From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/i915/ringbuffer: Delay after invalidating gen6+ xcs
Date: Thu, 30 Aug 2018 18:45:45 +0300 [thread overview]
Message-ID: <20180830154545.GN5565@intel.com> (raw)
In-Reply-To: <20180828170429.28492-2-chris@chris-wilson.co.uk>
On Tue, Aug 28, 2018 at 06:04:29PM +0100, Chris Wilson wrote:
> During stress testing of full-ppgtt (on Baytrail at least), we found
> that the invalidation around a context/mm switch was insufficient (writes
> would go astray). Adding a second MI_FLUSH_DW barrier prevents this, but
> it is unclear as to whether this is merely a delaying tactic or if it is
> truly serialising with the TLB invalidation. Either way, it is
> empirically required.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107715
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
> drivers/gpu/drm/i915/intel_ringbuffer.c | 101 +++++++++++-------------
> 1 file changed, 47 insertions(+), 54 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index d40f55a8dc34..952b6269bab0 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1944,40 +1944,62 @@ static void gen6_bsd_submit_request(struct i915_request *request)
> intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> }
>
> -static int gen6_bsd_ring_flush(struct i915_request *rq, u32 mode)
> +static int emit_mi_flush_dw(struct i915_request *rq, u32 mode, u32 invflags)
> {
> u32 cmd, *cs;
>
> - cs = intel_ring_begin(rq, 4);
> - if (IS_ERR(cs))
> - return PTR_ERR(cs);
> + do {
> + cs = intel_ring_begin(rq, 4);
> + if (IS_ERR(cs))
> + return PTR_ERR(cs);
>
> - cmd = MI_FLUSH_DW;
> + cmd = MI_FLUSH_DW;
>
> - /* We always require a command barrier so that subsequent
> - * commands, such as breadcrumb interrupts, are strictly ordered
> - * wrt the contents of the write cache being flushed to memory
> - * (and thus being coherent from the CPU).
> - */
> - cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
> + /*
> + * We always require a command barrier so that subsequent
> + * commands, such as breadcrumb interrupts, are strictly ordered
> + * wrt the contents of the write cache being flushed to memory
> + * (and thus being coherent from the CPU).
> + */
> + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
>
> - /*
> - * Bspec vol 1c.5 - video engine command streamer:
> - * "If ENABLED, all TLBs will be invalidated once the flush
> - * operation is complete. This bit is only valid when the
> - * Post-Sync Operation field is a value of 1h or 3h."
> - */
> - if (mode & EMIT_INVALIDATE)
> - cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD;
> + /*
> + * Bspec vol 1c.3 - blitter engine command streamer:
> + * "If ENABLED, all TLBs will be invalidated once the flush
> + * operation is complete. This bit is only valid when the
> + * Post-Sync Operation field is a value of 1h or 3h."
> + */
> + if (mode & EMIT_INVALIDATE)
> + cmd |= invflags;
> + *cs++ = cmd;
> + *cs++ = I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT;
> + *cs++ = 0;
> + *cs++ = MI_NOOP;
> + intel_ring_advance(rq, cs);
> +
> + /*
> + * Not only do we need a full barrier (post-sync write) after
> + * invalidating the TLBs, but we need to wait a little bit
> + * longer. Whether this is merely delaying us, or the
> + * subsequent flush is a key part of serialising with the
> + * post-sync op, this extra pass appears vital before a
> + * mm switch!
> + */
> + if (!(mode & EMIT_INVALIDATE))
> + break;
> +
> + mode &= ~EMIT_INVALIDATE;
> + } while (1);
I find the loop thingy somewhat hard to read. I'd probably have
written it as something like
{
if (mode & EMIT_INVALIDATE)
mi_flush(INVALIDATE_TLB);
mi_flush(0);
}
Either way it seems to do what it says so
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>
> - *cs++ = cmd;
> - *cs++ = I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT;
> - *cs++ = 0;
> - *cs++ = MI_NOOP;
> - intel_ring_advance(rq, cs);
> return 0;
> }
>
> +static int gen6_bsd_ring_flush(struct i915_request *rq, u32 mode)
> +{
> + return emit_mi_flush_dw(rq, mode,
> + MI_INVALIDATE_TLB | MI_INVALIDATE_BSD);
> +}
> +
> static int
> hsw_emit_bb_start(struct i915_request *rq,
> u64 offset, u32 len,
> @@ -2022,36 +2044,7 @@ gen6_emit_bb_start(struct i915_request *rq,
>
> static int gen6_ring_flush(struct i915_request *rq, u32 mode)
> {
> - u32 cmd, *cs;
> -
> - cs = intel_ring_begin(rq, 4);
> - if (IS_ERR(cs))
> - return PTR_ERR(cs);
> -
> - cmd = MI_FLUSH_DW;
> -
> - /* We always require a command barrier so that subsequent
> - * commands, such as breadcrumb interrupts, are strictly ordered
> - * wrt the contents of the write cache being flushed to memory
> - * (and thus being coherent from the CPU).
> - */
> - cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
> -
> - /*
> - * Bspec vol 1c.3 - blitter engine command streamer:
> - * "If ENABLED, all TLBs will be invalidated once the flush
> - * operation is complete. This bit is only valid when the
> - * Post-Sync Operation field is a value of 1h or 3h."
> - */
> - if (mode & EMIT_INVALIDATE)
> - cmd |= MI_INVALIDATE_TLB;
> - *cs++ = cmd;
> - *cs++ = I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT;
> - *cs++ = 0;
> - *cs++ = MI_NOOP;
> - intel_ring_advance(rq, cs);
> -
> - return 0;
> + return emit_mi_flush_dw(rq, mode, MI_INVALIDATE_TLB);
> }
>
> static void intel_ring_init_semaphores(struct drm_i915_private *dev_priv,
> --
> 2.18.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Ville Syrjälä
Intel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2018-08-30 15:45 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-28 17:04 Missing byt full-ppgtt massage Chris Wilson
2018-08-28 17:04 ` [PATCH] drm/i915/ringbuffer: Delay after invalidating gen6+ xcs Chris Wilson
2018-08-30 15:45 ` Ville Syrjälä [this message]
2018-08-28 18:11 ` ✓ Fi.CI.BAT: success for " Patchwork
2018-08-28 20:14 ` Missing byt full-ppgtt massage Rodrigo Vivi
2018-08-28 20:17 ` Chris Wilson
2018-08-28 21:51 ` ✓ Fi.CI.IGT: success for drm/i915/ringbuffer: Delay after invalidating gen6+ xcs Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180830154545.GN5565@intel.com \
--to=ville.syrjala@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.