All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org, stable@vger.kernel.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
Date: Wed, 7 Nov 2018 17:04:24 +0200	[thread overview]
Message-ID: <20181107150424.GK9144@intel.com> (raw)
In-Reply-To: <20181105094305.5767-1-chris@chris-wilson.co.uk>

On Mon, Nov 05, 2018 at 09:43:05AM +0000, Chris Wilson wrote:
> Exercising the gpu reloc path strenuously revealed an issue where the
> updated relocations (from MI_STORE_DWORD_IMM) were not being observed
> upon execution. After some experiments with adding pipecontrols (a lot
> of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe
> controls or even the current on), it was discovered that we merely
> needed to delay the EMIT_INVALIDATE by several flushes. It is important
> to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that
> needs the delay as opposed to what one might first expect -- that the
> delay is required for the TLB invalidation to take effect (one presumes
> to purge any CS buffers) as opposed to a delay after flushing to ensure
> the writes have landed before triggering invalidation.
> 
> Testcase: igt/gem_tiled_fence_blits
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: stable@vger.kernel.org
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 38 +++++++++++++++++++++++--
>  1 file changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index b8a7a014d46d..87eebc13c0d8 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -91,6 +91,7 @@ static int
>  gen4_render_ring_flush(struct i915_request *rq, u32 mode)
>  {
>  	u32 cmd, *cs;
> +	int i;
>  
>  	/*
>  	 * read/write caches:
> @@ -127,12 +128,45 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode)
>  			cmd |= MI_INVALIDATE_ISP;
>  	}
>  
> -	cs = intel_ring_begin(rq, 2);
> +	i = 2;
> +	if (mode & EMIT_INVALIDATE)
> +		i += 20;
> +
> +	cs = intel_ring_begin(rq, i);
>  	if (IS_ERR(cs))
>  		return PTR_ERR(cs);
>  
>  	*cs++ = cmd;
> -	*cs++ = MI_NOOP;
> +
> +	/*
> +	 * A random delay to let the CS invalidate take effect? Without this
> +	 * delay, the GPU relocation path fails as the CS does not see
> +	 * the updated contents. Just as important, if we apply the flushes
> +	 * to the EMIT_FLUSH branch (i.e. immediately after the relocation
> +	 * write and before the invalidate on the next batch), the relocations
> +	 * still fail. This implies that is a delay following invalidation
> +	 * that is required to reset the caches as opposed to a delay to
> +	 * ensure the memory is written.
> +	 */
> +	if (mode & EMIT_INVALIDATE) {
> +		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> +		*cs++ = i915_ggtt_offset(rq->engine->scratch) |
> +			PIPE_CONTROL_GLOBAL_GTT;
> +		*cs++ = 0;
> +		*cs++ = 0;
> +
> +		for (i = 0; i < 12; i++)
> +			*cs++ = MI_FLUSH;
> +
> +		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> +		*cs++ = i915_ggtt_offset(rq->engine->scratch) |
> +			PIPE_CONTROL_GLOBAL_GTT;
> +		*cs++ = 0;
> +		*cs++ = 0;
> +	}

This smells a lot like the snb a/b w/a, except there the spec says to
use 8 STORE_DWORDS. I suppose the choice of a specific command isn't
critical, and it's just a matter of stuffing the pipeline with something
that's takes long enough to let the TLB invalidate finish?

Anyways, patch itself seems as reasonable as one might expect for an
issue like this.

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

> +
> +	*cs++ = cmd;
> +
>  	intel_ring_advance(rq, cs);
>  
>  	return 0;
> -- 
> 2.19.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel

WARNING: multiple messages have this Message-ID (diff)
From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org, stable@vger.kernel.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
Date: Wed, 7 Nov 2018 17:04:24 +0200	[thread overview]
Message-ID: <20181107150424.GK9144@intel.com> (raw)
In-Reply-To: <20181105094305.5767-1-chris@chris-wilson.co.uk>

On Mon, Nov 05, 2018 at 09:43:05AM +0000, Chris Wilson wrote:
> Exercising the gpu reloc path strenuously revealed an issue where the
> updated relocations (from MI_STORE_DWORD_IMM) were not being observed
> upon execution. After some experiments with adding pipecontrols (a lot
> of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe
> controls or even the current on), it was discovered that we merely
> needed to delay the EMIT_INVALIDATE by several flushes. It is important
> to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that
> needs the delay as opposed to what one might first expect -- that the
> delay is required for the TLB invalidation to take effect (one presumes
> to purge any CS buffers) as opposed to a delay after flushing to ensure
> the writes have landed before triggering invalidation.
> 
> Testcase: igt/gem_tiled_fence_blits
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: stable@vger.kernel.org
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 38 +++++++++++++++++++++++--
>  1 file changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index b8a7a014d46d..87eebc13c0d8 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -91,6 +91,7 @@ static int
>  gen4_render_ring_flush(struct i915_request *rq, u32 mode)
>  {
>  	u32 cmd, *cs;
> +	int i;
>  
>  	/*
>  	 * read/write caches:
> @@ -127,12 +128,45 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode)
>  			cmd |= MI_INVALIDATE_ISP;
>  	}
>  
> -	cs = intel_ring_begin(rq, 2);
> +	i = 2;
> +	if (mode & EMIT_INVALIDATE)
> +		i += 20;
> +
> +	cs = intel_ring_begin(rq, i);
>  	if (IS_ERR(cs))
>  		return PTR_ERR(cs);
>  
>  	*cs++ = cmd;
> -	*cs++ = MI_NOOP;
> +
> +	/*
> +	 * A random delay to let the CS invalidate take effect? Without this
> +	 * delay, the GPU relocation path fails as the CS does not see
> +	 * the updated contents. Just as important, if we apply the flushes
> +	 * to the EMIT_FLUSH branch (i.e. immediately after the relocation
> +	 * write and before the invalidate on the next batch), the relocations
> +	 * still fail. This implies that is a delay following invalidation
> +	 * that is required to reset the caches as opposed to a delay to
> +	 * ensure the memory is written.
> +	 */
> +	if (mode & EMIT_INVALIDATE) {
> +		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> +		*cs++ = i915_ggtt_offset(rq->engine->scratch) |
> +			PIPE_CONTROL_GLOBAL_GTT;
> +		*cs++ = 0;
> +		*cs++ = 0;
> +
> +		for (i = 0; i < 12; i++)
> +			*cs++ = MI_FLUSH;
> +
> +		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> +		*cs++ = i915_ggtt_offset(rq->engine->scratch) |
> +			PIPE_CONTROL_GLOBAL_GTT;
> +		*cs++ = 0;
> +		*cs++ = 0;
> +	}

This smells a lot like the snb a/b w/a, except there the spec says to
use 8 STORE_DWORDS. I suppose the choice of a specific command isn't
critical, and it's just a matter of stuffing the pipeline with something
that's takes long enough to let the TLB invalidate finish?

Anyways, patch itself seems as reasonable as one might expect for an
issue like this.

Reviewed-by: Ville Syrj�l� <ville.syrjala@linux.intel.com>

> +
> +	*cs++ = cmd;
> +
>  	intel_ring_advance(rq, cs);
>  
>  	return 0;
> -- 
> 2.19.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrj�l�
Intel

  parent reply	other threads:[~2018-11-07 15:04 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-05  9:43 [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 Chris Wilson
2018-11-05 10:23 ` ✓ Fi.CI.BAT: success for " Patchwork
2018-11-05 11:13 ` ✓ Fi.CI.IGT: " Patchwork
2018-11-07 15:04 ` Ville Syrjälä [this message]
2018-11-07 15:04   ` [Intel-gfx] [PATCH] " Ville Syrjälä
2018-11-07 15:12   ` Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181107150424.GK9144@intel.com \
    --to=ville.syrjala@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.