All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
@ 2018-11-05  9:43 Chris Wilson
  2018-11-05 10:23 ` ✓ Fi.CI.BAT: success for " Patchwork
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Chris Wilson @ 2018-11-05  9:43 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson, stable

Exercising the gpu reloc path strenuously revealed an issue where the
updated relocations (from MI_STORE_DWORD_IMM) were not being observed
upon execution. After some experiments with adding pipecontrols (a lot
of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe
controls or even the current on), it was discovered that we merely
needed to delay the EMIT_INVALIDATE by several flushes. It is important
to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that
needs the delay as opposed to what one might first expect -- that the
delay is required for the TLB invalidation to take effect (one presumes
to purge any CS buffers) as opposed to a delay after flushing to ensure
the writes have landed before triggering invalidation.

Testcase: igt/gem_tiled_fence_blits
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 38 +++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index b8a7a014d46d..87eebc13c0d8 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -91,6 +91,7 @@ static int
 gen4_render_ring_flush(struct i915_request *rq, u32 mode)
 {
 	u32 cmd, *cs;
+	int i;
 
 	/*
 	 * read/write caches:
@@ -127,12 +128,45 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode)
 			cmd |= MI_INVALIDATE_ISP;
 	}
 
-	cs = intel_ring_begin(rq, 2);
+	i = 2;
+	if (mode & EMIT_INVALIDATE)
+		i += 20;
+
+	cs = intel_ring_begin(rq, i);
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
 	*cs++ = cmd;
-	*cs++ = MI_NOOP;
+
+	/*
+	 * A random delay to let the CS invalidate take effect? Without this
+	 * delay, the GPU relocation path fails as the CS does not see
+	 * the updated contents. Just as important, if we apply the flushes
+	 * to the EMIT_FLUSH branch (i.e. immediately after the relocation
+	 * write and before the invalidate on the next batch), the relocations
+	 * still fail. This implies that is a delay following invalidation
+	 * that is required to reset the caches as opposed to a delay to
+	 * ensure the memory is written.
+	 */
+	if (mode & EMIT_INVALIDATE) {
+		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
+		*cs++ = i915_ggtt_offset(rq->engine->scratch) |
+			PIPE_CONTROL_GLOBAL_GTT;
+		*cs++ = 0;
+		*cs++ = 0;
+
+		for (i = 0; i < 12; i++)
+			*cs++ = MI_FLUSH;
+
+		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
+		*cs++ = i915_ggtt_offset(rq->engine->scratch) |
+			PIPE_CONTROL_GLOBAL_GTT;
+		*cs++ = 0;
+		*cs++ = 0;
+	}
+
+	*cs++ = cmd;
+
 	intel_ring_advance(rq, cs);
 
 	return 0;
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
  2018-11-05  9:43 [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 Chris Wilson
@ 2018-11-05 10:23 ` Patchwork
  2018-11-05 11:13 ` ✓ Fi.CI.IGT: " Patchwork
  2018-11-07 15:04   ` Ville Syrjälä
  2 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2018-11-05 10:23 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
URL   : https://patchwork.freedesktop.org/series/52013/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_5085 -> Patchwork_10722 =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_10722 need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_10722, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/52013/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_10722:

  === IGT changes ===

    ==== Warnings ====

    igt@drv_selftest@live_guc:
      fi-icl-u:           PASS -> SKIP +2

    
== Known issues ==

  Here are the changes found in Patchwork_10722 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
      fi-blb-e6850:       PASS -> INCOMPLETE (fdo#107718)

    
    ==== Possible fixes ====

    igt@gem_cpu_reloc@basic:
      fi-skl-6700hq:      INCOMPLETE (fdo#108011) -> PASS

    igt@gem_exec_suspend@basic-s3:
      fi-glk-dsi:         FAIL (fdo#103375) -> PASS

    igt@kms_frontbuffer_tracking@basic:
      fi-hsw-peppy:       DMESG-WARN (fdo#102614) -> PASS
      fi-icl-u:           FAIL (fdo#103167) -> PASS

    igt@kms_pipe_crc_basic@read-crc-pipe-b-frame-sequence:
      fi-byt-clapper:     FAIL (fdo#103191, fdo#107362) -> PASS

    
    ==== Warnings ====

    igt@drv_selftest@live_contexts:
      fi-icl-u:           DMESG-FAIL (fdo#108569) -> INCOMPLETE (fdo#108315, fdo#108535)

    
  fdo#102614 https://bugs.freedesktop.org/show_bug.cgi?id=102614
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191
  fdo#103375 https://bugs.freedesktop.org/show_bug.cgi?id=103375
  fdo#107362 https://bugs.freedesktop.org/show_bug.cgi?id=107362
  fdo#107718 https://bugs.freedesktop.org/show_bug.cgi?id=107718
  fdo#108011 https://bugs.freedesktop.org/show_bug.cgi?id=108011
  fdo#108315 https://bugs.freedesktop.org/show_bug.cgi?id=108315
  fdo#108535 https://bugs.freedesktop.org/show_bug.cgi?id=108535
  fdo#108569 https://bugs.freedesktop.org/show_bug.cgi?id=108569


== Participating hosts (45 -> 44) ==

  Additional (4): fi-kbl-7560u fi-gdg-551 fi-bwr-2160 fi-pnv-d510 
  Missing    (5): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 


== Build changes ==

    * Linux: CI_DRM_5085 -> Patchwork_10722

  CI_DRM_5085: 6ae61ee5db4af12c0b21bf39e0400ccf024187c4 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4706: 5421c73a7db3cfaa85ab24325fe6e898cbb27fb3 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_10722: 2c7f605ac0d85a04c93ff6866fcbe7d07dead990 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

2c7f605ac0d8 drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10722/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* ✓ Fi.CI.IGT: success for drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
  2018-11-05  9:43 [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 Chris Wilson
  2018-11-05 10:23 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2018-11-05 11:13 ` Patchwork
  2018-11-07 15:04   ` Ville Syrjälä
  2 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2018-11-05 11:13 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
URL   : https://patchwork.freedesktop.org/series/52013/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_5085_full -> Patchwork_10722_full =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_10722_full need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_10722_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_10722_full:

  === IGT changes ===

    ==== Warnings ====

    igt@pm_rc6_residency@rc6-accuracy:
      shard-snb:          PASS -> SKIP

    
== Known issues ==

  Here are the changes found in Patchwork_10722_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@gem_exec_reuse@baggage:
      shard-apl:          PASS -> INCOMPLETE (fdo#103927)

    igt@gem_exec_schedule@pi-ringfull-bsd:
      shard-skl:          NOTRUN -> FAIL (fdo#103158) +1

    igt@gem_softpin@noreloc-s3:
      shard-skl:          NOTRUN -> INCOMPLETE (fdo#104108, fdo#107773)

    igt@kms_busy@extended-modeset-hang-newfb-render-a:
      shard-skl:          NOTRUN -> DMESG-WARN (fdo#107956) +4

    igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-b:
      shard-kbl:          PASS -> DMESG-WARN (fdo#107956)

    igt@kms_busy@extended-pageflip-hang-newfb-render-a:
      shard-apl:          PASS -> DMESG-WARN (fdo#107956)

    igt@kms_cursor_crc@cursor-128x42-onscreen:
      shard-skl:          NOTRUN -> FAIL (fdo#103232)

    igt@kms_cursor_crc@cursor-256x256-dpms:
      shard-glk:          PASS -> FAIL (fdo#103232) +2

    igt@kms_fbcon_fbt@psr:
      shard-skl:          NOTRUN -> FAIL (fdo#107882)

    igt@kms_flip_tiling@flip-y-tiled:
      shard-skl:          NOTRUN -> FAIL (fdo#108303)

    igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-fullscreen:
      shard-apl:          PASS -> FAIL (fdo#103167)

    igt@kms_frontbuffer_tracking@fbc-stridechange:
      shard-skl:          NOTRUN -> FAIL (fdo#105683)

    igt@kms_plane@plane-position-covered-pipe-b-planes:
      shard-glk:          PASS -> FAIL (fdo#103166) +1

    igt@kms_plane_alpha_blend@pipe-a-constant-alpha-max:
      shard-skl:          NOTRUN -> FAIL (fdo#108145) +4

    igt@kms_plane_alpha_blend@pipe-b-alpha-basic:
      shard-skl:          NOTRUN -> FAIL (fdo#107815, fdo#108145) +1

    igt@kms_properties@connector-properties-atomic:
      shard-skl:          NOTRUN -> FAIL (fdo#108642)

    igt@kms_setmode@basic:
      shard-skl:          NOTRUN -> FAIL (fdo#99912)

    igt@kms_sysfs_edid_timing:
      shard-skl:          NOTRUN -> FAIL (fdo#100047)

    
    ==== Possible fixes ====

    igt@gem_exec_reloc@basic-wc-cpu-noreloc:
      shard-snb:          INCOMPLETE (fdo#105411) -> PASS

    igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-c:
      shard-hsw:          DMESG-WARN (fdo#107956) -> PASS

    igt@kms_busy@extended-pageflip-modeset-hang-oldfb-render-a:
      shard-snb:          DMESG-WARN (fdo#107956) -> PASS

    igt@kms_cursor_crc@cursor-128x42-onscreen:
      shard-glk:          FAIL (fdo#103232) -> PASS +1

    igt@kms_cursor_crc@cursor-64x64-offscreen:
      shard-skl:          FAIL (fdo#103232) -> PASS +1

    igt@kms_draw_crc@draw-method-xrgb2101010-pwrite-ytiled:
      shard-skl:          FAIL (fdo#103184) -> PASS

    igt@kms_draw_crc@draw-method-xrgb8888-pwrite-untiled:
      shard-skl:          FAIL (fdo#108472) -> PASS

    igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-mmap-cpu:
      shard-skl:          FAIL (fdo#105682) -> PASS

    igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-mmap-cpu:
      shard-apl:          FAIL (fdo#103167) -> PASS

    igt@kms_frontbuffer_tracking@fbc-1p-rte:
      shard-glk:          FAIL (fdo#105682, fdo#103167) -> PASS

    igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-onoff:
      shard-glk:          FAIL (fdo#103167) -> PASS +1

    igt@kms_frontbuffer_tracking@fbc-rgb101010-draw-mmap-wc:
      shard-skl:          FAIL (fdo#105682, fdo#103167) -> PASS

    igt@kms_plane_alpha_blend@pipe-a-coverage-7efc:
      shard-skl:          FAIL (fdo#107815, fdo#108145) -> PASS

    igt@kms_plane_multiple@atomic-pipe-a-tiling-x:
      shard-apl:          FAIL (fdo#103166) -> PASS +2

    igt@kms_plane_multiple@atomic-pipe-a-tiling-yf:
      shard-glk:          FAIL (fdo#103166) -> PASS

    igt@perf@oa-exponents:
      shard-glk:          FAIL (fdo#105483) -> PASS

    igt@pm_rpm@system-suspend:
      shard-skl:          INCOMPLETE (fdo#107807, fdo#104108, fdo#107773) -> PASS

    
  fdo#100047 https://bugs.freedesktop.org/show_bug.cgi?id=100047
  fdo#103158 https://bugs.freedesktop.org/show_bug.cgi?id=103158
  fdo#103166 https://bugs.freedesktop.org/show_bug.cgi?id=103166
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#103184 https://bugs.freedesktop.org/show_bug.cgi?id=103184
  fdo#103232 https://bugs.freedesktop.org/show_bug.cgi?id=103232
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
  fdo#104108 https://bugs.freedesktop.org/show_bug.cgi?id=104108
  fdo#105411 https://bugs.freedesktop.org/show_bug.cgi?id=105411
  fdo#105483 https://bugs.freedesktop.org/show_bug.cgi?id=105483
  fdo#105682 https://bugs.freedesktop.org/show_bug.cgi?id=105682
  fdo#105683 https://bugs.freedesktop.org/show_bug.cgi?id=105683
  fdo#107773 https://bugs.freedesktop.org/show_bug.cgi?id=107773
  fdo#107807 https://bugs.freedesktop.org/show_bug.cgi?id=107807
  fdo#107815 https://bugs.freedesktop.org/show_bug.cgi?id=107815
  fdo#107882 https://bugs.freedesktop.org/show_bug.cgi?id=107882
  fdo#107956 https://bugs.freedesktop.org/show_bug.cgi?id=107956
  fdo#108145 https://bugs.freedesktop.org/show_bug.cgi?id=108145
  fdo#108303 https://bugs.freedesktop.org/show_bug.cgi?id=108303
  fdo#108472 https://bugs.freedesktop.org/show_bug.cgi?id=108472
  fdo#108642 https://bugs.freedesktop.org/show_bug.cgi?id=108642
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912


== Participating hosts (6 -> 6) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_5085 -> Patchwork_10722

  CI_DRM_5085: 6ae61ee5db4af12c0b21bf39e0400ccf024187c4 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4706: 5421c73a7db3cfaa85ab24325fe6e898cbb27fb3 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_10722: 2c7f605ac0d85a04c93ff6866fcbe7d07dead990 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10722/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
  2018-11-05  9:43 [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 Chris Wilson
@ 2018-11-07 15:04   ` Ville Syrjälä
  2018-11-05 11:13 ` ✓ Fi.CI.IGT: " Patchwork
  2018-11-07 15:04   ` Ville Syrjälä
  2 siblings, 0 replies; 6+ messages in thread
From: Ville Syrjälä @ 2018-11-07 15:04 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, stable

On Mon, Nov 05, 2018 at 09:43:05AM +0000, Chris Wilson wrote:
> Exercising the gpu reloc path strenuously revealed an issue where the
> updated relocations (from MI_STORE_DWORD_IMM) were not being observed
> upon execution. After some experiments with adding pipecontrols (a lot
> of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe
> controls or even the current on), it was discovered that we merely
> needed to delay the EMIT_INVALIDATE by several flushes. It is important
> to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that
> needs the delay as opposed to what one might first expect -- that the
> delay is required for the TLB invalidation to take effect (one presumes
> to purge any CS buffers) as opposed to a delay after flushing to ensure
> the writes have landed before triggering invalidation.
> 
> Testcase: igt/gem_tiled_fence_blits
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: stable@vger.kernel.org
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 38 +++++++++++++++++++++++--
>  1 file changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index b8a7a014d46d..87eebc13c0d8 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -91,6 +91,7 @@ static int
>  gen4_render_ring_flush(struct i915_request *rq, u32 mode)
>  {
>  	u32 cmd, *cs;
> +	int i;
>  
>  	/*
>  	 * read/write caches:
> @@ -127,12 +128,45 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode)
>  			cmd |= MI_INVALIDATE_ISP;
>  	}
>  
> -	cs = intel_ring_begin(rq, 2);
> +	i = 2;
> +	if (mode & EMIT_INVALIDATE)
> +		i += 20;
> +
> +	cs = intel_ring_begin(rq, i);
>  	if (IS_ERR(cs))
>  		return PTR_ERR(cs);
>  
>  	*cs++ = cmd;
> -	*cs++ = MI_NOOP;
> +
> +	/*
> +	 * A random delay to let the CS invalidate take effect? Without this
> +	 * delay, the GPU relocation path fails as the CS does not see
> +	 * the updated contents. Just as important, if we apply the flushes
> +	 * to the EMIT_FLUSH branch (i.e. immediately after the relocation
> +	 * write and before the invalidate on the next batch), the relocations
> +	 * still fail. This implies that is a delay following invalidation
> +	 * that is required to reset the caches as opposed to a delay to
> +	 * ensure the memory is written.
> +	 */
> +	if (mode & EMIT_INVALIDATE) {
> +		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> +		*cs++ = i915_ggtt_offset(rq->engine->scratch) |
> +			PIPE_CONTROL_GLOBAL_GTT;
> +		*cs++ = 0;
> +		*cs++ = 0;
> +
> +		for (i = 0; i < 12; i++)
> +			*cs++ = MI_FLUSH;
> +
> +		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> +		*cs++ = i915_ggtt_offset(rq->engine->scratch) |
> +			PIPE_CONTROL_GLOBAL_GTT;
> +		*cs++ = 0;
> +		*cs++ = 0;
> +	}

This smells a lot like the snb a/b w/a, except there the spec says to
use 8 STORE_DWORDS. I suppose the choice of a specific command isn't
critical, and it's just a matter of stuffing the pipeline with something
that's takes long enough to let the TLB invalidate finish?

Anyways, patch itself seems as reasonable as one might expect for an
issue like this.

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

> +
> +	*cs++ = cmd;
> +
>  	intel_ring_advance(rq, cs);
>  
>  	return 0;
> -- 
> 2.19.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
@ 2018-11-07 15:04   ` Ville Syrjälä
  0 siblings, 0 replies; 6+ messages in thread
From: Ville Syrjälä @ 2018-11-07 15:04 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, stable

On Mon, Nov 05, 2018 at 09:43:05AM +0000, Chris Wilson wrote:
> Exercising the gpu reloc path strenuously revealed an issue where the
> updated relocations (from MI_STORE_DWORD_IMM) were not being observed
> upon execution. After some experiments with adding pipecontrols (a lot
> of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe
> controls or even the current on), it was discovered that we merely
> needed to delay the EMIT_INVALIDATE by several flushes. It is important
> to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that
> needs the delay as opposed to what one might first expect -- that the
> delay is required for the TLB invalidation to take effect (one presumes
> to purge any CS buffers) as opposed to a delay after flushing to ensure
> the writes have landed before triggering invalidation.
> 
> Testcase: igt/gem_tiled_fence_blits
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: stable@vger.kernel.org
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 38 +++++++++++++++++++++++--
>  1 file changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index b8a7a014d46d..87eebc13c0d8 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -91,6 +91,7 @@ static int
>  gen4_render_ring_flush(struct i915_request *rq, u32 mode)
>  {
>  	u32 cmd, *cs;
> +	int i;
>  
>  	/*
>  	 * read/write caches:
> @@ -127,12 +128,45 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode)
>  			cmd |= MI_INVALIDATE_ISP;
>  	}
>  
> -	cs = intel_ring_begin(rq, 2);
> +	i = 2;
> +	if (mode & EMIT_INVALIDATE)
> +		i += 20;
> +
> +	cs = intel_ring_begin(rq, i);
>  	if (IS_ERR(cs))
>  		return PTR_ERR(cs);
>  
>  	*cs++ = cmd;
> -	*cs++ = MI_NOOP;
> +
> +	/*
> +	 * A random delay to let the CS invalidate take effect? Without this
> +	 * delay, the GPU relocation path fails as the CS does not see
> +	 * the updated contents. Just as important, if we apply the flushes
> +	 * to the EMIT_FLUSH branch (i.e. immediately after the relocation
> +	 * write and before the invalidate on the next batch), the relocations
> +	 * still fail. This implies that is a delay following invalidation
> +	 * that is required to reset the caches as opposed to a delay to
> +	 * ensure the memory is written.
> +	 */
> +	if (mode & EMIT_INVALIDATE) {
> +		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> +		*cs++ = i915_ggtt_offset(rq->engine->scratch) |
> +			PIPE_CONTROL_GLOBAL_GTT;
> +		*cs++ = 0;
> +		*cs++ = 0;
> +
> +		for (i = 0; i < 12; i++)
> +			*cs++ = MI_FLUSH;
> +
> +		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> +		*cs++ = i915_ggtt_offset(rq->engine->scratch) |
> +			PIPE_CONTROL_GLOBAL_GTT;
> +		*cs++ = 0;
> +		*cs++ = 0;
> +	}

This smells a lot like the snb a/b w/a, except there the spec says to
use 8 STORE_DWORDS. I suppose the choice of a specific command isn't
critical, and it's just a matter of stuffing the pipeline with something
that's takes long enough to let the TLB invalidate finish?

Anyways, patch itself seems as reasonable as one might expect for an
issue like this.

Reviewed-by: Ville Syrj�l� <ville.syrjala@linux.intel.com>

> +
> +	*cs++ = cmd;
> +
>  	intel_ring_advance(rq, cs);
>  
>  	return 0;
> -- 
> 2.19.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrj�l�
Intel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
  2018-11-07 15:04   ` Ville Syrjälä
  (?)
@ 2018-11-07 15:12   ` Chris Wilson
  -1 siblings, 0 replies; 6+ messages in thread
From: Chris Wilson @ 2018-11-07 15:12 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx, stable

Quoting Ville Syrjälä (2018-11-07 15:04:24)
> On Mon, Nov 05, 2018 at 09:43:05AM +0000, Chris Wilson wrote:
> > Exercising the gpu reloc path strenuously revealed an issue where the
> > updated relocations (from MI_STORE_DWORD_IMM) were not being observed
> > upon execution. After some experiments with adding pipecontrols (a lot
> > of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe
> > controls or even the current on), it was discovered that we merely
> > needed to delay the EMIT_INVALIDATE by several flushes. It is important
> > to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that
> > needs the delay as opposed to what one might first expect -- that the
> > delay is required for the TLB invalidation to take effect (one presumes
> > to purge any CS buffers) as opposed to a delay after flushing to ensure
> > the writes have landed before triggering invalidation.
> > 
> > Testcase: igt/gem_tiled_fence_blits
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: stable@vger.kernel.org
> > ---
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 38 +++++++++++++++++++++++--
> >  1 file changed, 36 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index b8a7a014d46d..87eebc13c0d8 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -91,6 +91,7 @@ static int
> >  gen4_render_ring_flush(struct i915_request *rq, u32 mode)
> >  {
> >       u32 cmd, *cs;
> > +     int i;
> >  
> >       /*
> >        * read/write caches:
> > @@ -127,12 +128,45 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode)
> >                       cmd |= MI_INVALIDATE_ISP;
> >       }
> >  
> > -     cs = intel_ring_begin(rq, 2);
> > +     i = 2;
> > +     if (mode & EMIT_INVALIDATE)
> > +             i += 20;
> > +
> > +     cs = intel_ring_begin(rq, i);
> >       if (IS_ERR(cs))
> >               return PTR_ERR(cs);
> >  
> >       *cs++ = cmd;
> > -     *cs++ = MI_NOOP;
> > +
> > +     /*
> > +      * A random delay to let the CS invalidate take effect? Without this
> > +      * delay, the GPU relocation path fails as the CS does not see
> > +      * the updated contents. Just as important, if we apply the flushes
> > +      * to the EMIT_FLUSH branch (i.e. immediately after the relocation
> > +      * write and before the invalidate on the next batch), the relocations
> > +      * still fail. This implies that is a delay following invalidation
> > +      * that is required to reset the caches as opposed to a delay to
> > +      * ensure the memory is written.
> > +      */
> > +     if (mode & EMIT_INVALIDATE) {
> > +             *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> > +             *cs++ = i915_ggtt_offset(rq->engine->scratch) |
> > +                     PIPE_CONTROL_GLOBAL_GTT;
> > +             *cs++ = 0;
> > +             *cs++ = 0;
> > +
> > +             for (i = 0; i < 12; i++)
> > +                     *cs++ = MI_FLUSH;
> > +
> > +             *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> > +             *cs++ = i915_ggtt_offset(rq->engine->scratch) |
> > +                     PIPE_CONTROL_GLOBAL_GTT;
> > +             *cs++ = 0;
> > +             *cs++ = 0;
> > +     }
> 
> This smells a lot like the snb a/b w/a, except there the spec says to
> use 8 STORE_DWORDS.

Yeah, the similarity wasn't lost, except that w/a is to cover the
coherency aspect of the writes not being flushed. This feels a bit
fishier in that the experiments indicate it's an issue on the
invalidation path as opposed to flushing the writes.

And the other w/a to use umpteen pipecontrols to get around the lack of
PIPE_CONTROL_FLUSH.

> I suppose the choice of a specific command isn't
> critical, and it's just a matter of stuffing the pipeline with something
> that's takes long enough to let the TLB invalidate finish?

Except the MI_FLUSH are more effective in fewer number than
PIPE_CONTROLs. Probably because each one translates to a few pipe
controls or something, quite heavy.

> Anyways, patch itself seems as reasonable as one might expect for an
> issue like this.

As nasty as one would expect.

For the record, we are not entirely out of danger. gem_exec_whisper
continues to indicate a problem, but one step at a time. (I haven't yet
found quite what's upsetting it yet, except if we do each batch
synchronously and verify each one, it's happy.)
-Chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-11-08  0:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-05  9:43 [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 Chris Wilson
2018-11-05 10:23 ` ✓ Fi.CI.BAT: success for " Patchwork
2018-11-05 11:13 ` ✓ Fi.CI.IGT: " Patchwork
2018-11-07 15:04 ` [Intel-gfx] [PATCH] " Ville Syrjälä
2018-11-07 15:04   ` Ville Syrjälä
2018-11-07 15:12   ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.