* [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
@ 2018-11-05 9:43 Chris Wilson
2018-11-05 10:23 ` ✓ Fi.CI.BAT: success for " Patchwork
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Chris Wilson @ 2018-11-05 9:43 UTC (permalink / raw)
To: intel-gfx; +Cc: Chris Wilson, stable
Exercising the gpu reloc path strenuously revealed an issue where the
updated relocations (from MI_STORE_DWORD_IMM) were not being observed
upon execution. After some experiments with adding pipecontrols (a lot
of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe
controls or even the current on), it was discovered that we merely
needed to delay the EMIT_INVALIDATE by several flushes. It is important
to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that
needs the delay as opposed to what one might first expect -- that the
delay is required for the TLB invalidation to take effect (one presumes
to purge any CS buffers) as opposed to a delay after flushing to ensure
the writes have landed before triggering invalidation.
Testcase: igt/gem_tiled_fence_blits
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
---
drivers/gpu/drm/i915/intel_ringbuffer.c | 38 +++++++++++++++++++++++--
1 file changed, 36 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index b8a7a014d46d..87eebc13c0d8 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -91,6 +91,7 @@ static int
gen4_render_ring_flush(struct i915_request *rq, u32 mode)
{
u32 cmd, *cs;
+ int i;
/*
* read/write caches:
@@ -127,12 +128,45 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode)
cmd |= MI_INVALIDATE_ISP;
}
- cs = intel_ring_begin(rq, 2);
+ i = 2;
+ if (mode & EMIT_INVALIDATE)
+ i += 20;
+
+ cs = intel_ring_begin(rq, i);
if (IS_ERR(cs))
return PTR_ERR(cs);
*cs++ = cmd;
- *cs++ = MI_NOOP;
+
+ /*
+ * A random delay to let the CS invalidate take effect? Without this
+ * delay, the GPU relocation path fails as the CS does not see
+ * the updated contents. Just as important, if we apply the flushes
+ * to the EMIT_FLUSH branch (i.e. immediately after the relocation
+ * write and before the invalidate on the next batch), the relocations
+ * still fail. This implies that is a delay following invalidation
+ * that is required to reset the caches as opposed to a delay to
+ * ensure the memory is written.
+ */
+ if (mode & EMIT_INVALIDATE) {
+ *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
+ *cs++ = i915_ggtt_offset(rq->engine->scratch) |
+ PIPE_CONTROL_GLOBAL_GTT;
+ *cs++ = 0;
+ *cs++ = 0;
+
+ for (i = 0; i < 12; i++)
+ *cs++ = MI_FLUSH;
+
+ *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
+ *cs++ = i915_ggtt_offset(rq->engine->scratch) |
+ PIPE_CONTROL_GLOBAL_GTT;
+ *cs++ = 0;
+ *cs++ = 0;
+ }
+
+ *cs++ = cmd;
+
intel_ring_advance(rq, cs);
return 0;
--
2.19.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* ✓ Fi.CI.BAT: success for drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
2018-11-05 9:43 [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 Chris Wilson
@ 2018-11-05 10:23 ` Patchwork
2018-11-05 11:13 ` ✓ Fi.CI.IGT: " Patchwork
2018-11-07 15:04 ` Ville Syrjälä
2 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2018-11-05 10:23 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
URL : https://patchwork.freedesktop.org/series/52013/
State : success
== Summary ==
= CI Bug Log - changes from CI_DRM_5085 -> Patchwork_10722 =
== Summary - WARNING ==
Minor unknown changes coming with Patchwork_10722 need to be verified
manually.
If you think the reported changes have nothing to do with the changes
introduced in Patchwork_10722, please notify your bug team to allow them
to document this new failure mode, which will reduce false positives in CI.
External URL: https://patchwork.freedesktop.org/api/1.0/series/52013/revisions/1/mbox/
== Possible new issues ==
Here are the unknown changes that may have been introduced in Patchwork_10722:
=== IGT changes ===
==== Warnings ====
igt@drv_selftest@live_guc:
fi-icl-u: PASS -> SKIP +2
== Known issues ==
Here are the changes found in Patchwork_10722 that come from known issues:
=== IGT changes ===
==== Issues hit ====
igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
fi-blb-e6850: PASS -> INCOMPLETE (fdo#107718)
==== Possible fixes ====
igt@gem_cpu_reloc@basic:
fi-skl-6700hq: INCOMPLETE (fdo#108011) -> PASS
igt@gem_exec_suspend@basic-s3:
fi-glk-dsi: FAIL (fdo#103375) -> PASS
igt@kms_frontbuffer_tracking@basic:
fi-hsw-peppy: DMESG-WARN (fdo#102614) -> PASS
fi-icl-u: FAIL (fdo#103167) -> PASS
igt@kms_pipe_crc_basic@read-crc-pipe-b-frame-sequence:
fi-byt-clapper: FAIL (fdo#103191, fdo#107362) -> PASS
==== Warnings ====
igt@drv_selftest@live_contexts:
fi-icl-u: DMESG-FAIL (fdo#108569) -> INCOMPLETE (fdo#108315, fdo#108535)
fdo#102614 https://bugs.freedesktop.org/show_bug.cgi?id=102614
fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191
fdo#103375 https://bugs.freedesktop.org/show_bug.cgi?id=103375
fdo#107362 https://bugs.freedesktop.org/show_bug.cgi?id=107362
fdo#107718 https://bugs.freedesktop.org/show_bug.cgi?id=107718
fdo#108011 https://bugs.freedesktop.org/show_bug.cgi?id=108011
fdo#108315 https://bugs.freedesktop.org/show_bug.cgi?id=108315
fdo#108535 https://bugs.freedesktop.org/show_bug.cgi?id=108535
fdo#108569 https://bugs.freedesktop.org/show_bug.cgi?id=108569
== Participating hosts (45 -> 44) ==
Additional (4): fi-kbl-7560u fi-gdg-551 fi-bwr-2160 fi-pnv-d510
Missing (5): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600
== Build changes ==
* Linux: CI_DRM_5085 -> Patchwork_10722
CI_DRM_5085: 6ae61ee5db4af12c0b21bf39e0400ccf024187c4 @ git://anongit.freedesktop.org/gfx-ci/linux
IGT_4706: 5421c73a7db3cfaa85ab24325fe6e898cbb27fb3 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
Patchwork_10722: 2c7f605ac0d85a04c93ff6866fcbe7d07dead990 @ git://anongit.freedesktop.org/gfx-ci/linux
== Linux commits ==
2c7f605ac0d8 drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10722/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* ✓ Fi.CI.IGT: success for drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
2018-11-05 9:43 [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 Chris Wilson
2018-11-05 10:23 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2018-11-05 11:13 ` Patchwork
2018-11-07 15:04 ` Ville Syrjälä
2 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2018-11-05 11:13 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
URL : https://patchwork.freedesktop.org/series/52013/
State : success
== Summary ==
= CI Bug Log - changes from CI_DRM_5085_full -> Patchwork_10722_full =
== Summary - WARNING ==
Minor unknown changes coming with Patchwork_10722_full need to be verified
manually.
If you think the reported changes have nothing to do with the changes
introduced in Patchwork_10722_full, please notify your bug team to allow them
to document this new failure mode, which will reduce false positives in CI.
== Possible new issues ==
Here are the unknown changes that may have been introduced in Patchwork_10722_full:
=== IGT changes ===
==== Warnings ====
igt@pm_rc6_residency@rc6-accuracy:
shard-snb: PASS -> SKIP
== Known issues ==
Here are the changes found in Patchwork_10722_full that come from known issues:
=== IGT changes ===
==== Issues hit ====
igt@gem_exec_reuse@baggage:
shard-apl: PASS -> INCOMPLETE (fdo#103927)
igt@gem_exec_schedule@pi-ringfull-bsd:
shard-skl: NOTRUN -> FAIL (fdo#103158) +1
igt@gem_softpin@noreloc-s3:
shard-skl: NOTRUN -> INCOMPLETE (fdo#104108, fdo#107773)
igt@kms_busy@extended-modeset-hang-newfb-render-a:
shard-skl: NOTRUN -> DMESG-WARN (fdo#107956) +4
igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-b:
shard-kbl: PASS -> DMESG-WARN (fdo#107956)
igt@kms_busy@extended-pageflip-hang-newfb-render-a:
shard-apl: PASS -> DMESG-WARN (fdo#107956)
igt@kms_cursor_crc@cursor-128x42-onscreen:
shard-skl: NOTRUN -> FAIL (fdo#103232)
igt@kms_cursor_crc@cursor-256x256-dpms:
shard-glk: PASS -> FAIL (fdo#103232) +2
igt@kms_fbcon_fbt@psr:
shard-skl: NOTRUN -> FAIL (fdo#107882)
igt@kms_flip_tiling@flip-y-tiled:
shard-skl: NOTRUN -> FAIL (fdo#108303)
igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-fullscreen:
shard-apl: PASS -> FAIL (fdo#103167)
igt@kms_frontbuffer_tracking@fbc-stridechange:
shard-skl: NOTRUN -> FAIL (fdo#105683)
igt@kms_plane@plane-position-covered-pipe-b-planes:
shard-glk: PASS -> FAIL (fdo#103166) +1
igt@kms_plane_alpha_blend@pipe-a-constant-alpha-max:
shard-skl: NOTRUN -> FAIL (fdo#108145) +4
igt@kms_plane_alpha_blend@pipe-b-alpha-basic:
shard-skl: NOTRUN -> FAIL (fdo#107815, fdo#108145) +1
igt@kms_properties@connector-properties-atomic:
shard-skl: NOTRUN -> FAIL (fdo#108642)
igt@kms_setmode@basic:
shard-skl: NOTRUN -> FAIL (fdo#99912)
igt@kms_sysfs_edid_timing:
shard-skl: NOTRUN -> FAIL (fdo#100047)
==== Possible fixes ====
igt@gem_exec_reloc@basic-wc-cpu-noreloc:
shard-snb: INCOMPLETE (fdo#105411) -> PASS
igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-c:
shard-hsw: DMESG-WARN (fdo#107956) -> PASS
igt@kms_busy@extended-pageflip-modeset-hang-oldfb-render-a:
shard-snb: DMESG-WARN (fdo#107956) -> PASS
igt@kms_cursor_crc@cursor-128x42-onscreen:
shard-glk: FAIL (fdo#103232) -> PASS +1
igt@kms_cursor_crc@cursor-64x64-offscreen:
shard-skl: FAIL (fdo#103232) -> PASS +1
igt@kms_draw_crc@draw-method-xrgb2101010-pwrite-ytiled:
shard-skl: FAIL (fdo#103184) -> PASS
igt@kms_draw_crc@draw-method-xrgb8888-pwrite-untiled:
shard-skl: FAIL (fdo#108472) -> PASS
igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-mmap-cpu:
shard-skl: FAIL (fdo#105682) -> PASS
igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-mmap-cpu:
shard-apl: FAIL (fdo#103167) -> PASS
igt@kms_frontbuffer_tracking@fbc-1p-rte:
shard-glk: FAIL (fdo#105682, fdo#103167) -> PASS
igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-onoff:
shard-glk: FAIL (fdo#103167) -> PASS +1
igt@kms_frontbuffer_tracking@fbc-rgb101010-draw-mmap-wc:
shard-skl: FAIL (fdo#105682, fdo#103167) -> PASS
igt@kms_plane_alpha_blend@pipe-a-coverage-7efc:
shard-skl: FAIL (fdo#107815, fdo#108145) -> PASS
igt@kms_plane_multiple@atomic-pipe-a-tiling-x:
shard-apl: FAIL (fdo#103166) -> PASS +2
igt@kms_plane_multiple@atomic-pipe-a-tiling-yf:
shard-glk: FAIL (fdo#103166) -> PASS
igt@perf@oa-exponents:
shard-glk: FAIL (fdo#105483) -> PASS
igt@pm_rpm@system-suspend:
shard-skl: INCOMPLETE (fdo#107807, fdo#104108, fdo#107773) -> PASS
fdo#100047 https://bugs.freedesktop.org/show_bug.cgi?id=100047
fdo#103158 https://bugs.freedesktop.org/show_bug.cgi?id=103158
fdo#103166 https://bugs.freedesktop.org/show_bug.cgi?id=103166
fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
fdo#103184 https://bugs.freedesktop.org/show_bug.cgi?id=103184
fdo#103232 https://bugs.freedesktop.org/show_bug.cgi?id=103232
fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
fdo#104108 https://bugs.freedesktop.org/show_bug.cgi?id=104108
fdo#105411 https://bugs.freedesktop.org/show_bug.cgi?id=105411
fdo#105483 https://bugs.freedesktop.org/show_bug.cgi?id=105483
fdo#105682 https://bugs.freedesktop.org/show_bug.cgi?id=105682
fdo#105683 https://bugs.freedesktop.org/show_bug.cgi?id=105683
fdo#107773 https://bugs.freedesktop.org/show_bug.cgi?id=107773
fdo#107807 https://bugs.freedesktop.org/show_bug.cgi?id=107807
fdo#107815 https://bugs.freedesktop.org/show_bug.cgi?id=107815
fdo#107882 https://bugs.freedesktop.org/show_bug.cgi?id=107882
fdo#107956 https://bugs.freedesktop.org/show_bug.cgi?id=107956
fdo#108145 https://bugs.freedesktop.org/show_bug.cgi?id=108145
fdo#108303 https://bugs.freedesktop.org/show_bug.cgi?id=108303
fdo#108472 https://bugs.freedesktop.org/show_bug.cgi?id=108472
fdo#108642 https://bugs.freedesktop.org/show_bug.cgi?id=108642
fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912
== Participating hosts (6 -> 6) ==
No changes in participating hosts
== Build changes ==
* Linux: CI_DRM_5085 -> Patchwork_10722
CI_DRM_5085: 6ae61ee5db4af12c0b21bf39e0400ccf024187c4 @ git://anongit.freedesktop.org/gfx-ci/linux
IGT_4706: 5421c73a7db3cfaa85ab24325fe6e898cbb27fb3 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
Patchwork_10722: 2c7f605ac0d85a04c93ff6866fcbe7d07dead990 @ git://anongit.freedesktop.org/gfx-ci/linux
piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10722/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
2018-11-05 9:43 [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 Chris Wilson
@ 2018-11-07 15:04 ` Ville Syrjälä
2018-11-05 11:13 ` ✓ Fi.CI.IGT: " Patchwork
2018-11-07 15:04 ` Ville Syrjälä
2 siblings, 0 replies; 6+ messages in thread
From: Ville Syrjälä @ 2018-11-07 15:04 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx, stable
On Mon, Nov 05, 2018 at 09:43:05AM +0000, Chris Wilson wrote:
> Exercising the gpu reloc path strenuously revealed an issue where the
> updated relocations (from MI_STORE_DWORD_IMM) were not being observed
> upon execution. After some experiments with adding pipecontrols (a lot
> of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe
> controls or even the current on), it was discovered that we merely
> needed to delay the EMIT_INVALIDATE by several flushes. It is important
> to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that
> needs the delay as opposed to what one might first expect -- that the
> delay is required for the TLB invalidation to take effect (one presumes
> to purge any CS buffers) as opposed to a delay after flushing to ensure
> the writes have landed before triggering invalidation.
>
> Testcase: igt/gem_tiled_fence_blits
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: stable@vger.kernel.org
> ---
> drivers/gpu/drm/i915/intel_ringbuffer.c | 38 +++++++++++++++++++++++--
> 1 file changed, 36 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index b8a7a014d46d..87eebc13c0d8 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -91,6 +91,7 @@ static int
> gen4_render_ring_flush(struct i915_request *rq, u32 mode)
> {
> u32 cmd, *cs;
> + int i;
>
> /*
> * read/write caches:
> @@ -127,12 +128,45 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode)
> cmd |= MI_INVALIDATE_ISP;
> }
>
> - cs = intel_ring_begin(rq, 2);
> + i = 2;
> + if (mode & EMIT_INVALIDATE)
> + i += 20;
> +
> + cs = intel_ring_begin(rq, i);
> if (IS_ERR(cs))
> return PTR_ERR(cs);
>
> *cs++ = cmd;
> - *cs++ = MI_NOOP;
> +
> + /*
> + * A random delay to let the CS invalidate take effect? Without this
> + * delay, the GPU relocation path fails as the CS does not see
> + * the updated contents. Just as important, if we apply the flushes
> + * to the EMIT_FLUSH branch (i.e. immediately after the relocation
> + * write and before the invalidate on the next batch), the relocations
> + * still fail. This implies that is a delay following invalidation
> + * that is required to reset the caches as opposed to a delay to
> + * ensure the memory is written.
> + */
> + if (mode & EMIT_INVALIDATE) {
> + *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> + *cs++ = i915_ggtt_offset(rq->engine->scratch) |
> + PIPE_CONTROL_GLOBAL_GTT;
> + *cs++ = 0;
> + *cs++ = 0;
> +
> + for (i = 0; i < 12; i++)
> + *cs++ = MI_FLUSH;
> +
> + *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> + *cs++ = i915_ggtt_offset(rq->engine->scratch) |
> + PIPE_CONTROL_GLOBAL_GTT;
> + *cs++ = 0;
> + *cs++ = 0;
> + }
This smells a lot like the snb a/b w/a, except there the spec says to
use 8 STORE_DWORDS. I suppose the choice of a specific command isn't
critical, and it's just a matter of stuffing the pipeline with something
that's takes long enough to let the TLB invalidate finish?
Anyways, patch itself seems as reasonable as one might expect for an
issue like this.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> +
> + *cs++ = cmd;
> +
> intel_ring_advance(rq, cs);
>
> return 0;
> --
> 2.19.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Ville Syrjälä
Intel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
@ 2018-11-07 15:04 ` Ville Syrjälä
0 siblings, 0 replies; 6+ messages in thread
From: Ville Syrjälä @ 2018-11-07 15:04 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx, stable
On Mon, Nov 05, 2018 at 09:43:05AM +0000, Chris Wilson wrote:
> Exercising the gpu reloc path strenuously revealed an issue where the
> updated relocations (from MI_STORE_DWORD_IMM) were not being observed
> upon execution. After some experiments with adding pipecontrols (a lot
> of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe
> controls or even the current on), it was discovered that we merely
> needed to delay the EMIT_INVALIDATE by several flushes. It is important
> to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that
> needs the delay as opposed to what one might first expect -- that the
> delay is required for the TLB invalidation to take effect (one presumes
> to purge any CS buffers) as opposed to a delay after flushing to ensure
> the writes have landed before triggering invalidation.
>
> Testcase: igt/gem_tiled_fence_blits
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: stable@vger.kernel.org
> ---
> drivers/gpu/drm/i915/intel_ringbuffer.c | 38 +++++++++++++++++++++++--
> 1 file changed, 36 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index b8a7a014d46d..87eebc13c0d8 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -91,6 +91,7 @@ static int
> gen4_render_ring_flush(struct i915_request *rq, u32 mode)
> {
> u32 cmd, *cs;
> + int i;
>
> /*
> * read/write caches:
> @@ -127,12 +128,45 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode)
> cmd |= MI_INVALIDATE_ISP;
> }
>
> - cs = intel_ring_begin(rq, 2);
> + i = 2;
> + if (mode & EMIT_INVALIDATE)
> + i += 20;
> +
> + cs = intel_ring_begin(rq, i);
> if (IS_ERR(cs))
> return PTR_ERR(cs);
>
> *cs++ = cmd;
> - *cs++ = MI_NOOP;
> +
> + /*
> + * A random delay to let the CS invalidate take effect? Without this
> + * delay, the GPU relocation path fails as the CS does not see
> + * the updated contents. Just as important, if we apply the flushes
> + * to the EMIT_FLUSH branch (i.e. immediately after the relocation
> + * write and before the invalidate on the next batch), the relocations
> + * still fail. This implies that is a delay following invalidation
> + * that is required to reset the caches as opposed to a delay to
> + * ensure the memory is written.
> + */
> + if (mode & EMIT_INVALIDATE) {
> + *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> + *cs++ = i915_ggtt_offset(rq->engine->scratch) |
> + PIPE_CONTROL_GLOBAL_GTT;
> + *cs++ = 0;
> + *cs++ = 0;
> +
> + for (i = 0; i < 12; i++)
> + *cs++ = MI_FLUSH;
> +
> + *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> + *cs++ = i915_ggtt_offset(rq->engine->scratch) |
> + PIPE_CONTROL_GLOBAL_GTT;
> + *cs++ = 0;
> + *cs++ = 0;
> + }
This smells a lot like the snb a/b w/a, except there the spec says to
use 8 STORE_DWORDS. I suppose the choice of a specific command isn't
critical, and it's just a matter of stuffing the pipeline with something
that's takes long enough to let the TLB invalidate finish?
Anyways, patch itself seems as reasonable as one might expect for an
issue like this.
Reviewed-by: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> +
> + *cs++ = cmd;
> +
> intel_ring_advance(rq, cs);
>
> return 0;
> --
> 2.19.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Ville Syrj�l�
Intel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
2018-11-07 15:04 ` Ville Syrjälä
(?)
@ 2018-11-07 15:12 ` Chris Wilson
-1 siblings, 0 replies; 6+ messages in thread
From: Chris Wilson @ 2018-11-07 15:12 UTC (permalink / raw)
To: Ville Syrjälä; +Cc: intel-gfx, stable
Quoting Ville Syrjälä (2018-11-07 15:04:24)
> On Mon, Nov 05, 2018 at 09:43:05AM +0000, Chris Wilson wrote:
> > Exercising the gpu reloc path strenuously revealed an issue where the
> > updated relocations (from MI_STORE_DWORD_IMM) were not being observed
> > upon execution. After some experiments with adding pipecontrols (a lot
> > of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe
> > controls or even the current on), it was discovered that we merely
> > needed to delay the EMIT_INVALIDATE by several flushes. It is important
> > to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that
> > needs the delay as opposed to what one might first expect -- that the
> > delay is required for the TLB invalidation to take effect (one presumes
> > to purge any CS buffers) as opposed to a delay after flushing to ensure
> > the writes have landed before triggering invalidation.
> >
> > Testcase: igt/gem_tiled_fence_blits
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: stable@vger.kernel.org
> > ---
> > drivers/gpu/drm/i915/intel_ringbuffer.c | 38 +++++++++++++++++++++++--
> > 1 file changed, 36 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index b8a7a014d46d..87eebc13c0d8 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -91,6 +91,7 @@ static int
> > gen4_render_ring_flush(struct i915_request *rq, u32 mode)
> > {
> > u32 cmd, *cs;
> > + int i;
> >
> > /*
> > * read/write caches:
> > @@ -127,12 +128,45 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode)
> > cmd |= MI_INVALIDATE_ISP;
> > }
> >
> > - cs = intel_ring_begin(rq, 2);
> > + i = 2;
> > + if (mode & EMIT_INVALIDATE)
> > + i += 20;
> > +
> > + cs = intel_ring_begin(rq, i);
> > if (IS_ERR(cs))
> > return PTR_ERR(cs);
> >
> > *cs++ = cmd;
> > - *cs++ = MI_NOOP;
> > +
> > + /*
> > + * A random delay to let the CS invalidate take effect? Without this
> > + * delay, the GPU relocation path fails as the CS does not see
> > + * the updated contents. Just as important, if we apply the flushes
> > + * to the EMIT_FLUSH branch (i.e. immediately after the relocation
> > + * write and before the invalidate on the next batch), the relocations
> > + * still fail. This implies that is a delay following invalidation
> > + * that is required to reset the caches as opposed to a delay to
> > + * ensure the memory is written.
> > + */
> > + if (mode & EMIT_INVALIDATE) {
> > + *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> > + *cs++ = i915_ggtt_offset(rq->engine->scratch) |
> > + PIPE_CONTROL_GLOBAL_GTT;
> > + *cs++ = 0;
> > + *cs++ = 0;
> > +
> > + for (i = 0; i < 12; i++)
> > + *cs++ = MI_FLUSH;
> > +
> > + *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> > + *cs++ = i915_ggtt_offset(rq->engine->scratch) |
> > + PIPE_CONTROL_GLOBAL_GTT;
> > + *cs++ = 0;
> > + *cs++ = 0;
> > + }
>
> This smells a lot like the snb a/b w/a, except there the spec says to
> use 8 STORE_DWORDS.
Yeah, the similarity wasn't lost, except that w/a is to cover the
coherency aspect of the writes not being flushed. This feels a bit
fishier in that the experiments indicate it's an issue on the
invalidation path as opposed to flushing the writes.
And the other w/a to use umpteen pipecontrols to get around the lack of
PIPE_CONTROL_FLUSH.
> I suppose the choice of a specific command isn't
> critical, and it's just a matter of stuffing the pipeline with something
> that's takes long enough to let the TLB invalidate finish?
Except the MI_FLUSH are more effective in fewer number than
PIPE_CONTROLs. Probably because each one translates to a few pipe
controls or something, quite heavy.
> Anyways, patch itself seems as reasonable as one might expect for an
> issue like this.
As nasty as one would expect.
For the record, we are not entirely out of danger. gem_exec_whisper
continues to indicate a problem, but one step at a time. (I haven't yet
found quite what's upsetting it yet, except if we do each batch
synchronously and verify each one, it's happy.)
-Chris
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-11-08 0:35 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-05 9:43 [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 Chris Wilson
2018-11-05 10:23 ` ✓ Fi.CI.BAT: success for " Patchwork
2018-11-05 11:13 ` ✓ Fi.CI.IGT: " Patchwork
2018-11-07 15:04 ` [Intel-gfx] [PATCH] " Ville Syrjälä
2018-11-07 15:04 ` Ville Syrjälä
2018-11-07 15:12 ` Chris Wilson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.