* [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
@ 2018-11-05 9:43 Chris Wilson
2018-11-05 10:23 ` ✓ Fi.CI.BAT: success for " Patchwork
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Chris Wilson @ 2018-11-05 9:43 UTC (permalink / raw)
To: intel-gfx; +Cc: Chris Wilson, stable
Exercising the gpu reloc path strenuously revealed an issue where the
updated relocations (from MI_STORE_DWORD_IMM) were not being observed
upon execution. After some experiments with adding pipecontrols (a lot
of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe
controls or even the current on), it was discovered that we merely
needed to delay the EMIT_INVALIDATE by several flushes. It is important
to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that
needs the delay as opposed to what one might first expect -- that the
delay is required for the TLB invalidation to take effect (one presumes
to purge any CS buffers) as opposed to a delay after flushing to ensure
the writes have landed before triggering invalidation.
Testcase: igt/gem_tiled_fence_blits
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
---
drivers/gpu/drm/i915/intel_ringbuffer.c | 38 +++++++++++++++++++++++--
1 file changed, 36 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index b8a7a014d46d..87eebc13c0d8 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -91,6 +91,7 @@ static int
gen4_render_ring_flush(struct i915_request *rq, u32 mode)
{
u32 cmd, *cs;
+ int i;
/*
* read/write caches:
@@ -127,12 +128,45 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode)
cmd |= MI_INVALIDATE_ISP;
}
- cs = intel_ring_begin(rq, 2);
+ i = 2;
+ if (mode & EMIT_INVALIDATE)
+ i += 20;
+
+ cs = intel_ring_begin(rq, i);
if (IS_ERR(cs))
return PTR_ERR(cs);
*cs++ = cmd;
- *cs++ = MI_NOOP;
+
+ /*
+ * A random delay to let the CS invalidate take effect? Without this
+ * delay, the GPU relocation path fails as the CS does not see
+ * the updated contents. Just as important, if we apply the flushes
+ * to the EMIT_FLUSH branch (i.e. immediately after the relocation
+ * write and before the invalidate on the next batch), the relocations
+ * still fail. This implies that is a delay following invalidation
+ * that is required to reset the caches as opposed to a delay to
+ * ensure the memory is written.
+ */
+ if (mode & EMIT_INVALIDATE) {
+ *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
+ *cs++ = i915_ggtt_offset(rq->engine->scratch) |
+ PIPE_CONTROL_GLOBAL_GTT;
+ *cs++ = 0;
+ *cs++ = 0;
+
+ for (i = 0; i < 12; i++)
+ *cs++ = MI_FLUSH;
+
+ *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
+ *cs++ = i915_ggtt_offset(rq->engine->scratch) |
+ PIPE_CONTROL_GLOBAL_GTT;
+ *cs++ = 0;
+ *cs++ = 0;
+ }
+
+ *cs++ = cmd;
+
intel_ring_advance(rq, cs);
return 0;
--
2.19.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* ✓ Fi.CI.BAT: success for drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 2018-11-05 9:43 [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 Chris Wilson @ 2018-11-05 10:23 ` Patchwork 2018-11-05 11:13 ` ✓ Fi.CI.IGT: " Patchwork 2018-11-07 15:04 ` Ville Syrjälä 2 siblings, 0 replies; 6+ messages in thread From: Patchwork @ 2018-11-05 10:23 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx == Series Details == Series: drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 URL : https://patchwork.freedesktop.org/series/52013/ State : success == Summary == = CI Bug Log - changes from CI_DRM_5085 -> Patchwork_10722 = == Summary - WARNING == Minor unknown changes coming with Patchwork_10722 need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_10722, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. External URL: https://patchwork.freedesktop.org/api/1.0/series/52013/revisions/1/mbox/ == Possible new issues == Here are the unknown changes that may have been introduced in Patchwork_10722: === IGT changes === ==== Warnings ==== igt@drv_selftest@live_guc: fi-icl-u: PASS -> SKIP +2 == Known issues == Here are the changes found in Patchwork_10722 that come from known issues: === IGT changes === ==== Issues hit ==== igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b: fi-blb-e6850: PASS -> INCOMPLETE (fdo#107718) ==== Possible fixes ==== igt@gem_cpu_reloc@basic: fi-skl-6700hq: INCOMPLETE (fdo#108011) -> PASS igt@gem_exec_suspend@basic-s3: fi-glk-dsi: FAIL (fdo#103375) -> PASS igt@kms_frontbuffer_tracking@basic: fi-hsw-peppy: DMESG-WARN (fdo#102614) -> PASS fi-icl-u: FAIL (fdo#103167) -> PASS igt@kms_pipe_crc_basic@read-crc-pipe-b-frame-sequence: fi-byt-clapper: FAIL (fdo#103191, fdo#107362) -> PASS ==== Warnings ==== igt@drv_selftest@live_contexts: fi-icl-u: DMESG-FAIL (fdo#108569) -> INCOMPLETE (fdo#108315, fdo#108535) fdo#102614 https://bugs.freedesktop.org/show_bug.cgi?id=102614 fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167 fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191 fdo#103375 https://bugs.freedesktop.org/show_bug.cgi?id=103375 fdo#107362 https://bugs.freedesktop.org/show_bug.cgi?id=107362 fdo#107718 https://bugs.freedesktop.org/show_bug.cgi?id=107718 fdo#108011 https://bugs.freedesktop.org/show_bug.cgi?id=108011 fdo#108315 https://bugs.freedesktop.org/show_bug.cgi?id=108315 fdo#108535 https://bugs.freedesktop.org/show_bug.cgi?id=108535 fdo#108569 https://bugs.freedesktop.org/show_bug.cgi?id=108569 == Participating hosts (45 -> 44) == Additional (4): fi-kbl-7560u fi-gdg-551 fi-bwr-2160 fi-pnv-d510 Missing (5): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 == Build changes == * Linux: CI_DRM_5085 -> Patchwork_10722 CI_DRM_5085: 6ae61ee5db4af12c0b21bf39e0400ccf024187c4 @ git://anongit.freedesktop.org/gfx-ci/linux IGT_4706: 5421c73a7db3cfaa85ab24325fe6e898cbb27fb3 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools Patchwork_10722: 2c7f605ac0d85a04c93ff6866fcbe7d07dead990 @ git://anongit.freedesktop.org/gfx-ci/linux == Linux commits == 2c7f605ac0d8 drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10722/issues.html _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 6+ messages in thread
* ✓ Fi.CI.IGT: success for drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 2018-11-05 9:43 [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 Chris Wilson 2018-11-05 10:23 ` ✓ Fi.CI.BAT: success for " Patchwork @ 2018-11-05 11:13 ` Patchwork 2018-11-07 15:04 ` Ville Syrjälä 2 siblings, 0 replies; 6+ messages in thread From: Patchwork @ 2018-11-05 11:13 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx == Series Details == Series: drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 URL : https://patchwork.freedesktop.org/series/52013/ State : success == Summary == = CI Bug Log - changes from CI_DRM_5085_full -> Patchwork_10722_full = == Summary - WARNING == Minor unknown changes coming with Patchwork_10722_full need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_10722_full, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. == Possible new issues == Here are the unknown changes that may have been introduced in Patchwork_10722_full: === IGT changes === ==== Warnings ==== igt@pm_rc6_residency@rc6-accuracy: shard-snb: PASS -> SKIP == Known issues == Here are the changes found in Patchwork_10722_full that come from known issues: === IGT changes === ==== Issues hit ==== igt@gem_exec_reuse@baggage: shard-apl: PASS -> INCOMPLETE (fdo#103927) igt@gem_exec_schedule@pi-ringfull-bsd: shard-skl: NOTRUN -> FAIL (fdo#103158) +1 igt@gem_softpin@noreloc-s3: shard-skl: NOTRUN -> INCOMPLETE (fdo#104108, fdo#107773) igt@kms_busy@extended-modeset-hang-newfb-render-a: shard-skl: NOTRUN -> DMESG-WARN (fdo#107956) +4 igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-b: shard-kbl: PASS -> DMESG-WARN (fdo#107956) igt@kms_busy@extended-pageflip-hang-newfb-render-a: shard-apl: PASS -> DMESG-WARN (fdo#107956) igt@kms_cursor_crc@cursor-128x42-onscreen: shard-skl: NOTRUN -> FAIL (fdo#103232) igt@kms_cursor_crc@cursor-256x256-dpms: shard-glk: PASS -> FAIL (fdo#103232) +2 igt@kms_fbcon_fbt@psr: shard-skl: NOTRUN -> FAIL (fdo#107882) igt@kms_flip_tiling@flip-y-tiled: shard-skl: NOTRUN -> FAIL (fdo#108303) igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-fullscreen: shard-apl: PASS -> FAIL (fdo#103167) igt@kms_frontbuffer_tracking@fbc-stridechange: shard-skl: NOTRUN -> FAIL (fdo#105683) igt@kms_plane@plane-position-covered-pipe-b-planes: shard-glk: PASS -> FAIL (fdo#103166) +1 igt@kms_plane_alpha_blend@pipe-a-constant-alpha-max: shard-skl: NOTRUN -> FAIL (fdo#108145) +4 igt@kms_plane_alpha_blend@pipe-b-alpha-basic: shard-skl: NOTRUN -> FAIL (fdo#107815, fdo#108145) +1 igt@kms_properties@connector-properties-atomic: shard-skl: NOTRUN -> FAIL (fdo#108642) igt@kms_setmode@basic: shard-skl: NOTRUN -> FAIL (fdo#99912) igt@kms_sysfs_edid_timing: shard-skl: NOTRUN -> FAIL (fdo#100047) ==== Possible fixes ==== igt@gem_exec_reloc@basic-wc-cpu-noreloc: shard-snb: INCOMPLETE (fdo#105411) -> PASS igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-c: shard-hsw: DMESG-WARN (fdo#107956) -> PASS igt@kms_busy@extended-pageflip-modeset-hang-oldfb-render-a: shard-snb: DMESG-WARN (fdo#107956) -> PASS igt@kms_cursor_crc@cursor-128x42-onscreen: shard-glk: FAIL (fdo#103232) -> PASS +1 igt@kms_cursor_crc@cursor-64x64-offscreen: shard-skl: FAIL (fdo#103232) -> PASS +1 igt@kms_draw_crc@draw-method-xrgb2101010-pwrite-ytiled: shard-skl: FAIL (fdo#103184) -> PASS igt@kms_draw_crc@draw-method-xrgb8888-pwrite-untiled: shard-skl: FAIL (fdo#108472) -> PASS igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-mmap-cpu: shard-skl: FAIL (fdo#105682) -> PASS igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-mmap-cpu: shard-apl: FAIL (fdo#103167) -> PASS igt@kms_frontbuffer_tracking@fbc-1p-rte: shard-glk: FAIL (fdo#105682, fdo#103167) -> PASS igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-onoff: shard-glk: FAIL (fdo#103167) -> PASS +1 igt@kms_frontbuffer_tracking@fbc-rgb101010-draw-mmap-wc: shard-skl: FAIL (fdo#105682, fdo#103167) -> PASS igt@kms_plane_alpha_blend@pipe-a-coverage-7efc: shard-skl: FAIL (fdo#107815, fdo#108145) -> PASS igt@kms_plane_multiple@atomic-pipe-a-tiling-x: shard-apl: FAIL (fdo#103166) -> PASS +2 igt@kms_plane_multiple@atomic-pipe-a-tiling-yf: shard-glk: FAIL (fdo#103166) -> PASS igt@perf@oa-exponents: shard-glk: FAIL (fdo#105483) -> PASS igt@pm_rpm@system-suspend: shard-skl: INCOMPLETE (fdo#107807, fdo#104108, fdo#107773) -> PASS fdo#100047 https://bugs.freedesktop.org/show_bug.cgi?id=100047 fdo#103158 https://bugs.freedesktop.org/show_bug.cgi?id=103158 fdo#103166 https://bugs.freedesktop.org/show_bug.cgi?id=103166 fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167 fdo#103184 https://bugs.freedesktop.org/show_bug.cgi?id=103184 fdo#103232 https://bugs.freedesktop.org/show_bug.cgi?id=103232 fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927 fdo#104108 https://bugs.freedesktop.org/show_bug.cgi?id=104108 fdo#105411 https://bugs.freedesktop.org/show_bug.cgi?id=105411 fdo#105483 https://bugs.freedesktop.org/show_bug.cgi?id=105483 fdo#105682 https://bugs.freedesktop.org/show_bug.cgi?id=105682 fdo#105683 https://bugs.freedesktop.org/show_bug.cgi?id=105683 fdo#107773 https://bugs.freedesktop.org/show_bug.cgi?id=107773 fdo#107807 https://bugs.freedesktop.org/show_bug.cgi?id=107807 fdo#107815 https://bugs.freedesktop.org/show_bug.cgi?id=107815 fdo#107882 https://bugs.freedesktop.org/show_bug.cgi?id=107882 fdo#107956 https://bugs.freedesktop.org/show_bug.cgi?id=107956 fdo#108145 https://bugs.freedesktop.org/show_bug.cgi?id=108145 fdo#108303 https://bugs.freedesktop.org/show_bug.cgi?id=108303 fdo#108472 https://bugs.freedesktop.org/show_bug.cgi?id=108472 fdo#108642 https://bugs.freedesktop.org/show_bug.cgi?id=108642 fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912 == Participating hosts (6 -> 6) == No changes in participating hosts == Build changes == * Linux: CI_DRM_5085 -> Patchwork_10722 CI_DRM_5085: 6ae61ee5db4af12c0b21bf39e0400ccf024187c4 @ git://anongit.freedesktop.org/gfx-ci/linux IGT_4706: 5421c73a7db3cfaa85ab24325fe6e898cbb27fb3 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools Patchwork_10722: 2c7f605ac0d85a04c93ff6866fcbe7d07dead990 @ git://anongit.freedesktop.org/gfx-ci/linux piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10722/shards.html _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 2018-11-05 9:43 [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 Chris Wilson @ 2018-11-07 15:04 ` Ville Syrjälä 2018-11-05 11:13 ` ✓ Fi.CI.IGT: " Patchwork 2018-11-07 15:04 ` Ville Syrjälä 2 siblings, 0 replies; 6+ messages in thread From: Ville Syrjälä @ 2018-11-07 15:04 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx, stable On Mon, Nov 05, 2018 at 09:43:05AM +0000, Chris Wilson wrote: > Exercising the gpu reloc path strenuously revealed an issue where the > updated relocations (from MI_STORE_DWORD_IMM) were not being observed > upon execution. After some experiments with adding pipecontrols (a lot > of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe > controls or even the current on), it was discovered that we merely > needed to delay the EMIT_INVALIDATE by several flushes. It is important > to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that > needs the delay as opposed to what one might first expect -- that the > delay is required for the TLB invalidation to take effect (one presumes > to purge any CS buffers) as opposed to a delay after flushing to ensure > the writes have landed before triggering invalidation. > > Testcase: igt/gem_tiled_fence_blits > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: stable@vger.kernel.org > --- > drivers/gpu/drm/i915/intel_ringbuffer.c | 38 +++++++++++++++++++++++-- > 1 file changed, 36 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > index b8a7a014d46d..87eebc13c0d8 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -91,6 +91,7 @@ static int > gen4_render_ring_flush(struct i915_request *rq, u32 mode) > { > u32 cmd, *cs; > + int i; > > /* > * read/write caches: > @@ -127,12 +128,45 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode) > cmd |= MI_INVALIDATE_ISP; > } > > - cs = intel_ring_begin(rq, 2); > + i = 2; > + if (mode & EMIT_INVALIDATE) > + i += 20; > + > + cs = intel_ring_begin(rq, i); > if (IS_ERR(cs)) > return PTR_ERR(cs); > > *cs++ = cmd; > - *cs++ = MI_NOOP; > + > + /* > + * A random delay to let the CS invalidate take effect? Without this > + * delay, the GPU relocation path fails as the CS does not see > + * the updated contents. Just as important, if we apply the flushes > + * to the EMIT_FLUSH branch (i.e. immediately after the relocation > + * write and before the invalidate on the next batch), the relocations > + * still fail. This implies that is a delay following invalidation > + * that is required to reset the caches as opposed to a delay to > + * ensure the memory is written. > + */ > + if (mode & EMIT_INVALIDATE) { > + *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE; > + *cs++ = i915_ggtt_offset(rq->engine->scratch) | > + PIPE_CONTROL_GLOBAL_GTT; > + *cs++ = 0; > + *cs++ = 0; > + > + for (i = 0; i < 12; i++) > + *cs++ = MI_FLUSH; > + > + *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE; > + *cs++ = i915_ggtt_offset(rq->engine->scratch) | > + PIPE_CONTROL_GLOBAL_GTT; > + *cs++ = 0; > + *cs++ = 0; > + } This smells a lot like the snb a/b w/a, except there the spec says to use 8 STORE_DWORDS. I suppose the choice of a specific command isn't critical, and it's just a matter of stuffing the pipeline with something that's takes long enough to let the TLB invalidate finish? Anyways, patch itself seems as reasonable as one might expect for an issue like this. Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > + > + *cs++ = cmd; > + > intel_ring_advance(rq, cs); > > return 0; > -- > 2.19.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Ville Syrjälä Intel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 @ 2018-11-07 15:04 ` Ville Syrjälä 0 siblings, 0 replies; 6+ messages in thread From: Ville Syrjälä @ 2018-11-07 15:04 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx, stable On Mon, Nov 05, 2018 at 09:43:05AM +0000, Chris Wilson wrote: > Exercising the gpu reloc path strenuously revealed an issue where the > updated relocations (from MI_STORE_DWORD_IMM) were not being observed > upon execution. After some experiments with adding pipecontrols (a lot > of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe > controls or even the current on), it was discovered that we merely > needed to delay the EMIT_INVALIDATE by several flushes. It is important > to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that > needs the delay as opposed to what one might first expect -- that the > delay is required for the TLB invalidation to take effect (one presumes > to purge any CS buffers) as opposed to a delay after flushing to ensure > the writes have landed before triggering invalidation. > > Testcase: igt/gem_tiled_fence_blits > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: stable@vger.kernel.org > --- > drivers/gpu/drm/i915/intel_ringbuffer.c | 38 +++++++++++++++++++++++-- > 1 file changed, 36 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > index b8a7a014d46d..87eebc13c0d8 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -91,6 +91,7 @@ static int > gen4_render_ring_flush(struct i915_request *rq, u32 mode) > { > u32 cmd, *cs; > + int i; > > /* > * read/write caches: > @@ -127,12 +128,45 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode) > cmd |= MI_INVALIDATE_ISP; > } > > - cs = intel_ring_begin(rq, 2); > + i = 2; > + if (mode & EMIT_INVALIDATE) > + i += 20; > + > + cs = intel_ring_begin(rq, i); > if (IS_ERR(cs)) > return PTR_ERR(cs); > > *cs++ = cmd; > - *cs++ = MI_NOOP; > + > + /* > + * A random delay to let the CS invalidate take effect? Without this > + * delay, the GPU relocation path fails as the CS does not see > + * the updated contents. Just as important, if we apply the flushes > + * to the EMIT_FLUSH branch (i.e. immediately after the relocation > + * write and before the invalidate on the next batch), the relocations > + * still fail. This implies that is a delay following invalidation > + * that is required to reset the caches as opposed to a delay to > + * ensure the memory is written. > + */ > + if (mode & EMIT_INVALIDATE) { > + *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE; > + *cs++ = i915_ggtt_offset(rq->engine->scratch) | > + PIPE_CONTROL_GLOBAL_GTT; > + *cs++ = 0; > + *cs++ = 0; > + > + for (i = 0; i < 12; i++) > + *cs++ = MI_FLUSH; > + > + *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE; > + *cs++ = i915_ggtt_offset(rq->engine->scratch) | > + PIPE_CONTROL_GLOBAL_GTT; > + *cs++ = 0; > + *cs++ = 0; > + } This smells a lot like the snb a/b w/a, except there the spec says to use 8 STORE_DWORDS. I suppose the choice of a specific command isn't critical, and it's just a matter of stuffing the pipeline with something that's takes long enough to let the TLB invalidate finish? Anyways, patch itself seems as reasonable as one might expect for an issue like this. Reviewed-by: Ville Syrj�l� <ville.syrjala@linux.intel.com> > + > + *cs++ = cmd; > + > intel_ring_advance(rq, cs); > > return 0; > -- > 2.19.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Ville Syrj�l� Intel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 2018-11-07 15:04 ` Ville Syrjälä (?) @ 2018-11-07 15:12 ` Chris Wilson -1 siblings, 0 replies; 6+ messages in thread From: Chris Wilson @ 2018-11-07 15:12 UTC (permalink / raw) To: Ville Syrjälä; +Cc: intel-gfx, stable Quoting Ville Syrjälä (2018-11-07 15:04:24) > On Mon, Nov 05, 2018 at 09:43:05AM +0000, Chris Wilson wrote: > > Exercising the gpu reloc path strenuously revealed an issue where the > > updated relocations (from MI_STORE_DWORD_IMM) were not being observed > > upon execution. After some experiments with adding pipecontrols (a lot > > of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe > > controls or even the current on), it was discovered that we merely > > needed to delay the EMIT_INVALIDATE by several flushes. It is important > > to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that > > needs the delay as opposed to what one might first expect -- that the > > delay is required for the TLB invalidation to take effect (one presumes > > to purge any CS buffers) as opposed to a delay after flushing to ensure > > the writes have landed before triggering invalidation. > > > > Testcase: igt/gem_tiled_fence_blits > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: stable@vger.kernel.org > > --- > > drivers/gpu/drm/i915/intel_ringbuffer.c | 38 +++++++++++++++++++++++-- > > 1 file changed, 36 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > > index b8a7a014d46d..87eebc13c0d8 100644 > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > @@ -91,6 +91,7 @@ static int > > gen4_render_ring_flush(struct i915_request *rq, u32 mode) > > { > > u32 cmd, *cs; > > + int i; > > > > /* > > * read/write caches: > > @@ -127,12 +128,45 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode) > > cmd |= MI_INVALIDATE_ISP; > > } > > > > - cs = intel_ring_begin(rq, 2); > > + i = 2; > > + if (mode & EMIT_INVALIDATE) > > + i += 20; > > + > > + cs = intel_ring_begin(rq, i); > > if (IS_ERR(cs)) > > return PTR_ERR(cs); > > > > *cs++ = cmd; > > - *cs++ = MI_NOOP; > > + > > + /* > > + * A random delay to let the CS invalidate take effect? Without this > > + * delay, the GPU relocation path fails as the CS does not see > > + * the updated contents. Just as important, if we apply the flushes > > + * to the EMIT_FLUSH branch (i.e. immediately after the relocation > > + * write and before the invalidate on the next batch), the relocations > > + * still fail. This implies that is a delay following invalidation > > + * that is required to reset the caches as opposed to a delay to > > + * ensure the memory is written. > > + */ > > + if (mode & EMIT_INVALIDATE) { > > + *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE; > > + *cs++ = i915_ggtt_offset(rq->engine->scratch) | > > + PIPE_CONTROL_GLOBAL_GTT; > > + *cs++ = 0; > > + *cs++ = 0; > > + > > + for (i = 0; i < 12; i++) > > + *cs++ = MI_FLUSH; > > + > > + *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE; > > + *cs++ = i915_ggtt_offset(rq->engine->scratch) | > > + PIPE_CONTROL_GLOBAL_GTT; > > + *cs++ = 0; > > + *cs++ = 0; > > + } > > This smells a lot like the snb a/b w/a, except there the spec says to > use 8 STORE_DWORDS. Yeah, the similarity wasn't lost, except that w/a is to cover the coherency aspect of the writes not being flushed. This feels a bit fishier in that the experiments indicate it's an issue on the invalidation path as opposed to flushing the writes. And the other w/a to use umpteen pipecontrols to get around the lack of PIPE_CONTROL_FLUSH. > I suppose the choice of a specific command isn't > critical, and it's just a matter of stuffing the pipeline with something > that's takes long enough to let the TLB invalidate finish? Except the MI_FLUSH are more effective in fewer number than PIPE_CONTROLs. Probably because each one translates to a few pipe controls or something, quite heavy. > Anyways, patch itself seems as reasonable as one might expect for an > issue like this. As nasty as one would expect. For the record, we are not entirely out of danger. gem_exec_whisper continues to indicate a problem, but one step at a time. (I haven't yet found quite what's upsetting it yet, except if we do each batch synchronously and verify each one, it's happy.) -Chris ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-11-08 0:35 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-11-05 9:43 [PATCH] drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 Chris Wilson 2018-11-05 10:23 ` ✓ Fi.CI.BAT: success for " Patchwork 2018-11-05 11:13 ` ✓ Fi.CI.IGT: " Patchwork 2018-11-07 15:04 ` [Intel-gfx] [PATCH] " Ville Syrjälä 2018-11-07 15:04 ` Ville Syrjälä 2018-11-07 15:12 ` Chris Wilson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.