* [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes
@ 2015-01-22 13:13 Chris Wilson
2015-01-22 13:24 ` [Intel-gfx] " Daniel Vetter
2015-01-22 22:41 ` [PATCH] " shuang.he
0 siblings, 2 replies; 9+ messages in thread
From: Chris Wilson @ 2015-01-22 13:13 UTC (permalink / raw)
To: intel-gfx; +Cc: Chris Wilson, stable
This looked like an odd regression from
commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu Jun 12 10:28:55 2014 +0100
drm/i915: Restrict GPU boost to the RCS engine
but in reality it undercovered a much older coherency bug. The issue that
boosting the GPU frequency on the BCS ring was masking was that we could
wake the CPU up after completion of a BCS batch and inspect memory prior
to the write cache being fully evicted. In order to serialise the
breadcrumb interrupt (and so ensure that the CPU's view of memory is
coherent) we need to perform a post-sync operation in the MI_FLUSH_DW.
Testcase: gpuX-rcs-gpu-read-after-write
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
---
drivers/gpu/drm/i915/intel_ringbuffer.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index e18ed05dc0d5..53615eba2a97 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2019,6 +2019,14 @@ static int gen6_ring_flush(struct intel_engine_cs *ring,
cmd = MI_FLUSH_DW;
if (INTEL_INFO(ring->dev)->gen >= 8)
cmd += 1;
+
+ /* We always require a command barrier so that subsequent
+ * commands, such as breadcrumb interrupts, are strictly ordered
+ * wrt the contents of the write cache being flushed to memory
+ * (and thus being coherent from the CPU).
+ */
+ cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
+
/*
* Bspec vol 1c.3 - blitter engine command streamer:
* "If ENABLED, all TLBs will be invalidated once the flush
@@ -2026,8 +2034,7 @@ static int gen6_ring_flush(struct intel_engine_cs *ring,
* Post-Sync Operation field is a value of 1h or 3h."
*/
if (invalidate & I915_GEM_DOMAIN_RENDER)
- cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
- MI_FLUSH_DW_OP_STOREDW;
+ cmd |= MI_INVALIDATE_TLB;
intel_ring_emit(ring, cmd);
intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
if (INTEL_INFO(ring->dev)->gen >= 8) {
--
2.1.4
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [Intel-gfx] [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes 2015-01-22 13:13 [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes Chris Wilson @ 2015-01-22 13:24 ` Daniel Vetter 2015-01-22 13:32 ` Chris Wilson 2015-01-22 13:42 ` [PATCH v2] " Chris Wilson 2015-01-22 22:41 ` [PATCH] " shuang.he 1 sibling, 2 replies; 9+ messages in thread From: Daniel Vetter @ 2015-01-22 13:24 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx, stable On Thu, Jan 22, 2015 at 2:13 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > This looked like an odd regression from > > commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Thu Jun 12 10:28:55 2014 +0100 > > drm/i915: Restrict GPU boost to the RCS engine > > but in reality it undercovered a much older coherency bug. The issue that > boosting the GPU frequency on the BCS ring was masking was that we could > wake the CPU up after completion of a BCS batch and inspect memory prior > to the write cache being fully evicted. In order to serialise the > breadcrumb interrupt (and so ensure that the CPU's view of memory is > coherent) we need to perform a post-sync operation in the MI_FLUSH_DW. > > Testcase: gpuX-rcs-gpu-read-after-write > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: stable@vger.kernel.org We also need this in gen8_emit_flush and gen6_bsd_ring_flush I think. And interesting that the subsequent seqno write can apparently be reordered with cache flushing. Or do we just need lots more of those (wasn't the magic number once 32 or so)? Anyway can't argue with hw, so Acked (with the other 2 functions updated). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes 2015-01-22 13:24 ` [Intel-gfx] " Daniel Vetter @ 2015-01-22 13:32 ` Chris Wilson 2015-01-22 13:42 ` [PATCH v2] " Chris Wilson 1 sibling, 0 replies; 9+ messages in thread From: Chris Wilson @ 2015-01-22 13:32 UTC (permalink / raw) To: Daniel Vetter; +Cc: intel-gfx, stable On Thu, Jan 22, 2015 at 02:24:18PM +0100, Daniel Vetter wrote: > On Thu, Jan 22, 2015 at 2:13 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > This looked like an odd regression from > > > > commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b > > Author: Chris Wilson <chris@chris-wilson.co.uk> > > Date: Thu Jun 12 10:28:55 2014 +0100 > > > > drm/i915: Restrict GPU boost to the RCS engine > > > > but in reality it undercovered a much older coherency bug. The issue that > > boosting the GPU frequency on the BCS ring was masking was that we could > > wake the CPU up after completion of a BCS batch and inspect memory prior > > to the write cache being fully evicted. In order to serialise the > > breadcrumb interrupt (and so ensure that the CPU's view of memory is > > coherent) we need to perform a post-sync operation in the MI_FLUSH_DW. > > > > Testcase: gpuX-rcs-gpu-read-after-write > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: stable@vger.kernel.org > > We also need this in gen8_emit_flush and gen6_bsd_ring_flush I think. > > And interesting that the subsequent seqno write can apparently be > reordered with cache flushing. Or do we just need lots more of those > (wasn't the magic number once 32 or so)? That magic number was for the reordering of the seqno write with interrupt generation, iirc. But yes, it does sound oddly reminiscent. At least they did give us post-sync operations, even if we always end up having to use then. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2] drm/i915: Insert a command barrier on BLT/BSD cache flushes 2015-01-22 13:24 ` [Intel-gfx] " Daniel Vetter 2015-01-22 13:32 ` Chris Wilson @ 2015-01-22 13:42 ` Chris Wilson 2015-01-22 22:45 ` shuang.he 2015-02-09 16:00 ` [Intel-gfx] " Jani Nikula 1 sibling, 2 replies; 9+ messages in thread From: Chris Wilson @ 2015-01-22 13:42 UTC (permalink / raw) To: intel-gfx; +Cc: Chris Wilson, stable This looked like an odd regression from commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jun 12 10:28:55 2014 +0100 drm/i915: Restrict GPU boost to the RCS engine but in reality it undercovered a much older coherency bug. The issue that boosting the GPU frequency on the BCS ring was masking was that we could wake the CPU up after completion of a BCS batch and inspect memory prior to the write cache being fully evicted. In order to serialise the breadcrumb interrupt (and so ensure that the CPU's view of memory is coherent) we need to perform a post-sync operation in the MI_FLUSH_DW. v2: Fix all the MI_FLUSH_DW (bsd plus the duplication in execlists). Testcase: gpuX-rcs-gpu-read-after-write Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Acked-by: Daniel Vetter <daniel@ffwll.ch> --- drivers/gpu/drm/i915/intel_lrc.c | 20 +++++++++++--------- drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++++++++++++++++++---- 2 files changed, 30 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index e405b61cdac5..8e71d8851c9a 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1237,15 +1237,17 @@ static int gen8_emit_flush(struct intel_ringbuffer *ringbuf, cmd = MI_FLUSH_DW + 1; - if (ring == &dev_priv->ring[VCS]) { - if (invalidate_domains & I915_GEM_GPU_DOMAINS) - cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD | - MI_FLUSH_DW_STORE_INDEX | - MI_FLUSH_DW_OP_STOREDW; - } else { - if (invalidate_domains & I915_GEM_DOMAIN_RENDER) - cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX | - MI_FLUSH_DW_OP_STOREDW; + /* We always require a command barrier so that subsequent + * commands, such as breadcrumb interrupts, are strictly ordered + * wrt the contents of the write cache being flushed to memory + * (and thus being coherent from the CPU). + */ + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW; + + if (invalidate_domains & I915_GEM_GPU_DOMAINS) { + cmd |= MI_INVALIDATE_TLB; + if (ring == &dev_priv->ring[VCS]) + cmd |= MI_INVALIDATE_BSD; } intel_logical_ring_emit(ringbuf, cmd); diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 23020d67329b..718530fd6c6b 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -2224,6 +2224,14 @@ static int gen6_bsd_ring_flush(struct intel_engine_cs *ring, cmd = MI_FLUSH_DW; if (INTEL_INFO(ring->dev)->gen >= 8) cmd += 1; + + /* We always require a command barrier so that subsequent + * commands, such as breadcrumb interrupts, are strictly ordered + * wrt the contents of the write cache being flushed to memory + * (and thus being coherent from the CPU). + */ + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW; + /* * Bspec vol 1c.5 - video engine command streamer: * "If ENABLED, all TLBs will be invalidated once the flush @@ -2231,8 +2239,8 @@ static int gen6_bsd_ring_flush(struct intel_engine_cs *ring, * Post-Sync Operation field is a value of 1h or 3h." */ if (invalidate & I915_GEM_GPU_DOMAINS) - cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD | - MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW; + cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD; + intel_ring_emit(ring, cmd); intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT); if (INTEL_INFO(ring->dev)->gen >= 8) { @@ -2328,6 +2336,14 @@ static int gen6_ring_flush(struct intel_engine_cs *ring, cmd = MI_FLUSH_DW; if (INTEL_INFO(ring->dev)->gen >= 8) cmd += 1; + + /* We always require a command barrier so that subsequent + * commands, such as breadcrumb interrupts, are strictly ordered + * wrt the contents of the write cache being flushed to memory + * (and thus being coherent from the CPU). + */ + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW; + /* * Bspec vol 1c.3 - blitter engine command streamer: * "If ENABLED, all TLBs will be invalidated once the flush @@ -2335,8 +2351,7 @@ static int gen6_ring_flush(struct intel_engine_cs *ring, * Post-Sync Operation field is a value of 1h or 3h." */ if (invalidate & I915_GEM_DOMAIN_RENDER) - cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX | - MI_FLUSH_DW_OP_STOREDW; + cmd |= MI_INVALIDATE_TLB; intel_ring_emit(ring, cmd); intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT); if (INTEL_INFO(ring->dev)->gen >= 8) { -- 2.1.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] drm/i915: Insert a command barrier on BLT/BSD cache flushes 2015-01-22 13:42 ` [PATCH v2] " Chris Wilson @ 2015-01-22 22:45 ` shuang.he 2015-02-09 16:00 ` [Intel-gfx] " Jani Nikula 1 sibling, 0 replies; 9+ messages in thread From: shuang.he @ 2015-01-22 22:45 UTC (permalink / raw) To: shuang.he, ethan.gao, intel-gfx, chris Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com) Task id: 5626 -------------------------------------Summary------------------------------------- Platform Delta drm-intel-nightly Series Applied PNV 353/353 353/353 ILK 353/353 353/353 SNB +1 399/422 400/422 IVB +1-1 486/487 486/487 BYT 296/296 296/296 HSW +1 507/508 508/508 BDW +2 399/402 401/402 -------------------------------------Detailed------------------------------------- Platform Test drm-intel-nightly Series Applied SNB igt_kms_flip_nonexisting-fb NSPT(1, M35)PASS(2, M35) PASS(1, M35) *IVB igt_gem_pwrite_pread_display-copy-performance PASS(2, M21M4) DMESG_WARN(1, M4) IVB igt_gem_pwrite_pread_snooped-pwrite-blt-cpu_mmap-performance DMESG_WARN(1, M21)PASS(1, M4) PASS(1, M4) HSW igt_gem_storedw_loop_blt DMESG_WARN(2, M19M20)PASS(1, M20) PASS(1, M20) BDW igt_gem_pwrite_pread_display-pwrite-blt-gtt_mmap-performance DMESG_WARN(2, M28)PASS(1, M30) PASS(1, M28) BDW igt_gem_pwrite_pread_uncached-pwrite-blt-gtt_mmap-performance DMESG_WARN(1, M28)PASS(1, M28) PASS(1, M28) Note: You need to pay more attention to line start with '*' _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Intel-gfx] [PATCH v2] drm/i915: Insert a command barrier on BLT/BSD cache flushes 2015-01-22 13:42 ` [PATCH v2] " Chris Wilson 2015-01-22 22:45 ` shuang.he @ 2015-02-09 16:00 ` Jani Nikula 2015-02-09 16:07 ` Chris Wilson 1 sibling, 1 reply; 9+ messages in thread From: Jani Nikula @ 2015-02-09 16:00 UTC (permalink / raw) To: Chris Wilson, intel-gfx; +Cc: stable, Daniel Vetter On Thu, 22 Jan 2015, Chris Wilson <chris@chris-wilson.co.uk> wrote: > This looked like an odd regression from > > commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Thu Jun 12 10:28:55 2014 +0100 > > drm/i915: Restrict GPU boost to the RCS engine > > but in reality it undercovered a much older coherency bug. The issue that > boosting the GPU frequency on the BCS ring was masking was that we could > wake the CPU up after completion of a BCS batch and inspect memory prior > to the write cache being fully evicted. In order to serialise the > breadcrumb interrupt (and so ensure that the CPU's view of memory is > coherent) we need to perform a post-sync operation in the MI_FLUSH_DW. > > v2: Fix all the MI_FLUSH_DW (bsd plus the duplication in execlists). > > Testcase: gpuX-rcs-gpu-read-after-write > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: stable@vger.kernel.org > Acked-by: Daniel Vetter <daniel@ffwll.ch> > --- > drivers/gpu/drm/i915/intel_lrc.c | 20 +++++++++++--------- > drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++++++++++++++++++---- > 2 files changed, 30 insertions(+), 13 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index e405b61cdac5..8e71d8851c9a 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -1237,15 +1237,17 @@ static int gen8_emit_flush(struct intel_ringbuffer *ringbuf, > > cmd = MI_FLUSH_DW + 1; > > - if (ring == &dev_priv->ring[VCS]) { > - if (invalidate_domains & I915_GEM_GPU_DOMAINS) > - cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD | > - MI_FLUSH_DW_STORE_INDEX | > - MI_FLUSH_DW_OP_STOREDW; > - } else { > - if (invalidate_domains & I915_GEM_DOMAIN_RENDER) > - cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX | > - MI_FLUSH_DW_OP_STOREDW; > + /* We always require a command barrier so that subsequent > + * commands, such as breadcrumb interrupts, are strictly ordered > + * wrt the contents of the write cache being flushed to memory > + * (and thus being coherent from the CPU). > + */ > + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW; > + > + if (invalidate_domains & I915_GEM_GPU_DOMAINS) { Why do you change the mask from I915_GEM_DOMAIN_RENDER to I915_GEM_GPU_DOMAINS for ring != VCS? BR, Jani. > + cmd |= MI_INVALIDATE_TLB; > + if (ring == &dev_priv->ring[VCS]) > + cmd |= MI_INVALIDATE_BSD; > } > > intel_logical_ring_emit(ringbuf, cmd); > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > index 23020d67329b..718530fd6c6b 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -2224,6 +2224,14 @@ static int gen6_bsd_ring_flush(struct intel_engine_cs *ring, > cmd = MI_FLUSH_DW; > if (INTEL_INFO(ring->dev)->gen >= 8) > cmd += 1; > + > + /* We always require a command barrier so that subsequent > + * commands, such as breadcrumb interrupts, are strictly ordered > + * wrt the contents of the write cache being flushed to memory > + * (and thus being coherent from the CPU). > + */ > + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW; > + > /* > * Bspec vol 1c.5 - video engine command streamer: > * "If ENABLED, all TLBs will be invalidated once the flush > @@ -2231,8 +2239,8 @@ static int gen6_bsd_ring_flush(struct intel_engine_cs *ring, > * Post-Sync Operation field is a value of 1h or 3h." > */ > if (invalidate & I915_GEM_GPU_DOMAINS) > - cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD | > - MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW; > + cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD; > + > intel_ring_emit(ring, cmd); > intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT); > if (INTEL_INFO(ring->dev)->gen >= 8) { > @@ -2328,6 +2336,14 @@ static int gen6_ring_flush(struct intel_engine_cs *ring, > cmd = MI_FLUSH_DW; > if (INTEL_INFO(ring->dev)->gen >= 8) > cmd += 1; > + > + /* We always require a command barrier so that subsequent > + * commands, such as breadcrumb interrupts, are strictly ordered > + * wrt the contents of the write cache being flushed to memory > + * (and thus being coherent from the CPU). > + */ > + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW; > + > /* > * Bspec vol 1c.3 - blitter engine command streamer: > * "If ENABLED, all TLBs will be invalidated once the flush > @@ -2335,8 +2351,7 @@ static int gen6_ring_flush(struct intel_engine_cs *ring, > * Post-Sync Operation field is a value of 1h or 3h." > */ > if (invalidate & I915_GEM_DOMAIN_RENDER) > - cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX | > - MI_FLUSH_DW_OP_STOREDW; > + cmd |= MI_INVALIDATE_TLB; > intel_ring_emit(ring, cmd); > intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT); > if (INTEL_INFO(ring->dev)->gen >= 8) { > -- > 2.1.4 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Jani Nikula, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Intel-gfx] [PATCH v2] drm/i915: Insert a command barrier on BLT/BSD cache flushes 2015-02-09 16:00 ` [Intel-gfx] " Jani Nikula @ 2015-02-09 16:07 ` Chris Wilson 2015-02-09 18:15 ` Jani Nikula 0 siblings, 1 reply; 9+ messages in thread From: Chris Wilson @ 2015-02-09 16:07 UTC (permalink / raw) To: Jani Nikula; +Cc: intel-gfx, stable, Daniel Vetter On Mon, Feb 09, 2015 at 06:00:25PM +0200, Jani Nikula wrote: > On Thu, 22 Jan 2015, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > This looked like an odd regression from > > > > commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b > > Author: Chris Wilson <chris@chris-wilson.co.uk> > > Date: Thu Jun 12 10:28:55 2014 +0100 > > > > drm/i915: Restrict GPU boost to the RCS engine > > > > but in reality it undercovered a much older coherency bug. The issue that > > boosting the GPU frequency on the BCS ring was masking was that we could > > wake the CPU up after completion of a BCS batch and inspect memory prior > > to the write cache being fully evicted. In order to serialise the > > breadcrumb interrupt (and so ensure that the CPU's view of memory is > > coherent) we need to perform a post-sync operation in the MI_FLUSH_DW. > > > > v2: Fix all the MI_FLUSH_DW (bsd plus the duplication in execlists). > > > > Testcase: gpuX-rcs-gpu-read-after-write > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: stable@vger.kernel.org > > Acked-by: Daniel Vetter <daniel@ffwll.ch> > > --- > > drivers/gpu/drm/i915/intel_lrc.c | 20 +++++++++++--------- > > drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++++++++++++++++++---- > > 2 files changed, 30 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > > index e405b61cdac5..8e71d8851c9a 100644 > > --- a/drivers/gpu/drm/i915/intel_lrc.c > > +++ b/drivers/gpu/drm/i915/intel_lrc.c > > @@ -1237,15 +1237,17 @@ static int gen8_emit_flush(struct intel_ringbuffer *ringbuf, > > > > cmd = MI_FLUSH_DW + 1; > > > > - if (ring == &dev_priv->ring[VCS]) { > > - if (invalidate_domains & I915_GEM_GPU_DOMAINS) > > - cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD | > > - MI_FLUSH_DW_STORE_INDEX | > > - MI_FLUSH_DW_OP_STOREDW; > > - } else { > > - if (invalidate_domains & I915_GEM_DOMAIN_RENDER) > > - cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX | > > - MI_FLUSH_DW_OP_STOREDW; > > + /* We always require a command barrier so that subsequent > > + * commands, such as breadcrumb interrupts, are strictly ordered > > + * wrt the contents of the write cache being flushed to memory > > + * (and thus being coherent from the CPU). > > + */ > > + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW; > > + > > + if (invalidate_domains & I915_GEM_GPU_DOMAINS) { > > Why do you change the mask from I915_GEM_DOMAIN_RENDER to > I915_GEM_GPU_DOMAINS for ring != VCS? My bad, I didn't notice that execlists was originally broken. The patch is correct. -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] drm/i915: Insert a command barrier on BLT/BSD cache flushes 2015-02-09 16:07 ` Chris Wilson @ 2015-02-09 18:15 ` Jani Nikula 0 siblings, 0 replies; 9+ messages in thread From: Jani Nikula @ 2015-02-09 18:15 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx, stable On Mon, 09 Feb 2015, Chris Wilson <chris@chris-wilson.co.uk> wrote: > On Mon, Feb 09, 2015 at 06:00:25PM +0200, Jani Nikula wrote: >> On Thu, 22 Jan 2015, Chris Wilson <chris@chris-wilson.co.uk> wrote: >> > This looked like an odd regression from >> > >> > commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b >> > Author: Chris Wilson <chris@chris-wilson.co.uk> >> > Date: Thu Jun 12 10:28:55 2014 +0100 >> > >> > drm/i915: Restrict GPU boost to the RCS engine >> > >> > but in reality it undercovered a much older coherency bug. The issue that >> > boosting the GPU frequency on the BCS ring was masking was that we could >> > wake the CPU up after completion of a BCS batch and inspect memory prior >> > to the write cache being fully evicted. In order to serialise the >> > breadcrumb interrupt (and so ensure that the CPU's view of memory is >> > coherent) we need to perform a post-sync operation in the MI_FLUSH_DW. >> > >> > v2: Fix all the MI_FLUSH_DW (bsd plus the duplication in execlists). >> > >> > Testcase: gpuX-rcs-gpu-read-after-write >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >> > Cc: stable@vger.kernel.org >> > Acked-by: Daniel Vetter <daniel@ffwll.ch> >> > --- >> > drivers/gpu/drm/i915/intel_lrc.c | 20 +++++++++++--------- >> > drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++++++++++++++++++---- >> > 2 files changed, 30 insertions(+), 13 deletions(-) >> > >> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c >> > index e405b61cdac5..8e71d8851c9a 100644 >> > --- a/drivers/gpu/drm/i915/intel_lrc.c >> > +++ b/drivers/gpu/drm/i915/intel_lrc.c >> > @@ -1237,15 +1237,17 @@ static int gen8_emit_flush(struct intel_ringbuffer *ringbuf, >> > >> > cmd = MI_FLUSH_DW + 1; >> > >> > - if (ring == &dev_priv->ring[VCS]) { >> > - if (invalidate_domains & I915_GEM_GPU_DOMAINS) >> > - cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD | >> > - MI_FLUSH_DW_STORE_INDEX | >> > - MI_FLUSH_DW_OP_STOREDW; >> > - } else { >> > - if (invalidate_domains & I915_GEM_DOMAIN_RENDER) >> > - cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX | >> > - MI_FLUSH_DW_OP_STOREDW; >> > + /* We always require a command barrier so that subsequent >> > + * commands, such as breadcrumb interrupts, are strictly ordered >> > + * wrt the contents of the write cache being flushed to memory >> > + * (and thus being coherent from the CPU). >> > + */ >> > + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW; >> > + >> > + if (invalidate_domains & I915_GEM_GPU_DOMAINS) { >> >> Why do you change the mask from I915_GEM_DOMAIN_RENDER to >> I915_GEM_GPU_DOMAINS for ring != VCS? > > My bad, I didn't notice that execlists was originally broken. The patch > is correct. I'll take your and Daniel's word for it. I hope I won't have to regret not asking you to split this into two patches... Pushed to drm-intel-next-fixes, thanks for the patch and ack. BR, Jani. > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre -- Jani Nikula, Intel Open Source Technology Center _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes 2015-01-22 13:13 [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes Chris Wilson 2015-01-22 13:24 ` [Intel-gfx] " Daniel Vetter @ 2015-01-22 22:41 ` shuang.he 1 sibling, 0 replies; 9+ messages in thread From: shuang.he @ 2015-01-22 22:41 UTC (permalink / raw) To: shuang.he, ethan.gao, intel-gfx, chris Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com) Task id: 5625 -------------------------------------Summary------------------------------------- Platform Delta drm-intel-nightly Series Applied PNV -2 353/353 351/353 ILK 200/200 200/200 SNB 400/422 400/422 IVB 487/487 487/487 BYT 296/296 296/296 HSW 508/508 508/508 BDW 401/402 401/402 -------------------------------------Detailed------------------------------------- Platform Test drm-intel-nightly Series Applied *PNV igt_gen3_render_linear_blits PASS(4, M25M23) NRUN(1, M23) *PNV igt_gen3_render_mixed_blits PASS(3, M25M23) CRASH(1, M23) Note: You need to pay more attention to line start with '*' _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-02-09 18:16 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-01-22 13:13 [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes Chris Wilson 2015-01-22 13:24 ` [Intel-gfx] " Daniel Vetter 2015-01-22 13:32 ` Chris Wilson 2015-01-22 13:42 ` [PATCH v2] " Chris Wilson 2015-01-22 22:45 ` shuang.he 2015-02-09 16:00 ` [Intel-gfx] " Jani Nikula 2015-02-09 16:07 ` Chris Wilson 2015-02-09 18:15 ` Jani Nikula 2015-01-22 22:41 ` [PATCH] " shuang.he
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox