* [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue
@ 2015-08-14 12:38 Imre Deak
2015-08-14 12:38 ` [PATCH v2 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno Imre Deak
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Imre Deak @ 2015-08-14 12:38 UTC (permalink / raw)
To: intel-gfx; +Cc: Jani Nikula, Mika Kuoppala
This is a v2 of [1]. Since v1 the HW team confirmed that there is an
HW issue in A steppings with the GPU/CPU snoop logic, which explains why
we need this workaround.
In v2 I fixed a typo and limited the workaround to A steppings, since
in later steppings this problem is fixed.
[1]
http://lists.freedesktop.org/archives/intel-gfx/2015-June/068218.html
Imre Deak (2):
drm/i915/bxt: work around HW coherency issue when accessing GPU seqno
drm/i915/bxt: work around HW coherency issue for cached GEM mappings
drivers/gpu/drm/i915/i915_gem.c | 11 ++++++++++-
drivers/gpu/drm/i915/intel_lrc.c | 20 ++++++++++++++++++++
drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++
3 files changed, 37 insertions(+), 1 deletion(-)
--
2.1.4
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH v2 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-08-14 12:38 [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue Imre Deak @ 2015-08-14 12:38 ` Imre Deak 2015-08-14 13:12 ` Chris Wilson 2015-08-14 15:35 ` [PATCH v3 " Imre Deak 2015-08-14 12:38 ` [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings Imre Deak 2015-08-14 12:49 ` [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue Chris Wilson 2 siblings, 2 replies; 16+ messages in thread From: Imre Deak @ 2015-08-14 12:38 UTC (permalink / raw) To: intel-gfx; +Cc: Jani Nikula, Mika Kuoppala By running igt/store_dword_loop_render on BXT we can hit a coherency problem where the seqno written at GPU command completion time is not seen by the CPU. This results in __i915_wait_request seeing the stale seqno and not completing the request (not considering the lost interrupt/GPU reset mechanism). I also verified that this isn't a case of a lost interrupt, or that the command didn't complete somehow: when the coherency issue occured I read the seqno via an uncached GTT mapping too. While the cached version of the seqno still showed the stale value the one read via the uncached mapping was the correct one. Work around this issue by clflushing the corresponding CPU cacheline following any store of the seqno and preceding any reading of it. When reading it do this only when the caller expects a coherent view. v2: - fix using the proper logical && instead of a bitwise & (Jani, Mika) - limit the workaround to A stepping, on later steppings this HW issue is fixed Testcase: igt/store_dword_loop_render Signed-off-by: Imre Deak <imre.deak@intel.com> --- drivers/gpu/drm/i915/intel_lrc.c | 20 ++++++++++++++++++++ drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++ 2 files changed, 27 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 138964a..46f2be0 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1691,12 +1691,32 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request, static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency) { + + /* + * On BXT A steppings there is a HW coherency issue whereby the + * MI_STORE_DATA_IMM storing the completed request's seqno + * occasionally doesn't invalidate the CPU cache. Work around this by + * clflushing the corresponding cacheline whenever the caller wants + * the coherency to be guaranteed. Note that this cacheline is known + * to be clean at this point, since we only write it in + * gen8_set_seqno(), where we also do a clflush after the write. So + * this clflush in practice becomes an invalidate operation. + */ + + if (IS_BROXTON(ring->dev) && INTEL_REVID(ring->dev) < BXT_REVID_B0 && + !lazy_coherency) + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); + return intel_read_status_page(ring, I915_GEM_HWS_INDEX); } static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno) { intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno); + + /* See gen8_get_seqno() explaining the reason for the clflush. */ + if (IS_BROXTON(ring->dev) && INTEL_REVID(ring->dev) < BXT_REVID_B0) + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); } static int gen8_emit_request(struct drm_i915_gem_request *request) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 2e85fda..95b0b4b 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -377,6 +377,13 @@ intel_ring_sync_index(struct intel_engine_cs *ring, return idx; } +static inline void +intel_flush_status_page(struct intel_engine_cs *ring, int reg) +{ + drm_clflush_virt_range(&ring->status_page.page_addr[reg], + sizeof(uint32_t)); +} + static inline u32 intel_read_status_page(struct intel_engine_cs *ring, int reg) -- 2.1.4 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-08-14 12:38 ` [PATCH v2 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno Imre Deak @ 2015-08-14 13:12 ` Chris Wilson 2015-08-14 13:31 ` Imre Deak 2015-08-14 15:35 ` [PATCH v3 " Imre Deak 1 sibling, 1 reply; 16+ messages in thread From: Chris Wilson @ 2015-08-14 13:12 UTC (permalink / raw) To: Imre Deak; +Cc: Jani Nikula, intel-gfx, Mika Kuoppala On Fri, Aug 14, 2015 at 03:38:56PM +0300, Imre Deak wrote: > By running igt/store_dword_loop_render on BXT we can hit a coherency > problem where the seqno written at GPU command completion time is not > seen by the CPU. This results in __i915_wait_request seeing the stale > seqno and not completing the request (not considering the lost > interrupt/GPU reset mechanism). I also verified that this isn't a case > of a lost interrupt, or that the command didn't complete somehow: when > the coherency issue occured I read the seqno via an uncached GTT mapping > too. While the cached version of the seqno still showed the stale value > the one read via the uncached mapping was the correct one. > > Work around this issue by clflushing the corresponding CPU cacheline > following any store of the seqno and preceding any reading of it. When > reading it do this only when the caller expects a coherent view. > > v2: > - fix using the proper logical && instead of a bitwise & (Jani, Mika) > - limit the workaround to A stepping, on later steppings this HW issue > is fixed We have vfuncs in order to avoid the pointer dance (and boy is it a pretty and quite convoluted dance). -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-08-14 13:12 ` Chris Wilson @ 2015-08-14 13:31 ` Imre Deak 0 siblings, 0 replies; 16+ messages in thread From: Imre Deak @ 2015-08-14 13:31 UTC (permalink / raw) To: Chris Wilson; +Cc: Jani Nikula, intel-gfx, Mika Kuoppala On pe, 2015-08-14 at 14:12 +0100, Chris Wilson wrote: > On Fri, Aug 14, 2015 at 03:38:56PM +0300, Imre Deak wrote: > > By running igt/store_dword_loop_render on BXT we can hit a coherency > > problem where the seqno written at GPU command completion time is not > > seen by the CPU. This results in __i915_wait_request seeing the stale > > seqno and not completing the request (not considering the lost > > interrupt/GPU reset mechanism). I also verified that this isn't a case > > of a lost interrupt, or that the command didn't complete somehow: when > > the coherency issue occured I read the seqno via an uncached GTT mapping > > too. While the cached version of the seqno still showed the stale value > > the one read via the uncached mapping was the correct one. > > > > Work around this issue by clflushing the corresponding CPU cacheline > > following any store of the seqno and preceding any reading of it. When > > reading it do this only when the caller expects a coherent view. > > > > v2: > > - fix using the proper logical && instead of a bitwise & (Jani, Mika) > > - limit the workaround to A stepping, on later steppings this HW issue > > is fixed > > We have vfuncs in order to avoid the pointer dance (and boy is it a > pretty and quite convoluted dance). Ok, I'll add new get_seqno/set_seqno vfuncs. > -Chris > _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-08-14 12:38 ` [PATCH v2 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno Imre Deak 2015-08-14 13:12 ` Chris Wilson @ 2015-08-14 15:35 ` Imre Deak 2015-08-14 16:21 ` Chris Wilson 1 sibling, 1 reply; 16+ messages in thread From: Imre Deak @ 2015-08-14 15:35 UTC (permalink / raw) To: intel-gfx; +Cc: Jani Nikula, Mika Kuoppala By running igt/store_dword_loop_render on BXT we can hit a coherency problem where the seqno written at GPU command completion time is not seen by the CPU. This results in __i915_wait_request seeing the stale seqno and not completing the request (not considering the lost interrupt/GPU reset mechanism). I also verified that this isn't a case of a lost interrupt, or that the command didn't complete somehow: when the coherency issue occured I read the seqno via an uncached GTT mapping too. While the cached version of the seqno still showed the stale value the one read via the uncached mapping was the correct one. Work around this issue by clflushing the corresponding CPU cacheline following any store of the seqno and preceding any reading of it. When reading it do this only when the caller expects a coherent view. v2: - fix using the proper logical && instead of a bitwise & (Jani, Mika) - limit the workaround to A stepping, on later steppings this HW issue is fixed v3: - use a separate get_seqno/set_seqno vfunc (Chris) Testcase: igt/store_dword_loop_render Signed-off-by: Imre Deak <imre.deak@intel.com> --- drivers/gpu/drm/i915/intel_lrc.c | 64 ++++++++++++++++++++++++++++----- drivers/gpu/drm/i915/intel_ringbuffer.h | 7 ++++ 2 files changed, 63 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 138964a..1269d57 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1699,6 +1699,34 @@ static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno) intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno); } +static u32 bxt_a_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency) +{ + + /* + * On BXT A steppings there is a HW coherency issue whereby the + * MI_STORE_DATA_IMM storing the completed request's seqno + * occasionally doesn't invalidate the CPU cache. Work around this by + * clflushing the corresponding cacheline whenever the caller wants + * the coherency to be guaranteed. Note that this cacheline is known + * to be clean at this point, since we only write it in + * bxt_a_set_seqno(), where we also do a clflush after the write. So + * this clflush in practice becomes an invalidate operation. + */ + + if (!lazy_coherency) + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); + + return intel_read_status_page(ring, I915_GEM_HWS_INDEX); +} + +static void bxt_a_set_seqno(struct intel_engine_cs *ring, u32 seqno) +{ + intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno); + + /* See bxt_a_get_seqno() explaining the reason for the clflush. */ + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); +} + static int gen8_emit_request(struct drm_i915_gem_request *request) { struct intel_ringbuffer *ringbuf = request->ringbuf; @@ -1868,8 +1896,13 @@ static int logical_render_ring_init(struct drm_device *dev) ring->init_hw = gen8_init_render_ring; ring->init_context = gen8_init_rcs_context; ring->cleanup = intel_fini_pipe_control; - ring->get_seqno = gen8_get_seqno; - ring->set_seqno = gen8_set_seqno; + if (IS_BROXTON(dev) && INTEL_REVID(dev) < BXT_REVID_B0) { + ring->get_seqno = bxt_a_get_seqno; + ring->set_seqno = bxt_a_set_seqno; + } else { + ring->get_seqno = gen8_get_seqno; + ring->set_seqno = gen8_set_seqno; + } ring->emit_request = gen8_emit_request; ring->emit_flush = gen8_emit_flush_render; ring->irq_get = gen8_logical_ring_get_irq; @@ -1915,8 +1948,13 @@ static int logical_bsd_ring_init(struct drm_device *dev) GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT; ring->init_hw = gen8_init_common_ring; - ring->get_seqno = gen8_get_seqno; - ring->set_seqno = gen8_set_seqno; + if (IS_BROXTON(dev) && INTEL_REVID(dev) < BXT_REVID_B0) { + ring->get_seqno = bxt_a_get_seqno; + ring->set_seqno = bxt_a_set_seqno; + } else { + ring->get_seqno = gen8_get_seqno; + ring->set_seqno = gen8_set_seqno; + } ring->emit_request = gen8_emit_request; ring->emit_flush = gen8_emit_flush; ring->irq_get = gen8_logical_ring_get_irq; @@ -1965,8 +2003,13 @@ static int logical_blt_ring_init(struct drm_device *dev) GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT; ring->init_hw = gen8_init_common_ring; - ring->get_seqno = gen8_get_seqno; - ring->set_seqno = gen8_set_seqno; + if (IS_BROXTON(dev) && INTEL_REVID(dev) < BXT_REVID_B0) { + ring->get_seqno = bxt_a_get_seqno; + ring->set_seqno = bxt_a_set_seqno; + } else { + ring->get_seqno = gen8_get_seqno; + ring->set_seqno = gen8_set_seqno; + } ring->emit_request = gen8_emit_request; ring->emit_flush = gen8_emit_flush; ring->irq_get = gen8_logical_ring_get_irq; @@ -1990,8 +2033,13 @@ static int logical_vebox_ring_init(struct drm_device *dev) GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT; ring->init_hw = gen8_init_common_ring; - ring->get_seqno = gen8_get_seqno; - ring->set_seqno = gen8_set_seqno; + if (IS_BROXTON(dev) && INTEL_REVID(dev) < BXT_REVID_B0) { + ring->get_seqno = bxt_a_get_seqno; + ring->set_seqno = bxt_a_set_seqno; + } else { + ring->get_seqno = gen8_get_seqno; + ring->set_seqno = gen8_set_seqno; + } ring->emit_request = gen8_emit_request; ring->emit_flush = gen8_emit_flush; ring->irq_get = gen8_logical_ring_get_irq; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 2e85fda..95b0b4b 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -377,6 +377,13 @@ intel_ring_sync_index(struct intel_engine_cs *ring, return idx; } +static inline void +intel_flush_status_page(struct intel_engine_cs *ring, int reg) +{ + drm_clflush_virt_range(&ring->status_page.page_addr[reg], + sizeof(uint32_t)); +} + static inline u32 intel_read_status_page(struct intel_engine_cs *ring, int reg) -- 2.1.4 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-08-14 15:35 ` [PATCH v3 " Imre Deak @ 2015-08-14 16:21 ` Chris Wilson 0 siblings, 0 replies; 16+ messages in thread From: Chris Wilson @ 2015-08-14 16:21 UTC (permalink / raw) To: Imre Deak; +Cc: Jani Nikula, intel-gfx, Mika Kuoppala On Fri, Aug 14, 2015 at 06:35:27PM +0300, Imre Deak wrote: > By running igt/store_dword_loop_render on BXT we can hit a coherency > problem where the seqno written at GPU command completion time is not > seen by the CPU. This results in __i915_wait_request seeing the stale > seqno and not completing the request (not considering the lost > interrupt/GPU reset mechanism). I also verified that this isn't a case > of a lost interrupt, or that the command didn't complete somehow: when > the coherency issue occured I read the seqno via an uncached GTT mapping > too. While the cached version of the seqno still showed the stale value > the one read via the uncached mapping was the correct one. > > Work around this issue by clflushing the corresponding CPU cacheline > following any store of the seqno and preceding any reading of it. When > reading it do this only when the caller expects a coherent view. > > v2: > - fix using the proper logical && instead of a bitwise & (Jani, Mika) > - limit the workaround to A stepping, on later steppings this HW issue > is fixed > v3: > - use a separate get_seqno/set_seqno vfunc (Chris) > > Testcase: igt/store_dword_loop_render > Signed-off-by: Imre Deak <imre.deak@intel.com> I'm not going to quibble about the whitespace or the code duplication that isn't your fault... Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings 2015-08-14 12:38 [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue Imre Deak 2015-08-14 12:38 ` [PATCH v2 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno Imre Deak @ 2015-08-14 12:38 ` Imre Deak 2015-08-14 13:11 ` Chris Wilson ` (2 more replies) 2015-08-14 12:49 ` [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue Chris Wilson 2 siblings, 3 replies; 16+ messages in thread From: Imre Deak @ 2015-08-14 12:38 UTC (permalink / raw) To: intel-gfx; +Cc: Jani Nikula, Mika Kuoppala Due to a coherency issue on BXT A steppings we can't guarantee a coherent view of cached GPU mappings, so fall back to uncached mappings. Note that this still won't fix cases where userspace expects a coherent view without synchronizing (via a set domain call). It still makes sense to limit the kernel's notion of the mapping to be uncached, for example for relocations to work properly during execbuffer time. Also in case user space does synchronize the buffer, this will still guarantee that we'll do the proper clflushing for the buffer. v2: - limit the WA to A steppings, on later stepping this HW issue is fixed Testcast: igt/gem_store_dword_batches_loop/cached-mapping Signed-off-by: Imre Deak <imre.deak@intel.com> --- drivers/gpu/drm/i915/i915_gem.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 407b6b3..987ffa8 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3742,7 +3742,16 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data, level = I915_CACHE_NONE; break; case I915_CACHING_CACHED: - level = I915_CACHE_LLC; + /* + * Due to a HW issue on BXT A stepping, GPU stores via a + * snooped mapping may leave stale data in a corresponding CPU + * cacheline, whereas normally such cachelines would get + * invalidated. As a workaround assume that these stores are + * not coherent, which means we'll flush the CPU cache manually + * whenever doing a CPU/GPU sync operation. + */ + level = IS_BROXTON(dev) && INTEL_REVID(dev) < BXT_REVID_B0 ? + I915_CACHE_NONE : I915_CACHE_LLC; break; case I915_CACHING_DISPLAY: level = HAS_WT(dev) ? I915_CACHE_WT : I915_CACHE_NONE; -- 2.1.4 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings 2015-08-14 12:38 ` [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings Imre Deak @ 2015-08-14 13:11 ` Chris Wilson 2015-08-14 13:29 ` Imre Deak 2015-08-14 15:43 ` [PATCH v3 2/2] drm/i915/bxt: don't allow cached GEM mappings on A stepping Imre Deak 2015-08-16 11:45 ` [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings shuang.he 2 siblings, 1 reply; 16+ messages in thread From: Chris Wilson @ 2015-08-14 13:11 UTC (permalink / raw) To: Imre Deak; +Cc: Jani Nikula, intel-gfx, Mika Kuoppala On Fri, Aug 14, 2015 at 03:38:57PM +0300, Imre Deak wrote: > Due to a coherency issue on BXT A steppings we can't guarantee a > coherent view of cached GPU mappings, so fall back to uncached mappings. > Note that this still won't fix cases where userspace expects a coherent > view without synchronizing (via a set domain call). It still makes sense > to limit the kernel's notion of the mapping to be uncached, for example > for relocations to work properly during execbuffer time. Also in case > user space does synchronize the buffer, this will still guarantee that > we'll do the proper clflushing for the buffer. > > v2: > - limit the WA to A steppings, on later stepping this HW issue is fixed This has to report the failure, ENODEV otherwise userspace will be terribly confused (it will try to CPU coherent access assuming it will be fast, when it is better to use alternative paths). -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings 2015-08-14 13:11 ` Chris Wilson @ 2015-08-14 13:29 ` Imre Deak 0 siblings, 0 replies; 16+ messages in thread From: Imre Deak @ 2015-08-14 13:29 UTC (permalink / raw) To: Chris Wilson; +Cc: Jani Nikula, intel-gfx, Mika Kuoppala On pe, 2015-08-14 at 14:11 +0100, Chris Wilson wrote: > On Fri, Aug 14, 2015 at 03:38:57PM +0300, Imre Deak wrote: > > Due to a coherency issue on BXT A steppings we can't guarantee a > > coherent view of cached GPU mappings, so fall back to uncached mappings. > > Note that this still won't fix cases where userspace expects a coherent > > view without synchronizing (via a set domain call). It still makes sense > > to limit the kernel's notion of the mapping to be uncached, for example > > for relocations to work properly during execbuffer time. Also in case > > user space does synchronize the buffer, this will still guarantee that > > we'll do the proper clflushing for the buffer. > > > > v2: > > - limit the WA to A steppings, on later stepping this HW issue is fixed > > This has to report the failure, ENODEV otherwise userspace will be > terribly confused (it will try to CPU coherent access assuming it will > be fast, when it is better to use alternative paths). Ok, I was not sure how existing user space would handle the failure, but if it has the fall-back logic then ENODEV is the better solution. Will change this. > -Chris > _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 2/2] drm/i915/bxt: don't allow cached GEM mappings on A stepping 2015-08-14 12:38 ` [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings Imre Deak 2015-08-14 13:11 ` Chris Wilson @ 2015-08-14 15:43 ` Imre Deak 2015-08-14 16:18 ` Chris Wilson 2015-08-16 11:45 ` [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings shuang.he 2 siblings, 1 reply; 16+ messages in thread From: Imre Deak @ 2015-08-14 15:43 UTC (permalink / raw) To: intel-gfx; +Cc: Jani Nikula, Mika Kuoppala Due to a coherency issue on BXT A steppings we can't guarantee a coherent view of cached (CPU snooped) GPU mappings, so fail such requests. User space is supposed to fall back to uncached mappings in this case. v2: - limit the WA to A steppings, on later stepping this HW issue is fixed v3: - return error instead of trying to work around the issue in kernel, since that could confuse user space (Chris) Testcast: igt/gem_store_dword_batches_loop/cached-mapping Signed-off-by: Imre Deak <imre.deak@intel.com> --- drivers/gpu/drm/i915/i915_gem.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 407b6b3..95fd4ff 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3742,6 +3742,15 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data, level = I915_CACHE_NONE; break; case I915_CACHING_CACHED: + /* + * Due to a HW issue on BXT A stepping, GPU stores via a + * snooped mapping may leave stale data in a corresponding CPU + * cacheline, whereas normally such cachelines would get + * invalidated. + */ + if (IS_BROXTON(dev) && INTEL_REVID(dev) < BXT_REVID_B0) + return -ENODEV; + level = I915_CACHE_LLC; break; case I915_CACHING_DISPLAY: -- 2.1.4 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/2] drm/i915/bxt: don't allow cached GEM mappings on A stepping 2015-08-14 15:43 ` [PATCH v3 2/2] drm/i915/bxt: don't allow cached GEM mappings on A stepping Imre Deak @ 2015-08-14 16:18 ` Chris Wilson 2015-08-26 7:13 ` Daniel Vetter 0 siblings, 1 reply; 16+ messages in thread From: Chris Wilson @ 2015-08-14 16:18 UTC (permalink / raw) To: Imre Deak; +Cc: Jani Nikula, intel-gfx, Mika Kuoppala On Fri, Aug 14, 2015 at 06:43:30PM +0300, Imre Deak wrote: > Due to a coherency issue on BXT A steppings we can't guarantee a > coherent view of cached (CPU snooped) GPU mappings, so fail such > requests. User space is supposed to fall back to uncached mappings in > this case. > > v2: > - limit the WA to A steppings, on later stepping this HW issue is fixed > v3: > - return error instead of trying to work around the issue in kernel, > since that could confuse user space (Chris) > > Testcast: igt/gem_store_dword_batches_loop/cached-mapping > Signed-off-by: Imre Deak <imre.deak@intel.com> I checked in the two userspaces where I have used snooping that both handle an error correctly, so Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/2] drm/i915/bxt: don't allow cached GEM mappings on A stepping 2015-08-14 16:18 ` Chris Wilson @ 2015-08-26 7:13 ` Daniel Vetter 0 siblings, 0 replies; 16+ messages in thread From: Daniel Vetter @ 2015-08-26 7:13 UTC (permalink / raw) To: Chris Wilson, Imre Deak, intel-gfx, Mika Kuoppala, Jani Nikula On Fri, Aug 14, 2015 at 05:18:27PM +0100, Chris Wilson wrote: > On Fri, Aug 14, 2015 at 06:43:30PM +0300, Imre Deak wrote: > > Due to a coherency issue on BXT A steppings we can't guarantee a > > coherent view of cached (CPU snooped) GPU mappings, so fail such > > requests. User space is supposed to fall back to uncached mappings in > > this case. > > > > v2: > > - limit the WA to A steppings, on later stepping this HW issue is fixed > > v3: > > - return error instead of trying to work around the issue in kernel, > > since that could confuse user space (Chris) > > > > Testcast: igt/gem_store_dword_batches_loop/cached-mapping > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > I checked in the two userspaces where I have used snooping that both > handle an error correctly, so > > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Both patches applied, thanks. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings 2015-08-14 12:38 ` [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings Imre Deak 2015-08-14 13:11 ` Chris Wilson 2015-08-14 15:43 ` [PATCH v3 2/2] drm/i915/bxt: don't allow cached GEM mappings on A stepping Imre Deak @ 2015-08-16 11:45 ` shuang.he 2 siblings, 0 replies; 16+ messages in thread From: shuang.he @ 2015-08-16 11:45 UTC (permalink / raw) To: shuang.he, julianx.dumez, christophe.sureau, lei.a.liu, intel-gfx, imre.deak Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com) Task id: 7201 -------------------------------------Summary------------------------------------- Platform Delta drm-intel-nightly Series Applied ILK -1 302/302 301/302 SNB 315/315 315/315 IVB 336/336 336/336 BYT 283/283 283/283 HSW 378/378 378/378 -------------------------------------Detailed------------------------------------- Platform Test drm-intel-nightly Series Applied *ILK igt@kms_flip@flip-vs-dpms-interruptible PASS(1) DMESG_WARN(1) Note: You need to pay more attention to line start with '*' _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue 2015-08-14 12:38 [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue Imre Deak 2015-08-14 12:38 ` [PATCH v2 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno Imre Deak 2015-08-14 12:38 ` [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings Imre Deak @ 2015-08-14 12:49 ` Chris Wilson 2015-08-14 13:26 ` Imre Deak 2 siblings, 1 reply; 16+ messages in thread From: Chris Wilson @ 2015-08-14 12:49 UTC (permalink / raw) To: Imre Deak; +Cc: Jani Nikula, intel-gfx, Mika Kuoppala On Fri, Aug 14, 2015 at 03:38:55PM +0300, Imre Deak wrote: > This is a v2 of [1]. Since v1 the HW team confirmed that there is an > HW issue in A steppings with the GPU/CPU snoop logic, which explains why > we need this workaround. I've been testing this extensively, and I do believe we can use clflush for all gen - it is a measurable improvement over using the heavyweight/locked read of ACTHD, and none of the coherency tests have thrown any warnings (over a period of a couple of months now). -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue 2015-08-14 12:49 ` [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue Chris Wilson @ 2015-08-14 13:26 ` Imre Deak 2015-08-14 13:31 ` Chris Wilson 0 siblings, 1 reply; 16+ messages in thread From: Imre Deak @ 2015-08-14 13:26 UTC (permalink / raw) To: Chris Wilson; +Cc: Jani Nikula, intel-gfx, Mika Kuoppala On pe, 2015-08-14 at 13:49 +0100, Chris Wilson wrote: > On Fri, Aug 14, 2015 at 03:38:55PM +0300, Imre Deak wrote: > > This is a v2 of [1]. Since v1 the HW team confirmed that there is an > > HW issue in A steppings with the GPU/CPU snoop logic, which explains why > > we need this workaround. > > I've been testing this extensively, and I do believe we can use clflush > for all gen - it is a measurable improvement over using the > heavyweight/locked read of ACTHD, and none of the coherency tests have > thrown any warnings (over a period of a couple of months now). Ok. Do you mean only places where there is already the ACTHD w/a, or to extend it also to other platforms? Imo doing this as a follow-up to this patchset would make sense, to keep platform specific changes separate. --Imre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue 2015-08-14 13:26 ` Imre Deak @ 2015-08-14 13:31 ` Chris Wilson 0 siblings, 0 replies; 16+ messages in thread From: Chris Wilson @ 2015-08-14 13:31 UTC (permalink / raw) To: Imre Deak; +Cc: Jani Nikula, intel-gfx, Mika Kuoppala On Fri, Aug 14, 2015 at 04:26:29PM +0300, Imre Deak wrote: > On pe, 2015-08-14 at 13:49 +0100, Chris Wilson wrote: > > On Fri, Aug 14, 2015 at 03:38:55PM +0300, Imre Deak wrote: > > > This is a v2 of [1]. Since v1 the HW team confirmed that there is an > > > HW issue in A steppings with the GPU/CPU snoop logic, which explains why > > > we need this workaround. > > > > I've been testing this extensively, and I do believe we can use clflush > > for all gen - it is a measurable improvement over using the > > heavyweight/locked read of ACTHD, and none of the coherency tests have > > thrown any warnings (over a period of a couple of months now). > > Ok. Do you mean only places where there is already the ACTHD w/a, or to > extend it also to other platforms? I mean replace the current ACTHD w/a. > Imo doing this as a follow-up to this patchset would make sense, to keep > platform specific changes separate. Agreed. Get the w/a in place, then do the conversion so that we have a safe fallback. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2015-08-26 7:13 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-08-14 12:38 [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue Imre Deak 2015-08-14 12:38 ` [PATCH v2 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno Imre Deak 2015-08-14 13:12 ` Chris Wilson 2015-08-14 13:31 ` Imre Deak 2015-08-14 15:35 ` [PATCH v3 " Imre Deak 2015-08-14 16:21 ` Chris Wilson 2015-08-14 12:38 ` [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings Imre Deak 2015-08-14 13:11 ` Chris Wilson 2015-08-14 13:29 ` Imre Deak 2015-08-14 15:43 ` [PATCH v3 2/2] drm/i915/bxt: don't allow cached GEM mappings on A stepping Imre Deak 2015-08-14 16:18 ` Chris Wilson 2015-08-26 7:13 ` Daniel Vetter 2015-08-16 11:45 ` [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings shuang.he 2015-08-14 12:49 ` [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue Chris Wilson 2015-08-14 13:26 ` Imre Deak 2015-08-14 13:31 ` Chris Wilson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox