* [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue
@ 2015-08-14 12:38 Imre Deak
2015-08-14 12:38 ` [PATCH v2 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno Imre Deak
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Imre Deak @ 2015-08-14 12:38 UTC (permalink / raw)
To: intel-gfx; +Cc: Jani Nikula, Mika Kuoppala
This is a v2 of [1]. Since v1 the HW team confirmed that there is an
HW issue in A steppings with the GPU/CPU snoop logic, which explains why
we need this workaround.
In v2 I fixed a typo and limited the workaround to A steppings, since
in later steppings this problem is fixed.
[1]
http://lists.freedesktop.org/archives/intel-gfx/2015-June/068218.html
Imre Deak (2):
drm/i915/bxt: work around HW coherency issue when accessing GPU seqno
drm/i915/bxt: work around HW coherency issue for cached GEM mappings
drivers/gpu/drm/i915/i915_gem.c | 11 ++++++++++-
drivers/gpu/drm/i915/intel_lrc.c | 20 ++++++++++++++++++++
drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++
3 files changed, 37 insertions(+), 1 deletion(-)
--
2.1.4
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno
2015-08-14 12:38 [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue Imre Deak
@ 2015-08-14 12:38 ` Imre Deak
2015-08-14 13:12 ` Chris Wilson
2015-08-14 15:35 ` [PATCH v3 " Imre Deak
2015-08-14 12:38 ` [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings Imre Deak
2015-08-14 12:49 ` [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue Chris Wilson
2 siblings, 2 replies; 16+ messages in thread
From: Imre Deak @ 2015-08-14 12:38 UTC (permalink / raw)
To: intel-gfx; +Cc: Jani Nikula, Mika Kuoppala
By running igt/store_dword_loop_render on BXT we can hit a coherency
problem where the seqno written at GPU command completion time is not
seen by the CPU. This results in __i915_wait_request seeing the stale
seqno and not completing the request (not considering the lost
interrupt/GPU reset mechanism). I also verified that this isn't a case
of a lost interrupt, or that the command didn't complete somehow: when
the coherency issue occured I read the seqno via an uncached GTT mapping
too. While the cached version of the seqno still showed the stale value
the one read via the uncached mapping was the correct one.
Work around this issue by clflushing the corresponding CPU cacheline
following any store of the seqno and preceding any reading of it. When
reading it do this only when the caller expects a coherent view.
v2:
- fix using the proper logical && instead of a bitwise & (Jani, Mika)
- limit the workaround to A stepping, on later steppings this HW issue
is fixed
Testcase: igt/store_dword_loop_render
Signed-off-by: Imre Deak <imre.deak@intel.com>
---
drivers/gpu/drm/i915/intel_lrc.c | 20 ++++++++++++++++++++
drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++
2 files changed, 27 insertions(+)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 138964a..46f2be0 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1691,12 +1691,32 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
{
+
+ /*
+ * On BXT A steppings there is a HW coherency issue whereby the
+ * MI_STORE_DATA_IMM storing the completed request's seqno
+ * occasionally doesn't invalidate the CPU cache. Work around this by
+ * clflushing the corresponding cacheline whenever the caller wants
+ * the coherency to be guaranteed. Note that this cacheline is known
+ * to be clean at this point, since we only write it in
+ * gen8_set_seqno(), where we also do a clflush after the write. So
+ * this clflush in practice becomes an invalidate operation.
+ */
+
+ if (IS_BROXTON(ring->dev) && INTEL_REVID(ring->dev) < BXT_REVID_B0 &&
+ !lazy_coherency)
+ intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
+
return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
}
static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
{
intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
+
+ /* See gen8_get_seqno() explaining the reason for the clflush. */
+ if (IS_BROXTON(ring->dev) && INTEL_REVID(ring->dev) < BXT_REVID_B0)
+ intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
}
static int gen8_emit_request(struct drm_i915_gem_request *request)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 2e85fda..95b0b4b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -377,6 +377,13 @@ intel_ring_sync_index(struct intel_engine_cs *ring,
return idx;
}
+static inline void
+intel_flush_status_page(struct intel_engine_cs *ring, int reg)
+{
+ drm_clflush_virt_range(&ring->status_page.page_addr[reg],
+ sizeof(uint32_t));
+}
+
static inline u32
intel_read_status_page(struct intel_engine_cs *ring,
int reg)
--
2.1.4
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings
2015-08-14 12:38 [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue Imre Deak
2015-08-14 12:38 ` [PATCH v2 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno Imre Deak
@ 2015-08-14 12:38 ` Imre Deak
2015-08-14 13:11 ` Chris Wilson
` (2 more replies)
2015-08-14 12:49 ` [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue Chris Wilson
2 siblings, 3 replies; 16+ messages in thread
From: Imre Deak @ 2015-08-14 12:38 UTC (permalink / raw)
To: intel-gfx; +Cc: Jani Nikula, Mika Kuoppala
Due to a coherency issue on BXT A steppings we can't guarantee a
coherent view of cached GPU mappings, so fall back to uncached mappings.
Note that this still won't fix cases where userspace expects a coherent
view without synchronizing (via a set domain call). It still makes sense
to limit the kernel's notion of the mapping to be uncached, for example
for relocations to work properly during execbuffer time. Also in case
user space does synchronize the buffer, this will still guarantee that
we'll do the proper clflushing for the buffer.
v2:
- limit the WA to A steppings, on later stepping this HW issue is fixed
Testcast: igt/gem_store_dword_batches_loop/cached-mapping
Signed-off-by: Imre Deak <imre.deak@intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 407b6b3..987ffa8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3742,7 +3742,16 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
level = I915_CACHE_NONE;
break;
case I915_CACHING_CACHED:
- level = I915_CACHE_LLC;
+ /*
+ * Due to a HW issue on BXT A stepping, GPU stores via a
+ * snooped mapping may leave stale data in a corresponding CPU
+ * cacheline, whereas normally such cachelines would get
+ * invalidated. As a workaround assume that these stores are
+ * not coherent, which means we'll flush the CPU cache manually
+ * whenever doing a CPU/GPU sync operation.
+ */
+ level = IS_BROXTON(dev) && INTEL_REVID(dev) < BXT_REVID_B0 ?
+ I915_CACHE_NONE : I915_CACHE_LLC;
break;
case I915_CACHING_DISPLAY:
level = HAS_WT(dev) ? I915_CACHE_WT : I915_CACHE_NONE;
--
2.1.4
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue
2015-08-14 12:38 [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue Imre Deak
2015-08-14 12:38 ` [PATCH v2 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno Imre Deak
2015-08-14 12:38 ` [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings Imre Deak
@ 2015-08-14 12:49 ` Chris Wilson
2015-08-14 13:26 ` Imre Deak
2 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2015-08-14 12:49 UTC (permalink / raw)
To: Imre Deak; +Cc: Jani Nikula, intel-gfx, Mika Kuoppala
On Fri, Aug 14, 2015 at 03:38:55PM +0300, Imre Deak wrote:
> This is a v2 of [1]. Since v1 the HW team confirmed that there is an
> HW issue in A steppings with the GPU/CPU snoop logic, which explains why
> we need this workaround.
I've been testing this extensively, and I do believe we can use clflush
for all gen - it is a measurable improvement over using the
heavyweight/locked read of ACTHD, and none of the coherency tests have
thrown any warnings (over a period of a couple of months now).
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings
2015-08-14 12:38 ` [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings Imre Deak
@ 2015-08-14 13:11 ` Chris Wilson
2015-08-14 13:29 ` Imre Deak
2015-08-14 15:43 ` [PATCH v3 2/2] drm/i915/bxt: don't allow cached GEM mappings on A stepping Imre Deak
2015-08-16 11:45 ` [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings shuang.he
2 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2015-08-14 13:11 UTC (permalink / raw)
To: Imre Deak; +Cc: Jani Nikula, intel-gfx, Mika Kuoppala
On Fri, Aug 14, 2015 at 03:38:57PM +0300, Imre Deak wrote:
> Due to a coherency issue on BXT A steppings we can't guarantee a
> coherent view of cached GPU mappings, so fall back to uncached mappings.
> Note that this still won't fix cases where userspace expects a coherent
> view without synchronizing (via a set domain call). It still makes sense
> to limit the kernel's notion of the mapping to be uncached, for example
> for relocations to work properly during execbuffer time. Also in case
> user space does synchronize the buffer, this will still guarantee that
> we'll do the proper clflushing for the buffer.
>
> v2:
> - limit the WA to A steppings, on later stepping this HW issue is fixed
This has to report the failure, ENODEV otherwise userspace will be
terribly confused (it will try to CPU coherent access assuming it will
be fast, when it is better to use alternative paths).
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno
2015-08-14 12:38 ` [PATCH v2 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno Imre Deak
@ 2015-08-14 13:12 ` Chris Wilson
2015-08-14 13:31 ` Imre Deak
2015-08-14 15:35 ` [PATCH v3 " Imre Deak
1 sibling, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2015-08-14 13:12 UTC (permalink / raw)
To: Imre Deak; +Cc: Jani Nikula, intel-gfx, Mika Kuoppala
On Fri, Aug 14, 2015 at 03:38:56PM +0300, Imre Deak wrote:
> By running igt/store_dword_loop_render on BXT we can hit a coherency
> problem where the seqno written at GPU command completion time is not
> seen by the CPU. This results in __i915_wait_request seeing the stale
> seqno and not completing the request (not considering the lost
> interrupt/GPU reset mechanism). I also verified that this isn't a case
> of a lost interrupt, or that the command didn't complete somehow: when
> the coherency issue occured I read the seqno via an uncached GTT mapping
> too. While the cached version of the seqno still showed the stale value
> the one read via the uncached mapping was the correct one.
>
> Work around this issue by clflushing the corresponding CPU cacheline
> following any store of the seqno and preceding any reading of it. When
> reading it do this only when the caller expects a coherent view.
>
> v2:
> - fix using the proper logical && instead of a bitwise & (Jani, Mika)
> - limit the workaround to A stepping, on later steppings this HW issue
> is fixed
We have vfuncs in order to avoid the pointer dance (and boy is it a
pretty and quite convoluted dance).
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue
2015-08-14 12:49 ` [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue Chris Wilson
@ 2015-08-14 13:26 ` Imre Deak
2015-08-14 13:31 ` Chris Wilson
0 siblings, 1 reply; 16+ messages in thread
From: Imre Deak @ 2015-08-14 13:26 UTC (permalink / raw)
To: Chris Wilson; +Cc: Jani Nikula, intel-gfx, Mika Kuoppala
On pe, 2015-08-14 at 13:49 +0100, Chris Wilson wrote:
> On Fri, Aug 14, 2015 at 03:38:55PM +0300, Imre Deak wrote:
> > This is a v2 of [1]. Since v1 the HW team confirmed that there is an
> > HW issue in A steppings with the GPU/CPU snoop logic, which explains why
> > we need this workaround.
>
> I've been testing this extensively, and I do believe we can use clflush
> for all gen - it is a measurable improvement over using the
> heavyweight/locked read of ACTHD, and none of the coherency tests have
> thrown any warnings (over a period of a couple of months now).
Ok. Do you mean only places where there is already the ACTHD w/a, or to
extend it also to other platforms?
Imo doing this as a follow-up to this patchset would make sense, to keep
platform specific changes separate.
--Imre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings
2015-08-14 13:11 ` Chris Wilson
@ 2015-08-14 13:29 ` Imre Deak
0 siblings, 0 replies; 16+ messages in thread
From: Imre Deak @ 2015-08-14 13:29 UTC (permalink / raw)
To: Chris Wilson; +Cc: Jani Nikula, intel-gfx, Mika Kuoppala
On pe, 2015-08-14 at 14:11 +0100, Chris Wilson wrote:
> On Fri, Aug 14, 2015 at 03:38:57PM +0300, Imre Deak wrote:
> > Due to a coherency issue on BXT A steppings we can't guarantee a
> > coherent view of cached GPU mappings, so fall back to uncached mappings.
> > Note that this still won't fix cases where userspace expects a coherent
> > view without synchronizing (via a set domain call). It still makes sense
> > to limit the kernel's notion of the mapping to be uncached, for example
> > for relocations to work properly during execbuffer time. Also in case
> > user space does synchronize the buffer, this will still guarantee that
> > we'll do the proper clflushing for the buffer.
> >
> > v2:
> > - limit the WA to A steppings, on later stepping this HW issue is fixed
>
> This has to report the failure, ENODEV otherwise userspace will be
> terribly confused (it will try to CPU coherent access assuming it will
> be fast, when it is better to use alternative paths).
Ok, I was not sure how existing user space would handle the failure, but
if it has the fall-back logic then ENODEV is the better solution. Will
change this.
> -Chris
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue
2015-08-14 13:26 ` Imre Deak
@ 2015-08-14 13:31 ` Chris Wilson
0 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2015-08-14 13:31 UTC (permalink / raw)
To: Imre Deak; +Cc: Jani Nikula, intel-gfx, Mika Kuoppala
On Fri, Aug 14, 2015 at 04:26:29PM +0300, Imre Deak wrote:
> On pe, 2015-08-14 at 13:49 +0100, Chris Wilson wrote:
> > On Fri, Aug 14, 2015 at 03:38:55PM +0300, Imre Deak wrote:
> > > This is a v2 of [1]. Since v1 the HW team confirmed that there is an
> > > HW issue in A steppings with the GPU/CPU snoop logic, which explains why
> > > we need this workaround.
> >
> > I've been testing this extensively, and I do believe we can use clflush
> > for all gen - it is a measurable improvement over using the
> > heavyweight/locked read of ACTHD, and none of the coherency tests have
> > thrown any warnings (over a period of a couple of months now).
>
> Ok. Do you mean only places where there is already the ACTHD w/a, or to
> extend it also to other platforms?
I mean replace the current ACTHD w/a.
> Imo doing this as a follow-up to this patchset would make sense, to keep
> platform specific changes separate.
Agreed. Get the w/a in place, then do the conversion so that we have a
safe fallback.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno
2015-08-14 13:12 ` Chris Wilson
@ 2015-08-14 13:31 ` Imre Deak
0 siblings, 0 replies; 16+ messages in thread
From: Imre Deak @ 2015-08-14 13:31 UTC (permalink / raw)
To: Chris Wilson; +Cc: Jani Nikula, intel-gfx, Mika Kuoppala
On pe, 2015-08-14 at 14:12 +0100, Chris Wilson wrote:
> On Fri, Aug 14, 2015 at 03:38:56PM +0300, Imre Deak wrote:
> > By running igt/store_dword_loop_render on BXT we can hit a coherency
> > problem where the seqno written at GPU command completion time is not
> > seen by the CPU. This results in __i915_wait_request seeing the stale
> > seqno and not completing the request (not considering the lost
> > interrupt/GPU reset mechanism). I also verified that this isn't a case
> > of a lost interrupt, or that the command didn't complete somehow: when
> > the coherency issue occured I read the seqno via an uncached GTT mapping
> > too. While the cached version of the seqno still showed the stale value
> > the one read via the uncached mapping was the correct one.
> >
> > Work around this issue by clflushing the corresponding CPU cacheline
> > following any store of the seqno and preceding any reading of it. When
> > reading it do this only when the caller expects a coherent view.
> >
> > v2:
> > - fix using the proper logical && instead of a bitwise & (Jani, Mika)
> > - limit the workaround to A stepping, on later steppings this HW issue
> > is fixed
>
> We have vfuncs in order to avoid the pointer dance (and boy is it a
> pretty and quite convoluted dance).
Ok, I'll add new get_seqno/set_seqno vfuncs.
> -Chris
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno
2015-08-14 12:38 ` [PATCH v2 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno Imre Deak
2015-08-14 13:12 ` Chris Wilson
@ 2015-08-14 15:35 ` Imre Deak
2015-08-14 16:21 ` Chris Wilson
1 sibling, 1 reply; 16+ messages in thread
From: Imre Deak @ 2015-08-14 15:35 UTC (permalink / raw)
To: intel-gfx; +Cc: Jani Nikula, Mika Kuoppala
By running igt/store_dword_loop_render on BXT we can hit a coherency
problem where the seqno written at GPU command completion time is not
seen by the CPU. This results in __i915_wait_request seeing the stale
seqno and not completing the request (not considering the lost
interrupt/GPU reset mechanism). I also verified that this isn't a case
of a lost interrupt, or that the command didn't complete somehow: when
the coherency issue occured I read the seqno via an uncached GTT mapping
too. While the cached version of the seqno still showed the stale value
the one read via the uncached mapping was the correct one.
Work around this issue by clflushing the corresponding CPU cacheline
following any store of the seqno and preceding any reading of it. When
reading it do this only when the caller expects a coherent view.
v2:
- fix using the proper logical && instead of a bitwise & (Jani, Mika)
- limit the workaround to A stepping, on later steppings this HW issue
is fixed
v3:
- use a separate get_seqno/set_seqno vfunc (Chris)
Testcase: igt/store_dword_loop_render
Signed-off-by: Imre Deak <imre.deak@intel.com>
---
drivers/gpu/drm/i915/intel_lrc.c | 64 ++++++++++++++++++++++++++++-----
drivers/gpu/drm/i915/intel_ringbuffer.h | 7 ++++
2 files changed, 63 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 138964a..1269d57 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1699,6 +1699,34 @@ static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
}
+static u32 bxt_a_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
+{
+
+ /*
+ * On BXT A steppings there is a HW coherency issue whereby the
+ * MI_STORE_DATA_IMM storing the completed request's seqno
+ * occasionally doesn't invalidate the CPU cache. Work around this by
+ * clflushing the corresponding cacheline whenever the caller wants
+ * the coherency to be guaranteed. Note that this cacheline is known
+ * to be clean at this point, since we only write it in
+ * bxt_a_set_seqno(), where we also do a clflush after the write. So
+ * this clflush in practice becomes an invalidate operation.
+ */
+
+ if (!lazy_coherency)
+ intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
+
+ return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
+}
+
+static void bxt_a_set_seqno(struct intel_engine_cs *ring, u32 seqno)
+{
+ intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
+
+ /* See bxt_a_get_seqno() explaining the reason for the clflush. */
+ intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
+}
+
static int gen8_emit_request(struct drm_i915_gem_request *request)
{
struct intel_ringbuffer *ringbuf = request->ringbuf;
@@ -1868,8 +1896,13 @@ static int logical_render_ring_init(struct drm_device *dev)
ring->init_hw = gen8_init_render_ring;
ring->init_context = gen8_init_rcs_context;
ring->cleanup = intel_fini_pipe_control;
- ring->get_seqno = gen8_get_seqno;
- ring->set_seqno = gen8_set_seqno;
+ if (IS_BROXTON(dev) && INTEL_REVID(dev) < BXT_REVID_B0) {
+ ring->get_seqno = bxt_a_get_seqno;
+ ring->set_seqno = bxt_a_set_seqno;
+ } else {
+ ring->get_seqno = gen8_get_seqno;
+ ring->set_seqno = gen8_set_seqno;
+ }
ring->emit_request = gen8_emit_request;
ring->emit_flush = gen8_emit_flush_render;
ring->irq_get = gen8_logical_ring_get_irq;
@@ -1915,8 +1948,13 @@ static int logical_bsd_ring_init(struct drm_device *dev)
GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
ring->init_hw = gen8_init_common_ring;
- ring->get_seqno = gen8_get_seqno;
- ring->set_seqno = gen8_set_seqno;
+ if (IS_BROXTON(dev) && INTEL_REVID(dev) < BXT_REVID_B0) {
+ ring->get_seqno = bxt_a_get_seqno;
+ ring->set_seqno = bxt_a_set_seqno;
+ } else {
+ ring->get_seqno = gen8_get_seqno;
+ ring->set_seqno = gen8_set_seqno;
+ }
ring->emit_request = gen8_emit_request;
ring->emit_flush = gen8_emit_flush;
ring->irq_get = gen8_logical_ring_get_irq;
@@ -1965,8 +2003,13 @@ static int logical_blt_ring_init(struct drm_device *dev)
GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
ring->init_hw = gen8_init_common_ring;
- ring->get_seqno = gen8_get_seqno;
- ring->set_seqno = gen8_set_seqno;
+ if (IS_BROXTON(dev) && INTEL_REVID(dev) < BXT_REVID_B0) {
+ ring->get_seqno = bxt_a_get_seqno;
+ ring->set_seqno = bxt_a_set_seqno;
+ } else {
+ ring->get_seqno = gen8_get_seqno;
+ ring->set_seqno = gen8_set_seqno;
+ }
ring->emit_request = gen8_emit_request;
ring->emit_flush = gen8_emit_flush;
ring->irq_get = gen8_logical_ring_get_irq;
@@ -1990,8 +2033,13 @@ static int logical_vebox_ring_init(struct drm_device *dev)
GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
ring->init_hw = gen8_init_common_ring;
- ring->get_seqno = gen8_get_seqno;
- ring->set_seqno = gen8_set_seqno;
+ if (IS_BROXTON(dev) && INTEL_REVID(dev) < BXT_REVID_B0) {
+ ring->get_seqno = bxt_a_get_seqno;
+ ring->set_seqno = bxt_a_set_seqno;
+ } else {
+ ring->get_seqno = gen8_get_seqno;
+ ring->set_seqno = gen8_set_seqno;
+ }
ring->emit_request = gen8_emit_request;
ring->emit_flush = gen8_emit_flush;
ring->irq_get = gen8_logical_ring_get_irq;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 2e85fda..95b0b4b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -377,6 +377,13 @@ intel_ring_sync_index(struct intel_engine_cs *ring,
return idx;
}
+static inline void
+intel_flush_status_page(struct intel_engine_cs *ring, int reg)
+{
+ drm_clflush_virt_range(&ring->status_page.page_addr[reg],
+ sizeof(uint32_t));
+}
+
static inline u32
intel_read_status_page(struct intel_engine_cs *ring,
int reg)
--
2.1.4
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 2/2] drm/i915/bxt: don't allow cached GEM mappings on A stepping
2015-08-14 12:38 ` [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings Imre Deak
2015-08-14 13:11 ` Chris Wilson
@ 2015-08-14 15:43 ` Imre Deak
2015-08-14 16:18 ` Chris Wilson
2015-08-16 11:45 ` [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings shuang.he
2 siblings, 1 reply; 16+ messages in thread
From: Imre Deak @ 2015-08-14 15:43 UTC (permalink / raw)
To: intel-gfx; +Cc: Jani Nikula, Mika Kuoppala
Due to a coherency issue on BXT A steppings we can't guarantee a
coherent view of cached (CPU snooped) GPU mappings, so fail such
requests. User space is supposed to fall back to uncached mappings in
this case.
v2:
- limit the WA to A steppings, on later stepping this HW issue is fixed
v3:
- return error instead of trying to work around the issue in kernel,
since that could confuse user space (Chris)
Testcast: igt/gem_store_dword_batches_loop/cached-mapping
Signed-off-by: Imre Deak <imre.deak@intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 407b6b3..95fd4ff 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3742,6 +3742,15 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
level = I915_CACHE_NONE;
break;
case I915_CACHING_CACHED:
+ /*
+ * Due to a HW issue on BXT A stepping, GPU stores via a
+ * snooped mapping may leave stale data in a corresponding CPU
+ * cacheline, whereas normally such cachelines would get
+ * invalidated.
+ */
+ if (IS_BROXTON(dev) && INTEL_REVID(dev) < BXT_REVID_B0)
+ return -ENODEV;
+
level = I915_CACHE_LLC;
break;
case I915_CACHING_DISPLAY:
--
2.1.4
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/2] drm/i915/bxt: don't allow cached GEM mappings on A stepping
2015-08-14 15:43 ` [PATCH v3 2/2] drm/i915/bxt: don't allow cached GEM mappings on A stepping Imre Deak
@ 2015-08-14 16:18 ` Chris Wilson
2015-08-26 7:13 ` Daniel Vetter
0 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2015-08-14 16:18 UTC (permalink / raw)
To: Imre Deak; +Cc: Jani Nikula, intel-gfx, Mika Kuoppala
On Fri, Aug 14, 2015 at 06:43:30PM +0300, Imre Deak wrote:
> Due to a coherency issue on BXT A steppings we can't guarantee a
> coherent view of cached (CPU snooped) GPU mappings, so fail such
> requests. User space is supposed to fall back to uncached mappings in
> this case.
>
> v2:
> - limit the WA to A steppings, on later stepping this HW issue is fixed
> v3:
> - return error instead of trying to work around the issue in kernel,
> since that could confuse user space (Chris)
>
> Testcast: igt/gem_store_dword_batches_loop/cached-mapping
> Signed-off-by: Imre Deak <imre.deak@intel.com>
I checked in the two userspaces where I have used snooping that both
handle an error correctly, so
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno
2015-08-14 15:35 ` [PATCH v3 " Imre Deak
@ 2015-08-14 16:21 ` Chris Wilson
0 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2015-08-14 16:21 UTC (permalink / raw)
To: Imre Deak; +Cc: Jani Nikula, intel-gfx, Mika Kuoppala
On Fri, Aug 14, 2015 at 06:35:27PM +0300, Imre Deak wrote:
> By running igt/store_dword_loop_render on BXT we can hit a coherency
> problem where the seqno written at GPU command completion time is not
> seen by the CPU. This results in __i915_wait_request seeing the stale
> seqno and not completing the request (not considering the lost
> interrupt/GPU reset mechanism). I also verified that this isn't a case
> of a lost interrupt, or that the command didn't complete somehow: when
> the coherency issue occured I read the seqno via an uncached GTT mapping
> too. While the cached version of the seqno still showed the stale value
> the one read via the uncached mapping was the correct one.
>
> Work around this issue by clflushing the corresponding CPU cacheline
> following any store of the seqno and preceding any reading of it. When
> reading it do this only when the caller expects a coherent view.
>
> v2:
> - fix using the proper logical && instead of a bitwise & (Jani, Mika)
> - limit the workaround to A stepping, on later steppings this HW issue
> is fixed
> v3:
> - use a separate get_seqno/set_seqno vfunc (Chris)
>
> Testcase: igt/store_dword_loop_render
> Signed-off-by: Imre Deak <imre.deak@intel.com>
I'm not going to quibble about the whitespace or the code duplication
that isn't your fault...
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings
2015-08-14 12:38 ` [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings Imre Deak
2015-08-14 13:11 ` Chris Wilson
2015-08-14 15:43 ` [PATCH v3 2/2] drm/i915/bxt: don't allow cached GEM mappings on A stepping Imre Deak
@ 2015-08-16 11:45 ` shuang.he
2 siblings, 0 replies; 16+ messages in thread
From: shuang.he @ 2015-08-16 11:45 UTC (permalink / raw)
To: shuang.he, julianx.dumez, christophe.sureau, lei.a.liu, intel-gfx,
imre.deak
Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 7201
-------------------------------------Summary-------------------------------------
Platform Delta drm-intel-nightly Series Applied
ILK -1 302/302 301/302
SNB 315/315 315/315
IVB 336/336 336/336
BYT 283/283 283/283
HSW 378/378 378/378
-------------------------------------Detailed-------------------------------------
Platform Test drm-intel-nightly Series Applied
*ILK igt@kms_flip@flip-vs-dpms-interruptible PASS(1) DMESG_WARN(1)
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/2] drm/i915/bxt: don't allow cached GEM mappings on A stepping
2015-08-14 16:18 ` Chris Wilson
@ 2015-08-26 7:13 ` Daniel Vetter
0 siblings, 0 replies; 16+ messages in thread
From: Daniel Vetter @ 2015-08-26 7:13 UTC (permalink / raw)
To: Chris Wilson, Imre Deak, intel-gfx, Mika Kuoppala, Jani Nikula
On Fri, Aug 14, 2015 at 05:18:27PM +0100, Chris Wilson wrote:
> On Fri, Aug 14, 2015 at 06:43:30PM +0300, Imre Deak wrote:
> > Due to a coherency issue on BXT A steppings we can't guarantee a
> > coherent view of cached (CPU snooped) GPU mappings, so fail such
> > requests. User space is supposed to fall back to uncached mappings in
> > this case.
> >
> > v2:
> > - limit the WA to A steppings, on later stepping this HW issue is fixed
> > v3:
> > - return error instead of trying to work around the issue in kernel,
> > since that could confuse user space (Chris)
> >
> > Testcast: igt/gem_store_dword_batches_loop/cached-mapping
> > Signed-off-by: Imre Deak <imre.deak@intel.com>
>
> I checked in the two userspaces where I have used snooping that both
> handle an error correctly, so
>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Both patches applied, thanks.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2015-08-26 7:13 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-14 12:38 [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue Imre Deak
2015-08-14 12:38 ` [PATCH v2 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno Imre Deak
2015-08-14 13:12 ` Chris Wilson
2015-08-14 13:31 ` Imre Deak
2015-08-14 15:35 ` [PATCH v3 " Imre Deak
2015-08-14 16:21 ` Chris Wilson
2015-08-14 12:38 ` [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings Imre Deak
2015-08-14 13:11 ` Chris Wilson
2015-08-14 13:29 ` Imre Deak
2015-08-14 15:43 ` [PATCH v3 2/2] drm/i915/bxt: don't allow cached GEM mappings on A stepping Imre Deak
2015-08-14 16:18 ` Chris Wilson
2015-08-26 7:13 ` Daniel Vetter
2015-08-16 11:45 ` [PATCH v2 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings shuang.he
2015-08-14 12:49 ` [PATCH v2 0/2] drm/i915/bxt: work around HW coherency issue Chris Wilson
2015-08-14 13:26 ` Imre Deak
2015-08-14 13:31 ` Chris Wilson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox