[PATCH 1/2] drm/i915: mark explicitly the execlist context object as cache coherent

public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed

* [PATCH 1/2] drm/i915: mark explicitly the execlist context object as cache coherent
@ 2015-09-15 18:30 Imre Deak
  2015-09-15 18:30 ` [PATCH 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem Imre Deak
  0 siblings, 1 reply; 5+ messages in thread
From: Imre Deak @ 2015-09-15 18:30 UTC (permalink / raw)
  To: intel-gfx

The context object on all execlist platforms is mapped to the GPU with a
CPU-GPU cache coherent mapping. This matched the default cache level on
LLC platforms, but not on CHV. Set the cache level explicitly to fix
this up. On BXT A stepping the coherency is not guaranteed due to a HW
issue, to work around this we treat the mapping as not coherent.

Atm this change is only for consistency, it doesn't affect anything in
practice, but the following patch will depend on it to do a CLFLUSH
only on non-coherent platforms.

Signed-off-by: Imre Deak <imre.deak@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index fe06accb0..3f18ea1 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2465,6 +2465,16 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
 		return -ENOMEM;
 	}
 
+	if (!(IS_BROXTON(dev) && INTEL_REVID(dev) <= BXT_REVID_B0)) {
+		ret = i915_gem_object_set_cache_level(ctx_obj,
+						      I915_CACHE_LLC);
+		if (ret) {
+			DRM_ERROR("set cache level for context failed: %d\n",
+				  ret);
+			goto error_deref_obj;
+		}
+	}
+
 	ringbuf = intel_engine_create_ringbuffer(ring, 4 * PAGE_SIZE);
 	if (IS_ERR(ringbuf)) {
 		ret = PTR_ERR(ringbuf);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem
  2015-09-15 18:30 [PATCH 1/2] drm/i915: mark explicitly the execlist context object as cache coherent Imre Deak
@ 2015-09-15 18:30 ` Imre Deak
  2015-09-16  8:17   ` Chris Wilson
  2015-09-17 16:16   ` Imre Deak
  0 siblings, 2 replies; 5+ messages in thread
From: Imre Deak @ 2015-09-15 18:30 UTC (permalink / raw)
  To: intel-gfx

The execlist context object is mapped with a CPU/GPU coherent mapping
everywhere, but on BXT A stepping due to a HW issue the coherency is not
guaranteed. To work around this flush the CPU cache after any change
from the CPU to the context object. Note that this also includes any
changes done by the VM core as opposed to the driver, when
reading from backing store/bzeroing the pages.

I noticed this problem via a GPU hang, where IPEHR pointed to an invalid
opcode value. I couldn't find this value on the ring but looking at the
contents of the active context object it turned out to be a parameter
dword of a bigger command there. The original command opcode itself
was zeroed out, based on the above I assume due to a CPU writeback of
the corresponding cacheline. When restoring the context the GPU would
jump over the zeroed out opcode and hang when trying to execute the
above parameter dword.

I could easily reproduce this by running igt/gem_render_copy_redux and
gem_tiled_blits/basic in parallel, but I guess it could be triggered by
anything involving frequent switches between two separate contexts. With
this workaround I couldn't reproduce the problem.

Note that I also considered using set_pages_array_uc/wc on the context
object but this wouldn't work with kmap_atomic which always returns a WB
mapping, at least on HIGHMEM. The alternative would be keeping a UC/WC
kernel mapping around whenever the context object is pinned, but this
would be a bigger change. Since I'm not sure if there would be any
benefit in using set_pages_array, I chose the simpler clflush method.

Signed-off-by: Imre Deak <imre.deak@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  2 ++
 drivers/gpu/drm/i915/i915_gem.c  |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c | 18 ++++++++++++++++++
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4ea3e7b..ca45269 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3103,6 +3103,8 @@ i915_gem_object_ggtt_unpin(struct drm_i915_gem_object *obj)
 	i915_gem_object_ggtt_unpin_view(obj, &i915_ggtt_view_normal);
 }
 
+bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj);
+
 /* i915_gem_fence.c */
 int __must_check i915_gem_object_get_fence(struct drm_i915_gem_object *obj);
 int __must_check i915_gem_object_put_fence(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cb0df7e..4543124 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -53,7 +53,7 @@ static bool cpu_cache_is_coherent(struct drm_device *dev,
 	return HAS_LLC(dev) || level != I915_CACHE_NONE;
 }
 
-static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
+bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
 {
 	if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
 		return true;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3f18ea1..de866be 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -362,6 +362,8 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
 	struct drm_i915_gem_object *rb_obj = rq->ringbuf->obj;
 	struct page *page;
 	uint32_t *reg_state;
+	void *clflush_start;
+	void *clflush_end;
 
 	BUG_ON(!ctx_obj);
 	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj));
@@ -373,6 +375,9 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
 	reg_state[CTX_RING_TAIL+1] = rq->tail;
 	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(rb_obj);
 
+	clflush_start = &reg_state[CTX_RING_TAIL + 1];
+	clflush_end = &reg_state[CTX_RING_BUFFER_START + 2];
+
 	if (ppgtt && !USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
 		/* True 32b PPGTT with dynamic page allocation: update PDP
 		 * registers and point the unallocated PDPs to scratch page.
@@ -383,8 +388,14 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
 		ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
 		ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
 		ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
+
+		clflush_end = &reg_state[CTX_PDP0_LDW + 2];
 	}
 
+	if (cpu_write_needs_clflush(ctx_obj))
+		drm_clflush_virt_range(clflush_start,
+				       clflush_end - clflush_start);
+
 	kunmap_atomic(reg_state);
 
 	return 0;
@@ -1036,6 +1047,13 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 	if (ret)
 		return ret;
 
+	/*
+	 * For a non-coherent mapping we need to flush any related CPU cache
+	 * lines, otherwise the writeback of these would corrupt the data
+	 * written by the GPU.
+	 */
+	i915_gem_clflush_object(ctx_obj, false);
+
 	ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
 	if (ret)
 		goto unpin_ctx_obj;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem
  2015-09-15 18:30 ` [PATCH 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem Imre Deak
@ 2015-09-16  8:17   ` Chris Wilson
  2015-09-16 12:43     ` Imre Deak
  2015-09-17 16:16   ` Imre Deak
  1 sibling, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2015-09-16  8:17 UTC (permalink / raw)
  To: Imre Deak; +Cc: intel-gfx

On Tue, Sep 15, 2015 at 09:30:20PM +0300, Imre Deak wrote:
> The execlist context object is mapped with a CPU/GPU coherent mapping
> everywhere, but on BXT A stepping due to a HW issue the coherency is not
> guaranteed. To work around this flush the CPU cache after any change
> from the CPU to the context object. Note that this also includes any
> changes done by the VM core as opposed to the driver, when
> reading from backing store/bzeroing the pages.
> 
> I noticed this problem via a GPU hang, where IPEHR pointed to an invalid
> opcode value. I couldn't find this value on the ring but looking at the
> contents of the active context object it turned out to be a parameter
> dword of a bigger command there. The original command opcode itself
> was zeroed out, based on the above I assume due to a CPU writeback of
> the corresponding cacheline. When restoring the context the GPU would
> jump over the zeroed out opcode and hang when trying to execute the
> above parameter dword.
> 
> I could easily reproduce this by running igt/gem_render_copy_redux and
> gem_tiled_blits/basic in parallel, but I guess it could be triggered by
> anything involving frequent switches between two separate contexts. With
> this workaround I couldn't reproduce the problem.
> 
> Note that I also considered using set_pages_array_uc/wc on the context
> object but this wouldn't work with kmap_atomic which always returns a WB
> mapping, at least on HIGHMEM. The alternative would be keeping a UC/WC
> kernel mapping around whenever the context object is pinned, but this
> would be a bigger change. Since I'm not sure if there would be any
> benefit in using set_pages_array, I chose the simpler clflush method.

Nope. Fix execlists to use correct GEM domain management. From
experience the whole context object needs to be flushed if no longer
coherent.

Are you absolutely sure that you want to enable snooping on those pages
since that historically would be bogus? I would expect some strong
bspec reference saying that it is legal.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem
  2015-09-16  8:17   ` Chris Wilson
@ 2015-09-16 12:43     ` Imre Deak
  0 siblings, 0 replies; 5+ messages in thread
From: Imre Deak @ 2015-09-16 12:43 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On ke, 2015-09-16 at 09:17 +0100, Chris Wilson wrote:
> On Tue, Sep 15, 2015 at 09:30:20PM +0300, Imre Deak wrote:
> > The execlist context object is mapped with a CPU/GPU coherent mapping
> > everywhere, but on BXT A stepping due to a HW issue the coherency is not
> > guaranteed. To work around this flush the CPU cache after any change
> > from the CPU to the context object. Note that this also includes any
> > changes done by the VM core as opposed to the driver, when
> > reading from backing store/bzeroing the pages.
> > 
> > I noticed this problem via a GPU hang, where IPEHR pointed to an invalid
> > opcode value. I couldn't find this value on the ring but looking at the
> > contents of the active context object it turned out to be a parameter
> > dword of a bigger command there. The original command opcode itself
> > was zeroed out, based on the above I assume due to a CPU writeback of
> > the corresponding cacheline. When restoring the context the GPU would
> > jump over the zeroed out opcode and hang when trying to execute the
> > above parameter dword.
> > 
> > I could easily reproduce this by running igt/gem_render_copy_redux and
> > gem_tiled_blits/basic in parallel, but I guess it could be triggered by
> > anything involving frequent switches between two separate contexts. With
> > this workaround I couldn't reproduce the problem.
> > 
> > Note that I also considered using set_pages_array_uc/wc on the context
> > object but this wouldn't work with kmap_atomic which always returns a WB
> > mapping, at least on HIGHMEM. The alternative would be keeping a UC/WC
> > kernel mapping around whenever the context object is pinned, but this
> > would be a bigger change. Since I'm not sure if there would be any
> > benefit in using set_pages_array, I chose the simpler clflush method.
> 
> Nope. Fix execlists to use correct GEM domain management.

Not sure what you mean. In both execlist and ringbuffer mode the context
object is already put to the CPU domain, so there is no difference
there. We don't touch the context in ringbuffer mode, so there is
nothing else to do there. In execlist mode we do and so - after this
change - call i915_gem_clflush_object() which is what is done for all
other GEM objects too. This one in turn will do the proper thing based
on the given platform and the object's cache level, which is also
correct after patch 1/2.

> From experience the whole context object needs to be flushed if no longer
> coherent.

We know it's no longer coherent when we first pin it in
intel_lr_context_do_pin() so we need to flush there the whole object.
(This is after we initialized it in populate_lr_context() and/or
possibly read it back from backing storage.) Afterwards we know it won't
get accessed from the CPU until we submit it (execlist_update_context),
where we only update the ring tail/start and PDP values in it, so there
it's enough to flush these particular addresses.

Btw, I noticed now that I missed the GUC path, where there is a similar
update before submission in lr_context_update().

> Are you absolutely sure that you want to enable snooping on those pages
> since that historically would be bogus? I would expect some strong
> bspec reference saying that it is legal.

Well I don't want it, but this is what we do already anyway. On CHV and
BXT the GTT PTE PAT index is ignored and so all entries map to PAT index
0. We set up this PAT index as snooped, since we also have the HWSP in
GTT. We could go around this by mapping the HWSP also non-snooped and do
uncached seqno reads or as Ville pointed out, moving the HWSP to PPGTT.
In both cases we could set PAT index 0 as not snooped.

In either case I think the above is a separate issue that could be
addressed as a follow-up. Even if we map the context object unsnooped
we'll need the logic to flush it.

--Imre

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem
  2015-09-15 18:30 ` [PATCH 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem Imre Deak
  2015-09-16  8:17   ` Chris Wilson
@ 2015-09-17 16:16   ` Imre Deak
  1 sibling, 0 replies; 5+ messages in thread
From: Imre Deak @ 2015-09-17 16:16 UTC (permalink / raw)
  To: intel-gfx

On ti, 2015-09-15 at 21:30 +0300, Imre Deak wrote:
> The execlist context object is mapped with a CPU/GPU coherent mapping
> everywhere, but on BXT A stepping due to a HW issue the coherency is not
> guaranteed. To work around this flush the CPU cache after any change
> from the CPU to the context object. Note that this also includes any
> changes done by the VM core as opposed to the driver, when
> reading from backing store/bzeroing the pages.
> 
> I noticed this problem via a GPU hang, where IPEHR pointed to an invalid
> opcode value. I couldn't find this value on the ring but looking at the
> contents of the active context object it turned out to be a parameter
> dword of a bigger command there. The original command opcode itself
> was zeroed out, based on the above I assume due to a CPU writeback of
> the corresponding cacheline. When restoring the context the GPU would
> jump over the zeroed out opcode and hang when trying to execute the
> above parameter dword.
> 
> I could easily reproduce this by running igt/gem_render_copy_redux and
> gem_tiled_blits/basic in parallel, but I guess it could be triggered by
> anything involving frequent switches between two separate contexts. With
> this workaround I couldn't reproduce the problem.
> 
> Note that I also considered using set_pages_array_uc/wc on the context
> object but this wouldn't work with kmap_atomic which always returns a WB
> mapping, at least on HIGHMEM. The alternative would be keeping a UC/WC
> kernel mapping around whenever the context object is pinned, but this
> would be a bigger change. Since I'm not sure if there would be any
> benefit in using set_pages_array, I chose the simpler clflush method.

Ville noticed a race condition with the above approach when updating the
context during context switch: it's possible that the GPU writes the
updated ring head value at the same time the driver updates the tail
value. In this case the driver could corrupt the updated head value when
doing the clflush, since head and tail are in the same cacheline. So
instead we can map the corresponding page uncached avoiding this
scenario. I'll follow up with a v2 patchset doing this.

> Signed-off-by: Imre Deak <imre.deak@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h  |  2 ++
>  drivers/gpu/drm/i915/i915_gem.c  |  2 +-
>  drivers/gpu/drm/i915/intel_lrc.c | 18 ++++++++++++++++++
>  3 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 4ea3e7b..ca45269 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3103,6 +3103,8 @@ i915_gem_object_ggtt_unpin(struct drm_i915_gem_object *obj)
>  	i915_gem_object_ggtt_unpin_view(obj, &i915_ggtt_view_normal);
>  }
>  
> +bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj);
> +
>  /* i915_gem_fence.c */
>  int __must_check i915_gem_object_get_fence(struct drm_i915_gem_object *obj);
>  int __must_check i915_gem_object_put_fence(struct drm_i915_gem_object *obj);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index cb0df7e..4543124 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -53,7 +53,7 @@ static bool cpu_cache_is_coherent(struct drm_device *dev,
>  	return HAS_LLC(dev) || level != I915_CACHE_NONE;
>  }
>  
> -static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> +bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>  {
>  	if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
>  		return true;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 3f18ea1..de866be 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -362,6 +362,8 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
>  	struct drm_i915_gem_object *rb_obj = rq->ringbuf->obj;
>  	struct page *page;
>  	uint32_t *reg_state;
> +	void *clflush_start;
> +	void *clflush_end;
>  
>  	BUG_ON(!ctx_obj);
>  	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj));
> @@ -373,6 +375,9 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
>  	reg_state[CTX_RING_TAIL+1] = rq->tail;
>  	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(rb_obj);
>  
> +	clflush_start = &reg_state[CTX_RING_TAIL + 1];
> +	clflush_end = &reg_state[CTX_RING_BUFFER_START + 2];
> +
>  	if (ppgtt && !USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
>  		/* True 32b PPGTT with dynamic page allocation: update PDP
>  		 * registers and point the unallocated PDPs to scratch page.
> @@ -383,8 +388,14 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
>  		ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
>  		ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
>  		ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
> +
> +		clflush_end = &reg_state[CTX_PDP0_LDW + 2];
>  	}
>  
> +	if (cpu_write_needs_clflush(ctx_obj))
> +		drm_clflush_virt_range(clflush_start,
> +				       clflush_end - clflush_start);
> +
>  	kunmap_atomic(reg_state);
>  
>  	return 0;
> @@ -1036,6 +1047,13 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
>  	if (ret)
>  		return ret;
>  
> +	/*
> +	 * For a non-coherent mapping we need to flush any related CPU cache
> +	 * lines, otherwise the writeback of these would corrupt the data
> +	 * written by the GPU.
> +	 */
> +	i915_gem_clflush_object(ctx_obj, false);
> +
>  	ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
>  	if (ret)
>  		goto unpin_ctx_obj;


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-09-17 16:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-15 18:30 [PATCH 1/2] drm/i915: mark explicitly the execlist context object as cache coherent Imre Deak
2015-09-15 18:30 ` [PATCH 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem Imre Deak
2015-09-16  8:17   ` Chris Wilson
2015-09-16 12:43     ` Imre Deak
2015-09-17 16:16   ` Imre Deak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox