[PATCH] drm/i915: Prevent TLB error on first execution on SNB

public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed

* [PATCH] drm/i915: Prevent TLB error on first execution on SNB
@ 2015-02-13 12:59 Chris Wilson
  2015-02-13 13:43 ` Daniel Vetter
  2015-02-13 19:42 ` [PATCH] " shuang.he
  0 siblings, 2 replies; 11+ messages in thread
From: Chris Wilson @ 2015-02-13 12:59 UTC (permalink / raw)
  To: intel-gfx

Long ago I found that I was getting sporadic errors when booting SNB,
with the symptom being that the first batch died with IPEHR != *ACTHD,
typically caused by the TLB being invalid. These magically disappeared
if I held the forcewake during the entire ring initialisation sequence.
(It can probably be shortened to a short critical section, but the whole
initialisation is full of register writes and so we would be taking and
releasing forcewake almost continually, and so holding it over the
entire sequence will probably be a net win!)

Note some of the kernels I encounted the issue already had the deferred
forcewake release, so it is still relevant.

I know that there have been a few other reports with similar failure
conditions on SNB, I think such as
References: https://bugs.freedesktop.org/show_bug.cgi?id=80913

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8d15c8110962..2426f6d9b5a5 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4868,6 +4868,7 @@ int i915_gem_init(struct drm_device *dev)
 		dev_priv->gt.stop_ring = intel_logical_ring_stop;
 	}

+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
 	ret = i915_gem_init_userptr(dev);
 	if (ret)
 		goto out_unlock;
@@ -4894,6 +4895,7 @@ int i915_gem_init(struct drm_device *dev)
 	}

 out_unlock:
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 	mutex_unlock(&dev->struct_mutex);

 	return ret;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/i915: Prevent TLB error on first execution on SNB
  2015-02-13 12:59 [PATCH] drm/i915: Prevent TLB error on first execution on SNB Chris Wilson
@ 2015-02-13 13:43 ` Daniel Vetter
  2015-02-13 14:12   ` Chris Wilson
  2015-02-13 14:35   ` [PATCH v2] " Chris Wilson
  2015-02-13 19:42 ` [PATCH] " shuang.he
  1 sibling, 2 replies; 11+ messages in thread
From: Daniel Vetter @ 2015-02-13 13:43 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Feb 13, 2015 at 12:59:45PM +0000, Chris Wilson wrote:
> Long ago I found that I was getting sporadic errors when booting SNB,
> with the symptom being that the first batch died with IPEHR != *ACTHD,
> typically caused by the TLB being invalid. These magically disappeared
> if I held the forcewake during the entire ring initialisation sequence.
> (It can probably be shortened to a short critical section, but the whole
> initialisation is full of register writes and so we would be taking and
> releasing forcewake almost continually, and so holding it over the
> entire sequence will probably be a net win!)
> 
> Note some of the kernels I encounted the issue already had the deferred
> forcewake release, so it is still relevant.
> 
> I know that there have been a few other reports with similar failure
> conditions on SNB, I think such as
> References: https://bugs.freedesktop.org/show_bug.cgi?id=80913
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Given that we've already added a forcewake critical section around
individual ring inits this makes maybe a bit too much sense. But I do
wonder whether we don't need the same for resume and gpu resets?

With the split into hw/sw setup we could get that by pusing the
forcewake_get/put inti i915_gem_init_hw. Does the magic still work with
that? And if we put it there there fw_get/put in init_ring_common is fully
redundant and could be remove.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 8d15c8110962..2426f6d9b5a5 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4868,6 +4868,7 @@ int i915_gem_init(struct drm_device *dev)
>  		dev_priv->gt.stop_ring = intel_logical_ring_stop;
>  	}
>  
> +	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
>  	ret = i915_gem_init_userptr(dev);
>  	if (ret)
>  		goto out_unlock;
> @@ -4894,6 +4895,7 @@ int i915_gem_init(struct drm_device *dev)
>  	}
>  
>  out_unlock:
> +	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
>  	mutex_unlock(&dev->struct_mutex);
>  
>  	return ret;
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/i915: Prevent TLB error on first execution on SNB
  2015-02-13 13:43 ` Daniel Vetter
@ 2015-02-13 14:12   ` Chris Wilson
  2015-02-23 22:54     ` Daniel Vetter
  2015-02-13 14:35   ` [PATCH v2] " Chris Wilson
  1 sibling, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2015-02-13 14:12 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, Feb 13, 2015 at 02:43:40PM +0100, Daniel Vetter wrote:
> On Fri, Feb 13, 2015 at 12:59:45PM +0000, Chris Wilson wrote:
> > Long ago I found that I was getting sporadic errors when booting SNB,
> > with the symptom being that the first batch died with IPEHR != *ACTHD,
> > typically caused by the TLB being invalid. These magically disappeared
> > if I held the forcewake during the entire ring initialisation sequence.
> > (It can probably be shortened to a short critical section, but the whole
> > initialisation is full of register writes and so we would be taking and
> > releasing forcewake almost continually, and so holding it over the
> > entire sequence will probably be a net win!)
> > 
> > Note some of the kernels I encounted the issue already had the deferred
> > forcewake release, so it is still relevant.
> > 
> > I know that there have been a few other reports with similar failure
> > conditions on SNB, I think such as
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=80913
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Given that we've already added a forcewake critical section around
> individual ring inits this makes maybe a bit too much sense. But I do
> wonder whether we don't need the same for resume and gpu resets?
> 
> With the split into hw/sw setup we could get that by pusing the
> forcewake_get/put inti i915_gem_init_hw. Does the magic still work with
> that? And if we put it there there fw_get/put in init_ring_common is fully
> redundant and could be remove.

Hmm, my original thought was to keep the engine alive from the first
programming of CTL up until we fed in the first request (which is the
ppgtt/context init). We can add a second forcewake layer into init_hw to
give the same security blanket for resume/reset. Sound reasonable?

And I should add a comment saying that this is a security blanket.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2] drm/i915: Prevent TLB error on first execution on SNB
  2015-02-13 13:43 ` Daniel Vetter
  2015-02-13 14:12   ` Chris Wilson
@ 2015-02-13 14:35   ` Chris Wilson
  2015-02-14  0:19     ` shuang.he
  2015-03-10 10:31     ` Daniel Vetter
  1 sibling, 2 replies; 11+ messages in thread
From: Chris Wilson @ 2015-02-13 14:35 UTC (permalink / raw)
  To: intel-gfx

Long ago I found that I was getting sporadic errors when booting SNB,
with the symptom being that the first batch died with IPEHR != *ACTHD,
typically caused by the TLB being invalid. These magically disappeared
if I held the forcewake during the entire ring initialisation sequence.
(It can probably be shortened to a short critical section, but the whole
initialisation is full of register writes and so we would be taking and
releasing forcewake almost continually, and so holding it over the
entire sequence will probably be a net win!)

Note some of the kernels I encounted the issue already had the deferred
forcewake release, so it is still relevant.

I know that there have been a few other reports with similar failure
conditions on SNB, I think such as
References: https://bugs.freedesktop.org/show_bug.cgi?id=80913

v2: Wrap i915_gem_init_hw() with its own security blanket as we take
that path following resume and reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8d15c8110962..08450922f373 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4783,6 +4783,9 @@ i915_gem_init_hw(struct drm_device *dev)
 	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
 		return -EIO;
 
+	/* Double layer security blanket, see i915_gem_init() */
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
 	if (dev_priv->ellc_size)
 		I915_WRITE(HSW_IDICR, I915_READ(HSW_IDICR) | IDIHASHMSK(0xf));
 
@@ -4815,7 +4818,7 @@ i915_gem_init_hw(struct drm_device *dev)
 	for_each_ring(ring, dev_priv, i) {
 		ret = ring->init_hw(ring);
 		if (ret)
-			return ret;
+			goto out;
 	}
 
 	for (i = 0; i < NUM_L3_SLICES(dev); i++)
@@ -4832,9 +4835,11 @@ i915_gem_init_hw(struct drm_device *dev)
 		DRM_ERROR("Context enable failed %d\n", ret);
 		i915_gem_cleanup_ringbuffer(dev);
 
-		return ret;
+		goto out;
 	}
 
+out:
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 	return ret;
 }
 
@@ -4868,6 +4873,14 @@ int i915_gem_init(struct drm_device *dev)
 		dev_priv->gt.stop_ring = intel_logical_ring_stop;
 	}
 
+	/* This is just a security blanket to placate dragons.
+	 * On some systems, we very sporadically observe that the first TLBs
+	 * used by the CS may be stale, despite us poking the TLB reset. If
+	 * we hold the forcewake during initialisation these problems
+	 * just magically go away.
+	 */
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
 	ret = i915_gem_init_userptr(dev);
 	if (ret)
 		goto out_unlock;
@@ -4894,6 +4907,7 @@ int i915_gem_init(struct drm_device *dev)
 	}
 
 out_unlock:
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 	mutex_unlock(&dev->struct_mutex);
 
 	return ret;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/i915: Prevent TLB error on first execution on SNB
  2015-02-13 12:59 [PATCH] drm/i915: Prevent TLB error on first execution on SNB Chris Wilson
  2015-02-13 13:43 ` Daniel Vetter
@ 2015-02-13 19:42 ` shuang.he
  1 sibling, 0 replies; 11+ messages in thread
From: shuang.he @ 2015-02-13 19:42 UTC (permalink / raw)
  To: shuang.he, ethan.gao, intel-gfx, chris

Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 5770
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
PNV                                  281/281              281/281
ILK                                  313/313              313/313
SNB                                  309/309              309/309
IVB                                  382/382              382/382
BYT                                  296/296              296/296
HSW                 -2              426/426              424/426
BDW                 -1              318/318              317/318
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
*HSW  igt_kms_flip_plain-flip-fb-recreate      PASS(1)      TIMEOUT(1)
*HSW  igt_kms_flip_plain-flip-fb-recreate-interruptible      PASS(1)      TIMEOUT(1)
*BDW  igt_gem_gtt_hog      PASS(1)      DMESG_WARN(1)
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/i915: Prevent TLB error on first execution on SNB
  2015-02-13 14:35   ` [PATCH v2] " Chris Wilson
@ 2015-02-14  0:19     ` shuang.he
  2015-03-10 10:31     ` Daniel Vetter
  1 sibling, 0 replies; 11+ messages in thread
From: shuang.he @ 2015-02-14  0:19 UTC (permalink / raw)
  To: shuang.he, ethan.gao, intel-gfx, chris

Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 5773
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
PNV                 -1              281/281              280/281
ILK                                  313/313              313/313
SNB                                  309/309              309/309
IVB                                  382/382              382/382
BYT                                  296/296              296/296
HSW                                  426/426              426/426
BDW                 -1              318/318              317/318
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
*PNV  igt_gen3_render_mixed_blits      PASS(1)      CRASH(1)
*BDW  igt_gem_gtt_hog      PASS(1)      DMESG_WARN(1)
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/i915: Prevent TLB error on first execution on SNB
  2015-02-13 14:12   ` Chris Wilson
@ 2015-02-23 22:54     ` Daniel Vetter
  0 siblings, 0 replies; 11+ messages in thread
From: Daniel Vetter @ 2015-02-23 22:54 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx

On Fri, Feb 13, 2015 at 02:12:48PM +0000, Chris Wilson wrote:
> On Fri, Feb 13, 2015 at 02:43:40PM +0100, Daniel Vetter wrote:
> > On Fri, Feb 13, 2015 at 12:59:45PM +0000, Chris Wilson wrote:
> > > Long ago I found that I was getting sporadic errors when booting SNB,
> > > with the symptom being that the first batch died with IPEHR != *ACTHD,
> > > typically caused by the TLB being invalid. These magically disappeared
> > > if I held the forcewake during the entire ring initialisation sequence.
> > > (It can probably be shortened to a short critical section, but the whole
> > > initialisation is full of register writes and so we would be taking and
> > > releasing forcewake almost continually, and so holding it over the
> > > entire sequence will probably be a net win!)
> > > 
> > > Note some of the kernels I encounted the issue already had the deferred
> > > forcewake release, so it is still relevant.
> > > 
> > > I know that there have been a few other reports with similar failure
> > > conditions on SNB, I think such as
> > > References: https://bugs.freedesktop.org/show_bug.cgi?id=80913
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > Given that we've already added a forcewake critical section around
> > individual ring inits this makes maybe a bit too much sense. But I do
> > wonder whether we don't need the same for resume and gpu resets?
> > 
> > With the split into hw/sw setup we could get that by pusing the
> > forcewake_get/put inti i915_gem_init_hw. Does the magic still work with
> > that? And if we put it there there fw_get/put in init_ring_common is fully
> > redundant and could be remove.
> 
> Hmm, my original thought was to keep the engine alive from the first
> programming of CTL up until we fed in the first request (which is the
> ppgtt/context init). We can add a second forcewake layer into init_hw to
> give the same security blanket for resume/reset. Sound reasonable?

With the split into sw/hw setup init_hw should be all that's needed for
coverage, nothing touches the hw outside of it. Hence I think the original
outer layer is redundant. Does the magic still work if we drop that part,
or have I missed some hw access (there really shouldn't be any)?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/i915: Prevent TLB error on first execution on SNB
  2015-02-13 14:35   ` [PATCH v2] " Chris Wilson
  2015-02-14  0:19     ` shuang.he
@ 2015-03-10 10:31     ` Daniel Vetter
  2015-03-10 10:35       ` Chris Wilson
  1 sibling, 1 reply; 11+ messages in thread
From: Daniel Vetter @ 2015-03-10 10:31 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Feb 13, 2015 at 02:35:59PM +0000, Chris Wilson wrote:
> Long ago I found that I was getting sporadic errors when booting SNB,
> with the symptom being that the first batch died with IPEHR != *ACTHD,
> typically caused by the TLB being invalid. These magically disappeared
> if I held the forcewake during the entire ring initialisation sequence.
> (It can probably be shortened to a short critical section, but the whole
> initialisation is full of register writes and so we would be taking and
> releasing forcewake almost continually, and so holding it over the
> entire sequence will probably be a net win!)
> 
> Note some of the kernels I encounted the issue already had the deferred
> forcewake release, so it is still relevant.
> 
> I know that there have been a few other reports with similar failure
> conditions on SNB, I think such as
> References: https://bugs.freedesktop.org/show_bug.cgi?id=80913
> 
> v2: Wrap i915_gem_init_hw() with its own security blanket as we take
> that path following resume and reset.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 8d15c8110962..08450922f373 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4783,6 +4783,9 @@ i915_gem_init_hw(struct drm_device *dev)
>  	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
>  		return -EIO;
>  
> +	/* Double layer security blanket, see i915_gem_init() */
> +	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> +
>  	if (dev_priv->ellc_size)
>  		I915_WRITE(HSW_IDICR, I915_READ(HSW_IDICR) | IDIHASHMSK(0xf));
>  
> @@ -4815,7 +4818,7 @@ i915_gem_init_hw(struct drm_device *dev)
>  	for_each_ring(ring, dev_priv, i) {
>  		ret = ring->init_hw(ring);
>  		if (ret)
> -			return ret;
> +			goto out;
>  	}
>  
>  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> @@ -4832,9 +4835,11 @@ i915_gem_init_hw(struct drm_device *dev)
>  		DRM_ERROR("Context enable failed %d\n", ret);
>  		i915_gem_cleanup_ringbuffer(dev);
>  
> -		return ret;
> +		goto out;
>  	}
>  
> +out:
> +	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
>  	return ret;
>  }
>  
> @@ -4868,6 +4873,14 @@ int i915_gem_init(struct drm_device *dev)
>  		dev_priv->gt.stop_ring = intel_logical_ring_stop;
>  	}
>  
> +	/* This is just a security blanket to placate dragons.
> +	 * On some systems, we very sporadically observe that the first TLBs
> +	 * used by the CS may be stale, despite us poking the TLB reset. If
> +	 * we hold the forcewake during initialisation these problems
> +	 * just magically go away.
> +	 */
> +	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);

gem_init shouldn't ever touch the hw except through gem_init_hw. Do we
really need the double-layer here? Also the forcewake hack in the ring
init code should now be redundant, too.
-Daniel

> +
>  	ret = i915_gem_init_userptr(dev);
>  	if (ret)
>  		goto out_unlock;
> @@ -4894,6 +4907,7 @@ int i915_gem_init(struct drm_device *dev)
>  	}
>  
>  out_unlock:
> +	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
>  	mutex_unlock(&dev->struct_mutex);
>  
>  	return ret;
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/i915: Prevent TLB error on first execution on SNB
  2015-03-10 10:31     ` Daniel Vetter
@ 2015-03-10 10:35       ` Chris Wilson
  2015-03-10 12:55         ` Daniel Vetter
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2015-03-10 10:35 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, Mar 10, 2015 at 11:31:04AM +0100, Daniel Vetter wrote:
> On Fri, Feb 13, 2015 at 02:35:59PM +0000, Chris Wilson wrote:
> > Long ago I found that I was getting sporadic errors when booting SNB,
> > with the symptom being that the first batch died with IPEHR != *ACTHD,
> > typically caused by the TLB being invalid. These magically disappeared
> > if I held the forcewake during the entire ring initialisation sequence.
> > (It can probably be shortened to a short critical section, but the whole
> > initialisation is full of register writes and so we would be taking and
> > releasing forcewake almost continually, and so holding it over the
> > entire sequence will probably be a net win!)
> > 
> > Note some of the kernels I encounted the issue already had the deferred
> > forcewake release, so it is still relevant.
> > 
> > I know that there have been a few other reports with similar failure
> > conditions on SNB, I think such as
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=80913
> > 
> > v2: Wrap i915_gem_init_hw() with its own security blanket as we take
> > that path following resume and reset.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 18 ++++++++++++++++--
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 8d15c8110962..08450922f373 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4783,6 +4783,9 @@ i915_gem_init_hw(struct drm_device *dev)
> >  	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
> >  		return -EIO;
> >  
> > +	/* Double layer security blanket, see i915_gem_init() */
> > +	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> > +
> >  	if (dev_priv->ellc_size)
> >  		I915_WRITE(HSW_IDICR, I915_READ(HSW_IDICR) | IDIHASHMSK(0xf));
> >  
> > @@ -4815,7 +4818,7 @@ i915_gem_init_hw(struct drm_device *dev)
> >  	for_each_ring(ring, dev_priv, i) {
> >  		ret = ring->init_hw(ring);
> >  		if (ret)
> > -			return ret;
> > +			goto out;
> >  	}
> >  
> >  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> > @@ -4832,9 +4835,11 @@ i915_gem_init_hw(struct drm_device *dev)
> >  		DRM_ERROR("Context enable failed %d\n", ret);
> >  		i915_gem_cleanup_ringbuffer(dev);
> >  
> > -		return ret;
> > +		goto out;
> >  	}
> >  
> > +out:
> > +	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> >  	return ret;
> >  }
> >  
> > @@ -4868,6 +4873,14 @@ int i915_gem_init(struct drm_device *dev)
> >  		dev_priv->gt.stop_ring = intel_logical_ring_stop;
> >  	}
> >  
> > +	/* This is just a security blanket to placate dragons.
> > +	 * On some systems, we very sporadically observe that the first TLBs
> > +	 * used by the CS may be stale, despite us poking the TLB reset. If
> > +	 * we hold the forcewake during initialisation these problems
> > +	 * just magically go away.
> > +	 */
> > +	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> 
> gem_init shouldn't ever touch the hw except through gem_init_hw. Do we
> really need the double-layer here?

There are register accesses before, so yes since that's how I tested
it...

> Also the forcewake hack in the ring
> init code should now be redundant, too.

I am of the opinion that they still serve documentary value. Unless you
have an assert_force_wake() handy.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/i915: Prevent TLB error on first execution on SNB
  2015-03-10 10:35       ` Chris Wilson
@ 2015-03-10 12:55         ` Daniel Vetter
  2015-03-10 13:31           ` Jani Nikula
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Vetter @ 2015-03-10 12:55 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx

On Tue, Mar 10, 2015 at 10:35:41AM +0000, Chris Wilson wrote:
> On Tue, Mar 10, 2015 at 11:31:04AM +0100, Daniel Vetter wrote:
> > On Fri, Feb 13, 2015 at 02:35:59PM +0000, Chris Wilson wrote:
> > > Long ago I found that I was getting sporadic errors when booting SNB,
> > > with the symptom being that the first batch died with IPEHR != *ACTHD,
> > > typically caused by the TLB being invalid. These magically disappeared
> > > if I held the forcewake during the entire ring initialisation sequence.
> > > (It can probably be shortened to a short critical section, but the whole
> > > initialisation is full of register writes and so we would be taking and
> > > releasing forcewake almost continually, and so holding it over the
> > > entire sequence will probably be a net win!)
> > > 
> > > Note some of the kernels I encounted the issue already had the deferred
> > > forcewake release, so it is still relevant.
> > > 
> > > I know that there have been a few other reports with similar failure
> > > conditions on SNB, I think such as
> > > References: https://bugs.freedesktop.org/show_bug.cgi?id=80913
> > > 
> > > v2: Wrap i915_gem_init_hw() with its own security blanket as we take
> > > that path following resume and reset.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem.c | 18 ++++++++++++++++--
> > >  1 file changed, 16 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > index 8d15c8110962..08450922f373 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -4783,6 +4783,9 @@ i915_gem_init_hw(struct drm_device *dev)
> > >  	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
> > >  		return -EIO;
> > >  
> > > +	/* Double layer security blanket, see i915_gem_init() */
> > > +	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> > > +
> > >  	if (dev_priv->ellc_size)
> > >  		I915_WRITE(HSW_IDICR, I915_READ(HSW_IDICR) | IDIHASHMSK(0xf));
> > >  
> > > @@ -4815,7 +4818,7 @@ i915_gem_init_hw(struct drm_device *dev)
> > >  	for_each_ring(ring, dev_priv, i) {
> > >  		ret = ring->init_hw(ring);
> > >  		if (ret)
> > > -			return ret;
> > > +			goto out;
> > >  	}
> > >  
> > >  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> > > @@ -4832,9 +4835,11 @@ i915_gem_init_hw(struct drm_device *dev)
> > >  		DRM_ERROR("Context enable failed %d\n", ret);
> > >  		i915_gem_cleanup_ringbuffer(dev);
> > >  
> > > -		return ret;
> > > +		goto out;
> > >  	}
> > >  
> > > +out:
> > > +	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> > >  	return ret;
> > >  }
> > >  
> > > @@ -4868,6 +4873,14 @@ int i915_gem_init(struct drm_device *dev)
> > >  		dev_priv->gt.stop_ring = intel_logical_ring_stop;
> > >  	}
> > >  
> > > +	/* This is just a security blanket to placate dragons.
> > > +	 * On some systems, we very sporadically observe that the first TLBs
> > > +	 * used by the CS may be stale, despite us poking the TLB reset. If
> > > +	 * we hold the forcewake during initialisation these problems
> > > +	 * just magically go away.
> > > +	 */
> > > +	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> > 
> > gem_init shouldn't ever touch the hw except through gem_init_hw. Do we
> > really need the double-layer here?
> 
> There are register accesses before, so yes since that's how I tested
> it...
> 
> > Also the forcewake hack in the ring
> > init code should now be redundant, too.
> 
> I am of the opinion that they still serve documentary value. Unless you
> have an assert_force_wake() handy.

Ok, count me convinced.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

And I guess this is for Jani + cc: stable.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/i915: Prevent TLB error on first execution on SNB
  2015-03-10 12:55         ` Daniel Vetter
@ 2015-03-10 13:31           ` Jani Nikula
  0 siblings, 0 replies; 11+ messages in thread
From: Jani Nikula @ 2015-03-10 13:31 UTC (permalink / raw)
  To: Daniel Vetter, Chris Wilson

On Tue, 10 Mar 2015, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Tue, Mar 10, 2015 at 10:35:41AM +0000, Chris Wilson wrote:
>> On Tue, Mar 10, 2015 at 11:31:04AM +0100, Daniel Vetter wrote:
>> > On Fri, Feb 13, 2015 at 02:35:59PM +0000, Chris Wilson wrote:
>> > > Long ago I found that I was getting sporadic errors when booting SNB,
>> > > with the symptom being that the first batch died with IPEHR != *ACTHD,
>> > > typically caused by the TLB being invalid. These magically disappeared
>> > > if I held the forcewake during the entire ring initialisation sequence.
>> > > (It can probably be shortened to a short critical section, but the whole
>> > > initialisation is full of register writes and so we would be taking and
>> > > releasing forcewake almost continually, and so holding it over the
>> > > entire sequence will probably be a net win!)
>> > > 
>> > > Note some of the kernels I encounted the issue already had the deferred
>> > > forcewake release, so it is still relevant.
>> > > 
>> > > I know that there have been a few other reports with similar failure
>> > > conditions on SNB, I think such as
>> > > References: https://bugs.freedesktop.org/show_bug.cgi?id=80913
>> > > 
>> > > v2: Wrap i915_gem_init_hw() with its own security blanket as we take
>> > > that path following resume and reset.
>> > > 
>> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > > ---
>> > >  drivers/gpu/drm/i915/i915_gem.c | 18 ++++++++++++++++--
>> > >  1 file changed, 16 insertions(+), 2 deletions(-)
>> > > 
>> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> > > index 8d15c8110962..08450922f373 100644
>> > > --- a/drivers/gpu/drm/i915/i915_gem.c
>> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
>> > > @@ -4783,6 +4783,9 @@ i915_gem_init_hw(struct drm_device *dev)
>> > >  	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
>> > >  		return -EIO;
>> > >  
>> > > +	/* Double layer security blanket, see i915_gem_init() */
>> > > +	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
>> > > +
>> > >  	if (dev_priv->ellc_size)
>> > >  		I915_WRITE(HSW_IDICR, I915_READ(HSW_IDICR) | IDIHASHMSK(0xf));
>> > >  
>> > > @@ -4815,7 +4818,7 @@ i915_gem_init_hw(struct drm_device *dev)
>> > >  	for_each_ring(ring, dev_priv, i) {
>> > >  		ret = ring->init_hw(ring);
>> > >  		if (ret)
>> > > -			return ret;
>> > > +			goto out;
>> > >  	}
>> > >  
>> > >  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
>> > > @@ -4832,9 +4835,11 @@ i915_gem_init_hw(struct drm_device *dev)
>> > >  		DRM_ERROR("Context enable failed %d\n", ret);
>> > >  		i915_gem_cleanup_ringbuffer(dev);
>> > >  
>> > > -		return ret;
>> > > +		goto out;
>> > >  	}
>> > >  
>> > > +out:
>> > > +	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
>> > >  	return ret;
>> > >  }
>> > >  
>> > > @@ -4868,6 +4873,14 @@ int i915_gem_init(struct drm_device *dev)
>> > >  		dev_priv->gt.stop_ring = intel_logical_ring_stop;
>> > >  	}
>> > >  
>> > > +	/* This is just a security blanket to placate dragons.
>> > > +	 * On some systems, we very sporadically observe that the first TLBs
>> > > +	 * used by the CS may be stale, despite us poking the TLB reset. If
>> > > +	 * we hold the forcewake during initialisation these problems
>> > > +	 * just magically go away.
>> > > +	 */
>> > > +	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
>> > 
>> > gem_init shouldn't ever touch the hw except through gem_init_hw. Do we
>> > really need the double-layer here?
>> 
>> There are register accesses before, so yes since that's how I tested
>> it...
>> 
>> > Also the forcewake hack in the ring
>> > init code should now be redundant, too.
>> 
>> I am of the opinion that they still serve documentary value. Unless you
>> have an assert_force_wake() handy.
>
> Ok, count me convinced.
>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> And I guess this is for Jani + cc: stable.

Pushed to drm-intel-fixes, thanks for the patch and review.

BR,
Jani.


> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-03-10 13:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-13 12:59 [PATCH] drm/i915: Prevent TLB error on first execution on SNB Chris Wilson
2015-02-13 13:43 ` Daniel Vetter
2015-02-13 14:12   ` Chris Wilson
2015-02-23 22:54     ` Daniel Vetter
2015-02-13 14:35   ` [PATCH v2] " Chris Wilson
2015-02-14  0:19     ` shuang.he
2015-03-10 10:31     ` Daniel Vetter
2015-03-10 10:35       ` Chris Wilson
2015-03-10 12:55         ` Daniel Vetter
2015-03-10 13:31           ` Jani Nikula
2015-02-13 19:42 ` [PATCH] " shuang.he

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox