* [PATCH] drm/i915: Prevent TLB error on first execution on SNB @ 2015-02-13 12:59 Chris Wilson 2015-02-13 13:43 ` Daniel Vetter 2015-02-13 19:42 ` [PATCH] " shuang.he 0 siblings, 2 replies; 11+ messages in thread From: Chris Wilson @ 2015-02-13 12:59 UTC (permalink / raw) To: intel-gfx Long ago I found that I was getting sporadic errors when booting SNB, with the symptom being that the first batch died with IPEHR != *ACTHD, typically caused by the TLB being invalid. These magically disappeared if I held the forcewake during the entire ring initialisation sequence. (It can probably be shortened to a short critical section, but the whole initialisation is full of register writes and so we would be taking and releasing forcewake almost continually, and so holding it over the entire sequence will probably be a net win!) Note some of the kernels I encounted the issue already had the deferred forcewake release, so it is still relevant. I know that there have been a few other reports with similar failure conditions on SNB, I think such as References: https://bugs.freedesktop.org/show_bug.cgi?id=80913 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_gem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 8d15c8110962..2426f6d9b5a5 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4868,6 +4868,7 @@ int i915_gem_init(struct drm_device *dev) dev_priv->gt.stop_ring = intel_logical_ring_stop; } + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL); ret = i915_gem_init_userptr(dev); if (ret) goto out_unlock; @@ -4894,6 +4895,7 @@ int i915_gem_init(struct drm_device *dev) } out_unlock: + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); mutex_unlock(&dev->struct_mutex); return ret; -- 2.1.4 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] drm/i915: Prevent TLB error on first execution on SNB 2015-02-13 12:59 [PATCH] drm/i915: Prevent TLB error on first execution on SNB Chris Wilson @ 2015-02-13 13:43 ` Daniel Vetter 2015-02-13 14:12 ` Chris Wilson 2015-02-13 14:35 ` [PATCH v2] " Chris Wilson 2015-02-13 19:42 ` [PATCH] " shuang.he 1 sibling, 2 replies; 11+ messages in thread From: Daniel Vetter @ 2015-02-13 13:43 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx On Fri, Feb 13, 2015 at 12:59:45PM +0000, Chris Wilson wrote: > Long ago I found that I was getting sporadic errors when booting SNB, > with the symptom being that the first batch died with IPEHR != *ACTHD, > typically caused by the TLB being invalid. These magically disappeared > if I held the forcewake during the entire ring initialisation sequence. > (It can probably be shortened to a short critical section, but the whole > initialisation is full of register writes and so we would be taking and > releasing forcewake almost continually, and so holding it over the > entire sequence will probably be a net win!) > > Note some of the kernels I encounted the issue already had the deferred > forcewake release, so it is still relevant. > > I know that there have been a few other reports with similar failure > conditions on SNB, I think such as > References: https://bugs.freedesktop.org/show_bug.cgi?id=80913 > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Given that we've already added a forcewake critical section around individual ring inits this makes maybe a bit too much sense. But I do wonder whether we don't need the same for resume and gpu resets? With the split into hw/sw setup we could get that by pusing the forcewake_get/put inti i915_gem_init_hw. Does the magic still work with that? And if we put it there there fw_get/put in init_ring_common is fully redundant and could be remove. -Daniel > --- > drivers/gpu/drm/i915/i915_gem.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 8d15c8110962..2426f6d9b5a5 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -4868,6 +4868,7 @@ int i915_gem_init(struct drm_device *dev) > dev_priv->gt.stop_ring = intel_logical_ring_stop; > } > > + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL); > ret = i915_gem_init_userptr(dev); > if (ret) > goto out_unlock; > @@ -4894,6 +4895,7 @@ int i915_gem_init(struct drm_device *dev) > } > > out_unlock: > + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); > mutex_unlock(&dev->struct_mutex); > > return ret; > -- > 2.1.4 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] drm/i915: Prevent TLB error on first execution on SNB 2015-02-13 13:43 ` Daniel Vetter @ 2015-02-13 14:12 ` Chris Wilson 2015-02-23 22:54 ` Daniel Vetter 2015-02-13 14:35 ` [PATCH v2] " Chris Wilson 1 sibling, 1 reply; 11+ messages in thread From: Chris Wilson @ 2015-02-13 14:12 UTC (permalink / raw) To: Daniel Vetter; +Cc: intel-gfx On Fri, Feb 13, 2015 at 02:43:40PM +0100, Daniel Vetter wrote: > On Fri, Feb 13, 2015 at 12:59:45PM +0000, Chris Wilson wrote: > > Long ago I found that I was getting sporadic errors when booting SNB, > > with the symptom being that the first batch died with IPEHR != *ACTHD, > > typically caused by the TLB being invalid. These magically disappeared > > if I held the forcewake during the entire ring initialisation sequence. > > (It can probably be shortened to a short critical section, but the whole > > initialisation is full of register writes and so we would be taking and > > releasing forcewake almost continually, and so holding it over the > > entire sequence will probably be a net win!) > > > > Note some of the kernels I encounted the issue already had the deferred > > forcewake release, so it is still relevant. > > > > I know that there have been a few other reports with similar failure > > conditions on SNB, I think such as > > References: https://bugs.freedesktop.org/show_bug.cgi?id=80913 > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Given that we've already added a forcewake critical section around > individual ring inits this makes maybe a bit too much sense. But I do > wonder whether we don't need the same for resume and gpu resets? > > With the split into hw/sw setup we could get that by pusing the > forcewake_get/put inti i915_gem_init_hw. Does the magic still work with > that? And if we put it there there fw_get/put in init_ring_common is fully > redundant and could be remove. Hmm, my original thought was to keep the engine alive from the first programming of CTL up until we fed in the first request (which is the ppgtt/context init). We can add a second forcewake layer into init_hw to give the same security blanket for resume/reset. Sound reasonable? And I should add a comment saying that this is a security blanket. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] drm/i915: Prevent TLB error on first execution on SNB 2015-02-13 14:12 ` Chris Wilson @ 2015-02-23 22:54 ` Daniel Vetter 0 siblings, 0 replies; 11+ messages in thread From: Daniel Vetter @ 2015-02-23 22:54 UTC (permalink / raw) To: Chris Wilson, Daniel Vetter, intel-gfx On Fri, Feb 13, 2015 at 02:12:48PM +0000, Chris Wilson wrote: > On Fri, Feb 13, 2015 at 02:43:40PM +0100, Daniel Vetter wrote: > > On Fri, Feb 13, 2015 at 12:59:45PM +0000, Chris Wilson wrote: > > > Long ago I found that I was getting sporadic errors when booting SNB, > > > with the symptom being that the first batch died with IPEHR != *ACTHD, > > > typically caused by the TLB being invalid. These magically disappeared > > > if I held the forcewake during the entire ring initialisation sequence. > > > (It can probably be shortened to a short critical section, but the whole > > > initialisation is full of register writes and so we would be taking and > > > releasing forcewake almost continually, and so holding it over the > > > entire sequence will probably be a net win!) > > > > > > Note some of the kernels I encounted the issue already had the deferred > > > forcewake release, so it is still relevant. > > > > > > I know that there have been a few other reports with similar failure > > > conditions on SNB, I think such as > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=80913 > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > Given that we've already added a forcewake critical section around > > individual ring inits this makes maybe a bit too much sense. But I do > > wonder whether we don't need the same for resume and gpu resets? > > > > With the split into hw/sw setup we could get that by pusing the > > forcewake_get/put inti i915_gem_init_hw. Does the magic still work with > > that? And if we put it there there fw_get/put in init_ring_common is fully > > redundant and could be remove. > > Hmm, my original thought was to keep the engine alive from the first > programming of CTL up until we fed in the first request (which is the > ppgtt/context init). We can add a second forcewake layer into init_hw to > give the same security blanket for resume/reset. Sound reasonable? With the split into sw/hw setup init_hw should be all that's needed for coverage, nothing touches the hw outside of it. Hence I think the original outer layer is redundant. Does the magic still work if we drop that part, or have I missed some hw access (there really shouldn't be any)? -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2] drm/i915: Prevent TLB error on first execution on SNB 2015-02-13 13:43 ` Daniel Vetter 2015-02-13 14:12 ` Chris Wilson @ 2015-02-13 14:35 ` Chris Wilson 2015-02-14 0:19 ` shuang.he 2015-03-10 10:31 ` Daniel Vetter 1 sibling, 2 replies; 11+ messages in thread From: Chris Wilson @ 2015-02-13 14:35 UTC (permalink / raw) To: intel-gfx Long ago I found that I was getting sporadic errors when booting SNB, with the symptom being that the first batch died with IPEHR != *ACTHD, typically caused by the TLB being invalid. These magically disappeared if I held the forcewake during the entire ring initialisation sequence. (It can probably be shortened to a short critical section, but the whole initialisation is full of register writes and so we would be taking and releasing forcewake almost continually, and so holding it over the entire sequence will probably be a net win!) Note some of the kernels I encounted the issue already had the deferred forcewake release, so it is still relevant. I know that there have been a few other reports with similar failure conditions on SNB, I think such as References: https://bugs.freedesktop.org/show_bug.cgi?id=80913 v2: Wrap i915_gem_init_hw() with its own security blanket as we take that path following resume and reset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_gem.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 8d15c8110962..08450922f373 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4783,6 +4783,9 @@ i915_gem_init_hw(struct drm_device *dev) if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt()) return -EIO; + /* Double layer security blanket, see i915_gem_init() */ + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL); + if (dev_priv->ellc_size) I915_WRITE(HSW_IDICR, I915_READ(HSW_IDICR) | IDIHASHMSK(0xf)); @@ -4815,7 +4818,7 @@ i915_gem_init_hw(struct drm_device *dev) for_each_ring(ring, dev_priv, i) { ret = ring->init_hw(ring); if (ret) - return ret; + goto out; } for (i = 0; i < NUM_L3_SLICES(dev); i++) @@ -4832,9 +4835,11 @@ i915_gem_init_hw(struct drm_device *dev) DRM_ERROR("Context enable failed %d\n", ret); i915_gem_cleanup_ringbuffer(dev); - return ret; + goto out; } +out: + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); return ret; } @@ -4868,6 +4873,14 @@ int i915_gem_init(struct drm_device *dev) dev_priv->gt.stop_ring = intel_logical_ring_stop; } + /* This is just a security blanket to placate dragons. + * On some systems, we very sporadically observe that the first TLBs + * used by the CS may be stale, despite us poking the TLB reset. If + * we hold the forcewake during initialisation these problems + * just magically go away. + */ + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL); + ret = i915_gem_init_userptr(dev); if (ret) goto out_unlock; @@ -4894,6 +4907,7 @@ int i915_gem_init(struct drm_device *dev) } out_unlock: + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); mutex_unlock(&dev->struct_mutex); return ret; -- 2.1.4 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2] drm/i915: Prevent TLB error on first execution on SNB 2015-02-13 14:35 ` [PATCH v2] " Chris Wilson @ 2015-02-14 0:19 ` shuang.he 2015-03-10 10:31 ` Daniel Vetter 1 sibling, 0 replies; 11+ messages in thread From: shuang.he @ 2015-02-14 0:19 UTC (permalink / raw) To: shuang.he, ethan.gao, intel-gfx, chris Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com) Task id: 5773 -------------------------------------Summary------------------------------------- Platform Delta drm-intel-nightly Series Applied PNV -1 281/281 280/281 ILK 313/313 313/313 SNB 309/309 309/309 IVB 382/382 382/382 BYT 296/296 296/296 HSW 426/426 426/426 BDW -1 318/318 317/318 -------------------------------------Detailed------------------------------------- Platform Test drm-intel-nightly Series Applied *PNV igt_gen3_render_mixed_blits PASS(1) CRASH(1) *BDW igt_gem_gtt_hog PASS(1) DMESG_WARN(1) Note: You need to pay more attention to line start with '*' _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] drm/i915: Prevent TLB error on first execution on SNB 2015-02-13 14:35 ` [PATCH v2] " Chris Wilson 2015-02-14 0:19 ` shuang.he @ 2015-03-10 10:31 ` Daniel Vetter 2015-03-10 10:35 ` Chris Wilson 1 sibling, 1 reply; 11+ messages in thread From: Daniel Vetter @ 2015-03-10 10:31 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx On Fri, Feb 13, 2015 at 02:35:59PM +0000, Chris Wilson wrote: > Long ago I found that I was getting sporadic errors when booting SNB, > with the symptom being that the first batch died with IPEHR != *ACTHD, > typically caused by the TLB being invalid. These magically disappeared > if I held the forcewake during the entire ring initialisation sequence. > (It can probably be shortened to a short critical section, but the whole > initialisation is full of register writes and so we would be taking and > releasing forcewake almost continually, and so holding it over the > entire sequence will probably be a net win!) > > Note some of the kernels I encounted the issue already had the deferred > forcewake release, so it is still relevant. > > I know that there have been a few other reports with similar failure > conditions on SNB, I think such as > References: https://bugs.freedesktop.org/show_bug.cgi?id=80913 > > v2: Wrap i915_gem_init_hw() with its own security blanket as we take > that path following resume and reset. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_gem.c | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 8d15c8110962..08450922f373 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -4783,6 +4783,9 @@ i915_gem_init_hw(struct drm_device *dev) > if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt()) > return -EIO; > > + /* Double layer security blanket, see i915_gem_init() */ > + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL); > + > if (dev_priv->ellc_size) > I915_WRITE(HSW_IDICR, I915_READ(HSW_IDICR) | IDIHASHMSK(0xf)); > > @@ -4815,7 +4818,7 @@ i915_gem_init_hw(struct drm_device *dev) > for_each_ring(ring, dev_priv, i) { > ret = ring->init_hw(ring); > if (ret) > - return ret; > + goto out; > } > > for (i = 0; i < NUM_L3_SLICES(dev); i++) > @@ -4832,9 +4835,11 @@ i915_gem_init_hw(struct drm_device *dev) > DRM_ERROR("Context enable failed %d\n", ret); > i915_gem_cleanup_ringbuffer(dev); > > - return ret; > + goto out; > } > > +out: > + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); > return ret; > } > > @@ -4868,6 +4873,14 @@ int i915_gem_init(struct drm_device *dev) > dev_priv->gt.stop_ring = intel_logical_ring_stop; > } > > + /* This is just a security blanket to placate dragons. > + * On some systems, we very sporadically observe that the first TLBs > + * used by the CS may be stale, despite us poking the TLB reset. If > + * we hold the forcewake during initialisation these problems > + * just magically go away. > + */ > + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL); gem_init shouldn't ever touch the hw except through gem_init_hw. Do we really need the double-layer here? Also the forcewake hack in the ring init code should now be redundant, too. -Daniel > + > ret = i915_gem_init_userptr(dev); > if (ret) > goto out_unlock; > @@ -4894,6 +4907,7 @@ int i915_gem_init(struct drm_device *dev) > } > > out_unlock: > + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); > mutex_unlock(&dev->struct_mutex); > > return ret; > -- > 2.1.4 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] drm/i915: Prevent TLB error on first execution on SNB 2015-03-10 10:31 ` Daniel Vetter @ 2015-03-10 10:35 ` Chris Wilson 2015-03-10 12:55 ` Daniel Vetter 0 siblings, 1 reply; 11+ messages in thread From: Chris Wilson @ 2015-03-10 10:35 UTC (permalink / raw) To: Daniel Vetter; +Cc: intel-gfx On Tue, Mar 10, 2015 at 11:31:04AM +0100, Daniel Vetter wrote: > On Fri, Feb 13, 2015 at 02:35:59PM +0000, Chris Wilson wrote: > > Long ago I found that I was getting sporadic errors when booting SNB, > > with the symptom being that the first batch died with IPEHR != *ACTHD, > > typically caused by the TLB being invalid. These magically disappeared > > if I held the forcewake during the entire ring initialisation sequence. > > (It can probably be shortened to a short critical section, but the whole > > initialisation is full of register writes and so we would be taking and > > releasing forcewake almost continually, and so holding it over the > > entire sequence will probably be a net win!) > > > > Note some of the kernels I encounted the issue already had the deferred > > forcewake release, so it is still relevant. > > > > I know that there have been a few other reports with similar failure > > conditions on SNB, I think such as > > References: https://bugs.freedesktop.org/show_bug.cgi?id=80913 > > > > v2: Wrap i915_gem_init_hw() with its own security blanket as we take > > that path following resume and reset. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > --- > > drivers/gpu/drm/i915/i915_gem.c | 18 ++++++++++++++++-- > > 1 file changed, 16 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > index 8d15c8110962..08450922f373 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -4783,6 +4783,9 @@ i915_gem_init_hw(struct drm_device *dev) > > if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt()) > > return -EIO; > > > > + /* Double layer security blanket, see i915_gem_init() */ > > + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL); > > + > > if (dev_priv->ellc_size) > > I915_WRITE(HSW_IDICR, I915_READ(HSW_IDICR) | IDIHASHMSK(0xf)); > > > > @@ -4815,7 +4818,7 @@ i915_gem_init_hw(struct drm_device *dev) > > for_each_ring(ring, dev_priv, i) { > > ret = ring->init_hw(ring); > > if (ret) > > - return ret; > > + goto out; > > } > > > > for (i = 0; i < NUM_L3_SLICES(dev); i++) > > @@ -4832,9 +4835,11 @@ i915_gem_init_hw(struct drm_device *dev) > > DRM_ERROR("Context enable failed %d\n", ret); > > i915_gem_cleanup_ringbuffer(dev); > > > > - return ret; > > + goto out; > > } > > > > +out: > > + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); > > return ret; > > } > > > > @@ -4868,6 +4873,14 @@ int i915_gem_init(struct drm_device *dev) > > dev_priv->gt.stop_ring = intel_logical_ring_stop; > > } > > > > + /* This is just a security blanket to placate dragons. > > + * On some systems, we very sporadically observe that the first TLBs > > + * used by the CS may be stale, despite us poking the TLB reset. If > > + * we hold the forcewake during initialisation these problems > > + * just magically go away. > > + */ > > + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL); > > gem_init shouldn't ever touch the hw except through gem_init_hw. Do we > really need the double-layer here? There are register accesses before, so yes since that's how I tested it... > Also the forcewake hack in the ring > init code should now be redundant, too. I am of the opinion that they still serve documentary value. Unless you have an assert_force_wake() handy. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] drm/i915: Prevent TLB error on first execution on SNB 2015-03-10 10:35 ` Chris Wilson @ 2015-03-10 12:55 ` Daniel Vetter 2015-03-10 13:31 ` Jani Nikula 0 siblings, 1 reply; 11+ messages in thread From: Daniel Vetter @ 2015-03-10 12:55 UTC (permalink / raw) To: Chris Wilson, Daniel Vetter, intel-gfx On Tue, Mar 10, 2015 at 10:35:41AM +0000, Chris Wilson wrote: > On Tue, Mar 10, 2015 at 11:31:04AM +0100, Daniel Vetter wrote: > > On Fri, Feb 13, 2015 at 02:35:59PM +0000, Chris Wilson wrote: > > > Long ago I found that I was getting sporadic errors when booting SNB, > > > with the symptom being that the first batch died with IPEHR != *ACTHD, > > > typically caused by the TLB being invalid. These magically disappeared > > > if I held the forcewake during the entire ring initialisation sequence. > > > (It can probably be shortened to a short critical section, but the whole > > > initialisation is full of register writes and so we would be taking and > > > releasing forcewake almost continually, and so holding it over the > > > entire sequence will probably be a net win!) > > > > > > Note some of the kernels I encounted the issue already had the deferred > > > forcewake release, so it is still relevant. > > > > > > I know that there have been a few other reports with similar failure > > > conditions on SNB, I think such as > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=80913 > > > > > > v2: Wrap i915_gem_init_hw() with its own security blanket as we take > > > that path following resume and reset. > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > --- > > > drivers/gpu/drm/i915/i915_gem.c | 18 ++++++++++++++++-- > > > 1 file changed, 16 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > > index 8d15c8110962..08450922f373 100644 > > > --- a/drivers/gpu/drm/i915/i915_gem.c > > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > > @@ -4783,6 +4783,9 @@ i915_gem_init_hw(struct drm_device *dev) > > > if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt()) > > > return -EIO; > > > > > > + /* Double layer security blanket, see i915_gem_init() */ > > > + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL); > > > + > > > if (dev_priv->ellc_size) > > > I915_WRITE(HSW_IDICR, I915_READ(HSW_IDICR) | IDIHASHMSK(0xf)); > > > > > > @@ -4815,7 +4818,7 @@ i915_gem_init_hw(struct drm_device *dev) > > > for_each_ring(ring, dev_priv, i) { > > > ret = ring->init_hw(ring); > > > if (ret) > > > - return ret; > > > + goto out; > > > } > > > > > > for (i = 0; i < NUM_L3_SLICES(dev); i++) > > > @@ -4832,9 +4835,11 @@ i915_gem_init_hw(struct drm_device *dev) > > > DRM_ERROR("Context enable failed %d\n", ret); > > > i915_gem_cleanup_ringbuffer(dev); > > > > > > - return ret; > > > + goto out; > > > } > > > > > > +out: > > > + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); > > > return ret; > > > } > > > > > > @@ -4868,6 +4873,14 @@ int i915_gem_init(struct drm_device *dev) > > > dev_priv->gt.stop_ring = intel_logical_ring_stop; > > > } > > > > > > + /* This is just a security blanket to placate dragons. > > > + * On some systems, we very sporadically observe that the first TLBs > > > + * used by the CS may be stale, despite us poking the TLB reset. If > > > + * we hold the forcewake during initialisation these problems > > > + * just magically go away. > > > + */ > > > + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL); > > > > gem_init shouldn't ever touch the hw except through gem_init_hw. Do we > > really need the double-layer here? > > There are register accesses before, so yes since that's how I tested > it... > > > Also the forcewake hack in the ring > > init code should now be redundant, too. > > I am of the opinion that they still serve documentary value. Unless you > have an assert_force_wake() handy. Ok, count me convinced. Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> And I guess this is for Jani + cc: stable. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] drm/i915: Prevent TLB error on first execution on SNB 2015-03-10 12:55 ` Daniel Vetter @ 2015-03-10 13:31 ` Jani Nikula 0 siblings, 0 replies; 11+ messages in thread From: Jani Nikula @ 2015-03-10 13:31 UTC (permalink / raw) To: Daniel Vetter, Chris Wilson On Tue, 10 Mar 2015, Daniel Vetter <daniel@ffwll.ch> wrote: > On Tue, Mar 10, 2015 at 10:35:41AM +0000, Chris Wilson wrote: >> On Tue, Mar 10, 2015 at 11:31:04AM +0100, Daniel Vetter wrote: >> > On Fri, Feb 13, 2015 at 02:35:59PM +0000, Chris Wilson wrote: >> > > Long ago I found that I was getting sporadic errors when booting SNB, >> > > with the symptom being that the first batch died with IPEHR != *ACTHD, >> > > typically caused by the TLB being invalid. These magically disappeared >> > > if I held the forcewake during the entire ring initialisation sequence. >> > > (It can probably be shortened to a short critical section, but the whole >> > > initialisation is full of register writes and so we would be taking and >> > > releasing forcewake almost continually, and so holding it over the >> > > entire sequence will probably be a net win!) >> > > >> > > Note some of the kernels I encounted the issue already had the deferred >> > > forcewake release, so it is still relevant. >> > > >> > > I know that there have been a few other reports with similar failure >> > > conditions on SNB, I think such as >> > > References: https://bugs.freedesktop.org/show_bug.cgi?id=80913 >> > > >> > > v2: Wrap i915_gem_init_hw() with its own security blanket as we take >> > > that path following resume and reset. >> > > >> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >> > > --- >> > > drivers/gpu/drm/i915/i915_gem.c | 18 ++++++++++++++++-- >> > > 1 file changed, 16 insertions(+), 2 deletions(-) >> > > >> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c >> > > index 8d15c8110962..08450922f373 100644 >> > > --- a/drivers/gpu/drm/i915/i915_gem.c >> > > +++ b/drivers/gpu/drm/i915/i915_gem.c >> > > @@ -4783,6 +4783,9 @@ i915_gem_init_hw(struct drm_device *dev) >> > > if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt()) >> > > return -EIO; >> > > >> > > + /* Double layer security blanket, see i915_gem_init() */ >> > > + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL); >> > > + >> > > if (dev_priv->ellc_size) >> > > I915_WRITE(HSW_IDICR, I915_READ(HSW_IDICR) | IDIHASHMSK(0xf)); >> > > >> > > @@ -4815,7 +4818,7 @@ i915_gem_init_hw(struct drm_device *dev) >> > > for_each_ring(ring, dev_priv, i) { >> > > ret = ring->init_hw(ring); >> > > if (ret) >> > > - return ret; >> > > + goto out; >> > > } >> > > >> > > for (i = 0; i < NUM_L3_SLICES(dev); i++) >> > > @@ -4832,9 +4835,11 @@ i915_gem_init_hw(struct drm_device *dev) >> > > DRM_ERROR("Context enable failed %d\n", ret); >> > > i915_gem_cleanup_ringbuffer(dev); >> > > >> > > - return ret; >> > > + goto out; >> > > } >> > > >> > > +out: >> > > + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); >> > > return ret; >> > > } >> > > >> > > @@ -4868,6 +4873,14 @@ int i915_gem_init(struct drm_device *dev) >> > > dev_priv->gt.stop_ring = intel_logical_ring_stop; >> > > } >> > > >> > > + /* This is just a security blanket to placate dragons. >> > > + * On some systems, we very sporadically observe that the first TLBs >> > > + * used by the CS may be stale, despite us poking the TLB reset. If >> > > + * we hold the forcewake during initialisation these problems >> > > + * just magically go away. >> > > + */ >> > > + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL); >> > >> > gem_init shouldn't ever touch the hw except through gem_init_hw. Do we >> > really need the double-layer here? >> >> There are register accesses before, so yes since that's how I tested >> it... >> >> > Also the forcewake hack in the ring >> > init code should now be redundant, too. >> >> I am of the opinion that they still serve documentary value. Unless you >> have an assert_force_wake() handy. > > Ok, count me convinced. > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > And I guess this is for Jani + cc: stable. Pushed to drm-intel-fixes, thanks for the patch and review. BR, Jani. > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- Jani Nikula, Intel Open Source Technology Center _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] drm/i915: Prevent TLB error on first execution on SNB 2015-02-13 12:59 [PATCH] drm/i915: Prevent TLB error on first execution on SNB Chris Wilson 2015-02-13 13:43 ` Daniel Vetter @ 2015-02-13 19:42 ` shuang.he 1 sibling, 0 replies; 11+ messages in thread From: shuang.he @ 2015-02-13 19:42 UTC (permalink / raw) To: shuang.he, ethan.gao, intel-gfx, chris Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com) Task id: 5770 -------------------------------------Summary------------------------------------- Platform Delta drm-intel-nightly Series Applied PNV 281/281 281/281 ILK 313/313 313/313 SNB 309/309 309/309 IVB 382/382 382/382 BYT 296/296 296/296 HSW -2 426/426 424/426 BDW -1 318/318 317/318 -------------------------------------Detailed------------------------------------- Platform Test drm-intel-nightly Series Applied *HSW igt_kms_flip_plain-flip-fb-recreate PASS(1) TIMEOUT(1) *HSW igt_kms_flip_plain-flip-fb-recreate-interruptible PASS(1) TIMEOUT(1) *BDW igt_gem_gtt_hog PASS(1) DMESG_WARN(1) Note: You need to pay more attention to line start with '*' _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-03-10 13:30 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-02-13 12:59 [PATCH] drm/i915: Prevent TLB error on first execution on SNB Chris Wilson 2015-02-13 13:43 ` Daniel Vetter 2015-02-13 14:12 ` Chris Wilson 2015-02-23 22:54 ` Daniel Vetter 2015-02-13 14:35 ` [PATCH v2] " Chris Wilson 2015-02-14 0:19 ` shuang.he 2015-03-10 10:31 ` Daniel Vetter 2015-03-10 10:35 ` Chris Wilson 2015-03-10 12:55 ` Daniel Vetter 2015-03-10 13:31 ` Jani Nikula 2015-02-13 19:42 ` [PATCH] " shuang.he
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox