* [PATCH 1/2] drm/i915: Reorder media/render reset on g4x @ 2017-05-18 12:34 Mika Kuoppala 2017-05-18 12:34 ` [PATCH 2/2] drm/i915/g4x: Fix unreliable gpu reset Mika Kuoppala 0 siblings, 1 reply; 9+ messages in thread From: Mika Kuoppala @ 2017-05-18 12:34 UTC (permalink / raw) To: intel-gfx From: Chris Wilson <chris@chris-wilson.co.uk> Ville found a reference to WaMediaResetBeforeFullReset which we presume means that we should simply do the media reset first. References: https://bugs.freedesktop.org/show_bug.cgi?id=100942 Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> --- drivers/gpu/drm/i915/intel_uncore.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index a9a6933afda2..7eaaf2225e1a 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1468,12 +1468,6 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask) struct pci_dev *pdev = dev_priv->drm.pdev; int ret; - pci_write_config_byte(pdev, I915_GDRST, - GRDOM_RENDER | GRDOM_RESET_ENABLE); - ret = wait_for(g4x_reset_complete(pdev), 500); - if (ret) - goto out; - /* WaVcpClkGateDisableForMediaReset:ctg,elk */ I915_WRITE(VDECCLK_GATE_D, I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE); POSTING_READ(VDECCLK_GATE_D); @@ -1481,11 +1475,17 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask) pci_write_config_byte(pdev, I915_GDRST, GRDOM_MEDIA | GRDOM_RESET_ENABLE); ret = wait_for(g4x_reset_complete(pdev), 500); + if (ret) + goto out; /* WaVcpClkGateDisableForMediaReset:ctg,elk */ I915_WRITE(VDECCLK_GATE_D, I915_READ(VDECCLK_GATE_D) & ~VCP_UNIT_CLOCK_GATE_DISABLE); POSTING_READ(VDECCLK_GATE_D); + pci_write_config_byte(pdev, I915_GDRST, + GRDOM_RENDER | GRDOM_RESET_ENABLE); + ret = wait_for(g4x_reset_complete(pdev), 500); + out: pci_write_config_byte(pdev, I915_GDRST, 0); return ret; -- 2.11.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] drm/i915/g4x: Fix unreliable gpu reset 2017-05-18 12:34 [PATCH 1/2] drm/i915: Reorder media/render reset on g4x Mika Kuoppala @ 2017-05-18 12:34 ` Mika Kuoppala 2017-05-18 12:40 ` Mika Kuoppala ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Mika Kuoppala @ 2017-05-18 12:34 UTC (permalink / raw) To: intel-gfx; +Cc: Tomi Sarvela ELK seems to very picky about the preconditions to reset. Evidence on Eaglelake (8086:2e12 (rev 03)) shows that it does not like if reset occurs when there is active ring. Ville found out that there is workaround with name 'WaMediaResetMainRingCleanup' which suggests that we need to cleanup rings before resetting. It is unclear what cleanup exactly means but evidence shows that stopping the ring does have an effect on reset reliability. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100998 Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tomi Sarvela <tomi.p.sarvela@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> --- drivers/gpu/drm/i915/intel_uncore.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 7eaaf2225e1a..1d473cd1f8a4 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1427,6 +1427,26 @@ int i915_reg_read_ioctl(struct drm_device *dev, return ret; } +static void gen3_stop_rings(struct drm_i915_private *dev_priv) +{ + struct intel_engine_cs *engine; + enum intel_engine_id id; + + for_each_engine(engine, dev_priv, id) { + const i915_reg_t mode_r = RING_MI_MODE(engine->mmio_base); + + I915_WRITE_FW(mode_r, _MASKED_BIT_ENABLE(STOP_RING)); + if (intel_wait_for_register_fw(dev_priv, + mode_r, + MODE_IDLE, + MODE_IDLE, + 1000)) { + DRM_DEBUG("%s : timed out STOP_RING\n", + engine->name); + } + } +} + static bool i915_reset_complete(struct pci_dev *pdev) { u8 gdrst; @@ -1472,6 +1492,12 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask) I915_WRITE(VDECCLK_GATE_D, I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE); POSTING_READ(VDECCLK_GATE_D); + /* We stop engines, otherwise we might get failed reset and a + * dead gpu (on elk). + */ + /* WaMediaResetMainRingCleanup:ctg,elk (supposedly) */ + gen3_stop_rings(dev_priv); + pci_write_config_byte(pdev, I915_GDRST, GRDOM_MEDIA | GRDOM_RESET_ENABLE); ret = wait_for(g4x_reset_complete(pdev), 500); -- 2.11.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] drm/i915/g4x: Fix unreliable gpu reset 2017-05-18 12:34 ` [PATCH 2/2] drm/i915/g4x: Fix unreliable gpu reset Mika Kuoppala @ 2017-05-18 12:40 ` Mika Kuoppala 2017-05-18 12:44 ` Chris Wilson 2017-05-18 14:28 ` [PATCH v2 2/2] drm/i915/g4x: Improve gpu reset reliability Mika Kuoppala 2 siblings, 0 replies; 9+ messages in thread From: Mika Kuoppala @ 2017-05-18 12:40 UTC (permalink / raw) To: intel-gfx; +Cc: Tomi Sarvela Mika Kuoppala <mika.kuoppala@linux.intel.com> writes: > ELK seems to very picky about the preconditions to reset. > Evidence on Eaglelake (8086:2e12 (rev 03)) shows that it does > not like if reset occurs when there is active ring. > > Ville found out that there is workaround with name > 'WaMediaResetMainRingCleanup' which suggests that we need to > cleanup rings before resetting. It is unclear what cleanup > exactly means but evidence shows that stopping the ring > does have an effect on reset reliability. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100998 This is wrong. Should be Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100942 > Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tomi Sarvela <tomi.p.sarvela@intel.com> > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> > --- > drivers/gpu/drm/i915/intel_uncore.c | 26 ++++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > index 7eaaf2225e1a..1d473cd1f8a4 100644 > --- a/drivers/gpu/drm/i915/intel_uncore.c > +++ b/drivers/gpu/drm/i915/intel_uncore.c > @@ -1427,6 +1427,26 @@ int i915_reg_read_ioctl(struct drm_device *dev, > return ret; > } > > +static void gen3_stop_rings(struct drm_i915_private *dev_priv) > +{ > + struct intel_engine_cs *engine; > + enum intel_engine_id id; > + > + for_each_engine(engine, dev_priv, id) { > + const i915_reg_t mode_r = RING_MI_MODE(engine->mmio_base); > + > + I915_WRITE_FW(mode_r, _MASKED_BIT_ENABLE(STOP_RING)); > + if (intel_wait_for_register_fw(dev_priv, > + mode_r, > + MODE_IDLE, > + MODE_IDLE, > + 1000)) { > + DRM_DEBUG("%s : timed out STOP_RING\n", > + engine->name); > + } > + } > +} > + > static bool i915_reset_complete(struct pci_dev *pdev) > { > u8 gdrst; > @@ -1472,6 +1492,12 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask) > I915_WRITE(VDECCLK_GATE_D, I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE); > POSTING_READ(VDECCLK_GATE_D); > > + /* We stop engines, otherwise we might get failed reset and a > + * dead gpu (on elk). > + */ > + /* WaMediaResetMainRingCleanup:ctg,elk (supposedly) */ > + gen3_stop_rings(dev_priv); > + > pci_write_config_byte(pdev, I915_GDRST, > GRDOM_MEDIA | GRDOM_RESET_ENABLE); > ret = wait_for(g4x_reset_complete(pdev), 500); > -- > 2.11.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] drm/i915/g4x: Fix unreliable gpu reset 2017-05-18 12:34 ` [PATCH 2/2] drm/i915/g4x: Fix unreliable gpu reset Mika Kuoppala 2017-05-18 12:40 ` Mika Kuoppala @ 2017-05-18 12:44 ` Chris Wilson 2017-05-18 12:51 ` Mika Kuoppala 2017-05-18 14:28 ` [PATCH v2 2/2] drm/i915/g4x: Improve gpu reset reliability Mika Kuoppala 2 siblings, 1 reply; 9+ messages in thread From: Chris Wilson @ 2017-05-18 12:44 UTC (permalink / raw) To: Mika Kuoppala; +Cc: Tomi Sarvela, intel-gfx On Thu, May 18, 2017 at 03:34:22PM +0300, Mika Kuoppala wrote: > ELK seems to very picky about the preconditions to reset. > Evidence on Eaglelake (8086:2e12 (rev 03)) shows that it does > not like if reset occurs when there is active ring. > > Ville found out that there is workaround with name > 'WaMediaResetMainRingCleanup' which suggests that we need to > cleanup rings before resetting. It is unclear what cleanup > exactly means but evidence shows that stopping the ring > does have an effect on reset reliability. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100998 > Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tomi Sarvela <tomi.p.sarvela@intel.com> > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> > --- > drivers/gpu/drm/i915/intel_uncore.c | 26 ++++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > index 7eaaf2225e1a..1d473cd1f8a4 100644 > --- a/drivers/gpu/drm/i915/intel_uncore.c > +++ b/drivers/gpu/drm/i915/intel_uncore.c > @@ -1427,6 +1427,26 @@ int i915_reg_read_ioctl(struct drm_device *dev, > return ret; > } > > +static void gen3_stop_rings(struct drm_i915_private *dev_priv) > +{ > + struct intel_engine_cs *engine; > + enum intel_engine_id id; > + > + for_each_engine(engine, dev_priv, id) { > + const i915_reg_t mode_r = RING_MI_MODE(engine->mmio_base); > + > + I915_WRITE_FW(mode_r, _MASKED_BIT_ENABLE(STOP_RING)); This will only stop between instructions on the CS (i.e. within the ring) aiui . If we did hang in a shader, this is not going to achieve very much. > + if (intel_wait_for_register_fw(dev_priv, > + mode_r, > + MODE_IDLE, > + MODE_IDLE, > + 1000)) { > + DRM_DEBUG("%s : timed out STOP_RING\n", > + engine->name); If we make this a DRM_ERROR I expect it to fire through the hang tests in BAT. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] drm/i915/g4x: Fix unreliable gpu reset 2017-05-18 12:44 ` Chris Wilson @ 2017-05-18 12:51 ` Mika Kuoppala 2017-05-18 12:58 ` Chris Wilson 0 siblings, 1 reply; 9+ messages in thread From: Mika Kuoppala @ 2017-05-18 12:51 UTC (permalink / raw) To: Chris Wilson; +Cc: Tomi Sarvela, intel-gfx Chris Wilson <chris@chris-wilson.co.uk> writes: > On Thu, May 18, 2017 at 03:34:22PM +0300, Mika Kuoppala wrote: >> ELK seems to very picky about the preconditions to reset. >> Evidence on Eaglelake (8086:2e12 (rev 03)) shows that it does >> not like if reset occurs when there is active ring. >> >> Ville found out that there is workaround with name >> 'WaMediaResetMainRingCleanup' which suggests that we need to >> cleanup rings before resetting. It is unclear what cleanup >> exactly means but evidence shows that stopping the ring >> does have an effect on reset reliability. >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100998 >> Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> >> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> >> Cc: Chris Wilson <chris@chris-wilson.co.uk> >> Cc: Tomi Sarvela <tomi.p.sarvela@intel.com> >> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> >> --- >> drivers/gpu/drm/i915/intel_uncore.c | 26 ++++++++++++++++++++++++++ >> 1 file changed, 26 insertions(+) >> >> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c >> index 7eaaf2225e1a..1d473cd1f8a4 100644 >> --- a/drivers/gpu/drm/i915/intel_uncore.c >> +++ b/drivers/gpu/drm/i915/intel_uncore.c >> @@ -1427,6 +1427,26 @@ int i915_reg_read_ioctl(struct drm_device *dev, >> return ret; >> } >> >> +static void gen3_stop_rings(struct drm_i915_private *dev_priv) >> +{ >> + struct intel_engine_cs *engine; >> + enum intel_engine_id id; >> + >> + for_each_engine(engine, dev_priv, id) { >> + const i915_reg_t mode_r = RING_MI_MODE(engine->mmio_base); >> + >> + I915_WRITE_FW(mode_r, _MASKED_BIT_ENABLE(STOP_RING)); > > This will only stop between instructions on the CS (i.e. within the > ring) aiui . If we did hang in a shader, this is not going to achieve > very much. The subject is too grandiose then. Should I add to the subject 'basic hangs'? This seems to help get our hang tests through, bringing more coverage. I pondered adding ctl, head and tail writes to zero. It would not help with shades, but should not hurt either. > >> + if (intel_wait_for_register_fw(dev_priv, >> + mode_r, >> + MODE_IDLE, >> + MODE_IDLE, >> + 1000)) { >> + DRM_DEBUG("%s : timed out STOP_RING\n", >> + engine->name); > > If we make this a DRM_ERROR I expect it to fire through the hang tests > in BAT. Why would stopping the ring be a problem with a clean chained batch? Well I try to see how it goes. -Mika > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] drm/i915/g4x: Fix unreliable gpu reset 2017-05-18 12:51 ` Mika Kuoppala @ 2017-05-18 12:58 ` Chris Wilson 0 siblings, 0 replies; 9+ messages in thread From: Chris Wilson @ 2017-05-18 12:58 UTC (permalink / raw) To: Mika Kuoppala; +Cc: Tomi Sarvela, intel-gfx On Thu, May 18, 2017 at 03:51:19PM +0300, Mika Kuoppala wrote: > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > On Thu, May 18, 2017 at 03:34:22PM +0300, Mika Kuoppala wrote: > >> ELK seems to very picky about the preconditions to reset. > >> Evidence on Eaglelake (8086:2e12 (rev 03)) shows that it does > >> not like if reset occurs when there is active ring. > >> > >> Ville found out that there is workaround with name > >> 'WaMediaResetMainRingCleanup' which suggests that we need to > >> cleanup rings before resetting. It is unclear what cleanup > >> exactly means but evidence shows that stopping the ring > >> does have an effect on reset reliability. > >> > >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100998 > >> Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > >> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> > >> Cc: Chris Wilson <chris@chris-wilson.co.uk> > >> Cc: Tomi Sarvela <tomi.p.sarvela@intel.com> > >> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> > >> --- > >> drivers/gpu/drm/i915/intel_uncore.c | 26 ++++++++++++++++++++++++++ > >> 1 file changed, 26 insertions(+) > >> > >> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > >> index 7eaaf2225e1a..1d473cd1f8a4 100644 > >> --- a/drivers/gpu/drm/i915/intel_uncore.c > >> +++ b/drivers/gpu/drm/i915/intel_uncore.c > >> @@ -1427,6 +1427,26 @@ int i915_reg_read_ioctl(struct drm_device *dev, > >> return ret; > >> } > >> > >> +static void gen3_stop_rings(struct drm_i915_private *dev_priv) > >> +{ > >> + struct intel_engine_cs *engine; > >> + enum intel_engine_id id; > >> + > >> + for_each_engine(engine, dev_priv, id) { > >> + const i915_reg_t mode_r = RING_MI_MODE(engine->mmio_base); > >> + > >> + I915_WRITE_FW(mode_r, _MASKED_BIT_ENABLE(STOP_RING)); > > > > This will only stop between instructions on the CS (i.e. within the > > ring) aiui . If we did hang in a shader, this is not going to achieve > > very much. > > The subject is too grandiose then. Should I add to the subject 'basic > hangs'? > > This seems to help get our hang tests through, bringing more coverage. > I pondered adding ctl, head and tail writes to zero. It would not help > with shades, but should not hurt either. > > > > >> + if (intel_wait_for_register_fw(dev_priv, > >> + mode_r, > >> + MODE_IDLE, > >> + MODE_IDLE, > >> + 1000)) { > >> + DRM_DEBUG("%s : timed out STOP_RING\n", > >> + engine->name); > > > > If we make this a DRM_ERROR I expect it to fire through the hang tests > > in BAT. > > Why would stopping the ring be a problem with a clean chained batch? > Well I try to see how it goes. My understanding is that the STOP_RING only takes affect on the ring itself, our chained batches stay away from the ring. At least that's the problem I encountered when trying to use STOP_RING + SYNC_FLUSH for seqno flushing. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 2/2] drm/i915/g4x: Improve gpu reset reliability 2017-05-18 12:34 ` [PATCH 2/2] drm/i915/g4x: Fix unreliable gpu reset Mika Kuoppala 2017-05-18 12:40 ` Mika Kuoppala 2017-05-18 12:44 ` Chris Wilson @ 2017-05-18 14:28 ` Mika Kuoppala 2017-05-18 22:03 ` Chris Wilson 2 siblings, 1 reply; 9+ messages in thread From: Mika Kuoppala @ 2017-05-18 14:28 UTC (permalink / raw) To: intel-gfx; +Cc: Tomi Sarvela ELK seems to very picky about the preconditions to reset. Evidence on Eaglelake (8086:2e12 (rev 03)) shows that it does not like if reset occurs when there is active ring. Ville found out that there is workaround with name 'WaMediaResetMainRingCleanup' which suggests that we need to cleanup rings before resetting. It is unclear what cleanup exactly means but evidence shows that stopping the ring does have an effect on reset reliability. This patch makes reset succesful on hangs induced by chained batches (the igt ones). Note that if the hang is inside a shader, it is possible that our attempts to stop the ring achieves anything. v2: zero ctl,head,tail also. bug ref. use driver debugs (Chris) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100942 Testcase: igt/gem_busy/*-hang Testcase: igt/gem_ringfill/hang-* Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tomi Sarvela <tomi.p.sarvela@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> --- drivers/gpu/drm/i915/intel_uncore.c | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 7eaaf2225e1a..43da84be0321 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1427,6 +1427,35 @@ int i915_reg_read_ioctl(struct drm_device *dev, return ret; } +static void gen3_stop_rings(struct drm_i915_private *dev_priv) +{ + struct intel_engine_cs *engine; + enum intel_engine_id id; + + for_each_engine(engine, dev_priv, id) { + const u32 base = engine->mmio_base; + const i915_reg_t mode = RING_MI_MODE(base); + + I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING)); + if (intel_wait_for_register_fw(dev_priv, + mode, + MODE_IDLE, + MODE_IDLE, + 500)) + DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n", + engine->name); + + I915_WRITE_FW(RING_CTL(base), 0); + I915_WRITE_FW(RING_HEAD(base), 0); + I915_WRITE_FW(RING_TAIL(base), 0); + + /* Check acts as a post */ + if (I915_READ_FW(RING_HEAD(base)) != 0) + DRM_DEBUG_DRIVER("%s: ring head not parked\n", + engine->name); + } +} + static bool i915_reset_complete(struct pci_dev *pdev) { u8 gdrst; @@ -1472,6 +1501,12 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask) I915_WRITE(VDECCLK_GATE_D, I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE); POSTING_READ(VDECCLK_GATE_D); + /* We stop engines, otherwise we might get failed reset and a + * dead gpu (on elk). + */ + /* WaMediaResetMainRingCleanup:ctg,elk (supposedly) */ + gen3_stop_rings(dev_priv); + pci_write_config_byte(pdev, I915_GDRST, GRDOM_MEDIA | GRDOM_RESET_ENABLE); ret = wait_for(g4x_reset_complete(pdev), 500); -- 2.11.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] drm/i915/g4x: Improve gpu reset reliability 2017-05-18 14:28 ` [PATCH v2 2/2] drm/i915/g4x: Improve gpu reset reliability Mika Kuoppala @ 2017-05-18 22:03 ` Chris Wilson 2017-05-19 11:22 ` Mika Kuoppala 0 siblings, 1 reply; 9+ messages in thread From: Chris Wilson @ 2017-05-18 22:03 UTC (permalink / raw) To: Mika Kuoppala; +Cc: Tomi Sarvela, intel-gfx On Thu, May 18, 2017 at 05:28:41PM +0300, Mika Kuoppala wrote: > ELK seems to very picky about the preconditions to reset. > Evidence on Eaglelake (8086:2e12 (rev 03)) shows that it does > not like if reset occurs when there is active ring. > > Ville found out that there is workaround with name > 'WaMediaResetMainRingCleanup' which suggests that we need to > cleanup rings before resetting. It is unclear what cleanup > exactly means but evidence shows that stopping the ring > does have an effect on reset reliability. This patch makes > reset succesful on hangs induced by chained batches (the igt ones). > Note that if the hang is inside a shader, it is possible > that our attempts to stop the ring achieves anything. > > v2: zero ctl,head,tail also. bug ref. use driver debugs (Chris) > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100942 > Testcase: igt/gem_busy/*-hang > Testcase: igt/gem_ringfill/hang-* Maybe add # elk to these to indicate the problem isn't quite that widespread! > Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tomi Sarvela <tomi.p.sarvela@intel.com> > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> > --- > drivers/gpu/drm/i915/intel_uncore.c | 35 +++++++++++++++++++++++++++++++++++ > 1 file changed, 35 insertions(+) > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > index 7eaaf2225e1a..43da84be0321 100644 > --- a/drivers/gpu/drm/i915/intel_uncore.c > +++ b/drivers/gpu/drm/i915/intel_uncore.c > @@ -1427,6 +1427,35 @@ int i915_reg_read_ioctl(struct drm_device *dev, > return ret; > } > > +static void gen3_stop_rings(struct drm_i915_private *dev_priv) > +{ > + struct intel_engine_cs *engine; > + enum intel_engine_id id; > + > + for_each_engine(engine, dev_priv, id) { > + const u32 base = engine->mmio_base; > + const i915_reg_t mode = RING_MI_MODE(base); > + > + I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING)); > + if (intel_wait_for_register_fw(dev_priv, > + mode, > + MODE_IDLE, > + MODE_IDLE, > + 500)) > + DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n", > + engine->name); > + > + I915_WRITE_FW(RING_CTL(base), 0); > + I915_WRITE_FW(RING_HEAD(base), 0); > + I915_WRITE_FW(RING_TAIL(base), 0); > + > + /* Check acts as a post */ > + if (I915_READ_FW(RING_HEAD(base)) != 0) > + DRM_DEBUG_DRIVER("%s: ring head not parked\n", > + engine->name); > + } > +} > + > static bool i915_reset_complete(struct pci_dev *pdev) > { > u8 gdrst; > @@ -1472,6 +1501,12 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask) > I915_WRITE(VDECCLK_GATE_D, I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE); > POSTING_READ(VDECCLK_GATE_D); > > + /* We stop engines, otherwise we might get failed reset and a > + * dead gpu (on elk). > + */ > + /* WaMediaResetMainRingCleanup:ctg,elk (supposedly) */ Join this into a single comment block, s/supposedly/presumably/ Just a small concern we have some duplication of stop_ring() here, but I don't have a better suggestion (along the lines of export intel_stop_ring, gen3_engine_stop_ring, so far looks more confusing than helpful). As you have tested with DRM_ERROR to be sure that fear about this simply timing out for our spinning batches, it looks good to me. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] drm/i915/g4x: Improve gpu reset reliability 2017-05-18 22:03 ` Chris Wilson @ 2017-05-19 11:22 ` Mika Kuoppala 0 siblings, 0 replies; 9+ messages in thread From: Mika Kuoppala @ 2017-05-19 11:22 UTC (permalink / raw) To: Chris Wilson; +Cc: Tomi Sarvela, intel-gfx Chris Wilson <chris@chris-wilson.co.uk> writes: > On Thu, May 18, 2017 at 05:28:41PM +0300, Mika Kuoppala wrote: >> ELK seems to very picky about the preconditions to reset. >> Evidence on Eaglelake (8086:2e12 (rev 03)) shows that it does >> not like if reset occurs when there is active ring. >> >> Ville found out that there is workaround with name >> 'WaMediaResetMainRingCleanup' which suggests that we need to >> cleanup rings before resetting. It is unclear what cleanup >> exactly means but evidence shows that stopping the ring >> does have an effect on reset reliability. This patch makes >> reset succesful on hangs induced by chained batches (the igt ones). >> Note that if the hang is inside a shader, it is possible >> that our attempts to stop the ring achieves anything. >> >> v2: zero ctl,head,tail also. bug ref. use driver debugs (Chris) >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100942 >> Testcase: igt/gem_busy/*-hang >> Testcase: igt/gem_ringfill/hang-* > > Maybe add # elk to these to indicate the problem isn't quite that > widespread! > >> Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> >> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> >> Cc: Chris Wilson <chris@chris-wilson.co.uk> >> Cc: Tomi Sarvela <tomi.p.sarvela@intel.com> >> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> >> --- >> drivers/gpu/drm/i915/intel_uncore.c | 35 +++++++++++++++++++++++++++++++++++ >> 1 file changed, 35 insertions(+) >> >> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c >> index 7eaaf2225e1a..43da84be0321 100644 >> --- a/drivers/gpu/drm/i915/intel_uncore.c >> +++ b/drivers/gpu/drm/i915/intel_uncore.c >> @@ -1427,6 +1427,35 @@ int i915_reg_read_ioctl(struct drm_device *dev, >> return ret; >> } >> >> +static void gen3_stop_rings(struct drm_i915_private *dev_priv) >> +{ >> + struct intel_engine_cs *engine; >> + enum intel_engine_id id; >> + >> + for_each_engine(engine, dev_priv, id) { >> + const u32 base = engine->mmio_base; >> + const i915_reg_t mode = RING_MI_MODE(base); >> + >> + I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING)); >> + if (intel_wait_for_register_fw(dev_priv, >> + mode, >> + MODE_IDLE, >> + MODE_IDLE, >> + 500)) >> + DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n", >> + engine->name); >> + >> + I915_WRITE_FW(RING_CTL(base), 0); >> + I915_WRITE_FW(RING_HEAD(base), 0); >> + I915_WRITE_FW(RING_TAIL(base), 0); >> + >> + /* Check acts as a post */ >> + if (I915_READ_FW(RING_HEAD(base)) != 0) >> + DRM_DEBUG_DRIVER("%s: ring head not parked\n", >> + engine->name); >> + } >> +} >> + >> static bool i915_reset_complete(struct pci_dev *pdev) >> { >> u8 gdrst; >> @@ -1472,6 +1501,12 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask) >> I915_WRITE(VDECCLK_GATE_D, I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE); >> POSTING_READ(VDECCLK_GATE_D); >> >> + /* We stop engines, otherwise we might get failed reset and a >> + * dead gpu (on elk). >> + */ >> + /* WaMediaResetMainRingCleanup:ctg,elk (supposedly) */ > > Join this into a single comment block, s/supposedly/presumably/ > > Just a small concern we have some duplication of stop_ring() here, but I > don't have a better suggestion (along the lines of export intel_stop_ring, > gen3_engine_stop_ring, so far looks more confusing than helpful). As > you I had a patch which piggypacked engine->reset_hw(engine, NULL) to do the dirty work of stopping the ring. But the stop_ring of() intel_ringbuffer.c was giving up halfway if it didn't find idling the ring succesful, leaving head/tail intact. And that was on the prepare reset path. The boon was that it stopped the rings before killing the tasklet. But I decided to do more surgical approach directy on top of reset. If we find another gen which is suspectible, we might want to either piggypack on reset_hw or do a engine->stop_ring() and use it in prepare for reset path. > have tested with DRM_ERROR to be sure that fear about this simply > timing out for our spinning batches, it looks good to me. > Spinning batches in general seem to go idle nice, but gem_ringfill will spew out that ring_stop timeout debug. > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Thanks for review. Patch pushed. -Mika _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-05-19 11:23 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-05-18 12:34 [PATCH 1/2] drm/i915: Reorder media/render reset on g4x Mika Kuoppala 2017-05-18 12:34 ` [PATCH 2/2] drm/i915/g4x: Fix unreliable gpu reset Mika Kuoppala 2017-05-18 12:40 ` Mika Kuoppala 2017-05-18 12:44 ` Chris Wilson 2017-05-18 12:51 ` Mika Kuoppala 2017-05-18 12:58 ` Chris Wilson 2017-05-18 14:28 ` [PATCH v2 2/2] drm/i915/g4x: Improve gpu reset reliability Mika Kuoppala 2017-05-18 22:03 ` Chris Wilson 2017-05-19 11:22 ` Mika Kuoppala
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.