From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tomi Sarvela <tomi.p.sarvela@intel.com>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH v2 2/2] drm/i915/g4x: Improve gpu reset reliability
Date: Fri, 19 May 2017 14:22:57 +0300 [thread overview]
Message-ID: <87d1b5jdke.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <20170518220345.GA31803@nuc-i3427.alporthouse.com>
Chris Wilson <chris@chris-wilson.co.uk> writes:
> On Thu, May 18, 2017 at 05:28:41PM +0300, Mika Kuoppala wrote:
>> ELK seems to very picky about the preconditions to reset.
>> Evidence on Eaglelake (8086:2e12 (rev 03)) shows that it does
>> not like if reset occurs when there is active ring.
>>
>> Ville found out that there is workaround with name
>> 'WaMediaResetMainRingCleanup' which suggests that we need to
>> cleanup rings before resetting. It is unclear what cleanup
>> exactly means but evidence shows that stopping the ring
>> does have an effect on reset reliability. This patch makes
>> reset succesful on hangs induced by chained batches (the igt ones).
>> Note that if the hang is inside a shader, it is possible
>> that our attempts to stop the ring achieves anything.
>>
>> v2: zero ctl,head,tail also. bug ref. use driver debugs (Chris)
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100942
>> Testcase: igt/gem_busy/*-hang
>> Testcase: igt/gem_ringfill/hang-*
>
> Maybe add # elk to these to indicate the problem isn't quite that
> widespread!
>
>> Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Tomi Sarvela <tomi.p.sarvela@intel.com>
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>> ---
>> drivers/gpu/drm/i915/intel_uncore.c | 35 +++++++++++++++++++++++++++++++++++
>> 1 file changed, 35 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
>> index 7eaaf2225e1a..43da84be0321 100644
>> --- a/drivers/gpu/drm/i915/intel_uncore.c
>> +++ b/drivers/gpu/drm/i915/intel_uncore.c
>> @@ -1427,6 +1427,35 @@ int i915_reg_read_ioctl(struct drm_device *dev,
>> return ret;
>> }
>>
>> +static void gen3_stop_rings(struct drm_i915_private *dev_priv)
>> +{
>> + struct intel_engine_cs *engine;
>> + enum intel_engine_id id;
>> +
>> + for_each_engine(engine, dev_priv, id) {
>> + const u32 base = engine->mmio_base;
>> + const i915_reg_t mode = RING_MI_MODE(base);
>> +
>> + I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
>> + if (intel_wait_for_register_fw(dev_priv,
>> + mode,
>> + MODE_IDLE,
>> + MODE_IDLE,
>> + 500))
>> + DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
>> + engine->name);
>> +
>> + I915_WRITE_FW(RING_CTL(base), 0);
>> + I915_WRITE_FW(RING_HEAD(base), 0);
>> + I915_WRITE_FW(RING_TAIL(base), 0);
>> +
>> + /* Check acts as a post */
>> + if (I915_READ_FW(RING_HEAD(base)) != 0)
>> + DRM_DEBUG_DRIVER("%s: ring head not parked\n",
>> + engine->name);
>> + }
>> +}
>> +
>> static bool i915_reset_complete(struct pci_dev *pdev)
>> {
>> u8 gdrst;
>> @@ -1472,6 +1501,12 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
>> I915_WRITE(VDECCLK_GATE_D, I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE);
>> POSTING_READ(VDECCLK_GATE_D);
>>
>> + /* We stop engines, otherwise we might get failed reset and a
>> + * dead gpu (on elk).
>> + */
>> + /* WaMediaResetMainRingCleanup:ctg,elk (supposedly) */
>
> Join this into a single comment block, s/supposedly/presumably/
>
> Just a small concern we have some duplication of stop_ring() here, but I
> don't have a better suggestion (along the lines of export intel_stop_ring,
> gen3_engine_stop_ring, so far looks more confusing than helpful). As
> you
I had a patch which piggypacked engine->reset_hw(engine, NULL) to do
the dirty work of stopping the ring. But the stop_ring of()
intel_ringbuffer.c was giving up halfway if it didn't find
idling the ring succesful, leaving head/tail intact.
And that was on the prepare reset path. The boon was that
it stopped the rings before killing the tasklet.
But I decided to do more surgical approach directy on top of
reset. If we find another gen which is suspectible, we might
want to either piggypack on reset_hw or do a engine->stop_ring()
and use it in prepare for reset path.
> have tested with DRM_ERROR to be sure that fear about this simply
> timing out for our spinning batches, it looks good to me.
>
Spinning batches in general seem to go idle nice, but gem_ringfill
will spew out that ring_stop timeout debug.
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Thanks for review. Patch pushed.
-Mika
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
prev parent reply other threads:[~2017-05-19 11:23 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-18 12:34 [PATCH 1/2] drm/i915: Reorder media/render reset on g4x Mika Kuoppala
2017-05-18 12:34 ` [PATCH 2/2] drm/i915/g4x: Fix unreliable gpu reset Mika Kuoppala
2017-05-18 12:40 ` Mika Kuoppala
2017-05-18 12:44 ` Chris Wilson
2017-05-18 12:51 ` Mika Kuoppala
2017-05-18 12:58 ` Chris Wilson
2017-05-18 14:28 ` [PATCH v2 2/2] drm/i915/g4x: Improve gpu reset reliability Mika Kuoppala
2017-05-18 22:03 ` Chris Wilson
2017-05-19 11:22 ` Mika Kuoppala [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87d1b5jdke.fsf@gaia.fi.intel.com \
--to=mika.kuoppala@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
--cc=tomi.p.sarvela@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.