From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
intel-gfx@lists.freedesktop.org, stable@vger.kernel.org
Subject: Re: [PATCH] drm/i915: Try harder to reset the gen8+ engines
Date: Tue, 06 Sep 2016 16:25:11 +0300 [thread overview]
Message-ID: <87h99t88js.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <20160906112533.GA16682@nuc-i3427.alporthouse.com>
Chris Wilson <chris@chris-wilson.co.uk> writes:
> On Tue, Sep 06, 2016 at 02:11:47PM +0300, Mika Kuoppala wrote:
>>
>> Resending my r-b...
>
> It's not enough, even retrying the reset a few times, we still
> eventually get a timeout.
>
> This is just the usual
> 10: 0xffffffff
> 20: goto 10
> hanging batch
>>
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>>
>> > If at first we don't succeed, try again.
>> >
>> > Running the reset and recovery routines in a loop ends in a "reset
>> > request timeout" with a mtbf of an hour on Braswell. This is eerily
>> > similar to the unrecoverable reset condition that first prompted us to
>> > use the reset-request mechanism in commit 7fd2d26921d1 ("drm/i915: Reset
>> > request handling for gen8+"). Repeating the reset request makes the
>> > failure much harder to reproduce (but there is no reason to believe that
>> > it is more than mere paper over a timing or other issue).
>> >
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>> > Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
>> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> > Cc: stable@vger.kernel.org
>> > ---
>> > drivers/gpu/drm/i915/intel_uncore.c | 24 +++++++++++++-----------
>> > 1 file changed, 13 insertions(+), 11 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
>> > index e9f68cd56e32..1be8ced03ba5 100644
>> > --- a/drivers/gpu/drm/i915/intel_uncore.c
>> > +++ b/drivers/gpu/drm/i915/intel_uncore.c
>> > @@ -1688,20 +1688,22 @@ int intel_wait_for_register(struct drm_i915_private *dev_priv,
>> > static int gen8_request_engine_reset(struct intel_engine_cs *engine)
>> > {
>> > struct drm_i915_private *dev_priv = engine->i915;
>> > - int ret;
>> > + int loop = 3;
>> >
>>
>> retries?
>>
>> > - I915_WRITE_FW(RING_RESET_CTL(engine->mmio_base),
>> > - _MASKED_BIT_ENABLE(RESET_CTL_REQUEST_RESET));
>> > + do {
>> > + I915_WRITE_FW(RING_RESET_CTL(engine->mmio_base),
>> > + _MASKED_BIT_ENABLE(RESET_CTL_REQUEST_RESET));
>> >
>> > - ret = intel_wait_for_register_fw(dev_priv,
>> > - RING_RESET_CTL(engine->mmio_base),
>> > - RESET_CTL_READY_TO_RESET,
>> > - RESET_CTL_READY_TO_RESET,
>> > - 700);
>> > - if (ret)
>> > - DRM_ERROR("%s: reset request timeout\n", engine->name);
>> > + if (!intel_wait_for_register_fw(dev_priv,
>> > + RING_RESET_CTL(engine->mmio_base),
>> > + RESET_CTL_READY_TO_RESET,
>> > + RESET_CTL_READY_TO_RESET,
>> > + 700))
>>
>>
>> Did you check between the write didn't stick vs the readyness didn't
>> signal?
>
> The read will flush the write. So reading back the bit we just set shows
> us that the hw is not yet ready.
>
> Shouldn't we also be waiting for the reest bit to clear on cancel? And
> verifying that the reset itself did?
>
Request -> timeout -> cancel and wait 2 lowest bit to be zero and
then only after that retry?
And would not hurt to check they are zero before continuing after
a succesful reset also.
-Mika
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 6b84bd63310c..36d9a485b788 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1701,6 +1701,9 @@ static void gen8_unrequest_engine_reset(struct intel_engine_cs *engine)
>
> I915_WRITE_FW(RING_RESET_CTL(engine->mmio_base),
> _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET));
> + intel_wait_for_register_fw(dev_priv,
> + RING_RESET_CTL(engine->mmio_base),
> + RESET_CTL_READY_TO_RESET, 0, 700);
> }
>
> static int gen8_reset_engines(struct drm_i915_private *dev_priv,
> @@ -1708,18 +1711,18 @@ static int gen8_reset_engines(struct drm_i915_private *dev_priv,
> {
> struct intel_engine_cs *engine;
> unsigned int tmp;
> + int ret = -EIO;
>
> for_each_engine_masked(engine, dev_priv, engine_mask, tmp)
> if (gen8_request_engine_reset(engine))
> goto not_ready;
>
> - return gen6_reset_engines(dev_priv, engine_mask);
> + ret = gen6_reset_engines(dev_priv, engine_mask);
>
> -not_ready:
> for_each_engine_masked(engine, dev_priv, engine_mask, tmp)
> gen8_unrequest_engine_reset(engine);
>
> - return -EIO;
> + return ret;
> }
>
> typedef int (*reset_func)(struct drm_i915_private *, unsigned engine_mask);
>
> --
> Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
prev parent reply other threads:[~2016-09-06 13:27 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-05 9:30 [PATCH] drm/i915: Try harder to reset the gen8+ engines Chris Wilson
2016-09-05 9:50 ` ✓ Fi.CI.BAT: success for " Patchwork
2016-09-06 11:11 ` [PATCH] " Mika Kuoppala
2016-09-06 11:25 ` Chris Wilson
2016-09-06 13:25 ` Mika Kuoppala [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h99t88js.fsf@gaia.fi.intel.com \
--to=mika.kuoppala@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=daniel.vetter@ffwll.ch \
--cc=intel-gfx@lists.freedesktop.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox