From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH v2 2/2] drm/i915: Abandon the reset if we fail to stop the engines
Date: Fri, 27 Oct 2017 15:18:44 +0300 [thread overview]
Message-ID: <87inf0g4wr.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <20171027104011.2341-2-chris@chris-wilson.co.uk>
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Some machines, *cough* snb *cough*, fail catastrophically if asked to
> reset the GPU under certain conditions. The initial guess is that this
> is when the rings are still busy at the time of the reset request
> (because that's a pattern we've seen elsewhere, hence why we do try
> gen3_stop_engines() before reset) so abandon the reset and leave the
> device wedged, if gen3_stop_engines() fails.
>
> v2: Only give up if not idle after emptying the ring.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103240
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
> drivers/gpu/drm/i915/intel_uncore.c | 43 ++++++++++++++++++++++++++++---------
> 1 file changed, 33 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 96ee6b2754be..f80dbff3595f 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1372,7 +1372,7 @@ int i915_reg_read_ioctl(struct drm_device *dev,
> return ret;
> }
>
> -static void gen3_stop_engine(struct intel_engine_cs *engine)
> +static bool gen3_stop_engine(struct intel_engine_cs *engine)
> {
> struct drm_i915_private *dev_priv = engine->i915;
> const u32 base = engine->mmio_base;
> @@ -1392,26 +1392,46 @@ static void gen3_stop_engine(struct intel_engine_cs *engine)
> I915_WRITE_FW(RING_HEAD(base), 0);
> I915_WRITE_FW(RING_TAIL(base), 0);
>
> + /* Check acts as a post */
> + if (intel_wait_for_register_fw(dev_priv,
> + mode,
> + MODE_IDLE,
> + MODE_IDLE,
> + 1000)) {
> + DRM_DEBUG_DRIVER("%s: timed out after clearing ring\n",
> + engine->name);
> + return false;
> + }
> +
I recall that this bailout was the reason I didn't want to
use the stop_ring in intel_ringbuffer.c
Now if you choose to reintroduce the bailout, I think you can
make a generic stop_engine and get rid of the copy.
-Mika
> /* The ring must be empty before it is disabled */
> I915_WRITE_FW(RING_CTL(base), 0);
> + POSTING_READ_FW(RING_CTL(base));
>
> - /* Check acts as a post */
> - if (I915_READ_FW(RING_HEAD(base)) != 0)
> - DRM_DEBUG_DRIVER("%s: ring head not parked\n",
> - engine->name);
> + return true;
> }
>
> -static void i915_stop_engines(struct drm_i915_private *dev_priv,
> - unsigned engine_mask)
> +static int i915_stop_engines(struct drm_i915_private *dev_priv,
> + unsigned engine_mask)
> {
> struct intel_engine_cs *engine;
> enum intel_engine_id id;
> + bool idle;
>
> if (INTEL_GEN(dev_priv) < 3)
> - return;
> + return true;
>
> + idle = true;
> for_each_engine_masked(engine, dev_priv, engine_mask, id)
> - gen3_stop_engine(engine);
> + idle &= gen3_stop_engine(engine);
> + if (idle)
> + return true;
> +
> + dev_err(dev_priv->drm.dev, "Failed to stop all engines\n");
> + for_each_engine_masked(engine, dev_priv, engine_mask, id) {
> + struct drm_printer p = drm_debug_printer(__func__);
> + intel_engine_dump(engine, &p);
> + }
> + return false;
> }
>
> static bool i915_reset_complete(struct pci_dev *pdev)
> @@ -1772,7 +1792,10 @@ int intel_gpu_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
> *
> * FIXME: Wa for more modern gens needs to be validated
> */
> - i915_stop_engines(dev_priv, engine_mask);
> + if (!i915_stop_engines(dev_priv, engine_mask)) {
> + ret = -EIO;
> + break;
> + }
>
> ret = -ENODEV;
> if (reset)
> --
> 2.15.0.rc2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2017-10-27 12:20 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-27 10:40 [PATCH v2 1/2] drm/i915: Empty the ring before disabling Chris Wilson
2017-10-27 10:40 ` [PATCH v2 2/2] drm/i915: Abandon the reset if we fail to stop the engines Chris Wilson
2017-10-27 12:18 ` Mika Kuoppala [this message]
2017-10-27 12:34 ` Chris Wilson
2017-10-27 11:26 ` ✗ Fi.CI.BAT: failure for series starting with [v2,1/2] drm/i915: Empty the ring before disabling Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87inf0g4wr.fsf@gaia.fi.intel.com \
--to=mika.kuoppala@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox