public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Arun Siluvery <arun.siluvery@linux.intel.com>
To: intel-gfx@lists.freedesktop.org
Cc: Tomas Elf <tomas.elf@intel.com>
Subject: [PATCH v2 08/15] drm/i915/tdr: Add support for per engine reset recovery
Date: Fri, 17 Jun 2016 08:09:08 +0100	[thread overview]
Message-ID: <1466147355-4635-9-git-send-email-arun.siluvery@linux.intel.com> (raw)
In-Reply-To: <1466147355-4635-1-git-send-email-arun.siluvery@linux.intel.com>

This change implements support for per-engine reset as an initial, less
intrusive hang recovery option to be attempted before falling back to the
legacy full GPU reset recovery mode if necessary. This is only supported
from Gen8 onwards.

Hangchecker determines which engines are hung and invokes error handler to
recover from it. Error handler schedules recovery for each of those engines
that are hung. The recovery procedure is as follows,
 - force engine to idle: this is done by issuing a reset request
 - save current state which includes head and current request
 - reset engine
 - restore saved state and resubmit context

If engine reset fails then we fall back to heavy weight full gpu reset
which resets all engines and reinitiazes complete state of HW and SW.

Possible reasons for failure,
 - engine not ready for reset
 - if the HW and SW doesn't agree on the context that caused the hang
 - reset itself failed for some reason

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c     | 54 +++++++++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_drv.h     |  4 +++
 drivers/gpu/drm/i915/i915_gem.c     |  4 +--
 drivers/gpu/drm/i915/intel_uncore.c | 33 +++++++++++++++++++++++
 4 files changed, 91 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index dfac8b3..cb76da1 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1027,10 +1027,60 @@ error:
  */
 int i915_reset_engine(struct intel_engine_cs *engine)
 {
+	struct drm_device *dev = engine->i915->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs_state state;
 	int ret;
 
-	/* FIXME: replace me with engine reset sequence */
-	ret = -ENODEV;
+	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
+
+	i915_gem_reset_engine_status(dev_priv, engine);
+
+	/*Take wake lock to prevent power saving mode */
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+	ret = intel_request_for_reset(engine);
+	if (ret) {
+		DRM_ERROR("Failed to disable %s\n", engine->name);
+		goto out;
+	}
+
+	ret = engine->save(engine, &state);
+	if (ret)
+		goto enable_engine;
+
+	ret = intel_gpu_reset(dev_priv, intel_engine_flag(engine));
+	if (ret) {
+		DRM_ERROR("Failed to reset %s, ret=%d\n", engine->name, ret);
+		goto enable_engine;
+	}
+
+	ret = engine->init_hw(engine);
+	if (ret)
+		goto out;
+
+	/*
+	 * Restart the engine after reset.
+	 * Engine state is first restored and the context is resubmitted.
+	 */
+	engine->start(engine, &state);
+
+enable_engine:
+	/*
+	 * we only need to enable engine if we cannot either save engine state
+	 * or reset fails. If the reset is successful, engine gets enabled
+	 * automatically so we can skip this step.
+	 */
+	if (ret)
+		intel_clear_reset_request(engine);
+
+out:
+	if (state.req)
+		i915_gem_request_unreference(state.req);
+
+	/* Wake up anything waiting on this engine's queue */
+	wake_up_all(&engine->irq_queue);
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b35ca02..b2105bb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2910,6 +2910,8 @@ extern int intel_gpu_reset(struct drm_i915_private *dev_priv, u32 engine_mask);
 extern bool intel_has_gpu_reset(struct drm_i915_private *dev_priv);
 extern int i915_reset(struct drm_i915_private *dev_priv);
 extern bool intel_has_engine_reset(struct drm_i915_private *dev_priv);
+extern int intel_request_for_reset(struct intel_engine_cs *engine);
+extern int intel_clear_reset_request(struct intel_engine_cs *engine);
 extern int i915_reset_engine(struct intel_engine_cs *engine);
 extern int intel_guc_reset(struct drm_i915_private *dev_priv);
 extern void intel_engine_init_hangcheck(struct intel_engine_cs *engine);
@@ -3331,6 +3333,8 @@ static inline bool i915_stop_ring_allow_warn(struct drm_i915_private *dev_priv)
 }
 
 void i915_gem_reset(struct drm_device *dev);
+void i915_gem_reset_engine_status(struct drm_i915_private *dev_priv,
+				  struct intel_engine_cs *ring);
 bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
 int __must_check i915_gem_init(struct drm_device *dev);
 int i915_gem_init_engines(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 21d0dea..6160564 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3082,8 +3082,8 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
 	return NULL;
 }
 
-static void i915_gem_reset_engine_status(struct drm_i915_private *dev_priv,
-				       struct intel_engine_cs *engine)
+void i915_gem_reset_engine_status(struct drm_i915_private *dev_priv,
+				  struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_request *request;
 	bool ring_hung;
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index d973c2a..63acaeb 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1734,6 +1734,39 @@ int intel_guc_reset(struct drm_i915_private *dev_priv)
 	return ret;
 }
 
+/*
+ * On gen8+ a reset request has to be issued via the reset control register
+ * before a GPU engine can be reset in order to stop the command streamer
+ * and idle the engine. This replaces the legacy way of stopping an engine
+ * by writing to the stop ring bit in the MI_MODE register.
+ */
+int intel_request_for_reset(struct intel_engine_cs *engine)
+{
+	if (!intel_has_engine_reset(engine->i915)) {
+		DRM_ERROR("Engine Reset not supported on Gen%d\n",
+			  INTEL_INFO(engine->i915)->gen);
+		return -EINVAL;
+	}
+
+	return gen8_request_engine_reset(engine);
+}
+
+/*
+ * It is possible to back off from a previously issued reset request by simply
+ * clearing the reset request bit in the reset control register.
+ */
+int intel_clear_reset_request(struct intel_engine_cs *engine)
+{
+	if (!intel_has_engine_reset(engine->i915)) {
+		DRM_ERROR("Request to clear reset not supported on Gen%d\n",
+			  INTEL_INFO(engine->i915)->gen);
+		return -EINVAL;
+	}
+
+	gen8_unrequest_engine_reset(engine);
+	return 0;
+}
+
 bool intel_uncore_unclaimed_mmio(struct drm_i915_private *dev_priv)
 {
 	return check_for_unclaimed_mmio(dev_priv);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2016-06-17  7:09 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 01/15] drm/i915: Update i915.reset to handle engine resets Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 02/15] drm/i915/tdr: Extend the idea of reset_counter to engine reset Arun Siluvery
2016-06-17  7:35   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 03/15] drm/i915: Reinstate hang recovery work queue Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 04/15] drm/i915/tdr: Modify error handler for per engine hang recovery Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 05/15] drm/i915/tdr: Prepare execlist submission to handle tdr resubmission after reset Arun Siluvery
2016-06-17  7:39   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 06/15] drm/i915/tdr: Capture engine state before reset Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 07/15] drm/i915/tdr: Restore engine state and start after reset Arun Siluvery
2016-06-17  7:32   ` Chris Wilson
2016-06-17  7:09 ` Arun Siluvery [this message]
2016-06-17  7:09 ` [PATCH v2 09/15] drm/i915: Skip reset request if there is one already Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 10/15] drm/i915: Extending i915_gem_check_wedge to check engine reset in progress Arun Siluvery
2016-06-17  7:41   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 11/15] drm/i915: Port of Added scheduler support to __wait_request() calls Arun Siluvery
2016-06-17  7:42   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 12/15] drm/i915/tdr: Add engine reset count to error state Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 13/15] drm/i915/tdr: Export reset count info to debugfs Arun Siluvery
2016-06-17  7:21   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 14/15] drm/i915/tdr: Enable Engine reset and recovery support Arun Siluvery
2016-06-17  7:09 ` [ONLY FOR BAT v2 15/15] drm/i915: Disable GuC submission for testing Engine reset patches Arun Siluvery
2016-06-17  7:33 ` ✗ Ro.CI.BAT: failure for Execlist based Engine reset and recovery Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1466147355-4635-9-git-send-email-arun.siluvery@linux.intel.com \
    --to=arun.siluvery@linux.intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tomas.elf@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox