public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Michel Thierry <michel.thierry@intel.com>
To: intel-gfx@lists.freedesktop.org
Subject: [PATCH v9 15/21] drm/i915/guc: Add support for reset engine using GuC commands
Date: Thu, 15 Jun 2017 13:18:22 -0700	[thread overview]
Message-ID: <20170615201828.23144-16-michel.thierry@intel.com> (raw)
In-Reply-To: <20170615201828.23144-1-michel.thierry@intel.com>

This patch adds per engine reset and recovery (TDR) support when GuC is
used to submit workloads to GPU.

In the case of i915 directly submission to ELSP, driver manages hang
detection, recovery and resubmission. With GuC submission these tasks
are shared between driver and GuC. i915 is still responsible for detecting
a hang, and when it does it only requests GuC to reset that Engine. GuC
internally manages acquiring forcewake and idling the engine before actually
resetting it.

Once the reset is successful, i915 takes over again and handles resubmission.
The scheduler in i915 knows which requests are pending so after resetting
a engine, pending workloads/requests are resubmitted again.

v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
non-guc funtion names.

v3: Removed debug message about engine restarting from which request,
since the new baseline do it regardless of submission mode. (Chris)

Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c            | 24 +++++++++++----
 drivers/gpu/drm/i915/i915_drv.h            |  1 +
 drivers/gpu/drm/i915/i915_guc_submission.c | 48 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_guc_fwif.h      |  6 ++++
 drivers/gpu/drm/i915/intel_uc.h            |  1 +
 drivers/gpu/drm/i915/intel_uncore.c        |  5 ----
 6 files changed, 75 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 78df1477080a..32a580dc45de 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1940,11 +1940,21 @@ int i915_reset_engine(struct intel_engine_cs *engine)
 	 */
 	i915_gem_reset_engine(engine, active_request);
 
-	/* finally, reset engine */
-	ret = intel_gpu_reset(dev_priv, intel_engine_flag(engine));
-	if (ret) {
-		DRM_ERROR("Failed to reset %s, ret=%d\n", engine->name, ret);
-		goto out;
+	if (!dev_priv->guc.execbuf_client) {
+		/* finally, reset engine */
+		ret = intel_gpu_reset(dev_priv, intel_engine_flag(engine));
+		if (ret) {
+			DRM_ERROR("Failed to reset %s, ret=%d\n",
+				  engine->name, ret);
+			goto out;
+		}
+	} else {
+		ret = i915_guc_reset_engine(engine);
+		if (ret) {
+			DRM_ERROR("GuC failed to reset %s, ret=%d\n",
+				  engine->name, ret);
+			goto out;
+		}
 	}
 
 	i915_gem_reset_finish_engine(engine);
@@ -1958,6 +1968,10 @@ int i915_reset_engine(struct intel_engine_cs *engine)
 	if (ret)
 		goto out;
 
+	/* for guc too */
+	if (dev_priv->guc.execbuf_client)
+		i915_guc_submission_reenable_engine(engine);
+
 	error->reset_engine_count[engine->id]++;
 out:
 	return ret;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e657d56120d7..9869acacb548 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3105,6 +3105,7 @@ extern void i915_reset(struct drm_i915_private *dev_priv);
 extern int i915_reset_engine(struct intel_engine_cs *engine);
 extern bool intel_has_reset_engine(struct drm_i915_private *dev_priv);
 extern int intel_reset_guc(struct drm_i915_private *dev_priv);
+extern int i915_guc_reset_engine(struct intel_engine_cs *engine);
 extern void intel_engine_init_hangcheck(struct intel_engine_cs *engine);
 extern void intel_hangcheck_init(struct drm_i915_private *dev_priv);
 extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 2bb18d24ed42..782ef09b90f5 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -1362,6 +1362,25 @@ void i915_guc_submission_disable(struct drm_i915_private *dev_priv)
 	guc->execbuf_client = NULL;
 }
 
+void i915_guc_submission_reenable_engine(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+	struct intel_guc *guc = &dev_priv->guc;
+	struct i915_guc_client *client = guc->execbuf_client;
+	const int wqi_size = sizeof(struct guc_wq_item);
+	struct drm_i915_gem_request *rq;
+
+	GEM_BUG_ON(!client);
+	intel_guc_sample_forcewake(guc);
+
+	spin_lock_irq(&engine->timeline->lock);
+	list_for_each_entry(rq, &engine->timeline->requests, link) {
+		guc_client_update_wq_rsvd(client, wqi_size);
+		__i915_guc_submit(rq);
+	}
+	spin_unlock_irq(&engine->timeline->lock);
+}
+
 /**
  * intel_guc_suspend() - notify GuC entering suspend state
  * @dev_priv:	i915 device private
@@ -1413,3 +1432,32 @@ int intel_guc_resume(struct drm_i915_private *dev_priv)
 
 	return intel_guc_send(guc, data, ARRAY_SIZE(data));
 }
+
+int i915_guc_reset_engine(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+	struct intel_guc *guc = &dev_priv->guc;
+	struct i915_gem_context *ctx;
+	u32 data[7];
+
+	if (!i915.enable_guc_submission)
+		return 0;
+
+	ctx = dev_priv->kernel_context;
+
+	/*
+	 * The affected context report is populated by GuC and is provided
+	 * to the driver using the shared page. We request for it but don't
+	 * use it as scheduler has all of these details.
+	 */
+	data[0] = INTEL_GUC_ACTION_REQUEST_ENGINE_RESET;
+	data[1] = engine->guc_id;
+	data[2] = INTEL_GUC_RESET_OPTION_REPORT_AFFECTED_CONTEXTS;
+	data[3] = 0;
+	data[4] = 0;
+	data[5] = guc->execbuf_client->stage_id;
+	/* first page is shared data with GuC */
+	data[6] = guc_ggtt_offset(ctx->engine[RCS].state);
+
+	return intel_guc_send(guc, data, ARRAY_SIZE(data));
+}
diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h b/drivers/gpu/drm/i915/intel_guc_fwif.h
index 6cbf7c97d186..022704a835b3 100644
--- a/drivers/gpu/drm/i915/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/intel_guc_fwif.h
@@ -546,6 +546,7 @@ union guc_log_control {
 /* This Action will be programmed in C180 - SOFT_SCRATCH_O_REG */
 enum intel_guc_action {
 	INTEL_GUC_ACTION_DEFAULT = 0x0,
+	INTEL_GUC_ACTION_REQUEST_ENGINE_RESET = 0x3,
 	INTEL_GUC_ACTION_SAMPLE_FORCEWAKE = 0x6,
 	INTEL_GUC_ACTION_ALLOCATE_DOORBELL = 0x10,
 	INTEL_GUC_ACTION_DEALLOCATE_DOORBELL = 0x20,
@@ -561,6 +562,11 @@ enum intel_guc_action {
 	INTEL_GUC_ACTION_LIMIT
 };
 
+/* Reset engine options */
+enum action_engine_reset_options {
+	INTEL_GUC_RESET_OPTION_REPORT_AFFECTED_CONTEXTS = 0x10,
+};
+
 /*
  * The GuC sends its response to a command by overwriting the
  * command in SS0. The response is distinguishable from a command
diff --git a/drivers/gpu/drm/i915/intel_uc.h b/drivers/gpu/drm/i915/intel_uc.h
index 69daf4c01cd0..ff6d19b766e0 100644
--- a/drivers/gpu/drm/i915/intel_uc.h
+++ b/drivers/gpu/drm/i915/intel_uc.h
@@ -250,6 +250,7 @@ int i915_guc_wq_reserve(struct drm_i915_gem_request *rq);
 void i915_guc_wq_unreserve(struct drm_i915_gem_request *request);
 void i915_guc_submission_disable(struct drm_i915_private *dev_priv);
 void i915_guc_submission_fini(struct drm_i915_private *dev_priv);
+void i915_guc_submission_reenable_engine(struct intel_engine_cs *engine);
 struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size);
 
 /* intel_guc_log.c */
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 713a88cebf57..2c73a0c7024a 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1723,14 +1723,9 @@ bool intel_has_gpu_reset(struct drm_i915_private *dev_priv)
 	return intel_get_gpu_reset(dev_priv) != NULL;
 }
 
-/*
- * When GuC submission is enabled, GuC manages ELSP and can initiate the
- * engine reset too. For now, fall back to full GPU reset if it is enabled.
- */
 bool intel_has_reset_engine(struct drm_i915_private *dev_priv)
 {
 	return (dev_priv->info.has_reset_engine &&
-		!dev_priv->guc.execbuf_client &&
 		i915.reset >= 2);
 }
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2017-06-15 20:18 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-15 20:18 [PATCH v9 00/21] Gen8+ engine-reset Michel Thierry
2017-06-15 20:18 ` [PATCH v9 01/21] drm/i915: Look for active requests earlier in the reset path Michel Thierry
2017-06-15 20:18 ` [PATCH v9 02/21] drm/i915: Update i915.reset to handle engine resets Michel Thierry
2017-06-15 20:18 ` [PATCH v9 03/21] drm/i915: Modify error handler for per engine hang recovery Michel Thierry
2017-06-15 20:18 ` [PATCH v9 04/21] drm/i915: Include reset engine information in has_gpu_reset getparam Michel Thierry
2017-06-15 20:18 ` [PATCH v9 05/21] drm/i915: Add support for per engine reset recovery Michel Thierry
2017-06-19 12:31   ` Chris Wilson
2017-06-19 12:46   ` Chris Wilson
2017-06-19 18:42     ` Michel Thierry
2017-06-15 20:18 ` [PATCH v9 06/21] drm/i915: Add engine reset count to error state Michel Thierry
2017-06-15 20:18 ` [PATCH v9 07/21] drm/i915: Export per-engine reset count info to debugfs Michel Thierry
2017-06-15 20:18 ` [PATCH v9 RFC 08/21] drm/i915: Carry on with reset even if hw engine is not ready Michel Thierry
2017-06-15 20:18 ` [PATCH v9 09/21] drm/i915: Enable Engine reset and recovery support Michel Thierry
2017-06-15 20:18 ` [PATCH v9 10/21] drm/i915: Add engine reset count in get-reset-stats ioctl Michel Thierry
2017-06-15 21:14   ` Chris Wilson
2017-06-15 21:23     ` Michel Thierry
2017-06-15 20:18 ` [PATCH v9 11/21] drm/i915/selftests: reset engine self tests Michel Thierry
2017-06-15 20:18 ` [PATCH v9 12/21] drm/i915/guc: fix mmio whitelist mmio_start offset and add reminder Michel Thierry
2017-06-15 20:18 ` [PATCH v9 13/21] drm/i915/guc: Provide register list to be saved/restored during engine reset Michel Thierry
2017-06-15 20:18 ` [PATCH v9 14/21] drm/i915/guc: Rename the function that resets the GuC Michel Thierry
2017-06-15 20:18 ` Michel Thierry [this message]
2017-06-15 20:18 ` [PATCH v9 16/21] drm/i915: Watchdog timeout: Pass GuC shared data structure during param load Michel Thierry
2017-06-15 20:18 ` [PATCH v9 17/21] drm/i915: Watchdog timeout: IRQ handler for gen8+ Michel Thierry
2017-06-15 20:18 ` [PATCH v9 18/21] drm/i915: Watchdog timeout: Ringbuffer command emission " Michel Thierry
2017-06-15 20:18 ` [PATCH v9 19/21] drm/i915: Watchdog timeout: DRM kernel interface to set the timeout Michel Thierry
2017-06-15 20:18 ` [PATCH v9 20/21] drm/i915: Watchdog timeout: Include threshold value in error state Michel Thierry
2017-06-15 20:18 ` [PATCH v9 21/21] drm/i915: Watchdog timeout: Export media reset count from GuC to debugfs Michel Thierry
2017-06-15 20:42 ` ✓ Fi.CI.BAT: success for Gen8+ engine-reset (rev13) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170615201828.23144-16-michel.thierry@intel.com \
    --to=michel.thierry@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox