public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH v2 00/15] Execlist based Engine reset and recovery
@ 2016-06-17  7:09 Arun Siluvery
  2016-06-17  7:09 ` [PATCH v2 01/15] drm/i915: Update i915.reset to handle engine resets Arun Siluvery
                   ` (15 more replies)
  0 siblings, 16 replies; 23+ messages in thread
From: Arun Siluvery @ 2016-06-17  7:09 UTC (permalink / raw)
  To: intel-gfx

This series is to add Engine reset support (aka TDR - Timeout detection and
recovery) to i915. This is supported from Gen8 onwards and this is the
implementation with Execlist submission.

Upstream effort was originally started by Tomas Elf, now I am cotinuing it.

These patches are based on current nightly. TDR makes some changes to
wait_request path and there are some patches on the list which are
improving that path but may take some time before they get merged, these
paches need to updated accordingly if they get merged soon.

This is mainly a rebase of previous version [1].

[1] https://lists.freedesktop.org/archives/intel-gfx/2016-April/092349.html

Arun Siluvery (13):
  drm/i915: Update i915.reset to handle engine resets
  drm/i915/tdr: Extend the idea of reset_counter to engine reset
  drm/i915/tdr: Modify error handler for per engine hang recovery
  drm/i915/tdr: Prepare execlist submission to handle tdr resubmission
    after reset
  drm/i915/tdr: Capture engine state before reset
  drm/i915/tdr: Restore engine state and start after reset
  drm/i915/tdr: Add support for per engine reset recovery
  drm/i915: Extending i915_gem_check_wedge to check engine reset in
    progress
  drm/i915: Port of Added scheduler support to __wait_request() calls
  drm/i915/tdr: Add engine reset count to error state
  drm/i915/tdr: Export reset count info to debugfs
  drm/i915/tdr: Enable Engine reset and recovery support
  drm/i915: Disable GuC submission for testing Engine reset patches

Mika Kuoppala (1):
  drm/i915: Skip reset request if there is one already

Tomas Elf (1):
  drm/i915: Reinstate hang recovery work queue.

 drivers/gpu/drm/i915/i915_debugfs.c     |  33 +++++
 drivers/gpu/drm/i915/i915_dma.c         |   1 +
 drivers/gpu/drm/i915/i915_drv.c         |  73 ++++++++++
 drivers/gpu/drm/i915/i915_drv.h         |  34 ++++-
 drivers/gpu/drm/i915/i915_gem.c         |  79 ++++++++---
 drivers/gpu/drm/i915/i915_gpu_error.c   |   3 +
 drivers/gpu/drm/i915/i915_irq.c         | 231 +++++++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_params.c      |  10 +-
 drivers/gpu/drm/i915/i915_params.h      |   2 +-
 drivers/gpu/drm/i915/intel_display.c    |   5 +-
 drivers/gpu/drm/i915/intel_lrc.c        | 210 +++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_lrc.h        |   3 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |   8 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  19 +++
 drivers/gpu/drm/i915/intel_uncore.c     |  49 ++++++-
 15 files changed, 667 insertions(+), 93 deletions(-)

-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 01/15] drm/i915: Update i915.reset to handle engine resets
  2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
@ 2016-06-17  7:09 ` Arun Siluvery
  2016-06-17  7:09 ` [PATCH v2 02/15] drm/i915/tdr: Extend the idea of reset_counter to engine reset Arun Siluvery
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Arun Siluvery @ 2016-06-17  7:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

In preparation for engine reset work update this parameter to handle more
than one type of reset. Default at the moment is still full gpu reset.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_params.c | 6 +++---
 drivers/gpu/drm/i915/i915_params.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 573e787..b012da0 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -45,7 +45,7 @@ struct i915_params i915 __read_mostly = {
 	.fastboot = 0,
 	.prefault_disable = 0,
 	.load_detect_test = 0,
-	.reset = true,
+	.reset = 1,
 	.invert_brightness = 0,
 	.disable_display = 0,
 	.enable_cmd_parser = 1,
@@ -110,8 +110,8 @@ MODULE_PARM_DESC(vbt_sdvo_panel_type,
 	"Override/Ignore selection of SDVO panel mode in the VBT "
 	"(-2=ignore, -1=auto [default], index in VBT BIOS table)");
 
-module_param_named_unsafe(reset, i915.reset, bool, 0600);
-MODULE_PARM_DESC(reset, "Attempt GPU resets (default: true)");
+module_param_named_unsafe(reset, i915.reset, int, 0600);
+MODULE_PARM_DESC(reset, "Attempt GPU resets (0=disabled, 1=full gpu reset [default], 2=engine reset)");
 
 module_param_named_unsafe(enable_hangcheck, i915.enable_hangcheck, bool, 0644);
 MODULE_PARM_DESC(enable_hangcheck,
diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
index 1323261..efe7523 100644
--- a/drivers/gpu/drm/i915/i915_params.h
+++ b/drivers/gpu/drm/i915/i915_params.h
@@ -34,6 +34,7 @@ struct i915_params {
 	int lvds_channel_mode;
 	int panel_use_ssc;
 	int vbt_sdvo_panel_type;
+	int reset;
 	int enable_rc6;
 	int enable_dc;
 	int enable_fbc;
@@ -57,7 +58,6 @@ struct i915_params {
 	bool fastboot;
 	bool prefault_disable;
 	bool load_detect_test;
-	bool reset;
 	bool disable_display;
 	bool verbose_state_checks;
 	bool nuclear_pageflip;
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 02/15] drm/i915/tdr: Extend the idea of reset_counter to engine reset
  2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
  2016-06-17  7:09 ` [PATCH v2 01/15] drm/i915: Update i915.reset to handle engine resets Arun Siluvery
@ 2016-06-17  7:09 ` Arun Siluvery
  2016-06-17  7:35   ` Chris Wilson
  2016-06-17  7:09 ` [PATCH v2 03/15] drm/i915: Reinstate hang recovery work queue Arun Siluvery
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 23+ messages in thread
From: Arun Siluvery @ 2016-06-17  7:09 UTC (permalink / raw)
  To: intel-gfx

This change extends the idea of reset_counter variable to engine reset by
creating additional variables for each engine. Least significant bit is set
to mark the engine reset is pending and once reset is successful it is
incremented again, this is further used to count the no of engine resets.

Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9fa9698..8bb05b2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1401,6 +1401,12 @@ struct i915_gpu_error {
 #define I915_RESET_IN_PROGRESS_FLAG	1
 #define I915_WEDGED			(1 << 31)
 
+	/* indicates request to reset engine */
+#define I915_ENGINE_RESET_IN_PROGRESS	(1<<0)
+
+	/* extending the idea of reset_counter to engine reset */
+	atomic_t engine_reset_counter[I915_NUM_ENGINES];
+
 	/**
 	 * Waitqueue to signal when the reset has completed. Used by clients
 	 * that wait for dev_priv->mm.wedged to settle.
@@ -3296,6 +3302,19 @@ static inline u32 i915_reset_count(struct i915_gpu_error *error)
 	return ((i915_reset_counter(error) & ~I915_WEDGED) + 1) / 2;
 }
 
+static inline bool i915_engine_reset_in_progress(struct i915_gpu_error *error,
+						 u32 engine_id)
+{
+	return unlikely(atomic_read(&error->engine_reset_counter[engine_id])
+			& I915_ENGINE_RESET_IN_PROGRESS);
+}
+
+static inline u32 i915_engine_reset_count(struct i915_gpu_error *error,
+					  struct intel_engine_cs *engine)
+{
+	return (atomic_read(&error->engine_reset_counter[engine->id]) + 1) / 2;
+}
+
 static inline bool i915_stop_ring_allow_ban(struct drm_i915_private *dev_priv)
 {
 	return dev_priv->gpu_error.stop_rings == 0 ||
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 03/15] drm/i915: Reinstate hang recovery work queue.
  2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
  2016-06-17  7:09 ` [PATCH v2 01/15] drm/i915: Update i915.reset to handle engine resets Arun Siluvery
  2016-06-17  7:09 ` [PATCH v2 02/15] drm/i915/tdr: Extend the idea of reset_counter to engine reset Arun Siluvery
@ 2016-06-17  7:09 ` Arun Siluvery
  2016-06-17  7:09 ` [PATCH v2 04/15] drm/i915/tdr: Modify error handler for per engine hang recovery Arun Siluvery
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Arun Siluvery @ 2016-06-17  7:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala, Tomas Elf

From: Tomas Elf <tomas.elf@intel.com>

There used to be a work queue separating the error handler from the hang
recovery path, which was removed a while back in this commit:

commit b8d24a06568368076ebd5a858a011699a97bfa42
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Wed Jan 28 17:03:14 2015 +0200

    drm/i915: Remove nested work in gpu error handling

Now we need to revert most of that commit since the work queue separating
hang detection from hang recovery is needed in preparation for the upcoming
watchdog timeout feature. The watchdog interrupt service routine will be a
second callsite of the error handler alongside the periodic hang checker,
which runs in a work queue context. Seeing as the error handler will be
serving a caller in a hard interrupt execution context that means that the
error handler must never end up in a situation where it needs to grab the
struct_mutex.  Unfortunately, that is exactly what we need to do first at
the start of the hang recovery path, which might potentially sleep if the
struct_mutex is already held by another thread. Not good when you're in a
hard interrupt context.

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_dma.c |  1 +
 drivers/gpu/drm/i915/i915_drv.h |  1 +
 drivers/gpu/drm/i915/i915_irq.c | 26 +++++++++++++++++++++-----
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 24b670f..8a94fc5 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1552,6 +1552,7 @@ int i915_driver_unload(struct drm_device *dev)
 	/* Free error state after interrupts are fully disabled. */
 	cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
 	i915_destroy_error_state(dev);
+	cancel_work_sync(&dev_priv->gpu_error.work);
 
 	/* Flush any outstanding unpin_work. */
 	flush_workqueue(dev_priv->wq);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8bb05b2..9ad15c7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1372,6 +1372,7 @@ struct i915_gpu_error {
 	spinlock_t lock;
 	/* Protected by the above dev->gpu_error.lock. */
 	struct drm_i915_error_state *first_error;
+	struct work_struct work;
 
 	unsigned long missed_irq_rings;
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 4378a65..224ff2b 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2516,14 +2516,18 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv,
 }
 
 /**
- * i915_reset_and_wakeup - do process context error handling work
- * @dev_priv: i915 device private
+ * i915_error_work_func - do process context error handling work
+ * @work: work item containing error struct, passed by the error handler
  *
  * Fire an error uevent so userspace can see that a hang or error
  * was detected.
  */
-static void i915_reset_and_wakeup(struct drm_i915_private *dev_priv)
+static void i915_error_work_func(struct work_struct *work)
 {
+	struct i915_gpu_error *error = container_of(work, struct i915_gpu_error,
+	                                            work);
+	struct drm_i915_private *dev_priv =
+	        container_of(error, struct drm_i915_private, gpu_error);
 	struct kobject *kobj = &dev_priv->dev->primary->kdev->kobj;
 	char *error_event[] = { I915_ERROR_UEVENT "=1", NULL };
 	char *reset_event[] = { I915_RESET_UEVENT "=1", NULL };
@@ -2717,7 +2721,19 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
 		i915_error_wake_up(dev_priv, false);
 	}
 
-	i915_reset_and_wakeup(dev_priv);
+	/*
+	 * Our reset work can grab modeset locks (since it needs to reset the
+	 * state of outstanding pageflips). Hence it must not be run on our own
+	 * dev-priv->wq work queue for otherwise the flush_work in the pageflip
+	 * code will deadlock.
+	 * If error_work is already in the work queue then it will not be added
+	 * again. It hasn't yet executed so it will see the reset flags when
+	 * it is scheduled. If it isn't in the queue or it is currently
+	 * executing then this call will add it to the queue again so that
+	 * even if it misses the reset flags during the current call it is
+	 * guaranteed to see them on the next call.
+	 */
+	schedule_work(&dev_priv->gpu_error.work);
 }
 
 /* Called from drm generic code, passed 'crtc' which
@@ -4556,7 +4572,7 @@ void intel_irq_init(struct drm_i915_private *dev_priv)
 	struct drm_device *dev = dev_priv->dev;
 
 	intel_hpd_init_work(dev_priv);
-
+	INIT_WORK(&dev_priv->gpu_error.work, i915_error_work_func);
 	INIT_WORK(&dev_priv->rps.work, gen6_pm_rps_work);
 	INIT_WORK(&dev_priv->l3_parity.error_work, ivybridge_parity_work);
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 04/15] drm/i915/tdr: Modify error handler for per engine hang recovery
  2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
                   ` (2 preceding siblings ...)
  2016-06-17  7:09 ` [PATCH v2 03/15] drm/i915: Reinstate hang recovery work queue Arun Siluvery
@ 2016-06-17  7:09 ` Arun Siluvery
  2016-06-17  7:09 ` [PATCH v2 05/15] drm/i915/tdr: Prepare execlist submission to handle tdr resubmission after reset Arun Siluvery
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Arun Siluvery @ 2016-06-17  7:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ian Lister, Tomas Elf

This is a preparatory patch which modifies error handler to do per engine
hang recovery. The actual patch which implements this sequence follows
later in the series. The aim is to prepare existing recovery function to
adapt to this new function where applicable (which fails at this point
because core implementation is lacking) and continue recovery using legacy
full gpu reset.

A helper function is also added to query whether engine reset is supported
or not. Engine reset and full gpu reset sections are also extracted to
separate functions, it helps readability and branching between reset type
simpler.

The error events behaviour that are used to notify user of reset are
adapted to engine reset such that it doesn't break users listening to these
events. In legacy we report an error event, a reset event before resetting
the gpu and a reset done event marking the completion of reset. The same
behaviour is adapted but reset event is only dispatched once even when
multiple engines are hung. Finally once reset is complete we send reset
done event as usual.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ian Lister <ian.lister@intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c     |  23 ++++
 drivers/gpu/drm/i915/i915_drv.h     |   2 +
 drivers/gpu/drm/i915/i915_irq.c     | 205 +++++++++++++++++++++++++++---------
 drivers/gpu/drm/i915/intel_uncore.c |   5 +
 4 files changed, 188 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 3eb47fb..dfac8b3 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1012,6 +1012,29 @@ error:
 	return ret;
 }
 
+/**
+ * i915_reset_engine - reset GPU engine to recover from a hang
+ * @engine: engine to reset
+ *
+ * Reset a specific GPU engine. Useful if a hang is detected.
+ * Returns zero on successful reset or otherwise an error code.
+ *
+ * Procedure is fairly simple:
+ *  - force engine to idle
+ *  - save current state which includes head and current request
+ *  - reset engine
+ *  - restore saved state and resubmit context
+ */
+int i915_reset_engine(struct intel_engine_cs *engine)
+{
+	int ret;
+
+	/* FIXME: replace me with engine reset sequence */
+	ret = -ENODEV;
+
+	return ret;
+}
+
 static int i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
 	struct intel_device_info *intel_info =
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9ad15c7..b35ca02 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2909,6 +2909,8 @@ extern long i915_compat_ioctl(struct file *filp, unsigned int cmd,
 extern int intel_gpu_reset(struct drm_i915_private *dev_priv, u32 engine_mask);
 extern bool intel_has_gpu_reset(struct drm_i915_private *dev_priv);
 extern int i915_reset(struct drm_i915_private *dev_priv);
+extern bool intel_has_engine_reset(struct drm_i915_private *dev_priv);
+extern int i915_reset_engine(struct intel_engine_cs *engine);
 extern int intel_guc_reset(struct drm_i915_private *dev_priv);
 extern void intel_engine_init_hangcheck(struct intel_engine_cs *engine);
 extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 224ff2b..1990950 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2515,6 +2515,88 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv,
 		wake_up_all(&dev_priv->gpu_error.reset_queue);
 }
 
+static int i915_reset_engines(struct drm_i915_private *dev_priv)
+{
+	struct intel_engine_cs *engine;
+
+	for_each_engine(engine, dev_priv) {
+		int ret;
+		struct i915_gpu_error *error = &dev_priv->gpu_error;
+
+		if (!i915_engine_reset_in_progress(error, engine->id))
+			continue;
+
+		ret = i915_reset_engine(engine);
+		if (ret) {
+			struct intel_engine_cs *e;
+
+			DRM_ERROR("Reset of %s failed! ret=%d",
+				  engine->name, ret);
+
+			/*
+			 * when engine reset fails we switch to full gpu reset
+			 * which clears everything; In the case where multiple
+			 * engines are hung we would've already scheduled work
+			 * items and when they attempt to do engine reset they
+			 * won't find any active request (full gpu reset
+			 * would've cleared it). To make the work items exit
+			 * safely, clear engine reset pending mask.
+			 */
+			for_each_engine(e, dev_priv) {
+				if (i915_engine_reset_in_progress(error, e->id))
+					atomic_and(~I915_ENGINE_RESET_IN_PROGRESS,
+						   &error->engine_reset_counter[e->id]);
+			}
+
+			return ret;
+		}
+
+		atomic_inc(&error->engine_reset_counter[engine->id]);
+	}
+
+	return 0;
+}
+
+static int i915_reset_full(struct drm_i915_private *dev_priv)
+{
+	struct i915_gpu_error *error = &dev_priv->gpu_error;
+	int ret;
+
+	/* ensure device is awake */
+	assert_rpm_wakelock_held(dev_priv);
+
+	intel_prepare_reset(dev_priv);
+
+	/*
+	 * All state reset _must_ be completed before we update the
+	 * reset counter, for otherwise waiters might miss the reset
+	 * pending state and not properly drop locks, resulting in
+	 * deadlocks with the reset work.
+	 */
+	ret = i915_reset(dev_priv);
+
+	intel_finish_reset(dev_priv);
+
+	if (ret == 0) {
+		/*
+		 * After all the gem state is reset, increment the reset
+		 * counter and wake up everyone waiting for the reset to
+		 * complete.
+		 *
+		 * Since unlock operations are a one-sided barrier only,
+		 * we need to insert a barrier here to order any seqno
+		 * updates before
+		 * the counter increment.
+		 */
+		smp_mb__before_atomic();
+		atomic_inc(&error->reset_counter);
+	} else {
+		atomic_or(I915_WEDGED, &error->reset_counter);
+	}
+
+	return ret;
+}
+
 /**
  * i915_error_work_func - do process context error handling work
  * @work: work item containing error struct, passed by the error handler
@@ -2528,6 +2610,7 @@ static void i915_error_work_func(struct work_struct *work)
 	                                            work);
 	struct drm_i915_private *dev_priv =
 	        container_of(error, struct drm_i915_private, gpu_error);
+	struct drm_device *dev = dev_priv->dev;
 	struct kobject *kobj = &dev_priv->dev->primary->kdev->kobj;
 	char *error_event[] = { I915_ERROR_UEVENT "=1", NULL };
 	char *reset_event[] = { I915_RESET_UEVENT "=1", NULL };
@@ -2537,6 +2620,39 @@ static void i915_error_work_func(struct work_struct *work)
 	kobject_uevent_env(kobj, KOBJ_CHANGE, error_event);
 
 	/*
+	 * This event needs to be sent before performing gpu reset. When
+	 * engine resets are supported we iterate through all engines and
+	 * reset hung engines individually. To keep the event dispatch
+	 * mechanism consistent with full gpu reset, this is only sent once
+	 * even when multiple engines are hung. It is also safe to move this
+	 * here because when we are in this function, we will definitely
+	 * perform gpu reset.
+	 */
+	kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, reset_event);
+
+	/*
+	 * In most cases it's guaranteed that we get here with an RPM
+	 * reference held, for example because there is a pending GPU
+	 * request that won't finish until the reset is done. This
+	 * isn't the case at least when we get here by doing a
+	 * simulated reset via debugs, so get an RPM reference.
+	 */
+	intel_runtime_pm_get(dev_priv);
+
+	mutex_lock(&dev->struct_mutex);
+
+	if (!i915_reset_in_progress(error)) {
+		ret = i915_reset_engines(dev_priv);
+		if (ret) {
+			/* attempt full gpu reset to recover */
+			atomic_or(I915_RESET_IN_PROGRESS_FLAG,
+				  &dev_priv->gpu_error.reset_counter);
+		}
+	}
+
+	mutex_unlock(&dev->struct_mutex);
+
+	/*
 	 * Note that there's only one work item which does gpu resets, so we
 	 * need not worry about concurrent gpu resets potentially incrementing
 	 * error->reset_counter twice. We only need to take care of another
@@ -2548,41 +2664,21 @@ static void i915_error_work_func(struct work_struct *work)
 	 */
 	if (i915_reset_in_progress(&dev_priv->gpu_error)) {
 		DRM_DEBUG_DRIVER("resetting chip\n");
-		kobject_uevent_env(kobj, KOBJ_CHANGE, reset_event);
-
-		/*
-		 * In most cases it's guaranteed that we get here with an RPM
-		 * reference held, for example because there is a pending GPU
-		 * request that won't finish until the reset is done. This
-		 * isn't the case at least when we get here by doing a
-		 * simulated reset via debugs, so get an RPM reference.
-		 */
-		intel_runtime_pm_get(dev_priv);
-
-		intel_prepare_reset(dev_priv);
-
-		/*
-		 * All state reset _must_ be completed before we update the
-		 * reset counter, for otherwise waiters might miss the reset
-		 * pending state and not properly drop locks, resulting in
-		 * deadlocks with the reset work.
-		 */
-		ret = i915_reset(dev_priv);
-
-		intel_finish_reset(dev_priv);
 
-		intel_runtime_pm_put(dev_priv);
-
-		if (ret == 0)
-			kobject_uevent_env(kobj,
-					   KOBJ_CHANGE, reset_done_event);
+		ret = i915_reset_full(dev_priv);
+	}
 
-		/*
-		 * Note: The wake_up also serves as a memory barrier so that
-		 * waiters see the update value of the reset counter atomic_t.
-		 */
+	/*
+	 * Note: The wake_up also serves as a memory barrier so that
+	 * waiters see the update value of the reset counter atomic_t.
+	 */
+	if (!i915_terminally_wedged(error)) {
 		i915_error_wake_up(dev_priv, true);
+		kobject_uevent_env(&dev->primary->kdev->kobj,
+				   KOBJ_CHANGE, reset_done_event);
 	}
+
+	intel_runtime_pm_put(dev_priv);
 }
 
 static void i915_report_and_clear_eir(struct drm_i915_private *dev_priv)
@@ -2680,6 +2776,8 @@ static void i915_report_and_clear_eir(struct drm_i915_private *dev_priv)
  * i915_handle_error - handle a gpu error
  * @dev_priv: i915 device private
  * @engine_mask: mask representing engines that are hung
+ * @fmt: formatted hang msg that gets logged in captured error state
+ *
  * Do some basic checking of register state at error time and
  * dump it to the syslog.  Also call i915_capture_error_state() to make
  * sure we get a record and make it available in debugfs.  Fire a uevent
@@ -2702,26 +2800,39 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
 	i915_report_and_clear_eir(dev_priv);
 
 	if (engine_mask) {
-		atomic_or(I915_RESET_IN_PROGRESS_FLAG,
-				&dev_priv->gpu_error.reset_counter);
+		if (intel_has_engine_reset(dev_priv)) {
+			struct intel_engine_cs *engine;
+			struct i915_gpu_error *error = &dev_priv->gpu_error;
 
-		/*
-		 * Wakeup waiting processes so that the reset function
-		 * i915_reset_and_wakeup doesn't deadlock trying to grab
-		 * various locks. By bumping the reset counter first, the woken
-		 * processes will see a reset in progress and back off,
-		 * releasing their locks and then wait for the reset completion.
-		 * We must do this for _all_ gpu waiters that might hold locks
-		 * that the reset work needs to acquire.
-		 *
-		 * Note: The wake_up serves as the required memory barrier to
-		 * ensure that the waiters see the updated value of the reset
-		 * counter atomic_t.
-		 */
-		i915_error_wake_up(dev_priv, false);
+			for_each_engine_masked(engine, dev_priv, engine_mask) {
+				if (i915_engine_reset_in_progress(error, engine->id))
+					continue;
+
+				atomic_or(I915_ENGINE_RESET_IN_PROGRESS,
+					  &error->engine_reset_counter[engine->id]);
+			}
+		} else {
+			atomic_or(I915_RESET_IN_PROGRESS_FLAG,
+				  &dev_priv->gpu_error.reset_counter);
+		}
 	}
 
 	/*
+	 * Wakeup waiting processes so that the reset function
+	 * i915_reset_and_wakeup doesn't deadlock trying to grab
+	 * various locks. By bumping the reset counter first, the woken
+	 * processes will see a reset in progress and back off,
+	 * releasing their locks and then wait for the reset completion.
+	 * We must do this for _all_ gpu waiters that might hold locks
+	 * that the reset work needs to acquire.
+	 *
+	 * Note: The wake_up serves as the required memory barrier to
+	 * ensure that the waiters see the updated value of the reset
+	 * counter atomic_t.
+	 */
+	i915_error_wake_up(dev_priv, false);
+
+	/*
 	 * Our reset work can grab modeset locks (since it needs to reset the
 	 * state of outstanding pageflips). Hence it must not be run on our own
 	 * dev-priv->wq work queue for otherwise the flush_work in the pageflip
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index c1ca458..d973c2a 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1710,6 +1710,11 @@ bool intel_has_gpu_reset(struct drm_i915_private *dev_priv)
 	return intel_get_gpu_reset(dev_priv) != NULL;
 }
 
+bool intel_has_engine_reset(struct drm_i915_private *dev_priv)
+{
+	return (INTEL_INFO(dev_priv)->gen >=8 && i915.reset == 2);
+}
+
 int intel_guc_reset(struct drm_i915_private *dev_priv)
 {
 	int ret;
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 05/15] drm/i915/tdr: Prepare execlist submission to handle tdr resubmission after reset
  2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
                   ` (3 preceding siblings ...)
  2016-06-17  7:09 ` [PATCH v2 04/15] drm/i915/tdr: Modify error handler for per engine hang recovery Arun Siluvery
@ 2016-06-17  7:09 ` Arun Siluvery
  2016-06-17  7:39   ` Chris Wilson
  2016-06-17  7:09 ` [PATCH v2 06/15] drm/i915/tdr: Capture engine state before reset Arun Siluvery
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 23+ messages in thread
From: Arun Siluvery @ 2016-06-17  7:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Tomas Elf

To resume execution after engine reset we resubmit the context and this
needs to be treated differently otherwise we would count it as a completely
new submission. This change modifies the submission path to account for
this.

During resubmission we only submit the head request that caused the hang,
once this is done then we continue with the normally submission of two
contexts at a time. The intention is to restore the submission state at the
time of hang.

Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 36 +++++++++++++++++++++++++++---------
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4fad830..35255ce 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -340,7 +340,8 @@ uint64_t intel_lr_context_descriptor(struct i915_gem_context *ctx,
 }
 
 static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
-				 struct drm_i915_gem_request *rq1)
+				 struct drm_i915_gem_request *rq1,
+				 bool tdr_resubmission)
 {
 
 	struct intel_engine_cs *engine = rq0->engine;
@@ -349,13 +350,15 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
 
 	if (rq1) {
 		desc[1] = intel_lr_context_descriptor(rq1->ctx, rq1->engine);
-		rq1->elsp_submitted++;
+		if (!tdr_resubmission)
+			rq1->elsp_submitted++;
 	} else {
 		desc[1] = 0;
 	}
 
 	desc[0] = intel_lr_context_descriptor(rq0->ctx, rq0->engine);
-	rq0->elsp_submitted++;
+	if (!tdr_resubmission)
+		rq0->elsp_submitted++;
 
 	/* You must always write both descriptors in the order below. */
 	I915_WRITE_FW(RING_ELSP(engine), upper_32_bits(desc[1]));
@@ -396,7 +399,8 @@ static void execlists_update_context(struct drm_i915_gem_request *rq)
 }
 
 static void execlists_submit_requests(struct drm_i915_gem_request *rq0,
-				      struct drm_i915_gem_request *rq1)
+				      struct drm_i915_gem_request *rq1,
+				      bool tdr_resubmission)
 {
 	struct drm_i915_private *dev_priv = rq0->i915;
 	unsigned int fw_domains = rq0->engine->fw_domains;
@@ -409,13 +413,14 @@ static void execlists_submit_requests(struct drm_i915_gem_request *rq0,
 	spin_lock_irq(&dev_priv->uncore.lock);
 	intel_uncore_forcewake_get__locked(dev_priv, fw_domains);
 
-	execlists_elsp_write(rq0, rq1);
+	execlists_elsp_write(rq0, rq1, tdr_resubmission);
 
 	intel_uncore_forcewake_put__locked(dev_priv, fw_domains);
 	spin_unlock_irq(&dev_priv->uncore.lock);
 }
 
-static void execlists_context_unqueue(struct intel_engine_cs *engine)
+static void execlists_context_unqueue(struct intel_engine_cs *engine,
+				      bool tdr_resubmission)
 {
 	struct drm_i915_gem_request *req0 = NULL, *req1 = NULL;
 	struct drm_i915_gem_request *cursor, *tmp;
@@ -433,6 +438,19 @@ static void execlists_context_unqueue(struct intel_engine_cs *engine)
 				 execlist_link) {
 		if (!req0) {
 			req0 = cursor;
+
+			/*
+			 * Only submit head request if this is a resubmission
+			 * following engine reset. The intention is to restore
+			 * the original submission state from the situation
+			 * when the hang originally happened. Once the request
+			 * that caused the hang is resubmitted we can continue
+			 * normally by submitting two request at a time.
+			 */
+			if (tdr_resubmission) {
+				req1 = NULL;
+				break;
+			}
 		} else if (req0->ctx == cursor->ctx) {
 			/* Same ctx: ignore first request, as second request
 			 * will update tail past first request's workload */
@@ -466,7 +484,7 @@ static void execlists_context_unqueue(struct intel_engine_cs *engine)
 		req0->tail &= ringbuf->size - 1;
 	}
 
-	execlists_submit_requests(req0, req1);
+	execlists_submit_requests(req0, req1, tdr_resubmission);
 }
 
 static unsigned int
@@ -578,7 +596,7 @@ static void intel_lrc_irq_handler(unsigned long data)
 	if (submit_contexts) {
 		if (!engine->disable_lite_restore_wa ||
 		    (csb[i][0] & GEN8_CTX_STATUS_ACTIVE_IDLE))
-			execlists_context_unqueue(engine);
+			execlists_context_unqueue(engine, false);
 	}
 
 	spin_unlock(&engine->execlist_lock);
@@ -618,7 +636,7 @@ static void execlists_context_queue(struct drm_i915_gem_request *request)
 	list_add_tail(&request->execlist_link, &engine->execlist_queue);
 	request->ctx_hw_id = request->ctx->hw_id;
 	if (num_elements == 0)
-		execlists_context_unqueue(engine);
+		execlists_context_unqueue(engine, false);
 
 	spin_unlock_bh(&engine->execlist_lock);
 }
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 06/15] drm/i915/tdr: Capture engine state before reset
  2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
                   ` (4 preceding siblings ...)
  2016-06-17  7:09 ` [PATCH v2 05/15] drm/i915/tdr: Prepare execlist submission to handle tdr resubmission after reset Arun Siluvery
@ 2016-06-17  7:09 ` Arun Siluvery
  2016-06-17  7:09 ` [PATCH v2 07/15] drm/i915/tdr: Restore engine state and start after reset Arun Siluvery
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Arun Siluvery @ 2016-06-17  7:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Tomas Elf

Minimal state of engine is saved before resetting it, this state includes
head, current active request.

A consistency check is performed on the active request to make sure that
the context HW is executing is same as the one for the active request. This
check is important because engine recovery in execlist mode relies on the
context resubmission after reset. If the state is inconsistent,
resubmission can cause unforseen side-effects such as unexpected
preemptions.

Engine is restarted after reset with this state.

Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 80 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h        |  3 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h | 10 +++++
 3 files changed, 93 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 35255ce..b83552a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1022,6 +1022,82 @@ void intel_lr_context_unpin(struct i915_gem_context *ctx,
 	i915_gem_context_unreference(ctx);
 }
 
+/**
+ * intel_execlist_get_current_request() - returns request currently processed
+ * by the given engine
+ *
+ * @engine: Engine currently running context to be returned.
+ *
+ * Returns:
+ *  req - if a valid req is found in the execlist queue and HW also agrees.
+ *        caller has to dereference at the end of its lifecycle.
+ *  NULL - otherwise
+ */
+static struct drm_i915_gem_request *
+intel_execlist_get_current_request(struct intel_engine_cs *engine)
+{
+	struct drm_i915_gem_request *req;
+	unsigned long flags;
+
+	spin_lock_irqsave(&engine->execlist_lock, flags);
+
+	req = list_first_entry_or_null(&engine->execlist_queue,
+				       struct drm_i915_gem_request,
+				       execlist_link);
+	/*
+	 * Only acknowledge the request in the execlist queue if it's actually
+	 * been submitted to hardware, otherwise there's the risk of
+	 * inconsistency between the (unsubmitted) request and the idle
+	 * hardware state.
+	 */
+	if (req && req->ctx && req->elsp_submitted) {
+		u32 execlist_status;
+		u32 hw_context;
+		u32 hw_active;
+		struct drm_i915_private *dev_priv = engine->i915;
+
+		hw_context = I915_READ(RING_EXECLIST_STATUS_CTX_ID(engine));
+		execlist_status = I915_READ(RING_EXECLIST_STATUS_LO(engine));
+		hw_active = ((execlist_status & EXECLIST_STATUS_ELEMENT0_ACTIVE) ||
+			     (execlist_status & EXECLIST_STATUS_ELEMENT1_ACTIVE));
+
+		/* If both HW and driver agrees then we found it */
+		if (hw_active && hw_context == req->ctx->hw_id)
+			i915_gem_request_reference(req);
+	} else {
+		req = NULL;
+		WARN(1, "No active request for %s\n", engine->name);
+	}
+
+	spin_unlock_irqrestore(&engine->execlist_lock, flags);
+
+	return req;
+}
+
+/**
+ * gen8_engine_state_save() - save minimum engine state
+ * @engine: engine whose state is to be saved
+ * @state: location where the state is saved
+ *
+ * captured engine state includes head, tail, active request. After reset,
+ * engine is restarted with this state.
+ *
+ * Returns:
+ *	0 if ok, otherwise propagates error codes.
+ */
+static int gen8_engine_state_save(struct intel_engine_cs *engine,
+				  struct intel_engine_cs_state *state)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+
+	state->head = I915_READ_HEAD(engine);
+	state->req = intel_execlist_get_current_request(engine);
+	if (!state->req)
+		return -EINVAL;
+
+	return 0;
+}
+
 static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
 {
 	int ret, i;
@@ -1977,6 +2053,10 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->emit_bb_start = gen8_emit_bb_start;
 	engine->get_seqno = gen8_get_seqno;
 	engine->set_seqno = gen8_set_seqno;
+
+	/* engine reset supporting functions */
+	engine->save = gen8_engine_state_save;
+
 	if (IS_BXT_REVID(engine->i915, 0, BXT_REVID_A1)) {
 		engine->irq_seqno_barrier = bxt_a_seqno_barrier;
 		engine->set_seqno = bxt_a_set_seqno;
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index a8db42a..0ef6fb5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -31,7 +31,10 @@
 /* Execlists regs */
 #define RING_ELSP(ring)				_MMIO((ring)->mmio_base + 0x230)
 #define RING_EXECLIST_STATUS_LO(ring)		_MMIO((ring)->mmio_base + 0x234)
+#define   EXECLIST_STATUS_ELEMENT0_ACTIVE       (1 << 14)
+#define   EXECLIST_STATUS_ELEMENT1_ACTIVE       (1 << 15)
 #define RING_EXECLIST_STATUS_HI(ring)		_MMIO((ring)->mmio_base + 0x234 + 4)
+#define RING_EXECLIST_STATUS_CTX_ID(ring)	RING_EXECLIST_STATUS_HI(ring)
 #define RING_CONTEXT_CONTROL(ring)		_MMIO((ring)->mmio_base + 0x244)
 #define	  CTX_CTRL_INHIBIT_SYN_CTX_SWITCH	(1 << 3)
 #define	  CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT	(1 << 0)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index b33c876..daf2727 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -141,6 +141,12 @@ struct  i915_ctx_workarounds {
 	struct drm_i915_gem_object *obj;
 };
 
+struct intel_engine_cs_state {
+	u32 head;
+	u32 tail;
+	struct drm_i915_gem_request *req;
+};
+
 struct intel_engine_cs {
 	struct drm_i915_private *i915;
 	const char	*name;
@@ -204,6 +210,10 @@ struct intel_engine_cs {
 #define I915_DISPATCH_RS     0x4
 	void		(*cleanup)(struct intel_engine_cs *ring);
 
+	/* engine reset supporting functions */
+	int (*save)(struct intel_engine_cs *engine,
+		    struct intel_engine_cs_state *state);
+
 	/* GEN8 signal/wait table - never trust comments!
 	 *	  signal to	signal to    signal to   signal to      signal to
 	 *	    RCS		   VCS          BCS        VECS		 VCS2
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 07/15] drm/i915/tdr: Restore engine state and start after reset
  2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
                   ` (5 preceding siblings ...)
  2016-06-17  7:09 ` [PATCH v2 06/15] drm/i915/tdr: Capture engine state before reset Arun Siluvery
@ 2016-06-17  7:09 ` Arun Siluvery
  2016-06-17  7:32   ` Chris Wilson
  2016-06-17  7:09 ` [PATCH v2 08/15] drm/i915/tdr: Add support for per engine reset recovery Arun Siluvery
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 23+ messages in thread
From: Arun Siluvery @ 2016-06-17  7:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Tomas Elf

We capture the state of an engine before resetting it, once the reset is
successful engine is restored with the same state and restarted.

The state includes head register and active request. We also nudge the head
forward if it hasn't advanced, otherwise when the engine is restarted HW
executes the same instruction and may hang again. Generally head
automatically advances to the next instruction as soon as HW reads current
instruction, without waiting for it to complete, however a MBOX wait
inserted directly to VCS/BCS engines doesn't behave in the same way,
instead head will still be pointing at the same instruction until it
completes.

If the head is modified, this is also updated in the context image so that
HW sees up to date value.

A valid request is expected in the state at this point otherwise we
wouldn't have reached this point, the context that submitted this request
is resubmitted to HW. The request that caused the hang would be at the
start of execlist queue, unless we resubmit and complete this request, it
cannot be removed from the queue.

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 94 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  9 ++++
 2 files changed, 103 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b83552a..c9aa2ca 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -487,6 +487,30 @@ static void execlists_context_unqueue(struct intel_engine_cs *engine,
 	execlists_submit_requests(req0, req1, tdr_resubmission);
 }
 
+/**
+ * intel_execlists_resubmit()
+ * @engine: engine to do resubmission for
+ *
+ * In execlists mode, engine reset postprocess mainly includes resubmission of
+ * context after reset, for this we bypass the execlist queue. This is
+ * necessary since at the point of TDR hang recovery the hardware will be hung
+ * and resubmitting a fixed context (the context that the TDR has identified
+ * as hung and fixed up in order to move past the blocking batch buffer) to a
+ * hung execlist queue will lock up the TDR.  Instead, opt for direct ELSP
+ * submission without depending on the rest of the driver.
+ */
+static void intel_execlists_resubmit(struct intel_engine_cs *engine)
+{
+	unsigned long flags;
+
+	if (WARN_ON(list_empty(&engine->execlist_queue)))
+		return;
+
+	spin_lock_irqsave(&engine->execlist_lock, flags);
+	execlists_context_unqueue(engine, true);
+	spin_unlock_irqrestore(&engine->execlist_lock, flags);
+}
+
 static unsigned int
 execlists_check_remove_request(struct intel_engine_cs *engine, u32 ctx_id)
 {
@@ -1098,6 +1122,75 @@ static int gen8_engine_state_save(struct intel_engine_cs *engine,
 	return 0;
 }
 
+/**
+ * gen8_engine_start() - restore saved state and start engine
+ * @engine: engine to be started
+ * @state: state to be restored
+ *
+ * Returns:
+ *	0 if ok, otherwise propagates error codes.
+ */
+static int gen8_engine_start(struct intel_engine_cs *engine,
+			     struct intel_engine_cs_state *state)
+{
+	u32 head;
+	u32 head_addr, tail_addr;
+	u32 *reg_state;
+	struct intel_ringbuffer *ringbuf;
+	struct i915_gem_context *ctx;
+	struct drm_i915_private *dev_priv = engine->i915;
+
+	ctx = state->req->ctx;
+	ringbuf = ctx->engine[engine->id].ringbuf;
+	reg_state = ctx->engine[engine->id].lrc_reg_state;
+
+	head = state->head;
+	head_addr = head & HEAD_ADDR;
+
+	if (head == engine->hangcheck.last_head) {
+		/*
+		 * The engine has not advanced since the last time it hung,
+		 * force it to advance to the next QWORD. In most cases the
+		 * engine head pointer will automatically advance to the
+		 * next instruction as soon as it has read the current
+		 * instruction, without waiting for it to complete. This
+		 * seems to be the default behaviour, however an MBOX wait
+		 * inserted directly to the VCS/BCS engines does not behave
+		 * in the same way, instead the head pointer will still be
+		 * pointing at the MBOX instruction until it completes.
+		 */
+		head_addr = roundup(head_addr, 8);
+		engine->hangcheck.last_head = head;
+	} else if (head_addr & 0x7) {
+		/* Ensure head pointer is pointing to a QWORD boundary */
+		head_addr = ALIGN(head_addr, 8);
+	}
+
+	tail_addr = reg_state[CTX_RING_TAIL+1] & TAIL_ADDR;
+
+	if (head_addr > tail_addr)
+		head_addr = tail_addr;
+	else if (head_addr >= ringbuf->size)
+		head_addr = 0;
+
+	head &= ~HEAD_ADDR;
+	head |= (head_addr & HEAD_ADDR);
+
+	/* Restore head */
+	reg_state[CTX_RING_HEAD+1] = head;
+	I915_WRITE_HEAD(engine, head);
+
+	/* set head */
+	ringbuf->head = head;
+	ringbuf->last_retired_head = -1;
+	intel_ring_update_space(ringbuf);
+
+	if (state->req)
+		intel_execlists_resubmit(engine);
+
+	return 0;
+}
+
 static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
 {
 	int ret, i;
@@ -2056,6 +2149,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 
 	/* engine reset supporting functions */
 	engine->save = gen8_engine_state_save;
+	engine->start = gen8_engine_start;
 
 	if (IS_BXT_REVID(engine->i915, 0, BXT_REVID_A1)) {
 		engine->irq_seqno_barrier = bxt_a_seqno_barrier;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index daf2727..55cb0b5 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -92,6 +92,13 @@ struct intel_ring_hangcheck {
 	enum intel_ring_hangcheck_action action;
 	int deadlock;
 	u32 instdone[I915_NUM_INSTDONE_REG];
+
+	/*
+	 * Last recorded ring head index.
+	 * This is only ever a ring index where as active
+	 * head may be a graphics address in a ring buffer
+	 */
+	u32 last_head;
 };
 
 struct intel_ringbuffer {
@@ -213,6 +220,8 @@ struct intel_engine_cs {
 	/* engine reset supporting functions */
 	int (*save)(struct intel_engine_cs *engine,
 		    struct intel_engine_cs_state *state);
+	int (*start)(struct intel_engine_cs *engine,
+		     struct intel_engine_cs_state *state);
 
 	/* GEN8 signal/wait table - never trust comments!
 	 *	  signal to	signal to    signal to   signal to      signal to
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 08/15] drm/i915/tdr: Add support for per engine reset recovery
  2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
                   ` (6 preceding siblings ...)
  2016-06-17  7:09 ` [PATCH v2 07/15] drm/i915/tdr: Restore engine state and start after reset Arun Siluvery
@ 2016-06-17  7:09 ` Arun Siluvery
  2016-06-17  7:09 ` [PATCH v2 09/15] drm/i915: Skip reset request if there is one already Arun Siluvery
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Arun Siluvery @ 2016-06-17  7:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Tomas Elf

This change implements support for per-engine reset as an initial, less
intrusive hang recovery option to be attempted before falling back to the
legacy full GPU reset recovery mode if necessary. This is only supported
from Gen8 onwards.

Hangchecker determines which engines are hung and invokes error handler to
recover from it. Error handler schedules recovery for each of those engines
that are hung. The recovery procedure is as follows,
 - force engine to idle: this is done by issuing a reset request
 - save current state which includes head and current request
 - reset engine
 - restore saved state and resubmit context

If engine reset fails then we fall back to heavy weight full gpu reset
which resets all engines and reinitiazes complete state of HW and SW.

Possible reasons for failure,
 - engine not ready for reset
 - if the HW and SW doesn't agree on the context that caused the hang
 - reset itself failed for some reason

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c     | 54 +++++++++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_drv.h     |  4 +++
 drivers/gpu/drm/i915/i915_gem.c     |  4 +--
 drivers/gpu/drm/i915/intel_uncore.c | 33 +++++++++++++++++++++++
 4 files changed, 91 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index dfac8b3..cb76da1 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1027,10 +1027,60 @@ error:
  */
 int i915_reset_engine(struct intel_engine_cs *engine)
 {
+	struct drm_device *dev = engine->i915->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine_cs_state state;
 	int ret;
 
-	/* FIXME: replace me with engine reset sequence */
-	ret = -ENODEV;
+	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
+
+	i915_gem_reset_engine_status(dev_priv, engine);
+
+	/*Take wake lock to prevent power saving mode */
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+	ret = intel_request_for_reset(engine);
+	if (ret) {
+		DRM_ERROR("Failed to disable %s\n", engine->name);
+		goto out;
+	}
+
+	ret = engine->save(engine, &state);
+	if (ret)
+		goto enable_engine;
+
+	ret = intel_gpu_reset(dev_priv, intel_engine_flag(engine));
+	if (ret) {
+		DRM_ERROR("Failed to reset %s, ret=%d\n", engine->name, ret);
+		goto enable_engine;
+	}
+
+	ret = engine->init_hw(engine);
+	if (ret)
+		goto out;
+
+	/*
+	 * Restart the engine after reset.
+	 * Engine state is first restored and the context is resubmitted.
+	 */
+	engine->start(engine, &state);
+
+enable_engine:
+	/*
+	 * we only need to enable engine if we cannot either save engine state
+	 * or reset fails. If the reset is successful, engine gets enabled
+	 * automatically so we can skip this step.
+	 */
+	if (ret)
+		intel_clear_reset_request(engine);
+
+out:
+	if (state.req)
+		i915_gem_request_unreference(state.req);
+
+	/* Wake up anything waiting on this engine's queue */
+	wake_up_all(&engine->irq_queue);
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b35ca02..b2105bb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2910,6 +2910,8 @@ extern int intel_gpu_reset(struct drm_i915_private *dev_priv, u32 engine_mask);
 extern bool intel_has_gpu_reset(struct drm_i915_private *dev_priv);
 extern int i915_reset(struct drm_i915_private *dev_priv);
 extern bool intel_has_engine_reset(struct drm_i915_private *dev_priv);
+extern int intel_request_for_reset(struct intel_engine_cs *engine);
+extern int intel_clear_reset_request(struct intel_engine_cs *engine);
 extern int i915_reset_engine(struct intel_engine_cs *engine);
 extern int intel_guc_reset(struct drm_i915_private *dev_priv);
 extern void intel_engine_init_hangcheck(struct intel_engine_cs *engine);
@@ -3331,6 +3333,8 @@ static inline bool i915_stop_ring_allow_warn(struct drm_i915_private *dev_priv)
 }
 
 void i915_gem_reset(struct drm_device *dev);
+void i915_gem_reset_engine_status(struct drm_i915_private *dev_priv,
+				  struct intel_engine_cs *ring);
 bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
 int __must_check i915_gem_init(struct drm_device *dev);
 int i915_gem_init_engines(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 21d0dea..6160564 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3082,8 +3082,8 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
 	return NULL;
 }
 
-static void i915_gem_reset_engine_status(struct drm_i915_private *dev_priv,
-				       struct intel_engine_cs *engine)
+void i915_gem_reset_engine_status(struct drm_i915_private *dev_priv,
+				  struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_request *request;
 	bool ring_hung;
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index d973c2a..63acaeb 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1734,6 +1734,39 @@ int intel_guc_reset(struct drm_i915_private *dev_priv)
 	return ret;
 }
 
+/*
+ * On gen8+ a reset request has to be issued via the reset control register
+ * before a GPU engine can be reset in order to stop the command streamer
+ * and idle the engine. This replaces the legacy way of stopping an engine
+ * by writing to the stop ring bit in the MI_MODE register.
+ */
+int intel_request_for_reset(struct intel_engine_cs *engine)
+{
+	if (!intel_has_engine_reset(engine->i915)) {
+		DRM_ERROR("Engine Reset not supported on Gen%d\n",
+			  INTEL_INFO(engine->i915)->gen);
+		return -EINVAL;
+	}
+
+	return gen8_request_engine_reset(engine);
+}
+
+/*
+ * It is possible to back off from a previously issued reset request by simply
+ * clearing the reset request bit in the reset control register.
+ */
+int intel_clear_reset_request(struct intel_engine_cs *engine)
+{
+	if (!intel_has_engine_reset(engine->i915)) {
+		DRM_ERROR("Request to clear reset not supported on Gen%d\n",
+			  INTEL_INFO(engine->i915)->gen);
+		return -EINVAL;
+	}
+
+	gen8_unrequest_engine_reset(engine);
+	return 0;
+}
+
 bool intel_uncore_unclaimed_mmio(struct drm_i915_private *dev_priv)
 {
 	return check_for_unclaimed_mmio(dev_priv);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 09/15] drm/i915: Skip reset request if there is one already
  2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
                   ` (7 preceding siblings ...)
  2016-06-17  7:09 ` [PATCH v2 08/15] drm/i915/tdr: Add support for per engine reset recovery Arun Siluvery
@ 2016-06-17  7:09 ` Arun Siluvery
  2016-06-17  7:09 ` [PATCH v2 10/15] drm/i915: Extending i915_gem_check_wedge to check engine reset in progress Arun Siluvery
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Arun Siluvery @ 2016-06-17  7:09 UTC (permalink / raw)
  To: intel-gfx

From: Mika Kuoppala <mika.kuoppala@linux.intel.com>

To perform engine reset we first disable engine to capture its state. This
is done by issuing a reset request. Because we are reusing existing
infrastructure, again when we actually reset an engine, reset function
checks engine mask and issues reset request again which is unnecessary. To
avoid this we check if the engine is already prepared, if so we just exit
from that point.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_uncore.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 63acaeb..6355c2a7 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1621,13 +1621,18 @@ static int wait_for_register_fw(struct drm_i915_private *dev_priv,
 static int gen8_request_engine_reset(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
+	const i915_reg_t reset_ctrl = RING_RESET_CTL(engine->mmio_base);
+	const u32 ready = RESET_CTL_REQUEST_RESET | RESET_CTL_READY_TO_RESET;
 	int ret;
 
-	I915_WRITE_FW(RING_RESET_CTL(engine->mmio_base),
-		      _MASKED_BIT_ENABLE(RESET_CTL_REQUEST_RESET));
+	/* If engine has been already prepared, we can shortcut here */
+	if ((I915_READ_FW(reset_ctrl) & ready) == ready)
+		return 0;
+
+	I915_WRITE_FW(reset_ctrl, _MASKED_BIT_ENABLE(RESET_CTL_REQUEST_RESET));
 
 	ret = wait_for_register_fw(dev_priv,
-				   RING_RESET_CTL(engine->mmio_base),
+				   reset_ctrl,
 				   RESET_CTL_READY_TO_RESET,
 				   RESET_CTL_READY_TO_RESET,
 				   700);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 10/15] drm/i915: Extending i915_gem_check_wedge to check engine reset in progress
  2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
                   ` (8 preceding siblings ...)
  2016-06-17  7:09 ` [PATCH v2 09/15] drm/i915: Skip reset request if there is one already Arun Siluvery
@ 2016-06-17  7:09 ` Arun Siluvery
  2016-06-17  7:41   ` Chris Wilson
  2016-06-17  7:09 ` [PATCH v2 11/15] drm/i915: Port of Added scheduler support to __wait_request() calls Arun Siluvery
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 23+ messages in thread
From: Arun Siluvery @ 2016-06-17  7:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ian Lister, Tomas Elf

i915_gem_check_wedge now returns a non-zero result in three different cases:

1. Legacy: A hang has been detected and full GPU reset is in progress.

2. Per-engine recovery:
   a. A single engine reference can be passed to the function, in which
   case only that engine will be checked. If that particular engine is
   detected to be hung and is to be reset this will yield a non-zero result
   but not if reset is in progress for any other engine.

   b. No engine reference is passed to the function, in which case all
   engines are checked for ongoing per-engine hang recovery.

__i915_wait_request() is updated such that if an engine reset is pending,
we request the waiter to try again so that engine recovery can continue.
If i915_wait_request does not take per-engine hang recovery into account
there is no way for a waiting thread to know that a per-engine recovery is
about to happen and that it needs to back off.

Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Ian Lister <ian.lister@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---

These changes are based on current nightly. I am aware of the changes being
done to wait_request patch in "thundering herd series" but my understanding
is it has other dependencies. We can add incremental changes once that
series is merged.

 drivers/gpu/drm/i915/i915_gem.c | 43 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 38 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6160564..bc404da 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -100,12 +100,31 @@ static void i915_gem_info_remove_obj(struct drm_i915_private *dev_priv,
 	spin_unlock(&dev_priv->mm.object_stat_lock);
 }
 
+static bool i915_engine_reset_pending(struct i915_gpu_error *error,
+				     struct intel_engine_cs *engine)
+{
+	int i;
+
+	if (engine)
+		return i915_engine_reset_in_progress(error, engine->id);
+
+	for (i = 0; i < I915_NUM_ENGINES; ++i) {
+		if (i915_engine_reset_in_progress(error, i))
+			return true;
+	}
+
+	return false;
+}
+
 static int
 i915_gem_wait_for_error(struct i915_gpu_error *error)
 {
 	int ret;
 
-	if (!i915_reset_in_progress(error))
+#define EXIT_COND (!i915_reset_in_progress(error) ||	\
+		   !i915_engine_reset_pending(error, NULL))
+
+	if (EXIT_COND)
 		return 0;
 
 	/*
@@ -114,7 +133,7 @@ i915_gem_wait_for_error(struct i915_gpu_error *error)
 	 * we should simply try to bail out and fail as gracefully as possible.
 	 */
 	ret = wait_event_interruptible_timeout(error->reset_queue,
-					       !i915_reset_in_progress(error),
+					       EXIT_COND,
 					       10*HZ);
 	if (ret == 0) {
 		DRM_ERROR("Timed out waiting for the gpu reset to complete\n");
@@ -1325,12 +1344,18 @@ put_rpm:
 }
 
 static int
-i915_gem_check_wedge(unsigned reset_counter, bool interruptible)
+i915_gem_check_wedge(struct drm_i915_private *dev_priv,
+		     struct intel_engine_cs *engine,
+		     bool interruptible)
 {
+	struct i915_gpu_error *error = &dev_priv->gpu_error;
+	unsigned reset_counter = i915_reset_counter(error);
+
 	if (__i915_terminally_wedged(reset_counter))
 		return -EIO;
 
-	if (__i915_reset_in_progress(reset_counter)) {
+	if (__i915_reset_in_progress(reset_counter) ||
+	    i915_engine_reset_pending(error, engine)) {
 		/* Non-interruptible callers can't handle -EAGAIN, hence return
 		 * -EIO unconditionally for these. */
 		if (!interruptible)
@@ -1500,6 +1525,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 	for (;;) {
 		struct timer_list timer;
+		int reset_pending;
 
 		prepare_to_wait(&engine->irq_queue, &wait, state);
 
@@ -1515,6 +1541,13 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			break;
 		}
 
+		reset_pending = i915_engine_reset_pending(&dev_priv->gpu_error,
+							  NULL);
+		if (reset_pending) {
+			ret = -EAGAIN;
+			break;
+		}
+
 		if (i915_gem_request_completed(req, false)) {
 			ret = 0;
 			break;
@@ -2997,7 +3030,7 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
 	 * EIO if the GPU is already wedged, or EAGAIN to drop the struct_mutex
 	 * and restart.
 	 */
-	ret = i915_gem_check_wedge(reset_counter, dev_priv->mm.interruptible);
+	ret = i915_gem_check_wedge(dev_priv, NULL, dev_priv->mm.interruptible);
 	if (ret)
 		return ret;
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 11/15] drm/i915: Port of Added scheduler support to __wait_request() calls
  2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
                   ` (9 preceding siblings ...)
  2016-06-17  7:09 ` [PATCH v2 10/15] drm/i915: Extending i915_gem_check_wedge to check engine reset in progress Arun Siluvery
@ 2016-06-17  7:09 ` Arun Siluvery
  2016-06-17  7:42   ` Chris Wilson
  2016-06-17  7:09 ` [PATCH v2 12/15] drm/i915/tdr: Add engine reset count to error state Arun Siluvery
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 23+ messages in thread
From: Arun Siluvery @ 2016-06-17  7:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Tomas Elf

This is a partial port of the following patch from John Harrison's GPU
scheduler patch series: (patch sent to Intel-GFX with the subject line
"[Intel-gfx] [RFC 19/39] drm/i915: Added scheduler support to __wait_request()
calls" on Fri 17 July 2015)

	Author: John Harrison <John.C.Harrison@Intel.com>
	Date:   Thu Apr 10 10:48:55 2014 +0100
	Subject: drm/i915: Added scheduler support to __wait_request() calls

Removed all scheduler references and backported it to this baseline. The reason
we need this is because Chris Wilson has pointed out that threads that don't
hold the struct_mutex should not be thrown out of __i915_wait_request during
TDR hang recovery. Therefore we need a way to determine which threads are
holding the mutex and which are not.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  7 ++++++-
 drivers/gpu/drm/i915/i915_gem.c         | 34 ++++++++++++++++++++++-----------
 drivers/gpu/drm/i915/intel_display.c    |  5 +++--
 drivers/gpu/drm/i915/intel_ringbuffer.c |  8 +++++---
 4 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b2105bb..3e02b41 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3350,8 +3350,13 @@ void __i915_add_request(struct drm_i915_gem_request *req,
 	__i915_add_request(req, NULL, true)
 #define i915_add_request_no_flush(req) \
 	__i915_add_request(req, NULL, false)
+
+/* flags used by users of __i915_wait_request */
+#define I915_WAIT_REQUEST_INTERRUPTIBLE  (1 << 0)
+#define I915_WAIT_REQUEST_LOCKED         (1 << 1)
+
 int __i915_wait_request(struct drm_i915_gem_request *req,
-			bool interruptible,
+			u32 flags,
 			s64 *timeout,
 			struct intel_rps_client *rps);
 int __must_check i915_wait_request(struct drm_i915_gem_request *req);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index bc404da..b0c2263 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1455,7 +1455,9 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 /**
  * __i915_wait_request - wait until execution of request has finished
  * @req: duh!
- * @interruptible: do an interruptible wait (normally yes)
+ * @flags: flags to define the nature of wait
+ *    I915_WAIT_INTERRUPTIBLE - do an interruptible wait (normally yes)
+ *    I915_WAIT_LOCKED - caller is holding struct_mutex
  * @timeout: in - how long to wait (NULL forever); out - how much time remaining
  * @rps: RPS client
  *
@@ -1470,7 +1472,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
  * errno with remaining time filled in timeout argument.
  */
 int __i915_wait_request(struct drm_i915_gem_request *req,
-			bool interruptible,
+			u32 flags,
 			s64 *timeout,
 			struct intel_rps_client *rps)
 {
@@ -1478,6 +1480,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	struct drm_i915_private *dev_priv = req->i915;
 	const bool irq_test_in_progress =
 		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
+	bool interruptible = flags & I915_WAIT_REQUEST_INTERRUPTIBLE;
 	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
 	DEFINE_WAIT(wait);
 	unsigned long timeout_expire;
@@ -1526,6 +1529,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	for (;;) {
 		struct timer_list timer;
 		int reset_pending;
+		bool locked = flags & I915_WAIT_REQUEST_LOCKED;
 
 		prepare_to_wait(&engine->irq_queue, &wait, state);
 
@@ -1543,7 +1547,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 		reset_pending = i915_engine_reset_pending(&dev_priv->gpu_error,
 							  NULL);
-		if (reset_pending) {
+		if (reset_pending || locked) {
 			ret = -EAGAIN;
 			break;
 		}
@@ -1705,14 +1709,15 @@ int
 i915_wait_request(struct drm_i915_gem_request *req)
 {
 	struct drm_i915_private *dev_priv = req->i915;
-	bool interruptible;
+	u32 flags;
 	int ret;
 
-	interruptible = dev_priv->mm.interruptible;
-
 	BUG_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
 
-	ret = __i915_wait_request(req, interruptible, NULL, NULL);
+	flags = dev_priv->mm.interruptible ? I915_WAIT_REQUEST_INTERRUPTIBLE : 0;
+	flags |= I915_WAIT_REQUEST_LOCKED;
+
+	ret = __i915_wait_request(req, flags, NULL, NULL);
 	if (ret)
 		return ret;
 
@@ -1824,7 +1829,9 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	mutex_unlock(&dev->struct_mutex);
 	ret = 0;
 	for (i = 0; ret == 0 && i < n; i++)
-		ret = __i915_wait_request(requests[i], true, NULL, rps);
+		ret = __i915_wait_request(requests[i],
+					  I915_WAIT_REQUEST_INTERRUPTIBLE,
+					  NULL, rps);
 	mutex_lock(&dev->struct_mutex);
 
 	for (i = 0; i < n; i++) {
@@ -3442,7 +3449,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 
 	for (i = 0; i < n; i++) {
 		if (ret == 0)
-			ret = __i915_wait_request(req[i], true,
+			ret = __i915_wait_request(req[i], I915_WAIT_REQUEST_INTERRUPTIBLE,
 						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
 						  to_rps_client(file));
 		i915_gem_request_unreference(req[i]);
@@ -3473,8 +3480,13 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 
 	if (!i915_semaphore_is_enabled(to_i915(obj->base.dev))) {
 		struct drm_i915_private *i915 = to_i915(obj->base.dev);
+		u32 flags;
+
+		flags = i915->mm.interruptible ? I915_WAIT_REQUEST_INTERRUPTIBLE : 0;
+		flags |= I915_WAIT_REQUEST_LOCKED;
+
 		ret = __i915_wait_request(from_req,
-					  i915->mm.interruptible,
+					  flags,
 					  NULL,
 					  &i915->rps.semaphores);
 		if (ret)
@@ -4476,7 +4488,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (target == NULL)
 		return 0;
 
-	ret = __i915_wait_request(target, true, NULL, NULL);
+	ret = __i915_wait_request(target, I915_WAIT_REQUEST_INTERRUPTIBLE, NULL, NULL);
 	if (ret == 0)
 		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 095f83e..fa29091 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11546,7 +11546,7 @@ static void intel_mmio_flip_work_func(struct work_struct *w)
 
 	if (work->flip_queued_req)
 		WARN_ON(__i915_wait_request(work->flip_queued_req,
-					    false, NULL,
+					    0, NULL,
 					    &dev_priv->rps.mmioflips));
 
 	/* For framebuffer backed by dmabuf, wait for fence */
@@ -13602,7 +13602,8 @@ static int intel_atomic_prepare_commit(struct drm_device *dev,
 				continue;
 
 			ret = __i915_wait_request(intel_plane_state->wait_req,
-						  true, NULL, NULL);
+						  I915_WAIT_REQUEST_INTERRUPTIBLE,
+						  NULL, NULL);
 			if (ret) {
 				/* Any hang should be swallowed by the wait */
 				WARN_ON(ret == -EIO);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index fedd270..8d34f1c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2414,6 +2414,7 @@ void intel_cleanup_engine(struct intel_engine_cs *engine)
 int intel_engine_idle(struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_request *req;
+	u32 flags;
 
 	/* Wait upon the last request to be completed */
 	if (list_empty(&engine->request_list))
@@ -2423,10 +2424,11 @@ int intel_engine_idle(struct intel_engine_cs *engine)
 			 struct drm_i915_gem_request,
 			 list);
 
+	flags = req->i915->mm.interruptible ? I915_WAIT_REQUEST_INTERRUPTIBLE : 0;
+	flags |= I915_WAIT_REQUEST_LOCKED;
+
 	/* Make sure we do not trigger any retires */
-	return __i915_wait_request(req,
-				   req->i915->mm.interruptible,
-				   NULL, NULL);
+	return __i915_wait_request(req, flags, NULL, NULL);
 }
 
 int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request)
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 12/15] drm/i915/tdr: Add engine reset count to error state
  2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
                   ` (10 preceding siblings ...)
  2016-06-17  7:09 ` [PATCH v2 11/15] drm/i915: Port of Added scheduler support to __wait_request() calls Arun Siluvery
@ 2016-06-17  7:09 ` Arun Siluvery
  2016-06-17  7:09 ` [PATCH v2 13/15] drm/i915/tdr: Export reset count info to debugfs Arun Siluvery
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Arun Siluvery @ 2016-06-17  7:09 UTC (permalink / raw)
  To: intel-gfx

Driver maintains count of how many times a given engine is reset, useful to
capture this in error state also. It gives an idea of how engine is coping
up with the workloads it is executing before this error state.

Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h       | 1 +
 drivers/gpu/drm/i915/i915_gpu_error.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3e02b41..48a96b3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -506,6 +506,7 @@ struct drm_i915_error_state {
 		int hangcheck_score;
 		enum intel_ring_hangcheck_action hangcheck_action;
 		int num_requests;
+		u32 reset_count;
 
 		/* our own tracking of ring head and tail */
 		u32 cpu_ring_head;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 34ff245..c431248 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -303,6 +303,7 @@ static void i915_ring_error_state(struct drm_i915_error_state_buf *m,
 	err_printf(m, "  hangcheck: %s [%d]\n",
 		   hangcheck_action_to_str(ring->hangcheck_action),
 		   ring->hangcheck_score);
+	err_printf(m, "  engine reset count: %u\n", ring->reset_count);
 }
 
 void i915_error_printf(struct drm_i915_error_state_buf *e, const char *f, ...)
@@ -967,6 +968,8 @@ static void i915_record_ring_state(struct drm_i915_private *dev_priv,
 
 	ering->hangcheck_score = engine->hangcheck.score;
 	ering->hangcheck_action = engine->hangcheck.action;
+	ering->reset_count = i915_engine_reset_count(&dev_priv->gpu_error,
+						     engine);
 
 	if (USES_PPGTT(dev_priv)) {
 		int i;
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 13/15] drm/i915/tdr: Export reset count info to debugfs
  2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
                   ` (11 preceding siblings ...)
  2016-06-17  7:09 ` [PATCH v2 12/15] drm/i915/tdr: Add engine reset count to error state Arun Siluvery
@ 2016-06-17  7:09 ` Arun Siluvery
  2016-06-17  7:21   ` Chris Wilson
  2016-06-17  7:09 ` [PATCH v2 14/15] drm/i915/tdr: Enable Engine reset and recovery support Arun Siluvery
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 23+ messages in thread
From: Arun Siluvery @ 2016-06-17  7:09 UTC (permalink / raw)
  To: intel-gfx

A new variable is added to export the reset counts to debugfs, this
includes full gpu reset and engine reset count. This is useful for tests
where they areexpected to trigger reset; these counts are checked before
and after the test to ensure the same.

Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 5b75266..b02ca7a 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4814,6 +4814,38 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_wedged_fops,
 			i915_wedged_get, i915_wedged_set,
 			"%llu\n");
 
+
+static ssize_t i915_reset_info_read(struct file *filp, char __user *ubuf,
+				    size_t max, loff_t *ppos)
+{
+	int len;
+	char buf[300];
+	struct drm_device *dev = filp->private_data;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_gpu_error *error = &dev_priv->gpu_error;
+	struct intel_engine_cs *engine;
+
+	len = scnprintf(buf, sizeof(buf), "full gpu reset = %u\n",
+			i915_reset_count(error));
+
+	for_each_engine(engine, dev_priv) {
+		len += scnprintf(buf + len, sizeof(buf) - len,
+				 "%s = %u\n", engine->name,
+				 i915_engine_reset_count(error, engine));
+	}
+
+	len += scnprintf(buf + len - 1, sizeof(buf) - len, "\n");
+
+	return simple_read_from_buffer(ubuf, max, ppos, buf, len);
+}
+
+static const struct file_operations i915_reset_info_fops = {
+	.owner = THIS_MODULE,
+	.open = simple_open,
+	.read = i915_reset_info_read,
+	.llseek = default_llseek,
+};
+
 static int
 i915_ring_stop_get(void *data, u64 *val)
 {
@@ -5474,6 +5506,7 @@ static const struct i915_debugfs_files {
 	const struct file_operations *fops;
 } i915_debugfs_files[] = {
 	{"i915_wedged", &i915_wedged_fops},
+	{"i915_reset_info", &i915_reset_info_fops},
 	{"i915_max_freq", &i915_max_freq_fops},
 	{"i915_min_freq", &i915_min_freq_fops},
 	{"i915_cache_sharing", &i915_cache_sharing_fops},
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 14/15] drm/i915/tdr: Enable Engine reset and recovery support
  2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
                   ` (12 preceding siblings ...)
  2016-06-17  7:09 ` [PATCH v2 13/15] drm/i915/tdr: Export reset count info to debugfs Arun Siluvery
@ 2016-06-17  7:09 ` Arun Siluvery
  2016-06-17  7:09 ` [ONLY FOR BAT v2 15/15] drm/i915: Disable GuC submission for testing Engine reset patches Arun Siluvery
  2016-06-17  7:33 ` ✗ Ro.CI.BAT: failure for Execlist based Engine reset and recovery Patchwork
  15 siblings, 0 replies; 23+ messages in thread
From: Arun Siluvery @ 2016-06-17  7:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Tomas Elf

Everything in place, flip the switch.

This feature is available only from Gen8, for previous gen devices driver
falls back to legacy full gpu reset.

Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_params.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index b012da0..d3f021c 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -45,7 +45,7 @@ struct i915_params i915 __read_mostly = {
 	.fastboot = 0,
 	.prefault_disable = 0,
 	.load_detect_test = 0,
-	.reset = 1,
+	.reset = 2,
 	.invert_brightness = 0,
 	.disable_display = 0,
 	.enable_cmd_parser = 1,
@@ -111,7 +111,7 @@ MODULE_PARM_DESC(vbt_sdvo_panel_type,
 	"(-2=ignore, -1=auto [default], index in VBT BIOS table)");
 
 module_param_named_unsafe(reset, i915.reset, int, 0600);
-MODULE_PARM_DESC(reset, "Attempt GPU resets (0=disabled, 1=full gpu reset [default], 2=engine reset)");
+MODULE_PARM_DESC(reset, "Attempt GPU resets (0=disabled, 1=full gpu reset, 2=engine reset [default])");
 
 module_param_named_unsafe(enable_hangcheck, i915.enable_hangcheck, bool, 0644);
 MODULE_PARM_DESC(enable_hangcheck,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [ONLY FOR BAT v2 15/15] drm/i915: Disable GuC submission for testing Engine reset patches
  2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
                   ` (13 preceding siblings ...)
  2016-06-17  7:09 ` [PATCH v2 14/15] drm/i915/tdr: Enable Engine reset and recovery support Arun Siluvery
@ 2016-06-17  7:09 ` Arun Siluvery
  2016-06-17  7:33 ` ✗ Ro.CI.BAT: failure for Execlist based Engine reset and recovery Patchwork
  15 siblings, 0 replies; 23+ messages in thread
From: Arun Siluvery @ 2016-06-17  7:09 UTC (permalink / raw)
  To: intel-gfx

Engine reset implementation is currently supported with Execlist based
submission, GuC submission need to be disabled for BAT testing purpose.

Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---

Current TDR implementation is only for Execlist submission, we need this
patch to enable BAT testing with Execlist submission

 drivers/gpu/drm/i915/i915_params.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index d3f021c..b5881fe 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -54,8 +54,8 @@ struct i915_params i915 __read_mostly = {
 	.verbose_state_checks = 1,
 	.nuclear_pageflip = 0,
 	.edp_vswing = 0,
-	.enable_guc_loading = -1,
-	.enable_guc_submission = -1,
+	.enable_guc_loading = 0,
+	.enable_guc_submission = 0,
 	.guc_log_level = -1,
 	.enable_dp_mst = true,
 	.inject_load_failure = 0,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 13/15] drm/i915/tdr: Export reset count info to debugfs
  2016-06-17  7:09 ` [PATCH v2 13/15] drm/i915/tdr: Export reset count info to debugfs Arun Siluvery
@ 2016-06-17  7:21   ` Chris Wilson
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2016-06-17  7:21 UTC (permalink / raw)
  To: Arun Siluvery; +Cc: intel-gfx

On Fri, Jun 17, 2016 at 08:09:13AM +0100, Arun Siluvery wrote:
> A new variable is added to export the reset counts to debugfs, this
> includes full gpu reset and engine reset count. This is useful for tests
> where they areexpected to trigger reset; these counts are checked before
> and after the test to ensure the same.
> 
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c | 33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 5b75266..b02ca7a 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -4814,6 +4814,38 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_wedged_fops,
>  			i915_wedged_get, i915_wedged_set,
>  			"%llu\n");
>  
> +
> +static ssize_t i915_reset_info_read(struct file *filp, char __user *ubuf,
> +				    size_t max, loff_t *ppos)
> +{
> +	int len;
> +	char buf[300];
> +	struct drm_device *dev = filp->private_data;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct i915_gpu_error *error = &dev_priv->gpu_error;
> +	struct intel_engine_cs *engine;
> +
> +	len = scnprintf(buf, sizeof(buf), "full gpu reset = %u\n",
> +			i915_reset_count(error));
> +
> +	for_each_engine(engine, dev_priv) {
> +		len += scnprintf(buf + len, sizeof(buf) - len,
> +				 "%s = %u\n", engine->name,
> +				 i915_engine_reset_count(error, engine));
> +	}
> +
> +	len += scnprintf(buf + len - 1, sizeof(buf) - len, "\n");
> +
> +	return simple_read_from_buffer(ubuf, max, ppos, buf, len);
> +}
> +
> +static const struct file_operations i915_reset_info_fops = {
> +	.owner = THIS_MODULE,
> +	.open = simple_open,
> +	.read = i915_reset_info_read,
> +	.llseek = default_llseek,
> +};

Why not a simple seq file? Why not extend hangcheck info?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 07/15] drm/i915/tdr: Restore engine state and start after reset
  2016-06-17  7:09 ` [PATCH v2 07/15] drm/i915/tdr: Restore engine state and start after reset Arun Siluvery
@ 2016-06-17  7:32   ` Chris Wilson
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2016-06-17  7:32 UTC (permalink / raw)
  To: Arun Siluvery; +Cc: intel-gfx, Tomas Elf

On Fri, Jun 17, 2016 at 08:09:07AM +0100, Arun Siluvery wrote:
> We capture the state of an engine before resetting it, once the reset is
> successful engine is restored with the same state and restarted.
> 
> The state includes head register and active request. We also nudge the head
> forward if it hasn't advanced, otherwise when the engine is restarted HW
> executes the same instruction and may hang again. Generally head
> automatically advances to the next instruction as soon as HW reads current
> instruction, without waiting for it to complete, however a MBOX wait
> inserted directly to VCS/BCS engines doesn't behave in the same way,
> instead head will still be pointing at the same instruction until it
> completes.
> 
> If the head is modified, this is also updated in the context image so that
> HW sees up to date value.
> 
> A valid request is expected in the state at this point otherwise we
> wouldn't have reached this point, the context that submitted this request
> is resubmitted to HW. The request that caused the hang would be at the
> start of execlist queue, unless we resubmit and complete this request, it
> cannot be removed from the queue.
> 
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Tomas Elf <tomas.elf@intel.com>
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c        | 94 +++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  9 ++++
>  2 files changed, 103 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index b83552a..c9aa2ca 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -487,6 +487,30 @@ static void execlists_context_unqueue(struct intel_engine_cs *engine,
>  	execlists_submit_requests(req0, req1, tdr_resubmission);
>  }
>  
> +/**
> + * intel_execlists_resubmit()
> + * @engine: engine to do resubmission for
> + *
> + * In execlists mode, engine reset postprocess mainly includes resubmission of
> + * context after reset, for this we bypass the execlist queue. This is
> + * necessary since at the point of TDR hang recovery the hardware will be hung
> + * and resubmitting a fixed context (the context that the TDR has identified
> + * as hung and fixed up in order to move past the blocking batch buffer) to a
> + * hung execlist queue will lock up the TDR.  Instead, opt for direct ELSP
> + * submission without depending on the rest of the driver.
> + */
> +static void intel_execlists_resubmit(struct intel_engine_cs *engine)
> +{
> +	unsigned long flags;
> +
> +	if (WARN_ON(list_empty(&engine->execlist_queue)))
> +		return;
> +
> +	spin_lock_irqsave(&engine->execlist_lock, flags);
> +	execlists_context_unqueue(engine, true);
> +	spin_unlock_irqrestore(&engine->execlist_lock, flags);
> +}
> +
>  static unsigned int
>  execlists_check_remove_request(struct intel_engine_cs *engine, u32 ctx_id)
>  {
> @@ -1098,6 +1122,75 @@ static int gen8_engine_state_save(struct intel_engine_cs *engine,
>  	return 0;
>  }
>  
> +/**
> + * gen8_engine_start() - restore saved state and start engine
> + * @engine: engine to be started
> + * @state: state to be restored
> + *
> + * Returns:
> + *	0 if ok, otherwise propagates error codes.
> + */
> +static int gen8_engine_start(struct intel_engine_cs *engine,
> +			     struct intel_engine_cs_state *state)
> +{
> +	u32 head;
> +	u32 head_addr, tail_addr;
> +	u32 *reg_state;
> +	struct intel_ringbuffer *ringbuf;
> +	struct i915_gem_context *ctx;
> +	struct drm_i915_private *dev_priv = engine->i915;
> +
> +	ctx = state->req->ctx;
> +	ringbuf = ctx->engine[engine->id].ringbuf;
> +	reg_state = ctx->engine[engine->id].lrc_reg_state;
> +
> +	head = state->head;
> +	head_addr = head & HEAD_ADDR;
> +
> +	if (head == engine->hangcheck.last_head) {
> +		/*
> +		 * The engine has not advanced since the last time it hung,
> +		 * force it to advance to the next QWORD. In most cases the
> +		 * engine head pointer will automatically advance to the
> +		 * next instruction as soon as it has read the current
> +		 * instruction, without waiting for it to complete. This
> +		 * seems to be the default behaviour, however an MBOX wait
> +		 * inserted directly to the VCS/BCS engines does not behave
> +		 * in the same way, instead the head pointer will still be
> +		 * pointing at the MBOX instruction until it completes.
> +		 */
> +		head_addr = roundup(head_addr, 8);
> +		engine->hangcheck.last_head = head;
> +	} else if (head_addr & 0x7) {
> +		/* Ensure head pointer is pointing to a QWORD boundary */
> +		head_addr = ALIGN(head_addr, 8);
> +	}
> +
> +	tail_addr = reg_state[CTX_RING_TAIL+1] & TAIL_ADDR;
> +
> +	if (head_addr > tail_addr)
> +		head_addr = tail_addr;
> +	else if (head_addr >= ringbuf->size)
> +		head_addr = 0;
> +
> +	head &= ~HEAD_ADDR;
> +	head |= (head_addr & HEAD_ADDR);
> +
> +	/* Restore head */
> +	reg_state[CTX_RING_HEAD+1] = head;
> +	I915_WRITE_HEAD(engine, head);
> +
> +	/* set head */
> +	ringbuf->head = head;
> +	ringbuf->last_retired_head = -1;
> +	intel_ring_update_space(ringbuf);
> +
> +	if (state->req)
> +		intel_execlists_resubmit(engine);

So given that we have a request, why not just use the request to set the
state? We don't need to save anything as either we ensure the ring is
stopped (no new request) or submit the next request.

Also we already have a callback to start the engines, that would be easy
to extend to support starting at a particular request (was intended to
be).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* ✗ Ro.CI.BAT: failure for Execlist based Engine reset and recovery
  2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
                   ` (14 preceding siblings ...)
  2016-06-17  7:09 ` [ONLY FOR BAT v2 15/15] drm/i915: Disable GuC submission for testing Engine reset patches Arun Siluvery
@ 2016-06-17  7:33 ` Patchwork
  15 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2016-06-17  7:33 UTC (permalink / raw)
  To: arun.siluvery; +Cc: intel-gfx

== Series Details ==

Series: Execlist based Engine reset and recovery
URL   : https://patchwork.freedesktop.org/series/8805/
State : failure

== Summary ==

Series 8805v1 Execlist based Engine reset and recovery
http://patchwork.freedesktop.org/api/1.0/series/8805/revisions/1/mbox

Test drv_getparams_basic:
        Subgroup basic-eu-total:
                pass       -> SKIP       (ro-ivb2-i7-3770)
        Subgroup basic-subslice-total:
                pass       -> SKIP       (ro-ivb2-i7-3770)
Test drv_hangman:
        Subgroup error-state-basic:
                pass       -> DMESG-WARN (ro-bdw-i7-5557U)
                pass       -> TIMEOUT    (ro-ilk1-i5-650)
                pass       -> INCOMPLETE (fi-bdw-i7-5557u)
                pass       -> DMESG-WARN (fi-skl-i5-6260u)
                pass       -> DMESG-WARN (ro-skl3-i5-6260u)
                pass       -> TIMEOUT    (ro-snb-i7-2620M)
                pass       -> INCOMPLETE (ro-bdw-i5-5250u)
                pass       -> TIMEOUT    (fi-snb-i7-2600)
                pass       -> SKIP       (ro-ivb2-i7-3770)
                pass       -> TIMEOUT    (ro-byt-n2820)
                pass       -> DMESG-WARN (fi-skl-i7-6700k)
Test drv_module_reload_basic:
                pass       -> DMESG-FAIL (ro-ivb2-i7-3770)
                pass       -> INCOMPLETE (ro-bdw-i7-5600u)
                pass       -> INCOMPLETE (ro-hsw-i3-4010u)
                dmesg-warn -> PASS       (ro-skl3-i5-6260u)
                pass       -> INCOMPLETE (ro-hsw-i7-4770r)
Test gem_basic:
        Subgroup bad-close:
                pass       -> SKIP       (ro-ivb2-i7-3770)
                pass       -> INCOMPLETE (ro-byt-n2820)
                pass       -> INCOMPLETE (ro-ilk1-i5-650)
                pass       -> INCOMPLETE (ro-snb-i7-2620M)
                pass       -> INCOMPLETE (fi-snb-i7-2600)
        Subgroup create-close:
                pass       -> INCOMPLETE (ro-ivb2-i7-3770)

fi-bdw-i7-5557u  total:29   pass:23   dwarn:0   dfail:0   fail:0   skip:5  
fi-skl-i5-6260u  total:213  pass:201  dwarn:1   dfail:0   fail:0   skip:11 
fi-skl-i7-6700k  total:213  pass:187  dwarn:1   dfail:0   fail:0   skip:25 
fi-snb-i7-2600   total:30   pass:23   dwarn:0   dfail:0   fail:0   skip:5  
ro-bdw-i5-5250u  total:29   pass:23   dwarn:0   dfail:0   fail:0   skip:5  
ro-bdw-i7-5557U  total:213  pass:197  dwarn:1   dfail:0   fail:0   skip:15 
ro-bdw-i7-5600u  total:26   pass:14   dwarn:0   dfail:0   fail:0   skip:11 
ro-bsw-n3050     total:189  pass:155  dwarn:1   dfail:0   fail:2   skip:30 
ro-byt-n2820     total:30   pass:20   dwarn:0   dfail:0   fail:0   skip:8  
ro-hsw-i3-4010u  total:26   pass:18   dwarn:0   dfail:0   fail:0   skip:7  
ro-hsw-i7-4770r  total:26   pass:18   dwarn:0   dfail:0   fail:0   skip:7  
ro-ilk-i7-620lm  total:30   pass:10   dwarn:0   dfail:0   fail:0   skip:18 
ro-ilk1-i5-650   total:30   pass:14   dwarn:0   dfail:0   fail:0   skip:14 
ro-ivb2-i7-3770  total:31   pass:21   dwarn:0   dfail:1   fail:0   skip:8  
ro-skl3-i5-6260u total:213  pass:201  dwarn:1   dfail:0   fail:0   skip:11 
ro-snb-i7-2620M  total:30   pass:23   dwarn:0   dfail:0   fail:0   skip:5  
fi-hsw-i7-4770k failed to connect after reboot
ro-ivb-i7-3770 failed to connect after reboot

Results at /archive/results/CI_IGT_test/RO_Patchwork_1200/

3eb202e drm-intel-nightly: 2016y-06m-16d-12h-38m-37s UTC integration manifest
65684b6 drm/i915: Disable GuC submission for testing Engine reset patches
8307947 drm/i915/tdr: Enable Engine reset and recovery support
b6ea345 drm/i915/tdr: Export reset count info to debugfs
ef4e35f drm/i915/tdr: Add engine reset count to error state
ea0b384 drm/i915: Port of Added scheduler support to __wait_request() calls
0f1a809 drm/i915: Extending i915_gem_check_wedge to check engine reset in progress
3cb8ac0 drm/i915: Skip reset request if there is one already
e86f130 drm/i915/tdr: Add support for per engine reset recovery
b81330c drm/i915/tdr: Restore engine state and start after reset
ea3164f drm/i915/tdr: Capture engine state before reset
76c2f77 drm/i915/tdr: Prepare execlist submission to handle tdr resubmission after reset
c187d84 drm/i915/tdr: Modify error handler for per engine hang recovery
7c4b258 drm/i915: Reinstate hang recovery work queue.
61754eb drm/i915/tdr: Extend the idea of reset_counter to engine reset
38b06a3 drm/i915: Update i915.reset to handle engine resets

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 02/15] drm/i915/tdr: Extend the idea of reset_counter to engine reset
  2016-06-17  7:09 ` [PATCH v2 02/15] drm/i915/tdr: Extend the idea of reset_counter to engine reset Arun Siluvery
@ 2016-06-17  7:35   ` Chris Wilson
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2016-06-17  7:35 UTC (permalink / raw)
  To: Arun Siluvery; +Cc: intel-gfx

On Fri, Jun 17, 2016 at 08:09:02AM +0100, Arun Siluvery wrote:
> This change extends the idea of reset_counter variable to engine reset by
> creating additional variables for each engine. Least significant bit is set
> to mark the engine reset is pending and once reset is successful it is
> incremented again, this is further used to count the no of engine resets.

You still haven't split it up into the seperate fields.

atomic_t flags;
atomic_t global_reset_counter;
atomic_t engine_reset_counter[NUM_ENGINES];

That way checking whether a reset in progress is just one atomic read,
not N+1.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 05/15] drm/i915/tdr: Prepare execlist submission to handle tdr resubmission after reset
  2016-06-17  7:09 ` [PATCH v2 05/15] drm/i915/tdr: Prepare execlist submission to handle tdr resubmission after reset Arun Siluvery
@ 2016-06-17  7:39   ` Chris Wilson
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2016-06-17  7:39 UTC (permalink / raw)
  To: Arun Siluvery; +Cc: intel-gfx, Tomas Elf

On Fri, Jun 17, 2016 at 08:09:05AM +0100, Arun Siluvery wrote:
> To resume execution after engine reset we resubmit the context and this
> needs to be treated differently otherwise we would count it as a completely
> new submission. This change modifies the submission path to account for
> this.
> 
> During resubmission we only submit the head request that caused the hang,
> once this is done then we continue with the normally submission of two
> contexts at a time. The intention is to restore the submission state at the
> time of hang.

Why do we want to resubmit the hanging request (with HEAD increment by a
couple of dwords)? Kill the job, move on.

How quickly do you decide to actually abandon the request?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 10/15] drm/i915: Extending i915_gem_check_wedge to check engine reset in progress
  2016-06-17  7:09 ` [PATCH v2 10/15] drm/i915: Extending i915_gem_check_wedge to check engine reset in progress Arun Siluvery
@ 2016-06-17  7:41   ` Chris Wilson
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2016-06-17  7:41 UTC (permalink / raw)
  To: Arun Siluvery; +Cc: intel-gfx, Ian Lister, Tomas Elf

On Fri, Jun 17, 2016 at 08:09:10AM +0100, Arun Siluvery wrote:
> @@ -1515,6 +1541,13 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  			break;
>  		}
>  
> +		reset_pending = i915_engine_reset_pending(&dev_priv->gpu_error,
> +							  NULL);
> +		if (reset_pending) {
> +			ret = -EAGAIN;

You haven't prepared all call paths to handle the error.
Tell me again what you need struct_mutex for?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 11/15] drm/i915: Port of Added scheduler support to __wait_request() calls
  2016-06-17  7:09 ` [PATCH v2 11/15] drm/i915: Port of Added scheduler support to __wait_request() calls Arun Siluvery
@ 2016-06-17  7:42   ` Chris Wilson
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2016-06-17  7:42 UTC (permalink / raw)
  To: Arun Siluvery; +Cc: intel-gfx, Tomas Elf

On Fri, Jun 17, 2016 at 08:09:11AM +0100, Arun Siluvery wrote:
> This is a partial port of the following patch from John Harrison's GPU
> scheduler patch series: (patch sent to Intel-GFX with the subject line
> "[Intel-gfx] [RFC 19/39] drm/i915: Added scheduler support to __wait_request()
> calls" on Fri 17 July 2015)
> 
> 	Author: John Harrison <John.C.Harrison@Intel.com>
> 	Date:   Thu Apr 10 10:48:55 2014 +0100
> 	Subject: drm/i915: Added scheduler support to __wait_request() calls
> 
> Removed all scheduler references and backported it to this baseline. The reason
> we need this is because Chris Wilson has pointed out that threads that don't
> hold the struct_mutex should not be thrown out of __i915_wait_request during
> TDR hang recovery. Therefore we need a way to determine which threads are
> holding the mutex and which are not.
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Tomas Elf <tomas.elf@intel.com>
> Signed-off-by: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |  7 ++++++-
>  drivers/gpu/drm/i915/i915_gem.c         | 34 ++++++++++++++++++++++-----------
>  drivers/gpu/drm/i915/intel_display.c    |  5 +++--
>  drivers/gpu/drm/i915/intel_ringbuffer.c |  8 +++++---
>  4 files changed, 37 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index b2105bb..3e02b41 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3350,8 +3350,13 @@ void __i915_add_request(struct drm_i915_gem_request *req,
>  	__i915_add_request(req, NULL, true)
>  #define i915_add_request_no_flush(req) \
>  	__i915_add_request(req, NULL, false)
> +
> +/* flags used by users of __i915_wait_request */
> +#define I915_WAIT_REQUEST_INTERRUPTIBLE  (1 << 0)
> +#define I915_WAIT_REQUEST_LOCKED         (1 << 1)
> +
>  int __i915_wait_request(struct drm_i915_gem_request *req,
> -			bool interruptible,
> +			u32 flags,
>  			s64 *timeout,
>  			struct intel_rps_client *rps);
>  int __must_check i915_wait_request(struct drm_i915_gem_request *req);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index bc404da..b0c2263 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1455,7 +1455,9 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>  /**
>   * __i915_wait_request - wait until execution of request has finished
>   * @req: duh!
> - * @interruptible: do an interruptible wait (normally yes)
> + * @flags: flags to define the nature of wait
> + *    I915_WAIT_INTERRUPTIBLE - do an interruptible wait (normally yes)
> + *    I915_WAIT_LOCKED - caller is holding struct_mutex
>   * @timeout: in - how long to wait (NULL forever); out - how much time remaining
>   * @rps: RPS client
>   *
> @@ -1470,7 +1472,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>   * errno with remaining time filled in timeout argument.
>   */
>  int __i915_wait_request(struct drm_i915_gem_request *req,
> -			bool interruptible,
> +			u32 flags,
>  			s64 *timeout,
>  			struct intel_rps_client *rps)
>  {
> @@ -1478,6 +1480,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  	struct drm_i915_private *dev_priv = req->i915;
>  	const bool irq_test_in_progress =
>  		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
> +	bool interruptible = flags & I915_WAIT_REQUEST_INTERRUPTIBLE;
>  	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
>  	DEFINE_WAIT(wait);
>  	unsigned long timeout_expire;
> @@ -1526,6 +1529,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  	for (;;) {
>  		struct timer_list timer;
>  		int reset_pending;
> +		bool locked = flags & I915_WAIT_REQUEST_LOCKED;
>  
>  		prepare_to_wait(&engine->irq_queue, &wait, state);
>  
> @@ -1543,7 +1547,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  
>  		reset_pending = i915_engine_reset_pending(&dev_priv->gpu_error,
>  							  NULL);
> -		if (reset_pending) {
> +		if (reset_pending || locked) {

Very funny!
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-06-17  7:42 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 01/15] drm/i915: Update i915.reset to handle engine resets Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 02/15] drm/i915/tdr: Extend the idea of reset_counter to engine reset Arun Siluvery
2016-06-17  7:35   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 03/15] drm/i915: Reinstate hang recovery work queue Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 04/15] drm/i915/tdr: Modify error handler for per engine hang recovery Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 05/15] drm/i915/tdr: Prepare execlist submission to handle tdr resubmission after reset Arun Siluvery
2016-06-17  7:39   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 06/15] drm/i915/tdr: Capture engine state before reset Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 07/15] drm/i915/tdr: Restore engine state and start after reset Arun Siluvery
2016-06-17  7:32   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 08/15] drm/i915/tdr: Add support for per engine reset recovery Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 09/15] drm/i915: Skip reset request if there is one already Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 10/15] drm/i915: Extending i915_gem_check_wedge to check engine reset in progress Arun Siluvery
2016-06-17  7:41   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 11/15] drm/i915: Port of Added scheduler support to __wait_request() calls Arun Siluvery
2016-06-17  7:42   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 12/15] drm/i915/tdr: Add engine reset count to error state Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 13/15] drm/i915/tdr: Export reset count info to debugfs Arun Siluvery
2016-06-17  7:21   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 14/15] drm/i915/tdr: Enable Engine reset and recovery support Arun Siluvery
2016-06-17  7:09 ` [ONLY FOR BAT v2 15/15] drm/i915: Disable GuC submission for testing Engine reset patches Arun Siluvery
2016-06-17  7:33 ` ✗ Ro.CI.BAT: failure for Execlist based Engine reset and recovery Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox