intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] drm/i915: Trim error mask to known engines
@ 2018-03-16 21:49 Chris Wilson
  2018-03-16 21:50 ` [PATCH 2/3] drm/i915: Add control flags to i915_handle_error() Chris Wilson
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Chris Wilson @ 2018-03-16 21:49 UTC (permalink / raw)
  To: intel-gfx

For the convenience of userspace passing in an arbitrary reset mask,
remove unknown engines from the set of engines that are to be reset.
This means that we always follow a per-engine reset with a full-device
reset when userspace writes -1 into debugfs/i915_wedged.

Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 828f3104488c..44eef355e12c 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2985,6 +2985,7 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
 	 */
 	intel_runtime_pm_get(dev_priv);
 
+	engine_mask &= INTEL_INFO(dev_priv)->ring_mask;
 	i915_capture_error_state(dev_priv, engine_mask, error_msg);
 	i915_clear_error_registers(dev_priv);
 
-- 
2.16.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] drm/i915: Add control flags to i915_handle_error()
  2018-03-16 21:49 [PATCH 1/3] drm/i915: Trim error mask to known engines Chris Wilson
@ 2018-03-16 21:50 ` Chris Wilson
  2018-03-19 16:48   ` Michel Thierry
  2018-03-16 21:50 ` [PATCH 3/3] drm/i915/execlists: Refactor out complete_preempt_context() Chris Wilson
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2018-03-16 21:50 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

Not all callers want the GPU error to handled in the same way, so expose
a control parameter. In the first instance, some callers do not want the
heavyweight error capture so add a bit to request the state to be
captured and saved.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c              |  4 ++--
 drivers/gpu/drm/i915/i915_drv.h                  |  4 +++-
 drivers/gpu/drm/i915/i915_irq.c                  | 22 ++++++++++++++--------
 drivers/gpu/drm/i915/intel_hangcheck.c           |  6 +++---
 drivers/gpu/drm/i915/selftests/intel_hangcheck.c |  2 +-
 5 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 5378863e3238..03b74a92caed 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3952,8 +3952,8 @@ i915_wedged_set(void *data, u64 val)
 		engine->hangcheck.stalled = true;
 	}
 
-	i915_handle_error(i915, val, "Manually set wedged engine mask = %llx",
-			  val);
+	i915_handle_error(i915, val, I915_ERROR_CAPTURE,
+			  "Manually set wedged engine mask = %llx", val);
 
 	wait_on_bit(&i915->gpu_error.flags,
 		    I915_RESET_HANDOFF,
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e27ba8fb64e6..53009ba50640 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2751,10 +2751,12 @@ static inline void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
 			   &dev_priv->gpu_error.hangcheck_work, delay);
 }
 
-__printf(3, 4)
+__printf(4, 5)
 void i915_handle_error(struct drm_i915_private *dev_priv,
 		       u32 engine_mask,
+		       unsigned long flags,
 		       const char *fmt, ...);
+#define I915_ERROR_CAPTURE BIT(0)
 
 extern void intel_irq_init(struct drm_i915_private *dev_priv);
 extern void intel_irq_fini(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 44eef355e12c..b3a4dc7cb26c 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2955,6 +2955,7 @@ static void i915_clear_error_registers(struct drm_i915_private *dev_priv)
  * i915_handle_error - handle a gpu error
  * @dev_priv: i915 device private
  * @engine_mask: mask representing engines that are hung
+ * @flags: control flags
  * @fmt: Error message format string
  *
  * Do some basic checking of register state at error time and
@@ -2965,16 +2966,11 @@ static void i915_clear_error_registers(struct drm_i915_private *dev_priv)
  */
 void i915_handle_error(struct drm_i915_private *dev_priv,
 		       u32 engine_mask,
+		       unsigned long flags,
 		       const char *fmt, ...)
 {
 	struct intel_engine_cs *engine;
 	unsigned int tmp;
-	va_list args;
-	char error_msg[80];
-
-	va_start(args, fmt);
-	vscnprintf(error_msg, sizeof(error_msg), fmt, args);
-	va_end(args);
 
 	/*
 	 * In most cases it's guaranteed that we get here with an RPM
@@ -2986,8 +2982,18 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
 	intel_runtime_pm_get(dev_priv);
 
 	engine_mask &= INTEL_INFO(dev_priv)->ring_mask;
-	i915_capture_error_state(dev_priv, engine_mask, error_msg);
-	i915_clear_error_registers(dev_priv);
+
+	if (flags & I915_ERROR_CAPTURE) {
+		char error_msg[80];
+		va_list args;
+
+		va_start(args, fmt);
+		vscnprintf(error_msg, sizeof(error_msg), fmt, args);
+		va_end(args);
+
+		i915_capture_error_state(dev_priv, engine_mask, error_msg);
+		i915_clear_error_registers(dev_priv);
+	}
 
 	/*
 	 * Try engine reset when available. We fall back to full reset if
diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index 42e45ae87393..13d1a269c771 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -246,7 +246,7 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
 	 */
 	tmp = I915_READ_CTL(engine);
 	if (tmp & RING_WAIT) {
-		i915_handle_error(dev_priv, 0,
+		i915_handle_error(dev_priv, 0, 0,
 				  "Kicking stuck wait on %s",
 				  engine->name);
 		I915_WRITE_CTL(engine, tmp);
@@ -258,7 +258,7 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
 		default:
 			return ENGINE_DEAD;
 		case 1:
-			i915_handle_error(dev_priv, 0,
+			i915_handle_error(dev_priv, 0, 0,
 					  "Kicking stuck semaphore on %s",
 					  engine->name);
 			I915_WRITE_CTL(engine, tmp);
@@ -392,7 +392,7 @@ static void hangcheck_declare_hang(struct drm_i915_private *i915,
 				 "%s, ", engine->name);
 	msg[len-2] = '\0';
 
-	return i915_handle_error(i915, hung, "%s", msg);
+	return i915_handle_error(i915, hung, I915_ERROR_CAPTURE, "%s", msg);
 }
 
 /*
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index df7898c8edcb..84aad2b4825d 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -1084,7 +1084,7 @@ static int igt_handle_error(void *arg)
 	engine->hangcheck.stalled = true;
 	engine->hangcheck.seqno = intel_engine_get_seqno(engine);
 
-	i915_handle_error(i915, intel_engine_flag(engine), "%s", __func__);
+	i915_handle_error(i915, intel_engine_flag(engine), 0, "%s", __func__);
 
 	xchg(&i915->gpu_error.first_error, error);
 
-- 
2.16.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] drm/i915/execlists: Refactor out complete_preempt_context()
  2018-03-16 21:49 [PATCH 1/3] drm/i915: Trim error mask to known engines Chris Wilson
  2018-03-16 21:50 ` [PATCH 2/3] drm/i915: Add control flags to i915_handle_error() Chris Wilson
@ 2018-03-16 21:50 ` Chris Wilson
  2018-03-16 22:19 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Trim error mask to known engines Patchwork
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2018-03-16 21:50 UTC (permalink / raw)
  To: intel-gfx

As a complement to inject_preempt_context(), follow up with the function
to handle its completion. This will be useful should we wish to extend
the duties of the preempt-context for execlists.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 53f1c009ed7b..0bfaeb56b8c7 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -531,8 +531,17 @@ static void inject_preempt_context(struct intel_engine_cs *engine)
 	if (execlists->ctrl_reg)
 		writel(EL_CTRL_LOAD, execlists->ctrl_reg);
 
-	execlists_clear_active(&engine->execlists, EXECLISTS_ACTIVE_HWACK);
-	execlists_set_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT);
+	execlists_clear_active(execlists, EXECLISTS_ACTIVE_HWACK);
+	execlists_set_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
+}
+
+static void complete_preempt_context(struct intel_engine_execlists *execlists)
+{
+	execlists_cancel_port_requests(execlists);
+	execlists_unwind_incomplete_requests(execlists);
+
+	GEM_BUG_ON(!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT));
+	execlists_clear_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
 }
 
 static void execlists_dequeue(struct intel_engine_cs *engine)
@@ -939,14 +948,7 @@ static void execlists_submission_tasklet(unsigned long data)
 			if (status & GEN8_CTX_STATUS_COMPLETE &&
 			    buf[2*head + 1] == execlists->preempt_complete_status) {
 				GEM_TRACE("%s preempt-idle\n", engine->name);
-
-				execlists_cancel_port_requests(execlists);
-				execlists_unwind_incomplete_requests(execlists);
-
-				GEM_BUG_ON(!execlists_is_active(execlists,
-								EXECLISTS_ACTIVE_PREEMPT));
-				execlists_clear_active(execlists,
-						       EXECLISTS_ACTIVE_PREEMPT);
+				complete_preempt_context(execlists);
 				continue;
 			}
 
-- 
2.16.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Trim error mask to known engines
  2018-03-16 21:49 [PATCH 1/3] drm/i915: Trim error mask to known engines Chris Wilson
  2018-03-16 21:50 ` [PATCH 2/3] drm/i915: Add control flags to i915_handle_error() Chris Wilson
  2018-03-16 21:50 ` [PATCH 3/3] drm/i915/execlists: Refactor out complete_preempt_context() Chris Wilson
@ 2018-03-16 22:19 ` Patchwork
  2018-03-17  3:26 ` ✗ Fi.CI.IGT: warning " Patchwork
  2018-03-19 13:12 ` [PATCH 1/3] " Chris Wilson
  4 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2018-03-16 22:19 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/3] drm/i915: Trim error mask to known engines
URL   : https://patchwork.freedesktop.org/series/40130/
State : success

== Summary ==

Series 40130v1 series starting with [1/3] drm/i915: Trim error mask to known engines
https://patchwork.freedesktop.org/api/1.0/series/40130/revisions/1/mbox/

---- Known issues:

Test gem_mmap_gtt:
        Subgroup basic-small-bo-tiledx:
                pass       -> FAIL       (fi-gdg-551) fdo#102575
Test kms_pipe_crc_basic:
        Subgroup nonblocking-crc-pipe-b-frame-sequence:
                pass       -> FAIL       (fi-cfl-s2) fdo#103481

fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575
fdo#103481 https://bugs.freedesktop.org/show_bug.cgi?id=103481

fi-bdw-5557u     total:285  pass:264  dwarn:0   dfail:0   fail:0   skip:21  time:434s
fi-bdw-gvtdvm    total:285  pass:261  dwarn:0   dfail:0   fail:0   skip:24  time:440s
fi-blb-e6850     total:285  pass:220  dwarn:1   dfail:0   fail:0   skip:64  time:381s
fi-bsw-n3050     total:285  pass:239  dwarn:0   dfail:0   fail:0   skip:46  time:537s
fi-bwr-2160      total:285  pass:180  dwarn:0   dfail:0   fail:0   skip:105 time:296s
fi-bxt-dsi       total:285  pass:255  dwarn:0   dfail:0   fail:0   skip:30  time:515s
fi-bxt-j4205     total:285  pass:256  dwarn:0   dfail:0   fail:0   skip:29  time:513s
fi-byt-j1900     total:285  pass:250  dwarn:0   dfail:0   fail:0   skip:35  time:520s
fi-byt-n2820     total:285  pass:246  dwarn:0   dfail:0   fail:0   skip:39  time:505s
fi-cfl-8700k     total:285  pass:257  dwarn:0   dfail:0   fail:0   skip:28  time:412s
fi-cfl-s2        total:285  pass:258  dwarn:0   dfail:0   fail:1   skip:26  time:588s
fi-cfl-u         total:285  pass:259  dwarn:0   dfail:0   fail:0   skip:26  time:509s
fi-cnl-drrs      total:285  pass:254  dwarn:3   dfail:0   fail:0   skip:28  time:535s
fi-cnl-y3        total:285  pass:259  dwarn:0   dfail:0   fail:0   skip:26  time:598s
fi-elk-e7500     total:285  pass:225  dwarn:1   dfail:0   fail:0   skip:59  time:430s
fi-gdg-551       total:285  pass:176  dwarn:0   dfail:0   fail:1   skip:108 time:337s
fi-glk-1         total:285  pass:257  dwarn:0   dfail:0   fail:0   skip:28  time:541s
fi-hsw-4770      total:285  pass:258  dwarn:0   dfail:0   fail:0   skip:27  time:404s
fi-ilk-650       total:285  pass:225  dwarn:0   dfail:0   fail:0   skip:60  time:421s
fi-ivb-3520m     total:285  pass:256  dwarn:0   dfail:0   fail:0   skip:29  time:474s
fi-ivb-3770      total:285  pass:252  dwarn:0   dfail:0   fail:0   skip:33  time:428s
fi-kbl-7500u     total:285  pass:260  dwarn:1   dfail:0   fail:0   skip:24  time:477s
fi-kbl-7567u     total:285  pass:265  dwarn:0   dfail:0   fail:0   skip:20  time:470s
fi-kbl-r         total:285  pass:258  dwarn:0   dfail:0   fail:0   skip:27  time:513s
fi-pnv-d510      total:285  pass:219  dwarn:1   dfail:0   fail:0   skip:65  time:658s
fi-skl-6260u     total:285  pass:265  dwarn:0   dfail:0   fail:0   skip:20  time:443s
fi-skl-6600u     total:285  pass:258  dwarn:0   dfail:0   fail:0   skip:27  time:538s
fi-skl-6700hq    total:285  pass:259  dwarn:0   dfail:0   fail:0   skip:26  time:542s
fi-skl-6700k2    total:285  pass:261  dwarn:0   dfail:0   fail:0   skip:24  time:514s
fi-skl-6770hq    total:285  pass:265  dwarn:0   dfail:0   fail:0   skip:20  time:504s
fi-skl-guc       total:285  pass:257  dwarn:0   dfail:0   fail:0   skip:28  time:429s
fi-skl-gvtdvm    total:285  pass:262  dwarn:0   dfail:0   fail:0   skip:23  time:449s
fi-snb-2520m     total:285  pass:245  dwarn:0   dfail:0   fail:0   skip:40  time:612s
fi-snb-2600      total:285  pass:245  dwarn:0   dfail:0   fail:0   skip:40  time:404s

fa73baa35269b86a26e10f36bc35d3b657e2e932 drm-tip: 2018y-03m-16d-19h-43m-50s UTC integration manifest
86b3fd65972f drm/i915/execlists: Refactor out complete_preempt_context()
db534d0b84dc drm/i915: Add control flags to i915_handle_error()
bae6ba421c75 drm/i915: Trim error mask to known engines

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8383/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* ✗ Fi.CI.IGT: warning for series starting with [1/3] drm/i915: Trim error mask to known engines
  2018-03-16 21:49 [PATCH 1/3] drm/i915: Trim error mask to known engines Chris Wilson
                   ` (2 preceding siblings ...)
  2018-03-16 22:19 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Trim error mask to known engines Patchwork
@ 2018-03-17  3:26 ` Patchwork
  2018-03-19 13:12 ` [PATCH 1/3] " Chris Wilson
  4 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2018-03-17  3:26 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/3] drm/i915: Trim error mask to known engines
URL   : https://patchwork.freedesktop.org/series/40130/
State : warning

== Summary ==

---- Possible new issues:

Test kms_cursor_crc:
        Subgroup cursor-128x128-suspend:
                pass       -> SKIP       (shard-snb)
Test kms_frontbuffer_tracking:
        Subgroup fbc-2p-primscrn-indfb-plflip-blt:
                pass       -> SKIP       (shard-hsw)
Test pm_rc6_residency:
        Subgroup rc6-accuracy:
                pass       -> SKIP       (shard-snb)

---- Known issues:

Test drv_suspend:
        Subgroup debugfs-reader:
                pass       -> SKIP       (shard-snb) fdo#102365
Test kms_flip:
        Subgroup plain-flip-fb-recreate:
                pass       -> FAIL       (shard-hsw) fdo#100368 +2
Test kms_frontbuffer_tracking:
        Subgroup fbc-2p-shrfb-fliptrack:
                pass       -> SKIP       (shard-hsw) fdo#101623
Test kms_pipe_crc_basic:
        Subgroup hang-read-crc-pipe-c:
                fail       -> PASS       (shard-apl) fdo#103191
Test kms_plane:
        Subgroup plane-panning-bottom-right-suspend-pipe-c-planes:
                pass       -> INCOMPLETE (shard-hsw) fdo#103375
Test kms_sysfs_edid_timing:
                warn       -> PASS       (shard-apl) fdo#100047

fdo#102365 https://bugs.freedesktop.org/show_bug.cgi?id=102365
fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
fdo#101623 https://bugs.freedesktop.org/show_bug.cgi?id=101623
fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191
fdo#103375 https://bugs.freedesktop.org/show_bug.cgi?id=103375
fdo#100047 https://bugs.freedesktop.org/show_bug.cgi?id=100047

shard-apl        total:3442 pass:1815 dwarn:1   dfail:0   fail:7   skip:1619 time:13013s
shard-hsw        total:3369 pass:1728 dwarn:1   dfail:0   fail:3   skip:1635 time:11699s
shard-snb        total:3442 pass:1356 dwarn:1   dfail:0   fail:2   skip:2083 time:7191s
Blacklisted hosts:
shard-kbl        total:3442 pass:1936 dwarn:1   dfail:1   fail:9   skip:1495 time:9778s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8383/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] drm/i915: Trim error mask to known engines
  2018-03-16 21:49 [PATCH 1/3] drm/i915: Trim error mask to known engines Chris Wilson
                   ` (3 preceding siblings ...)
  2018-03-17  3:26 ` ✗ Fi.CI.IGT: warning " Patchwork
@ 2018-03-19 13:12 ` Chris Wilson
  2018-03-19 16:31   ` Michel Thierry
  4 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2018-03-19 13:12 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika

Quoting Chris Wilson (2018-03-16 21:49:59)
> For the convenience of userspace passing in an arbitrary reset mask,
> remove unknown engines from the set of engines that are to be reset.
> This means that we always follow a per-engine reset with a full-device
> reset when userspace writes -1 into debugfs/i915_wedged.
> 
> Reported-by: Michał Winiarski <michal.winiarski@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>

Please? It papers over the issue in gem_exec_capture...
-Chris

> ---
>  drivers/gpu/drm/i915/i915_irq.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 828f3104488c..44eef355e12c 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2985,6 +2985,7 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
>          */
>         intel_runtime_pm_get(dev_priv);
>  
> +       engine_mask &= INTEL_INFO(dev_priv)->ring_mask;
>         i915_capture_error_state(dev_priv, engine_mask, error_msg);
>         i915_clear_error_registers(dev_priv);
>  
> -- 
> 2.16.2
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] drm/i915: Trim error mask to known engines
  2018-03-19 13:12 ` [PATCH 1/3] " Chris Wilson
@ 2018-03-19 16:31   ` Michel Thierry
  2018-03-19 17:04     ` Chris Wilson
  2018-03-19 17:16     ` Chris Wilson
  0 siblings, 2 replies; 11+ messages in thread
From: Michel Thierry @ 2018-03-19 16:31 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx@lists.freedesktop.org; +Cc: Mika@freedesktop.org

On 19/03/18 06:12, Chris Wilson wrote:
> Quoting Chris Wilson (2018-03-16 21:49:59)
>> For the convenience of userspace passing in an arbitrary reset mask,
>> remove unknown engines from the set of engines that are to be reset.
>> This means that we always follow a per-engine reset with a full-device
>> reset when userspace writes -1 into debugfs/i915_wedged.

I thought that was the desired behaviour...

>>
>> Reported-by: Michał Winiarski <michal.winiarski@intel.com>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> Cc: Michał Winiarski <michal.winiarski@intel.com>
> 
> Please? It papers over the issue in gem_exec_capture...
> -Chris
> 

Reviewed-by: Michel Thierry <michel.thierry@intel.com>

>> ---
>>   drivers/gpu/drm/i915/i915_irq.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>> index 828f3104488c..44eef355e12c 100644
>> --- a/drivers/gpu/drm/i915/i915_irq.c
>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>> @@ -2985,6 +2985,7 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
>>           */
>>          intel_runtime_pm_get(dev_priv);
>>   
>> +       engine_mask &= INTEL_INFO(dev_priv)->ring_mask;
>>          i915_capture_error_state(dev_priv, engine_mask, error_msg);
>>          i915_clear_error_registers(dev_priv);
>>   
>> -- 
>> 2.16.2
>>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] drm/i915: Add control flags to i915_handle_error()
  2018-03-16 21:50 ` [PATCH 2/3] drm/i915: Add control flags to i915_handle_error() Chris Wilson
@ 2018-03-19 16:48   ` Michel Thierry
  2018-03-19 17:08     ` Chris Wilson
  0 siblings, 1 reply; 11+ messages in thread
From: Michel Thierry @ 2018-03-19 16:48 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx@lists.freedesktop.org; +Cc: Kuoppala, Mika

On 16/03/18 14:50, Chris Wilson wrote:
> Not all callers want the GPU error to handled in the same way, so expose
> a control parameter. In the first instance, some callers do not want the
> heavyweight error capture so add a bit to request the state to be
> captured and saved.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Jeff McGee <jeff.mcgee@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c              |  4 ++--
>   drivers/gpu/drm/i915/i915_drv.h                  |  4 +++-
>   drivers/gpu/drm/i915/i915_irq.c                  | 22 ++++++++++++++--------
>   drivers/gpu/drm/i915/intel_hangcheck.c           |  6 +++---
>   drivers/gpu/drm/i915/selftests/intel_hangcheck.c |  2 +-
>   5 files changed, 23 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 5378863e3238..03b74a92caed 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -3952,8 +3952,8 @@ i915_wedged_set(void *data, u64 val)
>   		engine->hangcheck.stalled = true;
>   	}
>   
> -	i915_handle_error(i915, val, "Manually set wedged engine mask = %llx",
> -			  val);
> +	i915_handle_error(i915, val, I915_ERROR_CAPTURE,
> +			  "Manually set wedged engine mask = %llx", val);
>   
>   	wait_on_bit(&i915->gpu_error.flags,
>   		    I915_RESET_HANDOFF,
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index e27ba8fb64e6..53009ba50640 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2751,10 +2751,12 @@ static inline void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
>   			   &dev_priv->gpu_error.hangcheck_work, delay);
>   }
>   
> -__printf(3, 4)
> +__printf(4, 5)
>   void i915_handle_error(struct drm_i915_private *dev_priv,
>   		       u32 engine_mask,
> +		       unsigned long flags,
>   		       const char *fmt, ...);
> +#define I915_ERROR_CAPTURE BIT(0)
>   
Should this be in the new i915_gpu_error.h?

>   extern void intel_irq_init(struct drm_i915_private *dev_priv);
>   extern void intel_irq_fini(struct drm_i915_private *dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 44eef355e12c..b3a4dc7cb26c 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2955,6 +2955,7 @@ static void i915_clear_error_registers(struct drm_i915_private *dev_priv)
>    * i915_handle_error - handle a gpu error
>    * @dev_priv: i915 device private
>    * @engine_mask: mask representing engines that are hung
> + * @flags: control flags
>    * @fmt: Error message format string
>    *
>    * Do some basic checking of register state at error time and
> @@ -2965,16 +2966,11 @@ static void i915_clear_error_registers(struct drm_i915_private *dev_priv)
>    */
>   void i915_handle_error(struct drm_i915_private *dev_priv,
>   		       u32 engine_mask,
> +		       unsigned long flags,
>   		       const char *fmt, ...)
>   {
>   	struct intel_engine_cs *engine;
>   	unsigned int tmp;
> -	va_list args;
> -	char error_msg[80];
> -
> -	va_start(args, fmt);
> -	vscnprintf(error_msg, sizeof(error_msg), fmt, args);
> -	va_end(args);
>   
>   	/*
>   	 * In most cases it's guaranteed that we get here with an RPM
> @@ -2986,8 +2982,18 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
>   	intel_runtime_pm_get(dev_priv);
>   
>   	engine_mask &= INTEL_INFO(dev_priv)->ring_mask;
> -	i915_capture_error_state(dev_priv, engine_mask, error_msg);
> -	i915_clear_error_registers(dev_priv);
> +
> +	if (flags & I915_ERROR_CAPTURE) {
> +		char error_msg[80];
> +		va_list args;
> +
> +		va_start(args, fmt);
> +		vscnprintf(error_msg, sizeof(error_msg), fmt, args);
> +		va_end(args);
> +
> +		i915_capture_error_state(dev_priv, engine_mask, error_msg);
> +		i915_clear_error_registers(dev_priv);
> +	}
         else
	    DRM_INFO or DRM_NOTE the error_msg  ?

Otherwise the 'kicking wait/semaphore' text from below will never be 
printed.

>   
>   	/*
>   	 * Try engine reset when available. We fall back to full reset if
> diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> index 42e45ae87393..13d1a269c771 100644
> --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> @@ -246,7 +246,7 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
>   	 */
>   	tmp = I915_READ_CTL(engine);
>   	if (tmp & RING_WAIT) {
> -		i915_handle_error(dev_priv, 0,
> +		i915_handle_error(dev_priv, 0, 0,
>   				  "Kicking stuck wait on %s",
>   				  engine->name);
>   		I915_WRITE_CTL(engine, tmp);
> @@ -258,7 +258,7 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
>   		default:
>   			return ENGINE_DEAD;
>   		case 1:
> -			i915_handle_error(dev_priv, 0,
> +			i915_handle_error(dev_priv, 0, 0,
>   					  "Kicking stuck semaphore on %s",
>   					  engine->name);
>   			I915_WRITE_CTL(engine, tmp);
> @@ -392,7 +392,7 @@ static void hangcheck_declare_hang(struct drm_i915_private *i915,
>   				 "%s, ", engine->name);
>   	msg[len-2] = '\0';
>   
> -	return i915_handle_error(i915, hung, "%s", msg);
> +	return i915_handle_error(i915, hung, I915_ERROR_CAPTURE, "%s", msg);
>   }
>   
>   /*
> diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> index df7898c8edcb..84aad2b4825d 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> @@ -1084,7 +1084,7 @@ static int igt_handle_error(void *arg)
>   	engine->hangcheck.stalled = true;
>   	engine->hangcheck.seqno = intel_engine_get_seqno(engine);
>   
> -	i915_handle_error(i915, intel_engine_flag(engine), "%s", __func__);
> +	i915_handle_error(i915, intel_engine_flag(engine), 0, "%s", __func__);
>   
>   	xchg(&i915->gpu_error.first_error, error);
>   
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] drm/i915: Trim error mask to known engines
  2018-03-19 16:31   ` Michel Thierry
@ 2018-03-19 17:04     ` Chris Wilson
  2018-03-19 17:16     ` Chris Wilson
  1 sibling, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2018-03-19 17:04 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx@lists.freedesktop.org; +Cc: Mika@freedesktop.org

Quoting Michel Thierry (2018-03-19 16:31:05)
> On 19/03/18 06:12, Chris Wilson wrote:
> > Quoting Chris Wilson (2018-03-16 21:49:59)
> >> For the convenience of userspace passing in an arbitrary reset mask,
> >> remove unknown engines from the set of engines that are to be reset.
> >> This means that we always follow a per-engine reset with a full-device
> >> reset when userspace writes -1 into debugfs/i915_wedged.
> 
> I thought that was the desired behaviour...

The name has been misleading for a few years, from since we actually had
working GPU reset.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] drm/i915: Add control flags to i915_handle_error()
  2018-03-19 16:48   ` Michel Thierry
@ 2018-03-19 17:08     ` Chris Wilson
  0 siblings, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2018-03-19 17:08 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx@lists.freedesktop.org; +Cc: Kuoppala, Mika

Quoting Michel Thierry (2018-03-19 16:48:01)
> On 16/03/18 14:50, Chris Wilson wrote:
> > Not all callers want the GPU error to handled in the same way, so expose
> > a control parameter. In the first instance, some callers do not want the
> > heavyweight error capture so add a bit to request the state to be
> > captured and saved.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Jeff McGee <jeff.mcgee@intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_debugfs.c              |  4 ++--
> >   drivers/gpu/drm/i915/i915_drv.h                  |  4 +++-
> >   drivers/gpu/drm/i915/i915_irq.c                  | 22 ++++++++++++++--------
> >   drivers/gpu/drm/i915/intel_hangcheck.c           |  6 +++---
> >   drivers/gpu/drm/i915/selftests/intel_hangcheck.c |  2 +-
> >   5 files changed, 23 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 5378863e3238..03b74a92caed 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -3952,8 +3952,8 @@ i915_wedged_set(void *data, u64 val)
> >               engine->hangcheck.stalled = true;
> >       }
> >   
> > -     i915_handle_error(i915, val, "Manually set wedged engine mask = %llx",
> > -                       val);
> > +     i915_handle_error(i915, val, I915_ERROR_CAPTURE,
> > +                       "Manually set wedged engine mask = %llx", val);
> >   
> >       wait_on_bit(&i915->gpu_error.flags,
> >                   I915_RESET_HANDOFF,
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index e27ba8fb64e6..53009ba50640 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2751,10 +2751,12 @@ static inline void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
> >                          &dev_priv->gpu_error.hangcheck_work, delay);
> >   }
> >   
> > -__printf(3, 4)
> > +__printf(4, 5)
> >   void i915_handle_error(struct drm_i915_private *dev_priv,
> >                      u32 engine_mask,
> > +                    unsigned long flags,
> >                      const char *fmt, ...);
> > +#define I915_ERROR_CAPTURE BIT(0)
> >   
> Should this be in the new i915_gpu_error.h?

And move it over from i915_irq.c to somewhere more useful? Probably
intel_hangcheck.c since i915_gpu_error.c is conditionally compiled.

> >   extern void intel_irq_init(struct drm_i915_private *dev_priv);
> >   extern void intel_irq_fini(struct drm_i915_private *dev_priv);
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index 44eef355e12c..b3a4dc7cb26c 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -2955,6 +2955,7 @@ static void i915_clear_error_registers(struct drm_i915_private *dev_priv)
> >    * i915_handle_error - handle a gpu error
> >    * @dev_priv: i915 device private
> >    * @engine_mask: mask representing engines that are hung
> > + * @flags: control flags
> >    * @fmt: Error message format string
> >    *
> >    * Do some basic checking of register state at error time and
> > @@ -2965,16 +2966,11 @@ static void i915_clear_error_registers(struct drm_i915_private *dev_priv)
> >    */
> >   void i915_handle_error(struct drm_i915_private *dev_priv,
> >                      u32 engine_mask,
> > +                    unsigned long flags,
> >                      const char *fmt, ...)
> >   {
> >       struct intel_engine_cs *engine;
> >       unsigned int tmp;
> > -     va_list args;
> > -     char error_msg[80];
> > -
> > -     va_start(args, fmt);
> > -     vscnprintf(error_msg, sizeof(error_msg), fmt, args);
> > -     va_end(args);
> >   
> >       /*
> >        * In most cases it's guaranteed that we get here with an RPM
> > @@ -2986,8 +2982,18 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
> >       intel_runtime_pm_get(dev_priv);
> >   
> >       engine_mask &= INTEL_INFO(dev_priv)->ring_mask;
> > -     i915_capture_error_state(dev_priv, engine_mask, error_msg);
> > -     i915_clear_error_registers(dev_priv);
> > +
> > +     if (flags & I915_ERROR_CAPTURE) {
> > +             char error_msg[80];
> > +             va_list args;
> > +
> > +             va_start(args, fmt);
> > +             vscnprintf(error_msg, sizeof(error_msg), fmt, args);
> > +             va_end(args);
> > +
> > +             i915_capture_error_state(dev_priv, engine_mask, error_msg);
> > +             i915_clear_error_registers(dev_priv);
> > +     }
>          else
>             DRM_INFO or DRM_NOTE the error_msg  ?
> 
> Otherwise the 'kicking wait/semaphore' text from below will never be 
> printed.

I liked it disappearing ;)

So we should feed it to i915_reset(const char *reason). That could then
probably replace I915_RESET_QUIET.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] drm/i915: Trim error mask to known engines
  2018-03-19 16:31   ` Michel Thierry
  2018-03-19 17:04     ` Chris Wilson
@ 2018-03-19 17:16     ` Chris Wilson
  1 sibling, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2018-03-19 17:16 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx@lists.freedesktop.org; +Cc: Mika@freedesktop.org

Quoting Michel Thierry (2018-03-19 16:31:05)
> On 19/03/18 06:12, Chris Wilson wrote:
> > Quoting Chris Wilson (2018-03-16 21:49:59)
> >> For the convenience of userspace passing in an arbitrary reset mask,
> >> remove unknown engines from the set of engines that are to be reset.
> >> This means that we always follow a per-engine reset with a full-device
> >> reset when userspace writes -1 into debugfs/i915_wedged.
> 
> I thought that was the desired behaviour...
> 
> >>
> >> Reported-by: Michał Winiarski <michal.winiarski@intel.com>
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> >> Cc: Michał Winiarski <michal.winiarski@intel.com>
> > 
> > Please? It papers over the issue in gem_exec_capture...
> > -Chris
> > 
> 
> Reviewed-by: Michel Thierry <michel.thierry@intel.com>

Snaffled it up so that CI stops spuriously tripping over
gem_exec_capture. We should arrange some faultinjection testing for
live_hangcheck selftests.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-03-19 17:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-16 21:49 [PATCH 1/3] drm/i915: Trim error mask to known engines Chris Wilson
2018-03-16 21:50 ` [PATCH 2/3] drm/i915: Add control flags to i915_handle_error() Chris Wilson
2018-03-19 16:48   ` Michel Thierry
2018-03-19 17:08     ` Chris Wilson
2018-03-16 21:50 ` [PATCH 3/3] drm/i915/execlists: Refactor out complete_preempt_context() Chris Wilson
2018-03-16 22:19 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Trim error mask to known engines Patchwork
2018-03-17  3:26 ` ✗ Fi.CI.IGT: warning " Patchwork
2018-03-19 13:12 ` [PATCH 1/3] " Chris Wilson
2018-03-19 16:31   ` Michel Thierry
2018-03-19 17:04     ` Chris Wilson
2018-03-19 17:16     ` Chris Wilson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).