* [PATCH 1/3] drm/i915: Trim error mask to known engines
@ 2018-03-16 21:49 Chris Wilson
2018-03-16 21:50 ` [PATCH 2/3] drm/i915: Add control flags to i915_handle_error() Chris Wilson
` (4 more replies)
0 siblings, 5 replies; 11+ messages in thread
From: Chris Wilson @ 2018-03-16 21:49 UTC (permalink / raw)
To: intel-gfx
For the convenience of userspace passing in an arbitrary reset mask,
remove unknown engines from the set of engines that are to be reset.
This means that we always follow a per-engine reset with a full-device
reset when userspace writes -1 into debugfs/i915_wedged.
Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
---
drivers/gpu/drm/i915/i915_irq.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 828f3104488c..44eef355e12c 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2985,6 +2985,7 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
*/
intel_runtime_pm_get(dev_priv);
+ engine_mask &= INTEL_INFO(dev_priv)->ring_mask;
i915_capture_error_state(dev_priv, engine_mask, error_msg);
i915_clear_error_registers(dev_priv);
--
2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/3] drm/i915: Add control flags to i915_handle_error()
2018-03-16 21:49 [PATCH 1/3] drm/i915: Trim error mask to known engines Chris Wilson
@ 2018-03-16 21:50 ` Chris Wilson
2018-03-19 16:48 ` Michel Thierry
2018-03-16 21:50 ` [PATCH 3/3] drm/i915/execlists: Refactor out complete_preempt_context() Chris Wilson
` (3 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2018-03-16 21:50 UTC (permalink / raw)
To: intel-gfx; +Cc: Mika Kuoppala
Not all callers want the GPU error to handled in the same way, so expose
a control parameter. In the first instance, some callers do not want the
heavyweight error capture so add a bit to request the state to be
captured and saved.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
drivers/gpu/drm/i915/i915_debugfs.c | 4 ++--
drivers/gpu/drm/i915/i915_drv.h | 4 +++-
drivers/gpu/drm/i915/i915_irq.c | 22 ++++++++++++++--------
drivers/gpu/drm/i915/intel_hangcheck.c | 6 +++---
drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 2 +-
5 files changed, 23 insertions(+), 15 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 5378863e3238..03b74a92caed 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3952,8 +3952,8 @@ i915_wedged_set(void *data, u64 val)
engine->hangcheck.stalled = true;
}
- i915_handle_error(i915, val, "Manually set wedged engine mask = %llx",
- val);
+ i915_handle_error(i915, val, I915_ERROR_CAPTURE,
+ "Manually set wedged engine mask = %llx", val);
wait_on_bit(&i915->gpu_error.flags,
I915_RESET_HANDOFF,
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e27ba8fb64e6..53009ba50640 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2751,10 +2751,12 @@ static inline void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
&dev_priv->gpu_error.hangcheck_work, delay);
}
-__printf(3, 4)
+__printf(4, 5)
void i915_handle_error(struct drm_i915_private *dev_priv,
u32 engine_mask,
+ unsigned long flags,
const char *fmt, ...);
+#define I915_ERROR_CAPTURE BIT(0)
extern void intel_irq_init(struct drm_i915_private *dev_priv);
extern void intel_irq_fini(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 44eef355e12c..b3a4dc7cb26c 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2955,6 +2955,7 @@ static void i915_clear_error_registers(struct drm_i915_private *dev_priv)
* i915_handle_error - handle a gpu error
* @dev_priv: i915 device private
* @engine_mask: mask representing engines that are hung
+ * @flags: control flags
* @fmt: Error message format string
*
* Do some basic checking of register state at error time and
@@ -2965,16 +2966,11 @@ static void i915_clear_error_registers(struct drm_i915_private *dev_priv)
*/
void i915_handle_error(struct drm_i915_private *dev_priv,
u32 engine_mask,
+ unsigned long flags,
const char *fmt, ...)
{
struct intel_engine_cs *engine;
unsigned int tmp;
- va_list args;
- char error_msg[80];
-
- va_start(args, fmt);
- vscnprintf(error_msg, sizeof(error_msg), fmt, args);
- va_end(args);
/*
* In most cases it's guaranteed that we get here with an RPM
@@ -2986,8 +2982,18 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
intel_runtime_pm_get(dev_priv);
engine_mask &= INTEL_INFO(dev_priv)->ring_mask;
- i915_capture_error_state(dev_priv, engine_mask, error_msg);
- i915_clear_error_registers(dev_priv);
+
+ if (flags & I915_ERROR_CAPTURE) {
+ char error_msg[80];
+ va_list args;
+
+ va_start(args, fmt);
+ vscnprintf(error_msg, sizeof(error_msg), fmt, args);
+ va_end(args);
+
+ i915_capture_error_state(dev_priv, engine_mask, error_msg);
+ i915_clear_error_registers(dev_priv);
+ }
/*
* Try engine reset when available. We fall back to full reset if
diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index 42e45ae87393..13d1a269c771 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -246,7 +246,7 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
*/
tmp = I915_READ_CTL(engine);
if (tmp & RING_WAIT) {
- i915_handle_error(dev_priv, 0,
+ i915_handle_error(dev_priv, 0, 0,
"Kicking stuck wait on %s",
engine->name);
I915_WRITE_CTL(engine, tmp);
@@ -258,7 +258,7 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
default:
return ENGINE_DEAD;
case 1:
- i915_handle_error(dev_priv, 0,
+ i915_handle_error(dev_priv, 0, 0,
"Kicking stuck semaphore on %s",
engine->name);
I915_WRITE_CTL(engine, tmp);
@@ -392,7 +392,7 @@ static void hangcheck_declare_hang(struct drm_i915_private *i915,
"%s, ", engine->name);
msg[len-2] = '\0';
- return i915_handle_error(i915, hung, "%s", msg);
+ return i915_handle_error(i915, hung, I915_ERROR_CAPTURE, "%s", msg);
}
/*
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index df7898c8edcb..84aad2b4825d 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -1084,7 +1084,7 @@ static int igt_handle_error(void *arg)
engine->hangcheck.stalled = true;
engine->hangcheck.seqno = intel_engine_get_seqno(engine);
- i915_handle_error(i915, intel_engine_flag(engine), "%s", __func__);
+ i915_handle_error(i915, intel_engine_flag(engine), 0, "%s", __func__);
xchg(&i915->gpu_error.first_error, error);
--
2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/3] drm/i915/execlists: Refactor out complete_preempt_context()
2018-03-16 21:49 [PATCH 1/3] drm/i915: Trim error mask to known engines Chris Wilson
2018-03-16 21:50 ` [PATCH 2/3] drm/i915: Add control flags to i915_handle_error() Chris Wilson
@ 2018-03-16 21:50 ` Chris Wilson
2018-03-16 22:19 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Trim error mask to known engines Patchwork
` (2 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2018-03-16 21:50 UTC (permalink / raw)
To: intel-gfx
As a complement to inject_preempt_context(), follow up with the function
to handle its completion. This will be useful should we wish to extend
the duties of the preempt-context for execlists.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
---
drivers/gpu/drm/i915/intel_lrc.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 53f1c009ed7b..0bfaeb56b8c7 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -531,8 +531,17 @@ static void inject_preempt_context(struct intel_engine_cs *engine)
if (execlists->ctrl_reg)
writel(EL_CTRL_LOAD, execlists->ctrl_reg);
- execlists_clear_active(&engine->execlists, EXECLISTS_ACTIVE_HWACK);
- execlists_set_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT);
+ execlists_clear_active(execlists, EXECLISTS_ACTIVE_HWACK);
+ execlists_set_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
+}
+
+static void complete_preempt_context(struct intel_engine_execlists *execlists)
+{
+ execlists_cancel_port_requests(execlists);
+ execlists_unwind_incomplete_requests(execlists);
+
+ GEM_BUG_ON(!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT));
+ execlists_clear_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
}
static void execlists_dequeue(struct intel_engine_cs *engine)
@@ -939,14 +948,7 @@ static void execlists_submission_tasklet(unsigned long data)
if (status & GEN8_CTX_STATUS_COMPLETE &&
buf[2*head + 1] == execlists->preempt_complete_status) {
GEM_TRACE("%s preempt-idle\n", engine->name);
-
- execlists_cancel_port_requests(execlists);
- execlists_unwind_incomplete_requests(execlists);
-
- GEM_BUG_ON(!execlists_is_active(execlists,
- EXECLISTS_ACTIVE_PREEMPT));
- execlists_clear_active(execlists,
- EXECLISTS_ACTIVE_PREEMPT);
+ complete_preempt_context(execlists);
continue;
}
--
2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 11+ messages in thread
* ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Trim error mask to known engines
2018-03-16 21:49 [PATCH 1/3] drm/i915: Trim error mask to known engines Chris Wilson
2018-03-16 21:50 ` [PATCH 2/3] drm/i915: Add control flags to i915_handle_error() Chris Wilson
2018-03-16 21:50 ` [PATCH 3/3] drm/i915/execlists: Refactor out complete_preempt_context() Chris Wilson
@ 2018-03-16 22:19 ` Patchwork
2018-03-17 3:26 ` ✗ Fi.CI.IGT: warning " Patchwork
2018-03-19 13:12 ` [PATCH 1/3] " Chris Wilson
4 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2018-03-16 22:19 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: series starting with [1/3] drm/i915: Trim error mask to known engines
URL : https://patchwork.freedesktop.org/series/40130/
State : success
== Summary ==
Series 40130v1 series starting with [1/3] drm/i915: Trim error mask to known engines
https://patchwork.freedesktop.org/api/1.0/series/40130/revisions/1/mbox/
---- Known issues:
Test gem_mmap_gtt:
Subgroup basic-small-bo-tiledx:
pass -> FAIL (fi-gdg-551) fdo#102575
Test kms_pipe_crc_basic:
Subgroup nonblocking-crc-pipe-b-frame-sequence:
pass -> FAIL (fi-cfl-s2) fdo#103481
fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575
fdo#103481 https://bugs.freedesktop.org/show_bug.cgi?id=103481
fi-bdw-5557u total:285 pass:264 dwarn:0 dfail:0 fail:0 skip:21 time:434s
fi-bdw-gvtdvm total:285 pass:261 dwarn:0 dfail:0 fail:0 skip:24 time:440s
fi-blb-e6850 total:285 pass:220 dwarn:1 dfail:0 fail:0 skip:64 time:381s
fi-bsw-n3050 total:285 pass:239 dwarn:0 dfail:0 fail:0 skip:46 time:537s
fi-bwr-2160 total:285 pass:180 dwarn:0 dfail:0 fail:0 skip:105 time:296s
fi-bxt-dsi total:285 pass:255 dwarn:0 dfail:0 fail:0 skip:30 time:515s
fi-bxt-j4205 total:285 pass:256 dwarn:0 dfail:0 fail:0 skip:29 time:513s
fi-byt-j1900 total:285 pass:250 dwarn:0 dfail:0 fail:0 skip:35 time:520s
fi-byt-n2820 total:285 pass:246 dwarn:0 dfail:0 fail:0 skip:39 time:505s
fi-cfl-8700k total:285 pass:257 dwarn:0 dfail:0 fail:0 skip:28 time:412s
fi-cfl-s2 total:285 pass:258 dwarn:0 dfail:0 fail:1 skip:26 time:588s
fi-cfl-u total:285 pass:259 dwarn:0 dfail:0 fail:0 skip:26 time:509s
fi-cnl-drrs total:285 pass:254 dwarn:3 dfail:0 fail:0 skip:28 time:535s
fi-cnl-y3 total:285 pass:259 dwarn:0 dfail:0 fail:0 skip:26 time:598s
fi-elk-e7500 total:285 pass:225 dwarn:1 dfail:0 fail:0 skip:59 time:430s
fi-gdg-551 total:285 pass:176 dwarn:0 dfail:0 fail:1 skip:108 time:337s
fi-glk-1 total:285 pass:257 dwarn:0 dfail:0 fail:0 skip:28 time:541s
fi-hsw-4770 total:285 pass:258 dwarn:0 dfail:0 fail:0 skip:27 time:404s
fi-ilk-650 total:285 pass:225 dwarn:0 dfail:0 fail:0 skip:60 time:421s
fi-ivb-3520m total:285 pass:256 dwarn:0 dfail:0 fail:0 skip:29 time:474s
fi-ivb-3770 total:285 pass:252 dwarn:0 dfail:0 fail:0 skip:33 time:428s
fi-kbl-7500u total:285 pass:260 dwarn:1 dfail:0 fail:0 skip:24 time:477s
fi-kbl-7567u total:285 pass:265 dwarn:0 dfail:0 fail:0 skip:20 time:470s
fi-kbl-r total:285 pass:258 dwarn:0 dfail:0 fail:0 skip:27 time:513s
fi-pnv-d510 total:285 pass:219 dwarn:1 dfail:0 fail:0 skip:65 time:658s
fi-skl-6260u total:285 pass:265 dwarn:0 dfail:0 fail:0 skip:20 time:443s
fi-skl-6600u total:285 pass:258 dwarn:0 dfail:0 fail:0 skip:27 time:538s
fi-skl-6700hq total:285 pass:259 dwarn:0 dfail:0 fail:0 skip:26 time:542s
fi-skl-6700k2 total:285 pass:261 dwarn:0 dfail:0 fail:0 skip:24 time:514s
fi-skl-6770hq total:285 pass:265 dwarn:0 dfail:0 fail:0 skip:20 time:504s
fi-skl-guc total:285 pass:257 dwarn:0 dfail:0 fail:0 skip:28 time:429s
fi-skl-gvtdvm total:285 pass:262 dwarn:0 dfail:0 fail:0 skip:23 time:449s
fi-snb-2520m total:285 pass:245 dwarn:0 dfail:0 fail:0 skip:40 time:612s
fi-snb-2600 total:285 pass:245 dwarn:0 dfail:0 fail:0 skip:40 time:404s
fa73baa35269b86a26e10f36bc35d3b657e2e932 drm-tip: 2018y-03m-16d-19h-43m-50s UTC integration manifest
86b3fd65972f drm/i915/execlists: Refactor out complete_preempt_context()
db534d0b84dc drm/i915: Add control flags to i915_handle_error()
bae6ba421c75 drm/i915: Trim error mask to known engines
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8383/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* ✗ Fi.CI.IGT: warning for series starting with [1/3] drm/i915: Trim error mask to known engines
2018-03-16 21:49 [PATCH 1/3] drm/i915: Trim error mask to known engines Chris Wilson
` (2 preceding siblings ...)
2018-03-16 22:19 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Trim error mask to known engines Patchwork
@ 2018-03-17 3:26 ` Patchwork
2018-03-19 13:12 ` [PATCH 1/3] " Chris Wilson
4 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2018-03-17 3:26 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: series starting with [1/3] drm/i915: Trim error mask to known engines
URL : https://patchwork.freedesktop.org/series/40130/
State : warning
== Summary ==
---- Possible new issues:
Test kms_cursor_crc:
Subgroup cursor-128x128-suspend:
pass -> SKIP (shard-snb)
Test kms_frontbuffer_tracking:
Subgroup fbc-2p-primscrn-indfb-plflip-blt:
pass -> SKIP (shard-hsw)
Test pm_rc6_residency:
Subgroup rc6-accuracy:
pass -> SKIP (shard-snb)
---- Known issues:
Test drv_suspend:
Subgroup debugfs-reader:
pass -> SKIP (shard-snb) fdo#102365
Test kms_flip:
Subgroup plain-flip-fb-recreate:
pass -> FAIL (shard-hsw) fdo#100368 +2
Test kms_frontbuffer_tracking:
Subgroup fbc-2p-shrfb-fliptrack:
pass -> SKIP (shard-hsw) fdo#101623
Test kms_pipe_crc_basic:
Subgroup hang-read-crc-pipe-c:
fail -> PASS (shard-apl) fdo#103191
Test kms_plane:
Subgroup plane-panning-bottom-right-suspend-pipe-c-planes:
pass -> INCOMPLETE (shard-hsw) fdo#103375
Test kms_sysfs_edid_timing:
warn -> PASS (shard-apl) fdo#100047
fdo#102365 https://bugs.freedesktop.org/show_bug.cgi?id=102365
fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
fdo#101623 https://bugs.freedesktop.org/show_bug.cgi?id=101623
fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191
fdo#103375 https://bugs.freedesktop.org/show_bug.cgi?id=103375
fdo#100047 https://bugs.freedesktop.org/show_bug.cgi?id=100047
shard-apl total:3442 pass:1815 dwarn:1 dfail:0 fail:7 skip:1619 time:13013s
shard-hsw total:3369 pass:1728 dwarn:1 dfail:0 fail:3 skip:1635 time:11699s
shard-snb total:3442 pass:1356 dwarn:1 dfail:0 fail:2 skip:2083 time:7191s
Blacklisted hosts:
shard-kbl total:3442 pass:1936 dwarn:1 dfail:1 fail:9 skip:1495 time:9778s
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8383/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] drm/i915: Trim error mask to known engines
2018-03-16 21:49 [PATCH 1/3] drm/i915: Trim error mask to known engines Chris Wilson
` (3 preceding siblings ...)
2018-03-17 3:26 ` ✗ Fi.CI.IGT: warning " Patchwork
@ 2018-03-19 13:12 ` Chris Wilson
2018-03-19 16:31 ` Michel Thierry
4 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2018-03-19 13:12 UTC (permalink / raw)
To: intel-gfx; +Cc: Mika
Quoting Chris Wilson (2018-03-16 21:49:59)
> For the convenience of userspace passing in an arbitrary reset mask,
> remove unknown engines from the set of engines that are to be reset.
> This means that we always follow a per-engine reset with a full-device
> reset when userspace writes -1 into debugfs/i915_wedged.
>
> Reported-by: Michał Winiarski <michal.winiarski@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
Please? It papers over the issue in gem_exec_capture...
-Chris
> ---
> drivers/gpu/drm/i915/i915_irq.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 828f3104488c..44eef355e12c 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2985,6 +2985,7 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
> */
> intel_runtime_pm_get(dev_priv);
>
> + engine_mask &= INTEL_INFO(dev_priv)->ring_mask;
> i915_capture_error_state(dev_priv, engine_mask, error_msg);
> i915_clear_error_registers(dev_priv);
>
> --
> 2.16.2
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] drm/i915: Trim error mask to known engines
2018-03-19 13:12 ` [PATCH 1/3] " Chris Wilson
@ 2018-03-19 16:31 ` Michel Thierry
2018-03-19 17:04 ` Chris Wilson
2018-03-19 17:16 ` Chris Wilson
0 siblings, 2 replies; 11+ messages in thread
From: Michel Thierry @ 2018-03-19 16:31 UTC (permalink / raw)
To: Chris Wilson, intel-gfx@lists.freedesktop.org; +Cc: Mika@freedesktop.org
On 19/03/18 06:12, Chris Wilson wrote:
> Quoting Chris Wilson (2018-03-16 21:49:59)
>> For the convenience of userspace passing in an arbitrary reset mask,
>> remove unknown engines from the set of engines that are to be reset.
>> This means that we always follow a per-engine reset with a full-device
>> reset when userspace writes -1 into debugfs/i915_wedged.
I thought that was the desired behaviour...
>>
>> Reported-by: Michał Winiarski <michal.winiarski@intel.com>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> Cc: Michał Winiarski <michal.winiarski@intel.com>
>
> Please? It papers over the issue in gem_exec_capture...
> -Chris
>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_irq.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>> index 828f3104488c..44eef355e12c 100644
>> --- a/drivers/gpu/drm/i915/i915_irq.c
>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>> @@ -2985,6 +2985,7 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
>> */
>> intel_runtime_pm_get(dev_priv);
>>
>> + engine_mask &= INTEL_INFO(dev_priv)->ring_mask;
>> i915_capture_error_state(dev_priv, engine_mask, error_msg);
>> i915_clear_error_registers(dev_priv);
>>
>> --
>> 2.16.2
>>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/3] drm/i915: Add control flags to i915_handle_error()
2018-03-16 21:50 ` [PATCH 2/3] drm/i915: Add control flags to i915_handle_error() Chris Wilson
@ 2018-03-19 16:48 ` Michel Thierry
2018-03-19 17:08 ` Chris Wilson
0 siblings, 1 reply; 11+ messages in thread
From: Michel Thierry @ 2018-03-19 16:48 UTC (permalink / raw)
To: Chris Wilson, intel-gfx@lists.freedesktop.org; +Cc: Kuoppala, Mika
On 16/03/18 14:50, Chris Wilson wrote:
> Not all callers want the GPU error to handled in the same way, so expose
> a control parameter. In the first instance, some callers do not want the
> heavyweight error capture so add a bit to request the state to be
> captured and saved.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Jeff McGee <jeff.mcgee@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
> drivers/gpu/drm/i915/i915_debugfs.c | 4 ++--
> drivers/gpu/drm/i915/i915_drv.h | 4 +++-
> drivers/gpu/drm/i915/i915_irq.c | 22 ++++++++++++++--------
> drivers/gpu/drm/i915/intel_hangcheck.c | 6 +++---
> drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 2 +-
> 5 files changed, 23 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 5378863e3238..03b74a92caed 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -3952,8 +3952,8 @@ i915_wedged_set(void *data, u64 val)
> engine->hangcheck.stalled = true;
> }
>
> - i915_handle_error(i915, val, "Manually set wedged engine mask = %llx",
> - val);
> + i915_handle_error(i915, val, I915_ERROR_CAPTURE,
> + "Manually set wedged engine mask = %llx", val);
>
> wait_on_bit(&i915->gpu_error.flags,
> I915_RESET_HANDOFF,
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index e27ba8fb64e6..53009ba50640 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2751,10 +2751,12 @@ static inline void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
> &dev_priv->gpu_error.hangcheck_work, delay);
> }
>
> -__printf(3, 4)
> +__printf(4, 5)
> void i915_handle_error(struct drm_i915_private *dev_priv,
> u32 engine_mask,
> + unsigned long flags,
> const char *fmt, ...);
> +#define I915_ERROR_CAPTURE BIT(0)
>
Should this be in the new i915_gpu_error.h?
> extern void intel_irq_init(struct drm_i915_private *dev_priv);
> extern void intel_irq_fini(struct drm_i915_private *dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 44eef355e12c..b3a4dc7cb26c 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2955,6 +2955,7 @@ static void i915_clear_error_registers(struct drm_i915_private *dev_priv)
> * i915_handle_error - handle a gpu error
> * @dev_priv: i915 device private
> * @engine_mask: mask representing engines that are hung
> + * @flags: control flags
> * @fmt: Error message format string
> *
> * Do some basic checking of register state at error time and
> @@ -2965,16 +2966,11 @@ static void i915_clear_error_registers(struct drm_i915_private *dev_priv)
> */
> void i915_handle_error(struct drm_i915_private *dev_priv,
> u32 engine_mask,
> + unsigned long flags,
> const char *fmt, ...)
> {
> struct intel_engine_cs *engine;
> unsigned int tmp;
> - va_list args;
> - char error_msg[80];
> -
> - va_start(args, fmt);
> - vscnprintf(error_msg, sizeof(error_msg), fmt, args);
> - va_end(args);
>
> /*
> * In most cases it's guaranteed that we get here with an RPM
> @@ -2986,8 +2982,18 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
> intel_runtime_pm_get(dev_priv);
>
> engine_mask &= INTEL_INFO(dev_priv)->ring_mask;
> - i915_capture_error_state(dev_priv, engine_mask, error_msg);
> - i915_clear_error_registers(dev_priv);
> +
> + if (flags & I915_ERROR_CAPTURE) {
> + char error_msg[80];
> + va_list args;
> +
> + va_start(args, fmt);
> + vscnprintf(error_msg, sizeof(error_msg), fmt, args);
> + va_end(args);
> +
> + i915_capture_error_state(dev_priv, engine_mask, error_msg);
> + i915_clear_error_registers(dev_priv);
> + }
else
DRM_INFO or DRM_NOTE the error_msg ?
Otherwise the 'kicking wait/semaphore' text from below will never be
printed.
>
> /*
> * Try engine reset when available. We fall back to full reset if
> diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> index 42e45ae87393..13d1a269c771 100644
> --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> @@ -246,7 +246,7 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
> */
> tmp = I915_READ_CTL(engine);
> if (tmp & RING_WAIT) {
> - i915_handle_error(dev_priv, 0,
> + i915_handle_error(dev_priv, 0, 0,
> "Kicking stuck wait on %s",
> engine->name);
> I915_WRITE_CTL(engine, tmp);
> @@ -258,7 +258,7 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
> default:
> return ENGINE_DEAD;
> case 1:
> - i915_handle_error(dev_priv, 0,
> + i915_handle_error(dev_priv, 0, 0,
> "Kicking stuck semaphore on %s",
> engine->name);
> I915_WRITE_CTL(engine, tmp);
> @@ -392,7 +392,7 @@ static void hangcheck_declare_hang(struct drm_i915_private *i915,
> "%s, ", engine->name);
> msg[len-2] = '\0';
>
> - return i915_handle_error(i915, hung, "%s", msg);
> + return i915_handle_error(i915, hung, I915_ERROR_CAPTURE, "%s", msg);
> }
>
> /*
> diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> index df7898c8edcb..84aad2b4825d 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> @@ -1084,7 +1084,7 @@ static int igt_handle_error(void *arg)
> engine->hangcheck.stalled = true;
> engine->hangcheck.seqno = intel_engine_get_seqno(engine);
>
> - i915_handle_error(i915, intel_engine_flag(engine), "%s", __func__);
> + i915_handle_error(i915, intel_engine_flag(engine), 0, "%s", __func__);
>
> xchg(&i915->gpu_error.first_error, error);
>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] drm/i915: Trim error mask to known engines
2018-03-19 16:31 ` Michel Thierry
@ 2018-03-19 17:04 ` Chris Wilson
2018-03-19 17:16 ` Chris Wilson
1 sibling, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2018-03-19 17:04 UTC (permalink / raw)
To: Michel Thierry, intel-gfx@lists.freedesktop.org; +Cc: Mika@freedesktop.org
Quoting Michel Thierry (2018-03-19 16:31:05)
> On 19/03/18 06:12, Chris Wilson wrote:
> > Quoting Chris Wilson (2018-03-16 21:49:59)
> >> For the convenience of userspace passing in an arbitrary reset mask,
> >> remove unknown engines from the set of engines that are to be reset.
> >> This means that we always follow a per-engine reset with a full-device
> >> reset when userspace writes -1 into debugfs/i915_wedged.
>
> I thought that was the desired behaviour...
The name has been misleading for a few years, from since we actually had
working GPU reset.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/3] drm/i915: Add control flags to i915_handle_error()
2018-03-19 16:48 ` Michel Thierry
@ 2018-03-19 17:08 ` Chris Wilson
0 siblings, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2018-03-19 17:08 UTC (permalink / raw)
To: Michel Thierry, intel-gfx@lists.freedesktop.org; +Cc: Kuoppala, Mika
Quoting Michel Thierry (2018-03-19 16:48:01)
> On 16/03/18 14:50, Chris Wilson wrote:
> > Not all callers want the GPU error to handled in the same way, so expose
> > a control parameter. In the first instance, some callers do not want the
> > heavyweight error capture so add a bit to request the state to be
> > captured and saved.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Jeff McGee <jeff.mcgee@intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_debugfs.c | 4 ++--
> > drivers/gpu/drm/i915/i915_drv.h | 4 +++-
> > drivers/gpu/drm/i915/i915_irq.c | 22 ++++++++++++++--------
> > drivers/gpu/drm/i915/intel_hangcheck.c | 6 +++---
> > drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 2 +-
> > 5 files changed, 23 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 5378863e3238..03b74a92caed 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -3952,8 +3952,8 @@ i915_wedged_set(void *data, u64 val)
> > engine->hangcheck.stalled = true;
> > }
> >
> > - i915_handle_error(i915, val, "Manually set wedged engine mask = %llx",
> > - val);
> > + i915_handle_error(i915, val, I915_ERROR_CAPTURE,
> > + "Manually set wedged engine mask = %llx", val);
> >
> > wait_on_bit(&i915->gpu_error.flags,
> > I915_RESET_HANDOFF,
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index e27ba8fb64e6..53009ba50640 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2751,10 +2751,12 @@ static inline void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
> > &dev_priv->gpu_error.hangcheck_work, delay);
> > }
> >
> > -__printf(3, 4)
> > +__printf(4, 5)
> > void i915_handle_error(struct drm_i915_private *dev_priv,
> > u32 engine_mask,
> > + unsigned long flags,
> > const char *fmt, ...);
> > +#define I915_ERROR_CAPTURE BIT(0)
> >
> Should this be in the new i915_gpu_error.h?
And move it over from i915_irq.c to somewhere more useful? Probably
intel_hangcheck.c since i915_gpu_error.c is conditionally compiled.
> > extern void intel_irq_init(struct drm_i915_private *dev_priv);
> > extern void intel_irq_fini(struct drm_i915_private *dev_priv);
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index 44eef355e12c..b3a4dc7cb26c 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -2955,6 +2955,7 @@ static void i915_clear_error_registers(struct drm_i915_private *dev_priv)
> > * i915_handle_error - handle a gpu error
> > * @dev_priv: i915 device private
> > * @engine_mask: mask representing engines that are hung
> > + * @flags: control flags
> > * @fmt: Error message format string
> > *
> > * Do some basic checking of register state at error time and
> > @@ -2965,16 +2966,11 @@ static void i915_clear_error_registers(struct drm_i915_private *dev_priv)
> > */
> > void i915_handle_error(struct drm_i915_private *dev_priv,
> > u32 engine_mask,
> > + unsigned long flags,
> > const char *fmt, ...)
> > {
> > struct intel_engine_cs *engine;
> > unsigned int tmp;
> > - va_list args;
> > - char error_msg[80];
> > -
> > - va_start(args, fmt);
> > - vscnprintf(error_msg, sizeof(error_msg), fmt, args);
> > - va_end(args);
> >
> > /*
> > * In most cases it's guaranteed that we get here with an RPM
> > @@ -2986,8 +2982,18 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
> > intel_runtime_pm_get(dev_priv);
> >
> > engine_mask &= INTEL_INFO(dev_priv)->ring_mask;
> > - i915_capture_error_state(dev_priv, engine_mask, error_msg);
> > - i915_clear_error_registers(dev_priv);
> > +
> > + if (flags & I915_ERROR_CAPTURE) {
> > + char error_msg[80];
> > + va_list args;
> > +
> > + va_start(args, fmt);
> > + vscnprintf(error_msg, sizeof(error_msg), fmt, args);
> > + va_end(args);
> > +
> > + i915_capture_error_state(dev_priv, engine_mask, error_msg);
> > + i915_clear_error_registers(dev_priv);
> > + }
> else
> DRM_INFO or DRM_NOTE the error_msg ?
>
> Otherwise the 'kicking wait/semaphore' text from below will never be
> printed.
I liked it disappearing ;)
So we should feed it to i915_reset(const char *reason). That could then
probably replace I915_RESET_QUIET.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] drm/i915: Trim error mask to known engines
2018-03-19 16:31 ` Michel Thierry
2018-03-19 17:04 ` Chris Wilson
@ 2018-03-19 17:16 ` Chris Wilson
1 sibling, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2018-03-19 17:16 UTC (permalink / raw)
To: Michel Thierry, intel-gfx@lists.freedesktop.org; +Cc: Mika@freedesktop.org
Quoting Michel Thierry (2018-03-19 16:31:05)
> On 19/03/18 06:12, Chris Wilson wrote:
> > Quoting Chris Wilson (2018-03-16 21:49:59)
> >> For the convenience of userspace passing in an arbitrary reset mask,
> >> remove unknown engines from the set of engines that are to be reset.
> >> This means that we always follow a per-engine reset with a full-device
> >> reset when userspace writes -1 into debugfs/i915_wedged.
>
> I thought that was the desired behaviour...
>
> >>
> >> Reported-by: Michał Winiarski <michal.winiarski@intel.com>
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> >> Cc: Michał Winiarski <michal.winiarski@intel.com>
> >
> > Please? It papers over the issue in gem_exec_capture...
> > -Chris
> >
>
> Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Snaffled it up so that CI stops spuriously tripping over
gem_exec_capture. We should arrange some faultinjection testing for
live_hangcheck selftests.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-03-19 17:16 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-16 21:49 [PATCH 1/3] drm/i915: Trim error mask to known engines Chris Wilson
2018-03-16 21:50 ` [PATCH 2/3] drm/i915: Add control flags to i915_handle_error() Chris Wilson
2018-03-19 16:48 ` Michel Thierry
2018-03-19 17:08 ` Chris Wilson
2018-03-16 21:50 ` [PATCH 3/3] drm/i915/execlists: Refactor out complete_preempt_context() Chris Wilson
2018-03-16 22:19 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Trim error mask to known engines Patchwork
2018-03-17 3:26 ` ✗ Fi.CI.IGT: warning " Patchwork
2018-03-19 13:12 ` [PATCH 1/3] " Chris Wilson
2018-03-19 16:31 ` Michel Thierry
2018-03-19 17:04 ` Chris Wilson
2018-03-19 17:16 ` Chris Wilson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).