* [PATCH 1/5] drm/i915: Stop engines around GPU reset preparations
@ 2018-03-02 14:32 Chris Wilson
2018-03-02 14:32 ` [PATCH 2/5] drm/i915: Suspend submission tasklets around wedging Chris Wilson
` (6 more replies)
0 siblings, 7 replies; 13+ messages in thread
From: Chris Wilson @ 2018-03-02 14:32 UTC (permalink / raw)
To: intel-gfx
As we make preparations to reset the GPU state, we assume that the GPU
is hung and will not advance. Make this assumption more explicit by
setting the STOP_RING bit on the engines as part of our early reset
preparations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180302113324.23189-1-chris@chris-wilson.co.uk
---
drivers/gpu/drm/i915/i915_drv.c | 3 +++
drivers/gpu/drm/i915/i915_drv.h | 11 +++++++++--
drivers/gpu/drm/i915/intel_uncore.c | 36 +++++++++++++++++++++++++++++++++++-
3 files changed, 47 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index aaa861b51024..925f5722d077 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1908,6 +1908,8 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags)
error->reset_count++;
disable_irq(i915->drm.irq);
+ intel_gpu_reset_prepare(i915, ALL_ENGINES);
+
ret = i915_gem_reset_prepare(i915);
if (ret) {
dev_err(i915->drm.dev, "GPU recovery failed\n");
@@ -1969,6 +1971,7 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags)
finish:
i915_gem_reset_finish(i915);
+ intel_gpu_reset_finish(i915, ALL_ENGINES);
enable_irq(i915->drm.irq);
wakeup:
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 10c9e5e619ab..0cb141f768ed 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2957,8 +2957,15 @@ extern const struct dev_pm_ops i915_pm_ops;
extern int i915_driver_load(struct pci_dev *pdev,
const struct pci_device_id *ent);
extern void i915_driver_unload(struct drm_device *dev);
-extern int intel_gpu_reset(struct drm_i915_private *dev_priv, u32 engine_mask);
-extern bool intel_has_gpu_reset(struct drm_i915_private *dev_priv);
+
+bool intel_has_gpu_reset(struct drm_i915_private *dev_priv);
+
+void intel_gpu_reset_prepare(struct drm_i915_private *dev_priv,
+ unsigned int engine_mask);
+int intel_gpu_reset(struct drm_i915_private *dev_priv,
+ unsigned int engine_mask);
+void intel_gpu_reset_finish(struct drm_i915_private *dev_priv,
+ unsigned int engine_mask);
#define I915_RESET_QUIET BIT(0)
extern void i915_reset(struct drm_i915_private *i915, unsigned int flags);
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 5ae9a62712ca..e193af2feefb 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1899,7 +1899,31 @@ static reset_func intel_get_gpu_reset(struct drm_i915_private *dev_priv)
return NULL;
}
-int intel_gpu_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
+static void i915_engines_set_mode(struct drm_i915_private *dev_priv,
+ unsigned int engine_mask,
+ u32 mode)
+{
+ struct intel_engine_cs *engine;
+ enum intel_engine_id id;
+
+ if (INTEL_GEN(dev_priv) < 3)
+ return;
+
+ for_each_engine_masked(engine, dev_priv, engine_mask, id)
+ I915_WRITE_FW(RING_MI_MODE(engine->mmio_base), mode);
+}
+
+void intel_gpu_reset_prepare(struct drm_i915_private *dev_priv,
+ unsigned int engine_mask)
+{
+ intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+ i915_engines_set_mode(dev_priv, engine_mask,
+ _MASKED_BIT_ENABLE(STOP_RING));
+}
+
+int intel_gpu_reset(struct drm_i915_private *dev_priv,
+ unsigned int engine_mask)
{
reset_func reset = intel_get_gpu_reset(dev_priv);
int retry;
@@ -1939,6 +1963,16 @@ int intel_gpu_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
return ret;
}
+void intel_gpu_reset_finish(struct drm_i915_private *dev_priv,
+ unsigned int engine_mask)
+{
+ /* Clear the STOP_RING bit as the reset may not have occurred */
+ i915_engines_set_mode(dev_priv, engine_mask,
+ _MASKED_BIT_DISABLE(STOP_RING));
+
+ intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+}
+
bool intel_has_gpu_reset(struct drm_i915_private *dev_priv)
{
return intel_get_gpu_reset(dev_priv) != NULL;
--
2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/5] drm/i915: Suspend submission tasklets around wedging
2018-03-02 14:32 [PATCH 1/5] drm/i915: Stop engines around GPU reset preparations Chris Wilson
@ 2018-03-02 14:32 ` Chris Wilson
2018-03-02 14:34 ` Mika Kuoppala
2018-03-02 14:32 ` [PATCH 3/5] drm/i915/execlists: Move irq state manipulation inside irq disabled region Chris Wilson
` (5 subsequent siblings)
6 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2018-03-02 14:32 UTC (permalink / raw)
To: intel-gfx
After starting hard at sequences like
[ 28.199013] systemd-1 2..s. 26062228us : execlists_submission_tasklet: rcs0 cs-irq head=0 [0?], tail=1 [1?]
[ 28.199095] systemd-1 2..s. 26062229us : execlists_submission_tasklet: rcs0 csb[1]: status=0x00000018:0x00000000, active=0x1
[ 28.199177] systemd-1 2..s. 26062230us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.1, seqno=3, prio=-1024
[ 28.199258] systemd-1 2..s. 26062231us : execlists_submission_tasklet: rcs0 completed ctx=0
[ 28.199340] gem_eio-829 1..s1 26066853us : execlists_submission_tasklet: rcs0 in[0]: ctx=1.1, seqno=1, prio=0
[ 28.199421] <idle>-0 2..s. 26066863us : execlists_submission_tasklet: rcs0 cs-irq head=1 [1?], tail=2 [2?]
[ 28.199503] <idle>-0 2..s. 26066865us : execlists_submission_tasklet: rcs0 csb[2]: status=0x00000001:0x00000000, active=0x1
[ 28.199585] gem_eio-829 1..s1 26067077us : execlists_submission_tasklet: rcs0 in[1]: ctx=3.1, seqno=2, prio=0
[ 28.199667] gem_eio-829 1..s1 26067078us : execlists_submission_tasklet: rcs0 in[0]: ctx=1.2, seqno=1, prio=0
[ 28.199749] <idle>-0 2..s. 26067084us : execlists_submission_tasklet: rcs0 cs-irq head=2 [2?], tail=3 [3?]
[ 28.199830] <idle>-0 2..s. 26067085us : execlists_submission_tasklet: rcs0 csb[3]: status=0x00008002:0x00000001, active=0x1
[ 28.199912] <idle>-0 2..s. 26067086us : execlists_submission_tasklet: rcs0 out[0]: ctx=1.2, seqno=1, prio=0
[ 28.199994] gem_eio-829 2..s. 28246084us : execlists_submission_tasklet: rcs0 cs-irq head=3 [3?], tail=4 [4?]
[ 28.200096] gem_eio-829 2..s. 28246088us : execlists_submission_tasklet: rcs0 csb[4]: status=0x00000014:0x00000001, active=0x5
[ 28.200178] gem_eio-829 2..s. 28246089us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.0, seqno=0, prio=0
[ 28.200260] gem_eio-829 2..s. 28246127us : execlists_submission_tasklet: execlists_submission_tasklet:886 GEM_BUG_ON(buf[2 * head + 1] != port->context_id)
the conclusion is that the only place where the ports are reset to zero,
is from engine->cancel_requests called during i915_gem_set_wedged().
The race is horrible as it results from calling set-wedged on active HW
(the GPU reset failed) and as such we need to be careful as the HW state
changes beneath us. Fortunately, it's the same scary conditions as
affect normal reset, so we can reuse the same machinery to disable state
tracking as we clobber it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104945
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180302113324.23189-2-chris@chris-wilson.co.uk
---
drivers/gpu/drm/i915/i915_gem.c | 6 +++++-
drivers/gpu/drm/i915/intel_lrc.c | 5 +++++
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c29b1a1cbe96..dcdcc09240b9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3212,8 +3212,10 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
* rolling the global seqno forward (since this would complete requests
* for which we haven't set the fence error to EIO yet).
*/
- for_each_engine(engine, i915, id)
+ for_each_engine(engine, i915, id) {
+ i915_gem_reset_prepare_engine(engine);
engine->submit_request = nop_submit_request;
+ }
/*
* Make sure no one is running the old callback before we proceed with
@@ -3255,6 +3257,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
intel_engine_init_global_seqno(engine,
intel_engine_last_submit(engine));
spin_unlock_irqrestore(&engine->timeline->lock, flags);
+
+ i915_gem_reset_finish_engine(engine);
}
wake_up_all(&i915->gpu_error.reset_queue);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 14288743909f..c1a3636e94fc 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -687,6 +687,8 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
struct rb_node *rb;
unsigned long flags;
+ GEM_TRACE("%s\n", engine->name);
+
spin_lock_irqsave(&engine->timeline->lock, flags);
/* Cancel the requests on the HW and clear the ELSP tracker. */
@@ -733,6 +735,9 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
*/
clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+ /* Mark all CS interrupts as complete */
+ execlists->active = 0;
+
spin_unlock_irqrestore(&engine->timeline->lock, flags);
}
--
2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/5] drm/i915/execlists: Move irq state manipulation inside irq disabled region
2018-03-02 14:32 [PATCH 1/5] drm/i915: Stop engines around GPU reset preparations Chris Wilson
2018-03-02 14:32 ` [PATCH 2/5] drm/i915: Suspend submission tasklets around wedging Chris Wilson
@ 2018-03-02 14:32 ` Chris Wilson
2018-03-02 14:32 ` [PATCH 4/5] drm/i915/execlists: Split spinlock from its irq disabling side-effect Chris Wilson
` (4 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Chris Wilson @ 2018-03-02 14:32 UTC (permalink / raw)
To: intel-gfx
Although this state (execlists->active and engine->irq_posted) itself is
not protected by the engine->timeline spinlock, it does conveniently
ensure that irqs are disabled. We can use this to protect our
manipulation of the state and so ensure that the next IRQ to arrive sees
consistent state and (hopefully) ignores the reset engine.
Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180302131246.22036-1-chris@chris-wilson.co.uk
---
drivers/gpu/drm/i915/intel_lrc.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c1a3636e94fc..0482e54c94f0 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1618,10 +1618,10 @@ static void reset_common_ring(struct intel_engine_cs *engine,
GEM_TRACE("%s seqno=%x\n",
engine->name, request ? request->global_seqno : 0);
- reset_irq(engine);
-
spin_lock_irqsave(&engine->timeline->lock, flags);
+ reset_irq(engine);
+
/*
* Catch up with any missed context-switch interrupts.
*
@@ -1636,11 +1636,11 @@ static void reset_common_ring(struct intel_engine_cs *engine,
/* Push back any incomplete requests for replay after the reset. */
__unwind_incomplete_requests(engine);
- spin_unlock_irqrestore(&engine->timeline->lock, flags);
-
/* Mark all CS interrupts as complete */
execlists->active = 0;
+ spin_unlock_irqrestore(&engine->timeline->lock, flags);
+
/* If the request was innocent, we leave the request in the ELSP
* and will try to replay it on restarting. The context image may
* have been corrupted by the reset, in which case we may have
--
2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 4/5] drm/i915/execlists: Split spinlock from its irq disabling side-effect
2018-03-02 14:32 [PATCH 1/5] drm/i915: Stop engines around GPU reset preparations Chris Wilson
2018-03-02 14:32 ` [PATCH 2/5] drm/i915: Suspend submission tasklets around wedging Chris Wilson
2018-03-02 14:32 ` [PATCH 3/5] drm/i915/execlists: Move irq state manipulation inside irq disabled region Chris Wilson
@ 2018-03-02 14:32 ` Chris Wilson
2018-03-02 15:50 ` Mika Kuoppala
2018-03-02 14:32 ` [PATCH 5/5] drm/i915: Call prepare/finish around intel_gpu_reset() during GEM sanitize Chris Wilson
` (3 subsequent siblings)
6 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2018-03-02 14:32 UTC (permalink / raw)
To: intel-gfx
During reset/wedging, we have to clean up the requests on the timeline
and flush the pending interrupt state. Currently, we are abusing the irq
disabling of the timeline spinlock to protect the irq state in
conjunction to the engine's timeline requests, but this is accidental
and conflates the spinlock with the irq state. A baffling state of
affairs for the reader.
Instead, explicitly disable irqs over the critical section, and separate
modifying the irq state from the timeline's requests.
Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/intel_lrc.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0482e54c94f0..7d1109aceabb 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -689,11 +689,13 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
GEM_TRACE("%s\n", engine->name);
- spin_lock_irqsave(&engine->timeline->lock, flags);
+ local_irq_save(flags);
/* Cancel the requests on the HW and clear the ELSP tracker. */
execlists_cancel_port_requests(execlists);
+ spin_lock(&engine->timeline->lock);
+
/* Mark all executing requests as skipped. */
list_for_each_entry(rq, &engine->timeline->requests, link) {
GEM_BUG_ON(!rq->global_seqno);
@@ -727,6 +729,8 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
execlists->first = NULL;
GEM_BUG_ON(port_isset(execlists->port));
+ spin_unlock(&engine->timeline->lock);
+
/*
* The port is checked prior to scheduling a tasklet, but
* just in case we have suspended the tasklet to do the
@@ -738,7 +742,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
/* Mark all CS interrupts as complete */
execlists->active = 0;
- spin_unlock_irqrestore(&engine->timeline->lock, flags);
+ local_irq_restore(flags);
}
/*
@@ -1618,10 +1622,11 @@ static void reset_common_ring(struct intel_engine_cs *engine,
GEM_TRACE("%s seqno=%x\n",
engine->name, request ? request->global_seqno : 0);
- spin_lock_irqsave(&engine->timeline->lock, flags);
+ local_irq_save(flags);
reset_irq(engine);
+
/*
* Catch up with any missed context-switch interrupts.
*
@@ -1634,14 +1639,17 @@ static void reset_common_ring(struct intel_engine_cs *engine,
execlists_cancel_port_requests(execlists);
/* Push back any incomplete requests for replay after the reset. */
+ spin_lock(&engine->timeline->lock);
__unwind_incomplete_requests(engine);
+ spin_unlock(&engine->timeline->lock);
/* Mark all CS interrupts as complete */
execlists->active = 0;
- spin_unlock_irqrestore(&engine->timeline->lock, flags);
+ local_irq_restore(flags);
- /* If the request was innocent, we leave the request in the ELSP
+ /*
+ * If the request was innocent, we leave the request in the ELSP
* and will try to replay it on restarting. The context image may
* have been corrupted by the reset, in which case we may have
* to service a new GPU hang, but more likely we can continue on
@@ -1654,7 +1662,8 @@ static void reset_common_ring(struct intel_engine_cs *engine,
if (!request || request->fence.error != -EIO)
return;
- /* We want a simple context + ring to execute the breadcrumb update.
+ /*
+ * We want a simple context + ring to execute the breadcrumb update.
* We cannot rely on the context being intact across the GPU hang,
* so clear it and rebuild just what we need for the breadcrumb.
* All pending requests for this context will be zapped, and any
--
2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 5/5] drm/i915: Call prepare/finish around intel_gpu_reset() during GEM sanitize
2018-03-02 14:32 [PATCH 1/5] drm/i915: Stop engines around GPU reset preparations Chris Wilson
` (2 preceding siblings ...)
2018-03-02 14:32 ` [PATCH 4/5] drm/i915/execlists: Split spinlock from its irq disabling side-effect Chris Wilson
@ 2018-03-02 14:32 ` Chris Wilson
2018-03-02 15:51 ` Mika Kuoppala
2018-03-02 15:43 ` ✓ Fi.CI.BAT: success for series starting with [1/5] drm/i915: Stop engines around GPU reset preparations Patchwork
` (2 subsequent siblings)
6 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2018-03-02 14:32 UTC (permalink / raw)
To: intel-gfx
During GEM sanitization, we reset the GPU so that it's always in a
default state whenever we take over or return the GPU back to the BIOS.
We call the GPU reset directly, so that we don't get caught up in trying
to handle GEM or KMS state that is isn't ready at that time, but now we
have a couple of helpers to prepare and finish around the HW reset. Use
them, so that we can extend them as required for updating HW state
tracking around resets.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index dcdcc09240b9..7ae9c877c1c6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4874,6 +4874,8 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
mutex_unlock(&i915->drm.struct_mutex);
}
+ intel_gpu_reset_prepare(i915, ALL_ENGINES);
+
/*
* If we inherit context state from the BIOS or earlier occupants
* of the GPU, the GPU may be in an inconsistent state when we
@@ -4884,6 +4886,8 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
*/
if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
+
+ intel_gpu_reset_finish(i915, ALL_ENGINES);
}
int i915_gem_suspend(struct drm_i915_private *dev_priv)
--
2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/5] drm/i915: Suspend submission tasklets around wedging
2018-03-02 14:32 ` [PATCH 2/5] drm/i915: Suspend submission tasklets around wedging Chris Wilson
@ 2018-03-02 14:34 ` Mika Kuoppala
0 siblings, 0 replies; 13+ messages in thread
From: Mika Kuoppala @ 2018-03-02 14:34 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> After starting hard at sequences like
Perhaps you meant staring, but starting is fine too.
-Mika
>
> [ 28.199013] systemd-1 2..s. 26062228us : execlists_submission_tasklet: rcs0 cs-irq head=0 [0?], tail=1 [1?]
> [ 28.199095] systemd-1 2..s. 26062229us : execlists_submission_tasklet: rcs0 csb[1]: status=0x00000018:0x00000000, active=0x1
> [ 28.199177] systemd-1 2..s. 26062230us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.1, seqno=3, prio=-1024
> [ 28.199258] systemd-1 2..s. 26062231us : execlists_submission_tasklet: rcs0 completed ctx=0
> [ 28.199340] gem_eio-829 1..s1 26066853us : execlists_submission_tasklet: rcs0 in[0]: ctx=1.1, seqno=1, prio=0
> [ 28.199421] <idle>-0 2..s. 26066863us : execlists_submission_tasklet: rcs0 cs-irq head=1 [1?], tail=2 [2?]
> [ 28.199503] <idle>-0 2..s. 26066865us : execlists_submission_tasklet: rcs0 csb[2]: status=0x00000001:0x00000000, active=0x1
> [ 28.199585] gem_eio-829 1..s1 26067077us : execlists_submission_tasklet: rcs0 in[1]: ctx=3.1, seqno=2, prio=0
> [ 28.199667] gem_eio-829 1..s1 26067078us : execlists_submission_tasklet: rcs0 in[0]: ctx=1.2, seqno=1, prio=0
> [ 28.199749] <idle>-0 2..s. 26067084us : execlists_submission_tasklet: rcs0 cs-irq head=2 [2?], tail=3 [3?]
> [ 28.199830] <idle>-0 2..s. 26067085us : execlists_submission_tasklet: rcs0 csb[3]: status=0x00008002:0x00000001, active=0x1
> [ 28.199912] <idle>-0 2..s. 26067086us : execlists_submission_tasklet: rcs0 out[0]: ctx=1.2, seqno=1, prio=0
> [ 28.199994] gem_eio-829 2..s. 28246084us : execlists_submission_tasklet: rcs0 cs-irq head=3 [3?], tail=4 [4?]
> [ 28.200096] gem_eio-829 2..s. 28246088us : execlists_submission_tasklet: rcs0 csb[4]: status=0x00000014:0x00000001, active=0x5
> [ 28.200178] gem_eio-829 2..s. 28246089us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.0, seqno=0, prio=0
> [ 28.200260] gem_eio-829 2..s. 28246127us : execlists_submission_tasklet: execlists_submission_tasklet:886 GEM_BUG_ON(buf[2 * head + 1] != port->context_id)
>
> the conclusion is that the only place where the ports are reset to zero,
> is from engine->cancel_requests called during i915_gem_set_wedged().
>
> The race is horrible as it results from calling set-wedged on active HW
> (the GPU reset failed) and as such we need to be careful as the HW state
> changes beneath us. Fortunately, it's the same scary conditions as
> affect normal reset, so we can reuse the same machinery to disable state
> tracking as we clobber it.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104945
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Link: https://patchwork.freedesktop.org/patch/msgid/20180302113324.23189-2-chris@chris-wilson.co.uk
> ---
> drivers/gpu/drm/i915/i915_gem.c | 6 +++++-
> drivers/gpu/drm/i915/intel_lrc.c | 5 +++++
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c29b1a1cbe96..dcdcc09240b9 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3212,8 +3212,10 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
> * rolling the global seqno forward (since this would complete requests
> * for which we haven't set the fence error to EIO yet).
> */
> - for_each_engine(engine, i915, id)
> + for_each_engine(engine, i915, id) {
> + i915_gem_reset_prepare_engine(engine);
> engine->submit_request = nop_submit_request;
> + }
>
> /*
> * Make sure no one is running the old callback before we proceed with
> @@ -3255,6 +3257,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
> intel_engine_init_global_seqno(engine,
> intel_engine_last_submit(engine));
> spin_unlock_irqrestore(&engine->timeline->lock, flags);
> +
> + i915_gem_reset_finish_engine(engine);
> }
>
> wake_up_all(&i915->gpu_error.reset_queue);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 14288743909f..c1a3636e94fc 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -687,6 +687,8 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
> struct rb_node *rb;
> unsigned long flags;
>
> + GEM_TRACE("%s\n", engine->name);
> +
> spin_lock_irqsave(&engine->timeline->lock, flags);
>
> /* Cancel the requests on the HW and clear the ELSP tracker. */
> @@ -733,6 +735,9 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
> */
> clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
>
> + /* Mark all CS interrupts as complete */
> + execlists->active = 0;
> +
> spin_unlock_irqrestore(&engine->timeline->lock, flags);
> }
>
> --
> 2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* ✓ Fi.CI.BAT: success for series starting with [1/5] drm/i915: Stop engines around GPU reset preparations
2018-03-02 14:32 [PATCH 1/5] drm/i915: Stop engines around GPU reset preparations Chris Wilson
` (3 preceding siblings ...)
2018-03-02 14:32 ` [PATCH 5/5] drm/i915: Call prepare/finish around intel_gpu_reset() during GEM sanitize Chris Wilson
@ 2018-03-02 15:43 ` Patchwork
2018-03-02 18:10 ` ✗ Fi.CI.IGT: failure " Patchwork
2018-03-03 8:46 ` [PATCH 1/5] " Chris Wilson
6 siblings, 0 replies; 13+ messages in thread
From: Patchwork @ 2018-03-02 15:43 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: series starting with [1/5] drm/i915: Stop engines around GPU reset preparations
URL : https://patchwork.freedesktop.org/series/39284/
State : success
== Summary ==
Series 39284v1 series starting with [1/5] drm/i915: Stop engines around GPU reset preparations
https://patchwork.freedesktop.org/api/1.0/series/39284/revisions/1/mbox/
fi-bdw-5557u total:288 pass:267 dwarn:0 dfail:0 fail:0 skip:21 time:415s
fi-bdw-gvtdvm total:288 pass:264 dwarn:0 dfail:0 fail:0 skip:24 time:423s
fi-blb-e6850 total:288 pass:223 dwarn:1 dfail:0 fail:0 skip:64 time:372s
fi-bsw-n3050 total:288 pass:242 dwarn:0 dfail:0 fail:0 skip:46 time:486s
fi-bwr-2160 total:288 pass:183 dwarn:0 dfail:0 fail:0 skip:105 time:277s
fi-bxt-j4205 total:288 pass:259 dwarn:0 dfail:0 fail:0 skip:29 time:478s
fi-byt-j1900 total:288 pass:253 dwarn:0 dfail:0 fail:0 skip:35 time:467s
fi-byt-n2820 total:288 pass:249 dwarn:0 dfail:0 fail:0 skip:39 time:465s
fi-cfl-8700k total:288 pass:260 dwarn:0 dfail:0 fail:0 skip:28 time:395s
fi-cfl-s2 total:288 pass:262 dwarn:0 dfail:0 fail:0 skip:26 time:555s
fi-cfl-u total:288 pass:262 dwarn:0 dfail:0 fail:0 skip:26 time:491s
fi-elk-e7500 total:288 pass:229 dwarn:0 dfail:0 fail:0 skip:59 time:413s
fi-gdg-551 total:288 pass:179 dwarn:0 dfail:0 fail:1 skip:108 time:289s
fi-glk-1 total:288 pass:260 dwarn:0 dfail:0 fail:0 skip:28 time:506s
fi-hsw-4770 total:288 pass:261 dwarn:0 dfail:0 fail:0 skip:27 time:386s
fi-ilk-650 total:288 pass:228 dwarn:0 dfail:0 fail:0 skip:60 time:404s
fi-ivb-3520m total:288 pass:259 dwarn:0 dfail:0 fail:0 skip:29 time:450s
fi-ivb-3770 total:288 pass:255 dwarn:0 dfail:0 fail:0 skip:33 time:408s
fi-kbl-7500u total:288 pass:263 dwarn:1 dfail:0 fail:0 skip:24 time:447s
fi-kbl-7560u total:288 pass:269 dwarn:0 dfail:0 fail:0 skip:19 time:488s
fi-kbl-7567u total:288 pass:268 dwarn:0 dfail:0 fail:0 skip:20 time:446s
fi-kbl-r total:288 pass:261 dwarn:0 dfail:0 fail:0 skip:27 time:492s
fi-pnv-d510 total:288 pass:222 dwarn:1 dfail:0 fail:0 skip:65 time:586s
fi-skl-6260u total:288 pass:268 dwarn:0 dfail:0 fail:0 skip:20 time:429s
fi-skl-6600u total:288 pass:261 dwarn:0 dfail:0 fail:0 skip:27 time:496s
fi-skl-6700hq total:288 pass:262 dwarn:0 dfail:0 fail:0 skip:26 time:519s
fi-skl-6700k2 total:288 pass:264 dwarn:0 dfail:0 fail:0 skip:24 time:487s
fi-skl-6770hq total:288 pass:268 dwarn:0 dfail:0 fail:0 skip:20 time:477s
fi-skl-guc total:288 pass:260 dwarn:0 dfail:0 fail:0 skip:28 time:403s
fi-skl-gvtdvm total:288 pass:265 dwarn:0 dfail:0 fail:0 skip:23 time:428s
fi-snb-2520m total:288 pass:248 dwarn:0 dfail:0 fail:0 skip:40 time:519s
fi-snb-2600 total:288 pass:248 dwarn:0 dfail:0 fail:0 skip:40 time:390s
fi-cnl-y3 failed to collect. IGT log at Patchwork_8218/fi-cnl-y3/run0.log
7075ab436b3bb8b97dfde3eb16b2545398938f83 drm-tip: 2018y-03m-02d-13h-04m-00s UTC integration manifest
676919bfcffd drm/i915: Call prepare/finish around intel_gpu_reset() during GEM sanitize
93c1a7f6cb61 drm/i915/execlists: Split spinlock from its irq disabling side-effect
0a4ef58484b9 drm/i915/execlists: Move irq state manipulation inside irq disabled region
1fcd7b68df1b drm/i915: Suspend submission tasklets around wedging
63aeeb0f1728 drm/i915: Stop engines around GPU reset preparations
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8218/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 4/5] drm/i915/execlists: Split spinlock from its irq disabling side-effect
2018-03-02 14:32 ` [PATCH 4/5] drm/i915/execlists: Split spinlock from its irq disabling side-effect Chris Wilson
@ 2018-03-02 15:50 ` Mika Kuoppala
2018-03-02 16:04 ` Chris Wilson
0 siblings, 1 reply; 13+ messages in thread
From: Mika Kuoppala @ 2018-03-02 15:50 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> During reset/wedging, we have to clean up the requests on the timeline
> and flush the pending interrupt state. Currently, we are abusing the irq
> disabling of the timeline spinlock to protect the irq state in
> conjunction to the engine's timeline requests, but this is accidental
> and conflates the spinlock with the irq state. A baffling state of
> affairs for the reader.
>
> Instead, explicitly disable irqs over the critical section, and separate
> modifying the irq state from the timeline's requests.
>
> Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/intel_lrc.c | 21 +++++++++++++++------
> 1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 0482e54c94f0..7d1109aceabb 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -689,11 +689,13 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
>
> GEM_TRACE("%s\n", engine->name);
>
> - spin_lock_irqsave(&engine->timeline->lock, flags);
> + local_irq_save(flags);
Chris explained in irc that this is for lockdep only. It was a bit
confusing as we already are guaranteed exclusive access to
state by tasklet being killed and dead at this point.
I think this warrants a comment that this is to soothe lockdep.
>
> /* Cancel the requests on the HW and clear the ELSP tracker. */
> execlists_cancel_port_requests(execlists);
>
> + spin_lock(&engine->timeline->lock);
> +
> /* Mark all executing requests as skipped. */
> list_for_each_entry(rq, &engine->timeline->requests, link) {
> GEM_BUG_ON(!rq->global_seqno);
> @@ -727,6 +729,8 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
> execlists->first = NULL;
> GEM_BUG_ON(port_isset(execlists->port));
>
> + spin_unlock(&engine->timeline->lock);
> +
> /*
> * The port is checked prior to scheduling a tasklet, but
> * just in case we have suspended the tasklet to do the
> @@ -738,7 +742,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
> /* Mark all CS interrupts as complete */
> execlists->active = 0;
>
> - spin_unlock_irqrestore(&engine->timeline->lock, flags);
> + local_irq_restore(flags);
> }
>
> /*
> @@ -1618,10 +1622,11 @@ static void reset_common_ring(struct intel_engine_cs *engine,
> GEM_TRACE("%s seqno=%x\n",
> engine->name, request ? request->global_seqno : 0);
>
> - spin_lock_irqsave(&engine->timeline->lock, flags);
> + local_irq_save(flags);
>
> reset_irq(engine);
>
> +
Unintentional extra line?
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> /*
> * Catch up with any missed context-switch interrupts.
> *
> @@ -1634,14 +1639,17 @@ static void reset_common_ring(struct intel_engine_cs *engine,
> execlists_cancel_port_requests(execlists);
>
> /* Push back any incomplete requests for replay after the reset. */
> + spin_lock(&engine->timeline->lock);
> __unwind_incomplete_requests(engine);
> + spin_unlock(&engine->timeline->lock);
>
> /* Mark all CS interrupts as complete */
> execlists->active = 0;
>
> - spin_unlock_irqrestore(&engine->timeline->lock, flags);
> + local_irq_restore(flags);
>
> - /* If the request was innocent, we leave the request in the ELSP
> + /*
> + * If the request was innocent, we leave the request in the ELSP
> * and will try to replay it on restarting. The context image may
> * have been corrupted by the reset, in which case we may have
> * to service a new GPU hang, but more likely we can continue on
> @@ -1654,7 +1662,8 @@ static void reset_common_ring(struct intel_engine_cs *engine,
> if (!request || request->fence.error != -EIO)
> return;
>
> - /* We want a simple context + ring to execute the breadcrumb update.
> + /*
> + * We want a simple context + ring to execute the breadcrumb update.
> * We cannot rely on the context being intact across the GPU hang,
> * so clear it and rebuild just what we need for the breadcrumb.
> * All pending requests for this context will be zapped, and any
> --
> 2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 5/5] drm/i915: Call prepare/finish around intel_gpu_reset() during GEM sanitize
2018-03-02 14:32 ` [PATCH 5/5] drm/i915: Call prepare/finish around intel_gpu_reset() during GEM sanitize Chris Wilson
@ 2018-03-02 15:51 ` Mika Kuoppala
0 siblings, 0 replies; 13+ messages in thread
From: Mika Kuoppala @ 2018-03-02 15:51 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> During GEM sanitization, we reset the GPU so that it's always in a
> default state whenever we take over or return the GPU back to the BIOS.
> We call the GPU reset directly, so that we don't get caught up in trying
> to handle GEM or KMS state that is isn't ready at that time, but now we
> have a couple of helpers to prepare and finish around the HW reset. Use
> them, so that we can extend them as required for updating HW state
> tracking around resets.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 4/5] drm/i915/execlists: Split spinlock from its irq disabling side-effect
2018-03-02 15:50 ` Mika Kuoppala
@ 2018-03-02 16:04 ` Chris Wilson
2018-03-02 16:11 ` Mika Kuoppala
0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2018-03-02 16:04 UTC (permalink / raw)
To: Mika Kuoppala, intel-gfx
Quoting Mika Kuoppala (2018-03-02 15:50:53)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
> > During reset/wedging, we have to clean up the requests on the timeline
> > and flush the pending interrupt state. Currently, we are abusing the irq
> > disabling of the timeline spinlock to protect the irq state in
> > conjunction to the engine's timeline requests, but this is accidental
> > and conflates the spinlock with the irq state. A baffling state of
> > affairs for the reader.
> >
> > Instead, explicitly disable irqs over the critical section, and separate
> > modifying the irq state from the timeline's requests.
> >
> > Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Michel Thierry <michel.thierry@intel.com>
> > ---
> > drivers/gpu/drm/i915/intel_lrc.c | 21 +++++++++++++++------
> > 1 file changed, 15 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 0482e54c94f0..7d1109aceabb 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -689,11 +689,13 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
> >
> > GEM_TRACE("%s\n", engine->name);
> >
> > - spin_lock_irqsave(&engine->timeline->lock, flags);
> > + local_irq_save(flags);
>
> Chris explained in irc that this is for lockdep only. It was a bit
> confusing as we already are guaranteed exclusive access to
> state by tasklet being killed and dead at this point.
>
> I think this warrants a comment that this is to soothe lockdep.
/*
* Before we call engine->cancel_requests(), we should have exclusive
* access to the submission state. This is arranged for us by the caller
* disabling the interrupt generation, the tasklet and other threads
* that may then access the same state, giving us a free hand to
* reset state. However, we still need to let lockdep be aware that
* we know this state may be accessed in hardirq context, so we
* disable the irq around this manipulation and we want to keep
* the spinlock focused on its duties and not accidentally conflate
* coverage to the submission's irq state. (Similarly, although we
* shouldn't need to disable irq around the manipulation of the
* submission's irq state, we also wish to remind ourselves that
* it is irq state.)
*/
> >
> > /* Cancel the requests on the HW and clear the ELSP tracker. */
> > execlists_cancel_port_requests(execlists);
> >
> > + spin_lock(&engine->timeline->lock);
> > @@ -1618,10 +1622,11 @@ static void reset_common_ring(struct intel_engine_cs *engine,
> > GEM_TRACE("%s seqno=%x\n",
> > engine->name, request ? request->global_seqno : 0);
> >
> > - spin_lock_irqsave(&engine->timeline->lock, flags);
/* See execlists_cancel_requests() for the irq/spinlock split. */
> > + local_irq_save(flags);
Good?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 4/5] drm/i915/execlists: Split spinlock from its irq disabling side-effect
2018-03-02 16:04 ` Chris Wilson
@ 2018-03-02 16:11 ` Mika Kuoppala
0 siblings, 0 replies; 13+ messages in thread
From: Mika Kuoppala @ 2018-03-02 16:11 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Quoting Mika Kuoppala (2018-03-02 15:50:53)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>>
>> > During reset/wedging, we have to clean up the requests on the timeline
>> > and flush the pending interrupt state. Currently, we are abusing the irq
>> > disabling of the timeline spinlock to protect the irq state in
>> > conjunction to the engine's timeline requests, but this is accidental
>> > and conflates the spinlock with the irq state. A baffling state of
>> > affairs for the reader.
>> >
>> > Instead, explicitly disable irqs over the critical section, and separate
>> > modifying the irq state from the timeline's requests.
>> >
>> > Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> > Cc: Michel Thierry <michel.thierry@intel.com>
>> > ---
>> > drivers/gpu/drm/i915/intel_lrc.c | 21 +++++++++++++++------
>> > 1 file changed, 15 insertions(+), 6 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> > index 0482e54c94f0..7d1109aceabb 100644
>> > --- a/drivers/gpu/drm/i915/intel_lrc.c
>> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> > @@ -689,11 +689,13 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
>> >
>> > GEM_TRACE("%s\n", engine->name);
>> >
>> > - spin_lock_irqsave(&engine->timeline->lock, flags);
>> > + local_irq_save(flags);
>>
>> Chris explained in irc that this is for lockdep only. It was a bit
>> confusing as we already are guaranteed exclusive access to
>> state by tasklet being killed and dead at this point.
>>
>> I think this warrants a comment that this is to soothe lockdep.
>
> /*
> * Before we call engine->cancel_requests(), we should have exclusive
> * access to the submission state. This is arranged for us by the caller
> * disabling the interrupt generation, the tasklet and other threads
> * that may then access the same state, giving us a free hand to
> * reset state. However, we still need to let lockdep be aware that
> * we know this state may be accessed in hardirq context, so we
> * disable the irq around this manipulation and we want to keep
> * the spinlock focused on its duties and not accidentally conflate
> * coverage to the submission's irq state. (Similarly, although we
> * shouldn't need to disable irq around the manipulation of the
> * submission's irq state, we also wish to remind ourselves that
> * it is irq state.)
> */
>
>> >
>> > /* Cancel the requests on the HW and clear the ELSP tracker. */
>> > execlists_cancel_port_requests(execlists);
>> >
>> > + spin_lock(&engine->timeline->lock);
>> > @@ -1618,10 +1622,11 @@ static void reset_common_ring(struct intel_engine_cs *engine,
>> > GEM_TRACE("%s seqno=%x\n",
>> > engine->name, request ? request->global_seqno : 0);
>> >
>> > - spin_lock_irqsave(&engine->timeline->lock, flags);
>
> /* See execlists_cancel_requests() for the irq/spinlock split. */
>> > + local_irq_save(flags);
>
> Good?
Much more than I bargained for. Excellent!
-Mika
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* ✗ Fi.CI.IGT: failure for series starting with [1/5] drm/i915: Stop engines around GPU reset preparations
2018-03-02 14:32 [PATCH 1/5] drm/i915: Stop engines around GPU reset preparations Chris Wilson
` (4 preceding siblings ...)
2018-03-02 15:43 ` ✓ Fi.CI.BAT: success for series starting with [1/5] drm/i915: Stop engines around GPU reset preparations Patchwork
@ 2018-03-02 18:10 ` Patchwork
2018-03-03 8:46 ` [PATCH 1/5] " Chris Wilson
6 siblings, 0 replies; 13+ messages in thread
From: Patchwork @ 2018-03-02 18:10 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: series starting with [1/5] drm/i915: Stop engines around GPU reset preparations
URL : https://patchwork.freedesktop.org/series/39284/
State : failure
== Summary ==
---- Possible new issues:
Test drv_selftest:
Subgroup live_hangcheck:
pass -> INCOMPLETE (shard-apl)
Test kms_cursor_legacy:
Subgroup short-flip-before-cursor-atomic-transitions-varying-size:
pass -> FAIL (shard-hsw)
---- Known issues:
Test kms_chv_cursor_fail:
Subgroup pipe-b-128x128-left-edge:
dmesg-warn -> PASS (shard-snb) fdo#105185
Test kms_cursor_crc:
Subgroup cursor-256x256-suspend:
pass -> SKIP (shard-snb) fdo#103375
Test kms_sysfs_edid_timing:
pass -> WARN (shard-apl) fdo#100047
Test perf:
Subgroup blocking:
pass -> FAIL (shard-hsw) fdo#102252
fdo#105185 https://bugs.freedesktop.org/show_bug.cgi?id=105185
fdo#103375 https://bugs.freedesktop.org/show_bug.cgi?id=103375
fdo#100047 https://bugs.freedesktop.org/show_bug.cgi?id=100047
fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252
shard-apl total:3445 pass:1803 dwarn:1 dfail:0 fail:7 skip:1632 time:12245s
shard-hsw total:3463 pass:1768 dwarn:1 dfail:0 fail:3 skip:1690 time:12090s
shard-snb total:3463 pass:1361 dwarn:1 dfail:0 fail:1 skip:2100 time:6913s
Blacklisted hosts:
shard-kbl total:3445 pass:1926 dwarn:1 dfail:0 fail:7 skip:1510 time:9778s
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8218/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/5] drm/i915: Stop engines around GPU reset preparations
2018-03-02 14:32 [PATCH 1/5] drm/i915: Stop engines around GPU reset preparations Chris Wilson
` (5 preceding siblings ...)
2018-03-02 18:10 ` ✗ Fi.CI.IGT: failure " Patchwork
@ 2018-03-03 8:46 ` Chris Wilson
6 siblings, 0 replies; 13+ messages in thread
From: Chris Wilson @ 2018-03-03 8:46 UTC (permalink / raw)
To: intel-gfx
Quoting Chris Wilson (2018-03-02 14:32:42)
> As we make preparations to reset the GPU state, we assume that the GPU
> is hung and will not advance. Make this assumption more explicit by
> setting the STOP_RING bit on the engines as part of our early reset
> preparations.
I had to drop this one as it causes a CS arbitration event (i.e. context
switch) when the CS parser hits the STOP_RING. As we were doing the stop
prior to disabling the tasklet, this meant we were handling a spurious
interrupt. Note this means that we are generating spurious interrupts
prior to the reset currently, just we are effectively masking them :)
As for the problem it was trying to solve? I am thinking along the lines
of more set-wedge races that used to be prevented by the stop_machine().
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2018-03-03 8:46 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-02 14:32 [PATCH 1/5] drm/i915: Stop engines around GPU reset preparations Chris Wilson
2018-03-02 14:32 ` [PATCH 2/5] drm/i915: Suspend submission tasklets around wedging Chris Wilson
2018-03-02 14:34 ` Mika Kuoppala
2018-03-02 14:32 ` [PATCH 3/5] drm/i915/execlists: Move irq state manipulation inside irq disabled region Chris Wilson
2018-03-02 14:32 ` [PATCH 4/5] drm/i915/execlists: Split spinlock from its irq disabling side-effect Chris Wilson
2018-03-02 15:50 ` Mika Kuoppala
2018-03-02 16:04 ` Chris Wilson
2018-03-02 16:11 ` Mika Kuoppala
2018-03-02 14:32 ` [PATCH 5/5] drm/i915: Call prepare/finish around intel_gpu_reset() during GEM sanitize Chris Wilson
2018-03-02 15:51 ` Mika Kuoppala
2018-03-02 15:43 ` ✓ Fi.CI.BAT: success for series starting with [1/5] drm/i915: Stop engines around GPU reset preparations Patchwork
2018-03-02 18:10 ` ✗ Fi.CI.IGT: failure " Patchwork
2018-03-03 8:46 ` [PATCH 1/5] " Chris Wilson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox