From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/5] drm/i915: Suspend submission tasklets around wedging
Date: Fri, 02 Mar 2018 16:34:15 +0200 [thread overview]
Message-ID: <87h8pypmvs.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <20180302143246.2579-2-chris@chris-wilson.co.uk>
Chris Wilson <chris@chris-wilson.co.uk> writes:
> After starting hard at sequences like
Perhaps you meant staring, but starting is fine too.
-Mika
>
> [ 28.199013] systemd-1 2..s. 26062228us : execlists_submission_tasklet: rcs0 cs-irq head=0 [0?], tail=1 [1?]
> [ 28.199095] systemd-1 2..s. 26062229us : execlists_submission_tasklet: rcs0 csb[1]: status=0x00000018:0x00000000, active=0x1
> [ 28.199177] systemd-1 2..s. 26062230us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.1, seqno=3, prio=-1024
> [ 28.199258] systemd-1 2..s. 26062231us : execlists_submission_tasklet: rcs0 completed ctx=0
> [ 28.199340] gem_eio-829 1..s1 26066853us : execlists_submission_tasklet: rcs0 in[0]: ctx=1.1, seqno=1, prio=0
> [ 28.199421] <idle>-0 2..s. 26066863us : execlists_submission_tasklet: rcs0 cs-irq head=1 [1?], tail=2 [2?]
> [ 28.199503] <idle>-0 2..s. 26066865us : execlists_submission_tasklet: rcs0 csb[2]: status=0x00000001:0x00000000, active=0x1
> [ 28.199585] gem_eio-829 1..s1 26067077us : execlists_submission_tasklet: rcs0 in[1]: ctx=3.1, seqno=2, prio=0
> [ 28.199667] gem_eio-829 1..s1 26067078us : execlists_submission_tasklet: rcs0 in[0]: ctx=1.2, seqno=1, prio=0
> [ 28.199749] <idle>-0 2..s. 26067084us : execlists_submission_tasklet: rcs0 cs-irq head=2 [2?], tail=3 [3?]
> [ 28.199830] <idle>-0 2..s. 26067085us : execlists_submission_tasklet: rcs0 csb[3]: status=0x00008002:0x00000001, active=0x1
> [ 28.199912] <idle>-0 2..s. 26067086us : execlists_submission_tasklet: rcs0 out[0]: ctx=1.2, seqno=1, prio=0
> [ 28.199994] gem_eio-829 2..s. 28246084us : execlists_submission_tasklet: rcs0 cs-irq head=3 [3?], tail=4 [4?]
> [ 28.200096] gem_eio-829 2..s. 28246088us : execlists_submission_tasklet: rcs0 csb[4]: status=0x00000014:0x00000001, active=0x5
> [ 28.200178] gem_eio-829 2..s. 28246089us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.0, seqno=0, prio=0
> [ 28.200260] gem_eio-829 2..s. 28246127us : execlists_submission_tasklet: execlists_submission_tasklet:886 GEM_BUG_ON(buf[2 * head + 1] != port->context_id)
>
> the conclusion is that the only place where the ports are reset to zero,
> is from engine->cancel_requests called during i915_gem_set_wedged().
>
> The race is horrible as it results from calling set-wedged on active HW
> (the GPU reset failed) and as such we need to be careful as the HW state
> changes beneath us. Fortunately, it's the same scary conditions as
> affect normal reset, so we can reuse the same machinery to disable state
> tracking as we clobber it.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104945
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Link: https://patchwork.freedesktop.org/patch/msgid/20180302113324.23189-2-chris@chris-wilson.co.uk
> ---
> drivers/gpu/drm/i915/i915_gem.c | 6 +++++-
> drivers/gpu/drm/i915/intel_lrc.c | 5 +++++
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c29b1a1cbe96..dcdcc09240b9 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3212,8 +3212,10 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
> * rolling the global seqno forward (since this would complete requests
> * for which we haven't set the fence error to EIO yet).
> */
> - for_each_engine(engine, i915, id)
> + for_each_engine(engine, i915, id) {
> + i915_gem_reset_prepare_engine(engine);
> engine->submit_request = nop_submit_request;
> + }
>
> /*
> * Make sure no one is running the old callback before we proceed with
> @@ -3255,6 +3257,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
> intel_engine_init_global_seqno(engine,
> intel_engine_last_submit(engine));
> spin_unlock_irqrestore(&engine->timeline->lock, flags);
> +
> + i915_gem_reset_finish_engine(engine);
> }
>
> wake_up_all(&i915->gpu_error.reset_queue);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 14288743909f..c1a3636e94fc 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -687,6 +687,8 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
> struct rb_node *rb;
> unsigned long flags;
>
> + GEM_TRACE("%s\n", engine->name);
> +
> spin_lock_irqsave(&engine->timeline->lock, flags);
>
> /* Cancel the requests on the HW and clear the ELSP tracker. */
> @@ -733,6 +735,9 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
> */
> clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
>
> + /* Mark all CS interrupts as complete */
> + execlists->active = 0;
> +
> spin_unlock_irqrestore(&engine->timeline->lock, flags);
> }
>
> --
> 2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2018-03-02 14:38 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-02 14:32 [PATCH 1/5] drm/i915: Stop engines around GPU reset preparations Chris Wilson
2018-03-02 14:32 ` [PATCH 2/5] drm/i915: Suspend submission tasklets around wedging Chris Wilson
2018-03-02 14:34 ` Mika Kuoppala [this message]
2018-03-02 14:32 ` [PATCH 3/5] drm/i915/execlists: Move irq state manipulation inside irq disabled region Chris Wilson
2018-03-02 14:32 ` [PATCH 4/5] drm/i915/execlists: Split spinlock from its irq disabling side-effect Chris Wilson
2018-03-02 15:50 ` Mika Kuoppala
2018-03-02 16:04 ` Chris Wilson
2018-03-02 16:11 ` Mika Kuoppala
2018-03-02 14:32 ` [PATCH 5/5] drm/i915: Call prepare/finish around intel_gpu_reset() during GEM sanitize Chris Wilson
2018-03-02 15:51 ` Mika Kuoppala
2018-03-02 15:43 ` ✓ Fi.CI.BAT: success for series starting with [1/5] drm/i915: Stop engines around GPU reset preparations Patchwork
2018-03-02 18:10 ` ✗ Fi.CI.IGT: failure " Patchwork
2018-03-03 8:46 ` [PATCH 1/5] " Chris Wilson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h8pypmvs.fsf@gaia.fi.intel.com \
--to=mika.kuoppala@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox