All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/5] drm/i915: Suspend submission tasklets around wedging
Date: Fri, 02 Mar 2018 16:34:15 +0200	[thread overview]
Message-ID: <87h8pypmvs.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <20180302143246.2579-2-chris@chris-wilson.co.uk>

Chris Wilson <chris@chris-wilson.co.uk> writes:

> After starting hard at sequences like

Perhaps you meant staring, but starting is fine too.
-Mika

>
> [   28.199013]  systemd-1       2..s. 26062228us : execlists_submission_tasklet: rcs0 cs-irq head=0 [0?], tail=1 [1?]
> [   28.199095]  systemd-1       2..s. 26062229us : execlists_submission_tasklet: rcs0 csb[1]: status=0x00000018:0x00000000, active=0x1
> [   28.199177]  systemd-1       2..s. 26062230us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.1, seqno=3, prio=-1024
> [   28.199258]  systemd-1       2..s. 26062231us : execlists_submission_tasklet: rcs0 completed ctx=0
> [   28.199340]  gem_eio-829     1..s1 26066853us : execlists_submission_tasklet: rcs0 in[0]:  ctx=1.1, seqno=1, prio=0
> [   28.199421]   <idle>-0       2..s. 26066863us : execlists_submission_tasklet: rcs0 cs-irq head=1 [1?], tail=2 [2?]
> [   28.199503]   <idle>-0       2..s. 26066865us : execlists_submission_tasklet: rcs0 csb[2]: status=0x00000001:0x00000000, active=0x1
> [   28.199585]  gem_eio-829     1..s1 26067077us : execlists_submission_tasklet: rcs0 in[1]:  ctx=3.1, seqno=2, prio=0
> [   28.199667]  gem_eio-829     1..s1 26067078us : execlists_submission_tasklet: rcs0 in[0]:  ctx=1.2, seqno=1, prio=0
> [   28.199749]   <idle>-0       2..s. 26067084us : execlists_submission_tasklet: rcs0 cs-irq head=2 [2?], tail=3 [3?]
> [   28.199830]   <idle>-0       2..s. 26067085us : execlists_submission_tasklet: rcs0 csb[3]: status=0x00008002:0x00000001, active=0x1
> [   28.199912]   <idle>-0       2..s. 26067086us : execlists_submission_tasklet: rcs0 out[0]: ctx=1.2, seqno=1, prio=0
> [   28.199994]  gem_eio-829     2..s. 28246084us : execlists_submission_tasklet: rcs0 cs-irq head=3 [3?], tail=4 [4?]
> [   28.200096]  gem_eio-829     2..s. 28246088us : execlists_submission_tasklet: rcs0 csb[4]: status=0x00000014:0x00000001, active=0x5
> [   28.200178]  gem_eio-829     2..s. 28246089us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.0, seqno=0, prio=0
> [   28.200260]  gem_eio-829     2..s. 28246127us : execlists_submission_tasklet: execlists_submission_tasklet:886 GEM_BUG_ON(buf[2 * head + 1] != port->context_id)
>
> the conclusion is that the only place where the ports are reset to zero,
> is from engine->cancel_requests called during i915_gem_set_wedged().
>
> The race is horrible as it results from calling set-wedged on active HW
> (the GPU reset failed) and as such we need to be careful as the HW state
> changes beneath us. Fortunately, it's the same scary conditions as
> affect normal reset, so we can reuse the same machinery to disable state
> tracking as we clobber it.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104945
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Link: https://patchwork.freedesktop.org/patch/msgid/20180302113324.23189-2-chris@chris-wilson.co.uk
> ---
>  drivers/gpu/drm/i915/i915_gem.c  | 6 +++++-
>  drivers/gpu/drm/i915/intel_lrc.c | 5 +++++
>  2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c29b1a1cbe96..dcdcc09240b9 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3212,8 +3212,10 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>  	 * rolling the global seqno forward (since this would complete requests
>  	 * for which we haven't set the fence error to EIO yet).
>  	 */
> -	for_each_engine(engine, i915, id)
> +	for_each_engine(engine, i915, id) {
> +		i915_gem_reset_prepare_engine(engine);
>  		engine->submit_request = nop_submit_request;
> +	}
>  
>  	/*
>  	 * Make sure no one is running the old callback before we proceed with
> @@ -3255,6 +3257,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>  		intel_engine_init_global_seqno(engine,
>  					       intel_engine_last_submit(engine));
>  		spin_unlock_irqrestore(&engine->timeline->lock, flags);
> +
> +		i915_gem_reset_finish_engine(engine);
>  	}
>  
>  	wake_up_all(&i915->gpu_error.reset_queue);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 14288743909f..c1a3636e94fc 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -687,6 +687,8 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
>  	struct rb_node *rb;
>  	unsigned long flags;
>  
> +	GEM_TRACE("%s\n", engine->name);
> +
>  	spin_lock_irqsave(&engine->timeline->lock, flags);
>  
>  	/* Cancel the requests on the HW and clear the ELSP tracker. */
> @@ -733,6 +735,9 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
>  	 */
>  	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
>  
> +	/* Mark all CS interrupts as complete */
> +	execlists->active = 0;
> +
>  	spin_unlock_irqrestore(&engine->timeline->lock, flags);
>  }
>  
> -- 
> 2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2018-03-02 14:38 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-02 14:32 [PATCH 1/5] drm/i915: Stop engines around GPU reset preparations Chris Wilson
2018-03-02 14:32 ` [PATCH 2/5] drm/i915: Suspend submission tasklets around wedging Chris Wilson
2018-03-02 14:34   ` Mika Kuoppala [this message]
2018-03-02 14:32 ` [PATCH 3/5] drm/i915/execlists: Move irq state manipulation inside irq disabled region Chris Wilson
2018-03-02 14:32 ` [PATCH 4/5] drm/i915/execlists: Split spinlock from its irq disabling side-effect Chris Wilson
2018-03-02 15:50   ` Mika Kuoppala
2018-03-02 16:04     ` Chris Wilson
2018-03-02 16:11       ` Mika Kuoppala
2018-03-02 14:32 ` [PATCH 5/5] drm/i915: Call prepare/finish around intel_gpu_reset() during GEM sanitize Chris Wilson
2018-03-02 15:51   ` Mika Kuoppala
2018-03-02 15:43 ` ✓ Fi.CI.BAT: success for series starting with [1/5] drm/i915: Stop engines around GPU reset preparations Patchwork
2018-03-02 18:10 ` ✗ Fi.CI.IGT: failure " Patchwork
2018-03-03  8:46 ` [PATCH 1/5] " Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h8pypmvs.fsf@gaia.fi.intel.com \
    --to=mika.kuoppala@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.