public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Subject: Re: [PATCH v3 2/2] drm/i915: Restore engine->submit_request before unwedging
Date: Wed, 15 Mar 2017 09:23:01 +0000	[thread overview]
Message-ID: <cdf9081e-ed02-c03d-9dbd-00332a53683d@linux.intel.com> (raw)
In-Reply-To: <20170314093441.7397-2-chris@chris-wilson.co.uk>


On 14/03/2017 09:34, Chris Wilson wrote:
> When we wedge the device, we override engine->submit_request with a nop
> to ensure that all in-flight requests are marked in error. However, igt
> would like to unwedge the device to test -EIO handling. This requires us
> to flush those in-flight requests and restore the original
> engine->submit_request.
>
> v2: Use a vfunc to unify enabling request submission to engines
> v3: Split new vfunc to a separate patch.
>
> Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests")
> Testcase: igt/gem_eio
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c |  2 +-
>  drivers/gpu/drm/i915/i915_drv.h |  1 +
>  drivers/gpu/drm/i915/i915_gem.c | 51 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 53 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index e312b61ba6bb..11d1066b673c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1822,7 +1822,7 @@ void i915_reset(struct drm_i915_private *dev_priv)
>  		return;
>
>  	/* Clear any previous failed attempts at recovery. Time to try again. */
> -	__clear_bit(I915_WEDGED, &error->flags);
> +	i915_gem_unset_wedged(dev_priv);
>  	error->reset_count++;
>
>  	pr_notice("drm/i915: Resetting chip after gpu hang\n");
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 48ff64812289..53a791d8d992 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3406,6 +3406,7 @@ int i915_gem_reset_prepare(struct drm_i915_private *dev_priv);
>  void i915_gem_reset(struct drm_i915_private *dev_priv);
>  void i915_gem_reset_finish(struct drm_i915_private *dev_priv);
>  void i915_gem_set_wedged(struct drm_i915_private *dev_priv);
> +void i915_gem_unset_wedged(struct drm_i915_private *dev_priv);
>
>  void i915_gem_init_mmio(struct drm_i915_private *i915);
>  int __must_check i915_gem_init(struct drm_i915_private *dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 202bb850f260..e06830916a05 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3000,6 +3000,57 @@ void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
>  	mod_delayed_work(dev_priv->wq, &dev_priv->gt.idle_work, 0);
>  }
>
> +void i915_gem_unset_wedged(struct drm_i915_private *i915)
> +{
> +	struct i915_gem_timeline *tl;
> +	int i;
> +
> +	lockdep_assert_held(&i915->drm.struct_mutex);
> +	if (!test_bit(I915_WEDGED, &i915->gpu_error.flags))
> +		return;
> +
> +	/* Before unwedging, make sure that all pending operations
> +	 * are flushed and errored out. No more can be submitted until
> +	 * we reset the wedged bit.
> +	 */
> +	list_for_each_entry(tl, &i915->gt.timelines, link) {
> +		for (i = 0; i < ARRAY_SIZE(tl->engine); i++) {
> +			struct drm_i915_gem_request *rq;
> +
> +			rq = i915_gem_active_peek(&tl->engine[i].last_request,
> +						  &i915->drm.struct_mutex);
> +			if (!rq)
> +				continue;
> +
> +			/* We can't use our normal waiter as we want to
> +			 * avoid recursively trying to handle the current
> +			 * reset. The basic dma_fence_default_wait() installs
> +			 * a callback for dma_fence_signal(), which is
> +			 * triggered by our nop handler (indirectly, the
> +			 * callback enables the signaler thread which is
> +			 * woken by the nop_submit_request() advancing the seqno
> +			 * and when the seqno passes the fence, the signaler
> +			 * then signals the fence waking us up).
> +			 */
> +			dma_fence_default_wait(&rq->fence, false,
> +					       MAX_SCHEDULE_TIMEOUT);
> +		}
> +	}
> +
> +	/* Undo nop_submit_request. We prevent all new i915 requests from
> +	 * being queued (by disallowing execbuf whilst wedged) so having
> +	 * waited for all active requests above, we know the system is idle
> +	 * and do not have to worry about a thread being inside
> +	 * engine->submit_request() as we swap over. So unlike installing
> +	 * the nop_submit_request on reset, we can do this from normal
> +	 * context and do not require stop_machine().
> +	 */
> +	intel_engines_enable_submission(i915);

So the point of the dma_fence_default_wait above is it to ensure all 
nop_submit_request call backs have completed? I don't at the moment 
understand how could there be such callbacks since unwedge is happening 
after the wedge. So the wedge already installed the nop handler, and by 
the time we get to another reset attempt, isn't it already guaranteed 
all of those have exited?

Regards,

Tvrtko

> +
> +	smp_mb__before_atomic(); /* complete takeover before enabling execbuf */
> +	clear_bit(I915_WEDGED, &i915->gpu_error.flags);
> +}
> +
>  static void
>  i915_gem_retire_work_handler(struct work_struct *work)
>  {
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2017-03-15  9:23 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-14  9:34 [PATCH v3 1/2] drm/i915: Move engine->submit_request selection to a vfunc Chris Wilson
2017-03-14  9:34 ` [PATCH v3 2/2] drm/i915: Restore engine->submit_request before unwedging Chris Wilson
2017-03-15  9:23   ` Tvrtko Ursulin [this message]
2017-03-15  9:34     ` Chris Wilson
2017-03-15 10:04       ` Tvrtko Ursulin
2017-03-14 16:17 ` ✓ Fi.CI.BAT: success for series starting with [v3,1/2] drm/i915: Move engine->submit_request selection to a vfunc Patchwork
2017-03-14 16:31 ` [PATCH v3 1/2] " Tvrtko Ursulin
2017-03-14 21:33   ` Chris Wilson
2017-03-15  8:14     ` Tvrtko Ursulin
2017-03-15  9:41       ` Chris Wilson
2017-03-15 10:05         ` Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cdf9081e-ed02-c03d-9dbd-00332a53683d@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=mika.kuoppala@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox