public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	Michel Thierry <michel.thierry@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 4/9] drm/i915/tdr: Add support for per engine reset recovery
Date: Mon, 19 Dec 2016 16:24:18 +0200	[thread overview]
Message-ID: <87pokot2st.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <20161216204521.GG29871@nuc-i3427.alporthouse.com>

Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Fri, Dec 16, 2016 at 12:20:05PM -0800, Michel Thierry wrote:
>> From: Arun Siluvery <arun.siluvery@linux.intel.com>
>> 
>> This change implements support for per-engine reset as an initial, less
>> intrusive hang recovery option to be attempted before falling back to the
>> legacy full GPU reset recovery mode if necessary. This is only supported
>> from Gen8 onwards.
>> 
>> Hangchecker determines which engines are hung and invokes error handler to
>> recover from it. Error handler schedules recovery for each of those engines
>> that are hung. The recovery procedure is as follows,
>>  - identifies the request that caused the hang and it is dropped
>>  - force engine to idle: this is done by issuing a reset request
>>  - reset and re-init engine
>>  - restart submissions to the engine
>> 
>> If engine reset fails then we fall back to heavy weight full gpu reset
>> which resets all engines and reinitiazes complete state of HW and SW.
>> 
>> v2: Rebase.
>> 
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> Signed-off-by: Tomas Elf <tomas.elf@intel.com>
>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_drv.c     | 56 +++++++++++++++++++++++++++++++++++--
>>  drivers/gpu/drm/i915/i915_drv.h     |  3 ++
>>  drivers/gpu/drm/i915/i915_gem.c     |  2 +-
>>  drivers/gpu/drm/i915/intel_lrc.c    | 12 ++++++++
>>  drivers/gpu/drm/i915/intel_lrc.h    |  1 +
>>  drivers/gpu/drm/i915/intel_uncore.c | 41 ++++++++++++++++++++++++---
>>  6 files changed, 108 insertions(+), 7 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
>> index e5688edd62cd..a034793bc246 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.c
>> +++ b/drivers/gpu/drm/i915/i915_drv.c
>> @@ -1830,18 +1830,70 @@ void i915_reset(struct drm_i915_private *dev_priv)
>>   *
>>   * Reset a specific GPU engine. Useful if a hang is detected.
>>   * Returns zero on successful reset or otherwise an error code.
>> + *
>> + * Procedure is fairly simple:
>> + *  - identifies the request that caused the hang and it is dropped
>> + *  - force engine to idle: this is done by issuing a reset request
>> + *  - reset engine
>> + *  - restart submissions to the engine
>>   */
>>  int i915_reset_engine(struct intel_engine_cs *engine)
>
> What's the serialisation between potential callers of
> i915_reset_engine()?
>

I feel that making 'reset_in_progress' per engine feature
would clarify this and would be more fitting as now it is
the one engine that can be in reset at particular point in time,
decoupled with others.

Having global 'reset_in_progress' and then a per engine
resets just doesn't feel right.

-Mika

>>  {
>>  	int ret;
>>  	struct drm_i915_private *dev_priv = engine->i915;
>>  
>> -	/* FIXME: replace me with engine reset sequence */
>> -	ret = -ENODEV;
>> +	/*
>> +	 * We need to first idle the engine by issuing a reset request,
>> +	 * then perform soft reset and re-initialize hw state, for all of
>> +	 * this GT power need to be awake so ensure it does throughout the
>> +	 * process
>> +	 */
>> +	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
>> +
>> +	/*
>> +	 * the request that caused the hang is stuck on elsp, identify the
>> +	 * active request and drop it, adjust head to skip the offending
>> +	 * request to resume executing remaining requests in the queue.
>> +	 */
>> +	i915_gem_reset_engine(engine);
>
> Must freeze the engine and irqs first, before calling
> i915_gem_reset_engine() (i.e. something like disable_engines_irq,
> cancelling tasklet)
>
> Eeek note that the current i915_gem_reset_engine() is lacking a
> spinlock.
>
>> +
>> +	ret = intel_engine_reset_begin(engine);
>> +	if (ret) {
>> +		DRM_ERROR("Failed to disable %s\n", engine->name);
>> +		goto error;
>> +	}
>> +
>> +	ret = intel_gpu_reset(dev_priv, intel_engine_flag(engine));
>> +	if (ret) {
>> +		DRM_ERROR("Failed to reset %s, ret=%d\n", engine->name, ret);
>> +		intel_engine_reset_cancel(engine);
>> +		goto error;
>> +	}
>> +
>> +	ret = engine->init_hw(engine);
>> +	if (ret)
>> +		goto error;
>>  
>> +	intel_engine_reset_cancel(engine);
>> +	intel_execlists_restart_submission(engine);
>
> engine->init_hw(engine) *is* intel_execlists_restart_submission.
>
>> +
>> +	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
>> +	return 0;
>> +
>> +error:
>>  	/* use full gpu reset to recover on error */
>>  	set_bit(I915_RESET_IN_PROGRESS, &dev_priv->gpu_error.flags);
>>  
>> +	/* Engine reset is performed without taking struct_mutex, since it
>> +	 * failed we now fallback to full gpu reset. Wakeup any waiters
>> +	 * which should now see the reset_in_progress and release
>> +	 * struct_mutex for us to continue recovery.
>> +	 */
>> +	rcu_read_lock();
>> +	intel_engine_wakeup(engine);
>> +	rcu_read_unlock();
>> +
>> +	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
>>  	return ret;
>>  }
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2016-12-19 14:24 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-16 20:20 [PATCH 0/9] Execlist based engine-reset Michel Thierry
2016-12-16 20:20 ` [PATCH 1/9] drm/i915: Keep i915_handle_error kerneldoc parameters together Michel Thierry
2016-12-16 21:40   ` Chris Wilson
2016-12-19  8:22   ` Tvrtko Ursulin
2016-12-16 20:20 ` [PATCH 2/9] drm/i915: Update i915.reset to handle engine resets Michel Thierry
2016-12-16 20:20 ` [PATCH 3/9] drm/i915/tdr: Modify error handler for per engine hang recovery Michel Thierry
2016-12-16 20:20 ` [PATCH 4/9] drm/i915/tdr: Add support for per engine reset recovery Michel Thierry
2016-12-16 20:45   ` Chris Wilson
2016-12-19  5:02     ` Michel Thierry
2016-12-19  9:51       ` Chris Wilson
2017-01-05 23:55         ` Michel Thierry
2016-12-19 14:24     ` Mika Kuoppala [this message]
2016-12-19 14:42       ` Chris Wilson
2016-12-16 20:20 ` [PATCH 5/9] drm/i915: Skip reset request if there is one already Michel Thierry
2016-12-16 20:20 ` [PATCH 6/9] drm/i915/tdr: Add engine reset count to error state Michel Thierry
2016-12-16 20:37   ` Chris Wilson
2016-12-16 21:19     ` Michel Thierry
2016-12-19  8:27   ` Tvrtko Ursulin
2016-12-19 18:09     ` Michel Thierry
2016-12-16 20:20 ` [PATCH 7/9] drm/i915/tdr: Export per-engine reset count info to debugfs Michel Thierry
2016-12-16 20:20 ` [PATCH 8/9] drm/i915/tdr: Enable Engine reset and recovery support Michel Thierry
2016-12-16 20:20 ` [RFC 9/9] drm/i915: Add engine reset count in get-reset-stats ioctl Michel Thierry
2016-12-16 20:45 ` ✗ Fi.CI.BAT: warning for Execlist based engine-reset Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pokot2st.fsf@gaia.fi.intel.com \
    --to=mika.kuoppala@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=michel.thierry@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox