From: Michel Thierry <michel.thierry@intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
intel-gfx@lists.freedesktop.org,
Mika Kuoppala <mika.kuoppala@linux.intel.com>
Subject: Re: [PATCH 4/9] drm/i915/tdr: Add support for per engine reset recovery
Date: Sun, 18 Dec 2016 21:02:33 -0800 [thread overview]
Message-ID: <2e64469d-2db6-7f17-e76c-baf788f9c1a8@intel.com> (raw)
In-Reply-To: <20161216204521.GG29871@nuc-i3427.alporthouse.com>
On 12/16/2016 12:45 PM, Chris Wilson wrote:
> On Fri, Dec 16, 2016 at 12:20:05PM -0800, Michel Thierry wrote:
>> From: Arun Siluvery <arun.siluvery@linux.intel.com>
>>
>> This change implements support for per-engine reset as an initial, less
>> intrusive hang recovery option to be attempted before falling back to the
>> legacy full GPU reset recovery mode if necessary. This is only supported
>> from Gen8 onwards.
>>
>> Hangchecker determines which engines are hung and invokes error handler to
>> recover from it. Error handler schedules recovery for each of those engines
>> that are hung. The recovery procedure is as follows,
>> - identifies the request that caused the hang and it is dropped
>> - force engine to idle: this is done by issuing a reset request
>> - reset and re-init engine
>> - restart submissions to the engine
>>
>> If engine reset fails then we fall back to heavy weight full gpu reset
>> which resets all engines and reinitiazes complete state of HW and SW.
>>
>> v2: Rebase.
>>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> Signed-off-by: Tomas Elf <tomas.elf@intel.com>
>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_drv.c | 56 +++++++++++++++++++++++++++++++++++--
>> drivers/gpu/drm/i915/i915_drv.h | 3 ++
>> drivers/gpu/drm/i915/i915_gem.c | 2 +-
>> drivers/gpu/drm/i915/intel_lrc.c | 12 ++++++++
>> drivers/gpu/drm/i915/intel_lrc.h | 1 +
>> drivers/gpu/drm/i915/intel_uncore.c | 41 ++++++++++++++++++++++++---
>> 6 files changed, 108 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
>> index e5688edd62cd..a034793bc246 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.c
>> +++ b/drivers/gpu/drm/i915/i915_drv.c
>> @@ -1830,18 +1830,70 @@ void i915_reset(struct drm_i915_private *dev_priv)
>> *
>> * Reset a specific GPU engine. Useful if a hang is detected.
>> * Returns zero on successful reset or otherwise an error code.
>> + *
>> + * Procedure is fairly simple:
>> + * - identifies the request that caused the hang and it is dropped
>> + * - force engine to idle: this is done by issuing a reset request
>> + * - reset engine
>> + * - restart submissions to the engine
>> */
>> int i915_reset_engine(struct intel_engine_cs *engine)
>
> What's the serialisation between potential callers of
> i915_reset_engine()?
>
I haven't seen simultaneous calls happening yet, would a
reset_engine-specific mutex be enough?
>> {
>> int ret;
>> struct drm_i915_private *dev_priv = engine->i915;
>>
>> - /* FIXME: replace me with engine reset sequence */
>> - ret = -ENODEV;
>> + /*
>> + * We need to first idle the engine by issuing a reset request,
>> + * then perform soft reset and re-initialize hw state, for all of
>> + * this GT power need to be awake so ensure it does throughout the
>> + * process
>> + */
>> + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
>> +
>> + /*
>> + * the request that caused the hang is stuck on elsp, identify the
>> + * active request and drop it, adjust head to skip the offending
>> + * request to resume executing remaining requests in the queue.
>> + */
>> + i915_gem_reset_engine(engine);
>
> Must freeze the engine and irqs first, before calling
> i915_gem_reset_engine() (i.e. something like disable_engines_irq,
> cancelling tasklet)
>
Will do.
> Eeek note that the current i915_gem_reset_engine() is lacking a
> spinlock.
>
The new mutex (for i915_reset_engine) should cover this.
>> +
>> + ret = intel_engine_reset_begin(engine);
>> + if (ret) {
>> + DRM_ERROR("Failed to disable %s\n", engine->name);
>> + goto error;
>> + }
>> +
>> + ret = intel_gpu_reset(dev_priv, intel_engine_flag(engine));
>> + if (ret) {
>> + DRM_ERROR("Failed to reset %s, ret=%d\n", engine->name, ret);
>> + intel_engine_reset_cancel(engine);
>> + goto error;
>> + }
>> +
>> + ret = engine->init_hw(engine);
>> + if (ret)
>> + goto error;
>>
>> + intel_engine_reset_cancel(engine);
>> + intel_execlists_restart_submission(engine);
>
> engine->init_hw(engine) *is* intel_execlists_restart_submission.
>
I'll make these changes,
Thanks.
>> +
>> + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
>> + return 0;
>> +
>> +error:
>> /* use full gpu reset to recover on error */
>> set_bit(I915_RESET_IN_PROGRESS, &dev_priv->gpu_error.flags);
>>
>> + /* Engine reset is performed without taking struct_mutex, since it
>> + * failed we now fallback to full gpu reset. Wakeup any waiters
>> + * which should now see the reset_in_progress and release
>> + * struct_mutex for us to continue recovery.
>> + */
>> + rcu_read_lock();
>> + intel_engine_wakeup(engine);
>> + rcu_read_unlock();
>> +
>> + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
>> return ret;
>> }
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2016-12-19 5:02 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-16 20:20 [PATCH 0/9] Execlist based engine-reset Michel Thierry
2016-12-16 20:20 ` [PATCH 1/9] drm/i915: Keep i915_handle_error kerneldoc parameters together Michel Thierry
2016-12-16 21:40 ` Chris Wilson
2016-12-19 8:22 ` Tvrtko Ursulin
2016-12-16 20:20 ` [PATCH 2/9] drm/i915: Update i915.reset to handle engine resets Michel Thierry
2016-12-16 20:20 ` [PATCH 3/9] drm/i915/tdr: Modify error handler for per engine hang recovery Michel Thierry
2016-12-16 20:20 ` [PATCH 4/9] drm/i915/tdr: Add support for per engine reset recovery Michel Thierry
2016-12-16 20:45 ` Chris Wilson
2016-12-19 5:02 ` Michel Thierry [this message]
2016-12-19 9:51 ` Chris Wilson
2017-01-05 23:55 ` Michel Thierry
2016-12-19 14:24 ` Mika Kuoppala
2016-12-19 14:42 ` Chris Wilson
2016-12-16 20:20 ` [PATCH 5/9] drm/i915: Skip reset request if there is one already Michel Thierry
2016-12-16 20:20 ` [PATCH 6/9] drm/i915/tdr: Add engine reset count to error state Michel Thierry
2016-12-16 20:37 ` Chris Wilson
2016-12-16 21:19 ` Michel Thierry
2016-12-19 8:27 ` Tvrtko Ursulin
2016-12-19 18:09 ` Michel Thierry
2016-12-16 20:20 ` [PATCH 7/9] drm/i915/tdr: Export per-engine reset count info to debugfs Michel Thierry
2016-12-16 20:20 ` [PATCH 8/9] drm/i915/tdr: Enable Engine reset and recovery support Michel Thierry
2016-12-16 20:20 ` [RFC 9/9] drm/i915: Add engine reset count in get-reset-stats ioctl Michel Thierry
2016-12-16 20:45 ` ✗ Fi.CI.BAT: warning for Execlist based engine-reset Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2e64469d-2db6-7f17-e76c-baf788f9c1a8@intel.com \
--to=michel.thierry@intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
--cc=mika.kuoppala@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox