From: Arun Siluvery <arun.siluvery@linux.intel.com>
To: intel-gfx@lists.freedesktop.org
Cc: Ian Lister <ian.lister@intel.com>, Tomas Elf <tomas.elf@intel.com>
Subject: [PATCH v2 10/15] drm/i915: Extending i915_gem_check_wedge to check engine reset in progress
Date: Fri, 17 Jun 2016 08:09:10 +0100 [thread overview]
Message-ID: <1466147355-4635-11-git-send-email-arun.siluvery@linux.intel.com> (raw)
In-Reply-To: <1466147355-4635-1-git-send-email-arun.siluvery@linux.intel.com>
i915_gem_check_wedge now returns a non-zero result in three different cases:
1. Legacy: A hang has been detected and full GPU reset is in progress.
2. Per-engine recovery:
a. A single engine reference can be passed to the function, in which
case only that engine will be checked. If that particular engine is
detected to be hung and is to be reset this will yield a non-zero result
but not if reset is in progress for any other engine.
b. No engine reference is passed to the function, in which case all
engines are checked for ongoing per-engine hang recovery.
__i915_wait_request() is updated such that if an engine reset is pending,
we request the waiter to try again so that engine recovery can continue.
If i915_wait_request does not take per-engine hang recovery into account
there is no way for a waiting thread to know that a per-engine recovery is
about to happen and that it needs to back off.
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Ian Lister <ian.lister@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
These changes are based on current nightly. I am aware of the changes being
done to wait_request patch in "thundering herd series" but my understanding
is it has other dependencies. We can add incremental changes once that
series is merged.
drivers/gpu/drm/i915/i915_gem.c | 43 ++++++++++++++++++++++++++++++++++++-----
1 file changed, 38 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6160564..bc404da 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -100,12 +100,31 @@ static void i915_gem_info_remove_obj(struct drm_i915_private *dev_priv,
spin_unlock(&dev_priv->mm.object_stat_lock);
}
+static bool i915_engine_reset_pending(struct i915_gpu_error *error,
+ struct intel_engine_cs *engine)
+{
+ int i;
+
+ if (engine)
+ return i915_engine_reset_in_progress(error, engine->id);
+
+ for (i = 0; i < I915_NUM_ENGINES; ++i) {
+ if (i915_engine_reset_in_progress(error, i))
+ return true;
+ }
+
+ return false;
+}
+
static int
i915_gem_wait_for_error(struct i915_gpu_error *error)
{
int ret;
- if (!i915_reset_in_progress(error))
+#define EXIT_COND (!i915_reset_in_progress(error) || \
+ !i915_engine_reset_pending(error, NULL))
+
+ if (EXIT_COND)
return 0;
/*
@@ -114,7 +133,7 @@ i915_gem_wait_for_error(struct i915_gpu_error *error)
* we should simply try to bail out and fail as gracefully as possible.
*/
ret = wait_event_interruptible_timeout(error->reset_queue,
- !i915_reset_in_progress(error),
+ EXIT_COND,
10*HZ);
if (ret == 0) {
DRM_ERROR("Timed out waiting for the gpu reset to complete\n");
@@ -1325,12 +1344,18 @@ put_rpm:
}
static int
-i915_gem_check_wedge(unsigned reset_counter, bool interruptible)
+i915_gem_check_wedge(struct drm_i915_private *dev_priv,
+ struct intel_engine_cs *engine,
+ bool interruptible)
{
+ struct i915_gpu_error *error = &dev_priv->gpu_error;
+ unsigned reset_counter = i915_reset_counter(error);
+
if (__i915_terminally_wedged(reset_counter))
return -EIO;
- if (__i915_reset_in_progress(reset_counter)) {
+ if (__i915_reset_in_progress(reset_counter) ||
+ i915_engine_reset_pending(error, engine)) {
/* Non-interruptible callers can't handle -EAGAIN, hence return
* -EIO unconditionally for these. */
if (!interruptible)
@@ -1500,6 +1525,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
for (;;) {
struct timer_list timer;
+ int reset_pending;
prepare_to_wait(&engine->irq_queue, &wait, state);
@@ -1515,6 +1541,13 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
break;
}
+ reset_pending = i915_engine_reset_pending(&dev_priv->gpu_error,
+ NULL);
+ if (reset_pending) {
+ ret = -EAGAIN;
+ break;
+ }
+
if (i915_gem_request_completed(req, false)) {
ret = 0;
break;
@@ -2997,7 +3030,7 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
* EIO if the GPU is already wedged, or EAGAIN to drop the struct_mutex
* and restart.
*/
- ret = i915_gem_check_wedge(reset_counter, dev_priv->mm.interruptible);
+ ret = i915_gem_check_wedge(dev_priv, NULL, dev_priv->mm.interruptible);
if (ret)
return ret;
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2016-06-17 7:09 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-17 7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
2016-06-17 7:09 ` [PATCH v2 01/15] drm/i915: Update i915.reset to handle engine resets Arun Siluvery
2016-06-17 7:09 ` [PATCH v2 02/15] drm/i915/tdr: Extend the idea of reset_counter to engine reset Arun Siluvery
2016-06-17 7:35 ` Chris Wilson
2016-06-17 7:09 ` [PATCH v2 03/15] drm/i915: Reinstate hang recovery work queue Arun Siluvery
2016-06-17 7:09 ` [PATCH v2 04/15] drm/i915/tdr: Modify error handler for per engine hang recovery Arun Siluvery
2016-06-17 7:09 ` [PATCH v2 05/15] drm/i915/tdr: Prepare execlist submission to handle tdr resubmission after reset Arun Siluvery
2016-06-17 7:39 ` Chris Wilson
2016-06-17 7:09 ` [PATCH v2 06/15] drm/i915/tdr: Capture engine state before reset Arun Siluvery
2016-06-17 7:09 ` [PATCH v2 07/15] drm/i915/tdr: Restore engine state and start after reset Arun Siluvery
2016-06-17 7:32 ` Chris Wilson
2016-06-17 7:09 ` [PATCH v2 08/15] drm/i915/tdr: Add support for per engine reset recovery Arun Siluvery
2016-06-17 7:09 ` [PATCH v2 09/15] drm/i915: Skip reset request if there is one already Arun Siluvery
2016-06-17 7:09 ` Arun Siluvery [this message]
2016-06-17 7:41 ` [PATCH v2 10/15] drm/i915: Extending i915_gem_check_wedge to check engine reset in progress Chris Wilson
2016-06-17 7:09 ` [PATCH v2 11/15] drm/i915: Port of Added scheduler support to __wait_request() calls Arun Siluvery
2016-06-17 7:42 ` Chris Wilson
2016-06-17 7:09 ` [PATCH v2 12/15] drm/i915/tdr: Add engine reset count to error state Arun Siluvery
2016-06-17 7:09 ` [PATCH v2 13/15] drm/i915/tdr: Export reset count info to debugfs Arun Siluvery
2016-06-17 7:21 ` Chris Wilson
2016-06-17 7:09 ` [PATCH v2 14/15] drm/i915/tdr: Enable Engine reset and recovery support Arun Siluvery
2016-06-17 7:09 ` [ONLY FOR BAT v2 15/15] drm/i915: Disable GuC submission for testing Engine reset patches Arun Siluvery
2016-06-17 7:33 ` ✗ Ro.CI.BAT: failure for Execlist based Engine reset and recovery Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1466147355-4635-11-git-send-email-arun.siluvery@linux.intel.com \
--to=arun.siluvery@linux.intel.com \
--cc=ian.lister@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=tomas.elf@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox