From: Arun Siluvery <arun.siluvery@linux.intel.com>
To: intel-gfx@lists.freedesktop.org
Cc: Tomas Elf <tomas.elf@intel.com>
Subject: [PATCH 16/20] drm/i915: Fix __i915_wait_request() behaviour during hang detection.
Date: Wed, 13 Jan 2016 17:28:28 +0000 [thread overview]
Message-ID: <1452706112-8617-17-git-send-email-arun.siluvery@linux.intel.com> (raw)
In-Reply-To: <1452706112-8617-1-git-send-email-arun.siluvery@linux.intel.com>
From: Tomas Elf <tomas.elf@intel.com>
Use is_locked parameter in __i915_wait_request() to determine if a thread
should be forced to back off and retry or if it can continue sleeping. Don't
return -EIO from __i915_wait_request since that is bad for the upper layers,
only -EAGAIN to signify reset in progress. (unless the driver is terminally
wedged, in which case there's no mode of recovery left to attempt)
Also, use is_locked in trace_i915_gem_request_wait_begin() trace point for more
accurate reflection of current thread's lock state.
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 88 ++++++++++++++++++++++++++++++++-------
drivers/gpu/drm/i915/i915_trace.h | 15 ++-----
2 files changed, 78 insertions(+), 25 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7122315..80df5b5 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1287,7 +1287,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
unsigned long timeout_expire;
s64 before, now;
int ret;
- int reset_in_progress = 0;
WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
@@ -1312,7 +1311,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
gen6_rps_boost(dev_priv, rps, req->emitted_jiffies);
/* Record current time in case interrupted by signal, or wedged */
- trace_i915_gem_request_wait_begin(req);
+ trace_i915_gem_request_wait_begin(req, is_locked);
before = ktime_get_raw_ns();
/* Optimistic spin for the next jiffie before touching IRQs */
@@ -1327,23 +1326,84 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
for (;;) {
struct timer_list timer;
+ bool full_gpu_reset_completed_unlocked = false;
+ bool reset_in_progress_locked = false;
prepare_to_wait(&ring->irq_queue, &wait, state);
- /* We need to check whether any gpu reset happened in between
- * the caller grabbing the seqno and now ... */
- reset_in_progress =
- i915_gem_check_wedge(ring->dev->dev_private, NULL, interruptible);
+ /*
+ * Rules for waiting with/without struct_mutex held during
+ * asynchronous TDR hang detection/recovery:
+ *
+ * ("reset in progress" = TDR has detected a hang, hang
+ * recovery may or may not have commenced)
+ *
+ * 1. Is the driver terminally wedged? If so, return -EIO since
+ * there is no point in having the caller retry - the driver
+ * is irrecoverably stuck.
+ *
+ * 2. Is this thread holding the struct_mutex and is any of the
+ * following true?
+ *
+ * a) Is any kind of reset in progress?
+ * b) Has a full GPU reset happened while this thread were
+ * sleeping?
+ *
+ * If so:
+ * Return -EAGAIN. The caller should interpret this as:
+ * Release struct_mutex, try to acquire struct_mutex
+ * (through i915_mutex_lock_interruptible(), which will
+ * fail as long as any reset is in progress) and retry the
+ * call to __i915_wait_request(), which hopefully will go
+ * better as soon as the hang has been resolved.
+ *
+ * 3. Is this thread not holding the struct_mutex and has a
+ * full GPU reset completed? (that is, the reset count has
+ * changed but there is currenly no full GPU reset in
+ * progress?)
+ *
+ * If so:
+ * Return 0. Since the request has been purged there is no
+ * requests left to wait for. Just go home.
+ *
+ * 4. Is this thread not holding the struct_mutex and is any
+ * kind of reset in progress?
+ *
+ * If so:
+ * This thread may keep on waiting.
+ */
- if ((reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter)) ||
- reset_in_progress) {
+ if (i915_terminally_wedged(&dev_priv->gpu_error)) {
+ ret = -EIO;
+ break;
+ }
- /* ... but upgrade the -EAGAIN to an -EIO if the gpu
- * is truely gone. */
- if (reset_in_progress)
- ret = reset_in_progress;
- else
- ret = -EAGAIN;
+ reset_in_progress_locked =
+ (((i915_gem_check_wedge(ring->dev->dev_private, NULL, interruptible)) ||
+ (reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))) &&
+ is_locked);
+
+ if (reset_in_progress_locked) {
+ /*
+ * If the caller is holding the struct_mutex throw them
+ * out since TDR needs access to it.
+ */
+ ret = -EAGAIN;
+ break;
+ }
+
+ full_gpu_reset_completed_unlocked =
+ (((reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter)) &&
+ (!i915_reset_in_progress(&dev_priv->gpu_error))));
+
+ if (full_gpu_reset_completed_unlocked) {
+ /*
+ * Full GPU reset without holding the struct_mutex has
+ * completed - just return. If recovery is still in
+ * progress the thread will keep on sleeping until
+ * recovery is complete.
+ */
+ ret = 0;
break;
}
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 5c15d43..7dcac93 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -591,8 +591,8 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_complete,
);
TRACE_EVENT(i915_gem_request_wait_begin,
- TP_PROTO(struct drm_i915_gem_request *req),
- TP_ARGS(req),
+ TP_PROTO(struct drm_i915_gem_request *req, bool blocking),
+ TP_ARGS(req, blocking),
TP_STRUCT__entry(
__field(u32, dev)
@@ -601,25 +601,18 @@ TRACE_EVENT(i915_gem_request_wait_begin,
__field(bool, blocking)
),
- /* NB: the blocking information is racy since mutex_is_locked
- * doesn't check that the current thread holds the lock. The only
- * other option would be to pass the boolean information of whether
- * or not the class was blocking down through the stack which is
- * less desirable.
- */
TP_fast_assign(
struct intel_engine_cs *ring =
i915_gem_request_get_ring(req);
__entry->dev = ring->dev->primary->index;
__entry->ring = ring->id;
__entry->seqno = i915_gem_request_get_seqno(req);
- __entry->blocking =
- mutex_is_locked(&ring->dev->struct_mutex);
+ __entry->blocking = blocking;
),
TP_printk("dev=%u, ring=%u, seqno=%u, blocking=%s",
__entry->dev, __entry->ring,
- __entry->seqno, __entry->blocking ? "yes (NB)" : "no")
+ __entry->seqno, __entry->blocking ? "yes" : "no")
);
DEFINE_EVENT(i915_gem_request, i915_gem_request_wait_end,
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2016-01-13 17:29 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-13 17:28 [PATCH 00/20] TDR/watchdog support for gen8 Arun Siluvery
2016-01-13 17:28 ` [PATCH 01/20] drm/i915: Make i915_gem_reset_ring_status() public Arun Siluvery
2016-01-13 17:28 ` [PATCH 02/20] drm/i915: Generalise common GPU engine reset request/unrequest code Arun Siluvery
2016-01-22 11:24 ` Mika Kuoppala
2016-01-13 17:28 ` [PATCH 03/20] drm/i915: TDR / per-engine hang recovery support for gen8 Arun Siluvery
2016-01-13 21:16 ` Chris Wilson
2016-01-13 21:21 ` Chris Wilson
2016-01-29 14:16 ` Mika Kuoppala
2016-01-13 17:28 ` [PATCH 04/20] drm/i915: TDR / per-engine hang detection Arun Siluvery
2016-01-13 20:37 ` Chris Wilson
2016-01-13 17:28 ` [PATCH 05/20] drm/i915: Extending i915_gem_check_wedge to check engine reset in progress Arun Siluvery
2016-01-13 20:49 ` Chris Wilson
2016-01-13 17:28 ` [PATCH 06/20] drm/i915: Reinstate hang recovery work queue Arun Siluvery
2016-01-13 21:01 ` Chris Wilson
2016-01-13 17:28 ` [PATCH 07/20] drm/i915: Watchdog timeout: Hang detection integration into error handler Arun Siluvery
2016-01-13 21:13 ` Chris Wilson
2016-01-13 17:28 ` [PATCH 08/20] drm/i915: Watchdog timeout: IRQ handler for gen8 Arun Siluvery
2016-01-13 17:28 ` [PATCH 09/20] drm/i915: Watchdog timeout: Ringbuffer command emission " Arun Siluvery
2016-01-13 17:28 ` [PATCH 10/20] drm/i915: Watchdog timeout: DRM kernel interface enablement Arun Siluvery
2016-01-13 17:28 ` [PATCH 11/20] drm/i915: Fake lost context event interrupts through forced CSB checking Arun Siluvery
2016-01-13 17:28 ` [PATCH 12/20] drm/i915: Debugfs interface for per-engine hang recovery Arun Siluvery
2016-01-13 17:28 ` [PATCH 13/20] drm/i915: Test infrastructure for context state inconsistency simulation Arun Siluvery
2016-01-13 17:28 ` [PATCH 14/20] drm/i915: TDR/watchdog trace points Arun Siluvery
2016-01-13 17:28 ` [PATCH 15/20] drm/i915: Port of Added scheduler support to __wait_request() calls Arun Siluvery
2016-01-13 17:28 ` Arun Siluvery [this message]
2016-01-13 17:28 ` [PATCH 17/20] drm/i915: Extended error state with TDR count, watchdog count and engine reset count Arun Siluvery
2016-01-13 17:28 ` [PATCH 18/20] drm/i915: TDR / per-engine hang recovery kernel docs Arun Siluvery
2016-01-13 17:28 ` [PATCH 19/20] drm/i915: drm/i915 changes to simulated hangs Arun Siluvery
2016-01-13 17:28 ` [PATCH 20/20] drm/i915: Enable TDR / per-engine hang recovery Arun Siluvery
2016-01-14 8:30 ` ✗ failure: Fi.CI.BAT Patchwork
-- strict thread matches above, loose matches on Subject: below --
2015-10-23 1:32 [PATCH 00/20] TDR/watchdog support for gen8 Tomas Elf
2015-10-23 1:32 ` [PATCH 16/20] drm/i915: Fix __i915_wait_request() behaviour during hang detection Tomas Elf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1452706112-8617-17-git-send-email-arun.siluvery@linux.intel.com \
--to=arun.siluvery@linux.intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=tomas.elf@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).