intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Subject: [CI 02/20] drm/i915: Delay queuing hangcheck to wait-request
Date: Fri,  1 Jul 2016 17:23:11 +0100	[thread overview]
Message-ID: <1467390209-3576-2-git-send-email-chris@chris-wilson.co.uk> (raw)
In-Reply-To: <1467390209-3576-1-git-send-email-chris@chris-wilson.co.uk>

We can forgo queuing the hangcheck from the start of every request to
until we wait upon a request. This reduces the overhead of every
request, but may increase the latency of detecting a hang. However, if
nothing every waits upon a hang, did it ever hang? It also improves the
robustness of the wait-request by ensuring that the hangchecker is
indeed running before we sleep indefinitely (and thereby ensuring that
we never actually sleep forever waiting for a dead GPU).

As pointed out by Tvrtko, it is possible for a GPU hang to go unnoticed
for as long as nobody is waiting for the GPU. Though this rare, during
that time we may be consuming more power than if we had promptly
recovered, and in the most extreme case we may exhaust all memory before
forcing the hangcheck. Something to be wary off in future.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 15 +++++++++++----
 drivers/gpu/drm/i915/i915_irq.c | 10 ++++------
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1d9878258103..e0b1e286bf87 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1532,6 +1532,15 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			break;
 		}
 
+		/* Ensure that even if the GPU hangs, we get woken up.
+		 *
+		 * However, note that if no one is waiting, we never notice
+		 * a gpu hang. Eventually, we will have to wait for a resource
+		 * held by the GPU and so trigger a hangcheck. In the most
+		 * pathological case, this will be upon memory starvation!
+		 */
+		i915_queue_hangcheck(dev_priv);
+
 		timer.function = NULL;
 		if (timeout || missed_irq(dev_priv, engine)) {
 			unsigned long expire;
@@ -2919,8 +2928,6 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	/* Not allowed to fail! */
 	WARN(ret, "emit|add_request failed: %d!\n", ret);
 
-	i915_queue_hangcheck(engine->i915);
-
 	queue_delayed_work(dev_priv->wq,
 			   &dev_priv->mm.retire_work,
 			   round_jiffies_up_relative(HZ));
@@ -3264,8 +3271,8 @@ i915_gem_retire_requests(struct drm_i915_private *dev_priv)
 
 	if (idle)
 		mod_delayed_work(dev_priv->wq,
-				   &dev_priv->mm.idle_work,
-				   msecs_to_jiffies(100));
+				 &dev_priv->mm.idle_work,
+				 msecs_to_jiffies(100));
 
 	return idle;
 }
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 4378a659d962..5614582ca240 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3135,10 +3135,10 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	intel_uncore_arm_unclaimed_mmio_detection(dev_priv);
 
 	for_each_engine_id(engine, dev_priv, id) {
+		bool busy = waitqueue_active(&engine->irq_queue);
 		u64 acthd;
 		u32 seqno;
 		unsigned user_interrupts;
-		bool busy = true;
 
 		semaphore_clear_deadlocks(dev_priv);
 
@@ -3161,12 +3161,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 		if (engine->hangcheck.seqno == seqno) {
 			if (ring_idle(engine, seqno)) {
 				engine->hangcheck.action = HANGCHECK_IDLE;
-				if (waitqueue_active(&engine->irq_queue)) {
+				if (busy) {
 					/* Safeguard against driver failure */
 					user_interrupts = kick_waiters(engine);
 					engine->hangcheck.score += BUSY;
-				} else
-					busy = false;
+				}
 			} else {
 				/* We always increment the hangcheck score
 				 * if the ring is busy and still processing
@@ -3240,9 +3239,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 		goto out;
 	}
 
+	/* Reset timer in case GPU hangs without another request being added */
 	if (busy_count)
-		/* Reset timer case chip hangs without another request
-		 * being added */
 		i915_queue_hangcheck(dev_priv);
 
 out:
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2016-07-01 16:23 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-01 16:23 [CI 01/20] drm/i915/shrinker: Flush active on objects before counting Chris Wilson
2016-07-01 16:23 ` Chris Wilson [this message]
2016-07-01 16:23 ` [CI 03/20] drm/i915: Remove the dedicated hangcheck workqueue Chris Wilson
2016-07-01 16:23 ` [CI 04/20] drm/i915: Make queueing the hangcheck work inline Chris Wilson
2016-07-01 16:23 ` [CI 05/20] drm/i915: Separate GPU hang waitqueue from advance Chris Wilson
2016-07-01 16:23 ` [CI 06/20] drm/i915: Slaughter the thundering i915_wait_request herd Chris Wilson
2016-07-01 16:23 ` [CI 07/20] drm/i915: Spin after waking up for an interrupt Chris Wilson
2016-07-01 16:23 ` [CI 08/20] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
2016-07-01 16:23 ` [CI 09/20] drm/i915: Stop mapping the scratch page into CPU space Chris Wilson
2016-07-01 16:23 ` [CI 10/20] drm/i915: Allocate scratch page from stolen Chris Wilson
2016-07-01 16:23 ` [CI 11/20] drm/i915: Refactor scratch object allocation for gen2 w/a buffer Chris Wilson
2016-07-01 16:23 ` [CI 12/20] drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk) Chris Wilson
2016-07-01 16:23 ` [CI 13/20] drm/i915: Check the CPU cached value in HWS of seqno after waking the waiter Chris Wilson
2016-07-01 16:23 ` [CI 14/20] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted Chris Wilson
2016-07-01 16:23 ` [CI 15/20] drm/i915: Stop setting wraparound seqno on initialisation Chris Wilson
2016-07-01 16:23 ` [CI 16/20] drm/i915: Convert trace-irq to the breadcrumb waiter Chris Wilson
2016-07-01 16:23 ` [CI 17/20] drm/i915: Embed signaling node into the GEM request Chris Wilson
2016-07-01 16:23 ` [CI 18/20] drm/i915: Move the get/put irq locking into the caller Chris Wilson
2016-07-01 16:23 ` [CI 19/20] drm/i915: Simplify enabling user-interrupts with L3-remapping Chris Wilson
2016-07-01 16:23 ` [CI 20/20] drm/i915: Remove debug noise on detecting fault-injection of missed interrupts Chris Wilson
2016-07-01 16:51 ` ✗ Ro.CI.BAT: failure for series starting with [CI,01/20] drm/i915/shrinker: Flush active on objects before counting Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1467390209-3576-2-git-send-email-chris@chris-wilson.co.uk \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).