From: Arun Siluvery <arun.siluvery@linux.intel.com>
To: intel-gfx@lists.freedesktop.org
Cc: Tomas Elf <tomas.elf@intel.com>
Subject: [PATCH 19/20] drm/i915: drm/i915 changes to simulated hangs
Date: Wed, 13 Jan 2016 17:28:31 +0000 [thread overview]
Message-ID: <1452706112-8617-20-git-send-email-arun.siluvery@linux.intel.com> (raw)
In-Reply-To: <1452706112-8617-1-git-send-email-arun.siluvery@linux.intel.com>
From: Tim Gore <tim.gore@intel.com>
Simulated hangs, as used by drv_hangman and some other IGT tests, are not
handled correctly with the new per-engine hang recovery mode. This patch fixes
several issues needed to get them working in the execlist case.
1) The "simulated" hang is effected by not submitting a particular batch buffer
to the hardware. In this way it is not handled by the hardware are hence
remains in the software queue, leading the TDR mechanism to declare a hang.
The place where the submission of the batch was being blocked was in
intel_logical_ring_advance_and_submit. Because this means the request never
enters the execlist_queue, the TDR mechanism does not detect a hang and the
situation is never cleared. Also, blocking the batch buffer here is before
the intel_ctx_submit_request object gets allocated. During TDR we need to
actually complete the submission process to unhang the ring, but we are not
in user context so cannot allocate the request object. To overcome both
these issues I moved the place where submission is blocked to
execlists_context_unqueue. This means that the request enters the
ring->execlist_queue, so the TDR mechanism detects the hang and can resubmit
the request after the stop_rings bit is cleared.
2) A further problem arises from a workaround in i915_hangcheck_sample to deal
with a context submission status of "...INCONSISTENT" being reported by
intel_execlists_TDR_get_current_request() when the hardware is idle. A
simulated hang, because it causes the sw and hw context id's to be out of
sync, results in a context submission status of "...INCONSISTENT" being
reported, triggering this workaround which resubmits the batch and clears
the hang, avoiding a ring reset. But we want the ring reset to occur, since
this is part of what we are testing. So, I have made
intel_execlists_TDR_get_current_request() aware of simulated hangs, so that
it returns a status of OK in this case. This avoids the workaround being
triggered, leading to the TDR mechanism declaring a ring hang and doing a
ring reset.
Issue: VIZ-5488
Signed-off-by: Tim Gore <tim.gore@intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
---
drivers/gpu/drm/i915/i915_drv.c | 10 ++++++++++
drivers/gpu/drm/i915/intel_lrc.c | 26 +++++++++++++++++++++-----
2 files changed, 31 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 73976f9..fd51c26 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1167,6 +1167,16 @@ int i915_reset_engine(struct intel_engine_cs *engine)
}
}
+ /* Clear any simulated hang flags */
+ if (dev_priv->gpu_error.stop_rings) {
+ DRM_INFO("Simulated gpu hang, reset stop_rings bits %08x\n",
+ (0x1 << engine->id));
+ dev_priv->gpu_error.stop_rings &= ~(0x1 << engine->id);
+ /* if all hangs are cleared, then clear the ALLOW_BAN/ERROR bits */
+ if ((dev_priv->gpu_error.stop_rings & ((1 << I915_NUM_RINGS) - 1)) == 0)
+ dev_priv->gpu_error.stop_rings = 0;
+ }
+
/* Sample the current ring head position */
head = I915_READ_HEAD(engine) & HEAD_ADDR;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 85107a1..b565d78 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -653,6 +653,16 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring, bool tdr_res
}
}
+ /* Check for a simulated hang request */
+ if (intel_ring_stopped(ring)) {
+ /*
+ * Mark the request at the head of the queue as submitted but
+ * dont actually submit it.
+ */
+ req0->elsp_submitted++;
+ return;
+ }
+
WARN_ON(req1 && req1->elsp_submitted && !tdr_resubmission);
execlists_submit_requests(req0, req1, tdr_resubmission);
@@ -1045,16 +1055,12 @@ static int logical_ring_wait_for_space(struct drm_i915_gem_request *req,
static void
intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request)
{
- struct intel_engine_cs *ring = request->ring;
struct drm_i915_private *dev_priv = request->i915;
intel_logical_ring_advance(request->ringbuf);
request->tail = request->ringbuf->tail;
- if (intel_ring_stopped(ring))
- return;
-
if (dev_priv->guc.execbuf_client)
i915_guc_submit(dev_priv->guc.execbuf_client, request);
else
@@ -3290,7 +3296,17 @@ intel_execlists_TDR_get_current_request(struct intel_engine_cs *ring,
}
if (tmpctx) {
- status = ((hw_context == sw_context) && hw_active) ?
+ /*
+ * Check for simuated hang. In this case the head entry in the
+ * sw execlist queue will not have been submitted to the ELSP, so
+ * the hw and sw context id's may well disagree, but we still want
+ * to proceed with hang recovery. So we return OK which allows
+ * the TDR recovery mechanism to proceed with a ring reset.
+ */
+ if (intel_ring_stopped(ring))
+ status = CONTEXT_SUBMISSION_STATUS_OK;
+ else
+ status = ((hw_context == sw_context) && hw_active) ?
CONTEXT_SUBMISSION_STATUS_OK :
CONTEXT_SUBMISSION_STATUS_INCONSISTENT;
} else {
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2016-01-13 17:29 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-13 17:28 [PATCH 00/20] TDR/watchdog support for gen8 Arun Siluvery
2016-01-13 17:28 ` [PATCH 01/20] drm/i915: Make i915_gem_reset_ring_status() public Arun Siluvery
2016-01-13 17:28 ` [PATCH 02/20] drm/i915: Generalise common GPU engine reset request/unrequest code Arun Siluvery
2016-01-22 11:24 ` Mika Kuoppala
2016-01-13 17:28 ` [PATCH 03/20] drm/i915: TDR / per-engine hang recovery support for gen8 Arun Siluvery
2016-01-13 21:16 ` Chris Wilson
2016-01-13 21:21 ` Chris Wilson
2016-01-29 14:16 ` Mika Kuoppala
2016-01-13 17:28 ` [PATCH 04/20] drm/i915: TDR / per-engine hang detection Arun Siluvery
2016-01-13 20:37 ` Chris Wilson
2016-01-13 17:28 ` [PATCH 05/20] drm/i915: Extending i915_gem_check_wedge to check engine reset in progress Arun Siluvery
2016-01-13 20:49 ` Chris Wilson
2016-01-13 17:28 ` [PATCH 06/20] drm/i915: Reinstate hang recovery work queue Arun Siluvery
2016-01-13 21:01 ` Chris Wilson
2016-01-13 17:28 ` [PATCH 07/20] drm/i915: Watchdog timeout: Hang detection integration into error handler Arun Siluvery
2016-01-13 21:13 ` Chris Wilson
2016-01-13 17:28 ` [PATCH 08/20] drm/i915: Watchdog timeout: IRQ handler for gen8 Arun Siluvery
2016-01-13 17:28 ` [PATCH 09/20] drm/i915: Watchdog timeout: Ringbuffer command emission " Arun Siluvery
2016-01-13 17:28 ` [PATCH 10/20] drm/i915: Watchdog timeout: DRM kernel interface enablement Arun Siluvery
2016-01-13 17:28 ` [PATCH 11/20] drm/i915: Fake lost context event interrupts through forced CSB checking Arun Siluvery
2016-01-13 17:28 ` [PATCH 12/20] drm/i915: Debugfs interface for per-engine hang recovery Arun Siluvery
2016-01-13 17:28 ` [PATCH 13/20] drm/i915: Test infrastructure for context state inconsistency simulation Arun Siluvery
2016-01-13 17:28 ` [PATCH 14/20] drm/i915: TDR/watchdog trace points Arun Siluvery
2016-01-13 17:28 ` [PATCH 15/20] drm/i915: Port of Added scheduler support to __wait_request() calls Arun Siluvery
2016-01-13 17:28 ` [PATCH 16/20] drm/i915: Fix __i915_wait_request() behaviour during hang detection Arun Siluvery
2016-01-13 17:28 ` [PATCH 17/20] drm/i915: Extended error state with TDR count, watchdog count and engine reset count Arun Siluvery
2016-01-13 17:28 ` [PATCH 18/20] drm/i915: TDR / per-engine hang recovery kernel docs Arun Siluvery
2016-01-13 17:28 ` Arun Siluvery [this message]
2016-01-13 17:28 ` [PATCH 20/20] drm/i915: Enable TDR / per-engine hang recovery Arun Siluvery
2016-01-14 8:30 ` ✗ failure: Fi.CI.BAT Patchwork
-- strict thread matches above, loose matches on Subject: below --
2015-10-23 1:32 [PATCH 00/20] TDR/watchdog support for gen8 Tomas Elf
2015-10-23 1:32 ` [PATCH 19/20] drm/i915: drm/i915 changes to simulated hangs Tomas Elf
2016-01-22 13:43 ` Chris Wilson
2016-01-22 13:52 ` Arun Siluvery
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1452706112-8617-20-git-send-email-arun.siluvery@linux.intel.com \
--to=arun.siluvery@linux.intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=tomas.elf@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).