From: John.C.Harrison@Intel.com
To: Intel-GFX@Lists.FreeDesktop.Org
Cc: DRI-Devel@Lists.FreeDesktop.Org
Subject: [Intel-gfx] [PATCH] drm/i915: Don't wait forever in drop_caches
Date: Wed, 2 Nov 2022 18:35:02 -0700 [thread overview]
Message-ID: <20221103013502.2804729-1-John.C.Harrison@Intel.com> (raw)
From: John Harrison <John.C.Harrison@Intel.com>
At the end of each test, IGT does a drop caches call via debugfs with
special flags set. One of the possible paths waits for idle with an
infinite timeout. That causes problems for debugging issues when CI
catches a "can't go idle" test failure. Best case, the CI system times
out (after 90s), attempts a bunch of state dump actions and then
reboots the system to recover it. Worst case, the CI system can't do
anything at all and then times out (after 1000s) and simply reboots.
Sometimes a serial port log of dmesg might be available, sometimes not.
So rather than making life hard for ourselves, change the timeout to
be 10s rather than infinite. Also, trigger the standard
wedge/reset/recover sequence so that testing can continue with a
working system (if possible).
v2: Rationalise timeout defines (review feedback from Jani & Tvrtko).
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
drivers/gpu/drm/i915/i915_debugfs.c | 10 ++++++++--
drivers/gpu/drm/i915/i915_drv.h | 2 --
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index ae987e92251dd..a224584ea4eb1 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -621,6 +621,9 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_perf_noa_delay_fops,
i915_perf_noa_delay_set,
"%llu\n");
+#define DROPCACHE_IDLE_ENGINES_TIMEOUT_MS 200
+#define DROPCACHE_IDLE_GT_TIMEOUT (HZ * 10)
+
#define DROP_UNBOUND BIT(0)
#define DROP_BOUND BIT(1)
#define DROP_RETIRE BIT(2)
@@ -641,6 +644,7 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_perf_noa_delay_fops,
DROP_RESET_ACTIVE | \
DROP_RESET_SEQNO | \
DROP_RCU)
+
static int
i915_drop_caches_get(void *data, u64 *val)
{
@@ -654,14 +658,16 @@ gt_drop_caches(struct intel_gt *gt, u64 val)
int ret;
if (val & DROP_RESET_ACTIVE &&
- wait_for(intel_engines_are_idle(gt), I915_IDLE_ENGINES_TIMEOUT))
+ wait_for(intel_engines_are_idle(gt), DROPCACHE_IDLE_ENGINES_TIMEOUT_MS))
intel_gt_set_wedged(gt);
if (val & DROP_RETIRE)
intel_gt_retire_requests(gt);
if (val & (DROP_IDLE | DROP_ACTIVE)) {
- ret = intel_gt_wait_for_idle(gt, MAX_SCHEDULE_TIMEOUT);
+ ret = intel_gt_wait_for_idle(gt, DROPCACHE_IDLE_GT_TIMEOUT);
+ if (ret == -ETIME)
+ intel_gt_set_wedged(gt);
if (ret)
return ret;
}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 05b3300cc4edf..4c2adaad8e9ed 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -162,8 +162,6 @@ struct i915_gem_mm {
u32 shrink_count;
};
-#define I915_IDLE_ENGINES_TIMEOUT (200) /* in ms */
-
unsigned long i915_fence_context_timeout(const struct drm_i915_private *i915,
u64 context);
--
2.37.3
next reply other threads:[~2022-11-03 1:33 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-03 1:35 John.C.Harrison [this message]
2022-11-03 2:13 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915: Don't wait forever in drop_caches (rev2) Patchwork
2022-11-03 2:13 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
2022-11-03 2:35 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-11-03 15:49 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
-- strict thread matches above, loose matches on Subject: below --
2022-11-01 23:50 [Intel-gfx] [PATCH] drm/i915: Don't wait forever in drop_caches John.C.Harrison
2022-11-02 12:12 ` Jani Nikula
2022-11-02 14:20 ` Tvrtko Ursulin
2022-11-03 1:33 ` John Harrison
2022-11-03 9:18 ` Tvrtko Ursulin
2022-11-03 9:38 ` Tvrtko Ursulin
2022-11-03 19:16 ` John Harrison
2022-11-04 10:01 ` Tvrtko Ursulin
2022-11-04 17:45 ` John Harrison
2022-11-07 14:09 ` Tvrtko Ursulin
2022-11-07 19:45 ` John Harrison
2022-11-08 9:08 ` Tvrtko Ursulin
2022-11-08 19:37 ` John Harrison
2022-11-09 11:35 ` Tvrtko Ursulin
2022-11-10 6:20 ` John Harrison
2022-11-03 19:37 ` John Harrison
2022-11-03 10:45 ` Jani Nikula
2022-11-03 19:39 ` John Harrison
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221103013502.2804729-1-John.C.Harrison@Intel.com \
--to=john.c.harrison@intel.com \
--cc=DRI-Devel@Lists.FreeDesktop.Org \
--cc=Intel-GFX@Lists.FreeDesktop.Org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox