Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: John.C.Harrison@Intel.com
To: Intel-GFX@Lists.FreeDesktop.Org
Cc: DRI-Devel@Lists.FreeDesktop.Org
Subject: [Intel-gfx] [PATCH 2/2] drm/i915/guc: Don't deadlock busyness stats vs reset
Date: Fri, 28 Oct 2022 12:46:49 -0700	[thread overview]
Message-ID: <20221028194649.1130223-3-John.C.Harrison@Intel.com> (raw)
In-Reply-To: <20221028194649.1130223-1-John.C.Harrison@Intel.com>

From: John Harrison <John.C.Harrison@Intel.com>

The engine busyness stats has a worker function to do things like
64bit extend the 32bit hardware counters. The GuC's reset prepare
function flushes out this worker function to ensure no corruption
happens during the reset. Unforunately, the worker function has an
infinite wait for active resets to finish before doing its work. Thus
a deadlock would occur if the worker function had actually started
just as the reset starts.

Update the worker to abort if a reset is in progress rather than
waiting for it to complete. It will still acquire the reset lock in
the case where a reset was not already in progress. So the processing
is still safe from corruption, but the deadlock can no longer occur.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_reset.c             | 15 ++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_reset.h             |  1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  6 ++++--
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 3159df6cdd492..2f48c6e4420ea 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -1407,7 +1407,7 @@ void intel_gt_handle_error(struct intel_gt *gt,
 	intel_runtime_pm_put(gt->uncore->rpm, wakeref);
 }
 
-int intel_gt_reset_trylock(struct intel_gt *gt, int *srcu)
+static int _intel_gt_reset_trylock(struct intel_gt *gt, int *srcu, bool retry)
 {
 	might_lock(&gt->reset.backoff_srcu);
 	might_sleep();
@@ -1416,6 +1416,9 @@ int intel_gt_reset_trylock(struct intel_gt *gt, int *srcu)
 	while (test_bit(I915_RESET_BACKOFF, &gt->reset.flags)) {
 		rcu_read_unlock();
 
+		if (!retry)
+			return -EBUSY;
+
 		if (wait_event_interruptible(gt->reset.queue,
 					     !test_bit(I915_RESET_BACKOFF,
 						       &gt->reset.flags)))
@@ -1429,6 +1432,16 @@ int intel_gt_reset_trylock(struct intel_gt *gt, int *srcu)
 	return 0;
 }
 
+int intel_gt_reset_trylock_noretry(struct intel_gt *gt, int *srcu)
+{
+	return _intel_gt_reset_trylock(gt, srcu, false);
+}
+
+int intel_gt_reset_trylock(struct intel_gt *gt, int *srcu)
+{
+	return _intel_gt_reset_trylock(gt, srcu, true);
+}
+
 void intel_gt_reset_unlock(struct intel_gt *gt, int tag)
 __releases(&gt->reset.backoff_srcu)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.h b/drivers/gpu/drm/i915/gt/intel_reset.h
index adc734e673870..7f863726eb6a2 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.h
+++ b/drivers/gpu/drm/i915/gt/intel_reset.h
@@ -38,6 +38,7 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine,
 
 void __i915_request_reset(struct i915_request *rq, bool guilty);
 
+int __must_check intel_gt_reset_trylock_noretry(struct intel_gt *gt, int *srcu);
 int __must_check intel_gt_reset_trylock(struct intel_gt *gt, int *srcu);
 void intel_gt_reset_unlock(struct intel_gt *gt, int tag);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 941613be3b9dd..1fa1bc7dde3df 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1401,9 +1401,11 @@ static void guc_timestamp_ping(struct work_struct *wrk)
 
 	/*
 	 * Synchronize with gt reset to make sure the worker does not
-	 * corrupt the engine/guc stats.
+	 * corrupt the engine/guc stats. NB: can't actually block waiting
+	 * for a reset to complete as the reset requires flushing out
+	 * any running worker thread. So waiting would deadlock.
 	 */
-	ret = intel_gt_reset_trylock(gt, &srcu);
+	ret = intel_gt_reset_trylock_noretry(gt, &srcu);
 	if (ret)
 		return;
 
-- 
2.37.3


  parent reply	other threads:[~2022-10-28 19:45 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-28 19:46 [Intel-gfx] [PATCH 0/2] Fix for two GuC issues John.C.Harrison
2022-10-28 19:46 ` [Intel-gfx] [PATCH 1/2] drm/i915/guc: Properly initialise kernel contexts John.C.Harrison
2022-10-28 19:46 ` John.C.Harrison [this message]
2022-10-31 10:09   ` [Intel-gfx] [PATCH 2/2] drm/i915/guc: Don't deadlock busyness stats vs reset Tvrtko Ursulin
2022-10-31 12:51     ` Tvrtko Ursulin
2022-10-31 18:30       ` John Harrison
2022-11-01  9:58         ` Tvrtko Ursulin
2022-11-01 16:56           ` John Harrison
2022-11-02  8:17             ` Tvrtko Ursulin
2022-10-28 20:58 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Fix for two GuC issues Patchwork
2022-10-29  0:07 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221028194649.1130223-3-John.C.Harrison@Intel.com \
    --to=john.c.harrison@intel.com \
    --cc=DRI-Devel@Lists.FreeDesktop.Org \
    --cc=Intel-GFX@Lists.FreeDesktop.Org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox