From: Daniel Vetter <daniel.vetter@ffwll.ch>
To: Intel Graphics Development <intel-gfx@lists.freedesktop.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
Mika Kuoppala <mika.kuoppala@intel.com>,
Daniel Vetter <daniel.vetter@intel.com>
Subject: [PATCH 3/7] drm/i915: More surgically unbreak the modeset vs reset deadlock
Date: Thu, 20 Jul 2017 19:57:50 +0200 [thread overview]
Message-ID: <20170720175754.30751-4-daniel.vetter@ffwll.ch> (raw)
In-Reply-To: <20170720175754.30751-1-daniel.vetter@ffwll.ch>
There's no reason to entirely wedge the gpu, for the minimal deadlock
bugfix we only need to unbreak/decouple the atomic commit from the gpu
reset. The simplest way to fix that is by replacing the
unconditional fence wait a the top of commit_tail by a wait which
completes either when the fences are done (normal case, or when a
reset doesn't need to touch the display state). Or when the gpu reset
needs to force-unblock all pending modeset states.
The lesser source of deadlocks is when we try to pin a new framebuffer
and run into a stall. There's a bunch of places this can happen, like
eviction, changing the caching mode, acquiring a fence on older
platforms. And we can't just break the depency loop and keep going,
the only way would be to break out and restart. But the problem with
that approach is that we must stall for the reset to complete before
we grab any locks, and with the atomic infrastructure that's a bit
tricky. The only place is the ioctl code, and we don't want to insert
code into e.g. the BUSY ioctl. Hence for that problem just create a
critical section, and if any code is in there, wedge the GPU. For the
steady-state this should never be a problem.
Note that in both cases TDR itself keeps working, so from a userspace
pov this trickery isn't observable. Users themselvs might spot a short
glitch while the rendering is catching up again, but that's still
better than pre-TDR where we've thrown away all the rendering,
including innocent batches. Also, this fixes the regression TDR
introduced of making gpu resets deadlock-prone when we do need to
touch the display.
One thing I noticed is that gpu_error.flags seems to use both our own
wait-queue in gpu_error.wait_queue, and the generic wait_on_bit
facilities. Not entirely sure why this inconsistency exists, I just
picked one style.
A possible future avenue could be to insert the gpu reset in-between
ongoing modeset changes, which would avoid the momentary glitch. But
that's a lot more work to implement in the atomic commit machinery,
and given that we only need this for pre-g4x hw, of questionable
utility just for the sake of polishing gpu reset even more on those
old boxes. It might be useful for other features though.
v2: Rebase onto 4.13 with a s/wait_queue_t/struct wait_queue_entry/.
v3: Really emabarrassing fixup, I checked the wrong bit and broke the
unbreak/wakeup logic.
v4: Also handle deadlocks in pin_to_display.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 3 +++
drivers/gpu/drm/i915/intel_display.c | 45 +++++++++++++++++++++++++++++++-----
2 files changed, 42 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 07e98b07c5bc..9b055deca36d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1505,6 +1505,8 @@ struct i915_gpu_error {
/* Protected by the above dev->gpu_error.lock. */
struct i915_gpu_state *first_error;
+ atomic_t pending_fb_pin;
+
unsigned long missed_irq_rings;
/**
@@ -1564,6 +1566,7 @@ struct i915_gpu_error {
unsigned long flags;
#define I915_RESET_BACKOFF 0
#define I915_RESET_HANDOFF 1
+#define I915_RESET_MODESET 2
#define I915_WEDGED (BITS_PER_LONG - 1)
#define I915_RESET_ENGINE (I915_WEDGED - I915_NUM_ENGINES)
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index f6bd6282d7f7..63ea8d6b2ebd 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2162,6 +2162,8 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, unsigned int rotation)
*/
intel_runtime_pm_get(dev_priv);
+ atomic_inc(&dev_priv->gpu_error.pending_fb_pin);
+
vma = i915_gem_object_pin_to_display_plane(obj, alignment, &view);
if (IS_ERR(vma))
goto err;
@@ -2189,6 +2191,8 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, unsigned int rotation)
i915_vma_get(vma);
err:
+ atomic_dec(&dev_priv->gpu_error.pending_fb_pin);
+
intel_runtime_pm_put(dev_priv);
return vma;
}
@@ -3471,11 +3475,14 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
!gpu_reset_clobbers_display(dev_priv))
return;
- /* We have a modeset vs reset deadlock, defensively unbreak it.
- *
- * FIXME: We can do a _lot_ better, this is just a first iteration.*/
- i915_gem_set_wedged(dev_priv);
- DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending modeset updates\n");
+ /* We have a modeset vs reset deadlock, defensively unbreak it. */
+ set_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags);
+ wake_up_all(&dev_priv->gpu_error.wait_queue);
+
+ if (atomic_read(&dev_priv->gpu_error.pending_fb_pin)) {
+ DRM_DEBUG_KMS("Modeset potentially stuck, unbreaking through wedging\n");
+ i915_gem_set_wedged(dev_priv);
+ }
/*
* Need mode_config.mutex so that we don't
@@ -3570,6 +3577,8 @@ void intel_finish_reset(struct drm_i915_private *dev_priv)
drm_modeset_drop_locks(ctx);
drm_modeset_acquire_fini(ctx);
mutex_unlock(&dev->mode_config.mutex);
+
+ clear_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags);
}
static bool abort_flip_on_reset(struct intel_crtc *crtc)
@@ -12381,6 +12390,30 @@ static void intel_atomic_helper_free_state_worker(struct work_struct *work)
intel_atomic_helper_free_state(dev_priv);
}
+static void intel_atomic_commit_fence_wait(struct intel_atomic_state *intel_state)
+{
+ struct wait_queue_entry wait_fence, wait_reset;
+ struct drm_i915_private *dev_priv = to_i915(intel_state->base.dev);
+
+ init_wait_entry(&wait_fence, 0);
+ init_wait_entry(&wait_reset, 0);
+ for (;;) {
+ prepare_to_wait(&intel_state->commit_ready.wait,
+ &wait_fence, TASK_UNINTERRUPTIBLE);
+ prepare_to_wait(&dev_priv->gpu_error.wait_queue,
+ &wait_reset, TASK_UNINTERRUPTIBLE);
+
+
+ if (i915_sw_fence_done(&intel_state->commit_ready)
+ || test_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags))
+ break;
+
+ schedule();
+ }
+ finish_wait(&intel_state->commit_ready.wait, &wait_fence);
+ finish_wait(&dev_priv->gpu_error.wait_queue, &wait_reset);
+}
+
static void intel_atomic_commit_tail(struct drm_atomic_state *state)
{
struct drm_device *dev = state->dev;
@@ -12394,7 +12427,7 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state)
unsigned crtc_vblank_mask = 0;
int i;
- i915_sw_fence_wait(&intel_state->commit_ready);
+ intel_atomic_commit_fence_wait(intel_state);
drm_atomic_helper_wait_for_dependencies(state);
--
2.13.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2017-07-20 17:58 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-20 17:57 [PATCH 0/7] gpu reset and page_flip removal, take 2 Daniel Vetter
2017-07-20 17:57 ` [PATCH 1/7] drm/i915: Avoid the gpu reset vs. modeset deadlock Daniel Vetter
2017-07-20 19:47 ` Chris Wilson
2017-07-20 20:04 ` Daniel Vetter
2017-07-20 20:16 ` Chris Wilson
2017-07-20 20:18 ` Daniel Vetter
2017-07-20 17:57 ` [PATCH 2/7] drm/i915: Push i915_sw_fence_wait into the nonblocking atomic commit Daniel Vetter
2017-08-03 19:44 ` Michel Thierry
2017-08-07 15:33 ` Daniel Vetter
2017-08-07 17:06 ` Michel Thierry
2017-07-20 17:57 ` Daniel Vetter [this message]
2017-08-03 19:35 ` [PATCH 3/7] drm/i915: More surgically unbreak the modeset vs reset deadlock Michel Thierry
2017-08-07 15:39 ` Daniel Vetter
2017-07-20 17:57 ` [PATCH 4/7] drm/i915: Rip out legacy page_flip completion/irq handling Daniel Vetter
2017-07-20 17:57 ` [PATCH 5/7] drm/i915: adjust has_pending_fb_unpin to atomic Daniel Vetter
2017-07-20 17:57 ` [PATCH 6/7] drm/i915: Remove intel_flip_work infrastructure Daniel Vetter
2017-07-20 17:57 ` [PATCH 7/7] drm/i915: Drop unpin stall in atomic_prepare_commit Daniel Vetter
2017-07-20 20:47 ` Daniel Vetter
2017-07-20 18:14 ` ✓ Fi.CI.BAT: success for gpu reset and page_flip removal, take 2 Patchwork
2017-07-20 19:45 ` Chris Wilson
-- strict thread matches above, loose matches on Subject: below --
2017-07-20 11:43 [PATCH 1/7] drm/i915: Avoid the gpu reset vs. modeset deadlock Daniel Vetter
2017-07-20 11:43 ` [PATCH 3/7] drm/i915: More surgically unbreak the modeset vs reset deadlock Daniel Vetter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170720175754.30751-4-daniel.vetter@ffwll.ch \
--to=daniel.vetter@ffwll.ch \
--cc=daniel.vetter@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=mika.kuoppala@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox