From: Daniel Vetter <daniel@ffwll.ch>
To: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Intel GFX <intel-gfx@lists.freedesktop.org>,
Ben Widawsky <ben@bwidawsk.net>
Subject: Re: [PATCH 13/16] drm/i915: More correct (slower) ppgtt cleanup
Date: Thu, 17 Jul 2014 11:49:29 +0200 [thread overview]
Message-ID: <20140717094929.GJ15237@phenom.ffwll.local> (raw)
In-Reply-To: <1404238671-18760-14-git-send-email-benjamin.widawsky@intel.com>
On Tue, Jul 01, 2014 at 11:17:48AM -0700, Ben Widawsky wrote:
> If a VM still have objects which are bound (exactly: have a node
> reserved in the drm_mm), and we are in the middle of a reset, we have no
> hope of the standard methods fixing the situation (ring idle won't
> work). We must therefore let the reset handler take it's course, and
> then we can resume tearing down the VM.
>
> This logic very much duplicates^Wresembles the logic in our wait for
> error code. I've decided to leave it as open coded because I expect this
> bit of code to require tweaks and changes over time.
>
> Interruptions via signal causes a really similar problem.
>
> This should deprecate the need for the yet unmerged patch from Chris
> (and an identical patch from me, which was first!!):
> drm/i915: Prevent signals from interrupting close()
>
> I have a followup patch to implement deferred free, before you complain.
>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Imo this goes in the wrong direction. ppgtt_cleanup really shouldn't ever
have a need to wait for the gpu. We need to rework the lifetimes such that
we keep the ppgtt alive until the gpu is done with it. Similarly to how we
keep the objects themselves around when the gpu is still using them. Even
when userspace has already dropped the last reference.
Having such a stark behaviour difference between ppgtt lifetimes and
object lifetimes only leads to unecessary complexity and fragility in the
code. And this patch here is a good example for this.
-Daniel
> ---
> drivers/gpu/drm/i915/i915_gem_context.c | 51 +++++++++++++++++++++++++++++++--
> 1 file changed, 48 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 8d106d9..e1b5613 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -101,6 +101,32 @@ static void do_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt)
> struct drm_device *dev = ppgtt->base.dev;
> struct drm_i915_private *dev_priv = dev->dev_private;
> struct i915_address_space *vm = &ppgtt->base;
> + bool do_idle = false;
> + int ret;
> +
> + /* If we get here while in reset, we need to let the reset handler run
> + * first, or else our VM teardown isn't going to go smoothly. There are
> + * a could of options at this point, but letting the reset handler do
> + * it's thing is the most desirable. The reset handler will take care of
> + * retiring the stuck requests.
> + */
> + if (i915_reset_in_progress(&dev_priv->gpu_error)) {
> + mutex_unlock(&dev->struct_mutex);
> +#define EXIT_COND (!i915_reset_in_progress(&dev_priv->gpu_error) || \
> + i915_terminally_wedged(&dev_priv->gpu_error))
> + ret = wait_event_timeout(dev_priv->gpu_error.reset_queue,
> + EXIT_COND,
> + 10 * HZ);
> + if (!ret) {
> + /* it's unlikely idling will solve anything, but it
> + * shouldn't hurt to try. */
> + do_idle = true;
> + /* TODO: go down kicking and screaming harder */
> + }
> +#undef EXIT_COND
> +
> + mutex_lock(&dev->struct_mutex);
> + }
>
> if (ppgtt == dev_priv->mm.aliasing_ppgtt ||
> (list_empty(&vm->active_list) && list_empty(&vm->inactive_list))) {
> @@ -117,14 +143,33 @@ static void do_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt)
> if (!list_empty(&vm->active_list)) {
> struct i915_vma *vma;
>
> + do_idle = true;
> list_for_each_entry(vma, &vm->active_list, mm_list)
> if (WARN_ON(list_empty(&vma->vma_link) ||
> list_is_singular(&vma->vma_link)))
> break;
> - i915_gem_evict_vm(&ppgtt->base, true, true);
> - } else {
> + } else
> i915_gem_retire_requests(dev);
> - i915_gem_evict_vm(&ppgtt->base, false, true);
> +
> + /* We have a problem here where VM teardown cannot be interrupted, or
> + * else the ppgtt cleanup will fail. As an example, a precisely timed
> + * SIGKILL could leads to an OOPS, or worse. There are two options:
> + * 1. Make the eviction uninterruptible
> + * 2. Defer the eviction if it was interrupted.
> + *
> + * Option #1 is not the friendliest, but it's the easiest to implement,
> + * and least error prone.
> + * TODO: Implement option 2
> + */
> + ret = i915_gem_evict_vm(&ppgtt->base, do_idle, !do_idle);
> + if (ret == -ERESTARTSYS)
> + ret = i915_gem_evict_vm(&ppgtt->base, do_idle, false);
> + WARN_ON(ret);
> + WARN_ON(!list_empty(&vm->active_list));
> +
> + /* This is going to blow up badly if the mm is unclean */
> + if (WARN_ON(!list_empty(&ppgtt->base.mm.head_node.node_list))) {
> + /* TODO: go down kicking and screaming harder++ */
> }
>
> ppgtt->base.cleanup(&ppgtt->base);
> --
> 2.0.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
next prev parent reply other threads:[~2014-07-17 9:49 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-01 18:17 [PATCH 00/16] Enabling GEN8 full PPGTT + fixes Ben Widawsky
2014-07-01 18:17 ` [PATCH 01/16] drm/i915: Split up do_switch Ben Widawsky
2014-07-01 18:17 ` [PATCH 02/16] drm/i915: Extract l3 remapping out of ctx switch Ben Widawsky
2014-07-01 18:17 ` [PATCH 03/16] drm/i915/ppgtt: Load address space after mi_set_context Ben Widawsky
2014-07-01 18:17 ` [PATCH 04/16] drm/i915: Fix another another use-after-free in do_switch Ben Widawsky
2014-08-09 20:15 ` [PATCH] [v2] " Ben Widawsky
2014-08-10 8:04 ` Chris Wilson
2014-08-11 9:26 ` Daniel Vetter
2014-07-01 18:17 ` [PATCH 05/16] drm/i915/ctx: Return earlier on failure Ben Widawsky
2014-07-04 8:14 ` Chris Wilson
2014-07-01 18:17 ` [PATCH 06/16] drm/i915/error: Check the potential ctx obj's vm Ben Widawsky
2014-07-17 8:47 ` Daniel Vetter
2014-07-01 18:17 ` [PATCH 07/16] drm/i915/error: vma error capture prettyify Ben Widawsky
2014-07-01 18:17 ` [PATCH 08/16] drm/i915/error: Do a better job of disambiguating VMAs Ben Widawsky
2014-07-04 7:57 ` Chris Wilson
2014-07-04 16:56 ` Ben Widawsky
2014-07-17 8:51 ` Daniel Vetter
2014-07-20 23:49 ` Ben Widawsky
2014-07-01 18:17 ` [PATCH 09/16] drm/i915/error: Capture vmas instead of BOs Ben Widawsky
2014-07-01 18:17 ` [PATCH 10/16] drm/i915: Add some extra guards in evict_vm Ben Widawsky
2014-07-01 18:17 ` [PATCH 11/16] drm/i915: Make an uninterruptible evict Ben Widawsky
2014-07-01 18:17 ` [PATCH 12/16] drm/i915: Reorder ctx unref on ppgtt cleanup Ben Widawsky
2014-07-17 9:56 ` Daniel Vetter
2014-07-01 18:17 ` [PATCH 13/16] drm/i915: More correct (slower) " Ben Widawsky
2014-07-17 9:49 ` Daniel Vetter [this message]
2014-07-01 18:17 ` [PATCH 14/16] drm/i915: Defer PPGTT cleanup Ben Widawsky
2014-07-01 18:17 ` [PATCH 15/16] drm/i915/bdw: Enable full PPGTT Ben Widawsky
2014-07-01 18:17 ` [PATCH 16/16] drm/i915: Get the error state over the wire (HACKish) Ben Widawsky
2014-07-04 8:02 ` Chris Wilson
2014-07-03 22:01 ` [PATCH 1/2] drm/i915/gen8: Invalidate TLBs before PDP reload Ben Widawsky
2014-07-03 22:01 ` [PATCH 2/2] drm/i915: Remove false assertion in ppgtt_release Ben Widawsky
2014-07-04 7:51 ` [PATCH 1/2] drm/i915/gen8: Invalidate TLBs before PDP reload Chris Wilson
2014-07-04 16:55 ` Ben Widawsky
2014-07-17 12:04 ` [PATCH 00/16] Enabling GEN8 full PPGTT + fixes Daniel Vetter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140717094929.GJ15237@phenom.ffwll.local \
--to=daniel@ffwll.ch \
--cc=ben@bwidawsk.net \
--cc=benjamin.widawsky@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox