From: Daniel Vetter <daniel@ffwll.ch>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
intel-gfx@lists.freedesktop.org, "Goel,
Akash" <akash.goel@intel.com>
Subject: Re: [PATCH 02/33] drm/i915: Do not overwrite the request with zero on reallocation
Date: Mon, 8 Aug 2016 11:25:56 +0200 [thread overview]
Message-ID: <20160808092556.GY6232@phenom.ffwll.local> (raw)
In-Reply-To: <1470581141-14432-3-git-send-email-chris@chris-wilson.co.uk>
On Sun, Aug 07, 2016 at 03:45:10PM +0100, Chris Wilson wrote:
> When using RCU lookup for the request, commit 0eafec6d3244 ("drm/i915:
> Enable lockless lookup of request tracking via RCU"), we acknowledge that
> we may race with another thread that could have reallocated the request.
> In order for the first thread not to blow up, the second thread must not
> clear the request completed before overwriting it. In the RCU lookup, we
> allow for the engine/seqno to be replaced but we do not allow for it to
> be zeroed.
>
> The choice we make is to either add extra checking to the RCU lookup, or
> embrace the inherent races (as intended). It is more complicated as we
> need to manually clear everything we depend upon being zero initialised,
> but we benefit from not emiting the memset() to clear the entire
> frequently allocated structure (that memset turns up in throughput
> profiles). And at the same time, the lookup remains flexible for future
> adjustments.
>
> v2: Old style LRC requires another variable to be initialize. (The
> danger inherent in not zeroing everything.)
> v3: request->batch also needs to be cleared
>
> Fixes: 0eafec6d3244 ("drm/i915: Enable lockless lookup of request...")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: "Goel, Akash" <akash.goel@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_request.c | 37 ++++++++++++++++++++++++++++++++-
> drivers/gpu/drm/i915/i915_gem_request.h | 11 ++++++++++
> 2 files changed, 47 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index 6a1661643d3d..b7ffde002a62 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -355,7 +355,35 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
> if (req && i915_gem_request_completed(req))
> i915_gem_request_retire(req);
>
> - req = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);
> + /* Beware: Dragons be flying overhead.
> + *
> + * We use RCU to look up requests in flight. The lookups may
> + * race with the request being allocated from the slab freelist.
> + * That is the request we are writing to here, may be in the process
> + * of being read by __i915_gem_active_get_request_rcu(). As such,
> + * we have to be very careful when overwriting the contents. During
> + * the RCU lookup, we change chase the request->engine pointer,
> + * read the request->fence.seqno and increment the reference count.
> + *
> + * The reference count is incremented atomically. If it is zero,
> + * the lookup knows the request is unallocated and complete. Otherwise,
> + * it is either still in use, or has been reallocated and reset
> + * with fence_init(). This increment is safe for release as we check
> + * that the request we have a reference to and matches the active
> + * request.
> + *
> + * Before we increment the refcount, we chase the request->engine
> + * pointer. We must not call kmem_cache_zalloc() or else we set
> + * that pointer to NULL and cause a crash during the lookup. If
> + * we see the request is completed (based on the value of the
> + * old engine and seqno), the lookup is complete and reports NULL.
> + * If we decide the request is not completed (new engine or seqno),
> + * then we grab a reference and double check that it is still the
> + * active request - which it won't be and restart the lookup.
> + *
> + * Do not use kmem_cache_zalloc() here!
> + */
> + req = kmem_cache_alloc(dev_priv->requests, GFP_KERNEL);
> if (!req)
> return ERR_PTR(-ENOMEM);
>
> @@ -375,6 +403,13 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
> req->engine = engine;
> req->ctx = i915_gem_context_get(ctx);
See my earlier review - if we go with this I think we should fully embrace
it and not clear anything where it's not needed. Otherwise we have a funny
mix of defensive clearing to NULL and needing to be careful.
> + /* No zalloc, must clear what we need by hand */
> + req->signaling.wait.tsk = NULL;
This shouldn't be non-NULL once the refcount has dropped to 0. Maybe a
WARN_ON instead?
> + req->previous_context = NULL;
We unconditionally set this in advance_context (together with a bunch of
other ring state tracked in the request). Do we really need to reset this
here?
> + req->file_priv = NULL;
This is already cleared in either request_retire or _release. Again maybe
just a WARN_ON?.
> + req->batch_obj = NULL;
Agreed with this one, we might reuse the request for a non-execbuf
request. But I think we also need to reset ->pid here.
> + req->elsp_submitted = 0;
Needed, but feels misplaced since it's lrc stuff. I think it'd be better
to stuff this into intel_logical_ring_alloc_request_extras.
Aside, while reviewing this I noticed that the /** comments in
i915_gem_request.h aren't really kerneldoc - the metadata is missing. Also
would be great to include all that into a new section in i915.rst.
I didn't spot anything else that could result in harm - but I probably
missed something somewhere ;-)
I'm happy with all the comments&other changes in this patch.
-Daniel
> +
> /*
> * Reserve space in the ring buffer for all the commands required to
> * eventually emit this request. This is to guarantee that the
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> index b2456dede3ad..721eb8cbce9b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -51,6 +51,13 @@ struct intel_signal_node {
> * emission time to be associated with the request for tracking how far ahead
> * of the GPU the submission is.
> *
> + * When modifying this structure be very aware that we perform a lockless
> + * RCU lookup of it that may race against reallocation of the struct
> + * from the slab freelist. We intentionally do not zero the structure on
> + * allocation so that the lookup can use the dangling pointers (and is
> + * cogniscent that those pointers may be wrong). Instead, everything that
> + * needs to be initialised must be done so explicitly.
> + *
> * The requests are reference counted.
> */
> struct drm_i915_gem_request {
> @@ -465,6 +472,10 @@ __i915_gem_active_get_rcu(const struct i915_gem_active *active)
> * just report the active tracker is idle. If the new request is
> * incomplete, then we acquire a reference on it and check that
> * it remained the active request.
> + *
> + * It is then imperative that we do not zero the request on
> + * reallocation, so that we can chase the dangling pointers!
> + * See i915_gem_request_alloc().
> */
> do {
> struct drm_i915_gem_request *request;
> --
> 2.8.1
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2016-08-08 9:26 UTC|newest]
Thread overview: 125+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-07 14:45 First class VMA, take 2 Chris Wilson
2016-08-07 14:45 ` [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance Chris Wilson
2016-08-08 9:12 ` Daniel Vetter
2016-08-08 9:30 ` Chris Wilson
2016-08-08 9:45 ` Chris Wilson
2016-08-09 6:36 ` Joonas Lahtinen
2016-08-09 7:14 ` Chris Wilson
2016-08-09 8:48 ` Joonas Lahtinen
2016-08-09 9:05 ` Chris Wilson
2016-08-10 10:12 ` Daniel Vetter
2016-08-10 10:13 ` Daniel Vetter
2016-08-10 11:00 ` Joonas Lahtinen
2016-08-12 9:50 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 02/33] drm/i915: Do not overwrite the request with zero on reallocation Chris Wilson
2016-08-08 9:25 ` Daniel Vetter [this message]
2016-08-08 9:56 ` Chris Wilson
2016-08-09 6:32 ` Daniel Vetter
2016-08-07 14:45 ` [PATCH 03/33] drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs Chris Wilson
2016-08-09 14:08 ` [PATCH v2] " Chris Wilson
2016-08-09 14:10 ` [PATCH v3] " Chris Wilson
2016-08-09 15:24 ` Mika Kuoppala
2016-08-07 14:45 ` [PATCH 04/33] drm/i915: Use RCU to annotate and enforce protection for breadcrumb's bh Chris Wilson
2016-08-08 9:33 ` Daniel Vetter
2016-08-12 9:56 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 05/33] drm/i915: Reduce amount of duplicate buffer information captured on error Chris Wilson
2016-08-10 7:04 ` Joonas Lahtinen
2016-08-10 7:15 ` Chris Wilson
2016-08-10 8:07 ` Joonas Lahtinen
2016-08-10 8:36 ` Chris Wilson
2016-08-10 10:51 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 06/33] drm/i915: Stop the machine whilst capturing the GPU crash dump Chris Wilson
2016-08-07 14:45 ` [PATCH 07/33] drm/i915: Store the active context object on all engines upon error Chris Wilson
2016-08-09 9:02 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 08/33] drm/i915: Move setting of request->batch into its single callsite Chris Wilson
2016-08-09 15:53 ` Mika Kuoppala
2016-08-09 16:04 ` Chris Wilson
2016-08-10 7:19 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 09/33] drm/i915: Mark unmappable GGTT entries as PIN_HIGH Chris Wilson
2016-08-08 9:09 ` Joonas Lahtinen
2016-08-09 11:05 ` Tvrtko Ursulin
2016-08-09 11:13 ` Chris Wilson
2016-08-09 11:20 ` Chris Wilson
2016-08-07 14:45 ` [PATCH 10/33] drm/i915: Remove inactive/active list from debugfs Chris Wilson
2016-08-09 10:29 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 11/33] drm/i915: Focus debugfs/i915_gem_pinned to show only display pins Chris Wilson
2016-08-09 10:39 ` Joonas Lahtinen
2016-08-09 10:46 ` Chris Wilson
2016-08-09 11:32 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 12/33] drm/i915: Reduce i915_gem_objects to only show object information Chris Wilson
2016-08-10 7:29 ` Joonas Lahtinen
2016-08-10 7:38 ` Chris Wilson
2016-08-10 8:10 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 13/33] drm/i915: Remove redundant WARN_ON from __i915_add_request() Chris Wilson
2016-08-08 9:03 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 14/33] drm/i915: Create a VMA for an object Chris Wilson
2016-08-08 9:01 ` Joonas Lahtinen
2016-08-08 9:09 ` Chris Wilson
2016-08-10 10:58 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 15/33] drm/i915: Track pinned vma inside guc Chris Wilson
2016-08-11 16:19 ` Dave Gordon
2016-08-11 16:41 ` Chris Wilson
2016-08-07 14:45 ` [PATCH 16/33] drm/i915: Convert fence computations to use vma directly Chris Wilson
2016-08-09 10:27 ` Joonas Lahtinen
2016-08-09 10:33 ` Chris Wilson
2016-08-07 14:45 ` [PATCH 17/33] drm/i915: Use VMA directly for checking tiling parameters Chris Wilson
2016-08-09 6:18 ` Joonas Lahtinen
2016-08-09 8:03 ` Chris Wilson
2016-08-07 14:45 ` [PATCH 18/33] drm/i915: Use VMA as the primary object for context state Chris Wilson
2016-08-10 8:03 ` Joonas Lahtinen
2016-08-10 8:25 ` Chris Wilson
2016-08-10 10:54 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 19/33] drm/i915: Only clflush the context object when binding Chris Wilson
2016-08-10 8:41 ` Joonas Lahtinen
2016-08-10 9:02 ` Chris Wilson
2016-08-10 10:50 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 20/33] drm/i915: Use VMA for ringbuffer tracking Chris Wilson
2016-08-11 9:32 ` Joonas Lahtinen
2016-08-11 9:58 ` Chris Wilson
2016-08-07 14:45 ` [PATCH 21/33] drm/i915: Use VMA for scratch page tracking Chris Wilson
2016-08-08 8:00 ` [PATCH 1/3] " Chris Wilson
2016-08-08 8:00 ` [PATCH 2/3] drm/i915: Move common scratch allocation/destroy to intel_engine_cs.c Chris Wilson
2016-08-08 9:24 ` Matthew Auld
2016-08-08 8:00 ` [PATCH 3/3] drm/i915: Move common seqno reset " Chris Wilson
2016-08-08 9:40 ` Matthew Auld
2016-08-08 10:15 ` Chris Wilson
2016-08-08 15:34 ` Matthew Auld
2016-08-11 10:06 ` [PATCH 21/33] drm/i915: Use VMA for scratch page tracking Joonas Lahtinen
2016-08-11 10:22 ` Chris Wilson
2016-08-07 14:45 ` [PATCH 22/33] drm/i915/overlay: Use VMA as the primary tracker for images Chris Wilson
2016-08-11 10:17 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 23/33] drm/i915: Use VMA as the primary tracker for semaphore page Chris Wilson
2016-08-11 10:42 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 24/33] drm/i915: Use VMA for render state page tracking Chris Wilson
2016-08-11 10:46 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 25/33] drm/i915: Use VMA for wa_ctx tracking Chris Wilson
2016-08-11 10:53 ` Joonas Lahtinen
2016-08-11 11:02 ` Chris Wilson
2016-08-11 12:41 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 26/33] drm/i915: Track pinned VMA Chris Wilson
2016-08-11 12:18 ` Joonas Lahtinen
2016-08-11 12:37 ` Chris Wilson
2016-08-07 14:45 ` [PATCH 27/33] drm/i915: Print the batchbuffer offset next to BBADDR in error state Chris Wilson
2016-08-11 12:24 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 28/33] drm/i915: Move per-request pid from request to ctx Chris Wilson
2016-08-11 12:32 ` Joonas Lahtinen
2016-08-11 12:41 ` Chris Wilson
2016-08-07 14:45 ` [PATCH 29/33] drm/i915: Only record active and pending requests upon a GPU hang Chris Wilson
2016-08-11 12:36 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 30/33] drm/i915: Record the RING_MODE register for post-mortem debugging Chris Wilson
2016-08-08 11:35 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 31/33] drm/i915: Always use the GTT for error capture Chris Wilson
2016-08-07 14:45 ` [PATCH 32/33] drm/i915: Consolidate error object printing Chris Wilson
2016-08-09 11:44 ` Joonas Lahtinen
2016-08-09 11:53 ` Chris Wilson
2016-08-10 10:55 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 33/33] drm/i915: Compress GPU objects in error state Chris Wilson
2016-08-10 10:32 ` Joonas Lahtinen
2016-08-10 10:52 ` Chris Wilson
2016-08-10 11:26 ` Joonas Lahtinen
2016-08-07 15:16 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance Patchwork
2016-08-08 9:46 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev4) Patchwork
2016-08-08 10:34 ` ✗ Fi.CI.BAT: " Patchwork
2016-08-09 14:10 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev5) Patchwork
2016-08-09 14:20 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev6) Patchwork
2016-08-10 6:43 ` Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160808092556.GY6232@phenom.ffwll.local \
--to=daniel@ffwll.ch \
--cc=akash.goel@intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=daniel.vetter@ffwll.ch \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.