From: Daniel Vetter <daniel@ffwll.ch>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
intel-gfx@lists.freedesktop.org, "Goel,
Akash" <akash.goel@intel.com>
Subject: Re: [PATCH 02/33] drm/i915: Do not overwrite the request with zero on reallocation
Date: Mon, 8 Aug 2016 11:25:56 +0200 [thread overview]
Message-ID: <20160808092556.GY6232@phenom.ffwll.local> (raw)
In-Reply-To: <1470581141-14432-3-git-send-email-chris@chris-wilson.co.uk>
On Sun, Aug 07, 2016 at 03:45:10PM +0100, Chris Wilson wrote:
> When using RCU lookup for the request, commit 0eafec6d3244 ("drm/i915:
> Enable lockless lookup of request tracking via RCU"), we acknowledge that
> we may race with another thread that could have reallocated the request.
> In order for the first thread not to blow up, the second thread must not
> clear the request completed before overwriting it. In the RCU lookup, we
> allow for the engine/seqno to be replaced but we do not allow for it to
> be zeroed.
>
> The choice we make is to either add extra checking to the RCU lookup, or
> embrace the inherent races (as intended). It is more complicated as we
> need to manually clear everything we depend upon being zero initialised,
> but we benefit from not emiting the memset() to clear the entire
> frequently allocated structure (that memset turns up in throughput
> profiles). And at the same time, the lookup remains flexible for future
> adjustments.
>
> v2: Old style LRC requires another variable to be initialize. (The
> danger inherent in not zeroing everything.)
> v3: request->batch also needs to be cleared
>
> Fixes: 0eafec6d3244 ("drm/i915: Enable lockless lookup of request...")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: "Goel, Akash" <akash.goel@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_request.c | 37 ++++++++++++++++++++++++++++++++-
> drivers/gpu/drm/i915/i915_gem_request.h | 11 ++++++++++
> 2 files changed, 47 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index 6a1661643d3d..b7ffde002a62 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -355,7 +355,35 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
> if (req && i915_gem_request_completed(req))
> i915_gem_request_retire(req);
>
> - req = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);
> + /* Beware: Dragons be flying overhead.
> + *
> + * We use RCU to look up requests in flight. The lookups may
> + * race with the request being allocated from the slab freelist.
> + * That is the request we are writing to here, may be in the process
> + * of being read by __i915_gem_active_get_request_rcu(). As such,
> + * we have to be very careful when overwriting the contents. During
> + * the RCU lookup, we change chase the request->engine pointer,
> + * read the request->fence.seqno and increment the reference count.
> + *
> + * The reference count is incremented atomically. If it is zero,
> + * the lookup knows the request is unallocated and complete. Otherwise,
> + * it is either still in use, or has been reallocated and reset
> + * with fence_init(). This increment is safe for release as we check
> + * that the request we have a reference to and matches the active
> + * request.
> + *
> + * Before we increment the refcount, we chase the request->engine
> + * pointer. We must not call kmem_cache_zalloc() or else we set
> + * that pointer to NULL and cause a crash during the lookup. If
> + * we see the request is completed (based on the value of the
> + * old engine and seqno), the lookup is complete and reports NULL.
> + * If we decide the request is not completed (new engine or seqno),
> + * then we grab a reference and double check that it is still the
> + * active request - which it won't be and restart the lookup.
> + *
> + * Do not use kmem_cache_zalloc() here!
> + */
> + req = kmem_cache_alloc(dev_priv->requests, GFP_KERNEL);
> if (!req)
> return ERR_PTR(-ENOMEM);
>
> @@ -375,6 +403,13 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
> req->engine = engine;
> req->ctx = i915_gem_context_get(ctx);
See my earlier review - if we go with this I think we should fully embrace
it and not clear anything where it's not needed. Otherwise we have a funny
mix of defensive clearing to NULL and needing to be careful.
> + /* No zalloc, must clear what we need by hand */
> + req->signaling.wait.tsk = NULL;
This shouldn't be non-NULL once the refcount has dropped to 0. Maybe a
WARN_ON instead?
> + req->previous_context = NULL;
We unconditionally set this in advance_context (together with a bunch of
other ring state tracked in the request). Do we really need to reset this
here?
> + req->file_priv = NULL;
This is already cleared in either request_retire or _release. Again maybe
just a WARN_ON?.
> + req->batch_obj = NULL;
Agreed with this one, we might reuse the request for a non-execbuf
request. But I think we also need to reset ->pid here.
> + req->elsp_submitted = 0;
Needed, but feels misplaced since it's lrc stuff. I think it'd be better
to stuff this into intel_logical_ring_alloc_request_extras.
Aside, while reviewing this I noticed that the /** comments in
i915_gem_request.h aren't really kerneldoc - the metadata is missing. Also
would be great to include all that into a new section in i915.rst.
I didn't spot anything else that could result in harm - but I probably
missed something somewhere ;-)
I'm happy with all the comments&other changes in this patch.
-Daniel
> +
> /*
> * Reserve space in the ring buffer for all the commands required to
> * eventually emit this request. This is to guarantee that the
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> index b2456dede3ad..721eb8cbce9b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -51,6 +51,13 @@ struct intel_signal_node {
> * emission time to be associated with the request for tracking how far ahead
> * of the GPU the submission is.
> *
> + * When modifying this structure be very aware that we perform a lockless
> + * RCU lookup of it that may race against reallocation of the struct
> + * from the slab freelist. We intentionally do not zero the structure on
> + * allocation so that the lookup can use the dangling pointers (and is
> + * cogniscent that those pointers may be wrong). Instead, everything that
> + * needs to be initialised must be done so explicitly.
> + *
> * The requests are reference counted.
> */
> struct drm_i915_gem_request {
> @@ -465,6 +472,10 @@ __i915_gem_active_get_rcu(const struct i915_gem_active *active)
> * just report the active tracker is idle. If the new request is
> * incomplete, then we acquire a reference on it and check that
> * it remained the active request.
> + *
> + * It is then imperative that we do not zero the request on
> + * reallocation, so that we can chase the dangling pointers!
> + * See i915_gem_request_alloc().
> */
> do {
> struct drm_i915_gem_request *request;
> --
> 2.8.1
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2016-08-08 9:26 UTC|newest]
Thread overview: 125+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-07 14:45 First class VMA, take 2 Chris Wilson
2016-08-07 14:45 ` [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance Chris Wilson
2016-08-08 9:12 ` Daniel Vetter
2016-08-08 9:30 ` Chris Wilson
2016-08-08 9:45 ` Chris Wilson
2016-08-09 6:36 ` Joonas Lahtinen
2016-08-09 7:14 ` Chris Wilson
2016-08-09 8:48 ` Joonas Lahtinen
2016-08-09 9:05 ` Chris Wilson
2016-08-10 10:12 ` Daniel Vetter
2016-08-10 10:13 ` Daniel Vetter
2016-08-10 11:00 ` Joonas Lahtinen
2016-08-12 9:50 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 02/33] drm/i915: Do not overwrite the request with zero on reallocation Chris Wilson
2016-08-08 9:25 ` Daniel Vetter [this message]
2016-08-08 9:56 ` Chris Wilson
2016-08-09 6:32 ` Daniel Vetter
2016-08-07 14:45 ` [PATCH 03/33] drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs Chris Wilson
2016-08-09 14:08 ` [PATCH v2] " Chris Wilson
2016-08-09 14:10 ` [PATCH v3] " Chris Wilson
2016-08-09 15:24 ` Mika Kuoppala
2016-08-07 14:45 ` [PATCH 04/33] drm/i915: Use RCU to annotate and enforce protection for breadcrumb's bh Chris Wilson
2016-08-08 9:33 ` Daniel Vetter
2016-08-12 9:56 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 05/33] drm/i915: Reduce amount of duplicate buffer information captured on error Chris Wilson
2016-08-10 7:04 ` Joonas Lahtinen
2016-08-10 7:15 ` Chris Wilson
2016-08-10 8:07 ` Joonas Lahtinen
2016-08-10 8:36 ` Chris Wilson
2016-08-10 10:51 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 06/33] drm/i915: Stop the machine whilst capturing the GPU crash dump Chris Wilson
2016-08-07 14:45 ` [PATCH 07/33] drm/i915: Store the active context object on all engines upon error Chris Wilson
2016-08-09 9:02 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 08/33] drm/i915: Move setting of request->batch into its single callsite Chris Wilson
2016-08-09 15:53 ` Mika Kuoppala
2016-08-09 16:04 ` Chris Wilson
2016-08-10 7:19 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 09/33] drm/i915: Mark unmappable GGTT entries as PIN_HIGH Chris Wilson
2016-08-08 9:09 ` Joonas Lahtinen
2016-08-09 11:05 ` Tvrtko Ursulin
2016-08-09 11:13 ` Chris Wilson
2016-08-09 11:20 ` Chris Wilson
2016-08-07 14:45 ` [PATCH 10/33] drm/i915: Remove inactive/active list from debugfs Chris Wilson
2016-08-09 10:29 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 11/33] drm/i915: Focus debugfs/i915_gem_pinned to show only display pins Chris Wilson
2016-08-09 10:39 ` Joonas Lahtinen
2016-08-09 10:46 ` Chris Wilson
2016-08-09 11:32 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 12/33] drm/i915: Reduce i915_gem_objects to only show object information Chris Wilson
2016-08-10 7:29 ` Joonas Lahtinen
2016-08-10 7:38 ` Chris Wilson
2016-08-10 8:10 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 13/33] drm/i915: Remove redundant WARN_ON from __i915_add_request() Chris Wilson
2016-08-08 9:03 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 14/33] drm/i915: Create a VMA for an object Chris Wilson
2016-08-08 9:01 ` Joonas Lahtinen
2016-08-08 9:09 ` Chris Wilson
2016-08-10 10:58 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 15/33] drm/i915: Track pinned vma inside guc Chris Wilson
2016-08-11 16:19 ` Dave Gordon
2016-08-11 16:41 ` Chris Wilson
2016-08-07 14:45 ` [PATCH 16/33] drm/i915: Convert fence computations to use vma directly Chris Wilson
2016-08-09 10:27 ` Joonas Lahtinen
2016-08-09 10:33 ` Chris Wilson
2016-08-07 14:45 ` [PATCH 17/33] drm/i915: Use VMA directly for checking tiling parameters Chris Wilson
2016-08-09 6:18 ` Joonas Lahtinen
2016-08-09 8:03 ` Chris Wilson
2016-08-07 14:45 ` [PATCH 18/33] drm/i915: Use VMA as the primary object for context state Chris Wilson
2016-08-10 8:03 ` Joonas Lahtinen
2016-08-10 8:25 ` Chris Wilson
2016-08-10 10:54 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 19/33] drm/i915: Only clflush the context object when binding Chris Wilson
2016-08-10 8:41 ` Joonas Lahtinen
2016-08-10 9:02 ` Chris Wilson
2016-08-10 10:50 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 20/33] drm/i915: Use VMA for ringbuffer tracking Chris Wilson
2016-08-11 9:32 ` Joonas Lahtinen
2016-08-11 9:58 ` Chris Wilson
2016-08-07 14:45 ` [PATCH 21/33] drm/i915: Use VMA for scratch page tracking Chris Wilson
2016-08-08 8:00 ` [PATCH 1/3] " Chris Wilson
2016-08-08 8:00 ` [PATCH 2/3] drm/i915: Move common scratch allocation/destroy to intel_engine_cs.c Chris Wilson
2016-08-08 9:24 ` Matthew Auld
2016-08-08 8:00 ` [PATCH 3/3] drm/i915: Move common seqno reset " Chris Wilson
2016-08-08 9:40 ` Matthew Auld
2016-08-08 10:15 ` Chris Wilson
2016-08-08 15:34 ` Matthew Auld
2016-08-11 10:06 ` [PATCH 21/33] drm/i915: Use VMA for scratch page tracking Joonas Lahtinen
2016-08-11 10:22 ` Chris Wilson
2016-08-07 14:45 ` [PATCH 22/33] drm/i915/overlay: Use VMA as the primary tracker for images Chris Wilson
2016-08-11 10:17 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 23/33] drm/i915: Use VMA as the primary tracker for semaphore page Chris Wilson
2016-08-11 10:42 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 24/33] drm/i915: Use VMA for render state page tracking Chris Wilson
2016-08-11 10:46 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 25/33] drm/i915: Use VMA for wa_ctx tracking Chris Wilson
2016-08-11 10:53 ` Joonas Lahtinen
2016-08-11 11:02 ` Chris Wilson
2016-08-11 12:41 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 26/33] drm/i915: Track pinned VMA Chris Wilson
2016-08-11 12:18 ` Joonas Lahtinen
2016-08-11 12:37 ` Chris Wilson
2016-08-07 14:45 ` [PATCH 27/33] drm/i915: Print the batchbuffer offset next to BBADDR in error state Chris Wilson
2016-08-11 12:24 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 28/33] drm/i915: Move per-request pid from request to ctx Chris Wilson
2016-08-11 12:32 ` Joonas Lahtinen
2016-08-11 12:41 ` Chris Wilson
2016-08-07 14:45 ` [PATCH 29/33] drm/i915: Only record active and pending requests upon a GPU hang Chris Wilson
2016-08-11 12:36 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 30/33] drm/i915: Record the RING_MODE register for post-mortem debugging Chris Wilson
2016-08-08 11:35 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 31/33] drm/i915: Always use the GTT for error capture Chris Wilson
2016-08-07 14:45 ` [PATCH 32/33] drm/i915: Consolidate error object printing Chris Wilson
2016-08-09 11:44 ` Joonas Lahtinen
2016-08-09 11:53 ` Chris Wilson
2016-08-10 10:55 ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 33/33] drm/i915: Compress GPU objects in error state Chris Wilson
2016-08-10 10:32 ` Joonas Lahtinen
2016-08-10 10:52 ` Chris Wilson
2016-08-10 11:26 ` Joonas Lahtinen
2016-08-07 15:16 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance Patchwork
2016-08-08 9:46 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev4) Patchwork
2016-08-08 10:34 ` ✗ Fi.CI.BAT: " Patchwork
2016-08-09 14:10 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev5) Patchwork
2016-08-09 14:20 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev6) Patchwork
2016-08-10 6:43 ` Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160808092556.GY6232@phenom.ffwll.local \
--to=daniel@ffwll.ch \
--cc=akash.goel@intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=daniel.vetter@ffwll.ch \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox