From: Daniel Vetter <daniel@ffwll.ch>
To: Chris Wilson <chris@chris-wilson.co.uk>,
Daniel Vetter <daniel@ffwll.ch>,
intel-gfx@lists.freedesktop.org
Subject: Re: [CI 13/19] drm/i915: Remove (struct_mutex) locking for busy-ioctl
Date: Fri, 5 Aug 2016 22:12:44 +0200 [thread overview]
Message-ID: <20160805201244.GN6232@phenom.ffwll.local> (raw)
In-Reply-To: <20160805193042.GN24508@nuc-i3427.alporthouse.com>
On Fri, Aug 05, 2016 at 08:30:42PM +0100, Chris Wilson wrote:
> On Fri, Aug 05, 2016 at 09:08:34PM +0200, Daniel Vetter wrote:
> > On Fri, Aug 05, 2016 at 10:14:18AM +0100, Chris Wilson wrote:
> > > By applying the same logic as for wait-ioctl, we can query whether a
> > > request has completed without holding struct_mutex. The biggest impact
> > > system-wide is removing the flush_active and the contention that causes.
> > >
> > > Testcase: igt/gem_busy
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Akash Goel <akash.goel@intel.com>
> > > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > ---
> > > drivers/gpu/drm/i915/i915_gem.c | 131 +++++++++++++++++++++++++++++++---------
> > > 1 file changed, 101 insertions(+), 30 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > index ceb00970b2da..b99d64bfb7eb 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -3736,49 +3736,120 @@ i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
> > > i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
> > > }
> > >
> > > +static __always_inline unsigned __busy_read_flag(unsigned int id)
> > > +{
> > > + /* Note that we could alias engines in the execbuf API, but
> > > + * that would be very unwise as it prevents userspace from
> > > + * fine control over engine selection. Ahem.
> > > + *
> > > + * This should be something like EXEC_MAX_ENGINE instead of
> > > + * I915_NUM_ENGINES.
> > > + */
> > > + BUILD_BUG_ON(I915_NUM_ENGINES > 16);
> > > + return 0x10000 << id;
> > > +}
> > > +
> > > +static __always_inline unsigned int __busy_write_id(unsigned int id)
> > > +{
> > > + return id;
> > > +}
> > > +
> > > +static __always_inline unsigned
> > > +__busy_set_if_active(const struct i915_gem_active *active,
> > > + unsigned int (*flag)(unsigned int id))
> > > +{
> > > + /* For more discussion about the barriers and locking concerns,
> > > + * see __i915_gem_active_get_rcu().
> > > + */
> > > + do {
> > > + struct drm_i915_gem_request *request;
> > > + unsigned int id;
> > > +
> > > + request = rcu_dereference(active->request);
> > > + if (!request || i915_gem_request_completed(request))
> > > + return 0;
> > > +
> > > + id = request->engine->exec_id;
> > > +
> > > + /* Check that the pointer wasn't reassigned and overwritten. */
> >
> > cf. our discussion in active_get_rcu - there's no fence_get_rcu in sight
> > anywhere here, hence this needs an smp_rmb().
>
> I toyed with smp_rmb().
>
> The rcu_deference() followed by rcu_access_pointer() is ordered.
>
> So I was back with dancing around "where the dependent-reads ordered by
> the first rcu_deference ordered in front of the second access which was
> itself ordered after the first?" I probably should
> have stuck in the smp_rmb() and stopped worrying - it is still going to
> be cheaper than the refcount traffic.
It's the read of exec_id vs. the 2nd read of request which isn't ordered,
and which we want to be ordered to ensure we read the right engine id (and
not some bogus thing since the request was recycled meanwhile). And I
think the smp_rmb() is indeed required in there:
1. first active->request lookup
2. 2nd active->request lookup (compiler/cpu is allowed to do that, I think it
could even reorder ahead of 1 since it's not a dependent read)
<- gpu completes request, evil other thread does all the clean&recycles
with new bogus engine
3. sample the engine->exec_id
4. bail out of the loop sinc requests looked up in 1&2 match.
> > Also nitpick: The two
> > rcu_dereference(actove->request) feel a bit silly. If we move the first in
> > front of the loop, and update the local request pointer (using a tmp) it
> > would look tidier, and we could even move the loop termination condition
> > into the while () check (and move the return flag(id) at the end of the
> > function).
>
> I was quite content with only having to think of one phase through the
> loop and not worry about state being carried forward.
>
> __busy_set_if_active(const struct i915_gem_active *active,
> unsigned int (*flag)(unsigned int id))
> {
> + struct drm_i915_gem_request *request;
> + unsigned int id;
> +
> /* For more discussion about the barriers and locking concerns,
> * see __i915_gem_active_get_rcu().
> */
> + request = rcu_dereference(active->request);
> do {
> - struct drm_i915_gem_request *request;
> - unsigned int id;
> + struct drm_i915_gem_request *tmp;
>
> - request = rcu_dereference(active->request);
> if (!request || i915_gem_request_completed(request))
> return 0;
>
> id = request->engine->exec_id;
>
> /* Check that the pointer wasn't reassigned and overwritten. */
> - if (request == rcu_access_pointer(active->request))
> - return flag(id);
> + tmp = rcu_dereference(active->request);
> + if (tmp == request)
> + break;
> +
> + request = tmp;
> } while (1);
> +
> + return flag(id);
> }
>
> is also not as well optimised by gcc, apparently.
Hm yeah, underwhelming. I'm ok with either I guess.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2016-08-05 20:12 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-05 9:14 [CI 01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
2016-08-05 9:14 ` [CI 02/19] drm/i915: Convert non-blocking waits for requests over to using RCU Chris Wilson
2016-08-05 9:14 ` [CI 03/19] drm/i915: Convert non-blocking userptr " Chris Wilson
2016-08-05 9:14 ` [CI 04/19] drm/i915/userptr: Remove superfluous interruptible=false on waiting Chris Wilson
2016-08-05 9:14 ` [CI 05/19] drm/i915: Remove forced stop ring on suspend/unload Chris Wilson
2016-08-05 9:14 ` [CI 06/19] drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex Chris Wilson
2016-08-05 9:14 ` [CI 07/19] drm/i915: Simplify do_idling() (Ironlake vt-d w/a) Chris Wilson
2016-08-05 9:14 ` [CI 08/19] drm/i915/shrinker: Wait before acquiring struct_mutex under oom Chris Wilson
2016-08-05 9:14 ` [CI 09/19] drm/i915: Tidy generation of the GTT mmap offset Chris Wilson
2016-08-05 9:14 ` [CI 10/19] drm/i915: Remove unused no-shrinker-steal Chris Wilson
2016-08-05 9:14 ` [CI 11/19] drm/i915: Do a nonblocking wait first in pread/pwrite Chris Wilson
2016-08-05 9:14 ` [CI 12/19] drm/i915: Remove (struct_mutex) locking for wait-ioctl Chris Wilson
2016-08-05 9:14 ` [CI 13/19] drm/i915: Remove (struct_mutex) locking for busy-ioctl Chris Wilson
2016-08-05 19:08 ` Daniel Vetter
2016-08-05 19:30 ` Chris Wilson
2016-08-05 20:12 ` Daniel Vetter [this message]
2016-08-05 9:14 ` [CI 14/19] drm/i915: Reduce locking inside swfinish ioctl Chris Wilson
2016-08-05 9:14 ` [CI 15/19] drm/i915: Remove pinned check from madvise ioctl Chris Wilson
2016-08-05 19:10 ` Daniel Vetter
2016-08-05 9:14 ` [CI 16/19] drm/i915: Remove locking for get_tiling Chris Wilson
2016-08-05 9:14 ` [CI 17/19] drm/i915: Document and reject invalid tiling modes Chris Wilson
2016-08-05 9:14 ` [CI 18/19] drm/i915: Repack fence tiling mode and stride into a single integer Chris Wilson
2016-08-05 9:14 ` [CI 19/19] drm/i915: Assert that the request hasn't been retired Chris Wilson
2016-08-05 9:39 ` ✗ Ro.CI.BAT: failure for series starting with [CI,01/19] drm/i915: Introduce i915_gem_active_wait_unlocked() Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160805201244.GN6232@phenom.ffwll.local \
--to=daniel@ffwll.ch \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.