All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Intel-GFX@Lists.FreeDesktop.Org
Subject: Re: [RFC] drm/i915: Add sync framework support to execbuff IOCTL
Date: Mon, 6 Jul 2015 17:37:57 +0200	[thread overview]
Message-ID: <20150706153757.GA7568@phenom.ffwll.local> (raw)
In-Reply-To: <559A9CF8.9010601@linux.intel.com>

On Mon, Jul 06, 2015 at 04:21:28PM +0100, Tvrtko Ursulin wrote:
> 
> On 07/06/2015 04:12 PM, Daniel Vetter wrote:
> >On Mon, Jul 06, 2015 at 03:46:49PM +0100, Tvrtko Ursulin wrote:
> >>On 07/06/2015 03:26 PM, John Harrison wrote:
> >>>On 06/07/2015 14:59, Daniel Vetter wrote:
> >>>>On Mon, Jul 06, 2015 at 01:58:25PM +0100, John Harrison wrote:
> >>>>>On 06/07/2015 10:29, Daniel Vetter wrote:
> >>>>>>On Fri, Jul 03, 2015 at 12:17:33PM +0100, Tvrtko Ursulin wrote:
> >>>>>>>On 07/02/2015 04:55 PM, Chris Wilson wrote:
> >>>>>>>>It would be nice if we could reuse one seqno both for
> >>>>>>>>internal/external
> >>>>>>>>fences. If you need to expose a fence ordering within a timeline
> >>>>>>>>that is
> >>>>>>>>based on the creation stamp rather than execution stamp, it seems
> >>>>>>>>like
> >>>>>>>>we could just add such a stamp when creating the sync_pt and not
> >>>>>>>>worry
> >>>>>>>>about its relationship to the execution seqno.
> >>>>>>>>
> >>>>>>>>Doing so does expose that requests are reordered to userspace
> >>>>>>>>since the
> >>>>>>>>signalling timeline is not the same as userspace's ordered
> >>>>>>>>timeline. Not
> >>>>>>>>sure if that is a problem or not.
> >>>>>>>>
> >>>>>>>>Afaict the sync uapi is based on waiting for all of a set of
> >>>>>>>>fences to
> >>>>>>>>retire. It doesn't seem to rely on fence ordering (that is knowing
> >>>>>>>>that
> >>>>>>>>fence A will signal before fence B so it need only wait on fence B).
> >>>>>>>>
> >>>>>>>>Here's hoping that we can have both simplicity and efficiency...
> >>>>>>>Jumping in with not even perfect understanding of everything here -
> >>>>>>>but
> >>>>>>>timeline business has always been confusing me. There is nothing in
> >>>>>>>the
> >>>>>>>uapi which needs it afaics and iirc there was some discussion at
> >>>>>>>the time
> >>>>>>>Jesse floated his patches that it can be removed. Based on that when I
> >>>>>>>squashed his patches and ported them on top of John's request to fence
> >>>>>>>conversion it ended up something like the below (manually edited a
> >>>>>>>bit to
> >>>>>>>be less noisy and some prep patches omitted):
> >>>>>>>
> >>>>>>>This implements the ioctl based uapi and indeed seqnos are not
> >>>>>>>actually
> >>>>>>>used in waits. So is this insufficient for some reason? (Other that it
> >>>>>>>does not implement the input fence side of things.)
> >>>>>>Yeah android syncpt on top of struct fence embedded int i915 request is
> >>>>>>what I'd have expected.
> >>>>>The thing I'm not happy with in that plan is that it leaves the kernel
> >>>>>driver at the mercy of user land applications. If we return a fence
> >>>>>object
> >>>>>to user land via a file descriptor (or indeed any other mechanism)
> >>>>>then that
> >>>>>fence object must be locked until user land closes the file. If the
> >>>>>fence
> >>>>>object is the one embedded within our request structure then that
> >>>>>means user
> >>>>>land is effectively locking our request structure. Given that more
> >>>>>and more
> >>>>>stuff is being attached to the request, that could be a fair bit of
> >>>>>memory
> >>>>>tied up that we can do nothing about. E.g. if a rogue/buggy application
> >>>>>requests a fence be returned for every batch buffer submitted but never
> >>>>>closes them. Whereas, if we go the route of a separate fence object
> >>>>>specifically for user land then they can leak them like a sieve and
> >>>>>we won't
> >>>>>really care so much.
> >>>>Userspace can exhaust kernel allocations, that's nothing new. And if we
> >>>>keep it userspace simply needs to leak a few more fence fds than if
> >>>>there's a bit more data attached to it.
> >>>>
> >>>>The solution to this problem is to have a mem cgroup limit set. No
> >>>>need to
> >>>>complicate our kernel code.
> >>>
> >>>There is still the extra complication that request unreferencing cannot
> >>>require any kind of mutex lock if we are allowing it to happen from
> >>>outside of the driver. That means the unreference callback must move the
> >>>request to a 'please clean me later' list, schedule a worker thread to
> >>>run, and thus do the clean up asynchronously.
> >>
> >>For this particular issue my solution was to extend the sync_fence
> >>constructor to take a mutex and store it inside the object. Then at
> >>destruction time, which happens at sync_fd->f_ops->release() time, it is
> >>just a matter of calling kref_put_mutex instead of kref_put.
> >>
> >>Seemed to work under some quick testing but that is as much as I did back
> >>then.
> >
> >The problem is that it doesn't scale since everyone wants some other kind
> >of mutex to serialize the final kref_put. If something is supposed to be
> >cross-subsystem/driver (which is the case for fences) then we really can't
> >do that kind of leaky locking design. Imo we should have a kref_put_mutex
> >considered harmful sign somewhere ...
> 
> I get the argument about everything wanting to add their own not scaling,
> but don't tie with the leaky comment? Also mutex is a pretty standard thing,
> especially since kref_put_mutex. :D If you look at it from that angle it
> kind of just exposes to the super class what the base class can do.

leaky not as in leaking memory, but leaky abstraction - we impose locking
internals for i915 onto users of the fence interface (somewhat at least).
> 
> >If you have weak references somewhere and need to prevent the object from
> >disappearing untimely while chasing that weak reference then imo the
> >better design pattern is to use kref_get_unless_zero. If you need the
> >serialization the mutex provides for some other reason (someone is only
> >hodling the mutex instead of grabbing a proper refernce when they really
> >should grab one) then your refcounting scheme probably needs another kind
> >of fixup patch.
> 
> I don't see how weak references can work since if the request goes
> information is lost, unless stored somewhere else.

So weak refernce is a pointer which doesn't hold a reference to the object
controlled with kref, but protected by some lock. Latest when the object
is destroyed we need to clear that pointer in the free callback of
kref_put (and so also grab the lock that protects that pointer). The
problem is that anyone else chasing that weak reference might race with
the final kref_put and increase the refcount from 0 to 1, which isn't
good.

There's two ways to do that:
- kref_put_mutex on the release side.
- in the acquire side do a kref_get_unless_zero _while_ holding that lock.

Two upsides of the later approach:
- You can have an unrestricted amount of weak references, each protected
  with their own lock. kref_put_mutex only works up to one.
- It doesn't serialize the final kref_put with the lock - doing that
  allows folks to rely on the mutex instead of a proper refcount to make
  the obj stick around, which is imo a design antipattern that I suffered
  through trying to cleaning it up agian a lot.

But that's really a tangent to the discussion here, I have no idea whether
this applies here since I didn't read your patch in detail.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2015-07-06 15:41 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-02 11:09 [RFC] drm/i915: Add sync framework support to execbuff IOCTL John.C.Harrison
2015-07-02 11:54 ` Chris Wilson
2015-07-02 12:02   ` Chris Wilson
2015-07-02 13:01   ` John Harrison
2015-07-02 13:22     ` Chris Wilson
2015-07-02 15:43       ` John Harrison
2015-07-02 15:55         ` Chris Wilson
2015-07-03 11:17           ` Tvrtko Ursulin
2015-07-06  9:29             ` Daniel Vetter
2015-07-06 12:58               ` John Harrison
2015-07-06 13:59                 ` Daniel Vetter
2015-07-06 14:26                   ` John Harrison
2015-07-06 14:41                     ` Daniel Vetter
2015-07-06 14:46                     ` Tvrtko Ursulin
2015-07-06 15:12                       ` Daniel Vetter
2015-07-06 15:21                         ` Tvrtko Ursulin
2015-07-06 15:37                           ` Daniel Vetter [this message]
2015-07-06 16:34                             ` Tvrtko Ursulin
2015-07-06 17:58                               ` Daniel Vetter
2015-07-07  9:15                 ` Tvrtko Ursulin
2015-07-29 21:19                   ` Jesse Barnes
2015-07-30 11:36                     ` John Harrison

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150706153757.GA7568@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=Intel-GFX@Lists.FreeDesktop.Org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.