Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document

Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>,
	Jason Ekstrand <jason@jlekstrand.net>
Cc: "Intel GFX" <intel-gfx@lists.freedesktop.org>,
	"Chris Wilson" <chris.p.wilson@intel.com>,
	"Thomas Hellstrom" <thomas.hellstrom@intel.com>,
	"Maling list - DRI developers" <dri-devel@lists.freedesktop.org>,
	"Daniel Vetter" <daniel.vetter@intel.com>,
	"Christian König" <christian.koenig@amd.com>
Subject: Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document
Date: Wed, 8 Jun 2022 08:33:25 +0100	[thread overview]
Message-ID: <4be022cc-518e-49e1-96bd-b9720a313401@linux.intel.com> (raw)
In-Reply-To: <20220607213209.GY4461@nvishwa1-DESK>



On 07/06/2022 22:32, Niranjana Vishwanathapura wrote:
> On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana Vishwanathapura wrote:
>> On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason Ekstrand wrote:
>>>  On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura
>>>  <niranjana.vishwanathapura@intel.com> wrote:
>>>
>>>    On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel Landwerlin wrote:
>>>    >   On 02/06/2022 23:35, Jason Ekstrand wrote:
>>>    >
>>>    >     On Thu, Jun 2, 2022 at 3:11 PM Niranjana Vishwanathapura
>>>    >     <niranjana.vishwanathapura@intel.com> wrote:
>>>    >
>>>    >       On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew Brost 
>>> wrote:
>>>    >       >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin
>>>    wrote:
>>>    >       >> On 17/05/2022 21:32, Niranjana Vishwanathapura wrote:
>>>    >       >> > +VM_BIND/UNBIND ioctl will immediately start
>>>    binding/unbinding
>>>    >       the mapping in an
>>>    >       >> > +async worker. The binding and unbinding will work 
>>> like a
>>>    special
>>>    >       GPU engine.
>>>    >       >> > +The binding and unbinding operations are serialized and
>>>    will
>>>    >       wait on specified
>>>    >       >> > +input fences before the operation and will signal the
>>>    output
>>>    >       fences upon the
>>>    >       >> > +completion of the operation. Due to serialization,
>>>    completion of
>>>    >       an operation
>>>    >       >> > +will also indicate that all previous operations are 
>>> also
>>>    >       complete.
>>>    >       >>
>>>    >       >> I guess we should avoid saying "will immediately start
>>>    >       binding/unbinding" if
>>>    >       >> there are fences involved.
>>>    >       >>
>>>    >       >> And the fact that it's happening in an async worker 
>>> seem to
>>>    imply
>>>    >       it's not
>>>    >       >> immediate.
>>>    >       >>
>>>    >
>>>    >       Ok, will fix.
>>>    >       This was added because in earlier design binding was deferred
>>>    until
>>>    >       next execbuff.
>>>    >       But now it is non-deferred (immediate in that sense). But 
>>> yah,
>>>    this is
>>>    >       confusing
>>>    >       and will fix it.
>>>    >
>>>    >       >>
>>>    >       >> I have a question on the behavior of the bind operation 
>>> when
>>>    no
>>>    >       input fence
>>>    >       >> is provided. Let say I do :
>>>    >       >>
>>>    >       >> VM_BIND (out_fence=fence1)
>>>    >       >>
>>>    >       >> VM_BIND (out_fence=fence2)
>>>    >       >>
>>>    >       >> VM_BIND (out_fence=fence3)
>>>    >       >>
>>>    >       >>
>>>    >       >> In what order are the fences going to be signaled?
>>>    >       >>
>>>    >       >> In the order of VM_BIND ioctls? Or out of order?
>>>    >       >>
>>>    >       >> Because you wrote "serialized I assume it's : in order
>>>    >       >>
>>>    >
>>>    >       Yes, in the order of VM_BIND/UNBIND ioctls. Note that bind 
>>> and
>>>    unbind
>>>    >       will use
>>>    >       the same queue and hence are ordered.
>>>    >
>>>    >       >>
>>>    >       >> One thing I didn't realize is that because we only get one
>>>    >       "VM_BIND" engine,
>>>    >       >> there is a disconnect from the Vulkan specification.
>>>    >       >>
>>>    >       >> In Vulkan VM_BIND operations are serialized but per 
>>> engine.
>>>    >       >>
>>>    >       >> So you could have something like this :
>>>    >       >>
>>>    >       >> VM_BIND (engine=rcs0, in_fence=fence1, out_fence=fence2)
>>>    >       >>
>>>    >       >> VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4)
>>>    >       >>
>>>    >       >>
>>>    >       >> fence1 is not signaled
>>>    >       >>
>>>    >       >> fence3 is signaled
>>>    >       >>
>>>    >       >> So the second VM_BIND will proceed before the first 
>>> VM_BIND.
>>>    >       >>
>>>    >       >>
>>>    >       >> I guess we can deal with that scenario in userspace by 
>>> doing
>>>    the
>>>    >       wait
>>>    >       >> ourselves in one thread per engines.
>>>    >       >>
>>>    >       >> But then it makes the VM_BIND input fences useless.
>>>    >       >>
>>>    >       >>
>>>    >       >> Daniel : what do you think? Should be rework this or just
>>>    deal with
>>>    >       wait
>>>    >       >> fences in userspace?
>>>    >       >>
>>>    >       >
>>>    >       >My opinion is rework this but make the ordering via an 
>>> engine
>>>    param
>>>    >       optional.
>>>    >       >
>>>    >       >e.g. A VM can be configured so all binds are ordered 
>>> within the
>>>    VM
>>>    >       >
>>>    >       >e.g. A VM can be configured so all binds accept an engine
>>>    argument
>>>    >       (in
>>>    >       >the case of the i915 likely this is a gem context handle) 
>>> and
>>>    binds
>>>    >       >ordered with respect to that engine.
>>>    >       >
>>>    >       >This gives UMDs options as the later likely consumes more 
>>> KMD
>>>    >       resources
>>>    >       >so if a different UMD can live with binds being ordered 
>>> within
>>>    the VM
>>>    >       >they can use a mode consuming less resources.
>>>    >       >
>>>    >
>>>    >       I think we need to be careful here if we are looking for some
>>>    out of
>>>    >       (submission) order completion of vm_bind/unbind.
>>>    >       In-order completion means, in a batch of binds and unbinds 
>>> to be
>>>    >       completed in-order, user only needs to specify in-fence 
>>> for the
>>>    >       first bind/unbind call and the our-fence for the last
>>>    bind/unbind
>>>    >       call. Also, the VA released by an unbind call can be 
>>> re-used by
>>>    >       any subsequent bind call in that in-order batch.
>>>    >
>>>    >       These things will break if binding/unbinding were to be 
>>> allowed
>>>    to
>>>    >       go out of order (of submission) and user need to be extra
>>>    careful
>>>    >       not to run into pre-mature triggereing of out-fence and bind
>>>    failing
>>>    >       as VA is still in use etc.
>>>    >
>>>    >       Also, VM_BIND binds the provided mapping on the specified
>>>    address
>>>    >       space
>>>    >       (VM). So, the uapi is not engine/context specific.
>>>    >
>>>    >       We can however add a 'queue' to the uapi which can be one 
>>> from
>>>    the
>>>    >       pre-defined queues,
>>>    >       I915_VM_BIND_QUEUE_0
>>>    >       I915_VM_BIND_QUEUE_1
>>>    >       ...
>>>    >       I915_VM_BIND_QUEUE_(N-1)
>>>    >
>>>    >       KMD will spawn an async work queue for each queue which will
>>>    only
>>>    >       bind the mappings on that queue in the order of submission.
>>>    >       User can assign the queue to per engine or anything like 
>>> that.
>>>    >
>>>    >       But again here, user need to be careful and not deadlock 
>>> these
>>>    >       queues with circular dependency of fences.
>>>    >
>>>    >       I prefer adding this later an as extension based on 
>>> whether it
>>>    >       is really helping with the implementation.
>>>    >
>>>    >     I can tell you right now that having everything on a single
>>>    in-order
>>>    >     queue will not get us the perf we want.  What vulkan really 
>>> wants
>>>    is one
>>>    >     of two things:
>>>    >      1. No implicit ordering of VM_BIND ops.  They just happen in
>>>    whatever
>>>    >     their dependencies are resolved and we ensure ordering 
>>> ourselves
>>>    by
>>>    >     having a syncobj in the VkQueue.
>>>    >      2. The ability to create multiple VM_BIND queues.  We need at
>>>    least 2
>>>    >     but I don't see why there needs to be a limit besides the 
>>> limits
>>>    the
>>>    >     i915 API already has on the number of engines.  Vulkan could
>>>    expose
>>>    >     multiple sparse binding queues to the client if it's not
>>>    arbitrarily
>>>    >     limited.
>>>
>>>    Thanks Jason, Lionel.
>>>
>>>    Jason, what are you referring to when you say "limits the i915 API
>>>    already
>>>    has on the number of engines"? I am not sure if there is such an uapi
>>>    today.
>>>
>>>  There's a limit of something like 64 total engines today based on the
>>>  number of bits we can cram into the exec flags in execbuffer2.  I think
>>>  someone had an extended version that allowed more but I ripped it out
>>>  because no one was using it.  Of course, execbuffer3 might not have 
>>> that
>>>  problem at all.
>>>
>>
>> Thanks Jason.
>> Ok, I am not sure which exec flag is that, but yah, execbuffer3 probably
>> will not have this limiation. So, we need to define a VM_BIND_MAX_QUEUE
>> and somehow export it to user (I am thinking of embedding it in
>> I915_PARAM_HAS_VM_BIND. bits[0]->HAS_VM_BIND, bits[1-3]->'n' meaning 2^n
>> queues.
> 
> Ah, I think you are waking about I915_EXEC_RING_MASK (0x3f) which execbuf3
> will also have. So, we can simply define in vm_bind/unbind structures,
> 
> #define I915_VM_BIND_MAX_QUEUE   64
>         __u32 queue;
> 
> I think that will keep things simple.

Hmmm? What does execbuf2 limit has to do with how many engines hardware 
can have? I suggest not to do that.

Change with added this:

	if (set.num_engines > I915_EXEC_RING_MASK + 1)
		return -EINVAL;

To context creation needs to be undone and so let users create engine 
maps with all hardware engines, and let execbuf3 access them all.

Regards,

Tvrtko

> 
> Niranjana
> 
>>
>>>    I am trying to see how many queues we need and don't want it to be
>>>    arbitrarily
>>>    large and unduely blow up memory usage and complexity in i915 driver.
>>>
>>>  I expect a Vulkan driver to use at most 2 in the vast majority of 
>>> cases. I
>>>  could imagine a client wanting to create more than 1 sparse queue in 
>>> which
>>>  case, it'll be N+1 but that's unlikely.  As far as complexity goes, 
>>> once
>>>  you allow two, I don't think the complexity is going up by allowing 
>>> N.  As
>>>  for memory usage, creating more queues means more memory.  That's a
>>>  trade-off that userspace can make.  Again, the expected number here 
>>> is 1
>>>  or 2 in the vast majority of cases so I don't think you need to worry.
>>
>> Ok, will start with n=3 meaning 8 queues.
>> That would require us create 8 workqueues.
>> We can change 'n' later if required.
>>
>> Niranjana
>>
>>>
>>>    >     Why?  Because Vulkan has two basic kind of bind operations 
>>> and we
>>>    don't
>>>    >     want any dependencies between them:
>>>    >      1. Immediate.  These happen right after BO creation or 
>>> maybe as
>>>    part of
>>>    >     vkBindImageMemory() or VkBindBufferMemory().  These don't 
>>> happen
>>>    on a
>>>    >     queue and we don't want them serialized with anything.  To
>>>    synchronize
>>>    >     with submit, we'll have a syncobj in the VkDevice which is
>>>    signaled by
>>>    >     all immediate bind operations and make submits wait on it.
>>>    >      2. Queued (sparse): These happen on a VkQueue which may be the
>>>    same as
>>>    >     a render/compute queue or may be its own queue.  It's up to us
>>>    what we
>>>    >     want to advertise.  From the Vulkan API PoV, this is like any
>>>    other
>>>    >     queue.  Operations on it wait on and signal semaphores.  If we
>>>    have a
>>>    >     VM_BIND engine, we'd provide syncobjs to wait and signal 
>>> just like
>>>    we do
>>>    >     in execbuf().
>>>    >     The important thing is that we don't want one type of 
>>> operation to
>>>    block
>>>    >     on the other.  If immediate binds are blocking on sparse binds,
>>>    it's
>>>    >     going to cause over-synchronization issues.
>>>    >     In terms of the internal implementation, I know that there's 
>>> going
>>>    to be
>>>    >     a lock on the VM and that we can't actually do these things in
>>>    >     parallel.  That's fine.  Once the dma_fences have signaled and
>>>    we're
>>>
>>>    Thats correct. It is like a single VM_BIND engine with multiple 
>>> queues
>>>    feeding to it.
>>>
>>>  Right.  As long as the queues themselves are independent and can 
>>> block on
>>>  dma_fences without holding up other queues, I think we're fine.
>>>
>>>    >     unblocked to do the bind operation, I don't care if there's 
>>> a bit
>>>    of
>>>    >     synchronization due to locking.  That's expected.  What we 
>>> can't
>>>    afford
>>>    >     to have is an immediate bind operation suddenly blocking on a
>>>    sparse
>>>    >     operation which is blocked on a compute job that's going to run
>>>    for
>>>    >     another 5ms.
>>>
>>>    As the VM_BIND queue is per VM, VM_BIND on one VM doesn't block the
>>>    VM_BIND
>>>    on other VMs. I am not sure about usecases here, but just wanted to
>>>    clarify.
>>>
>>>  Yes, that's what I would expect.
>>>  --Jason
>>>
>>>    Niranjana
>>>
>>>    >     For reference, Windows solves this by allowing arbitrarily many
>>>    paging
>>>    >     queues (what they call a VM_BIND engine/queue).  That design 
>>> works
>>>    >     pretty well and solves the problems in question.  Again, we 
>>> could
>>>    just
>>>    >     make everything out-of-order and require using syncobjs to 
>>> order
>>>    things
>>>    >     as userspace wants. That'd be fine too.
>>>    >     One more note while I'm here: danvet said something on IRC 
>>> about
>>>    VM_BIND
>>>    >     queues waiting for syncobjs to materialize.  We don't really
>>>    want/need
>>>    >     this.  We already have all the machinery in userspace to handle
>>>    >     wait-before-signal and waiting for syncobj fences to 
>>> materialize
>>>    and
>>>    >     that machinery is on by default.  It would actually take 
>>> MORE work
>>>    in
>>>    >     Mesa to turn it off and take advantage of the kernel being 
>>> able to
>>>    wait
>>>    >     for syncobjs to materialize.  Also, getting that right is
>>>    ridiculously
>>>    >     hard and I really don't want to get it wrong in kernel 
>>> space.     When we
>>>    >     do memory fences, wait-before-signal will be a thing.  We don't
>>>    need to
>>>    >     try and make it a thing for syncobj.
>>>    >     --Jason
>>>    >
>>>    >   Thanks Jason,
>>>    >
>>>    >   I missed the bit in the Vulkan spec that we're allowed to have a
>>>    sparse
>>>    >   queue that does not implement either graphics or compute 
>>> operations
>>>    :
>>>    >
>>>    >     "While some implementations may include
>>>    VK_QUEUE_SPARSE_BINDING_BIT
>>>    >     support in queue families that also include
>>>    >
>>>    >      graphics and compute support, other implementations may only
>>>    expose a
>>>    >     VK_QUEUE_SPARSE_BINDING_BIT-only queue
>>>    >
>>>    >      family."
>>>    >
>>>    >   So it can all be all a vm_bind engine that just does bind/unbind
>>>    >   operations.
>>>    >
>>>    >   But yes we need another engine for the immediate/non-sparse
>>>    operations.
>>>    >
>>>    >   -Lionel
>>>    >
>>>    >         >
>>>    >       Daniel, any thoughts?
>>>    >
>>>    >       Niranjana
>>>    >
>>>    >       >Matt
>>>    >       >
>>>    >       >>
>>>    >       >> Sorry I noticed this late.
>>>    >       >>
>>>    >       >>
>>>    >       >> -Lionel
>>>    >       >>
>>>    >       >>

next prev parent reply	other threads:[~2022-06-08  7:33 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-17 18:32 [Intel-gfx] [RFC v3 0/3] drm/doc/rfc: i915 VM_BIND feature design + uapi Niranjana Vishwanathapura
2022-05-17 18:32 ` [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document Niranjana Vishwanathapura
2022-05-19 22:52   ` Zanoni, Paulo R
2022-05-23 19:05     ` Niranjana Vishwanathapura
2022-05-23 19:08       ` Niranjana Vishwanathapura
2022-05-24 10:08     ` Lionel Landwerlin
2022-06-01 14:25   ` Lionel Landwerlin
2022-06-01 20:28     ` Matthew Brost
2022-06-02 20:11       ` Niranjana Vishwanathapura
2022-06-02 20:35         ` Jason Ekstrand
2022-06-03  7:20           ` Lionel Landwerlin
2022-06-03 23:51             ` Niranjana Vishwanathapura
2022-06-07 17:12               ` Jason Ekstrand
2022-06-07 18:18                 ` Niranjana Vishwanathapura
2022-06-07 21:32                   ` Niranjana Vishwanathapura
2022-06-08  7:33                     ` Tvrtko Ursulin [this message]
2022-06-08 21:44                       ` Niranjana Vishwanathapura
2022-06-08 21:55                         ` Jason Ekstrand
2022-06-08 22:48                           ` Niranjana Vishwanathapura
2022-06-09 14:49                           ` Lionel Landwerlin
2022-06-09 19:31                             ` Niranjana Vishwanathapura
2022-06-10  6:53                               ` Lionel Landwerlin
2022-06-10  7:54                                 ` Niranjana Vishwanathapura
2022-06-10  8:18                                   ` Lionel Landwerlin
2022-06-10 17:42                                     ` Niranjana Vishwanathapura
2022-06-13 13:33                                       ` Zeng, Oak
2022-06-13 18:02                                         ` Niranjana Vishwanathapura
2022-06-14  7:04                                           ` Lionel Landwerlin
2022-06-14 17:01                                             ` Niranjana Vishwanathapura
2022-06-14 21:12                                               ` Zeng, Oak
2022-06-14 21:47                                                 ` Zeng, Oak
2022-06-01 21:18     ` Matthew Brost
2022-06-02  5:42       ` Lionel Landwerlin
2022-06-02 16:22         ` Matthew Brost
2022-06-02 20:24           ` Niranjana Vishwanathapura
2022-06-02 20:16         ` Bas Nieuwenhuizen
2022-06-02  2:13   ` Zeng, Oak
2022-06-02 20:48     ` Niranjana Vishwanathapura
2022-06-06 20:45       ` Zeng, Oak
2022-05-17 18:32 ` [Intel-gfx] [RFC v3 2/3] drm/i915: Update i915 uapi documentation Niranjana Vishwanathapura
2022-06-08 11:24   ` Matthew Auld
2022-06-10  1:43     ` Niranjana Vishwanathapura
2022-05-17 18:32 ` [Intel-gfx] [RFC v3 3/3] drm/doc/rfc: VM_BIND uapi definition Niranjana Vishwanathapura
2022-05-19 23:07   ` Zanoni, Paulo R
2022-05-23 19:19     ` Niranjana Vishwanathapura
2022-06-01  9:02       ` Dave Airlie
2022-06-01  9:27         ` Daniel Vetter
2022-06-02  5:08           ` Niranjana Vishwanathapura
2022-06-03  6:53             ` Niranjana Vishwanathapura
2022-06-07 10:42               ` Tvrtko Ursulin
2022-06-07 21:25                 ` Niranjana Vishwanathapura
2022-06-08  7:34                   ` Tvrtko Ursulin
2022-06-08 19:52                     ` Niranjana Vishwanathapura
2022-06-08  6:40               ` Lionel Landwerlin
2022-06-08  6:43                 ` Lionel Landwerlin
2022-06-08  8:36                 ` Tvrtko Ursulin
2022-06-08  8:45                   ` Lionel Landwerlin
2022-06-08  8:54                     ` Tvrtko Ursulin
2022-06-08 20:45                       ` Niranjana Vishwanathapura
2022-06-15  9:49                         ` Tvrtko Ursulin
2022-06-08  7:12               ` Lionel Landwerlin
2022-06-08 21:24                 ` Matthew Brost
2022-06-07 10:27   ` Tvrtko Ursulin
2022-06-07 19:37     ` Niranjana Vishwanathapura
2022-06-08  7:17       ` Tvrtko Ursulin
2022-06-08  9:12         ` Matthew Auld
2022-06-08 21:32           ` Niranjana Vishwanathapura
2022-06-09  8:36             ` Matthew Auld
2022-06-09 18:53               ` Niranjana Vishwanathapura
2022-06-10 10:16                 ` Tvrtko Ursulin
2022-06-10 10:32                   ` Matthew Auld
2022-06-10  8:34   ` Matthew Brost
2022-05-17 20:49 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/doc/rfc: i915 VM_BIND feature design + uapi (rev3) Patchwork
2022-05-17 20:49 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-05-17 21:09 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-05-18  2:33 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4be022cc-518e-49e1-96bd-b9720a313401@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris.p.wilson@intel.com \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jason@jlekstrand.net \
    --cc=niranjana.vishwanathapura@intel.com \
    --cc=thomas.hellstrom@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox