From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
To: Jason Ekstrand <jason@jlekstrand.net>,
Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: "Intel GFX" <intel-gfx@lists.freedesktop.org>,
"Chris Wilson" <chris.p.wilson@intel.com>,
"Thomas Hellstrom" <thomas.hellstrom@intel.com>,
"Maling list - DRI developers" <dri-devel@lists.freedesktop.org>,
"Daniel Vetter" <daniel.vetter@intel.com>,
"Christian König" <christian.koenig@amd.com>
Subject: Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document
Date: Fri, 3 Jun 2022 10:20:25 +0300 [thread overview]
Message-ID: <bd615d4e-3911-a9ce-5d9f-fb85f7866d32@intel.com> (raw)
In-Reply-To: <CAOFGe94AXn_vqON++LpiCTqOspCrVZawcYmjL3W6A7tA5vjTpQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 9238 bytes --]
On 02/06/2022 23:35, Jason Ekstrand wrote:
> On Thu, Jun 2, 2022 at 3:11 PM Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com> wrote:
>
> On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew Brost wrote:
> >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin wrote:
> >> On 17/05/2022 21:32, Niranjana Vishwanathapura wrote:
> >> > +VM_BIND/UNBIND ioctl will immediately start
> binding/unbinding the mapping in an
> >> > +async worker. The binding and unbinding will work like a
> special GPU engine.
> >> > +The binding and unbinding operations are serialized and will
> wait on specified
> >> > +input fences before the operation and will signal the output
> fences upon the
> >> > +completion of the operation. Due to serialization,
> completion of an operation
> >> > +will also indicate that all previous operations are also
> complete.
> >>
> >> I guess we should avoid saying "will immediately start
> binding/unbinding" if
> >> there are fences involved.
> >>
> >> And the fact that it's happening in an async worker seem to
> imply it's not
> >> immediate.
> >>
>
> Ok, will fix.
> This was added because in earlier design binding was deferred
> until next execbuff.
> But now it is non-deferred (immediate in that sense). But yah,
> this is confusing
> and will fix it.
>
> >>
> >> I have a question on the behavior of the bind operation when no
> input fence
> >> is provided. Let say I do :
> >>
> >> VM_BIND (out_fence=fence1)
> >>
> >> VM_BIND (out_fence=fence2)
> >>
> >> VM_BIND (out_fence=fence3)
> >>
> >>
> >> In what order are the fences going to be signaled?
> >>
> >> In the order of VM_BIND ioctls? Or out of order?
> >>
> >> Because you wrote "serialized I assume it's : in order
> >>
>
> Yes, in the order of VM_BIND/UNBIND ioctls. Note that bind and
> unbind will use
> the same queue and hence are ordered.
>
> >>
> >> One thing I didn't realize is that because we only get one
> "VM_BIND" engine,
> >> there is a disconnect from the Vulkan specification.
> >>
> >> In Vulkan VM_BIND operations are serialized but per engine.
> >>
> >> So you could have something like this :
> >>
> >> VM_BIND (engine=rcs0, in_fence=fence1, out_fence=fence2)
> >>
> >> VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4)
> >>
> >>
> >> fence1 is not signaled
> >>
> >> fence3 is signaled
> >>
> >> So the second VM_BIND will proceed before the first VM_BIND.
> >>
> >>
> >> I guess we can deal with that scenario in userspace by doing
> the wait
> >> ourselves in one thread per engines.
> >>
> >> But then it makes the VM_BIND input fences useless.
> >>
> >>
> >> Daniel : what do you think? Should be rework this or just deal
> with wait
> >> fences in userspace?
> >>
> >
> >My opinion is rework this but make the ordering via an engine
> param optional.
> >
> >e.g. A VM can be configured so all binds are ordered within the VM
> >
> >e.g. A VM can be configured so all binds accept an engine
> argument (in
> >the case of the i915 likely this is a gem context handle) and binds
> >ordered with respect to that engine.
> >
> >This gives UMDs options as the later likely consumes more KMD
> resources
> >so if a different UMD can live with binds being ordered within the VM
> >they can use a mode consuming less resources.
> >
>
> I think we need to be careful here if we are looking for some out of
> (submission) order completion of vm_bind/unbind.
> In-order completion means, in a batch of binds and unbinds to be
> completed in-order, user only needs to specify in-fence for the
> first bind/unbind call and the our-fence for the last bind/unbind
> call. Also, the VA released by an unbind call can be re-used by
> any subsequent bind call in that in-order batch.
>
> These things will break if binding/unbinding were to be allowed to
> go out of order (of submission) and user need to be extra careful
> not to run into pre-mature triggereing of out-fence and bind failing
> as VA is still in use etc.
>
> Also, VM_BIND binds the provided mapping on the specified address
> space
> (VM). So, the uapi is not engine/context specific.
>
> We can however add a 'queue' to the uapi which can be one from the
> pre-defined queues,
> I915_VM_BIND_QUEUE_0
> I915_VM_BIND_QUEUE_1
> ...
> I915_VM_BIND_QUEUE_(N-1)
>
> KMD will spawn an async work queue for each queue which will only
> bind the mappings on that queue in the order of submission.
> User can assign the queue to per engine or anything like that.
>
> But again here, user need to be careful and not deadlock these
> queues with circular dependency of fences.
>
> I prefer adding this later an as extension based on whether it
> is really helping with the implementation.
>
>
> I can tell you right now that having everything on a single in-order
> queue will not get us the perf we want. What vulkan really wants is
> one of two things:
>
> 1. No implicit ordering of VM_BIND ops. They just happen in whatever
> their dependencies are resolved and we ensure ordering ourselves by
> having a syncobj in the VkQueue.
>
> 2. The ability to create multiple VM_BIND queues. We need at least 2
> but I don't see why there needs to be a limit besides the limits the
> i915 API already has on the number of engines. Vulkan could expose
> multiple sparse binding queues to the client if it's not arbitrarily
> limited.
>
> Why? Because Vulkan has two basic kind of bind operations and we
> don't want any dependencies between them:
>
> 1. Immediate. These happen right after BO creation or maybe as part
> of vkBindImageMemory() or VkBindBufferMemory(). These don't happen on
> a queue and we don't want them serialized with anything. To
> synchronize with submit, we'll have a syncobj in the VkDevice which is
> signaled by all immediate bind operations and make submits wait on it.
>
> 2. Queued (sparse): These happen on a VkQueue which may be the same
> as a render/compute queue or may be its own queue. It's up to us what
> we want to advertise. From the Vulkan API PoV, this is like any other
> queue. Operations on it wait on and signal semaphores. If we have a
> VM_BIND engine, we'd provide syncobjs to wait and signal just like we
> do in execbuf().
>
> The important thing is that we don't want one type of operation to
> block on the other. If immediate binds are blocking on sparse binds,
> it's going to cause over-synchronization issues.
>
> In terms of the internal implementation, I know that there's going to
> be a lock on the VM and that we can't actually do these things in
> parallel. That's fine. Once the dma_fences have signaled and we're
> unblocked to do the bind operation, I don't care if there's a bit of
> synchronization due to locking. That's expected. What we can't
> afford to have is an immediate bind operation suddenly blocking on a
> sparse operation which is blocked on a compute job that's going to run
> for another 5ms.
>
> For reference, Windows solves this by allowing arbitrarily many paging
> queues (what they call a VM_BIND engine/queue). That design works
> pretty well and solves the problems in question. Again, we could just
> make everything out-of-order and require using syncobjs to order
> things as userspace wants. That'd be fine too.
>
> One more note while I'm here: danvet said something on IRC about
> VM_BIND queues waiting for syncobjs to materialize. We don't really
> want/need this. We already have all the machinery in userspace to
> handle wait-before-signal and waiting for syncobj fences to
> materialize and that machinery is on by default. It would actually
> take MORE work in Mesa to turn it off and take advantage of the kernel
> being able to wait for syncobjs to materialize. Also, getting that
> right is ridiculously hard and I really don't want to get it wrong in
> kernel space. When we do memory fences, wait-before-signal will be a
> thing. We don't need to try and make it a thing for syncobj.
>
> --Jason
Thanks Jason,
I missed the bit in the Vulkan spec that we're allowed to have a sparse
queue that does not implement either graphics or compute operations :
"While some implementations may include VK_QUEUE_SPARSE_BINDING_BIT
support in queue families that also include
graphics and compute support, other implementations may only
expose a VK_QUEUE_SPARSE_BINDING_BIT-only queue
family."
So it can all be all a vm_bind engine that just does bind/unbind
operations.
But yes we need another engine for the immediate/non-sparse operations.
-Lionel
> Daniel, any thoughts?
>
> Niranjana
>
> >Matt
> >
> >>
> >> Sorry I noticed this late.
> >>
> >>
> >> -Lionel
> >>
> >>
>
[-- Attachment #2: Type: text/html, Size: 14209 bytes --]
next prev parent reply other threads:[~2022-06-03 7:20 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-17 18:32 [Intel-gfx] [RFC v3 0/3] drm/doc/rfc: i915 VM_BIND feature design + uapi Niranjana Vishwanathapura
2022-05-17 18:32 ` [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document Niranjana Vishwanathapura
2022-05-19 22:52 ` Zanoni, Paulo R
2022-05-23 19:05 ` Niranjana Vishwanathapura
2022-05-23 19:08 ` Niranjana Vishwanathapura
2022-05-24 10:08 ` Lionel Landwerlin
2022-06-01 14:25 ` Lionel Landwerlin
2022-06-01 20:28 ` Matthew Brost
2022-06-02 20:11 ` Niranjana Vishwanathapura
2022-06-02 20:35 ` Jason Ekstrand
2022-06-03 7:20 ` Lionel Landwerlin [this message]
2022-06-03 23:51 ` Niranjana Vishwanathapura
2022-06-07 17:12 ` Jason Ekstrand
2022-06-07 18:18 ` Niranjana Vishwanathapura
2022-06-07 21:32 ` Niranjana Vishwanathapura
2022-06-08 7:33 ` Tvrtko Ursulin
2022-06-08 21:44 ` Niranjana Vishwanathapura
2022-06-08 21:55 ` Jason Ekstrand
2022-06-08 22:48 ` Niranjana Vishwanathapura
2022-06-09 14:49 ` Lionel Landwerlin
2022-06-09 19:31 ` Niranjana Vishwanathapura
2022-06-10 6:53 ` Lionel Landwerlin
2022-06-10 7:54 ` Niranjana Vishwanathapura
2022-06-10 8:18 ` Lionel Landwerlin
2022-06-10 17:42 ` Niranjana Vishwanathapura
2022-06-13 13:33 ` Zeng, Oak
2022-06-13 18:02 ` Niranjana Vishwanathapura
2022-06-14 7:04 ` Lionel Landwerlin
2022-06-14 17:01 ` Niranjana Vishwanathapura
2022-06-14 21:12 ` Zeng, Oak
2022-06-14 21:47 ` Zeng, Oak
2022-06-01 21:18 ` Matthew Brost
2022-06-02 5:42 ` Lionel Landwerlin
2022-06-02 16:22 ` Matthew Brost
2022-06-02 20:24 ` Niranjana Vishwanathapura
2022-06-02 20:16 ` Bas Nieuwenhuizen
2022-06-02 2:13 ` Zeng, Oak
2022-06-02 20:48 ` Niranjana Vishwanathapura
2022-06-06 20:45 ` Zeng, Oak
2022-05-17 18:32 ` [Intel-gfx] [RFC v3 2/3] drm/i915: Update i915 uapi documentation Niranjana Vishwanathapura
2022-06-08 11:24 ` Matthew Auld
2022-06-10 1:43 ` Niranjana Vishwanathapura
2022-05-17 18:32 ` [Intel-gfx] [RFC v3 3/3] drm/doc/rfc: VM_BIND uapi definition Niranjana Vishwanathapura
2022-05-19 23:07 ` Zanoni, Paulo R
2022-05-23 19:19 ` Niranjana Vishwanathapura
2022-06-01 9:02 ` Dave Airlie
2022-06-01 9:27 ` Daniel Vetter
2022-06-02 5:08 ` Niranjana Vishwanathapura
2022-06-03 6:53 ` Niranjana Vishwanathapura
2022-06-07 10:42 ` Tvrtko Ursulin
2022-06-07 21:25 ` Niranjana Vishwanathapura
2022-06-08 7:34 ` Tvrtko Ursulin
2022-06-08 19:52 ` Niranjana Vishwanathapura
2022-06-08 6:40 ` Lionel Landwerlin
2022-06-08 6:43 ` Lionel Landwerlin
2022-06-08 8:36 ` Tvrtko Ursulin
2022-06-08 8:45 ` Lionel Landwerlin
2022-06-08 8:54 ` Tvrtko Ursulin
2022-06-08 20:45 ` Niranjana Vishwanathapura
2022-06-15 9:49 ` Tvrtko Ursulin
2022-06-08 7:12 ` Lionel Landwerlin
2022-06-08 21:24 ` Matthew Brost
2022-06-07 10:27 ` Tvrtko Ursulin
2022-06-07 19:37 ` Niranjana Vishwanathapura
2022-06-08 7:17 ` Tvrtko Ursulin
2022-06-08 9:12 ` Matthew Auld
2022-06-08 21:32 ` Niranjana Vishwanathapura
2022-06-09 8:36 ` Matthew Auld
2022-06-09 18:53 ` Niranjana Vishwanathapura
2022-06-10 10:16 ` Tvrtko Ursulin
2022-06-10 10:32 ` Matthew Auld
2022-06-10 8:34 ` Matthew Brost
2022-05-17 20:49 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/doc/rfc: i915 VM_BIND feature design + uapi (rev3) Patchwork
2022-05-17 20:49 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-05-17 21:09 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-05-18 2:33 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bd615d4e-3911-a9ce-5d9f-fb85f7866d32@intel.com \
--to=lionel.g.landwerlin@intel.com \
--cc=chris.p.wilson@intel.com \
--cc=christian.koenig@amd.com \
--cc=daniel.vetter@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=jason@jlekstrand.net \
--cc=niranjana.vishwanathapura@intel.com \
--cc=thomas.hellstrom@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox