From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B3619C43334 for ; Fri, 10 Jun 2022 07:55:02 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id BE75A10E5D5; Fri, 10 Jun 2022 07:55:00 +0000 (UTC) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by gabe.freedesktop.org (Postfix) with ESMTPS id B24EB10E5D5; Fri, 10 Jun 2022 07:54:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654847698; x=1686383698; h=date:from:to:cc:subject:message-id:references: mime-version:content-transfer-encoding:in-reply-to; bh=umualXmcMC5MH3g8dU7tpSq9RvypDuU8+bEJ4AHCrGM=; b=hzhZMPLdJ+gMkiOPY9OCJwAqBlZyEbDPq3AvwVcsEUIOkfemoXPsniz5 oDjrAoRW3M8G33h41cc70V8Q/0sOmTWUKLYtr6l549jc+YeXAcyMEJQqz 3lV+U1TRLdwxsSP5emw3Wql5fP2xvIxZHblDLatm2dLq+xaHu1mRvpJB4 d5MeNCjrmdML6Cp3g9mSVfaudQLFxjgih+hsWiB5KJnWouuQapa1xUgOI 1z5d3qLeVHd+5yKovA+EpaqcCBUO+8jfuMUMne8wkp6OTxduPFa3OYSj+ DK0VJzhQ4fhG/AJQhuZ3G6XOT3mJwXa8ewALe5I3oLtPyWiBdLoFImPFT g==; X-IronPort-AV: E=McAfee;i="6400,9594,10373"; a="363881527" X-IronPort-AV: E=Sophos;i="5.91,288,1647327600"; d="scan'208";a="363881527" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jun 2022 00:54:58 -0700 X-IronPort-AV: E=Sophos;i="5.91,288,1647327600"; d="scan'208";a="638006038" Received: from nvishwa1-desk.sc.intel.com (HELO nvishwa1-DESK) ([172.25.29.76]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jun 2022 00:54:58 -0700 Date: Fri, 10 Jun 2022 00:54:35 -0700 From: Niranjana Vishwanathapura To: Lionel Landwerlin Message-ID: <20220610075434.GA376@nvishwa1-DESK> References: <20220603235148.GU4461@nvishwa1-DESK> <20220607181810.GV4461@nvishwa1-DESK> <20220607213209.GY4461@nvishwa1-DESK> <4be022cc-518e-49e1-96bd-b9720a313401@linux.intel.com> <20220608214431.GD4461@nvishwa1-DESK> <54fb6c28-7954-123e-edd6-ba6c15b6d36e@intel.com> <20220609193150.GG4461@nvishwa1-DESK> <891017f7-e276-66a1-dd9b-cbebc8f8a00d@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1; format=flowed Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <891017f7-e276-66a1-dd9b-cbebc8f8a00d@intel.com> User-Agent: Mutt/1.5.24 (2015-08-30) Subject: Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Intel GFX , Maling list - DRI developers , Thomas Hellstrom , Chris Wilson , Daniel Vetter , Christian =?iso-8859-1?Q?K=F6nig?= Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On Fri, Jun 10, 2022 at 09:53:24AM +0300, Lionel Landwerlin wrote: >On 09/06/2022 22:31, Niranjana Vishwanathapura wrote: >>On Thu, Jun 09, 2022 at 05:49:09PM +0300, Lionel Landwerlin wrote: >>>  On 09/06/2022 00:55, Jason Ekstrand wrote: >>> >>>    On Wed, Jun 8, 2022 at 4:44 PM Niranjana Vishwanathapura >>>    wrote: >>> >>>      On Wed, Jun 08, 2022 at 08:33:25AM +0100, Tvrtko Ursulin wrote: >>>      > >>>      > >>>      >On 07/06/2022 22:32, Niranjana Vishwanathapura wrote: >>>      >>On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana >>>Vishwanathapura >>>      wrote: >>>      >>>On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason Ekstrand wrote: >>>      >>>> On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura >>>      >>>> wrote: >>>      >>>> >>>      >>>>   On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel Landwerlin >>>      wrote: >>>      >>>>   >   On 02/06/2022 23:35, Jason Ekstrand wrote: >>>      >>>>   > >>>      >>>>   >     On Thu, Jun 2, 2022 at 3:11 PM Niranjana >>>Vishwanathapura >>>      >>>>   > wrote: >>>      >>>>   > >>>      >>>>   >       On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew >>>      >>>>Brost wrote: >>>      >>>>   >       >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel >>>      Landwerlin >>>      >>>>   wrote: >>>      >>>>   >       >> On 17/05/2022 21:32, Niranjana Vishwanathapura >>>      wrote: >>>      >>>>   >       >> > +VM_BIND/UNBIND ioctl will immediately start >>>      >>>>   binding/unbinding >>>      >>>>   >       the mapping in an >>>      >>>>   >       >> > +async worker. The binding and unbinding will >>>      >>>>work like a >>>      >>>>   special >>>      >>>>   >       GPU engine. >>>      >>>>   >       >> > +The binding and unbinding operations are >>>      serialized and >>>      >>>>   will >>>      >>>>   >       wait on specified >>>      >>>>   >       >> > +input fences before the operation and >>>will signal >>>      the >>>      >>>>   output >>>      >>>>   >       fences upon the >>>      >>>>   >       >> > +completion of the operation. Due to >>>      serialization, >>>      >>>>   completion of >>>      >>>>   >       an operation >>>      >>>>   >       >> > +will also indicate that all previous >>>operations >>>      >>>>are also >>>      >>>>   >       complete. >>>      >>>>   >       >> >>>      >>>>   >       >> I guess we should avoid saying "will immediately >>>      start >>>      >>>>   >       binding/unbinding" if >>>      >>>>   >       >> there are fences involved. >>>      >>>>   >       >> >>>      >>>>   >       >> And the fact that it's happening in an async >>>      >>>>worker seem to >>>      >>>>   imply >>>      >>>>   >       it's not >>>      >>>>   >       >> immediate. >>>      >>>>   >       >> >>>      >>>>   > >>>      >>>>   >       Ok, will fix. >>>      >>>>   >       This was added because in earlier design >>>binding was >>>      deferred >>>      >>>>   until >>>      >>>>   >       next execbuff. >>>      >>>>   >       But now it is non-deferred (immediate in that >>>sense). >>>      >>>>But yah, >>>      >>>>   this is >>>      >>>>   >       confusing >>>      >>>>   >       and will fix it. >>>      >>>>   > >>>      >>>>   >       >> >>>      >>>>   >       >> I have a question on the behavior of the bind >>>      >>>>operation when >>>      >>>>   no >>>      >>>>   >       input fence >>>      >>>>   >       >> is provided. Let say I do : >>>      >>>>   >       >> >>>      >>>>   >       >> VM_BIND (out_fence=fence1) >>>      >>>>   >       >> >>>      >>>>   >       >> VM_BIND (out_fence=fence2) >>>      >>>>   >       >> >>>      >>>>   >       >> VM_BIND (out_fence=fence3) >>>      >>>>   >       >> >>>      >>>>   >       >> >>>      >>>>   >       >> In what order are the fences going to be >>>signaled? >>>      >>>>   >       >> >>>      >>>>   >       >> In the order of VM_BIND ioctls? Or out of order? >>>      >>>>   >       >> >>>      >>>>   >       >> Because you wrote "serialized I assume it's : in >>>      order >>>      >>>>   >       >> >>>      >>>>   > >>>      >>>>   >       Yes, in the order of VM_BIND/UNBIND ioctls. >>>Note that >>>      >>>>bind and >>>      >>>>   unbind >>>      >>>>   >       will use >>>      >>>>   >       the same queue and hence are ordered. >>>      >>>>   > >>>      >>>>   >       >> >>>      >>>>   >       >> One thing I didn't realize is that because >>>we only >>>      get one >>>      >>>>   >       "VM_BIND" engine, >>>      >>>>   >       >> there is a disconnect from the Vulkan >>>specification. >>>      >>>>   >       >> >>>      >>>>   >       >> In Vulkan VM_BIND operations are serialized but >>>      >>>>per engine. >>>      >>>>   >       >> >>>      >>>>   >       >> So you could have something like this : >>>      >>>>   >       >> >>>      >>>>   >       >> VM_BIND (engine=rcs0, in_fence=fence1, >>>      out_fence=fence2) >>>      >>>>   >       >> >>>      >>>>   >       >> VM_BIND (engine=ccs0, in_fence=fence3, >>>      out_fence=fence4) >>>      >>>>   >       >> >>>      >>>>   >       >> >>>      >>>>   >       >> fence1 is not signaled >>>      >>>>   >       >> >>>      >>>>   >       >> fence3 is signaled >>>      >>>>   >       >> >>>      >>>>   >       >> So the second VM_BIND will proceed before the >>>      >>>>first VM_BIND. >>>      >>>>   >       >> >>>      >>>>   >       >> >>>      >>>>   >       >> I guess we can deal with that scenario in >>>      >>>>userspace by doing >>>      >>>>   the >>>      >>>>   >       wait >>>      >>>>   >       >> ourselves in one thread per engines. >>>      >>>>   >       >> >>>      >>>>   >       >> But then it makes the VM_BIND input fences >>>useless. >>>      >>>>   >       >> >>>      >>>>   >       >> >>>      >>>>   >       >> Daniel : what do you think? Should be >>>rework this or >>>      just >>>      >>>>   deal with >>>      >>>>   >       wait >>>      >>>>   >       >> fences in userspace? >>>      >>>>   >       >> >>>      >>>>   >       > >>>      >>>>   >       >My opinion is rework this but make the >>>ordering via >>>      >>>>an engine >>>      >>>>   param >>>      >>>>   >       optional. >>>      >>>>   >       > >>>      >>>>   >       >e.g. A VM can be configured so all binds are >>>ordered >>>      >>>>within the >>>      >>>>   VM >>>      >>>>   >       > >>>      >>>>   >       >e.g. A VM can be configured so all binds accept an >>>      engine >>>      >>>>   argument >>>      >>>>   >       (in >>>      >>>>   >       >the case of the i915 likely this is a gem context >>>      >>>>handle) and >>>      >>>>   binds >>>      >>>>   >       >ordered with respect to that engine. >>>      >>>>   >       > >>>      >>>>   >       >This gives UMDs options as the later likely >>>consumes >>>      >>>>more KMD >>>      >>>>   >       resources >>>      >>>>   >       >so if a different UMD can live with binds being >>>      >>>>ordered within >>>      >>>>   the VM >>>      >>>>   >       >they can use a mode consuming less resources. >>>      >>>>   >       > >>>      >>>>   > >>>      >>>>   >       I think we need to be careful here if we are >>>looking >>>      for some >>>      >>>>   out of >>>      >>>>   >       (submission) order completion of vm_bind/unbind. >>>      >>>>   >       In-order completion means, in a batch of binds and >>>      >>>>unbinds to be >>>      >>>>   >       completed in-order, user only needs to specify >>>      >>>>in-fence for the >>>      >>>>   >       first bind/unbind call and the our-fence for >>>the last >>>      >>>>   bind/unbind >>>      >>>>   >       call. Also, the VA released by an unbind call >>>can be >>>      >>>>re-used by >>>      >>>>   >       any subsequent bind call in that in-order batch. >>>      >>>>   > >>>      >>>>   >       These things will break if binding/unbinding >>>were to >>>      >>>>be allowed >>>      >>>>   to >>>      >>>>   >       go out of order (of submission) and user need to be >>>      extra >>>      >>>>   careful >>>      >>>>   >       not to run into pre-mature triggereing of >>>out-fence and >>>      bind >>>      >>>>   failing >>>      >>>>   >       as VA is still in use etc. >>>      >>>>   > >>>      >>>>   >       Also, VM_BIND binds the provided mapping on the >>>      specified >>>      >>>>   address >>>      >>>>   >       space >>>      >>>>   >       (VM). So, the uapi is not engine/context specific. >>>      >>>>   > >>>      >>>>   >       We can however add a 'queue' to the uapi >>>which can be >>>      >>>>one from >>>      >>>>   the >>>      >>>>   >       pre-defined queues, >>>      >>>>   >       I915_VM_BIND_QUEUE_0 >>>      >>>>   >       I915_VM_BIND_QUEUE_1 >>>      >>>>   >       ... >>>      >>>>   >       I915_VM_BIND_QUEUE_(N-1) >>>      >>>>   > >>>      >>>>   >       KMD will spawn an async work queue for each >>>queue which >>>      will >>>      >>>>   only >>>      >>>>   >       bind the mappings on that queue in the order of >>>      submission. >>>      >>>>   >       User can assign the queue to per engine or anything >>>      >>>>like that. >>>      >>>>   > >>>      >>>>   >       But again here, user need to be careful and not >>>      >>>>deadlock these >>>      >>>>   >       queues with circular dependency of fences. >>>      >>>>   > >>>      >>>>   >       I prefer adding this later an as extension based on >>>      >>>>whether it >>>      >>>>   >       is really helping with the implementation. >>>      >>>>   > >>>      >>>>   >     I can tell you right now that having everything on a >>>      single >>>      >>>>   in-order >>>      >>>>   >     queue will not get us the perf we want.  What vulkan >>>      >>>>really wants >>>      >>>>   is one >>>      >>>>   >     of two things: >>>      >>>>   >      1. No implicit ordering of VM_BIND ops.  They just >>>      happen in >>>      >>>>   whatever >>>      >>>>   >     their dependencies are resolved and we ensure >>>ordering >>>      >>>>ourselves >>>      >>>>   by >>>      >>>>   >     having a syncobj in the VkQueue. >>>      >>>>   >      2. The ability to create multiple VM_BIND >>>queues.  We >>>      need at >>>      >>>>   least 2 >>>      >>>>   >     but I don't see why there needs to be a limit besides >>>      >>>>the limits >>>      >>>>   the >>>      >>>>   >     i915 API already has on the number of engines.  >>>Vulkan >>>      could >>>      >>>>   expose >>>      >>>>   >     multiple sparse binding queues to the client if >>>it's not >>>      >>>>   arbitrarily >>>      >>>>   >     limited. >>>      >>>> >>>      >>>>   Thanks Jason, Lionel. >>>      >>>> >>>      >>>>   Jason, what are you referring to when you say "limits >>>the i915 >>>      API >>>      >>>>   already >>>      >>>>   has on the number of engines"? I am not sure if there >>>is such >>>      an uapi >>>      >>>>   today. >>>      >>>> >>>      >>>> There's a limit of something like 64 total engines >>>today based on >>>      the >>>      >>>> number of bits we can cram into the exec flags in >>>execbuffer2.  I >>>      think >>>      >>>> someone had an extended version that allowed more but I >>>ripped it >>>      out >>>      >>>> because no one was using it.  Of course, execbuffer3 >>>might not >>>      >>>>have that >>>      >>>> problem at all. >>>      >>>> >>>      >>> >>>      >>>Thanks Jason. >>>      >>>Ok, I am not sure which exec flag is that, but yah, execbuffer3 >>>      probably >>>      >>>will not have this limiation. So, we need to define a >>>      VM_BIND_MAX_QUEUE >>>      >>>and somehow export it to user (I am thinking of embedding it in >>>      >>>I915_PARAM_HAS_VM_BIND. bits[0]->HAS_VM_BIND, bits[1-3]->'n' >>>      meaning 2^n >>>      >>>queues. >>>      >> >>>      >>Ah, I think you are waking about I915_EXEC_RING_MASK >>>(0x3f) which >>>      execbuf3 >>> >>>    Yup!  That's exactly the limit I was talking about. >>> >>>      >>will also have. So, we can simply define in vm_bind/unbind >>>      structures, >>>      >> >>>      >>#define I915_VM_BIND_MAX_QUEUE   64 >>>      >>        __u32 queue; >>>      >> >>>      >>I think that will keep things simple. >>>      > >>>      >Hmmm? What does execbuf2 limit has to do with how many engines >>>      >hardware can have? I suggest not to do that. >>>      > >>>      >Change with added this: >>>      > >>>      >       if (set.num_engines > I915_EXEC_RING_MASK + 1) >>>      >               return -EINVAL; >>>      > >>>      >To context creation needs to be undone and so let users >>>create engine >>>      >maps with all hardware engines, and let execbuf3 access them all. >>>      > >>> >>>      Earlier plan was to carry I915_EXEC_RING_MAP (0x3f) to >>>execbuff3 also. >>>      Hence, I was using the same limit for VM_BIND queues (64, or >>>65 if we >>>      make it N+1). >>>      But, as discussed in other thread of this RFC series, we are >>>planning >>>      to drop this I915_EXEC_RING_MAP in execbuff3. So, there won't be >>>      any uapi that limits the number of engines (and hence the vm_bind >>>      queues >>>      need to be supported). >>> >>>      If we leave the number of vm_bind queues to be arbitrarily large >>>      (__u32 queue_idx) then, we need to have a hashmap for queue (a wq, >>>      work_item and a linked list) lookup from the user specified queue >>>      index. >>>      Other option is to just put some hard limit (say 64 or 65) and use >>>      an array of queues in VM (each created upon first use). I >>>prefer this. >>> >>>    I don't get why a VM_BIND queue is any different from any >>>other queue or >>>    userspace-visible kernel object.  But I'll leave those details up to >>>    danvet or whoever else might be reviewing the implementation. >>>    --Jason >>> >>>  I kind of agree here. Wouldn't be simpler to have the bind queue >>>created >>>  like the others when we build the engine map? >>> >>>  For userspace it's then just matter of selecting the right queue >>>ID when >>>  submitting. >>> >>>  If there is ever a possibility to have this work on the GPU, it >>>would be >>>  all ready. >>> >> >>I did sync offline with Matt Brost on this. >>We can add a VM_BIND engine class and let user create VM_BIND >>engines (queues). >>The problem is, in i915 engine creating interface is bound to >>gem_context. >>So, in vm_bind ioctl, we would need both context_id and queue_idx >>for proper >>lookup of the user created engine. This is bit ackward as vm_bind is an >>interface to VM (address space) and has nothing to do with gem_context. > > >A gem_context has a single vm object right? > >Set through I915_CONTEXT_PARAM_VM at creation or given a default one if not. > >So it's just like picking up the vm like it's done at execbuffer time >right now : eb->context->vm > Are you suggesting replacing 'vm_id' with 'context_id' in the VM_BIND/UNBIND ioctl and probably call it CONTEXT_BIND/UNBIND, because VM can be obtained from the context? I think the interface is clean as a interface to VM. It is only that we don't have a clean way to create a raw VM_BIND engine (not associated with any context) with i915 uapi. May be we can add such an interface, but I don't think that is worth it (we might as well just use a queue_idx in VM_BIND/UNBIND ioctl as I mentioned above). Anyone has any thoughts? > >>Another problem is, if two VMs are binding with the same defined engine, >>binding on VM1 can get unnecessary blocked by binding on VM2 (which >>may be >>waiting on its in_fence). > > >Maybe I'm missing something, but how can you have 2 vm objects with a >single gem_context right now? > No, we don't have 2 VMs for a gem_context. Say if ctx1 with vm1 and ctx2 with vm2. First vm_bind call was for vm1 with q_idx 1 in ctx1 engine map. Second vm_bind call was for vm2 with q_idx 2 in ctx2 engine map. If those two queue indicies points to same underlying vm_bind engine, then the second vm_bind call gets blocked until the first vm_bind call's 'in' fence is triggered and bind completes. With per VM queues, this is not a problem as two VMs will not endup sharing same queue. BTW, I just posted a updated PATCH series. https://www.spinics.net/lists/dri-devel/msg350483.html Niranjana > >> >>So, my preference here is to just add a 'u32 queue' index in >>vm_bind/unbind >>ioctl, and the queues are per VM. >> >>Niranjana >> >>>  Thanks, >>> >>>  -Lionel >>> >>> >>>      Niranjana >>> >>>      >Regards, >>>      > >>>      >Tvrtko >>>      > >>>      >> >>>      >>Niranjana >>>      >> >>>      >>> >>>      >>>>   I am trying to see how many queues we need and don't >>>want it to >>>      be >>>      >>>>   arbitrarily >>>      >>>>   large and unduely blow up memory usage and complexity >>>in i915 >>>      driver. >>>      >>>> >>>      >>>> I expect a Vulkan driver to use at most 2 in the vast >>>majority >>>      >>>>of cases. I >>>      >>>> could imagine a client wanting to create more than 1 sparse >>>      >>>>queue in which >>>      >>>> case, it'll be N+1 but that's unlikely. As far as complexity >>>      >>>>goes, once >>>      >>>> you allow two, I don't think the complexity is going up by >>>      >>>>allowing N.  As >>>      >>>> for memory usage, creating more queues means more >>>memory.  That's >>>      a >>>      >>>> trade-off that userspace can make. Again, the expected number >>>      >>>>here is 1 >>>      >>>> or 2 in the vast majority of cases so I don't think you >>>need to >>>      worry. >>>      >>> >>>      >>>Ok, will start with n=3 meaning 8 queues. >>>      >>>That would require us create 8 workqueues. >>>      >>>We can change 'n' later if required. >>>      >>> >>>      >>>Niranjana >>>      >>> >>>      >>>> >>>      >>>>   >     Why?  Because Vulkan has two basic kind of bind >>>      >>>>operations and we >>>      >>>>   don't >>>      >>>>   >     want any dependencies between them: >>>      >>>>   >      1. Immediate.  These happen right after BO >>>creation or >>>      >>>>maybe as >>>      >>>>   part of >>>      >>>>   >     vkBindImageMemory() or VkBindBufferMemory().  These >>>      >>>>don't happen >>>      >>>>   on a >>>      >>>>   >     queue and we don't want them serialized with >>>anything.       To >>>      >>>>   synchronize >>>      >>>>   >     with submit, we'll have a syncobj in the >>>VkDevice which >>>      is >>>      >>>>   signaled by >>>      >>>>   >     all immediate bind operations and make submits >>>wait on >>>      it. >>>      >>>>   >      2. Queued (sparse): These happen on a VkQueue >>>which may >>>      be the >>>      >>>>   same as >>>      >>>>   >     a render/compute queue or may be its own >>>queue.  It's up >>>      to us >>>      >>>>   what we >>>      >>>>   >     want to advertise.  From the Vulkan API PoV, >>>this is like >>>      any >>>      >>>>   other >>>      >>>>   >     queue.  Operations on it wait on and signal >>>semaphores.       If we >>>      >>>>   have a >>>      >>>>   >     VM_BIND engine, we'd provide syncobjs to wait and >>>      >>>>signal just like >>>      >>>>   we do >>>      >>>>   >     in execbuf(). >>>      >>>>   >     The important thing is that we don't want one type of >>>      >>>>operation to >>>      >>>>   block >>>      >>>>   >     on the other.  If immediate binds are blocking >>>on sparse >>>      binds, >>>      >>>>   it's >>>      >>>>   >     going to cause over-synchronization issues. >>>      >>>>   >     In terms of the internal implementation, I know that >>>      >>>>there's going >>>      >>>>   to be >>>      >>>>   >     a lock on the VM and that we can't actually do these >>>      things in >>>      >>>>   >     parallel.  That's fine.  Once the dma_fences have >>>      signaled and >>>      >>>>   we're >>>      >>>> >>>      >>>>   Thats correct. It is like a single VM_BIND engine with >>>      >>>>multiple queues >>>      >>>>   feeding to it. >>>      >>>> >>>      >>>> Right.  As long as the queues themselves are independent and >>>      >>>>can block on >>>      >>>> dma_fences without holding up other queues, I think >>>we're fine. >>>      >>>> >>>      >>>>   >     unblocked to do the bind operation, I don't care if >>>      >>>>there's a bit >>>      >>>>   of >>>      >>>>   >     synchronization due to locking.  That's >>>expected.  What >>>      >>>>we can't >>>      >>>>   afford >>>      >>>>   >     to have is an immediate bind operation suddenly >>>blocking >>>      on a >>>      >>>>   sparse >>>      >>>>   >     operation which is blocked on a compute job >>>that's going >>>      to run >>>      >>>>   for >>>      >>>>   >     another 5ms. >>>      >>>> >>>      >>>>   As the VM_BIND queue is per VM, VM_BIND on one VM >>>doesn't block >>>      the >>>      >>>>   VM_BIND >>>      >>>>   on other VMs. I am not sure about usecases here, but just >>>      wanted to >>>      >>>>   clarify. >>>      >>>> >>>      >>>> Yes, that's what I would expect. >>>      >>>> --Jason >>>      >>>> >>>      >>>>   Niranjana >>>      >>>> >>>      >>>>   >     For reference, Windows solves this by allowing >>>      arbitrarily many >>>      >>>>   paging >>>      >>>>   >     queues (what they call a VM_BIND engine/queue).  That >>>      >>>>design works >>>      >>>>   >     pretty well and solves the problems in >>>question.       >>>>Again, we could >>>      >>>>   just >>>      >>>>   >     make everything out-of-order and require using >>>syncobjs >>>      >>>>to order >>>      >>>>   things >>>      >>>>   >     as userspace wants. That'd be fine too. >>>      >>>>   >     One more note while I'm here: danvet said >>>something on >>>      >>>>IRC about >>>      >>>>   VM_BIND >>>      >>>>   >     queues waiting for syncobjs to materialize.  We don't >>>      really >>>      >>>>   want/need >>>      >>>>   >     this.  We already have all the machinery in >>>userspace to >>>      handle >>>      >>>>   >     wait-before-signal and waiting for syncobj fences to >>>      >>>>materialize >>>      >>>>   and >>>      >>>>   >     that machinery is on by default.  It would actually >>>      >>>>take MORE work >>>      >>>>   in >>>      >>>>   >     Mesa to turn it off and take advantage of the kernel >>>      >>>>being able to >>>      >>>>   wait >>>      >>>>   >     for syncobjs to materialize. Also, getting that >>>right is >>>      >>>>   ridiculously >>>      >>>>   >     hard and I really don't want to get it wrong in >>>kernel >>>      >>>>space.     When we >>>      >>>>   >     do memory fences, wait-before-signal will be a >>>thing.  We >>>      don't >>>      >>>>   need to >>>      >>>>   >     try and make it a thing for syncobj. >>>      >>>>   >     --Jason >>>      >>>>   > >>>      >>>>   >   Thanks Jason, >>>      >>>>   > >>>      >>>>   >   I missed the bit in the Vulkan spec that we're >>>allowed to >>>      have a >>>      >>>>   sparse >>>      >>>>   >   queue that does not implement either graphics or >>>compute >>>      >>>>operations >>>      >>>>   : >>>      >>>>   > >>>      >>>>   >     "While some implementations may include >>>      >>>>   VK_QUEUE_SPARSE_BINDING_BIT >>>      >>>>   >     support in queue families that also include >>>      >>>>   > >>>      >>>>   >      graphics and compute support, other >>>implementations may >>>      only >>>      >>>>   expose a >>>      >>>>   > VK_QUEUE_SPARSE_BINDING_BIT-only queue >>>      >>>>   > >>>      >>>>   >      family." >>>      >>>>   > >>>      >>>>   >   So it can all be all a vm_bind engine that just does >>>      bind/unbind >>>      >>>>   >   operations. >>>      >>>>   > >>>      >>>>   >   But yes we need another engine for the >>>immediate/non-sparse >>>      >>>>   operations. >>>      >>>>   > >>>      >>>>   >   -Lionel >>>      >>>>   > >>>      >>>>   >         > >>>      >>>>   >       Daniel, any thoughts? >>>      >>>>   > >>>      >>>>   >       Niranjana >>>      >>>>   > >>>      >>>>   >       >Matt >>>      >>>>   >       > >>>      >>>>   >       >> >>>      >>>>   >       >> Sorry I noticed this late. >>>      >>>>   >       >> >>>      >>>>   >       >> >>>      >>>>   >       >> -Lionel >>>      >>>>   >       >> >>>      >>>>   >       >> > >