From: Matthew Brost <matthew.brost@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: dri-devel@lists.freedesktop.org, paulo.r.zanoni@intel.com,
intel-gfx@lists.freedesktop.org, chris.p.wilson@intel.com,
thomas.hellstrom@intel.com, daniel.vetter@intel.com,
christian.koenig@amd.com, matthew.auld@intel.com
Subject: Re: [Intel-gfx] [PATCH 3/3] drm/doc/rfc: VM_BIND uapi definition
Date: Mon, 13 Jun 2022 16:39:47 -0700 [thread overview]
Message-ID: <20220613233947.GA15145@jons-linux-dev-box> (raw)
In-Reply-To: <5ebcd237-a6df-add2-070a-056ccb83427a@linux.intel.com>
On Mon, Jun 13, 2022 at 07:09:06PM +0100, Tvrtko Ursulin wrote:
>
> On 13/06/2022 18:49, Niranjana Vishwanathapura wrote:
> > On Mon, Jun 13, 2022 at 05:22:02PM +0100, Tvrtko Ursulin wrote:
> > >
> > > On 13/06/2022 16:05, Niranjana Vishwanathapura wrote:
> > > > On Mon, Jun 13, 2022 at 09:24:18AM +0100, Tvrtko Ursulin wrote:
> > > > >
> > > > > On 10/06/2022 17:14, Niranjana Vishwanathapura wrote:
> > > > > > On Fri, Jun 10, 2022 at 05:48:39PM +0300, Lionel Landwerlin wrote:
> > > > > > > On 10/06/2022 13:37, Tvrtko Ursulin wrote:
> > > > > > > >
> > > > > > > > On 10/06/2022 08:07, Niranjana Vishwanathapura wrote:
> > > > > > > > > VM_BIND and related uapi definitions
> > > > > > > > >
> > > > > > > > > Signed-off-by: Niranjana Vishwanathapura
> > > > > > > > > <niranjana.vishwanathapura@intel.com>
> > > > > > > > > ---
> > > > > > > > > Documentation/gpu/rfc/i915_vm_bind.h | 490
> > > > > > > > > +++++++++++++++++++++++++++
> > > > > > > > > 1 file changed, 490 insertions(+)
> > > > > > > > > create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
> > > > > > > > >
> > > > > > > > > diff --git
> > > > > > > > > a/Documentation/gpu/rfc/i915_vm_bind.h
> > > > > > > > > b/Documentation/gpu/rfc/i915_vm_bind.h
> > > > > > > > > new file mode 100644
> > > > > > > > > index 000000000000..9fc854969cfb
> > > > > > > > > --- /dev/null
> > > > > > > > > +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> > > > > > > > > @@ -0,0 +1,490 @@
> > > > > > > > > +/* SPDX-License-Identifier: MIT */
> > > > > > > > > +/*
> > > > > > > > > + * Copyright © 2022 Intel Corporation
> > > > > > > > > + */
> > > > > > > > > +
> > > > > > > > > +/**
> > > > > > > > > + * DOC: I915_PARAM_HAS_VM_BIND
> > > > > > > > > + *
> > > > > > > > > + * VM_BIND feature availability.
> > > > > > > > > + * See typedef drm_i915_getparam_t param.
> > > > > > > > > + * bit[0]: If set, VM_BIND is supported, otherwise not.
> > > > > > > > > + * bits[8-15]: VM_BIND implementation version.
> > > > > > > > > + * version 0 will not have VM_BIND/UNBIND
> > > > > > > > > timeline fence array support.
> > > > > > > > > + */
> > > > > > > > > +#define I915_PARAM_HAS_VM_BIND 57
> > > > > > > > > +
> > > > > > > > > +/**
> > > > > > > > > + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> > > > > > > > > + *
> > > > > > > > > + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> > > > > > > > > + * See struct drm_i915_gem_vm_control flags.
> > > > > > > > > + *
> > > > > > > > > + * The older execbuf2 ioctl will not
> > > > > > > > > support VM_BIND mode of operation.
> > > > > > > > > + * For VM_BIND mode, we have new execbuf3
> > > > > > > > > ioctl which will not accept any
> > > > > > > > > + * execlist (See struct
> > > > > > > > > drm_i915_gem_execbuffer3 for more details).
> > > > > > > > > + *
> > > > > > > > > + */
> > > > > > > > > +#define I915_VM_CREATE_FLAGS_USE_VM_BIND (1 << 0)
> > > > > > > > > +
> > > > > > > > > +/**
> > > > > > > > > + * DOC: I915_CONTEXT_CREATE_FLAGS_LONG_RUNNING
> > > > > > > > > + *
> > > > > > > > > + * Flag to declare context as long running.
> > > > > > > > > + * See struct drm_i915_gem_context_create_ext flags.
> > > > > > > > > + *
> > > > > > > > > + * Usage of dma-fence expects that they
> > > > > > > > > complete in reasonable amount of time.
> > > > > > > > > + * Compute on the other hand can be long
> > > > > > > > > running. Hence it is not appropriate
> > > > > > > > > + * for compute contexts to export request
> > > > > > > > > completion dma-fence to user.
> > > > > > > > > + * The dma-fence usage will be limited to
> > > > > > > > > in-kernel consumption only.
> > > > > > > > > + * Compute contexts need to use user/memory fence.
> > > > > > > > > + *
> > > > > > > > > + * So, long running contexts do not support output fences. Hence,
> > > > > > > > > + * I915_EXEC_FENCE_SIGNAL (See
> > > > > > > > > &drm_i915_gem_exec_fence.flags) is expected
> > > > > > > > > + * to be not used. DRM_I915_GEM_WAIT ioctl
> > > > > > > > > call is also not supported for
> > > > > > > > > + * objects mapped to long running contexts.
> > > > > > > > > + */
> > > > > > > > > +#define I915_CONTEXT_CREATE_FLAGS_LONG_RUNNING (1u << 2)
> > > > > > > > > +
> > > > > > > > > +/* VM_BIND related ioctls */
> > > > > > > > > +#define DRM_I915_GEM_VM_BIND 0x3d
> > > > > > > > > +#define DRM_I915_GEM_VM_UNBIND 0x3e
> > > > > > > > > +#define DRM_I915_GEM_EXECBUFFER3 0x3f
> > > > > > > > > +#define DRM_I915_GEM_WAIT_USER_FENCE 0x40
> > > > > > > > > +
> > > > > > > > > +#define DRM_IOCTL_I915_GEM_VM_BIND
> > > > > > > > > DRM_IOWR(DRM_COMMAND_BASE +
> > > > > > > > > DRM_I915_GEM_VM_BIND, struct
> > > > > > > > > drm_i915_gem_vm_bind)
> > > > > > > > > +#define DRM_IOCTL_I915_GEM_VM_UNBIND
> > > > > > > > > DRM_IOWR(DRM_COMMAND_BASE +
> > > > > > > > > DRM_I915_GEM_VM_UNBIND, struct
> > > > > > > > > drm_i915_gem_vm_bind)
> > > > > > > > > +#define DRM_IOCTL_I915_GEM_EXECBUFFER3
> > > > > > > > > DRM_IOWR(DRM_COMMAND_BASE +
> > > > > > > > > DRM_I915_GEM_EXECBUFFER3, struct
> > > > > > > > > drm_i915_gem_execbuffer3)
> > > > > > > > > +#define DRM_IOCTL_I915_GEM_WAIT_USER_FENCE
> > > > > > > > > DRM_IOWR(DRM_COMMAND_BASE +
> > > > > > > > > DRM_I915_GEM_WAIT_USER_FENCE, struct
> > > > > > > > > drm_i915_gem_wait_user_fence)
> > > > > > > > > +
> > > > > > > > > +/**
> > > > > > > > > + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> > > > > > > > > + *
> > > > > > > > > + * This structure is passed to VM_BIND
> > > > > > > > > ioctl and specifies the mapping of GPU
> > > > > > > > > + * virtual address (VA) range to the
> > > > > > > > > section of an object that should be bound
> > > > > > > > > + * in the device page table of the specified address space (VM).
> > > > > > > > > + * The VA range specified must be unique
> > > > > > > > > (ie., not currently bound) and can
> > > > > > > > > + * be mapped to whole object or a section
> > > > > > > > > of the object (partial binding).
> > > > > > > > > + * Multiple VA mappings can be created to
> > > > > > > > > the same section of the object
> > > > > > > > > + * (aliasing).
> > > > > > > > > + *
> > > > > > > > > + * The @queue_idx specifies the queue to
> > > > > > > > > use for binding. Same queue can be
> > > > > > > > > + * used for both VM_BIND and VM_UNBIND
> > > > > > > > > calls. All submitted bind and unbind
> > > > > > > > > + * operations in a queue are performed in the order of submission.
> > > > > > > > > + *
> > > > > > > > > + * The @start, @offset and @length should
> > > > > > > > > be 4K page aligned. However the DG2
> > > > > > > > > + * and XEHPSDV has 64K page size for device
> > > > > > > > > local-memory and has compact page
> > > > > > > > > + * table. On those platforms, for binding
> > > > > > > > > device local-memory objects, the
> > > > > > > > > + * @start should be 2M aligned, @offset and
> > > > > > > > > @length should be 64K aligned.
> > > > > > > > > + * Also, on those platforms, it is not
> > > > > > > > > allowed to bind an device local-memory
> > > > > > > > > + * object and a system memory object in a
> > > > > > > > > single 2M section of VA range.
> > > > > > > > > + */
> > > > > > > > > +struct drm_i915_gem_vm_bind {
> > > > > > > > > + /** @vm_id: VM (address space) id to bind */
> > > > > > > > > + __u32 vm_id;
> > > > > > > > > +
> > > > > > > > > + /** @queue_idx: Index of queue for binding */
> > > > > > > > > + __u32 queue_idx;
> > > > > > > >
> > > > > > > > I have a question here to which I did not find
> > > > > > > > an answer by browsing the old threads.
> > > > > > > >
> > > > > > > > Queue index appears to be an implicit
> > > > > > > > synchronisation mechanism, right? Operations on
> > > > > > > > the same index are executed/complete in order of
> > > > > > > > ioctl submission?
> > > > > > > >
> > > > > > > > Do we _have_ to implement this on the kernel
> > > > > > > > side and could just allow in/out fence and let
> > > > > > > > userspace deal with it?
> > > > > > >
> > > > > > >
> > > > > > > It orders operations like in a queue. Which is kind
> > > > > > > of what happens with existing queues/engines.
> > > > > > >
> > > > > > > If I understood correctly, it's going to be a
> > > > > > > kthread + a linked list right?
> > > > > > >
> > > > > >
> > > > > > Yes, that is correct.
> > > > > >
> > > > > > >
> > > > > > > -Lionel
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Arbitrary/on-demand number of queues will add
> > > > > > > > the complexity on the kernel side which should
> > > > > > > > be avoided if possible.
> > > > > > > >
> > > > > >
> > > > > > It was discussed in the other thread. Jason prefers this over putting
> > > > > > an artificial limit on number of queues (as user can
> > > > > > anyway can exhaust
> > > > > > the memory). I think complexity in the driver is manageable.
> > > > >
> > > > > You'll need to create tracking structures on demand, with
> > > > > atomic replace of last fence, ref counting and locking of
> > > > > some sort, more or less?
> > > > >
> > > >
> > > > We will have a workqueue, an work item and a linked list per queue.
> > > > VM_BIND/UNBIND call will add the mapping request to the
> > > > specified queue's
> > > > linked list and schedule the work item on the workqueue of that queue.
> > > > I am not sure what you mean by last fence and replacing it.
> > > >
> > > > > > The other option being discussed in to have the user create those
> > > > > > queues (like creating engine map) before hand and use that in vm_bind
> > > > > > and vm_unbind ioctls. This puts a limit on the number of queues.
> > > > > > But it is not clean either and not sure it is worth
> > > > > > making the interface
> > > > > > more complex.
> > > > > > https://www.spinics.net/lists/dri-devel/msg350448.html
> > > > >
> > > > > What about the third option of a flag to return a fence (of
> > > > > some sort) and pass in a fence? That way userspace can
> > > > > imagine zero or N queues with very little effort on the
> > > > > kernel side. Was this considered?
> > > > >
> > > >
> > > > I am not clear what fence you are talking about here and how does that
> > > > help with the number of vm_bind queues. Can you eloborate?
> > >
> > > It is actually already documented that bind/unbind will support
> > > input and output fences - so what are these queues on top of what
> > > userspace can already achieve by using them? Purely a convenience or
> > > there is more to it?
> > >
> >
> > Oh, the vm_bind queues are discussed in this thread.
> > https://lists.freedesktop.org/archives/intel-gfx/2022-June/299217.html
> >
> > Apparently Vulkan has requirement for multiple queues, each queue
> > processing vm_bind/unbind calls in the order of submission.
>
> I don't see how that answers my question so I will take the freedom to
> repeat it. What are these queues on top of what userspace can already
> achieve by using in-out fences? Purely a convenience or there is more to it?
>
> Queue1:
>
> out_fence_A = vm_bind A
> out_fence_B = vm_bind B, in_fence=out_fence_A
> execbuf(in_fence = out_fence_B)
>
> Queue2:
>
> out_fence_C = vm_bind C
> out_fence_D = vm_bind D, in_fence=out_fence_C
> execbuf(in_fence = out_fence_D)
>
> Parallel bind:
> out_fence_E = vm_bind E
> out_fence_F = vm_bind F
> merged_fence = fence_merge(out_fence_E, out_fence_F)
> execbuf(in_fence = merged_fence)
>
Let's say you do this and only 1 queue:
VM_BIND_A (in_fence=fence_A)
VM_BIND_B (in_fence=NULL)
With 1 queue VM_BIND_B in blocked on fence_A, hence the need for than 1
queue.
e.g.
VM_BIND_A (queue_id=0, in_fence=fence_A)
VM_BIND_B (queue_id=1, in_fence=NULL)
Now VM_BIND_B can immediately be executed regardless of fence_A status.
Matt
> Regards,
>
> Tvrtko
next prev parent reply other threads:[~2022-06-13 23:46 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-10 7:07 [Intel-gfx] [PATCH 0/3] drm/doc/rfc: i915 VM_BIND feature design + uapi Niranjana Vishwanathapura
2022-06-10 7:07 ` [Intel-gfx] [PATCH 1/3] drm/doc/rfc: VM_BIND feature design document Niranjana Vishwanathapura
2022-06-10 7:07 ` [Intel-gfx] [PATCH 2/3] drm/i915: Update i915 uapi documentation Niranjana Vishwanathapura
2022-06-10 11:01 ` Matthew Auld
2022-06-10 16:36 ` Niranjana Vishwanathapura
2022-06-10 7:07 ` [Intel-gfx] [PATCH 3/3] drm/doc/rfc: VM_BIND uapi definition Niranjana Vishwanathapura
2022-06-10 8:53 ` Matthew Brost
2022-06-10 8:56 ` Matthew Brost
2022-06-10 16:35 ` Niranjana Vishwanathapura
2022-06-13 17:56 ` Niranjana Vishwanathapura
2022-06-14 6:27 ` Lionel Landwerlin
2022-06-14 16:14 ` Niranjana Vishwanathapura
2022-06-10 10:37 ` Tvrtko Ursulin
2022-06-10 14:48 ` Lionel Landwerlin
2022-06-10 16:14 ` Niranjana Vishwanathapura
2022-06-13 8:24 ` Tvrtko Ursulin
2022-06-13 15:05 ` Niranjana Vishwanathapura
2022-06-13 16:22 ` Tvrtko Ursulin
2022-06-13 17:49 ` Niranjana Vishwanathapura
2022-06-13 18:09 ` Tvrtko Ursulin
2022-06-13 23:39 ` Matthew Brost [this message]
2022-06-14 7:16 ` Tvrtko Ursulin
2022-06-14 15:43 ` Niranjana Vishwanathapura
2022-06-14 16:02 ` Tvrtko Ursulin
2022-06-14 16:07 ` Tvrtko Ursulin
2022-06-14 16:42 ` Niranjana Vishwanathapura
2022-06-15 7:22 ` Tvrtko Ursulin
2022-06-15 15:20 ` Niranjana Vishwanathapura
2022-06-16 8:53 ` Tvrtko Ursulin
2022-06-10 15:35 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for drm/doc/rfc: i915 VM_BIND feature design + uapi Patchwork
2022-06-17 5:13 ` [Intel-gfx] [PATCH 0/3] " Niranjana Vishwanathapura
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220613233947.GA15145@jons-linux-dev-box \
--to=matthew.brost@intel.com \
--cc=chris.p.wilson@intel.com \
--cc=christian.koenig@amd.com \
--cc=daniel.vetter@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=matthew.auld@intel.com \
--cc=paulo.r.zanoni@intel.com \
--cc=thomas.hellstrom@intel.com \
--cc=tvrtko.ursulin@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox