Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: dri-devel@lists.freedesktop.org, paulo.r.zanoni@intel.com,
	intel-gfx@lists.freedesktop.org, chris.p.wilson@intel.com,
	thomas.hellstrom@intel.com, daniel.vetter@intel.com,
	christian.koenig@amd.com, matthew.auld@intel.com
Subject: Re: [Intel-gfx] [PATCH 3/3] drm/doc/rfc: VM_BIND uapi definition
Date: Mon, 13 Jun 2022 16:39:47 -0700	[thread overview]
Message-ID: <20220613233947.GA15145@jons-linux-dev-box> (raw)
In-Reply-To: <5ebcd237-a6df-add2-070a-056ccb83427a@linux.intel.com>

On Mon, Jun 13, 2022 at 07:09:06PM +0100, Tvrtko Ursulin wrote:
> 
> On 13/06/2022 18:49, Niranjana Vishwanathapura wrote:
> > On Mon, Jun 13, 2022 at 05:22:02PM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 13/06/2022 16:05, Niranjana Vishwanathapura wrote:
> > > > On Mon, Jun 13, 2022 at 09:24:18AM +0100, Tvrtko Ursulin wrote:
> > > > > 
> > > > > On 10/06/2022 17:14, Niranjana Vishwanathapura wrote:
> > > > > > On Fri, Jun 10, 2022 at 05:48:39PM +0300, Lionel Landwerlin wrote:
> > > > > > > On 10/06/2022 13:37, Tvrtko Ursulin wrote:
> > > > > > > > 
> > > > > > > > On 10/06/2022 08:07, Niranjana Vishwanathapura wrote:
> > > > > > > > > VM_BIND and related uapi definitions
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Niranjana Vishwanathapura
> > > > > > > > > <niranjana.vishwanathapura@intel.com>
> > > > > > > > > ---
> > > > > > > > >   Documentation/gpu/rfc/i915_vm_bind.h | 490
> > > > > > > > > +++++++++++++++++++++++++++
> > > > > > > > >   1 file changed, 490 insertions(+)
> > > > > > > > >   create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
> > > > > > > > > 
> > > > > > > > > diff --git
> > > > > > > > > a/Documentation/gpu/rfc/i915_vm_bind.h
> > > > > > > > > b/Documentation/gpu/rfc/i915_vm_bind.h
> > > > > > > > > new file mode 100644
> > > > > > > > > index 000000000000..9fc854969cfb
> > > > > > > > > --- /dev/null
> > > > > > > > > +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> > > > > > > > > @@ -0,0 +1,490 @@
> > > > > > > > > +/* SPDX-License-Identifier: MIT */
> > > > > > > > > +/*
> > > > > > > > > + * Copyright © 2022 Intel Corporation
> > > > > > > > > + */
> > > > > > > > > +
> > > > > > > > > +/**
> > > > > > > > > + * DOC: I915_PARAM_HAS_VM_BIND
> > > > > > > > > + *
> > > > > > > > > + * VM_BIND feature availability.
> > > > > > > > > + * See typedef drm_i915_getparam_t param.
> > > > > > > > > + * bit[0]: If set, VM_BIND is supported, otherwise not.
> > > > > > > > > + * bits[8-15]: VM_BIND implementation version.
> > > > > > > > > + * version 0 will not have VM_BIND/UNBIND
> > > > > > > > > timeline fence array support.
> > > > > > > > > + */
> > > > > > > > > +#define I915_PARAM_HAS_VM_BIND        57
> > > > > > > > > +
> > > > > > > > > +/**
> > > > > > > > > + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> > > > > > > > > + *
> > > > > > > > > + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> > > > > > > > > + * See struct drm_i915_gem_vm_control flags.
> > > > > > > > > + *
> > > > > > > > > + * The older execbuf2 ioctl will not
> > > > > > > > > support VM_BIND mode of operation.
> > > > > > > > > + * For VM_BIND mode, we have new execbuf3
> > > > > > > > > ioctl which will not accept any
> > > > > > > > > + * execlist (See struct
> > > > > > > > > drm_i915_gem_execbuffer3 for more details).
> > > > > > > > > + *
> > > > > > > > > + */
> > > > > > > > > +#define I915_VM_CREATE_FLAGS_USE_VM_BIND    (1 << 0)
> > > > > > > > > +
> > > > > > > > > +/**
> > > > > > > > > + * DOC: I915_CONTEXT_CREATE_FLAGS_LONG_RUNNING
> > > > > > > > > + *
> > > > > > > > > + * Flag to declare context as long running.
> > > > > > > > > + * See struct drm_i915_gem_context_create_ext flags.
> > > > > > > > > + *
> > > > > > > > > + * Usage of dma-fence expects that they
> > > > > > > > > complete in reasonable amount of time.
> > > > > > > > > + * Compute on the other hand can be long
> > > > > > > > > running. Hence it is not appropriate
> > > > > > > > > + * for compute contexts to export request
> > > > > > > > > completion dma-fence to user.
> > > > > > > > > + * The dma-fence usage will be limited to
> > > > > > > > > in-kernel consumption only.
> > > > > > > > > + * Compute contexts need to use user/memory fence.
> > > > > > > > > + *
> > > > > > > > > + * So, long running contexts do not support output fences. Hence,
> > > > > > > > > + * I915_EXEC_FENCE_SIGNAL (See
> > > > > > > > > &drm_i915_gem_exec_fence.flags) is expected
> > > > > > > > > + * to be not used. DRM_I915_GEM_WAIT ioctl
> > > > > > > > > call is also not supported for
> > > > > > > > > + * objects mapped to long running contexts.
> > > > > > > > > + */
> > > > > > > > > +#define I915_CONTEXT_CREATE_FLAGS_LONG_RUNNING   (1u << 2)
> > > > > > > > > +
> > > > > > > > > +/* VM_BIND related ioctls */
> > > > > > > > > +#define DRM_I915_GEM_VM_BIND        0x3d
> > > > > > > > > +#define DRM_I915_GEM_VM_UNBIND        0x3e
> > > > > > > > > +#define DRM_I915_GEM_EXECBUFFER3    0x3f
> > > > > > > > > +#define DRM_I915_GEM_WAIT_USER_FENCE    0x40
> > > > > > > > > +
> > > > > > > > > +#define DRM_IOCTL_I915_GEM_VM_BIND
> > > > > > > > > DRM_IOWR(DRM_COMMAND_BASE +
> > > > > > > > > DRM_I915_GEM_VM_BIND, struct
> > > > > > > > > drm_i915_gem_vm_bind)
> > > > > > > > > +#define DRM_IOCTL_I915_GEM_VM_UNBIND
> > > > > > > > > DRM_IOWR(DRM_COMMAND_BASE +
> > > > > > > > > DRM_I915_GEM_VM_UNBIND, struct
> > > > > > > > > drm_i915_gem_vm_bind)
> > > > > > > > > +#define DRM_IOCTL_I915_GEM_EXECBUFFER3
> > > > > > > > > DRM_IOWR(DRM_COMMAND_BASE +
> > > > > > > > > DRM_I915_GEM_EXECBUFFER3, struct
> > > > > > > > > drm_i915_gem_execbuffer3)
> > > > > > > > > +#define DRM_IOCTL_I915_GEM_WAIT_USER_FENCE
> > > > > > > > > DRM_IOWR(DRM_COMMAND_BASE +
> > > > > > > > > DRM_I915_GEM_WAIT_USER_FENCE, struct
> > > > > > > > > drm_i915_gem_wait_user_fence)
> > > > > > > > > +
> > > > > > > > > +/**
> > > > > > > > > + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> > > > > > > > > + *
> > > > > > > > > + * This structure is passed to VM_BIND
> > > > > > > > > ioctl and specifies the mapping of GPU
> > > > > > > > > + * virtual address (VA) range to the
> > > > > > > > > section of an object that should be bound
> > > > > > > > > + * in the device page table of the specified address space (VM).
> > > > > > > > > + * The VA range specified must be unique
> > > > > > > > > (ie., not currently bound) and can
> > > > > > > > > + * be mapped to whole object or a section
> > > > > > > > > of the object (partial binding).
> > > > > > > > > + * Multiple VA mappings can be created to
> > > > > > > > > the same section of the object
> > > > > > > > > + * (aliasing).
> > > > > > > > > + *
> > > > > > > > > + * The @queue_idx specifies the queue to
> > > > > > > > > use for binding. Same queue can be
> > > > > > > > > + * used for both VM_BIND and VM_UNBIND
> > > > > > > > > calls. All submitted bind and unbind
> > > > > > > > > + * operations in a queue are performed in the order of submission.
> > > > > > > > > + *
> > > > > > > > > + * The @start, @offset and @length should
> > > > > > > > > be 4K page aligned. However the DG2
> > > > > > > > > + * and XEHPSDV has 64K page size for device
> > > > > > > > > local-memory and has compact page
> > > > > > > > > + * table. On those platforms, for binding
> > > > > > > > > device local-memory objects, the
> > > > > > > > > + * @start should be 2M aligned, @offset and
> > > > > > > > > @length should be 64K aligned.
> > > > > > > > > + * Also, on those platforms, it is not
> > > > > > > > > allowed to bind an device local-memory
> > > > > > > > > + * object and a system memory object in a
> > > > > > > > > single 2M section of VA range.
> > > > > > > > > + */
> > > > > > > > > +struct drm_i915_gem_vm_bind {
> > > > > > > > > +    /** @vm_id: VM (address space) id to bind */
> > > > > > > > > +    __u32 vm_id;
> > > > > > > > > +
> > > > > > > > > +    /** @queue_idx: Index of queue for binding */
> > > > > > > > > +    __u32 queue_idx;
> > > > > > > > 
> > > > > > > > I have a question here to which I did not find
> > > > > > > > an answer by browsing the old threads.
> > > > > > > > 
> > > > > > > > Queue index appears to be an implicit
> > > > > > > > synchronisation mechanism, right? Operations on
> > > > > > > > the same index are executed/complete in order of
> > > > > > > > ioctl submission?
> > > > > > > > 
> > > > > > > > Do we _have_ to implement this on the kernel
> > > > > > > > side and could just allow in/out fence and let
> > > > > > > > userspace deal with it?
> > > > > > > 
> > > > > > > 
> > > > > > > It orders operations like in a queue. Which is kind
> > > > > > > of what happens with existing queues/engines.
> > > > > > > 
> > > > > > > If I understood correctly, it's going to be a
> > > > > > > kthread + a linked list right?
> > > > > > > 
> > > > > > 
> > > > > > Yes, that is correct.
> > > > > > 
> > > > > > > 
> > > > > > > -Lionel
> > > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > Arbitrary/on-demand number of queues will add
> > > > > > > > the complexity on the kernel side which should
> > > > > > > > be avoided if possible.
> > > > > > > > 
> > > > > > 
> > > > > > It was discussed in the other thread. Jason prefers this over putting
> > > > > > an artificial limit on number of queues (as user can
> > > > > > anyway can exhaust
> > > > > > the memory). I think complexity in the driver is manageable.
> > > > > 
> > > > > You'll need to create tracking structures on demand, with
> > > > > atomic replace of last fence, ref counting and locking of
> > > > > some sort, more or less?
> > > > > 
> > > > 
> > > > We will have a workqueue, an work item and a linked list per queue.
> > > > VM_BIND/UNBIND call will add the mapping request to the
> > > > specified queue's
> > > > linked list and schedule the work item on the workqueue of that queue.
> > > > I am not sure what you mean by last fence and replacing it.
> > > > 
> > > > > > The other option being discussed in to have the user create those
> > > > > > queues (like creating engine map) before hand and use that in vm_bind
> > > > > > and vm_unbind ioctls. This puts a limit on the number of queues.
> > > > > > But it is not clean either and not sure it is worth
> > > > > > making the interface
> > > > > > more complex.
> > > > > > https://www.spinics.net/lists/dri-devel/msg350448.html
> > > > > 
> > > > > What about the third option of a flag to return a fence (of
> > > > > some sort) and pass in a fence? That way userspace can
> > > > > imagine zero or N queues with very little effort on the
> > > > > kernel side. Was this considered?
> > > > > 
> > > > 
> > > > I am not clear what fence you are talking about here and how does that
> > > > help with the number of vm_bind queues. Can you eloborate?
> > > 
> > > It is actually already documented that bind/unbind will support
> > > input and output fences - so what are these queues on top of what
> > > userspace can already achieve by using them? Purely a convenience or
> > > there is more to it?
> > > 
> > 
> > Oh, the vm_bind queues are discussed in this thread.
> > https://lists.freedesktop.org/archives/intel-gfx/2022-June/299217.html
> > 
> > Apparently Vulkan has requirement for multiple queues, each queue
> > processing vm_bind/unbind calls in the order of submission.
> 
> I don't see how that answers my question so I will take the freedom to
> repeat it. What are these queues on top of what userspace can already
> achieve by using in-out fences? Purely a convenience or there is more to it?
> 
> Queue1:
> 
> out_fence_A = vm_bind A
> out_fence_B = vm_bind B, in_fence=out_fence_A
> execbuf(in_fence = out_fence_B)
> 
> Queue2:
> 
> out_fence_C = vm_bind C
> out_fence_D = vm_bind D, in_fence=out_fence_C
> execbuf(in_fence = out_fence_D)
> 
> Parallel bind:
> out_fence_E = vm_bind E
> out_fence_F = vm_bind F
> merged_fence = fence_merge(out_fence_E, out_fence_F)
> execbuf(in_fence = merged_fence)
> 

Let's say you do this and only 1 queue:

VM_BIND_A (in_fence=fence_A)
VM_BIND_B (in_fence=NULL)

With 1 queue VM_BIND_B in blocked on fence_A, hence the need for than 1
queue.

e.g.
 
VM_BIND_A (queue_id=0, in_fence=fence_A)
VM_BIND_B (queue_id=1, in_fence=NULL)

Now VM_BIND_B can immediately be executed regardless of fence_A status.

Matt

> Regards,
> 
> Tvrtko

  reply	other threads:[~2022-06-13 23:46 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-10  7:07 [Intel-gfx] [PATCH 0/3] drm/doc/rfc: i915 VM_BIND feature design + uapi Niranjana Vishwanathapura
2022-06-10  7:07 ` [Intel-gfx] [PATCH 1/3] drm/doc/rfc: VM_BIND feature design document Niranjana Vishwanathapura
2022-06-10  7:07 ` [Intel-gfx] [PATCH 2/3] drm/i915: Update i915 uapi documentation Niranjana Vishwanathapura
2022-06-10 11:01   ` Matthew Auld
2022-06-10 16:36     ` Niranjana Vishwanathapura
2022-06-10  7:07 ` [Intel-gfx] [PATCH 3/3] drm/doc/rfc: VM_BIND uapi definition Niranjana Vishwanathapura
2022-06-10  8:53   ` Matthew Brost
2022-06-10  8:56     ` Matthew Brost
2022-06-10 16:35       ` Niranjana Vishwanathapura
2022-06-13 17:56         ` Niranjana Vishwanathapura
2022-06-14  6:27     ` Lionel Landwerlin
2022-06-14 16:14       ` Niranjana Vishwanathapura
2022-06-10 10:37   ` Tvrtko Ursulin
2022-06-10 14:48     ` Lionel Landwerlin
2022-06-10 16:14       ` Niranjana Vishwanathapura
2022-06-13  8:24         ` Tvrtko Ursulin
2022-06-13 15:05           ` Niranjana Vishwanathapura
2022-06-13 16:22             ` Tvrtko Ursulin
2022-06-13 17:49               ` Niranjana Vishwanathapura
2022-06-13 18:09                 ` Tvrtko Ursulin
2022-06-13 23:39                   ` Matthew Brost [this message]
2022-06-14  7:16                     ` Tvrtko Ursulin
2022-06-14 15:43                       ` Niranjana Vishwanathapura
2022-06-14 16:02                         ` Tvrtko Ursulin
2022-06-14 16:07                           ` Tvrtko Ursulin
2022-06-14 16:42                             ` Niranjana Vishwanathapura
2022-06-15  7:22                               ` Tvrtko Ursulin
2022-06-15 15:20                                 ` Niranjana Vishwanathapura
2022-06-16  8:53                                   ` Tvrtko Ursulin
2022-06-10 15:35 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for drm/doc/rfc: i915 VM_BIND feature design + uapi Patchwork
2022-06-17  5:13 ` [Intel-gfx] [PATCH 0/3] " Niranjana Vishwanathapura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220613233947.GA15145@jons-linux-dev-box \
    --to=matthew.brost@intel.com \
    --cc=chris.p.wilson@intel.com \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    --cc=paulo.r.zanoni@intel.com \
    --cc=thomas.hellstrom@intel.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox