Re: [Intel-gfx] [PATCH 3/3] drm/doc/rfc: VM_BIND uapi definition

Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: paulo.r.zanoni@intel.com, intel-gfx@lists.freedesktop.org,
	chris.p.wilson@intel.com, thomas.hellstrom@intel.com,
	dri-devel@lists.freedesktop.org, daniel.vetter@intel.com,
	christian.koenig@amd.com, matthew.auld@intel.com
Subject: Re: [Intel-gfx] [PATCH 3/3] drm/doc/rfc: VM_BIND uapi definition
Date: Mon, 13 Jun 2022 19:09:06 +0100	[thread overview]
Message-ID: <5ebcd237-a6df-add2-070a-056ccb83427a@linux.intel.com> (raw)
In-Reply-To: <20220613174956.GH376@nvishwa1-DESK>


On 13/06/2022 18:49, Niranjana Vishwanathapura wrote:
> On Mon, Jun 13, 2022 at 05:22:02PM +0100, Tvrtko Ursulin wrote:
>>
>> On 13/06/2022 16:05, Niranjana Vishwanathapura wrote:
>>> On Mon, Jun 13, 2022 at 09:24:18AM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 10/06/2022 17:14, Niranjana Vishwanathapura wrote:
>>>>> On Fri, Jun 10, 2022 at 05:48:39PM +0300, Lionel Landwerlin wrote:
>>>>>> On 10/06/2022 13:37, Tvrtko Ursulin wrote:
>>>>>>>
>>>>>>> On 10/06/2022 08:07, Niranjana Vishwanathapura wrote:
>>>>>>>> VM_BIND and related uapi definitions
>>>>>>>>
>>>>>>>> Signed-off-by: Niranjana Vishwanathapura 
>>>>>>>> <niranjana.vishwanathapura@intel.com>
>>>>>>>> ---
>>>>>>>>   Documentation/gpu/rfc/i915_vm_bind.h | 490 
>>>>>>>> +++++++++++++++++++++++++++
>>>>>>>>   1 file changed, 490 insertions(+)
>>>>>>>>   create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>>>>>>>>
>>>>>>>> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
>>>>>>>> b/Documentation/gpu/rfc/i915_vm_bind.h
>>>>>>>> new file mode 100644
>>>>>>>> index 000000000000..9fc854969cfb
>>>>>>>> --- /dev/null
>>>>>>>> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>>>>>>>> @@ -0,0 +1,490 @@
>>>>>>>> +/* SPDX-License-Identifier: MIT */
>>>>>>>> +/*
>>>>>>>> + * Copyright © 2022 Intel Corporation
>>>>>>>> + */
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * DOC: I915_PARAM_HAS_VM_BIND
>>>>>>>> + *
>>>>>>>> + * VM_BIND feature availability.
>>>>>>>> + * See typedef drm_i915_getparam_t param.
>>>>>>>> + * bit[0]: If set, VM_BIND is supported, otherwise not.
>>>>>>>> + * bits[8-15]: VM_BIND implementation version.
>>>>>>>> + * version 0 will not have VM_BIND/UNBIND timeline fence array 
>>>>>>>> support.
>>>>>>>> + */
>>>>>>>> +#define I915_PARAM_HAS_VM_BIND        57
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>>>>>>>> + *
>>>>>>>> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>>>>>>>> + * See struct drm_i915_gem_vm_control flags.
>>>>>>>> + *
>>>>>>>> + * The older execbuf2 ioctl will not support VM_BIND mode of 
>>>>>>>> operation.
>>>>>>>> + * For VM_BIND mode, we have new execbuf3 ioctl which will not 
>>>>>>>> accept any
>>>>>>>> + * execlist (See struct drm_i915_gem_execbuffer3 for more 
>>>>>>>> details).
>>>>>>>> + *
>>>>>>>> + */
>>>>>>>> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND    (1 << 0)
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * DOC: I915_CONTEXT_CREATE_FLAGS_LONG_RUNNING
>>>>>>>> + *
>>>>>>>> + * Flag to declare context as long running.
>>>>>>>> + * See struct drm_i915_gem_context_create_ext flags.
>>>>>>>> + *
>>>>>>>> + * Usage of dma-fence expects that they complete in reasonable 
>>>>>>>> amount of time.
>>>>>>>> + * Compute on the other hand can be long running. Hence it is 
>>>>>>>> not appropriate
>>>>>>>> + * for compute contexts to export request completion dma-fence 
>>>>>>>> to user.
>>>>>>>> + * The dma-fence usage will be limited to in-kernel consumption 
>>>>>>>> only.
>>>>>>>> + * Compute contexts need to use user/memory fence.
>>>>>>>> + *
>>>>>>>> + * So, long running contexts do not support output fences. Hence,
>>>>>>>> + * I915_EXEC_FENCE_SIGNAL (See &drm_i915_gem_exec_fence.flags) 
>>>>>>>> is expected
>>>>>>>> + * to be not used. DRM_I915_GEM_WAIT ioctl call is also not 
>>>>>>>> supported for
>>>>>>>> + * objects mapped to long running contexts.
>>>>>>>> + */
>>>>>>>> +#define I915_CONTEXT_CREATE_FLAGS_LONG_RUNNING   (1u << 2)
>>>>>>>> +
>>>>>>>> +/* VM_BIND related ioctls */
>>>>>>>> +#define DRM_I915_GEM_VM_BIND        0x3d
>>>>>>>> +#define DRM_I915_GEM_VM_UNBIND        0x3e
>>>>>>>> +#define DRM_I915_GEM_EXECBUFFER3    0x3f
>>>>>>>> +#define DRM_I915_GEM_WAIT_USER_FENCE    0x40
>>>>>>>> +
>>>>>>>> +#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + 
>>>>>>>> DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
>>>>>>>> +#define DRM_IOCTL_I915_GEM_VM_UNBIND DRM_IOWR(DRM_COMMAND_BASE 
>>>>>>>> + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
>>>>>>>> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3 
>>>>>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct 
>>>>>>>> drm_i915_gem_execbuffer3)
>>>>>>>> +#define DRM_IOCTL_I915_GEM_WAIT_USER_FENCE 
>>>>>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_WAIT_USER_FENCE, struct 
>>>>>>>> drm_i915_gem_wait_user_fence)
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>>>>>>>> + *
>>>>>>>> + * This structure is passed to VM_BIND ioctl and specifies the 
>>>>>>>> mapping of GPU
>>>>>>>> + * virtual address (VA) range to the section of an object that 
>>>>>>>> should be bound
>>>>>>>> + * in the device page table of the specified address space (VM).
>>>>>>>> + * The VA range specified must be unique (ie., not currently 
>>>>>>>> bound) and can
>>>>>>>> + * be mapped to whole object or a section of the object 
>>>>>>>> (partial binding).
>>>>>>>> + * Multiple VA mappings can be created to the same section of 
>>>>>>>> the object
>>>>>>>> + * (aliasing).
>>>>>>>> + *
>>>>>>>> + * The @queue_idx specifies the queue to use for binding. Same 
>>>>>>>> queue can be
>>>>>>>> + * used for both VM_BIND and VM_UNBIND calls. All submitted 
>>>>>>>> bind and unbind
>>>>>>>> + * operations in a queue are performed in the order of submission.
>>>>>>>> + *
>>>>>>>> + * The @start, @offset and @length should be 4K page aligned. 
>>>>>>>> However the DG2
>>>>>>>> + * and XEHPSDV has 64K page size for device local-memory and 
>>>>>>>> has compact page
>>>>>>>> + * table. On those platforms, for binding device local-memory 
>>>>>>>> objects, the
>>>>>>>> + * @start should be 2M aligned, @offset and @length should be 
>>>>>>>> 64K aligned.
>>>>>>>> + * Also, on those platforms, it is not allowed to bind an 
>>>>>>>> device local-memory
>>>>>>>> + * object and a system memory object in a single 2M section of 
>>>>>>>> VA range.
>>>>>>>> + */
>>>>>>>> +struct drm_i915_gem_vm_bind {
>>>>>>>> +    /** @vm_id: VM (address space) id to bind */
>>>>>>>> +    __u32 vm_id;
>>>>>>>> +
>>>>>>>> +    /** @queue_idx: Index of queue for binding */
>>>>>>>> +    __u32 queue_idx;
>>>>>>>
>>>>>>> I have a question here to which I did not find an answer by 
>>>>>>> browsing the old threads.
>>>>>>>
>>>>>>> Queue index appears to be an implicit synchronisation mechanism, 
>>>>>>> right? Operations on the same index are executed/complete in 
>>>>>>> order of ioctl submission?
>>>>>>>
>>>>>>> Do we _have_ to implement this on the kernel side and could just 
>>>>>>> allow in/out fence and let userspace deal with it?
>>>>>>
>>>>>>
>>>>>> It orders operations like in a queue. Which is kind of what 
>>>>>> happens with existing queues/engines.
>>>>>>
>>>>>> If I understood correctly, it's going to be a kthread + a linked 
>>>>>> list right?
>>>>>>
>>>>>
>>>>> Yes, that is correct.
>>>>>
>>>>>>
>>>>>> -Lionel
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Arbitrary/on-demand number of queues will add the complexity on 
>>>>>>> the kernel side which should be avoided if possible.
>>>>>>>
>>>>>
>>>>> It was discussed in the other thread. Jason prefers this over putting
>>>>> an artificial limit on number of queues (as user can anyway can 
>>>>> exhaust
>>>>> the memory). I think complexity in the driver is manageable.
>>>>
>>>> You'll need to create tracking structures on demand, with atomic 
>>>> replace of last fence, ref counting and locking of some sort, more 
>>>> or less?
>>>>
>>>
>>> We will have a workqueue, an work item and a linked list per queue.
>>> VM_BIND/UNBIND call will add the mapping request to the specified 
>>> queue's
>>> linked list and schedule the work item on the workqueue of that queue.
>>> I am not sure what you mean by last fence and replacing it.
>>>
>>>>> The other option being discussed in to have the user create those
>>>>> queues (like creating engine map) before hand and use that in vm_bind
>>>>> and vm_unbind ioctls. This puts a limit on the number of queues.
>>>>> But it is not clean either and not sure it is worth making the 
>>>>> interface
>>>>> more complex.
>>>>> https://www.spinics.net/lists/dri-devel/msg350448.html
>>>>
>>>> What about the third option of a flag to return a fence (of some 
>>>> sort) and pass in a fence? That way userspace can imagine zero or N 
>>>> queues with very little effort on the kernel side. Was this considered?
>>>>
>>>
>>> I am not clear what fence you are talking about here and how does that
>>> help with the number of vm_bind queues. Can you eloborate?
>>
>> It is actually already documented that bind/unbind will support input 
>> and output fences - so what are these queues on top of what userspace 
>> can already achieve by using them? Purely a convenience or there is 
>> more to it?
>>
> 
> Oh, the vm_bind queues are discussed in this thread.
> https://lists.freedesktop.org/archives/intel-gfx/2022-June/299217.html
> 
> Apparently Vulkan has requirement for multiple queues, each queue
> processing vm_bind/unbind calls in the order of submission.

I don't see how that answers my question so I will take the freedom to 
repeat it. What are these queues on top of what userspace can already 
achieve by using in-out fences? Purely a convenience or there is more to it?

Queue1:

out_fence_A = vm_bind A
out_fence_B = vm_bind B, in_fence=out_fence_A
execbuf(in_fence = out_fence_B)

Queue2:

out_fence_C = vm_bind C
out_fence_D = vm_bind D, in_fence=out_fence_C
execbuf(in_fence = out_fence_D)

Parallel bind:
out_fence_E = vm_bind E
out_fence_F = vm_bind F
merged_fence = fence_merge(out_fence_E, out_fence_F)
execbuf(in_fence = merged_fence)

Regards,

Tvrtko

next prev parent reply	other threads:[~2022-06-13 18:09 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-10  7:07 [Intel-gfx] [PATCH 0/3] drm/doc/rfc: i915 VM_BIND feature design + uapi Niranjana Vishwanathapura
2022-06-10  7:07 ` [Intel-gfx] [PATCH 1/3] drm/doc/rfc: VM_BIND feature design document Niranjana Vishwanathapura
2022-06-10  7:07 ` [Intel-gfx] [PATCH 2/3] drm/i915: Update i915 uapi documentation Niranjana Vishwanathapura
2022-06-10 11:01   ` Matthew Auld
2022-06-10 16:36     ` Niranjana Vishwanathapura
2022-06-10  7:07 ` [Intel-gfx] [PATCH 3/3] drm/doc/rfc: VM_BIND uapi definition Niranjana Vishwanathapura
2022-06-10  8:53   ` Matthew Brost
2022-06-10  8:56     ` Matthew Brost
2022-06-10 16:35       ` Niranjana Vishwanathapura
2022-06-13 17:56         ` Niranjana Vishwanathapura
2022-06-14  6:27     ` Lionel Landwerlin
2022-06-14 16:14       ` Niranjana Vishwanathapura
2022-06-10 10:37   ` Tvrtko Ursulin
2022-06-10 14:48     ` Lionel Landwerlin
2022-06-10 16:14       ` Niranjana Vishwanathapura
2022-06-13  8:24         ` Tvrtko Ursulin
2022-06-13 15:05           ` Niranjana Vishwanathapura
2022-06-13 16:22             ` Tvrtko Ursulin
2022-06-13 17:49               ` Niranjana Vishwanathapura
2022-06-13 18:09                 ` Tvrtko Ursulin [this message]
2022-06-13 23:39                   ` Matthew Brost
2022-06-14  7:16                     ` Tvrtko Ursulin
2022-06-14 15:43                       ` Niranjana Vishwanathapura
2022-06-14 16:02                         ` Tvrtko Ursulin
2022-06-14 16:07                           ` Tvrtko Ursulin
2022-06-14 16:42                             ` Niranjana Vishwanathapura
2022-06-15  7:22                               ` Tvrtko Ursulin
2022-06-15 15:20                                 ` Niranjana Vishwanathapura
2022-06-16  8:53                                   ` Tvrtko Ursulin
2022-06-10 15:35 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for drm/doc/rfc: i915 VM_BIND feature design + uapi Patchwork
2022-06-17  5:13 ` [Intel-gfx] [PATCH 0/3] " Niranjana Vishwanathapura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5ebcd237-a6df-add2-070a-056ccb83427a@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris.p.wilson@intel.com \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    --cc=niranjana.vishwanathapura@intel.com \
    --cc=paulo.r.zanoni@intel.com \
    --cc=thomas.hellstrom@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox