Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	kenneth.w.graunke@intel.com, lionel.g.landwerlin@intel.com,
	jose.souza@intel.com, simona.vetter@ffwll.ch,
	thomas.hellstrom@linux.intel.com, boris.brezillon@collabora.com,
	airlied@gmail.com, mihail.atanassov@arm.com,
	steven.price@arm.com, shashank.sharma@amd.com
Subject: Re: [RFC PATCH 02/29] dma-fence: Add dma_fence_user_fence
Date: Fri, 22 Nov 2024 11:28:57 +0100	[thread overview]
Message-ID: <42e5ad2f-b4d0-4d48-888c-2dd398044f62@amd.com> (raw)
In-Reply-To: <Zz/t5pp9rCg9b7lw@lstrano-desk.jf.intel.com>

[-- Attachment #1: Type: text/plain, Size: 5317 bytes --]

Am 22.11.24 um 03:35 schrieb Matthew Brost:
> [SNIP]
>>> The flow here would be, a user job needs to wait on external dma-fence
>>> in a syncobj, syncfile, etc..., call the convert dma-fence to user fence
>>> IOCTL before the submission (patch 22, 28 in this series), program the
>>> wait via ring instructions, and then do the user submission. This would
>>> avoid blocking on external dma-fences in the submission path.
>>>
>>> I think this makes sense and having a light weight helper to normalize
>>> this flow across drivers makes a bit sense too.
>> Well we have pretty much the same concept, but all writes are done by the
>> hardware and not go by a round-trip through the CPU.
>>
> Hmm, I'm curious how that works on your end. Doesn't the DMA fence
> signaling have to go through the kernel?

No, we have a protected_fence packet which basically writes the current 
processing status (RPTR) into a location defined by the kernel driver.

So neither the value nor the location of the write can be manipulated by 
userspace.

This way queues can signal each other their status without going through 
a CPU round trip nor writing into a shared memory location. Writing into 
a memory location can probably be done by any hardware, but that usually 
has tons of scheduling implications, e.g. priority inversion etc...

> Yes, of course, in Xe we program seqno writes through the GPU when we
> can, but our bind code currently opportunistically bypasses the GPU.
> Eventually, I think it will become a 100% CPU operation for various
> reasons. Likewise, if a fence is coming from an external process, there
> is no GPU job to write the seqno.

Good point, for that use case the implementation would be useful for us 
as well.

> Of course, we could issue a GPU job to
> write the seqno, but this would add latency. In the case of VM bind, we
> really want to completely decouple that from the GPU for various reasons
> (I can explain why if needed, but it's kind of off-topic).
>
>> We have a read only mapped seq64 area in the kernel reserved part of the VM
>> address space.
>>
>> Through this area the queues can see each others fence progress and we can
>> say things like BO mapping and TLB flush are finished when this seq64
>> increases please suspend further processing until you see that.
>>
>> Could be that this is useful for more than XE, but at least for AMD I
>> currently don't see that.
>>
> Ok, we have no other current users, and if you feel it is better to
> carry this in Xe in a way that it can be moved to the common layer
> later, there’s no issue with that. We have several other components like
> this in Xe that are generic but currently live in Xe.

It's probably overkill for DMA-buf, but maybe we can put that stuff into 
DRM.

Christian.

>
> Matt
>
>> Regards,
>> Christian.
>>
>>> Matt
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> +}
>>>>> +EXPORT_SYMBOL(dma_fence_user_fence_free);
>>>>> +
>>>>> +/**
>>>>> + * dma_fence_user_fence_attach() - Attach user fence to dma-fence
>>>>> + *
>>>>> + * @fence: fence
>>>>> + * @user_fence user fence
>>>>> + * @map: IOSYS map to write seqno to
>>>>> + * @seqno: seqno to write to IOSYS map
>>>>> + *
>>>>> + * Attach a user fence, which is a seqno write to an IOSYS map, to a DMA fence.
>>>>> + * The caller must guarantee that the memory in the IOSYS map doesn't move
>>>>> + * before the fence signals. This is typically done by installing the DMA fence
>>>>> + * into the BO's DMA reservation bookkeeping slot from which the IOSYS was
>>>>> + * derived.
>>>>> + */
>>>>> +void dma_fence_user_fence_attach(struct dma_fence *fence,
>>>>> +				 struct dma_fence_user_fence *user_fence,
>>>>> +				 struct iosys_map *map, u64 seqno)
>>>>> +{
>>>>> +	int err;
>>>>> +
>>>>> +	user_fence->map = *map;
>>>>> +	user_fence->seqno = seqno;
>>>>> +
>>>>> +	err = dma_fence_add_callback(fence, &user_fence->cb, user_fence_cb);
>>>>> +	if (err == -ENOENT)
>>>>> +		user_fence_cb(NULL, &user_fence->cb);
>>>>> +}
>>>>> +EXPORT_SYMBOL(dma_fence_user_fence_attach);
>>>>> diff --git a/include/linux/dma-fence-user-fence.h b/include/linux/dma-fence-user-fence.h
>>>>> new file mode 100644
>>>>> index 000000000000..8678129c7d56
>>>>> --- /dev/null
>>>>> +++ b/include/linux/dma-fence-user-fence.h
>>>>> @@ -0,0 +1,31 @@
>>>>> +/* SPDX-License-Identifier: MIT */
>>>>> +/*
>>>>> + * Copyright © 2024 Intel Corporation
>>>>> + */
>>>>> +
>>>>> +#ifndef __LINUX_DMA_FENCE_USER_FENCE_H
>>>>> +#define __LINUX_DMA_FENCE_USER_FENCE_H
>>>>> +
>>>>> +#include <linux/dma-fence.h>
>>>>> +#include <linux/iosys-map.h>
>>>>> +
>>>>> +/** struct dma_fence_user_fence - User fence */
>>>>> +struct dma_fence_user_fence {
>>>>> +	/** @cb: dma-fence callback used to attach user fence to dma-fence */
>>>>> +	struct dma_fence_cb cb;
>>>>> +	/** @map: IOSYS map to write seqno to */
>>>>> +	struct iosys_map map;
>>>>> +	/** @seqno: seqno to write to IOSYS map */
>>>>> +	u64 seqno;
>>>>> +};
>>>>> +
>>>>> +struct dma_fence_user_fence *dma_fence_user_fence_alloc(void);
>>>>> +
>>>>> +void dma_fence_user_fence_free(struct dma_fence_user_fence *user_fence);
>>>>> +
>>>>> +void dma_fence_user_fence_attach(struct dma_fence *fence,
>>>>> +				 struct dma_fence_user_fence *user_fence,
>>>>> +				 struct iosys_map *map,
>>>>> +				 u64 seqno);
>>>>> +
>>>>> +#endif

[-- Attachment #2: Type: text/html, Size: 6605 bytes --]

  reply	other threads:[~2024-11-22 10:29 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-18 23:37 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 01/29] dma-fence: Add dma_fence_preempt base class Matthew Brost
2024-11-20 13:31   ` Christian König
2024-11-20 17:36     ` Matthew Brost
2024-11-21 10:04       ` Christian König
2024-11-21 18:41         ` Matthew Brost
2024-11-22 10:56           ` Christian König
2024-11-18 23:37 ` [RFC PATCH 02/29] dma-fence: Add dma_fence_user_fence Matthew Brost
2024-11-20 13:38   ` Christian König
2024-11-20 22:50     ` Matthew Brost
2024-11-21  9:31       ` Christian König
2024-11-22  2:35         ` Matthew Brost
2024-11-22 10:28           ` Christian König [this message]
2024-11-18 23:37 ` [RFC PATCH 03/29] drm/xe: Use dma_fence_preempt base class Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 04/29] drm/xe: Allocate doorbells for UMD exec queues Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 05/29] drm/xe: Add doorbell ID to snapshot capture Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 06/29] drm/xe: Break submission ring out into its own BO Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 07/29] drm/xe: Break indirect ring state " Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 08/29] drm/xe: Clear GGTT in xe_bo_restore_kernel Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 09/29] FIXME: drm/xe: Add pad to ring and indirect state Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 10/29] drm/xe: Enable indirect ring on media GT Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 11/29] drm/xe: Don't add pinned mappings to VM bulk move Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 12/29] drm/xe: Add exec queue post init extension processing Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 13/29] drm/xe/mmap: Add mmap support for PCI memory barrier Matthew Brost
2024-11-19 10:00   ` Christian König
2024-11-19 11:57     ` Joonas Lahtinen
2024-11-19 12:42       ` Mrozek, Michal
2024-12-18 12:59         ` Upadhyay, Tejas
2024-11-18 23:37 ` [RFC PATCH 14/29] drm/xe: Add support for mmapping doorbells to user space Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 15/29] drm/xe: Add support for mmapping submission ring and indirect ring state " Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 16/29] drm/xe/uapi: Define UMD exec queue mapping uAPI Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 17/29] drm/xe: Add usermap exec queue extension Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 18/29] drm/xe: Drop EXEC_QUEUE_FLAG_UMD_SUBMISSION flag Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 19/29] drm/xe: Do not allow usermap exec queues in exec IOCTL Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 20/29] drm/xe: Teach GuC backend to kill usermap queues Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 21/29] drm/xe: Enable preempt fences on " Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 22/29] drm/xe/uapi: Add uAPI to convert user semaphore to / from drm syncobj Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 23/29] drm/xe: Add user fence IRQ handler Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 24/29] drm/xe: Add xe_hw_fence_user_init Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 25/29] drm/xe: Add a message lock to the Xe GPU scheduler Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 26/29] drm/xe: Always wait on preempt fences in vma_check_userptr Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 27/29] drm/xe: Teach xe_sync layer about drm_xe_semaphore Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 28/29] drm/xe: Add VM convert fence IOCTL Matthew Brost
2024-11-18 23:37 ` [RFC PATCH 29/29] drm/xe: Add user fence TDR Matthew Brost
2024-11-18 23:55 ` ✓ CI.Patch_applied: success for UMD direct submission in Xe Patchwork
2024-11-18 23:56 ` ✗ CI.checkpatch: warning " Patchwork
2024-11-18 23:57 ` ✓ CI.KUnit: success " Patchwork
2024-11-19  0:15 ` ✓ CI.Build: " Patchwork
2024-11-19  0:17 ` ✗ CI.Hooks: failure " Patchwork
2024-11-19  0:19 ` ✓ CI.checksparse: success " Patchwork
2024-11-19  0:39 ` ✗ CI.BAT: failure " Patchwork
2024-11-19 11:44 ` ✗ CI.FULL: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42e5ad2f-b4d0-4d48-888c-2dd398044f62@amd.com \
    --to=christian.koenig@amd.com \
    --cc=airlied@gmail.com \
    --cc=boris.brezillon@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jose.souza@intel.com \
    --cc=kenneth.w.graunke@intel.com \
    --cc=lionel.g.landwerlin@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mihail.atanassov@arm.com \
    --cc=shashank.sharma@amd.com \
    --cc=simona.vetter@ffwll.ch \
    --cc=steven.price@arm.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox