* [PATCH 0/3] Enable device atomics with a VM bind flag
@ 2024-04-10 17:03 Nirmoy Das
2024-04-10 17:03 ` [PATCH 1/3] drm/xe: Consolidate setting PTE_AE into one place Nirmoy Das
` (6 more replies)
0 siblings, 7 replies; 27+ messages in thread
From: Nirmoy Das @ 2024-04-10 17:03 UTC (permalink / raw)
To: intel-xe
Cc: Nirmoy Das, Balasubramani Vivekanandan, Brian Welty, Fei Yang,
Lionel G Landwerlin, Matt Roper, Matthew Brost, Michal Mrozek,
Oak Zeng, Thomas Hellstr_m
Currently device atomics in SMEM only buffer is not supported and
given that simultaneous usage of device atomics and CPU atomics on
the same SMEM buffer is not guaranteed to function without migration,
and UMD expects no migration for SMEM-only buffer objects, so this
provide a way to set device atomics when UMD is certain to use the
buffer only for device atomics.
Test-with: 20240410170041.24963-1-nirmoy.das@intel.com
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Lionel G Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Mrozek <michal.mrozek@intel.com>
Cc: Oak Zeng <oak.zeng@intel.com>
Cc: Thomas Hellstr_m <thomas.hellstrom@linux.intel.com>
Nirmoy Das (3):
drm/xe: Consolidate setting PTE_AE into one place
drm/xe: Add function to check if BO has single placement
drm/xe/uapi: Introduce VMA bind flag for device atomics
drivers/gpu/drm/xe/xe_bo.c | 14 ++++++++++++++
drivers/gpu/drm/xe/xe_bo.h | 1 +
drivers/gpu/drm/xe/xe_pt.c | 4 +---
drivers/gpu/drm/xe/xe_vm.c | 32 ++++++++++++++++++++++++++++----
drivers/gpu/drm/xe/xe_vm_types.h | 2 ++
include/uapi/drm/xe_drm.h | 9 +++++----
6 files changed, 51 insertions(+), 11 deletions(-)
--
2.42.0
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 1/3] drm/xe: Consolidate setting PTE_AE into one place
2024-04-10 17:03 [PATCH 0/3] Enable device atomics with a VM bind flag Nirmoy Das
@ 2024-04-10 17:03 ` Nirmoy Das
2024-04-11 23:22 ` Matt Roper
2024-04-10 17:03 ` [PATCH 2/3] drm/xe: Add function to check if BO has single placement Nirmoy Das
` (5 subsequent siblings)
6 siblings, 1 reply; 27+ messages in thread
From: Nirmoy Das @ 2024-04-10 17:03 UTC (permalink / raw)
To: intel-xe; +Cc: Nirmoy Das
Currently decision to set PTE_AE is spread between xe_pt
and xe_vm files and there is no reason to be keep it that
way. Consolidate the logic for better maintainability.
This also remove the extra care needed for PVC which only
allows setting PTE_AE for LMEM.
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
---
drivers/gpu/drm/xe/xe_pt.c | 4 +---
drivers/gpu/drm/xe/xe_vm.c | 7 ++++---
2 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 5b7930f46cf3..7dc13a8bb44f 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -597,7 +597,6 @@ static int
xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
struct xe_vm_pgtable_update *entries, u32 *num_entries)
{
- struct xe_device *xe = tile_to_xe(tile);
struct xe_bo *bo = xe_vma_bo(vma);
bool is_devmem = !xe_vma_is_userptr(vma) && bo &&
(xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo));
@@ -619,8 +618,7 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id];
int ret;
- if ((vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT) &&
- (is_devmem || !IS_DGFX(xe)))
+ if (vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT)
xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
if (is_devmem) {
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index a196dbe65252..8f3474c5f480 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -904,9 +904,6 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
for_each_tile(tile, vm->xe, id)
vma->tile_mask |= 0x1 << id;
- if (GRAPHICS_VER(vm->xe) >= 20 || vm->xe->info.platform == XE_PVC)
- vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
-
vma->pat_index = pat_index;
if (bo) {
@@ -914,6 +911,10 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
xe_bo_assert_held(bo);
+ if (GRAPHICS_VER(vm->xe) >= 20 || xe_bo_is_vram(bo) ||
+ !IS_DGFX(vm->xe))
+ vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
+
vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
if (IS_ERR(vm_bo)) {
xe_vma_free(vma);
--
2.42.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 2/3] drm/xe: Add function to check if BO has single placement
2024-04-10 17:03 [PATCH 0/3] Enable device atomics with a VM bind flag Nirmoy Das
2024-04-10 17:03 ` [PATCH 1/3] drm/xe: Consolidate setting PTE_AE into one place Nirmoy Das
@ 2024-04-10 17:03 ` Nirmoy Das
2024-04-10 17:03 ` [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics Nirmoy Das
` (4 subsequent siblings)
6 siblings, 0 replies; 27+ messages in thread
From: Nirmoy Das @ 2024-04-10 17:03 UTC (permalink / raw)
To: intel-xe; +Cc: Nirmoy Das
A new helper function xe_bo_has_single_placement() to check
if a BO has single placement.
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
---
drivers/gpu/drm/xe/xe_bo.c | 14 ++++++++++++++
drivers/gpu/drm/xe/xe_bo.h | 1 +
2 files changed, 15 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index fdeb3691d3f6..281126667ba7 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -94,6 +94,20 @@ bool xe_bo_is_stolen(struct xe_bo *bo)
return bo->ttm.resource->mem_type == XE_PL_STOLEN;
}
+/**
+ * xe_bo_has_single_placement - check if BO is placed only in one memory location
+ * @bo: The BO
+ *
+ * This function checks whether a given BO is placed in only one memory location.
+ *
+ * Returns: true if the BO is placed in a single memory location, false otherwise.
+ *
+ */
+bool xe_bo_has_single_placement(struct xe_bo *bo)
+{
+ return bo->placement.num_placement == 1;
+}
+
/**
* xe_bo_is_stolen_devmem - check if BO is of stolen type accessed via PCI BAR
* @bo: The BO
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index a885b14bf595..6de894c728f5 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -206,6 +206,7 @@ bool mem_type_is_vram(u32 mem_type);
bool xe_bo_is_vram(struct xe_bo *bo);
bool xe_bo_is_stolen(struct xe_bo *bo);
bool xe_bo_is_stolen_devmem(struct xe_bo *bo);
+bool xe_bo_has_single_placement(struct xe_bo *bo);
uint64_t vram_region_gpu_offset(struct ttm_resource *res);
bool xe_bo_can_migrate(struct xe_bo *bo, u32 mem_type);
--
2.42.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics
2024-04-10 17:03 [PATCH 0/3] Enable device atomics with a VM bind flag Nirmoy Das
2024-04-10 17:03 ` [PATCH 1/3] drm/xe: Consolidate setting PTE_AE into one place Nirmoy Das
2024-04-10 17:03 ` [PATCH 2/3] drm/xe: Add function to check if BO has single placement Nirmoy Das
@ 2024-04-10 17:03 ` Nirmoy Das
2024-04-10 17:35 ` Matthew Brost
` (2 more replies)
2024-04-10 18:04 ` ✓ CI.Patch_applied: success for Enable device atomics with a VM bind flag Patchwork
` (3 subsequent siblings)
6 siblings, 3 replies; 27+ messages in thread
From: Nirmoy Das @ 2024-04-10 17:03 UTC (permalink / raw)
To: intel-xe; +Cc: Nirmoy Das
Adds a new VMA bind flag to enable device atomics on SMEM only buffers.
Given that simultaneous usage of device atomics and CPU atomics on
the same SMEM buffer is not guaranteed to function without migration,
and UMD expects no migration for SMEM-only buffer objects, so this provide
a way to set device atomics when UMD is certain to use the buffer only
for device atomics.
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
---
drivers/gpu/drm/xe/xe_vm.c | 27 +++++++++++++++++++++++++--
drivers/gpu/drm/xe/xe_vm_types.h | 2 ++
include/uapi/drm/xe_drm.h | 9 +++++----
3 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 8f3474c5f480..530b4bbc186c 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -851,6 +851,7 @@ static void xe_vma_free(struct xe_vma *vma)
#define VMA_CREATE_FLAG_READ_ONLY BIT(0)
#define VMA_CREATE_FLAG_IS_NULL BIT(1)
#define VMA_CREATE_FLAG_DUMPABLE BIT(2)
+#define VMA_CREATE_FLAG_DEVICE_ATOMICS BIT(3)
static struct xe_vma *xe_vma_create(struct xe_vm *vm,
struct xe_bo *bo,
@@ -864,6 +865,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
bool read_only = (flags & VMA_CREATE_FLAG_READ_ONLY);
bool is_null = (flags & VMA_CREATE_FLAG_IS_NULL);
bool dumpable = (flags & VMA_CREATE_FLAG_DUMPABLE);
+ bool enable_atomics = (flags & VMA_CREATE_FLAG_IS_NULL);
xe_assert(vm->xe, start < end);
xe_assert(vm->xe, end < vm->size);
@@ -912,7 +914,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
xe_bo_assert_held(bo);
if (GRAPHICS_VER(vm->xe) >= 20 || xe_bo_is_vram(bo) ||
- !IS_DGFX(vm->xe))
+ !IS_DGFX(vm->xe) || enable_atomics)
vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
@@ -2174,6 +2176,18 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
operation, (ULL)addr, (ULL)range,
(ULL)bo_offset_or_userptr);
+ if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
+ (vm->xe->info.platform == XE_PVC && !xe_bo_is_vram(bo))) {
+ drm_warn(&vm->xe->drm, "Setting device atomics on SMEM is not supported for this platform");
+ return ERR_PTR(-EINVAL);
+ }
+
+ if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
+ !xe_bo_has_single_placement(bo))
+ drm_warn(&vm->xe->drm, "DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS can be only set if the BO has single placement");
+ return ERR_PTR(-EINVAL);
+ }
+
switch (operation) {
case DRM_XE_VM_BIND_OP_MAP:
case DRM_XE_VM_BIND_OP_MAP_USERPTR:
@@ -2216,6 +2230,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
if (__op->op == DRM_GPUVA_OP_MAP) {
op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL;
op->map.dumpable = flags & DRM_XE_VM_BIND_FLAG_DUMPABLE;
+ op->map.enable_device_atomics = flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS;
op->map.pat_index = pat_index;
} else if (__op->op == DRM_GPUVA_OP_PREFETCH) {
op->prefetch.region = prefetch_region;
@@ -2412,6 +2427,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
VMA_CREATE_FLAG_IS_NULL : 0;
flags |= op->map.dumpable ?
VMA_CREATE_FLAG_DUMPABLE : 0;
+ flags |= op->map.enable_device_atomics ?
+ VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
vma = new_vma(vm, &op->base.map, op->map.pat_index,
flags);
@@ -2439,6 +2456,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
flags |= op->base.remap.unmap->va->flags &
XE_VMA_DUMPABLE ?
VMA_CREATE_FLAG_DUMPABLE : 0;
+ flags |= op->base.remap.unmap->va->flags ?
+ VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
vma = new_vma(vm, op->base.remap.prev,
old->pat_index, flags);
@@ -2476,6 +2495,9 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
flags |= op->base.remap.unmap->va->flags &
XE_VMA_DUMPABLE ?
VMA_CREATE_FLAG_DUMPABLE : 0;
+ flags |= op->base.remap.unmap->va->flags ?
+ VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
+
vma = new_vma(vm, op->base.remap.next,
old->pat_index, flags);
@@ -2831,7 +2853,8 @@ static int vm_bind_ioctl_ops_execute(struct xe_vm *vm,
(DRM_XE_VM_BIND_FLAG_READONLY | \
DRM_XE_VM_BIND_FLAG_IMMEDIATE | \
DRM_XE_VM_BIND_FLAG_NULL | \
- DRM_XE_VM_BIND_FLAG_DUMPABLE)
+ DRM_XE_VM_BIND_FLAG_DUMPABLE | \
+ DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS)
#define XE_64K_PAGE_MASK 0xffffull
#define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP)
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index badf3945083d..7b9c68909c78 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -282,6 +282,8 @@ struct xe_vma_op_map {
bool dumpable;
/** @pat_index: The pat index to use for this operation. */
u16 pat_index;
+ /** @enable_device_atomics: Whether the VMA will allow device atomics */
+ bool enable_device_atomics;
};
/** struct xe_vma_op_remap - VMA remap operation */
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 1446c3bae515..bffe8b1c040c 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -969,10 +969,11 @@ struct drm_xe_vm_bind_op {
/** @op: Bind operation to perform */
__u32 op;
-#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
-#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
-#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
-#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
+#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
+#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
+#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
+#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
+#define DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS (1 << 4)
/** @flags: Bind flags */
__u32 flags;
--
2.42.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics
2024-04-10 17:03 ` [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics Nirmoy Das
@ 2024-04-10 17:35 ` Matthew Brost
2024-04-11 9:22 ` Nirmoy Das
2024-04-11 14:00 ` Nirmoy Das
2024-04-11 13:14 ` Lionel Landwerlin
2024-04-11 23:44 ` Matt Roper
2 siblings, 2 replies; 27+ messages in thread
From: Matthew Brost @ 2024-04-10 17:35 UTC (permalink / raw)
To: Nirmoy Das; +Cc: intel-xe
On Wed, Apr 10, 2024 at 07:03:08PM +0200, Nirmoy Das wrote:
> Adds a new VMA bind flag to enable device atomics on SMEM only buffers.
>
> Given that simultaneous usage of device atomics and CPU atomics on
> the same SMEM buffer is not guaranteed to function without migration,
> and UMD expects no migration for SMEM-only buffer objects, so this provide
> a way to set device atomics when UMD is certain to use the buffer only
> for device atomics.
>
For new uAPI we will need a UMD PR using it and provide a link to the PR
in the commit message.
> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
> ---
> drivers/gpu/drm/xe/xe_vm.c | 27 +++++++++++++++++++++++++--
> drivers/gpu/drm/xe/xe_vm_types.h | 2 ++
> include/uapi/drm/xe_drm.h | 9 +++++----
> 3 files changed, 32 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 8f3474c5f480..530b4bbc186c 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -851,6 +851,7 @@ static void xe_vma_free(struct xe_vma *vma)
> #define VMA_CREATE_FLAG_READ_ONLY BIT(0)
> #define VMA_CREATE_FLAG_IS_NULL BIT(1)
> #define VMA_CREATE_FLAG_DUMPABLE BIT(2)
> +#define VMA_CREATE_FLAG_DEVICE_ATOMICS BIT(3)
>
> static struct xe_vma *xe_vma_create(struct xe_vm *vm,
> struct xe_bo *bo,
> @@ -864,6 +865,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
> bool read_only = (flags & VMA_CREATE_FLAG_READ_ONLY);
> bool is_null = (flags & VMA_CREATE_FLAG_IS_NULL);
> bool dumpable = (flags & VMA_CREATE_FLAG_DUMPABLE);
> + bool enable_atomics = (flags & VMA_CREATE_FLAG_IS_NULL);
s/flags & VMA_CREATE_FLAG_IS_NULL/flags & VMA_CREATE_FLAG_DEVICE_ATOMICS/
>
> xe_assert(vm->xe, start < end);
> xe_assert(vm->xe, end < vm->size);
> @@ -912,7 +914,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
> xe_bo_assert_held(bo);
>
> if (GRAPHICS_VER(vm->xe) >= 20 || xe_bo_is_vram(bo) ||
> - !IS_DGFX(vm->xe))
> + !IS_DGFX(vm->xe) || enable_atomics)
> vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>
> vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
> @@ -2174,6 +2176,18 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
> operation, (ULL)addr, (ULL)range,
> (ULL)bo_offset_or_userptr);
>
> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
> + (vm->xe->info.platform == XE_PVC && !xe_bo_is_vram(bo))) {
> + drm_warn(&vm->xe->drm, "Setting device atomics on SMEM is not supported for this platform");
> + return ERR_PTR(-EINVAL);
> + }
> +
> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
> + !xe_bo_has_single_placement(bo))
> + drm_warn(&vm->xe->drm, "DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS can be only set if the BO has single placement");
> + return ERR_PTR(-EINVAL);
> + }
> +
A few things here.
- These check should go in xe_vm_bind_ioctl() in the loop that looks up
the BOs and parses them. Probably move implementation to a helper(s)
too as it is already a pretty big ugly loop in xe_vm_bind_ioctl and
adding more logic will make it uglier. Search for
drm_gem_object_lookup in xe_vm_bind_ioctl() for the loop I am refering
to.
- Rather than drm_warns just use XE_IOCTL_DBG for these checks, that is
the style or input to IOCTLs when we return -EINVAL in Xe.
- Are these checks valid? At one point on PVC I had to code to do
atomics between CPU and GPU by migrating the BO back and forth on page
faults. I think the i915 does this too? Are we abandoning that idea?
- Lastly if these check are valid, rather than a platform check in the
code I'd rather see a bit in intel_device_info.
> switch (operation) {
> case DRM_XE_VM_BIND_OP_MAP:
> case DRM_XE_VM_BIND_OP_MAP_USERPTR:
> @@ -2216,6 +2230,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
> if (__op->op == DRM_GPUVA_OP_MAP) {
> op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL;
> op->map.dumpable = flags & DRM_XE_VM_BIND_FLAG_DUMPABLE;
> + op->map.enable_device_atomics = flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS;
> op->map.pat_index = pat_index;
> } else if (__op->op == DRM_GPUVA_OP_PREFETCH) {
> op->prefetch.region = prefetch_region;
> @@ -2412,6 +2427,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
> VMA_CREATE_FLAG_IS_NULL : 0;
> flags |= op->map.dumpable ?
> VMA_CREATE_FLAG_DUMPABLE : 0;
> + flags |= op->map.enable_device_atomics ?
> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>
> vma = new_vma(vm, &op->base.map, op->map.pat_index,
> flags);
> @@ -2439,6 +2456,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
> flags |= op->base.remap.unmap->va->flags &
> XE_VMA_DUMPABLE ?
> VMA_CREATE_FLAG_DUMPABLE : 0;
> + flags |= op->base.remap.unmap->va->flags ?
> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>
> vma = new_vma(vm, op->base.remap.prev,
> old->pat_index, flags);
> @@ -2476,6 +2495,9 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
> flags |= op->base.remap.unmap->va->flags &
> XE_VMA_DUMPABLE ?
> VMA_CREATE_FLAG_DUMPABLE : 0;
> + flags |= op->base.remap.unmap->va->flags ?
> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
> +
Extra newline.
>
> vma = new_vma(vm, op->base.remap.next,
> old->pat_index, flags);
> @@ -2831,7 +2853,8 @@ static int vm_bind_ioctl_ops_execute(struct xe_vm *vm,
> (DRM_XE_VM_BIND_FLAG_READONLY | \
> DRM_XE_VM_BIND_FLAG_IMMEDIATE | \
> DRM_XE_VM_BIND_FLAG_NULL | \
> - DRM_XE_VM_BIND_FLAG_DUMPABLE)
> + DRM_XE_VM_BIND_FLAG_DUMPABLE | \
> + DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS)
> #define XE_64K_PAGE_MASK 0xffffull
> #define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP)
>
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index badf3945083d..7b9c68909c78 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -282,6 +282,8 @@ struct xe_vma_op_map {
> bool dumpable;
> /** @pat_index: The pat index to use for this operation. */
> u16 pat_index;
> + /** @enable_device_atomics: Whether the VMA will allow device atomics */
> + bool enable_device_atomics;
Put this next to the other bools in xe_vma_op_map.
> };
>
> /** struct xe_vma_op_remap - VMA remap operation */
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index 1446c3bae515..bffe8b1c040c 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -969,10 +969,11 @@ struct drm_xe_vm_bind_op {
> /** @op: Bind operation to perform */
> __u32 op;
>
> -#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
> -#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
> -#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
> -#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
> +#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
> +#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
> +#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
> +#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
> +#define DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS (1 << 4)
Kernel doc for new flag.
Matt
> /** @flags: Bind flags */
> __u32 flags;
>
> --
> 2.42.0
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* ✓ CI.Patch_applied: success for Enable device atomics with a VM bind flag
2024-04-10 17:03 [PATCH 0/3] Enable device atomics with a VM bind flag Nirmoy Das
` (2 preceding siblings ...)
2024-04-10 17:03 ` [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics Nirmoy Das
@ 2024-04-10 18:04 ` Patchwork
2024-04-10 18:04 ` ✓ CI.checkpatch: " Patchwork
` (2 subsequent siblings)
6 siblings, 0 replies; 27+ messages in thread
From: Patchwork @ 2024-04-10 18:04 UTC (permalink / raw)
To: Nirmoy Das; +Cc: intel-xe
== Series Details ==
Series: Enable device atomics with a VM bind flag
URL : https://patchwork.freedesktop.org/series/132290/
State : success
== Summary ==
=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: 3658e2ec471a drm-tip: 2024y-04m-10d-17h-23m-58s UTC integration manifest
=== git am output follows ===
Applying: drm/xe: Consolidate setting PTE_AE into one place
Applying: drm/xe: Add function to check if BO has single placement
Applying: drm/xe/uapi: Introduce VMA bind flag for device atomics
^ permalink raw reply [flat|nested] 27+ messages in thread
* ✓ CI.checkpatch: success for Enable device atomics with a VM bind flag
2024-04-10 17:03 [PATCH 0/3] Enable device atomics with a VM bind flag Nirmoy Das
` (3 preceding siblings ...)
2024-04-10 18:04 ` ✓ CI.Patch_applied: success for Enable device atomics with a VM bind flag Patchwork
@ 2024-04-10 18:04 ` Patchwork
2024-04-10 18:04 ` ✗ CI.KUnit: failure " Patchwork
2024-04-11 16:22 ` [PATCH 0/3] " Zeng, Oak
6 siblings, 0 replies; 27+ messages in thread
From: Patchwork @ 2024-04-10 18:04 UTC (permalink / raw)
To: Nirmoy Das; +Cc: intel-xe
== Series Details ==
Series: Enable device atomics with a VM bind flag
URL : https://patchwork.freedesktop.org/series/132290/
State : success
== Summary ==
+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
10b531c2aeb176a1a539b4a77216232f97719cec
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit c9691d35ef5cb85263c7d9702c1e8aca9163e6b1
Author: Nirmoy Das <nirmoy.das@intel.com>
Date: Wed Apr 10 19:03:08 2024 +0200
drm/xe/uapi: Introduce VMA bind flag for device atomics
Adds a new VMA bind flag to enable device atomics on SMEM only buffers.
Given that simultaneous usage of device atomics and CPU atomics on
the same SMEM buffer is not guaranteed to function without migration,
and UMD expects no migration for SMEM-only buffer objects, so this provide
a way to set device atomics when UMD is certain to use the buffer only
for device atomics.
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
+ /mt/dim checkpatch 3658e2ec471a1d67baa30a554583bcdd4be55857 drm-intel
49d3c4f1432d drm/xe: Consolidate setting PTE_AE into one place
dca148c74afb drm/xe: Add function to check if BO has single placement
c9691d35ef5c drm/xe/uapi: Introduce VMA bind flag for device atomics
^ permalink raw reply [flat|nested] 27+ messages in thread
* ✗ CI.KUnit: failure for Enable device atomics with a VM bind flag
2024-04-10 17:03 [PATCH 0/3] Enable device atomics with a VM bind flag Nirmoy Das
` (4 preceding siblings ...)
2024-04-10 18:04 ` ✓ CI.checkpatch: " Patchwork
@ 2024-04-10 18:04 ` Patchwork
2024-04-11 16:22 ` [PATCH 0/3] " Zeng, Oak
6 siblings, 0 replies; 27+ messages in thread
From: Patchwork @ 2024-04-10 18:04 UTC (permalink / raw)
To: Nirmoy Das; +Cc: intel-xe
== Series Details ==
Series: Enable device atomics with a VM bind flag
URL : https://patchwork.freedesktop.org/series/132290/
State : failure
== Summary ==
+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
ERROR:root:../arch/x86/um/user-offsets.c:17:6: warning: no previous prototype for ‘foo’ [-Wmissing-prototypes]
17 | void foo(void)
| ^~~
In file included from ../arch/um/kernel/asm-offsets.c:1:
../arch/x86/um/shared/sysdep/kernel-offsets.h:9:6: warning: no previous prototype for ‘foo’ [-Wmissing-prototypes]
9 | void foo(void)
| ^~~
../arch/x86/um/bugs_64.c:9:6: warning: no previous prototype for ‘arch_check_bugs’ [-Wmissing-prototypes]
9 | void arch_check_bugs(void)
| ^~~~~~~~~~~~~~~
../arch/x86/um/bugs_64.c:13:6: warning: no previous prototype for ‘arch_examine_signal’ [-Wmissing-prototypes]
13 | void arch_examine_signal(int sig, struct uml_pt_regs *regs)
| ^~~~~~~~~~~~~~~~~~~
../arch/x86/um/fault.c:18:5: warning: no previous prototype for ‘arch_fixup’ [-Wmissing-prototypes]
18 | int arch_fixup(unsigned long address, struct uml_pt_regs *regs)
| ^~~~~~~~~~
../arch/x86/um/vdso/um_vdso.c:16:5: warning: no previous prototype for ‘__vdso_clock_gettime’ [-Wmissing-prototypes]
16 | int __vdso_clock_gettime(clockid_t clock, struct __kernel_old_timespec *ts)
| ^~~~~~~~~~~~~~~~~~~~
../arch/x86/um/vdso/um_vdso.c:30:5: warning: no previous prototype for ‘__vdso_gettimeofday’ [-Wmissing-prototypes]
30 | int __vdso_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz)
| ^~~~~~~~~~~~~~~~~~~
../arch/x86/um/vdso/um_vdso.c:44:21: warning: no previous prototype for ‘__vdso_time’ [-Wmissing-prototypes]
44 | __kernel_old_time_t __vdso_time(__kernel_old_time_t *t)
| ^~~~~~~~~~~
../arch/x86/um/vdso/um_vdso.c:57:1: warning: no previous prototype for ‘__vdso_getcpu’ [-Wmissing-prototypes]
57 | __vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
| ^~~~~~~~~~~~~
../arch/x86/um/os-Linux/registers.c:146:15: warning: no previous prototype for ‘get_thread_reg’ [-Wmissing-prototypes]
146 | unsigned long get_thread_reg(int reg, jmp_buf *buf)
| ^~~~~~~~~~~~~~
../arch/x86/um/os-Linux/mcontext.c:7:6: warning: no previous prototype for ‘get_regs_from_mc’ [-Wmissing-prototypes]
7 | void get_regs_from_mc(struct uml_pt_regs *regs, mcontext_t *mc)
| ^~~~~~~~~~~~~~~~
../arch/um/os-Linux/skas/process.c:107:6: warning: no previous prototype for ‘wait_stub_done’ [-Wmissing-prototypes]
107 | void wait_stub_done(int pid)
| ^~~~~~~~~~~~~~
../arch/um/os-Linux/skas/process.c:683:6: warning: no previous prototype for ‘__switch_mm’ [-Wmissing-prototypes]
683 | void __switch_mm(struct mm_id *mm_idp)
| ^~~~~~~~~~~
../arch/um/kernel/skas/mmu.c:17:5: warning: no previous prototype for ‘init_new_context’ [-Wmissing-prototypes]
17 | int init_new_context(struct task_struct *task, struct mm_struct *mm)
| ^~~~~~~~~~~~~~~~
../arch/um/kernel/skas/mmu.c:60:6: warning: no previous prototype for ‘destroy_context’ [-Wmissing-prototypes]
60 | void destroy_context(struct mm_struct *mm)
| ^~~~~~~~~~~~~~~
../arch/um/os-Linux/main.c:187:7: warning: no previous prototype for ‘__wrap_malloc’ [-Wmissing-prototypes]
187 | void *__wrap_malloc(int size)
| ^~~~~~~~~~~~~
../arch/um/os-Linux/main.c:208:7: warning: no previous prototype for ‘__wrap_calloc’ [-Wmissing-prototypes]
208 | void *__wrap_calloc(int n, int size)
| ^~~~~~~~~~~~~
../arch/um/os-Linux/main.c:222:6: warning: no previous prototype for ‘__wrap_free’ [-Wmissing-prototypes]
222 | void __wrap_free(void *ptr)
| ^~~~~~~~~~~
../arch/um/kernel/skas/process.c:36:12: warning: no previous prototype for ‘start_uml’ [-Wmissing-prototypes]
36 | int __init start_uml(void)
| ^~~~~~~~~
../arch/um/os-Linux/mem.c:28:6: warning: no previous prototype for ‘kasan_map_memory’ [-Wmissing-prototypes]
28 | void kasan_map_memory(void *start, size_t len)
| ^~~~~~~~~~~~~~~~
../arch/um/os-Linux/mem.c:212:13: warning: no previous prototype for ‘check_tmpexec’ [-Wmissing-prototypes]
212 | void __init check_tmpexec(void)
| ^~~~~~~~~~~~~
../arch/x86/um/ptrace_64.c:111:5: warning: no previous prototype for ‘poke_user’ [-Wmissing-prototypes]
111 | int poke_user(struct task_struct *child, long addr, long data)
| ^~~~~~~~~
../arch/x86/um/ptrace_64.c:171:5: warning: no previous prototype for ‘peek_user’ [-Wmissing-prototypes]
171 | int peek_user(struct task_struct *child, long addr, long data)
| ^~~~~~~~~
../arch/um/os-Linux/signal.c:75:6: warning: no previous prototype for ‘sig_handler’ [-Wmissing-prototypes]
75 | void sig_handler(int sig, struct siginfo *si, mcontext_t *mc)
| ^~~~~~~~~~~
../arch/um/os-Linux/signal.c:111:6: warning: no previous prototype for ‘timer_alarm_handler’ [-Wmissing-prototypes]
111 | void timer_alarm_handler(int sig, struct siginfo *unused_si, mcontext_t *mc)
| ^~~~~~~~~~~~~~~~~~~
../arch/x86/um/signal.c:560:6: warning: no previous prototype for ‘sys_rt_sigreturn’ [-Wmissing-prototypes]
560 | long sys_rt_sigreturn(void)
| ^~~~~~~~~~~~~~~~
../arch/um/os-Linux/start_up.c:301:12: warning: no previous prototype for ‘parse_iomem’ [-Wmissing-prototypes]
301 | int __init parse_iomem(char *str, int *add)
| ^~~~~~~~~~~
../arch/um/kernel/mem.c:202:8: warning: no previous prototype for ‘pgd_alloc’ [-Wmissing-prototypes]
202 | pgd_t *pgd_alloc(struct mm_struct *mm)
| ^~~~~~~~~
../arch/um/kernel/mem.c:215:7: warning: no previous prototype for ‘uml_kmalloc’ [-Wmissing-prototypes]
215 | void *uml_kmalloc(int size, int flags)
| ^~~~~~~~~~~
../arch/x86/um/syscalls_64.c:48:6: warning: no previous prototype for ‘arch_switch_to’ [-Wmissing-prototypes]
48 | void arch_switch_to(struct task_struct *to)
| ^~~~~~~~~~~~~~
../arch/um/kernel/process.c:51:5: warning: no previous prototype for ‘pid_to_processor_id’ [-Wmissing-prototypes]
51 | int pid_to_processor_id(int pid)
| ^~~~~~~~~~~~~~~~~~~
../arch/um/kernel/process.c:87:7: warning: no previous prototype for ‘__switch_to’ [-Wmissing-prototypes]
87 | void *__switch_to(struct task_struct *from, struct task_struct *to)
| ^~~~~~~~~~~
../arch/um/kernel/process.c:140:6: warning: no previous prototype for ‘fork_handler’ [-Wmissing-prototypes]
140 | void fork_handler(void)
| ^~~~~~~~~~~~
../arch/um/kernel/process.c:217:6: warning: no previous prototype for ‘arch_cpu_idle’ [-Wmissing-prototypes]
217 | void arch_cpu_idle(void)
| ^~~~~~~~~~~~~
../arch/um/kernel/process.c:253:5: warning: no previous prototype for ‘copy_to_user_proc’ [-Wmissing-prototypes]
253 | int copy_to_user_proc(void __user *to, void *from, int size)
| ^~~~~~~~~~~~~~~~~
../arch/um/kernel/process.c:263:5: warning: no previous prototype for ‘clear_user_proc’ [-Wmissing-prototypes]
263 | int clear_user_proc(void __user *buf, int size)
| ^~~~~~~~~~~~~~~
../arch/um/kernel/process.c:271:6: warning: no previous prototype for ‘set_using_sysemu’ [-Wmissing-prototypes]
271 | void set_using_sysemu(int value)
| ^~~~~~~~~~~~~~~~
../arch/um/kernel/process.c:278:5: warning: no previous prototype for ‘get_using_sysemu’ [-Wmissing-prototypes]
278 | int get_using_sysemu(void)
| ^~~~~~~~~~~~~~~~
../arch/um/kernel/process.c:316:12: warning: no previous prototype for ‘make_proc_sysemu’ [-Wmissing-prototypes]
316 | int __init make_proc_sysemu(void)
| ^~~~~~~~~~~~~~~~
../arch/um/kernel/process.c:348:15: warning: no previous prototype for ‘arch_align_stack’ [-Wmissing-prototypes]
348 | unsigned long arch_align_stack(unsigned long sp)
| ^~~~~~~~~~~~~~~~
../arch/um/kernel/reboot.c:45:6: warning: no previous prototype for ‘machine_restart’ [-Wmissing-prototypes]
45 | void machine_restart(char * __unused)
| ^~~~~~~~~~~~~~~
../arch/um/kernel/reboot.c:51:6: warning: no previous prototype for ‘machine_power_off’ [-Wmissing-prototypes]
51 | void machine_power_off(void)
| ^~~~~~~~~~~~~~~~~
../arch/um/kernel/reboot.c:57:6: warning: no previous prototype for ‘machine_halt’ [-Wmissing-prototypes]
57 | void machine_halt(void)
| ^~~~~~~~~~~~
../arch/um/kernel/tlb.c:579:6: warning: no previous prototype for ‘flush_tlb_mm_range’ [-Wmissing-prototypes]
579 | void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
| ^~~~~~~~~~~~~~~~~~
../arch/um/kernel/tlb.c:594:6: warning: no previous prototype for ‘force_flush_all’ [-Wmissing-prototypes]
594 | void force_flush_all(void)
| ^~~~~~~~~~~~~~~
../arch/um/kernel/um_arch.c:408:19: warning: no previous prototype for ‘read_initrd’ [-Wmissing-prototypes]
408 | int __init __weak read_initrd(void)
| ^~~~~~~~~~~
../arch/um/kernel/um_arch.c:461:7: warning: no previous prototype for ‘text_poke’ [-Wmissing-prototypes]
461 | void *text_poke(void *addr, const void *opcode, size_t len)
| ^~~~~~~~~
../arch/um/kernel/um_arch.c:473:6: warning: no previous prototype for ‘text_poke_sync’ [-Wmissing-prototypes]
473 | void text_poke_sync(void)
| ^~~~~~~~~~~~~~
../arch/um/kernel/kmsg_dump.c:60:12: warning: no previous prototype for ‘kmsg_dumper_stdout_init’ [-Wmissing-prototypes]
60 | int __init kmsg_dumper_stdout_init(void)
| ^~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_vm.c: In function ‘vm_bind_ioctl_ops_create’:
../drivers/gpu/drm/xe/xe_vm.c:2170:6: warning: unused variable ‘err’ [-Wunused-variable]
2170 | int err;
| ^~~
../drivers/gpu/drm/xe/xe_vm.c:2169:23: warning: unused variable ‘vm_bo’ [-Wunused-variable]
2169 | struct drm_gpuvm_bo *vm_bo;
| ^~~~~
../drivers/gpu/drm/xe/xe_vm.c:2168:23: warning: unused variable ‘__op’ [-Wunused-variable]
2168 | struct drm_gpuva_op *__op;
| ^~~~
../drivers/gpu/drm/xe/xe_vm.c:2167:24: warning: unused variable ‘ops’ [-Wunused-variable]
2167 | struct drm_gpuva_ops *ops;
| ^~~
../drivers/gpu/drm/xe/xe_vm.c:2166:25: warning: unused variable ‘obj’ [-Wunused-variable]
2166 | struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
| ^~~
../drivers/gpu/drm/xe/xe_vm.c: At top level:
../drivers/gpu/drm/xe/xe_vm.c:2191:2: error: expected identifier or ‘(’ before ‘switch’
2191 | switch (operation) {
| ^~~~~~
../drivers/gpu/drm/xe/xe_vm.c:2224:2: error: expected identifier or ‘(’ before ‘if’
2224 | if (IS_ERR(ops))
| ^~
In file included from ../include/linux/fwnode.h:14,
from ../include/linux/logic_pio.h:11,
from ../include/asm-generic/io.h:609,
from ../arch/um/include/asm/io.h:24,
from ../include/linux/io.h:13,
from ../include/linux/iosys-map.h:10,
from ../drivers/gpu/drm/xe/xe_bo_types.h:9,
from ../drivers/gpu/drm/xe/xe_vm.h:9,
from ../drivers/gpu/drm/xe/xe_vm.c:6:
../include/linux/list.h:778:2: error: expected identifier or ‘(’ before ‘for’
778 | for (pos = list_first_entry(head, typeof(*pos), member); \
| ^~~
../include/drm/drm_gpuvm.h:995:40: note: in expansion of macro ‘list_for_each_entry’
995 | #define drm_gpuva_for_each_op(op, ops) list_for_each_entry(op, &(ops)->list, entry)
| ^~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_vm.c:2227:2: note: in expansion of macro ‘drm_gpuva_for_each_op’
2227 | drm_gpuva_for_each_op(__op, ops) {
| ^~~~~~~~~~~~~~~~~~~~~
In file included from ../include/linux/list.h:5,
from ../include/linux/fwnode.h:14,
from ../include/linux/logic_pio.h:11,
from ../include/asm-generic/io.h:609,
from ../arch/um/include/asm/io.h:24,
from ../include/linux/io.h:13,
from ../include/linux/iosys-map.h:10,
from ../drivers/gpu/drm/xe/xe_bo_types.h:9,
from ../drivers/gpu/drm/xe/xe_vm.h:9,
from ../drivers/gpu/drm/xe/xe_vm.c:6:
../include/linux/container_of.h:23:48: error: expected identifier or ‘(’ before ‘)’ token
23 | ((type *)(__mptr - offsetof(type, member))); })
| ^
../include/linux/list.h:601:2: note: in expansion of macro ‘container_of’
601 | container_of(ptr, type, member)
| ^~~~~~~~~~~~
../include/linux/list.h:612:2: note: in expansion of macro ‘list_entry’
612 | list_entry((ptr)->next, type, member)
| ^~~~~~~~~~
../include/linux/list.h:778:13: note: in expansion of macro ‘list_first_entry’
778 | for (pos = list_first_entry(head, typeof(*pos), member); \
| ^~~~~~~~~~~~~~~~
../include/drm/drm_gpuvm.h:995:40: note: in expansion of macro ‘list_for_each_entry’
995 | #define drm_gpuva_for_each_op(op, ops) list_for_each_entry(op, &(ops)->list, entry)
| ^~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_vm.c:2227:2: note: in expansion of macro ‘drm_gpuva_for_each_op’
2227 | drm_gpuva_for_each_op(__op, ops) {
| ^~~~~~~~~~~~~~~~~~~~~
In file included from ../include/linux/fwnode.h:14,
from ../include/linux/logic_pio.h:11,
from ../include/asm-generic/io.h:609,
from ../arch/um/include/asm/io.h:24,
from ../include/linux/io.h:13,
from ../include/linux/iosys-map.h:10,
from ../drivers/gpu/drm/xe/xe_bo_types.h:9,
from ../drivers/gpu/drm/xe/xe_vm.h:9,
from ../drivers/gpu/drm/xe/xe_vm.c:6:
../include/linux/list.h:779:7: error: expected identifier or ‘(’ before ‘!’ token
779 | !list_entry_is_head(pos, head, member); \
| ^
../include/drm/drm_gpuvm.h:995:40: note: in expansion of macro ‘list_for_each_entry’
995 | #define drm_gpuva_for_each_op(op, ops) list_for_each_entry(op, &(ops)->list, entry)
| ^~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_vm.c:2227:2: note: in expansion of macro ‘drm_gpuva_for_each_op’
2227 | drm_gpuva_for_each_op(__op, ops) {
| ^~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_vm.c:2227:24: warning: data definition has no type or storage class
2227 | drm_gpuva_for_each_op(__op, ops) {
| ^~~~
../include/linux/list.h:780:7: note: in definition of macro ‘list_for_each_entry’
780 | pos = list_next_entry(pos, member))
| ^~~
../drivers/gpu/drm/xe/xe_vm.c:2227:2: note: in expansion of macro ‘drm_gpuva_for_each_op’
2227 | drm_gpuva_for_each_op(__op, ops) {
| ^~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_vm.c:2227:24: error: type defaults to ‘int’ in declaration of ‘__op’ [-Werror=implicit-int]
2227 | drm_gpuva_for_each_op(__op, ops) {
| ^~~~
../include/linux/list.h:780:7: note: in definition of macro ‘list_for_each_entry’
780 | pos = list_next_entry(pos, member))
| ^~~
../drivers/gpu/drm/xe/xe_vm.c:2227:2: note: in expansion of macro ‘drm_gpuva_for_each_op’
2227 | drm_gpuva_for_each_op(__op, ops) {
| ^~~~~~~~~~~~~~~~~~~~~
In file included from ../include/linux/list.h:5,
from ../include/linux/fwnode.h:14,
from ../include/linux/logic_pio.h:11,
from ../include/asm-generic/io.h:609,
from ../arch/um/include/asm/io.h:24,
from ../include/linux/io.h:13,
from ../include/linux/iosys-map.h:10,
from ../drivers/gpu/drm/xe/xe_bo_types.h:9,
from ../drivers/gpu/drm/xe/xe_vm.h:9,
from ../drivers/gpu/drm/xe/xe_vm.c:6:
../include/linux/container_of.h:18:41: error: braced-group within expression allowed only inside a function
18 | #define container_of(ptr, type, member) ({ \
| ^
../include/linux/list.h:601:2: note: in expansion of macro ‘container_of’
601 | container_of(ptr, type, member)
| ^~~~~~~~~~~~
../include/linux/list.h:645:2: note: in expansion of macro ‘list_entry’
645 | list_entry((pos)->member.next, typeof(*(pos)), member)
| ^~~~~~~~~~
../include/linux/list.h:780:13: note: in expansion of macro ‘list_next_entry’
780 | pos = list_next_entry(pos, member))
| ^~~~~~~~~~~~~~~
../include/drm/drm_gpuvm.h:995:40: note: in expansion of macro ‘list_for_each_entry’
995 | #define drm_gpuva_for_each_op(op, ops) list_for_each_entry(op, &(ops)->list, entry)
| ^~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_vm.c:2227:2: note: in expansion of macro ‘drm_gpuva_for_each_op’
2227 | drm_gpuva_for_each_op(__op, ops) {
| ^~~~~~~~~~~~~~~~~~~~~
In file included from ../include/linux/fwnode.h:14,
from ../include/linux/logic_pio.h:11,
from ../include/asm-generic/io.h:609,
from ../arch/um/include/asm/io.h:24,
from ../include/linux/io.h:13,
from ../include/linux/iosys-map.h:10,
from ../drivers/gpu/drm/xe/xe_bo_types.h:9,
from ../drivers/gpu/drm/xe/xe_vm.h:9,
from ../drivers/gpu/drm/xe/xe_vm.c:6:
../include/linux/list.h:780:41: error: expected ‘,’ or ‘;’ before ‘)’ token
780 | pos = list_next_entry(pos, member))
| ^
../include/drm/drm_gpuvm.h:995:40: note: in expansion of macro ‘list_for_each_entry’
995 | #define drm_gpuva_for_each_op(op, ops) list_for_each_entry(op, &(ops)->list, entry)
| ^~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_vm.c:2227:2: note: in expansion of macro ‘drm_gpuva_for_each_op’
2227 | drm_gpuva_for_each_op(__op, ops) {
| ^~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_vm.c:2242:2: error: expected identifier or ‘(’ before ‘return’
2242 | return ops;
| ^~~~~~
../drivers/gpu/drm/xe/xe_vm.c:2243:1: error: expected identifier or ‘(’ before ‘}’ token
2243 | }
| ^
../drivers/gpu/drm/xe/xe_vm.c:2151:13: warning: ‘print_op’ defined but not used [-Wunused-function]
2151 | static void print_op(struct xe_device *xe, struct drm_gpuva_op *op)
| ^~~~~~~~
cc1: some warnings being treated as errors
make[7]: *** [../scripts/Makefile.build:244: drivers/gpu/drm/xe/xe_vm.o] Error 1
make[7]: *** Waiting for unfinished jobs....
../lib/iomap.c:156:5: warning: no previous prototype for ‘ioread64_lo_hi’ [-Wmissing-prototypes]
156 | u64 ioread64_lo_hi(const void __iomem *addr)
| ^~~~~~~~~~~~~~
../lib/iomap.c:163:5: warning: no previous prototype for ‘ioread64_hi_lo’ [-Wmissing-prototypes]
163 | u64 ioread64_hi_lo(const void __iomem *addr)
| ^~~~~~~~~~~~~~
../lib/iomap.c:170:5: warning: no previous prototype for ‘ioread64be_lo_hi’ [-Wmissing-prototypes]
170 | u64 ioread64be_lo_hi(const void __iomem *addr)
| ^~~~~~~~~~~~~~~~
../lib/iomap.c:178:5: warning: no previous prototype for ‘ioread64be_hi_lo’ [-Wmissing-prototypes]
178 | u64 ioread64be_hi_lo(const void __iomem *addr)
| ^~~~~~~~~~~~~~~~
../lib/iomap.c:264:6: warning: no previous prototype for ‘iowrite64_lo_hi’ [-Wmissing-prototypes]
264 | void iowrite64_lo_hi(u64 val, void __iomem *addr)
| ^~~~~~~~~~~~~~~
../lib/iomap.c:272:6: warning: no previous prototype for ‘iowrite64_hi_lo’ [-Wmissing-prototypes]
272 | void iowrite64_hi_lo(u64 val, void __iomem *addr)
| ^~~~~~~~~~~~~~~
../lib/iomap.c:280:6: warning: no previous prototype for ‘iowrite64be_lo_hi’ [-Wmissing-prototypes]
280 | void iowrite64be_lo_hi(u64 val, void __iomem *addr)
| ^~~~~~~~~~~~~~~~~
../lib/iomap.c:288:6: warning: no previous prototype for ‘iowrite64be_hi_lo’ [-Wmissing-prototypes]
288 | void iowrite64be_hi_lo(u64 val, void __iomem *addr)
| ^~~~~~~~~~~~~~~~~
make[6]: *** [../scripts/Makefile.build:485: drivers/gpu/drm/xe] Error 2
make[5]: *** [../scripts/Makefile.build:485: drivers/gpu/drm] Error 2
make[4]: *** [../scripts/Makefile.build:485: drivers/gpu] Error 2
make[3]: *** [../scripts/Makefile.build:485: drivers] Error 2
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [/kernel/Makefile:1919: .] Error 2
make[1]: *** [/kernel/Makefile:240: __sub-make] Error 2
make: *** [Makefile:240: __sub-make] Error 2
[18:04:24] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[18:04:28] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make ARCH=um O=.kunit --jobs=48
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics
2024-04-10 17:35 ` Matthew Brost
@ 2024-04-11 9:22 ` Nirmoy Das
2024-04-11 14:00 ` Nirmoy Das
1 sibling, 0 replies; 27+ messages in thread
From: Nirmoy Das @ 2024-04-11 9:22 UTC (permalink / raw)
To: Matthew Brost, Nirmoy Das; +Cc: intel-xe
Hi Matt,
On 4/10/2024 7:35 PM, Matthew Brost wrote:
> On Wed, Apr 10, 2024 at 07:03:08PM +0200, Nirmoy Das wrote:
>> Adds a new VMA bind flag to enable device atomics on SMEM only buffers.
>>
>> Given that simultaneous usage of device atomics and CPU atomics on
>> the same SMEM buffer is not guaranteed to function without migration,
>> and UMD expects no migration for SMEM-only buffer objects, so this provide
>> a way to set device atomics when UMD is certain to use the buffer only
>> for device atomics.
>>
> For new uAPI we will need a UMD PR using it and provide a link to the PR
> in the commit message.
I will resend this once I have the UMD PR.
Thanks,
Nirmoy
>
>> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_vm.c | 27 +++++++++++++++++++++++++--
>> drivers/gpu/drm/xe/xe_vm_types.h | 2 ++
>> include/uapi/drm/xe_drm.h | 9 +++++----
>> 3 files changed, 32 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index 8f3474c5f480..530b4bbc186c 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -851,6 +851,7 @@ static void xe_vma_free(struct xe_vma *vma)
>> #define VMA_CREATE_FLAG_READ_ONLY BIT(0)
>> #define VMA_CREATE_FLAG_IS_NULL BIT(1)
>> #define VMA_CREATE_FLAG_DUMPABLE BIT(2)
>> +#define VMA_CREATE_FLAG_DEVICE_ATOMICS BIT(3)
>>
>> static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>> struct xe_bo *bo,
>> @@ -864,6 +865,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>> bool read_only = (flags & VMA_CREATE_FLAG_READ_ONLY);
>> bool is_null = (flags & VMA_CREATE_FLAG_IS_NULL);
>> bool dumpable = (flags & VMA_CREATE_FLAG_DUMPABLE);
>> + bool enable_atomics = (flags & VMA_CREATE_FLAG_IS_NULL);
> s/flags & VMA_CREATE_FLAG_IS_NULL/flags & VMA_CREATE_FLAG_DEVICE_ATOMICS/
>
>>
>> xe_assert(vm->xe, start < end);
>> xe_assert(vm->xe, end < vm->size);
>> @@ -912,7 +914,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>> xe_bo_assert_held(bo);
>>
>> if (GRAPHICS_VER(vm->xe) >= 20 || xe_bo_is_vram(bo) ||
>> - !IS_DGFX(vm->xe))
>> + !IS_DGFX(vm->xe) || enable_atomics)
>> vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>>
>> vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
>> @@ -2174,6 +2176,18 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>> operation, (ULL)addr, (ULL)range,
>> (ULL)bo_offset_or_userptr);
>>
>> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
>> + (vm->xe->info.platform == XE_PVC && !xe_bo_is_vram(bo))) {
>> + drm_warn(&vm->xe->drm, "Setting device atomics on SMEM is not supported for this platform");
>> + return ERR_PTR(-EINVAL);
>> + }
>> +
>> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
>> + !xe_bo_has_single_placement(bo))
>> + drm_warn(&vm->xe->drm, "DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS can be only set if the BO has single placement");
>> + return ERR_PTR(-EINVAL);
>> + }
>> +
> A few things here.
>
> - These check should go in xe_vm_bind_ioctl() in the loop that looks up
> the BOs and parses them. Probably move implementation to a helper(s)
> too as it is already a pretty big ugly loop in xe_vm_bind_ioctl and
> adding more logic will make it uglier. Search for
> drm_gem_object_lookup in xe_vm_bind_ioctl() for the loop I am refering
> to.
>
> - Rather than drm_warns just use XE_IOCTL_DBG for these checks, that is
> the style or input to IOCTLs when we return -EINVAL in Xe.
>
> - Are these checks valid? At one point on PVC I had to code to do
> atomics between CPU and GPU by migrating the BO back and forth on page
> faults. I think the i915 does this too? Are we abandoning that idea?
>
> - Lastly if these check are valid, rather than a platform check in the
> code I'd rather see a bit in intel_device_info.
>
>> switch (operation) {
>> case DRM_XE_VM_BIND_OP_MAP:
>> case DRM_XE_VM_BIND_OP_MAP_USERPTR:
>> @@ -2216,6 +2230,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>> if (__op->op == DRM_GPUVA_OP_MAP) {
>> op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL;
>> op->map.dumpable = flags & DRM_XE_VM_BIND_FLAG_DUMPABLE;
>> + op->map.enable_device_atomics = flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS;
>> op->map.pat_index = pat_index;
>> } else if (__op->op == DRM_GPUVA_OP_PREFETCH) {
>> op->prefetch.region = prefetch_region;
>> @@ -2412,6 +2427,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>> VMA_CREATE_FLAG_IS_NULL : 0;
>> flags |= op->map.dumpable ?
>> VMA_CREATE_FLAG_DUMPABLE : 0;
>> + flags |= op->map.enable_device_atomics ?
>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>>
>> vma = new_vma(vm, &op->base.map, op->map.pat_index,
>> flags);
>> @@ -2439,6 +2456,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>> flags |= op->base.remap.unmap->va->flags &
>> XE_VMA_DUMPABLE ?
>> VMA_CREATE_FLAG_DUMPABLE : 0;
>> + flags |= op->base.remap.unmap->va->flags ?
>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>>
>> vma = new_vma(vm, op->base.remap.prev,
>> old->pat_index, flags);
>> @@ -2476,6 +2495,9 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>> flags |= op->base.remap.unmap->va->flags &
>> XE_VMA_DUMPABLE ?
>> VMA_CREATE_FLAG_DUMPABLE : 0;
>> + flags |= op->base.remap.unmap->va->flags ?
>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>> +
> Extra newline.
>
>>
>> vma = new_vma(vm, op->base.remap.next,
>> old->pat_index, flags);
>> @@ -2831,7 +2853,8 @@ static int vm_bind_ioctl_ops_execute(struct xe_vm *vm,
>> (DRM_XE_VM_BIND_FLAG_READONLY | \
>> DRM_XE_VM_BIND_FLAG_IMMEDIATE | \
>> DRM_XE_VM_BIND_FLAG_NULL | \
>> - DRM_XE_VM_BIND_FLAG_DUMPABLE)
>> + DRM_XE_VM_BIND_FLAG_DUMPABLE | \
>> + DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS)
>> #define XE_64K_PAGE_MASK 0xffffull
>> #define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
>> index badf3945083d..7b9c68909c78 100644
>> --- a/drivers/gpu/drm/xe/xe_vm_types.h
>> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
>> @@ -282,6 +282,8 @@ struct xe_vma_op_map {
>> bool dumpable;
>> /** @pat_index: The pat index to use for this operation. */
>> u16 pat_index;
>> + /** @enable_device_atomics: Whether the VMA will allow device atomics */
>> + bool enable_device_atomics;
> Put this next to the other bools in xe_vma_op_map.
>
>> };
>>
>> /** struct xe_vma_op_remap - VMA remap operation */
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index 1446c3bae515..bffe8b1c040c 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -969,10 +969,11 @@ struct drm_xe_vm_bind_op {
>> /** @op: Bind operation to perform */
>> __u32 op;
>>
>> -#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
>> -#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
>> -#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
>> -#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
>> +#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
>> +#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
>> +#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
>> +#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
>> +#define DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS (1 << 4)
> Kernel doc for new flag.
>
> Matt
>
>> /** @flags: Bind flags */
>> __u32 flags;
>>
>> --
>> 2.42.0
>>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics
2024-04-10 17:03 ` [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics Nirmoy Das
2024-04-10 17:35 ` Matthew Brost
@ 2024-04-11 13:14 ` Lionel Landwerlin
2024-04-11 13:32 ` Lionel Landwerlin
2024-04-11 13:42 ` Nirmoy Das
2024-04-11 23:44 ` Matt Roper
2 siblings, 2 replies; 27+ messages in thread
From: Lionel Landwerlin @ 2024-04-11 13:14 UTC (permalink / raw)
To: Nirmoy Das, intel-xe
On 10/04/2024 20:03, Nirmoy Das wrote:
> Adds a new VMA bind flag to enable device atomics on SMEM only buffers.
>
> Given that simultaneous usage of device atomics and CPU atomics on
> the same SMEM buffer is not guaranteed to function without migration,
> and UMD expects no migration for SMEM-only buffer objects, so this provide
> a way to set device atomics when UMD is certain to use the buffer only
> for device atomics.
>
> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
> ---
> drivers/gpu/drm/xe/xe_vm.c | 27 +++++++++++++++++++++++++--
> drivers/gpu/drm/xe/xe_vm_types.h | 2 ++
> include/uapi/drm/xe_drm.h | 9 +++++----
> 3 files changed, 32 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 8f3474c5f480..530b4bbc186c 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -851,6 +851,7 @@ static void xe_vma_free(struct xe_vma *vma)
> #define VMA_CREATE_FLAG_READ_ONLY BIT(0)
> #define VMA_CREATE_FLAG_IS_NULL BIT(1)
> #define VMA_CREATE_FLAG_DUMPABLE BIT(2)
> +#define VMA_CREATE_FLAG_DEVICE_ATOMICS BIT(3)
>
> static struct xe_vma *xe_vma_create(struct xe_vm *vm,
> struct xe_bo *bo,
> @@ -864,6 +865,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
> bool read_only = (flags & VMA_CREATE_FLAG_READ_ONLY);
> bool is_null = (flags & VMA_CREATE_FLAG_IS_NULL);
> bool dumpable = (flags & VMA_CREATE_FLAG_DUMPABLE);
> + bool enable_atomics = (flags & VMA_CREATE_FLAG_IS_NULL);
>
> xe_assert(vm->xe, start < end);
> xe_assert(vm->xe, end < vm->size);
> @@ -912,7 +914,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
> xe_bo_assert_held(bo);
>
> if (GRAPHICS_VER(vm->xe) >= 20 || xe_bo_is_vram(bo) ||
> - !IS_DGFX(vm->xe))
> + !IS_DGFX(vm->xe) || enable_atomics)
> vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>
> vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
> @@ -2174,6 +2176,18 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
> operation, (ULL)addr, (ULL)range,
> (ULL)bo_offset_or_userptr);
>
> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
> + (vm->xe->info.platform == XE_PVC && !xe_bo_is_vram(bo))) {
> + drm_warn(&vm->xe->drm, "Setting device atomics on SMEM is not supported for this platform");
> + return ERR_PTR(-EINVAL);
> + }
> +
> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
> + !xe_bo_has_single_placement(bo))
> + drm_warn(&vm->xe->drm, "DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS can be only set if the BO has single placement");
> + return ERR_PTR(-EINVAL);
> + }
> +
> switch (operation) {
> case DRM_XE_VM_BIND_OP_MAP:
> case DRM_XE_VM_BIND_OP_MAP_USERPTR:
> @@ -2216,6 +2230,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
> if (__op->op == DRM_GPUVA_OP_MAP) {
> op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL;
> op->map.dumpable = flags & DRM_XE_VM_BIND_FLAG_DUMPABLE;
> + op->map.enable_device_atomics = flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS;
> op->map.pat_index = pat_index;
> } else if (__op->op == DRM_GPUVA_OP_PREFETCH) {
> op->prefetch.region = prefetch_region;
> @@ -2412,6 +2427,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
> VMA_CREATE_FLAG_IS_NULL : 0;
> flags |= op->map.dumpable ?
> VMA_CREATE_FLAG_DUMPABLE : 0;
> + flags |= op->map.enable_device_atomics ?
> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>
> vma = new_vma(vm, &op->base.map, op->map.pat_index,
> flags);
> @@ -2439,6 +2456,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
> flags |= op->base.remap.unmap->va->flags &
> XE_VMA_DUMPABLE ?
> VMA_CREATE_FLAG_DUMPABLE : 0;
> + flags |= op->base.remap.unmap->va->flags ?
> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>
> vma = new_vma(vm, op->base.remap.prev,
> old->pat_index, flags);
> @@ -2476,6 +2495,9 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
> flags |= op->base.remap.unmap->va->flags &
> XE_VMA_DUMPABLE ?
> VMA_CREATE_FLAG_DUMPABLE : 0;
> + flags |= op->base.remap.unmap->va->flags ?
> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
> +
>
> vma = new_vma(vm, op->base.remap.next,
> old->pat_index, flags);
> @@ -2831,7 +2853,8 @@ static int vm_bind_ioctl_ops_execute(struct xe_vm *vm,
> (DRM_XE_VM_BIND_FLAG_READONLY | \
> DRM_XE_VM_BIND_FLAG_IMMEDIATE | \
> DRM_XE_VM_BIND_FLAG_NULL | \
> - DRM_XE_VM_BIND_FLAG_DUMPABLE)
> + DRM_XE_VM_BIND_FLAG_DUMPABLE | \
> + DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS)
> #define XE_64K_PAGE_MASK 0xffffull
> #define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP)
>
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index badf3945083d..7b9c68909c78 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -282,6 +282,8 @@ struct xe_vma_op_map {
> bool dumpable;
> /** @pat_index: The pat index to use for this operation. */
> u16 pat_index;
> + /** @enable_device_atomics: Whether the VMA will allow device atomics */
> + bool enable_device_atomics;
> };
>
> /** struct xe_vma_op_remap - VMA remap operation */
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index 1446c3bae515..bffe8b1c040c 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -969,10 +969,11 @@ struct drm_xe_vm_bind_op {
> /** @op: Bind operation to perform */
> __u32 op;
>
> -#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
> -#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
> -#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
> -#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
> +#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
> +#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
> +#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
> +#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
We should probably document that it's an error to add this flag if the
BO has a single memory region.
Are we supposed to the ability to set that flag or is there going to be
a query?
A query might make sense since it's going to be rejected for some platform.
-Lionel
> +#define DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS (1 << 4)
> /** @flags: Bind flags */
> __u32 flags;
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics
2024-04-11 13:14 ` Lionel Landwerlin
@ 2024-04-11 13:32 ` Lionel Landwerlin
2024-04-11 13:42 ` Nirmoy Das
1 sibling, 0 replies; 27+ messages in thread
From: Lionel Landwerlin @ 2024-04-11 13:32 UTC (permalink / raw)
To: Nirmoy Das, intel-xe
On 11/04/2024 16:14, Lionel Landwerlin wrote:
> On 10/04/2024 20:03, Nirmoy Das wrote:
>> Adds a new VMA bind flag to enable device atomics on SMEM only buffers.
>>
>> Given that simultaneous usage of device atomics and CPU atomics on
>> the same SMEM buffer is not guaranteed to function without migration,
>> and UMD expects no migration for SMEM-only buffer objects, so this
>> provide
>> a way to set device atomics when UMD is certain to use the buffer only
>> for device atomics.
>>
>> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_vm.c | 27 +++++++++++++++++++++++++--
>> drivers/gpu/drm/xe/xe_vm_types.h | 2 ++
>> include/uapi/drm/xe_drm.h | 9 +++++----
>> 3 files changed, 32 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index 8f3474c5f480..530b4bbc186c 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -851,6 +851,7 @@ static void xe_vma_free(struct xe_vma *vma)
>> #define VMA_CREATE_FLAG_READ_ONLY BIT(0)
>> #define VMA_CREATE_FLAG_IS_NULL BIT(1)
>> #define VMA_CREATE_FLAG_DUMPABLE BIT(2)
>> +#define VMA_CREATE_FLAG_DEVICE_ATOMICS BIT(3)
>> static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>> struct xe_bo *bo,
>> @@ -864,6 +865,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm
>> *vm,
>> bool read_only = (flags & VMA_CREATE_FLAG_READ_ONLY);
>> bool is_null = (flags & VMA_CREATE_FLAG_IS_NULL);
>> bool dumpable = (flags & VMA_CREATE_FLAG_DUMPABLE);
>> + bool enable_atomics = (flags & VMA_CREATE_FLAG_IS_NULL);
>> xe_assert(vm->xe, start < end);
>> xe_assert(vm->xe, end < vm->size);
>> @@ -912,7 +914,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm
>> *vm,
>> xe_bo_assert_held(bo);
>> if (GRAPHICS_VER(vm->xe) >= 20 || xe_bo_is_vram(bo) ||
>> - !IS_DGFX(vm->xe))
>> + !IS_DGFX(vm->xe) || enable_atomics)
>> vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>> vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
>> @@ -2174,6 +2176,18 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm,
>> struct xe_bo *bo,
>> operation, (ULL)addr, (ULL)range,
>> (ULL)bo_offset_or_userptr);
>> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
>> + (vm->xe->info.platform == XE_PVC && !xe_bo_is_vram(bo))) {
>> + drm_warn(&vm->xe->drm, "Setting device atomics on SMEM is
>> not supported for this platform");
>> + return ERR_PTR(-EINVAL);
>> + }
>> +
>> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
>> + !xe_bo_has_single_placement(bo))
>> + drm_warn(&vm->xe->drm, "DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS
>> can be only set if the BO has single placement");
>> + return ERR_PTR(-EINVAL);
>> + }
>> +
>> switch (operation) {
>> case DRM_XE_VM_BIND_OP_MAP:
>> case DRM_XE_VM_BIND_OP_MAP_USERPTR:
>> @@ -2216,6 +2230,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm,
>> struct xe_bo *bo,
>> if (__op->op == DRM_GPUVA_OP_MAP) {
>> op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL;
>> op->map.dumpable = flags & DRM_XE_VM_BIND_FLAG_DUMPABLE;
>> + op->map.enable_device_atomics = flags &
>> DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS;
>> op->map.pat_index = pat_index;
>> } else if (__op->op == DRM_GPUVA_OP_PREFETCH) {
>> op->prefetch.region = prefetch_region;
>> @@ -2412,6 +2427,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm
>> *vm, struct xe_exec_queue *q,
>> VMA_CREATE_FLAG_IS_NULL : 0;
>> flags |= op->map.dumpable ?
>> VMA_CREATE_FLAG_DUMPABLE : 0;
>> + flags |= op->map.enable_device_atomics ?
>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>> vma = new_vma(vm, &op->base.map, op->map.pat_index,
>> flags);
>> @@ -2439,6 +2456,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm
>> *vm, struct xe_exec_queue *q,
>> flags |= op->base.remap.unmap->va->flags &
>> XE_VMA_DUMPABLE ?
>> VMA_CREATE_FLAG_DUMPABLE : 0;
>> + flags |= op->base.remap.unmap->va->flags ?
>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>> vma = new_vma(vm, op->base.remap.prev,
>> old->pat_index, flags);
>> @@ -2476,6 +2495,9 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm
>> *vm, struct xe_exec_queue *q,
>> flags |= op->base.remap.unmap->va->flags &
>> XE_VMA_DUMPABLE ?
>> VMA_CREATE_FLAG_DUMPABLE : 0;
>> + flags |= op->base.remap.unmap->va->flags ?
>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>> +
>> vma = new_vma(vm, op->base.remap.next,
>> old->pat_index, flags);
>> @@ -2831,7 +2853,8 @@ static int vm_bind_ioctl_ops_execute(struct
>> xe_vm *vm,
>> (DRM_XE_VM_BIND_FLAG_READONLY | \
>> DRM_XE_VM_BIND_FLAG_IMMEDIATE | \
>> DRM_XE_VM_BIND_FLAG_NULL | \
>> - DRM_XE_VM_BIND_FLAG_DUMPABLE)
>> + DRM_XE_VM_BIND_FLAG_DUMPABLE | \
>> + DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS)
>> #define XE_64K_PAGE_MASK 0xffffull
>> #define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP)
>> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
>> b/drivers/gpu/drm/xe/xe_vm_types.h
>> index badf3945083d..7b9c68909c78 100644
>> --- a/drivers/gpu/drm/xe/xe_vm_types.h
>> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
>> @@ -282,6 +282,8 @@ struct xe_vma_op_map {
>> bool dumpable;
>> /** @pat_index: The pat index to use for this operation. */
>> u16 pat_index;
>> + /** @enable_device_atomics: Whether the VMA will allow device
>> atomics */
>> + bool enable_device_atomics;
>> };
>> /** struct xe_vma_op_remap - VMA remap operation */
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index 1446c3bae515..bffe8b1c040c 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -969,10 +969,11 @@ struct drm_xe_vm_bind_op {
>> /** @op: Bind operation to perform */
>> __u32 op;
>> -#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
>> -#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
>> -#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
>> -#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
>> +#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
>> +#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
>> +#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
>> +#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
>
>
> We should probably document that it's an error to add this flag if the
> BO has a single memory region.
>
> Are we supposed to the ability to set that flag or is there going to
> be a query?
typo : Are we supposed to detect the ability to set that flag or is
there going to be a query?
>
> A query might make sense since it's going to be rejected for some
> platform.
>
>
> -Lionel
>
>
>> +#define DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS (1 << 4)
>> /** @flags: Bind flags */
>> __u32 flags;
>
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics
2024-04-11 13:14 ` Lionel Landwerlin
2024-04-11 13:32 ` Lionel Landwerlin
@ 2024-04-11 13:42 ` Nirmoy Das
2024-04-11 14:00 ` Lionel Landwerlin
1 sibling, 1 reply; 27+ messages in thread
From: Nirmoy Das @ 2024-04-11 13:42 UTC (permalink / raw)
To: Lionel Landwerlin, intel-xe
Hi Lionel,
On 4/11/2024 3:14 PM, Lionel Landwerlin wrote:
> On 10/04/2024 20:03, Nirmoy Das wrote:
>> Adds a new VMA bind flag to enable device atomics on SMEM only buffers.
>>
>> Given that simultaneous usage of device atomics and CPU atomics on
>> the same SMEM buffer is not guaranteed to function without migration,
>> and UMD expects no migration for SMEM-only buffer objects, so this
>> provide
>> a way to set device atomics when UMD is certain to use the buffer only
>> for device atomics.
>>
>> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_vm.c | 27 +++++++++++++++++++++++++--
>> drivers/gpu/drm/xe/xe_vm_types.h | 2 ++
>> include/uapi/drm/xe_drm.h | 9 +++++----
>> 3 files changed, 32 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index 8f3474c5f480..530b4bbc186c 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -851,6 +851,7 @@ static void xe_vma_free(struct xe_vma *vma)
>> #define VMA_CREATE_FLAG_READ_ONLY BIT(0)
>> #define VMA_CREATE_FLAG_IS_NULL BIT(1)
>> #define VMA_CREATE_FLAG_DUMPABLE BIT(2)
>> +#define VMA_CREATE_FLAG_DEVICE_ATOMICS BIT(3)
>> static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>> struct xe_bo *bo,
>> @@ -864,6 +865,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm
>> *vm,
>> bool read_only = (flags & VMA_CREATE_FLAG_READ_ONLY);
>> bool is_null = (flags & VMA_CREATE_FLAG_IS_NULL);
>> bool dumpable = (flags & VMA_CREATE_FLAG_DUMPABLE);
>> + bool enable_atomics = (flags & VMA_CREATE_FLAG_IS_NULL);
>> xe_assert(vm->xe, start < end);
>> xe_assert(vm->xe, end < vm->size);
>> @@ -912,7 +914,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm
>> *vm,
>> xe_bo_assert_held(bo);
>> if (GRAPHICS_VER(vm->xe) >= 20 || xe_bo_is_vram(bo) ||
>> - !IS_DGFX(vm->xe))
>> + !IS_DGFX(vm->xe) || enable_atomics)
>> vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>> vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
>> @@ -2174,6 +2176,18 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm,
>> struct xe_bo *bo,
>> operation, (ULL)addr, (ULL)range,
>> (ULL)bo_offset_or_userptr);
>> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
>> + (vm->xe->info.platform == XE_PVC && !xe_bo_is_vram(bo))) {
>> + drm_warn(&vm->xe->drm, "Setting device atomics on SMEM is
>> not supported for this platform");
>> + return ERR_PTR(-EINVAL);
>> + }
>> +
>> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
>> + !xe_bo_has_single_placement(bo))
>> + drm_warn(&vm->xe->drm, "DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS
>> can be only set if the BO has single placement");
>> + return ERR_PTR(-EINVAL);
>> + }
>> +
>> switch (operation) {
>> case DRM_XE_VM_BIND_OP_MAP:
>> case DRM_XE_VM_BIND_OP_MAP_USERPTR:
>> @@ -2216,6 +2230,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm,
>> struct xe_bo *bo,
>> if (__op->op == DRM_GPUVA_OP_MAP) {
>> op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL;
>> op->map.dumpable = flags & DRM_XE_VM_BIND_FLAG_DUMPABLE;
>> + op->map.enable_device_atomics = flags &
>> DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS;
>> op->map.pat_index = pat_index;
>> } else if (__op->op == DRM_GPUVA_OP_PREFETCH) {
>> op->prefetch.region = prefetch_region;
>> @@ -2412,6 +2427,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm
>> *vm, struct xe_exec_queue *q,
>> VMA_CREATE_FLAG_IS_NULL : 0;
>> flags |= op->map.dumpable ?
>> VMA_CREATE_FLAG_DUMPABLE : 0;
>> + flags |= op->map.enable_device_atomics ?
>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>> vma = new_vma(vm, &op->base.map, op->map.pat_index,
>> flags);
>> @@ -2439,6 +2456,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm
>> *vm, struct xe_exec_queue *q,
>> flags |= op->base.remap.unmap->va->flags &
>> XE_VMA_DUMPABLE ?
>> VMA_CREATE_FLAG_DUMPABLE : 0;
>> + flags |= op->base.remap.unmap->va->flags ?
>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>> vma = new_vma(vm, op->base.remap.prev,
>> old->pat_index, flags);
>> @@ -2476,6 +2495,9 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm
>> *vm, struct xe_exec_queue *q,
>> flags |= op->base.remap.unmap->va->flags &
>> XE_VMA_DUMPABLE ?
>> VMA_CREATE_FLAG_DUMPABLE : 0;
>> + flags |= op->base.remap.unmap->va->flags ?
>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>> +
>> vma = new_vma(vm, op->base.remap.next,
>> old->pat_index, flags);
>> @@ -2831,7 +2853,8 @@ static int vm_bind_ioctl_ops_execute(struct
>> xe_vm *vm,
>> (DRM_XE_VM_BIND_FLAG_READONLY | \
>> DRM_XE_VM_BIND_FLAG_IMMEDIATE | \
>> DRM_XE_VM_BIND_FLAG_NULL | \
>> - DRM_XE_VM_BIND_FLAG_DUMPABLE)
>> + DRM_XE_VM_BIND_FLAG_DUMPABLE | \
>> + DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS)
>> #define XE_64K_PAGE_MASK 0xffffull
>> #define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP)
>> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
>> b/drivers/gpu/drm/xe/xe_vm_types.h
>> index badf3945083d..7b9c68909c78 100644
>> --- a/drivers/gpu/drm/xe/xe_vm_types.h
>> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
>> @@ -282,6 +282,8 @@ struct xe_vma_op_map {
>> bool dumpable;
>> /** @pat_index: The pat index to use for this operation. */
>> u16 pat_index;
>> + /** @enable_device_atomics: Whether the VMA will allow device
>> atomics */
>> + bool enable_device_atomics;
>> };
>> /** struct xe_vma_op_remap - VMA remap operation */
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index 1446c3bae515..bffe8b1c040c 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -969,10 +969,11 @@ struct drm_xe_vm_bind_op {
>> /** @op: Bind operation to perform */
>> __u32 op;
>> -#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
>> -#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
>> -#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
>> -#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
>> +#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
>> +#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
>> +#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
>> +#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
>
>
> We should probably document that it's an error to add this flag if the
> BO has a single memory region.
It is an error if this flag is set on non single memory region as
atomics on SMEM+LMEM buffer will be handled with migration.
Let me know if I got that wrong.
>
> Are we supposed to the ability to set that flag or is there going to
> be a query?
>
> A query might make sense since it's going to be rejected for some
> platform.
Can we use the rejection as a query ? I am using it in a IGT test
https://patchwork.freedesktop.org/patch/588759/?series=132289&rev=1
Regards,
Nirmoy
>
>
> -Lionel
>
>
>> +#define DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS (1 << 4)
>> /** @flags: Bind flags */
>> __u32 flags;
>
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics
2024-04-11 13:42 ` Nirmoy Das
@ 2024-04-11 14:00 ` Lionel Landwerlin
2024-04-11 14:07 ` Souza, Jose
0 siblings, 1 reply; 27+ messages in thread
From: Lionel Landwerlin @ 2024-04-11 14:00 UTC (permalink / raw)
To: Nirmoy Das, intel-xe
On 11/04/2024 16:42, Nirmoy Das wrote:
> Hi Lionel,
>
> On 4/11/2024 3:14 PM, Lionel Landwerlin wrote:
>> On 10/04/2024 20:03, Nirmoy Das wrote:
>>> Adds a new VMA bind flag to enable device atomics on SMEM only buffers.
>>>
>>> Given that simultaneous usage of device atomics and CPU atomics on
>>> the same SMEM buffer is not guaranteed to function without migration,
>>> and UMD expects no migration for SMEM-only buffer objects, so this
>>> provide
>>> a way to set device atomics when UMD is certain to use the buffer only
>>> for device atomics.
>>>
>>> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
>>> ---
>>> drivers/gpu/drm/xe/xe_vm.c | 27 +++++++++++++++++++++++++--
>>> drivers/gpu/drm/xe/xe_vm_types.h | 2 ++
>>> include/uapi/drm/xe_drm.h | 9 +++++----
>>> 3 files changed, 32 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>>> index 8f3474c5f480..530b4bbc186c 100644
>>> --- a/drivers/gpu/drm/xe/xe_vm.c
>>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>>> @@ -851,6 +851,7 @@ static void xe_vma_free(struct xe_vma *vma)
>>> #define VMA_CREATE_FLAG_READ_ONLY BIT(0)
>>> #define VMA_CREATE_FLAG_IS_NULL BIT(1)
>>> #define VMA_CREATE_FLAG_DUMPABLE BIT(2)
>>> +#define VMA_CREATE_FLAG_DEVICE_ATOMICS BIT(3)
>>> static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>>> struct xe_bo *bo,
>>> @@ -864,6 +865,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm
>>> *vm,
>>> bool read_only = (flags & VMA_CREATE_FLAG_READ_ONLY);
>>> bool is_null = (flags & VMA_CREATE_FLAG_IS_NULL);
>>> bool dumpable = (flags & VMA_CREATE_FLAG_DUMPABLE);
>>> + bool enable_atomics = (flags & VMA_CREATE_FLAG_IS_NULL);
>>> xe_assert(vm->xe, start < end);
>>> xe_assert(vm->xe, end < vm->size);
>>> @@ -912,7 +914,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm
>>> *vm,
>>> xe_bo_assert_held(bo);
>>> if (GRAPHICS_VER(vm->xe) >= 20 || xe_bo_is_vram(bo) ||
>>> - !IS_DGFX(vm->xe))
>>> + !IS_DGFX(vm->xe) || enable_atomics)
>>> vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>>> vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
>>> @@ -2174,6 +2176,18 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm,
>>> struct xe_bo *bo,
>>> operation, (ULL)addr, (ULL)range,
>>> (ULL)bo_offset_or_userptr);
>>> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
>>> + (vm->xe->info.platform == XE_PVC && !xe_bo_is_vram(bo))) {
>>> + drm_warn(&vm->xe->drm, "Setting device atomics on SMEM is
>>> not supported for this platform");
>>> + return ERR_PTR(-EINVAL);
>>> + }
>>> +
>>> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
>>> + !xe_bo_has_single_placement(bo))
>>> + drm_warn(&vm->xe->drm, "DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS
>>> can be only set if the BO has single placement");
>>> + return ERR_PTR(-EINVAL);
>>> + }
>>> +
>>> switch (operation) {
>>> case DRM_XE_VM_BIND_OP_MAP:
>>> case DRM_XE_VM_BIND_OP_MAP_USERPTR:
>>> @@ -2216,6 +2230,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm,
>>> struct xe_bo *bo,
>>> if (__op->op == DRM_GPUVA_OP_MAP) {
>>> op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL;
>>> op->map.dumpable = flags & DRM_XE_VM_BIND_FLAG_DUMPABLE;
>>> + op->map.enable_device_atomics = flags &
>>> DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS;
>>> op->map.pat_index = pat_index;
>>> } else if (__op->op == DRM_GPUVA_OP_PREFETCH) {
>>> op->prefetch.region = prefetch_region;
>>> @@ -2412,6 +2427,8 @@ static int vm_bind_ioctl_ops_parse(struct
>>> xe_vm *vm, struct xe_exec_queue *q,
>>> VMA_CREATE_FLAG_IS_NULL : 0;
>>> flags |= op->map.dumpable ?
>>> VMA_CREATE_FLAG_DUMPABLE : 0;
>>> + flags |= op->map.enable_device_atomics ?
>>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>>> vma = new_vma(vm, &op->base.map, op->map.pat_index,
>>> flags);
>>> @@ -2439,6 +2456,8 @@ static int vm_bind_ioctl_ops_parse(struct
>>> xe_vm *vm, struct xe_exec_queue *q,
>>> flags |= op->base.remap.unmap->va->flags &
>>> XE_VMA_DUMPABLE ?
>>> VMA_CREATE_FLAG_DUMPABLE : 0;
>>> + flags |= op->base.remap.unmap->va->flags ?
>>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>>> vma = new_vma(vm, op->base.remap.prev,
>>> old->pat_index, flags);
>>> @@ -2476,6 +2495,9 @@ static int vm_bind_ioctl_ops_parse(struct
>>> xe_vm *vm, struct xe_exec_queue *q,
>>> flags |= op->base.remap.unmap->va->flags &
>>> XE_VMA_DUMPABLE ?
>>> VMA_CREATE_FLAG_DUMPABLE : 0;
>>> + flags |= op->base.remap.unmap->va->flags ?
>>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>>> +
>>> vma = new_vma(vm, op->base.remap.next,
>>> old->pat_index, flags);
>>> @@ -2831,7 +2853,8 @@ static int vm_bind_ioctl_ops_execute(struct
>>> xe_vm *vm,
>>> (DRM_XE_VM_BIND_FLAG_READONLY | \
>>> DRM_XE_VM_BIND_FLAG_IMMEDIATE | \
>>> DRM_XE_VM_BIND_FLAG_NULL | \
>>> - DRM_XE_VM_BIND_FLAG_DUMPABLE)
>>> + DRM_XE_VM_BIND_FLAG_DUMPABLE | \
>>> + DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS)
>>> #define XE_64K_PAGE_MASK 0xffffull
>>> #define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP)
>>> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
>>> b/drivers/gpu/drm/xe/xe_vm_types.h
>>> index badf3945083d..7b9c68909c78 100644
>>> --- a/drivers/gpu/drm/xe/xe_vm_types.h
>>> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
>>> @@ -282,6 +282,8 @@ struct xe_vma_op_map {
>>> bool dumpable;
>>> /** @pat_index: The pat index to use for this operation. */
>>> u16 pat_index;
>>> + /** @enable_device_atomics: Whether the VMA will allow device
>>> atomics */
>>> + bool enable_device_atomics;
>>> };
>>> /** struct xe_vma_op_remap - VMA remap operation */
>>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>>> index 1446c3bae515..bffe8b1c040c 100644
>>> --- a/include/uapi/drm/xe_drm.h
>>> +++ b/include/uapi/drm/xe_drm.h
>>> @@ -969,10 +969,11 @@ struct drm_xe_vm_bind_op {
>>> /** @op: Bind operation to perform */
>>> __u32 op;
>>> -#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
>>> -#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
>>> -#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
>>> -#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
>>> +#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
>>> +#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
>>> +#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
>>> +#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
>>
>>
>> We should probably document that it's an error to add this flag if
>> the BO has a single memory region.
>
> It is an error if this flag is set on non single memory region as
> atomics on SMEM+LMEM buffer will be handled with migration.
>
> Let me know if I got that wrong.
>
>>
>> Are we supposed to the ability to set that flag or is there going to
>> be a query?
>>
>> A query might make sense since it's going to be rejected for some
>> platform.
>
> Can we use the rejection as a query ? I am using it in a IGT test
> https://patchwork.freedesktop.org/patch/588759/?series=132289&rev=1
>
>
> Regards,
>
> Nirmoy
Okay, sounds doable.
Will try to put up an MR for Mesa soon.
-Lionel
>
>>
>>
>> -Lionel
>>
>>
>>> +#define DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS (1 << 4)
>>> /** @flags: Bind flags */
>>> __u32 flags;
>>
>>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics
2024-04-10 17:35 ` Matthew Brost
2024-04-11 9:22 ` Nirmoy Das
@ 2024-04-11 14:00 ` Nirmoy Das
1 sibling, 0 replies; 27+ messages in thread
From: Nirmoy Das @ 2024-04-11 14:00 UTC (permalink / raw)
To: Matthew Brost, Nirmoy Das; +Cc: intel-xe
Hi Matt,
Sorry, my previous reply wasn't complete, addressing remaining reviews
here.
On 4/10/2024 7:35 PM, Matthew Brost wrote:
> On Wed, Apr 10, 2024 at 07:03:08PM +0200, Nirmoy Das wrote:
>> Adds a new VMA bind flag to enable device atomics on SMEM only buffers.
>>
>> Given that simultaneous usage of device atomics and CPU atomics on
>> the same SMEM buffer is not guaranteed to function without migration,
>> and UMD expects no migration for SMEM-only buffer objects, so this provide
>> a way to set device atomics when UMD is certain to use the buffer only
>> for device atomics.
>>
> For new uAPI we will need a UMD PR using it and provide a link to the PR
> in the commit message.
>
>> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_vm.c | 27 +++++++++++++++++++++++++--
>> drivers/gpu/drm/xe/xe_vm_types.h | 2 ++
>> include/uapi/drm/xe_drm.h | 9 +++++----
>> 3 files changed, 32 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index 8f3474c5f480..530b4bbc186c 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -851,6 +851,7 @@ static void xe_vma_free(struct xe_vma *vma)
>> #define VMA_CREATE_FLAG_READ_ONLY BIT(0)
>> #define VMA_CREATE_FLAG_IS_NULL BIT(1)
>> #define VMA_CREATE_FLAG_DUMPABLE BIT(2)
>> +#define VMA_CREATE_FLAG_DEVICE_ATOMICS BIT(3)
>>
>> static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>> struct xe_bo *bo,
>> @@ -864,6 +865,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>> bool read_only = (flags & VMA_CREATE_FLAG_READ_ONLY);
>> bool is_null = (flags & VMA_CREATE_FLAG_IS_NULL);
>> bool dumpable = (flags & VMA_CREATE_FLAG_DUMPABLE);
>> + bool enable_atomics = (flags & VMA_CREATE_FLAG_IS_NULL);
> s/flags & VMA_CREATE_FLAG_IS_NULL/flags & VMA_CREATE_FLAG_DEVICE_ATOMICS/
Strange that the test worked with this wrong change. Probably made this
mistake later on. I will fix it.
>
>>
>> xe_assert(vm->xe, start < end);
>> xe_assert(vm->xe, end < vm->size);
>> @@ -912,7 +914,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>> xe_bo_assert_held(bo);
>>
>> if (GRAPHICS_VER(vm->xe) >= 20 || xe_bo_is_vram(bo) ||
>> - !IS_DGFX(vm->xe))
>> + !IS_DGFX(vm->xe) || enable_atomics)
>> vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>>
>> vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
>> @@ -2174,6 +2176,18 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>> operation, (ULL)addr, (ULL)range,
>> (ULL)bo_offset_or_userptr);
>>
>> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
>> + (vm->xe->info.platform == XE_PVC && !xe_bo_is_vram(bo))) {
>> + drm_warn(&vm->xe->drm, "Setting device atomics on SMEM is not supported for this platform");
>> + return ERR_PTR(-EINVAL);
>> + }
>> +
>> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
>> + !xe_bo_has_single_placement(bo))
>> + drm_warn(&vm->xe->drm, "DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS can be only set if the BO has single placement");
>> + return ERR_PTR(-EINVAL);
>> + }
>> +
> A few things here.
>
> - These check should go in xe_vm_bind_ioctl() in the loop that looks up
> the BOs and parses them. Probably move implementation to a helper(s)
> too as it is already a pretty big ugly loop in xe_vm_bind_ioctl and
> adding more logic will make it uglier. Search for
> drm_gem_object_lookup in xe_vm_bind_ioctl() for the loop I am refering
> to.
>
> - Rather than drm_warns just use XE_IOCTL_DBG for these checks, that is
> the style or input to IOCTLs when we return -EINVAL in Xe.
I will do that, thanks for pointing out.
>
> - Are these checks valid? At one point on PVC I had to code to do
> atomics between CPU and GPU by migrating the BO back and forth on page
> faults. I think the i915 does this too? Are we abandoning that idea?
Currently we only set PTE_AE on LMEM for all dgfx. With this flag, we
can set that pte bit for SMEM only BO
which isn't valid for PVC as you pointed out above. UMD expects no
migration for SMEM-only so on PVC, we can't set this
bit for SMEM only BO which is why I am returning an error.
>
> - Lastly if these check are valid, rather than a platform check in the
> code I'd rather see a bit in intel_device_info.
Ok makes sense. I will add one.
>
>> switch (operation) {
>> case DRM_XE_VM_BIND_OP_MAP:
>> case DRM_XE_VM_BIND_OP_MAP_USERPTR:
>> @@ -2216,6 +2230,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>> if (__op->op == DRM_GPUVA_OP_MAP) {
>> op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL;
>> op->map.dumpable = flags & DRM_XE_VM_BIND_FLAG_DUMPABLE;
>> + op->map.enable_device_atomics = flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS;
>> op->map.pat_index = pat_index;
>> } else if (__op->op == DRM_GPUVA_OP_PREFETCH) {
>> op->prefetch.region = prefetch_region;
>> @@ -2412,6 +2427,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>> VMA_CREATE_FLAG_IS_NULL : 0;
>> flags |= op->map.dumpable ?
>> VMA_CREATE_FLAG_DUMPABLE : 0;
>> + flags |= op->map.enable_device_atomics ?
>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>>
>> vma = new_vma(vm, &op->base.map, op->map.pat_index,
>> flags);
>> @@ -2439,6 +2456,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>> flags |= op->base.remap.unmap->va->flags &
>> XE_VMA_DUMPABLE ?
>> VMA_CREATE_FLAG_DUMPABLE : 0;
>> + flags |= op->base.remap.unmap->va->flags ?
>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>>
>> vma = new_vma(vm, op->base.remap.prev,
>> old->pat_index, flags);
>> @@ -2476,6 +2495,9 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>> flags |= op->base.remap.unmap->va->flags &
>> XE_VMA_DUMPABLE ?
>> VMA_CREATE_FLAG_DUMPABLE : 0;
>> + flags |= op->base.remap.unmap->va->flags ?
>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>> +
> Extra newline.
>
>>
>> vma = new_vma(vm, op->base.remap.next,
>> old->pat_index, flags);
>> @@ -2831,7 +2853,8 @@ static int vm_bind_ioctl_ops_execute(struct xe_vm *vm,
>> (DRM_XE_VM_BIND_FLAG_READONLY | \
>> DRM_XE_VM_BIND_FLAG_IMMEDIATE | \
>> DRM_XE_VM_BIND_FLAG_NULL | \
>> - DRM_XE_VM_BIND_FLAG_DUMPABLE)
>> + DRM_XE_VM_BIND_FLAG_DUMPABLE | \
>> + DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS)
>> #define XE_64K_PAGE_MASK 0xffffull
>> #define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
>> index badf3945083d..7b9c68909c78 100644
>> --- a/drivers/gpu/drm/xe/xe_vm_types.h
>> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
>> @@ -282,6 +282,8 @@ struct xe_vma_op_map {
>> bool dumpable;
>> /** @pat_index: The pat index to use for this operation. */
>> u16 pat_index;
>> + /** @enable_device_atomics: Whether the VMA will allow device atomics */
>> + bool enable_device_atomics;
> Put this next to the other bools in xe_vma_op_map.
Will do that.
>
>> };
>>
>> /** struct xe_vma_op_remap - VMA remap operation */
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index 1446c3bae515..bffe8b1c040c 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -969,10 +969,11 @@ struct drm_xe_vm_bind_op {
>> /** @op: Bind operation to perform */
>> __u32 op;
>>
>> -#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
>> -#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
>> -#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
>> -#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
>> +#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
>> +#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
>> +#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
>> +#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
>> +#define DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS (1 << 4)
> Kernel doc for new flag.
Will add in my next revision .
Thanks,
Nirmoy
>
> Matt
>
>> /** @flags: Bind flags */
>> __u32 flags;
>>
>> --
>> 2.42.0
>>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics
2024-04-11 14:00 ` Lionel Landwerlin
@ 2024-04-11 14:07 ` Souza, Jose
2024-04-11 14:32 ` Nirmoy Das
0 siblings, 1 reply; 27+ messages in thread
From: Souza, Jose @ 2024-04-11 14:07 UTC (permalink / raw)
To: Das, Nirmoy, intel-xe@lists.freedesktop.org, Landwerlin, Lionel G
On Thu, 2024-04-11 at 17:00 +0300, Lionel Landwerlin wrote:
> On 11/04/2024 16:42, Nirmoy Das wrote:
> > Hi Lionel,
> >
> > On 4/11/2024 3:14 PM, Lionel Landwerlin wrote:
> > > On 10/04/2024 20:03, Nirmoy Das wrote:
> > > > Adds a new VMA bind flag to enable device atomics on SMEM only buffers.
> > > >
> > > > Given that simultaneous usage of device atomics and CPU atomics on
> > > > the same SMEM buffer is not guaranteed to function without migration,
> > > > and UMD expects no migration for SMEM-only buffer objects, so this
> > > > provide
> > > > a way to set device atomics when UMD is certain to use the buffer only
> > > > for device atomics.
> > > >
> > > > Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
> > > > ---
> > > > drivers/gpu/drm/xe/xe_vm.c | 27 +++++++++++++++++++++++++--
> > > > drivers/gpu/drm/xe/xe_vm_types.h | 2 ++
> > > > include/uapi/drm/xe_drm.h | 9 +++++----
> > > > 3 files changed, 32 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > > > index 8f3474c5f480..530b4bbc186c 100644
> > > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > > @@ -851,6 +851,7 @@ static void xe_vma_free(struct xe_vma *vma)
> > > > #define VMA_CREATE_FLAG_READ_ONLY BIT(0)
> > > > #define VMA_CREATE_FLAG_IS_NULL BIT(1)
> > > > #define VMA_CREATE_FLAG_DUMPABLE BIT(2)
> > > > +#define VMA_CREATE_FLAG_DEVICE_ATOMICS BIT(3)
> > > > static struct xe_vma *xe_vma_create(struct xe_vm *vm,
> > > > struct xe_bo *bo,
> > > > @@ -864,6 +865,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm
> > > > *vm,
> > > > bool read_only = (flags & VMA_CREATE_FLAG_READ_ONLY);
> > > > bool is_null = (flags & VMA_CREATE_FLAG_IS_NULL);
> > > > bool dumpable = (flags & VMA_CREATE_FLAG_DUMPABLE);
> > > > + bool enable_atomics = (flags & VMA_CREATE_FLAG_IS_NULL);
> > > > xe_assert(vm->xe, start < end);
> > > > xe_assert(vm->xe, end < vm->size);
> > > > @@ -912,7 +914,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm
> > > > *vm,
> > > > xe_bo_assert_held(bo);
> > > > if (GRAPHICS_VER(vm->xe) >= 20 || xe_bo_is_vram(bo) ||
> > > > - !IS_DGFX(vm->xe))
> > > > + !IS_DGFX(vm->xe) || enable_atomics)
> > > > vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
> > > > vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
> > > > @@ -2174,6 +2176,18 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm,
> > > > struct xe_bo *bo,
> > > > operation, (ULL)addr, (ULL)range,
> > > > (ULL)bo_offset_or_userptr);
> > > > + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
> > > > + (vm->xe->info.platform == XE_PVC && !xe_bo_is_vram(bo))) {
> > > > + drm_warn(&vm->xe->drm, "Setting device atomics on SMEM is
> > > > not supported for this platform");
> > > > + return ERR_PTR(-EINVAL);
> > > > + }
> > > > +
> > > > + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
> > > > + !xe_bo_has_single_placement(bo))
> > > > + drm_warn(&vm->xe->drm, "DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS
> > > > can be only set if the BO has single placement");
> > > > + return ERR_PTR(-EINVAL);
> > > > + }
> > > > +
> > > > switch (operation) {
> > > > case DRM_XE_VM_BIND_OP_MAP:
> > > > case DRM_XE_VM_BIND_OP_MAP_USERPTR:
> > > > @@ -2216,6 +2230,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm,
> > > > struct xe_bo *bo,
> > > > if (__op->op == DRM_GPUVA_OP_MAP) {
> > > > op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL;
> > > > op->map.dumpable = flags & DRM_XE_VM_BIND_FLAG_DUMPABLE;
> > > > + op->map.enable_device_atomics = flags &
> > > > DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS;
> > > > op->map.pat_index = pat_index;
> > > > } else if (__op->op == DRM_GPUVA_OP_PREFETCH) {
> > > > op->prefetch.region = prefetch_region;
> > > > @@ -2412,6 +2427,8 @@ static int vm_bind_ioctl_ops_parse(struct
> > > > xe_vm *vm, struct xe_exec_queue *q,
> > > > VMA_CREATE_FLAG_IS_NULL : 0;
> > > > flags |= op->map.dumpable ?
> > > > VMA_CREATE_FLAG_DUMPABLE : 0;
> > > > + flags |= op->map.enable_device_atomics ?
> > > > + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
> > > > vma = new_vma(vm, &op->base.map, op->map.pat_index,
> > > > flags);
> > > > @@ -2439,6 +2456,8 @@ static int vm_bind_ioctl_ops_parse(struct
> > > > xe_vm *vm, struct xe_exec_queue *q,
> > > > flags |= op->base.remap.unmap->va->flags &
> > > > XE_VMA_DUMPABLE ?
> > > > VMA_CREATE_FLAG_DUMPABLE : 0;
> > > > + flags |= op->base.remap.unmap->va->flags ?
> > > > + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
> > > > vma = new_vma(vm, op->base.remap.prev,
> > > > old->pat_index, flags);
> > > > @@ -2476,6 +2495,9 @@ static int vm_bind_ioctl_ops_parse(struct
> > > > xe_vm *vm, struct xe_exec_queue *q,
> > > > flags |= op->base.remap.unmap->va->flags &
> > > > XE_VMA_DUMPABLE ?
> > > > VMA_CREATE_FLAG_DUMPABLE : 0;
> > > > + flags |= op->base.remap.unmap->va->flags ?
> > > > + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
> > > > +
> > > > vma = new_vma(vm, op->base.remap.next,
> > > > old->pat_index, flags);
> > > > @@ -2831,7 +2853,8 @@ static int vm_bind_ioctl_ops_execute(struct
> > > > xe_vm *vm,
> > > > (DRM_XE_VM_BIND_FLAG_READONLY | \
> > > > DRM_XE_VM_BIND_FLAG_IMMEDIATE | \
> > > > DRM_XE_VM_BIND_FLAG_NULL | \
> > > > - DRM_XE_VM_BIND_FLAG_DUMPABLE)
> > > > + DRM_XE_VM_BIND_FLAG_DUMPABLE | \
> > > > + DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS)
> > > > #define XE_64K_PAGE_MASK 0xffffull
> > > > #define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP)
> > > > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
> > > > b/drivers/gpu/drm/xe/xe_vm_types.h
> > > > index badf3945083d..7b9c68909c78 100644
> > > > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > > > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > > > @@ -282,6 +282,8 @@ struct xe_vma_op_map {
> > > > bool dumpable;
> > > > /** @pat_index: The pat index to use for this operation. */
> > > > u16 pat_index;
> > > > + /** @enable_device_atomics: Whether the VMA will allow device
> > > > atomics */
> > > > + bool enable_device_atomics;
> > > > };
> > > > /** struct xe_vma_op_remap - VMA remap operation */
> > > > diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> > > > index 1446c3bae515..bffe8b1c040c 100644
> > > > --- a/include/uapi/drm/xe_drm.h
> > > > +++ b/include/uapi/drm/xe_drm.h
> > > > @@ -969,10 +969,11 @@ struct drm_xe_vm_bind_op {
> > > > /** @op: Bind operation to perform */
> > > > __u32 op;
> > > > -#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
> > > > -#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
> > > > -#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
> > > > -#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
> > > > +#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
> > > > +#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
> > > > +#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
> > > > +#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
> > >
> > >
> > > We should probably document that it's an error to add this flag if
> > > the BO has a single memory region.
> >
> > It is an error if this flag is set on non single memory region as
> > atomics on SMEM+LMEM buffer will be handled with migration.
> >
> > Let me know if I got that wrong.
> >
> > >
> > > Are we supposed to the ability to set that flag or is there going to
> > > be a query?
> > >
> > > A query might make sense since it's going to be rejected for some
> > > platform.
> >
> > Can we use the rejection as a query ? I am using it in a IGT test
> > https://patchwork.freedesktop.org/patch/588759/?series=132289&rev=1
> >
> >
That was not acceptable for DRM_XE_VM_BIND_FLAG_DUMPABLE. At the end we were able to land DRM_XE_VM_BIND_FLAG_DUMPABLE in Linux 6.8 so we did not
needed the query. But for this one will be needed.
> > Regards,
> >
> > Nirmoy
>
>
> Okay, sounds doable.
>
> Will try to put up an MR for Mesa soon.
>
>
> -Lionel
>
>
> >
> > >
> > >
> > > -Lionel
> > >
> > >
> > > > +#define DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS (1 << 4)
> > > > /** @flags: Bind flags */
> > > > __u32 flags;
> > >
> > >
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics
2024-04-11 14:07 ` Souza, Jose
@ 2024-04-11 14:32 ` Nirmoy Das
2024-04-11 14:40 ` Souza, Jose
0 siblings, 1 reply; 27+ messages in thread
From: Nirmoy Das @ 2024-04-11 14:32 UTC (permalink / raw)
To: Souza, Jose, Das, Nirmoy, intel-xe@lists.freedesktop.org,
Landwerlin, Lionel G
Hi Jose,
On 4/11/2024 4:07 PM, Souza, Jose wrote:
> On Thu, 2024-04-11 at 17:00 +0300, Lionel Landwerlin wrote:
>> On 11/04/2024 16:42, Nirmoy Das wrote:
>>> Hi Lionel,
>>>
>>> On 4/11/2024 3:14 PM, Lionel Landwerlin wrote:
>>>> On 10/04/2024 20:03, Nirmoy Das wrote:
>>>>> Adds a new VMA bind flag to enable device atomics on SMEM only buffers.
>>>>>
>>>>> Given that simultaneous usage of device atomics and CPU atomics on
>>>>> the same SMEM buffer is not guaranteed to function without migration,
>>>>> and UMD expects no migration for SMEM-only buffer objects, so this
>>>>> provide
>>>>> a way to set device atomics when UMD is certain to use the buffer only
>>>>> for device atomics.
>>>>>
>>>>> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
>>>>> ---
>>>>> drivers/gpu/drm/xe/xe_vm.c | 27 +++++++++++++++++++++++++--
>>>>> drivers/gpu/drm/xe/xe_vm_types.h | 2 ++
>>>>> include/uapi/drm/xe_drm.h | 9 +++++----
>>>>> 3 files changed, 32 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>>>>> index 8f3474c5f480..530b4bbc186c 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_vm.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>>>>> @@ -851,6 +851,7 @@ static void xe_vma_free(struct xe_vma *vma)
>>>>> #define VMA_CREATE_FLAG_READ_ONLY BIT(0)
>>>>> #define VMA_CREATE_FLAG_IS_NULL BIT(1)
>>>>> #define VMA_CREATE_FLAG_DUMPABLE BIT(2)
>>>>> +#define VMA_CREATE_FLAG_DEVICE_ATOMICS BIT(3)
>>>>> static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>>>>> struct xe_bo *bo,
>>>>> @@ -864,6 +865,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm
>>>>> *vm,
>>>>> bool read_only = (flags & VMA_CREATE_FLAG_READ_ONLY);
>>>>> bool is_null = (flags & VMA_CREATE_FLAG_IS_NULL);
>>>>> bool dumpable = (flags & VMA_CREATE_FLAG_DUMPABLE);
>>>>> + bool enable_atomics = (flags & VMA_CREATE_FLAG_IS_NULL);
>>>>> xe_assert(vm->xe, start < end);
>>>>> xe_assert(vm->xe, end < vm->size);
>>>>> @@ -912,7 +914,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm
>>>>> *vm,
>>>>> xe_bo_assert_held(bo);
>>>>> if (GRAPHICS_VER(vm->xe) >= 20 || xe_bo_is_vram(bo) ||
>>>>> - !IS_DGFX(vm->xe))
>>>>> + !IS_DGFX(vm->xe) || enable_atomics)
>>>>> vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>>>>> vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
>>>>> @@ -2174,6 +2176,18 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm,
>>>>> struct xe_bo *bo,
>>>>> operation, (ULL)addr, (ULL)range,
>>>>> (ULL)bo_offset_or_userptr);
>>>>> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
>>>>> + (vm->xe->info.platform == XE_PVC && !xe_bo_is_vram(bo))) {
>>>>> + drm_warn(&vm->xe->drm, "Setting device atomics on SMEM is
>>>>> not supported for this platform");
>>>>> + return ERR_PTR(-EINVAL);
>>>>> + }
>>>>> +
>>>>> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
>>>>> + !xe_bo_has_single_placement(bo))
>>>>> + drm_warn(&vm->xe->drm, "DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS
>>>>> can be only set if the BO has single placement");
>>>>> + return ERR_PTR(-EINVAL);
>>>>> + }
>>>>> +
>>>>> switch (operation) {
>>>>> case DRM_XE_VM_BIND_OP_MAP:
>>>>> case DRM_XE_VM_BIND_OP_MAP_USERPTR:
>>>>> @@ -2216,6 +2230,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm,
>>>>> struct xe_bo *bo,
>>>>> if (__op->op == DRM_GPUVA_OP_MAP) {
>>>>> op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL;
>>>>> op->map.dumpable = flags & DRM_XE_VM_BIND_FLAG_DUMPABLE;
>>>>> + op->map.enable_device_atomics = flags &
>>>>> DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS;
>>>>> op->map.pat_index = pat_index;
>>>>> } else if (__op->op == DRM_GPUVA_OP_PREFETCH) {
>>>>> op->prefetch.region = prefetch_region;
>>>>> @@ -2412,6 +2427,8 @@ static int vm_bind_ioctl_ops_parse(struct
>>>>> xe_vm *vm, struct xe_exec_queue *q,
>>>>> VMA_CREATE_FLAG_IS_NULL : 0;
>>>>> flags |= op->map.dumpable ?
>>>>> VMA_CREATE_FLAG_DUMPABLE : 0;
>>>>> + flags |= op->map.enable_device_atomics ?
>>>>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>>>>> vma = new_vma(vm, &op->base.map, op->map.pat_index,
>>>>> flags);
>>>>> @@ -2439,6 +2456,8 @@ static int vm_bind_ioctl_ops_parse(struct
>>>>> xe_vm *vm, struct xe_exec_queue *q,
>>>>> flags |= op->base.remap.unmap->va->flags &
>>>>> XE_VMA_DUMPABLE ?
>>>>> VMA_CREATE_FLAG_DUMPABLE : 0;
>>>>> + flags |= op->base.remap.unmap->va->flags ?
>>>>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>>>>> vma = new_vma(vm, op->base.remap.prev,
>>>>> old->pat_index, flags);
>>>>> @@ -2476,6 +2495,9 @@ static int vm_bind_ioctl_ops_parse(struct
>>>>> xe_vm *vm, struct xe_exec_queue *q,
>>>>> flags |= op->base.remap.unmap->va->flags &
>>>>> XE_VMA_DUMPABLE ?
>>>>> VMA_CREATE_FLAG_DUMPABLE : 0;
>>>>> + flags |= op->base.remap.unmap->va->flags ?
>>>>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>>>>> +
>>>>> vma = new_vma(vm, op->base.remap.next,
>>>>> old->pat_index, flags);
>>>>> @@ -2831,7 +2853,8 @@ static int vm_bind_ioctl_ops_execute(struct
>>>>> xe_vm *vm,
>>>>> (DRM_XE_VM_BIND_FLAG_READONLY | \
>>>>> DRM_XE_VM_BIND_FLAG_IMMEDIATE | \
>>>>> DRM_XE_VM_BIND_FLAG_NULL | \
>>>>> - DRM_XE_VM_BIND_FLAG_DUMPABLE)
>>>>> + DRM_XE_VM_BIND_FLAG_DUMPABLE | \
>>>>> + DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS)
>>>>> #define XE_64K_PAGE_MASK 0xffffull
>>>>> #define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP)
>>>>> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
>>>>> b/drivers/gpu/drm/xe/xe_vm_types.h
>>>>> index badf3945083d..7b9c68909c78 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_vm_types.h
>>>>> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
>>>>> @@ -282,6 +282,8 @@ struct xe_vma_op_map {
>>>>> bool dumpable;
>>>>> /** @pat_index: The pat index to use for this operation. */
>>>>> u16 pat_index;
>>>>> + /** @enable_device_atomics: Whether the VMA will allow device
>>>>> atomics */
>>>>> + bool enable_device_atomics;
>>>>> };
>>>>> /** struct xe_vma_op_remap - VMA remap operation */
>>>>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>>>>> index 1446c3bae515..bffe8b1c040c 100644
>>>>> --- a/include/uapi/drm/xe_drm.h
>>>>> +++ b/include/uapi/drm/xe_drm.h
>>>>> @@ -969,10 +969,11 @@ struct drm_xe_vm_bind_op {
>>>>> /** @op: Bind operation to perform */
>>>>> __u32 op;
>>>>> -#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
>>>>> -#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
>>>>> -#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
>>>>> -#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
>>>>> +#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
>>>>> +#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
>>>>> +#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
>>>>> +#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
>>>>
>>>> We should probably document that it's an error to add this flag if
>>>> the BO has a single memory region.
>>> It is an error if this flag is set on non single memory region as
>>> atomics on SMEM+LMEM buffer will be handled with migration.
>>>
>>> Let me know if I got that wrong.
>>>
>>>> Are we supposed to the ability to set that flag or is there going to
>>>> be a query?
>>>>
>>>> A query might make sense since it's going to be rejected for some
>>>> platform.
>>> Can we use the rejection as a query ? I am using it in a IGT test
>>> https://patchwork.freedesktop.org/patch/588759/?series=132289&rev=1
>>>
>>>
> That was not acceptable for DRM_XE_VM_BIND_FLAG_DUMPABLE. At the end we were able to land DRM_XE_VM_BIND_FLAG_DUMPABLE in Linux 6.8 so we did not
> needed the query. But for this one will be needed.
If the rejection can't work as query, I can add one. Would
adding(DRM_XE_QUERY_CONFIG_SUPP_DEV_ATOMIC_ON_SMEM) a new flag into
'struct drm_xe_query_config' work ?
Regards,
Nirmoy
>
>
>>> Regards,
>>>
>>> Nirmoy
>>
>> Okay, sounds doable.
>>
>> Will try to put up an MR for Mesa soon.
>>
>>
>> -Lionel
>>
>>
>>>>
>>>> -Lionel
>>>>
>>>>
>>>>> +#define DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS (1 << 4)
>>>>> /** @flags: Bind flags */
>>>>> __u32 flags;
>>>>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics
2024-04-11 14:32 ` Nirmoy Das
@ 2024-04-11 14:40 ` Souza, Jose
2024-04-11 14:54 ` Nirmoy Das
0 siblings, 1 reply; 27+ messages in thread
From: Souza, Jose @ 2024-04-11 14:40 UTC (permalink / raw)
To: Das, Nirmoy, intel-xe@lists.freedesktop.org,
nirmoy.das@linux.intel.com, Landwerlin, Lionel G
On Thu, 2024-04-11 at 16:32 +0200, Nirmoy Das wrote:
> Hi Jose,
>
> On 4/11/2024 4:07 PM, Souza, Jose wrote:
> > On Thu, 2024-04-11 at 17:00 +0300, Lionel Landwerlin wrote:
> > > On 11/04/2024 16:42, Nirmoy Das wrote:
> > > > Hi Lionel,
> > > >
> > > > On 4/11/2024 3:14 PM, Lionel Landwerlin wrote:
> > > > > On 10/04/2024 20:03, Nirmoy Das wrote:
> > > > > > Adds a new VMA bind flag to enable device atomics on SMEM only buffers.
> > > > > >
> > > > > > Given that simultaneous usage of device atomics and CPU atomics on
> > > > > > the same SMEM buffer is not guaranteed to function without migration,
> > > > > > and UMD expects no migration for SMEM-only buffer objects, so this
> > > > > > provide
> > > > > > a way to set device atomics when UMD is certain to use the buffer only
> > > > > > for device atomics.
> > > > > >
> > > > > > Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
> > > > > > ---
> > > > > > drivers/gpu/drm/xe/xe_vm.c | 27 +++++++++++++++++++++++++--
> > > > > > drivers/gpu/drm/xe/xe_vm_types.h | 2 ++
> > > > > > include/uapi/drm/xe_drm.h | 9 +++++----
> > > > > > 3 files changed, 32 insertions(+), 6 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > > > > > index 8f3474c5f480..530b4bbc186c 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > > > > @@ -851,6 +851,7 @@ static void xe_vma_free(struct xe_vma *vma)
> > > > > > #define VMA_CREATE_FLAG_READ_ONLY BIT(0)
> > > > > > #define VMA_CREATE_FLAG_IS_NULL BIT(1)
> > > > > > #define VMA_CREATE_FLAG_DUMPABLE BIT(2)
> > > > > > +#define VMA_CREATE_FLAG_DEVICE_ATOMICS BIT(3)
> > > > > > static struct xe_vma *xe_vma_create(struct xe_vm *vm,
> > > > > > struct xe_bo *bo,
> > > > > > @@ -864,6 +865,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm
> > > > > > *vm,
> > > > > > bool read_only = (flags & VMA_CREATE_FLAG_READ_ONLY);
> > > > > > bool is_null = (flags & VMA_CREATE_FLAG_IS_NULL);
> > > > > > bool dumpable = (flags & VMA_CREATE_FLAG_DUMPABLE);
> > > > > > + bool enable_atomics = (flags & VMA_CREATE_FLAG_IS_NULL);
> > > > > > xe_assert(vm->xe, start < end);
> > > > > > xe_assert(vm->xe, end < vm->size);
> > > > > > @@ -912,7 +914,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm
> > > > > > *vm,
> > > > > > xe_bo_assert_held(bo);
> > > > > > if (GRAPHICS_VER(vm->xe) >= 20 || xe_bo_is_vram(bo) ||
> > > > > > - !IS_DGFX(vm->xe))
> > > > > > + !IS_DGFX(vm->xe) || enable_atomics)
> > > > > > vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
> > > > > > vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
> > > > > > @@ -2174,6 +2176,18 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm,
> > > > > > struct xe_bo *bo,
> > > > > > operation, (ULL)addr, (ULL)range,
> > > > > > (ULL)bo_offset_or_userptr);
> > > > > > + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
> > > > > > + (vm->xe->info.platform == XE_PVC && !xe_bo_is_vram(bo))) {
> > > > > > + drm_warn(&vm->xe->drm, "Setting device atomics on SMEM is
> > > > > > not supported for this platform");
> > > > > > + return ERR_PTR(-EINVAL);
> > > > > > + }
> > > > > > +
> > > > > > + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
> > > > > > + !xe_bo_has_single_placement(bo))
> > > > > > + drm_warn(&vm->xe->drm, "DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS
> > > > > > can be only set if the BO has single placement");
> > > > > > + return ERR_PTR(-EINVAL);
> > > > > > + }
> > > > > > +
> > > > > > switch (operation) {
> > > > > > case DRM_XE_VM_BIND_OP_MAP:
> > > > > > case DRM_XE_VM_BIND_OP_MAP_USERPTR:
> > > > > > @@ -2216,6 +2230,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm,
> > > > > > struct xe_bo *bo,
> > > > > > if (__op->op == DRM_GPUVA_OP_MAP) {
> > > > > > op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL;
> > > > > > op->map.dumpable = flags & DRM_XE_VM_BIND_FLAG_DUMPABLE;
> > > > > > + op->map.enable_device_atomics = flags &
> > > > > > DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS;
> > > > > > op->map.pat_index = pat_index;
> > > > > > } else if (__op->op == DRM_GPUVA_OP_PREFETCH) {
> > > > > > op->prefetch.region = prefetch_region;
> > > > > > @@ -2412,6 +2427,8 @@ static int vm_bind_ioctl_ops_parse(struct
> > > > > > xe_vm *vm, struct xe_exec_queue *q,
> > > > > > VMA_CREATE_FLAG_IS_NULL : 0;
> > > > > > flags |= op->map.dumpable ?
> > > > > > VMA_CREATE_FLAG_DUMPABLE : 0;
> > > > > > + flags |= op->map.enable_device_atomics ?
> > > > > > + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
> > > > > > vma = new_vma(vm, &op->base.map, op->map.pat_index,
> > > > > > flags);
> > > > > > @@ -2439,6 +2456,8 @@ static int vm_bind_ioctl_ops_parse(struct
> > > > > > xe_vm *vm, struct xe_exec_queue *q,
> > > > > > flags |= op->base.remap.unmap->va->flags &
> > > > > > XE_VMA_DUMPABLE ?
> > > > > > VMA_CREATE_FLAG_DUMPABLE : 0;
> > > > > > + flags |= op->base.remap.unmap->va->flags ?
> > > > > > + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
> > > > > > vma = new_vma(vm, op->base.remap.prev,
> > > > > > old->pat_index, flags);
> > > > > > @@ -2476,6 +2495,9 @@ static int vm_bind_ioctl_ops_parse(struct
> > > > > > xe_vm *vm, struct xe_exec_queue *q,
> > > > > > flags |= op->base.remap.unmap->va->flags &
> > > > > > XE_VMA_DUMPABLE ?
> > > > > > VMA_CREATE_FLAG_DUMPABLE : 0;
> > > > > > + flags |= op->base.remap.unmap->va->flags ?
> > > > > > + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
> > > > > > +
> > > > > > vma = new_vma(vm, op->base.remap.next,
> > > > > > old->pat_index, flags);
> > > > > > @@ -2831,7 +2853,8 @@ static int vm_bind_ioctl_ops_execute(struct
> > > > > > xe_vm *vm,
> > > > > > (DRM_XE_VM_BIND_FLAG_READONLY | \
> > > > > > DRM_XE_VM_BIND_FLAG_IMMEDIATE | \
> > > > > > DRM_XE_VM_BIND_FLAG_NULL | \
> > > > > > - DRM_XE_VM_BIND_FLAG_DUMPABLE)
> > > > > > + DRM_XE_VM_BIND_FLAG_DUMPABLE | \
> > > > > > + DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS)
> > > > > > #define XE_64K_PAGE_MASK 0xffffull
> > > > > > #define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP)
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
> > > > > > b/drivers/gpu/drm/xe/xe_vm_types.h
> > > > > > index badf3945083d..7b9c68909c78 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > > > > > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > > > > > @@ -282,6 +282,8 @@ struct xe_vma_op_map {
> > > > > > bool dumpable;
> > > > > > /** @pat_index: The pat index to use for this operation. */
> > > > > > u16 pat_index;
> > > > > > + /** @enable_device_atomics: Whether the VMA will allow device
> > > > > > atomics */
> > > > > > + bool enable_device_atomics;
> > > > > > };
> > > > > > /** struct xe_vma_op_remap - VMA remap operation */
> > > > > > diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> > > > > > index 1446c3bae515..bffe8b1c040c 100644
> > > > > > --- a/include/uapi/drm/xe_drm.h
> > > > > > +++ b/include/uapi/drm/xe_drm.h
> > > > > > @@ -969,10 +969,11 @@ struct drm_xe_vm_bind_op {
> > > > > > /** @op: Bind operation to perform */
> > > > > > __u32 op;
> > > > > > -#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
> > > > > > -#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
> > > > > > -#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
> > > > > > -#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
> > > > > > +#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
> > > > > > +#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
> > > > > > +#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
> > > > > > +#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
> > > > >
> > > > > We should probably document that it's an error to add this flag if
> > > > > the BO has a single memory region.
> > > > It is an error if this flag is set on non single memory region as
> > > > atomics on SMEM+LMEM buffer will be handled with migration.
> > > >
> > > > Let me know if I got that wrong.
> > > >
> > > > > Are we supposed to the ability to set that flag or is there going to
> > > > > be a query?
> > > > >
> > > > > A query might make sense since it's going to be rejected for some
> > > > > platform.
> > > > Can we use the rejection as a query ? I am using it in a IGT test
> > > > https://patchwork.freedesktop.org/patch/588759/?series=132289&rev=1
> > > >
> > > >
> > That was not acceptable for DRM_XE_VM_BIND_FLAG_DUMPABLE. At the end we were able to land DRM_XE_VM_BIND_FLAG_DUMPABLE in Linux 6.8 so we did not
> > needed the query. But for this one will be needed.
>
> If the rejection can't work as query, I can add one. Would
> adding(DRM_XE_QUERY_CONFIG_SUPP_DEV_ATOMIC_ON_SMEM) a new flag into
> 'struct drm_xe_query_config' work ?
As another bit after DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM? That lgtm but even better would be add a new category like
DRM_XE_QUERY_CONFIG_SUPPORTED_FEATURES and then DRM_XE_QUERY_CONFIG_SUPPORTED_FEATURES_ATOMIC_ON_SMEM (1 << 0).
So we have a u64 just to check for features supported by different KMD versions.
But ask the opinion of other engineers in Xe chat.
>
>
> Regards,
>
> Nirmoy
>
> >
> >
> > > > Regards,
> > > >
> > > > Nirmoy
> > >
> > > Okay, sounds doable.
> > >
> > > Will try to put up an MR for Mesa soon.
> > >
> > >
> > > -Lionel
> > >
> > >
> > > > >
> > > > > -Lionel
> > > > >
> > > > >
> > > > > > +#define DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS (1 << 4)
> > > > > > /** @flags: Bind flags */
> > > > > > __u32 flags;
> > > > >
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics
2024-04-11 14:40 ` Souza, Jose
@ 2024-04-11 14:54 ` Nirmoy Das
0 siblings, 0 replies; 27+ messages in thread
From: Nirmoy Das @ 2024-04-11 14:54 UTC (permalink / raw)
To: Souza, Jose, intel-xe@lists.freedesktop.org,
nirmoy.das@linux.intel.com, Landwerlin, Lionel G
Hi Jose,
On 4/11/2024 4:40 PM, Souza, Jose wrote:
> On Thu, 2024-04-11 at 16:32 +0200, Nirmoy Das wrote:
>> Hi Jose,
>>
>> On 4/11/2024 4:07 PM, Souza, Jose wrote:
>>> On Thu, 2024-04-11 at 17:00 +0300, Lionel Landwerlin wrote:
>>>> On 11/04/2024 16:42, Nirmoy Das wrote:
>>>>> Hi Lionel,
>>>>>
>>>>> On 4/11/2024 3:14 PM, Lionel Landwerlin wrote:
>>>>>> On 10/04/2024 20:03, Nirmoy Das wrote:
>>>>>>> Adds a new VMA bind flag to enable device atomics on SMEM only buffers.
>>>>>>>
>>>>>>> Given that simultaneous usage of device atomics and CPU atomics on
>>>>>>> the same SMEM buffer is not guaranteed to function without migration,
>>>>>>> and UMD expects no migration for SMEM-only buffer objects, so this
>>>>>>> provide
>>>>>>> a way to set device atomics when UMD is certain to use the buffer only
>>>>>>> for device atomics.
>>>>>>>
>>>>>>> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/xe/xe_vm.c | 27 +++++++++++++++++++++++++--
>>>>>>> drivers/gpu/drm/xe/xe_vm_types.h | 2 ++
>>>>>>> include/uapi/drm/xe_drm.h | 9 +++++----
>>>>>>> 3 files changed, 32 insertions(+), 6 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>>>>>>> index 8f3474c5f480..530b4bbc186c 100644
>>>>>>> --- a/drivers/gpu/drm/xe/xe_vm.c
>>>>>>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>>>>>>> @@ -851,6 +851,7 @@ static void xe_vma_free(struct xe_vma *vma)
>>>>>>> #define VMA_CREATE_FLAG_READ_ONLY BIT(0)
>>>>>>> #define VMA_CREATE_FLAG_IS_NULL BIT(1)
>>>>>>> #define VMA_CREATE_FLAG_DUMPABLE BIT(2)
>>>>>>> +#define VMA_CREATE_FLAG_DEVICE_ATOMICS BIT(3)
>>>>>>> static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>>>>>>> struct xe_bo *bo,
>>>>>>> @@ -864,6 +865,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm
>>>>>>> *vm,
>>>>>>> bool read_only = (flags & VMA_CREATE_FLAG_READ_ONLY);
>>>>>>> bool is_null = (flags & VMA_CREATE_FLAG_IS_NULL);
>>>>>>> bool dumpable = (flags & VMA_CREATE_FLAG_DUMPABLE);
>>>>>>> + bool enable_atomics = (flags & VMA_CREATE_FLAG_IS_NULL);
>>>>>>> xe_assert(vm->xe, start < end);
>>>>>>> xe_assert(vm->xe, end < vm->size);
>>>>>>> @@ -912,7 +914,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm
>>>>>>> *vm,
>>>>>>> xe_bo_assert_held(bo);
>>>>>>> if (GRAPHICS_VER(vm->xe) >= 20 || xe_bo_is_vram(bo) ||
>>>>>>> - !IS_DGFX(vm->xe))
>>>>>>> + !IS_DGFX(vm->xe) || enable_atomics)
>>>>>>> vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>>>>>>> vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
>>>>>>> @@ -2174,6 +2176,18 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm,
>>>>>>> struct xe_bo *bo,
>>>>>>> operation, (ULL)addr, (ULL)range,
>>>>>>> (ULL)bo_offset_or_userptr);
>>>>>>> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
>>>>>>> + (vm->xe->info.platform == XE_PVC && !xe_bo_is_vram(bo))) {
>>>>>>> + drm_warn(&vm->xe->drm, "Setting device atomics on SMEM is
>>>>>>> not supported for this platform");
>>>>>>> + return ERR_PTR(-EINVAL);
>>>>>>> + }
>>>>>>> +
>>>>>>> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
>>>>>>> + !xe_bo_has_single_placement(bo))
>>>>>>> + drm_warn(&vm->xe->drm, "DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS
>>>>>>> can be only set if the BO has single placement");
>>>>>>> + return ERR_PTR(-EINVAL);
>>>>>>> + }
>>>>>>> +
>>>>>>> switch (operation) {
>>>>>>> case DRM_XE_VM_BIND_OP_MAP:
>>>>>>> case DRM_XE_VM_BIND_OP_MAP_USERPTR:
>>>>>>> @@ -2216,6 +2230,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm,
>>>>>>> struct xe_bo *bo,
>>>>>>> if (__op->op == DRM_GPUVA_OP_MAP) {
>>>>>>> op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL;
>>>>>>> op->map.dumpable = flags & DRM_XE_VM_BIND_FLAG_DUMPABLE;
>>>>>>> + op->map.enable_device_atomics = flags &
>>>>>>> DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS;
>>>>>>> op->map.pat_index = pat_index;
>>>>>>> } else if (__op->op == DRM_GPUVA_OP_PREFETCH) {
>>>>>>> op->prefetch.region = prefetch_region;
>>>>>>> @@ -2412,6 +2427,8 @@ static int vm_bind_ioctl_ops_parse(struct
>>>>>>> xe_vm *vm, struct xe_exec_queue *q,
>>>>>>> VMA_CREATE_FLAG_IS_NULL : 0;
>>>>>>> flags |= op->map.dumpable ?
>>>>>>> VMA_CREATE_FLAG_DUMPABLE : 0;
>>>>>>> + flags |= op->map.enable_device_atomics ?
>>>>>>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>>>>>>> vma = new_vma(vm, &op->base.map, op->map.pat_index,
>>>>>>> flags);
>>>>>>> @@ -2439,6 +2456,8 @@ static int vm_bind_ioctl_ops_parse(struct
>>>>>>> xe_vm *vm, struct xe_exec_queue *q,
>>>>>>> flags |= op->base.remap.unmap->va->flags &
>>>>>>> XE_VMA_DUMPABLE ?
>>>>>>> VMA_CREATE_FLAG_DUMPABLE : 0;
>>>>>>> + flags |= op->base.remap.unmap->va->flags ?
>>>>>>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>>>>>>> vma = new_vma(vm, op->base.remap.prev,
>>>>>>> old->pat_index, flags);
>>>>>>> @@ -2476,6 +2495,9 @@ static int vm_bind_ioctl_ops_parse(struct
>>>>>>> xe_vm *vm, struct xe_exec_queue *q,
>>>>>>> flags |= op->base.remap.unmap->va->flags &
>>>>>>> XE_VMA_DUMPABLE ?
>>>>>>> VMA_CREATE_FLAG_DUMPABLE : 0;
>>>>>>> + flags |= op->base.remap.unmap->va->flags ?
>>>>>>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>>>>>>> +
>>>>>>> vma = new_vma(vm, op->base.remap.next,
>>>>>>> old->pat_index, flags);
>>>>>>> @@ -2831,7 +2853,8 @@ static int vm_bind_ioctl_ops_execute(struct
>>>>>>> xe_vm *vm,
>>>>>>> (DRM_XE_VM_BIND_FLAG_READONLY | \
>>>>>>> DRM_XE_VM_BIND_FLAG_IMMEDIATE | \
>>>>>>> DRM_XE_VM_BIND_FLAG_NULL | \
>>>>>>> - DRM_XE_VM_BIND_FLAG_DUMPABLE)
>>>>>>> + DRM_XE_VM_BIND_FLAG_DUMPABLE | \
>>>>>>> + DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS)
>>>>>>> #define XE_64K_PAGE_MASK 0xffffull
>>>>>>> #define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP)
>>>>>>> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
>>>>>>> b/drivers/gpu/drm/xe/xe_vm_types.h
>>>>>>> index badf3945083d..7b9c68909c78 100644
>>>>>>> --- a/drivers/gpu/drm/xe/xe_vm_types.h
>>>>>>> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
>>>>>>> @@ -282,6 +282,8 @@ struct xe_vma_op_map {
>>>>>>> bool dumpable;
>>>>>>> /** @pat_index: The pat index to use for this operation. */
>>>>>>> u16 pat_index;
>>>>>>> + /** @enable_device_atomics: Whether the VMA will allow device
>>>>>>> atomics */
>>>>>>> + bool enable_device_atomics;
>>>>>>> };
>>>>>>> /** struct xe_vma_op_remap - VMA remap operation */
>>>>>>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>>>>>>> index 1446c3bae515..bffe8b1c040c 100644
>>>>>>> --- a/include/uapi/drm/xe_drm.h
>>>>>>> +++ b/include/uapi/drm/xe_drm.h
>>>>>>> @@ -969,10 +969,11 @@ struct drm_xe_vm_bind_op {
>>>>>>> /** @op: Bind operation to perform */
>>>>>>> __u32 op;
>>>>>>> -#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
>>>>>>> -#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
>>>>>>> -#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
>>>>>>> -#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
>>>>>>> +#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
>>>>>>> +#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
>>>>>>> +#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
>>>>>>> +#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
>>>>>> We should probably document that it's an error to add this flag if
>>>>>> the BO has a single memory region.
>>>>> It is an error if this flag is set on non single memory region as
>>>>> atomics on SMEM+LMEM buffer will be handled with migration.
>>>>>
>>>>> Let me know if I got that wrong.
>>>>>
>>>>>> Are we supposed to the ability to set that flag or is there going to
>>>>>> be a query?
>>>>>>
>>>>>> A query might make sense since it's going to be rejected for some
>>>>>> platform.
>>>>> Can we use the rejection as a query ? I am using it in a IGT test
>>>>> https://patchwork.freedesktop.org/patch/588759/?series=132289&rev=1
>>>>>
>>>>>
>>> That was not acceptable for DRM_XE_VM_BIND_FLAG_DUMPABLE. At the end we were able to land DRM_XE_VM_BIND_FLAG_DUMPABLE in Linux 6.8 so we did not
>>> needed the query. But for this one will be needed.
>> If the rejection can't work as query, I can add one. Would
>> adding(DRM_XE_QUERY_CONFIG_SUPP_DEV_ATOMIC_ON_SMEM) a new flag into
>> 'struct drm_xe_query_config' work ?
> As another bit after DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM?
Yes.
> That lgtm but even better would be add a new category like
> DRM_XE_QUERY_CONFIG_SUPPORTED_FEATURES and then DRM_XE_QUERY_CONFIG_SUPPORTED_FEATURES_ATOMIC_ON_SMEM (1 << 0).
> So we have a u64 just to check for features supported by different KMD versions.
>
> But ask the opinion of other engineers in Xe chat.
Done. Let's see what others prefer.
Thanks,
Nirmoy
>
>>
>> Regards,
>>
>> Nirmoy
>>
>>>
>>>>> Regards,
>>>>>
>>>>> Nirmoy
>>>> Okay, sounds doable.
>>>>
>>>> Will try to put up an MR for Mesa soon.
>>>>
>>>>
>>>> -Lionel
>>>>
>>>>
>>>>>> -Lionel
>>>>>>
>>>>>>
>>>>>>> +#define DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS (1 << 4)
>>>>>>> /** @flags: Bind flags */
>>>>>>> __u32 flags;
^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: [PATCH 0/3] Enable device atomics with a VM bind flag
2024-04-10 17:03 [PATCH 0/3] Enable device atomics with a VM bind flag Nirmoy Das
` (5 preceding siblings ...)
2024-04-10 18:04 ` ✗ CI.KUnit: failure " Patchwork
@ 2024-04-11 16:22 ` Zeng, Oak
2024-04-11 17:00 ` Nirmoy Das
6 siblings, 1 reply; 27+ messages in thread
From: Zeng, Oak @ 2024-04-11 16:22 UTC (permalink / raw)
To: Das, Nirmoy, intel-xe@lists.freedesktop.org
Cc: Vivekanandan, Balasubramani, Welty, Brian, Yang, Fei,
Landwerlin, Lionel G, Roper, Matthew D, Brost, Matthew,
Mrozek, Michal, Thomas Hellstr_m
> -----Original Message-----
> From: Das, Nirmoy <nirmoy.das@intel.com>
> Sent: Wednesday, April 10, 2024 1:03 PM
> To: intel-xe@lists.freedesktop.org
> Cc: Das, Nirmoy <nirmoy.das@intel.com>; Vivekanandan, Balasubramani
> <balasubramani.vivekanandan@intel.com>; Welty, Brian
> <brian.welty@intel.com>; Yang, Fei <fei.yang@intel.com>; Landwerlin, Lionel G
> <lionel.g.landwerlin@intel.com>; Roper, Matthew D
> <matthew.d.roper@intel.com>; Brost, Matthew <matthew.brost@intel.com>;
> Mrozek, Michal <michal.mrozek@intel.com>; Zeng, Oak <oak.zeng@intel.com>;
> Thomas Hellstr_m <thomas.hellstrom@linux.intel.com>
> Subject: [PATCH 0/3] Enable device atomics with a VM bind flag
>
> Currently device atomics in SMEM only buffer is not supported and
> given that simultaneous usage of device atomics and CPU atomics on
> the same SMEM buffer is not guaranteed to function without migration,
> and UMD expects no migration for SMEM-only buffer objects, so this
> provide a way to set device atomics when UMD is certain to use the
> buffer only for device atomics.
If I understand it correctly, per umd-kmd contract, kmd should assume all shared allocations support system-wide atomic by default (i.e. even if there is not any user settings). So if we want to introduce API to let user to say "only device atomic is expected on this allocation", the API would be "No system-wide atomics on this allocation".
Similarly, we can introduce API to let user say:
there is no atomics expected to this allocation
no device atomic or
no host atomic
But the above assumption (all allocation by default support system-wide atomics) doesn't work for device or host allocations on some platform, i.e., as explained in the commit message...
So the API format is still an open to me. Should we define thing similar to umd api here: https://spec.oneapi.io/level-zero/latest/core/api.html#ze-memory-atomic-attr-exp-flags-t
Oak
>
> Test-with: 20240410170041.24963-1-nirmoy.das@intel.com
> Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
> Cc: Brian Welty <brian.welty@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Lionel G Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Michal Mrozek <michal.mrozek@intel.com>
> Cc: Oak Zeng <oak.zeng@intel.com>
> Cc: Thomas Hellstr_m <thomas.hellstrom@linux.intel.com>
>
> Nirmoy Das (3):
> drm/xe: Consolidate setting PTE_AE into one place
> drm/xe: Add function to check if BO has single placement
> drm/xe/uapi: Introduce VMA bind flag for device atomics
>
> drivers/gpu/drm/xe/xe_bo.c | 14 ++++++++++++++
> drivers/gpu/drm/xe/xe_bo.h | 1 +
> drivers/gpu/drm/xe/xe_pt.c | 4 +---
> drivers/gpu/drm/xe/xe_vm.c | 32 ++++++++++++++++++++++++++++----
> drivers/gpu/drm/xe/xe_vm_types.h | 2 ++
> include/uapi/drm/xe_drm.h | 9 +++++----
> 6 files changed, 51 insertions(+), 11 deletions(-)
>
> --
> 2.42.0
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/3] Enable device atomics with a VM bind flag
2024-04-11 16:22 ` [PATCH 0/3] " Zeng, Oak
@ 2024-04-11 17:00 ` Nirmoy Das
2024-04-11 17:23 ` Zeng, Oak
0 siblings, 1 reply; 27+ messages in thread
From: Nirmoy Das @ 2024-04-11 17:00 UTC (permalink / raw)
To: Zeng, Oak, Das, Nirmoy, intel-xe@lists.freedesktop.org
Cc: Vivekanandan, Balasubramani, Welty, Brian, Yang, Fei,
Landwerlin, Lionel G, Roper, Matthew D, Brost, Matthew,
Mrozek, Michal, Thomas Hellstr_m
Hi Oak,
On 4/11/2024 6:22 PM, Zeng, Oak wrote:
>
>> -----Original Message-----
>> From: Das, Nirmoy <nirmoy.das@intel.com>
>> Sent: Wednesday, April 10, 2024 1:03 PM
>> To: intel-xe@lists.freedesktop.org
>> Cc: Das, Nirmoy <nirmoy.das@intel.com>; Vivekanandan, Balasubramani
>> <balasubramani.vivekanandan@intel.com>; Welty, Brian
>> <brian.welty@intel.com>; Yang, Fei <fei.yang@intel.com>; Landwerlin, Lionel G
>> <lionel.g.landwerlin@intel.com>; Roper, Matthew D
>> <matthew.d.roper@intel.com>; Brost, Matthew <matthew.brost@intel.com>;
>> Mrozek, Michal <michal.mrozek@intel.com>; Zeng, Oak <oak.zeng@intel.com>;
>> Thomas Hellstr_m <thomas.hellstrom@linux.intel.com>
>> Subject: [PATCH 0/3] Enable device atomics with a VM bind flag
>>
>> Currently device atomics in SMEM only buffer is not supported and
>> given that simultaneous usage of device atomics and CPU atomics on
>> the same SMEM buffer is not guaranteed to function without migration,
>> and UMD expects no migration for SMEM-only buffer objects, so this
>> provide a way to set device atomics when UMD is certain to use the
>> buffer only for device atomics.
> If I understand it correctly, per umd-kmd contract, kmd should assume all shared allocations support system-wide atomic by default
Do we have any reference to this ? I wish this is documented somewhere.
For Level0, the system-wide atomics is default expectation only when
there is KMD migration and on buffer with SMEM+LMEM placements. This is
my understanding
after long discussion that I had with Michal, Matt, and Lionel.
This patch is needed because a vulkan application can always try to do
cpu atomics on SMEM only BO and we can't allow device atomics on that
by default.
> (i.e. even if there is not any user settings). So if we want to introduce API to let user to say "only device atomic is expected on this allocation", the API would be "No system-wide atomics on this allocation".
I think the flag name suites well with existing
"ZE_MEMORY_ATOMIC_ATTR_EXP_FLAG_DEVICE_ATOMICS".
>
> Similarly, we can introduce API to let user say:
> there is no atomics expected to this allocation
> no device atomic or
> no host atomic
We can add those once we have such requirement from UMD.
Regards,
Nirmoy
>
> But the above assumption (all allocation by default support system-wide atomics) doesn't work for device or host allocations on some platform, i.e., as explained in the commit message...
>
> So the API format is still an open to me. Should we define thing similar to umd api here: https://spec.oneapi.io/level-zero/latest/core/api.html#ze-memory-atomic-attr-exp-flags-t
>
> Oak
>
>
>> Test-with: 20240410170041.24963-1-nirmoy.das@intel.com
>> Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
>> Cc: Brian Welty <brian.welty@intel.com>
>> Cc: Fei Yang <fei.yang@intel.com>
>> Cc: Lionel G Landwerlin <lionel.g.landwerlin@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Michal Mrozek <michal.mrozek@intel.com>
>> Cc: Oak Zeng <oak.zeng@intel.com>
>> Cc: Thomas Hellstr_m <thomas.hellstrom@linux.intel.com>
>>
>> Nirmoy Das (3):
>> drm/xe: Consolidate setting PTE_AE into one place
>> drm/xe: Add function to check if BO has single placement
>> drm/xe/uapi: Introduce VMA bind flag for device atomics
>>
>> drivers/gpu/drm/xe/xe_bo.c | 14 ++++++++++++++
>> drivers/gpu/drm/xe/xe_bo.h | 1 +
>> drivers/gpu/drm/xe/xe_pt.c | 4 +---
>> drivers/gpu/drm/xe/xe_vm.c | 32 ++++++++++++++++++++++++++++----
>> drivers/gpu/drm/xe/xe_vm_types.h | 2 ++
>> include/uapi/drm/xe_drm.h | 9 +++++----
>> 6 files changed, 51 insertions(+), 11 deletions(-)
>>
>> --
>> 2.42.0
^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: [PATCH 0/3] Enable device atomics with a VM bind flag
2024-04-11 17:00 ` Nirmoy Das
@ 2024-04-11 17:23 ` Zeng, Oak
2024-04-12 5:06 ` Mrozek, Michal
0 siblings, 1 reply; 27+ messages in thread
From: Zeng, Oak @ 2024-04-11 17:23 UTC (permalink / raw)
To: Nirmoy Das, Das, Nirmoy, intel-xe@lists.freedesktop.org,
Mrozek, Michal
Cc: Vivekanandan, Balasubramani, Welty, Brian, Yang, Fei,
Landwerlin, Lionel G, Roper, Matthew D, Brost, Matthew,
Thomas Hellstr_m
> -----Original Message-----
> From: Nirmoy Das <nirmoy.das@linux.intel.com>
> Sent: Thursday, April 11, 2024 1:01 PM
> To: Zeng, Oak <oak.zeng@intel.com>; Das, Nirmoy <nirmoy.das@intel.com>;
> intel-xe@lists.freedesktop.org
> Cc: Vivekanandan, Balasubramani <balasubramani.vivekanandan@intel.com>;
> Welty, Brian <brian.welty@intel.com>; Yang, Fei <fei.yang@intel.com>;
> Landwerlin, Lionel G <lionel.g.landwerlin@intel.com>; Roper, Matthew D
> <matthew.d.roper@intel.com>; Brost, Matthew <matthew.brost@intel.com>;
> Mrozek, Michal <michal.mrozek@intel.com>; Thomas Hellstr_m
> <thomas.hellstrom@linux.intel.com>
> Subject: Re: [PATCH 0/3] Enable device atomics with a VM bind flag
>
> Hi Oak,
>
> On 4/11/2024 6:22 PM, Zeng, Oak wrote:
> >
> >> -----Original Message-----
> >> From: Das, Nirmoy <nirmoy.das@intel.com>
> >> Sent: Wednesday, April 10, 2024 1:03 PM
> >> To: intel-xe@lists.freedesktop.org
> >> Cc: Das, Nirmoy <nirmoy.das@intel.com>; Vivekanandan, Balasubramani
> >> <balasubramani.vivekanandan@intel.com>; Welty, Brian
> >> <brian.welty@intel.com>; Yang, Fei <fei.yang@intel.com>; Landwerlin,
> Lionel G
> >> <lionel.g.landwerlin@intel.com>; Roper, Matthew D
> >> <matthew.d.roper@intel.com>; Brost, Matthew
> <matthew.brost@intel.com>;
> >> Mrozek, Michal <michal.mrozek@intel.com>; Zeng, Oak
> <oak.zeng@intel.com>;
> >> Thomas Hellstr_m <thomas.hellstrom@linux.intel.com>
> >> Subject: [PATCH 0/3] Enable device atomics with a VM bind flag
> >>
> >> Currently device atomics in SMEM only buffer is not supported and
> >> given that simultaneous usage of device atomics and CPU atomics on
> >> the same SMEM buffer is not guaranteed to function without migration,
> >> and UMD expects no migration for SMEM-only buffer objects, so this
> >> provide a way to set device atomics when UMD is certain to use the
> >> buffer only for device atomics.
> > If I understand it correctly, per umd-kmd contract, kmd should assume all
> shared allocations support system-wide atomic by default
>
> Do we have any reference to this ? I wish this is documented somewhere.
I don't know such document either... I had that impression when I discussed with UMD previously
>
> For Level0, the system-wide atomics is default expectation only when
> there is KMD migration and on buffer with SMEM+LMEM placements. This is
> my understanding
>
> after long discussion that I had with Michal, Matt, and Lionel.
>
> This patch is needed because a vulkan application can always try to do
> cpu atomics on SMEM only BO and we can't allow device atomics on that
> by default.
>
> > (i.e. even if there is not any user settings). So if we want to introduce API to
> let user to say "only device atomic is expected on this allocation", the API would
> be "No system-wide atomics on this allocation".
> I think the flag name suites well with existing
> "ZE_MEMORY_ATOMIC_ATTR_EXP_FLAG_DEVICE_ATOMICS".
Yah, your flag matches above L0 definition.
What is not clear to me is, what is the default behavior of an allocation? If the default behavior is system-wide atomics (means simultaneous host and device atomic operations are possible), then what is the point of SET_DEVICE_ATOMIC...
I think the uAPI format depends on the default behavior:
If default is system-wide atomic, then uAPI should be NO_SYSTEM_ATOMIC, NO_DEVICE_ATOMIC...
If default is no atomic, then uAPI should be SET_DEVICE_ATOMIC, SET_SYSTEM_ATOMIC...
@Mrozek, Michal can you comment?
Oak
> >
> > Similarly, we can introduce API to let user say:
> > there is no atomics expected to this allocation
> > no device atomic or
> > no host atomic
>
> We can add those once we have such requirement from UMD.
>
>
> Regards,
>
> Nirmoy
>
> >
> > But the above assumption (all allocation by default support system-wide
> atomics) doesn't work for device or host allocations on some platform, i.e., as
> explained in the commit message...
> >
> > So the API format is still an open to me. Should we define thing similar to
> umd api here: https://spec.oneapi.io/level-zero/latest/core/api.html#ze-
> memory-atomic-attr-exp-flags-t
> >
> > Oak
> >
> >
> >> Test-with: 20240410170041.24963-1-nirmoy.das@intel.com
> >> Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
> >> Cc: Brian Welty <brian.welty@intel.com>
> >> Cc: Fei Yang <fei.yang@intel.com>
> >> Cc: Lionel G Landwerlin <lionel.g.landwerlin@intel.com>
> >> Cc: Matt Roper <matthew.d.roper@intel.com>
> >> Cc: Matthew Brost <matthew.brost@intel.com>
> >> Cc: Michal Mrozek <michal.mrozek@intel.com>
> >> Cc: Oak Zeng <oak.zeng@intel.com>
> >> Cc: Thomas Hellstr_m <thomas.hellstrom@linux.intel.com>
> >>
> >> Nirmoy Das (3):
> >> drm/xe: Consolidate setting PTE_AE into one place
> >> drm/xe: Add function to check if BO has single placement
> >> drm/xe/uapi: Introduce VMA bind flag for device atomics
> >>
> >> drivers/gpu/drm/xe/xe_bo.c | 14 ++++++++++++++
> >> drivers/gpu/drm/xe/xe_bo.h | 1 +
> >> drivers/gpu/drm/xe/xe_pt.c | 4 +---
> >> drivers/gpu/drm/xe/xe_vm.c | 32 ++++++++++++++++++++++++++++---
> -
> >> drivers/gpu/drm/xe/xe_vm_types.h | 2 ++
> >> include/uapi/drm/xe_drm.h | 9 +++++----
> >> 6 files changed, 51 insertions(+), 11 deletions(-)
> >>
> >> --
> >> 2.42.0
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/3] drm/xe: Consolidate setting PTE_AE into one place
2024-04-10 17:03 ` [PATCH 1/3] drm/xe: Consolidate setting PTE_AE into one place Nirmoy Das
@ 2024-04-11 23:22 ` Matt Roper
2024-04-12 9:34 ` Nirmoy Das
0 siblings, 1 reply; 27+ messages in thread
From: Matt Roper @ 2024-04-11 23:22 UTC (permalink / raw)
To: Nirmoy Das; +Cc: intel-xe
On Wed, Apr 10, 2024 at 07:03:06PM +0200, Nirmoy Das wrote:
> Currently decision to set PTE_AE is spread between xe_pt
> and xe_vm files and there is no reason to be keep it that
> way. Consolidate the logic for better maintainability.
Does this series bisect properly? I.e., if we run the driver with this
patch applied, but the other two patches missing, isn't it going to turn
on the AE bit in the page table in some BMG SMEM cases where it
shouldn't? It seems like we should at least mention that in the commit
message to avoid confusion.
Matt
>
> This also remove the extra care needed for PVC which only
> allows setting PTE_AE for LMEM.
>
> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
> ---
> drivers/gpu/drm/xe/xe_pt.c | 4 +---
> drivers/gpu/drm/xe/xe_vm.c | 7 ++++---
> 2 files changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 5b7930f46cf3..7dc13a8bb44f 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -597,7 +597,6 @@ static int
> xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
> struct xe_vm_pgtable_update *entries, u32 *num_entries)
> {
> - struct xe_device *xe = tile_to_xe(tile);
> struct xe_bo *bo = xe_vma_bo(vma);
> bool is_devmem = !xe_vma_is_userptr(vma) && bo &&
> (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo));
> @@ -619,8 +618,7 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
> struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id];
> int ret;
>
> - if ((vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT) &&
> - (is_devmem || !IS_DGFX(xe)))
> + if (vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT)
> xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
>
> if (is_devmem) {
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index a196dbe65252..8f3474c5f480 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -904,9 +904,6 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
> for_each_tile(tile, vm->xe, id)
> vma->tile_mask |= 0x1 << id;
>
> - if (GRAPHICS_VER(vm->xe) >= 20 || vm->xe->info.platform == XE_PVC)
> - vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
> -
> vma->pat_index = pat_index;
>
> if (bo) {
> @@ -914,6 +911,10 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>
> xe_bo_assert_held(bo);
>
> + if (GRAPHICS_VER(vm->xe) >= 20 || xe_bo_is_vram(bo) ||
> + !IS_DGFX(vm->xe))
> + vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
> +
> vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
> if (IS_ERR(vm_bo)) {
> xe_vma_free(vma);
> --
> 2.42.0
>
--
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics
2024-04-10 17:03 ` [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics Nirmoy Das
2024-04-10 17:35 ` Matthew Brost
2024-04-11 13:14 ` Lionel Landwerlin
@ 2024-04-11 23:44 ` Matt Roper
2024-04-12 8:06 ` Nirmoy Das
2 siblings, 1 reply; 27+ messages in thread
From: Matt Roper @ 2024-04-11 23:44 UTC (permalink / raw)
To: Nirmoy Das; +Cc: intel-xe
On Wed, Apr 10, 2024 at 07:03:08PM +0200, Nirmoy Das wrote:
> Adds a new VMA bind flag to enable device atomics on SMEM only buffers.
>
> Given that simultaneous usage of device atomics and CPU atomics on
> the same SMEM buffer is not guaranteed to function without migration,
> and UMD expects no migration for SMEM-only buffer objects, so this provide
> a way to set device atomics when UMD is certain to use the buffer only
> for device atomics.
>
> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
> ---
> drivers/gpu/drm/xe/xe_vm.c | 27 +++++++++++++++++++++++++--
> drivers/gpu/drm/xe/xe_vm_types.h | 2 ++
> include/uapi/drm/xe_drm.h | 9 +++++----
> 3 files changed, 32 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 8f3474c5f480..530b4bbc186c 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -851,6 +851,7 @@ static void xe_vma_free(struct xe_vma *vma)
> #define VMA_CREATE_FLAG_READ_ONLY BIT(0)
> #define VMA_CREATE_FLAG_IS_NULL BIT(1)
> #define VMA_CREATE_FLAG_DUMPABLE BIT(2)
> +#define VMA_CREATE_FLAG_DEVICE_ATOMICS BIT(3)
>
> static struct xe_vma *xe_vma_create(struct xe_vm *vm,
> struct xe_bo *bo,
> @@ -864,6 +865,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
> bool read_only = (flags & VMA_CREATE_FLAG_READ_ONLY);
> bool is_null = (flags & VMA_CREATE_FLAG_IS_NULL);
> bool dumpable = (flags & VMA_CREATE_FLAG_DUMPABLE);
> + bool enable_atomics = (flags & VMA_CREATE_FLAG_IS_NULL);
Is this a copy-paste mistake? I expected this to be
VMA_CREATE_FLAG_DEVICE_ATOMICS, not VMA_CREATE_FLAG_IS_NULL.
>
> xe_assert(vm->xe, start < end);
> xe_assert(vm->xe, end < vm->size);
> @@ -912,7 +914,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
> xe_bo_assert_held(bo);
>
> if (GRAPHICS_VER(vm->xe) >= 20 || xe_bo_is_vram(bo) ||
> - !IS_DGFX(vm->xe))
> + !IS_DGFX(vm->xe) || enable_atomics)
Is this the intended logic? This will still turn on AE=1
unconditionally on Xe2 platforms (regardless of the flags, placement,
etc.). I thought it would be something more like
if (xe_bo_is_vram(bo) || !IS_DGFX(vm_xe) ||
(GRAPHICS_VER(vm->xe) >= 20 && enable_atomics))
so that AE=1 only gets applied to discrete smem in cases where we're on
an Xe2 or later platform _and_ userspace used the flag to tell us that
device-scope atomics are sufficient for its needs.
Matt
> vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>
> vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
> @@ -2174,6 +2176,18 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
> operation, (ULL)addr, (ULL)range,
> (ULL)bo_offset_or_userptr);
>
> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
> + (vm->xe->info.platform == XE_PVC && !xe_bo_is_vram(bo))) {
> + drm_warn(&vm->xe->drm, "Setting device atomics on SMEM is not supported for this platform");
> + return ERR_PTR(-EINVAL);
> + }
> +
> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
> + !xe_bo_has_single_placement(bo))
> + drm_warn(&vm->xe->drm, "DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS can be only set if the BO has single placement");
> + return ERR_PTR(-EINVAL);
> + }
> +
> switch (operation) {
> case DRM_XE_VM_BIND_OP_MAP:
> case DRM_XE_VM_BIND_OP_MAP_USERPTR:
> @@ -2216,6 +2230,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
> if (__op->op == DRM_GPUVA_OP_MAP) {
> op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL;
> op->map.dumpable = flags & DRM_XE_VM_BIND_FLAG_DUMPABLE;
> + op->map.enable_device_atomics = flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS;
> op->map.pat_index = pat_index;
> } else if (__op->op == DRM_GPUVA_OP_PREFETCH) {
> op->prefetch.region = prefetch_region;
> @@ -2412,6 +2427,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
> VMA_CREATE_FLAG_IS_NULL : 0;
> flags |= op->map.dumpable ?
> VMA_CREATE_FLAG_DUMPABLE : 0;
> + flags |= op->map.enable_device_atomics ?
> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>
> vma = new_vma(vm, &op->base.map, op->map.pat_index,
> flags);
> @@ -2439,6 +2456,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
> flags |= op->base.remap.unmap->va->flags &
> XE_VMA_DUMPABLE ?
> VMA_CREATE_FLAG_DUMPABLE : 0;
> + flags |= op->base.remap.unmap->va->flags ?
> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>
> vma = new_vma(vm, op->base.remap.prev,
> old->pat_index, flags);
> @@ -2476,6 +2495,9 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
> flags |= op->base.remap.unmap->va->flags &
> XE_VMA_DUMPABLE ?
> VMA_CREATE_FLAG_DUMPABLE : 0;
> + flags |= op->base.remap.unmap->va->flags ?
> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
> +
>
> vma = new_vma(vm, op->base.remap.next,
> old->pat_index, flags);
> @@ -2831,7 +2853,8 @@ static int vm_bind_ioctl_ops_execute(struct xe_vm *vm,
> (DRM_XE_VM_BIND_FLAG_READONLY | \
> DRM_XE_VM_BIND_FLAG_IMMEDIATE | \
> DRM_XE_VM_BIND_FLAG_NULL | \
> - DRM_XE_VM_BIND_FLAG_DUMPABLE)
> + DRM_XE_VM_BIND_FLAG_DUMPABLE | \
> + DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS)
> #define XE_64K_PAGE_MASK 0xffffull
> #define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP)
>
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index badf3945083d..7b9c68909c78 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -282,6 +282,8 @@ struct xe_vma_op_map {
> bool dumpable;
> /** @pat_index: The pat index to use for this operation. */
> u16 pat_index;
> + /** @enable_device_atomics: Whether the VMA will allow device atomics */
> + bool enable_device_atomics;
> };
>
> /** struct xe_vma_op_remap - VMA remap operation */
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index 1446c3bae515..bffe8b1c040c 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -969,10 +969,11 @@ struct drm_xe_vm_bind_op {
> /** @op: Bind operation to perform */
> __u32 op;
>
> -#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
> -#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
> -#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
> -#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
> +#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
> +#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
> +#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
> +#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
> +#define DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS (1 << 4)
> /** @flags: Bind flags */
> __u32 flags;
>
> --
> 2.42.0
>
--
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation
^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: [PATCH 0/3] Enable device atomics with a VM bind flag
2024-04-11 17:23 ` Zeng, Oak
@ 2024-04-12 5:06 ` Mrozek, Michal
0 siblings, 0 replies; 27+ messages in thread
From: Mrozek, Michal @ 2024-04-12 5:06 UTC (permalink / raw)
To: Zeng, Oak, Nirmoy Das, Das, Nirmoy,
intel-xe@lists.freedesktop.org
Cc: Vivekanandan, Balasubramani, Welty, Brian, Yang, Fei,
Landwerlin, Lionel G, Roper, Matthew D, Brost, Matthew,
Thomas Hellstr_m
"
I think the uAPI format depends on the default behavior:
If default is system-wide atomic, then uAPI should be NO_SYSTEM_ATOMIC, NO_DEVICE_ATOMIC...
If default is no atomic, then uAPI should be SET_DEVICE_ATOMIC, SET_SYSTEM_ATOMIC...
@Mrozek, Michal can you comment?
"
There are following use cases in Compute
1. Device USM
LMEM only region
Device Atomics only
Atomics possible only from GPU
2. Host USM
SMEM only region
Device Atomics only
Atomics possible from CPU and GPU.
Concurrent atomics from CPU and GPU not allowed.
3. Shared USM with KMD migration
SMEM + LMEM region
Global atomics enabled
Atomics possible from CPU and GPU
Concurrent atomics from CPU and GPU are allowed
So in order to allocate #1 and #2 you need to indicate a need for device atomics and not global atomics.
Michal
-----Original Message-----
From: Zeng, Oak <oak.zeng@intel.com>
Sent: Thursday, April 11, 2024 7:24 PM
To: Nirmoy Das <nirmoy.das@linux.intel.com>; Das, Nirmoy <nirmoy.das@intel.com>; intel-xe@lists.freedesktop.org; Mrozek, Michal <michal.mrozek@intel.com>
Cc: Vivekanandan, Balasubramani <balasubramani.vivekanandan@intel.com>; Welty, Brian <brian.welty@intel.com>; Yang, Fei <fei.yang@intel.com>; Landwerlin, Lionel G <lionel.g.landwerlin@intel.com>; Roper, Matthew D <matthew.d.roper@intel.com>; Brost, Matthew <matthew.brost@intel.com>; Thomas Hellstr_m <thomas.hellstrom@linux.intel.com>
Subject: RE: [PATCH 0/3] Enable device atomics with a VM bind flag
> -----Original Message-----
> From: Nirmoy Das <nirmoy.das@linux.intel.com>
> Sent: Thursday, April 11, 2024 1:01 PM
> To: Zeng, Oak <oak.zeng@intel.com>; Das, Nirmoy
> <nirmoy.das@intel.com>; intel-xe@lists.freedesktop.org
> Cc: Vivekanandan, Balasubramani
> <balasubramani.vivekanandan@intel.com>;
> Welty, Brian <brian.welty@intel.com>; Yang, Fei <fei.yang@intel.com>;
> Landwerlin, Lionel G <lionel.g.landwerlin@intel.com>; Roper, Matthew D
> <matthew.d.roper@intel.com>; Brost, Matthew <matthew.brost@intel.com>;
> Mrozek, Michal <michal.mrozek@intel.com>; Thomas Hellstr_m
> <thomas.hellstrom@linux.intel.com>
> Subject: Re: [PATCH 0/3] Enable device atomics with a VM bind flag
>
> Hi Oak,
>
> On 4/11/2024 6:22 PM, Zeng, Oak wrote:
> >
> >> -----Original Message-----
> >> From: Das, Nirmoy <nirmoy.das@intel.com>
> >> Sent: Wednesday, April 10, 2024 1:03 PM
> >> To: intel-xe@lists.freedesktop.org
> >> Cc: Das, Nirmoy <nirmoy.das@intel.com>; Vivekanandan, Balasubramani
> >> <balasubramani.vivekanandan@intel.com>; Welty, Brian
> >> <brian.welty@intel.com>; Yang, Fei <fei.yang@intel.com>;
> >> Landwerlin,
> Lionel G
> >> <lionel.g.landwerlin@intel.com>; Roper, Matthew D
> >> <matthew.d.roper@intel.com>; Brost, Matthew
> <matthew.brost@intel.com>;
> >> Mrozek, Michal <michal.mrozek@intel.com>; Zeng, Oak
> <oak.zeng@intel.com>;
> >> Thomas Hellstr_m <thomas.hellstrom@linux.intel.com>
> >> Subject: [PATCH 0/3] Enable device atomics with a VM bind flag
> >>
> >> Currently device atomics in SMEM only buffer is not supported and
> >> given that simultaneous usage of device atomics and CPU atomics on
> >> the same SMEM buffer is not guaranteed to function without
> >> migration, and UMD expects no migration for SMEM-only buffer
> >> objects, so this provide a way to set device atomics when UMD is
> >> certain to use the buffer only for device atomics.
> > If I understand it correctly, per umd-kmd contract, kmd should
> > assume all
> shared allocations support system-wide atomic by default
>
> Do we have any reference to this ? I wish this is documented somewhere.
I don't know such document either... I had that impression when I discussed with UMD previously
>
> For Level0, the system-wide atomics is default expectation only when
> there is KMD migration and on buffer with SMEM+LMEM placements. This
> is my understanding
>
> after long discussion that I had with Michal, Matt, and Lionel.
>
> This patch is needed because a vulkan application can always try to do
> cpu atomics on SMEM only BO and we can't allow device atomics on that
> by default.
>
> > (i.e. even if there is not any user settings). So if we want to
> > introduce API to
> let user to say "only device atomic is expected on this allocation",
> the API would be "No system-wide atomics on this allocation".
> I think the flag name suites well with existing
> "ZE_MEMORY_ATOMIC_ATTR_EXP_FLAG_DEVICE_ATOMICS".
Yah, your flag matches above L0 definition.
What is not clear to me is, what is the default behavior of an allocation? If the default behavior is system-wide atomics (means simultaneous host and device atomic operations are possible), then what is the point of SET_DEVICE_ATOMIC...
I think the uAPI format depends on the default behavior:
If default is system-wide atomic, then uAPI should be NO_SYSTEM_ATOMIC, NO_DEVICE_ATOMIC...
If default is no atomic, then uAPI should be SET_DEVICE_ATOMIC, SET_SYSTEM_ATOMIC...
@Mrozek, Michal can you comment?
Oak
> >
> > Similarly, we can introduce API to let user say:
> > there is no atomics expected to this allocation
> > no device atomic or
> > no host atomic
>
> We can add those once we have such requirement from UMD.
>
>
> Regards,
>
> Nirmoy
>
> >
> > But the above assumption (all allocation by default support
> > system-wide
> atomics) doesn't work for device or host allocations on some platform,
> i.e., as explained in the commit message...
> >
> > So the API format is still an open to me. Should we define thing
> > similar to
> umd api here:
> https://spec.oneapi.io/level-zero/latest/core/api.html#ze-
> memory-atomic-attr-exp-flags-t
> >
> > Oak
> >
> >
> >> Test-with: 20240410170041.24963-1-nirmoy.das@intel.com
> >> Cc: Balasubramani Vivekanandan
> >> <balasubramani.vivekanandan@intel.com>
> >> Cc: Brian Welty <brian.welty@intel.com>
> >> Cc: Fei Yang <fei.yang@intel.com>
> >> Cc: Lionel G Landwerlin <lionel.g.landwerlin@intel.com>
> >> Cc: Matt Roper <matthew.d.roper@intel.com>
> >> Cc: Matthew Brost <matthew.brost@intel.com>
> >> Cc: Michal Mrozek <michal.mrozek@intel.com>
> >> Cc: Oak Zeng <oak.zeng@intel.com>
> >> Cc: Thomas Hellstr_m <thomas.hellstrom@linux.intel.com>
> >>
> >> Nirmoy Das (3):
> >> drm/xe: Consolidate setting PTE_AE into one place
> >> drm/xe: Add function to check if BO has single placement
> >> drm/xe/uapi: Introduce VMA bind flag for device atomics
> >>
> >> drivers/gpu/drm/xe/xe_bo.c | 14 ++++++++++++++
> >> drivers/gpu/drm/xe/xe_bo.h | 1 +
> >> drivers/gpu/drm/xe/xe_pt.c | 4 +---
> >> drivers/gpu/drm/xe/xe_vm.c | 32 ++++++++++++++++++++++++++++---
> -
> >> drivers/gpu/drm/xe/xe_vm_types.h | 2 ++
> >> include/uapi/drm/xe_drm.h | 9 +++++----
> >> 6 files changed, 51 insertions(+), 11 deletions(-)
> >>
> >> --
> >> 2.42.0
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics
2024-04-11 23:44 ` Matt Roper
@ 2024-04-12 8:06 ` Nirmoy Das
0 siblings, 0 replies; 27+ messages in thread
From: Nirmoy Das @ 2024-04-12 8:06 UTC (permalink / raw)
To: Matt Roper; +Cc: intel-xe
Hi Matt,
On 4/12/2024 1:44 AM, Matt Roper wrote:
> On Wed, Apr 10, 2024 at 07:03:08PM +0200, Nirmoy Das wrote:
>> Adds a new VMA bind flag to enable device atomics on SMEM only buffers.
>>
>> Given that simultaneous usage of device atomics and CPU atomics on
>> the same SMEM buffer is not guaranteed to function without migration,
>> and UMD expects no migration for SMEM-only buffer objects, so this provide
>> a way to set device atomics when UMD is certain to use the buffer only
>> for device atomics.
>>
>> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_vm.c | 27 +++++++++++++++++++++++++--
>> drivers/gpu/drm/xe/xe_vm_types.h | 2 ++
>> include/uapi/drm/xe_drm.h | 9 +++++----
>> 3 files changed, 32 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index 8f3474c5f480..530b4bbc186c 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -851,6 +851,7 @@ static void xe_vma_free(struct xe_vma *vma)
>> #define VMA_CREATE_FLAG_READ_ONLY BIT(0)
>> #define VMA_CREATE_FLAG_IS_NULL BIT(1)
>> #define VMA_CREATE_FLAG_DUMPABLE BIT(2)
>> +#define VMA_CREATE_FLAG_DEVICE_ATOMICS BIT(3)
>>
>> static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>> struct xe_bo *bo,
>> @@ -864,6 +865,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>> bool read_only = (flags & VMA_CREATE_FLAG_READ_ONLY);
>> bool is_null = (flags & VMA_CREATE_FLAG_IS_NULL);
>> bool dumpable = (flags & VMA_CREATE_FLAG_DUMPABLE);
>> + bool enable_atomics = (flags & VMA_CREATE_FLAG_IS_NULL);
> Is this a copy-paste mistake? I expected this to be
> VMA_CREATE_FLAG_DEVICE_ATOMICS, not VMA_CREATE_FLAG_IS_NULL.
Yes, it is. Fixed it locally now.
>
>>
>> xe_assert(vm->xe, start < end);
>> xe_assert(vm->xe, end < vm->size);
>> @@ -912,7 +914,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>> xe_bo_assert_held(bo);
>>
>> if (GRAPHICS_VER(vm->xe) >= 20 || xe_bo_is_vram(bo) ||
>> - !IS_DGFX(vm->xe))
>> + !IS_DGFX(vm->xe) || enable_atomics)
> Is this the intended logic? This will still turn on AE=1
> unconditionally on Xe2 platforms (regardless of the flags, placement,
> etc.). I thought it would be something more like
>
> if (xe_bo_is_vram(bo) || !IS_DGFX(vm_xe) ||
> (GRAPHICS_VER(vm->xe) >= 20 && enable_atomics))
>
> so that AE=1 only gets applied to discrete smem in cases where we're on
> an Xe2 or later platform _and_ userspace used the flag to tell us that
> device-scope atomics are sufficient for its needs.
I realized it yesterday while adding platform flags as suggested by Matt
B. I have next version which
makes it cleaner.
Thanks,
Nirmoy
>
>
> Matt
>
>> vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>>
>> vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
>> @@ -2174,6 +2176,18 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>> operation, (ULL)addr, (ULL)range,
>> (ULL)bo_offset_or_userptr);
>>
>> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
>> + (vm->xe->info.platform == XE_PVC && !xe_bo_is_vram(bo))) {
>> + drm_warn(&vm->xe->drm, "Setting device atomics on SMEM is not supported for this platform");
>> + return ERR_PTR(-EINVAL);
>> + }
>> +
>> + if (bo && (flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS) &&
>> + !xe_bo_has_single_placement(bo))
>> + drm_warn(&vm->xe->drm, "DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS can be only set if the BO has single placement");
>> + return ERR_PTR(-EINVAL);
>> + }
>> +
>> switch (operation) {
>> case DRM_XE_VM_BIND_OP_MAP:
>> case DRM_XE_VM_BIND_OP_MAP_USERPTR:
>> @@ -2216,6 +2230,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>> if (__op->op == DRM_GPUVA_OP_MAP) {
>> op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL;
>> op->map.dumpable = flags & DRM_XE_VM_BIND_FLAG_DUMPABLE;
>> + op->map.enable_device_atomics = flags & DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS;
>> op->map.pat_index = pat_index;
>> } else if (__op->op == DRM_GPUVA_OP_PREFETCH) {
>> op->prefetch.region = prefetch_region;
>> @@ -2412,6 +2427,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>> VMA_CREATE_FLAG_IS_NULL : 0;
>> flags |= op->map.dumpable ?
>> VMA_CREATE_FLAG_DUMPABLE : 0;
>> + flags |= op->map.enable_device_atomics ?
>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>>
>> vma = new_vma(vm, &op->base.map, op->map.pat_index,
>> flags);
>> @@ -2439,6 +2456,8 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>> flags |= op->base.remap.unmap->va->flags &
>> XE_VMA_DUMPABLE ?
>> VMA_CREATE_FLAG_DUMPABLE : 0;
>> + flags |= op->base.remap.unmap->va->flags ?
>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>>
>> vma = new_vma(vm, op->base.remap.prev,
>> old->pat_index, flags);
>> @@ -2476,6 +2495,9 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>> flags |= op->base.remap.unmap->va->flags &
>> XE_VMA_DUMPABLE ?
>> VMA_CREATE_FLAG_DUMPABLE : 0;
>> + flags |= op->base.remap.unmap->va->flags ?
>> + VMA_CREATE_FLAG_DEVICE_ATOMICS : 0;
>> +
>>
>> vma = new_vma(vm, op->base.remap.next,
>> old->pat_index, flags);
>> @@ -2831,7 +2853,8 @@ static int vm_bind_ioctl_ops_execute(struct xe_vm *vm,
>> (DRM_XE_VM_BIND_FLAG_READONLY | \
>> DRM_XE_VM_BIND_FLAG_IMMEDIATE | \
>> DRM_XE_VM_BIND_FLAG_NULL | \
>> - DRM_XE_VM_BIND_FLAG_DUMPABLE)
>> + DRM_XE_VM_BIND_FLAG_DUMPABLE | \
>> + DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS)
>> #define XE_64K_PAGE_MASK 0xffffull
>> #define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
>> index badf3945083d..7b9c68909c78 100644
>> --- a/drivers/gpu/drm/xe/xe_vm_types.h
>> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
>> @@ -282,6 +282,8 @@ struct xe_vma_op_map {
>> bool dumpable;
>> /** @pat_index: The pat index to use for this operation. */
>> u16 pat_index;
>> + /** @enable_device_atomics: Whether the VMA will allow device atomics */
>> + bool enable_device_atomics;
>> };
>>
>> /** struct xe_vma_op_remap - VMA remap operation */
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index 1446c3bae515..bffe8b1c040c 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -969,10 +969,11 @@ struct drm_xe_vm_bind_op {
>> /** @op: Bind operation to perform */
>> __u32 op;
>>
>> -#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
>> -#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
>> -#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
>> -#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
>> +#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
>> +#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
>> +#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
>> +#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
>> +#define DRM_XE_VM_BIND_FLAG_DEVICE_ATOMICS (1 << 4)
>> /** @flags: Bind flags */
>> __u32 flags;
>>
>> --
>> 2.42.0
>>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/3] drm/xe: Consolidate setting PTE_AE into one place
2024-04-11 23:22 ` Matt Roper
@ 2024-04-12 9:34 ` Nirmoy Das
2024-04-12 11:33 ` Nirmoy Das
0 siblings, 1 reply; 27+ messages in thread
From: Nirmoy Das @ 2024-04-12 9:34 UTC (permalink / raw)
To: Matt Roper, Nirmoy Das; +Cc: intel-xe
Hi Matt,
On 4/12/2024 1:22 AM, Matt Roper wrote:
> On Wed, Apr 10, 2024 at 07:03:06PM +0200, Nirmoy Das wrote:
>> Currently decision to set PTE_AE is spread between xe_pt
>> and xe_vm files and there is no reason to be keep it that
>> way. Consolidate the logic for better maintainability.
> Does this series bisect properly? I.e., if we run the driver with this
> patch applied, but the other two patches missing, isn't it going to turn
> on the AE bit in the page table in some BMG SMEM cases where it
> shouldn't? It seems like we should at least mention that in the commit
> message to avoid confusion.
Sorry for the confusion; it came up because I made a mistake with the
if-conditions. I didn't intend to introduce any functional changes with
this patch.
Version 2 corrects the if-condition, which now only allows setting
PTE_AE on PVC+VRAM, XE2+VRAM, and XE2+iGFX, thus maintaining the
functionality as it was
Regards,
Nirmoy
>
>
> Matt
>
>> This also remove the extra care needed for PVC which only
>> allows setting PTE_AE for LMEM.
>>
>> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_pt.c | 4 +---
>> drivers/gpu/drm/xe/xe_vm.c | 7 ++++---
>> 2 files changed, 5 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
>> index 5b7930f46cf3..7dc13a8bb44f 100644
>> --- a/drivers/gpu/drm/xe/xe_pt.c
>> +++ b/drivers/gpu/drm/xe/xe_pt.c
>> @@ -597,7 +597,6 @@ static int
>> xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
>> struct xe_vm_pgtable_update *entries, u32 *num_entries)
>> {
>> - struct xe_device *xe = tile_to_xe(tile);
>> struct xe_bo *bo = xe_vma_bo(vma);
>> bool is_devmem = !xe_vma_is_userptr(vma) && bo &&
>> (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo));
>> @@ -619,8 +618,7 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
>> struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id];
>> int ret;
>>
>> - if ((vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT) &&
>> - (is_devmem || !IS_DGFX(xe)))
>> + if (vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT)
>> xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
>>
>> if (is_devmem) {
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index a196dbe65252..8f3474c5f480 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -904,9 +904,6 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>> for_each_tile(tile, vm->xe, id)
>> vma->tile_mask |= 0x1 << id;
>>
>> - if (GRAPHICS_VER(vm->xe) >= 20 || vm->xe->info.platform == XE_PVC)
>> - vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>> -
>> vma->pat_index = pat_index;
>>
>> if (bo) {
>> @@ -914,6 +911,10 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>>
>> xe_bo_assert_held(bo);
>>
>> + if (GRAPHICS_VER(vm->xe) >= 20 || xe_bo_is_vram(bo) ||
>> + !IS_DGFX(vm->xe))
>> + vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>> +
>> vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
>> if (IS_ERR(vm_bo)) {
>> xe_vma_free(vma);
>> --
>> 2.42.0
>>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/3] drm/xe: Consolidate setting PTE_AE into one place
2024-04-12 9:34 ` Nirmoy Das
@ 2024-04-12 11:33 ` Nirmoy Das
0 siblings, 0 replies; 27+ messages in thread
From: Nirmoy Das @ 2024-04-12 11:33 UTC (permalink / raw)
To: Matt Roper, Nirmoy Das; +Cc: intel-xe
Hi Matt,
On 4/12/2024 11:34 AM, Nirmoy Das wrote:
> Hi Matt,
>
> On 4/12/2024 1:22 AM, Matt Roper wrote:
>> On Wed, Apr 10, 2024 at 07:03:06PM +0200, Nirmoy Das wrote:
>>> Currently decision to set PTE_AE is spread between xe_pt
>>> and xe_vm files and there is no reason to be keep it that
>>> way. Consolidate the logic for better maintainability.
>> Does this series bisect properly? I.e., if we run the driver with this
>> patch applied, but the other two patches missing, isn't it going to turn
>> on the AE bit in the page table in some BMG SMEM cases where it
>> shouldn't? It seems like we should at least mention that in the commit
>> message to avoid confusion.
> Sorry for the confusion; it came up because I made a mistake with the
> if-conditions. I didn't intend to introduce any functional changes
> with this patch.
>
> Version 2 corrects the if-condition, which now only allows setting
> PTE_AE on PVC+VRAM, XE2+VRAM, and XE2+iGFX, thus maintaining the
> functionality as it was
I miss understood your comment. Indeed, this is setting PTE_AE when only
BO exist and leaving others. This need to be noted in the commit
message. I will do that.
Regards,
Nirmoy
>
> Regards,
>
> Nirmoy
>
>>
>>
>> Matt
>>
>>> This also remove the extra care needed for PVC which only
>>> allows setting PTE_AE for LMEM.
>>>
>>> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
>>> ---
>>> drivers/gpu/drm/xe/xe_pt.c | 4 +---
>>> drivers/gpu/drm/xe/xe_vm.c | 7 ++++---
>>> 2 files changed, 5 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
>>> index 5b7930f46cf3..7dc13a8bb44f 100644
>>> --- a/drivers/gpu/drm/xe/xe_pt.c
>>> +++ b/drivers/gpu/drm/xe/xe_pt.c
>>> @@ -597,7 +597,6 @@ static int
>>> xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
>>> struct xe_vm_pgtable_update *entries, u32 *num_entries)
>>> {
>>> - struct xe_device *xe = tile_to_xe(tile);
>>> struct xe_bo *bo = xe_vma_bo(vma);
>>> bool is_devmem = !xe_vma_is_userptr(vma) && bo &&
>>> (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo));
>>> @@ -619,8 +618,7 @@ xe_pt_stage_bind(struct xe_tile *tile, struct
>>> xe_vma *vma,
>>> struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id];
>>> int ret;
>>> - if ((vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT) &&
>>> - (is_devmem || !IS_DGFX(xe)))
>>> + if (vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT)
>>> xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
>>> if (is_devmem) {
>>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>>> index a196dbe65252..8f3474c5f480 100644
>>> --- a/drivers/gpu/drm/xe/xe_vm.c
>>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>>> @@ -904,9 +904,6 @@ static struct xe_vma *xe_vma_create(struct xe_vm
>>> *vm,
>>> for_each_tile(tile, vm->xe, id)
>>> vma->tile_mask |= 0x1 << id;
>>> - if (GRAPHICS_VER(vm->xe) >= 20 || vm->xe->info.platform ==
>>> XE_PVC)
>>> - vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>>> -
>>> vma->pat_index = pat_index;
>>> if (bo) {
>>> @@ -914,6 +911,10 @@ static struct xe_vma *xe_vma_create(struct
>>> xe_vm *vm,
>>> xe_bo_assert_held(bo);
>>> + if (GRAPHICS_VER(vm->xe) >= 20 || xe_bo_is_vram(bo) ||
>>> + !IS_DGFX(vm->xe))
>>> + vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>>> +
>>> vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
>>> if (IS_ERR(vm_bo)) {
>>> xe_vma_free(vma);
>>> --
>>> 2.42.0
>>>
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2024-04-12 11:33 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-10 17:03 [PATCH 0/3] Enable device atomics with a VM bind flag Nirmoy Das
2024-04-10 17:03 ` [PATCH 1/3] drm/xe: Consolidate setting PTE_AE into one place Nirmoy Das
2024-04-11 23:22 ` Matt Roper
2024-04-12 9:34 ` Nirmoy Das
2024-04-12 11:33 ` Nirmoy Das
2024-04-10 17:03 ` [PATCH 2/3] drm/xe: Add function to check if BO has single placement Nirmoy Das
2024-04-10 17:03 ` [PATCH 3/3] drm/xe/uapi: Introduce VMA bind flag for device atomics Nirmoy Das
2024-04-10 17:35 ` Matthew Brost
2024-04-11 9:22 ` Nirmoy Das
2024-04-11 14:00 ` Nirmoy Das
2024-04-11 13:14 ` Lionel Landwerlin
2024-04-11 13:32 ` Lionel Landwerlin
2024-04-11 13:42 ` Nirmoy Das
2024-04-11 14:00 ` Lionel Landwerlin
2024-04-11 14:07 ` Souza, Jose
2024-04-11 14:32 ` Nirmoy Das
2024-04-11 14:40 ` Souza, Jose
2024-04-11 14:54 ` Nirmoy Das
2024-04-11 23:44 ` Matt Roper
2024-04-12 8:06 ` Nirmoy Das
2024-04-10 18:04 ` ✓ CI.Patch_applied: success for Enable device atomics with a VM bind flag Patchwork
2024-04-10 18:04 ` ✓ CI.checkpatch: " Patchwork
2024-04-10 18:04 ` ✗ CI.KUnit: failure " Patchwork
2024-04-11 16:22 ` [PATCH 0/3] " Zeng, Oak
2024-04-11 17:00 ` Nirmoy Das
2024-04-11 17:23 ` Zeng, Oak
2024-04-12 5:06 ` Mrozek, Michal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox