* [PATCH i-g-t v7 00/10] Add SVM madvise feature for multi-GPU configurations
@ 2025-11-13 17:04 nishit.sharma
2025-11-13 17:04 ` [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers nishit.sharma
` (8 more replies)
0 siblings, 9 replies; 21+ messages in thread
From: nishit.sharma @ 2025-11-13 17:04 UTC (permalink / raw)
To: igt-dev, nishit.sharma
From: Nishit Sharma <nishit.sharma@intel.com>
This patch series adds comprehensive SVM multi-GPU IGT test coverage for
madvise and prefetch functionality.
ver2:
- Test name changed in commits
- In patchwork v1 patch is missing due to last patch was not sent
ver3:
- In patch-7 tags were added and it was not sent on patchwork
ver4:
- In patch file was added which is not available in source which caused
CI build failure.
ver5:
- Added subtest function wrappers
- Subtests executing for all GPUs enumerated
ver7:
- Optimized function calling which are frequenctly called
- Incorporated review comments (Thomas Hellstrom)
Nishit Sharma (10):
lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync
helpers
tests/intel/xe_exec_system_allocator: Add parameter in madvise call
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU cross-GPU memory access
test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU atomic operations
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU coherency test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU performance test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU fault handling test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU simultaneous access
test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU conflicting madvise
test
tests/intel/xe_multi-gpusvm: Add SVM multi-GPU migration test
include/drm-uapi/xe_drm.h | 4 +-
lib/xe/xe_ioctl.c | 53 +-
lib/xe/xe_ioctl.h | 11 +-
tests/intel/xe_exec_system_allocator.c | 8 +-
tests/intel/xe_multi_gpusvm.c | 1441 ++++++++++++++++++++++++
tests/meson.build | 1 +
6 files changed, 1504 insertions(+), 14 deletions(-)
create mode 100644 tests/intel/xe_multi_gpusvm.c
--
2.48.1
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers 2025-11-13 17:04 [PATCH i-g-t v7 00/10] Add SVM madvise feature for multi-GPU configurations nishit.sharma @ 2025-11-13 17:04 ` nishit.sharma 2025-11-13 17:04 ` [PATCH i-g-t v7 02/10] tests/intel/xe_exec_system_allocator: Add parameter in madvise call nishit.sharma ` (7 subsequent siblings) 8 siblings, 0 replies; 21+ messages in thread From: nishit.sharma @ 2025-11-13 17:04 UTC (permalink / raw) To: igt-dev, nishit.sharma From: Nishit Sharma <nishit.sharma@intel.com> Add an 'instance' parameter to xe_vm_madvise and __xe_vm_madvise to support per-instance memory advice operations.Implement xe_vm_bind_lr_sync and xe_vm_unbind_lr_sync helpers for synchronous VM bind/unbind using user fences. These changes improve memory advice and binding operations for multi-GPU and multi-instance scenarios in IGT tests. Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> --- include/drm-uapi/xe_drm.h | 4 +-- lib/xe/xe_ioctl.c | 53 +++++++++++++++++++++++++++++++++++---- lib/xe/xe_ioctl.h | 11 +++++--- 3 files changed, 58 insertions(+), 10 deletions(-) diff --git a/include/drm-uapi/xe_drm.h b/include/drm-uapi/xe_drm.h index 89ab54935..3472efa58 100644 --- a/include/drm-uapi/xe_drm.h +++ b/include/drm-uapi/xe_drm.h @@ -2060,8 +2060,8 @@ struct drm_xe_madvise { /** @preferred_mem_loc.migration_policy: Page migration policy */ __u16 migration_policy; - /** @preferred_mem_loc.pad : MBZ */ - __u16 pad; + /** @preferred_mem_loc.region_instance: Region instance */ + __u16 region_instance; /** @preferred_mem_loc.reserved : Reserved */ __u64 reserved; diff --git a/lib/xe/xe_ioctl.c b/lib/xe/xe_ioctl.c index 39c4667a1..06ce8a339 100644 --- a/lib/xe/xe_ioctl.c +++ b/lib/xe/xe_ioctl.c @@ -687,7 +687,8 @@ int64_t xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, } int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, - uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy) + uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy, + uint16_t instance) { struct drm_xe_madvise madvise = { .type = type, @@ -704,6 +705,7 @@ int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, case DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC: madvise.preferred_mem_loc.devmem_fd = op_val; madvise.preferred_mem_loc.migration_policy = policy; + madvise.preferred_mem_loc.region_instance = instance; igt_debug("madvise.preferred_mem_loc.devmem_fd = %d\n", madvise.preferred_mem_loc.devmem_fd); break; @@ -731,14 +733,55 @@ int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, * @type: type of attribute * @op_val: fd/atomic value/pat index, depending upon type of operation * @policy: Page migration policy + * @instance: vram instance * * Function initializes different members of struct drm_xe_madvise and calls * MADVISE IOCTL . * - * Asserts in case of error returned by DRM_IOCTL_XE_MADVISE. + * Returns error number in failure and 0 if pass. */ -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, - uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy) +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, + uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy, + uint16_t instance) { - igt_assert_eq(__xe_vm_madvise(fd, vm, addr, range, ext, type, op_val, policy), 0); + return __xe_vm_madvise(fd, vm, addr, range, ext, type, op_val, policy, instance); +} + +#define BIND_SYNC_VAL 0x686868 +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, uint64_t offset, + uint64_t addr, uint64_t size, uint32_t flags) +{ + volatile uint64_t *sync_addr = malloc(sizeof(*sync_addr)); + struct drm_xe_sync sync = { + .flags = DRM_XE_SYNC_FLAG_SIGNAL, + .type = DRM_XE_SYNC_TYPE_USER_FENCE, + .addr = to_user_pointer((uint64_t *)sync_addr), + .timeline_value = BIND_SYNC_VAL, + }; + + igt_assert(!!sync_addr); + xe_vm_bind_async_flags(fd, vm, 0, bo, 0, addr, size, &sync, 1, flags); + if (*sync_addr != BIND_SYNC_VAL) + xe_wait_ufence(fd, (uint64_t *)sync_addr, BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); + /* Only free if the wait succeeds */ + free((void *)sync_addr); +} + +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, + uint64_t addr, uint64_t size) +{ + volatile uint64_t *sync_addr = malloc(sizeof(*sync_addr)); + struct drm_xe_sync sync = { + .flags = DRM_XE_SYNC_FLAG_SIGNAL, + .type = DRM_XE_SYNC_TYPE_USER_FENCE, + .addr = to_user_pointer((uint64_t *)sync_addr), + .timeline_value = BIND_SYNC_VAL, + }; + + igt_assert(!!sync_addr); + *sync_addr = 0; + xe_vm_unbind_async(fd, vm, 0, 0, addr, size, &sync, 1); + if (*sync_addr != BIND_SYNC_VAL) + xe_wait_ufence(fd, (uint64_t *)sync_addr, BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); + free((void *)sync_addr); } diff --git a/lib/xe/xe_ioctl.h b/lib/xe/xe_ioctl.h index ae8a23a54..1ae38029d 100644 --- a/lib/xe/xe_ioctl.h +++ b/lib/xe/xe_ioctl.h @@ -100,13 +100,18 @@ int __xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, int64_t xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, uint32_t exec_queue, int64_t timeout); int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, - uint32_t type, uint32_t op_val, uint16_t policy); -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, - uint32_t type, uint32_t op_val, uint16_t policy); + uint32_t type, uint32_t op_val, uint16_t policy, uint16_t instance); +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, + uint32_t type, uint32_t op_val, uint16_t policy, uint16_t instance); int xe_vm_number_vmas_in_range(int fd, struct drm_xe_vm_query_mem_range_attr *vmas_attr); int xe_vm_vma_attrs(int fd, struct drm_xe_vm_query_mem_range_attr *vmas_attr, struct drm_xe_mem_range_attr *mem_attr); struct drm_xe_mem_range_attr *xe_vm_get_mem_attr_values_in_range(int fd, uint32_t vm, uint64_t start, uint64_t range, uint32_t *num_ranges); +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, + uint64_t offset, uint64_t addr, + uint64_t size, uint32_t flags); +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, + uint64_t addr, uint64_t size); #endif /* XE_IOCTL_H */ -- 2.48.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH i-g-t v7 02/10] tests/intel/xe_exec_system_allocator: Add parameter in madvise call 2025-11-13 17:04 [PATCH i-g-t v7 00/10] Add SVM madvise feature for multi-GPU configurations nishit.sharma 2025-11-13 17:04 ` [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers nishit.sharma @ 2025-11-13 17:04 ` nishit.sharma 2025-11-18 13:25 ` Gurram, Pravalika 2025-11-13 17:04 ` [PATCH i-g-t v7 03/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU cross-GPU memory access test nishit.sharma ` (6 subsequent siblings) 8 siblings, 1 reply; 21+ messages in thread From: nishit.sharma @ 2025-11-13 17:04 UTC (permalink / raw) To: igt-dev, nishit.sharma From: Nishit Sharma <nishit.sharma@intel.com> Parameter instance added in xe_vm_madvise() call. This parameter addition cause compilation issue in system_allocator test. As a fix 0 as instance parameter passed in xe_vm_madvise() calls. Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> --- tests/intel/xe_exec_system_allocator.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tests/intel/xe_exec_system_allocator.c b/tests/intel/xe_exec_system_allocator.c index b88967e58..1e7175061 100644 --- a/tests/intel/xe_exec_system_allocator.c +++ b/tests/intel/xe_exec_system_allocator.c @@ -1164,7 +1164,7 @@ madvise_swizzle_op_exec(int fd, uint32_t vm, struct test_exec_data *data, xe_vm_madvise(fd, vm, to_user_pointer(data), bo_size, 0, DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, preferred_loc, - 0); + 0, 0); } static void @@ -1172,7 +1172,7 @@ xe_vm_madvixe_pat_attr(int fd, uint32_t vm, uint64_t addr, uint64_t range, int pat_index) { xe_vm_madvise(fd, vm, addr, range, 0, - DRM_XE_MEM_RANGE_ATTR_PAT, pat_index, 0); + DRM_XE_MEM_RANGE_ATTR_PAT, pat_index, 0, 0); } static void @@ -1181,7 +1181,7 @@ xe_vm_madvise_atomic_attr(int fd, uint32_t vm, uint64_t addr, uint64_t range, { xe_vm_madvise(fd, vm, addr, range, 0, DRM_XE_MEM_RANGE_ATTR_ATOMIC, - mem_attr, 0); + mem_attr, 0, 0); } static void @@ -1190,7 +1190,7 @@ xe_vm_madvise_migrate_pages(int fd, uint32_t vm, uint64_t addr, uint64_t range) xe_vm_madvise(fd, vm, addr, range, 0, DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM, - DRM_XE_MIGRATE_ALL_PAGES); + DRM_XE_MIGRATE_ALL_PAGES, 0); } static void -- 2.48.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* RE: [PATCH i-g-t v7 02/10] tests/intel/xe_exec_system_allocator: Add parameter in madvise call 2025-11-13 17:04 ` [PATCH i-g-t v7 02/10] tests/intel/xe_exec_system_allocator: Add parameter in madvise call nishit.sharma @ 2025-11-18 13:25 ` Gurram, Pravalika 0 siblings, 0 replies; 21+ messages in thread From: Gurram, Pravalika @ 2025-11-18 13:25 UTC (permalink / raw) To: Sharma, Nishit, igt-dev@lists.freedesktop.org, Sharma, Nishit > -----Original Message----- > From: igt-dev <igt-dev-bounces@lists.freedesktop.org> On Behalf Of > nishit.sharma@intel.com > Sent: Thursday, November 13, 2025 10:35 PM > To: igt-dev@lists.freedesktop.org; Sharma, Nishit > <nishit.sharma@intel.com> > Subject: [PATCH i-g-t v7 02/10] tests/intel/xe_exec_system_allocator: Add > parameter in madvise call > > From: Nishit Sharma <nishit.sharma@intel.com> > > Parameter instance added in xe_vm_madvise() call. This parameter addition > cause compilation issue in system_allocator test. As a fix > 0 as instance parameter passed in xe_vm_madvise() calls. > > Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> > --- > tests/intel/xe_exec_system_allocator.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/tests/intel/xe_exec_system_allocator.c > b/tests/intel/xe_exec_system_allocator.c > index b88967e58..1e7175061 100644 > --- a/tests/intel/xe_exec_system_allocator.c > +++ b/tests/intel/xe_exec_system_allocator.c > @@ -1164,7 +1164,7 @@ madvise_swizzle_op_exec(int fd, uint32_t vm, > struct test_exec_data *data, > xe_vm_madvise(fd, vm, to_user_pointer(data), bo_size, 0, > DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, > preferred_loc, > - 0); > + 0, 0); > } > > static void > @@ -1172,7 +1172,7 @@ xe_vm_madvixe_pat_attr(int fd, uint32_t vm, > uint64_t addr, uint64_t range, > int pat_index) > { > xe_vm_madvise(fd, vm, addr, range, 0, > - DRM_XE_MEM_RANGE_ATTR_PAT, pat_index, 0); > + DRM_XE_MEM_RANGE_ATTR_PAT, pat_index, 0, 0); > } > > static void > @@ -1181,7 +1181,7 @@ xe_vm_madvise_atomic_attr(int fd, uint32_t vm, > uint64_t addr, uint64_t range, { > xe_vm_madvise(fd, vm, addr, range, 0, > DRM_XE_MEM_RANGE_ATTR_ATOMIC, > - mem_attr, 0); > + mem_attr, 0, 0); > } > > static void > @@ -1190,7 +1190,7 @@ xe_vm_madvise_migrate_pages(int fd, uint32_t > vm, uint64_t addr, uint64_t range) > xe_vm_madvise(fd, vm, addr, range, 0, > DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, > DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM, > - DRM_XE_MIGRATE_ALL_PAGES); > + DRM_XE_MIGRATE_ALL_PAGES, 0); > } please include this with previous patch lib change and all callers should be in one patch --Pravalika > > static void > -- > 2.48.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH i-g-t v7 03/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU cross-GPU memory access test 2025-11-13 17:04 [PATCH i-g-t v7 00/10] Add SVM madvise feature for multi-GPU configurations nishit.sharma 2025-11-13 17:04 ` [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers nishit.sharma 2025-11-13 17:04 ` [PATCH i-g-t v7 02/10] tests/intel/xe_exec_system_allocator: Add parameter in madvise call nishit.sharma @ 2025-11-13 17:04 ` nishit.sharma 2025-11-13 17:04 ` [PATCH i-g-t v7 04/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU atomic operations nishit.sharma ` (5 subsequent siblings) 8 siblings, 0 replies; 21+ messages in thread From: nishit.sharma @ 2025-11-13 17:04 UTC (permalink / raw) To: igt-dev, nishit.sharma From: Nishit Sharma <nishit.sharma@intel.com> This test allocates a buffer in SVM, writes data to it from src GPU , and reads/verifies the data from dst GPU. Optionally, the CPU also reads or modifies the buffer and both GPUs verify the results, ensuring correct cross-GPU and CPU memory access in a multi-GPU environment. Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> --- tests/intel/xe_multi_gpusvm.c | 373 ++++++++++++++++++++++++++++++++++ tests/meson.build | 1 + 2 files changed, 374 insertions(+) create mode 100644 tests/intel/xe_multi_gpusvm.c diff --git a/tests/intel/xe_multi_gpusvm.c b/tests/intel/xe_multi_gpusvm.c new file mode 100644 index 000000000..6614ea3d1 --- /dev/null +++ b/tests/intel/xe_multi_gpusvm.c @@ -0,0 +1,373 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2023 Intel Corporation + */ + +#include <unistd.h> + +#include "drmtest.h" +#include "igt.h" +#include "igt_multigpu.h" + +#include "intel_blt.h" +#include "intel_mocs.h" +#include "intel_reg.h" + +#include "xe/xe_ioctl.h" +#include "xe/xe_query.h" +#include "xe/xe_util.h" + +/** + * TEST: Basic multi-gpu SVM testing + * Category: SVM + * Mega feature: Compute + * Sub-category: Compute tests + * Functionality: SVM p2p access, madvise and prefetch. + * Test category: functionality test + * + * SUBTEST: cross-gpu-mem-access + * Description: + * This test creates two malloced regions, places the destination + * region both remotely and locally and copies to it. Reads back to + * system memory and checks the result. + * + */ + +#define MAX_XE_REGIONS 8 +#define MAX_XE_GPUS 8 +#define NUM_LOOPS 1 +#define BATCH_SIZE(_fd) ALIGN(SZ_8K, xe_get_default_alignment(_fd)) +#define BIND_SYNC_VAL 0x686868 +#define EXEC_SYNC_VAL 0x676767 +#define COPY_SIZE SZ_64M + +struct xe_svm_gpu_info { + bool supports_faults; + int vram_regions[MAX_XE_REGIONS]; + unsigned int num_regions; + unsigned int va_bits; + int fd; +}; + +struct multigpu_ops_args { + bool prefetch_req; + bool op_mod; +}; + +typedef void (*gpu_pair_fn) ( + struct xe_svm_gpu_info *src, + struct xe_svm_gpu_info *dst, + struct drm_xe_engine_class_instance *eci, + void *extra_args +); + +static void for_each_gpu_pair(int num_gpus, + struct xe_svm_gpu_info *gpus, + struct drm_xe_engine_class_instance *eci, + gpu_pair_fn fn, + void *extra_args); + +static void gpu_mem_access_wrapper(struct xe_svm_gpu_info *src, + struct xe_svm_gpu_info *dst, + struct drm_xe_engine_class_instance *eci, + void *extra_args); + +static void open_pagemaps(int fd, struct xe_svm_gpu_info *info); + +static void +create_vm_and_queue(struct xe_svm_gpu_info *gpu, struct drm_xe_engine_class_instance *eci, + uint32_t *vm, uint32_t *exec_queue) +{ + *vm = xe_vm_create(gpu->fd, + DRM_XE_VM_CREATE_FLAG_LR_MODE | DRM_XE_VM_CREATE_FLAG_FAULT_MODE, 0); + *exec_queue = xe_exec_queue_create(gpu->fd, *vm, eci, 0); + xe_vm_bind_lr_sync(gpu->fd, *vm, 0, 0, 0, 1ull << gpu->va_bits, + DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR); +} + +static void +setup_sync(struct drm_xe_sync *sync, volatile uint64_t **sync_addr, uint64_t timeline_value) +{ + *sync_addr = malloc(sizeof(**sync_addr)); + igt_assert(*sync_addr); + sync->flags = DRM_XE_SYNC_FLAG_SIGNAL; + sync->type = DRM_XE_SYNC_TYPE_USER_FENCE; + sync->addr = to_user_pointer((uint64_t *)*sync_addr); + sync->timeline_value = timeline_value; + **sync_addr = 0; +} + +static void +cleanup_vm_and_queue(struct xe_svm_gpu_info *gpu, uint32_t vm, uint32_t exec_queue) +{ + xe_vm_unbind_lr_sync(gpu->fd, vm, 0, 0, 1ull << gpu->va_bits); + xe_exec_queue_destroy(gpu->fd, exec_queue); + xe_vm_destroy(gpu->fd, vm); +} + +static void xe_multigpu_madvise(int src_fd, uint32_t vm, uint64_t addr, uint64_t size, + uint64_t ext, uint32_t type, int dst_fd, uint16_t policy, + uint16_t instance, uint32_t exec_queue, int local_fd, + uint16_t local_vram) +{ + int ret; + +#define SYSTEM_MEMORY 0 + if (src_fd != dst_fd) { + ret = xe_vm_madvise(src_fd, vm, addr, size, ext, type, dst_fd, policy, instance); + if (ret == -ENOLINK) { + igt_info("No fast interconnect between GPU0 and GPU1, falling back to local VRAM\n"); + ret = xe_vm_madvise(src_fd, vm, addr, size, ext, type, local_fd, + policy, local_vram); + if (ret) { + igt_info("Local VRAM madvise failed, falling back to system memory\n"); + ret = xe_vm_madvise(src_fd, vm, addr, size, ext, type, + SYSTEM_MEMORY, policy, SYSTEM_MEMORY); + igt_assert_eq(ret, 0); + } + } else { + igt_assert_eq(ret, 0); + } + } else { + ret = xe_vm_madvise(src_fd, vm, addr, size, ext, type, dst_fd, policy, instance); + igt_assert_eq(ret, 0); + + } + +} + +static void xe_multigpu_prefetch(int src_fd, uint32_t vm, uint64_t addr, uint64_t size, + struct drm_xe_sync *sync, volatile uint64_t *sync_addr, + uint32_t exec_queue, bool prefetch_req) +{ + if (prefetch_req) { + xe_vm_prefetch_async(src_fd, vm, 0, 0, addr, size, sync, 1, + DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC); + if (*sync_addr != sync->timeline_value) + xe_wait_ufence(src_fd, (uint64_t *)sync_addr, sync->timeline_value, + exec_queue, NSEC_PER_SEC * 10); + } + free((void *)sync_addr); +} + +static void for_each_gpu_pair(int num_gpus, struct xe_svm_gpu_info *gpus, + struct drm_xe_engine_class_instance *eci, + gpu_pair_fn fn, void *extra_args) +{ + for (int src = 0; src < num_gpus; src++) { + if(!gpus[src].supports_faults) + continue; + + for (int dst = 0; dst < num_gpus; dst++) { + if (src == dst) + continue; + fn(&gpus[src], &gpus[dst], eci, extra_args); + } + } +} + +static void batch_init(int fd, uint32_t vm, uint64_t src_addr, + uint64_t dst_addr, uint64_t copy_size, + uint32_t *bo, uint64_t *addr) +{ + uint32_t width = copy_size / 256; + uint32_t height = 1; + uint32_t batch_bo_size = BATCH_SIZE(fd); + uint32_t batch_bo; + uint64_t batch_addr; + void *batch; + uint32_t *cmd; + uint32_t mocs_index = intel_get_uc_mocs_index(fd); + int i = 0; + + batch_bo = xe_bo_create(fd, vm, batch_bo_size, vram_if_possible(fd, 0), 0); + batch = xe_bo_map(fd, batch_bo, batch_bo_size); + cmd = (uint32_t *) batch; + cmd[i++] = MEM_COPY_CMD | (1 << 19); + cmd[i++] = width - 1; + cmd[i++] = height - 1; + cmd[i++] = width - 1; + cmd[i++] = width - 1; + cmd[i++] = src_addr & ((1UL << 32) - 1); + cmd[i++] = src_addr >> 32; + cmd[i++] = dst_addr & ((1UL << 32) - 1); + cmd[i++] = dst_addr >> 32; + cmd[i++] = mocs_index << XE2_MEM_COPY_MOCS_SHIFT | mocs_index; + cmd[i++] = MI_BATCH_BUFFER_END; + cmd[i++] = MI_BATCH_BUFFER_END; + + batch_addr = to_user_pointer(batch); + /* Punch a gap in the SVM map where we map the batch_bo */ + xe_vm_bind_lr_sync(fd, vm, batch_bo, 0, batch_addr, batch_bo_size, 0); + *bo = batch_bo; + *addr = batch_addr; +} + +static void batch_fini(int fd, uint32_t vm, uint32_t bo, uint64_t addr) +{ + /* Unmap the batch bo by re-instating the SVM binding. */ + xe_vm_bind_lr_sync(fd, vm, 0, 0, addr, BATCH_SIZE(fd), + DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR); + gem_close(fd, bo); +} + + +static void open_pagemaps(int fd, struct xe_svm_gpu_info *info) +{ + unsigned int count = 0; + uint64_t regions = all_memory_regions(fd); + uint32_t region; + + xe_for_each_mem_region(fd, regions, region) { + if (XE_IS_VRAM_MEMORY_REGION(fd, region)) { + struct drm_xe_mem_region *mem_region = + xe_mem_region(fd, 1ull << (region - 1)); + igt_assert(count < MAX_XE_REGIONS); + info->vram_regions[count++] = mem_region->instance; + } + } + + info->num_regions = count; +} + +static int get_device_info(struct xe_svm_gpu_info gpus[], int num_gpus) +{ + int cnt; + int xe; + int i; + + for (i = 0, cnt = 0 && i < 128; cnt < num_gpus; i++) { + xe = __drm_open_driver_another(i, DRIVER_XE); + if (xe < 0) + break; + + gpus[cnt].fd = xe; + cnt++; + } + + return cnt; +} + +static void +copy_src_dst(struct xe_svm_gpu_info *gpu0, + struct xe_svm_gpu_info *gpu1, + struct drm_xe_engine_class_instance *eci, + bool prefetch_req) +{ + uint32_t vm[1]; + uint32_t exec_queue[2]; + uint32_t batch_bo; + void *copy_src, *copy_dst; + uint64_t batch_addr; + struct drm_xe_sync sync = {}; + volatile uint64_t *sync_addr; + int local_fd = gpu0->fd; + uint16_t local_vram = gpu0->vram_regions[0]; + + create_vm_and_queue(gpu0, eci, &vm[0], &exec_queue[0]); + + /* Allocate source and destination buffers */ + copy_src = aligned_alloc(xe_get_default_alignment(gpu0->fd), SZ_64M); + igt_assert(copy_src); + copy_dst = aligned_alloc(xe_get_default_alignment(gpu1->fd), SZ_64M); + igt_assert(copy_dst); + + /* + * Initialize, map and bind the batch bo. Note that Xe doesn't seem to enjoy + * batch buffer memory accessed over PCIe p2p. + */ + batch_init(gpu0->fd, vm[0], to_user_pointer(copy_src), to_user_pointer(copy_dst), + COPY_SIZE, &batch_bo, &batch_addr); + + /* Fill the source with a pattern, clear the destination. */ + memset(copy_src, 0x67, COPY_SIZE); + memset(copy_dst, 0x0, COPY_SIZE); + + xe_multigpu_madvise(gpu0->fd, vm[0], to_user_pointer(copy_dst), COPY_SIZE, + 0, DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, + gpu1->fd, 0, gpu1->vram_regions[0], exec_queue[0], + local_fd, local_vram); + + setup_sync(&sync, &sync_addr, BIND_SYNC_VAL); + xe_multigpu_prefetch(gpu0->fd, vm[0], to_user_pointer(copy_dst), COPY_SIZE, &sync, + sync_addr, exec_queue[0], prefetch_req); + + sync_addr = (void *)((char *)batch_addr + SZ_4K); + sync.addr = to_user_pointer((uint64_t *)sync_addr); + sync.timeline_value = EXEC_SYNC_VAL; + *sync_addr = 0; + + /* Execute a GPU copy. */ + xe_exec_sync(gpu0->fd, exec_queue[0], batch_addr, &sync, 1); + if (*sync_addr != EXEC_SYNC_VAL) + xe_wait_ufence(gpu0->fd, (uint64_t *)sync_addr, EXEC_SYNC_VAL, exec_queue[0], + NSEC_PER_SEC * 10); + + igt_assert(memcmp(copy_src, copy_dst, COPY_SIZE) == 0); + + free(copy_dst); + free(copy_src); + munmap((void *)batch_addr, BATCH_SIZE(gpu0->fd)); + batch_fini(gpu0->fd, vm[0], batch_bo, batch_addr); + cleanup_vm_and_queue(gpu0, vm[0], exec_queue[0]); +} + +static void +gpu_mem_access_wrapper(struct xe_svm_gpu_info *src, + struct xe_svm_gpu_info *dst, + struct drm_xe_engine_class_instance *eci, + void *extra_args) +{ + struct multigpu_ops_args *args = (struct multigpu_ops_args *)extra_args; + igt_assert(src); + igt_assert(dst); + + copy_src_dst(src, dst, eci, args->prefetch_req); +} + +igt_main +{ + struct xe_svm_gpu_info gpus[MAX_XE_GPUS]; + struct xe_device *xe; + int gpu, gpu_cnt; + + struct drm_xe_engine_class_instance eci = { + .engine_class = DRM_XE_ENGINE_CLASS_COPY, + }; + + igt_fixture { + gpu_cnt = get_device_info(gpus, ARRAY_SIZE(gpus)); + igt_skip_on(gpu_cnt < 2); + + for (gpu = 0; gpu < gpu_cnt; ++gpu) { + igt_assert(gpu < MAX_XE_GPUS); + + open_pagemaps(gpus[gpu].fd, &gpus[gpu]); + /* NOTE! inverted return value. */ + gpus[gpu].supports_faults = !xe_supports_faults(gpus[gpu].fd); + fprintf(stderr, "GPU %u has %u VRAM regions%s, and %s SVM VMs.\n", + gpu, gpus[gpu].num_regions, + gpus[gpu].num_regions != 1 ? "s" : "", + gpus[gpu].supports_faults ? "supports" : "doesn't support"); + + xe = xe_device_get(gpus[gpu].fd); + gpus[gpu].va_bits = xe->va_bits; + } + } + + igt_describe("gpu-gpu write-read"); + igt_subtest("cross-gpu-mem-access") { + struct multigpu_ops_args op_args; + op_args.prefetch_req = 1; + for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_mem_access_wrapper, &op_args); + op_args.prefetch_req = 0; + for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_mem_access_wrapper, &op_args); + } + + igt_fixture { + int cnt; + + for (cnt = 0; cnt < gpu_cnt; cnt++) + drm_close_driver(gpus[cnt].fd); + } +} diff --git a/tests/meson.build b/tests/meson.build index 9736f2338..1209f84a4 100644 --- a/tests/meson.build +++ b/tests/meson.build @@ -313,6 +313,7 @@ intel_xe_progs = [ 'xe_media_fill', 'xe_mmap', 'xe_module_load', + 'xe_multi_gpusvm', 'xe_noexec_ping_pong', 'xe_oa', 'xe_pat', -- 2.48.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH i-g-t v7 04/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU atomic operations 2025-11-13 17:04 [PATCH i-g-t v7 00/10] Add SVM madvise feature for multi-GPU configurations nishit.sharma ` (2 preceding siblings ...) 2025-11-13 17:04 ` [PATCH i-g-t v7 03/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU cross-GPU memory access test nishit.sharma @ 2025-11-13 17:04 ` nishit.sharma 2025-11-21 14:11 ` Gurram, Pravalika 2025-11-13 17:04 ` [PATCH i-g-t v7 05/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU coherency test nishit.sharma ` (4 subsequent siblings) 8 siblings, 1 reply; 21+ messages in thread From: nishit.sharma @ 2025-11-13 17:04 UTC (permalink / raw) To: igt-dev, nishit.sharma From: Nishit Sharma <nishit.sharma@intel.com> This test performs atomic increment operation on a shared SVM buffer from both GPUs and the CPU in a multi-GPU environment. It uses madvise and prefetch to control buffer placement and verifies correctness and ordering of atomic updates across agents. Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> --- tests/intel/xe_multi_gpusvm.c | 157 +++++++++++++++++++++++++++++++++- 1 file changed, 156 insertions(+), 1 deletion(-) diff --git a/tests/intel/xe_multi_gpusvm.c b/tests/intel/xe_multi_gpusvm.c index 6614ea3d1..54e036724 100644 --- a/tests/intel/xe_multi_gpusvm.c +++ b/tests/intel/xe_multi_gpusvm.c @@ -31,6 +31,11 @@ * region both remotely and locally and copies to it. Reads back to * system memory and checks the result. * + * SUBTEST: atomic-inc-gpu-op + * Description: + * This test does atomic operation in multi-gpu by executing atomic + * operation on GPU1 and then atomic operation on GPU2 using same + * adress */ #define MAX_XE_REGIONS 8 @@ -40,6 +45,7 @@ #define BIND_SYNC_VAL 0x686868 #define EXEC_SYNC_VAL 0x676767 #define COPY_SIZE SZ_64M +#define ATOMIC_OP_VAL 56 struct xe_svm_gpu_info { bool supports_faults; @@ -49,6 +55,16 @@ struct xe_svm_gpu_info { int fd; }; +struct test_exec_data { + uint32_t batch[32]; + uint64_t pad; + uint64_t vm_sync; + uint64_t exec_sync; + uint32_t data; + uint32_t expected_data; + uint64_t batch_addr; +}; + struct multigpu_ops_args { bool prefetch_req; bool op_mod; @@ -72,7 +88,10 @@ static void gpu_mem_access_wrapper(struct xe_svm_gpu_info *src, struct drm_xe_engine_class_instance *eci, void *extra_args); -static void open_pagemaps(int fd, struct xe_svm_gpu_info *info); +static void gpu_atomic_inc_wrapper(struct xe_svm_gpu_info *src, + struct xe_svm_gpu_info *dst, + struct drm_xe_engine_class_instance *eci, + void *extra_args); static void create_vm_and_queue(struct xe_svm_gpu_info *gpu, struct drm_xe_engine_class_instance *eci, @@ -166,6 +185,35 @@ static void for_each_gpu_pair(int num_gpus, struct xe_svm_gpu_info *gpus, } } +static void open_pagemaps(int fd, struct xe_svm_gpu_info *info); + +static void +atomic_batch_init(int fd, uint32_t vm, uint64_t src_addr, + uint32_t *bo, uint64_t *addr) +{ + uint32_t batch_bo_size = BATCH_SIZE(fd); + uint32_t batch_bo; + uint64_t batch_addr; + void *batch; + uint32_t *cmd; + int i = 0; + + batch_bo = xe_bo_create(fd, vm, batch_bo_size, vram_if_possible(fd, 0), 0); + batch = xe_bo_map(fd, batch_bo, batch_bo_size); + cmd = (uint32_t *)batch; + + cmd[i++] = MI_ATOMIC | MI_ATOMIC_INC; + cmd[i++] = src_addr; + cmd[i++] = src_addr >> 32; + cmd[i++] = MI_BATCH_BUFFER_END; + + batch_addr = to_user_pointer(batch); + /* Punch a gap in the SVM map where we map the batch_bo */ + xe_vm_bind_lr_sync(fd, vm, batch_bo, 0, batch_addr, batch_bo_size, 0); + *bo = batch_bo; + *addr = batch_addr; +} + static void batch_init(int fd, uint32_t vm, uint64_t src_addr, uint64_t dst_addr, uint64_t copy_size, uint32_t *bo, uint64_t *addr) @@ -325,6 +373,105 @@ gpu_mem_access_wrapper(struct xe_svm_gpu_info *src, copy_src_dst(src, dst, eci, args->prefetch_req); } +static void +atomic_inc_op(struct xe_svm_gpu_info *gpu0, + struct xe_svm_gpu_info *gpu1, + struct drm_xe_engine_class_instance *eci, + bool prefetch_req) +{ + uint64_t addr; + uint32_t vm[2]; + uint32_t exec_queue[2]; + uint32_t batch_bo; + struct test_exec_data *data; + uint64_t batch_addr; + struct drm_xe_sync sync = {}; + volatile uint64_t *sync_addr; + volatile uint32_t *shared_val; + + create_vm_and_queue(gpu0, eci, &vm[0], &exec_queue[0]); + create_vm_and_queue(gpu1, eci, &vm[1], &exec_queue[1]); + + data = aligned_alloc(SZ_2M, SZ_4K); + igt_assert(data); + data[0].vm_sync = 0; + addr = to_user_pointer(data); + + shared_val = (volatile uint32_t *)addr; + *shared_val = ATOMIC_OP_VAL - 1; + + atomic_batch_init(gpu0->fd, vm[0], addr, &batch_bo, &batch_addr); + + /* Place destination in an optionally remote location to test */ + xe_multigpu_madvise(gpu0->fd, vm[0], addr, SZ_4K, 0, + DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, + gpu0->fd, 0, gpu0->vram_regions[0], exec_queue[0], + 0, 0); + + setup_sync(&sync, &sync_addr, BIND_SYNC_VAL); + xe_multigpu_prefetch(gpu0->fd, vm[0], addr, SZ_4K, &sync, + sync_addr, exec_queue[0], prefetch_req); + + sync_addr = (void *)((char *)batch_addr + SZ_4K); + sync.addr = to_user_pointer((uint64_t *)sync_addr); + sync.timeline_value = EXEC_SYNC_VAL; + *sync_addr = 0; + + /* Executing ATOMIC_INC on GPU0. */ + xe_exec_sync(gpu0->fd, exec_queue[0], batch_addr, &sync, 1); + if (*sync_addr != EXEC_SYNC_VAL) + xe_wait_ufence(gpu0->fd, (uint64_t *)sync_addr, EXEC_SYNC_VAL, exec_queue[0], + NSEC_PER_SEC * 10); + + igt_assert_eq(*shared_val, ATOMIC_OP_VAL); + + atomic_batch_init(gpu1->fd, vm[1], addr, &batch_bo, &batch_addr); + + /* Place destination in an optionally remote location to test */ + xe_multigpu_madvise(gpu1->fd, vm[1], addr, SZ_4K, 0, + DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, + gpu1->fd, 0, gpu1->vram_regions[0], exec_queue[0], + 0, 0); + + setup_sync(&sync, &sync_addr, BIND_SYNC_VAL); + xe_multigpu_prefetch(gpu1->fd, vm[1], addr, SZ_4K, &sync, + sync_addr, exec_queue[1], prefetch_req); + + sync_addr = (void *)((char *)batch_addr + SZ_4K); + sync.addr = to_user_pointer((uint64_t *)sync_addr); + sync.timeline_value = EXEC_SYNC_VAL; + *sync_addr = 0; + + /* Execute ATOMIC_INC on GPU1 */ + xe_exec_sync(gpu1->fd, exec_queue[1], batch_addr, &sync, 1); + if (*sync_addr != EXEC_SYNC_VAL) + xe_wait_ufence(gpu1->fd, (uint64_t *)sync_addr, EXEC_SYNC_VAL, exec_queue[1], + NSEC_PER_SEC * 10); + + igt_assert_eq(*shared_val, ATOMIC_OP_VAL + 1); + + munmap((void *)batch_addr, BATCH_SIZE(gpu0->fd)); + batch_fini(gpu0->fd, vm[0], batch_bo, batch_addr); + batch_fini(gpu1->fd, vm[1], batch_bo, batch_addr); + free(data); + + cleanup_vm_and_queue(gpu0, vm[0], exec_queue[0]); + cleanup_vm_and_queue(gpu1, vm[1], exec_queue[1]); +} + +static void +gpu_atomic_inc_wrapper(struct xe_svm_gpu_info *src, + struct xe_svm_gpu_info *dst, + struct drm_xe_engine_class_instance *eci, + void *extra_args) +{ + struct multigpu_ops_args *args = (struct multigpu_ops_args *)extra_args; + igt_assert(src); + igt_assert(dst); + + atomic_inc_op(src, dst, eci, args->prefetch_req); +} + igt_main { struct xe_svm_gpu_info gpus[MAX_XE_GPUS]; @@ -364,6 +511,14 @@ igt_main for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_mem_access_wrapper, &op_args); } + igt_subtest("atomic-inc-gpu-op") { + struct multigpu_ops_args atomic_args; + atomic_args.prefetch_req = 1; + for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_atomic_inc_wrapper, &atomic_args); + atomic_args.prefetch_req = 0; + for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_atomic_inc_wrapper, &atomic_args); + } + igt_fixture { int cnt; -- 2.48.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* RE: [PATCH i-g-t v7 04/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU atomic operations 2025-11-13 17:04 ` [PATCH i-g-t v7 04/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU atomic operations nishit.sharma @ 2025-11-21 14:11 ` Gurram, Pravalika 0 siblings, 0 replies; 21+ messages in thread From: Gurram, Pravalika @ 2025-11-21 14:11 UTC (permalink / raw) To: Sharma, Nishit, igt-dev@lists.freedesktop.org, Sharma, Nishit > -----Original Message----- > From: igt-dev <igt-dev-bounces@lists.freedesktop.org> On Behalf Of > nishit.sharma@intel.com > Sent: Thursday, November 13, 2025 10:35 PM > To: igt-dev@lists.freedesktop.org; Sharma, Nishit > <nishit.sharma@intel.com> > Subject: [PATCH i-g-t v7 04/10] tests/intel/xe_multi_gpusvm: Add SVM > multi-GPU atomic operations > > From: Nishit Sharma <nishit.sharma@intel.com> > > This test performs atomic increment operation on a shared SVM buffer from > both GPUs and the CPU in a multi-GPU environment. It uses madvise and > prefetch to control buffer placement and verifies correctness and ordering of > atomic updates across agents. > > Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> > --- > tests/intel/xe_multi_gpusvm.c | 157 > +++++++++++++++++++++++++++++++++- > 1 file changed, 156 insertions(+), 1 deletion(-) > > diff --git a/tests/intel/xe_multi_gpusvm.c b/tests/intel/xe_multi_gpusvm.c > index 6614ea3d1..54e036724 100644 > --- a/tests/intel/xe_multi_gpusvm.c > +++ b/tests/intel/xe_multi_gpusvm.c > @@ -31,6 +31,11 @@ > * region both remotely and locally and copies to it. Reads back to > * system memory and checks the result. > * > + * SUBTEST: atomic-inc-gpu-op > + * Description: > + * This test does atomic operation in multi-gpu by executing atomic > + * operation on GPU1 and then atomic operation on GPU2 using same > + * adress > */ > > #define MAX_XE_REGIONS 8 > @@ -40,6 +45,7 @@ > #define BIND_SYNC_VAL 0x686868 > #define EXEC_SYNC_VAL 0x676767 > #define COPY_SIZE SZ_64M > +#define ATOMIC_OP_VAL 56 > > struct xe_svm_gpu_info { > bool supports_faults; > @@ -49,6 +55,16 @@ struct xe_svm_gpu_info { > int fd; > }; > > +struct test_exec_data { > + uint32_t batch[32]; > + uint64_t pad; > + uint64_t vm_sync; > + uint64_t exec_sync; > + uint32_t data; > + uint32_t expected_data; > + uint64_t batch_addr; > +}; > + > struct multigpu_ops_args { > bool prefetch_req; > bool op_mod; > @@ -72,7 +88,10 @@ static void gpu_mem_access_wrapper(struct > xe_svm_gpu_info *src, > struct drm_xe_engine_class_instance *eci, > void *extra_args); > > -static void open_pagemaps(int fd, struct xe_svm_gpu_info *info); > +static void gpu_atomic_inc_wrapper(struct xe_svm_gpu_info *src, > + struct xe_svm_gpu_info *dst, > + struct drm_xe_engine_class_instance *eci, > + void *extra_args); > > static void > create_vm_and_queue(struct xe_svm_gpu_info *gpu, struct > drm_xe_engine_class_instance *eci, @@ -166,6 +185,35 @@ static void > for_each_gpu_pair(int num_gpus, struct xe_svm_gpu_info *gpus, > } > } > > +static void open_pagemaps(int fd, struct xe_svm_gpu_info *info); > + > +static void > +atomic_batch_init(int fd, uint32_t vm, uint64_t src_addr, > + uint32_t *bo, uint64_t *addr) > +{ > + uint32_t batch_bo_size = BATCH_SIZE(fd); > + uint32_t batch_bo; > + uint64_t batch_addr; > + void *batch; > + uint32_t *cmd; > + int i = 0; > + > + batch_bo = xe_bo_create(fd, vm, batch_bo_size, > vram_if_possible(fd, 0), 0); > + batch = xe_bo_map(fd, batch_bo, batch_bo_size); > + cmd = (uint32_t *)batch; > + > + cmd[i++] = MI_ATOMIC | MI_ATOMIC_INC; > + cmd[i++] = src_addr; > + cmd[i++] = src_addr >> 32; > + cmd[i++] = MI_BATCH_BUFFER_END; > + > + batch_addr = to_user_pointer(batch); > + /* Punch a gap in the SVM map where we map the batch_bo */ > + xe_vm_bind_lr_sync(fd, vm, batch_bo, 0, batch_addr, > batch_bo_size, 0); > + *bo = batch_bo; > + *addr = batch_addr; > +} > + > static void batch_init(int fd, uint32_t vm, uint64_t src_addr, > uint64_t dst_addr, uint64_t copy_size, > uint32_t *bo, uint64_t *addr) > @@ -325,6 +373,105 @@ gpu_mem_access_wrapper(struct > xe_svm_gpu_info *src, > copy_src_dst(src, dst, eci, args->prefetch_req); } > > +static void > +atomic_inc_op(struct xe_svm_gpu_info *gpu0, > + struct xe_svm_gpu_info *gpu1, > + struct drm_xe_engine_class_instance *eci, > + bool prefetch_req) > +{ > + uint64_t addr; > + uint32_t vm[2]; > + uint32_t exec_queue[2]; > + uint32_t batch_bo; > + struct test_exec_data *data; > + uint64_t batch_addr; > + struct drm_xe_sync sync = {}; > + volatile uint64_t *sync_addr; > + volatile uint32_t *shared_val; > + > + create_vm_and_queue(gpu0, eci, &vm[0], &exec_queue[0]); > + create_vm_and_queue(gpu1, eci, &vm[1], &exec_queue[1]); > + > + data = aligned_alloc(SZ_2M, SZ_4K); > + igt_assert(data); > + data[0].vm_sync = 0; > + addr = to_user_pointer(data); > + > + shared_val = (volatile uint32_t *)addr; > + *shared_val = ATOMIC_OP_VAL - 1; > + > + atomic_batch_init(gpu0->fd, vm[0], addr, &batch_bo, &batch_addr); > + > + /* Place destination in an optionally remote location to test */ This is not remote location. It local memory to gpu0 > + xe_multigpu_madvise(gpu0->fd, vm[0], addr, SZ_4K, 0, > + DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, > + gpu0->fd, 0, gpu0->vram_regions[0], > exec_queue[0], > + 0, 0); > + > + setup_sync(&sync, &sync_addr, BIND_SYNC_VAL); > + xe_multigpu_prefetch(gpu0->fd, vm[0], addr, SZ_4K, &sync, > + sync_addr, exec_queue[0], prefetch_req); > + > + sync_addr = (void *)((char *)batch_addr + SZ_4K); > + sync.addr = to_user_pointer((uint64_t *)sync_addr); > + sync.timeline_value = EXEC_SYNC_VAL; > + *sync_addr = 0; > + > + /* Executing ATOMIC_INC on GPU0. */ > + xe_exec_sync(gpu0->fd, exec_queue[0], batch_addr, &sync, 1); > + if (*sync_addr != EXEC_SYNC_VAL) > + xe_wait_ufence(gpu0->fd, (uint64_t *)sync_addr, > EXEC_SYNC_VAL, exec_queue[0], > + NSEC_PER_SEC * 10); > + > + igt_assert_eq(*shared_val, ATOMIC_OP_VAL); This test only doing GPU0 writes -> CPU reads GPU1 writes -> CPU reads GPUs NEVER communicate with each other Only tests GPU -> CPU coherency Missing GPU <-> GPU coherency am thinking if we do below operation it will be good. GPU0 writes -> GPU1 READS what GPU0 wrote -> (GPU-to-GPU) GPU1 writes -> GPU0 READS what GPU1 wrote -> (GPU-to-GPU) This test GPU-to-GPU shared memory -Pravalika > + > + atomic_batch_init(gpu1->fd, vm[1], addr, &batch_bo, &batch_addr); > + > + /* Place destination in an optionally remote location to test */ > + xe_multigpu_madvise(gpu1->fd, vm[1], addr, SZ_4K, 0, > + DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, > + gpu1->fd, 0, gpu1->vram_regions[0], > exec_queue[0], > + 0, 0); > + > + setup_sync(&sync, &sync_addr, BIND_SYNC_VAL); > + xe_multigpu_prefetch(gpu1->fd, vm[1], addr, SZ_4K, &sync, > + sync_addr, exec_queue[1], prefetch_req); > + > + sync_addr = (void *)((char *)batch_addr + SZ_4K); > + sync.addr = to_user_pointer((uint64_t *)sync_addr); > + sync.timeline_value = EXEC_SYNC_VAL; > + *sync_addr = 0; > + > + /* Execute ATOMIC_INC on GPU1 */ > + xe_exec_sync(gpu1->fd, exec_queue[1], batch_addr, &sync, 1); > + if (*sync_addr != EXEC_SYNC_VAL) > + xe_wait_ufence(gpu1->fd, (uint64_t *)sync_addr, > EXEC_SYNC_VAL, exec_queue[1], > + NSEC_PER_SEC * 10); > + > + igt_assert_eq(*shared_val, ATOMIC_OP_VAL + 1); > + > + munmap((void *)batch_addr, BATCH_SIZE(gpu0->fd)); > + batch_fini(gpu0->fd, vm[0], batch_bo, batch_addr); > + batch_fini(gpu1->fd, vm[1], batch_bo, batch_addr); > + free(data); > + > + cleanup_vm_and_queue(gpu0, vm[0], exec_queue[0]); > + cleanup_vm_and_queue(gpu1, vm[1], exec_queue[1]); } > + > +static void > +gpu_atomic_inc_wrapper(struct xe_svm_gpu_info *src, > + struct xe_svm_gpu_info *dst, > + struct drm_xe_engine_class_instance *eci, > + void *extra_args) > +{ > + struct multigpu_ops_args *args = (struct multigpu_ops_args > *)extra_args; > + igt_assert(src); > + igt_assert(dst); > + > + atomic_inc_op(src, dst, eci, args->prefetch_req); } > + > igt_main > { > struct xe_svm_gpu_info gpus[MAX_XE_GPUS]; @@ -364,6 +511,14 > @@ igt_main > for_each_gpu_pair(gpu_cnt, gpus, &eci, > gpu_mem_access_wrapper, &op_args); > } > > + igt_subtest("atomic-inc-gpu-op") { > + struct multigpu_ops_args atomic_args; > + atomic_args.prefetch_req = 1; > + for_each_gpu_pair(gpu_cnt, gpus, &eci, > gpu_atomic_inc_wrapper, &atomic_args); > + atomic_args.prefetch_req = 0; > + for_each_gpu_pair(gpu_cnt, gpus, &eci, > gpu_atomic_inc_wrapper, &atomic_args); > + } > + > igt_fixture { > int cnt; > > -- > 2.48.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH i-g-t v7 05/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU coherency test 2025-11-13 17:04 [PATCH i-g-t v7 00/10] Add SVM madvise feature for multi-GPU configurations nishit.sharma ` (3 preceding siblings ...) 2025-11-13 17:04 ` [PATCH i-g-t v7 04/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU atomic operations nishit.sharma @ 2025-11-13 17:04 ` nishit.sharma 2025-11-13 17:04 ` [PATCH i-g-t v7 06/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU performance test nishit.sharma ` (3 subsequent siblings) 8 siblings, 0 replies; 21+ messages in thread From: nishit.sharma @ 2025-11-13 17:04 UTC (permalink / raw) To: igt-dev, nishit.sharma From: Nishit Sharma <nishit.sharma@intel.com> This test verifies memory coherency in a multi-GPU environment using SVM. GPU 1 writes to a shared buffer, GPU 2 reads and checks for correct data without explicit synchronization, and the test is repeated with CPU and both GPUs to ensure consistent memory visibility across agents. Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> --- tests/intel/xe_multi_gpusvm.c | 203 +++++++++++++++++++++++++++++++++- 1 file changed, 201 insertions(+), 2 deletions(-) diff --git a/tests/intel/xe_multi_gpusvm.c b/tests/intel/xe_multi_gpusvm.c index 54e036724..6792ef72c 100644 --- a/tests/intel/xe_multi_gpusvm.c +++ b/tests/intel/xe_multi_gpusvm.c @@ -34,8 +34,13 @@ * SUBTEST: atomic-inc-gpu-op * Description: * This test does atomic operation in multi-gpu by executing atomic - * operation on GPU1 and then atomic operation on GPU2 using same - * adress + * operation on GPU1 and then atomic operation on GPU2 using same + * adress + * + * SUBTEST: coherency-multi-gpu + * Description: + * This test checks coherency in multi-gpu by writing from GPU0 + * reading from GPU1 and verify and repeating with CPU and both GPUs */ #define MAX_XE_REGIONS 8 @@ -93,6 +98,11 @@ static void gpu_atomic_inc_wrapper(struct xe_svm_gpu_info *src, struct drm_xe_engine_class_instance *eci, void *extra_args); +static void gpu_coherecy_test_wrapper(struct xe_svm_gpu_info *src, + struct xe_svm_gpu_info *dst, + struct drm_xe_engine_class_instance *eci, + void *extra_args); + static void create_vm_and_queue(struct xe_svm_gpu_info *gpu, struct drm_xe_engine_class_instance *eci, uint32_t *vm, uint32_t *exec_queue) @@ -214,6 +224,35 @@ atomic_batch_init(int fd, uint32_t vm, uint64_t src_addr, *addr = batch_addr; } +static void +store_dword_batch_init(int fd, uint32_t vm, uint64_t src_addr, + uint32_t *bo, uint64_t *addr, int value) +{ + uint32_t batch_bo_size = BATCH_SIZE(fd); + uint32_t batch_bo; + uint64_t batch_addr; + void *batch; + uint32_t *cmd; + int i = 0; + + batch_bo = xe_bo_create(fd, vm, batch_bo_size, vram_if_possible(fd, 0), 0); + batch = xe_bo_map(fd, batch_bo, batch_bo_size); + cmd = (uint32_t *) batch; + + cmd[i++] = MI_STORE_DWORD_IMM_GEN4; + cmd[i++] = src_addr; + cmd[i++] = src_addr >> 32; + cmd[i++] = value; + cmd[i++] = MI_BATCH_BUFFER_END; + + batch_addr = to_user_pointer(batch); + + /* Punch a gap in the SVM map where we map the batch_bo */ + xe_vm_bind_lr_sync(fd, vm, batch_bo, 0, batch_addr, batch_bo_size, 0); + *bo = batch_bo; + *addr = batch_addr; +} + static void batch_init(int fd, uint32_t vm, uint64_t src_addr, uint64_t dst_addr, uint64_t copy_size, uint32_t *bo, uint64_t *addr) @@ -373,6 +412,143 @@ gpu_mem_access_wrapper(struct xe_svm_gpu_info *src, copy_src_dst(src, dst, eci, args->prefetch_req); } +static void +coherency_test_multigpu(struct xe_svm_gpu_info *gpu0, + struct xe_svm_gpu_info *gpu1, + struct drm_xe_engine_class_instance *eci, + bool coh_fail_set, + bool prefetch_req) +{ + uint64_t addr; + uint32_t vm[2]; + uint32_t exec_queue[2]; + uint32_t batch_bo, batch1_bo[2]; + uint64_t batch_addr, batch1_addr[2]; + struct drm_xe_sync sync = {}; + volatile uint64_t *sync_addr; + int value = 60; + uint64_t *data1; + void *copy_dst; + + create_vm_and_queue(gpu0, eci, &vm[0], &exec_queue[0]); + create_vm_and_queue(gpu1, eci, &vm[1], &exec_queue[1]); + + data1 = aligned_alloc(SZ_2M, SZ_4K); + igt_assert(data1); + addr = to_user_pointer(data1); + + copy_dst = aligned_alloc(SZ_2M, SZ_4K); + igt_assert(copy_dst); + + store_dword_batch_init(gpu0->fd, vm[0], addr, &batch_bo, &batch_addr, value); + + /* Place destination in GPU0 local memory location to test */ + xe_multigpu_madvise(gpu0->fd, vm[0], addr, SZ_4K, 0, + DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, + gpu0->fd, 0, gpu0->vram_regions[0], exec_queue[0], + 0, 0); + + setup_sync(&sync, &sync_addr, BIND_SYNC_VAL); + xe_multigpu_prefetch(gpu0->fd, vm[0], addr, SZ_4K, &sync, + sync_addr, exec_queue[0], prefetch_req); + + sync_addr = (void *)((char *)batch_addr + SZ_4K); + sync.addr = to_user_pointer((uint64_t *)sync_addr); + sync.timeline_value = EXEC_SYNC_VAL; + *sync_addr = 0; + + /* Execute STORE command on GPU0 */ + xe_exec_sync(gpu0->fd, exec_queue[0], batch_addr, &sync, 1); + if (*sync_addr != EXEC_SYNC_VAL) + xe_wait_ufence(gpu0->fd, (uint64_t *)sync_addr, EXEC_SYNC_VAL, exec_queue[0], + NSEC_PER_SEC * 10); + + igt_assert_eq(*(uint64_t *)addr, value); + + /* Creating batch for GPU1 using addr as Src which have value from GPU0 */ + batch_init(gpu1->fd, vm[1], addr, to_user_pointer(copy_dst), + SZ_4K, &batch_bo, &batch_addr); + + /* Place destination in GPU1 local memory location to test */ + xe_multigpu_madvise(gpu1->fd, vm[1], addr, SZ_4K, 0, + DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, + gpu1->fd, 0, gpu1->vram_regions[0], exec_queue[1], + 0, 0); + + setup_sync(&sync, &sync_addr, BIND_SYNC_VAL); + xe_multigpu_prefetch(gpu1->fd, vm[1], addr, SZ_4K, &sync, + sync_addr, exec_queue[1], prefetch_req); + + sync_addr = (void *)((char *)batch_addr + SZ_4K); + sync.addr = to_user_pointer((uint64_t *)sync_addr); + sync.timeline_value = EXEC_SYNC_VAL; + *sync_addr = 0; + + /* Execute COPY command on GPU1 */ + xe_exec_sync(gpu1->fd, exec_queue[1], batch_addr, &sync, 1); + if (*sync_addr != EXEC_SYNC_VAL) + xe_wait_ufence(gpu1->fd, (uint64_t *)sync_addr, EXEC_SYNC_VAL, exec_queue[1], + NSEC_PER_SEC * 10); + + igt_assert_eq(*(uint64_t *)copy_dst, value); + + /* CPU writes 10, memset set bytes no integer hence memset fills 4 bytes with 0x0A */ + memset((void *)(uintptr_t)addr, 10, sizeof(int)); + igt_assert_eq(*(uint64_t *)addr, 0x0A0A0A0A); + + if (coh_fail_set) { + igt_info("coherency fail impl\n"); + + /* Coherency fail scenario */ + store_dword_batch_init(gpu0->fd, vm[0], addr, &batch1_bo[0], &batch1_addr[0], value + 10); + store_dword_batch_init(gpu1->fd, vm[1], addr, &batch1_bo[1], &batch1_addr[1], value + 20); + + sync_addr = (void *)((char *)batch1_addr[0] + SZ_4K); + sync.addr = to_user_pointer((uint64_t *)sync_addr); + sync.timeline_value = EXEC_SYNC_VAL; + *sync_addr = 0; + + /* Execute STORE command on GPU1 */ + xe_exec_sync(gpu0->fd, exec_queue[0], batch1_addr[0], &sync, 1); + if (*sync_addr != EXEC_SYNC_VAL) + xe_wait_ufence(gpu0->fd, (uint64_t *)sync_addr, EXEC_SYNC_VAL, exec_queue[0], + NSEC_PER_SEC * 10); + + sync_addr = (void *)((char *)batch1_addr[1] + SZ_4K); + sync.addr = to_user_pointer((uint64_t *)sync_addr); + sync.timeline_value = EXEC_SYNC_VAL; + *sync_addr = 0; + + /* Execute STORE command on GPU2 */ + xe_exec_sync(gpu1->fd, exec_queue[1], batch1_addr[1], &sync, 1); + if (*sync_addr != EXEC_SYNC_VAL) + xe_wait_ufence(gpu1->fd, (uint64_t *)sync_addr, EXEC_SYNC_VAL, exec_queue[1], + NSEC_PER_SEC * 10); + + igt_warn_on_f(*(uint64_t *)addr != (value + 10), + "GPU2(dst_gpu] has overwritten value at addr\n"); + + munmap((void *)batch1_addr[0], BATCH_SIZE(gpu0->fd)); + munmap((void *)batch1_addr[1], BATCH_SIZE(gpu1->fd)); + + batch_fini(gpu0->fd, vm[0], batch1_bo[0], batch1_addr[0]); + batch_fini(gpu1->fd, vm[1], batch1_bo[1], batch1_addr[1]); + } + + /* CPU writes 11, memset set bytes no integer hence memset fills 4 bytes with 0x0B */ + memset((void *)(uintptr_t)addr, 11, sizeof(int)); + igt_assert_eq(*(uint64_t *)addr, 0x0B0B0B0B); + + munmap((void *)batch_addr, BATCH_SIZE(gpu0->fd)); + batch_fini(gpu0->fd, vm[0], batch_bo, batch_addr); + batch_fini(gpu1->fd, vm[1], batch_bo, batch_addr); + free(data1); + free(copy_dst); + + cleanup_vm_and_queue(gpu0, vm[0], exec_queue[0]); + cleanup_vm_and_queue(gpu1, vm[1], exec_queue[1]); +} + static void atomic_inc_op(struct xe_svm_gpu_info *gpu0, struct xe_svm_gpu_info *gpu1, @@ -472,6 +648,19 @@ gpu_atomic_inc_wrapper(struct xe_svm_gpu_info *src, atomic_inc_op(src, dst, eci, args->prefetch_req); } +static void +gpu_coherecy_test_wrapper(struct xe_svm_gpu_info *src, + struct xe_svm_gpu_info *dst, + struct drm_xe_engine_class_instance *eci, + void *extra_args) +{ + struct multigpu_ops_args *args = (struct multigpu_ops_args *)extra_args; + igt_assert(src); + igt_assert(dst); + + coherency_test_multigpu(src, dst, eci, args->op_mod, args->prefetch_req); +} + igt_main { struct xe_svm_gpu_info gpus[MAX_XE_GPUS]; @@ -519,6 +708,16 @@ igt_main for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_atomic_inc_wrapper, &atomic_args); } + igt_subtest("coherency-multi-gpu") { + struct multigpu_ops_args coh_args; + coh_args.prefetch_req = 1; + coh_args.op_mod = 0; + for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_coherecy_test_wrapper, &coh_args); + coh_args.prefetch_req = 0; + coh_args.op_mod = 1; + for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_coherecy_test_wrapper, &coh_args); + } + igt_fixture { int cnt; -- 2.48.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH i-g-t v7 06/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU performance test 2025-11-13 17:04 [PATCH i-g-t v7 00/10] Add SVM madvise feature for multi-GPU configurations nishit.sharma ` (4 preceding siblings ...) 2025-11-13 17:04 ` [PATCH i-g-t v7 05/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU coherency test nishit.sharma @ 2025-11-13 17:04 ` nishit.sharma 2025-11-13 17:04 ` [PATCH i-g-t v7 07/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU fault handling test nishit.sharma ` (2 subsequent siblings) 8 siblings, 0 replies; 21+ messages in thread From: nishit.sharma @ 2025-11-13 17:04 UTC (permalink / raw) To: igt-dev, nishit.sharma From: Nishit Sharma <nishit.sharma@intel.com> This test measures latency and bandwidth for buffer access from each GPU and the CPU in a multi-GPU SVM environment. It compares performance for local versus remote access using madvise and prefetch to control buffer placement Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> --- tests/intel/xe_multi_gpusvm.c | 181 ++++++++++++++++++++++++++++++++++ 1 file changed, 181 insertions(+) diff --git a/tests/intel/xe_multi_gpusvm.c b/tests/intel/xe_multi_gpusvm.c index 6792ef72c..2c8e62e34 100644 --- a/tests/intel/xe_multi_gpusvm.c +++ b/tests/intel/xe_multi_gpusvm.c @@ -13,6 +13,8 @@ #include "intel_mocs.h" #include "intel_reg.h" +#include "time.h" + #include "xe/xe_ioctl.h" #include "xe/xe_query.h" #include "xe/xe_util.h" @@ -41,6 +43,11 @@ * Description: * This test checks coherency in multi-gpu by writing from GPU0 * reading from GPU1 and verify and repeating with CPU and both GPUs + * + * SUBTEST: latency-multi-gpu + * Description: + * This test measures and compares latency and bandwidth for buffer access + * from CPU, local GPU, and remote GPU */ #define MAX_XE_REGIONS 8 @@ -103,6 +110,11 @@ static void gpu_coherecy_test_wrapper(struct xe_svm_gpu_info *src, struct drm_xe_engine_class_instance *eci, void *extra_args); +static void gpu_latency_test_wrapper(struct xe_svm_gpu_info *src, + struct xe_svm_gpu_info *dst, + struct drm_xe_engine_class_instance *eci, + void *extra_args); + static void create_vm_and_queue(struct xe_svm_gpu_info *gpu, struct drm_xe_engine_class_instance *eci, uint32_t *vm, uint32_t *exec_queue) @@ -197,6 +209,11 @@ static void for_each_gpu_pair(int num_gpus, struct xe_svm_gpu_info *gpus, static void open_pagemaps(int fd, struct xe_svm_gpu_info *info); +static double time_diff(struct timespec *start, struct timespec *end) +{ + return (end->tv_sec - start->tv_sec) + (end->tv_nsec - start->tv_nsec) / 1e9; +} + static void atomic_batch_init(int fd, uint32_t vm, uint64_t src_addr, uint32_t *bo, uint64_t *addr) @@ -549,6 +566,147 @@ coherency_test_multigpu(struct xe_svm_gpu_info *gpu0, cleanup_vm_and_queue(gpu1, vm[1], exec_queue[1]); } +static void +latency_test_multigpu(struct xe_svm_gpu_info *gpu0, + struct xe_svm_gpu_info *gpu1, + struct drm_xe_engine_class_instance *eci, + bool remote_copy, + bool prefetch_req) +{ + uint64_t addr; + uint32_t vm[2]; + uint32_t exec_queue[2]; + uint32_t batch_bo; + uint8_t *copy_dst; + uint64_t batch_addr; + struct drm_xe_sync sync = {}; + volatile uint64_t *sync_addr; + int value = 60; + int shared_val[4]; + struct test_exec_data *data; + struct timespec t_start, t_end; + double cpu_latency, gpu1_latency, gpu2_latency; + double cpu_bw, gpu1_bw, gpu2_bw; + + create_vm_and_queue(gpu0, eci, &vm[0], &exec_queue[0]); + create_vm_and_queue(gpu1, eci, &vm[1], &exec_queue[1]); + + data = aligned_alloc(SZ_2M, SZ_4K); + igt_assert(data); + data[0].vm_sync = 0; + addr = to_user_pointer(data); + + copy_dst = aligned_alloc(SZ_2M, SZ_4K); + igt_assert(copy_dst); + + store_dword_batch_init(gpu0->fd, vm[0], addr, &batch_bo, &batch_addr, value); + + /* Measure GPU0 access latency/bandwidth */ + clock_gettime(CLOCK_MONOTONIC, &t_start); + + /* GPU0(src_gpu) access */ + xe_multigpu_madvise(gpu0->fd, vm[0], addr, SZ_4K, 0, + DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, + gpu0->fd, 0, gpu0->vram_regions[0], exec_queue[0], + 0, 0); + + setup_sync(&sync, &sync_addr, BIND_SYNC_VAL); + xe_multigpu_prefetch(gpu0->fd, vm[0], addr, SZ_4K, &sync, + sync_addr, exec_queue[0], prefetch_req); + + clock_gettime(CLOCK_MONOTONIC, &t_end); + gpu1_latency = time_diff(&t_start, &t_end); + gpu1_bw = COPY_SIZE / gpu1_latency / (1024 * 1024); // MB/s + + sync_addr = (void *)((char *)batch_addr + SZ_4K); + sync.addr = to_user_pointer((uint64_t *)sync_addr); + sync.timeline_value = EXEC_SYNC_VAL; + *sync_addr = 0; + + /* Execute STORE command on GPU0 */ + xe_exec_sync(gpu0->fd, exec_queue[0], batch_addr, &sync, 1); + if (*sync_addr != EXEC_SYNC_VAL) + xe_wait_ufence(gpu0->fd, (uint64_t *)sync_addr, EXEC_SYNC_VAL, exec_queue[0], + NSEC_PER_SEC * 10); + + memcpy(shared_val, (void *)addr, 4); + igt_assert_eq(shared_val[0], value); + + /* CPU writes 10, memset set bytes no integer hence memset fills 4 bytes with 0x0A */ + memset((void *)(uintptr_t)addr, 10, sizeof(int)); + memcpy(shared_val, (void *)(uintptr_t)addr, sizeof(shared_val)); + igt_assert_eq(shared_val[0], 0x0A0A0A0A); + + *(uint64_t *)addr = 50; + + if(remote_copy) { + igt_info("creating batch for COPY_CMD on GPU1\n"); + batch_init(gpu1->fd, vm[1], addr, to_user_pointer(copy_dst), + SZ_4K, &batch_bo, &batch_addr); + } else { + igt_info("creating batch for STORE_CMD on GPU1\n"); + store_dword_batch_init(gpu1->fd, vm[1], addr, &batch_bo, &batch_addr, value + 10); + } + + /* Measure GPU1 access latency/bandwidth */ + clock_gettime(CLOCK_MONOTONIC, &t_start); + + /* GPU1(dst_gpu) access */ + xe_multigpu_madvise(gpu1->fd, vm[1], addr, SZ_4K, 0, + DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, + gpu1->fd, 0, gpu1->vram_regions[0], exec_queue[1], + 0, 0); + + setup_sync(&sync, &sync_addr, BIND_SYNC_VAL); + xe_multigpu_prefetch(gpu1->fd, vm[1], addr, SZ_4K, &sync, + sync_addr, exec_queue[1], prefetch_req); + + clock_gettime(CLOCK_MONOTONIC, &t_end); + gpu2_latency = time_diff(&t_start, &t_end); + gpu2_bw = COPY_SIZE / gpu2_latency / (1024 * 1024); // MB/s + + sync_addr = (void *)((char *)batch_addr + SZ_4K); + sync.addr = to_user_pointer((uint64_t *)sync_addr); + sync.timeline_value = EXEC_SYNC_VAL; + *sync_addr = 0; + + /* Execute COPY/STORE command on GPU1 */ + xe_exec_sync(gpu1->fd, exec_queue[1], batch_addr, &sync, 1); + if (*sync_addr != EXEC_SYNC_VAL) + xe_wait_ufence(gpu1->fd, (uint64_t *)sync_addr, EXEC_SYNC_VAL, exec_queue[1], + NSEC_PER_SEC * 10); + + if (!remote_copy) + igt_assert_eq(*(uint64_t *)addr, value + 10); + else + igt_assert_eq(*(uint64_t *)copy_dst, 50); + + /* CPU writes 11, memset set bytes no integer hence memset fills 4 bytes with 0x0B */ + /* Measure CPU access latency/bandwidth */ + clock_gettime(CLOCK_MONOTONIC, &t_start); + memset((void *)(uintptr_t)addr, 11, sizeof(int)); + memcpy(shared_val, (void *)(uintptr_t)addr, sizeof(shared_val)); + clock_gettime(CLOCK_MONOTONIC, &t_end); + cpu_latency = time_diff(&t_start, &t_end); + cpu_bw = COPY_SIZE / cpu_latency / (1024 * 1024); // MB/s + + igt_assert_eq(shared_val[0], 0x0B0B0B0B); + + /* Print results */ + igt_info("CPU: Latency %.6f s, Bandwidth %.2f MB/s\n", cpu_latency, cpu_bw); + igt_info("GPU: Latency %.6f s, Bandwidth %.2f MB/s\n", gpu1_latency, gpu1_bw); + igt_info("GPU: Latency %.6f s, Bandwidth %.2f MB/s\n", gpu2_latency, gpu2_bw); + + munmap((void *)batch_addr, BATCH_SIZE(gpu0->fd)); + batch_fini(gpu0->fd, vm[0], batch_bo, batch_addr); + batch_fini(gpu1->fd, vm[1], batch_bo, batch_addr); + free(data); + free(copy_dst); + + cleanup_vm_and_queue(gpu0, vm[0], exec_queue[0]); + cleanup_vm_and_queue(gpu1, vm[1], exec_queue[1]); +} + static void atomic_inc_op(struct xe_svm_gpu_info *gpu0, struct xe_svm_gpu_info *gpu1, @@ -661,6 +819,19 @@ gpu_coherecy_test_wrapper(struct xe_svm_gpu_info *src, coherency_test_multigpu(src, dst, eci, args->op_mod, args->prefetch_req); } +static void +gpu_latency_test_wrapper(struct xe_svm_gpu_info *src, + struct xe_svm_gpu_info *dst, + struct drm_xe_engine_class_instance *eci, + void *extra_args) +{ + struct multigpu_ops_args *args = (struct multigpu_ops_args *)extra_args; + igt_assert(src); + igt_assert(dst); + + latency_test_multigpu(src, dst, eci, args->op_mod, args->prefetch_req); +} + igt_main { struct xe_svm_gpu_info gpus[MAX_XE_GPUS]; @@ -718,6 +889,16 @@ igt_main for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_coherecy_test_wrapper, &coh_args); } + igt_subtest("latency-multi-gpu") { + struct multigpu_ops_args latency_args; + latency_args.prefetch_req = 1; + latency_args.op_mod = 1; + for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_latency_test_wrapper, &latency_args); + latency_args.prefetch_req = 0; + latency_args.op_mod = 0; + for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_latency_test_wrapper, &latency_args); + } + igt_fixture { int cnt; -- 2.48.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH i-g-t v7 07/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU fault handling test 2025-11-13 17:04 [PATCH i-g-t v7 00/10] Add SVM madvise feature for multi-GPU configurations nishit.sharma ` (5 preceding siblings ...) 2025-11-13 17:04 ` [PATCH i-g-t v7 06/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU performance test nishit.sharma @ 2025-11-13 17:04 ` nishit.sharma 2025-11-13 17:04 ` [PATCH i-g-t v7 08/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU simultaneous access test nishit.sharma 2025-11-13 17:04 ` [PATCH i-g-t v7 09/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU conflicting madvise test nishit.sharma 8 siblings, 0 replies; 21+ messages in thread From: nishit.sharma @ 2025-11-13 17:04 UTC (permalink / raw) To: igt-dev, nishit.sharma From: Nishit Sharma <nishit.sharma@intel.com> This test intentionally triggers page faults by accessing regions without prefetch for both GPUs in a multi-GPU environment. Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> --- tests/intel/xe_multi_gpusvm.c | 102 ++++++++++++++++++++++++++++++++++ 1 file changed, 102 insertions(+) diff --git a/tests/intel/xe_multi_gpusvm.c b/tests/intel/xe_multi_gpusvm.c index 2c8e62e34..6feb543ae 100644 --- a/tests/intel/xe_multi_gpusvm.c +++ b/tests/intel/xe_multi_gpusvm.c @@ -15,6 +15,7 @@ #include "time.h" +#include "xe/xe_gt.h" #include "xe/xe_ioctl.h" #include "xe/xe_query.h" #include "xe/xe_util.h" @@ -48,6 +49,11 @@ * Description: * This test measures and compares latency and bandwidth for buffer access * from CPU, local GPU, and remote GPU + * + * SUBTEST: pagefault-multi-gpu + * Description: + * This test intentionally triggers page faults by accessing unmapped SVM + * regions from both GPUs */ #define MAX_XE_REGIONS 8 @@ -115,6 +121,11 @@ static void gpu_latency_test_wrapper(struct xe_svm_gpu_info *src, struct drm_xe_engine_class_instance *eci, void *extra_args); +static void gpu_fault_test_wrapper(struct xe_svm_gpu_info *src, + struct xe_svm_gpu_info *dst, + struct drm_xe_engine_class_instance *eci, + void *extra_args); + static void create_vm_and_queue(struct xe_svm_gpu_info *gpu, struct drm_xe_engine_class_instance *eci, uint32_t *vm, uint32_t *exec_queue) @@ -707,6 +718,76 @@ latency_test_multigpu(struct xe_svm_gpu_info *gpu0, cleanup_vm_and_queue(gpu1, vm[1], exec_queue[1]); } +static void +pagefault_test_multigpu(struct xe_svm_gpu_info *gpu0, + struct xe_svm_gpu_info *gpu1, + struct drm_xe_engine_class_instance *eci, + bool prefetch_req) +{ + uint64_t addr; + uint32_t vm[2]; + uint32_t exec_queue[2]; + uint32_t batch_bo; + uint64_t batch_addr; + struct drm_xe_sync sync = {}; + volatile uint64_t *sync_addr; + int value = 60, pf_count_1, pf_count_2; + void *data; + const char *pf_count_stat = "svm_pagefault_count"; + + create_vm_and_queue(gpu0, eci, &vm[0], &exec_queue[0]); + create_vm_and_queue(gpu1, eci, &vm[1], &exec_queue[1]); + + data = aligned_alloc(SZ_2M, SZ_4K); + igt_assert(data); + addr = to_user_pointer(data); + + pf_count_1 = xe_gt_stats_get_count(gpu0->fd, eci->gt_id, pf_count_stat); + + /* checking pagefault count on GPU */ + store_dword_batch_init(gpu0->fd, vm[0], addr, &batch_bo, &batch_addr, value); + + xe_multigpu_madvise(gpu0->fd, vm[0], addr, SZ_4K, 0, + DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, + gpu0->fd, 0, gpu0->vram_regions[0], exec_queue[0], + 0, 0); + + setup_sync(&sync, &sync_addr, BIND_SYNC_VAL); + xe_multigpu_prefetch(gpu0->fd, vm[0], addr, SZ_4K, &sync, + sync_addr, exec_queue[0], prefetch_req); + + sync_addr = (void *)((char *)batch_addr + SZ_4K); + sync.addr = to_user_pointer((uint64_t *)sync_addr); + sync.timeline_value = EXEC_SYNC_VAL; + *sync_addr = 0; + + /* Execute STORE command on GPU */ + xe_exec_sync(gpu0->fd, exec_queue[0], batch_addr, &sync, 1); + if (*sync_addr != EXEC_SYNC_VAL) + xe_wait_ufence(gpu0->fd, (uint64_t *)sync_addr, EXEC_SYNC_VAL, exec_queue[0], + NSEC_PER_SEC * 10); + + pf_count_2 = xe_gt_stats_get_count(gpu0->fd, eci->gt_id, pf_count_stat); + + if (pf_count_2 != pf_count_1) { + igt_warn("GPU pf: pf_count_2(%d) != pf_count_1(%d) prefetch_req :%d\n", + pf_count_2, pf_count_1, prefetch_req); + } + + igt_assert_eq(*(uint64_t *)addr, value); + + /* CPU writes 11, memset set bytes no integer hence memset fills 4 bytes with 0x0B */ + memset((void *)(uintptr_t)addr, 11, sizeof(int)); + igt_assert_eq(*(uint64_t *)addr, 0x0B0B0B0B); + + munmap((void *)batch_addr, BATCH_SIZE(gpu0->fd)); + batch_fini(gpu0->fd, vm[0], batch_bo, batch_addr); + free(data); + + cleanup_vm_and_queue(gpu0, vm[0], exec_queue[0]); + cleanup_vm_and_queue(gpu1, vm[1], exec_queue[1]); +} + static void atomic_inc_op(struct xe_svm_gpu_info *gpu0, struct xe_svm_gpu_info *gpu1, @@ -832,6 +913,19 @@ gpu_latency_test_wrapper(struct xe_svm_gpu_info *src, latency_test_multigpu(src, dst, eci, args->op_mod, args->prefetch_req); } +static void +gpu_fault_test_wrapper(struct xe_svm_gpu_info *src, + struct xe_svm_gpu_info *dst, + struct drm_xe_engine_class_instance *eci, + void *extra_args) +{ + struct multigpu_ops_args *args = (struct multigpu_ops_args *)extra_args; + igt_assert(src); + igt_assert(dst); + + pagefault_test_multigpu(src, dst, eci, args->prefetch_req); +} + igt_main { struct xe_svm_gpu_info gpus[MAX_XE_GPUS]; @@ -899,6 +993,14 @@ igt_main for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_latency_test_wrapper, &latency_args); } + igt_subtest("pagefault-multi-gpu") { + struct multigpu_ops_args fault_args; + fault_args.prefetch_req = 1; + for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_fault_test_wrapper, &fault_args); + fault_args.prefetch_req = 0; + for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_fault_test_wrapper, &fault_args); + } + igt_fixture { int cnt; -- 2.48.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH i-g-t v7 08/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU simultaneous access test 2025-11-13 17:04 [PATCH i-g-t v7 00/10] Add SVM madvise feature for multi-GPU configurations nishit.sharma ` (6 preceding siblings ...) 2025-11-13 17:04 ` [PATCH i-g-t v7 07/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU fault handling test nishit.sharma @ 2025-11-13 17:04 ` nishit.sharma 2025-11-13 17:04 ` [PATCH i-g-t v7 09/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU conflicting madvise test nishit.sharma 8 siblings, 0 replies; 21+ messages in thread From: nishit.sharma @ 2025-11-13 17:04 UTC (permalink / raw) To: igt-dev, nishit.sharma From: Nishit Sharma <nishit.sharma@intel.com> This test launches compute or copy workloads on both GPUs that access the same SVM buffer, using synchronization primitives (fences/semaphores) to coordinate access. It verifies data integrity and checks for the absence of race conditions in a multi-GPU SVM environment. Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> --- tests/intel/xe_multi_gpusvm.c | 133 ++++++++++++++++++++++++++++++++++ 1 file changed, 133 insertions(+) diff --git a/tests/intel/xe_multi_gpusvm.c b/tests/intel/xe_multi_gpusvm.c index 6feb543ae..dc2a8f9c8 100644 --- a/tests/intel/xe_multi_gpusvm.c +++ b/tests/intel/xe_multi_gpusvm.c @@ -54,6 +54,11 @@ * Description: * This test intentionally triggers page faults by accessing unmapped SVM * regions from both GPUs + * + * SUBTEST: concurrent-access-multi-gpu + * Description: + * This tests aunches simultaneous workloads on both GPUs accessing the + * same SVM buffer synchronizes with fences, and verifies data integrity */ #define MAX_XE_REGIONS 8 @@ -126,6 +131,11 @@ static void gpu_fault_test_wrapper(struct xe_svm_gpu_info *src, struct drm_xe_engine_class_instance *eci, void *extra_args); +static void gpu_simult_test_wrapper(struct xe_svm_gpu_info *src, + struct xe_svm_gpu_info *dst, + struct drm_xe_engine_class_instance *eci, + void *extra_args); + static void create_vm_and_queue(struct xe_svm_gpu_info *gpu, struct drm_xe_engine_class_instance *eci, uint32_t *vm, uint32_t *exec_queue) @@ -900,6 +910,108 @@ gpu_coherecy_test_wrapper(struct xe_svm_gpu_info *src, coherency_test_multigpu(src, dst, eci, args->op_mod, args->prefetch_req); } +static void +multigpu_access_test(struct xe_svm_gpu_info *gpu0, + struct xe_svm_gpu_info *gpu1, + struct drm_xe_engine_class_instance *eci, + bool no_prefetch) +{ + uint64_t addr; + uint32_t vm[2]; + uint32_t exec_queue[2]; + uint32_t batch_bo[2]; + struct test_exec_data *data; + uint64_t batch_addr[2]; + struct drm_xe_sync sync[2] = {}; + volatile uint64_t *sync_addr[2]; + volatile uint32_t *shared_val; + + create_vm_and_queue(gpu0, eci, &vm[0], &exec_queue[0]); + create_vm_and_queue(gpu1, eci, &vm[1], &exec_queue[1]); + + data = aligned_alloc(SZ_2M, SZ_4K); + igt_assert(data); + data[0].vm_sync = 0; + addr = to_user_pointer(data); + + shared_val = (volatile uint32_t *)addr; + *shared_val = ATOMIC_OP_VAL - 1; + + atomic_batch_init(gpu0->fd, vm[0], addr, &batch_bo[0], &batch_addr[0]); + *shared_val = ATOMIC_OP_VAL - 2; + atomic_batch_init(gpu1->fd, vm[1], addr, &batch_bo[1], &batch_addr[1]); + + /* Place destination in an optionally remote location to test */ + xe_multigpu_madvise(gpu0->fd, vm[0], addr, SZ_4K, 0, + DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, + gpu0->fd, 0, gpu0->vram_regions[0], exec_queue[0], + 0, 0); + xe_multigpu_madvise(gpu1->fd, vm[1], addr, SZ_4K, 0, + DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, + gpu1->fd, 0, gpu1->vram_regions[0], exec_queue[1], + 0, 0); + + setup_sync(&sync[0], &sync_addr[0], BIND_SYNC_VAL); + setup_sync(&sync[1], &sync_addr[1], BIND_SYNC_VAL); + + /* For simultaneous access need to call xe_wait_ufence for both gpus after prefetch */ + if(!no_prefetch) { + xe_vm_prefetch_async(gpu0->fd, vm[0], 0, 0, addr, + SZ_4K, &sync[0], 1, + DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC); + + xe_vm_prefetch_async(gpu1->fd, vm[1], 0, 0, addr, + SZ_4K, &sync[1], 1, + DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC); + + if (*sync_addr[0] != BIND_SYNC_VAL) + xe_wait_ufence(gpu0->fd, (uint64_t *)sync_addr[0], BIND_SYNC_VAL, exec_queue[0], + NSEC_PER_SEC * 10); + free((void *)sync_addr[0]); + if (*sync_addr[1] != BIND_SYNC_VAL) + xe_wait_ufence(gpu1->fd, (uint64_t *)sync_addr[1], BIND_SYNC_VAL, exec_queue[1], + NSEC_PER_SEC * 10); + free((void *)sync_addr[1]); + } + + if (no_prefetch) { + free((void *)sync_addr[0]); + free((void *)sync_addr[1]); + } + + for (int i = 0; i < 100; i++) { + sync_addr[0] = (void *)((char *)batch_addr[0] + SZ_4K); + sync[0].addr = to_user_pointer((uint64_t *)sync_addr[0]); + sync[0].timeline_value = EXEC_SYNC_VAL; + + sync_addr[1] = (void *)((char *)batch_addr[1] + SZ_4K); + sync[1].addr = to_user_pointer((uint64_t *)sync_addr[1]); + sync[1].timeline_value = EXEC_SYNC_VAL; + *sync_addr[0] = 0; + *sync_addr[1] = 0; + + xe_exec_sync(gpu0->fd, exec_queue[0], batch_addr[0], &sync[0], 1); + if (*sync_addr[0] != EXEC_SYNC_VAL) + xe_wait_ufence(gpu0->fd, (uint64_t *)sync_addr[0], EXEC_SYNC_VAL, exec_queue[0], + NSEC_PER_SEC * 10); + xe_exec_sync(gpu1->fd, exec_queue[1], batch_addr[1], &sync[1], 1); + if (*sync_addr[1] != EXEC_SYNC_VAL) + xe_wait_ufence(gpu1->fd, (uint64_t *)sync_addr[1], EXEC_SYNC_VAL, exec_queue[1], + NSEC_PER_SEC * 10); + } + + igt_assert_eq(*(uint64_t *)addr, 254); + + munmap((void *)batch_addr[0], BATCH_SIZE(gpu0->fd)); + munmap((void *)batch_addr[1], BATCH_SIZE(gpu0->fd)); + batch_fini(gpu0->fd, vm[0], batch_bo[0], batch_addr[0]); + batch_fini(gpu1->fd, vm[1], batch_bo[1], batch_addr[1]); + free(data); + + cleanup_vm_and_queue(gpu0, vm[0], exec_queue[0]); + cleanup_vm_and_queue(gpu1, vm[1], exec_queue[1]); +} + static void gpu_latency_test_wrapper(struct xe_svm_gpu_info *src, struct xe_svm_gpu_info *dst, @@ -926,6 +1038,19 @@ gpu_fault_test_wrapper(struct xe_svm_gpu_info *src, pagefault_test_multigpu(src, dst, eci, args->prefetch_req); } +static void +gpu_simult_test_wrapper(struct xe_svm_gpu_info *src, + struct xe_svm_gpu_info *dst, + struct drm_xe_engine_class_instance *eci, + void *extra_args) +{ + struct multigpu_ops_args *args = (struct multigpu_ops_args *)extra_args; + igt_assert(src); + igt_assert(dst); + + multigpu_access_test(src, dst, eci, args->prefetch_req); +} + igt_main { struct xe_svm_gpu_info gpus[MAX_XE_GPUS]; @@ -1001,6 +1126,14 @@ igt_main for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_fault_test_wrapper, &fault_args); } + igt_subtest("concurrent-access-multi-gpu") { + struct multigpu_ops_args simul_args; + simul_args.prefetch_req = 1; + for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_simult_test_wrapper, &simul_args); + simul_args.prefetch_req = 0; + for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_simult_test_wrapper, &simul_args); + } + igt_fixture { int cnt; -- 2.48.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH i-g-t v7 09/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU conflicting madvise test 2025-11-13 17:04 [PATCH i-g-t v7 00/10] Add SVM madvise feature for multi-GPU configurations nishit.sharma ` (7 preceding siblings ...) 2025-11-13 17:04 ` [PATCH i-g-t v7 08/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU simultaneous access test nishit.sharma @ 2025-11-13 17:04 ` nishit.sharma 8 siblings, 0 replies; 21+ messages in thread From: nishit.sharma @ 2025-11-13 17:04 UTC (permalink / raw) To: igt-dev, nishit.sharma From: Nishit Sharma <nishit.sharma@intel.com> This test calls madvise operations on GPU0 with the preferred location set to GPU1 and vice versa. It reports conflicts when conflicting memory advice is given for shared SVM buffers in a multi-GPU environment. Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> --- tests/intel/xe_multi_gpusvm.c | 143 ++++++++++++++++++++++++++++++++++ 1 file changed, 143 insertions(+) diff --git a/tests/intel/xe_multi_gpusvm.c b/tests/intel/xe_multi_gpusvm.c index dc2a8f9c8..afbf010e6 100644 --- a/tests/intel/xe_multi_gpusvm.c +++ b/tests/intel/xe_multi_gpusvm.c @@ -59,6 +59,11 @@ * Description: * This tests aunches simultaneous workloads on both GPUs accessing the * same SVM buffer synchronizes with fences, and verifies data integrity + * + * SUBTEST: conflicting-madvise-gpu + * Description: + * This test checks conflicting madvise by allocating shared buffer + * prefetches from both and checks for migration conflicts */ #define MAX_XE_REGIONS 8 @@ -69,6 +74,8 @@ #define EXEC_SYNC_VAL 0x676767 #define COPY_SIZE SZ_64M #define ATOMIC_OP_VAL 56 +#define USER_FENCE_VALUE 0xdeadbeefdeadbeefull +#define FIVE_SEC (5LL * NSEC_PER_SEC) struct xe_svm_gpu_info { bool supports_faults; @@ -136,6 +143,11 @@ static void gpu_simult_test_wrapper(struct xe_svm_gpu_info *src, struct drm_xe_engine_class_instance *eci, void *extra_args); +static void gpu_conflict_test_wrapper(struct xe_svm_gpu_info *src, + struct xe_svm_gpu_info *dst, + struct drm_xe_engine_class_instance *eci, + void *extra_args); + static void create_vm_and_queue(struct xe_svm_gpu_info *gpu, struct drm_xe_engine_class_instance *eci, uint32_t *vm, uint32_t *exec_queue) @@ -798,6 +810,116 @@ pagefault_test_multigpu(struct xe_svm_gpu_info *gpu0, cleanup_vm_and_queue(gpu1, vm[1], exec_queue[1]); } +#define XE_BO_FLAG_SYSTEM BIT(1) +#define XE_BO_FLAG_CPU_ADDR_MIRROR BIT(24) + +static void +conflicting_madvise(struct xe_svm_gpu_info *gpu0, + struct xe_svm_gpu_info *gpu1, + struct drm_xe_engine_class_instance *eci, + bool no_prefetch) +{ + uint64_t addr; + uint32_t vm[2]; + uint32_t exec_queue[2]; + uint32_t batch_bo[2]; + void *data; + uint64_t batch_addr[2]; + struct drm_xe_sync sync[2] = {}; + volatile uint64_t *sync_addr[2]; + int local_fd; + uint16_t local_vram; + + create_vm_and_queue(gpu0, eci, &vm[0], &exec_queue[0]); + create_vm_and_queue(gpu1, eci, &vm[1], &exec_queue[1]); + + data = aligned_alloc(SZ_2M, SZ_4K); + igt_assert(data); + addr = to_user_pointer(data); + + xe_vm_madvise(gpu0->fd, vm[0], addr, SZ_4K, 0, + DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, + DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM, 0, 0); + + store_dword_batch_init(gpu0->fd, vm[0], addr, &batch_bo[0], &batch_addr[0], 10); + store_dword_batch_init(gpu1->fd, vm[0], addr, &batch_bo[1], &batch_addr[1], 20); + + /* Place destination in an optionally remote location to test */ + local_fd = gpu0->fd; + local_vram = gpu0->vram_regions[0]; + xe_multigpu_madvise(gpu0->fd, vm[0], addr, SZ_4K, + 0, DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, + gpu1->fd, 0, gpu1->vram_regions[0], exec_queue[0], + local_fd, local_vram); + + local_fd = gpu1->fd; + local_vram = gpu1->vram_regions[0]; + xe_multigpu_madvise(gpu1->fd, vm[1], addr, SZ_4K, + 0, DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, + gpu0->fd, 0, gpu0->vram_regions[0], exec_queue[0], + local_fd, local_vram); + + setup_sync(&sync[0], &sync_addr[0], BIND_SYNC_VAL); + setup_sync(&sync[1], &sync_addr[1], BIND_SYNC_VAL); + + /* For simultaneous access need to call xe_wait_ufence for both gpus after prefetch */ + if(!no_prefetch) { + xe_vm_prefetch_async(gpu0->fd, vm[0], 0, 0, addr, + SZ_4K, &sync[0], 1, + DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC); + + xe_vm_prefetch_async(gpu1->fd, vm[1], 0, 0, addr, + SZ_4K, &sync[1], 1, + DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC); + + if (*sync_addr[0] != BIND_SYNC_VAL) + xe_wait_ufence(gpu0->fd, (uint64_t *)sync_addr[0], BIND_SYNC_VAL, exec_queue[0], + NSEC_PER_SEC * 10); + free((void *)sync_addr[0]); + if (*sync_addr[1] != BIND_SYNC_VAL) + xe_wait_ufence(gpu1->fd, (uint64_t *)sync_addr[1], BIND_SYNC_VAL, exec_queue[1], + NSEC_PER_SEC * 10); + free((void *)sync_addr[1]); + } + + if (no_prefetch) { + free((void *)sync_addr[0]); + free((void *)sync_addr[1]); + } + + for (int i = 0; i < 1; i++) { + sync_addr[0] = (void *)((char *)batch_addr[0] + SZ_4K); + sync[0].addr = to_user_pointer((uint64_t *)sync_addr[0]); + sync[0].timeline_value = EXEC_SYNC_VAL; + + sync_addr[1] = (void *)((char *)batch_addr[1] + SZ_4K); + sync[1].addr = to_user_pointer((uint64_t *)sync_addr[1]); + sync[1].timeline_value = EXEC_SYNC_VAL; + *sync_addr[0] = 0; + *sync_addr[1] = 0; + + xe_exec_sync(gpu0->fd, exec_queue[0], batch_addr[0], &sync[0], 1); + if (*sync_addr[0] != EXEC_SYNC_VAL) + xe_wait_ufence(gpu0->fd, (uint64_t *)sync_addr[0], EXEC_SYNC_VAL, exec_queue[0], + NSEC_PER_SEC * 10); + xe_exec_sync(gpu1->fd, exec_queue[1], batch_addr[1], &sync[1], 1); + if (*sync_addr[1] != EXEC_SYNC_VAL) + xe_wait_ufence(gpu1->fd, (uint64_t *)sync_addr[1], EXEC_SYNC_VAL, exec_queue[1], + NSEC_PER_SEC * 10); + } + + igt_assert_eq(*(uint64_t *)addr, 20); + + munmap((void *)batch_addr[0], BATCH_SIZE(gpu0->fd)); + munmap((void *)batch_addr[1], BATCH_SIZE(gpu0->fd)); + batch_fini(gpu0->fd, vm[0], batch_bo[0], batch_addr[0]); + batch_fini(gpu1->fd, vm[1], batch_bo[1], batch_addr[1]); + free(data); + + cleanup_vm_and_queue(gpu0, vm[0], exec_queue[0]); + cleanup_vm_and_queue(gpu1, vm[1], exec_queue[1]); +} + static void atomic_inc_op(struct xe_svm_gpu_info *gpu0, struct xe_svm_gpu_info *gpu1, @@ -1012,6 +1134,19 @@ multigpu_access_test(struct xe_svm_gpu_info *gpu0, cleanup_vm_and_queue(gpu1, vm[1], exec_queue[1]); } +static void +gpu_conflict_test_wrapper(struct xe_svm_gpu_info *src, + struct xe_svm_gpu_info *dst, + struct drm_xe_engine_class_instance *eci, + void *extra_args) +{ + struct multigpu_ops_args *args = (struct multigpu_ops_args *)extra_args; + igt_assert(src); + igt_assert(dst); + + conflicting_madvise(src, dst, eci, args->prefetch_req); +} + static void gpu_latency_test_wrapper(struct xe_svm_gpu_info *src, struct xe_svm_gpu_info *dst, @@ -1108,6 +1243,14 @@ igt_main for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_coherecy_test_wrapper, &coh_args); } + igt_subtest("conflicting-madvise-gpu") { + struct multigpu_ops_args conflict_args; + conflict_args.prefetch_req = 1; + for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_conflict_test_wrapper, &conflict_args); + conflict_args.prefetch_req = 0; + for_each_gpu_pair(gpu_cnt, gpus, &eci, gpu_conflict_test_wrapper, &conflict_args); + } + igt_subtest("latency-multi-gpu") { struct multigpu_ops_args latency_args; latency_args.prefetch_req = 1; -- 2.48.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH i-g-t v7 00/10] Madvise feature in SVM for Multi-GPU configs
@ 2025-11-13 17:16 nishit.sharma
2025-11-13 17:16 ` [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers nishit.sharma
0 siblings, 1 reply; 21+ messages in thread
From: nishit.sharma @ 2025-11-13 17:16 UTC (permalink / raw)
To: igt-dev
From: Nishit Sharma <nishit.sharma@intel.com>
This patch series adds comprehensive SVM multi-GPU IGT test coverage for
madvise and prefetch functionality.
ver2:
- Test name changed in commits
- In patchwork v1 patch is missing due to last patch was not sent
ver3:
- In patch-7 tags were added and it was not sent on patchwork
ver4:
- In patch file was added which is not available in source which caused
CI build failure.
ver5:
- Added subtest function wrappers
- Subtests executing for all GPUs enumerated
ver7:
- Optimized function calling which are frequenctly called
- Incorporated review comments (Thomas Hellstrom)
Nishit Sharma (10):
lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync
helpers
tests/intel/xe_exec_system_allocator: Add parameter in madvise call
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU cross-GPU memory access
test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU atomic operations
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU coherency test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU performance test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU fault handling test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU simultaneous access
test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU conflicting madvise
test
tests/intel/xe_multi-gpusvm: Add SVM multi-GPU migration test
include/drm-uapi/xe_drm.h | 4 +-
lib/xe/xe_ioctl.c | 53 +-
lib/xe/xe_ioctl.h | 11 +-
tests/intel/xe_exec_system_allocator.c | 8 +-
tests/intel/xe_multi_gpusvm.c | 1441 ++++++++++++++++++++++++
tests/meson.build | 1 +
6 files changed, 1504 insertions(+), 14 deletions(-)
create mode 100644 tests/intel/xe_multi_gpusvm.c
--
2.48.1
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers 2025-11-13 17:16 [PATCH i-g-t v7 00/10] Madvise feature in SVM for Multi-GPU configs nishit.sharma @ 2025-11-13 17:16 ` nishit.sharma 0 siblings, 0 replies; 21+ messages in thread From: nishit.sharma @ 2025-11-13 17:16 UTC (permalink / raw) To: igt-dev From: Nishit Sharma <nishit.sharma@intel.com> Add an 'instance' parameter to xe_vm_madvise and __xe_vm_madvise to support per-instance memory advice operations.Implement xe_vm_bind_lr_sync and xe_vm_unbind_lr_sync helpers for synchronous VM bind/unbind using user fences. These changes improve memory advice and binding operations for multi-GPU and multi-instance scenarios in IGT tests. Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> --- include/drm-uapi/xe_drm.h | 4 +-- lib/xe/xe_ioctl.c | 53 +++++++++++++++++++++++++++++++++++---- lib/xe/xe_ioctl.h | 11 +++++--- 3 files changed, 58 insertions(+), 10 deletions(-) diff --git a/include/drm-uapi/xe_drm.h b/include/drm-uapi/xe_drm.h index 89ab54935..3472efa58 100644 --- a/include/drm-uapi/xe_drm.h +++ b/include/drm-uapi/xe_drm.h @@ -2060,8 +2060,8 @@ struct drm_xe_madvise { /** @preferred_mem_loc.migration_policy: Page migration policy */ __u16 migration_policy; - /** @preferred_mem_loc.pad : MBZ */ - __u16 pad; + /** @preferred_mem_loc.region_instance: Region instance */ + __u16 region_instance; /** @preferred_mem_loc.reserved : Reserved */ __u64 reserved; diff --git a/lib/xe/xe_ioctl.c b/lib/xe/xe_ioctl.c index 39c4667a1..06ce8a339 100644 --- a/lib/xe/xe_ioctl.c +++ b/lib/xe/xe_ioctl.c @@ -687,7 +687,8 @@ int64_t xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, } int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, - uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy) + uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy, + uint16_t instance) { struct drm_xe_madvise madvise = { .type = type, @@ -704,6 +705,7 @@ int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, case DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC: madvise.preferred_mem_loc.devmem_fd = op_val; madvise.preferred_mem_loc.migration_policy = policy; + madvise.preferred_mem_loc.region_instance = instance; igt_debug("madvise.preferred_mem_loc.devmem_fd = %d\n", madvise.preferred_mem_loc.devmem_fd); break; @@ -731,14 +733,55 @@ int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, * @type: type of attribute * @op_val: fd/atomic value/pat index, depending upon type of operation * @policy: Page migration policy + * @instance: vram instance * * Function initializes different members of struct drm_xe_madvise and calls * MADVISE IOCTL . * - * Asserts in case of error returned by DRM_IOCTL_XE_MADVISE. + * Returns error number in failure and 0 if pass. */ -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, - uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy) +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, + uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy, + uint16_t instance) { - igt_assert_eq(__xe_vm_madvise(fd, vm, addr, range, ext, type, op_val, policy), 0); + return __xe_vm_madvise(fd, vm, addr, range, ext, type, op_val, policy, instance); +} + +#define BIND_SYNC_VAL 0x686868 +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, uint64_t offset, + uint64_t addr, uint64_t size, uint32_t flags) +{ + volatile uint64_t *sync_addr = malloc(sizeof(*sync_addr)); + struct drm_xe_sync sync = { + .flags = DRM_XE_SYNC_FLAG_SIGNAL, + .type = DRM_XE_SYNC_TYPE_USER_FENCE, + .addr = to_user_pointer((uint64_t *)sync_addr), + .timeline_value = BIND_SYNC_VAL, + }; + + igt_assert(!!sync_addr); + xe_vm_bind_async_flags(fd, vm, 0, bo, 0, addr, size, &sync, 1, flags); + if (*sync_addr != BIND_SYNC_VAL) + xe_wait_ufence(fd, (uint64_t *)sync_addr, BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); + /* Only free if the wait succeeds */ + free((void *)sync_addr); +} + +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, + uint64_t addr, uint64_t size) +{ + volatile uint64_t *sync_addr = malloc(sizeof(*sync_addr)); + struct drm_xe_sync sync = { + .flags = DRM_XE_SYNC_FLAG_SIGNAL, + .type = DRM_XE_SYNC_TYPE_USER_FENCE, + .addr = to_user_pointer((uint64_t *)sync_addr), + .timeline_value = BIND_SYNC_VAL, + }; + + igt_assert(!!sync_addr); + *sync_addr = 0; + xe_vm_unbind_async(fd, vm, 0, 0, addr, size, &sync, 1); + if (*sync_addr != BIND_SYNC_VAL) + xe_wait_ufence(fd, (uint64_t *)sync_addr, BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); + free((void *)sync_addr); } diff --git a/lib/xe/xe_ioctl.h b/lib/xe/xe_ioctl.h index ae8a23a54..1ae38029d 100644 --- a/lib/xe/xe_ioctl.h +++ b/lib/xe/xe_ioctl.h @@ -100,13 +100,18 @@ int __xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, int64_t xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, uint32_t exec_queue, int64_t timeout); int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, - uint32_t type, uint32_t op_val, uint16_t policy); -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, - uint32_t type, uint32_t op_val, uint16_t policy); + uint32_t type, uint32_t op_val, uint16_t policy, uint16_t instance); +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, + uint32_t type, uint32_t op_val, uint16_t policy, uint16_t instance); int xe_vm_number_vmas_in_range(int fd, struct drm_xe_vm_query_mem_range_attr *vmas_attr); int xe_vm_vma_attrs(int fd, struct drm_xe_vm_query_mem_range_attr *vmas_attr, struct drm_xe_mem_range_attr *mem_attr); struct drm_xe_mem_range_attr *xe_vm_get_mem_attr_values_in_range(int fd, uint32_t vm, uint64_t start, uint64_t range, uint32_t *num_ranges); +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, + uint64_t offset, uint64_t addr, + uint64_t size, uint32_t flags); +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, + uint64_t addr, uint64_t size); #endif /* XE_IOCTL_H */ -- 2.48.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH i-g-t v7 00/10] Madvise feature in SVM for Multi-GPU configs
@ 2025-11-13 17:15 nishit.sharma
2025-11-13 17:15 ` [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers nishit.sharma
0 siblings, 1 reply; 21+ messages in thread
From: nishit.sharma @ 2025-11-13 17:15 UTC (permalink / raw)
To: igt-dev
From: Nishit Sharma <nishit.sharma@intel.com>
This patch series adds comprehensive SVM multi-GPU IGT test coverage for
madvise and prefetch functionality.
ver2:
- Test name changed in commits
- In patchwork v1 patch is missing due to last patch was not sent
ver3:
- In patch-7 tags were added and it was not sent on patchwork
ver4:
- In patch file was added which is not available in source which caused
CI build failure.
ver5:
- Added subtest function wrappers
- Subtests executing for all GPUs enumerated
ver7:
- Optimized function calling which are frequenctly called
- Incorporated review comments (Thomas Hellstrom)
Nishit Sharma (10):
lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync
helpers
tests/intel/xe_exec_system_allocator: Add parameter in madvise call
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU cross-GPU memory access
test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU atomic operations
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU coherency test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU performance test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU fault handling test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU simultaneous access
test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU conflicting madvise
test
tests/intel/xe_multi-gpusvm: Add SVM multi-GPU migration test
include/drm-uapi/xe_drm.h | 4 +-
lib/xe/xe_ioctl.c | 53 +-
lib/xe/xe_ioctl.h | 11 +-
tests/intel/xe_exec_system_allocator.c | 8 +-
tests/intel/xe_multi_gpusvm.c | 1441 ++++++++++++++++++++++++
tests/meson.build | 1 +
6 files changed, 1504 insertions(+), 14 deletions(-)
create mode 100644 tests/intel/xe_multi_gpusvm.c
--
2.48.1
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers 2025-11-13 17:15 [PATCH i-g-t v7 00/10] Madvise feature in SVM for Multi-GPU configs nishit.sharma @ 2025-11-13 17:15 ` nishit.sharma 0 siblings, 0 replies; 21+ messages in thread From: nishit.sharma @ 2025-11-13 17:15 UTC (permalink / raw) To: igt-dev From: Nishit Sharma <nishit.sharma@intel.com> Add an 'instance' parameter to xe_vm_madvise and __xe_vm_madvise to support per-instance memory advice operations.Implement xe_vm_bind_lr_sync and xe_vm_unbind_lr_sync helpers for synchronous VM bind/unbind using user fences. These changes improve memory advice and binding operations for multi-GPU and multi-instance scenarios in IGT tests. Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> --- include/drm-uapi/xe_drm.h | 4 +-- lib/xe/xe_ioctl.c | 53 +++++++++++++++++++++++++++++++++++---- lib/xe/xe_ioctl.h | 11 +++++--- 3 files changed, 58 insertions(+), 10 deletions(-) diff --git a/include/drm-uapi/xe_drm.h b/include/drm-uapi/xe_drm.h index 89ab54935..3472efa58 100644 --- a/include/drm-uapi/xe_drm.h +++ b/include/drm-uapi/xe_drm.h @@ -2060,8 +2060,8 @@ struct drm_xe_madvise { /** @preferred_mem_loc.migration_policy: Page migration policy */ __u16 migration_policy; - /** @preferred_mem_loc.pad : MBZ */ - __u16 pad; + /** @preferred_mem_loc.region_instance: Region instance */ + __u16 region_instance; /** @preferred_mem_loc.reserved : Reserved */ __u64 reserved; diff --git a/lib/xe/xe_ioctl.c b/lib/xe/xe_ioctl.c index 39c4667a1..06ce8a339 100644 --- a/lib/xe/xe_ioctl.c +++ b/lib/xe/xe_ioctl.c @@ -687,7 +687,8 @@ int64_t xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, } int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, - uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy) + uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy, + uint16_t instance) { struct drm_xe_madvise madvise = { .type = type, @@ -704,6 +705,7 @@ int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, case DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC: madvise.preferred_mem_loc.devmem_fd = op_val; madvise.preferred_mem_loc.migration_policy = policy; + madvise.preferred_mem_loc.region_instance = instance; igt_debug("madvise.preferred_mem_loc.devmem_fd = %d\n", madvise.preferred_mem_loc.devmem_fd); break; @@ -731,14 +733,55 @@ int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, * @type: type of attribute * @op_val: fd/atomic value/pat index, depending upon type of operation * @policy: Page migration policy + * @instance: vram instance * * Function initializes different members of struct drm_xe_madvise and calls * MADVISE IOCTL . * - * Asserts in case of error returned by DRM_IOCTL_XE_MADVISE. + * Returns error number in failure and 0 if pass. */ -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, - uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy) +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, + uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy, + uint16_t instance) { - igt_assert_eq(__xe_vm_madvise(fd, vm, addr, range, ext, type, op_val, policy), 0); + return __xe_vm_madvise(fd, vm, addr, range, ext, type, op_val, policy, instance); +} + +#define BIND_SYNC_VAL 0x686868 +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, uint64_t offset, + uint64_t addr, uint64_t size, uint32_t flags) +{ + volatile uint64_t *sync_addr = malloc(sizeof(*sync_addr)); + struct drm_xe_sync sync = { + .flags = DRM_XE_SYNC_FLAG_SIGNAL, + .type = DRM_XE_SYNC_TYPE_USER_FENCE, + .addr = to_user_pointer((uint64_t *)sync_addr), + .timeline_value = BIND_SYNC_VAL, + }; + + igt_assert(!!sync_addr); + xe_vm_bind_async_flags(fd, vm, 0, bo, 0, addr, size, &sync, 1, flags); + if (*sync_addr != BIND_SYNC_VAL) + xe_wait_ufence(fd, (uint64_t *)sync_addr, BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); + /* Only free if the wait succeeds */ + free((void *)sync_addr); +} + +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, + uint64_t addr, uint64_t size) +{ + volatile uint64_t *sync_addr = malloc(sizeof(*sync_addr)); + struct drm_xe_sync sync = { + .flags = DRM_XE_SYNC_FLAG_SIGNAL, + .type = DRM_XE_SYNC_TYPE_USER_FENCE, + .addr = to_user_pointer((uint64_t *)sync_addr), + .timeline_value = BIND_SYNC_VAL, + }; + + igt_assert(!!sync_addr); + *sync_addr = 0; + xe_vm_unbind_async(fd, vm, 0, 0, addr, size, &sync, 1); + if (*sync_addr != BIND_SYNC_VAL) + xe_wait_ufence(fd, (uint64_t *)sync_addr, BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); + free((void *)sync_addr); } diff --git a/lib/xe/xe_ioctl.h b/lib/xe/xe_ioctl.h index ae8a23a54..1ae38029d 100644 --- a/lib/xe/xe_ioctl.h +++ b/lib/xe/xe_ioctl.h @@ -100,13 +100,18 @@ int __xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, int64_t xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, uint32_t exec_queue, int64_t timeout); int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, - uint32_t type, uint32_t op_val, uint16_t policy); -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, - uint32_t type, uint32_t op_val, uint16_t policy); + uint32_t type, uint32_t op_val, uint16_t policy, uint16_t instance); +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, + uint32_t type, uint32_t op_val, uint16_t policy, uint16_t instance); int xe_vm_number_vmas_in_range(int fd, struct drm_xe_vm_query_mem_range_attr *vmas_attr); int xe_vm_vma_attrs(int fd, struct drm_xe_vm_query_mem_range_attr *vmas_attr, struct drm_xe_mem_range_attr *mem_attr); struct drm_xe_mem_range_attr *xe_vm_get_mem_attr_values_in_range(int fd, uint32_t vm, uint64_t start, uint64_t range, uint32_t *num_ranges); +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, + uint64_t offset, uint64_t addr, + uint64_t size, uint32_t flags); +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, + uint64_t addr, uint64_t size); #endif /* XE_IOCTL_H */ -- 2.48.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH i-g-t v7 00/10] SVM madvise feature in multi-GPU config
@ 2025-11-13 17:09 nishit.sharma
2025-11-13 17:09 ` [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers nishit.sharma
0 siblings, 1 reply; 21+ messages in thread
From: nishit.sharma @ 2025-11-13 17:09 UTC (permalink / raw)
To: igt-dev, nishit.sharma
From: Nishit Sharma <nishit.sharma@intel.com>
This patch series adds comprehensive SVM multi-GPU IGT test coverage for
madvise and prefetch functionality.
ver2:
- Test name changed in commits
- In patchwork v1 patch is missing due to last patch was not sent
ver3:
- In patch-7 tags were added and it was not sent on patchwork
ver4:
- In patch file was added which is not available in source which caused
CI build failure.
ver5:
- Added subtest function wrappers
- Subtests executing for all GPUs enumerated
ver7:
- Optimized function calling which are frequenctly called
- Incorporated review comments (Thomas Hellstrom)
Nishit Sharma (10):
lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync
helpers
tests/intel/xe_exec_system_allocator: Add parameter in madvise call
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU cross-GPU memory access
test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU atomic operations
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU coherency test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU performance test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU fault handling test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU simultaneous access
test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU conflicting madvise
test
tests/intel/xe_multi-gpusvm: Add SVM multi-GPU migration test
include/drm-uapi/xe_drm.h | 4 +-
lib/xe/xe_ioctl.c | 53 +-
lib/xe/xe_ioctl.h | 11 +-
tests/intel/xe_exec_system_allocator.c | 8 +-
tests/intel/xe_multi_gpusvm.c | 1441 ++++++++++++++++++++++++
tests/meson.build | 1 +
6 files changed, 1504 insertions(+), 14 deletions(-)
create mode 100644 tests/intel/xe_multi_gpusvm.c
--
2.48.1
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers 2025-11-13 17:09 [PATCH i-g-t v7 00/10] SVM madvise feature in multi-GPU config nishit.sharma @ 2025-11-13 17:09 ` nishit.sharma 0 siblings, 0 replies; 21+ messages in thread From: nishit.sharma @ 2025-11-13 17:09 UTC (permalink / raw) To: igt-dev, nishit.sharma From: Nishit Sharma <nishit.sharma@intel.com> Add an 'instance' parameter to xe_vm_madvise and __xe_vm_madvise to support per-instance memory advice operations.Implement xe_vm_bind_lr_sync and xe_vm_unbind_lr_sync helpers for synchronous VM bind/unbind using user fences. These changes improve memory advice and binding operations for multi-GPU and multi-instance scenarios in IGT tests. Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> --- include/drm-uapi/xe_drm.h | 4 +-- lib/xe/xe_ioctl.c | 53 +++++++++++++++++++++++++++++++++++---- lib/xe/xe_ioctl.h | 11 +++++--- 3 files changed, 58 insertions(+), 10 deletions(-) diff --git a/include/drm-uapi/xe_drm.h b/include/drm-uapi/xe_drm.h index 89ab54935..3472efa58 100644 --- a/include/drm-uapi/xe_drm.h +++ b/include/drm-uapi/xe_drm.h @@ -2060,8 +2060,8 @@ struct drm_xe_madvise { /** @preferred_mem_loc.migration_policy: Page migration policy */ __u16 migration_policy; - /** @preferred_mem_loc.pad : MBZ */ - __u16 pad; + /** @preferred_mem_loc.region_instance: Region instance */ + __u16 region_instance; /** @preferred_mem_loc.reserved : Reserved */ __u64 reserved; diff --git a/lib/xe/xe_ioctl.c b/lib/xe/xe_ioctl.c index 39c4667a1..06ce8a339 100644 --- a/lib/xe/xe_ioctl.c +++ b/lib/xe/xe_ioctl.c @@ -687,7 +687,8 @@ int64_t xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, } int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, - uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy) + uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy, + uint16_t instance) { struct drm_xe_madvise madvise = { .type = type, @@ -704,6 +705,7 @@ int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, case DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC: madvise.preferred_mem_loc.devmem_fd = op_val; madvise.preferred_mem_loc.migration_policy = policy; + madvise.preferred_mem_loc.region_instance = instance; igt_debug("madvise.preferred_mem_loc.devmem_fd = %d\n", madvise.preferred_mem_loc.devmem_fd); break; @@ -731,14 +733,55 @@ int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, * @type: type of attribute * @op_val: fd/atomic value/pat index, depending upon type of operation * @policy: Page migration policy + * @instance: vram instance * * Function initializes different members of struct drm_xe_madvise and calls * MADVISE IOCTL . * - * Asserts in case of error returned by DRM_IOCTL_XE_MADVISE. + * Returns error number in failure and 0 if pass. */ -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, - uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy) +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, + uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy, + uint16_t instance) { - igt_assert_eq(__xe_vm_madvise(fd, vm, addr, range, ext, type, op_val, policy), 0); + return __xe_vm_madvise(fd, vm, addr, range, ext, type, op_val, policy, instance); +} + +#define BIND_SYNC_VAL 0x686868 +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, uint64_t offset, + uint64_t addr, uint64_t size, uint32_t flags) +{ + volatile uint64_t *sync_addr = malloc(sizeof(*sync_addr)); + struct drm_xe_sync sync = { + .flags = DRM_XE_SYNC_FLAG_SIGNAL, + .type = DRM_XE_SYNC_TYPE_USER_FENCE, + .addr = to_user_pointer((uint64_t *)sync_addr), + .timeline_value = BIND_SYNC_VAL, + }; + + igt_assert(!!sync_addr); + xe_vm_bind_async_flags(fd, vm, 0, bo, 0, addr, size, &sync, 1, flags); + if (*sync_addr != BIND_SYNC_VAL) + xe_wait_ufence(fd, (uint64_t *)sync_addr, BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); + /* Only free if the wait succeeds */ + free((void *)sync_addr); +} + +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, + uint64_t addr, uint64_t size) +{ + volatile uint64_t *sync_addr = malloc(sizeof(*sync_addr)); + struct drm_xe_sync sync = { + .flags = DRM_XE_SYNC_FLAG_SIGNAL, + .type = DRM_XE_SYNC_TYPE_USER_FENCE, + .addr = to_user_pointer((uint64_t *)sync_addr), + .timeline_value = BIND_SYNC_VAL, + }; + + igt_assert(!!sync_addr); + *sync_addr = 0; + xe_vm_unbind_async(fd, vm, 0, 0, addr, size, &sync, 1); + if (*sync_addr != BIND_SYNC_VAL) + xe_wait_ufence(fd, (uint64_t *)sync_addr, BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); + free((void *)sync_addr); } diff --git a/lib/xe/xe_ioctl.h b/lib/xe/xe_ioctl.h index ae8a23a54..1ae38029d 100644 --- a/lib/xe/xe_ioctl.h +++ b/lib/xe/xe_ioctl.h @@ -100,13 +100,18 @@ int __xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, int64_t xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, uint32_t exec_queue, int64_t timeout); int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, - uint32_t type, uint32_t op_val, uint16_t policy); -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, - uint32_t type, uint32_t op_val, uint16_t policy); + uint32_t type, uint32_t op_val, uint16_t policy, uint16_t instance); +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, + uint32_t type, uint32_t op_val, uint16_t policy, uint16_t instance); int xe_vm_number_vmas_in_range(int fd, struct drm_xe_vm_query_mem_range_attr *vmas_attr); int xe_vm_vma_attrs(int fd, struct drm_xe_vm_query_mem_range_attr *vmas_attr, struct drm_xe_mem_range_attr *mem_attr); struct drm_xe_mem_range_attr *xe_vm_get_mem_attr_values_in_range(int fd, uint32_t vm, uint64_t start, uint64_t range, uint32_t *num_ranges); +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, + uint64_t offset, uint64_t addr, + uint64_t size, uint32_t flags); +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, + uint64_t addr, uint64_t size); #endif /* XE_IOCTL_H */ -- 2.48.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH i-g-t v7 00/10] Madvise feature in SVM for Multi-GPU configs
@ 2025-11-13 16:49 nishit.sharma
2025-11-13 16:49 ` [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers nishit.sharma
0 siblings, 1 reply; 21+ messages in thread
From: nishit.sharma @ 2025-11-13 16:49 UTC (permalink / raw)
To: igt-dev, nishit.sharma
From: Nishit Sharma <nishit.sharma@intel.com>
This patch series adds comprehensive SVM multi-GPU IGT test coverage for
madvise and prefetch functionality.
ver2:
- Test name changed in commits
- In patchwork v1 patch is missing due to last patch was not sent
ver3:
- In patch-7 tags were added and it was not sent on patchwork
ver4:
- In patch file was added which is not available in source which caused
CI build failure.
ver5:
- Added subtest function wrappers
- Subtests executing for all GPUs enumerated
ver7:
- Optimized function calling which are frequenctly called
- Incorporated review comments (Thomas Hellstrom)
Nishit Sharma (10):
lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync
helpers
tests/intel/xe_exec_system_allocator: Add parameter in madvise call
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU cross-GPU memory access
test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU atomic operations
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU coherency test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU performance test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU fault handling test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU simultaneous access
test
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU conflicting madvise
test
tests/intel/xe_multi-gpusvm: Add SVM multi-GPU migration test
include/drm-uapi/xe_drm.h | 4 +-
lib/xe/xe_ioctl.c | 53 +-
lib/xe/xe_ioctl.h | 11 +-
tests/intel/xe_exec_system_allocator.c | 8 +-
tests/intel/xe_multi_gpusvm.c | 1441 ++++++++++++++++++++++++
tests/meson.build | 1 +
6 files changed, 1504 insertions(+), 14 deletions(-)
create mode 100644 tests/intel/xe_multi_gpusvm.c
--
2.48.1
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers 2025-11-13 16:49 [PATCH i-g-t v7 00/10] Madvise feature in SVM for Multi-GPU configs nishit.sharma @ 2025-11-13 16:49 ` nishit.sharma 0 siblings, 0 replies; 21+ messages in thread From: nishit.sharma @ 2025-11-13 16:49 UTC (permalink / raw) To: igt-dev, nishit.sharma From: Nishit Sharma <nishit.sharma@intel.com> Add an 'instance' parameter to xe_vm_madvise and __xe_vm_madvise to support per-instance memory advice operations.Implement xe_vm_bind_lr_sync and xe_vm_unbind_lr_sync helpers for synchronous VM bind/unbind using user fences. These changes improve memory advice and binding operations for multi-GPU and multi-instance scenarios in IGT tests. Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> --- include/drm-uapi/xe_drm.h | 4 +-- lib/xe/xe_ioctl.c | 53 +++++++++++++++++++++++++++++++++++---- lib/xe/xe_ioctl.h | 11 +++++--- 3 files changed, 58 insertions(+), 10 deletions(-) diff --git a/include/drm-uapi/xe_drm.h b/include/drm-uapi/xe_drm.h index 89ab54935..3472efa58 100644 --- a/include/drm-uapi/xe_drm.h +++ b/include/drm-uapi/xe_drm.h @@ -2060,8 +2060,8 @@ struct drm_xe_madvise { /** @preferred_mem_loc.migration_policy: Page migration policy */ __u16 migration_policy; - /** @preferred_mem_loc.pad : MBZ */ - __u16 pad; + /** @preferred_mem_loc.region_instance: Region instance */ + __u16 region_instance; /** @preferred_mem_loc.reserved : Reserved */ __u64 reserved; diff --git a/lib/xe/xe_ioctl.c b/lib/xe/xe_ioctl.c index 39c4667a1..06ce8a339 100644 --- a/lib/xe/xe_ioctl.c +++ b/lib/xe/xe_ioctl.c @@ -687,7 +687,8 @@ int64_t xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, } int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, - uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy) + uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy, + uint16_t instance) { struct drm_xe_madvise madvise = { .type = type, @@ -704,6 +705,7 @@ int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, case DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC: madvise.preferred_mem_loc.devmem_fd = op_val; madvise.preferred_mem_loc.migration_policy = policy; + madvise.preferred_mem_loc.region_instance = instance; igt_debug("madvise.preferred_mem_loc.devmem_fd = %d\n", madvise.preferred_mem_loc.devmem_fd); break; @@ -731,14 +733,55 @@ int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, * @type: type of attribute * @op_val: fd/atomic value/pat index, depending upon type of operation * @policy: Page migration policy + * @instance: vram instance * * Function initializes different members of struct drm_xe_madvise and calls * MADVISE IOCTL . * - * Asserts in case of error returned by DRM_IOCTL_XE_MADVISE. + * Returns error number in failure and 0 if pass. */ -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, - uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy) +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, + uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy, + uint16_t instance) { - igt_assert_eq(__xe_vm_madvise(fd, vm, addr, range, ext, type, op_val, policy), 0); + return __xe_vm_madvise(fd, vm, addr, range, ext, type, op_val, policy, instance); +} + +#define BIND_SYNC_VAL 0x686868 +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, uint64_t offset, + uint64_t addr, uint64_t size, uint32_t flags) +{ + volatile uint64_t *sync_addr = malloc(sizeof(*sync_addr)); + struct drm_xe_sync sync = { + .flags = DRM_XE_SYNC_FLAG_SIGNAL, + .type = DRM_XE_SYNC_TYPE_USER_FENCE, + .addr = to_user_pointer((uint64_t *)sync_addr), + .timeline_value = BIND_SYNC_VAL, + }; + + igt_assert(!!sync_addr); + xe_vm_bind_async_flags(fd, vm, 0, bo, 0, addr, size, &sync, 1, flags); + if (*sync_addr != BIND_SYNC_VAL) + xe_wait_ufence(fd, (uint64_t *)sync_addr, BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); + /* Only free if the wait succeeds */ + free((void *)sync_addr); +} + +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, + uint64_t addr, uint64_t size) +{ + volatile uint64_t *sync_addr = malloc(sizeof(*sync_addr)); + struct drm_xe_sync sync = { + .flags = DRM_XE_SYNC_FLAG_SIGNAL, + .type = DRM_XE_SYNC_TYPE_USER_FENCE, + .addr = to_user_pointer((uint64_t *)sync_addr), + .timeline_value = BIND_SYNC_VAL, + }; + + igt_assert(!!sync_addr); + *sync_addr = 0; + xe_vm_unbind_async(fd, vm, 0, 0, addr, size, &sync, 1); + if (*sync_addr != BIND_SYNC_VAL) + xe_wait_ufence(fd, (uint64_t *)sync_addr, BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); + free((void *)sync_addr); } diff --git a/lib/xe/xe_ioctl.h b/lib/xe/xe_ioctl.h index ae8a23a54..1ae38029d 100644 --- a/lib/xe/xe_ioctl.h +++ b/lib/xe/xe_ioctl.h @@ -100,13 +100,18 @@ int __xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, int64_t xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, uint32_t exec_queue, int64_t timeout); int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, - uint32_t type, uint32_t op_val, uint16_t policy); -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, - uint32_t type, uint32_t op_val, uint16_t policy); + uint32_t type, uint32_t op_val, uint16_t policy, uint16_t instance); +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, + uint32_t type, uint32_t op_val, uint16_t policy, uint16_t instance); int xe_vm_number_vmas_in_range(int fd, struct drm_xe_vm_query_mem_range_attr *vmas_attr); int xe_vm_vma_attrs(int fd, struct drm_xe_vm_query_mem_range_attr *vmas_attr, struct drm_xe_mem_range_attr *mem_attr); struct drm_xe_mem_range_attr *xe_vm_get_mem_attr_values_in_range(int fd, uint32_t vm, uint64_t start, uint64_t range, uint32_t *num_ranges); +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, + uint64_t offset, uint64_t addr, + uint64_t size, uint32_t flags); +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, + uint64_t addr, uint64_t size); #endif /* XE_IOCTL_H */ -- 2.48.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH i-g-t v7 00/10] Madvise feature in SVM for Multi-GPU configs
@ 2025-11-13 16:32 nishit.sharma
2025-11-13 16:33 ` [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers nishit.sharma
0 siblings, 1 reply; 21+ messages in thread
From: nishit.sharma @ 2025-11-13 16:32 UTC (permalink / raw)
To: igt-dev, thomas.hellstrom, nishit.sharma
From: Nishit Sharma <nishit.sharma@intel.com>
Tis patch series adds comprehensive SVM multi-GPU IGT test coverage for
madvise and prefetch functionality.
ver2:
- Test name changed in commits
- In patchwork v1 patch is missing due to last patch was not sent
ver3:
- In patch-7 tags were added and it was not sent on patchwork
ver4:
- In patch file was added which is not available in source which caused
CI build failure.
ver5:
- Added subtest function wrappers
- Subtests executing for all GPUs enumerated
ver7:
- Optimized function calling which are frequenctly called
- Incorporated review comments (Thomas Hellstrom)
Nishit Sharma (10):
lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync
helpers
tests/intel/xe_exec_system_allocator: Add parameter in madvise call
tests/intel/xe_multi_gpusvm: Add SVM multi-GPU cross-GPU memory access
test
tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU atomic operations
tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU coherency test
tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU performance test
tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU fault handling test
tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU simultaneous access
test
tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU conflicting madvise
test
tests/intel/xe_multi-gpusvm.c: Add SVM multi-GPU migration test
include/drm-uapi/xe_drm.h | 4 +-
lib/xe/xe_ioctl.c | 53 +-
lib/xe/xe_ioctl.h | 11 +-
tests/intel/xe_exec_system_allocator.c | 8 +-
tests/intel/xe_multi_gpusvm.c | 1441 ++++++++++++++++++++++++
tests/meson.build | 1 +
6 files changed, 1504 insertions(+), 14 deletions(-)
create mode 100644 tests/intel/xe_multi_gpusvm.c
--
2.48.1
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers 2025-11-13 16:32 [PATCH i-g-t v7 00/10] Madvise feature in SVM for Multi-GPU configs nishit.sharma @ 2025-11-13 16:33 ` nishit.sharma 2025-11-17 12:34 ` Hellstrom, Thomas 0 siblings, 1 reply; 21+ messages in thread From: nishit.sharma @ 2025-11-13 16:33 UTC (permalink / raw) To: igt-dev, thomas.hellstrom, nishit.sharma From: Nishit Sharma <nishit.sharma@intel.com> Add an 'instance' parameter to xe_vm_madvise and __xe_vm_madvise to support per-instance memory advice operations.Implement xe_vm_bind_lr_sync and xe_vm_unbind_lr_sync helpers for synchronous VM bind/unbind using user fences. These changes improve memory advice and binding operations for multi-GPU and multi-instance scenarios in IGT tests. Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> --- include/drm-uapi/xe_drm.h | 4 +-- lib/xe/xe_ioctl.c | 53 +++++++++++++++++++++++++++++++++++---- lib/xe/xe_ioctl.h | 11 +++++--- 3 files changed, 58 insertions(+), 10 deletions(-) diff --git a/include/drm-uapi/xe_drm.h b/include/drm-uapi/xe_drm.h index 89ab54935..3472efa58 100644 --- a/include/drm-uapi/xe_drm.h +++ b/include/drm-uapi/xe_drm.h @@ -2060,8 +2060,8 @@ struct drm_xe_madvise { /** @preferred_mem_loc.migration_policy: Page migration policy */ __u16 migration_policy; - /** @preferred_mem_loc.pad : MBZ */ - __u16 pad; + /** @preferred_mem_loc.region_instance: Region instance */ + __u16 region_instance; /** @preferred_mem_loc.reserved : Reserved */ __u64 reserved; diff --git a/lib/xe/xe_ioctl.c b/lib/xe/xe_ioctl.c index 39c4667a1..06ce8a339 100644 --- a/lib/xe/xe_ioctl.c +++ b/lib/xe/xe_ioctl.c @@ -687,7 +687,8 @@ int64_t xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, } int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, - uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy) + uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy, + uint16_t instance) { struct drm_xe_madvise madvise = { .type = type, @@ -704,6 +705,7 @@ int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, case DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC: madvise.preferred_mem_loc.devmem_fd = op_val; madvise.preferred_mem_loc.migration_policy = policy; + madvise.preferred_mem_loc.region_instance = instance; igt_debug("madvise.preferred_mem_loc.devmem_fd = %d\n", madvise.preferred_mem_loc.devmem_fd); break; @@ -731,14 +733,55 @@ int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, * @type: type of attribute * @op_val: fd/atomic value/pat index, depending upon type of operation * @policy: Page migration policy + * @instance: vram instance * * Function initializes different members of struct drm_xe_madvise and calls * MADVISE IOCTL . * - * Asserts in case of error returned by DRM_IOCTL_XE_MADVISE. + * Returns error number in failure and 0 if pass. */ -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, - uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy) +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, + uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy, + uint16_t instance) { - igt_assert_eq(__xe_vm_madvise(fd, vm, addr, range, ext, type, op_val, policy), 0); + return __xe_vm_madvise(fd, vm, addr, range, ext, type, op_val, policy, instance); +} + +#define BIND_SYNC_VAL 0x686868 +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, uint64_t offset, + uint64_t addr, uint64_t size, uint32_t flags) +{ + volatile uint64_t *sync_addr = malloc(sizeof(*sync_addr)); + struct drm_xe_sync sync = { + .flags = DRM_XE_SYNC_FLAG_SIGNAL, + .type = DRM_XE_SYNC_TYPE_USER_FENCE, + .addr = to_user_pointer((uint64_t *)sync_addr), + .timeline_value = BIND_SYNC_VAL, + }; + + igt_assert(!!sync_addr); + xe_vm_bind_async_flags(fd, vm, 0, bo, 0, addr, size, &sync, 1, flags); + if (*sync_addr != BIND_SYNC_VAL) + xe_wait_ufence(fd, (uint64_t *)sync_addr, BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); + /* Only free if the wait succeeds */ + free((void *)sync_addr); +} + +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, + uint64_t addr, uint64_t size) +{ + volatile uint64_t *sync_addr = malloc(sizeof(*sync_addr)); + struct drm_xe_sync sync = { + .flags = DRM_XE_SYNC_FLAG_SIGNAL, + .type = DRM_XE_SYNC_TYPE_USER_FENCE, + .addr = to_user_pointer((uint64_t *)sync_addr), + .timeline_value = BIND_SYNC_VAL, + }; + + igt_assert(!!sync_addr); + *sync_addr = 0; + xe_vm_unbind_async(fd, vm, 0, 0, addr, size, &sync, 1); + if (*sync_addr != BIND_SYNC_VAL) + xe_wait_ufence(fd, (uint64_t *)sync_addr, BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); + free((void *)sync_addr); } diff --git a/lib/xe/xe_ioctl.h b/lib/xe/xe_ioctl.h index ae8a23a54..1ae38029d 100644 --- a/lib/xe/xe_ioctl.h +++ b/lib/xe/xe_ioctl.h @@ -100,13 +100,18 @@ int __xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, int64_t xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, uint32_t exec_queue, int64_t timeout); int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, - uint32_t type, uint32_t op_val, uint16_t policy); -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, - uint32_t type, uint32_t op_val, uint16_t policy); + uint32_t type, uint32_t op_val, uint16_t policy, uint16_t instance); +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, + uint32_t type, uint32_t op_val, uint16_t policy, uint16_t instance); int xe_vm_number_vmas_in_range(int fd, struct drm_xe_vm_query_mem_range_attr *vmas_attr); int xe_vm_vma_attrs(int fd, struct drm_xe_vm_query_mem_range_attr *vmas_attr, struct drm_xe_mem_range_attr *mem_attr); struct drm_xe_mem_range_attr *xe_vm_get_mem_attr_values_in_range(int fd, uint32_t vm, uint64_t start, uint64_t range, uint32_t *num_ranges); +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, + uint64_t offset, uint64_t addr, + uint64_t size, uint32_t flags); +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, + uint64_t addr, uint64_t size); #endif /* XE_IOCTL_H */ -- 2.48.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers 2025-11-13 16:33 ` [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers nishit.sharma @ 2025-11-17 12:34 ` Hellstrom, Thomas 2025-11-17 15:43 ` Sharma, Nishit 0 siblings, 1 reply; 21+ messages in thread From: Hellstrom, Thomas @ 2025-11-17 12:34 UTC (permalink / raw) To: igt-dev@lists.freedesktop.org, Sharma, Nishit On Thu, 2025-11-13 at 16:33 +0000, nishit.sharma@intel.com wrote: > From: Nishit Sharma <nishit.sharma@intel.com> > > Add an 'instance' parameter to xe_vm_madvise and __xe_vm_madvise to > support per-instance memory advice operations.Implement > xe_vm_bind_lr_sync > and xe_vm_unbind_lr_sync helpers for synchronous VM bind/unbind using > user > fences. > These changes improve memory advice and binding operations for multi- > GPU > and multi-instance scenarios in IGT tests. s memory advice/memory advise/ ? Also the lr_sync part is unrelated and should be split out to a separate patch. Thanks, Thomas > > Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> > --- > include/drm-uapi/xe_drm.h | 4 +-- > lib/xe/xe_ioctl.c | 53 +++++++++++++++++++++++++++++++++++-- > -- > lib/xe/xe_ioctl.h | 11 +++++--- > 3 files changed, 58 insertions(+), 10 deletions(-) > > diff --git a/include/drm-uapi/xe_drm.h b/include/drm-uapi/xe_drm.h > index 89ab54935..3472efa58 100644 > --- a/include/drm-uapi/xe_drm.h > +++ b/include/drm-uapi/xe_drm.h > @@ -2060,8 +2060,8 @@ struct drm_xe_madvise { > /** @preferred_mem_loc.migration_policy: > Page migration policy */ > __u16 migration_policy; > > - /** @preferred_mem_loc.pad : MBZ */ > - __u16 pad; > + /** @preferred_mem_loc.region_instance: > Region instance */ > + __u16 region_instance; > > /** @preferred_mem_loc.reserved : Reserved > */ > __u64 reserved; > diff --git a/lib/xe/xe_ioctl.c b/lib/xe/xe_ioctl.c > index 39c4667a1..06ce8a339 100644 > --- a/lib/xe/xe_ioctl.c > +++ b/lib/xe/xe_ioctl.c > @@ -687,7 +687,8 @@ int64_t xe_wait_ufence(int fd, uint64_t *addr, > uint64_t value, > } > > int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t > range, > - uint64_t ext, uint32_t type, uint32_t op_val, > uint16_t policy) > + uint64_t ext, uint32_t type, uint32_t op_val, > uint16_t policy, > + uint16_t instance) > { > struct drm_xe_madvise madvise = { > .type = type, > @@ -704,6 +705,7 @@ int __xe_vm_madvise(int fd, uint32_t vm, uint64_t > addr, uint64_t range, > case DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC: > madvise.preferred_mem_loc.devmem_fd = op_val; > madvise.preferred_mem_loc.migration_policy = policy; > + madvise.preferred_mem_loc.region_instance = > instance; > igt_debug("madvise.preferred_mem_loc.devmem_fd = > %d\n", > madvise.preferred_mem_loc.devmem_fd); > break; > @@ -731,14 +733,55 @@ int __xe_vm_madvise(int fd, uint32_t vm, > uint64_t addr, uint64_t range, > * @type: type of attribute > * @op_val: fd/atomic value/pat index, depending upon type of > operation > * @policy: Page migration policy > + * @instance: vram instance > * > * Function initializes different members of struct drm_xe_madvise > and calls > * MADVISE IOCTL . > * > - * Asserts in case of error returned by DRM_IOCTL_XE_MADVISE. > + * Returns error number in failure and 0 if pass. > */ > -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t > range, > - uint64_t ext, uint32_t type, uint32_t op_val, > uint16_t policy) > +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t > range, > + uint64_t ext, uint32_t type, uint32_t op_val, > uint16_t policy, > + uint16_t instance) > { > - igt_assert_eq(__xe_vm_madvise(fd, vm, addr, range, ext, > type, op_val, policy), 0); > + return __xe_vm_madvise(fd, vm, addr, range, ext, type, > op_val, policy, instance); > +} > + > +#define BIND_SYNC_VAL 0x686868 > +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, uint64_t > offset, > + uint64_t addr, uint64_t size, uint32_t > flags) > +{ > + volatile uint64_t *sync_addr = malloc(sizeof(*sync_addr)); > + struct drm_xe_sync sync = { > + .flags = DRM_XE_SYNC_FLAG_SIGNAL, > + .type = DRM_XE_SYNC_TYPE_USER_FENCE, > + .addr = to_user_pointer((uint64_t *)sync_addr), > + .timeline_value = BIND_SYNC_VAL, > + }; > + > + igt_assert(!!sync_addr); > + xe_vm_bind_async_flags(fd, vm, 0, bo, 0, addr, size, &sync, > 1, flags); > + if (*sync_addr != BIND_SYNC_VAL) > + xe_wait_ufence(fd, (uint64_t *)sync_addr, > BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); > + /* Only free if the wait succeeds */ > + free((void *)sync_addr); > +} > + > +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, > + uint64_t addr, uint64_t size) > +{ > + volatile uint64_t *sync_addr = malloc(sizeof(*sync_addr)); > + struct drm_xe_sync sync = { > + .flags = DRM_XE_SYNC_FLAG_SIGNAL, > + .type = DRM_XE_SYNC_TYPE_USER_FENCE, > + .addr = to_user_pointer((uint64_t *)sync_addr), > + .timeline_value = BIND_SYNC_VAL, > + }; > + > + igt_assert(!!sync_addr); > + *sync_addr = 0; > + xe_vm_unbind_async(fd, vm, 0, 0, addr, size, &sync, 1); > + if (*sync_addr != BIND_SYNC_VAL) > + xe_wait_ufence(fd, (uint64_t *)sync_addr, > BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); > + free((void *)sync_addr); > } > diff --git a/lib/xe/xe_ioctl.h b/lib/xe/xe_ioctl.h > index ae8a23a54..1ae38029d 100644 > --- a/lib/xe/xe_ioctl.h > +++ b/lib/xe/xe_ioctl.h > @@ -100,13 +100,18 @@ int __xe_wait_ufence(int fd, uint64_t *addr, > uint64_t value, > int64_t xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, > uint32_t exec_queue, int64_t timeout); > int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t > range, uint64_t ext, > - uint32_t type, uint32_t op_val, uint16_t > policy); > -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t > range, uint64_t ext, > - uint32_t type, uint32_t op_val, uint16_t policy); > + uint32_t type, uint32_t op_val, uint16_t policy, > uint16_t instance); > +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t > range, uint64_t ext, > + uint32_t type, uint32_t op_val, uint16_t policy, > uint16_t instance); > int xe_vm_number_vmas_in_range(int fd, struct > drm_xe_vm_query_mem_range_attr *vmas_attr); > int xe_vm_vma_attrs(int fd, struct drm_xe_vm_query_mem_range_attr > *vmas_attr, > struct drm_xe_mem_range_attr *mem_attr); > struct drm_xe_mem_range_attr > *xe_vm_get_mem_attr_values_in_range(int fd, uint32_t vm, uint64_t > start, > uint64_t range, uint32_t > *num_ranges); > +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, > + uint64_t offset, uint64_t addr, > + uint64_t size, uint32_t flags); > +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, > + uint64_t addr, uint64_t size); > #endif /* XE_IOCTL_H */ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers 2025-11-17 12:34 ` Hellstrom, Thomas @ 2025-11-17 15:43 ` Sharma, Nishit 2025-11-18 9:23 ` Hellstrom, Thomas 0 siblings, 1 reply; 21+ messages in thread From: Sharma, Nishit @ 2025-11-17 15:43 UTC (permalink / raw) To: Hellstrom, Thomas, igt-dev@lists.freedesktop.org On 11/17/2025 6:04 PM, Hellstrom, Thomas wrote: > On Thu, 2025-11-13 at 16:33 +0000, nishit.sharma@intel.com wrote: >> From: Nishit Sharma <nishit.sharma@intel.com> >> >> Add an 'instance' parameter to xe_vm_madvise and __xe_vm_madvise to >> support per-instance memory advice operations.Implement >> xe_vm_bind_lr_sync >> and xe_vm_unbind_lr_sync helpers for synchronous VM bind/unbind using >> user >> fences. >> These changes improve memory advice and binding operations for multi- >> GPU >> and multi-instance scenarios in IGT tests. > s memory advice/memory advise/ ? Git it. Will edit the description. > > Also the lr_sync part is unrelated and should be split out to a > separate patch. Sure will create separate patch for lr_sync part. Also the xe_exec_system_allocator() changes available in Patch [2/10] should be merged along madvise changes within Patch [1/10]? > > Thanks, > Thomas > > > >> Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> >> --- >> include/drm-uapi/xe_drm.h | 4 +-- >> lib/xe/xe_ioctl.c | 53 +++++++++++++++++++++++++++++++++++-- >> -- >> lib/xe/xe_ioctl.h | 11 +++++--- >> 3 files changed, 58 insertions(+), 10 deletions(-) >> >> diff --git a/include/drm-uapi/xe_drm.h b/include/drm-uapi/xe_drm.h >> index 89ab54935..3472efa58 100644 >> --- a/include/drm-uapi/xe_drm.h >> +++ b/include/drm-uapi/xe_drm.h >> @@ -2060,8 +2060,8 @@ struct drm_xe_madvise { >> /** @preferred_mem_loc.migration_policy: >> Page migration policy */ >> __u16 migration_policy; >> >> - /** @preferred_mem_loc.pad : MBZ */ >> - __u16 pad; >> + /** @preferred_mem_loc.region_instance: >> Region instance */ >> + __u16 region_instance; >> >> /** @preferred_mem_loc.reserved : Reserved >> */ >> __u64 reserved; >> diff --git a/lib/xe/xe_ioctl.c b/lib/xe/xe_ioctl.c >> index 39c4667a1..06ce8a339 100644 >> --- a/lib/xe/xe_ioctl.c >> +++ b/lib/xe/xe_ioctl.c >> @@ -687,7 +687,8 @@ int64_t xe_wait_ufence(int fd, uint64_t *addr, >> uint64_t value, >> } >> >> int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t >> range, >> - uint64_t ext, uint32_t type, uint32_t op_val, >> uint16_t policy) >> + uint64_t ext, uint32_t type, uint32_t op_val, >> uint16_t policy, >> + uint16_t instance) >> { >> struct drm_xe_madvise madvise = { >> .type = type, >> @@ -704,6 +705,7 @@ int __xe_vm_madvise(int fd, uint32_t vm, uint64_t >> addr, uint64_t range, >> case DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC: >> madvise.preferred_mem_loc.devmem_fd = op_val; >> madvise.preferred_mem_loc.migration_policy = policy; >> + madvise.preferred_mem_loc.region_instance = >> instance; >> igt_debug("madvise.preferred_mem_loc.devmem_fd = >> %d\n", >> madvise.preferred_mem_loc.devmem_fd); >> break; >> @@ -731,14 +733,55 @@ int __xe_vm_madvise(int fd, uint32_t vm, >> uint64_t addr, uint64_t range, >> * @type: type of attribute >> * @op_val: fd/atomic value/pat index, depending upon type of >> operation >> * @policy: Page migration policy >> + * @instance: vram instance >> * >> * Function initializes different members of struct drm_xe_madvise >> and calls >> * MADVISE IOCTL . >> * >> - * Asserts in case of error returned by DRM_IOCTL_XE_MADVISE. >> + * Returns error number in failure and 0 if pass. >> */ >> -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t >> range, >> - uint64_t ext, uint32_t type, uint32_t op_val, >> uint16_t policy) >> +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t >> range, >> + uint64_t ext, uint32_t type, uint32_t op_val, >> uint16_t policy, >> + uint16_t instance) >> { >> - igt_assert_eq(__xe_vm_madvise(fd, vm, addr, range, ext, >> type, op_val, policy), 0); >> + return __xe_vm_madvise(fd, vm, addr, range, ext, type, >> op_val, policy, instance); >> +} >> + >> +#define BIND_SYNC_VAL 0x686868 >> +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, uint64_t >> offset, >> + uint64_t addr, uint64_t size, uint32_t >> flags) >> +{ >> + volatile uint64_t *sync_addr = malloc(sizeof(*sync_addr)); >> + struct drm_xe_sync sync = { >> + .flags = DRM_XE_SYNC_FLAG_SIGNAL, >> + .type = DRM_XE_SYNC_TYPE_USER_FENCE, >> + .addr = to_user_pointer((uint64_t *)sync_addr), >> + .timeline_value = BIND_SYNC_VAL, >> + }; >> + >> + igt_assert(!!sync_addr); >> + xe_vm_bind_async_flags(fd, vm, 0, bo, 0, addr, size, &sync, >> 1, flags); >> + if (*sync_addr != BIND_SYNC_VAL) >> + xe_wait_ufence(fd, (uint64_t *)sync_addr, >> BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); >> + /* Only free if the wait succeeds */ >> + free((void *)sync_addr); >> +} >> + >> +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, >> + uint64_t addr, uint64_t size) >> +{ >> + volatile uint64_t *sync_addr = malloc(sizeof(*sync_addr)); >> + struct drm_xe_sync sync = { >> + .flags = DRM_XE_SYNC_FLAG_SIGNAL, >> + .type = DRM_XE_SYNC_TYPE_USER_FENCE, >> + .addr = to_user_pointer((uint64_t *)sync_addr), >> + .timeline_value = BIND_SYNC_VAL, >> + }; >> + >> + igt_assert(!!sync_addr); >> + *sync_addr = 0; >> + xe_vm_unbind_async(fd, vm, 0, 0, addr, size, &sync, 1); >> + if (*sync_addr != BIND_SYNC_VAL) >> + xe_wait_ufence(fd, (uint64_t *)sync_addr, >> BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); >> + free((void *)sync_addr); >> } >> diff --git a/lib/xe/xe_ioctl.h b/lib/xe/xe_ioctl.h >> index ae8a23a54..1ae38029d 100644 >> --- a/lib/xe/xe_ioctl.h >> +++ b/lib/xe/xe_ioctl.h >> @@ -100,13 +100,18 @@ int __xe_wait_ufence(int fd, uint64_t *addr, >> uint64_t value, >> int64_t xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, >> uint32_t exec_queue, int64_t timeout); >> int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t >> range, uint64_t ext, >> - uint32_t type, uint32_t op_val, uint16_t >> policy); >> -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t >> range, uint64_t ext, >> - uint32_t type, uint32_t op_val, uint16_t policy); >> + uint32_t type, uint32_t op_val, uint16_t policy, >> uint16_t instance); >> +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t >> range, uint64_t ext, >> + uint32_t type, uint32_t op_val, uint16_t policy, >> uint16_t instance); >> int xe_vm_number_vmas_in_range(int fd, struct >> drm_xe_vm_query_mem_range_attr *vmas_attr); >> int xe_vm_vma_attrs(int fd, struct drm_xe_vm_query_mem_range_attr >> *vmas_attr, >> struct drm_xe_mem_range_attr *mem_attr); >> struct drm_xe_mem_range_attr >> *xe_vm_get_mem_attr_values_in_range(int fd, uint32_t vm, uint64_t >> start, >> uint64_t range, uint32_t >> *num_ranges); >> +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, >> + uint64_t offset, uint64_t addr, >> + uint64_t size, uint32_t flags); >> +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, >> + uint64_t addr, uint64_t size); >> #endif /* XE_IOCTL_H */ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers 2025-11-17 15:43 ` Sharma, Nishit @ 2025-11-18 9:23 ` Hellstrom, Thomas 0 siblings, 0 replies; 21+ messages in thread From: Hellstrom, Thomas @ 2025-11-18 9:23 UTC (permalink / raw) To: igt-dev@lists.freedesktop.org, Sharma, Nishit On Mon, 2025-11-17 at 21:13 +0530, Sharma, Nishit wrote: > > On 11/17/2025 6:04 PM, Hellstrom, Thomas wrote: > > On Thu, 2025-11-13 at 16:33 +0000, nishit.sharma@intel.com wrote: > > > From: Nishit Sharma <nishit.sharma@intel.com> > > > > > > Add an 'instance' parameter to xe_vm_madvise and __xe_vm_madvise > > > to > > > support per-instance memory advice operations.Implement > > > xe_vm_bind_lr_sync > > > and xe_vm_unbind_lr_sync helpers for synchronous VM bind/unbind > > > using > > > user > > > fences. > > > These changes improve memory advice and binding operations for > > > multi- > > > GPU > > > and multi-instance scenarios in IGT tests. > > s memory advice/memory advise/ ? > Git it. Will edit the description. > > > > Also the lr_sync part is unrelated and should be split out to a > > separate patch. > > Sure will create separate patch for lr_sync part. Also the > xe_exec_system_allocator() changes available in Patch [2/10] should > be > merged along madvise changes within > > Patch [1/10]? Yes, IIRC I noted that in that review. We need to ensure that the code compiles after each patch. /Thomas > > > > > > Thanks, > > Thomas > > > > > > > > > Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> > > > --- > > > include/drm-uapi/xe_drm.h | 4 +-- > > > lib/xe/xe_ioctl.c | 53 > > > +++++++++++++++++++++++++++++++++++-- > > > -- > > > lib/xe/xe_ioctl.h | 11 +++++--- > > > 3 files changed, 58 insertions(+), 10 deletions(-) > > > > > > diff --git a/include/drm-uapi/xe_drm.h b/include/drm- > > > uapi/xe_drm.h > > > index 89ab54935..3472efa58 100644 > > > --- a/include/drm-uapi/xe_drm.h > > > +++ b/include/drm-uapi/xe_drm.h > > > @@ -2060,8 +2060,8 @@ struct drm_xe_madvise { > > > /** @preferred_mem_loc.migration_policy: > > > Page migration policy */ > > > __u16 migration_policy; > > > > > > - /** @preferred_mem_loc.pad : MBZ */ > > > - __u16 pad; > > > + /** @preferred_mem_loc.region_instance: > > > Region instance */ > > > + __u16 region_instance; > > > > > > /** @preferred_mem_loc.reserved : > > > Reserved > > > */ > > > __u64 reserved; > > > diff --git a/lib/xe/xe_ioctl.c b/lib/xe/xe_ioctl.c > > > index 39c4667a1..06ce8a339 100644 > > > --- a/lib/xe/xe_ioctl.c > > > +++ b/lib/xe/xe_ioctl.c > > > @@ -687,7 +687,8 @@ int64_t xe_wait_ufence(int fd, uint64_t > > > *addr, > > > uint64_t value, > > > } > > > > > > int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, > > > uint64_t > > > range, > > > - uint64_t ext, uint32_t type, uint32_t > > > op_val, > > > uint16_t policy) > > > + uint64_t ext, uint32_t type, uint32_t > > > op_val, > > > uint16_t policy, > > > + uint16_t instance) > > > { > > > struct drm_xe_madvise madvise = { > > > .type = type, > > > @@ -704,6 +705,7 @@ int __xe_vm_madvise(int fd, uint32_t vm, > > > uint64_t > > > addr, uint64_t range, > > > case DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC: > > > madvise.preferred_mem_loc.devmem_fd = op_val; > > > madvise.preferred_mem_loc.migration_policy = > > > policy; > > > + madvise.preferred_mem_loc.region_instance = > > > instance; > > > igt_debug("madvise.preferred_mem_loc.devmem_fd = > > > %d\n", > > > madvise.preferred_mem_loc.devmem_fd); > > > break; > > > @@ -731,14 +733,55 @@ int __xe_vm_madvise(int fd, uint32_t vm, > > > uint64_t addr, uint64_t range, > > > * @type: type of attribute > > > * @op_val: fd/atomic value/pat index, depending upon type of > > > operation > > > * @policy: Page migration policy > > > + * @instance: vram instance > > > * > > > * Function initializes different members of struct > > > drm_xe_madvise > > > and calls > > > * MADVISE IOCTL . > > > * > > > - * Asserts in case of error returned by DRM_IOCTL_XE_MADVISE. > > > + * Returns error number in failure and 0 if pass. > > > */ > > > -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t > > > range, > > > - uint64_t ext, uint32_t type, uint32_t op_val, > > > uint16_t policy) > > > +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t > > > range, > > > + uint64_t ext, uint32_t type, uint32_t op_val, > > > uint16_t policy, > > > + uint16_t instance) > > > { > > > - igt_assert_eq(__xe_vm_madvise(fd, vm, addr, range, ext, > > > type, op_val, policy), 0); > > > + return __xe_vm_madvise(fd, vm, addr, range, ext, type, > > > op_val, policy, instance); > > > +} > > > + > > > +#define BIND_SYNC_VAL 0x686868 > > > +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, > > > uint64_t > > > offset, > > > + uint64_t addr, uint64_t size, uint32_t > > > flags) > > > +{ > > > + volatile uint64_t *sync_addr = > > > malloc(sizeof(*sync_addr)); > > > + struct drm_xe_sync sync = { > > > + .flags = DRM_XE_SYNC_FLAG_SIGNAL, > > > + .type = DRM_XE_SYNC_TYPE_USER_FENCE, > > > + .addr = to_user_pointer((uint64_t *)sync_addr), > > > + .timeline_value = BIND_SYNC_VAL, > > > + }; > > > + > > > + igt_assert(!!sync_addr); > > > + xe_vm_bind_async_flags(fd, vm, 0, bo, 0, addr, size, > > > &sync, > > > 1, flags); > > > + if (*sync_addr != BIND_SYNC_VAL) > > > + xe_wait_ufence(fd, (uint64_t *)sync_addr, > > > BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); > > > + /* Only free if the wait succeeds */ > > > + free((void *)sync_addr); > > > +} > > > + > > > +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, > > > + uint64_t addr, uint64_t size) > > > +{ > > > + volatile uint64_t *sync_addr = > > > malloc(sizeof(*sync_addr)); > > > + struct drm_xe_sync sync = { > > > + .flags = DRM_XE_SYNC_FLAG_SIGNAL, > > > + .type = DRM_XE_SYNC_TYPE_USER_FENCE, > > > + .addr = to_user_pointer((uint64_t *)sync_addr), > > > + .timeline_value = BIND_SYNC_VAL, > > > + }; > > > + > > > + igt_assert(!!sync_addr); > > > + *sync_addr = 0; > > > + xe_vm_unbind_async(fd, vm, 0, 0, addr, size, &sync, 1); > > > + if (*sync_addr != BIND_SYNC_VAL) > > > + xe_wait_ufence(fd, (uint64_t *)sync_addr, > > > BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); > > > + free((void *)sync_addr); > > > } > > > diff --git a/lib/xe/xe_ioctl.h b/lib/xe/xe_ioctl.h > > > index ae8a23a54..1ae38029d 100644 > > > --- a/lib/xe/xe_ioctl.h > > > +++ b/lib/xe/xe_ioctl.h > > > @@ -100,13 +100,18 @@ int __xe_wait_ufence(int fd, uint64_t > > > *addr, > > > uint64_t value, > > > int64_t xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, > > > uint32_t exec_queue, int64_t timeout); > > > int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, > > > uint64_t > > > range, uint64_t ext, > > > - uint32_t type, uint32_t op_val, uint16_t > > > policy); > > > -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t > > > range, uint64_t ext, > > > - uint32_t type, uint32_t op_val, uint16_t > > > policy); > > > + uint32_t type, uint32_t op_val, uint16_t > > > policy, > > > uint16_t instance); > > > +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t > > > range, uint64_t ext, > > > + uint32_t type, uint32_t op_val, uint16_t > > > policy, > > > uint16_t instance); > > > int xe_vm_number_vmas_in_range(int fd, struct > > > drm_xe_vm_query_mem_range_attr *vmas_attr); > > > int xe_vm_vma_attrs(int fd, struct > > > drm_xe_vm_query_mem_range_attr > > > *vmas_attr, > > > struct drm_xe_mem_range_attr *mem_attr); > > > struct drm_xe_mem_range_attr > > > *xe_vm_get_mem_attr_values_in_range(int fd, uint32_t vm, > > > uint64_t > > > start, > > > uint64_t range, uint32_t > > > *num_ranges); > > > +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, > > > + uint64_t offset, uint64_t addr, > > > + uint64_t size, uint32_t flags); > > > +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, > > > + uint64_t addr, uint64_t size); > > > #endif /* XE_IOCTL_H */ ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH i-g-t v2 0/7] Madvise feature in SVM for Multi-GPU configs
@ 2025-11-04 15:31 nishit.sharma
2025-11-13 17:00 ` [PATCH i-g-t v7 00/10] " Nishit Sharma
0 siblings, 1 reply; 21+ messages in thread
From: nishit.sharma @ 2025-11-04 15:31 UTC (permalink / raw)
To: igt-dev, nishit.sharma
From: Nishit Sharma <nishit.sharma@intel.com>
This patch series adds comprehensive SVM multi-GPU IGT test coverage for
madvise and prefetch functionality.
ver2:
- Test name changed in commits
- In patchwork v1 patch is missing due to last patch was not sent
Cc: thomas.hellstrom@intel.com
Nishit Sharma (7):
tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU cross-GPU memory
access test
tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU atomic operations
tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU coherency test
tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU performance test
tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU fault handling test
tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU simultaneous access
test
tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU conflicting madvise
test
lib/xe/xe_ioctl.c | 40 +
lib/xe/xe_ioctl.h | 5 +
tags | 263672 +++++++++++++++++++++++++++++++
tests/intel/xe_multi_gpusvm.c | 1379 +
tests/intel/xe_multisvm.c | 41 +-
tests/meson.build | 1 +
6 files changed, 265100 insertions(+), 38 deletions(-)
create mode 100644 tags
create mode 100644 tests/intel/xe_multi_gpusvm.c
--
2.48.1
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH i-g-t v7 00/10] Madvise feature in SVM for Multi-GPU configs 2025-11-04 15:31 [PATCH i-g-t v2 0/7] Madvise feature in SVM for Multi-GPU configs nishit.sharma @ 2025-11-13 17:00 ` Nishit Sharma 2025-11-13 17:00 ` [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers Nishit Sharma 0 siblings, 1 reply; 21+ messages in thread From: Nishit Sharma @ 2025-11-13 17:00 UTC (permalink / raw) To: igt-dev This patch series adds comprehensive SVM multi-GPU IGT test coverage for madvise and prefetch functionality. ver2: - Test name changed in commits - In patchwork v1 patch is missing due to last patch was not sent ver3: - In patch-7 tags were added and it was not sent on patchwork ver4: - In patch file was added which is not available in source which caused CI build failure. ver5: - Added subtest function wrappers - Subtests executing for all GPUs enumerated ver7: - Optimized function calling which are frequenctly called - Incorporated review comments (Thomas Hellstrom) Nishit Sharma (10): lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers tests/intel/xe_exec_system_allocator: Add parameter in madvise call tests/intel/xe_multi_gpusvm: Add SVM multi-GPU cross-GPU memory access test tests/intel/xe_multi_gpusvm: Add SVM multi-GPU atomic operations tests/intel/xe_multi_gpusvm: Add SVM multi-GPU coherency test tests/intel/xe_multi_gpusvm: Add SVM multi-GPU performance test tests/intel/xe_multi_gpusvm: Add SVM multi-GPU fault handling test tests/intel/xe_multi_gpusvm: Add SVM multi-GPU simultaneous access test tests/intel/xe_multi_gpusvm: Add SVM multi-GPU conflicting madvise test tests/intel/xe_multi-gpusvm: Add SVM multi-GPU migration test include/drm-uapi/xe_drm.h | 4 +- lib/xe/xe_ioctl.c | 53 +- lib/xe/xe_ioctl.h | 11 +- tests/intel/xe_exec_system_allocator.c | 8 +- tests/intel/xe_multi_gpusvm.c | 1441 ++++++++++++++++++++++++ tests/meson.build | 1 + 6 files changed, 1504 insertions(+), 14 deletions(-) create mode 100644 tests/intel/xe_multi_gpusvm.c -- 2.48.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers 2025-11-13 17:00 ` [PATCH i-g-t v7 00/10] " Nishit Sharma @ 2025-11-13 17:00 ` Nishit Sharma 0 siblings, 0 replies; 21+ messages in thread From: Nishit Sharma @ 2025-11-13 17:00 UTC (permalink / raw) To: igt-dev Add an 'instance' parameter to xe_vm_madvise and __xe_vm_madvise to support per-instance memory advice operations.Implement xe_vm_bind_lr_sync and xe_vm_unbind_lr_sync helpers for synchronous VM bind/unbind using user fences. These changes improve memory advice and binding operations for multi-GPU and multi-instance scenarios in IGT tests. Signed-off-by: Nishit Sharma <nishit.sharma@intel.com> --- include/drm-uapi/xe_drm.h | 4 +-- lib/xe/xe_ioctl.c | 53 +++++++++++++++++++++++++++++++++++---- lib/xe/xe_ioctl.h | 11 +++++--- 3 files changed, 58 insertions(+), 10 deletions(-) diff --git a/include/drm-uapi/xe_drm.h b/include/drm-uapi/xe_drm.h index 89ab54935..3472efa58 100644 --- a/include/drm-uapi/xe_drm.h +++ b/include/drm-uapi/xe_drm.h @@ -2060,8 +2060,8 @@ struct drm_xe_madvise { /** @preferred_mem_loc.migration_policy: Page migration policy */ __u16 migration_policy; - /** @preferred_mem_loc.pad : MBZ */ - __u16 pad; + /** @preferred_mem_loc.region_instance: Region instance */ + __u16 region_instance; /** @preferred_mem_loc.reserved : Reserved */ __u64 reserved; diff --git a/lib/xe/xe_ioctl.c b/lib/xe/xe_ioctl.c index 39c4667a1..06ce8a339 100644 --- a/lib/xe/xe_ioctl.c +++ b/lib/xe/xe_ioctl.c @@ -687,7 +687,8 @@ int64_t xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, } int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, - uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy) + uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy, + uint16_t instance) { struct drm_xe_madvise madvise = { .type = type, @@ -704,6 +705,7 @@ int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, case DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC: madvise.preferred_mem_loc.devmem_fd = op_val; madvise.preferred_mem_loc.migration_policy = policy; + madvise.preferred_mem_loc.region_instance = instance; igt_debug("madvise.preferred_mem_loc.devmem_fd = %d\n", madvise.preferred_mem_loc.devmem_fd); break; @@ -731,14 +733,55 @@ int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, * @type: type of attribute * @op_val: fd/atomic value/pat index, depending upon type of operation * @policy: Page migration policy + * @instance: vram instance * * Function initializes different members of struct drm_xe_madvise and calls * MADVISE IOCTL . * - * Asserts in case of error returned by DRM_IOCTL_XE_MADVISE. + * Returns error number in failure and 0 if pass. */ -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, - uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy) +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, + uint64_t ext, uint32_t type, uint32_t op_val, uint16_t policy, + uint16_t instance) { - igt_assert_eq(__xe_vm_madvise(fd, vm, addr, range, ext, type, op_val, policy), 0); + return __xe_vm_madvise(fd, vm, addr, range, ext, type, op_val, policy, instance); +} + +#define BIND_SYNC_VAL 0x686868 +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, uint64_t offset, + uint64_t addr, uint64_t size, uint32_t flags) +{ + volatile uint64_t *sync_addr = malloc(sizeof(*sync_addr)); + struct drm_xe_sync sync = { + .flags = DRM_XE_SYNC_FLAG_SIGNAL, + .type = DRM_XE_SYNC_TYPE_USER_FENCE, + .addr = to_user_pointer((uint64_t *)sync_addr), + .timeline_value = BIND_SYNC_VAL, + }; + + igt_assert(!!sync_addr); + xe_vm_bind_async_flags(fd, vm, 0, bo, 0, addr, size, &sync, 1, flags); + if (*sync_addr != BIND_SYNC_VAL) + xe_wait_ufence(fd, (uint64_t *)sync_addr, BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); + /* Only free if the wait succeeds */ + free((void *)sync_addr); +} + +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, + uint64_t addr, uint64_t size) +{ + volatile uint64_t *sync_addr = malloc(sizeof(*sync_addr)); + struct drm_xe_sync sync = { + .flags = DRM_XE_SYNC_FLAG_SIGNAL, + .type = DRM_XE_SYNC_TYPE_USER_FENCE, + .addr = to_user_pointer((uint64_t *)sync_addr), + .timeline_value = BIND_SYNC_VAL, + }; + + igt_assert(!!sync_addr); + *sync_addr = 0; + xe_vm_unbind_async(fd, vm, 0, 0, addr, size, &sync, 1); + if (*sync_addr != BIND_SYNC_VAL) + xe_wait_ufence(fd, (uint64_t *)sync_addr, BIND_SYNC_VAL, 0, NSEC_PER_SEC * 10); + free((void *)sync_addr); } diff --git a/lib/xe/xe_ioctl.h b/lib/xe/xe_ioctl.h index ae8a23a54..1ae38029d 100644 --- a/lib/xe/xe_ioctl.h +++ b/lib/xe/xe_ioctl.h @@ -100,13 +100,18 @@ int __xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, int64_t xe_wait_ufence(int fd, uint64_t *addr, uint64_t value, uint32_t exec_queue, int64_t timeout); int __xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, - uint32_t type, uint32_t op_val, uint16_t policy); -void xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, - uint32_t type, uint32_t op_val, uint16_t policy); + uint32_t type, uint32_t op_val, uint16_t policy, uint16_t instance); +int xe_vm_madvise(int fd, uint32_t vm, uint64_t addr, uint64_t range, uint64_t ext, + uint32_t type, uint32_t op_val, uint16_t policy, uint16_t instance); int xe_vm_number_vmas_in_range(int fd, struct drm_xe_vm_query_mem_range_attr *vmas_attr); int xe_vm_vma_attrs(int fd, struct drm_xe_vm_query_mem_range_attr *vmas_attr, struct drm_xe_mem_range_attr *mem_attr); struct drm_xe_mem_range_attr *xe_vm_get_mem_attr_values_in_range(int fd, uint32_t vm, uint64_t start, uint64_t range, uint32_t *num_ranges); +void xe_vm_bind_lr_sync(int fd, uint32_t vm, uint32_t bo, + uint64_t offset, uint64_t addr, + uint64_t size, uint32_t flags); +void xe_vm_unbind_lr_sync(int fd, uint32_t vm, uint64_t offset, + uint64_t addr, uint64_t size); #endif /* XE_IOCTL_H */ -- 2.48.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
end of thread, other threads:[~2025-11-21 14:11 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-13 17:04 [PATCH i-g-t v7 00/10] Add SVM madvise feature for multi-GPU configurations nishit.sharma 2025-11-13 17:04 ` [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers nishit.sharma 2025-11-13 17:04 ` [PATCH i-g-t v7 02/10] tests/intel/xe_exec_system_allocator: Add parameter in madvise call nishit.sharma 2025-11-18 13:25 ` Gurram, Pravalika 2025-11-13 17:04 ` [PATCH i-g-t v7 03/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU cross-GPU memory access test nishit.sharma 2025-11-13 17:04 ` [PATCH i-g-t v7 04/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU atomic operations nishit.sharma 2025-11-21 14:11 ` Gurram, Pravalika 2025-11-13 17:04 ` [PATCH i-g-t v7 05/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU coherency test nishit.sharma 2025-11-13 17:04 ` [PATCH i-g-t v7 06/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU performance test nishit.sharma 2025-11-13 17:04 ` [PATCH i-g-t v7 07/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU fault handling test nishit.sharma 2025-11-13 17:04 ` [PATCH i-g-t v7 08/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU simultaneous access test nishit.sharma 2025-11-13 17:04 ` [PATCH i-g-t v7 09/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU conflicting madvise test nishit.sharma -- strict thread matches above, loose matches on Subject: below -- 2025-11-13 17:16 [PATCH i-g-t v7 00/10] Madvise feature in SVM for Multi-GPU configs nishit.sharma 2025-11-13 17:16 ` [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers nishit.sharma 2025-11-13 17:15 [PATCH i-g-t v7 00/10] Madvise feature in SVM for Multi-GPU configs nishit.sharma 2025-11-13 17:15 ` [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers nishit.sharma 2025-11-13 17:09 [PATCH i-g-t v7 00/10] SVM madvise feature in multi-GPU config nishit.sharma 2025-11-13 17:09 ` [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers nishit.sharma 2025-11-13 16:49 [PATCH i-g-t v7 00/10] Madvise feature in SVM for Multi-GPU configs nishit.sharma 2025-11-13 16:49 ` [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers nishit.sharma 2025-11-13 16:32 [PATCH i-g-t v7 00/10] Madvise feature in SVM for Multi-GPU configs nishit.sharma 2025-11-13 16:33 ` [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers nishit.sharma 2025-11-17 12:34 ` Hellstrom, Thomas 2025-11-17 15:43 ` Sharma, Nishit 2025-11-18 9:23 ` Hellstrom, Thomas 2025-11-04 15:31 [PATCH i-g-t v2 0/7] Madvise feature in SVM for Multi-GPU configs nishit.sharma 2025-11-13 17:00 ` [PATCH i-g-t v7 00/10] " Nishit Sharma 2025-11-13 17:00 ` [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers Nishit Sharma
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).