From: "Hellstrom, Thomas" <thomas.hellstrom@intel.com>
To: "igt-dev@lists.freedesktop.org" <igt-dev@lists.freedesktop.org>,
"Sharma, Nishit" <nishit.sharma@intel.com>
Subject: Re: [PATCH i-g-t v7 05/10] tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU coherency test
Date: Mon, 17 Nov 2025 14:02:51 +0000 [thread overview]
Message-ID: <c14d068ad4a7a6bc6ab4f1ca4622c492619f557f.camel@intel.com> (raw)
In-Reply-To: <20251113163308.633818-6-nishit.sharma@intel.com>
On Thu, 2025-11-13 at 16:33 +0000, nishit.sharma@intel.com wrote:
> From: Nishit Sharma <nishit.sharma@intel.com>
>
> This test verifies memory coherency in a multi-GPU environment using
> SVM.
> GPU 1 writes to a shared buffer, GPU 2 reads and checks for correct
> data
> without explicit synchronization, and the test is repeated with CPU
> and
> both GPUs to ensure consistent memory visibility across agents.
>
> Signed-off-by: Nishit Sharma <nishit.sharma@intel.com>
> ---
> tests/intel/xe_multi_gpusvm.c | 203
> +++++++++++++++++++++++++++++++++-
> 1 file changed, 201 insertions(+), 2 deletions(-)
>
> diff --git a/tests/intel/xe_multi_gpusvm.c
> b/tests/intel/xe_multi_gpusvm.c
> index 54e036724..6792ef72c 100644
> --- a/tests/intel/xe_multi_gpusvm.c
> +++ b/tests/intel/xe_multi_gpusvm.c
> @@ -34,8 +34,13 @@
> * SUBTEST: atomic-inc-gpu-op
> * Description:
> * This test does atomic operation in multi-gpu by executing
> atomic
> - * operation on GPU1 and then atomic operation on GPU2 using
> same
> - * adress
> + * operation on GPU1 and then atomic operation on GPU2 using
> same
> + * adress
> + *
> + * SUBTEST: coherency-multi-gpu
> + * Description:
> + * This test checks coherency in multi-gpu by writing from GPU0
> + * reading from GPU1 and verify and repeating with CPU and both
> GPUs
> */
>
> #define MAX_XE_REGIONS 8
> @@ -93,6 +98,11 @@ static void gpu_atomic_inc_wrapper(struct
> xe_svm_gpu_info *src,
> struct
> drm_xe_engine_class_instance *eci,
> void *extra_args);
>
> +static void gpu_coherecy_test_wrapper(struct xe_svm_gpu_info *src,
> + struct xe_svm_gpu_info *dst,
> + struct
> drm_xe_engine_class_instance *eci,
> + void *extra_args);
> +
> static void
> create_vm_and_queue(struct xe_svm_gpu_info *gpu, struct
> drm_xe_engine_class_instance *eci,
> uint32_t *vm, uint32_t *exec_queue)
> @@ -214,6 +224,35 @@ atomic_batch_init(int fd, uint32_t vm, uint64_t
> src_addr,
> *addr = batch_addr;
> }
>
> +static void
> +store_dword_batch_init(int fd, uint32_t vm, uint64_t src_addr,
> + uint32_t *bo, uint64_t *addr, int value)
> +{
> + uint32_t batch_bo_size = BATCH_SIZE(fd);
> + uint32_t batch_bo;
> + uint64_t batch_addr;
> + void *batch;
> + uint32_t *cmd;
> + int i = 0;
> +
> + batch_bo = xe_bo_create(fd, vm, batch_bo_size,
> vram_if_possible(fd, 0), 0);
> + batch = xe_bo_map(fd, batch_bo, batch_bo_size);
> + cmd = (uint32_t *) batch;
> +
> + cmd[i++] = MI_STORE_DWORD_IMM_GEN4;
> + cmd[i++] = src_addr;
> + cmd[i++] = src_addr >> 32;
> + cmd[i++] = value;
> + cmd[i++] = MI_BATCH_BUFFER_END;
> +
> + batch_addr = to_user_pointer(batch);
> +
> + /* Punch a gap in the SVM map where we map the batch_bo */
> + xe_vm_bind_lr_sync(fd, vm, batch_bo, 0, batch_addr,
> batch_bo_size, 0);
> + *bo = batch_bo;
> + *addr = batch_addr;
> +}
> +
> static void batch_init(int fd, uint32_t vm, uint64_t src_addr,
> uint64_t dst_addr, uint64_t copy_size,
> uint32_t *bo, uint64_t *addr)
> @@ -373,6 +412,143 @@ gpu_mem_access_wrapper(struct xe_svm_gpu_info
> *src,
> copy_src_dst(src, dst, eci, args->prefetch_req);
> }
>
> +static void
> +coherency_test_multigpu(struct xe_svm_gpu_info *gpu0,
> + struct xe_svm_gpu_info *gpu1,
> + struct drm_xe_engine_class_instance *eci,
> + bool coh_fail_set,
> + bool prefetch_req)
> +{
> + uint64_t addr;
> + uint32_t vm[2];
> + uint32_t exec_queue[2];
> + uint32_t batch_bo, batch1_bo[2];
> + uint64_t batch_addr, batch1_addr[2];
> + struct drm_xe_sync sync = {};
> + volatile uint64_t *sync_addr;
> + int value = 60;
> + uint64_t *data1;
> + void *copy_dst;
> +
> + create_vm_and_queue(gpu0, eci, &vm[0], &exec_queue[0]);
> + create_vm_and_queue(gpu1, eci, &vm[1], &exec_queue[1]);
> +
> + data1 = aligned_alloc(SZ_2M, SZ_4K);
> + igt_assert(data1);
> + addr = to_user_pointer(data1);
> +
> + copy_dst = aligned_alloc(SZ_2M, SZ_4K);
> + igt_assert(copy_dst);
> +
> + store_dword_batch_init(gpu0->fd, vm[0], addr, &batch_bo,
> &batch_addr, value);
> +
> + /* Place destination in GPU0 local memory location to test
> */
Indentation looks odd throughout this function. Is there a
formatting / style checker that has been run on these patches?
> + xe_multigpu_madvise(gpu0->fd, vm[0], addr, SZ_4K, 0,
> + DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC,
> + gpu0->fd, 0, gpu0->vram_regions[0],
> exec_queue[0],
> + 0, 0);
> +
> + setup_sync(&sync, &sync_addr, BIND_SYNC_VAL);
> + xe_multigpu_prefetch(gpu0->fd, vm[0], addr, SZ_4K, &sync,
> + sync_addr, exec_queue[0],
> prefetch_req);
> +
> + sync_addr = (void *)((char *)batch_addr + SZ_4K);
> + sync.addr = to_user_pointer((uint64_t *)sync_addr);
> + sync.timeline_value = EXEC_SYNC_VAL;
> + *sync_addr = 0;
> +
> + /* Execute STORE command on GPU0 */
> + xe_exec_sync(gpu0->fd, exec_queue[0], batch_addr, &sync, 1);
> + if (*sync_addr != EXEC_SYNC_VAL)
> + xe_wait_ufence(gpu0->fd, (uint64_t *)sync_addr,
> EXEC_SYNC_VAL, exec_queue[0],
> + NSEC_PER_SEC * 10);
> +
> + igt_assert_eq(*(uint64_t *)addr, value);
This assert will cause a CPU read which migrates the data to system, so
perhaps not ideal if we want to test coherency across gpus?
> +
> + /* Creating batch for GPU1 using addr as Src which have
> value from GPU0 */
> + batch_init(gpu1->fd, vm[1], addr, to_user_pointer(copy_dst),
> + SZ_4K, &batch_bo, &batch_addr);
> +
> + /* Place destination in GPU1 local memory location to test
> */
> + xe_multigpu_madvise(gpu1->fd, vm[1], addr, SZ_4K, 0,
> + DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC,
> + gpu1->fd, 0, gpu1->vram_regions[0],
> exec_queue[1],
> + 0, 0);
> +
> + setup_sync(&sync, &sync_addr, BIND_SYNC_VAL);
> + xe_multigpu_prefetch(gpu1->fd, vm[1], addr, SZ_4K, &sync,
> + sync_addr, exec_queue[1],
> prefetch_req);
> +
> + sync_addr = (void *)((char *)batch_addr + SZ_4K);
> + sync.addr = to_user_pointer((uint64_t *)sync_addr);
> + sync.timeline_value = EXEC_SYNC_VAL;
> + *sync_addr = 0;
> +
> + /* Execute COPY command on GPU1 */
> + xe_exec_sync(gpu1->fd, exec_queue[1], batch_addr, &sync, 1);
> + if (*sync_addr != EXEC_SYNC_VAL)
> + xe_wait_ufence(gpu1->fd, (uint64_t *)sync_addr,
> EXEC_SYNC_VAL, exec_queue[1],
> + NSEC_PER_SEC * 10);
> +
> + igt_assert_eq(*(uint64_t *)copy_dst, value);
> +
> + /* CPU writes 10, memset set bytes no integer hence memset
> fills 4 bytes with 0x0A */
> + memset((void *)(uintptr_t)addr, 10, sizeof(int));
> + igt_assert_eq(*(uint64_t *)addr, 0x0A0A0A0A);
> +
> + if (coh_fail_set) {
> + igt_info("coherency fail impl\n");
> +
> + /* Coherency fail scenario */
> + store_dword_batch_init(gpu0->fd, vm[0], addr,
> &batch1_bo[0], &batch1_addr[0], value + 10);
> + store_dword_batch_init(gpu1->fd, vm[1], addr,
> &batch1_bo[1], &batch1_addr[1], value + 20);
> +
> + sync_addr = (void *)((char *)batch1_addr[0] +
> SZ_4K);
> + sync.addr = to_user_pointer((uint64_t *)sync_addr);
> + sync.timeline_value = EXEC_SYNC_VAL;
> + *sync_addr = 0;
> +
> + /* Execute STORE command on GPU1 */
> + xe_exec_sync(gpu0->fd, exec_queue[0],
> batch1_addr[0], &sync, 1);
> + if (*sync_addr != EXEC_SYNC_VAL)
> + xe_wait_ufence(gpu0->fd, (uint64_t
> *)sync_addr, EXEC_SYNC_VAL, exec_queue[0],
> + NSEC_PER_SEC * 10);
> +
> + sync_addr = (void *)((char *)batch1_addr[1] +
> SZ_4K);
> + sync.addr = to_user_pointer((uint64_t *)sync_addr);
> + sync.timeline_value = EXEC_SYNC_VAL;
> + *sync_addr = 0;
> +
> + /* Execute STORE command on GPU2 */
> + xe_exec_sync(gpu1->fd, exec_queue[1],
> batch1_addr[1], &sync, 1);
> + if (*sync_addr != EXEC_SYNC_VAL)
> + xe_wait_ufence(gpu1->fd, (uint64_t
> *)sync_addr, EXEC_SYNC_VAL, exec_queue[1],
> + NSEC_PER_SEC * 10);
> +
> + igt_warn_on_f(*(uint64_t *)addr != (value + 10),
> + "GPU2(dst_gpu] has overwritten value
> at addr\n");
Parenthesis mismatch.
BTW, isn't gpu2 supposed to overwrite the value here? Perhaps I'm
missing something?
Also regarding the previous comment WRT using the naming gpu0 and gpu1
vs gpu1 and gpu2? Shouldn't we try to be consistent here to avoid
confusion?
/Thomas
> +
> + munmap((void *)batch1_addr[0], BATCH_SIZE(gpu0-
> >fd));
> + munmap((void *)batch1_addr[1], BATCH_SIZE(gpu1-
> >fd));
> +
> + batch_fini(gpu0->fd, vm[0], batch1_bo[0],
> batch1_addr[0]);
> + batch_fini(gpu1->fd, vm[1], batch1_bo[1],
> batch1_addr[1]);
> + }
> +
> + /* CPU writes 11, memset set bytes no integer hence memset
> fills 4 bytes with 0x0B */
> + memset((void *)(uintptr_t)addr, 11, sizeof(int));
> + igt_assert_eq(*(uint64_t *)addr, 0x0B0B0B0B);
> +
> + munmap((void *)batch_addr, BATCH_SIZE(gpu0->fd));
> + batch_fini(gpu0->fd, vm[0], batch_bo, batch_addr);
> + batch_fini(gpu1->fd, vm[1], batch_bo, batch_addr);
> + free(data1);
> + free(copy_dst);
> +
> + cleanup_vm_and_queue(gpu0, vm[0], exec_queue[0]);
> + cleanup_vm_and_queue(gpu1, vm[1], exec_queue[1]);
> +}
> +
> static void
> atomic_inc_op(struct xe_svm_gpu_info *gpu0,
> struct xe_svm_gpu_info *gpu1,
> @@ -472,6 +648,19 @@ gpu_atomic_inc_wrapper(struct xe_svm_gpu_info
> *src,
> atomic_inc_op(src, dst, eci, args->prefetch_req);
> }
>
> +static void
> +gpu_coherecy_test_wrapper(struct xe_svm_gpu_info *src,
> + struct xe_svm_gpu_info *dst,
> + struct drm_xe_engine_class_instance *eci,
> + void *extra_args)
> +{
> + struct multigpu_ops_args *args = (struct multigpu_ops_args
> *)extra_args;
> + igt_assert(src);
> + igt_assert(dst);
> +
> + coherency_test_multigpu(src, dst, eci, args->op_mod, args-
> >prefetch_req);
> +}
> +
> igt_main
> {
> struct xe_svm_gpu_info gpus[MAX_XE_GPUS];
> @@ -519,6 +708,16 @@ igt_main
> for_each_gpu_pair(gpu_cnt, gpus, &eci,
> gpu_atomic_inc_wrapper, &atomic_args);
> }
>
> + igt_subtest("coherency-multi-gpu") {
> + struct multigpu_ops_args coh_args;
> + coh_args.prefetch_req = 1;
> + coh_args.op_mod = 0;
> + for_each_gpu_pair(gpu_cnt, gpus, &eci,
> gpu_coherecy_test_wrapper, &coh_args);
> + coh_args.prefetch_req = 0;
> + coh_args.op_mod = 1;
> + for_each_gpu_pair(gpu_cnt, gpus, &eci,
> gpu_coherecy_test_wrapper, &coh_args);
> + }
> +
> igt_fixture {
> int cnt;
>
next prev parent reply other threads:[~2025-11-17 14:03 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-13 16:32 [PATCH i-g-t v7 00/10] Madvise feature in SVM for Multi-GPU configs nishit.sharma
2025-11-13 16:33 ` [PATCH i-g-t v7 01/10] lib/xe: Add instance parameter to xe_vm_madvise and introduce lr_sync helpers nishit.sharma
2025-11-17 12:34 ` Hellstrom, Thomas
2025-11-17 15:43 ` Sharma, Nishit
2025-11-18 9:23 ` Hellstrom, Thomas
2025-11-13 16:33 ` [PATCH i-g-t v7 02/10] tests/intel/xe_exec_system_allocator: Add parameter in madvise call nishit.sharma
2025-11-17 12:38 ` Hellstrom, Thomas
2025-11-13 16:33 ` [PATCH i-g-t v7 03/10] tests/intel/xe_multi_gpusvm: Add SVM multi-GPU cross-GPU memory access test nishit.sharma
2025-11-17 13:00 ` Hellstrom, Thomas
2025-11-17 15:49 ` Sharma, Nishit
2025-11-17 20:40 ` Hellstrom, Thomas
2025-11-18 9:24 ` Hellstrom, Thomas
2025-11-13 16:33 ` [PATCH i-g-t v7 04/10] tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU atomic operations nishit.sharma
2025-11-17 13:10 ` Hellstrom, Thomas
2025-11-17 15:50 ` Sharma, Nishit
2025-11-18 9:26 ` Hellstrom, Thomas
2025-11-13 16:33 ` [PATCH i-g-t v7 05/10] tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU coherency test nishit.sharma
2025-11-17 14:02 ` Hellstrom, Thomas [this message]
2025-11-17 16:18 ` Sharma, Nishit
2025-11-27 7:36 ` Gurram, Pravalika
2025-11-13 16:33 ` [PATCH i-g-t v7 06/10] tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU performance test nishit.sharma
2025-11-17 14:39 ` Hellstrom, Thomas
2025-11-13 16:33 ` [PATCH i-g-t v7 07/10] tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU fault handling test nishit.sharma
2025-11-17 14:48 ` Hellstrom, Thomas
2025-11-13 16:33 ` [PATCH i-g-t v7 08/10] tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU simultaneous access test nishit.sharma
2025-11-17 14:57 ` Hellstrom, Thomas
2025-11-13 16:33 ` [PATCH i-g-t v7 09/10] tests/intel/xe_multi_gpusvm.c: Add SVM multi-GPU conflicting madvise test nishit.sharma
2025-11-17 15:11 ` Hellstrom, Thomas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c14d068ad4a7a6bc6ab4f1ca4622c492619f557f.camel@intel.com \
--to=thomas.hellstrom@intel.com \
--cc=igt-dev@lists.freedesktop.org \
--cc=nishit.sharma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).