From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D0F7DC2A07D for ; Mon, 5 Jan 2026 04:00:05 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D558810E353; Mon, 5 Jan 2026 04:00:04 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="Xgj/qxLx"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) by gabe.freedesktop.org (Postfix) with ESMTPS id D1DBE10E232 for ; Mon, 5 Jan 2026 04:00:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1767585603; x=1799121603; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=JPK3to+Cy5FdE1yjolpjd3U1S9pjGX8BilZ45v5lMLY=; b=Xgj/qxLx54msy8GuQznuqlCXdU3uv0M1EuREL+ksnSrd83ZPa9pWsgsV ZTMDHQo/xE4SUCAoejmwsS8ksZHBzQXeOjVIwpI9PwzE9CaQDCyDYJs08 w3c97Z9tJwbrvfaAQKmEEfvC/t15m0QOCR+gz1h8G4vtjKO0XJtKP6wyQ vFoI4Nhsyb/sfuXDv6m4FjL2ORz/Z5mK571WMMokBxPtJtE22eWWigXZF 4om/Mw6VmEgnZH7hOLS0Uo4BNB4LaVft2JS7a28IHOSMaOfnWftxfTRHf ad+eeP0d/zYSB6VPSzATAToithZ3OO+LtNuvl67+J6llO4z0Z3RuudyEp g==; X-CSE-ConnectionGUID: clNtDVfpRL+1qDkKdeHFEQ== X-CSE-MsgGUID: lw0GIKLHSEKabdIFDJwiVQ== X-IronPort-AV: E=McAfee;i="6800,10657,11661"; a="69024121" X-IronPort-AV: E=Sophos;i="6.21,202,1763452800"; d="scan'208";a="69024121" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jan 2026 20:00:02 -0800 X-CSE-ConnectionGUID: VbmbNnYPR0mAdYwZOSngiQ== X-CSE-MsgGUID: jh20IQWrQhyqnM5UFC9s4A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,202,1763452800"; d="scan'208";a="202335552" Received: from dut7069bmgfrd.fm.intel.com (HELO DUT7069BMGFRD..) ([10.1.84.79]) by orviesa008.jf.intel.com with ESMTP; 04 Jan 2026 20:00:02 -0800 From: nishit.sharma@intel.com To: igt-dev@lists.freedesktop.org, pravalika.gurram@intel.com Subject: [PATCH i-g-t v13 09/11] tests/intel/xe_multigpu_svm: Add SVM multi-GPU simultaneous access test Date: Mon, 5 Jan 2026 03:59:58 +0000 Message-ID: <20260105040000.181183-10-nishit.sharma@intel.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20260105040000.181183-1-nishit.sharma@intel.com> References: <20260105040000.181183-1-nishit.sharma@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: igt-dev@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Development mailing list for IGT GPU Tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: igt-dev-bounces@lists.freedesktop.org Sender: "igt-dev" From: Nishit Sharma This test launches atomic increment workloads on two GPUs in parallel, both accessing the same SVM buffer. It verifies that concurrent atomic operations from multiple GPUs produce the expected result, ensuring data integrity and the absence of race conditions in a multi-GPU SVM environment. Signed-off-by: Nishit Sharma Reviewed-by: Pravalika Gurram Reviewed-by: Thomas Hellström --- tests/intel/xe_multigpu_svm.c | 193 ++++++++++++++++++++++++++++++++++ 1 file changed, 193 insertions(+) diff --git a/tests/intel/xe_multigpu_svm.c b/tests/intel/xe_multigpu_svm.c index f0a9a8f3c..08414ace5 100644 --- a/tests/intel/xe_multigpu_svm.c +++ b/tests/intel/xe_multigpu_svm.c @@ -13,6 +13,7 @@ #include "intel_mocs.h" #include "intel_reg.h" +#include "intel_gpu_commands.h" #include "time.h" #include "xe/xe_gt.h" @@ -101,6 +102,17 @@ * Test cross-GPU memory access with prefetch to verify page fault * suppression when memory is pre-migrated to target GPU's VRAM * + * SUBTEST: mgpu-concurrent-access-basic + * Description: + * Test concurrent atomic memory operations where multiple GPUs + * simultaneously access and modify the same memory location without + * prefetch to validate cross-GPU coherency and synchronization + * + * SUBTEST: mgpu-concurrent-access-prefetch + * Description: + * Test concurrent atomic memory operations with prefetch where + * multiple GPUs simultaneously access shared memory to validate + * coherency with memory migration and local VRAM access */ #define MAX_XE_REGIONS 8 @@ -112,6 +124,7 @@ #define COPY_SIZE SZ_64M #define ATOMIC_OP_VAL 56 #define BATCH_VALUE 60 +#define NUM_ITER 200 #define MULTIGPU_PREFETCH BIT(1) #define MULTIGPU_XGPU_ACCESS BIT(2) @@ -121,6 +134,7 @@ #define MULTIGPU_PERF_OP BIT(6) #define MULTIGPU_PERF_REM_COPY BIT(7) #define MULTIGPU_PFAULT_OP BIT(8) +#define MULTIGPU_CONC_ACCESS BIT(9) #define INIT 2 #define STORE 3 @@ -181,6 +195,11 @@ static void gpu_fault_test_wrapper(struct xe_svm_gpu_info *src, struct drm_xe_engine_class_instance *eci, unsigned int flags); +static void gpu_simult_test_wrapper(struct xe_svm_gpu_info *src, + struct xe_svm_gpu_info *dst, + struct drm_xe_engine_class_instance *eci, + unsigned int flags); + static void create_vm_and_queue(struct xe_svm_gpu_info *gpu, struct drm_xe_engine_class_instance *eci, uint32_t *vm, uint32_t *exec_queue) @@ -1057,6 +1076,164 @@ pagefault_test_multigpu(struct xe_svm_gpu_info *gpu1, cleanup_vm_and_queue(gpu2, vm[1], exec_queue[1]); } +static void +multigpu_access_test(struct xe_svm_gpu_info *gpu1, + struct xe_svm_gpu_info *gpu2, + struct drm_xe_engine_class_instance *eci, + unsigned int flags) +{ + uint64_t addr; + uint32_t vm[2]; + uint32_t exec_queue[2]; + uint32_t batch_bo[2]; + struct test_exec_data *data; + uint64_t batch_addr[2]; + struct drm_xe_sync sync[2] = {}; + uint64_t *sync_addr[2]; + uint32_t verify_batch_bo; + uint64_t verify_batch_addr; + uint64_t *verify_result; + uint32_t final_value; + uint64_t final_timeline; + + /* Skip if either GPU doesn't support faults */ + if (mgpu_check_fault_support(gpu1, gpu2)) + return; + + create_vm_and_queue(gpu1, eci, &vm[0], &exec_queue[0]); + create_vm_and_queue(gpu2, eci, &vm[1], &exec_queue[1]); + + data = aligned_alloc(SZ_2M, SZ_4K); + igt_assert(data); + data[0].vm_sync = 0; + addr = to_user_pointer(data); + + WRITE_ONCE(*(uint64_t *)addr, 0); + + /* GPU1: Atomic Batch create */ + gpu_batch_create(gpu1, vm[0], exec_queue[0], addr, 0, + &batch_bo[0], &batch_addr[0], flags, ATOMIC); + /* GPU2: Atomic Batch create */ + gpu_batch_create(gpu2, vm[1], exec_queue[1], addr, 0, + &batch_bo[1], &batch_addr[1], flags, ATOMIC); + + /* gpu_madvise_sync calls xe_exec() also, here intention is different */ + xe_multigpu_madvise(gpu1->fd, vm[0], addr, SZ_4K, 0, + DRM_XE_MEM_RANGE_ATTR_ATOMIC, + DRM_XE_ATOMIC_GLOBAL, 0, 0, exec_queue[0]); + + xe_multigpu_madvise(gpu1->fd, vm[0], addr, SZ_4K, 0, + DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, + gpu1->fd, 0, gpu1->vram_regions[0], exec_queue[0]); + + xe_multigpu_madvise(gpu2->fd, vm[1], addr, SZ_4K, 0, + DRM_XE_MEM_RANGE_ATTR_ATOMIC, + DRM_XE_ATOMIC_GLOBAL, 0, 0, exec_queue[1]); + + xe_multigpu_madvise(gpu2->fd, vm[1], addr, SZ_4K, 0, + DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC, + gpu2->fd, 0, gpu2->vram_regions[0], exec_queue[1]); + + setup_sync(&sync[0], &sync_addr[0], BIND_SYNC_VAL); + setup_sync(&sync[1], &sync_addr[1], BIND_SYNC_VAL); + + xe_multigpu_prefetch(gpu1->fd, vm[0], addr, SZ_4K, &sync[0], + sync_addr[0], exec_queue[0], flags); + + xe_multigpu_prefetch(gpu2->fd, vm[1], addr, SZ_4K, &sync[1], + sync_addr[1], exec_queue[1], flags); + + free(sync_addr[0]); + free(sync_addr[1]); + + igt_info("Starting %d concurrent atomic increment iterations\n", NUM_ITER); + for (int i = 0; i < NUM_ITER; i++) { + bool last = (i == NUM_ITER - 1); + + if (last) { + sync_addr[0] = (void *)((char *)batch_addr[0] + SZ_4K); + sync[0].flags = DRM_XE_SYNC_FLAG_SIGNAL; + sync[0].type = DRM_XE_SYNC_TYPE_USER_FENCE; + sync[0].addr = to_user_pointer((uint64_t *)sync_addr[0]); + sync[0].timeline_value = EXEC_SYNC_VAL + i; + WRITE_ONCE(*sync_addr[0], 0); + + sync_addr[1] = (void *)((char *)batch_addr[1] + SZ_4K); + sync[1].flags = DRM_XE_SYNC_FLAG_SIGNAL; + sync[1].type = DRM_XE_SYNC_TYPE_USER_FENCE; + sync[1].addr = to_user_pointer((uint64_t *)sync_addr[1]); + sync[1].timeline_value = EXEC_SYNC_VAL + i; + WRITE_ONCE(*sync_addr[1], 0); + } + + /* === CONCURRENT EXECUTION: Launch both GPUs simultaneously === */ + xe_exec_sync(gpu1->fd, exec_queue[0], batch_addr[0], + last ? &sync[0] : NULL, last ? 1 : 0); + + xe_exec_sync(gpu2->fd, exec_queue[1], batch_addr[1], + last ? &sync[1] : NULL, last ? 1 : 0); + } + + /* NOW wait only for the last operations to complete */ + final_timeline = EXEC_SYNC_VAL + NUM_ITER - 1; + if (NUM_ITER > 0) { + if (READ_ONCE(*sync_addr[0]) != final_timeline) + xe_wait_ufence(gpu1->fd, (uint64_t *)sync_addr[0], final_timeline, + exec_queue[0], NSEC_PER_SEC * 30); + + if (READ_ONCE(*sync_addr[1]) != final_timeline) + xe_wait_ufence(gpu2->fd, (uint64_t *)sync_addr[1], final_timeline, + exec_queue[1], NSEC_PER_SEC * 30); + } + + igt_info("Both GPUs completed execution %u\n", READ_ONCE(*(uint32_t *)addr)); + + /* === Verification using GPU read (not CPU) === */ + verify_result = aligned_alloc(SZ_2M, SZ_4K); + igt_assert(verify_result); + memset(verify_result, 0xDE, SZ_4K); + + /* Use GPU1 to read final value */ + gpu_batch_create(gpu1, vm[0], exec_queue[0], addr, to_user_pointer(verify_result), + &verify_batch_bo, &verify_batch_addr, flags, INIT); + + sync_addr[0] = (void *)((char *)verify_batch_addr + SZ_4K); + sync[0].addr = to_user_pointer((uint64_t *)sync_addr[0]); + sync[0].timeline_value = EXEC_SYNC_VAL; + sync[0].flags = DRM_XE_SYNC_FLAG_SIGNAL; + sync[0].type = DRM_XE_SYNC_TYPE_USER_FENCE; + WRITE_ONCE(*sync_addr[0], 0); + + xe_exec_sync(gpu1->fd, exec_queue[0], verify_batch_addr, &sync[0], 1); + if (READ_ONCE(*sync_addr[0]) != EXEC_SYNC_VAL) + xe_wait_ufence(gpu1->fd, (uint64_t *)sync_addr[0], EXEC_SYNC_VAL, + exec_queue[0], NSEC_PER_SEC * 10); + + /* NOW CPU can read verify_result */ + final_value = READ_ONCE(*(uint32_t *)verify_result); + + igt_info("GPU verification batch copied value: %u\n", final_value); + igt_info("CPU direct read shows: %u\n", (unsigned int)*(uint64_t *)addr); + + /* Expected: 0 + (NUM_ITER * 2 GPUs) = 400 */ + igt_assert_f((final_value == 2 * NUM_ITER), + "Expected %u value, got %u\n", + 2 * NUM_ITER, final_value); + + munmap((void *)verify_batch_addr, BATCH_SIZE(gpu1->fd)); + batch_fini(gpu1->fd, vm[0], verify_batch_bo, verify_batch_addr); + free(verify_result); + + munmap((void *)batch_addr[0], BATCH_SIZE(gpu1->fd)); + munmap((void *)batch_addr[1], BATCH_SIZE(gpu2->fd)); + batch_fini(gpu1->fd, vm[0], batch_bo[0], batch_addr[0]); + batch_fini(gpu2->fd, vm[1], batch_bo[1], batch_addr[1]); + free(data); + + cleanup_vm_and_queue(gpu1, vm[0], exec_queue[0]); + cleanup_vm_and_queue(gpu2, vm[1], exec_queue[1]); +} + static void gpu_mem_access_wrapper(struct xe_svm_gpu_info *src, struct xe_svm_gpu_info *dst, @@ -1117,6 +1294,18 @@ gpu_fault_test_wrapper(struct xe_svm_gpu_info *src, pagefault_test_multigpu(src, dst, eci, flags); } +static void +gpu_simult_test_wrapper(struct xe_svm_gpu_info *src, + struct xe_svm_gpu_info *dst, + struct drm_xe_engine_class_instance *eci, + unsigned int flags) +{ + igt_assert(src); + igt_assert(dst); + + multigpu_access_test(src, dst, eci, flags); +} + static void test_mgpu_exec(int gpu_cnt, struct xe_svm_gpu_info *gpus, struct drm_xe_engine_class_instance *eci, @@ -1132,6 +1321,8 @@ test_mgpu_exec(int gpu_cnt, struct xe_svm_gpu_info *gpus, for_each_gpu_pair(gpu_cnt, gpus, eci, gpu_latency_test_wrapper, flags); if (flags & MULTIGPU_PFAULT_OP) for_each_gpu_pair(gpu_cnt, gpus, eci, gpu_fault_test_wrapper, flags); + if (flags & MULTIGPU_CONC_ACCESS) + for_each_gpu_pair(gpu_cnt, gpus, eci, gpu_simult_test_wrapper, flags); } struct section { @@ -1169,6 +1360,8 @@ int igt_main() MULTIGPU_PREFETCH | MULTIGPU_PERF_OP | MULTIGPU_PERF_REM_COPY }, { "pagefault-basic", MULTIGPU_PFAULT_OP }, { "pagefault-prefetch", MULTIGPU_PREFETCH | MULTIGPU_PFAULT_OP }, + { "concurrent-access-basic", MULTIGPU_CONC_ACCESS }, + { "concurrent-access-prefetch", MULTIGPU_PREFETCH | MULTIGPU_CONC_ACCESS }, { NULL }, }; -- 2.48.1