From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0338CD3ABFF for ; Tue, 12 Nov 2024 13:23:45 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id ACC5510E5E1; Tue, 12 Nov 2024 13:23:45 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="EmDJaewz"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id EFEC310E5E1 for ; Tue, 12 Nov 2024 13:23:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731417825; x=1762953825; h=message-id:date:mime-version:subject:to:references:from: in-reply-to:content-transfer-encoding; bh=XDwsZyfe7h9bxojWe6qRHWTlWNUrjmjB/ned7NfAWqA=; b=EmDJaewzOkeoqYr09IutdYakPU3PnsBbf//RGCfDmtpOlK+zWpJ5rQhq nxfsCXQmj9aEiRv9rKMVaBO1hhVlq3kM/O3UvnmUGyqVW417kXCfZI9rN WkQUoteXypaTliA1UVhKWvHVadkT9/cy5BFMfH4km2mhEKD1VuVNLNbVz SJ5hbe22lJ3CBaScXXR1lZfVMxM/tEIYw0nTCpDcGbmdjuR3C5qR5ZXAF DW7fKqpzCqjFQbD3H8jyqcau1tCBxMOmXtXaJoqVUiGrSAg8B+BorCZH4 Z3nn2ONinNW8FbZbHg3Kit/ugbEbzBdtR9bjrkwThT0bXn28QYBaPRvRU A==; X-CSE-ConnectionGUID: iEAjbt9QTgucFJAS6dzJtw== X-CSE-MsgGUID: bE0IHwpCTJykf+xI/BaOHA== X-IronPort-AV: E=McAfee;i="6700,10204,11253"; a="34117262" X-IronPort-AV: E=Sophos;i="6.12,148,1728975600"; d="scan'208";a="34117262" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Nov 2024 05:23:44 -0800 X-CSE-ConnectionGUID: 1U7ah6+CSH2cvw3P5FS0UQ== X-CSE-MsgGUID: GMR7E2geTOGCh+Uc/r/6+g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,148,1728975600"; d="scan'208";a="92235623" Received: from nirmoyda-mobl.ger.corp.intel.com (HELO [10.245.134.205]) ([10.245.134.205]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Nov 2024 05:23:43 -0800 Message-ID: Date: Tue, 12 Nov 2024 14:23:40 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [i-g-t, v2, 1/4] tests/intel/xe_exec_fault_mode: separate exec_sync and batch buffer To: fei.yang@intel.com, igt-dev@lists.freedesktop.org References: <20241108011041.2257553-1-fei.yang@intel.com> <20241108011041.2257553-2-fei.yang@intel.com> Content-Language: en-US From: Nirmoy Das In-Reply-To: <20241108011041.2257553-2-fei.yang@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-BeenThere: igt-dev@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Development mailing list for IGT GPU Tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: igt-dev-bounces@lists.freedesktop.org Sender: "igt-dev" On 11/8/2024 2:10 AM, fei.yang@intel.com wrote: > From: Fei Yang > > In INVALIDATE cases the test purposely remap the data buffer to a > different physical location in the midle of execution to exercise the > page fault handling flow. After the remapping we lose access to the old > physical location, and that would cause a problem for comparing ufence > value at the end of the execution. To fix this the exec_sync data needs > to be separated from the batch buffer for instructions, and during the > execution we don't remap the exec_sync data. > > v2: Separate only exec_sync. Keep data field together with the batch > buffer (Matt Roper) > > Signed-off-by: Fei Yang Tested it locally Reviewed-by: Nirmoy Das > --- > tests/intel/xe_exec_fault_mode.c | 66 ++++++++++++++++++++++++-------- > 1 file changed, 50 insertions(+), 16 deletions(-) > > diff --git a/tests/intel/xe_exec_fault_mode.c b/tests/intel/xe_exec_fault_mode.c > index d416c773b..ae40e099b 100644 > --- a/tests/intel/xe_exec_fault_mode.c > +++ b/tests/intel/xe_exec_fault_mode.c > @@ -116,6 +116,7 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci, > { > uint32_t vm; > uint64_t addr = 0x1a0000; > + uint64_t sync_addr = 0x101a0000; > #define USER_FENCE_VALUE 0xdeadbeefdeadbeefull > struct drm_xe_sync sync[1] = { > { .type = DRM_XE_SYNC_TYPE_USER_FENCE, .flags = DRM_XE_SYNC_FLAG_SIGNAL, > @@ -128,15 +129,15 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci, > }; > uint32_t exec_queues[MAX_N_EXEC_QUEUES]; > uint32_t bind_exec_queues[MAX_N_EXEC_QUEUES]; > - size_t bo_size; > + size_t bo_size, sync_size; > uint32_t bo = 0; > struct { > uint32_t batch[16]; > uint64_t pad; > uint64_t vm_sync; > - uint64_t exec_sync; > uint32_t data; > } *data; > + uint64_t *exec_sync; > int i, j, b; > int map_fd = -1; > > @@ -151,6 +152,8 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci, > DRM_XE_VM_CREATE_FLAG_FAULT_MODE, 0); > bo_size = sizeof(*data) * n_execs; > bo_size = xe_bb_size(fd, bo_size); > + sync_size = sizeof(*exec_sync) * n_execs; > + sync_size = xe_bb_size(fd, sync_size); > > if (flags & USERPTR) { > #define MAP_ADDRESS 0x00007fadeadbe000 > @@ -178,6 +181,12 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci, > } > memset(data, 0, bo_size); > > +#define EXEC_SYNC_ADDRESS 0x00007fbdeadbe000 > + exec_sync = mmap((void *)EXEC_SYNC_ADDRESS, sync_size, PROT_READ | PROT_WRITE, > + MAP_SHARED | MAP_FIXED | MAP_ANONYMOUS, -1, 0); > + igt_assert(exec_sync != MAP_FAILED); > + memset(exec_sync, 0, sync_size); > + > for (i = 0; i < n_exec_queues; i++) { > exec_queues[i] = xe_exec_queue_create(fd, vm, eci, 0); > if (flags & BIND_EXEC_QUEUE) > @@ -212,6 +221,13 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci, > bind_exec_queues[0], NSEC_PER_SEC); > data[0].vm_sync = 0; > > + xe_vm_bind_userptr_async(fd, vm, bind_exec_queues[0], > + to_user_pointer(exec_sync), sync_addr, > + sync_size, sync, 1); > + xe_wait_ufence(fd, &data[0].vm_sync, USER_FENCE_VALUE, > + bind_exec_queues[0], NSEC_PER_SEC); > + data[0].vm_sync = 0; > + > if (flags & PREFETCH) { > /* Should move to system memory */ > xe_vm_prefetch_async(fd, vm, bind_exec_queues[0], 0, addr, > @@ -239,14 +255,14 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci, > data[i].batch[b++] = MI_BATCH_BUFFER_END; > igt_assert(b <= ARRAY_SIZE(data[i].batch)); > > - sync[0].addr = addr + (char *)&data[i].exec_sync - (char *)data; > + sync[0].addr = sync_addr + (char *)&exec_sync[i] - (char *)exec_sync; > > exec.exec_queue_id = exec_queues[e]; > exec.address = batch_addr; > xe_exec(fd, &exec); > > if (flags & REBIND && i + 1 != n_execs) { > - xe_wait_ufence(fd, &data[i].exec_sync, USER_FENCE_VALUE, > + xe_wait_ufence(fd, &exec_sync[i], USER_FENCE_VALUE, > exec_queues[e], NSEC_PER_SEC); > xe_vm_unbind_async(fd, vm, bind_exec_queues[e], 0, > addr, bo_size, NULL, 0); > @@ -275,7 +291,7 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci, > * physical memory on next mmap call triggering > * an invalidate. > */ > - xe_wait_ufence(fd, &data[i].exec_sync, > + xe_wait_ufence(fd, &exec_sync[i], > USER_FENCE_VALUE, exec_queues[e], > NSEC_PER_SEC); > igt_assert_eq(data[i].data, 0xc0ffee); > @@ -311,48 +327,66 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci, > * submission here. For RACE cases we need to wait for all submissions > * to complete because the GuC scheduling can be out of order, the > * completion of the last submission doesn't mean all submission are > - * completed. > + * completed. For REBIND cases we only need to wait for the last > + * submission. > */ > - j = (flags & INVALIDATE && !(flags & RACE)) ? n_execs - 1 : 0; > + j = ((flags & INVALIDATE && !(flags & RACE)) || > + (flags & REBIND)) ? n_execs - 1 : 0; > > for (i = j; i < n_execs; i++) { > int64_t timeout = NSEC_PER_SEC; > > if (flags & INVALID_VA && !(flags & ENABLE_SCRATCH)) > - igt_assert_eq(__xe_wait_ufence(fd, &data[i].exec_sync, USER_FENCE_VALUE, > + igt_assert_eq(__xe_wait_ufence(fd, &exec_sync[i], USER_FENCE_VALUE, > exec_queues[i % n_exec_queues], &timeout), -EIO); > else > - igt_assert_eq(__xe_wait_ufence(fd, &data[i].exec_sync, USER_FENCE_VALUE, > + igt_assert_eq(__xe_wait_ufence(fd, &exec_sync[i], USER_FENCE_VALUE, > exec_queues[i % n_exec_queues], &timeout), 0); > } > } > - sync[0].addr = to_user_pointer(&data[0].vm_sync); > - xe_vm_unbind_async(fd, vm, bind_exec_queues[0], 0, addr, bo_size, > - sync, 1); > - xe_wait_ufence(fd, &data[0].vm_sync, USER_FENCE_VALUE, > - bind_exec_queues[0], NSEC_PER_SEC); > > if (flags & INVALID_FAULT) { > for (i = 0; i < n_execs; i++) { > int ret; > int64_t timeout = NSEC_PER_SEC; > > - ret = __xe_wait_ufence(fd, &data[i].exec_sync, USER_FENCE_VALUE, > + ret = __xe_wait_ufence(fd, &exec_sync[i], USER_FENCE_VALUE, > exec_queues[i % n_exec_queues], &timeout); > igt_assert(ret == -EIO || ret == 0); > } > } else if (!(flags & INVALID_VA)) { > + /* > + * For INVALIDATE && RACE cases, due the the remmap in the > + * middle of the execution, we lose access to some of the > + * 0xc0ffee written to the old location, so check only for > + * the second half of the submissions. > + */ > + if (flags & INVALIDATE && flags & RACE) > + j = n_execs / 2 + 1; > for (i = j; i < n_execs; i++) > igt_assert_eq(data[i].data, 0xc0ffee); > - > } > > + sync[0].addr = to_user_pointer(&data[0].vm_sync); > + data[0].vm_sync = 0; > + xe_vm_unbind_async(fd, vm, bind_exec_queues[0], 0, sync_addr, sync_size, > + sync, 1); > + xe_wait_ufence(fd, &data[0].vm_sync, USER_FENCE_VALUE, > + bind_exec_queues[0], NSEC_PER_SEC); > + data[0].vm_sync = 0; > + xe_vm_unbind_async(fd, vm, bind_exec_queues[0], 0, addr, bo_size, > + sync, 1); > + xe_wait_ufence(fd, &data[0].vm_sync, USER_FENCE_VALUE, > + bind_exec_queues[0], NSEC_PER_SEC); > + > for (i = 0; i < n_exec_queues; i++) { > xe_exec_queue_destroy(fd, exec_queues[i]); > if (bind_exec_queues[i]) > xe_exec_queue_destroy(fd, bind_exec_queues[i]); > } > > + munmap(exec_sync, sync_size); > + > if (bo) { > munmap(data, bo_size); > gem_close(fd, bo);