Re: [i-g-t 1/4] tests/intel/xe_exec_fault_mode: separate sync data and batch buffer

Igt-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Matt Roper <matthew.d.roper@intel.com>
To: <fei.yang@intel.com>
Cc: <igt-dev@lists.freedesktop.org>
Subject: Re: [i-g-t 1/4] tests/intel/xe_exec_fault_mode: separate sync data and batch buffer
Date: Tue, 5 Nov 2024 15:55:44 -0800	[thread overview]
Message-ID: <20241105235544.GN4891@mdroper-desk1.amr.corp.intel.com> (raw)
In-Reply-To: <20241030230350.1681757-2-fei.yang@intel.com>

On Wed, Oct 30, 2024 at 04:03:47PM -0700, fei.yang@intel.com wrote:
> From: Fei Yang <fei.yang@intel.com>
> 
> In INVALIDATE cases the test purposely remap the data buffer to a
> different physical location in the midle of execution to exercise the
> page fault handling flow. After the remapping we lose access to the old
> physical location, and that would cause a problem for verifying stored
> data and comparing ufence value at the end of the execution. To fix this
> the data used for synchronization purpose needs to be separated from the
> batch buffer for instructions, and during the execution we remap the
> batch buffer only.
> 
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> ---
>  tests/intel/xe_exec_fault_mode.c | 70 ++++++++++++++++++++++----------
>  1 file changed, 48 insertions(+), 22 deletions(-)
> 
> diff --git a/tests/intel/xe_exec_fault_mode.c b/tests/intel/xe_exec_fault_mode.c
> index d416c773b..995517087 100644
> --- a/tests/intel/xe_exec_fault_mode.c
> +++ b/tests/intel/xe_exec_fault_mode.c
> @@ -116,6 +116,7 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci,
>  {
>  	uint32_t vm;
>  	uint64_t addr = 0x1a0000;
> +	uint64_t syncaddr = 0x101a0000;
>  #define USER_FENCE_VALUE	0xdeadbeefdeadbeefull
>  	struct drm_xe_sync sync[1] = {
>  		{ .type = DRM_XE_SYNC_TYPE_USER_FENCE, .flags = DRM_XE_SYNC_FLAG_SIGNAL,
> @@ -128,15 +129,17 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci,
>  	};
>  	uint32_t exec_queues[MAX_N_EXEC_QUEUES];
>  	uint32_t bind_exec_queues[MAX_N_EXEC_QUEUES];
> -	size_t bo_size;
> +	size_t bo_size, sync_size;
>  	uint32_t bo = 0;
>  	struct {
>  		uint32_t batch[16];
>  		uint64_t pad;
> +	} *data;
> +	struct {
>  		uint64_t vm_sync;

I don't think we need to move the vm_sync since that one is just used
for binding.  The first time we use it, we've already finished waiting
on it successfully before the invalidate+race flag combination does the
read, mmap, write operations that could potentially make it lose racing
updates to the 'data' structure.

>  		uint64_t exec_sync;
>  		uint32_t data;

I think we want to keep the 'data' field in the original data structure.
Based on the minimal comments in the test code, it seems like the goal
of this flag combination is to ensure that the batch buffer's write to
this field doesn't cause a GPU fault due to a racing mmap.  If we move
the target of the batchbuffer write to a separate area that isn't
subject to the mmap, then we've lost the coverage this subtest was
trying to provide.

Due to the way we read, remap, write it's possible that we can
potentially lose the 0xc0ffee value written if the timing is just right,
which will fail the assertion at the end of the test to make sure every
0xc0ffee was written.  But I think that assertion is simply unwanted for
the invalidate+race flag combination.  We should probably skip that
final assertion for this specific flag combination since we don't always
expect every single 0xc0ffee write to still be visible at the end of the
test.

As far as I can see, just moving just the exec_sync to the new area
should be sufficient to ensure that we don't wind up waiting forever on
a sync object that was actually already signalled (but that we "missed"
due to an older copy of the data structure being copied into the
re-mmap'd area).


Matt


> -	} *data;
> +	} *syncdata;
>  	int i, j, b;
>  	int map_fd = -1;
>  
> @@ -151,6 +154,8 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci,
>  				  DRM_XE_VM_CREATE_FLAG_FAULT_MODE, 0);
>  	bo_size = sizeof(*data) * n_execs;
>  	bo_size = xe_bb_size(fd, bo_size);
> +	sync_size = sizeof(*syncdata) * n_execs;
> +	sync_size = xe_bb_size(fd, sync_size);
>  
>  	if (flags & USERPTR) {
>  #define	MAP_ADDRESS	0x00007fadeadbe000
> @@ -178,6 +183,12 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci,
>  	}
>  	memset(data, 0, bo_size);
>  
> +#define SYNCDATA_ADDRESS	0x00007fbdeadbe000
> +	syncdata = mmap((void *)SYNCDATA_ADDRESS, sync_size, PROT_READ | PROT_WRITE,
> +			MAP_SHARED | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
> +	igt_assert(syncdata != MAP_FAILED);
> +	memset(syncdata, 0, sync_size);
> +
>  	for (i = 0; i < n_exec_queues; i++) {
>  		exec_queues[i] = xe_exec_queue_create(fd, vm, eci, 0);
>  		if (flags & BIND_EXEC_QUEUE)
> @@ -187,7 +198,14 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci,
>  			bind_exec_queues[i] = 0;
>  	};
>  
> -	sync[0].addr = to_user_pointer(&data[0].vm_sync);
> +	sync[0].addr = to_user_pointer(&syncdata[0].vm_sync);
> +	xe_vm_bind_userptr_async(fd, vm, bind_exec_queues[0],
> +				 to_user_pointer(syncdata), syncaddr,
> +				 sync_size, sync, 1);
> +	xe_wait_ufence(fd, &syncdata[0].vm_sync, USER_FENCE_VALUE,
> +			bind_exec_queues[0], NSEC_PER_SEC);
> +	syncdata[0].vm_sync = 0;
> +
>  	if (flags & IMMEDIATE) {
>  		if (bo)
>  			xe_vm_bind_async_flags(fd, vm, bind_exec_queues[0], bo, 0,
> @@ -208,24 +226,24 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci,
>  						 bo_size, sync, 1);
>  	}
>  
> -	xe_wait_ufence(fd, &data[0].vm_sync, USER_FENCE_VALUE,
> +	xe_wait_ufence(fd, &syncdata[0].vm_sync, USER_FENCE_VALUE,
>  		       bind_exec_queues[0], NSEC_PER_SEC);
> -	data[0].vm_sync = 0;
> +	syncdata[0].vm_sync = 0;
>  
>  	if (flags & PREFETCH) {
>  		/* Should move to system memory */
>  		xe_vm_prefetch_async(fd, vm, bind_exec_queues[0], 0, addr,
>  				     bo_size, sync, 1, 0);
> -		xe_wait_ufence(fd, &data[0].vm_sync, USER_FENCE_VALUE,
> +		xe_wait_ufence(fd, &syncdata[0].vm_sync, USER_FENCE_VALUE,
>  			       bind_exec_queues[0], NSEC_PER_SEC);
> -		data[0].vm_sync = 0;
> +		syncdata[0].vm_sync = 0;
>  	}
>  
>  	for (i = 0; i < n_execs; i++) {
>  		uint64_t batch_offset = (char *)&data[i].batch - (char *)data;
>  		uint64_t batch_addr = addr + batch_offset;
> -		uint64_t sdi_offset = (char *)&data[i].data - (char *)data;
> -		uint64_t sdi_addr = addr + sdi_offset;
> +		uint64_t sdi_offset = (char *)&syncdata[i].data - (char *)syncdata;
> +		uint64_t sdi_addr = syncaddr + sdi_offset;
>  		int e = i % n_exec_queues;
>  
>  		b = 0;
> @@ -239,19 +257,19 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci,
>  		data[i].batch[b++] = MI_BATCH_BUFFER_END;
>  		igt_assert(b <= ARRAY_SIZE(data[i].batch));
>  
> -		sync[0].addr = addr + (char *)&data[i].exec_sync - (char *)data;
> +		sync[0].addr = syncaddr + (char *)&syncdata[i].exec_sync - (char *)syncdata;
>  
>  		exec.exec_queue_id = exec_queues[e];
>  		exec.address = batch_addr;
>  		xe_exec(fd, &exec);
>  
>  		if (flags & REBIND && i + 1 != n_execs) {
> -			xe_wait_ufence(fd, &data[i].exec_sync, USER_FENCE_VALUE,
> +			xe_wait_ufence(fd, &syncdata[i].exec_sync, USER_FENCE_VALUE,
>  				       exec_queues[e], NSEC_PER_SEC);
>  			xe_vm_unbind_async(fd, vm, bind_exec_queues[e], 0,
>  					   addr, bo_size, NULL, 0);
>  
> -			sync[0].addr = to_user_pointer(&data[0].vm_sync);
> +			sync[0].addr = to_user_pointer(&syncdata[0].vm_sync);
>  			addr += bo_size;
>  			if (bo)
>  				xe_vm_bind_async(fd, vm, bind_exec_queues[e], bo,
> @@ -262,9 +280,9 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci,
>  							 to_user_pointer(data),
>  							 addr, bo_size, sync,
>  							 1);
> -			xe_wait_ufence(fd, &data[0].vm_sync, USER_FENCE_VALUE,
> +			xe_wait_ufence(fd, &syncdata[0].vm_sync, USER_FENCE_VALUE,
>  				       bind_exec_queues[e], NSEC_PER_SEC);
> -			data[0].vm_sync = 0;
> +			syncdata[0].vm_sync = 0;
>  		}
>  
>  		if (flags & INVALIDATE && i + 1 != n_execs) {
> @@ -275,10 +293,10 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci,
>  				 * physical memory on next mmap call triggering
>  				 * an invalidate.
>  				 */
> -				xe_wait_ufence(fd, &data[i].exec_sync,
> +				xe_wait_ufence(fd, &syncdata[i].exec_sync,
>  					       USER_FENCE_VALUE, exec_queues[e],
>  					       NSEC_PER_SEC);
> -				igt_assert_eq(data[i].data, 0xc0ffee);
> +				igt_assert_eq(syncdata[i].data, 0xc0ffee);
>  			} else if (i * 2 != n_execs) {
>  				/*
>  				 * We issue 1 mmap which races against running
> @@ -319,17 +337,22 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci,
>  			int64_t timeout = NSEC_PER_SEC;
>  
>  			if (flags & INVALID_VA && !(flags & ENABLE_SCRATCH))
> -				igt_assert_eq(__xe_wait_ufence(fd, &data[i].exec_sync, USER_FENCE_VALUE,
> +				igt_assert_eq(__xe_wait_ufence(fd, &syncdata[i].exec_sync, USER_FENCE_VALUE,
>  							       exec_queues[i % n_exec_queues], &timeout), -EIO);
>  			else
> -				igt_assert_eq(__xe_wait_ufence(fd, &data[i].exec_sync, USER_FENCE_VALUE,
> +				igt_assert_eq(__xe_wait_ufence(fd, &syncdata[i].exec_sync, USER_FENCE_VALUE,
>  							       exec_queues[i % n_exec_queues], &timeout), 0);
>  		}
>  	}
> -	sync[0].addr = to_user_pointer(&data[0].vm_sync);
> +	sync[0].addr = to_user_pointer(&syncdata[0].vm_sync);
>  	xe_vm_unbind_async(fd, vm, bind_exec_queues[0], 0, addr, bo_size,
>  			   sync, 1);
> -	xe_wait_ufence(fd, &data[0].vm_sync, USER_FENCE_VALUE,
> +	xe_wait_ufence(fd, &syncdata[0].vm_sync, USER_FENCE_VALUE,
> +		       bind_exec_queues[0], NSEC_PER_SEC);
> +	syncdata[0].vm_sync = 0;
> +	xe_vm_unbind_async(fd, vm, bind_exec_queues[0], 0, syncaddr, sync_size,
> +			   sync, 1);
> +	xe_wait_ufence(fd, &syncdata[0].vm_sync, USER_FENCE_VALUE,
>  		       bind_exec_queues[0], NSEC_PER_SEC);
>  
>  	if (flags & INVALID_FAULT) {
> @@ -337,13 +360,13 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci,
>  			int ret;
>  			int64_t timeout = NSEC_PER_SEC;
>  
> -			ret = __xe_wait_ufence(fd, &data[i].exec_sync, USER_FENCE_VALUE,
> +			ret = __xe_wait_ufence(fd, &syncdata[i].exec_sync, USER_FENCE_VALUE,
>  					       exec_queues[i % n_exec_queues], &timeout);
>  			igt_assert(ret == -EIO || ret == 0);
>  		}
>  	} else if (!(flags & INVALID_VA)) {
>  		for (i = j; i < n_execs; i++)
> -			igt_assert_eq(data[i].data, 0xc0ffee);
> +			igt_assert_eq(syncdata[i].data, 0xc0ffee);
>  
>  	}
>  
> @@ -353,12 +376,15 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci,
>  			xe_exec_queue_destroy(fd, bind_exec_queues[i]);
>  	}
>  
> +	munmap(syncdata, sync_size);
> +
>  	if (bo) {
>  		munmap(data, bo_size);
>  		gem_close(fd, bo);
>  	} else if (!(flags & INVALIDATE)) {
>  		free(data);
>  	}
> +
>  	xe_vm_destroy(fd, vm);
>  	if (map_fd != -1)
>  		close(map_fd);
> -- 
> 2.25.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

next prev parent reply	other threads:[~2024-11-05 23:55 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-30 23:03 [i-g-t 0/4] separate sync data and batch buffer fei.yang
2024-10-30 23:03 ` [i-g-t 1/4] tests/intel/xe_exec_fault_mode: " fei.yang
2024-11-05 23:55   ` Matt Roper [this message]
2024-10-30 23:03 ` [i-g-t 2/4] tests/intel/xe_exec_threads: remove redundant wait fei.yang
2024-11-05 23:59   ` Matt Roper
2024-10-30 23:03 ` [i-g-t 3/4] tests/intel/xe_exec_threads: wait for all submissions to complete fei.yang
2024-10-30 23:03 ` [i-g-t 4/4] tests/intel/xe_exec_threads: separate sync data and batch buffer fei.yang
2024-10-31  0:57 ` ✓ CI.xeBAT: success for " Patchwork
2024-10-31  0:59 ` ✓ Fi.CI.BAT: " Patchwork
2024-10-31  2:05 ` ✗ CI.xeFULL: failure " Patchwork
2024-10-31 11:10 ` ✗ Fi.CI.IGT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241105235544.GN4891@mdroper-desk1.amr.corp.intel.com \
    --to=matthew.d.roper@intel.com \
    --cc=fei.yang@intel.com \
    --cc=igt-dev@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox