All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: vitaly.prosyak@amd.com, igt-dev@lists.freedesktop.org
Cc: Alex Deucher <alexander.deucher@amd.com>,
	Jesse Zhang <jesse.zhang@amd.com>
Subject: Re: [PATCH] tests/amdgpu: add USERPTR PTE invalidation regression test
Date: Wed, 13 May 2026 09:02:56 +0200	[thread overview]
Message-ID: <8e8bb1f2-df0e-4826-9eec-e9b189b61c49@amd.com> (raw)
In-Reply-To: <20260512182348.115625-1-vitaly.prosyak@amd.com>

On 5/12/26 20:21, vitaly.prosyak@amd.com wrote:
> From: Vitaly Prosyak <vitaly.prosyak@amd.com>
> 
> Add amd_userptr_invalidation test to verify that GPU page table entries
> are properly invalidated when the userspace backing of a USERPTR buffer
> object is released via munmap().
> 
> The test contains two subtests:
> 
>   userptr-unmap-revalidate:
>     Allocates a USERPTR BO, releases its backing via munmap(), then
>     submits a CS that still references the BO in the bo_list.  Verifies
>     that the kernel detects the invalidated BO during revalidation and
>     rejects the command submission.
> 
>   userptr-unmap-stress:
>     Allocates a 256 MB USERPTR region, establishes GPU mappings with an
>     initial SDMA copy, releases the backing via munmap(), then creates
>     memory pressure with pipes and child processes.  A second SDMA copy
>     through the old VA range is submitted without the USERPTR BO in the
>     bo_list.  Verifies that GPU PTEs were invalidated by checking for
>     GPU page faults (via klogctl) and confirming that the destination
>     buffer does not contain the original data pattern.
> 
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Jesse Zhang <jesse.zhang@amd.com>
> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>

Skimming over it the patch looks like it does the right thing, but I'm certainly not an expert for the code base.

So only Acked-by: Christian König <christian.koenig@amd.com>.

Regards,
Christian.

> ---
>  lib/amdgpu/amd_command_submission.c     |   2 +-
>  tests/amdgpu/amd_userptr_invalidation.c | 590 ++++++++++++++++++++++++
>  tests/amdgpu/meson.build                |   1 +
>  3 files changed, 592 insertions(+), 1 deletion(-)
>  create mode 100644 tests/amdgpu/amd_userptr_invalidation.c
> 
> diff --git a/lib/amdgpu/amd_command_submission.c b/lib/amdgpu/amd_command_submission.c
> index c80e06fb5..1a5fd9446 100644
> --- a/lib/amdgpu/amd_command_submission.c
> +++ b/lib/amdgpu/amd_command_submission.c
> @@ -139,7 +139,7 @@ int amdgpu_test_exec_cs_helper(amdgpu_device_handle device, unsigned int ip_type
>  						 0, &expired);
>  		ring_context->err_codes.err_code_wait_for_fence = r;
>  		if (expect_failure) {
> -			igt_info("EXPECT FAILURE amdgpu_cs_query_fence_status%d"
> +			igt_info("EXPECT FAILURE amdgpu_cs_query_fence_status %d\n"
>  				 "expired %d PID %d\n", r, expired, getpid());
>  		} else {
>  			/* we allow ECANCELED or ENODATA for good jobs temporally */
> diff --git a/tests/amdgpu/amd_userptr_invalidation.c b/tests/amdgpu/amd_userptr_invalidation.c
> new file mode 100644
> index 000000000..6938c11a4
> --- /dev/null
> +++ b/tests/amdgpu/amd_userptr_invalidation.c
> @@ -0,0 +1,590 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright 2026 Advanced Micro Devices, Inc.
> + *
> + * Test for USERPTR BO PTE invalidation after munmap.
> + *
> + * When userspace releases the backing pages of a USERPTR buffer object
> + * via munmap(), the kernel must invalidate the corresponding GPU page
> + * table entries so that subsequent command submissions through the same
> + * GPU virtual address range do not access the old physical pages.
> + *
> + * The test verifies two aspects of this behavior:
> + *
> + *   userptr-unmap-revalidate
> + *       Submits a CS that includes the USERPTR BO in its bo_list after
> + *       the backing has been released.  The kernel is expected to detect
> + *       the invalidated BO during revalidation and reject the submission.
> + *
> + *   userptr-unmap-stress
> + *       Allocates a large (256 MB) USERPTR region, establishes GPU
> + *       mappings via an initial SDMA copy, then releases the backing
> + *       and creates memory pressure with many pipes and child processes.
> + *       A second SDMA copy through the old VA range is submitted without
> + *       the USERPTR BO in the bo_list.  The test verifies that the GPU
> + *       does not read back the original data pattern (0xAA), confirming
> + *       that the PTEs were properly invalidated.
> + *
> + *       Detection uses three complementary signals:
> + *         - GPU page faults logged by the kernel (klogctl)
> + *         - destination buffer containing only zeros (dummy page)
> + *         - absence of original 0xAA pattern in destination
> + */
> +
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <signal.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/mman.h>
> +#include <sys/klog.h>
> +#include <sys/wait.h>
> +#include <unistd.h>
> +
> +#include "igt.h"
> +#include "ioctl_wrappers.h"
> +#include "lib/amdgpu/amd_memory.h"
> +#include "lib/amdgpu/amd_sdma.h"
> +#include "lib/amdgpu/amd_ip_blocks.h"
> +#include "lib/amdgpu/amd_command_submission.h"
> +#include "lib/amdgpu/amd_utils.h"
> +
> +#define BUF_SZ			(64 * 1024)
> +#define PM4_DW			256
> +
> +#define STRESS_TARGET_SZ	(256UL * 1024 * 1024)
> +#define STRESS_CHILDREN		2048
> +#define STRESS_PIPES		200000
> +#define STRESS_SCAN_CHUNK	(4UL * 1024 * 1024)
> +#define STRESS_PTE_STEP		(64UL * 1024 * 1024)
> +
> +/**
> + * count_gpu_page_faults() - count GPU page faults in dmesg for a given PID
> + * @pid: process ID to match in fault messages
> + * @since_uptime_ms: only count faults after this uptime (milliseconds)
> + *
> + * Read the kernel ring buffer via klogctl(3) and count lines containing
> + * "[gfxhub] page fault" followed by the given PID on the next line.
> + * Only messages with a kernel timestamp >= @since_uptime_ms are counted.
> + *
> + * Return: number of matching page fault entries.
> + */
> +static unsigned int
> +count_gpu_page_faults(pid_t pid, unsigned long since_uptime_ms)
> +{
> +	int bufsize, len;
> +	char *buf, *p, *line_start;
> +	unsigned int count = 0;
> +	unsigned int total_lines = 0;
> +	unsigned int fault_lines = 0;
> +	char pid_pattern[64];
> +	bool prev_was_fault = false;
> +
> +	snprintf(pid_pattern, sizeof(pid_pattern), "pid %d ", (int)pid);
> +
> +	bufsize = klogctl(10, NULL, 0);
> +	if (bufsize <= 0)
> +		bufsize = 1 << 20;
> +
> +	buf = malloc(bufsize + 1);
> +	if (!buf)
> +		return 0;
> +
> +	len = klogctl(3, buf, bufsize);
> +	if (len <= 0) {
> +		free(buf);
> +		return 0;
> +	}
> +	buf[len] = '\0';
> +
> +	line_start = buf;
> +	for (p = buf; p <= buf + len; p++) {
> +		const char *ts_start;
> +		double ts;
> +
> +		if (*p != '\n' && *p != '\0')
> +			continue;
> +
> +		*p = '\0';
> +		total_lines++;
> +
> +		/* Filter by kernel timestamp */
> +		ts_start = strchr(line_start, '[');
> +		if (ts_start && sscanf(ts_start, "[%lf]", &ts) == 1) {
> +			unsigned long ts_ms = (unsigned long)(ts * 1000.0);
> +
> +			if (ts_ms < since_uptime_ms) {
> +				prev_was_fault = false;
> +				line_start = p + 1;
> +				continue;
> +			}
> +		}
> +
> +		if (strstr(line_start, "[gfxhub] page fault")) {
> +			prev_was_fault = true;
> +			fault_lines++;
> +		} else if (prev_was_fault && strstr(line_start, pid_pattern)) {
> +			count++;
> +			prev_was_fault = false;
> +		} else {
> +			prev_was_fault = false;
> +		}
> +
> +		line_start = p + 1;
> +	}
> +
> +	igt_info("  klogctl: lines=%u faults=%u matched=%u\n",
> +		 total_lines, fault_lines, count);
> +	free(buf);
> +	return count;
> +}
> +
> +/**
> + * get_uptime_ms() - read current system uptime in milliseconds
> + *
> + * Read /proc/uptime for the kernel monotonic timestamp that matches
> + * the timestamps used in dmesg.
> + *
> + * Return: uptime in milliseconds, or 0 on error.
> + */
> +static unsigned long get_uptime_ms(void)
> +{
> +	FILE *fp;
> +	double uptime;
> +
> +	fp = fopen("/proc/uptime", "r");
> +	if (!fp)
> +		return 0;
> +	if (fscanf(fp, "%lf", &uptime) != 1)
> +		uptime = 0;
> +	fclose(fp);
> +	return (unsigned long)(uptime * 1000.0);
> +}
> +
> +/**
> + * amdgpu_userptr_unmap_revalidate() - test CS rejection after munmap
> + * @dev: amdgpu device handle
> + *
> + * Allocate a USERPTR BO, release its backing via munmap(), then submit
> + * a CS that still references the BO in its bo_list.  The kernel should
> + * detect that the BO pages are no longer valid and reject the CS.
> + *
> + * If the CS is accepted, the destination buffer is scanned for bytes
> + * that do not match the original fill or zero patterns.
> + */
> +static void amdgpu_userptr_unmap_revalidate(amdgpu_device_handle dev)
> +{
> +	const struct amdgpu_ip_block_version *ip_block;
> +	struct amdgpu_ring_context *ring_context;
> +	amdgpu_bo_handle up_bo;
> +	amdgpu_va_handle up_va_h;
> +	uint64_t up_va;
> +	void *up_cpu;
> +	amdgpu_bo_handle dst_bo;
> +	amdgpu_va_handle dst_va_h;
> +	uint64_t dst_mc;
> +	void *dst_cpu_ptr;
> +	uint8_t *dst;
> +	unsigned int suspicious;
> +	uint64_t i;
> +	int r;
> +
> +	ip_block = get_ip_block(dev, AMDGPU_HW_IP_DMA);
> +	igt_assert(ip_block);
> +
> +	ring_context = calloc(1, sizeof(*ring_context));
> +	igt_assert(ring_context);
> +	ring_context->write_length = BUF_SZ;
> +	ring_context->pm4 = calloc(PM4_DW, sizeof(*ring_context->pm4));
> +	ring_context->pm4_size = PM4_DW;
> +	ring_context->secure = false;
> +	ring_context->res_cnt = 2;
> +	igt_assert(ring_context->pm4);
> +
> +	r = amdgpu_cs_ctx_create(dev, &ring_context->context_handle);
> +	igt_assert_eq(r, 0);
> +
> +	/* Allocate and fill USERPTR BO */
> +	up_cpu = mmap(NULL, BUF_SZ, PROT_READ | PROT_WRITE,
> +		      MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
> +	igt_assert(up_cpu != MAP_FAILED);
> +	memset(up_cpu, 0x77, BUF_SZ);
> +
> +	r = amdgpu_create_bo_from_user_mem(dev, up_cpu, BUF_SZ, &up_bo);
> +	igt_assert_eq(r, 0);
> +
> +	r = amdgpu_va_range_alloc(dev, amdgpu_gpu_va_range_general,
> +				  BUF_SZ, sysconf(_SC_PAGE_SIZE), 0,
> +				  &up_va, &up_va_h, 0);
> +	igt_assert_eq(r, 0);
> +
> +	r = amdgpu_bo_va_op(up_bo, 0, BUF_SZ, up_va, 0, AMDGPU_VA_OP_MAP);
> +	igt_assert_eq(r, 0);
> +
> +	/* Allocate VRAM destination */
> +	r = amdgpu_bo_alloc_and_map(dev, BUF_SZ, sysconf(_SC_PAGE_SIZE),
> +				    AMDGPU_GEM_DOMAIN_VRAM,
> +				    AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED,
> +				    &dst_bo, &dst_cpu_ptr, &dst_mc,
> +				    &dst_va_h);
> +	igt_assert_eq(r, 0);
> +	memset(dst_cpu_ptr, 0, BUF_SZ);
> +
> +	/* Release USERPTR backing before CS */
> +	munmap(up_cpu, BUF_SZ);
> +
> +	/* Submit CS with invalidated USERPTR BO in bo_list */
> +	ring_context->bo_mc = up_va;
> +	ring_context->bo_mc2 = dst_mc;
> +	ring_context->resources[0] = up_bo;
> +	ring_context->resources[1] = dst_bo;
> +
> +	ip_block->funcs->copy_linear(ip_block->funcs, ring_context,
> +				     &ring_context->pm4_dw);
> +
> +	r = amdgpu_test_exec_cs_helper(dev, ip_block->type, ring_context, 1);
> +	r = ring_context->err_codes.err_code_cs_submit;
> +
> +	if (r != 0) {
> +		igt_info("CS rejected (r=%d) after munmap\n", r);
> +	} else {
> +		dst = (uint8_t *)dst_cpu_ptr;
> +		suspicious = 0;
> +		for (i = 0; i < BUF_SZ; i++) {
> +			if (dst[i] != 0 && dst[i] != 0x77)
> +				suspicious++;
> +		}
> +		igt_info("CS completed: %u/%d unexpected bytes\n",
> +			 suspicious, BUF_SZ);
> +	}
> +
> +	amdgpu_bo_unmap_and_free(dst_bo, dst_va_h, dst_mc, BUF_SZ);
> +	amdgpu_bo_va_op(up_bo, 0, BUF_SZ, up_va, 0, AMDGPU_VA_OP_UNMAP);
> +	amdgpu_va_range_free(up_va_h);
> +	amdgpu_bo_free(up_bo);
> +	amdgpu_cs_ctx_free(ring_context->context_handle);
> +	free(ring_context->pm4);
> +	free(ring_context);
> +}
> +
> +/**
> + * amdgpu_userptr_unmap_stress() - stress test PTE invalidation
> + * @dev: amdgpu device handle
> + *
> + * Phase 1: Allocate a large USERPTR region filled with 0xAA, create a
> + *          VRAM destination BO, and perform an initial SDMA copy to
> + *          populate GPU page table entries.
> + *
> + * Phase 2: Release the USERPTR backing via munmap().  This triggers the
> + *          MMU notifier which should invalidate the GPU PTEs.
> + *
> + * Phase 3: Create memory pressure by opening many pipes and forking
> + *          child processes.  This increases the chance that the freed
> + *          physical pages are reassigned.
> + *
> + * Phase 4: Poison the destination with 0xCC and submit a second SDMA
> + *          copy through the old VA range without the USERPTR BO in the
> + *          bo_list.  If PTEs were invalidated, the GPU will fault and
> + *          the fault handler will redirect reads to a zeroed dummy page.
> + *          The test checks that no original 0xAA data appears in the
> + *          destination.
> + */
> +static void amdgpu_userptr_unmap_stress(amdgpu_device_handle dev)
> +{
> +	const struct amdgpu_ip_block_version *ip_block;
> +	struct amdgpu_ring_context *ring_context;
> +	amdgpu_bo_handle up_bo;
> +	amdgpu_va_handle up_va_h;
> +	uint64_t up_va;
> +	void *up_cpu;
> +	amdgpu_bo_handle dst_bo;
> +	amdgpu_va_handle dst_va_h;
> +	uint64_t dst_mc;
> +	void *dst_cpu_ptr;
> +	int (*pipes)[2];
> +	unsigned int pipes_opened;
> +	pid_t *children;
> +	unsigned int children_spawned;
> +	uint64_t off;
> +	unsigned int i;
> +	pid_t pid;
> +	volatile uint8_t sink;
> +	uint8_t *base;
> +	uint8_t *scan;
> +	uint64_t p;
> +	unsigned int non_poison;
> +	unsigned int original_count;
> +	unsigned int page_faults;
> +	unsigned long ts_before;
> +	pid_t my_pid;
> +	int r;
> +
> +	up_bo = NULL;
> +	up_cpu = MAP_FAILED;
> +	pipes = NULL;
> +	pipes_opened = 0;
> +	children = NULL;
> +	children_spawned = 0;
> +
> +	ip_block = get_ip_block(dev, AMDGPU_HW_IP_DMA);
> +	igt_assert(ip_block);
> +
> +	ring_context = calloc(1, sizeof(*ring_context));
> +	igt_assert(ring_context);
> +	ring_context->pm4 = calloc(PM4_DW, sizeof(*ring_context->pm4));
> +	ring_context->pm4_size = PM4_DW;
> +	ring_context->secure = false;
> +	igt_assert(ring_context->pm4);
> +
> +	r = amdgpu_cs_ctx_create(dev, &ring_context->context_handle);
> +	igt_assert_eq(r, 0);
> +
> +	r = amdgpu_bo_alloc_and_map(dev, STRESS_SCAN_CHUNK,
> +				    sysconf(_SC_PAGE_SIZE),
> +				    AMDGPU_GEM_DOMAIN_VRAM,
> +				    AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED,
> +				    &dst_bo, &dst_cpu_ptr, &dst_mc,
> +				    &dst_va_h);
> +	igt_assert_eq(r, 0);
> +
> +	/* Phase 1: allocate USERPTR region and establish GPU mappings */
> +	igt_info("Phase 1: allocating %lu MB USERPTR region\n",
> +		 (unsigned long)(STRESS_TARGET_SZ / (1024 * 1024)));
> +
> +	up_cpu = mmap(NULL, STRESS_TARGET_SZ, PROT_READ | PROT_WRITE,
> +		      MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
> +	igt_assert(up_cpu != MAP_FAILED);
> +	memset(up_cpu, 0xAA, STRESS_TARGET_SZ);
> +
> +	r = amdgpu_create_bo_from_user_mem(dev, up_cpu, STRESS_TARGET_SZ,
> +					   &up_bo);
> +	igt_assert_eq(r, 0);
> +
> +	r = amdgpu_va_range_alloc(dev, amdgpu_gpu_va_range_general,
> +				  STRESS_TARGET_SZ, sysconf(_SC_PAGE_SIZE), 0,
> +				  &up_va, &up_va_h, 0);
> +	igt_assert_eq(r, 0);
> +
> +	r = amdgpu_bo_va_op(up_bo, 0, STRESS_TARGET_SZ, up_va, 0,
> +			     AMDGPU_VA_OP_MAP);
> +	igt_assert_eq(r, 0);
> +
> +	/* Touch pages at intervals to populate GPU PTEs */
> +	sink = 0;
> +	base = (uint8_t *)up_cpu;
> +	for (off = 0; off < STRESS_TARGET_SZ; off += STRESS_PTE_STEP)
> +		sink += base[off];
> +	(void)sink;
> +
> +	/* Initial SDMA copy to ensure GPU has walked the page tables */
> +	ring_context->bo_mc = up_va;
> +	ring_context->bo_mc2 = dst_mc;
> +	ring_context->write_length = STRESS_SCAN_CHUNK;
> +	ring_context->resources[0] = up_bo;
> +	ring_context->resources[1] = dst_bo;
> +	ring_context->res_cnt = 2;
> +
> +	ip_block->funcs->copy_linear(ip_block->funcs, ring_context,
> +				     &ring_context->pm4_dw);
> +	r = amdgpu_test_exec_cs_helper(dev, ip_block->type, ring_context, 0);
> +	igt_assert_eq(r, 0);
> +	igt_info("Phase 1: initial SDMA copy OK\n");
> +
> +	/* Phase 2: release USERPTR backing */
> +	igt_info("Phase 2: munmap() %lu MB USERPTR backing\n",
> +		 (unsigned long)(STRESS_TARGET_SZ / (1024 * 1024)));
> +	my_pid = getpid();
> +	ts_before = get_uptime_ms();
> +	munmap(up_cpu, STRESS_TARGET_SZ);
> +	up_cpu = MAP_FAILED;
> +
> +	/* Phase 3: create memory pressure */
> +	igt_info("Phase 3: creating memory pressure\n");
> +
> +	pipes = calloc(STRESS_PIPES, sizeof(*pipes));
> +	igt_assert(pipes);
> +
> +	for (i = 0; i < STRESS_PIPES; i++) {
> +		if (pipe(pipes[i]) < 0) {
> +			igt_info("  pipe allocation stopped at %u/%d (errno=%d)\n",
> +				 i, STRESS_PIPES, errno);
> +			break;
> +		}
> +		(void)write(pipes[i][1], "X", 1);
> +		pipes_opened = i + 1;
> +	}
> +	igt_info("  opened %u pipes\n", pipes_opened);
> +
> +	children = calloc(STRESS_CHILDREN, sizeof(*children));
> +	igt_assert(children);
> +
> +	for (i = 0; i < STRESS_CHILDREN; i++) {
> +		pid = fork();
> +		if (pid == 0) {
> +			pause();
> +			_exit(0);
> +		}
> +		igt_assert(pid > 0);
> +		children[i] = pid;
> +		children_spawned = i + 1;
> +	}
> +	igt_info("  spawned %u children\n", children_spawned);
> +
> +	/*
> +	 * Phase 4: submit SDMA copy through the old VA range.
> +	 *
> +	 * The USERPTR BO is intentionally omitted from the bo_list so
> +	 * the kernel does not attempt to revalidate it.  If the PTEs
> +	 * were invalidated, the SDMA engine will fault and the kernel
> +	 * fault handler will map a zeroed dummy page, so the
> +	 * destination will contain zeros instead of the original 0xAA.
> +	 */
> +	igt_info("Phase 4: submitting CS through unmapped VA range\n");
> +
> +	memset(dst_cpu_ptr, 0xCC, STRESS_SCAN_CHUNK);
> +
> +	memset(ring_context->pm4, 0, PM4_DW * sizeof(uint32_t));
> +	ring_context->pm4_dw = 0;
> +	ring_context->bo_mc = up_va;
> +	ring_context->bo_mc2 = dst_mc;
> +	ring_context->write_length = STRESS_SCAN_CHUNK;
> +	ring_context->res_cnt = 1;
> +	ring_context->resources[0] = dst_bo;
> +
> +	ip_block->funcs->copy_linear(ip_block->funcs, ring_context,
> +				     &ring_context->pm4_dw);
> +
> +	r = amdgpu_test_exec_cs_helper(dev, ip_block->type, ring_context, 1);
> +	if (ring_context->err_codes.err_code_cs_submit != 0) {
> +		igt_info("  CS rejected (r=%d)\n",
> +			 ring_context->err_codes.err_code_cs_submit);
> +		goto cleanup;
> +	}
> +
> +	/* Scan destination for original data pattern */
> +	scan = (uint8_t *)dst_cpu_ptr;
> +	non_poison = 0;
> +	original_count = 0;
> +	for (p = 0; p < STRESS_SCAN_CHUNK; p++) {
> +		if (scan[p] != 0xCC)
> +			non_poison++;
> +		if (scan[p] == 0xAA)
> +			original_count++;
> +	}
> +
> +	igt_info("  %u/%lu non-poison bytes (%u original 0xAA)\n",
> +		 non_poison, (unsigned long)STRESS_SCAN_CHUNK,
> +		 original_count);
> +
> +cleanup:
> +	for (i = 0; i < children_spawned; i++) {
> +		kill(children[i], SIGKILL);
> +		waitpid(children[i], NULL, 0);
> +	}
> +	free(children);
> +
> +	for (i = 0; i < pipes_opened; i++) {
> +		close(pipes[i][0]);
> +		close(pipes[i][1]);
> +	}
> +	free(pipes);
> +
> +	/*
> +	 * Read GPU page faults after releasing file descriptors so
> +	 * klogctl has room to work.  Brief sleep to let deferred
> +	 * printk flush any remaining fault messages.
> +	 */
> +	usleep(500000);
> +	page_faults = count_gpu_page_faults(my_pid, ts_before);
> +	igt_info("  %u GPU page faults for PID %d\n",
> +		 page_faults, (int)my_pid);
> +
> +	if (up_bo) {
> +		amdgpu_bo_va_op(up_bo, 0, STRESS_TARGET_SZ, up_va, 0,
> +				AMDGPU_VA_OP_UNMAP);
> +		amdgpu_va_range_free(up_va_h);
> +		amdgpu_bo_free(up_bo);
> +	}
> +
> +	amdgpu_bo_unmap_and_free(dst_bo, dst_va_h, dst_mc, STRESS_SCAN_CHUNK);
> +	amdgpu_cs_ctx_free(ring_context->context_handle);
> +	free(ring_context->pm4);
> +	free(ring_context);
> +
> +	/*
> +	 * Invalidation is confirmed when any of the following holds:
> +	 *
> +	 *   (a) CS was rejected outright (already handled above).
> +	 *   (b) GPU page faults were logged for this PID.
> +	 *   (c) Destination contains non-original data (zeros from the
> +	 *       dummy page), proving PTEs no longer point at the old
> +	 *       physical pages.
> +	 *
> +	 * Page fault messages may be suppressed by printk ratelimiting,
> +	 * so the data pattern check (c) is the primary detection method.
> +	 */
> +	if (page_faults > 0) {
> +		igt_info("PTE invalidation confirmed: %u page faults\n",
> +			 page_faults);
> +	} else if (non_poison > 0 && original_count == 0) {
> +		igt_info("PTE invalidation confirmed: dummy page data\n");
> +	} else {
> +		igt_assert_f(non_poison == 0 || original_count == 0,
> +			     "destination contains %u bytes of original data "
> +			     "(0xAA) after munmap with no GPU page faults\n",
> +			     original_count);
> +	}
> +}
> +
> +int igt_main()
> +{
> +	amdgpu_device_handle device;
> +	struct amdgpu_gpu_info gpu_info = {0};
> +	uint32_t major, minor;
> +	int fd = -1;
> +	int r;
> +	bool arr_cap[AMD_IP_MAX] = {0};
> +
> +	igt_fixture() {
> +		log_total_time(true, igt_test_name());
> +		fd = drm_open_driver(DRIVER_AMDGPU);
> +
> +		r = amdgpu_device_initialize(fd, &major, &minor, &device);
> +		igt_require(r == 0);
> +
> +		igt_info("Initialized amdgpu, driver version %d.%d\n",
> +			 major, minor);
> +
> +		r = amdgpu_query_gpu_info(device, &gpu_info);
> +		igt_assert_eq(r, 0);
> +
> +		r = setup_amdgpu_ip_blocks(major, minor, &gpu_info, device);
> +		igt_assert_eq(r, 0);
> +
> +		asic_rings_readness(device, 1, arr_cap);
> +	}
> +
> +	igt_describe("Submit CS with USERPTR BO in bo_list after munmap "
> +		     "and verify the kernel rejects it");
> +	igt_subtest_with_dynamic("userptr-unmap-revalidate") {
> +		igt_require(arr_cap[AMD_IP_DMA]);
> +		igt_dynamic_f("userptr-unmap-revalidate")
> +			amdgpu_userptr_unmap_revalidate(device);
> +	}
> +
> +	igt_describe("Stress test: release large USERPTR backing under "
> +		     "memory pressure and verify GPU PTEs are invalidated");
> +	igt_subtest_with_dynamic("userptr-unmap-stress") {
> +		igt_require(arr_cap[AMD_IP_DMA]);
> +		igt_dynamic_f("userptr-unmap-stress")
> +			amdgpu_userptr_unmap_stress(device);
> +	}
> +
> +	igt_fixture() {
> +		amdgpu_device_deinitialize(device);
> +		drm_close_driver(fd);
> +		log_total_time(false, igt_test_name());
> +	}
> +}
> diff --git a/tests/amdgpu/meson.build b/tests/amdgpu/meson.build
> index 0dc689e40..801239547 100644
> --- a/tests/amdgpu/meson.build
> +++ b/tests/amdgpu/meson.build
> @@ -53,6 +53,7 @@ if libdrm_amdgpu.found()
>  			  'amd_vpe',
>  			  'amd_mem',
>  			  'amd_remote_mem',
> +                          'amd_userptr_invalidation',
>  			]
>  	if libdrm_amdgpu.version().version_compare('> 2.4.97')
>  		amdgpu_progs +=[ 'amd_syncobj', ]


  parent reply	other threads:[~2026-05-13  7:03 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 18:21 [PATCH] tests/amdgpu: add USERPTR PTE invalidation regression test vitaly.prosyak
2026-05-13  0:20 ` ✓ i915.CI.BAT: success for " Patchwork
2026-05-13  1:13 ` [PATCH] " Zhang, Jesse(Jie)
2026-05-13  1:33 ` ✓ Xe.CI.BAT: success for " Patchwork
2026-05-13  7:02 ` Christian König [this message]
2026-05-13 19:15 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-05-14  0:41 ` ✗ i915.CI.Full: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8e8bb1f2-df0e-4826-9eec-e9b189b61c49@amd.com \
    --to=christian.koenig@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=igt-dev@lists.freedesktop.org \
    --cc=jesse.zhang@amd.com \
    --cc=vitaly.prosyak@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.