[PATCH i-g-t v3] tests/intel/xe_exec: Add VM rebind stress test with cpumask cycling

public inbox for igt-dev@lists.freedesktop.org
 help / color / mirror / Atom feed

From: S Sebinraj <s.sebinraj@intel.com>
To: igt-dev@lists.freedesktop.org
Cc: carlos.santa@intel.com, matthew.brost@intel.com,
	jeevaka.badrappan@intel.com, karthik.b.s@intel.com,
	krzysztof.karas@intel.com, kamil.konieczny@intel.com,
	zbigniew.kempczynski@intel.com, S Sebinraj <s.sebinraj@intel.com>
Subject: [PATCH i-g-t v3] tests/intel/xe_exec: Add VM rebind stress test with cpumask cycling
Date: Fri, 24 Apr 2026 09:54:57 +0530	[thread overview]
Message-ID: <20260424042457.2178562-1-s.sebinraj@intel.com> (raw)

Add a new subtest threads-wq-stress-rebind-bindexecqueue that stresses
the VM rebind path under workqueue CPU pool migration pressure.

The test spawns per-engine threads that continuously perform VM unbind/
rebind cycles using per-slot bind exec queues, while a helper child
process rapidly cycles the global unbound workqueue cpumask through
progressively wider CPU sets (f -> ff -> fff -> ffff) at 100ms intervals.

A new WQ_STRESS flag enables timed fence waits in test_legacy_mode at
three syncobj_wait() checkpoints (per-exec-queue, bind-chain, and unbind/
TLB-invalidation fences) using a 5-second deadline. If any fence misses
the deadline, a shared atomic is set and all threads bail out immediately
rather than running for the full 30-second window.

All GPU work runs in a forked child that writes its result via a pipe,
the parent polls with a 60-second timeout and restores the original
cpumask regardless of outcome.

When a hang is detected the child drops the kernel page cache
and issues xe_force_gt_reset_all() to unblock any DMA fences still
pending from the stress run, then writes the hang result and calls
_exit() immediately without closing the DRM fd. Closing the fd while GPU
work is stuck would block indefinitely in dma_resv_wait_timeout(intr=false,
TASK_UNINTERRUPTIBLE). For the same reason, GPU resource teardown
(exec queues, BOs, VMs) is skipped in test_legacy_mode when a hang
is detected.

On the failure path the test is aborted, as the process (child) gets into
a D+ state (hung) and may affect other tests if marked as fail alone. So
aborting would mean forcing a reboot in testing environment.

A bug of same signature regressed in Xe driver, where the whole system
hung due to a fencing signal not completing, which was finaly
traced to an issue in the kernel workqueue scheduling.
Hard reboot was the only way to bring the system back to work.
https://patchwork.freedesktop.org/patch/715805/

v2:
- Abort the hung test instead of marking as fail alone
- Code Comment corrections
- Fix type casting
- Write check for cpumask

v3:
- Rename title of commit
- Correct commenting style

Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com>
Cc: Santa Carlos <carlos.santa@intel.com>
Signed-off-by: S Sebinraj <s.sebinraj@intel.com>
---
 tests/intel/xe_exec_threads.c | 334 +++++++++++++++++++++++++++++++++-
 1 file changed, 327 insertions(+), 7 deletions(-)

diff --git a/tests/intel/xe_exec_threads.c b/tests/intel/xe_exec_threads.c
index f082a0eda..7197f4345 100644
--- a/tests/intel/xe_exec_threads.c
+++ b/tests/intel/xe_exec_threads.c
@@ -13,9 +13,15 @@
  */
 
 #include <fcntl.h>
+#include <inttypes.h>
+#include <poll.h>
+#include <signal.h>
+#include <stdatomic.h>
+#include <sys/wait.h>
 
 #include "igt.h"
 #include "lib/igt_syncobj.h"
+#include "lib/igt_thread.h"
 #include "lib/intel_reg.h"
 #include "xe_drm.h"
 
@@ -42,9 +48,89 @@
 #define BIND_EXEC_QUEUE	(0x1 << 13)
 #define MANY_QUEUES	(0x1 << 14)
 #define MULTI_QUEUE		(0x1 << 15)
+#define WQ_STRESS		(0x1 << 16)
+
+/*
+ * Maximum fence wait time when WQ_STRESS is active. If any bind/unbind
+ * fence takes longer than this to signal, the workqueue is considered stuck
+ * and the test fails — this is the direct symptom of the
+ * Xe hang caused by the kernel workqueue pool_workqueue pending_pwqs bug.
+ */
+#define WQ_FENCE_TIMEOUT_NS	(5LL * NSEC_PER_SEC)
+
+/*
+ * Maximum time the parent waits for the child to write its result byte.
+ * The child's stress loop runs for up to 30s (igt_until_timeout(30)), plus
+ * cleanup overhead (GT reset, drop_caches, cpumask restore, sleep(1)).
+ * 60s gives ample headroom on a healthy kernel.
+ */
+#define WQ_CHILD_TIMEOUT_MS	(60 * 1000)
+
+/* Sysfs node that controls the unbound workqueue CPU affinity mask. */
+#define WQ_CPUMASK_PATH		"/sys/devices/virtual/workqueue/cpumask"
+
+/* Procfs node for dropping the kernel's page, dentry, and inode caches. */
+#define DROP_CACHES_PATH	"/proc/sys/vm/drop_caches"
 
 pthread_barrier_t barrier;
 
+/*
+ * Set to true by the first thread that detects a fence stall under WQ_STRESS.
+ * All other threads and the igt_until_timeout loop check this to bail out
+ * immediately rather than hammering a hung kernel for the full timeout.
+ */
+static _Atomic bool wq_stress_hang_detected;
+
+/*
+ * stress_fence_deadline - compute an absolute CLOCK_MONOTONIC deadline
+ * 5 seconds from now, used as the syncobj_wait timeout under WQ_STRESS.
+ */
+static int64_t stress_fence_deadline(void)
+{
+	struct timespec ts;
+
+	clock_gettime(CLOCK_MONOTONIC, &ts);
+	return (int64_t)ts.tv_sec * NSEC_PER_SEC + (int64_t)ts.tv_nsec +
+		WQ_FENCE_TIMEOUT_NS;
+}
+
+/*
+ * cpumask_stressor_loop
+ *
+ * Rapidly cycles the kernel unbound workqueue cpumask through progressively
+ * wider CPU sets (mirroring the original shell reproduction script). This
+ * forces workqueue work items to be migrated between CPU pools, exercising
+ * the wq_node_nr_active / pool_workqueue plug-unplug path that hides the
+ * pending_pwqs scheduling bug.
+ *
+ * The original reproduction commands were:
+ *   for i in {1..1000}; do
+ *       echo f  > /sys/devices/virtual/workqueue/cpumask
+ *       echo ff > /sys/devices/virtual/workqueue/cpumask
+ *       ...
+ *       sleep .1
+ *   done
+ */
+static void cpumask_stressor_loop(void)
+{
+	static const char * const masks[] = { "f", "ff", "fff", "ffff" };
+	int wq_fd;
+
+	wq_fd = open(WQ_CPUMASK_PATH, O_WRONLY);
+	if (wq_fd < 0)
+		exit(IGT_EXIT_FAILURE);
+
+	for (;;) {
+		for (int i = 0; i < ARRAY_SIZE(masks); i++) {
+			if (write(wq_fd, masks[i], strlen(masks[i])) < 0 &&
+			    errno != EINVAL)
+				exit(IGT_EXIT_FAILURE); /* unexpected error - fail the test */
+			usleep(100000); /* 100ms */
+		}
+	}
+	close(wq_fd);
+}
+
 static void
 test_balancer(int fd, int gt, uint32_t vm, uint64_t addr, uint64_t userptr,
 	      int class, int n_exec_queues, int n_execs, unsigned int flags)
@@ -600,6 +686,10 @@ test_legacy_mode(int fd, uint32_t vm, uint64_t addr, uint64_t userptr,
 		uint64_t exec_addr;
 		int e = i % n_exec_queues;
 
+		/* Bail early if another thread already detected a hang */
+		if ((flags & WQ_STRESS) && atomic_load(&wq_stress_hang_detected))
+			goto wq_stress_cleanup;
+
 		if (flags & MANY_QUEUES) {
 			if (exec_queues[e]) {
 				igt_assert(syncobj_wait(fd, &syncobjs[e], 1,
@@ -693,15 +783,65 @@ test_legacy_mode(int fd, uint32_t vm, uint64_t addr, uint64_t userptr,
 		}
 	}
 
-	for (i = 0; i < n_exec_queues; i++)
-		igt_assert(syncobj_wait(fd, &syncobjs[i], 1, INT64_MAX, 0,
-					NULL));
-	igt_assert(syncobj_wait(fd, &sync[0].handle, 1, INT64_MAX, 0, NULL));
+
+	for (i = 0; i < n_exec_queues; i++) {
+		if (flags & WQ_STRESS) {
+			/* Drain all exec-queue fences under a 5 sec deadline */
+			/* A timeout means the workqueue is hung so bail immediately */
+			if (atomic_load(&wq_stress_hang_detected) ||
+			    !syncobj_wait(fd, &syncobjs[i], 1,
+					  stress_fence_deadline(), 0, NULL)) {
+				igt_critical("exec-queue[%d] fence stalled "
+					 "under WQ_STRESS, workqueue "
+					 "scheduling hang suspected\n", i);
+				atomic_store(&wq_stress_hang_detected, true);
+				igt_thread_fail();
+				goto wq_stress_cleanup;
+			}
+		} else {
+			igt_assert(syncobj_wait(fd, &syncobjs[i], 1,
+						INT64_MAX, 0, NULL));
+		}
+	}
+
+	if (flags & WQ_STRESS) {
+		if (atomic_load(&wq_stress_hang_detected) ||
+		    !syncobj_wait(fd, &sync[0].handle, 1,
+				  stress_fence_deadline(), 0, NULL)) {
+			igt_critical("bind-chain fence stalled under WQ_STRESS\n");
+			atomic_store(&wq_stress_hang_detected, true);
+			igt_thread_fail();
+			goto wq_stress_cleanup;
+		}
+	} else {
+		igt_assert(syncobj_wait(fd, &sync[0].handle, 1,
+					INT64_MAX, 0, NULL));
+	}
 
 	sync[0].flags |= DRM_XE_SYNC_FLAG_SIGNAL;
 	xe_vm_unbind_async(fd, vm, bind_exec_queues[0], 0, addr,
 			   bo_size, sync, 1);
-	igt_assert(syncobj_wait(fd, &sync[0].handle, 1, INT64_MAX, 0, NULL));
+	if (flags & WQ_STRESS) {
+		/*
+		 * This is the most critical fence under WQ_STRESS: it covers the
+		 * TLB-invalidation completion triggered by xe_vm_unbind_async().
+		 * If ttm_bo_delayed_delete() workers are stuck in the workqueue
+		 * the TLB flush fence will never signal and we will timeout here.
+		 */
+		if (atomic_load(&wq_stress_hang_detected) ||
+		    !syncobj_wait(fd, &sync[0].handle, 1,
+				  stress_fence_deadline(), 0, NULL)) {
+			igt_critical("unbind/TLB-invalidation fence stalled "
+				 "under WQ_STRESS, "
+				 "ttm_bo_delayed_delete work item likely stuck\n");
+			atomic_store(&wq_stress_hang_detected, true);
+			igt_thread_fail();
+			goto wq_stress_cleanup;
+		}
+	} else {
+		igt_assert(syncobj_wait(fd, &sync[0].handle, 1,
+					INT64_MAX, 0, NULL));
+	}
 
 	for (i = flags & INVALIDATE ? n_execs - 1 : 0;
 	     i < n_execs; i++) {
@@ -713,9 +853,21 @@ test_legacy_mode(int fd, uint32_t vm, uint64_t addr, uint64_t userptr,
 			igt_assert_eq(data[i].data, 0xc0ffee);
 	}
 
+wq_stress_cleanup:
 	syncobj_destroy(fd, sync[0].handle);
 	for (i = 0; i < n_exec_queues; i++) {
 		syncobj_destroy(fd, syncobjs[i]);
+		if (flags & WQ_STRESS && atomic_load(&wq_stress_hang_detected)) {
+			/*
+			 * Under WQ_STRESS, if a hang was detected skip all GPU resource
+			 * teardown calls (xe_exec_queue_destroy, xe_vm_destroy, gem_close).
+			 * Those ioctls wait for pending GPU work to drain and will hang
+			 * indefinitely if the workqueue is stuck. The kernel reclaims all
+			 * GPU resources automatically on process exit.
+			 */
+			continue;
+		}
+
 		xe_exec_queue_destroy(fd, exec_queues[i]);
 		if (bind_exec_queues[i])
 			xe_exec_queue_destroy(fd, bind_exec_queues[i]);
@@ -723,11 +875,14 @@ test_legacy_mode(int fd, uint32_t vm, uint64_t addr, uint64_t userptr,
 
 	if (bo) {
 		munmap(data, bo_size);
-		gem_close(fd, bo);
+		if (!(flags & WQ_STRESS) || !atomic_load(&wq_stress_hang_detected))
+			gem_close(fd, bo);
 	} else if (!(flags & INVALIDATE)) {
 		free(data);
 	}
-	if (owns_vm)
+
+	if (owns_vm &&
+	    (!(flags & WQ_STRESS) || !atomic_load(&wq_stress_hang_detected)))
 		xe_vm_destroy(fd, vm);
 	if (owns_fd)
 		drm_close_driver(fd);
@@ -1529,6 +1684,171 @@ int igt_main()
 		}
 	}
 
+	/**
+	 * SUBTEST: threads-wq-stress-rebind-bindexecqueue
+	 * Description: Concurrently hammers VM bind/unbind cycles using per-slot
+	 *              bind exec queues across all engines while a background
+	 *              process rapidly cycles the unbound workqueue cpumask,
+	 *              forcing work items to migrate between CPU pools.
+	 *
+	 *              Each bind and unbind fence is waited on with a 5-second
+	 *              deadline. The test passes if all fences signal within that
+	 *              window across repeated iterations for up to 30 seconds.
+	 *              It fails if any fence stalls beyond the deadline, indicating
+	 *              that GPU work items are no longer being scheduled.
+	 */
+	igt_subtest("threads-wq-stress-rebind-bindexecqueue") {
+		char orig_cpumask[64] = {};
+		int cfd, result_pipe[2];
+		pid_t child;
+		uint8_t result_byte;
+
+		struct igt_helper_process cpumask_proc = {};
+		int child_fd;
+		bool hang;
+		uint8_t r;
+
+		/* Needs write access to workqueue cpumask sysfs node (root) */
+		igt_require(access(WQ_CPUMASK_PATH, W_OK) == 0);
+
+		/* Save current cpumask so we can restore it after the test */
+		cfd = open(WQ_CPUMASK_PATH, O_RDONLY);
+		igt_assert_neq(cfd, -1);
+		read(cfd, orig_cpumask, sizeof(orig_cpumask) - 1);
+		close(cfd);
+		orig_cpumask[strcspn(orig_cpumask, "\n")] = '\0';
+
+		/*
+		 * Start the cpumask stressor in the parent so that any unexpected
+		 * write failure propagates through igt_stop_helper() and fails
+		 * the test - running GPU stress without active cpumask cycling
+		 * would make this test meaningless.
+		 */
+		cpumask_proc.use_SIGKILL = true;
+		igt_fork_helper(&cpumask_proc)
+			cpumask_stressor_loop();
+
+		/* IPC channel: child writes a 1-byte result; parent reads via poll() */
+		igt_assert_eq(pipe(result_pipe), 0);
+
+		/*
+		 * All GPU work is submitted through child_fd (opened inside the
+		 * child). The fixture fd held by the parent has no pending GPU
+		 * work, so drm_close_driver(fd) in the end fixture will never
+		 * block — even if the child's _exit() gets stuck in
+		 * dma_resv_wait_timeout() (intr=false, TASK_UNINTERRUPTIBLE).
+		 */
+		child = fork();
+		igt_assert_neq(child, -1);
+
+		if (child == 0) {
+			/* ---- child: owns all GPU resources ---- */
+			close(result_pipe[0]);
+
+			child_fd = drm_open_driver(DRIVER_XE);
+
+			atomic_store(&wq_stress_hang_detected, false);
+			igt_until_timeout(30) {
+				threads(child_fd,
+					REBIND | BIND_EXEC_QUEUE | WQ_STRESS);
+				if (atomic_load(&wq_stress_hang_detected))
+					break;
+			}
+
+			/* Restore cpumask from child */
+			cfd = open(WQ_CPUMASK_PATH, O_WRONLY);
+			if (cfd >= 0) {
+				write(cfd, orig_cpumask, strlen(orig_cpumask));
+				close(cfd);
+			}
+
+			hang = atomic_load(&wq_stress_hang_detected);
+			if (hang) {
+				int dc_fd;
+
+				igt_critical("WorkQueue hang detected; dropping "
+					     "VM page cache and forcing GT "
+					     "reset\n");
+
+				dc_fd = open(DROP_CACHES_PATH,
+					     O_WRONLY);
+				if (dc_fd >= 0) {
+					write(dc_fd, "3", 1);
+					close(dc_fd);
+				}
+
+				xe_force_gt_reset_all(child_fd);
+				sleep(1);
+			}
+
+			/*
+			 * Write result BEFORE _exit().  When hang == true the
+			 * subsequent _exit() triggers do_exit() -> exit_files()
+			 * -> fput(child_fd) -> dma_resv_wait_timeout(intr=false)
+			 * and blocks in TASK_UNINTERRUPTIBLE.  The parent already
+			 * has the answer at this point and does not need to wait
+			 * for the child to actually exit.
+			 */
+			r = hang ? 1 : 0;
+			write(result_pipe[1], &r, 1);
+			close(result_pipe[1]);
+
+			if (!hang)
+				drm_close_driver(child_fd);
+			_exit(hang ? IGT_EXIT_FAILURE : IGT_EXIT_SUCCESS);
+		}
+
+		/* ---- parent ---- */
+		close(result_pipe[1]);
+
+		/*
+		 * Wait up to 60s for the child to write its result byte.
+		 * The child's stress loop runs for up to 30s, plus cleanup
+		 * (GT reset, drop_caches, cpumask restore) — 60s total gives
+		 * enough headroom on a healthy kernel while still catching a
+		 * child that has silently deadlocked before ever writing.
+		 *
+		 * poll() timeout -> treat as hang (result_byte = 0xFF).
+		 * read() returning 0 (EOF, no byte written) -> same.
+		 */
+		{
+			struct pollfd pfd = {
+				.fd     = result_pipe[0],
+				.events = POLLIN,
+			};
+			int poll_ret = poll(&pfd, 1, WQ_CHILD_TIMEOUT_MS);
+
+			if (poll_ret <= 0) {
+				/* timeout (0) or poll error (-1) */
+				igt_warn("Timed out waiting for child result "
+					 "after 60s - treating as hang\n");
+				result_byte = 0xFF;
+			} else if (read(result_pipe[0], &result_byte, 1) != 1) {
+				result_byte = 0xFF; /* EOF: child crashed before writing */
+			}
+		}
+		close(result_pipe[0]);
+
+		/* Restore cpumask from parent too. */
+		igt_stop_helper(&cpumask_proc);
+		cfd = open(WQ_CPUMASK_PATH, O_WRONLY);
+		if (cfd >= 0) {
+			write(cfd, orig_cpumask, strlen(orig_cpumask));
+			close(cfd);
+		}
+
+		/* Abort if hang detected, as moving forward without doing 
+		 * so could lead to undefined behavior and further issues in 
+		 * other tests
+		 */
+		igt_assert_f(result_byte == 0,
+			     "WQ stress worker detected fence stall "
+			     "- workqueue scheduling hang confirmed\n");
+
+		/* Clean path: no hang, reap child normally and continue */
+		waitpid(child, NULL, 0);
+	}
+
 	igt_fixture()
 		drm_close_driver(fd);
 }
-- 
2.43.0

next             reply	other threads:[~2026-04-24  4:46 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24  4:24 S Sebinraj [this message]
2026-04-24  5:31 ` ✓ i915.CI.BAT: success for tests/intel/xe_exec: Add VM rebind stress test with cpumask cycling Patchwork
2026-04-24  5:39 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-24  6:34 ` ✓ Xe.CI.FULL: " Patchwork
2026-04-24  7:17 ` ✗ i915.CI.Full: failure " Patchwork

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:f082a0ed dfblob:7197f434 )
 OR (
bs:"[PATCH i-g-t v3] tests/intel/xe_exec: Add VM rebind stress test with cpumask cycling" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260424042457.2178562-1-s.sebinraj@intel.com \
    --to=s.sebinraj@intel.com \
    --cc=carlos.santa@intel.com \
    --cc=igt-dev@lists.freedesktop.org \
    --cc=jeevaka.badrappan@intel.com \
    --cc=kamil.konieczny@intel.com \
    --cc=karthik.b.s@intel.com \
    --cc=krzysztof.karas@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=zbigniew.kempczynski@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox