Igt-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] tests/xe_exec_reset: Replace fixed usleep with dynamic GT idle wait
@ 2026-06-09 16:25 Xin Wang
  2026-06-09 18:36 ` ✓ i915.CI.BAT: success for tests/xe_exec_reset: Replace fixed usleep with dynamic GT idle wait (rev2) Patchwork
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Xin Wang @ 2026-06-09 16:25 UTC (permalink / raw)
  To: igt-dev; +Cc: Xin Wang, Matthew Brost

The CLOSE_FD subtests simulate a user abruptly closing the DRM file
descriptor (e.g. via Ctrl+C) while GPU work is still in flight. After
closing the fd, the kernel begins tearing down the VM and releasing
resources including the ASID. If the next test starts before this
cleanup completes, ASID reuse can cause hardware-level collisions and
trigger a cascade of CAT errors, corrupting the subsequent test.

The previous workaround of a fixed 150ms sleep was fragile: on some
platforms or under varying system load, cleanup takes longer and the
sleep is insufficient. Replace it with a dynamic poll that waits until
the GT enters the C6 idle state, which is a reliable hardware signal
that all GPU activity has ceased and resources have been reclaimed.
The poll checks every 10ms with a 1 second timeout, adapting to actual
hardware completion time rather than guessing.

v2: Fix regression on VF where GT C6 idle status is not available.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Xin Wang <x.wang@intel.com>
---
 tests/intel/xe_exec_reset.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/tests/intel/xe_exec_reset.c b/tests/intel/xe_exec_reset.c
index dc1ae7e99c..19643b8874 100644
--- a/tests/intel/xe_exec_reset.c
+++ b/tests/intel/xe_exec_reset.c
@@ -15,6 +15,7 @@
 #include <fcntl.h>
 
 #include "igt.h"
+#include "igt_sysfs.h"
 #include "lib/igt_syncobj.h"
 #include "lib/intel_reg.h"
 #include "xe_drm.h"
@@ -157,6 +158,20 @@ static void test_spin(int fd, struct drm_xe_engine_class_instance *eci,
  * @parallel:	parallel
  */
 
+static void
+wait_gt_idle(int gt_id)
+{
+	int fd = drm_open_driver(DRIVER_XE);
+
+	if (xe_sysfs_gt_has_node(fd, gt_id, "gtidle/idle_status"))
+		igt_wait(xe_gt_is_in_c6(fd, gt_id), 1000, 10);
+	else
+		/* GT C6 idle status not available (e.g. VF), fall back to fixed delay */
+		usleep(150000);
+
+	drm_close_driver(fd);
+}
+
 static void
 test_balancer(int fd, int gt, int class, int n_exec_queues, int n_execs,
 	      unsigned int flags)
@@ -277,8 +292,7 @@ test_balancer(int fd, int gt, int class, int n_exec_queues, int n_execs,
 				xe_exec_queue_destroy(fd, exec_queues[i]);
 		}
 		drm_close_driver(fd);
-		/* FIXME: wait for idle */
-		usleep(150000);
+		wait_gt_idle(gt);
 		return;
 	}
 
@@ -542,8 +556,7 @@ test_compute_mode(int fd, struct drm_xe_engine_class_instance *eci,
 				xe_exec_queue_destroy(fd, exec_queues[i]);
 		}
 		drm_close_driver(fd);
-		/* FIXME: wait for idle */
-		usleep(150000);
+		wait_gt_idle(eci->gt_id);
 		return;
 	}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-10 21:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09 16:25 [PATCH v2] tests/xe_exec_reset: Replace fixed usleep with dynamic GT idle wait Xin Wang
2026-06-09 18:36 ` ✓ i915.CI.BAT: success for tests/xe_exec_reset: Replace fixed usleep with dynamic GT idle wait (rev2) Patchwork
2026-06-09 18:45 ` ✓ Xe.CI.BAT: " Patchwork
2026-06-10  7:23 ` ✓ Xe.CI.FULL: " Patchwork
2026-06-10 21:23 ` ✗ i915.CI.Full: failure " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox