[PATCH i-g-t 0/4] Add test coverage for multi queue reset

public inbox for igt-dev@lists.freedesktop.org
 help / color / mirror / Atom feed

* [PATCH i-g-t 0/4] Add test coverage for multi queue reset
@ 2026-04-27 21:20 Stuart Summers
  2026-04-27 21:20 ` [PATCH i-g-t 1/4] tests/intel/xe_exec_reset: Remove CANCEL logic for CM variants Stuart Summers
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Stuart Summers @ 2026-04-27 21:20 UTC (permalink / raw)
  Cc: igt-dev, niranjana.vishwanathapura, Stuart Summers

Combine series [1] and [2] for final CI testing before
merge.

For [2], also adjust the author to me given the original
author is no longer available here as well as the fact that
the original patch is now quite different in this latest
iteration.

[1] also has a review already. Including that here since it
also relates to the multi queue reset - removing the unused
CANCEL parameter from the CM cases. This just more clearly
indicates why those flags aren't implemented for multi-queue
as well given they aren't used for non-multi-queue either.

[1]: https://patchwork.freedesktop.org/patch/716611/
[2]: https://patchwork.freedesktop.org/series/164653/

Stuart Summers (4):
  tests/intel/xe_exec_reset: Remove CANCEL logic for CM variants
  tests/intel/xe_exec_reset: Add a comment about return for syncobj wait
  tests/intel/xe_exec_reset: Add checks for hanging queue wait_ufence
    return
  tests/intel/xe_exec_reset: Add multi queue subtests

 lib/xe/xe_legacy.c          |  86 +++++++++++---
 tests/intel/xe_exec_reset.c | 223 ++++++++++++++++++++++++++++++++++--
 2 files changed, 284 insertions(+), 25 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH i-g-t 1/4] tests/intel/xe_exec_reset: Remove CANCEL logic for CM variants
  2026-04-27 21:20 [PATCH i-g-t 0/4] Add test coverage for multi queue reset Stuart Summers
@ 2026-04-27 21:20 ` Stuart Summers
  2026-04-27 21:20 ` [PATCH i-g-t 2/4] tests/intel/xe_exec_reset: Add a comment about return for syncobj wait Stuart Summers
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Stuart Summers @ 2026-04-27 21:20 UTC (permalink / raw)
  Cc: igt-dev, niranjana.vishwanathapura, Stuart Summers,
	Priyanka Dandamudi

The compute mode tests are expected to be long running and will
not trigger the TDR logic in the KMD. This is illustrated in
the test in that the legacy cases implement the CANCEL case,
which is meant to trigger the TDR based resets vs the CM
cases which do not support this.

Remove the CANCEL logic from the CM tests since it isn't actually
used in any of the test cases here.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Acked-by: Priyanka Dandamudi <priyanka.dandamudi@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 tests/intel/xe_exec_reset.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/intel/xe_exec_reset.c b/tests/intel/xe_exec_reset.c
index 66580ea44..bc1a74fa3 100644
--- a/tests/intel/xe_exec_reset.c
+++ b/tests/intel/xe_exec_reset.c
@@ -422,7 +422,7 @@ test_compute_mode(int fd, struct drm_xe_engine_class_instance *eci,
 		uint64_t exec_addr;
 		int e = i % n_exec_queues;
 
-		if (!i || flags & CANCEL) {
+		if (!i) {
 			spin_opts.addr = base_addr + spin_offset;
 			xe_spin_init(&data[i].spin, &spin_opts);
 			exec_addr = spin_opts.addr;
@@ -479,7 +479,7 @@ test_compute_mode(int fd, struct drm_xe_engine_class_instance *eci,
 	xe_vm_unbind_async(fd, vm, 0, 0, addr, bo_size, sync, 1);
 	xe_wait_ufence(fd, &data[0].vm_sync, USER_FENCE_VALUE, 0, 3 * NSEC_PER_SEC);
 
-	if (!(flags & (GT_RESET | CANCEL))) {
+	if (!(flags & GT_RESET)) {
 		for (i = 1; i < n_execs; i++)
 			igt_assert_eq(data[i].data, 0xc0ffee);
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH i-g-t 2/4] tests/intel/xe_exec_reset: Add a comment about return for syncobj wait
  2026-04-27 21:20 [PATCH i-g-t 0/4] Add test coverage for multi queue reset Stuart Summers
  2026-04-27 21:20 ` [PATCH i-g-t 1/4] tests/intel/xe_exec_reset: Remove CANCEL logic for CM variants Stuart Summers
@ 2026-04-27 21:20 ` Stuart Summers
  2026-04-27 21:20 ` [PATCH i-g-t 3/4] tests/intel/xe_exec_reset: Add checks for hanging queue wait_ufence return Stuart Summers
  2026-04-27 21:20 ` [PATCH i-g-t 4/4] tests/intel/xe_exec_reset: Add multi queue subtests Stuart Summers
  3 siblings, 0 replies; 5+ messages in thread
From: Stuart Summers @ 2026-04-27 21:20 UTC (permalink / raw)
  Cc: igt-dev, niranjana.vishwanathapura, Stuart Summers

Add a comment to the syncobj wait after a hanging submission
to indicate that even for the hang case, we expect the syncobj
wait to return successfully as opposed to the wait ufence case
where we only return successful if hardware did in fact execute
the batch through the MI_USER_INTERRUPT to satisfy the wait ufence.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 lib/xe/xe_legacy.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/xe/xe_legacy.c b/lib/xe/xe_legacy.c
index 084445305..6aeddc578 100644
--- a/lib/xe/xe_legacy.c
+++ b/lib/xe/xe_legacy.c
@@ -181,8 +181,13 @@ xe_legacy_test_mode(int fd, struct drm_xe_engine_class_instance *eci,
 		return;
 	}
 
-	for (i = 0; i < n_exec_queues && n_execs; i++)
+	for (i = 0; i < n_exec_queues && n_execs; i++) {
+		/*
+		 * Expectation here is that on reset, submissions will
+		 * still satisfy the syncobj_wait.
+		 */
 		igt_assert(syncobj_wait(fd, &syncobjs[i], 1, INT64_MAX, 0, NULL));
+	}
 
 	igt_assert(syncobj_wait(fd, &sync[0].handle, 1, INT64_MAX, 0, NULL));
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH i-g-t 3/4] tests/intel/xe_exec_reset: Add checks for hanging queue wait_ufence return
  2026-04-27 21:20 [PATCH i-g-t 0/4] Add test coverage for multi queue reset Stuart Summers
  2026-04-27 21:20 ` [PATCH i-g-t 1/4] tests/intel/xe_exec_reset: Remove CANCEL logic for CM variants Stuart Summers
  2026-04-27 21:20 ` [PATCH i-g-t 2/4] tests/intel/xe_exec_reset: Add a comment about return for syncobj wait Stuart Summers
@ 2026-04-27 21:20 ` Stuart Summers
  2026-04-27 21:20 ` [PATCH i-g-t 4/4] tests/intel/xe_exec_reset: Add multi queue subtests Stuart Summers
  3 siblings, 0 replies; 5+ messages in thread
From: Stuart Summers @ 2026-04-27 21:20 UTC (permalink / raw)
  Cc: igt-dev, niranjana.vishwanathapura, Stuart Summers

There is a 3 second wait user fence timeout for the compute mode
variants of this test. Instead of just skipping the wait altogether,
let's make sure this does in fact return -ETIME as expected there.

Also add the i == 0 cases for legacy and compute path for the actual
data checks to stay consistent and to be a little more explicit
about what we're checking there.

This also let's us add a little more detail to the cases in some
planned changes around hanging multi queue secondary queues.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 lib/xe/xe_legacy.c          |  8 ++++++--
 tests/intel/xe_exec_reset.c | 17 ++++++++++++-----
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/lib/xe/xe_legacy.c b/lib/xe/xe_legacy.c
index 6aeddc578..3371a91ac 100644
--- a/lib/xe/xe_legacy.c
+++ b/lib/xe/xe_legacy.c
@@ -230,9 +230,13 @@ xe_legacy_test_mode(int fd, struct drm_xe_engine_class_instance *eci,
 	igt_assert(syncobj_wait(fd, &sync[0].handle, 1, INT64_MAX, 0, NULL));
 
 	if (!use_capture_mode && !(flags & (GT_RESET | CANCEL | COMPRESSION))) {
-		for (i = flags & LONG_SPIN ? n_exec_queues : 1;
-		     i < n_execs + extra_execs; i++)
+		for (i = flags & LONG_SPIN ? n_exec_queues : 0;
+		     i < n_execs + extra_execs; i++) {
+			if (!i)
+				continue;
+
 			igt_assert_eq(data[i].data, 0xc0ffee);
+		}
 	}
 
 	syncobj_destroy(fd, sync[0].handle);
diff --git a/tests/intel/xe_exec_reset.c b/tests/intel/xe_exec_reset.c
index bc1a74fa3..1c9275804 100644
--- a/tests/intel/xe_exec_reset.c
+++ b/tests/intel/xe_exec_reset.c
@@ -462,26 +462,33 @@ test_compute_mode(int fd, struct drm_xe_engine_class_instance *eci,
 		return;
 	}
 
-	for (i = 1; i < n_execs; i++) {
+	for (i = 0; i < n_execs; i++) {
 		int64_t timeout = 3 * NSEC_PER_SEC;
 		int err;
 
 		err = __xe_wait_ufence(fd, &data[i].exec_sync, USER_FENCE_VALUE,
 				       exec_queues[i % n_exec_queues], &timeout);
-		if (flags & GT_RESET || flags & CAT_ERROR)
+		if (!i) {
+			igt_assert(err == -ETIME || err == -EIO);
+		} else if (flags & GT_RESET || flags & CAT_ERROR) {
 			/* exec races with reset: may return -EIO or complete */
 			igt_assert(err == -EIO || !err);
-		else
+		} else {
 			igt_assert_eq(err, 0);
+		}
 	}
 
 	sync[0].addr = to_user_pointer(&data[0].vm_sync);
 	xe_vm_unbind_async(fd, vm, 0, 0, addr, bo_size, sync, 1);
 	xe_wait_ufence(fd, &data[0].vm_sync, USER_FENCE_VALUE, 0, 3 * NSEC_PER_SEC);
 
-	if (!(flags & GT_RESET)) {
-		for (i = 1; i < n_execs; i++)
+	if (!(flags & (GT_RESET))) {
+		for (i = 0; i < n_execs; i++) {
+			if (!i)
+				continue;
+
 			igt_assert_eq(data[i].data, 0xc0ffee);
+		}
 	}
 
 	for (i = 0; i < n_exec_queues; i++)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH i-g-t 4/4] tests/intel/xe_exec_reset: Add multi queue subtests
  2026-04-27 21:20 [PATCH i-g-t 0/4] Add test coverage for multi queue reset Stuart Summers
                   ` (2 preceding siblings ...)
  2026-04-27 21:20 ` [PATCH i-g-t 3/4] tests/intel/xe_exec_reset: Add checks for hanging queue wait_ufence return Stuart Summers
@ 2026-04-27 21:20 ` Stuart Summers
  3 siblings, 0 replies; 5+ messages in thread
From: Stuart Summers @ 2026-04-27 21:20 UTC (permalink / raw)
  Cc: igt-dev, niranjana.vishwanathapura, Stuart Summers, Fei Yang,
	Katarzyna Piecielska, Priyanka Dandamudi, Daniel Charles,
	Kamil Konieczny

Extend the existing test cases in tests/intel/xe_exec_reset.c
to include testing of reset flows for both primary queue
and secondary queues.

Engine resets without CAT faults are triggered via the *-cancel
cases. These don't include the CANCEL flag as this causes a spinner
on each queue which isn't adding any extra coverage for multi queue
over non multi queue.

Since the *-cancel cases are currently implemented only for the
legacy cases, do the same for multi queue.

New MULTI_QUEUE and SECONDARY_QUEUE flags are added to cover
the general multi queue cases and the cases where we are triggering
engine resets and/or cat faults on secondary queues specifically.

Note for multi queue it is interesting to test these secondary
queue reset scenarios since these are communicated to the driver
from GuC via the primary queue and after this, the entire queue
group is torn down. The test cases here are to ensure nothing
breaks when we hit a scenario like this.

Signed-off-by: Fei Yang <fei.yang@intel.com>
Signed-off-by: Katarzyna Piecielska <katarzyna.piecielska@intel.com>
Signed-off-by: Priyanka Dandamudi <priyanka.dandamudi@intel.com>
Signed-off-by: Daniel Charles <daniel.charles@intel.com>
Signed-off-by: Kamil Konieczny <kamil.konieczny@linux.intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 lib/xe/xe_legacy.c          |  71 ++++++++++--
 tests/intel/xe_exec_reset.c | 210 ++++++++++++++++++++++++++++++++++--
 2 files changed, 262 insertions(+), 19 deletions(-)

diff --git a/lib/xe/xe_legacy.c b/lib/xe/xe_legacy.c
index 3371a91ac..f9bd5bcb6 100644
--- a/lib/xe/xe_legacy.c
+++ b/lib/xe/xe_legacy.c
@@ -13,6 +13,8 @@
 
 /* Batch buffer element count, in number of dwords(u32) */
 #define BATCH_DW_COUNT			16
+#define SECONDARY_QUEUE			(0x1 << 15)
+#define MULTI_QUEUE			(0x1 << 14)
 #define COMPRESSION			(0x1 << 13)
 #define SYSTEM				(0x1 << 12)
 #define LONG_SPIN_REUSE_QUEUE		(0x1 << 11)
@@ -70,10 +72,14 @@ xe_legacy_test_mode(int fd, struct drm_xe_engine_class_instance *eci,
 			xe_spin_nsec_to_ticks(fd, 0, THREE_SEC) : 0,
 	};
 	int i, b;
+	int hang_position = flags & SECONDARY_QUEUE ? 1 : 0;
 	int extra_execs = (flags & LONG_SPIN_REUSE_QUEUE) ? n_exec_queues : 0;
 
 	igt_assert_lte(n_exec_queues, MAX_N_EXECQUEUES);
 
+	igt_assert_f(!(flags & SECONDARY_QUEUE) || (flags & MULTI_QUEUE),
+		     "SECONDARY_QUEUE requires MULTI_QUEUE to be set");
+
 	if (flags & COMPRESSION)
 		igt_require(intel_gen(intel_get_drm_devid(fd)) >= 20);
 
@@ -101,7 +107,20 @@ xe_legacy_test_mode(int fd, struct drm_xe_engine_class_instance *eci,
 	data = xe_bo_map(fd, bo, bo_size);
 
 	for (i = 0; i < n_exec_queues; i++) {
-		exec_queues[i] = xe_exec_queue_create(fd, vm, eci, 0);
+		if (flags & MULTI_QUEUE) {
+			struct drm_xe_ext_set_property multi_queue = {
+				.base.next_extension = 0,
+				.base.name = DRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY,
+				.property = DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP,
+			};
+
+			uint64_t ext = to_user_pointer(&multi_queue);
+
+			multi_queue.value = i ? exec_queues[0] : DRM_XE_MULTI_GROUP_CREATE;
+			exec_queues[i] = xe_exec_queue_create(fd, vm, eci, ext);
+		} else {
+			exec_queues[i] = xe_exec_queue_create(fd, vm, eci, 0);
+		}
 		syncobjs[i] = syncobj_create(fd, 0);
 	}
 
@@ -123,17 +142,22 @@ xe_legacy_test_mode(int fd, struct drm_xe_engine_class_instance *eci,
 	}
 
 	for (i = 0; i < n_execs; i++) {
-		u64 base_addr = (!use_capture_mode && (flags & CAT_ERROR) && !i)
-			? (addr + bo_size * 128) : addr;
+		u64 base_addr = (!use_capture_mode && flags & CAT_ERROR &&
+				 i == hang_position) ?
+				(addr + bo_size * 128) : addr;
 		u64 batch_offset = (char *)&data[i].batch - (char *)data;
 		u64 batch_addr = base_addr + batch_offset;
 		u64 spin_offset = (char *)&data[i].spin - (char *)data;
 		u64 sdi_offset = (char *)&data[i].data - (char *)data;
 		u64 sdi_addr = base_addr + sdi_offset;
 		u64 exec_addr;
-		int e = i % n_exec_queues;
+		int err, e = i % n_exec_queues;
 
-		if (!i || flags & CANCEL ||
+		/*
+		 * For cat fault on a secondary queue the fault will
+		 * be on the spinner.
+		 */
+		if (i == hang_position || flags & CANCEL ||
 		    (flags & LONG_SPIN && i < n_exec_queues)) {
 			spin_opts.addr = base_addr + spin_offset;
 			xe_spin_init(&data[i].spin, &spin_opts);
@@ -160,10 +184,17 @@ xe_legacy_test_mode(int fd, struct drm_xe_engine_class_instance *eci,
 		if (e != i)
 			syncobj_reset(fd, &syncobjs[e], 1);
 
-		xe_exec(fd, &exec);
+		/*
+		 * Secondary queues are reset when the primary queue
+		 * is reset. The submission can race here and it is
+		 * expected for those to fail submission if the primary
+		 * reset has already happened.
+		 */
+		err = __xe_exec(fd, &exec);
+		igt_assert(!err || ((flags & MULTI_QUEUE) && err == -ECANCELED));
 
-		if (!i && !(flags & CAT_ERROR) && !use_capture_mode &&
-		    !(flags & COMPRESSION))
+		if (i == hang_position && !(flags & CAT_ERROR) &&
+		    !use_capture_mode && !(flags & COMPRESSION))
 			xe_spin_wait_started(&data[i].spin);
 	}
 
@@ -186,7 +217,21 @@ xe_legacy_test_mode(int fd, struct drm_xe_engine_class_instance *eci,
 		 * Expectation here is that on reset, submissions will
 		 * still satisfy the syncobj_wait.
 		 */
-		igt_assert(syncobj_wait(fd, &syncobjs[i], 1, INT64_MAX, 0, NULL));
+		int err = syncobj_wait_err(fd, &syncobjs[i], 1, INT64_MAX, 0);
+
+		/*
+		 * Currently any time GuC resets a queue which is part of a
+		 * multi queue queue group submitted by the KMD, the KMD
+		 * will tear down the entire group. This means we don't know
+		 * whether a particular queue submitted prior to the hanging
+		 * queue will complete or not. So we have to check all possible
+		 * return values here.
+		 *
+		 * In the event we get an -ECANCELED at the exec above and the
+		 * syncobj was not installed, we expect this to return -EINVAL
+		 * here instead.
+		 */
+		igt_assert(!err || ((flags & MULTI_QUEUE) && err == -EINVAL));
 	}
 
 	igt_assert(syncobj_wait(fd, &sync[0].handle, 1, INT64_MAX, 0, NULL));
@@ -232,7 +277,13 @@ xe_legacy_test_mode(int fd, struct drm_xe_engine_class_instance *eci,
 	if (!use_capture_mode && !(flags & (GT_RESET | CANCEL | COMPRESSION))) {
 		for (i = flags & LONG_SPIN ? n_exec_queues : 0;
 		     i < n_execs + extra_execs; i++) {
-			if (!i)
+			/*
+			 * For multi-queue there is no guarantee which
+			 * queue will be scheduled first as they are all
+			 * submitted at the same priority in this test.
+			 * So we can't guarantee any data integrity here.
+			 */
+			if (i == hang_position || flags & MULTI_QUEUE)
 				continue;
 
 			igt_assert_eq(data[i].data, 0xc0ffee);
diff --git a/tests/intel/xe_exec_reset.c b/tests/intel/xe_exec_reset.c
index 1c9275804..a3cf290ab 100644
--- a/tests/intel/xe_exec_reset.c
+++ b/tests/intel/xe_exec_reset.c
@@ -112,7 +112,7 @@ static void test_spin(int fd, struct drm_xe_engine_class_instance *eci,
 #define MAX_N_EXECQUEUES	16
 #define GT_RESET			(0x1 << 0)
 #define CLOSE_FD			(0x1 << 1)
-#define CLOSE_EXEC_QUEUES	(0x1 << 2)
+#define CLOSE_EXEC_QUEUES		(0x1 << 2)
 #define VIRTUAL				(0x1 << 3)
 #define PARALLEL			(0x1 << 4)
 #define CAT_ERROR			(0x1 << 5)
@@ -124,6 +124,8 @@ static void test_spin(int fd, struct drm_xe_engine_class_instance *eci,
 #define LONG_SPIN_REUSE_QUEUE		(0x1 << 11)
 #define SYSTEM				(0x1 << 12)
 #define COMPRESSION			(0x1 << 13)
+#define MULTI_QUEUE			(0x1 << 14)
+#define SECONDARY_QUEUE			(0x1 << 15)
 
 /**
  * SUBTEST: %s-cat-error
@@ -354,6 +356,45 @@ test_balancer(int fd, int gt, int class, int n_exec_queues, int n_execs,
  *
  * SUBTEST: cm-close-execqueues-close-fd
  * Description: Test compute mode close exec_queues close fd
+ *
+ * SUBTEST: multi-queue-cat-error
+ * Description: Test cat error with multi_queue
+ *
+ * SUBTEST: multi-queue-cat-error-on-secondary
+ * Description: Test cat error with multi_queue
+ *              on a secondary queue
+ *
+ * SUBTEST: multi-queue-gt-reset
+ * Description: Test GT reset with multi_queue
+ *
+ * SUBTEST: multi-queue-cancel
+ * Description: Test engine reset with multi_queue
+ *
+ * SUBTEST: multi-queue-cancel-on-secondary
+ * Description: Test engine reset with multi_queue
+ *              on a secondary queue
+ *
+ * SUBTEST: multi-queue-close-fd
+ * Description: Test close fd with multi_queue
+ *
+ * SUBTEST: multi-queue-close-execqueues
+ * Description: Test close execqueues with multi_queue
+ *
+ * SUBTEST: cm-multi-queue-cat-error
+ * Description: Test compute mode cat error with multi_queue
+ *
+ * SUBTEST: cm-multi-queue-cat-error-on-secondary
+ * Description: Test compute mode cat error with multi_queue
+ *              on a secondary queue
+ *
+ * SUBTEST: cm-multi-queue-gt-reset
+ * Description: Test compute mode GT reset with multi_queue
+ *
+ * SUBTEST: cm-multi-queue-close-fd
+ * Description: Test compute mode close fd with multi_queue
+ *
+ * SUBTEST: cm-multi-queue-close-execqueues
+ * Description: Test compute mode close execqueues with multi_queue
  */
 
 static void
@@ -385,9 +426,14 @@ test_compute_mode(int fd, struct drm_xe_engine_class_instance *eci,
 	} *data;
 	struct xe_spin_opts spin_opts = { .preempt = flags & PREEMPT };
 	int i, b;
+	int hang_position = flags & SECONDARY_QUEUE ? 1 : 0;
 
 	igt_assert_lte(n_exec_queues, MAX_N_EXECQUEUES);
 
+	igt_assert_f(!(flags & SECONDARY_QUEUE) ||
+		     ((flags & MULTI_QUEUE) && (flags & CAT_ERROR)),
+		     "SECONDARY_QUEUE requires MULTI_QUEUE and CAT_ERROR to be set");
+
 	if (flags & CLOSE_FD)
 		fd = drm_open_driver(DRIVER_XE);
 
@@ -402,7 +448,20 @@ test_compute_mode(int fd, struct drm_xe_engine_class_instance *eci,
 	memset(data, 0, bo_size);
 
 	for (i = 0; i < n_exec_queues; i++) {
-		exec_queues[i] = xe_exec_queue_create(fd, vm, eci, 0);
+		if (flags & MULTI_QUEUE) {
+			struct drm_xe_ext_set_property multi_queue = {
+				.base.next_extension = 0,
+				.base.name = DRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY,
+				.property = DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP,
+			};
+
+			uint64_t ext = to_user_pointer(&multi_queue);
+
+			multi_queue.value = i ? exec_queues[0] : DRM_XE_MULTI_GROUP_CREATE;
+			exec_queues[i] = xe_exec_queue_create(fd, vm, eci, ext);
+		} else {
+			exec_queues[i] = xe_exec_queue_create(fd, vm, eci, 0);
+		}
 	};
 
 	sync[0].addr = to_user_pointer(&data[0].vm_sync);
@@ -412,17 +471,21 @@ test_compute_mode(int fd, struct drm_xe_engine_class_instance *eci,
 	data[0].vm_sync = 0;
 
 	for (i = 0; i < n_execs; i++) {
-		uint64_t base_addr = flags & CAT_ERROR && !i ?
-			addr + bo_size * 128 : addr;
+		uint64_t base_addr = (flags & CAT_ERROR && i == hang_position) ?
+				     (addr + bo_size * 128) : addr;
 		uint64_t batch_offset = (char *)&data[i].batch - (char *)data;
 		uint64_t batch_addr = base_addr + batch_offset;
 		uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
 		uint64_t sdi_offset = (char *)&data[i].data - (char *)data;
 		uint64_t sdi_addr = base_addr + sdi_offset;
 		uint64_t exec_addr;
-		int e = i % n_exec_queues;
+		int err, e = i % n_exec_queues;
 
-		if (!i) {
+		/*
+		 * For cat fault on a secondary queue the fault will
+		 * be on the spinner.
+		 */
+		if (i == hang_position) {
 			spin_opts.addr = base_addr + spin_offset;
 			xe_spin_init(&data[i].spin, &spin_opts);
 			exec_addr = spin_opts.addr;
@@ -443,7 +506,18 @@ test_compute_mode(int fd, struct drm_xe_engine_class_instance *eci,
 
 		exec.exec_queue_id = exec_queues[e];
 		exec.address = exec_addr;
-		xe_exec(fd, &exec);
+
+		/*
+		 * Secondary queues are reset when the primary queue
+		 * is reset. The submission can race here and it is
+		 * expected for those to fail submission if the primary
+		 * reset has already happened.
+		 */
+		err = __xe_exec(fd, &exec);
+		igt_assert(!err || ((flags & MULTI_QUEUE) && err == -ECANCELED));
+
+		if (i == hang_position && !(flags & CAT_ERROR))
+			xe_spin_wait_started(&data[i].spin);
 	}
 
 	if (flags & GT_RESET) {
@@ -468,8 +542,18 @@ test_compute_mode(int fd, struct drm_xe_engine_class_instance *eci,
 
 		err = __xe_wait_ufence(fd, &data[i].exec_sync, USER_FENCE_VALUE,
 				       exec_queues[i % n_exec_queues], &timeout);
-		if (!i) {
+		if (i == hang_position) {
 			igt_assert(err == -ETIME || err == -EIO);
+		} else if (flags & MULTI_QUEUE) {
+			/*
+			 * Currently any time GuC resets a queue submitted
+			 * by the KMD, the KMD will tear down the entire
+			 * queue group. This means we don't know whether
+			 * a particular queue submitted prior to the hanging
+			 * queue will complete or not. So we have to check
+			 * all possible return values here.
+			 */
+			igt_assert(err == -ETIME || err == -EIO || !err);
 		} else if (flags & GT_RESET || flags & CAT_ERROR) {
 			/* exec races with reset: may return -EIO or complete */
 			igt_assert(err == -EIO || !err);
@@ -484,7 +568,13 @@ test_compute_mode(int fd, struct drm_xe_engine_class_instance *eci,
 
 	if (!(flags & (GT_RESET))) {
 		for (i = 0; i < n_execs; i++) {
-			if (!i)
+			/*
+			 * For multi-queue there is no guarantee which
+			 * queue will be scheduled first as they are all
+			 * submitted at the same priority in this test.
+			 * So we can't guarantee any data integrity here.
+			 */
+			if (i == hang_position || flags & MULTI_QUEUE)
 				continue;
 
 			igt_assert_eq(data[i].data, 0xc0ffee);
@@ -987,6 +1077,108 @@ int igt_main()
 		xe_for_each_gt(fd, gt)
 			gt_mocs_reset(fd, gt);
 
+	igt_subtest("multi-queue-cat-error") {
+		igt_require(intel_graphics_ver(intel_get_drm_devid(fd)) >= IP_VER(35, 0));
+		xe_for_each_multi_queue_engine(fd, hwe)
+			xe_legacy_test_mode(fd, hwe, 16, 16,
+					    CAT_ERROR | MULTI_QUEUE,
+					    LEGACY_MODE_ADDR,
+					    false);
+	}
+
+	igt_subtest("multi-queue-cat-error-on-secondary") {
+		igt_require(intel_graphics_ver(intel_get_drm_devid(fd)) >= IP_VER(35, 0));
+		xe_for_each_multi_queue_engine(fd, hwe)
+			xe_legacy_test_mode(fd, hwe, 16, 16,
+					    CAT_ERROR | MULTI_QUEUE |
+					    SECONDARY_QUEUE,
+					    LEGACY_MODE_ADDR,
+					    false);
+	}
+
+	igt_subtest("multi-queue-gt-reset") {
+		igt_require(intel_graphics_ver(intel_get_drm_devid(fd)) >= IP_VER(35, 0));
+		xe_for_each_multi_queue_engine(fd, hwe)
+			xe_legacy_test_mode(fd, hwe, 16, 16,
+					    GT_RESET | MULTI_QUEUE,
+					    LEGACY_MODE_ADDR,
+					    false);
+	}
+
+	igt_subtest("multi-queue-cancel") {
+		igt_require(intel_graphics_ver(intel_get_drm_devid(fd)) >= IP_VER(35, 0));
+		xe_for_each_multi_queue_engine(fd, hwe)
+			xe_legacy_test_mode(fd, hwe, 16, 16,
+					    MULTI_QUEUE,
+					    LEGACY_MODE_ADDR,
+					    false);
+	}
+
+	igt_subtest("multi-queue-cancel-on-secondary") {
+		igt_require(intel_graphics_ver(intel_get_drm_devid(fd)) >= IP_VER(35, 0));
+		xe_for_each_multi_queue_engine(fd, hwe)
+			xe_legacy_test_mode(fd, hwe, 16, 16,
+					    MULTI_QUEUE | SECONDARY_QUEUE,
+					    LEGACY_MODE_ADDR,
+					    false);
+	}
+
+	igt_subtest("multi-queue-close-fd") {
+		igt_require(intel_graphics_ver(intel_get_drm_devid(fd)) >= IP_VER(35, 0));
+		xe_for_each_multi_queue_engine(fd, hwe)
+			xe_legacy_test_mode(-1, hwe, 16, 256,
+					    CLOSE_FD | MULTI_QUEUE,
+					    LEGACY_MODE_ADDR,
+					    false);
+	}
+
+	igt_subtest("multi-queue-close-execqueues") {
+		igt_require(intel_graphics_ver(intel_get_drm_devid(fd)) >= IP_VER(35, 0));
+		xe_for_each_multi_queue_engine(fd, hwe)
+			xe_legacy_test_mode(-1, hwe, 16, 256,
+					    CLOSE_EXEC_QUEUES | CLOSE_FD |
+					    MULTI_QUEUE,
+					    LEGACY_MODE_ADDR,
+					    false);
+	}
+
+	igt_subtest("cm-multi-queue-cat-error") {
+		igt_require(intel_graphics_ver(intel_get_drm_devid(fd)) >= IP_VER(35, 0));
+		xe_for_each_multi_queue_engine(fd, hwe)
+			test_compute_mode(fd, hwe, 16, 16,
+					  CAT_ERROR | MULTI_QUEUE);
+	}
+
+	igt_subtest("cm-multi-queue-cat-error-on-secondary") {
+		igt_require(intel_graphics_ver(intel_get_drm_devid(fd)) >= IP_VER(35, 0));
+		xe_for_each_multi_queue_engine(fd, hwe)
+			test_compute_mode(fd, hwe, 16, 16,
+					  CAT_ERROR | MULTI_QUEUE |
+					  SECONDARY_QUEUE);
+	}
+
+	igt_subtest("cm-multi-queue-gt-reset") {
+		igt_require(intel_graphics_ver(intel_get_drm_devid(fd)) >= IP_VER(35, 0));
+		xe_for_each_multi_queue_engine(fd, hwe)
+			test_compute_mode(fd, hwe, 16, 16,
+					  GT_RESET | MULTI_QUEUE);
+	}
+
+	igt_subtest("cm-multi-queue-close-fd") {
+		igt_require(intel_graphics_ver(intel_get_drm_devid(fd)) >= IP_VER(35, 0));
+		xe_for_each_multi_queue_engine(fd, hwe)
+			test_compute_mode(-1, hwe, 16, 256,
+					  CLOSE_FD | MULTI_QUEUE);
+	}
+
+	igt_subtest("cm-multi-queue-close-execqueues") {
+		igt_require(intel_graphics_ver(intel_get_drm_devid(fd)) >= IP_VER(35, 0));
+		xe_for_each_multi_queue_engine(fd, hwe)
+			test_compute_mode(-1, hwe, 16, 256,
+					  CLOSE_EXEC_QUEUES | CLOSE_FD |
+					  MULTI_QUEUE);
+	}
+
 	igt_fixture()
 		drm_close_driver(fd);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-27 21:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-27 21:20 [PATCH i-g-t 0/4] Add test coverage for multi queue reset Stuart Summers
2026-04-27 21:20 ` [PATCH i-g-t 1/4] tests/intel/xe_exec_reset: Remove CANCEL logic for CM variants Stuart Summers
2026-04-27 21:20 ` [PATCH i-g-t 2/4] tests/intel/xe_exec_reset: Add a comment about return for syncobj wait Stuart Summers
2026-04-27 21:20 ` [PATCH i-g-t 3/4] tests/intel/xe_exec_reset: Add checks for hanging queue wait_ufence return Stuart Summers
2026-04-27 21:20 ` [PATCH i-g-t 4/4] tests/intel/xe_exec_reset: Add multi queue subtests Stuart Summers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox