[igt-dev] [PATCH i-g-t 0/8] [RFC] benchmarks/gem

Igt-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [igt-dev] [PATCH i-g-t 0/8] [RFC] benchmarks/gem_wsim: added basic xe support
@ 2023-09-06 15:51 Marcin Bernatowicz
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 1/8] lib/xe_spin: xe_spin_opts for xe_spin initialization Marcin Bernatowicz
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: Marcin Bernatowicz @ 2023-09-06 15:51 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Added basic xe support with few examples.
Single binary handles both i915 and Xe devices,
but workload definitions differs between i915 and xe.
Xe does not use context abstraction, introduces new VM and Exec Queue
steps and BATCH step references exec queue.
For more details see wsim/README.
Some functionality is still missing: working sets,
load balancing (need some input if/how to do it in Xe - exec queues
width?).

The tool is handy for scheduling tests, we find it useful to verify vGPU
profiles defining different execution quantum/preemption timeout settings.

There is also some rationale for the tool in following thread:
https://lore.kernel.org/dri-devel/a443495f-5d1b-52e1-9b2f-80167deb6d57@linux.intel.com/

With this patch it should be possible to run following on xe device:

gem_wsim -w benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim -c 36 -r 600

Best with drm debug logs disabled:

echo 0 > /sys/module/drm/parameters/debug

lib/xe_spin: fixed duration xe_spin capability - is already 
  under review https://patchwork.freedesktop.org/series/122624/

v2:
- splitted patches to easy review (Kamil), 
  all benchmarks/gem_wsim patches before [RFC] one
  contain fixes (for scale duration option), cleanups (checkpatch.pl),
  refactors (some code moved to functions),
  not related to xe and ready to be applied
- lib/xe_spin is under review in other thread https://patchwork.freedesktop.org/series/122624/

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>

Marcin Bernatowicz (8):
  lib/xe_spin: xe_spin_opts for xe_spin initialization
  lib/xe_spin: fixed duration xe_spin capability
  lib/igt_device_scan: Xe get integrated/discrete card functions
  benchmarks/gem_wsim: scale duration option fixes
  benchmarks/gem_wsim: cleanups
  benchmarks/gem_wsim: allow comments in workload description files
  benchmarks/gem_wsim: extract prepare_ctxs function, add w_sync
  [RFC] benchmarks/gem_wsim: added basic xe support

 benchmarks/gem_wsim.c                         | 875 ++++++++++++++----
 benchmarks/wsim/README                        |  87 +-
 benchmarks/wsim/xe_cloud-gaming-60fps.wsim    |  25 +
 benchmarks/wsim/xe_example.wsim               |  28 +
 benchmarks/wsim/xe_example01.wsim             |  19 +
 benchmarks/wsim/xe_example_fence.wsim         |  23 +
 .../wsim/xe_media_load_balance_fhd26u7.wsim   |  63 ++
 lib/igt_device_scan.c                         |  34 +-
 lib/igt_device_scan.h                         |   2 +
 lib/xe/xe_spin.c                              | 123 ++-
 lib/xe/xe_spin.h                              |  27 +-
 tests/intel/xe_dma_buf_sync.c                 |   6 +-
 tests/intel/xe_exec_balancer.c                |   9 +-
 tests/intel/xe_exec_reset.c                   |  24 +-
 tests/intel/xe_exec_threads.c                 |   7 +-
 tests/intel/xe_vm.c                           |   7 +-
 16 files changed, 1114 insertions(+), 245 deletions(-)
 create mode 100644 benchmarks/wsim/xe_cloud-gaming-60fps.wsim
 create mode 100644 benchmarks/wsim/xe_example.wsim
 create mode 100644 benchmarks/wsim/xe_example01.wsim
 create mode 100644 benchmarks/wsim/xe_example_fence.wsim
 create mode 100644 benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim

-- 
2.30.2

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [igt-dev] [PATCH i-g-t 1/8] lib/xe_spin: xe_spin_opts for xe_spin initialization
  2023-09-06 15:51 [igt-dev] [PATCH i-g-t 0/8] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
@ 2023-09-06 15:51 ` Marcin Bernatowicz
  2023-09-20 16:43   ` Kamil Konieczny
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 2/8] lib/xe_spin: fixed duration xe_spin capability Marcin Bernatowicz
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Marcin Bernatowicz @ 2023-09-06 15:51 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Introduced struct xe_spin_opts for xe_spin initialization,
adjusted tests to new xe_spin_init signature.
Added xe_spin_init_opts macro (Zbyszek).

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 lib/xe/xe_spin.c               | 28 ++++++++++------------------
 lib/xe/xe_spin.h               | 19 ++++++++++++++++++-
 tests/intel/xe_dma_buf_sync.c  |  6 +++---
 tests/intel/xe_exec_balancer.c |  9 ++++-----
 tests/intel/xe_exec_reset.c    | 24 ++++++++++++++----------
 tests/intel/xe_exec_threads.c  |  7 ++++---
 tests/intel/xe_vm.c            |  7 ++++---
 7 files changed, 57 insertions(+), 43 deletions(-)

diff --git a/lib/xe/xe_spin.c b/lib/xe/xe_spin.c
index 7113972ee..27f837ef9 100644
--- a/lib/xe/xe_spin.c
+++ b/lib/xe/xe_spin.c
@@ -19,17 +19,13 @@
 /**
  * xe_spin_init:
  * @spin: pointer to mapped bo in which spinner code will be written
- * @addr: offset of spinner within vm
- * @preempt: allow spinner to be preempted or not
+ * @opts: pointer to spinner initialization options
  */
-void xe_spin_init(struct xe_spin *spin, uint64_t addr, bool preempt)
+void xe_spin_init(struct xe_spin *spin, struct xe_spin_opts *opts)
 {
-	uint64_t batch_offset = (char *)&spin->batch - (char *)spin;
-	uint64_t batch_addr = addr + batch_offset;
-	uint64_t start_offset = (char *)&spin->start - (char *)spin;
-	uint64_t start_addr = addr + start_offset;
-	uint64_t end_offset = (char *)&spin->end - (char *)spin;
-	uint64_t end_addr = addr + end_offset;
+	uint64_t loop_addr = opts->addr + offsetof(struct xe_spin, batch);
+	uint64_t start_addr = opts->addr + offsetof(struct xe_spin, start);
+	uint64_t end_addr = opts->addr + offsetof(struct xe_spin, end);
 	int b = 0;
 
 	spin->start = 0;
@@ -40,7 +36,7 @@ void xe_spin_init(struct xe_spin *spin, uint64_t addr, bool preempt)
 	spin->batch[b++] = start_addr >> 32;
 	spin->batch[b++] = 0xc0ffee;
 
-	if (preempt)
+	if (opts->preempt)
 		spin->batch[b++] = (0x5 << 23);
 
 	spin->batch[b++] = MI_COND_BATCH_BUFFER_END | MI_DO_COMPARE | 2;
@@ -49,8 +45,8 @@ void xe_spin_init(struct xe_spin *spin, uint64_t addr, bool preempt)
 	spin->batch[b++] = end_addr >> 32;
 
 	spin->batch[b++] = MI_BATCH_BUFFER_START | 1 << 8 | 1;
-	spin->batch[b++] = batch_addr;
-	spin->batch[b++] = batch_addr >> 32;
+	spin->batch[b++] = loop_addr;
+	spin->batch[b++] = loop_addr >> 32;
 
 	igt_assert(b <= ARRAY_SIZE(spin->batch));
 }
@@ -133,11 +129,7 @@ xe_spin_create(int fd, const struct igt_spin_factory *opt)
 	addr = intel_allocator_alloc_with_strategy(ahnd, spin->handle, bo_size, 0, ALLOC_STRATEGY_LOW_TO_HIGH);
 	xe_vm_bind_sync(fd, spin->vm, spin->handle, 0, addr, bo_size);
 
-	if (!(opt->flags & IGT_SPIN_NO_PREEMPTION))
-		xe_spin_init(xe_spin, addr, true);
-	else
-		xe_spin_init(xe_spin, addr, false);
-
+	xe_spin_init_opts(xe_spin, .addr = addr, .preempt = !(opt->flags & IGT_SPIN_NO_PREEMPTION));
 	exec.exec_queue_id = spin->engine;
 	exec.address = addr;
 	sync.handle = spin->syncobj;
@@ -219,7 +211,7 @@ void xe_cork_init(int fd, struct drm_xe_engine_class_instance *hwe,
 	exec_queue = xe_exec_queue_create(fd, vm, hwe, 0);
 	syncobj = syncobj_create(fd, 0);
 
-	xe_spin_init(spin, addr, true);
+	xe_spin_init_opts(spin, .addr = addr, .preempt = true);
 	exec.exec_queue_id = exec_queue;
 	exec.address = addr;
 	sync.handle = syncobj;
diff --git a/lib/xe/xe_spin.h b/lib/xe/xe_spin.h
index c84db175d..9f1d33294 100644
--- a/lib/xe/xe_spin.h
+++ b/lib/xe/xe_spin.h
@@ -15,6 +15,18 @@
 #include "xe_query.h"
 #include "lib/igt_dummyload.h"
 
+/** struct xe_spin_opts
+ *
+ * @addr: offset of spinner within vm
+ * @preempt: allow spinner to be preempted or not
+ *
+ * Used to initialize struct xe_spin spinner behavior.
+ */
+struct xe_spin_opts {
+	uint64_t addr;
+	bool preempt;
+};
+
 /* Mapped GPU object */
 struct xe_spin {
 	uint32_t batch[16];
@@ -22,8 +34,13 @@ struct xe_spin {
 	uint32_t start;
 	uint32_t end;
 };
+
 igt_spin_t *xe_spin_create(int fd, const struct igt_spin_factory *opt);
-void xe_spin_init(struct xe_spin *spin, uint64_t addr, bool preempt);
+void xe_spin_init(struct xe_spin *spin, struct xe_spin_opts *opts);
+
+#define xe_spin_init_opts(fd, ...) \
+	xe_spin_init(fd, &((struct xe_spin_opts){__VA_ARGS__}))
+
 bool xe_spin_started(struct xe_spin *spin);
 void xe_spin_sync_wait(int fd, struct igt_spin *spin);
 void xe_spin_wait_started(struct xe_spin *spin);
diff --git a/tests/intel/xe_dma_buf_sync.c b/tests/intel/xe_dma_buf_sync.c
index 29d675154..627f4c1e5 100644
--- a/tests/intel/xe_dma_buf_sync.c
+++ b/tests/intel/xe_dma_buf_sync.c
@@ -144,7 +144,6 @@ test_export_dma_buf(struct drm_xe_engine_class_instance *hwe0,
 		uint64_t sdi_offset = (char *)&data[i]->data - (char *)data[i];
 		uint64_t sdi_addr = addr + sdi_offset;
 		uint64_t spin_offset = (char *)&data[i]->spin - (char *)data[i];
-		uint64_t spin_addr = addr + spin_offset;
 		struct drm_xe_sync sync[2] = {
 			{ .flags = DRM_XE_SYNC_SYNCOBJ, },
 			{ .flags = DRM_XE_SYNC_SYNCOBJ | DRM_XE_SYNC_SIGNAL, },
@@ -153,14 +152,15 @@ test_export_dma_buf(struct drm_xe_engine_class_instance *hwe0,
 			.num_batch_buffer = 1,
 			.syncs = to_user_pointer(sync),
 		};
+		struct xe_spin_opts spin_opts = { .addr = addr + spin_offset, .preempt = true };
 		uint32_t syncobj;
 		int b = 0;
 		int sync_fd;
 
 		/* Write spinner on FD[0] */
-		xe_spin_init(&data[i]->spin, spin_addr, true);
+		xe_spin_init(&data[i]->spin, &spin_opts);
 		exec.exec_queue_id = exec_queue[0];
-		exec.address = spin_addr;
+		exec.address = spin_opts.addr;
 		xe_exec(fd[0], &exec);
 
 		/* Export prime BO as sync file and veify business */
diff --git a/tests/intel/xe_exec_balancer.c b/tests/intel/xe_exec_balancer.c
index f364a4b7a..d7d8dd8fb 100644
--- a/tests/intel/xe_exec_balancer.c
+++ b/tests/intel/xe_exec_balancer.c
@@ -52,6 +52,7 @@ static void test_all_active(int fd, int gt, int class)
 	struct {
 		struct xe_spin spin;
 	} *data;
+	struct xe_spin_opts spin_opts = { .preempt = false };
 	struct drm_xe_engine_class_instance *hwe;
 	struct drm_xe_engine_class_instance eci[MAX_INSTANCE];
 	int i, num_placements = 0;
@@ -90,16 +91,14 @@ static void test_all_active(int fd, int gt, int class)
 	xe_vm_bind_async(fd, vm, 0, bo, 0, addr, bo_size, sync, 1);
 
 	for (i = 0; i < num_placements; i++) {
-		uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
-		uint64_t spin_addr = addr + spin_offset;
-
-		xe_spin_init(&data[i].spin, spin_addr, false);
+		spin_opts.addr = addr + (char *)&data[i].spin - (char *)data;
+		xe_spin_init(&data[i].spin, &spin_opts);
 		sync[0].flags &= ~DRM_XE_SYNC_SIGNAL;
 		sync[1].flags |= DRM_XE_SYNC_SIGNAL;
 		sync[1].handle = syncobjs[i];
 
 		exec.exec_queue_id = exec_queues[i];
-		exec.address = spin_addr;
+		exec.address = spin_opts.addr;
 		xe_exec(fd, &exec);
 		xe_spin_wait_started(&data[i].spin);
 	}
diff --git a/tests/intel/xe_exec_reset.c b/tests/intel/xe_exec_reset.c
index a2d33baf1..be6bbada6 100644
--- a/tests/intel/xe_exec_reset.c
+++ b/tests/intel/xe_exec_reset.c
@@ -44,6 +44,7 @@ static void test_spin(int fd, struct drm_xe_engine_class_instance *eci)
 	size_t bo_size;
 	uint32_t bo = 0;
 	struct xe_spin *spin;
+	struct xe_spin_opts spin_opts = { .addr = addr, .preempt = false };
 
 	vm = xe_vm_create(fd, DRM_XE_VM_CREATE_ASYNC_BIND_OPS, 0);
 	bo_size = sizeof(*spin);
@@ -60,7 +61,7 @@ static void test_spin(int fd, struct drm_xe_engine_class_instance *eci)
 	sync[0].handle = syncobj_create(fd, 0);
 	xe_vm_bind_async(fd, vm, 0, bo, 0, addr, bo_size, sync, 1);
 
-	xe_spin_init(spin, addr, false);
+	xe_spin_init(spin, &spin_opts);
 
 	sync[0].flags &= ~DRM_XE_SYNC_SIGNAL;
 	sync[1].flags |= DRM_XE_SYNC_SIGNAL;
@@ -165,6 +166,7 @@ test_balancer(int fd, int gt, int class, int n_exec_queues, int n_execs,
 		uint64_t pad;
 		uint32_t data;
 	} *data;
+	struct xe_spin_opts spin_opts = { .preempt = false };
 	struct drm_xe_engine_class_instance *hwe;
 	struct drm_xe_engine_class_instance eci[MAX_INSTANCE];
 	int i, j, b, num_placements = 0, bad_batches = 1;
@@ -236,7 +238,6 @@ test_balancer(int fd, int gt, int class, int n_exec_queues, int n_execs,
 		uint64_t batch_offset = (char *)&data[i].batch - (char *)data;
 		uint64_t batch_addr = base_addr + batch_offset;
 		uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
-		uint64_t spin_addr = base_addr + spin_offset;
 		uint64_t sdi_offset = (char *)&data[i].data - (char *)data;
 		uint64_t sdi_addr = base_addr + sdi_offset;
 		uint64_t exec_addr;
@@ -247,8 +248,9 @@ test_balancer(int fd, int gt, int class, int n_exec_queues, int n_execs,
 			batches[j] = batch_addr;
 
 		if (i < bad_batches) {
-			xe_spin_init(&data[i].spin, spin_addr, false);
-			exec_addr = spin_addr;
+			spin_opts.addr = base_addr + spin_offset;
+			xe_spin_init(&data[i].spin, &spin_opts);
+			exec_addr = spin_opts.addr;
 		} else {
 			b = 0;
 			data[i].batch[b++] = MI_STORE_DWORD_IMM_GEN4;
@@ -368,6 +370,7 @@ test_legacy_mode(int fd, struct drm_xe_engine_class_instance *eci,
 		uint64_t pad;
 		uint32_t data;
 	} *data;
+	struct xe_spin_opts spin_opts = { .preempt = false };
 	int i, b;
 
 	igt_assert(n_exec_queues <= MAX_N_EXECQUEUES);
@@ -417,15 +420,15 @@ test_legacy_mode(int fd, struct drm_xe_engine_class_instance *eci,
 		uint64_t batch_offset = (char *)&data[i].batch - (char *)data;
 		uint64_t batch_addr = base_addr + batch_offset;
 		uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
-		uint64_t spin_addr = base_addr + spin_offset;
 		uint64_t sdi_offset = (char *)&data[i].data - (char *)data;
 		uint64_t sdi_addr = base_addr + sdi_offset;
 		uint64_t exec_addr;
 		int e = i % n_exec_queues;
 
 		if (!i) {
-			xe_spin_init(&data[i].spin, spin_addr, false);
-			exec_addr = spin_addr;
+			spin_opts.addr = base_addr + spin_offset;
+			xe_spin_init(&data[i].spin, &spin_opts);
+			exec_addr = spin_opts.addr;
 		} else {
 			b = 0;
 			data[i].batch[b++] = MI_STORE_DWORD_IMM_GEN4;
@@ -539,6 +542,7 @@ test_compute_mode(int fd, struct drm_xe_engine_class_instance *eci,
 		uint64_t exec_sync;
 		uint32_t data;
 	} *data;
+	struct xe_spin_opts spin_opts = { .preempt = false };
 	int i, b;
 
 	igt_assert(n_exec_queues <= MAX_N_EXECQUEUES);
@@ -593,15 +597,15 @@ test_compute_mode(int fd, struct drm_xe_engine_class_instance *eci,
 		uint64_t batch_offset = (char *)&data[i].batch - (char *)data;
 		uint64_t batch_addr = base_addr + batch_offset;
 		uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
-		uint64_t spin_addr = base_addr + spin_offset;
 		uint64_t sdi_offset = (char *)&data[i].data - (char *)data;
 		uint64_t sdi_addr = base_addr + sdi_offset;
 		uint64_t exec_addr;
 		int e = i % n_exec_queues;
 
 		if (!i) {
-			xe_spin_init(&data[i].spin, spin_addr, false);
-			exec_addr = spin_addr;
+			spin_opts.addr = base_addr + spin_offset;
+			xe_spin_init(&data[i].spin, &spin_opts);
+			exec_addr = spin_opts.addr;
 		} else {
 			b = 0;
 			data[i].batch[b++] = MI_STORE_DWORD_IMM_GEN4;
diff --git a/tests/intel/xe_exec_threads.c b/tests/intel/xe_exec_threads.c
index e64c1639a..ff4ebc280 100644
--- a/tests/intel/xe_exec_threads.c
+++ b/tests/intel/xe_exec_threads.c
@@ -486,6 +486,7 @@ test_legacy_mode(int fd, uint32_t vm, uint64_t addr, uint64_t userptr,
 		uint64_t pad;
 		uint32_t data;
 	} *data;
+	struct xe_spin_opts spin_opts = { .preempt = false };
 	int i, j, b, hang_exec_queue = n_exec_queues / 2;
 	bool owns_vm = false, owns_fd = false;
 
@@ -562,15 +563,15 @@ test_legacy_mode(int fd, uint32_t vm, uint64_t addr, uint64_t userptr,
 		uint64_t batch_offset = (char *)&data[i].batch - (char *)data;
 		uint64_t batch_addr = addr + batch_offset;
 		uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
-		uint64_t spin_addr = addr + spin_offset;
 		uint64_t sdi_offset = (char *)&data[i].data - (char *)data;
 		uint64_t sdi_addr = addr + sdi_offset;
 		uint64_t exec_addr;
 		int e = i % n_exec_queues;
 
 		if (flags & HANG && e == hang_exec_queue && i == e) {
-			xe_spin_init(&data[i].spin, spin_addr, false);
-			exec_addr = spin_addr;
+			spin_opts.addr = addr + spin_offset;
+			xe_spin_init(&data[i].spin, &spin_opts);
+			exec_addr = spin_opts.addr;
 		} else {
 			b = 0;
 			data[i].batch[b++] = MI_STORE_DWORD_IMM_GEN4;
diff --git a/tests/intel/xe_vm.c b/tests/intel/xe_vm.c
index e42c04e33..dc1850338 100644
--- a/tests/intel/xe_vm.c
+++ b/tests/intel/xe_vm.c
@@ -727,6 +727,7 @@ test_bind_execqueues_independent(int fd, struct drm_xe_engine_class_instance *ec
 		uint64_t pad;
 		uint32_t data;
 	} *data;
+	struct xe_spin_opts spin_opts = { .preempt = true };
 	int i, b;
 
 	vm = xe_vm_create(fd, DRM_XE_VM_CREATE_ASYNC_BIND_OPS, 0);
@@ -755,14 +756,14 @@ test_bind_execqueues_independent(int fd, struct drm_xe_engine_class_instance *ec
 		uint64_t sdi_offset = (char *)&data[i].data - (char *)data;
 		uint64_t sdi_addr = addr + sdi_offset;
 		uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
-		uint64_t spin_addr = addr + spin_offset;
 		int e = i;
 
 		if (i == 0) {
 			/* Cork 1st exec_queue with a spinner */
-			xe_spin_init(&data[i].spin, spin_addr, true);
+			spin_opts.addr = addr + spin_offset;
+			xe_spin_init(&data[i].spin, &spin_opts);
 			exec.exec_queue_id = exec_queues[e];
-			exec.address = spin_addr;
+			exec.address = spin_opts.addr;
 			sync[0].flags &= ~DRM_XE_SYNC_SIGNAL;
 			sync[1].flags |= DRM_XE_SYNC_SIGNAL;
 			sync[1].handle = syncobjs[e];
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [igt-dev] [PATCH i-g-t 2/8] lib/xe_spin: fixed duration xe_spin capability
  2023-09-06 15:51 [igt-dev] [PATCH i-g-t 0/8] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 1/8] lib/xe_spin: xe_spin_opts for xe_spin initialization Marcin Bernatowicz
@ 2023-09-06 15:51 ` Marcin Bernatowicz
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 3/8] lib/igt_device_scan: Xe get integrated/discrete card functions Marcin Bernatowicz
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Marcin Bernatowicz @ 2023-09-06 15:51 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Extended spinner with fixed duration capability. It allows
to prepare fixed duration (ex. 10ms) workloads and take workloads/second
measurements, a handy utility for scheduling tests.

v2: - added asserts in div64_u64_round_up, duration_to_ctx_ticks,
      simplified loop_addr (Zbyszek)
    - corrected patch title (Kamil)

v3: - div64_u64_round_up assert on overflow (Kamil)
    - enum indentation cleanup in xe_spin.c (Kamil)

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 lib/xe/xe_spin.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++-
 lib/xe/xe_spin.h |  8 +++-
 2 files changed, 103 insertions(+), 2 deletions(-)

diff --git a/lib/xe/xe_spin.c b/lib/xe/xe_spin.c
index 27f837ef9..54ae2d3ac 100644
--- a/lib/xe/xe_spin.c
+++ b/lib/xe/xe_spin.c
@@ -16,6 +16,50 @@
 #include "xe_ioctl.h"
 #include "xe_spin.h"
 
+static uint32_t read_timestamp_frequency(int fd, int gt_id)
+{
+	struct xe_device *dev = xe_device_get(fd);
+
+	igt_assert(dev && dev->gts && dev->gts->num_gt);
+	igt_assert(gt_id >= 0 && gt_id <= dev->gts->num_gt);
+
+	return dev->gts->gts[gt_id].clock_freq;
+}
+
+static uint64_t div64_u64_round_up(const uint64_t x, const uint64_t y)
+{
+	igt_assert(y > 0);
+	igt_assert_lte_u64(x, UINT64_MAX - (y - 1));
+
+	return (x + y - 1) / y;
+}
+
+/**
+ * duration_to_ctx_ticks:
+ * @fd: opened device
+ * @gt_id: tile id
+ * @duration_ns: duration in nanoseconds to be converted to context timestamp ticks
+ * @return: duration converted to context timestamp ticks.
+ */
+uint32_t duration_to_ctx_ticks(int fd, int gt_id, uint64_t duration_ns)
+{
+	uint32_t f = read_timestamp_frequency(fd, gt_id);
+	uint64_t ctx_ticks = div64_u64_round_up(duration_ns * f, NSEC_PER_SEC);
+
+	igt_assert_lt_u64(ctx_ticks, XE_SPIN_MAX_CTX_TICKS);
+
+	return ctx_ticks;
+}
+
+#define MI_SRM_CS_MMIO				(1 << 19)
+#define MI_LRI_CS_MMIO				(1 << 19)
+#define MI_LRR_DST_CS_MMIO			(1 << 19)
+#define MI_LRR_SRC_CS_MMIO			(1 << 18)
+#define CTX_TIMESTAMP 0x3a8;
+#define CS_GPR(x) (0x600 + 8 * (x))
+
+enum { START_TS, NOW_TS };
+
 /**
  * xe_spin_init:
  * @spin: pointer to mapped bo in which spinner code will be written
@@ -23,13 +67,28 @@
  */
 void xe_spin_init(struct xe_spin *spin, struct xe_spin_opts *opts)
 {
-	uint64_t loop_addr = opts->addr + offsetof(struct xe_spin, batch);
+	uint64_t loop_addr;
 	uint64_t start_addr = opts->addr + offsetof(struct xe_spin, start);
 	uint64_t end_addr = opts->addr + offsetof(struct xe_spin, end);
+	uint64_t ticks_delta_addr = opts->addr + offsetof(struct xe_spin, ticks_delta);
+	uint64_t pad_addr = opts->addr + offsetof(struct xe_spin, pad);
 	int b = 0;
 
 	spin->start = 0;
 	spin->end = 0xffffffff;
+	spin->ticks_delta = 0;
+
+	if (opts->ctx_ticks) {
+		/* store start timestamp */
+		spin->batch[b++] = MI_LOAD_REGISTER_IMM(1) | MI_LRI_CS_MMIO;
+		spin->batch[b++] = CS_GPR(START_TS) + 4;
+		spin->batch[b++] = 0;
+		spin->batch[b++] = MI_LOAD_REGISTER_REG | MI_LRR_DST_CS_MMIO | MI_LRR_SRC_CS_MMIO;
+		spin->batch[b++] = CTX_TIMESTAMP;
+		spin->batch[b++] = CS_GPR(START_TS);
+	}
+
+	loop_addr = opts->addr + b * sizeof(uint32_t);
 
 	spin->batch[b++] = MI_STORE_DWORD_IMM_GEN4;
 	spin->batch[b++] = start_addr;
@@ -39,6 +98,42 @@ void xe_spin_init(struct xe_spin *spin, struct xe_spin_opts *opts)
 	if (opts->preempt)
 		spin->batch[b++] = (0x5 << 23);
 
+	if (opts->ctx_ticks) {
+		spin->batch[b++] = MI_LOAD_REGISTER_IMM(1) | MI_LRI_CS_MMIO;
+		spin->batch[b++] = CS_GPR(NOW_TS) + 4;
+		spin->batch[b++] = 0;
+		spin->batch[b++] = MI_LOAD_REGISTER_REG | MI_LRR_DST_CS_MMIO | MI_LRR_SRC_CS_MMIO;
+		spin->batch[b++] = CTX_TIMESTAMP;
+		spin->batch[b++] = CS_GPR(NOW_TS);
+
+		/* delta = now - start; inverted to match COND_BBE */
+		spin->batch[b++] = MI_MATH(4);
+		spin->batch[b++] = MI_MATH_LOAD(MI_MATH_REG_SRCA, MI_MATH_REG(NOW_TS));
+		spin->batch[b++] = MI_MATH_LOAD(MI_MATH_REG_SRCB, MI_MATH_REG(START_TS));
+		spin->batch[b++] = MI_MATH_SUB;
+		spin->batch[b++] = MI_MATH_STOREINV(MI_MATH_REG(NOW_TS), MI_MATH_REG_ACCU);
+
+		/* Save delta for reading by COND_BBE */
+		spin->batch[b++] = MI_STORE_REGISTER_MEM | MI_SRM_CS_MMIO | 2;
+		spin->batch[b++] = CS_GPR(NOW_TS);
+		spin->batch[b++] = ticks_delta_addr;
+		spin->batch[b++] = ticks_delta_addr >> 32;
+
+		/* Delay between SRM and COND_BBE to post the writes */
+		for (int n = 0; n < 8; n++) {
+			spin->batch[b++] = MI_STORE_DWORD_IMM_GEN4;
+			spin->batch[b++] = pad_addr;
+			spin->batch[b++] = pad_addr >> 32;
+			spin->batch[b++] = 0xc0ffee;
+		}
+
+		/* Break if delta [time elapsed] > ns */
+		spin->batch[b++] = MI_COND_BATCH_BUFFER_END | MI_DO_COMPARE | 2;
+		spin->batch[b++] = ~(opts->ctx_ticks);
+		spin->batch[b++] = ticks_delta_addr;
+		spin->batch[b++] = ticks_delta_addr >> 32;
+	}
+
 	spin->batch[b++] = MI_COND_BATCH_BUFFER_END | MI_DO_COMPARE | 2;
 	spin->batch[b++] = 0;
 	spin->batch[b++] = end_addr;
diff --git a/lib/xe/xe_spin.h b/lib/xe/xe_spin.h
index 9f1d33294..f1abc1102 100644
--- a/lib/xe/xe_spin.h
+++ b/lib/xe/xe_spin.h
@@ -15,27 +15,33 @@
 #include "xe_query.h"
 #include "lib/igt_dummyload.h"
 
+#define XE_SPIN_MAX_CTX_TICKS UINT32_MAX - 1000
+
 /** struct xe_spin_opts
  *
  * @addr: offset of spinner within vm
  * @preempt: allow spinner to be preempted or not
+ * @ctx_ticks: number of ticks after which spinner is stopped, applied if > 0
  *
  * Used to initialize struct xe_spin spinner behavior.
  */
 struct xe_spin_opts {
 	uint64_t addr;
 	bool preempt;
+	uint32_t ctx_ticks;
 };
 
 /* Mapped GPU object */
 struct xe_spin {
-	uint32_t batch[16];
+	uint32_t batch[128];
 	uint64_t pad;
 	uint32_t start;
 	uint32_t end;
+	uint32_t ticks_delta;
 };
 
 igt_spin_t *xe_spin_create(int fd, const struct igt_spin_factory *opt);
+uint32_t duration_to_ctx_ticks(int fd, int gt_id, uint64_t ns);
 void xe_spin_init(struct xe_spin *spin, struct xe_spin_opts *opts);
 
 #define xe_spin_init_opts(fd, ...) \
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [igt-dev] [PATCH i-g-t 3/8] lib/igt_device_scan: Xe get integrated/discrete card functions
  2023-09-06 15:51 [igt-dev] [PATCH i-g-t 0/8] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 1/8] lib/xe_spin: xe_spin_opts for xe_spin initialization Marcin Bernatowicz
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 2/8] lib/xe_spin: fixed duration xe_spin capability Marcin Bernatowicz
@ 2023-09-06 15:51 ` Marcin Bernatowicz
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 4/8] benchmarks/gem_wsim: scale duration option fixes Marcin Bernatowicz
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Marcin Bernatowicz @ 2023-09-06 15:51 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Xe functions to get integrated/discrete card.

v2:
- renamed __find_first_i915_card to __find_first_intel_card_by_driver_name (Zbyszek)

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 lib/igt_device_scan.c | 34 +++++++++++++++++++++++++---------
 lib/igt_device_scan.h |  2 ++
 2 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/lib/igt_device_scan.c b/lib/igt_device_scan.c
index ae69ed09f..151ce4593 100644
--- a/lib/igt_device_scan.c
+++ b/lib/igt_device_scan.c
@@ -769,25 +769,27 @@ __copy_dev_to_card(struct igt_device *dev, struct igt_device_card *card)
  * Iterate over all igt_devices array and find first discrete/integrated card.
  * card->pci_slot_name will be updated only if a card is found.
  */
-static bool __find_first_i915_card(struct igt_device_card *card, bool discrete)
+static bool __find_first_intel_card_by_driver_name(struct igt_device_card *card,
+				bool want_discrete, const char *drv_name)
 {
 	struct igt_device *dev;
-	int cmp;
+	int is_integrated;
 
+	igt_assert(drv_name);
 	memset(card, 0, sizeof(*card));
 
 	igt_list_for_each_entry(dev, &igt_devs.all, link) {
 
-		if (!is_pci_subsystem(dev) || strcmp(dev->driver, "i915"))
+		if (!is_pci_subsystem(dev) || strcmp(dev->driver, drv_name))
 			continue;
 
-		cmp = strncmp(dev->pci_slot_name, INTEGRATED_I915_GPU_PCI_ID,
-			      PCI_SLOT_NAME_SIZE);
+		is_integrated = !strncmp(dev->pci_slot_name, INTEGRATED_I915_GPU_PCI_ID,
+					 PCI_SLOT_NAME_SIZE);
 
-		if (discrete && cmp) {
+		if (want_discrete && !is_integrated) {
 			__copy_dev_to_card(dev, card);
 			return true;
-		} else if (!discrete && !cmp) {
+		} else if (!want_discrete && is_integrated) {
 			__copy_dev_to_card(dev, card);
 			return true;
 		}
@@ -800,14 +802,28 @@ bool igt_device_find_first_i915_discrete_card(struct igt_device_card *card)
 {
 	igt_assert(card);
 
-	return __find_first_i915_card(card, true);
+	return __find_first_intel_card_by_driver_name(card, true, "i915");
+}
+
+bool igt_device_find_first_xe_discrete_card(struct igt_device_card *card)
+{
+	igt_assert(card);
+
+	return __find_first_intel_card_by_driver_name(card, true, "xe");
 }
 
 bool igt_device_find_integrated_card(struct igt_device_card *card)
 {
 	igt_assert(card);
 
-	return __find_first_i915_card(card, false);
+	return __find_first_intel_card_by_driver_name(card, false, "i915");
+}
+
+bool igt_device_find_xe_integrated_card(struct igt_device_card *card)
+{
+	igt_assert(card);
+
+	return __find_first_intel_card_by_driver_name(card, false, "xe");
 }
 
 static struct igt_device *igt_device_from_syspath(const char *syspath)
diff --git a/lib/igt_device_scan.h b/lib/igt_device_scan.h
index e6b0f1b90..b8f6a843d 100644
--- a/lib/igt_device_scan.h
+++ b/lib/igt_device_scan.h
@@ -87,6 +87,8 @@ bool igt_device_card_match_pci(const char *filter,
 	struct igt_device_card *card);
 bool igt_device_find_first_i915_discrete_card(struct igt_device_card *card);
 bool igt_device_find_integrated_card(struct igt_device_card *card);
+bool igt_device_find_first_xe_discrete_card(struct igt_device_card *card);
+bool igt_device_find_xe_integrated_card(struct igt_device_card *card);
 char *igt_device_get_pretty_name(struct igt_device_card *card, bool numeric);
 int igt_open_card(struct igt_device_card *card);
 int igt_open_render(struct igt_device_card *card);
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [igt-dev] [PATCH i-g-t 4/8] benchmarks/gem_wsim: scale duration option fixes
  2023-09-06 15:51 [igt-dev] [PATCH i-g-t 0/8] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (2 preceding siblings ...)
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 3/8] lib/igt_device_scan: Xe get integrated/discrete card functions Marcin Bernatowicz
@ 2023-09-06 15:51 ` Marcin Bernatowicz
  2023-09-20 16:06   ` Tvrtko Ursulin
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 5/8] benchmarks/gem_wsim: cleanups Marcin Bernatowicz
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Marcin Bernatowicz @ 2023-09-06 15:51 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Fixed range duration check when scale duration (-f) command line option
is provided + PERIOD step takes scale duration into account.
Moved duration parsing code from parse_workload to separate function.
Moved wsim_err, __duration definitions before parse_duration.
Moved unbound_duration from struct w_step to struct duration.

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 109 +++++++++++++++++++++++-------------------
 1 file changed, 61 insertions(+), 48 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 7b5e62a3b..f4024deb1 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -73,6 +73,7 @@ enum intel_engine_id {
 
 struct duration {
 	unsigned int min, max;
+	bool unbound_duration;
 };
 
 enum w_type
@@ -145,7 +146,6 @@ struct w_step
 	unsigned int context;
 	unsigned int engine;
 	struct duration duration;
-	bool unbound_duration;
 	struct deps data_deps;
 	struct deps fence_deps;
 	int emit_fence;
@@ -240,6 +240,19 @@ static struct drm_i915_gem_context_param_sseu device_sseu = {
 #define DEPSYNC		(1<<2)
 #define SSEU		(1<<3)
 
+static void __attribute__((format(printf, 1, 2)))
+wsim_err(const char *fmt, ...)
+{
+	va_list ap;
+
+	if (!verbose)
+		return;
+
+	va_start(ap, fmt);
+	vfprintf(stderr, fmt, ap);
+	va_end(ap);
+}
+
 static const char *ring_str_map[NUM_ENGINES] = {
 	[DEFAULT] = "DEFAULT",
 	[RCS] = "RCS",
@@ -429,17 +442,43 @@ out:
 	return ret;
 }
 
-static void __attribute__((format(printf, 1, 2)))
-wsim_err(const char *fmt, ...)
+static long __duration(long dur, double scale)
 {
-	va_list ap;
+	return round(scale * dur);
+}
 
-	if (!verbose)
-		return;
+static int
+parse_duration(unsigned int nr_steps, struct duration *dur, double scale_dur, char *_desc)
+{
+	char *sep = NULL;
+	long tmpl;
 
-	va_start(ap, fmt);
-	vfprintf(stderr, fmt, ap);
-	va_end(ap);
+	if (_desc[0] == '*') {
+		if (intel_gen(intel_get_drm_devid(fd)) < 8) {
+			wsim_err("Infinite batch at step %u needs Gen8+!\n", nr_steps);
+			return -1;
+		}
+		dur->unbound_duration = true;
+	} else {
+		tmpl = strtol(_desc, &sep, 10);
+		if (tmpl <= 0 || tmpl == LONG_MIN || tmpl == LONG_MAX)
+			return -1;
+
+		dur->min = __duration(tmpl, scale_dur);
+
+		if (sep && *sep == '-') {
+			tmpl = strtol(sep + 1, NULL, 10);
+			if (tmpl <= 0 || __duration(tmpl, scale_dur) <= dur->min ||
+			    tmpl == LONG_MIN || tmpl == LONG_MAX)
+				return -1;
+
+			dur->max = __duration(tmpl, scale_dur);
+		} else {
+			dur->max = dur->min;
+		}
+	}
+
+	return 0;
 }
 
 #define check_arg(cond, fmt, ...) \
@@ -855,11 +894,6 @@ static uint64_t engine_list_mask(const char *_str)
 static unsigned long
 allocate_working_set(struct workload *wrk, struct working_set *set);
 
-static long __duration(long dur, double scale)
-{
-	return round(scale * dur);
-}
-
 #define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
 	if ((field = strtok_r(fstart, ".", &fctx))) { \
 		tmp = atoi(field); \
@@ -899,8 +933,14 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 				int_field(DELAY, delay, tmp <= 0,
 					  "Invalid delay at step %u!\n");
 			} else if (!strcmp(field, "p")) {
-				int_field(PERIOD, period, tmp <= 0,
-					  "Invalid period at step %u!\n");
+				field = strtok_r(fstart, ".", &fctx);
+				if (field) {
+					tmp = atoi(field);
+					check_arg(tmp <= 0, "Invalid period at step %u!\n", nr_steps);
+					step.type = PERIOD;
+					step.period = __duration(tmp, scale_dur);
+					goto add_step;
+				}
 			} else if (!strcmp(field, "P")) {
 				unsigned int nr = 0;
 				while ((field = strtok_r(fstart, ".", &fctx))) {
@@ -1121,38 +1161,11 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 		}
 
 		if ((field = strtok_r(fstart, ".", &fctx))) {
-			char *sep = NULL;
-			long int tmpl;
-
 			fstart = NULL;
 
-			if (field[0] == '*') {
-				check_arg(intel_gen(intel_get_drm_devid(fd)) < 8,
-					  "Infinite batch at step %u needs Gen8+!\n",
-					  nr_steps);
-				step.unbound_duration = true;
-			} else {
-				tmpl = strtol(field, &sep, 10);
-				check_arg(tmpl <= 0 || tmpl == LONG_MIN ||
-					  tmpl == LONG_MAX,
-					  "Invalid duration at step %u!\n",
-					  nr_steps);
-				step.duration.min = __duration(tmpl, scale_dur);
-
-				if (sep && *sep == '-') {
-					tmpl = strtol(sep + 1, NULL, 10);
-					check_arg(tmpl <= 0 ||
-						tmpl <= step.duration.min ||
-						tmpl == LONG_MIN ||
-						tmpl == LONG_MAX,
-						"Invalid duration range at step %u!\n",
-						nr_steps);
-					step.duration.max = __duration(tmpl,
-								       scale_dur);
-				} else {
-					step.duration.max = step.duration.min;
-				}
-			}
+			tmp = parse_duration(nr_steps, &step.duration, scale_dur, field);
+			check_arg(tmp < 0,
+				  "Invalid duration at step %u!\n", nr_steps);
 
 			valid++;
 		}
@@ -2172,7 +2185,7 @@ update_bb_start(struct workload *wrk, struct w_step *w)
 
 	/* ticks is inverted for MI_DO_COMPARE (less-than comparison) */
 	ticks = 0;
-	if (!w->unbound_duration)
+	if (!w->duration.unbound_duration)
 		ticks = ~ns_to_ctx_ticks(1000 * get_duration(wrk, w));
 
 	*w->bb_duration = ticks;
@@ -2349,7 +2362,7 @@ static void *run_workload(void *data)
 
 				igt_assert(t_idx >= 0 && t_idx < i);
 				igt_assert(wrk->steps[t_idx].type == BATCH);
-				igt_assert(wrk->steps[t_idx].unbound_duration);
+				igt_assert(wrk->steps[t_idx].duration.unbound_duration);
 
 				*wrk->steps[t_idx].bb_duration = 0xffffffff;
 				__sync_synchronize();
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [igt-dev] [PATCH i-g-t 5/8] benchmarks/gem_wsim: cleanups
  2023-09-06 15:51 [igt-dev] [PATCH i-g-t 0/8] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (3 preceding siblings ...)
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 4/8] benchmarks/gem_wsim: scale duration option fixes Marcin Bernatowicz
@ 2023-09-06 15:51 ` Marcin Bernatowicz
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 6/8] benchmarks/gem_wsim: allow comments in workload description files Marcin Bernatowicz
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Marcin Bernatowicz @ 2023-09-06 15:51 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 6028 bytes --]

Cleaning checkpatch.pl reported warnings/errors.
Removed unused fence_signal field from struct w_step.
calloc vs malloc in parse_workload for struct workload.

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 56 ++++++++++++++++++++++++++-----------------
 1 file changed, 34 insertions(+), 22 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index f4024deb1..0c1b58727 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: MIT
 /*
  * Copyright © 2017 Intel Corporation
  *
@@ -76,8 +77,7 @@ struct duration {
 	bool unbound_duration;
 };
 
-enum w_type
-{
+enum w_type {
 	BATCH,
 	SYNC,
 	DELAY,
@@ -102,8 +102,7 @@ struct dep_entry {
 	int working_set; /* -1 = step dependecy, >= 0 working set id */
 };
 
-struct deps
-{
+struct deps {
 	int nr;
 	bool submit_fence;
 	struct dep_entry *list;
@@ -137,8 +136,7 @@ struct working_set {
 
 struct workload;
 
-struct w_step
-{
+struct w_step {
 	struct workload *wrk;
 
 	/* Workload step metadata */
@@ -155,7 +153,6 @@ struct w_step
 		int period;
 		int target;
 		int throttle;
-		int fence_signal;
 		int priority;
 		struct {
 			unsigned int engine_map_count;
@@ -194,8 +191,7 @@ struct ctx {
 	uint64_t sseu;
 };
 
-struct workload
-{
+struct workload {
 	unsigned int id;
 
 	unsigned int nr_steps;
@@ -846,6 +842,7 @@ static int add_buffers(struct working_set *set, char *str)
 
 	for (i = 0; i < add; i++) {
 		struct work_buffer_size *sz = &sizes[set->nr + i];
+
 		sz->min = min_sz;
 		sz->max = max_sz;
 		sz->size = 0;
@@ -895,13 +892,16 @@ static unsigned long
 allocate_working_set(struct workload *wrk, struct working_set *set);
 
 #define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
-	if ((field = strtok_r(fstart, ".", &fctx))) { \
-		tmp = atoi(field); \
-		check_arg(_COND_, _ERR_, nr_steps); \
-		step.type = _STEP_; \
-		step._FIELD_ = tmp; \
-		goto add_step; \
-	} \
+	do { \
+		field = strtok_r(fstart, ".", &fctx); \
+		if (field) { \
+			tmp = atoi(field); \
+			check_arg(_COND_, _ERR_, nr_steps); \
+			step.type = _STEP_; \
+			step._FIELD_ = tmp; \
+			goto add_step; \
+		} \
+	} while (0)
 
 static struct workload *
 parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
@@ -926,7 +926,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 		valid = 0;
 		memset(&step, 0, sizeof(step));
 
-		if ((field = strtok_r(fstart, ".", &fctx))) {
+		field = strtok_r(fstart, ".", &fctx);
+		if (field) {
 			fstart = NULL;
 
 			if (!strcmp(field, "d")) {
@@ -943,6 +944,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 				}
 			} else if (!strcmp(field, "P")) {
 				unsigned int nr = 0;
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
@@ -968,6 +970,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 					  "Invalid sync target at step %u!\n");
 			} else if (!strcmp(field, "S")) {
 				unsigned int nr = 0;
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(tmp <= 0 && nr == 0,
@@ -1004,6 +1007,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 				goto add_step;
 			} else if (!strcmp(field, "M")) {
 				unsigned int nr = 0;
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
@@ -1034,6 +1038,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 					  "Invalid terminate target at step %u!\n");
 			} else if (!strcmp(field, "X")) {
 				unsigned int nr = 0;
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
@@ -1058,6 +1063,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 				goto add_step;
 			} else if (!strcmp(field, "B")) {
 				unsigned int nr = 0;
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
@@ -1077,6 +1083,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 				goto add_step;
 			} else if (!strcmp(field, "b")) {
 				unsigned int nr = 0;
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					check_arg(nr > 2,
 						  "Invalid bond format at step %u!\n",
@@ -1148,7 +1155,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			valid++;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx))) {
+		field = strtok_r(fstart, ".", &fctx);
+		if (field) {
 			fstart = NULL;
 
 			i = str_to_engine(field);
@@ -1160,7 +1168,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			step.engine = i;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx))) {
+		field = strtok_r(fstart, ".", &fctx);
+		if (field) {
 			fstart = NULL;
 
 			tmp = parse_duration(nr_steps, &step.duration, scale_dur, field);
@@ -1170,7 +1179,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			valid++;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx))) {
+		field = strtok_r(fstart, ".", &fctx);
+		if (field) {
 			fstart = NULL;
 
 			tmp = parse_dependencies(nr_steps, &step, field);
@@ -1180,7 +1190,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			valid++;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx))) {
+		field = strtok_r(fstart, ".", &fctx);
+		if (field) {
 			fstart = NULL;
 
 			check_arg(strlen(field) != 1 ||
@@ -1224,7 +1235,7 @@ add_step:
 		nr_steps += app_w->nr_steps;
 	}
 
-	wrk = malloc(sizeof(*wrk));
+	wrk = calloc(1, sizeof(*wrk));
 	igt_assert(wrk);
 
 	wrk->nr_steps = nr_steps;
@@ -2717,6 +2728,7 @@ int main(int argc, char **argv)
 
 	if (append_workload_arg) {
 		struct w_arg arg = { NULL, append_workload_arg, 0 };
+
 		app_w = parse_workload(&arg, flags, scale_dur, scale_time,
 				       NULL);
 		if (!app_w) {
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [igt-dev] [PATCH i-g-t 6/8] benchmarks/gem_wsim: allow comments in workload description files
  2023-09-06 15:51 [igt-dev] [PATCH i-g-t 0/8] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (4 preceding siblings ...)
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 5/8] benchmarks/gem_wsim: cleanups Marcin Bernatowicz
@ 2023-09-06 15:51 ` Marcin Bernatowicz
  2023-09-20 16:13   ` Tvrtko Ursulin
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 7/8] benchmarks/gem_wsim: extract prepare_ctxs function, add w_sync Marcin Bernatowicz
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Marcin Bernatowicz @ 2023-09-06 15:51 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Lines starting with '#' are skipped.
If command line step separator (',') is encountered after '#'
it is replaced with ';' to not break parsing.

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c  | 41 ++++++++++++++++++++++++++++++++---------
 benchmarks/wsim/README |  2 ++
 2 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 0c1b58727..ec9fdc2d0 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -43,6 +43,7 @@
 #include <limits.h>
 #include <pthread.h>
 #include <math.h>
+#include <ctype.h>
 
 #include "drm.h"
 #include "drmtest.h"
@@ -94,6 +95,7 @@ enum w_type {
 	TERMINATE,
 	SSEU,
 	WORKINGSET,
+	SKIP,
 };
 
 struct dep_entry {
@@ -930,6 +932,12 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 		if (field) {
 			fstart = NULL;
 
+			/* line starting with # is a comment */
+			if (field[0] == '#') {
+				step.type = SKIP;
+				goto add_step;
+			}
+
 			if (!strcmp(field, "d")) {
 				int_field(DELAY, delay, tmp <= 0,
 					  "Invalid delay at step %u!\n");
@@ -1194,7 +1202,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 		if (field) {
 			fstart = NULL;
 
-			check_arg(strlen(field) != 1 ||
+			check_arg(!strlen(field) ||
+				  (strlen(field) > 1 && !isspace(field[1]) && field[1] != '#') ||
 				  (field[0] != '0' && field[0] != '1'),
 				  "Invalid wait boolean at step %u!\n",
 				  nr_steps);
@@ -1208,18 +1217,23 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 		step.type = BATCH;
 
 add_step:
-		if (step.type == DELAY)
-			step.delay = __duration(step.delay, scale_time);
+		if (step.type == SKIP) {
+			if (verbose > 3)
+				printf("skipped STEP: %s\n", _token);
+		} else {
+			if (step.type == DELAY)
+				step.delay = __duration(step.delay, scale_time);
 
-		step.idx = nr_steps++;
-		step.request = -1;
-		steps = realloc(steps, sizeof(step) * nr_steps);
-		igt_assert(steps);
+			step.idx = nr_steps++;
+			step.request = -1;
+			steps = realloc(steps, sizeof(step) * nr_steps);
+			igt_assert(steps);
 
-		memcpy(&steps[nr_steps - 1], &step, sizeof(step));
+			memcpy(&steps[nr_steps - 1], &step, sizeof(step));
+		}
 
 		free(token);
-	}
+	} // while ((_token = strtok_r(tstart, ",", &tctx))) {
 
 	if (app_w) {
 		steps = realloc(steps, sizeof(step) *
@@ -2304,6 +2318,8 @@ static void *run_workload(void *data)
 			enum intel_engine_id engine = w->engine;
 			int do_sleep = 0;
 
+			igt_assert(w->type != SKIP);
+
 			if (w->type == DELAY) {
 				do_sleep = w->delay;
 			} else if (w->type == PERIOD) {
@@ -2543,6 +2559,13 @@ static char *load_workload_descriptor(char *filename)
 	close(infd);
 
 	for (i = 0; i < len; i++) {
+		/* '#' starts comment till end of line */
+		if (buf[i] == '#')
+			/* replace ',' in comments to not break parsing */
+			while (++i < len && buf[i] != '\n')
+				if (buf[i] == ',')
+					buf[i] = ';';
+
 		if (buf[i] == '\n')
 			buf[i] = ',';
 	}
diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
index 8c71f2fe6..e4fd61645 100644
--- a/benchmarks/wsim/README
+++ b/benchmarks/wsim/README
@@ -1,6 +1,8 @@
 Workload descriptor format
 ==========================
 
+Lines starting with '#' are treated as comments (do not create work step).
+
 ctx.engine.duration_us.dependency.wait,...
 <uint>.<str>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
 B.<uint>
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [igt-dev] [PATCH i-g-t 7/8] benchmarks/gem_wsim: extract prepare_ctxs function, add w_sync
  2023-09-06 15:51 [igt-dev] [PATCH i-g-t 0/8] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (5 preceding siblings ...)
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 6/8] benchmarks/gem_wsim: allow comments in workload description files Marcin Bernatowicz
@ 2023-09-06 15:51 ` Marcin Bernatowicz
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 8/8] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
  2023-09-06 21:01 ` [igt-dev] ✗ Fi.CI.BAT: failure for benchmarks/gem_wsim: added basic xe support (rev2) Patchwork
  8 siblings, 0 replies; 22+ messages in thread
From: Marcin Bernatowicz @ 2023-09-06 15:51 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Some code reorganization, no functional changes.
Extracted prepare_ctxs function from prepare_workload.
Added w_sync abstraction for workload step synchronization.
Changes will allow cleaner xe integration.

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 145 ++++++++++++++++++++++++------------------
 1 file changed, 82 insertions(+), 63 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index ec9fdc2d0..d807a9d7d 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -261,6 +261,11 @@ static const char *ring_str_map[NUM_ENGINES] = {
 	[VECS] = "VECS",
 };
 
+static void w_sync(int fd_, struct w_step *w)
+{
+	gem_sync(fd_, w->obj[0].handle);
+}
+
 static int read_timestamp_frequency(int i915)
 {
 	int value = 0;
@@ -1886,20 +1891,13 @@ static void measure_active_set(struct workload *wrk)
 
 #define alloca0(sz) ({ size_t sz__ = (sz); memset(alloca(sz__), 0, sz__); })
 
-static int prepare_workload(unsigned int id, struct workload *wrk)
+static int prepare_ctxs(unsigned int id, struct workload *wrk)
 {
-	struct working_set **sets;
-	unsigned long total = 0;
 	uint32_t share_vm = 0;
 	int max_ctx = -1;
 	struct w_step *w;
 	int i, j;
 
-	wrk->id = id;
-	wrk->bb_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
-	wrk->bo_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
-	wrk->run = true;
-
 	/*
 	 * Pre-scan workload steps to allocate context list storage.
 	 */
@@ -2088,6 +2086,21 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
 	if (share_vm)
 		vm_destroy(fd, share_vm);
 
+	return 0;
+}
+
+static int prepare_workload(unsigned int id, struct workload *wrk)
+{
+	struct w_step *w;
+	int i, j;
+
+	wrk->id = id;
+	wrk->bb_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
+	wrk->bo_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
+	wrk->run = true;
+
+	prepare_ctxs(id, wrk);
+
 	/* Record default preemption. */
 	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
 		if (w->type == BATCH)
@@ -2108,75 +2121,81 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
 		for (j = i + 1; j < wrk->nr_steps; j++) {
 			w2 = &wrk->steps[j];
 
-			if (w2->context != w->context)
-				continue;
-			else if (w2->type == PREEMPTION)
+				if (w2->context != w->context)
+					continue;
+
+			if (w2->type == PREEMPTION)
 				break;
-			else if (w2->type != BATCH)
+			if (w2->type != BATCH)
 				continue;
 
 			w2->preempt_us = w->period;
 		}
 	}
 
-	/*
-	 * Scan for SSEU control steps.
-	 */
-	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
-		if (w->type == SSEU) {
-			get_device_sseu();
-			break;
+	{
+		struct working_set **sets;
+		unsigned long total = 0;
+
+		/*
+		 * Scan for SSEU control steps.
+		 */
+		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+			if (w->type == SSEU) {
+				get_device_sseu();
+				break;
+			}
 		}
-	}
 
-	/*
-	 * Allocate working sets.
-	 */
-	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
-		if (w->type == WORKINGSET && !w->working_set.shared)
-			total += allocate_working_set(wrk, &w->working_set);
-	}
+		/*
+		 * Allocate working sets.
+		 */
+		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+			if (w->type == WORKINGSET && !w->working_set.shared)
+				total += allocate_working_set(wrk, &w->working_set);
+		}
 
-	if (verbose > 2)
-		printf("%u: %lu bytes in working sets.\n", wrk->id, total);
+		if (verbose > 2)
+			printf("%u: %lu bytes in working sets.\n", wrk->id, total);
 
-	/*
-	 * Map of working set ids.
-	 */
-	wrk->max_working_set_id = -1;
-	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
-		if (w->type == WORKINGSET &&
-		    w->working_set.id > wrk->max_working_set_id)
-			wrk->max_working_set_id = w->working_set.id;
-	}
+		/*
+		 * Map of working set ids.
+		 */
+		wrk->max_working_set_id = -1;
+		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+			if (w->type == WORKINGSET &&
+			w->working_set.id > wrk->max_working_set_id)
+				wrk->max_working_set_id = w->working_set.id;
+		}
 
-	sets = wrk->working_sets;
-	wrk->working_sets = calloc(wrk->max_working_set_id + 1,
-				   sizeof(*wrk->working_sets));
-	igt_assert(wrk->working_sets);
+		sets = wrk->working_sets;
+		wrk->working_sets = calloc(wrk->max_working_set_id + 1,
+					sizeof(*wrk->working_sets));
+		igt_assert(wrk->working_sets);
 
-	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
-		struct working_set *set;
+		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+			struct working_set *set;
 
-		if (w->type != WORKINGSET)
-			continue;
+			if (w->type != WORKINGSET)
+				continue;
 
-		if (!w->working_set.shared) {
-			set = &w->working_set;
-		} else {
-			igt_assert(sets);
+			if (!w->working_set.shared) {
+				set = &w->working_set;
+			} else {
+				igt_assert(sets);
 
-			set = sets[w->working_set.id];
-			igt_assert(set->shared);
-			igt_assert(set->sizes);
+				set = sets[w->working_set.id];
+				igt_assert(set->shared);
+				igt_assert(set->sizes);
+			}
+
+			wrk->working_sets[w->working_set.id] = set;
 		}
 
-		wrk->working_sets[w->working_set.id] = set;
+		if (sets)
+			free(sets);
 	}
 
-	if (sets)
-		free(sets);
-
 	/*
 	 * Allocate batch buffers.
 	 */
@@ -2231,7 +2250,7 @@ static void w_sync_to(struct workload *wrk, struct w_step *w, int target)
 	igt_assert(target < wrk->nr_steps);
 	igt_assert(wrk->steps[target].type == BATCH);
 
-	gem_sync(fd, wrk->steps[target].obj[0].handle);
+	w_sync(fd, &wrk->steps[target]);
 }
 
 static void
@@ -2290,7 +2309,7 @@ static void sync_deps(struct workload *wrk, struct w_step *w)
 		igt_assert(dep_idx >= 0 && dep_idx < w->idx);
 		igt_assert(wrk->steps[dep_idx].type == BATCH);
 
-		gem_sync(fd, wrk->steps[dep_idx].obj[0].handle);
+		w_sync(fd, &wrk->steps[dep_idx]);
 	}
 }
 
@@ -2346,7 +2365,7 @@ static void *run_workload(void *data)
 
 				igt_assert(s_idx >= 0 && s_idx < i);
 				igt_assert(wrk->steps[s_idx].type == BATCH);
-				gem_sync(fd, wrk->steps[s_idx].obj[0].handle);
+				w_sync(fd, &wrk->steps[s_idx]);
 				continue;
 			} else if (w->type == THROTTLE) {
 				throttle = w->throttle;
@@ -2437,7 +2456,7 @@ static void *run_workload(void *data)
 				break;
 
 			if (w->sync)
-				gem_sync(fd, w->obj[0].handle);
+				w_sync(fd, w);
 
 			if (qd_throttle > 0) {
 				while (wrk->nrequest[engine] > qd_throttle) {
@@ -2446,7 +2465,7 @@ static void *run_workload(void *data)
 					s = igt_list_first_entry(&wrk->requests[engine],
 								 s, rq_link);
 
-					gem_sync(fd, s->obj[0].handle);
+						w_sync(fd, s);
 
 					s->request = -1;
 					igt_list_del(&s->rq_link);
@@ -2471,7 +2490,7 @@ static void *run_workload(void *data)
 				w->emit_fence = -1;
 			}
 		}
-	}
+	} // main loop
 
 	for (i = 0; i < NUM_ENGINES; i++) {
 		if (!wrk->nrequest[i])
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [igt-dev] [PATCH i-g-t 8/8] [RFC] benchmarks/gem_wsim: added basic xe support
  2023-09-06 15:51 [igt-dev] [PATCH i-g-t 0/8] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (6 preceding siblings ...)
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 7/8] benchmarks/gem_wsim: extract prepare_ctxs function, add w_sync Marcin Bernatowicz
@ 2023-09-06 15:51 ` Marcin Bernatowicz
  2023-09-21 15:57   ` Tvrtko Ursulin
  2023-09-06 21:01 ` [igt-dev] ✗ Fi.CI.BAT: failure for benchmarks/gem_wsim: added basic xe support (rev2) Patchwork
  8 siblings, 1 reply; 22+ messages in thread
From: Marcin Bernatowicz @ 2023-09-06 15:51 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Added basic xe support with few examples.
Single binary handles both i915 and Xe devices,
but workload definitions differ between i915 and xe.
Xe does not use context abstraction, introduces new VM and Exec Queue
steps and BATCH step references exec queue.
For more details see wsim/README.
Some functionality is still missing: working sets,
load balancing (need some input if/how to do it in Xe - exec queues
width?).

The tool is handy for scheduling tests, we find it useful to verify vGPU
profiles defining different execution quantum/preemption timeout
settings.

There is also some rationale for the tool in following thread:
https://lore.kernel.org/dri-devel/a443495f-5d1b-52e1-9b2f-80167deb6d57@linux.intel.com/

With this patch it should be possible to run following on xe device:

gem_wsim -w benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim -c 36 -r 600

Best with drm debug logs disabled:

echo 0 > /sys/module/drm/parameters/debug

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c                         | 534 ++++++++++++++++--
 benchmarks/wsim/README                        |  85 ++-
 benchmarks/wsim/xe_cloud-gaming-60fps.wsim    |  25 +
 benchmarks/wsim/xe_example.wsim               |  28 +
 benchmarks/wsim/xe_example01.wsim             |  19 +
 benchmarks/wsim/xe_example_fence.wsim         |  23 +
 .../wsim/xe_media_load_balance_fhd26u7.wsim   |  63 +++
 7 files changed, 722 insertions(+), 55 deletions(-)
 create mode 100644 benchmarks/wsim/xe_cloud-gaming-60fps.wsim
 create mode 100644 benchmarks/wsim/xe_example.wsim
 create mode 100644 benchmarks/wsim/xe_example01.wsim
 create mode 100644 benchmarks/wsim/xe_example_fence.wsim
 create mode 100644 benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index d807a9d7d..fa36385ec 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -62,6 +62,12 @@
 #include "i915/gem_engine_topology.h"
 #include "i915/gem_mman.h"
 
+#include "igt_syncobj.h"
+#include "intel_allocator.h"
+#include "xe_drm.h"
+#include "xe/xe_ioctl.h"
+#include "xe/xe_spin.h"
+
 enum intel_engine_id {
 	DEFAULT,
 	RCS,
@@ -95,6 +101,8 @@ enum w_type {
 	TERMINATE,
 	SSEU,
 	WORKINGSET,
+	VM,
+	EXEC_QUEUE,
 	SKIP,
 };
 
@@ -110,6 +118,10 @@ struct deps {
 	struct dep_entry *list;
 };
 
+#define for_each_dep(__dep, __deps) \
+	for (int __i = 0; __i < __deps.nr && \
+	     (__dep = &__deps.list[__i]); ++__i)
+
 struct w_arg {
 	char *filename;
 	char *desc;
@@ -145,6 +157,7 @@ struct w_step {
 	enum w_type type;
 	unsigned int context;
 	unsigned int engine;
+	unsigned int eq_idx;
 	struct duration duration;
 	struct deps data_deps;
 	struct deps fence_deps;
@@ -167,6 +180,8 @@ struct w_step {
 		};
 		int sseu;
 		struct working_set working_set;
+		struct vm *vm;
+		struct exec_queue *eq;
 	};
 
 	/* Implementation details */
@@ -178,10 +193,35 @@ struct w_step {
 	struct drm_i915_gem_execbuffer2 eb;
 	struct drm_i915_gem_exec_object2 *obj;
 	struct drm_i915_gem_relocation_entry reloc[3];
+
+	struct drm_xe_exec exec;
+	size_t bb_size;
+	struct xe_spin *spin;
+	struct drm_xe_sync *syncs;
+
 	uint32_t bb_handle;
 	uint32_t *bb_duration;
 };
 
+struct vm {
+	uint32_t id;
+	bool compute_mode;
+	uint64_t ahnd;
+};
+
+struct exec_queue {
+	uint32_t id;
+	uint32_t vm_idx; /* index in workload.vm_list */
+	struct drm_xe_engine_class_instance hwe;
+	bool compute_mode; /* vm should also be in compute mode */
+	/* timeout applied when compute_mode == false*/
+	uint32_t job_timeout_ms;
+	/* todo: preempt, timeslice and other props */
+	/* for qd_throttle */
+	unsigned int nrequest;
+	struct igt_list_head requests;
+};
+
 struct ctx {
 	uint32_t id;
 	int priority;
@@ -216,7 +256,12 @@ struct workload {
 	unsigned int nr_ctxs;
 	struct ctx *ctx_list;
 
-	struct working_set **working_sets; /* array indexed by set id */
+	unsigned int nr_vms;
+	struct vm *vm_list;
+	unsigned int nr_eqs;
+	struct exec_queue *eq_list;
+
+	struct working_set **working_sets;
 	int max_working_set_id;
 
 	int sync_timeline;
@@ -226,6 +271,14 @@ struct workload {
 	unsigned int nrequest[NUM_ENGINES];
 };
 
+#define for_each_exec_queue(__eq, __wrk) \
+	for (int __i = 0; __i < (__wrk)->nr_eqs && \
+	     (__eq = &(__wrk)->eq_list[__i]); ++__i)
+
+#define for_each_vm(__vm, __wrk) \
+	for (int __i = 0; __i < (__wrk)->nr_vms && \
+	     (__vm = &(__wrk)->vm_list[__i]); ++__i)
+
 static unsigned int master_prng;
 
 static int verbose = 1;
@@ -234,6 +287,8 @@ static struct drm_i915_gem_context_param_sseu device_sseu = {
 	.slice_mask = -1 /* Force read on first use. */
 };
 
+static bool is_xe;
+
 #define SYNCEDCLIENTS	(1<<1)
 #define DEPSYNC		(1<<2)
 #define SSEU		(1<<3)
@@ -263,7 +318,10 @@ static const char *ring_str_map[NUM_ENGINES] = {
 
 static void w_sync(int fd_, struct w_step *w)
 {
-	gem_sync(fd_, w->obj[0].handle);
+	if (is_xe)
+		igt_assert(syncobj_wait(fd_, &w->syncs[0].handle, 1, INT64_MAX, 0, NULL));
+	else
+		gem_sync(fd_, w->obj[0].handle);
 }
 
 static int read_timestamp_frequency(int i915)
@@ -367,15 +425,23 @@ parse_dependency(unsigned int nr_steps, struct w_step *w, char *str)
 		if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
 			return -1;
 
-		add_dep(&w->data_deps, entry);
+		/* only fence deps in xe, let f-1 <==> -1 */
+		if (is_xe)
+			add_dep(&w->fence_deps, entry);
+		else
+			add_dep(&w->data_deps, entry);
 
 		break;
 	case 's':
-		submit_fence = true;
+		/* no submit fence in xe ? */
+		if (!is_xe)
+			submit_fence = true;
 		/* Fall-through. */
 	case 'f':
-		/* Multiple fences not yet supported. */
-		igt_assert_eq(w->fence_deps.nr, 0);
+		/* xe supports multiple fences */
+		if (!is_xe)
+			/* Multiple fences not yet supported. */
+			igt_assert_eq(w->fence_deps.nr, 0);
 
 		entry.target = atoi(++str);
 		if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
@@ -484,6 +550,89 @@ parse_duration(unsigned int nr_steps, struct duration *dur, double scale_dur, ch
 	return 0;
 }
 
+/* v.compute_mode - 0 | 1 */
+static int
+parse_vm(unsigned int nr_steps, struct w_step *w, char *_desc)
+{
+	struct vm _vm = {};
+	char *field, *ctx = NULL;
+
+	/* skip v. part */
+	igt_assert(_desc && _desc[0] == 'v' && _desc[1] == '.');
+
+	field = strtok_r(_desc + 2, ".", &ctx);
+	if (field)
+		_vm.compute_mode = (atoi(field) == 1);
+
+	w->vm = malloc(sizeof(_vm));
+	*w->vm = _vm;
+
+	return 0;
+}
+
+/* e.vm_idx.class.instance.compute_mode<0|1>.job_timeout_ms
+ *
+ * class - int - corresponding to RCS, BCS, VCS, VECS, CCS
+ * instance - int  -1 = virtual, >=0 instance id
+ */
+static int
+parse_exec_queue(unsigned int nr_steps, struct w_step *w, char *_desc)
+{
+	struct exec_queue eq = {};
+	int id;
+	char *field, *ctx = NULL;
+
+	/* skip e. part */
+	igt_assert(_desc && _desc[0] == 'e' && _desc[1] == '.');
+
+	/* vm_idx */
+	field = strtok_r(_desc + 2, ".", &ctx);
+	if (field)
+		id = atoi(field);
+
+	if (id < 0) {
+		wsim_err("Invalid vm index at step %u!\n", nr_steps);
+		return -1;
+	}
+	eq.vm_idx = id;
+
+	/* class */
+	field = strtok_r(0, ".", &ctx);
+	if (field)
+		id = atoi(field);
+
+	if (id < 0 || id > 255) {
+		wsim_err("Invalid engine class at step %u!\n", nr_steps);
+		return -1;
+	}
+	eq.hwe.engine_class = id;
+
+	/* instance -1 - virtual (TODO), >= 0 - instance id */
+	field = strtok_r(0, ".", &ctx);
+	if (field)
+		id = atoi(field);
+
+	if (id < -1 || id > 255) {
+		wsim_err("Invalid engine instance at step %u!\n", nr_steps);
+		return -1;
+	}
+	eq.hwe.engine_instance = id;
+
+	field = strtok_r(0, ".", &ctx);
+	if (field)
+		eq.compute_mode = (atoi(field) == 1);
+
+	/* 0 - default, > 0 timeout */
+	field = strtok_r(0, ".", &ctx);
+	if (field)
+		eq.job_timeout_ms = atoi(field);
+
+	w->eq = malloc(sizeof(eq));
+	*w->eq = eq;
+
+	return 0;
+}
+
 #define check_arg(cond, fmt, ...) \
 { \
 	if (cond) { \
@@ -943,7 +1092,17 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 				goto add_step;
 			}
 
-			if (!strcmp(field, "d")) {
+			if (!strcmp(field, "v")) {
+				tmp = parse_vm(nr_steps, &step, _token);
+				check_arg(tmp < 0, "Invalid vm at step %u!\n", nr_steps);
+				step.type = VM;
+				goto add_step;
+			} else if (!strcmp(field, "e")) {
+				tmp = parse_exec_queue(nr_steps, &step, _token);
+				check_arg(tmp < 0, "Invalid exec queue at step %u!\n", nr_steps);
+				step.type = EXEC_QUEUE;
+				goto add_step;
+			} else if (!strcmp(field, "d")) {
 				int_field(DELAY, delay, tmp <= 0,
 					  "Invalid delay at step %u!\n");
 			} else if (!strcmp(field, "p")) {
@@ -958,6 +1117,11 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			} else if (!strcmp(field, "P")) {
 				unsigned int nr = 0;
 
+				if (is_xe) {
+					step.type = SKIP;
+					goto add_step;
+				}
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
@@ -984,6 +1148,11 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			} else if (!strcmp(field, "S")) {
 				unsigned int nr = 0;
 
+				if (is_xe) {
+					step.type = SKIP;
+					goto add_step;
+				}
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(tmp <= 0 && nr == 0,
@@ -1021,6 +1190,10 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			} else if (!strcmp(field, "M")) {
 				unsigned int nr = 0;
 
+				if (is_xe) {
+					step.type = SKIP;
+					goto add_step;
+				}
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
@@ -1054,7 +1227,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
-					check_arg(nr == 0 && tmp <= 0,
+					check_arg(nr == 0 && (is_xe ? tmp < 0 : tmp <= 0),
 						  "Invalid context at step %u!\n",
 						  nr_steps);
 					check_arg(nr == 1 && tmp < 0,
@@ -1077,6 +1250,10 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			} else if (!strcmp(field, "B")) {
 				unsigned int nr = 0;
 
+				if (is_xe) {
+					step.type = SKIP;
+					goto add_step;
+				}
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
@@ -1097,6 +1274,10 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			} else if (!strcmp(field, "b")) {
 				unsigned int nr = 0;
 
+				if (is_xe) {
+					step.type = SKIP;
+					goto add_step;
+				}
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					check_arg(nr > 2,
 						  "Invalid bond format at step %u!\n",
@@ -1161,24 +1342,29 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			}
 
 			tmp = atoi(field);
-			check_arg(tmp < 0, "Invalid ctx id at step %u!\n",
+			check_arg(tmp < 0, "Invalid %s id at step %u!\n",
+				  (is_xe ? "exec queue" : "ctx"),
 				  nr_steps);
 			step.context = tmp;
+			step.eq_idx = tmp;
 
 			valid++;
 		}
 
-		field = strtok_r(fstart, ".", &fctx);
-		if (field) {
-			fstart = NULL;
+		/* engine desc in BATCH type is i915 specific */
+		if (!is_xe) {
+			field = strtok_r(fstart, ".", &fctx);
+			if (field) {
+				fstart = NULL;
 
-			i = str_to_engine(field);
-			check_arg(i < 0,
-				  "Invalid engine id at step %u!\n", nr_steps);
+				i = str_to_engine(field);
+				check_arg(i < 0,
+					"Invalid engine id at step %u!\n", nr_steps);
 
-			valid++;
+				valid++;
 
-			step.engine = i;
+				step.engine = i;
+			}
 		}
 
 		field = strtok_r(fstart, ".", &fctx);
@@ -1217,7 +1403,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			valid++;
 		}
 
-		check_arg(valid != 5, "Invalid record at step %u!\n", nr_steps);
+		check_arg(valid != (is_xe ? 4 : 5), "Invalid record at step %u!\n", nr_steps);
 
 		step.type = BATCH;
 
@@ -1413,6 +1599,24 @@ __get_ctx(struct workload *wrk, const struct w_step *w)
 	return &wrk->ctx_list[w->context];
 }
 
+static struct exec_queue *
+get_eq(struct workload *wrk, const struct w_step *w)
+{
+	igt_assert(w->eq_idx < wrk->nr_eqs);
+
+	return &wrk->eq_list[w->eq_idx];
+}
+
+static struct vm *
+get_vm(struct workload *wrk, const struct w_step *w)
+{
+	uint32_t vm_idx = get_eq(wrk, w)->vm_idx;
+
+	igt_assert(vm_idx < wrk->nr_vms);
+
+	return &wrk->vm_list[vm_idx];
+}
+
 static uint32_t mmio_base(int i915, enum intel_engine_id engine, int gen)
 {
 	const char *name;
@@ -1665,6 +1869,59 @@ alloc_step_batch(struct workload *wrk, struct w_step *w)
 #endif
 }
 
+static void
+xe_alloc_step_batch(struct workload *wrk, struct w_step *w)
+{
+	struct vm *vm = get_vm(wrk, w);
+	struct exec_queue *eq = get_eq(wrk, w);
+	struct dep_entry *dep;
+	int i;
+
+	w->bb_size = ALIGN(sizeof(*w->spin) + xe_cs_prefetch_size(fd), xe_get_default_alignment(fd));
+	w->bb_handle = xe_bo_create(fd, 0, vm->id, w->bb_size);
+	w->spin = xe_bo_map(fd, w->bb_handle, w->bb_size);
+	w->exec.address = intel_allocator_alloc_with_strategy(vm->ahnd, w->bb_handle, w->bb_size,
+							0, ALLOC_STRATEGY_LOW_TO_HIGH);
+	xe_vm_bind_sync(fd, vm->id, w->bb_handle, 0, w->exec.address, w->bb_size);
+	xe_spin_init_opts(w->spin, .addr = w->exec.address,
+				   .preempt = (w->preempt_us > 0),
+				   .ctx_ticks = duration_to_ctx_ticks(fd, eq->hwe.gt_id,
+								1000 * get_duration(wrk, w)));
+	w->exec.exec_queue_id = eq->id;
+	w->exec.num_batch_buffer = 1;
+	/* always at least one out fence */
+	w->exec.num_syncs = 1;
+	/* count syncs */
+	igt_assert_eq(0, w->data_deps.nr);
+	for_each_dep(dep, w->fence_deps) {
+		int dep_idx = w->idx + dep->target;
+
+		igt_assert(dep_idx >= 0 && dep_idx < w->idx);
+		igt_assert(wrk->steps[dep_idx].type == SW_FENCE ||
+			   wrk->steps[dep_idx].type == BATCH);
+
+		w->exec.num_syncs++;
+	}
+	w->syncs = calloc(w->exec.num_syncs, sizeof(*w->syncs));
+	/* fill syncs */
+	i = 0;
+	/* out fence */
+	w->syncs[i].handle = syncobj_create(fd, 0);
+	w->syncs[i++].flags = DRM_XE_SYNC_SYNCOBJ | DRM_XE_SYNC_SIGNAL;
+	/* in fence(s) */
+	for_each_dep(dep, w->fence_deps) {
+		int dep_idx = w->idx + dep->target;
+
+		igt_assert(wrk->steps[dep_idx].type == SW_FENCE ||
+			   wrk->steps[dep_idx].type == BATCH);
+		igt_assert(wrk->steps[dep_idx].syncs && wrk->steps[dep_idx].syncs[0].handle);
+
+		w->syncs[i].handle = wrk->steps[dep_idx].syncs[0].handle;
+		w->syncs[i++].flags = DRM_XE_SYNC_SYNCOBJ;
+	}
+	w->exec.syncs = to_user_pointer(w->syncs);
+}
+
 static bool set_priority(uint32_t ctx_id, int prio)
 {
 	struct drm_i915_gem_context_param param = {
@@ -1891,6 +2148,70 @@ static void measure_active_set(struct workload *wrk)
 
 #define alloca0(sz) ({ size_t sz__ = (sz); memset(alloca(sz__), 0, sz__); })
 
+static int xe_prepare_vms_eqs(unsigned int id, struct workload *wrk)
+{
+	struct w_step *w;
+	int i, j;
+
+	/* Create vms - should be done before exec queues */
+	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+		if (w->type != VM)
+			continue;
+		wrk->nr_vms++;
+	}
+	igt_assert(wrk->nr_vms);
+	wrk->vm_list = calloc(wrk->nr_vms, sizeof(struct vm));
+
+	for (j = 0 /*vm_idx*/, i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+		struct vm *vm_;
+
+		if (w->type != VM)
+			continue;
+		vm_ = &wrk->vm_list[j];
+		*vm_ = *w->vm;
+		vm_->id = xe_vm_create(fd, 0 /*flags*/, 0 /*ext*/);
+		vm_->ahnd = intel_allocator_open(fd, vm_->id, INTEL_ALLOCATOR_RELOC);
+		j++;
+	}
+
+	/* Create exec queues */
+	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+		if (w->type != EXEC_QUEUE)
+			continue;
+		wrk->nr_eqs++;
+	}
+	igt_assert(wrk->nr_eqs);
+	wrk->eq_list = calloc(wrk->nr_eqs, sizeof(struct exec_queue));
+
+	for (j = 0 /*eq_idx*/, i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+		struct exec_queue *eq;
+		struct vm *vm_;
+
+		if (w->type != EXEC_QUEUE)
+			continue;
+		eq = &(wrk->eq_list[j]);
+		*eq = *w->eq;
+		vm_ = get_vm(wrk, w);
+		igt_assert(vm_);
+		igt_assert(eq->hwe.engine_instance >= 0);
+		eq->id = xe_exec_queue_create(fd, vm_->id, &eq->hwe, 0 /*ext*/);
+		/* init request list */
+		IGT_INIT_LIST_HEAD(&eq->requests);
+		eq->nrequest = 0;
+		j++;
+	}
+
+	/* create syncobjs for SW_FENCE */
+	for (j = 0, i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++)
+		if (w->type == SW_FENCE) {
+			w->syncs = calloc(1, sizeof(struct drm_xe_sync));
+			w->syncs[0].handle = syncobj_create(fd, 0);
+			w->syncs[0].flags = DRM_XE_SYNC_SYNCOBJ;
+		}
+
+	return 0;
+}
+
 static int prepare_ctxs(unsigned int id, struct workload *wrk)
 {
 	uint32_t share_vm = 0;
@@ -2099,7 +2420,10 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
 	wrk->bo_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
 	wrk->run = true;
 
-	prepare_ctxs(id, wrk);
+	if (is_xe)
+		xe_prepare_vms_eqs(id, wrk);
+	else
+		prepare_ctxs(id, wrk);
 
 	/* Record default preemption. */
 	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
@@ -2121,8 +2445,13 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
 		for (j = i + 1; j < wrk->nr_steps; j++) {
 			w2 = &wrk->steps[j];
 
+			if (is_xe) {
+				if (w2->eq_idx != w->eq_idx)
+					continue;
+			} else {
 				if (w2->context != w->context)
 					continue;
+			}
 
 			if (w2->type == PREEMPTION)
 				break;
@@ -2133,7 +2462,7 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
 		}
 	}
 
-	{
+	if (!is_xe) {
 		struct working_set **sets;
 		unsigned long total = 0;
 
@@ -2203,10 +2532,14 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
 		if (w->type != BATCH)
 			continue;
 
-		alloc_step_batch(wrk, w);
+		if (is_xe)
+			xe_alloc_step_batch(wrk, w);
+		else
+			alloc_step_batch(wrk, w);
 	}
 
-	measure_active_set(wrk);
+	if (!is_xe)
+		measure_active_set(wrk);
 
 	return 0;
 }
@@ -2253,6 +2586,31 @@ static void w_sync_to(struct workload *wrk, struct w_step *w, int target)
 	w_sync(fd, &wrk->steps[target]);
 }
 
+static void do_xe_exec(struct workload *wrk, struct w_step *w)
+{
+	struct exec_queue *eq = get_eq(wrk, w);
+
+	igt_assert(w->emit_fence <= 0);
+	if (w->emit_fence == -1)
+		syncobj_reset(fd, &w->syncs[0].handle, 1);
+
+	/* update duration if random */
+	if (w->duration.max != w->duration.min)
+		xe_spin_init_opts(w->spin, .addr = w->exec.address,
+					   .preempt = (w->preempt_us > 0),
+					   .ctx_ticks = duration_to_ctx_ticks(fd, eq->hwe.gt_id,
+								1000LL * get_duration(wrk, w)));
+	xe_exec(fd, &w->exec);
+
+	/* for qd_throttle */
+	if (w->rq_link.prev != NULL || w->rq_link.next != NULL) {
+		igt_list_del(&w->rq_link);
+		eq->nrequest--;
+	}
+	igt_list_add_tail(&w->rq_link, &eq->requests);
+	eq->nrequest++;
+}
+
 static void
 do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine)
 {
@@ -2379,6 +2737,9 @@ static void *run_workload(void *data)
 					sw_sync_timeline_create_fence(wrk->sync_timeline,
 								      cur_seqno + w->idx);
 				igt_assert(w->emit_fence > 0);
+				if (is_xe)
+					/* Convert sync file to syncobj */
+					syncobj_import_sync_file(fd, w->syncs[0].handle, w->emit_fence);
 				continue;
 			} else if (w->type == SW_FENCE_SIGNAL) {
 				int tgt = w->idx + w->target;
@@ -2410,7 +2771,10 @@ static void *run_workload(void *data)
 				igt_assert(wrk->steps[t_idx].type == BATCH);
 				igt_assert(wrk->steps[t_idx].duration.unbound_duration);
 
-				*wrk->steps[t_idx].bb_duration = 0xffffffff;
+				if (is_xe)
+					xe_spin_end(wrk->steps[t_idx].spin);
+				else
+					*wrk->steps[t_idx].bb_duration = 0xffffffff;
 				__sync_synchronize();
 				continue;
 			} else if (w->type == SSEU) {
@@ -2424,7 +2788,9 @@ static void *run_workload(void *data)
 				   w->type == ENGINE_MAP ||
 				   w->type == LOAD_BALANCE ||
 				   w->type == BOND ||
-				   w->type == WORKINGSET) {
+				   w->type == WORKINGSET ||
+				   w->type == VM ||
+				   w->type == EXEC_QUEUE) {
 				   /* No action for these at execution time. */
 				continue;
 			}
@@ -2442,15 +2808,19 @@ static void *run_workload(void *data)
 			if (throttle > 0)
 				w_sync_to(wrk, w, i - throttle);
 
-			do_eb(wrk, w, engine);
+			if (is_xe)
+				do_xe_exec(wrk, w);
+			else {
+				do_eb(wrk, w, engine);
 
-			if (w->request != -1) {
-				igt_list_del(&w->rq_link);
-				wrk->nrequest[w->request]--;
+				if (w->request != -1) {
+					igt_list_del(&w->rq_link);
+					wrk->nrequest[w->request]--;
+				}
+				w->request = engine;
+				igt_list_add_tail(&w->rq_link, &wrk->requests[engine]);
+				wrk->nrequest[engine]++;
 			}
-			w->request = engine;
-			igt_list_add_tail(&w->rq_link, &wrk->requests[engine]);
-			wrk->nrequest[engine]++;
 
 			if (!wrk->run)
 				break;
@@ -2459,17 +2829,33 @@ static void *run_workload(void *data)
 				w_sync(fd, w);
 
 			if (qd_throttle > 0) {
-				while (wrk->nrequest[engine] > qd_throttle) {
-					struct w_step *s;
+				if (is_xe) {
+					struct exec_queue *eq = get_eq(wrk, w);
 
-					s = igt_list_first_entry(&wrk->requests[engine],
-								 s, rq_link);
+					while (eq->nrequest > qd_throttle) {
+						struct w_step *s;
+
+						s = igt_list_first_entry(&eq->requests, s, rq_link);
+
+						w_sync(fd, s);
+
+						igt_list_del(&s->rq_link);
+						eq->nrequest--;
+					}
+				} else {
+					while (wrk->nrequest[engine] > qd_throttle) {
+						struct w_step *s;
+
+						s = igt_list_first_entry(&wrk->requests[engine],
+									s, rq_link);
 
 						w_sync(fd, s);
+						// gem_sync(fd, s->obj[0].handle);
 
-					s->request = -1;
-					igt_list_del(&s->rq_link);
-					wrk->nrequest[engine]--;
+						s->request = -1;
+						igt_list_del(&s->rq_link);
+						wrk->nrequest[engine]--;
+					}
 				}
 			}
 		}
@@ -2486,18 +2872,50 @@ static void *run_workload(void *data)
 		for (i = 0, w = wrk->steps; wrk->run && (i < wrk->nr_steps);
 		     i++, w++) {
 			if (w->emit_fence > 0) {
-				close(w->emit_fence);
-				w->emit_fence = -1;
+				if (is_xe) {
+					igt_assert(w->type == SW_FENCE);
+					close(w->emit_fence);
+					w->emit_fence = -1;
+					syncobj_reset(fd, &w->syncs[0].handle, 1);
+				} else {
+					close(w->emit_fence);
+					w->emit_fence = -1;
+				}
 			}
 		}
 	} // main loop
 
-	for (i = 0; i < NUM_ENGINES; i++) {
-		if (!wrk->nrequest[i])
-			continue;
+	if (is_xe) {
+		struct exec_queue *eq;
+
+		for_each_exec_queue(eq, wrk) {
+			if (eq->nrequest) {
+				w = igt_list_last_entry(&eq->requests, w, rq_link);
+				w_sync(fd, w);
+			}
+		}
+
+		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+			if (w->type == BATCH) {
+				w_sync(fd, w);
+				syncobj_destroy(fd, w->syncs[0].handle);
+				free(w->syncs);
+				xe_vm_unbind_sync(fd, get_vm(wrk, w)->id, 0, w->exec.address, w->bb_size);
+				gem_munmap(w->spin, w->bb_size);
+				gem_close(fd, w->bb_handle);
+			} else if (w->type == SW_FENCE) {
+				syncobj_destroy(fd, w->syncs[0].handle);
+				free(w->syncs);
+			}
+		}
+	} else {
+		for (i = 0; i < NUM_ENGINES; i++) {
+			if (!wrk->nrequest[i])
+				continue;
 
-		w = igt_list_last_entry(&wrk->requests[i], w, rq_link);
-		gem_sync(fd, w->obj[0].handle);
+			w = igt_list_last_entry(&wrk->requests[i], w, rq_link);
+			w_sync(fd, w);
+		}
 	}
 
 	clock_gettime(CLOCK_MONOTONIC, &t_end);
@@ -2519,6 +2937,21 @@ static void *run_workload(void *data)
 
 static void fini_workload(struct workload *wrk)
 {
+	if (is_xe) {
+		struct exec_queue *eq;
+		struct vm *vm_;
+
+		for_each_exec_queue(eq, wrk)
+			xe_exec_queue_destroy(fd, eq->id);
+		free(wrk->eq_list);
+		wrk->nr_eqs = 0;
+		for_each_vm(vm_, wrk) {
+			put_ahnd(vm_->ahnd);
+			xe_vm_destroy(fd, vm_->id);
+		}
+		free(wrk->vm_list);
+		wrk->nr_vms = 0;
+	}
 	free(wrk->steps);
 	free(wrk);
 }
@@ -2726,8 +3159,12 @@ int main(int argc, char **argv)
 		ret = igt_device_find_first_i915_discrete_card(&card);
 		if (!ret)
 			ret = igt_device_find_integrated_card(&card);
+		if (!ret)
+			ret = igt_device_find_first_xe_discrete_card(&card);
+		if (!ret)
+			ret = igt_device_find_xe_integrated_card(&card);
 		if (!ret) {
-			wsim_err("No device filter specified and no i915 devices found!\n");
+			wsim_err("No device filter specified and no intel devices found!\n");
 			return EXIT_FAILURE;
 		}
 	}
@@ -2742,6 +3179,7 @@ int main(int argc, char **argv)
 	}
 
 	fd = open(drm_dev, O_RDWR);
+
 	if (fd < 0) {
 		wsim_err("Failed to open '%s'! (%s)\n",
 			 drm_dev, strerror(errno));
@@ -2750,6 +3188,10 @@ int main(int argc, char **argv)
 	if (verbose > 1)
 		printf("Using device %s\n", drm_dev);
 
+	is_xe = is_xe_device(fd);
+	if (is_xe)
+		xe_device_get(fd);
+
 	if (!nr_w_args) {
 		wsim_err("No workload descriptor(s)!\n");
 		goto err;
diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
index e4fd61645..ddfefff47 100644
--- a/benchmarks/wsim/README
+++ b/benchmarks/wsim/README
@@ -3,6 +3,7 @@ Workload descriptor format
 
 Lines starting with '#' are treated as comments (do not create work step).
 
+# i915
 ctx.engine.duration_us.dependency.wait,...
 <uint>.<str>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
 B.<uint>
@@ -13,6 +14,23 @@ b.<uint>.<str>[|<str>].<str>
 w|W.<uint>.<str>[/<str>]...
 f
 
+# xe
+Xe does not use context abstraction and adds additional work step types
+for VM (v.) and exec queue (e.) creation.
+Each v. and e. step creates array entry (in workload's VM and Exec Queue arrays).
+Batch step references the exec queue on which it is to be executed.
+Exec queue reference (eq_idx) is the index (0-based) in workload's exec queue array.
+VM reference (vm_idx) is the index (0-based) in workload's VM array.
+
+v.compute_mode
+v.<0|1>
+e.vm_idx.class.instance.compute_mode.job_timeout_ms,...
+e.<uint>.<uint 0=RCS,1=BCS,2=VCS,3=VECS,4=CCS>.<int>.<0|1>.<uint>,...
+eq_idx.duration_us.dependency.wait,...
+<uint>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
+d|p|s|t|q|a|T.<int>,...
+f
+
 For duration a range can be given from which a random value will be picked
 before every submit. Since this and seqno management requires CPU access to
 objects, care needs to be taken in order to ensure the submit queue is deep
@@ -29,21 +47,22 @@ Additional workload steps are also supported:
  'q' - Throttle to n max queue depth.
  'f' - Create a sync fence.
  'a' - Advance the previously created sync fence.
- 'B' - Turn on context load balancing.
- 'b' - Set up engine bonds.
- 'M' - Set up engine map.
- 'P' - Context priority.
- 'S' - Context SSEU configuration.
+ 'B' - Turn on context load balancing. (i915 only)
+ 'b' - Set up engine bonds. (i915 only)
+ 'M' - Set up engine map. (i915 only)
+ 'P' - Context priority. (i915 only)
+ 'S' - Context SSEU configuration. (i915 only)
  'T' - Terminate an infinite batch.
- 'w' - Working set. (See Working sets section.)
- 'W' - Shared working set.
- 'X' - Context preemption control.
+ 'w' - Working set. (See Working sets section.) (i915 only)
+ 'W' - Shared working set. (i915 only)
+ 'X' - Context preemption control. (i915 only)
 
 Engine ids: DEFAULT, RCS, BCS, VCS, VCS1, VCS2, VECS
 
 Example (leading spaces must not be present in the actual file):
 ----------------------------------------------------------------
 
+# i915
   1.VCS1.3000.0.1
   1.RCS.500-1000.-1.0
   1.RCS.3700.0.0
@@ -53,6 +72,25 @@ Example (leading spaces must not be present in the actual file):
   1.VCS2.600.-1.1
   p.16000
 
+# xe equivalent
+  #VM: v.compute_mode
+  v.0
+  #EXEC_QUEUE: e.vm_idx.class.intance.compute_mode.job_timeout_ms
+  e.0.2.0.0.0 # VCS1
+  e.0.0.0.0.0 # RCS
+  e.0.2.1.0.0 # VCS2
+  e.0.0.0.0.0 # second RCS exec queue
+  #BATCH: eq_idx.duration.dependency.wait
+  0.3000.0.1       # 1.VCS1.3000.0.1
+  1.500-1000.-1.0  # 1.RCS.500-1000.-1.0
+  3.3700.0.0       # 1.RCS.3700.0.0
+  1.1000.-2.1      # 1.RCS.1000.-2.0
+  2.2300.-2.0      # 1.VCS2.2300.-2.0
+  3.4700.-1.0      # 1.RCS.4700.-1.0
+  2.600.-1.1       # 1.VCS2.600.-1.1
+  p.16000
+
+
 The above workload described in human language works like this:
 
   1.   A batch is sent to the VCS1 engine which will be executing for 3ms on the
@@ -78,16 +116,30 @@ Multiple dependencies can be given separated by forward slashes.
 
 Example:
 
+# i915
   1.VCS1.3000.0.1
   1.RCS.3700.0.0
   1.VCS2.2300.-1/-2.0
 
+# xe
+  v.0
+  e.0.2.0.0.0
+  e.0.0.0.0.0
+  e.0.2.1.0.0.0
+  0.3000.0.1
+  1.3700.0.0
+  2.2300.-1/-2.0
+
 I this case the last step has a data dependency on both first and second steps.
 
 Batch durations can also be specified as infinite by using the '*' in the
 duration field. Such batches must be ended by the terminate command ('T')
 otherwise they will cause a GPU hang to be reported.
 
+Note: On Xe Batch dependencies are expressed with syncobjects,
+so there is no difference between f-1 and -1
+ex. 1.1000.-2.0 is same as 1.1000.f-2.0.
+
 Sync (fd) fences
 ----------------
 
@@ -116,6 +168,7 @@ VCS1 and VCS2 batches will have a sync fence dependency on the RCS batch.
 
 Example:
 
+# i915
   1.RCS.500-1000.0.0
   f
   2.VCS1.3000.f-1.0
@@ -125,13 +178,27 @@ Example:
   s.-4
   s.-4
 
+# xe equivalent
+  v.0
+  e.0.0.0.0.0    # RCS
+  e.0.2.0.0.0    # VCS1
+  e.0.2.1.0.0    # VCS2
+  0.500-1000.0.0
+  f
+  1.3000.f-1.0
+  2.3000.f-2.0
+  0.500-1000.0.1
+  a.-4
+  s.-4
+  s.-4
+
 VCS1 and VCS2 batches have an input sync fence dependecy on the standalone fence
 created at the second step. They are submitted ahead of time while still not
 runnable. When the second RCS batch completes the standalone fence is signaled
 which allows the two VCS batches to be executed. Finally we wait until the both
 VCS batches have completed before starting the (optional) next iteration.
 
-Submit fences
+Submit fences (i915 only?)
 -------------
 
 Submit fences are a type of input fence which are signalled when the originating
diff --git a/benchmarks/wsim/xe_cloud-gaming-60fps.wsim b/benchmarks/wsim/xe_cloud-gaming-60fps.wsim
new file mode 100644
index 000000000..9fdf15e27
--- /dev/null
+++ b/benchmarks/wsim/xe_cloud-gaming-60fps.wsim
@@ -0,0 +1,25 @@
+#w.1.10n8m
+#w.2.3n16m
+#1.RCS.500-1500.r1-0-4/w2-0.0
+#1.RCS.500-1500.r1-5-9/w2-1.0
+#1.RCS.500-1500.r2-0-1/w2-2.0
+#M.2.VCS
+#B.2
+#3.RCS.500-1500.r2-2.0
+#2.DEFAULT.2000-4000.-1.0
+#4.VCS1.250-750.-1.1
+#p.16667
+#
+#xe
+v.0
+e.0.0.0.0.0 # 1.RCS.500-1500.r1-0-4/w2-0.0
+e.0.2.0.0.0 # 2.DEFAULT.2000-4000.-1.0
+e.0.0.0.0.0 # 3.RCS.500-1500.r2-2.0
+e.0.2.1.0.0 # 4.VCS1.250-750.-1.1
+0.500-1500.0.0
+0.500-1500.0.0
+0.500-1500.0.0
+2.500-1500.-2.0 # #3.RCS.500-1500.r2-2.0
+1.2000-4000.-1.0
+3.250-750.-1.1
+p.16667
diff --git a/benchmarks/wsim/xe_example.wsim b/benchmarks/wsim/xe_example.wsim
new file mode 100644
index 000000000..3fa620932
--- /dev/null
+++ b/benchmarks/wsim/xe_example.wsim
@@ -0,0 +1,28 @@
+#i915
+#1.VCS1.3000.0.1
+#1.RCS.500-1000.-1.0
+#1.RCS.3700.0.0
+#1.RCS.1000.-2.0
+#1.VCS2.2300.-2.0
+#1.RCS.4700.-1.0
+#1.VCS2.600.-1.1
+#p.16000
+#
+#xe
+#
+#VM: v.compute_mode
+v.0
+#EXEC_QUEUE: e.vm_idx.class.intance.compute_mode.job_timeout_ms
+e.0.2.0.0.0 # VCS1
+e.0.0.0.0.0 # RCS
+e.0.2.1.0.0 # VCS2
+e.0.0.0.0.0 # second RCS exec_queue
+#BATCH: eq_idx.duration.dependency.wait
+0.3000.0.1       # 1.VCS1.3000.0.1
+1.500-1000.-1.0  # 1.RCS.500-1000.-1.0
+3.3700.0.0       # 1.RCS.3700.0.0
+1.1000.-2.1      # 1.RCS.1000.-2.0
+2.2300.-2.0      # 1.VCS2.2300.-2.0
+3.4700.-1.0      # 1.RCS.4700.-1.0
+2.600.-1.1       # 1.VCS2.600.-1.1
+p.16000
diff --git a/benchmarks/wsim/xe_example01.wsim b/benchmarks/wsim/xe_example01.wsim
new file mode 100644
index 000000000..496905371
--- /dev/null
+++ b/benchmarks/wsim/xe_example01.wsim
@@ -0,0 +1,19 @@
+#VM: v.compute_mode
+v.0
+#EXEC_QUEUE: e.vm_idx.class.intance.compute_mode.job_timeout_ms
+e.0.0.0.0.0
+e.0.2.0.0.0
+e.0.1.0.0.0
+#BATCH: eq_idx.duration.dependency.wait
+# B1 - 10ms batch on BCS0
+2.10000.0.0
+# B2 - 10ms batch on RCS0; waits on B1
+0.10000.0.0
+# B3 - 10ms batch on VECS0; waits on B2
+1.10000.0.0
+# B4 - 10ms batch on BCS0
+2.10000.0.0
+# B5 - 10ms batch on RCS0; waits on B4
+0.10000.-1.0
+# B6 - 10ms batch on VECS0; waits on B5; wait on batch fence out
+1.10000.-1.1
diff --git a/benchmarks/wsim/xe_example_fence.wsim b/benchmarks/wsim/xe_example_fence.wsim
new file mode 100644
index 000000000..4f810d64e
--- /dev/null
+++ b/benchmarks/wsim/xe_example_fence.wsim
@@ -0,0 +1,23 @@
+#i915
+#1.RCS.500-1000.0.0
+#f
+#2.VCS1.3000.f-1.0
+#2.VCS2.3000.f-2.0
+#1.RCS.500-1000.0.1
+#a.-4
+#s.-4
+#s.-4
+#
+#xe
+v.0
+e.0.0.0.0.0
+e.0.2.0.0.0
+e.0.2.1.0.0
+0.500-1000.0.0
+f
+1.3000.f-1.0
+2.3000.f-2.0
+0.500-1000.0.1
+a.-4
+s.-4
+s.-4
diff --git a/benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim b/benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim
new file mode 100644
index 000000000..2214914eb
--- /dev/null
+++ b/benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim
@@ -0,0 +1,63 @@
+# https://lore.kernel.org/dri-devel/a443495f-5d1b-52e1-9b2f-80167deb6d57@linux.intel.com/
+#i915
+#M.3.VCS
+#B.3
+#1.VCS1.1200-1800.0.0
+#1.VCS1.1900-2100.0.0
+#2.RCS.1500-2000.-1.0
+#3.VCS.1400-1800.-1.1
+#1.VCS1.1900-2100.-1.0
+#2.RCS.1500-2000.-1.0
+#3.VCS.1400-1800.-1.1
+#1.VCS1.1900-2100.-1.0
+#2.RCS.200-400.-1.0
+#2.RCS.1500-2000.0.0
+#3.VCS.1400-1800.-1.1
+#1.VCS1.1900-2100.-1.0
+#2.RCS.1500-2000.-1.0
+#3.VCS.1400-1800.-1.1
+#1.VCS1.1900-2100.-1.0
+#2.RCS.200-400.-1.0
+#2.RCS.1500-2000.0.0
+#3.VCS.1400-1800.-1.1
+#1.VCS1.1900-2100.-1.0
+#2.RCS.1500-2000.-1.0
+#3.VCS.1400-1800.-1.1
+#1.VCS1.1900-2100.-1.0
+#2.RCS.1500-2000.-1.0
+#2.RCS.1500-2000.0.0
+#3.VCS.1400-1800.-1.1
+#
+#xe
+#
+#M.3.VCS ??
+#B.3     ??
+v.0
+e.0.2.0.0.0 # 1.VCS1
+e.0.0.0.0.0 # 2.RCS
+e.0.2.1.0.0 # 3.VCS - no load balancing yet always VCS2
+0.1200-1800.0.0
+0.1900-2100.0.0
+1.1500-2000.-1.0
+2.1400-1800.-1.1
+0.1900-2100.-1.0
+1.1500-2000.-1.0
+2.1400-1800.-1.1
+0.1900-2100.-1.0
+1.200-400.-1.0
+1.1500-2000.0.0
+2.1400-1800.-1.1
+0.1900-2100.-1.0
+1.1500-2000.-1.0
+2.1400-1800.-1.1
+0.1900-2100.-1.0
+1.200-400.-1.0
+1.1500-2000.0.0
+2.1400-1800.-1.1
+0.1900-2100.-1.0
+1.1500-2000.-1.0
+2.1400-1800.-1.1
+0.1900-2100.-1.0
+1.1500-2000.-1.0
+1.1500-2000.0.0
+2.1400-1800.-1.1
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [igt-dev] ✗ Fi.CI.BAT: failure for benchmarks/gem_wsim: added basic xe support (rev2)
  2023-09-06 15:51 [igt-dev] [PATCH i-g-t 0/8] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (7 preceding siblings ...)
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 8/8] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
@ 2023-09-06 21:01 ` Patchwork
  2023-09-07  9:30   ` Bernatowicz, Marcin
  8 siblings, 1 reply; 22+ messages in thread
From: Patchwork @ 2023-09-06 21:01 UTC (permalink / raw)
  To: Marcin Bernatowicz; +Cc: igt-dev

[-- Attachment #1: Type: text/plain, Size: 9740 bytes --]

== Series Details ==

Series: benchmarks/gem_wsim: added basic xe support (rev2)
URL   : https://patchwork.freedesktop.org/series/122920/
State : failure

== Summary ==

CI Bug Log - changes from IGT_7472 -> IGTPW_9733
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with IGTPW_9733 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in IGTPW_9733, please notify your bug team (lgci.bug.filing@intel.com) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/index.html

Participating hosts (40 -> 39)
------------------------------

  Additional (1): bat-dg2-8 
  Missing    (2): fi-kbl-soraka fi-snb-2520m 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in IGTPW_9733:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_exec_suspend@basic-s3@smem:
    - bat-dg2-8:          NOTRUN -> [INCOMPLETE][1]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@gem_exec_suspend@basic-s3@smem.html

  
Known issues
------------

  Here are the changes found in IGTPW_9733 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_mmap@basic:
    - bat-dg2-8:          NOTRUN -> [SKIP][2] ([i915#4083])
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@gem_mmap@basic.html

  * igt@gem_mmap_gtt@basic:
    - bat-dg2-8:          NOTRUN -> [SKIP][3] ([i915#4077]) +2 other tests skip
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@gem_mmap_gtt@basic.html

  * igt@gem_tiled_pread_basic:
    - bat-dg2-8:          NOTRUN -> [SKIP][4] ([i915#4079]) +1 other test skip
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@gem_tiled_pread_basic.html

  * igt@i915_pm_backlight@basic-brightness:
    - bat-dg2-8:          NOTRUN -> [SKIP][5] ([i915#5354] / [i915#7561])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@i915_pm_backlight@basic-brightness.html

  * igt@i915_pm_rps@basic-api:
    - bat-dg2-8:          NOTRUN -> [SKIP][6] ([i915#6621])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@i915_pm_rps@basic-api.html

  * igt@i915_suspend@basic-s3-without-i915:
    - bat-dg2-8:          NOTRUN -> [SKIP][7] ([i915#6645])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@i915_suspend@basic-s3-without-i915.html

  * igt@kms_addfb_basic@addfb25-y-tiled-small-legacy:
    - fi-hsw-4770:        NOTRUN -> [SKIP][8] ([fdo#109271]) +13 other tests skip
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/fi-hsw-4770/igt@kms_addfb_basic@addfb25-y-tiled-small-legacy.html
    - bat-dg2-8:          NOTRUN -> [SKIP][9] ([i915#5190])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@kms_addfb_basic@addfb25-y-tiled-small-legacy.html

  * igt@kms_addfb_basic@basic-y-tiled-legacy:
    - bat-dg2-8:          NOTRUN -> [SKIP][10] ([i915#4215] / [i915#5190])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@kms_addfb_basic@basic-y-tiled-legacy.html

  * igt@kms_addfb_basic@framebuffer-vs-set-tiling:
    - bat-dg2-8:          NOTRUN -> [SKIP][11] ([i915#4212]) +7 other tests skip
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@kms_addfb_basic@framebuffer-vs-set-tiling.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy:
    - bat-dg2-8:          NOTRUN -> [SKIP][12] ([i915#4103] / [i915#4213]) +1 other test skip
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy.html

  * igt@kms_force_connector_basic@force-load-detect:
    - bat-dg2-8:          NOTRUN -> [SKIP][13] ([fdo#109285])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_force_connector_basic@prune-stale-modes:
    - bat-dg2-8:          NOTRUN -> [SKIP][14] ([i915#5274])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@kms_force_connector_basic@prune-stale-modes.html

  * igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-b-dp-6:
    - bat-adlp-11:        [PASS][15] -> [ABORT][16] ([i915#8668])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_7472/bat-adlp-11/igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-b-dp-6.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-adlp-11/igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-b-dp-6.html

  * igt@kms_pipe_crc_basic@suspend-read-crc@pipe-c-vga-1:
    - fi-hsw-4770:        NOTRUN -> [DMESG-WARN][17] ([i915#8841]) +6 other tests dmesg-warn
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/fi-hsw-4770/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-c-vga-1.html

  * igt@kms_psr@cursor_plane_move:
    - bat-dg2-8:          NOTRUN -> [SKIP][18] ([i915#1072]) +3 other tests skip
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@kms_psr@cursor_plane_move.html

  * igt@kms_psr@sprite_plane_onoff:
    - fi-hsw-4770:        NOTRUN -> [SKIP][19] ([fdo#109271] / [i915#1072]) +3 other tests skip
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/fi-hsw-4770/igt@kms_psr@sprite_plane_onoff.html

  * igt@kms_setmode@basic-clone-single-crtc:
    - bat-dg2-8:          NOTRUN -> [SKIP][20] ([i915#3555])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@kms_setmode@basic-clone-single-crtc.html

  * igt@prime_vgem@basic-fence-flip:
    - bat-dg2-8:          NOTRUN -> [SKIP][21] ([i915#3708])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@prime_vgem@basic-fence-flip.html

  * igt@prime_vgem@basic-fence-mmap:
    - bat-dg2-8:          NOTRUN -> [SKIP][22] ([i915#3708] / [i915#4077]) +1 other test skip
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@prime_vgem@basic-fence-mmap.html

  * igt@prime_vgem@basic-write:
    - bat-dg2-8:          NOTRUN -> [SKIP][23] ([i915#3291] / [i915#3708]) +2 other tests skip
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@prime_vgem@basic-write.html

  
#### Possible fixes ####

  * igt@kms_cursor_legacy@basic-flip-before-cursor-legacy:
    - bat-adlp-11:        [DMESG-WARN][24] ([i915#4309]) -> [PASS][25]
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_7472/bat-adlp-11/igt@kms_cursor_legacy@basic-flip-before-cursor-legacy.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-adlp-11/igt@kms_cursor_legacy@basic-flip-before-cursor-legacy.html

  * igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-a-dp-5:
    - bat-adlp-11:        [ABORT][26] ([i915#8668]) -> [PASS][27]
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_7472/bat-adlp-11/igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-a-dp-5.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-adlp-11/igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-a-dp-5.html

  
#### Warnings ####

  * igt@kms_psr@sprite_plane_onoff:
    - bat-rplp-1:         [SKIP][28] ([i915#1072]) -> [ABORT][29] ([i915#8442] / [i915#8668] / [i915#8712])
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_7472/bat-rplp-1/igt@kms_psr@sprite_plane_onoff.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-rplp-1/igt@kms_psr@sprite_plane_onoff.html

  
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#3291]: https://gitlab.freedesktop.org/drm/intel/issues/3291
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212
  [i915#4213]: https://gitlab.freedesktop.org/drm/intel/issues/4213
  [i915#4215]: https://gitlab.freedesktop.org/drm/intel/issues/4215
  [i915#4309]: https://gitlab.freedesktop.org/drm/intel/issues/4309
  [i915#5190]: https://gitlab.freedesktop.org/drm/intel/issues/5190
  [i915#5274]: https://gitlab.freedesktop.org/drm/intel/issues/5274
  [i915#5354]: https://gitlab.freedesktop.org/drm/intel/issues/5354
  [i915#6621]: https://gitlab.freedesktop.org/drm/intel/issues/6621
  [i915#6645]: https://gitlab.freedesktop.org/drm/intel/issues/6645
  [i915#7561]: https://gitlab.freedesktop.org/drm/intel/issues/7561
  [i915#8442]: https://gitlab.freedesktop.org/drm/intel/issues/8442
  [i915#8668]: https://gitlab.freedesktop.org/drm/intel/issues/8668
  [i915#8712]: https://gitlab.freedesktop.org/drm/intel/issues/8712
  [i915#8841]: https://gitlab.freedesktop.org/drm/intel/issues/8841


Build changes
-------------

  * CI: CI-20190529 -> None
  * IGT: IGT_7472 -> IGTPW_9733

  CI-20190529: 20190529
  CI_DRM_13605: 5008076127a9599704e98fb4de3761743d943dd0 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_9733: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/index.html
  IGT_7472: b9d6f8dd0f69d0091e349ddc9d9f1425b2f36ec9 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/index.html

[-- Attachment #2: Type: text/html, Size: 11567 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [igt-dev] ✗ Fi.CI.BAT: failure for benchmarks/gem_wsim: added basic xe support (rev2)
  2023-09-06 21:01 ` [igt-dev] ✗ Fi.CI.BAT: failure for benchmarks/gem_wsim: added basic xe support (rev2) Patchwork
@ 2023-09-07  9:30   ` Bernatowicz, Marcin
  0 siblings, 0 replies; 22+ messages in thread
From: Bernatowicz, Marcin @ 2023-09-07  9:30 UTC (permalink / raw)
  To: igt-dev



On 9/6/2023 11:01 PM, Patchwork wrote:
> *Patch Details*
> *Series:*	benchmarks/gem_wsim: added basic xe support (rev2)
> *URL:*	https://patchwork.freedesktop.org/series/122920/ 
> <https://patchwork.freedesktop.org/series/122920/>
> *State:*	failure
> *Details:* 
> https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/index.html 
> <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/index.html>
> 
> 
>   CI Bug Log - changes from IGT_7472 -> IGTPW_9733
> 
> 
>     Summary
> 
> *FAILURE*
> 
> Serious unknown changes coming with IGTPW_9733 absolutely need to be
> verified manually.
> 
> If you think the reported changes have nothing to do with the changes
> introduced in IGTPW_9733, please notify your bug team 
> (lgci.bug.filing@intel.com) to allow them
> to document this new failure mode, which will reduce false positives in CI.
> 
> External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/index.html
> 
> 
>     Participating hosts (40 -> 39)
> 
> Additional (1): bat-dg2-8
> Missing (2): fi-kbl-soraka fi-snb-2520m
> 
> 
>     Possible new issues
> 
> Here are the unknown changes that may have been introduced in IGTPW_9733:
> 
> 
>       IGT changes
> 
> 
>         Possible regressions
> 
>   * igt@gem_exec_suspend@basic-s3@smem:
>       o bat-dg2-8: NOTRUN -> INCOMPLETE
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@gem_exec_suspend@basic-s3@smem.html>
> 

Unrelated to the change.

-- 
Marcin
> 
>     Known issues
> 
> Here are the changes found in IGTPW_9733 that come from known issues:
> 
> 
>       IGT changes
> 
> 
>         Issues hit
> 
>   *
> 
>     igt@gem_mmap@basic:
> 
>       o bat-dg2-8: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@gem_mmap@basic.html> (i915#4083 <https://gitlab.freedesktop.org/drm/intel/issues/4083>)
>   *
> 
>     igt@gem_mmap_gtt@basic:
> 
>       o bat-dg2-8: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@gem_mmap_gtt@basic.html> (i915#4077 <https://gitlab.freedesktop.org/drm/intel/issues/4077>) +2 other tests skip
>   *
> 
>     igt@gem_tiled_pread_basic:
> 
>       o bat-dg2-8: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@gem_tiled_pread_basic.html> (i915#4079 <https://gitlab.freedesktop.org/drm/intel/issues/4079>) +1 other test skip
>   *
> 
>     igt@i915_pm_backlight@basic-brightness:
> 
>       o bat-dg2-8: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@i915_pm_backlight@basic-brightness.html> (i915#5354 <https://gitlab.freedesktop.org/drm/intel/issues/5354> / i915#7561 <https://gitlab.freedesktop.org/drm/intel/issues/7561>)
>   *
> 
>     igt@i915_pm_rps@basic-api:
> 
>       o bat-dg2-8: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@i915_pm_rps@basic-api.html> (i915#6621 <https://gitlab.freedesktop.org/drm/intel/issues/6621>)
>   *
> 
>     igt@i915_suspend@basic-s3-without-i915:
> 
>       o bat-dg2-8: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@i915_suspend@basic-s3-without-i915.html> (i915#6645 <https://gitlab.freedesktop.org/drm/intel/issues/6645>)
>   *
> 
>     igt@kms_addfb_basic@addfb25-y-tiled-small-legacy:
> 
>       o
> 
>         fi-hsw-4770: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/fi-hsw-4770/igt@kms_addfb_basic@addfb25-y-tiled-small-legacy.html> (fdo#109271 <https://bugs.freedesktop.org/show_bug.cgi?id=109271>) +13 other tests skip
> 
>       o
> 
>         bat-dg2-8: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@kms_addfb_basic@addfb25-y-tiled-small-legacy.html> (i915#5190 <https://gitlab.freedesktop.org/drm/intel/issues/5190>)
> 
>   *
> 
>     igt@kms_addfb_basic@basic-y-tiled-legacy:
> 
>       o bat-dg2-8: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@kms_addfb_basic@basic-y-tiled-legacy.html> (i915#4215 <https://gitlab.freedesktop.org/drm/intel/issues/4215> / i915#5190 <https://gitlab.freedesktop.org/drm/intel/issues/5190>)
>   *
> 
>     igt@kms_addfb_basic@framebuffer-vs-set-tiling:
> 
>       o bat-dg2-8: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@kms_addfb_basic@framebuffer-vs-set-tiling.html> (i915#4212 <https://gitlab.freedesktop.org/drm/intel/issues/4212>) +7 other tests skip
>   *
> 
>     igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy:
> 
>       o bat-dg2-8: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy.html> (i915#4103 <https://gitlab.freedesktop.org/drm/intel/issues/4103> / i915#4213 <https://gitlab.freedesktop.org/drm/intel/issues/4213>) +1 other test skip
>   *
> 
>     igt@kms_force_connector_basic@force-load-detect:
> 
>       o bat-dg2-8: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@kms_force_connector_basic@force-load-detect.html> (fdo#109285 <https://bugs.freedesktop.org/show_bug.cgi?id=109285>)
>   *
> 
>     igt@kms_force_connector_basic@prune-stale-modes:
> 
>       o bat-dg2-8: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@kms_force_connector_basic@prune-stale-modes.html> (i915#5274 <https://gitlab.freedesktop.org/drm/intel/issues/5274>)
>   *
> 
>     igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-b-dp-6:
> 
>       o bat-adlp-11: PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGT_7472/bat-adlp-11/igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-b-dp-6.html> -> ABORT <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-adlp-11/igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-b-dp-6.html> (i915#8668 <https://gitlab.freedesktop.org/drm/intel/issues/8668>)
>   *
> 
>     igt@kms_pipe_crc_basic@suspend-read-crc@pipe-c-vga-1:
> 
>       o fi-hsw-4770: NOTRUN -> DMESG-WARN
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/fi-hsw-4770/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-c-vga-1.html> (i915#8841 <https://gitlab.freedesktop.org/drm/intel/issues/8841>) +6 other tests dmesg-warn
>   *
> 
>     igt@kms_psr@cursor_plane_move:
> 
>       o bat-dg2-8: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@kms_psr@cursor_plane_move.html> (i915#1072 <https://gitlab.freedesktop.org/drm/intel/issues/1072>) +3 other tests skip
>   *
> 
>     igt@kms_psr@sprite_plane_onoff:
> 
>       o fi-hsw-4770: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/fi-hsw-4770/igt@kms_psr@sprite_plane_onoff.html> (fdo#109271 <https://bugs.freedesktop.org/show_bug.cgi?id=109271> / i915#1072 <https://gitlab.freedesktop.org/drm/intel/issues/1072>) +3 other tests skip
>   *
> 
>     igt@kms_setmode@basic-clone-single-crtc:
> 
>       o bat-dg2-8: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@kms_setmode@basic-clone-single-crtc.html> (i915#3555 <https://gitlab.freedesktop.org/drm/intel/issues/3555>)
>   *
> 
>     igt@prime_vgem@basic-fence-flip:
> 
>       o bat-dg2-8: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@prime_vgem@basic-fence-flip.html> (i915#3708 <https://gitlab.freedesktop.org/drm/intel/issues/3708>)
>   *
> 
>     igt@prime_vgem@basic-fence-mmap:
> 
>       o bat-dg2-8: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@prime_vgem@basic-fence-mmap.html> (i915#3708 <https://gitlab.freedesktop.org/drm/intel/issues/3708> / i915#4077 <https://gitlab.freedesktop.org/drm/intel/issues/4077>) +1 other test skip
>   *
> 
>     igt@prime_vgem@basic-write:
> 
>       o bat-dg2-8: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-dg2-8/igt@prime_vgem@basic-write.html> (i915#3291 <https://gitlab.freedesktop.org/drm/intel/issues/3291> / i915#3708 <https://gitlab.freedesktop.org/drm/intel/issues/3708>) +2 other tests skip
> 
> 
>         Possible fixes
> 
>   *
> 
>     igt@kms_cursor_legacy@basic-flip-before-cursor-legacy:
> 
>       o bat-adlp-11: DMESG-WARN
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGT_7472/bat-adlp-11/igt@kms_cursor_legacy@basic-flip-before-cursor-legacy.html> (i915#4309 <https://gitlab.freedesktop.org/drm/intel/issues/4309>) -> PASS <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-adlp-11/igt@kms_cursor_legacy@basic-flip-before-cursor-legacy.html>
>   *
> 
>     igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-a-dp-5:
> 
>       o bat-adlp-11: ABORT
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGT_7472/bat-adlp-11/igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-a-dp-5.html> (i915#8668 <https://gitlab.freedesktop.org/drm/intel/issues/8668>) -> PASS <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-adlp-11/igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-a-dp-5.html>
> 
> 
>         Warnings
> 
>   * igt@kms_psr@sprite_plane_onoff:
>       o bat-rplp-1: SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/IGT_7472/bat-rplp-1/igt@kms_psr@sprite_plane_onoff.html> (i915#1072 <https://gitlab.freedesktop.org/drm/intel/issues/1072>) -> ABORT <https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/bat-rplp-1/igt@kms_psr@sprite_plane_onoff.html> (i915#8442 <https://gitlab.freedesktop.org/drm/intel/issues/8442> / i915#8668 <https://gitlab.freedesktop.org/drm/intel/issues/8668> / i915#8712 <https://gitlab.freedesktop.org/drm/intel/issues/8712>)
> 
> 
>     Build changes
> 
>   * CI: CI-20190529 -> None
>   * IGT: IGT_7472 -> IGTPW_9733
> 
> CI-20190529: 20190529
> CI_DRM_13605: 5008076127a9599704e98fb4de3761743d943dd0 @ 
> git://anongit.freedesktop.org/gfx-ci/linux
> IGTPW_9733: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9733/index.html
> IGT_7472: b9d6f8dd0f69d0091e349ddc9d9f1425b2f36ec9 @ 
> https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 4/8] benchmarks/gem_wsim: scale duration option fixes
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 4/8] benchmarks/gem_wsim: scale duration option fixes Marcin Bernatowicz
@ 2023-09-20 16:06   ` Tvrtko Ursulin
  0 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2023-09-20 16:06 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


Hi,

On 06/09/2023 16:51, Marcin Bernatowicz wrote:
> Fixed range duration check when scale duration (-f) command line option
> is provided + PERIOD step takes scale duration into account.
> Moved duration parsing code from parse_workload to separate function.
> Moved wsim_err, __duration definitions before parse_duration.
> Moved unbound_duration from struct w_step to struct duration.

Kudos for managing to navigate through this tool! :)

One request I would have before looking in more detail is to split 
refactoring from functional changes. In this case, from the commit 
message at least, it seems to me these could be four patches:

1. Move wsim_err
2. Reposition the unbound duration boolean.
3. Extract the duration parsing code to a new function.
4. Fix scaling of period steps.

Maybe actually.. have you noticed there are two command line options:

"  -f <scale>        Scale factor for batch durations.\n"
"  -F <scale>        Scale factor for delays.\n"

-f is only supposed to affect batches, while -F works on delays. Nothing 
seems to work on periods so indeed that maybe needs changing and have -F 
cover them too.

Regards,

Tvrtko

> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 109 +++++++++++++++++++++++-------------------
>   1 file changed, 61 insertions(+), 48 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 7b5e62a3b..f4024deb1 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -73,6 +73,7 @@ enum intel_engine_id {
>   
>   struct duration {
>   	unsigned int min, max;
> +	bool unbound_duration;
>   };
>   
>   enum w_type
> @@ -145,7 +146,6 @@ struct w_step
>   	unsigned int context;
>   	unsigned int engine;
>   	struct duration duration;
> -	bool unbound_duration;
>   	struct deps data_deps;
>   	struct deps fence_deps;
>   	int emit_fence;
> @@ -240,6 +240,19 @@ static struct drm_i915_gem_context_param_sseu device_sseu = {
>   #define DEPSYNC		(1<<2)
>   #define SSEU		(1<<3)
>   
> +static void __attribute__((format(printf, 1, 2)))
> +wsim_err(const char *fmt, ...)
> +{
> +	va_list ap;
> +
> +	if (!verbose)
> +		return;
> +
> +	va_start(ap, fmt);
> +	vfprintf(stderr, fmt, ap);
> +	va_end(ap);
> +}
> +
>   static const char *ring_str_map[NUM_ENGINES] = {
>   	[DEFAULT] = "DEFAULT",
>   	[RCS] = "RCS",
> @@ -429,17 +442,43 @@ out:
>   	return ret;
>   }
>   
> -static void __attribute__((format(printf, 1, 2)))
> -wsim_err(const char *fmt, ...)
> +static long __duration(long dur, double scale)
>   {
> -	va_list ap;
> +	return round(scale * dur);
> +}
>   
> -	if (!verbose)
> -		return;
> +static int
> +parse_duration(unsigned int nr_steps, struct duration *dur, double scale_dur, char *_desc)
> +{
> +	char *sep = NULL;
> +	long tmpl;
>   
> -	va_start(ap, fmt);
> -	vfprintf(stderr, fmt, ap);
> -	va_end(ap);
> +	if (_desc[0] == '*') {
> +		if (intel_gen(intel_get_drm_devid(fd)) < 8) {
> +			wsim_err("Infinite batch at step %u needs Gen8+!\n", nr_steps);
> +			return -1;
> +		}
> +		dur->unbound_duration = true;
> +	} else {
> +		tmpl = strtol(_desc, &sep, 10);
> +		if (tmpl <= 0 || tmpl == LONG_MIN || tmpl == LONG_MAX)
> +			return -1;
> +
> +		dur->min = __duration(tmpl, scale_dur);
> +
> +		if (sep && *sep == '-') {
> +			tmpl = strtol(sep + 1, NULL, 10);
> +			if (tmpl <= 0 || __duration(tmpl, scale_dur) <= dur->min ||
> +			    tmpl == LONG_MIN || tmpl == LONG_MAX)
> +				return -1;
> +
> +			dur->max = __duration(tmpl, scale_dur);
> +		} else {
> +			dur->max = dur->min;
> +		}
> +	}
> +
> +	return 0;
>   }
>   
>   #define check_arg(cond, fmt, ...) \
> @@ -855,11 +894,6 @@ static uint64_t engine_list_mask(const char *_str)
>   static unsigned long
>   allocate_working_set(struct workload *wrk, struct working_set *set);
>   
> -static long __duration(long dur, double scale)
> -{
> -	return round(scale * dur);
> -}
> -
>   #define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
>   	if ((field = strtok_r(fstart, ".", &fctx))) { \
>   		tmp = atoi(field); \
> @@ -899,8 +933,14 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   				int_field(DELAY, delay, tmp <= 0,
>   					  "Invalid delay at step %u!\n");
>   			} else if (!strcmp(field, "p")) {
> -				int_field(PERIOD, period, tmp <= 0,
> -					  "Invalid period at step %u!\n");
> +				field = strtok_r(fstart, ".", &fctx);
> +				if (field) {
> +					tmp = atoi(field);
> +					check_arg(tmp <= 0, "Invalid period at step %u!\n", nr_steps);
> +					step.type = PERIOD;
> +					step.period = __duration(tmp, scale_dur);
> +					goto add_step;
> +				}
>   			} else if (!strcmp(field, "P")) {
>   				unsigned int nr = 0;
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
> @@ -1121,38 +1161,11 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   		}
>   
>   		if ((field = strtok_r(fstart, ".", &fctx))) {
> -			char *sep = NULL;
> -			long int tmpl;
> -
>   			fstart = NULL;
>   
> -			if (field[0] == '*') {
> -				check_arg(intel_gen(intel_get_drm_devid(fd)) < 8,
> -					  "Infinite batch at step %u needs Gen8+!\n",
> -					  nr_steps);
> -				step.unbound_duration = true;
> -			} else {
> -				tmpl = strtol(field, &sep, 10);
> -				check_arg(tmpl <= 0 || tmpl == LONG_MIN ||
> -					  tmpl == LONG_MAX,
> -					  "Invalid duration at step %u!\n",
> -					  nr_steps);
> -				step.duration.min = __duration(tmpl, scale_dur);
> -
> -				if (sep && *sep == '-') {
> -					tmpl = strtol(sep + 1, NULL, 10);
> -					check_arg(tmpl <= 0 ||
> -						tmpl <= step.duration.min ||
> -						tmpl == LONG_MIN ||
> -						tmpl == LONG_MAX,
> -						"Invalid duration range at step %u!\n",
> -						nr_steps);
> -					step.duration.max = __duration(tmpl,
> -								       scale_dur);
> -				} else {
> -					step.duration.max = step.duration.min;
> -				}
> -			}
> +			tmp = parse_duration(nr_steps, &step.duration, scale_dur, field);
> +			check_arg(tmp < 0,
> +				  "Invalid duration at step %u!\n", nr_steps);
>   
>   			valid++;
>   		}
> @@ -2172,7 +2185,7 @@ update_bb_start(struct workload *wrk, struct w_step *w)
>   
>   	/* ticks is inverted for MI_DO_COMPARE (less-than comparison) */
>   	ticks = 0;
> -	if (!w->unbound_duration)
> +	if (!w->duration.unbound_duration)
>   		ticks = ~ns_to_ctx_ticks(1000 * get_duration(wrk, w));
>   
>   	*w->bb_duration = ticks;
> @@ -2349,7 +2362,7 @@ static void *run_workload(void *data)
>   
>   				igt_assert(t_idx >= 0 && t_idx < i);
>   				igt_assert(wrk->steps[t_idx].type == BATCH);
> -				igt_assert(wrk->steps[t_idx].unbound_duration);
> +				igt_assert(wrk->steps[t_idx].duration.unbound_duration);
>   
>   				*wrk->steps[t_idx].bb_duration = 0xffffffff;
>   				__sync_synchronize();

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 6/8] benchmarks/gem_wsim: allow comments in workload description files
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 6/8] benchmarks/gem_wsim: allow comments in workload description files Marcin Bernatowicz
@ 2023-09-20 16:13   ` Tvrtko Ursulin
  2023-09-21 15:05     ` Bernatowicz, Marcin
  0 siblings, 1 reply; 22+ messages in thread
From: Tvrtko Ursulin @ 2023-09-20 16:13 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 06/09/2023 16:51, Marcin Bernatowicz wrote:
> Lines starting with '#' are skipped.
> If command line step separator (',') is encountered after '#'
> it is replaced with ';' to not break parsing.
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c  | 41 ++++++++++++++++++++++++++++++++---------
>   benchmarks/wsim/README |  2 ++
>   2 files changed, 34 insertions(+), 9 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 0c1b58727..ec9fdc2d0 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -43,6 +43,7 @@
>   #include <limits.h>
>   #include <pthread.h>
>   #include <math.h>
> +#include <ctype.h>
>   
>   #include "drm.h"
>   #include "drmtest.h"
> @@ -94,6 +95,7 @@ enum w_type {
>   	TERMINATE,
>   	SSEU,
>   	WORKINGSET,
> +	SKIP,
>   };
>   
>   struct dep_entry {
> @@ -930,6 +932,12 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   		if (field) {
>   			fstart = NULL;
>   
> +			/* line starting with # is a comment */
> +			if (field[0] == '#') {
> +				step.type = SKIP;
> +				goto add_step;
> +			}

Do they need to be recorded as steps and couldn't be simply silently 
skipped over while parsing?

How does relative step referencing works when comments are present? 
(Batch implicit dependencies and 'a' and 's' commands.)

Regards,

Tvrtko

> +
>   			if (!strcmp(field, "d")) {
>   				int_field(DELAY, delay, tmp <= 0,
>   					  "Invalid delay at step %u!\n");
> @@ -1194,7 +1202,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   		if (field) {
>   			fstart = NULL;
>   
> -			check_arg(strlen(field) != 1 ||
> +			check_arg(!strlen(field) ||
> +				  (strlen(field) > 1 && !isspace(field[1]) && field[1] != '#') ||
>   				  (field[0] != '0' && field[0] != '1'),
>   				  "Invalid wait boolean at step %u!\n",
>   				  nr_steps);
> @@ -1208,18 +1217,23 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   		step.type = BATCH;
>   
>   add_step:
> -		if (step.type == DELAY)
> -			step.delay = __duration(step.delay, scale_time);
> +		if (step.type == SKIP) {
> +			if (verbose > 3)
> +				printf("skipped STEP: %s\n", _token);
> +		} else {
> +			if (step.type == DELAY)
> +				step.delay = __duration(step.delay, scale_time);
>   
> -		step.idx = nr_steps++;
> -		step.request = -1;
> -		steps = realloc(steps, sizeof(step) * nr_steps);
> -		igt_assert(steps);
> +			step.idx = nr_steps++;
> +			step.request = -1;
> +			steps = realloc(steps, sizeof(step) * nr_steps);
> +			igt_assert(steps);
>   
> -		memcpy(&steps[nr_steps - 1], &step, sizeof(step));
> +			memcpy(&steps[nr_steps - 1], &step, sizeof(step));
> +		}
>   
>   		free(token);
> -	}
> +	} // while ((_token = strtok_r(tstart, ",", &tctx))) {
>   
>   	if (app_w) {
>   		steps = realloc(steps, sizeof(step) *
> @@ -2304,6 +2318,8 @@ static void *run_workload(void *data)
>   			enum intel_engine_id engine = w->engine;
>   			int do_sleep = 0;
>   
> +			igt_assert(w->type != SKIP);
> +
>   			if (w->type == DELAY) {
>   				do_sleep = w->delay;
>   			} else if (w->type == PERIOD) {
> @@ -2543,6 +2559,13 @@ static char *load_workload_descriptor(char *filename)
>   	close(infd);
>   
>   	for (i = 0; i < len; i++) {
> +		/* '#' starts comment till end of line */
> +		if (buf[i] == '#')
> +			/* replace ',' in comments to not break parsing */
> +			while (++i < len && buf[i] != '\n')
> +				if (buf[i] == ',')
> +					buf[i] = ';';
> +
>   		if (buf[i] == '\n')
>   			buf[i] = ',';
>   	}
> diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
> index 8c71f2fe6..e4fd61645 100644
> --- a/benchmarks/wsim/README
> +++ b/benchmarks/wsim/README
> @@ -1,6 +1,8 @@
>   Workload descriptor format
>   ==========================
>   
> +Lines starting with '#' are treated as comments (do not create work step).
> +
>   ctx.engine.duration_us.dependency.wait,...
>   <uint>.<str>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
>   B.<uint>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 1/8] lib/xe_spin: xe_spin_opts for xe_spin initialization
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 1/8] lib/xe_spin: xe_spin_opts for xe_spin initialization Marcin Bernatowicz
@ 2023-09-20 16:43   ` Kamil Konieczny
  2023-09-21 15:08     ` Bernatowicz, Marcin
  0 siblings, 1 reply; 22+ messages in thread
From: Kamil Konieczny @ 2023-09-20 16:43 UTC (permalink / raw)
  To: igt-dev

Hi Marcin,

could you drop already merged patches and resent?

Regards,
Kamil

On 2023-09-06 at 15:51:01 +0000, Marcin Bernatowicz wrote:
> Introduced struct xe_spin_opts for xe_spin initialization,
> adjusted tests to new xe_spin_init signature.
> Added xe_spin_init_opts macro (Zbyszek).
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>  lib/xe/xe_spin.c               | 28 ++++++++++------------------
>  lib/xe/xe_spin.h               | 19 ++++++++++++++++++-
>  tests/intel/xe_dma_buf_sync.c  |  6 +++---
>  tests/intel/xe_exec_balancer.c |  9 ++++-----
>  tests/intel/xe_exec_reset.c    | 24 ++++++++++++++----------
>  tests/intel/xe_exec_threads.c  |  7 ++++---
>  tests/intel/xe_vm.c            |  7 ++++---
>  7 files changed, 57 insertions(+), 43 deletions(-)
> 
> diff --git a/lib/xe/xe_spin.c b/lib/xe/xe_spin.c
> index 7113972ee..27f837ef9 100644
> --- a/lib/xe/xe_spin.c
> +++ b/lib/xe/xe_spin.c
> @@ -19,17 +19,13 @@
>  /**
>   * xe_spin_init:
>   * @spin: pointer to mapped bo in which spinner code will be written
> - * @addr: offset of spinner within vm
> - * @preempt: allow spinner to be preempted or not
> + * @opts: pointer to spinner initialization options
>   */
> -void xe_spin_init(struct xe_spin *spin, uint64_t addr, bool preempt)
> +void xe_spin_init(struct xe_spin *spin, struct xe_spin_opts *opts)
>  {
> -	uint64_t batch_offset = (char *)&spin->batch - (char *)spin;
> -	uint64_t batch_addr = addr + batch_offset;
> -	uint64_t start_offset = (char *)&spin->start - (char *)spin;
> -	uint64_t start_addr = addr + start_offset;
> -	uint64_t end_offset = (char *)&spin->end - (char *)spin;
> -	uint64_t end_addr = addr + end_offset;
> +	uint64_t loop_addr = opts->addr + offsetof(struct xe_spin, batch);
> +	uint64_t start_addr = opts->addr + offsetof(struct xe_spin, start);
> +	uint64_t end_addr = opts->addr + offsetof(struct xe_spin, end);
>  	int b = 0;
>  
>  	spin->start = 0;
> @@ -40,7 +36,7 @@ void xe_spin_init(struct xe_spin *spin, uint64_t addr, bool preempt)
>  	spin->batch[b++] = start_addr >> 32;
>  	spin->batch[b++] = 0xc0ffee;
>  
> -	if (preempt)
> +	if (opts->preempt)
>  		spin->batch[b++] = (0x5 << 23);
>  
>  	spin->batch[b++] = MI_COND_BATCH_BUFFER_END | MI_DO_COMPARE | 2;
> @@ -49,8 +45,8 @@ void xe_spin_init(struct xe_spin *spin, uint64_t addr, bool preempt)
>  	spin->batch[b++] = end_addr >> 32;
>  
>  	spin->batch[b++] = MI_BATCH_BUFFER_START | 1 << 8 | 1;
> -	spin->batch[b++] = batch_addr;
> -	spin->batch[b++] = batch_addr >> 32;
> +	spin->batch[b++] = loop_addr;
> +	spin->batch[b++] = loop_addr >> 32;
>  
>  	igt_assert(b <= ARRAY_SIZE(spin->batch));
>  }
> @@ -133,11 +129,7 @@ xe_spin_create(int fd, const struct igt_spin_factory *opt)
>  	addr = intel_allocator_alloc_with_strategy(ahnd, spin->handle, bo_size, 0, ALLOC_STRATEGY_LOW_TO_HIGH);
>  	xe_vm_bind_sync(fd, spin->vm, spin->handle, 0, addr, bo_size);
>  
> -	if (!(opt->flags & IGT_SPIN_NO_PREEMPTION))
> -		xe_spin_init(xe_spin, addr, true);
> -	else
> -		xe_spin_init(xe_spin, addr, false);
> -
> +	xe_spin_init_opts(xe_spin, .addr = addr, .preempt = !(opt->flags & IGT_SPIN_NO_PREEMPTION));
>  	exec.exec_queue_id = spin->engine;
>  	exec.address = addr;
>  	sync.handle = spin->syncobj;
> @@ -219,7 +211,7 @@ void xe_cork_init(int fd, struct drm_xe_engine_class_instance *hwe,
>  	exec_queue = xe_exec_queue_create(fd, vm, hwe, 0);
>  	syncobj = syncobj_create(fd, 0);
>  
> -	xe_spin_init(spin, addr, true);
> +	xe_spin_init_opts(spin, .addr = addr, .preempt = true);
>  	exec.exec_queue_id = exec_queue;
>  	exec.address = addr;
>  	sync.handle = syncobj;
> diff --git a/lib/xe/xe_spin.h b/lib/xe/xe_spin.h
> index c84db175d..9f1d33294 100644
> --- a/lib/xe/xe_spin.h
> +++ b/lib/xe/xe_spin.h
> @@ -15,6 +15,18 @@
>  #include "xe_query.h"
>  #include "lib/igt_dummyload.h"
>  
> +/** struct xe_spin_opts
> + *
> + * @addr: offset of spinner within vm
> + * @preempt: allow spinner to be preempted or not
> + *
> + * Used to initialize struct xe_spin spinner behavior.
> + */
> +struct xe_spin_opts {
> +	uint64_t addr;
> +	bool preempt;
> +};
> +
>  /* Mapped GPU object */
>  struct xe_spin {
>  	uint32_t batch[16];
> @@ -22,8 +34,13 @@ struct xe_spin {
>  	uint32_t start;
>  	uint32_t end;
>  };
> +
>  igt_spin_t *xe_spin_create(int fd, const struct igt_spin_factory *opt);
> -void xe_spin_init(struct xe_spin *spin, uint64_t addr, bool preempt);
> +void xe_spin_init(struct xe_spin *spin, struct xe_spin_opts *opts);
> +
> +#define xe_spin_init_opts(fd, ...) \
> +	xe_spin_init(fd, &((struct xe_spin_opts){__VA_ARGS__}))
> +
>  bool xe_spin_started(struct xe_spin *spin);
>  void xe_spin_sync_wait(int fd, struct igt_spin *spin);
>  void xe_spin_wait_started(struct xe_spin *spin);
> diff --git a/tests/intel/xe_dma_buf_sync.c b/tests/intel/xe_dma_buf_sync.c
> index 29d675154..627f4c1e5 100644
> --- a/tests/intel/xe_dma_buf_sync.c
> +++ b/tests/intel/xe_dma_buf_sync.c
> @@ -144,7 +144,6 @@ test_export_dma_buf(struct drm_xe_engine_class_instance *hwe0,
>  		uint64_t sdi_offset = (char *)&data[i]->data - (char *)data[i];
>  		uint64_t sdi_addr = addr + sdi_offset;
>  		uint64_t spin_offset = (char *)&data[i]->spin - (char *)data[i];
> -		uint64_t spin_addr = addr + spin_offset;
>  		struct drm_xe_sync sync[2] = {
>  			{ .flags = DRM_XE_SYNC_SYNCOBJ, },
>  			{ .flags = DRM_XE_SYNC_SYNCOBJ | DRM_XE_SYNC_SIGNAL, },
> @@ -153,14 +152,15 @@ test_export_dma_buf(struct drm_xe_engine_class_instance *hwe0,
>  			.num_batch_buffer = 1,
>  			.syncs = to_user_pointer(sync),
>  		};
> +		struct xe_spin_opts spin_opts = { .addr = addr + spin_offset, .preempt = true };
>  		uint32_t syncobj;
>  		int b = 0;
>  		int sync_fd;
>  
>  		/* Write spinner on FD[0] */
> -		xe_spin_init(&data[i]->spin, spin_addr, true);
> +		xe_spin_init(&data[i]->spin, &spin_opts);
>  		exec.exec_queue_id = exec_queue[0];
> -		exec.address = spin_addr;
> +		exec.address = spin_opts.addr;
>  		xe_exec(fd[0], &exec);
>  
>  		/* Export prime BO as sync file and veify business */
> diff --git a/tests/intel/xe_exec_balancer.c b/tests/intel/xe_exec_balancer.c
> index f364a4b7a..d7d8dd8fb 100644
> --- a/tests/intel/xe_exec_balancer.c
> +++ b/tests/intel/xe_exec_balancer.c
> @@ -52,6 +52,7 @@ static void test_all_active(int fd, int gt, int class)
>  	struct {
>  		struct xe_spin spin;
>  	} *data;
> +	struct xe_spin_opts spin_opts = { .preempt = false };
>  	struct drm_xe_engine_class_instance *hwe;
>  	struct drm_xe_engine_class_instance eci[MAX_INSTANCE];
>  	int i, num_placements = 0;
> @@ -90,16 +91,14 @@ static void test_all_active(int fd, int gt, int class)
>  	xe_vm_bind_async(fd, vm, 0, bo, 0, addr, bo_size, sync, 1);
>  
>  	for (i = 0; i < num_placements; i++) {
> -		uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
> -		uint64_t spin_addr = addr + spin_offset;
> -
> -		xe_spin_init(&data[i].spin, spin_addr, false);
> +		spin_opts.addr = addr + (char *)&data[i].spin - (char *)data;
> +		xe_spin_init(&data[i].spin, &spin_opts);
>  		sync[0].flags &= ~DRM_XE_SYNC_SIGNAL;
>  		sync[1].flags |= DRM_XE_SYNC_SIGNAL;
>  		sync[1].handle = syncobjs[i];
>  
>  		exec.exec_queue_id = exec_queues[i];
> -		exec.address = spin_addr;
> +		exec.address = spin_opts.addr;
>  		xe_exec(fd, &exec);
>  		xe_spin_wait_started(&data[i].spin);
>  	}
> diff --git a/tests/intel/xe_exec_reset.c b/tests/intel/xe_exec_reset.c
> index a2d33baf1..be6bbada6 100644
> --- a/tests/intel/xe_exec_reset.c
> +++ b/tests/intel/xe_exec_reset.c
> @@ -44,6 +44,7 @@ static void test_spin(int fd, struct drm_xe_engine_class_instance *eci)
>  	size_t bo_size;
>  	uint32_t bo = 0;
>  	struct xe_spin *spin;
> +	struct xe_spin_opts spin_opts = { .addr = addr, .preempt = false };
>  
>  	vm = xe_vm_create(fd, DRM_XE_VM_CREATE_ASYNC_BIND_OPS, 0);
>  	bo_size = sizeof(*spin);
> @@ -60,7 +61,7 @@ static void test_spin(int fd, struct drm_xe_engine_class_instance *eci)
>  	sync[0].handle = syncobj_create(fd, 0);
>  	xe_vm_bind_async(fd, vm, 0, bo, 0, addr, bo_size, sync, 1);
>  
> -	xe_spin_init(spin, addr, false);
> +	xe_spin_init(spin, &spin_opts);
>  
>  	sync[0].flags &= ~DRM_XE_SYNC_SIGNAL;
>  	sync[1].flags |= DRM_XE_SYNC_SIGNAL;
> @@ -165,6 +166,7 @@ test_balancer(int fd, int gt, int class, int n_exec_queues, int n_execs,
>  		uint64_t pad;
>  		uint32_t data;
>  	} *data;
> +	struct xe_spin_opts spin_opts = { .preempt = false };
>  	struct drm_xe_engine_class_instance *hwe;
>  	struct drm_xe_engine_class_instance eci[MAX_INSTANCE];
>  	int i, j, b, num_placements = 0, bad_batches = 1;
> @@ -236,7 +238,6 @@ test_balancer(int fd, int gt, int class, int n_exec_queues, int n_execs,
>  		uint64_t batch_offset = (char *)&data[i].batch - (char *)data;
>  		uint64_t batch_addr = base_addr + batch_offset;
>  		uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
> -		uint64_t spin_addr = base_addr + spin_offset;
>  		uint64_t sdi_offset = (char *)&data[i].data - (char *)data;
>  		uint64_t sdi_addr = base_addr + sdi_offset;
>  		uint64_t exec_addr;
> @@ -247,8 +248,9 @@ test_balancer(int fd, int gt, int class, int n_exec_queues, int n_execs,
>  			batches[j] = batch_addr;
>  
>  		if (i < bad_batches) {
> -			xe_spin_init(&data[i].spin, spin_addr, false);
> -			exec_addr = spin_addr;
> +			spin_opts.addr = base_addr + spin_offset;
> +			xe_spin_init(&data[i].spin, &spin_opts);
> +			exec_addr = spin_opts.addr;
>  		} else {
>  			b = 0;
>  			data[i].batch[b++] = MI_STORE_DWORD_IMM_GEN4;
> @@ -368,6 +370,7 @@ test_legacy_mode(int fd, struct drm_xe_engine_class_instance *eci,
>  		uint64_t pad;
>  		uint32_t data;
>  	} *data;
> +	struct xe_spin_opts spin_opts = { .preempt = false };
>  	int i, b;
>  
>  	igt_assert(n_exec_queues <= MAX_N_EXECQUEUES);
> @@ -417,15 +420,15 @@ test_legacy_mode(int fd, struct drm_xe_engine_class_instance *eci,
>  		uint64_t batch_offset = (char *)&data[i].batch - (char *)data;
>  		uint64_t batch_addr = base_addr + batch_offset;
>  		uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
> -		uint64_t spin_addr = base_addr + spin_offset;
>  		uint64_t sdi_offset = (char *)&data[i].data - (char *)data;
>  		uint64_t sdi_addr = base_addr + sdi_offset;
>  		uint64_t exec_addr;
>  		int e = i % n_exec_queues;
>  
>  		if (!i) {
> -			xe_spin_init(&data[i].spin, spin_addr, false);
> -			exec_addr = spin_addr;
> +			spin_opts.addr = base_addr + spin_offset;
> +			xe_spin_init(&data[i].spin, &spin_opts);
> +			exec_addr = spin_opts.addr;
>  		} else {
>  			b = 0;
>  			data[i].batch[b++] = MI_STORE_DWORD_IMM_GEN4;
> @@ -539,6 +542,7 @@ test_compute_mode(int fd, struct drm_xe_engine_class_instance *eci,
>  		uint64_t exec_sync;
>  		uint32_t data;
>  	} *data;
> +	struct xe_spin_opts spin_opts = { .preempt = false };
>  	int i, b;
>  
>  	igt_assert(n_exec_queues <= MAX_N_EXECQUEUES);
> @@ -593,15 +597,15 @@ test_compute_mode(int fd, struct drm_xe_engine_class_instance *eci,
>  		uint64_t batch_offset = (char *)&data[i].batch - (char *)data;
>  		uint64_t batch_addr = base_addr + batch_offset;
>  		uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
> -		uint64_t spin_addr = base_addr + spin_offset;
>  		uint64_t sdi_offset = (char *)&data[i].data - (char *)data;
>  		uint64_t sdi_addr = base_addr + sdi_offset;
>  		uint64_t exec_addr;
>  		int e = i % n_exec_queues;
>  
>  		if (!i) {
> -			xe_spin_init(&data[i].spin, spin_addr, false);
> -			exec_addr = spin_addr;
> +			spin_opts.addr = base_addr + spin_offset;
> +			xe_spin_init(&data[i].spin, &spin_opts);
> +			exec_addr = spin_opts.addr;
>  		} else {
>  			b = 0;
>  			data[i].batch[b++] = MI_STORE_DWORD_IMM_GEN4;
> diff --git a/tests/intel/xe_exec_threads.c b/tests/intel/xe_exec_threads.c
> index e64c1639a..ff4ebc280 100644
> --- a/tests/intel/xe_exec_threads.c
> +++ b/tests/intel/xe_exec_threads.c
> @@ -486,6 +486,7 @@ test_legacy_mode(int fd, uint32_t vm, uint64_t addr, uint64_t userptr,
>  		uint64_t pad;
>  		uint32_t data;
>  	} *data;
> +	struct xe_spin_opts spin_opts = { .preempt = false };
>  	int i, j, b, hang_exec_queue = n_exec_queues / 2;
>  	bool owns_vm = false, owns_fd = false;
>  
> @@ -562,15 +563,15 @@ test_legacy_mode(int fd, uint32_t vm, uint64_t addr, uint64_t userptr,
>  		uint64_t batch_offset = (char *)&data[i].batch - (char *)data;
>  		uint64_t batch_addr = addr + batch_offset;
>  		uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
> -		uint64_t spin_addr = addr + spin_offset;
>  		uint64_t sdi_offset = (char *)&data[i].data - (char *)data;
>  		uint64_t sdi_addr = addr + sdi_offset;
>  		uint64_t exec_addr;
>  		int e = i % n_exec_queues;
>  
>  		if (flags & HANG && e == hang_exec_queue && i == e) {
> -			xe_spin_init(&data[i].spin, spin_addr, false);
> -			exec_addr = spin_addr;
> +			spin_opts.addr = addr + spin_offset;
> +			xe_spin_init(&data[i].spin, &spin_opts);
> +			exec_addr = spin_opts.addr;
>  		} else {
>  			b = 0;
>  			data[i].batch[b++] = MI_STORE_DWORD_IMM_GEN4;
> diff --git a/tests/intel/xe_vm.c b/tests/intel/xe_vm.c
> index e42c04e33..dc1850338 100644
> --- a/tests/intel/xe_vm.c
> +++ b/tests/intel/xe_vm.c
> @@ -727,6 +727,7 @@ test_bind_execqueues_independent(int fd, struct drm_xe_engine_class_instance *ec
>  		uint64_t pad;
>  		uint32_t data;
>  	} *data;
> +	struct xe_spin_opts spin_opts = { .preempt = true };
>  	int i, b;
>  
>  	vm = xe_vm_create(fd, DRM_XE_VM_CREATE_ASYNC_BIND_OPS, 0);
> @@ -755,14 +756,14 @@ test_bind_execqueues_independent(int fd, struct drm_xe_engine_class_instance *ec
>  		uint64_t sdi_offset = (char *)&data[i].data - (char *)data;
>  		uint64_t sdi_addr = addr + sdi_offset;
>  		uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
> -		uint64_t spin_addr = addr + spin_offset;
>  		int e = i;
>  
>  		if (i == 0) {
>  			/* Cork 1st exec_queue with a spinner */
> -			xe_spin_init(&data[i].spin, spin_addr, true);
> +			spin_opts.addr = addr + spin_offset;
> +			xe_spin_init(&data[i].spin, &spin_opts);
>  			exec.exec_queue_id = exec_queues[e];
> -			exec.address = spin_addr;
> +			exec.address = spin_opts.addr;
>  			sync[0].flags &= ~DRM_XE_SYNC_SIGNAL;
>  			sync[1].flags |= DRM_XE_SYNC_SIGNAL;
>  			sync[1].handle = syncobjs[e];
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 6/8] benchmarks/gem_wsim: allow comments in workload description files
  2023-09-20 16:13   ` Tvrtko Ursulin
@ 2023-09-21 15:05     ` Bernatowicz, Marcin
  2023-09-21 15:22       ` Tvrtko Ursulin
  0 siblings, 1 reply; 22+ messages in thread
From: Bernatowicz, Marcin @ 2023-09-21 15:05 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: chris.p.wilson

Hi,

On 9/20/2023 6:13 PM, Tvrtko Ursulin wrote:
> 
> On 06/09/2023 16:51, Marcin Bernatowicz wrote:
>> Lines starting with '#' are skipped.
>> If command line step separator (',') is encountered after '#'
>> it is replaced with ';' to not break parsing.
>>
>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>> ---
>>   benchmarks/gem_wsim.c  | 41 ++++++++++++++++++++++++++++++++---------
>>   benchmarks/wsim/README |  2 ++
>>   2 files changed, 34 insertions(+), 9 deletions(-)
>>
>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>> index 0c1b58727..ec9fdc2d0 100644
>> --- a/benchmarks/gem_wsim.c
>> +++ b/benchmarks/gem_wsim.c
>> @@ -43,6 +43,7 @@
>>   #include <limits.h>
>>   #include <pthread.h>
>>   #include <math.h>
>> +#include <ctype.h>
>>   #include "drm.h"
>>   #include "drmtest.h"
>> @@ -94,6 +95,7 @@ enum w_type {
>>       TERMINATE,
>>       SSEU,
>>       WORKINGSET,
>> +    SKIP,
>>   };
>>   struct dep_entry {
>> @@ -930,6 +932,12 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>           if (field) {
>>               fstart = NULL;
>> +            /* line starting with # is a comment */
>> +            if (field[0] == '#') {
>> +                step.type = SKIP;
>> +                goto add_step;
>> +            }
> 
> Do they need to be recorded as steps and couldn't be simply silently 
> skipped over while parsing?

Looks indeed a bool skip may be enough. It's a dummy step not stored in 
workload steps.

> 
> How does relative step referencing works when comments are present? 
> (Batch implicit dependencies and 'a' and 's' commands.)

Comments may be added to existing workloads without a problem as the 
SKIPs are not incrementing nr_step (does not create a record).
Maybe a small confusion is that step.idx does not correspond to line 
number, so in case of failed parse we get a step number without a line :/

Regards,
Marcin
> 
> Regards,
> 
> Tvrtko
> 
>> +
>>               if (!strcmp(field, "d")) {
>>                   int_field(DELAY, delay, tmp <= 0,
>>                         "Invalid delay at step %u!\n");
>> @@ -1194,7 +1202,8 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>           if (field) {
>>               fstart = NULL;
>> -            check_arg(strlen(field) != 1 ||
>> +            check_arg(!strlen(field) ||
>> +                  (strlen(field) > 1 && !isspace(field[1]) && 
>> field[1] != '#') ||
>>                     (field[0] != '0' && field[0] != '1'),
>>                     "Invalid wait boolean at step %u!\n",
>>                     nr_steps);
>> @@ -1208,18 +1217,23 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>           step.type = BATCH;
>>   add_step:
>> -        if (step.type == DELAY)
>> -            step.delay = __duration(step.delay, scale_time);
>> +        if (step.type == SKIP) {
>> +            if (verbose > 3)
>> +                printf("skipped STEP: %s\n", _token);
>> +        } else {
>> +            if (step.type == DELAY)
>> +                step.delay = __duration(step.delay, scale_time);
>> -        step.idx = nr_steps++;
>> -        step.request = -1;
>> -        steps = realloc(steps, sizeof(step) * nr_steps);
>> -        igt_assert(steps);
>> +            step.idx = nr_steps++;
>> +            step.request = -1;
>> +            steps = realloc(steps, sizeof(step) * nr_steps);
>> +            igt_assert(steps);
>> -        memcpy(&steps[nr_steps - 1], &step, sizeof(step));
>> +            memcpy(&steps[nr_steps - 1], &step, sizeof(step));
>> +        }
>>           free(token);
>> -    }
>> +    } // while ((_token = strtok_r(tstart, ",", &tctx))) {
>>       if (app_w) {
>>           steps = realloc(steps, sizeof(step) *
>> @@ -2304,6 +2318,8 @@ static void *run_workload(void *data)
>>               enum intel_engine_id engine = w->engine;
>>               int do_sleep = 0;
>> +            igt_assert(w->type != SKIP);
>> +
>>               if (w->type == DELAY) {
>>                   do_sleep = w->delay;
>>               } else if (w->type == PERIOD) {
>> @@ -2543,6 +2559,13 @@ static char *load_workload_descriptor(char 
>> *filename)
>>       close(infd);
>>       for (i = 0; i < len; i++) {
>> +        /* '#' starts comment till end of line */
>> +        if (buf[i] == '#')
>> +            /* replace ',' in comments to not break parsing */
>> +            while (++i < len && buf[i] != '\n')
>> +                if (buf[i] == ',')
>> +                    buf[i] = ';';
>> +
>>           if (buf[i] == '\n')
>>               buf[i] = ',';
>>       }
>> diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
>> index 8c71f2fe6..e4fd61645 100644
>> --- a/benchmarks/wsim/README
>> +++ b/benchmarks/wsim/README
>> @@ -1,6 +1,8 @@
>>   Workload descriptor format
>>   ==========================
>> +Lines starting with '#' are treated as comments (do not create work 
>> step).
>> +
>>   ctx.engine.duration_us.dependency.wait,...
>>   <uint>.<str>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
>>   B.<uint>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 1/8] lib/xe_spin: xe_spin_opts for xe_spin initialization
  2023-09-20 16:43   ` Kamil Konieczny
@ 2023-09-21 15:08     ` Bernatowicz, Marcin
  0 siblings, 0 replies; 22+ messages in thread
From: Bernatowicz, Marcin @ 2023-09-21 15:08 UTC (permalink / raw)
  To: Kamil Konieczny, igt-dev

Hi,

On 9/20/2023 6:43 PM, Kamil Konieczny wrote:
> Hi Marcin,
> 
> could you drop already merged patches and resent?

Yes, will send a new series shortly.

Regards,
Marcin
> 
> Regards,
> Kamil
> 
> On 2023-09-06 at 15:51:01 +0000, Marcin Bernatowicz wrote:
>> Introduced struct xe_spin_opts for xe_spin initialization,
>> adjusted tests to new xe_spin_init signature.
>> Added xe_spin_init_opts macro (Zbyszek).
>>
>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>> ---
>>   lib/xe/xe_spin.c               | 28 ++++++++++------------------
>>   lib/xe/xe_spin.h               | 19 ++++++++++++++++++-
>>   tests/intel/xe_dma_buf_sync.c  |  6 +++---
>>   tests/intel/xe_exec_balancer.c |  9 ++++-----
>>   tests/intel/xe_exec_reset.c    | 24 ++++++++++++++----------
>>   tests/intel/xe_exec_threads.c  |  7 ++++---
>>   tests/intel/xe_vm.c            |  7 ++++---
>>   7 files changed, 57 insertions(+), 43 deletions(-)
>>
>> diff --git a/lib/xe/xe_spin.c b/lib/xe/xe_spin.c
>> index 7113972ee..27f837ef9 100644
>> --- a/lib/xe/xe_spin.c
>> +++ b/lib/xe/xe_spin.c
>> @@ -19,17 +19,13 @@
>>   /**
>>    * xe_spin_init:
>>    * @spin: pointer to mapped bo in which spinner code will be written
>> - * @addr: offset of spinner within vm
>> - * @preempt: allow spinner to be preempted or not
>> + * @opts: pointer to spinner initialization options
>>    */
>> -void xe_spin_init(struct xe_spin *spin, uint64_t addr, bool preempt)
>> +void xe_spin_init(struct xe_spin *spin, struct xe_spin_opts *opts)
>>   {
>> -	uint64_t batch_offset = (char *)&spin->batch - (char *)spin;
>> -	uint64_t batch_addr = addr + batch_offset;
>> -	uint64_t start_offset = (char *)&spin->start - (char *)spin;
>> -	uint64_t start_addr = addr + start_offset;
>> -	uint64_t end_offset = (char *)&spin->end - (char *)spin;
>> -	uint64_t end_addr = addr + end_offset;
>> +	uint64_t loop_addr = opts->addr + offsetof(struct xe_spin, batch);
>> +	uint64_t start_addr = opts->addr + offsetof(struct xe_spin, start);
>> +	uint64_t end_addr = opts->addr + offsetof(struct xe_spin, end);
>>   	int b = 0;
>>   
>>   	spin->start = 0;
>> @@ -40,7 +36,7 @@ void xe_spin_init(struct xe_spin *spin, uint64_t addr, bool preempt)
>>   	spin->batch[b++] = start_addr >> 32;
>>   	spin->batch[b++] = 0xc0ffee;
>>   
>> -	if (preempt)
>> +	if (opts->preempt)
>>   		spin->batch[b++] = (0x5 << 23);
>>   
>>   	spin->batch[b++] = MI_COND_BATCH_BUFFER_END | MI_DO_COMPARE | 2;
>> @@ -49,8 +45,8 @@ void xe_spin_init(struct xe_spin *spin, uint64_t addr, bool preempt)
>>   	spin->batch[b++] = end_addr >> 32;
>>   
>>   	spin->batch[b++] = MI_BATCH_BUFFER_START | 1 << 8 | 1;
>> -	spin->batch[b++] = batch_addr;
>> -	spin->batch[b++] = batch_addr >> 32;
>> +	spin->batch[b++] = loop_addr;
>> +	spin->batch[b++] = loop_addr >> 32;
>>   
>>   	igt_assert(b <= ARRAY_SIZE(spin->batch));
>>   }
>> @@ -133,11 +129,7 @@ xe_spin_create(int fd, const struct igt_spin_factory *opt)
>>   	addr = intel_allocator_alloc_with_strategy(ahnd, spin->handle, bo_size, 0, ALLOC_STRATEGY_LOW_TO_HIGH);
>>   	xe_vm_bind_sync(fd, spin->vm, spin->handle, 0, addr, bo_size);
>>   
>> -	if (!(opt->flags & IGT_SPIN_NO_PREEMPTION))
>> -		xe_spin_init(xe_spin, addr, true);
>> -	else
>> -		xe_spin_init(xe_spin, addr, false);
>> -
>> +	xe_spin_init_opts(xe_spin, .addr = addr, .preempt = !(opt->flags & IGT_SPIN_NO_PREEMPTION));
>>   	exec.exec_queue_id = spin->engine;
>>   	exec.address = addr;
>>   	sync.handle = spin->syncobj;
>> @@ -219,7 +211,7 @@ void xe_cork_init(int fd, struct drm_xe_engine_class_instance *hwe,
>>   	exec_queue = xe_exec_queue_create(fd, vm, hwe, 0);
>>   	syncobj = syncobj_create(fd, 0);
>>   
>> -	xe_spin_init(spin, addr, true);
>> +	xe_spin_init_opts(spin, .addr = addr, .preempt = true);
>>   	exec.exec_queue_id = exec_queue;
>>   	exec.address = addr;
>>   	sync.handle = syncobj;
>> diff --git a/lib/xe/xe_spin.h b/lib/xe/xe_spin.h
>> index c84db175d..9f1d33294 100644
>> --- a/lib/xe/xe_spin.h
>> +++ b/lib/xe/xe_spin.h
>> @@ -15,6 +15,18 @@
>>   #include "xe_query.h"
>>   #include "lib/igt_dummyload.h"
>>   
>> +/** struct xe_spin_opts
>> + *
>> + * @addr: offset of spinner within vm
>> + * @preempt: allow spinner to be preempted or not
>> + *
>> + * Used to initialize struct xe_spin spinner behavior.
>> + */
>> +struct xe_spin_opts {
>> +	uint64_t addr;
>> +	bool preempt;
>> +};
>> +
>>   /* Mapped GPU object */
>>   struct xe_spin {
>>   	uint32_t batch[16];
>> @@ -22,8 +34,13 @@ struct xe_spin {
>>   	uint32_t start;
>>   	uint32_t end;
>>   };
>> +
>>   igt_spin_t *xe_spin_create(int fd, const struct igt_spin_factory *opt);
>> -void xe_spin_init(struct xe_spin *spin, uint64_t addr, bool preempt);
>> +void xe_spin_init(struct xe_spin *spin, struct xe_spin_opts *opts);
>> +
>> +#define xe_spin_init_opts(fd, ...) \
>> +	xe_spin_init(fd, &((struct xe_spin_opts){__VA_ARGS__}))
>> +
>>   bool xe_spin_started(struct xe_spin *spin);
>>   void xe_spin_sync_wait(int fd, struct igt_spin *spin);
>>   void xe_spin_wait_started(struct xe_spin *spin);
>> diff --git a/tests/intel/xe_dma_buf_sync.c b/tests/intel/xe_dma_buf_sync.c
>> index 29d675154..627f4c1e5 100644
>> --- a/tests/intel/xe_dma_buf_sync.c
>> +++ b/tests/intel/xe_dma_buf_sync.c
>> @@ -144,7 +144,6 @@ test_export_dma_buf(struct drm_xe_engine_class_instance *hwe0,
>>   		uint64_t sdi_offset = (char *)&data[i]->data - (char *)data[i];
>>   		uint64_t sdi_addr = addr + sdi_offset;
>>   		uint64_t spin_offset = (char *)&data[i]->spin - (char *)data[i];
>> -		uint64_t spin_addr = addr + spin_offset;
>>   		struct drm_xe_sync sync[2] = {
>>   			{ .flags = DRM_XE_SYNC_SYNCOBJ, },
>>   			{ .flags = DRM_XE_SYNC_SYNCOBJ | DRM_XE_SYNC_SIGNAL, },
>> @@ -153,14 +152,15 @@ test_export_dma_buf(struct drm_xe_engine_class_instance *hwe0,
>>   			.num_batch_buffer = 1,
>>   			.syncs = to_user_pointer(sync),
>>   		};
>> +		struct xe_spin_opts spin_opts = { .addr = addr + spin_offset, .preempt = true };
>>   		uint32_t syncobj;
>>   		int b = 0;
>>   		int sync_fd;
>>   
>>   		/* Write spinner on FD[0] */
>> -		xe_spin_init(&data[i]->spin, spin_addr, true);
>> +		xe_spin_init(&data[i]->spin, &spin_opts);
>>   		exec.exec_queue_id = exec_queue[0];
>> -		exec.address = spin_addr;
>> +		exec.address = spin_opts.addr;
>>   		xe_exec(fd[0], &exec);
>>   
>>   		/* Export prime BO as sync file and veify business */
>> diff --git a/tests/intel/xe_exec_balancer.c b/tests/intel/xe_exec_balancer.c
>> index f364a4b7a..d7d8dd8fb 100644
>> --- a/tests/intel/xe_exec_balancer.c
>> +++ b/tests/intel/xe_exec_balancer.c
>> @@ -52,6 +52,7 @@ static void test_all_active(int fd, int gt, int class)
>>   	struct {
>>   		struct xe_spin spin;
>>   	} *data;
>> +	struct xe_spin_opts spin_opts = { .preempt = false };
>>   	struct drm_xe_engine_class_instance *hwe;
>>   	struct drm_xe_engine_class_instance eci[MAX_INSTANCE];
>>   	int i, num_placements = 0;
>> @@ -90,16 +91,14 @@ static void test_all_active(int fd, int gt, int class)
>>   	xe_vm_bind_async(fd, vm, 0, bo, 0, addr, bo_size, sync, 1);
>>   
>>   	for (i = 0; i < num_placements; i++) {
>> -		uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
>> -		uint64_t spin_addr = addr + spin_offset;
>> -
>> -		xe_spin_init(&data[i].spin, spin_addr, false);
>> +		spin_opts.addr = addr + (char *)&data[i].spin - (char *)data;
>> +		xe_spin_init(&data[i].spin, &spin_opts);
>>   		sync[0].flags &= ~DRM_XE_SYNC_SIGNAL;
>>   		sync[1].flags |= DRM_XE_SYNC_SIGNAL;
>>   		sync[1].handle = syncobjs[i];
>>   
>>   		exec.exec_queue_id = exec_queues[i];
>> -		exec.address = spin_addr;
>> +		exec.address = spin_opts.addr;
>>   		xe_exec(fd, &exec);
>>   		xe_spin_wait_started(&data[i].spin);
>>   	}
>> diff --git a/tests/intel/xe_exec_reset.c b/tests/intel/xe_exec_reset.c
>> index a2d33baf1..be6bbada6 100644
>> --- a/tests/intel/xe_exec_reset.c
>> +++ b/tests/intel/xe_exec_reset.c
>> @@ -44,6 +44,7 @@ static void test_spin(int fd, struct drm_xe_engine_class_instance *eci)
>>   	size_t bo_size;
>>   	uint32_t bo = 0;
>>   	struct xe_spin *spin;
>> +	struct xe_spin_opts spin_opts = { .addr = addr, .preempt = false };
>>   
>>   	vm = xe_vm_create(fd, DRM_XE_VM_CREATE_ASYNC_BIND_OPS, 0);
>>   	bo_size = sizeof(*spin);
>> @@ -60,7 +61,7 @@ static void test_spin(int fd, struct drm_xe_engine_class_instance *eci)
>>   	sync[0].handle = syncobj_create(fd, 0);
>>   	xe_vm_bind_async(fd, vm, 0, bo, 0, addr, bo_size, sync, 1);
>>   
>> -	xe_spin_init(spin, addr, false);
>> +	xe_spin_init(spin, &spin_opts);
>>   
>>   	sync[0].flags &= ~DRM_XE_SYNC_SIGNAL;
>>   	sync[1].flags |= DRM_XE_SYNC_SIGNAL;
>> @@ -165,6 +166,7 @@ test_balancer(int fd, int gt, int class, int n_exec_queues, int n_execs,
>>   		uint64_t pad;
>>   		uint32_t data;
>>   	} *data;
>> +	struct xe_spin_opts spin_opts = { .preempt = false };
>>   	struct drm_xe_engine_class_instance *hwe;
>>   	struct drm_xe_engine_class_instance eci[MAX_INSTANCE];
>>   	int i, j, b, num_placements = 0, bad_batches = 1;
>> @@ -236,7 +238,6 @@ test_balancer(int fd, int gt, int class, int n_exec_queues, int n_execs,
>>   		uint64_t batch_offset = (char *)&data[i].batch - (char *)data;
>>   		uint64_t batch_addr = base_addr + batch_offset;
>>   		uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
>> -		uint64_t spin_addr = base_addr + spin_offset;
>>   		uint64_t sdi_offset = (char *)&data[i].data - (char *)data;
>>   		uint64_t sdi_addr = base_addr + sdi_offset;
>>   		uint64_t exec_addr;
>> @@ -247,8 +248,9 @@ test_balancer(int fd, int gt, int class, int n_exec_queues, int n_execs,
>>   			batches[j] = batch_addr;
>>   
>>   		if (i < bad_batches) {
>> -			xe_spin_init(&data[i].spin, spin_addr, false);
>> -			exec_addr = spin_addr;
>> +			spin_opts.addr = base_addr + spin_offset;
>> +			xe_spin_init(&data[i].spin, &spin_opts);
>> +			exec_addr = spin_opts.addr;
>>   		} else {
>>   			b = 0;
>>   			data[i].batch[b++] = MI_STORE_DWORD_IMM_GEN4;
>> @@ -368,6 +370,7 @@ test_legacy_mode(int fd, struct drm_xe_engine_class_instance *eci,
>>   		uint64_t pad;
>>   		uint32_t data;
>>   	} *data;
>> +	struct xe_spin_opts spin_opts = { .preempt = false };
>>   	int i, b;
>>   
>>   	igt_assert(n_exec_queues <= MAX_N_EXECQUEUES);
>> @@ -417,15 +420,15 @@ test_legacy_mode(int fd, struct drm_xe_engine_class_instance *eci,
>>   		uint64_t batch_offset = (char *)&data[i].batch - (char *)data;
>>   		uint64_t batch_addr = base_addr + batch_offset;
>>   		uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
>> -		uint64_t spin_addr = base_addr + spin_offset;
>>   		uint64_t sdi_offset = (char *)&data[i].data - (char *)data;
>>   		uint64_t sdi_addr = base_addr + sdi_offset;
>>   		uint64_t exec_addr;
>>   		int e = i % n_exec_queues;
>>   
>>   		if (!i) {
>> -			xe_spin_init(&data[i].spin, spin_addr, false);
>> -			exec_addr = spin_addr;
>> +			spin_opts.addr = base_addr + spin_offset;
>> +			xe_spin_init(&data[i].spin, &spin_opts);
>> +			exec_addr = spin_opts.addr;
>>   		} else {
>>   			b = 0;
>>   			data[i].batch[b++] = MI_STORE_DWORD_IMM_GEN4;
>> @@ -539,6 +542,7 @@ test_compute_mode(int fd, struct drm_xe_engine_class_instance *eci,
>>   		uint64_t exec_sync;
>>   		uint32_t data;
>>   	} *data;
>> +	struct xe_spin_opts spin_opts = { .preempt = false };
>>   	int i, b;
>>   
>>   	igt_assert(n_exec_queues <= MAX_N_EXECQUEUES);
>> @@ -593,15 +597,15 @@ test_compute_mode(int fd, struct drm_xe_engine_class_instance *eci,
>>   		uint64_t batch_offset = (char *)&data[i].batch - (char *)data;
>>   		uint64_t batch_addr = base_addr + batch_offset;
>>   		uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
>> -		uint64_t spin_addr = base_addr + spin_offset;
>>   		uint64_t sdi_offset = (char *)&data[i].data - (char *)data;
>>   		uint64_t sdi_addr = base_addr + sdi_offset;
>>   		uint64_t exec_addr;
>>   		int e = i % n_exec_queues;
>>   
>>   		if (!i) {
>> -			xe_spin_init(&data[i].spin, spin_addr, false);
>> -			exec_addr = spin_addr;
>> +			spin_opts.addr = base_addr + spin_offset;
>> +			xe_spin_init(&data[i].spin, &spin_opts);
>> +			exec_addr = spin_opts.addr;
>>   		} else {
>>   			b = 0;
>>   			data[i].batch[b++] = MI_STORE_DWORD_IMM_GEN4;
>> diff --git a/tests/intel/xe_exec_threads.c b/tests/intel/xe_exec_threads.c
>> index e64c1639a..ff4ebc280 100644
>> --- a/tests/intel/xe_exec_threads.c
>> +++ b/tests/intel/xe_exec_threads.c
>> @@ -486,6 +486,7 @@ test_legacy_mode(int fd, uint32_t vm, uint64_t addr, uint64_t userptr,
>>   		uint64_t pad;
>>   		uint32_t data;
>>   	} *data;
>> +	struct xe_spin_opts spin_opts = { .preempt = false };
>>   	int i, j, b, hang_exec_queue = n_exec_queues / 2;
>>   	bool owns_vm = false, owns_fd = false;
>>   
>> @@ -562,15 +563,15 @@ test_legacy_mode(int fd, uint32_t vm, uint64_t addr, uint64_t userptr,
>>   		uint64_t batch_offset = (char *)&data[i].batch - (char *)data;
>>   		uint64_t batch_addr = addr + batch_offset;
>>   		uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
>> -		uint64_t spin_addr = addr + spin_offset;
>>   		uint64_t sdi_offset = (char *)&data[i].data - (char *)data;
>>   		uint64_t sdi_addr = addr + sdi_offset;
>>   		uint64_t exec_addr;
>>   		int e = i % n_exec_queues;
>>   
>>   		if (flags & HANG && e == hang_exec_queue && i == e) {
>> -			xe_spin_init(&data[i].spin, spin_addr, false);
>> -			exec_addr = spin_addr;
>> +			spin_opts.addr = addr + spin_offset;
>> +			xe_spin_init(&data[i].spin, &spin_opts);
>> +			exec_addr = spin_opts.addr;
>>   		} else {
>>   			b = 0;
>>   			data[i].batch[b++] = MI_STORE_DWORD_IMM_GEN4;
>> diff --git a/tests/intel/xe_vm.c b/tests/intel/xe_vm.c
>> index e42c04e33..dc1850338 100644
>> --- a/tests/intel/xe_vm.c
>> +++ b/tests/intel/xe_vm.c
>> @@ -727,6 +727,7 @@ test_bind_execqueues_independent(int fd, struct drm_xe_engine_class_instance *ec
>>   		uint64_t pad;
>>   		uint32_t data;
>>   	} *data;
>> +	struct xe_spin_opts spin_opts = { .preempt = true };
>>   	int i, b;
>>   
>>   	vm = xe_vm_create(fd, DRM_XE_VM_CREATE_ASYNC_BIND_OPS, 0);
>> @@ -755,14 +756,14 @@ test_bind_execqueues_independent(int fd, struct drm_xe_engine_class_instance *ec
>>   		uint64_t sdi_offset = (char *)&data[i].data - (char *)data;
>>   		uint64_t sdi_addr = addr + sdi_offset;
>>   		uint64_t spin_offset = (char *)&data[i].spin - (char *)data;
>> -		uint64_t spin_addr = addr + spin_offset;
>>   		int e = i;
>>   
>>   		if (i == 0) {
>>   			/* Cork 1st exec_queue with a spinner */
>> -			xe_spin_init(&data[i].spin, spin_addr, true);
>> +			spin_opts.addr = addr + spin_offset;
>> +			xe_spin_init(&data[i].spin, &spin_opts);
>>   			exec.exec_queue_id = exec_queues[e];
>> -			exec.address = spin_addr;
>> +			exec.address = spin_opts.addr;
>>   			sync[0].flags &= ~DRM_XE_SYNC_SIGNAL;
>>   			sync[1].flags |= DRM_XE_SYNC_SIGNAL;
>>   			sync[1].handle = syncobjs[e];
>> -- 
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 6/8] benchmarks/gem_wsim: allow comments in workload description files
  2023-09-21 15:05     ` Bernatowicz, Marcin
@ 2023-09-21 15:22       ` Tvrtko Ursulin
  2023-09-21 16:20         ` Bernatowicz, Marcin
  0 siblings, 1 reply; 22+ messages in thread
From: Tvrtko Ursulin @ 2023-09-21 15:22 UTC (permalink / raw)
  To: Bernatowicz, Marcin, igt-dev; +Cc: chris.p.wilson


On 21/09/2023 16:05, Bernatowicz, Marcin wrote:
> Hi,
> 
> On 9/20/2023 6:13 PM, Tvrtko Ursulin wrote:
>>
>> On 06/09/2023 16:51, Marcin Bernatowicz wrote:
>>> Lines starting with '#' are skipped.
>>> If command line step separator (',') is encountered after '#'
>>> it is replaced with ';' to not break parsing.
>>>
>>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>>> ---
>>>   benchmarks/gem_wsim.c  | 41 ++++++++++++++++++++++++++++++++---------
>>>   benchmarks/wsim/README |  2 ++
>>>   2 files changed, 34 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>>> index 0c1b58727..ec9fdc2d0 100644
>>> --- a/benchmarks/gem_wsim.c
>>> +++ b/benchmarks/gem_wsim.c
>>> @@ -43,6 +43,7 @@
>>>   #include <limits.h>
>>>   #include <pthread.h>
>>>   #include <math.h>
>>> +#include <ctype.h>
>>>   #include "drm.h"
>>>   #include "drmtest.h"
>>> @@ -94,6 +95,7 @@ enum w_type {
>>>       TERMINATE,
>>>       SSEU,
>>>       WORKINGSET,
>>> +    SKIP,
>>>   };
>>>   struct dep_entry {
>>> @@ -930,6 +932,12 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>           if (field) {
>>>               fstart = NULL;
>>> +            /* line starting with # is a comment */
>>> +            if (field[0] == '#') {
>>> +                step.type = SKIP;
>>> +                goto add_step;
>>> +            }
>>
>> Do they need to be recorded as steps and couldn't be simply silently 
>> skipped over while parsing?
> 
> Looks indeed a bool skip may be enough. It's a dummy step not stored in 
> workload steps.

Cool, that would be the best.

>>
>> How does relative step referencing works when comments are present? 
>> (Batch implicit dependencies and 'a' and 's' commands.)
> 
> Comments may be added to existing workloads without a problem as the 
> SKIPs are not incrementing nr_step (does not create a record).

Oookay, guess I was confused with the goto add_step. :)

> Maybe a small confusion is that step.idx does not correspond to line 
> number, so in case of failed parse we get a step number without a line :/

Hm? If parse fails the whole program fails so I didn't get this.

Regards,

Tvrtko

> 
> Regards,
> Marcin
>>
>> Regards,
>>
>> Tvrtko
>>
>>> +
>>>               if (!strcmp(field, "d")) {
>>>                   int_field(DELAY, delay, tmp <= 0,
>>>                         "Invalid delay at step %u!\n");
>>> @@ -1194,7 +1202,8 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>           if (field) {
>>>               fstart = NULL;
>>> -            check_arg(strlen(field) != 1 ||
>>> +            check_arg(!strlen(field) ||
>>> +                  (strlen(field) > 1 && !isspace(field[1]) && 
>>> field[1] != '#') ||
>>>                     (field[0] != '0' && field[0] != '1'),
>>>                     "Invalid wait boolean at step %u!\n",
>>>                     nr_steps);
>>> @@ -1208,18 +1217,23 @@ parse_workload(struct w_arg *arg, unsigned 
>>> int flags, double scale_dur,
>>>           step.type = BATCH;
>>>   add_step:
>>> -        if (step.type == DELAY)
>>> -            step.delay = __duration(step.delay, scale_time);
>>> +        if (step.type == SKIP) {
>>> +            if (verbose > 3)
>>> +                printf("skipped STEP: %s\n", _token);
>>> +        } else {
>>> +            if (step.type == DELAY)
>>> +                step.delay = __duration(step.delay, scale_time);
>>> -        step.idx = nr_steps++;
>>> -        step.request = -1;
>>> -        steps = realloc(steps, sizeof(step) * nr_steps);
>>> -        igt_assert(steps);
>>> +            step.idx = nr_steps++;
>>> +            step.request = -1;
>>> +            steps = realloc(steps, sizeof(step) * nr_steps);
>>> +            igt_assert(steps);
>>> -        memcpy(&steps[nr_steps - 1], &step, sizeof(step));
>>> +            memcpy(&steps[nr_steps - 1], &step, sizeof(step));
>>> +        }
>>>           free(token);
>>> -    }
>>> +    } // while ((_token = strtok_r(tstart, ",", &tctx))) {
>>>       if (app_w) {
>>>           steps = realloc(steps, sizeof(step) *
>>> @@ -2304,6 +2318,8 @@ static void *run_workload(void *data)
>>>               enum intel_engine_id engine = w->engine;
>>>               int do_sleep = 0;
>>> +            igt_assert(w->type != SKIP);
>>> +
>>>               if (w->type == DELAY) {
>>>                   do_sleep = w->delay;
>>>               } else if (w->type == PERIOD) {
>>> @@ -2543,6 +2559,13 @@ static char *load_workload_descriptor(char 
>>> *filename)
>>>       close(infd);
>>>       for (i = 0; i < len; i++) {
>>> +        /* '#' starts comment till end of line */
>>> +        if (buf[i] == '#')
>>> +            /* replace ',' in comments to not break parsing */
>>> +            while (++i < len && buf[i] != '\n')
>>> +                if (buf[i] == ',')
>>> +                    buf[i] = ';';
>>> +
>>>           if (buf[i] == '\n')
>>>               buf[i] = ',';
>>>       }
>>> diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
>>> index 8c71f2fe6..e4fd61645 100644
>>> --- a/benchmarks/wsim/README
>>> +++ b/benchmarks/wsim/README
>>> @@ -1,6 +1,8 @@
>>>   Workload descriptor format
>>>   ==========================
>>> +Lines starting with '#' are treated as comments (do not create work 
>>> step).
>>> +
>>>   ctx.engine.duration_us.dependency.wait,...
>>>   <uint>.<str>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
>>>   B.<uint>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 8/8] [RFC] benchmarks/gem_wsim: added basic xe support
  2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 8/8] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
@ 2023-09-21 15:57   ` Tvrtko Ursulin
  2023-09-21 19:39     ` Bernatowicz, Marcin
  0 siblings, 1 reply; 22+ messages in thread
From: Tvrtko Ursulin @ 2023-09-21 15:57 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 06/09/2023 16:51, Marcin Bernatowicz wrote:
> Added basic xe support with few examples.
> Single binary handles both i915 and Xe devices,
> but workload definitions differ between i915 and xe.
> Xe does not use context abstraction, introduces new VM and Exec Queue
> steps and BATCH step references exec queue.
> For more details see wsim/README.
> Some functionality is still missing: working sets,
> load balancing (need some input if/how to do it in Xe - exec queues
> width?).
> 
> The tool is handy for scheduling tests, we find it useful to verify vGPU
> profiles defining different execution quantum/preemption timeout
> settings.
> 
> There is also some rationale for the tool in following thread:
> https://lore.kernel.org/dri-devel/a443495f-5d1b-52e1-9b2f-80167deb6d57@linux.intel.com/
> 
> With this patch it should be possible to run following on xe device:
> 
> gem_wsim -w benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim -c 36 -r 600

For historical reference there used to be a tool called media-bench.pl in IGT which was used to answer a question of "how many streams of this can this load balancer do". In simplified terms it worked by increasing the -c above until engine busyness would stop growing, which meant saturation. With that we were able to compare load balancing strategies and some other things. Like how many streams until starting to drop frames.

These days, if resurrected, or resurrected in principle, it could answer the question of which driver can fit more streams of workload X, or does the new GuC fw regress something.

> Best with drm debug logs disabled:
> 
> echo 0 > /sys/module/drm/parameters/debug
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c                         | 534 ++++++++++++++++--
>   benchmarks/wsim/README                        |  85 ++-
>   benchmarks/wsim/xe_cloud-gaming-60fps.wsim    |  25 +
>   benchmarks/wsim/xe_example.wsim               |  28 +
>   benchmarks/wsim/xe_example01.wsim             |  19 +
>   benchmarks/wsim/xe_example_fence.wsim         |  23 +
>   .../wsim/xe_media_load_balance_fhd26u7.wsim   |  63 +++
>   7 files changed, 722 insertions(+), 55 deletions(-)
>   create mode 100644 benchmarks/wsim/xe_cloud-gaming-60fps.wsim
>   create mode 100644 benchmarks/wsim/xe_example.wsim
>   create mode 100644 benchmarks/wsim/xe_example01.wsim
>   create mode 100644 benchmarks/wsim/xe_example_fence.wsim
>   create mode 100644 benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim

8<
> diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
> index e4fd61645..ddfefff47 100644
> --- a/benchmarks/wsim/README
> +++ b/benchmarks/wsim/README
> @@ -3,6 +3,7 @@ Workload descriptor format
>   
>   Lines starting with '#' are treated as comments (do not create work step).
>   
> +# i915
>   ctx.engine.duration_us.dependency.wait,...
>   <uint>.<str>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
>   B.<uint>
> @@ -13,6 +14,23 @@ b.<uint>.<str>[|<str>].<str>
>   w|W.<uint>.<str>[/<str>]...
>   f
>   
> +# xe
> +Xe does not use context abstraction and adds additional work step types
> +for VM (v.) and exec queue (e.) creation.
> +Each v. and e. step creates array entry (in workload's VM and Exec Queue arrays).
> +Batch step references the exec queue on which it is to be executed.
> +Exec queue reference (eq_idx) is the index (0-based) in workload's exec queue array.
> +VM reference (vm_idx) is the index (0-based) in workload's VM array.
> +
> +v.compute_mode
> +v.<0|1>
> +e.vm_idx.class.instance.compute_mode.job_timeout_ms,...
> +e.<uint>.<uint 0=RCS,1=BCS,2=VCS,3=VECS,4=CCS>.<int>.<0|1>.<uint>,...
> +eq_idx.duration_us.dependency.wait,...
> +<uint>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
> +d|p|s|t|q|a|T.<int>,...
> +f
> +
>   For duration a range can be given from which a random value will be picked
>   before every submit. Since this and seqno management requires CPU access to
>   objects, care needs to be taken in order to ensure the submit queue is deep
> @@ -29,21 +47,22 @@ Additional workload steps are also supported:
>    'q' - Throttle to n max queue depth.
>    'f' - Create a sync fence.
>    'a' - Advance the previously created sync fence.
> - 'B' - Turn on context load balancing.
> - 'b' - Set up engine bonds.
> - 'M' - Set up engine map.
> - 'P' - Context priority.
> - 'S' - Context SSEU configuration.
> + 'B' - Turn on context load balancing. (i915 only)
> + 'b' - Set up engine bonds. (i915 only)
> + 'M' - Set up engine map. (i915 only)
> + 'P' - Context priority. (i915 only)
> + 'S' - Context SSEU configuration. (i915 only)
>    'T' - Terminate an infinite batch.
> - 'w' - Working set. (See Working sets section.)
> - 'W' - Shared working set.
> - 'X' - Context preemption control.
> + 'w' - Working set. (See Working sets section.) (i915 only)
> + 'W' - Shared working set. (i915 only)
> + 'X' - Context preemption control. (i915 only)
>   
>   Engine ids: DEFAULT, RCS, BCS, VCS, VCS1, VCS2, VECS
>   
>   Example (leading spaces must not be present in the actual file):
>   ----------------------------------------------------------------
>   
> +# i915
>     1.VCS1.3000.0.1
>     1.RCS.500-1000.-1.0
>     1.RCS.3700.0.0
> @@ -53,6 +72,25 @@ Example (leading spaces must not be present in the actual file):
>     1.VCS2.600.-1.1
>     p.16000
>   
> +# xe equivalent
> +  #VM: v.compute_mode
> +  v.0
> +  #EXEC_QUEUE: e.vm_idx.class.intance.compute_mode.job_timeout_ms
> +  e.0.2.0.0.0 # VCS1

A minor digression - I would suggest using more symbolic names and less numbers. For instance encode class instance in names.

> +  e.0.0.0.0.0 # RCS
> +  e.0.2.1.0.0 # VCS2
> +  e.0.0.0.0.0 # second RCS exec queue
> +  #BATCH: eq_idx.duration.dependency.wait
> +  0.3000.0.1       # 1.VCS1.3000.0.1
> +  1.500-1000.-1.0  # 1.RCS.500-1000.-1.0
> +  3.3700.0.0       # 1.RCS.3700.0.0
> +  1.1000.-2.1      # 1.RCS.1000.-2.0
> +  2.2300.-2.0      # 1.VCS2.2300.-2.0
> +  3.4700.-1.0      # 1.RCS.4700.-1.0
> +  2.600.-1.1       # 1.VCS2.600.-1.1
> +  p.16000

My initial feeling, and also after some thinking, is that it would be good to look for solutions for minimising divergence. That means try to avoid having completely different syntax and zero chance of workloads which can be run with either driver.

For instance the concept of a queue is relatively similar and in practice with xe ends up a little bit more limited. Which I think is solvable.

For instance I think this can be made to work with xe.

M.1.VCS1|VCS2
# or M.1.VCS - class names without numbers can be kept considered VCS*
B.1
1.VCS.500-2000.0.0

As for i915 this creates a load balancing context with engine map populated, I think with xe you have the same concept when creating a queue - allowed engine mask - right?

B.1 step you can skip with xe if it is not needed, I mean if multiple allowed engines imply load balancing there.

And then the actual submission you know it is queue 1 and VCS you can you to sanitize. If it doesn't match the queue configuration error out, otherwise just submit to the queue.

VM management can be explicit steps, and AFAIR gem_wsim already shares the VM implicitly, so for xe you just need to add some commands to make it explicit:

V.1 			# create VM 1
M.1.VCS 		# create ctx/queue 1 with all VCS engines
v.1.1			# assign VM 1 to ctx/queue 1
B.1			# turn on load balancing for ctx 1
1.VCS.1000.0.0		# submit to ctx/queue 1

I think this could work with both i915 and xe as is.

Things like compute mode you add as extensions which i915 could then ignore.

V.1
c.1	# turn on compute mode on vm 1
M.1.VCS	# do you *need* to repeat the compute mode if vm carries the info?
v.1.1
B.1
1.VCS.1000.0.0

Still would work with both i915 and xe if I am not missing something.

I mean maybe even you don't need explicit VM management in the first go and can just do what the code currently does which is shares the same VM for all contexts?

That much for now, let the brainstorming commence! :)

Regards,

Tvrtko

P.S.

Engine bonds could be used to validate and set up parallel submission queue. For instance:

b.1.VCS2.VCS1

Is probably a no-op on xe with parallel queues. Or you use it to configure the engine map order, if that is important.

Problem will be converting multiple submission into one. It is probably doable but not warranted to include. It is okay to error out for now on workloads which use the feature.

> +
> +
>   The above workload described in human language works like this:
>   
>     1.   A batch is sent to the VCS1 engine which will be executing for 3ms on the
> @@ -78,16 +116,30 @@ Multiple dependencies can be given separated by forward slashes.
>   
>   Example:
>   
> +# i915
>     1.VCS1.3000.0.1
>     1.RCS.3700.0.0
>     1.VCS2.2300.-1/-2.0
>   
> +# xe
> +  v.0
> +  e.0.2.0.0.0
> +  e.0.0.0.0.0
> +  e.0.2.1.0.0.0
> +  0.3000.0.1
> +  1.3700.0.0
> +  2.2300.-1/-2.0
> +
>   I this case the last step has a data dependency on both first and second steps.
>   
>   Batch durations can also be specified as infinite by using the '*' in the
>   duration field. Such batches must be ended by the terminate command ('T')
>   otherwise they will cause a GPU hang to be reported.
>   
> +Note: On Xe Batch dependencies are expressed with syncobjects,
> +so there is no difference between f-1 and -1
> +ex. 1.1000.-2.0 is same as 1.1000.f-2.0.
> +
>   Sync (fd) fences
>   ----------------
>   
> @@ -116,6 +168,7 @@ VCS1 and VCS2 batches will have a sync fence dependency on the RCS batch.
>   
>   Example:
>   
> +# i915
>     1.RCS.500-1000.0.0
>     f
>     2.VCS1.3000.f-1.0
> @@ -125,13 +178,27 @@ Example:
>     s.-4
>     s.-4
>   
> +# xe equivalent
> +  v.0
> +  e.0.0.0.0.0    # RCS
> +  e.0.2.0.0.0    # VCS1
> +  e.0.2.1.0.0    # VCS2
> +  0.500-1000.0.0
> +  f
> +  1.3000.f-1.0
> +  2.3000.f-2.0
> +  0.500-1000.0.1
> +  a.-4
> +  s.-4
> +  s.-4
> +
>   VCS1 and VCS2 batches have an input sync fence dependecy on the standalone fence
>   created at the second step. They are submitted ahead of time while still not
>   runnable. When the second RCS batch completes the standalone fence is signaled
>   which allows the two VCS batches to be executed. Finally we wait until the both
>   VCS batches have completed before starting the (optional) next iteration.
>   
> -Submit fences
> +Submit fences (i915 only?)
>   -------------
>   
>   Submit fences are a type of input fence which are signalled when the originating
> diff --git a/benchmarks/wsim/xe_cloud-gaming-60fps.wsim b/benchmarks/wsim/xe_cloud-gaming-60fps.wsim
> new file mode 100644
> index 000000000..9fdf15e27
> --- /dev/null
> +++ b/benchmarks/wsim/xe_cloud-gaming-60fps.wsim
> @@ -0,0 +1,25 @@
> +#w.1.10n8m
> +#w.2.3n16m
> +#1.RCS.500-1500.r1-0-4/w2-0.0
> +#1.RCS.500-1500.r1-5-9/w2-1.0
> +#1.RCS.500-1500.r2-0-1/w2-2.0
> +#M.2.VCS
> +#B.2
> +#3.RCS.500-1500.r2-2.0
> +#2.DEFAULT.2000-4000.-1.0
> +#4.VCS1.250-750.-1.1
> +#p.16667
> +#
> +#xe
> +v.0
> +e.0.0.0.0.0 # 1.RCS.500-1500.r1-0-4/w2-0.0
> +e.0.2.0.0.0 # 2.DEFAULT.2000-4000.-1.0
> +e.0.0.0.0.0 # 3.RCS.500-1500.r2-2.0
> +e.0.2.1.0.0 # 4.VCS1.250-750.-1.1
> +0.500-1500.0.0
> +0.500-1500.0.0
> +0.500-1500.0.0
> +2.500-1500.-2.0 # #3.RCS.500-1500.r2-2.0
> +1.2000-4000.-1.0
> +3.250-750.-1.1
> +p.16667
> diff --git a/benchmarks/wsim/xe_example.wsim b/benchmarks/wsim/xe_example.wsim
> new file mode 100644
> index 000000000..3fa620932
> --- /dev/null
> +++ b/benchmarks/wsim/xe_example.wsim
> @@ -0,0 +1,28 @@
> +#i915
> +#1.VCS1.3000.0.1
> +#1.RCS.500-1000.-1.0
> +#1.RCS.3700.0.0
> +#1.RCS.1000.-2.0
> +#1.VCS2.2300.-2.0
> +#1.RCS.4700.-1.0
> +#1.VCS2.600.-1.1
> +#p.16000
> +#
> +#xe
> +#
> +#VM: v.compute_mode
> +v.0
> +#EXEC_QUEUE: e.vm_idx.class.intance.compute_mode.job_timeout_ms
> +e.0.2.0.0.0 # VCS1
> +e.0.0.0.0.0 # RCS
> +e.0.2.1.0.0 # VCS2
> +e.0.0.0.0.0 # second RCS exec_queue
> +#BATCH: eq_idx.duration.dependency.wait
> +0.3000.0.1       # 1.VCS1.3000.0.1
> +1.500-1000.-1.0  # 1.RCS.500-1000.-1.0
> +3.3700.0.0       # 1.RCS.3700.0.0
> +1.1000.-2.1      # 1.RCS.1000.-2.0
> +2.2300.-2.0      # 1.VCS2.2300.-2.0
> +3.4700.-1.0      # 1.RCS.4700.-1.0
> +2.600.-1.1       # 1.VCS2.600.-1.1
> +p.16000
> diff --git a/benchmarks/wsim/xe_example01.wsim b/benchmarks/wsim/xe_example01.wsim
> new file mode 100644
> index 000000000..496905371
> --- /dev/null
> +++ b/benchmarks/wsim/xe_example01.wsim
> @@ -0,0 +1,19 @@
> +#VM: v.compute_mode
> +v.0
> +#EXEC_QUEUE: e.vm_idx.class.intance.compute_mode.job_timeout_ms
> +e.0.0.0.0.0
> +e.0.2.0.0.0
> +e.0.1.0.0.0
> +#BATCH: eq_idx.duration.dependency.wait
> +# B1 - 10ms batch on BCS0
> +2.10000.0.0
> +# B2 - 10ms batch on RCS0; waits on B1
> +0.10000.0.0
> +# B3 - 10ms batch on VECS0; waits on B2
> +1.10000.0.0
> +# B4 - 10ms batch on BCS0
> +2.10000.0.0
> +# B5 - 10ms batch on RCS0; waits on B4
> +0.10000.-1.0
> +# B6 - 10ms batch on VECS0; waits on B5; wait on batch fence out
> +1.10000.-1.1
> diff --git a/benchmarks/wsim/xe_example_fence.wsim b/benchmarks/wsim/xe_example_fence.wsim
> new file mode 100644
> index 000000000..4f810d64e
> --- /dev/null
> +++ b/benchmarks/wsim/xe_example_fence.wsim
> @@ -0,0 +1,23 @@
> +#i915
> +#1.RCS.500-1000.0.0
> +#f
> +#2.VCS1.3000.f-1.0
> +#2.VCS2.3000.f-2.0
> +#1.RCS.500-1000.0.1
> +#a.-4
> +#s.-4
> +#s.-4
> +#
> +#xe
> +v.0
> +e.0.0.0.0.0
> +e.0.2.0.0.0
> +e.0.2.1.0.0
> +0.500-1000.0.0
> +f
> +1.3000.f-1.0
> +2.3000.f-2.0
> +0.500-1000.0.1
> +a.-4
> +s.-4
> +s.-4
> diff --git a/benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim b/benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim
> new file mode 100644
> index 000000000..2214914eb
> --- /dev/null
> +++ b/benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim
> @@ -0,0 +1,63 @@
> +# https://lore.kernel.org/dri-devel/a443495f-5d1b-52e1-9b2f-80167deb6d57@linux.intel.com/
> +#i915
> +#M.3.VCS
> +#B.3
> +#1.VCS1.1200-1800.0.0
> +#1.VCS1.1900-2100.0.0
> +#2.RCS.1500-2000.-1.0
> +#3.VCS.1400-1800.-1.1
> +#1.VCS1.1900-2100.-1.0
> +#2.RCS.1500-2000.-1.0
> +#3.VCS.1400-1800.-1.1
> +#1.VCS1.1900-2100.-1.0
> +#2.RCS.200-400.-1.0
> +#2.RCS.1500-2000.0.0
> +#3.VCS.1400-1800.-1.1
> +#1.VCS1.1900-2100.-1.0
> +#2.RCS.1500-2000.-1.0
> +#3.VCS.1400-1800.-1.1
> +#1.VCS1.1900-2100.-1.0
> +#2.RCS.200-400.-1.0
> +#2.RCS.1500-2000.0.0
> +#3.VCS.1400-1800.-1.1
> +#1.VCS1.1900-2100.-1.0
> +#2.RCS.1500-2000.-1.0
> +#3.VCS.1400-1800.-1.1
> +#1.VCS1.1900-2100.-1.0
> +#2.RCS.1500-2000.-1.0
> +#2.RCS.1500-2000.0.0
> +#3.VCS.1400-1800.-1.1
> +#
> +#xe
> +#
> +#M.3.VCS ??
> +#B.3     ??
> +v.0
> +e.0.2.0.0.0 # 1.VCS1
> +e.0.0.0.0.0 # 2.RCS
> +e.0.2.1.0.0 # 3.VCS - no load balancing yet always VCS2
> +0.1200-1800.0.0
> +0.1900-2100.0.0
> +1.1500-2000.-1.0
> +2.1400-1800.-1.1
> +0.1900-2100.-1.0
> +1.1500-2000.-1.0
> +2.1400-1800.-1.1
> +0.1900-2100.-1.0
> +1.200-400.-1.0
> +1.1500-2000.0.0
> +2.1400-1800.-1.1
> +0.1900-2100.-1.0
> +1.1500-2000.-1.0
> +2.1400-1800.-1.1
> +0.1900-2100.-1.0
> +1.200-400.-1.0
> +1.1500-2000.0.0
> +2.1400-1800.-1.1
> +0.1900-2100.-1.0
> +1.1500-2000.-1.0
> +2.1400-1800.-1.1
> +0.1900-2100.-1.0
> +1.1500-2000.-1.0
> +1.1500-2000.0.0
> +2.1400-1800.-1.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 6/8] benchmarks/gem_wsim: allow comments in workload description files
  2023-09-21 15:22       ` Tvrtko Ursulin
@ 2023-09-21 16:20         ` Bernatowicz, Marcin
  2023-09-25  9:03           ` Tvrtko Ursulin
  0 siblings, 1 reply; 22+ messages in thread
From: Bernatowicz, Marcin @ 2023-09-21 16:20 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: chris.p.wilson



On 9/21/2023 5:22 PM, Tvrtko Ursulin wrote:
> 
> On 21/09/2023 16:05, Bernatowicz, Marcin wrote:
>> Hi,
>>
>> On 9/20/2023 6:13 PM, Tvrtko Ursulin wrote:
>>>
>>> On 06/09/2023 16:51, Marcin Bernatowicz wrote:
>>>> Lines starting with '#' are skipped.
>>>> If command line step separator (',') is encountered after '#'
>>>> it is replaced with ';' to not break parsing.
>>>>
>>>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>>>> ---
>>>>   benchmarks/gem_wsim.c  | 41 ++++++++++++++++++++++++++++++++---------
>>>>   benchmarks/wsim/README |  2 ++
>>>>   2 files changed, 34 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>>>> index 0c1b58727..ec9fdc2d0 100644
>>>> --- a/benchmarks/gem_wsim.c
>>>> +++ b/benchmarks/gem_wsim.c
>>>> @@ -43,6 +43,7 @@
>>>>   #include <limits.h>
>>>>   #include <pthread.h>
>>>>   #include <math.h>
>>>> +#include <ctype.h>
>>>>   #include "drm.h"
>>>>   #include "drmtest.h"
>>>> @@ -94,6 +95,7 @@ enum w_type {
>>>>       TERMINATE,
>>>>       SSEU,
>>>>       WORKINGSET,
>>>> +    SKIP,
>>>>   };
>>>>   struct dep_entry {
>>>> @@ -930,6 +932,12 @@ parse_workload(struct w_arg *arg, unsigned int 
>>>> flags, double scale_dur,
>>>>           if (field) {
>>>>               fstart = NULL;
>>>> +            /* line starting with # is a comment */
>>>> +            if (field[0] == '#') {
>>>> +                step.type = SKIP;
>>>> +                goto add_step;
>>>> +            }
>>>
>>> Do they need to be recorded as steps and couldn't be simply silently 
>>> skipped over while parsing?
>>
>> Looks indeed a bool skip may be enough. It's a dummy step not stored 
>> in workload steps.
> 
> Cool, that would be the best.
> 
>>>
>>> How does relative step referencing works when comments are present? 
>>> (Batch implicit dependencies and 'a' and 's' commands.)
>>
>> Comments may be added to existing workloads without a problem as the 
>> SKIPs are not incrementing nr_step (does not create a record).
> 
> Oookay, guess I was confused with the goto add_step. :)
> 
>> Maybe a small confusion is that step.idx does not correspond to line 
>> number, so in case of failed parse we get a step number without a line :/
> 
> Hm? If parse fails the whole program fails so I didn't get this.

Sorry, I created a confusion. A Failure message does not point the line 
number but the step number (0-based) - that didn't change.
Looking at workload definition (broken for example) in text editor 
having line number could easy a bit:

1  # Workload simulating transcoding session
2  # ex. gem_wsim -w benchmarks/wsim/media_load_balance_fhd26u7.wsim -c 
36 -r 600
3  # will run 36 parallel transcoding session streams for 600 frames each
4  M.3.VCS
5  B.3
6  1.VCS1.1200-1800.0.0
7  1.VCS1.1900-2100.0.0
8  2.RCS.1500-2000.-1.0
9  3.VCS.1400-1800.-1.1
10 1.VCS1.1900-2100.-1.0
11 2.RCS.1500-2000.-1.0
12 3.VCS.1400-1800.-1.1
13 1.VCS1.1900-2100.-1.0
14 2.RCS.200-400.-1.0
15 2.RCS.1500-2000.0.0
16 3.VCS.1400-1800.-1.1
17 1.VCS1.1900-2100.-1.0
18 2.RCS.1500-2000.-1.0
19 3.VCS.1400-1800.-1.1
20 1.VCS1.1900-2100.-1.0
21 2.RCS.200-400.-1.0
22 2.RCS.1500-2000.0.0
23 3.VCS.1400-k1800.-1.1
24 1.VCS1.1900-2100.-1.0
25 2.RCS.1500-2000.-1.0
26 3.VCS.1400-1800.-1.1
27 1.VCS1.1900-2100.-1.0
28 2.RCS.1500-2000.-1.0
29 2.RCS.1500-2000.0.0
30 3.VCS.1400-1800.-1.1

Current parse failure message will tell:

Invalid duration at step 19!
Failed to parse workload 0!

Which was on line 20 (step no + 1) and now is on 23 (+ no of comment lines).
But there are more important matters :)

Regards,
Marcin

> 
> Regards,
> 
> Tvrtko
> 
>>
>> Regards,
>> Marcin
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> +
>>>>               if (!strcmp(field, "d")) {
>>>>                   int_field(DELAY, delay, tmp <= 0,
>>>>                         "Invalid delay at step %u!\n");
>>>> @@ -1194,7 +1202,8 @@ parse_workload(struct w_arg *arg, unsigned int 
>>>> flags, double scale_dur,
>>>>           if (field) {
>>>>               fstart = NULL;
>>>> -            check_arg(strlen(field) != 1 ||
>>>> +            check_arg(!strlen(field) ||
>>>> +                  (strlen(field) > 1 && !isspace(field[1]) && 
>>>> field[1] != '#') ||
>>>>                     (field[0] != '0' && field[0] != '1'),
>>>>                     "Invalid wait boolean at step %u!\n",
>>>>                     nr_steps);
>>>> @@ -1208,18 +1217,23 @@ parse_workload(struct w_arg *arg, unsigned 
>>>> int flags, double scale_dur,
>>>>           step.type = BATCH;
>>>>   add_step:
>>>> -        if (step.type == DELAY)
>>>> -            step.delay = __duration(step.delay, scale_time);
>>>> +        if (step.type == SKIP) {
>>>> +            if (verbose > 3)
>>>> +                printf("skipped STEP: %s\n", _token);
>>>> +        } else {
>>>> +            if (step.type == DELAY)
>>>> +                step.delay = __duration(step.delay, scale_time);
>>>> -        step.idx = nr_steps++;
>>>> -        step.request = -1;
>>>> -        steps = realloc(steps, sizeof(step) * nr_steps);
>>>> -        igt_assert(steps);
>>>> +            step.idx = nr_steps++;
>>>> +            step.request = -1;
>>>> +            steps = realloc(steps, sizeof(step) * nr_steps);
>>>> +            igt_assert(steps);
>>>> -        memcpy(&steps[nr_steps - 1], &step, sizeof(step));
>>>> +            memcpy(&steps[nr_steps - 1], &step, sizeof(step));
>>>> +        }
>>>>           free(token);
>>>> -    }
>>>> +    } // while ((_token = strtok_r(tstart, ",", &tctx))) {
>>>>       if (app_w) {
>>>>           steps = realloc(steps, sizeof(step) *
>>>> @@ -2304,6 +2318,8 @@ static void *run_workload(void *data)
>>>>               enum intel_engine_id engine = w->engine;
>>>>               int do_sleep = 0;
>>>> +            igt_assert(w->type != SKIP);
>>>> +
>>>>               if (w->type == DELAY) {
>>>>                   do_sleep = w->delay;
>>>>               } else if (w->type == PERIOD) {
>>>> @@ -2543,6 +2559,13 @@ static char *load_workload_descriptor(char 
>>>> *filename)
>>>>       close(infd);
>>>>       for (i = 0; i < len; i++) {
>>>> +        /* '#' starts comment till end of line */
>>>> +        if (buf[i] == '#')
>>>> +            /* replace ',' in comments to not break parsing */
>>>> +            while (++i < len && buf[i] != '\n')
>>>> +                if (buf[i] == ',')
>>>> +                    buf[i] = ';';
>>>> +
>>>>           if (buf[i] == '\n')
>>>>               buf[i] = ',';
>>>>       }
>>>> diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
>>>> index 8c71f2fe6..e4fd61645 100644
>>>> --- a/benchmarks/wsim/README
>>>> +++ b/benchmarks/wsim/README
>>>> @@ -1,6 +1,8 @@
>>>>   Workload descriptor format
>>>>   ==========================
>>>> +Lines starting with '#' are treated as comments (do not create work 
>>>> step).
>>>> +
>>>>   ctx.engine.duration_us.dependency.wait,...
>>>>   <uint>.<str>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
>>>>   B.<uint>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 8/8] [RFC] benchmarks/gem_wsim: added basic xe support
  2023-09-21 15:57   ` Tvrtko Ursulin
@ 2023-09-21 19:39     ` Bernatowicz, Marcin
  2023-09-25  9:16       ` Tvrtko Ursulin
  0 siblings, 1 reply; 22+ messages in thread
From: Bernatowicz, Marcin @ 2023-09-21 19:39 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: chris.p.wilson

Hi,

On 9/21/2023 5:57 PM, Tvrtko Ursulin wrote:
> 
> On 06/09/2023 16:51, Marcin Bernatowicz wrote:
>> Added basic xe support with few examples.
>> Single binary handles both i915 and Xe devices,
>> but workload definitions differ between i915 and xe.
>> Xe does not use context abstraction, introduces new VM and Exec Queue
>> steps and BATCH step references exec queue.
>> For more details see wsim/README.
>> Some functionality is still missing: working sets,
>> load balancing (need some input if/how to do it in Xe - exec queues
>> width?).
>>
>> The tool is handy for scheduling tests, we find it useful to verify vGPU
>> profiles defining different execution quantum/preemption timeout
>> settings.
>>
>> There is also some rationale for the tool in following thread:
>> https://lore.kernel.org/dri-devel/a443495f-5d1b-52e1-9b2f-80167deb6d57@linux.intel.com/
>>
>> With this patch it should be possible to run following on xe device:
>>
>> gem_wsim -w benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim -c 36 
>> -r 600
> 
> For historical reference there used to be a tool called media-bench.pl 
> in IGT which was used to answer a question of "how many streams of this 
> can this load balancer do". In simplified terms it worked by increasing 
> the -c above until engine busyness would stop growing, which meant 
> saturation. With that we were able to compare load balancing strategies 
> and some other things. Like how many streams until starting to drop frames.
> 
> These days, if resurrected, or resurrected in principle, it could answer 
> the question of which driver can fit more streams of workload X, or does 
> the new GuC fw regress something.

interesting
> 
>> Best with drm debug logs disabled:
>>
>> echo 0 > /sys/module/drm/parameters/debug
>>
>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>> ---
>>   benchmarks/gem_wsim.c                         | 534 ++++++++++++++++--
>>   benchmarks/wsim/README                        |  85 ++-
>>   benchmarks/wsim/xe_cloud-gaming-60fps.wsim    |  25 +
>>   benchmarks/wsim/xe_example.wsim               |  28 +
>>   benchmarks/wsim/xe_example01.wsim             |  19 +
>>   benchmarks/wsim/xe_example_fence.wsim         |  23 +
>>   .../wsim/xe_media_load_balance_fhd26u7.wsim   |  63 +++
>>   7 files changed, 722 insertions(+), 55 deletions(-)
>>   create mode 100644 benchmarks/wsim/xe_cloud-gaming-60fps.wsim
>>   create mode 100644 benchmarks/wsim/xe_example.wsim
>>   create mode 100644 benchmarks/wsim/xe_example01.wsim
>>   create mode 100644 benchmarks/wsim/xe_example_fence.wsim
>>   create mode 100644 benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim
> 
> 8<
>> diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
>> index e4fd61645..ddfefff47 100644
>> --- a/benchmarks/wsim/README
>> +++ b/benchmarks/wsim/README
>> @@ -3,6 +3,7 @@ Workload descriptor format
>>   Lines starting with '#' are treated as comments (do not create work 
>> step).
>> +# i915
>>   ctx.engine.duration_us.dependency.wait,...
>>   <uint>.<str>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
>>   B.<uint>
>> @@ -13,6 +14,23 @@ b.<uint>.<str>[|<str>].<str>
>>   w|W.<uint>.<str>[/<str>]...
>>   f
>> +# xe
>> +Xe does not use context abstraction and adds additional work step types
>> +for VM (v.) and exec queue (e.) creation.
>> +Each v. and e. step creates array entry (in workload's VM and Exec 
>> Queue arrays).
>> +Batch step references the exec queue on which it is to be executed.
>> +Exec queue reference (eq_idx) is the index (0-based) in workload's 
>> exec queue array.
>> +VM reference (vm_idx) is the index (0-based) in workload's VM array.
>> +
>> +v.compute_mode
>> +v.<0|1>
>> +e.vm_idx.class.instance.compute_mode.job_timeout_ms,...
>> +e.<uint>.<uint 0=RCS,1=BCS,2=VCS,3=VECS,4=CCS>.<int>.<0|1>.<uint>,...
>> +eq_idx.duration_us.dependency.wait,...
>> +<uint>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
>> +d|p|s|t|q|a|T.<int>,...
>> +f
>> +
>>   For duration a range can be given from which a random value will be 
>> picked
>>   before every submit. Since this and seqno management requires CPU 
>> access to
>>   objects, care needs to be taken in order to ensure the submit queue 
>> is deep
>> @@ -29,21 +47,22 @@ Additional workload steps are also supported:
>>    'q' - Throttle to n max queue depth.
>>    'f' - Create a sync fence.
>>    'a' - Advance the previously created sync fence.
>> - 'B' - Turn on context load balancing.
>> - 'b' - Set up engine bonds.
>> - 'M' - Set up engine map.
>> - 'P' - Context priority.
>> - 'S' - Context SSEU configuration.
>> + 'B' - Turn on context load balancing. (i915 only)
>> + 'b' - Set up engine bonds. (i915 only)
>> + 'M' - Set up engine map. (i915 only)
>> + 'P' - Context priority. (i915 only)
>> + 'S' - Context SSEU configuration. (i915 only)
>>    'T' - Terminate an infinite batch.
>> - 'w' - Working set. (See Working sets section.)
>> - 'W' - Shared working set.
>> - 'X' - Context preemption control.
>> + 'w' - Working set. (See Working sets section.) (i915 only)
>> + 'W' - Shared working set. (i915 only)
>> + 'X' - Context preemption control. (i915 only)
>>   Engine ids: DEFAULT, RCS, BCS, VCS, VCS1, VCS2, VECS
>>   Example (leading spaces must not be present in the actual file):
>>   ----------------------------------------------------------------
>> +# i915
>>     1.VCS1.3000.0.1
>>     1.RCS.500-1000.-1.0
>>     1.RCS.3700.0.0
>> @@ -53,6 +72,25 @@ Example (leading spaces must not be present in the 
>> actual file):
>>     1.VCS2.600.-1.1
>>     p.16000
>> +# xe equivalent
>> +  #VM: v.compute_mode
>> +  v.0
>> +  #EXEC_QUEUE: e.vm_idx.class.intance.compute_mode.job_timeout_ms
>> +  e.0.2.0.0.0 # VCS1
> 
> A minor digression - I would suggest using more symbolic names and less 
> numbers. For instance encode class instance in names.

yes, it is just a first fast prototype ;/
Currently have something like
e.vm_idx.class.instance[.jb=[uint].ts=[uint].pt=[uint].pr=[uint]]
so some fields (job timeout/timeslice_us/preempt_timeout_us/priority 
properties) are optional, so now it is : e.0.2.0 # VCS1
Introduction of symbolic names is next step, first wanted any feedback.
> 
>> +  e.0.0.0.0.0 # RCS
>> +  e.0.2.1.0.0 # VCS2
>> +  e.0.0.0.0.0 # second RCS exec queue
>> +  #BATCH: eq_idx.duration.dependency.wait
>> +  0.3000.0.1       # 1.VCS1.3000.0.1
>> +  1.500-1000.-1.0  # 1.RCS.500-1000.-1.0
>> +  3.3700.0.0       # 1.RCS.3700.0.0
>> +  1.1000.-2.1      # 1.RCS.1000.-2.0
>> +  2.2300.-2.0      # 1.VCS2.2300.-2.0
>> +  3.4700.-1.0      # 1.RCS.4700.-1.0
>> +  2.600.-1.1       # 1.VCS2.600.-1.1
>> +  p.16000
> 
> My initial feeling, and also after some thinking, is that it would be 
> good to look for solutions for minimising divergence. That means try to 
> avoid having completely different syntax and zero chance of workloads 
> which can be run with either driver.

My first thought was to introduce new syntax to have xe uAPI granularity 
(if it exposes vm, exec_queue - make it accessible), but it was just 
first shot.

> 
> For instance the concept of a queue is relatively similar and in 
> practice with xe ends up a little bit more limited. Which I think is 
> solvable.
> 
> For instance I think this can be made to work with xe.
> 
> M.1.VCS1|VCS2
> # or M.1.VCS - class names without numbers can be kept considered VCS*
> B.1
> 1.VCS.500-2000.0.0
> 
> As for i915 this creates a load balancing context with engine map 
> populated, I think with xe you have the same concept when creating a 
> queue - allowed engine mask - right?

Yes, I think we have num_placements to allow exec queue represent a set 
of engines of given class.

> 
> B.1 step you can skip with xe if it is not needed, I mean if multiple 
> allowed engines imply load balancing there.

That is to be checked. I've a patch (not posted yet) which allows to 
create exec queue with all engines of given class (using num_placements):

     benchmarks/gem_wsim: use num_placements for exec queue creation

     Enable num_placement exec queue creation option.

     Tried following workload:

     gem_wsim -w xe_media_load_balance_fhd26u7.wsim -c 36 -r 25 -v

     with three versions of exec queue definitions
     (listed from worse to best in terms of workloads/s):

     e.0.2.0 # 1.VCS1
     e.0.0.0 # 2.RCS       -> ~83% of last one
     e.0.2.-1 # any VCS

     e.0.2.0 # 1.VCS1
     e.0.0.0 # 2.RCS       -> ~85% of last one
     e.0.2.1 # always VCS2

     e.0.2.-1 # any VCS
     e.0.0.0  # RCS        -> 100%
     e.0.2.-1 # any VCS

So it looks best results (load balancing?) happen when all exec queues 
of given class are configured to use all engines of that class.
With two exec queues of which one was on VCS1 and second exec queue on 
all VCS it was even a bit worse then when first was on VCS1 and second 
on VCS2.
Full xe_media_load_balance_fhd26u7.wsim workload is at end of this 
message (i915 and my xe version), more or less we have there exec_queue 
<-> context equivalence.

> And then the actual submission you know it is queue 1 and VCS you can 
> you to sanitize. If it doesn't match the queue configuration error out, 
> otherwise just submit to the queue.
> 
> VM management can be explicit steps, and AFAIR gem_wsim already shares 
> the VM implicitly, so for xe you just need to add some commands to make 
> it explicit:
> 
> V.1             # create VM 1
> M.1.VCS         # create ctx/queue 1 with all VCS engines
> v.1.1            # assign VM 1 to ctx/queue 1
> B.1            # turn on load balancing for ctx 1
> 1.VCS.1000.0.0        # submit to ctx/queue 1
> 
> I think this could work with both i915 and xe as is.

will think on this.

> 
> Things like compute mode you add as extensions which i915 could then 
> ignore.
> 
> V.1
> c.1    # turn on compute mode on vm 1
> M.1.VCS    # do you *need* to repeat the compute mode if vm carries the 
> info?
I think not, recent changes to uAPI 
(https://patchwork.freedesktop.org/series/123916/) remove the need for 
COMPUTE_MODE property on exec queue.
> v.1.1
> B.1
> 1.VCS.1000.0.0
> 
> Still would work with both i915 and xe if I am not missing something.
> 
> I mean maybe even you don't need explicit VM management in the first go 
> and can just do what the code currently does which is shares the same VM 
> for all contexts?
> 
> That much for now, let the brainstorming commence! :)
> 
> Regards,
> 
> Tvrtko
> 
> P.S.
> 
> Engine bonds could be used to validate and set up parallel submission 
> queue. For instance:
> 
> b.1.VCS2.VCS1
> 
> Is probably a no-op on xe with parallel queues. Or you use it to 
> configure the engine map order, if that is important.
>
> Problem will be converting multiple submission into one. It is probably 
> doable but not warranted to include. It is okay to error out for now on 
> workloads which use the feature.

I don't get parallel submission queue concept - is it submission on 
multiple engines of same class at same time (I think in xe it's a width 
parameter of exec ioctl ?)

Thanks a lot for valuable feedback
--
marcin
> 
>> +
>> +
>>   The above workload described in human language works like this:
>>     1.   A batch is sent to the VCS1 engine which will be executing 
>> for 3ms on the
>> @@ -78,16 +116,30 @@ Multiple dependencies can be given separated by 
>> forward slashes.
>>   Example:
>> +# i915
>>     1.VCS1.3000.0.1
>>     1.RCS.3700.0.0
>>     1.VCS2.2300.-1/-2.0
>> +# xe
>> +  v.0
>> +  e.0.2.0.0.0
>> +  e.0.0.0.0.0
>> +  e.0.2.1.0.0.0
>> +  0.3000.0.1
>> +  1.3700.0.0
>> +  2.2300.-1/-2.0
>> +
>>   I this case the last step has a data dependency on both first and 
>> second steps.
>>   Batch durations can also be specified as infinite by using the '*' 
>> in the
>>   duration field. Such batches must be ended by the terminate command 
>> ('T')
>>   otherwise they will cause a GPU hang to be reported.
>> +Note: On Xe Batch dependencies are expressed with syncobjects,
>> +so there is no difference between f-1 and -1
>> +ex. 1.1000.-2.0 is same as 1.1000.f-2.0.
>> +
>>   Sync (fd) fences
>>   ----------------
>> @@ -116,6 +168,7 @@ VCS1 and VCS2 batches will have a sync fence 
>> dependency on the RCS batch.
>>   Example:
>> +# i915
>>     1.RCS.500-1000.0.0
>>     f
>>     2.VCS1.3000.f-1.0
>> @@ -125,13 +178,27 @@ Example:
>>     s.-4
>>     s.-4
>> +# xe equivalent
>> +  v.0
>> +  e.0.0.0.0.0    # RCS
>> +  e.0.2.0.0.0    # VCS1
>> +  e.0.2.1.0.0    # VCS2
>> +  0.500-1000.0.0
>> +  f
>> +  1.3000.f-1.0
>> +  2.3000.f-2.0
>> +  0.500-1000.0.1
>> +  a.-4
>> +  s.-4
>> +  s.-4
>> +
>>   VCS1 and VCS2 batches have an input sync fence dependecy on the 
>> standalone fence
>>   created at the second step. They are submitted ahead of time while 
>> still not
>>   runnable. When the second RCS batch completes the standalone fence 
>> is signaled
>>   which allows the two VCS batches to be executed. Finally we wait 
>> until the both
>>   VCS batches have completed before starting the (optional) next 
>> iteration.
>> -Submit fences
>> +Submit fences (i915 only?)
>>   -------------
>>   Submit fences are a type of input fence which are signalled when the 
>> originating
>> diff --git a/benchmarks/wsim/xe_cloud-gaming-60fps.wsim 
>> b/benchmarks/wsim/xe_cloud-gaming-60fps.wsim
>> new file mode 100644
>> index 000000000..9fdf15e27
>> --- /dev/null
>> +++ b/benchmarks/wsim/xe_cloud-gaming-60fps.wsim
>> @@ -0,0 +1,25 @@
>> +#w.1.10n8m
>> +#w.2.3n16m
>> +#1.RCS.500-1500.r1-0-4/w2-0.0
>> +#1.RCS.500-1500.r1-5-9/w2-1.0
>> +#1.RCS.500-1500.r2-0-1/w2-2.0
>> +#M.2.VCS
>> +#B.2
>> +#3.RCS.500-1500.r2-2.0
>> +#2.DEFAULT.2000-4000.-1.0
>> +#4.VCS1.250-750.-1.1
>> +#p.16667
>> +#
>> +#xe
>> +v.0
>> +e.0.0.0.0.0 # 1.RCS.500-1500.r1-0-4/w2-0.0
>> +e.0.2.0.0.0 # 2.DEFAULT.2000-4000.-1.0
>> +e.0.0.0.0.0 # 3.RCS.500-1500.r2-2.0
>> +e.0.2.1.0.0 # 4.VCS1.250-750.-1.1
>> +0.500-1500.0.0
>> +0.500-1500.0.0
>> +0.500-1500.0.0
>> +2.500-1500.-2.0 # #3.RCS.500-1500.r2-2.0
>> +1.2000-4000.-1.0
>> +3.250-750.-1.1
>> +p.16667
>> diff --git a/benchmarks/wsim/xe_example.wsim 
>> b/benchmarks/wsim/xe_example.wsim
>> new file mode 100644
>> index 000000000..3fa620932
>> --- /dev/null
>> +++ b/benchmarks/wsim/xe_example.wsim
>> @@ -0,0 +1,28 @@
>> +#i915
>> +#1.VCS1.3000.0.1
>> +#1.RCS.500-1000.-1.0
>> +#1.RCS.3700.0.0
>> +#1.RCS.1000.-2.0
>> +#1.VCS2.2300.-2.0
>> +#1.RCS.4700.-1.0
>> +#1.VCS2.600.-1.1
>> +#p.16000
>> +#
>> +#xe
>> +#
>> +#VM: v.compute_mode
>> +v.0
>> +#EXEC_QUEUE: e.vm_idx.class.intance.compute_mode.job_timeout_ms
>> +e.0.2.0.0.0 # VCS1
>> +e.0.0.0.0.0 # RCS
>> +e.0.2.1.0.0 # VCS2
>> +e.0.0.0.0.0 # second RCS exec_queue
>> +#BATCH: eq_idx.duration.dependency.wait
>> +0.3000.0.1       # 1.VCS1.3000.0.1
>> +1.500-1000.-1.0  # 1.RCS.500-1000.-1.0
>> +3.3700.0.0       # 1.RCS.3700.0.0
>> +1.1000.-2.1      # 1.RCS.1000.-2.0
>> +2.2300.-2.0      # 1.VCS2.2300.-2.0
>> +3.4700.-1.0      # 1.RCS.4700.-1.0
>> +2.600.-1.1       # 1.VCS2.600.-1.1
>> +p.16000
>> diff --git a/benchmarks/wsim/xe_example01.wsim 
>> b/benchmarks/wsim/xe_example01.wsim
>> new file mode 100644
>> index 000000000..496905371
>> --- /dev/null
>> +++ b/benchmarks/wsim/xe_example01.wsim
>> @@ -0,0 +1,19 @@
>> +#VM: v.compute_mode
>> +v.0
>> +#EXEC_QUEUE: e.vm_idx.class.intance.compute_mode.job_timeout_ms
>> +e.0.0.0.0.0
>> +e.0.2.0.0.0
>> +e.0.1.0.0.0
>> +#BATCH: eq_idx.duration.dependency.wait
>> +# B1 - 10ms batch on BCS0
>> +2.10000.0.0
>> +# B2 - 10ms batch on RCS0; waits on B1
>> +0.10000.0.0
>> +# B3 - 10ms batch on VECS0; waits on B2
>> +1.10000.0.0
>> +# B4 - 10ms batch on BCS0
>> +2.10000.0.0
>> +# B5 - 10ms batch on RCS0; waits on B4
>> +0.10000.-1.0
>> +# B6 - 10ms batch on VECS0; waits on B5; wait on batch fence out
>> +1.10000.-1.1
>> diff --git a/benchmarks/wsim/xe_example_fence.wsim 
>> b/benchmarks/wsim/xe_example_fence.wsim
>> new file mode 100644
>> index 000000000..4f810d64e
>> --- /dev/null
>> +++ b/benchmarks/wsim/xe_example_fence.wsim
>> @@ -0,0 +1,23 @@
>> +#i915
>> +#1.RCS.500-1000.0.0
>> +#f
>> +#2.VCS1.3000.f-1.0
>> +#2.VCS2.3000.f-2.0
>> +#1.RCS.500-1000.0.1
>> +#a.-4
>> +#s.-4
>> +#s.-4
>> +#
>> +#xe
>> +v.0
>> +e.0.0.0.0.0
>> +e.0.2.0.0.0
>> +e.0.2.1.0.0
>> +0.500-1000.0.0
>> +f
>> +1.3000.f-1.0
>> +2.3000.f-2.0
>> +0.500-1000.0.1
>> +a.-4
>> +s.-4
>> +s.-4
>> diff --git a/benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim 
>> b/benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim
>> new file mode 100644
>> index 000000000..2214914eb
>> --- /dev/null
>> +++ b/benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim
>> @@ -0,0 +1,63 @@
>> +# 
>> https://lore.kernel.org/dri-devel/a443495f-5d1b-52e1-9b2f-80167deb6d57@linux.intel.com/
>> +#i915
>> +#M.3.VCS
>> +#B.3
>> +#1.VCS1.1200-1800.0.0
>> +#1.VCS1.1900-2100.0.0
>> +#2.RCS.1500-2000.-1.0
>> +#3.VCS.1400-1800.-1.1
>> +#1.VCS1.1900-2100.-1.0
>> +#2.RCS.1500-2000.-1.0
>> +#3.VCS.1400-1800.-1.1
>> +#1.VCS1.1900-2100.-1.0
>> +#2.RCS.200-400.-1.0
>> +#2.RCS.1500-2000.0.0
>> +#3.VCS.1400-1800.-1.1
>> +#1.VCS1.1900-2100.-1.0
>> +#2.RCS.1500-2000.-1.0
>> +#3.VCS.1400-1800.-1.1
>> +#1.VCS1.1900-2100.-1.0
>> +#2.RCS.200-400.-1.0
>> +#2.RCS.1500-2000.0.0
>> +#3.VCS.1400-1800.-1.1
>> +#1.VCS1.1900-2100.-1.0
>> +#2.RCS.1500-2000.-1.0
>> +#3.VCS.1400-1800.-1.1
>> +#1.VCS1.1900-2100.-1.0
>> +#2.RCS.1500-2000.-1.0
>> +#2.RCS.1500-2000.0.0
>> +#3.VCS.1400-1800.-1.1
>> +#
>> +#xe
>> +#
>> +#M.3.VCS ??
>> +#B.3     ??
>> +v.0
>> +e.0.2.0.0.0 # 1.VCS1
>> +e.0.0.0.0.0 # 2.RCS
>> +e.0.2.1.0.0 # 3.VCS - no load balancing yet always VCS2
>> +0.1200-1800.0.0
>> +0.1900-2100.0.0
>> +1.1500-2000.-1.0
>> +2.1400-1800.-1.1
>> +0.1900-2100.-1.0
>> +1.1500-2000.-1.0
>> +2.1400-1800.-1.1
>> +0.1900-2100.-1.0
>> +1.200-400.-1.0
>> +1.1500-2000.0.0
>> +2.1400-1800.-1.1
>> +0.1900-2100.-1.0
>> +1.1500-2000.-1.0
>> +2.1400-1800.-1.1
>> +0.1900-2100.-1.0
>> +1.200-400.-1.0
>> +1.1500-2000.0.0
>> +2.1400-1800.-1.1
>> +0.1900-2100.-1.0
>> +1.1500-2000.-1.0
>> +2.1400-1800.-1.1
>> +0.1900-2100.-1.0
>> +1.1500-2000.-1.0
>> +1.1500-2000.0.0
>> +2.1400-1800.-1.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 6/8] benchmarks/gem_wsim: allow comments in workload description files
  2023-09-21 16:20         ` Bernatowicz, Marcin
@ 2023-09-25  9:03           ` Tvrtko Ursulin
  0 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2023-09-25  9:03 UTC (permalink / raw)
  To: Bernatowicz, Marcin, igt-dev; +Cc: chris.p.wilson


On 21/09/2023 17:20, Bernatowicz, Marcin wrote:
> 
> 
> On 9/21/2023 5:22 PM, Tvrtko Ursulin wrote:
>>
>> On 21/09/2023 16:05, Bernatowicz, Marcin wrote:
>>> Hi,
>>>
>>> On 9/20/2023 6:13 PM, Tvrtko Ursulin wrote:
>>>>
>>>> On 06/09/2023 16:51, Marcin Bernatowicz wrote:
>>>>> Lines starting with '#' are skipped.
>>>>> If command line step separator (',') is encountered after '#'
>>>>> it is replaced with ';' to not break parsing.
>>>>>
>>>>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>>>>> ---
>>>>>   benchmarks/gem_wsim.c  | 41 
>>>>> ++++++++++++++++++++++++++++++++---------
>>>>>   benchmarks/wsim/README |  2 ++
>>>>>   2 files changed, 34 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>>>>> index 0c1b58727..ec9fdc2d0 100644
>>>>> --- a/benchmarks/gem_wsim.c
>>>>> +++ b/benchmarks/gem_wsim.c
>>>>> @@ -43,6 +43,7 @@
>>>>>   #include <limits.h>
>>>>>   #include <pthread.h>
>>>>>   #include <math.h>
>>>>> +#include <ctype.h>
>>>>>   #include "drm.h"
>>>>>   #include "drmtest.h"
>>>>> @@ -94,6 +95,7 @@ enum w_type {
>>>>>       TERMINATE,
>>>>>       SSEU,
>>>>>       WORKINGSET,
>>>>> +    SKIP,
>>>>>   };
>>>>>   struct dep_entry {
>>>>> @@ -930,6 +932,12 @@ parse_workload(struct w_arg *arg, unsigned int 
>>>>> flags, double scale_dur,
>>>>>           if (field) {
>>>>>               fstart = NULL;
>>>>> +            /* line starting with # is a comment */
>>>>> +            if (field[0] == '#') {
>>>>> +                step.type = SKIP;
>>>>> +                goto add_step;
>>>>> +            }
>>>>
>>>> Do they need to be recorded as steps and couldn't be simply silently 
>>>> skipped over while parsing?
>>>
>>> Looks indeed a bool skip may be enough. It's a dummy step not stored 
>>> in workload steps.
>>
>> Cool, that would be the best.
>>
>>>>
>>>> How does relative step referencing works when comments are present? 
>>>> (Batch implicit dependencies and 'a' and 's' commands.)
>>>
>>> Comments may be added to existing workloads without a problem as the 
>>> SKIPs are not incrementing nr_step (does not create a record).
>>
>> Oookay, guess I was confused with the goto add_step. :)
>>
>>> Maybe a small confusion is that step.idx does not correspond to line 
>>> number, so in case of failed parse we get a step number without a 
>>> line :/
>>
>> Hm? If parse fails the whole program fails so I didn't get this.
> 
> Sorry, I created a confusion. A Failure message does not point the line 
> number but the step number (0-based) - that didn't change.
> Looking at workload definition (broken for example) in text editor 
> having line number could easy a bit:
> 
> 1  # Workload simulating transcoding session
> 2  # ex. gem_wsim -w benchmarks/wsim/media_load_balance_fhd26u7.wsim -c 
> 36 -r 600
> 3  # will run 36 parallel transcoding session streams for 600 frames each
> 4  M.3.VCS
> 5  B.3
> 6  1.VCS1.1200-1800.0.0
> 7  1.VCS1.1900-2100.0.0
> 8  2.RCS.1500-2000.-1.0
> 9  3.VCS.1400-1800.-1.1
> 10 1.VCS1.1900-2100.-1.0
> 11 2.RCS.1500-2000.-1.0
> 12 3.VCS.1400-1800.-1.1
> 13 1.VCS1.1900-2100.-1.0
> 14 2.RCS.200-400.-1.0
> 15 2.RCS.1500-2000.0.0
> 16 3.VCS.1400-1800.-1.1
> 17 1.VCS1.1900-2100.-1.0
> 18 2.RCS.1500-2000.-1.0
> 19 3.VCS.1400-1800.-1.1
> 20 1.VCS1.1900-2100.-1.0
> 21 2.RCS.200-400.-1.0
> 22 2.RCS.1500-2000.0.0
> 23 3.VCS.1400-k1800.-1.1
> 24 1.VCS1.1900-2100.-1.0
> 25 2.RCS.1500-2000.-1.0
> 26 3.VCS.1400-1800.-1.1
> 27 1.VCS1.1900-2100.-1.0
> 28 2.RCS.1500-2000.-1.0
> 29 2.RCS.1500-2000.0.0
> 30 3.VCS.1400-1800.-1.1
> 
> Current parse failure message will tell:
> 
> Invalid duration at step 19!
> Failed to parse workload 0!
> 
> Which was on line 20 (step no + 1) and now is on 23 (+ no of comment 
> lines).
> But there are more important matters :)

Ah right.. yes it would be handy to have better parsing error reporting. 
I think it is easy to do. Just store line numbers in struct w_step. 
Heck, we could strdup the whole line while parsing and go super fancy by 
showing it verbatim on error, like:

Invalid duration at step 19, line 23!
    >>> '23 3.VCS.1400-k1800.-1.1'
Failed to parse workload 0!

Regards,

Tvrtko

> 
> Regards,
> Marcin
> 
>>
>> Regards,
>>
>> Tvrtko
>>
>>>
>>> Regards,
>>> Marcin
>>>>
>>>> Regards,
>>>>
>>>> Tvrtko
>>>>
>>>>> +
>>>>>               if (!strcmp(field, "d")) {
>>>>>                   int_field(DELAY, delay, tmp <= 0,
>>>>>                         "Invalid delay at step %u!\n");
>>>>> @@ -1194,7 +1202,8 @@ parse_workload(struct w_arg *arg, unsigned 
>>>>> int flags, double scale_dur,
>>>>>           if (field) {
>>>>>               fstart = NULL;
>>>>> -            check_arg(strlen(field) != 1 ||
>>>>> +            check_arg(!strlen(field) ||
>>>>> +                  (strlen(field) > 1 && !isspace(field[1]) && 
>>>>> field[1] != '#') ||
>>>>>                     (field[0] != '0' && field[0] != '1'),
>>>>>                     "Invalid wait boolean at step %u!\n",
>>>>>                     nr_steps);
>>>>> @@ -1208,18 +1217,23 @@ parse_workload(struct w_arg *arg, unsigned 
>>>>> int flags, double scale_dur,
>>>>>           step.type = BATCH;
>>>>>   add_step:
>>>>> -        if (step.type == DELAY)
>>>>> -            step.delay = __duration(step.delay, scale_time);
>>>>> +        if (step.type == SKIP) {
>>>>> +            if (verbose > 3)
>>>>> +                printf("skipped STEP: %s\n", _token);
>>>>> +        } else {
>>>>> +            if (step.type == DELAY)
>>>>> +                step.delay = __duration(step.delay, scale_time);
>>>>> -        step.idx = nr_steps++;
>>>>> -        step.request = -1;
>>>>> -        steps = realloc(steps, sizeof(step) * nr_steps);
>>>>> -        igt_assert(steps);
>>>>> +            step.idx = nr_steps++;
>>>>> +            step.request = -1;
>>>>> +            steps = realloc(steps, sizeof(step) * nr_steps);
>>>>> +            igt_assert(steps);
>>>>> -        memcpy(&steps[nr_steps - 1], &step, sizeof(step));
>>>>> +            memcpy(&steps[nr_steps - 1], &step, sizeof(step));
>>>>> +        }
>>>>>           free(token);
>>>>> -    }
>>>>> +    } // while ((_token = strtok_r(tstart, ",", &tctx))) {
>>>>>       if (app_w) {
>>>>>           steps = realloc(steps, sizeof(step) *
>>>>> @@ -2304,6 +2318,8 @@ static void *run_workload(void *data)
>>>>>               enum intel_engine_id engine = w->engine;
>>>>>               int do_sleep = 0;
>>>>> +            igt_assert(w->type != SKIP);
>>>>> +
>>>>>               if (w->type == DELAY) {
>>>>>                   do_sleep = w->delay;
>>>>>               } else if (w->type == PERIOD) {
>>>>> @@ -2543,6 +2559,13 @@ static char *load_workload_descriptor(char 
>>>>> *filename)
>>>>>       close(infd);
>>>>>       for (i = 0; i < len; i++) {
>>>>> +        /* '#' starts comment till end of line */
>>>>> +        if (buf[i] == '#')
>>>>> +            /* replace ',' in comments to not break parsing */
>>>>> +            while (++i < len && buf[i] != '\n')
>>>>> +                if (buf[i] == ',')
>>>>> +                    buf[i] = ';';
>>>>> +
>>>>>           if (buf[i] == '\n')
>>>>>               buf[i] = ',';
>>>>>       }
>>>>> diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
>>>>> index 8c71f2fe6..e4fd61645 100644
>>>>> --- a/benchmarks/wsim/README
>>>>> +++ b/benchmarks/wsim/README
>>>>> @@ -1,6 +1,8 @@
>>>>>   Workload descriptor format
>>>>>   ==========================
>>>>> +Lines starting with '#' are treated as comments (do not create 
>>>>> work step).
>>>>> +
>>>>>   ctx.engine.duration_us.dependency.wait,...
>>>>>   <uint>.<str>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 
>>>>> 0>][...].<0|1>,...
>>>>>   B.<uint>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 8/8] [RFC] benchmarks/gem_wsim: added basic xe support
  2023-09-21 19:39     ` Bernatowicz, Marcin
@ 2023-09-25  9:16       ` Tvrtko Ursulin
  0 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2023-09-25  9:16 UTC (permalink / raw)
  To: Bernatowicz, Marcin, igt-dev; +Cc: chris.p.wilson


On 21/09/2023 20:39, Bernatowicz, Marcin wrote:
> Hi,
> 
> On 9/21/2023 5:57 PM, Tvrtko Ursulin wrote:
>>
>> On 06/09/2023 16:51, Marcin Bernatowicz wrote:
>>> Added basic xe support with few examples.
>>> Single binary handles both i915 and Xe devices,
>>> but workload definitions differ between i915 and xe.
>>> Xe does not use context abstraction, introduces new VM and Exec Queue
>>> steps and BATCH step references exec queue.
>>> For more details see wsim/README.
>>> Some functionality is still missing: working sets,
>>> load balancing (need some input if/how to do it in Xe - exec queues
>>> width?).
>>>
>>> The tool is handy for scheduling tests, we find it useful to verify vGPU
>>> profiles defining different execution quantum/preemption timeout
>>> settings.
>>>
>>> There is also some rationale for the tool in following thread:
>>> https://lore.kernel.org/dri-devel/a443495f-5d1b-52e1-9b2f-80167deb6d57@linux.intel.com/
>>>
>>> With this patch it should be possible to run following on xe device:
>>>
>>> gem_wsim -w benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim -c 36 
>>> -r 600
>>
>> For historical reference there used to be a tool called media-bench.pl 
>> in IGT which was used to answer a question of "how many streams of 
>> this can this load balancer do". In simplified terms it worked by 
>> increasing the -c above until engine busyness would stop growing, 
>> which meant saturation. With that we were able to compare load 
>> balancing strategies and some other things. Like how many streams 
>> until starting to drop frames.
>>
>> These days, if resurrected, or resurrected in principle, it could 
>> answer the question of which driver can fit more streams of workload 
>> X, or does the new GuC fw regress something.
> 
> interesting
>>
>>> Best with drm debug logs disabled:
>>>
>>> echo 0 > /sys/module/drm/parameters/debug
>>>
>>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>>> ---
>>>   benchmarks/gem_wsim.c                         | 534 ++++++++++++++++--
>>>   benchmarks/wsim/README                        |  85 ++-
>>>   benchmarks/wsim/xe_cloud-gaming-60fps.wsim    |  25 +
>>>   benchmarks/wsim/xe_example.wsim               |  28 +
>>>   benchmarks/wsim/xe_example01.wsim             |  19 +
>>>   benchmarks/wsim/xe_example_fence.wsim         |  23 +
>>>   .../wsim/xe_media_load_balance_fhd26u7.wsim   |  63 +++
>>>   7 files changed, 722 insertions(+), 55 deletions(-)
>>>   create mode 100644 benchmarks/wsim/xe_cloud-gaming-60fps.wsim
>>>   create mode 100644 benchmarks/wsim/xe_example.wsim
>>>   create mode 100644 benchmarks/wsim/xe_example01.wsim
>>>   create mode 100644 benchmarks/wsim/xe_example_fence.wsim
>>>   create mode 100644 benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim
>>
>> 8<
>>> diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
>>> index e4fd61645..ddfefff47 100644
>>> --- a/benchmarks/wsim/README
>>> +++ b/benchmarks/wsim/README
>>> @@ -3,6 +3,7 @@ Workload descriptor format
>>>   Lines starting with '#' are treated as comments (do not create work 
>>> step).
>>> +# i915
>>>   ctx.engine.duration_us.dependency.wait,...
>>>   <uint>.<str>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
>>>   B.<uint>
>>> @@ -13,6 +14,23 @@ b.<uint>.<str>[|<str>].<str>
>>>   w|W.<uint>.<str>[/<str>]...
>>>   f
>>> +# xe
>>> +Xe does not use context abstraction and adds additional work step types
>>> +for VM (v.) and exec queue (e.) creation.
>>> +Each v. and e. step creates array entry (in workload's VM and Exec 
>>> Queue arrays).
>>> +Batch step references the exec queue on which it is to be executed.
>>> +Exec queue reference (eq_idx) is the index (0-based) in workload's 
>>> exec queue array.
>>> +VM reference (vm_idx) is the index (0-based) in workload's VM array.
>>> +
>>> +v.compute_mode
>>> +v.<0|1>
>>> +e.vm_idx.class.instance.compute_mode.job_timeout_ms,...
>>> +e.<uint>.<uint 0=RCS,1=BCS,2=VCS,3=VECS,4=CCS>.<int>.<0|1>.<uint>,...
>>> +eq_idx.duration_us.dependency.wait,...
>>> +<uint>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
>>> +d|p|s|t|q|a|T.<int>,...
>>> +f
>>> +
>>>   For duration a range can be given from which a random value will be 
>>> picked
>>>   before every submit. Since this and seqno management requires CPU 
>>> access to
>>>   objects, care needs to be taken in order to ensure the submit queue 
>>> is deep
>>> @@ -29,21 +47,22 @@ Additional workload steps are also supported:
>>>    'q' - Throttle to n max queue depth.
>>>    'f' - Create a sync fence.
>>>    'a' - Advance the previously created sync fence.
>>> - 'B' - Turn on context load balancing.
>>> - 'b' - Set up engine bonds.
>>> - 'M' - Set up engine map.
>>> - 'P' - Context priority.
>>> - 'S' - Context SSEU configuration.
>>> + 'B' - Turn on context load balancing. (i915 only)
>>> + 'b' - Set up engine bonds. (i915 only)
>>> + 'M' - Set up engine map. (i915 only)
>>> + 'P' - Context priority. (i915 only)
>>> + 'S' - Context SSEU configuration. (i915 only)
>>>    'T' - Terminate an infinite batch.
>>> - 'w' - Working set. (See Working sets section.)
>>> - 'W' - Shared working set.
>>> - 'X' - Context preemption control.
>>> + 'w' - Working set. (See Working sets section.) (i915 only)
>>> + 'W' - Shared working set. (i915 only)
>>> + 'X' - Context preemption control. (i915 only)
>>>   Engine ids: DEFAULT, RCS, BCS, VCS, VCS1, VCS2, VECS
>>>   Example (leading spaces must not be present in the actual file):
>>>   ----------------------------------------------------------------
>>> +# i915
>>>     1.VCS1.3000.0.1
>>>     1.RCS.500-1000.-1.0
>>>     1.RCS.3700.0.0
>>> @@ -53,6 +72,25 @@ Example (leading spaces must not be present in the 
>>> actual file):
>>>     1.VCS2.600.-1.1
>>>     p.16000
>>> +# xe equivalent
>>> +  #VM: v.compute_mode
>>> +  v.0
>>> +  #EXEC_QUEUE: e.vm_idx.class.intance.compute_mode.job_timeout_ms
>>> +  e.0.2.0.0.0 # VCS1
>>
>> A minor digression - I would suggest using more symbolic names and 
>> less numbers. For instance encode class instance in names.
> 
> yes, it is just a first fast prototype ;/
> Currently have something like
> e.vm_idx.class.instance[.jb=[uint].ts=[uint].pt=[uint].pr=[uint]]
> so some fields (job timeout/timeslice_us/preempt_timeout_us/priority 
> properties) are optional, so now it is : e.0.2.0 # VCS1
> Introduction of symbolic names is next step, first wanted any feedback.
>>
>>> +  e.0.0.0.0.0 # RCS
>>> +  e.0.2.1.0.0 # VCS2
>>> +  e.0.0.0.0.0 # second RCS exec queue
>>> +  #BATCH: eq_idx.duration.dependency.wait
>>> +  0.3000.0.1       # 1.VCS1.3000.0.1
>>> +  1.500-1000.-1.0  # 1.RCS.500-1000.-1.0
>>> +  3.3700.0.0       # 1.RCS.3700.0.0
>>> +  1.1000.-2.1      # 1.RCS.1000.-2.0
>>> +  2.2300.-2.0      # 1.VCS2.2300.-2.0
>>> +  3.4700.-1.0      # 1.RCS.4700.-1.0
>>> +  2.600.-1.1       # 1.VCS2.600.-1.1
>>> +  p.16000
>>
>> My initial feeling, and also after some thinking, is that it would be 
>> good to look for solutions for minimising divergence. That means try 
>> to avoid having completely different syntax and zero chance of 
>> workloads which can be run with either driver.
> 
> My first thought was to introduce new syntax to have xe uAPI granularity 
> (if it exposes vm, exec_queue - make it accessible), but it was just 
> first shot.
> 
>>
>> For instance the concept of a queue is relatively similar and in 
>> practice with xe ends up a little bit more limited. Which I think is 
>> solvable.
>>
>> For instance I think this can be made to work with xe.
>>
>> M.1.VCS1|VCS2
>> # or M.1.VCS - class names without numbers can be kept considered VCS*
>> B.1
>> 1.VCS.500-2000.0.0
>>
>> As for i915 this creates a load balancing context with engine map 
>> populated, I think with xe you have the same concept when creating a 
>> queue - allowed engine mask - right?
> 
> Yes, I think we have num_placements to allow exec queue represent a set 
> of engines of given class.
> 
>>
>> B.1 step you can skip with xe if it is not needed, I mean if multiple 
>> allowed engines imply load balancing there.
> 
> That is to be checked. I've a patch (not posted yet) which allows to 
> create exec queue with all engines of given class (using num_placements):
> 
>      benchmarks/gem_wsim: use num_placements for exec queue creation
> 
>      Enable num_placement exec queue creation option.
> 
>      Tried following workload:
> 
>      gem_wsim -w xe_media_load_balance_fhd26u7.wsim -c 36 -r 25 -v
> 
>      with three versions of exec queue definitions
>      (listed from worse to best in terms of workloads/s):
> 
>      e.0.2.0 # 1.VCS1
>      e.0.0.0 # 2.RCS       -> ~83% of last one
>      e.0.2.-1 # any VCS
> 
>      e.0.2.0 # 1.VCS1
>      e.0.0.0 # 2.RCS       -> ~85% of last one
>      e.0.2.1 # always VCS2
> 
>      e.0.2.-1 # any VCS
>      e.0.0.0  # RCS        -> 100%
>      e.0.2.-1 # any VCS
> 
> So it looks best results (load balancing?) happen when all exec queues 
> of given class are configured to use all engines of that class.
> With two exec queues of which one was on VCS1 and second exec queue on 
> all VCS it was even a bit worse then when first was on VCS1 and second 
> on VCS2.
> Full xe_media_load_balance_fhd26u7.wsim workload is at end of this 
> message (i915 and my xe version), more or less we have there exec_queue 
> <-> context equivalence.

Note that fhd26u7 is an equivalent of a real transcoding session which 
was running on platforms where not all VCS engines were equal. Therefore 
some batches *had* to go to VCS1 only, while some could be load-balanced.

(See I915_VIDEO_CLASS_CAPABILITY_HEVC and 
I915_VIDEO_AND_ENHANCE_CLASS_CAPABILITY_SFC.)

Therefore if you would change it to any VCS it would no longer be 
representative of a real media workload. At least on those platforms. I 
am not up to speed if modern gens still have the same design or things 
got simplified.

>> And then the actual submission you know it is queue 1 and VCS you can 
>> you to sanitize. If it doesn't match the queue configuration error 
>> out, otherwise just submit to the queue.
>>
>> VM management can be explicit steps, and AFAIR gem_wsim already shares 
>> the VM implicitly, so for xe you just need to add some commands to 
>> make it explicit:
>>
>> V.1             # create VM 1
>> M.1.VCS         # create ctx/queue 1 with all VCS engines
>> v.1.1            # assign VM 1 to ctx/queue 1
>> B.1            # turn on load balancing for ctx 1
>> 1.VCS.1000.0.0        # submit to ctx/queue 1
>>
>> I think this could work with both i915 and xe as is.
> 
> will think on this.
> 
>>
>> Things like compute mode you add as extensions which i915 could then 
>> ignore.
>>
>> V.1
>> c.1    # turn on compute mode on vm 1
>> M.1.VCS    # do you *need* to repeat the compute mode if vm carries 
>> the info?
> I think not, recent changes to uAPI 
> (https://patchwork.freedesktop.org/series/123916/) remove the need for 
> COMPUTE_MODE property on exec queue.
>> v.1.1
>> B.1
>> 1.VCS.1000.0.0
>>
>> Still would work with both i915 and xe if I am not missing something.
>>
>> I mean maybe even you don't need explicit VM management in the first 
>> go and can just do what the code currently does which is shares the 
>> same VM for all contexts?
>>
>> That much for now, let the brainstorming commence! :)
>>
>> Regards,
>>
>> Tvrtko
>>
>> P.S.
>>
>> Engine bonds could be used to validate and set up parallel submission 
>> queue. For instance:
>>
>> b.1.VCS2.VCS1
>>
>> Is probably a no-op on xe with parallel queues. Or you use it to 
>> configure the engine map order, if that is important.
>>
>> Problem will be converting multiple submission into one. It is 
>> probably doable but not warranted to include. It is okay to error out 
>> for now on workloads which use the feature.
> 
> I don't get parallel submission queue concept - is it submission on 
> multiple engines of same class at same time (I think in xe it's a width 
> parameter of exec ioctl ?)

Yes, it is for example two VCS engines working in a parallel on the same 
data block. And they have to run strictly in parallel.

I think the only example in wsim of that is frame-split-60fps.wsim:

... set up two contexts on two vcs engines and set up the "bond"...
f # create a fence
1.DEFAULT.*.f-1.0 # submit 1st batch blocked on the fence
2.DEFAULT.4000-6000.s-1.0 # submit 2nd batch with submit fence on 1st
a.-3 # unblock 1st batch (and therefore both)
...

But as said, even though it is used by media, I think best to leave this 
for the 2nd stage of gem_wsim refactors. We can then decide if it makes 
sense (or if even possible) to auto-magically convert this to a xe 
parallel queue or diverge.

Regards,

Tvrtko

> 
> Thanks a lot for valuable feedback
> -- 
> marcin
>>
>>> +
>>> +
>>>   The above workload described in human language works like this:
>>>     1.   A batch is sent to the VCS1 engine which will be executing 
>>> for 3ms on the
>>> @@ -78,16 +116,30 @@ Multiple dependencies can be given separated by 
>>> forward slashes.
>>>   Example:
>>> +# i915
>>>     1.VCS1.3000.0.1
>>>     1.RCS.3700.0.0
>>>     1.VCS2.2300.-1/-2.0
>>> +# xe
>>> +  v.0
>>> +  e.0.2.0.0.0
>>> +  e.0.0.0.0.0
>>> +  e.0.2.1.0.0.0
>>> +  0.3000.0.1
>>> +  1.3700.0.0
>>> +  2.2300.-1/-2.0
>>> +
>>>   I this case the last step has a data dependency on both first and 
>>> second steps.
>>>   Batch durations can also be specified as infinite by using the '*' 
>>> in the
>>>   duration field. Such batches must be ended by the terminate command 
>>> ('T')
>>>   otherwise they will cause a GPU hang to be reported.
>>> +Note: On Xe Batch dependencies are expressed with syncobjects,
>>> +so there is no difference between f-1 and -1
>>> +ex. 1.1000.-2.0 is same as 1.1000.f-2.0.
>>> +
>>>   Sync (fd) fences
>>>   ----------------
>>> @@ -116,6 +168,7 @@ VCS1 and VCS2 batches will have a sync fence 
>>> dependency on the RCS batch.
>>>   Example:
>>> +# i915
>>>     1.RCS.500-1000.0.0
>>>     f
>>>     2.VCS1.3000.f-1.0
>>> @@ -125,13 +178,27 @@ Example:
>>>     s.-4
>>>     s.-4
>>> +# xe equivalent
>>> +  v.0
>>> +  e.0.0.0.0.0    # RCS
>>> +  e.0.2.0.0.0    # VCS1
>>> +  e.0.2.1.0.0    # VCS2
>>> +  0.500-1000.0.0
>>> +  f
>>> +  1.3000.f-1.0
>>> +  2.3000.f-2.0
>>> +  0.500-1000.0.1
>>> +  a.-4
>>> +  s.-4
>>> +  s.-4
>>> +
>>>   VCS1 and VCS2 batches have an input sync fence dependecy on the 
>>> standalone fence
>>>   created at the second step. They are submitted ahead of time while 
>>> still not
>>>   runnable. When the second RCS batch completes the standalone fence 
>>> is signaled
>>>   which allows the two VCS batches to be executed. Finally we wait 
>>> until the both
>>>   VCS batches have completed before starting the (optional) next 
>>> iteration.
>>> -Submit fences
>>> +Submit fences (i915 only?)
>>>   -------------
>>>   Submit fences are a type of input fence which are signalled when 
>>> the originating
>>> diff --git a/benchmarks/wsim/xe_cloud-gaming-60fps.wsim 
>>> b/benchmarks/wsim/xe_cloud-gaming-60fps.wsim
>>> new file mode 100644
>>> index 000000000..9fdf15e27
>>> --- /dev/null
>>> +++ b/benchmarks/wsim/xe_cloud-gaming-60fps.wsim
>>> @@ -0,0 +1,25 @@
>>> +#w.1.10n8m
>>> +#w.2.3n16m
>>> +#1.RCS.500-1500.r1-0-4/w2-0.0
>>> +#1.RCS.500-1500.r1-5-9/w2-1.0
>>> +#1.RCS.500-1500.r2-0-1/w2-2.0
>>> +#M.2.VCS
>>> +#B.2
>>> +#3.RCS.500-1500.r2-2.0
>>> +#2.DEFAULT.2000-4000.-1.0
>>> +#4.VCS1.250-750.-1.1
>>> +#p.16667
>>> +#
>>> +#xe
>>> +v.0
>>> +e.0.0.0.0.0 # 1.RCS.500-1500.r1-0-4/w2-0.0
>>> +e.0.2.0.0.0 # 2.DEFAULT.2000-4000.-1.0
>>> +e.0.0.0.0.0 # 3.RCS.500-1500.r2-2.0
>>> +e.0.2.1.0.0 # 4.VCS1.250-750.-1.1
>>> +0.500-1500.0.0
>>> +0.500-1500.0.0
>>> +0.500-1500.0.0
>>> +2.500-1500.-2.0 # #3.RCS.500-1500.r2-2.0
>>> +1.2000-4000.-1.0
>>> +3.250-750.-1.1
>>> +p.16667
>>> diff --git a/benchmarks/wsim/xe_example.wsim 
>>> b/benchmarks/wsim/xe_example.wsim
>>> new file mode 100644
>>> index 000000000..3fa620932
>>> --- /dev/null
>>> +++ b/benchmarks/wsim/xe_example.wsim
>>> @@ -0,0 +1,28 @@
>>> +#i915
>>> +#1.VCS1.3000.0.1
>>> +#1.RCS.500-1000.-1.0
>>> +#1.RCS.3700.0.0
>>> +#1.RCS.1000.-2.0
>>> +#1.VCS2.2300.-2.0
>>> +#1.RCS.4700.-1.0
>>> +#1.VCS2.600.-1.1
>>> +#p.16000
>>> +#
>>> +#xe
>>> +#
>>> +#VM: v.compute_mode
>>> +v.0
>>> +#EXEC_QUEUE: e.vm_idx.class.intance.compute_mode.job_timeout_ms
>>> +e.0.2.0.0.0 # VCS1
>>> +e.0.0.0.0.0 # RCS
>>> +e.0.2.1.0.0 # VCS2
>>> +e.0.0.0.0.0 # second RCS exec_queue
>>> +#BATCH: eq_idx.duration.dependency.wait
>>> +0.3000.0.1       # 1.VCS1.3000.0.1
>>> +1.500-1000.-1.0  # 1.RCS.500-1000.-1.0
>>> +3.3700.0.0       # 1.RCS.3700.0.0
>>> +1.1000.-2.1      # 1.RCS.1000.-2.0
>>> +2.2300.-2.0      # 1.VCS2.2300.-2.0
>>> +3.4700.-1.0      # 1.RCS.4700.-1.0
>>> +2.600.-1.1       # 1.VCS2.600.-1.1
>>> +p.16000
>>> diff --git a/benchmarks/wsim/xe_example01.wsim 
>>> b/benchmarks/wsim/xe_example01.wsim
>>> new file mode 100644
>>> index 000000000..496905371
>>> --- /dev/null
>>> +++ b/benchmarks/wsim/xe_example01.wsim
>>> @@ -0,0 +1,19 @@
>>> +#VM: v.compute_mode
>>> +v.0
>>> +#EXEC_QUEUE: e.vm_idx.class.intance.compute_mode.job_timeout_ms
>>> +e.0.0.0.0.0
>>> +e.0.2.0.0.0
>>> +e.0.1.0.0.0
>>> +#BATCH: eq_idx.duration.dependency.wait
>>> +# B1 - 10ms batch on BCS0
>>> +2.10000.0.0
>>> +# B2 - 10ms batch on RCS0; waits on B1
>>> +0.10000.0.0
>>> +# B3 - 10ms batch on VECS0; waits on B2
>>> +1.10000.0.0
>>> +# B4 - 10ms batch on BCS0
>>> +2.10000.0.0
>>> +# B5 - 10ms batch on RCS0; waits on B4
>>> +0.10000.-1.0
>>> +# B6 - 10ms batch on VECS0; waits on B5; wait on batch fence out
>>> +1.10000.-1.1
>>> diff --git a/benchmarks/wsim/xe_example_fence.wsim 
>>> b/benchmarks/wsim/xe_example_fence.wsim
>>> new file mode 100644
>>> index 000000000..4f810d64e
>>> --- /dev/null
>>> +++ b/benchmarks/wsim/xe_example_fence.wsim
>>> @@ -0,0 +1,23 @@
>>> +#i915
>>> +#1.RCS.500-1000.0.0
>>> +#f
>>> +#2.VCS1.3000.f-1.0
>>> +#2.VCS2.3000.f-2.0
>>> +#1.RCS.500-1000.0.1
>>> +#a.-4
>>> +#s.-4
>>> +#s.-4
>>> +#
>>> +#xe
>>> +v.0
>>> +e.0.0.0.0.0
>>> +e.0.2.0.0.0
>>> +e.0.2.1.0.0
>>> +0.500-1000.0.0
>>> +f
>>> +1.3000.f-1.0
>>> +2.3000.f-2.0
>>> +0.500-1000.0.1
>>> +a.-4
>>> +s.-4
>>> +s.-4
>>> diff --git a/benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim 
>>> b/benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim
>>> new file mode 100644
>>> index 000000000..2214914eb
>>> --- /dev/null
>>> +++ b/benchmarks/wsim/xe_media_load_balance_fhd26u7.wsim
>>> @@ -0,0 +1,63 @@
>>> +# 
>>> https://lore.kernel.org/dri-devel/a443495f-5d1b-52e1-9b2f-80167deb6d57@linux.intel.com/
>>> +#i915
>>> +#M.3.VCS
>>> +#B.3
>>> +#1.VCS1.1200-1800.0.0
>>> +#1.VCS1.1900-2100.0.0
>>> +#2.RCS.1500-2000.-1.0
>>> +#3.VCS.1400-1800.-1.1
>>> +#1.VCS1.1900-2100.-1.0
>>> +#2.RCS.1500-2000.-1.0
>>> +#3.VCS.1400-1800.-1.1
>>> +#1.VCS1.1900-2100.-1.0
>>> +#2.RCS.200-400.-1.0
>>> +#2.RCS.1500-2000.0.0
>>> +#3.VCS.1400-1800.-1.1
>>> +#1.VCS1.1900-2100.-1.0
>>> +#2.RCS.1500-2000.-1.0
>>> +#3.VCS.1400-1800.-1.1
>>> +#1.VCS1.1900-2100.-1.0
>>> +#2.RCS.200-400.-1.0
>>> +#2.RCS.1500-2000.0.0
>>> +#3.VCS.1400-1800.-1.1
>>> +#1.VCS1.1900-2100.-1.0
>>> +#2.RCS.1500-2000.-1.0
>>> +#3.VCS.1400-1800.-1.1
>>> +#1.VCS1.1900-2100.-1.0
>>> +#2.RCS.1500-2000.-1.0
>>> +#2.RCS.1500-2000.0.0
>>> +#3.VCS.1400-1800.-1.1
>>> +#
>>> +#xe
>>> +#
>>> +#M.3.VCS ??
>>> +#B.3     ??
>>> +v.0
>>> +e.0.2.0.0.0 # 1.VCS1
>>> +e.0.0.0.0.0 # 2.RCS
>>> +e.0.2.1.0.0 # 3.VCS - no load balancing yet always VCS2
>>> +0.1200-1800.0.0
>>> +0.1900-2100.0.0
>>> +1.1500-2000.-1.0
>>> +2.1400-1800.-1.1
>>> +0.1900-2100.-1.0
>>> +1.1500-2000.-1.0
>>> +2.1400-1800.-1.1
>>> +0.1900-2100.-1.0
>>> +1.200-400.-1.0
>>> +1.1500-2000.0.0
>>> +2.1400-1800.-1.1
>>> +0.1900-2100.-1.0
>>> +1.1500-2000.-1.0
>>> +2.1400-1800.-1.1
>>> +0.1900-2100.-1.0
>>> +1.200-400.-1.0
>>> +1.1500-2000.0.0
>>> +2.1400-1800.-1.1
>>> +0.1900-2100.-1.0
>>> +1.1500-2000.-1.0
>>> +2.1400-1800.-1.1
>>> +0.1900-2100.-1.0
>>> +1.1500-2000.-1.0
>>> +1.1500-2000.0.0
>>> +2.1400-1800.-1.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-09-25  9:17 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-06 15:51 [igt-dev] [PATCH i-g-t 0/8] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 1/8] lib/xe_spin: xe_spin_opts for xe_spin initialization Marcin Bernatowicz
2023-09-20 16:43   ` Kamil Konieczny
2023-09-21 15:08     ` Bernatowicz, Marcin
2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 2/8] lib/xe_spin: fixed duration xe_spin capability Marcin Bernatowicz
2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 3/8] lib/igt_device_scan: Xe get integrated/discrete card functions Marcin Bernatowicz
2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 4/8] benchmarks/gem_wsim: scale duration option fixes Marcin Bernatowicz
2023-09-20 16:06   ` Tvrtko Ursulin
2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 5/8] benchmarks/gem_wsim: cleanups Marcin Bernatowicz
2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 6/8] benchmarks/gem_wsim: allow comments in workload description files Marcin Bernatowicz
2023-09-20 16:13   ` Tvrtko Ursulin
2023-09-21 15:05     ` Bernatowicz, Marcin
2023-09-21 15:22       ` Tvrtko Ursulin
2023-09-21 16:20         ` Bernatowicz, Marcin
2023-09-25  9:03           ` Tvrtko Ursulin
2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 7/8] benchmarks/gem_wsim: extract prepare_ctxs function, add w_sync Marcin Bernatowicz
2023-09-06 15:51 ` [igt-dev] [PATCH i-g-t 8/8] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
2023-09-21 15:57   ` Tvrtko Ursulin
2023-09-21 19:39     ` Bernatowicz, Marcin
2023-09-25  9:16       ` Tvrtko Ursulin
2023-09-06 21:01 ` [igt-dev] ✗ Fi.CI.BAT: failure for benchmarks/gem_wsim: added basic xe support (rev2) Patchwork
2023-09-07  9:30   ` Bernatowicz, Marcin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox