Igt-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [igt-dev] [PATCH i-g-t 00/14] [RFC] benchmarks/gem_wsim: added basic xe support
@ 2023-09-26  8:44 Marcin Bernatowicz
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 01/14] lib/igt_device_scan: Xe get integrated/discrete card functions Marcin Bernatowicz
                   ` (16 more replies)
  0 siblings, 17 replies; 39+ messages in thread
From: Marcin Bernatowicz @ 2023-09-26  8:44 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Added basic xe support. Single binary handles both i915 and Xe devices.

Some functionality is still missing: working sets, bonding.

The tool is handy for scheduling tests, we find it useful to verify vGPU
profiles defining different execution quantum/preemption timeout
settings.

There is also some rationale for the tool in following thread:
https://lore.kernel.org/dri-devel/a443495f-5d1b-52e1-9b2f-80167deb6d57@linux.intel.com/

With this patch it should be possible to run following on xe device:

gem_wsim -w benchmarks/wsim/media_load_balance_fhd26u7.wsim -c 36 -r 600

Best with drm debug logs disabled:

echo 0 > /sys/module/drm/parameters/debug

v2: 
- minimizing divergence - same workload syntax for both drivers,
  so most existing examples should run on xe unmodified (Tvrtko)
  This version creates one common VM per workload.
  Explicit VM management, compute mode, improved engine handling
  to come in next patchset.
- split patches to easy review (Tvrtko)
- dropped already merged patches, added documentation to public
  lib functions, some code cleanups (Kamil)

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>

Marcin Bernatowicz (14):
  lib/igt_device_scan: Xe get integrated/discrete card functions
  benchmarks/gem_wsim: reposition the unbound duration boolean
  benchmarks/gem_wsim: fix scaling of period steps
  benchmarks/gem_wsim: fix duration range check
  benchmarks/gem_wsim: extract duration parsing code to new function
  benchmarks/gem_wsim: fix conflicting SSEU #define and enum
  benchmarks/gem_wsim: cleanups
  benchmarks/gem_wsim: reposition repeat_start variable
  benchmarks/gem_wsim: use lib code to query engines
  benchmarks/gem_wsim: allow comments in workload description files
  benchmarks/gem_wsim: introduce w_step_sync function
  benchmarks/gem_wsim: extract prepare contexts code to new function
  benchmarks/gem_wsim: extract prepare working sets code to new function
  benchmarks/gem_wsim: added basic xe support

 benchmarks/gem_wsim.c  | 963 +++++++++++++++++++++++++++++------------
 benchmarks/wsim/README |   8 +-
 lib/igt_device_scan.c  |  52 ++-
 lib/igt_device_scan.h  |   2 +
 4 files changed, 729 insertions(+), 296 deletions(-)

-- 
2.42.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [igt-dev] [PATCH i-g-t 01/14] lib/igt_device_scan: Xe get integrated/discrete card functions
  2023-09-26  8:44 [igt-dev] [PATCH i-g-t 00/14] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
@ 2023-09-26  8:44 ` Marcin Bernatowicz
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 02/14] benchmarks/gem_wsim: reposition the unbound duration boolean Marcin Bernatowicz
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Marcin Bernatowicz @ 2023-09-26  8:44 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Xe functions to get integrated/discrete card.

v2:
- renamed __find_first_i915_card to __find_first_intel_card_by_driver_name (Zbyszek)
v3:
- added documentation to public functions (Kamil)

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 lib/igt_device_scan.c | 52 +++++++++++++++++++++++++++++++++++--------
 lib/igt_device_scan.h |  2 ++
 2 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/lib/igt_device_scan.c b/lib/igt_device_scan.c
index ae69ed09f..f4f95fef3 100644
--- a/lib/igt_device_scan.c
+++ b/lib/igt_device_scan.c
@@ -769,25 +769,27 @@ __copy_dev_to_card(struct igt_device *dev, struct igt_device_card *card)
  * Iterate over all igt_devices array and find first discrete/integrated card.
  * card->pci_slot_name will be updated only if a card is found.
  */
-static bool __find_first_i915_card(struct igt_device_card *card, bool discrete)
+static bool __find_first_intel_card_by_driver_name(struct igt_device_card *card,
+				bool want_discrete, const char *drv_name)
 {
 	struct igt_device *dev;
-	int cmp;
+	int is_integrated;
 
+	igt_assert(drv_name);
 	memset(card, 0, sizeof(*card));
 
 	igt_list_for_each_entry(dev, &igt_devs.all, link) {
 
-		if (!is_pci_subsystem(dev) || strcmp(dev->driver, "i915"))
+		if (!is_pci_subsystem(dev) || strcmp(dev->driver, drv_name))
 			continue;
 
-		cmp = strncmp(dev->pci_slot_name, INTEGRATED_I915_GPU_PCI_ID,
-			      PCI_SLOT_NAME_SIZE);
+		is_integrated = !strncmp(dev->pci_slot_name, INTEGRATED_I915_GPU_PCI_ID,
+					 PCI_SLOT_NAME_SIZE);
 
-		if (discrete && cmp) {
+		if (want_discrete && !is_integrated) {
 			__copy_dev_to_card(dev, card);
 			return true;
-		} else if (!discrete && !cmp) {
+		} else if (!want_discrete && is_integrated) {
 			__copy_dev_to_card(dev, card);
 			return true;
 		}
@@ -800,14 +802,46 @@ bool igt_device_find_first_i915_discrete_card(struct igt_device_card *card)
 {
 	igt_assert(card);
 
-	return __find_first_i915_card(card, true);
+	return __find_first_intel_card_by_driver_name(card, true, "i915");
+}
+
+/**
+ * igt_device_find_first_xe_discrete_card
+ * @card: pointer to igt_device_card structure
+ *
+ * Iterate over all igt_devices array and find first xe discrete card.
+ * card will be updated only if a device is found.
+ *
+ * Returns: true if device is found, false otherwise.
+ */
+bool igt_device_find_first_xe_discrete_card(struct igt_device_card *card)
+{
+	igt_assert(card);
+
+	return __find_first_intel_card_by_driver_name(card, true, "xe");
 }
 
 bool igt_device_find_integrated_card(struct igt_device_card *card)
 {
 	igt_assert(card);
 
-	return __find_first_i915_card(card, false);
+	return __find_first_intel_card_by_driver_name(card, false, "i915");
+}
+
+/**
+ * igt_device_find_xe_integrated_card
+ * @card: pointer to igt_device_card structure
+ *
+ * Iterate over all igt_devices array and find first xe integrated card.
+ * card will be updated only if a device is found.
+ *
+ * Returns: true if device is found, false otherwise.
+ */
+bool igt_device_find_xe_integrated_card(struct igt_device_card *card)
+{
+	igt_assert(card);
+
+	return __find_first_intel_card_by_driver_name(card, false, "xe");
 }
 
 static struct igt_device *igt_device_from_syspath(const char *syspath)
diff --git a/lib/igt_device_scan.h b/lib/igt_device_scan.h
index e6b0f1b90..b8f6a843d 100644
--- a/lib/igt_device_scan.h
+++ b/lib/igt_device_scan.h
@@ -87,6 +87,8 @@ bool igt_device_card_match_pci(const char *filter,
 	struct igt_device_card *card);
 bool igt_device_find_first_i915_discrete_card(struct igt_device_card *card);
 bool igt_device_find_integrated_card(struct igt_device_card *card);
+bool igt_device_find_first_xe_discrete_card(struct igt_device_card *card);
+bool igt_device_find_xe_integrated_card(struct igt_device_card *card);
 char *igt_device_get_pretty_name(struct igt_device_card *card, bool numeric);
 int igt_open_card(struct igt_device_card *card);
 int igt_open_render(struct igt_device_card *card);
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [igt-dev] [PATCH i-g-t 02/14] benchmarks/gem_wsim: reposition the unbound duration boolean
  2023-09-26  8:44 [igt-dev] [PATCH i-g-t 00/14] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 01/14] lib/igt_device_scan: Xe get integrated/discrete card functions Marcin Bernatowicz
@ 2023-09-26  8:44 ` Marcin Bernatowicz
  2023-09-26 10:23   ` Tvrtko Ursulin
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 03/14] benchmarks/gem_wsim: fix scaling of period steps Marcin Bernatowicz
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: Marcin Bernatowicz @ 2023-09-26  8:44 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

All duration info is now in struct duration of w_step.

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 7b5e62a3b..90a36f7de 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -73,6 +73,7 @@ enum intel_engine_id {
 
 struct duration {
 	unsigned int min, max;
+	bool unbound_duration;
 };
 
 enum w_type
@@ -145,7 +146,6 @@ struct w_step
 	unsigned int context;
 	unsigned int engine;
 	struct duration duration;
-	bool unbound_duration;
 	struct deps data_deps;
 	struct deps fence_deps;
 	int emit_fence;
@@ -1130,7 +1130,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 				check_arg(intel_gen(intel_get_drm_devid(fd)) < 8,
 					  "Infinite batch at step %u needs Gen8+!\n",
 					  nr_steps);
-				step.unbound_duration = true;
+				step.duration.unbound_duration = true;
 			} else {
 				tmpl = strtol(field, &sep, 10);
 				check_arg(tmpl <= 0 || tmpl == LONG_MIN ||
@@ -2172,8 +2172,8 @@ update_bb_start(struct workload *wrk, struct w_step *w)
 
 	/* ticks is inverted for MI_DO_COMPARE (less-than comparison) */
 	ticks = 0;
-	if (!w->unbound_duration)
-		ticks = ~ns_to_ctx_ticks(1000 * get_duration(wrk, w));
+	if (!w->duration.unbound_duration)
+		ticks = ~ns_to_ctx_ticks(1000LL * get_duration(wrk, w));
 
 	*w->bb_duration = ticks;
 }
@@ -2349,7 +2349,7 @@ static void *run_workload(void *data)
 
 				igt_assert(t_idx >= 0 && t_idx < i);
 				igt_assert(wrk->steps[t_idx].type == BATCH);
-				igt_assert(wrk->steps[t_idx].unbound_duration);
+				igt_assert(wrk->steps[t_idx].duration.unbound_duration);
 
 				*wrk->steps[t_idx].bb_duration = 0xffffffff;
 				__sync_synchronize();
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [igt-dev] [PATCH i-g-t 03/14] benchmarks/gem_wsim: fix scaling of period steps
  2023-09-26  8:44 [igt-dev] [PATCH i-g-t 00/14] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 01/14] lib/igt_device_scan: Xe get integrated/discrete card functions Marcin Bernatowicz
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 02/14] benchmarks/gem_wsim: reposition the unbound duration boolean Marcin Bernatowicz
@ 2023-09-26  8:44 ` Marcin Bernatowicz
  2023-09-26 10:28   ` Tvrtko Ursulin
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 04/14] benchmarks/gem_wsim: fix duration range check Marcin Bernatowicz
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: Marcin Bernatowicz @ 2023-09-26  8:44 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Period steps take scale time (-F) command line option into account.
This allows to scale workload without need to modify .wsim file

ex. having following example.wsim

1.VCS1.3000.0.1
1.RCS.500-1000.-1.0
1.RCS.3700.0.0
1.RCS.1000.-2.0
1.VCS2.2300.-2.0
1.RCS.4700.-1.0
1.VCS2.600.-1.1
p.16000

we can scale the whole workload x10 with:

gem_wsim -w example.wsim -f 10 -F 10

-f is for batch duration steps, -F for period and delay steps

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 90a36f7de..65061461d 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -899,8 +899,14 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 				int_field(DELAY, delay, tmp <= 0,
 					  "Invalid delay at step %u!\n");
 			} else if (!strcmp(field, "p")) {
-				int_field(PERIOD, period, tmp <= 0,
-					  "Invalid period at step %u!\n");
+				field = strtok_r(fstart, ".", &fctx);
+				if (field) {
+					tmp = atoi(field);
+					check_arg(tmp <= 0, "Invalid period at step %u!\n", nr_steps);
+					step.type = PERIOD;
+					step.period = __duration(tmp, scale_time);
+					goto add_step;
+				}
 			} else if (!strcmp(field, "P")) {
 				unsigned int nr = 0;
 				while ((field = strtok_r(fstart, ".", &fctx))) {
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [igt-dev] [PATCH i-g-t 04/14] benchmarks/gem_wsim: fix duration range check
  2023-09-26  8:44 [igt-dev] [PATCH i-g-t 00/14] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (2 preceding siblings ...)
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 03/14] benchmarks/gem_wsim: fix scaling of period steps Marcin Bernatowicz
@ 2023-09-26  8:44 ` Marcin Bernatowicz
  2023-09-26 10:40   ` Tvrtko Ursulin
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 05/14] benchmarks/gem_wsim: extract duration parsing code to new function Marcin Bernatowicz
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: Marcin Bernatowicz @ 2023-09-26  8:44 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

When scale duration (-f) command line option is provided,
the max duration check does not take it into account, fix it.

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 65061461d..4f0deb095 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -1148,7 +1148,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 				if (sep && *sep == '-') {
 					tmpl = strtol(sep + 1, NULL, 10);
 					check_arg(tmpl <= 0 ||
-						tmpl <= step.duration.min ||
+						__duration(tmpl, scale_dur) <= step.duration.min ||
 						tmpl == LONG_MIN ||
 						tmpl == LONG_MAX,
 						"Invalid duration range at step %u!\n",
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [igt-dev] [PATCH i-g-t 05/14] benchmarks/gem_wsim: extract duration parsing code to new function
  2023-09-26  8:44 [igt-dev] [PATCH i-g-t 00/14] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (3 preceding siblings ...)
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 04/14] benchmarks/gem_wsim: fix duration range check Marcin Bernatowicz
@ 2023-09-26  8:44 ` Marcin Bernatowicz
  2023-09-26 10:48   ` Tvrtko Ursulin
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 06/14] benchmarks/gem_wsim: fix conflicting SSEU #define and enum Marcin Bernatowicz
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: Marcin Bernatowicz @ 2023-09-26  8:44 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Moved code from parse_workload to separate function.

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 67 ++++++++++++++++++++++++-------------------
 1 file changed, 37 insertions(+), 30 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 4f0deb095..aeb959364 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -860,6 +860,40 @@ static long __duration(long dur, double scale)
 	return round(scale * dur);
 }
 
+static int
+parse_duration(unsigned int nr_steps, struct duration *dur, double scale_dur, char *_desc)
+{
+	char *sep = NULL;
+	long tmpl;
+
+	if (_desc[0] == '*') {
+		if (intel_gen(intel_get_drm_devid(fd)) < 8) {
+			wsim_err("Infinite batch at step %u needs Gen8+!\n", nr_steps);
+			return -1;
+		}
+		dur->unbound_duration = true;
+	} else {
+		tmpl = strtol(_desc, &sep, 10);
+		if (tmpl <= 0 || tmpl == LONG_MIN || tmpl == LONG_MAX)
+			return -1;
+
+		dur->min = __duration(tmpl, scale_dur);
+
+		if (sep && *sep == '-') {
+			tmpl = strtol(sep + 1, NULL, 10);
+			if (tmpl <= 0 || __duration(tmpl, scale_dur) <= dur->min ||
+			    tmpl == LONG_MIN || tmpl == LONG_MAX)
+				return -1;
+
+			dur->max = __duration(tmpl, scale_dur);
+		} else {
+			dur->max = dur->min;
+		}
+	}
+
+	return 0;
+}
+
 #define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
 	if ((field = strtok_r(fstart, ".", &fctx))) { \
 		tmp = atoi(field); \
@@ -1127,38 +1161,11 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 		}
 
 		if ((field = strtok_r(fstart, ".", &fctx))) {
-			char *sep = NULL;
-			long int tmpl;
-
 			fstart = NULL;
 
-			if (field[0] == '*') {
-				check_arg(intel_gen(intel_get_drm_devid(fd)) < 8,
-					  "Infinite batch at step %u needs Gen8+!\n",
-					  nr_steps);
-				step.duration.unbound_duration = true;
-			} else {
-				tmpl = strtol(field, &sep, 10);
-				check_arg(tmpl <= 0 || tmpl == LONG_MIN ||
-					  tmpl == LONG_MAX,
-					  "Invalid duration at step %u!\n",
-					  nr_steps);
-				step.duration.min = __duration(tmpl, scale_dur);
-
-				if (sep && *sep == '-') {
-					tmpl = strtol(sep + 1, NULL, 10);
-					check_arg(tmpl <= 0 ||
-						__duration(tmpl, scale_dur) <= step.duration.min ||
-						tmpl == LONG_MIN ||
-						tmpl == LONG_MAX,
-						"Invalid duration range at step %u!\n",
-						nr_steps);
-					step.duration.max = __duration(tmpl,
-								       scale_dur);
-				} else {
-					step.duration.max = step.duration.min;
-				}
-			}
+			tmp = parse_duration(nr_steps, &step.duration, scale_dur, field);
+			check_arg(tmp < 0,
+				  "Invalid duration at step %u!\n", nr_steps);
 
 			valid++;
 		}
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [igt-dev] [PATCH i-g-t 06/14] benchmarks/gem_wsim: fix conflicting SSEU #define and enum
  2023-09-26  8:44 [igt-dev] [PATCH i-g-t 00/14] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (4 preceding siblings ...)
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 05/14] benchmarks/gem_wsim: extract duration parsing code to new function Marcin Bernatowicz
@ 2023-09-26  8:44 ` Marcin Bernatowicz
  2023-09-26 10:51   ` Tvrtko Ursulin
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 07/14] benchmarks/gem_wsim: cleanups Marcin Bernatowicz
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: Marcin Bernatowicz @ 2023-09-26  8:44 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

One SSEU is in enum w_step and then as #define SSEU (1 << 3).
Fix this.

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index aeb959364..3b01340bf 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -238,7 +238,7 @@ static struct drm_i915_gem_context_param_sseu device_sseu = {
 
 #define SYNCEDCLIENTS	(1<<1)
 #define DEPSYNC		(1<<2)
-#define SSEU		(1<<3)
+#define FLAG_SSEU	(1<<3)
 
 static const char *ring_str_map[NUM_ENGINES] = {
 	[DEFAULT] = "DEFAULT",
@@ -2597,7 +2597,7 @@ int main(int argc, char **argv)
 			/* Fall through */
 		case 'w':
 			w_args = add_workload_arg(w_args, ++nr_w_args, optarg,
-						  prio, flags & SSEU);
+						  prio, flags & FLAG_SSEU);
 			break;
 		case 'p':
 			prio = atoi(optarg);
@@ -2626,7 +2626,7 @@ int main(int argc, char **argv)
 			flags |= SYNCEDCLIENTS;
 			break;
 		case 's':
-			flags ^= SSEU;
+			flags ^= FLAG_SSEU;
 			break;
 		case 'd':
 			flags |= DEPSYNC;
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [igt-dev] [PATCH i-g-t 07/14] benchmarks/gem_wsim: cleanups
  2023-09-26  8:44 [igt-dev] [PATCH i-g-t 00/14] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (5 preceding siblings ...)
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 06/14] benchmarks/gem_wsim: fix conflicting SSEU #define and enum Marcin Bernatowicz
@ 2023-09-26  8:44 ` Marcin Bernatowicz
  2023-09-26 11:08   ` Tvrtko Ursulin
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 08/14] benchmarks/gem_wsim: reposition repeat_start variable Marcin Bernatowicz
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: Marcin Bernatowicz @ 2023-09-26  8:44 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 6021 bytes --]

Cleaning checkpatch.pl reported warnings/errors.
Removed unused fence_signal field from struct w_step.
calloc vs malloc in parse_workload for struct workload.

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 56 ++++++++++++++++++++++++++-----------------
 1 file changed, 34 insertions(+), 22 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 3b01340bf..daa20fb8a 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: MIT
 /*
  * Copyright © 2017 Intel Corporation
  *
@@ -76,8 +77,7 @@ struct duration {
 	bool unbound_duration;
 };
 
-enum w_type
-{
+enum w_type {
 	BATCH,
 	SYNC,
 	DELAY,
@@ -102,8 +102,7 @@ struct dep_entry {
 	int working_set; /* -1 = step dependecy, >= 0 working set id */
 };
 
-struct deps
-{
+struct deps {
 	int nr;
 	bool submit_fence;
 	struct dep_entry *list;
@@ -137,8 +136,7 @@ struct working_set {
 
 struct workload;
 
-struct w_step
-{
+struct w_step {
 	struct workload *wrk;
 
 	/* Workload step metadata */
@@ -155,7 +153,6 @@ struct w_step
 		int period;
 		int target;
 		int throttle;
-		int fence_signal;
 		int priority;
 		struct {
 			unsigned int engine_map_count;
@@ -194,8 +191,7 @@ struct ctx {
 	uint64_t sseu;
 };
 
-struct workload
-{
+struct workload {
 	unsigned int id;
 
 	unsigned int nr_steps;
@@ -807,6 +803,7 @@ static int add_buffers(struct working_set *set, char *str)
 
 	for (i = 0; i < add; i++) {
 		struct work_buffer_size *sz = &sizes[set->nr + i];
+
 		sz->min = min_sz;
 		sz->max = max_sz;
 		sz->size = 0;
@@ -895,13 +892,16 @@ parse_duration(unsigned int nr_steps, struct duration *dur, double scale_dur, ch
 }
 
 #define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
-	if ((field = strtok_r(fstart, ".", &fctx))) { \
-		tmp = atoi(field); \
-		check_arg(_COND_, _ERR_, nr_steps); \
-		step.type = _STEP_; \
-		step._FIELD_ = tmp; \
-		goto add_step; \
-	} \
+	do { \
+		field = strtok_r(fstart, ".", &fctx); \
+		if (field) { \
+			tmp = atoi(field); \
+			check_arg(_COND_, _ERR_, nr_steps); \
+			step.type = _STEP_; \
+			step._FIELD_ = tmp; \
+			goto add_step; \
+		} \
+	} while (0)
 
 static struct workload *
 parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
@@ -926,7 +926,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 		valid = 0;
 		memset(&step, 0, sizeof(step));
 
-		if ((field = strtok_r(fstart, ".", &fctx))) {
+		field = strtok_r(fstart, ".", &fctx);
+		if (field) {
 			fstart = NULL;
 
 			if (!strcmp(field, "d")) {
@@ -943,6 +944,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 				}
 			} else if (!strcmp(field, "P")) {
 				unsigned int nr = 0;
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
@@ -968,6 +970,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 					  "Invalid sync target at step %u!\n");
 			} else if (!strcmp(field, "S")) {
 				unsigned int nr = 0;
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(tmp <= 0 && nr == 0,
@@ -1004,6 +1007,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 				goto add_step;
 			} else if (!strcmp(field, "M")) {
 				unsigned int nr = 0;
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
@@ -1034,6 +1038,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 					  "Invalid terminate target at step %u!\n");
 			} else if (!strcmp(field, "X")) {
 				unsigned int nr = 0;
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
@@ -1058,6 +1063,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 				goto add_step;
 			} else if (!strcmp(field, "B")) {
 				unsigned int nr = 0;
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
@@ -1077,6 +1083,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 				goto add_step;
 			} else if (!strcmp(field, "b")) {
 				unsigned int nr = 0;
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					check_arg(nr > 2,
 						  "Invalid bond format at step %u!\n",
@@ -1148,7 +1155,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			valid++;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx))) {
+		field = strtok_r(fstart, ".", &fctx);
+		if (field) {
 			fstart = NULL;
 
 			i = str_to_engine(field);
@@ -1160,7 +1168,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			step.engine = i;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx))) {
+		field = strtok_r(fstart, ".", &fctx);
+		if (field) {
 			fstart = NULL;
 
 			tmp = parse_duration(nr_steps, &step.duration, scale_dur, field);
@@ -1170,7 +1179,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			valid++;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx))) {
+		field = strtok_r(fstart, ".", &fctx);
+		if (field) {
 			fstart = NULL;
 
 			tmp = parse_dependencies(nr_steps, &step, field);
@@ -1180,7 +1190,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			valid++;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx))) {
+		field = strtok_r(fstart, ".", &fctx);
+		if (field) {
 			fstart = NULL;
 
 			check_arg(strlen(field) != 1 ||
@@ -1224,7 +1235,7 @@ add_step:
 		nr_steps += app_w->nr_steps;
 	}
 
-	wrk = malloc(sizeof(*wrk));
+	wrk = calloc(1, sizeof(*wrk));
 	igt_assert(wrk);
 
 	wrk->nr_steps = nr_steps;
@@ -2717,6 +2728,7 @@ int main(int argc, char **argv)
 
 	if (append_workload_arg) {
 		struct w_arg arg = { NULL, append_workload_arg, 0 };
+
 		app_w = parse_workload(&arg, flags, scale_dur, scale_time,
 				       NULL);
 		if (!app_w) {
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [igt-dev] [PATCH i-g-t 08/14] benchmarks/gem_wsim: reposition repeat_start variable
  2023-09-26  8:44 [igt-dev] [PATCH i-g-t 00/14] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (6 preceding siblings ...)
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 07/14] benchmarks/gem_wsim: cleanups Marcin Bernatowicz
@ 2023-09-26  8:44 ` Marcin Bernatowicz
  2023-09-26 11:10   ` Tvrtko Ursulin
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 09/14] benchmarks/gem_wsim: use lib code to query engines Marcin Bernatowicz
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: Marcin Bernatowicz @ 2023-09-26  8:44 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

No need for repeat_start in struct workload.
It's now a variable in run_workload function scope.

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index daa20fb8a..2e6eb6388 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -209,8 +209,6 @@ struct workload {
 	uint32_t bb_prng;
 	uint32_t bo_prng;
 
-	struct timespec repeat_start;
-
 	unsigned int nr_ctxs;
 	struct ctx *ctx_list;
 
@@ -2283,7 +2281,7 @@ static void sync_deps(struct workload *wrk, struct w_step *w)
 static void *run_workload(void *data)
 {
 	struct workload *wrk = (struct workload *)data;
-	struct timespec t_start, t_end;
+	struct timespec t_start, t_end, repeat_start;
 	struct w_step *w;
 	int throttle = -1;
 	int qd_throttle = -1;
@@ -2297,7 +2295,7 @@ static void *run_workload(void *data)
 	     count++) {
 		unsigned int cur_seqno = wrk->sync_seqno;
 
-		clock_gettime(CLOCK_MONOTONIC, &wrk->repeat_start);
+		clock_gettime(CLOCK_MONOTONIC, &repeat_start);
 
 		for (i = 0, w = wrk->steps; wrk->run && (i < wrk->nr_steps);
 		     i++, w++) {
@@ -2311,7 +2309,7 @@ static void *run_workload(void *data)
 				int elapsed;
 
 				clock_gettime(CLOCK_MONOTONIC, &now);
-				elapsed = elapsed_us(&wrk->repeat_start, &now);
+				elapsed = elapsed_us(&repeat_start, &now);
 				do_sleep = w->period - elapsed;
 				time_tot += elapsed;
 				if (elapsed < time_min)
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [igt-dev] [PATCH i-g-t 09/14] benchmarks/gem_wsim: use lib code to query engines
  2023-09-26  8:44 [igt-dev] [PATCH i-g-t 00/14] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (7 preceding siblings ...)
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 08/14] benchmarks/gem_wsim: reposition repeat_start variable Marcin Bernatowicz
@ 2023-09-26  8:44 ` Marcin Bernatowicz
  2023-09-26 11:23   ` Tvrtko Ursulin
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 10/14] benchmarks/gem_wsim: allow comments in workload description files Marcin Bernatowicz
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: Marcin Bernatowicz @ 2023-09-26  8:44 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Use code in lib/i915/gem_engine_topology to query engines.

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 157 +++++-------------------------------------
 1 file changed, 19 insertions(+), 138 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 2e6eb6388..a3339e1b2 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -456,150 +456,31 @@ static int str_to_engine(const char *str)
 	return -1;
 }
 
-static bool __engines_queried;
-static unsigned int __num_engines;
-static struct i915_engine_class_instance *__engines;
-
-static int
-__i915_query(int i915, struct drm_i915_query *q)
+static struct intel_engine_data *query_engines(void)
 {
-	if (igt_ioctl(i915, DRM_IOCTL_I915_QUERY, q))
-		return -errno;
-	return 0;
-}
+	static struct intel_engine_data engines = {};
 
-static int
-__i915_query_items(int i915, struct drm_i915_query_item *items, uint32_t n_items)
-{
-	struct drm_i915_query q = {
-		.num_items = n_items,
-		.items_ptr = to_user_pointer(items),
-	};
-	return __i915_query(i915, &q);
-}
+	if (engines.nengines)
+		return &engines;
 
-static void
-i915_query_items(int i915, struct drm_i915_query_item *items, uint32_t n_items)
-{
-	igt_assert_eq(__i915_query_items(i915, items, n_items), 0);
-}
-
-static bool has_engine_query(int i915)
-{
-	struct drm_i915_query_item item = {
-		.query_id = DRM_I915_QUERY_ENGINE_INFO,
-	};
-
-	return __i915_query_items(i915, &item, 1) == 0 && item.length > 0;
-}
-
-static void query_engines(void)
-{
-	struct i915_engine_class_instance *engines;
-	unsigned int num;
-
-	if (__engines_queried)
-		return;
-
-	__engines_queried = true;
-
-	if (!has_engine_query(fd)) {
-		unsigned int num_bsd = gem_has_bsd(fd) + gem_has_bsd2(fd);
-		unsigned int i = 0;
-
-		igt_assert(num_bsd);
-
-		num = 1 + num_bsd;
-
-		if (gem_has_blt(fd))
-			num++;
-
-		if (gem_has_vebox(fd))
-			num++;
-
-		engines = calloc(num,
-				 sizeof(struct i915_engine_class_instance));
-		igt_assert(engines);
-
-		engines[i].engine_class = I915_ENGINE_CLASS_RENDER;
-		engines[i].engine_instance = 0;
-		i++;
-
-		if (gem_has_blt(fd)) {
-			engines[i].engine_class = I915_ENGINE_CLASS_COPY;
-			engines[i].engine_instance = 0;
-			i++;
-		}
-
-		if (gem_has_bsd(fd)) {
-			engines[i].engine_class = I915_ENGINE_CLASS_VIDEO;
-			engines[i].engine_instance = 0;
-			i++;
-		}
-
-		if (gem_has_bsd2(fd)) {
-			engines[i].engine_class = I915_ENGINE_CLASS_VIDEO;
-			engines[i].engine_instance = 1;
-			i++;
-		}
-
-		if (gem_has_vebox(fd)) {
-			engines[i].engine_class =
-				I915_ENGINE_CLASS_VIDEO_ENHANCE;
-			engines[i].engine_instance = 0;
-			i++;
-		}
-	} else {
-		struct drm_i915_query_engine_info *engine_info;
-		struct drm_i915_query_item item = {
-			.query_id = DRM_I915_QUERY_ENGINE_INFO,
-		};
-		const unsigned int sz = 4096;
-		unsigned int i;
-
-		engine_info = malloc(sz);
-		igt_assert(engine_info);
-		memset(engine_info, 0, sz);
-
-		item.data_ptr = to_user_pointer(engine_info);
-		item.length = sz;
-
-		i915_query_items(fd, &item, 1);
-		igt_assert(item.length > 0);
-		igt_assert(item.length <= sz);
-
-		num = engine_info->num_engines;
-
-		engines = calloc(num,
-				 sizeof(struct i915_engine_class_instance));
-		igt_assert(engines);
-
-		for (i = 0; i < num; i++) {
-			struct drm_i915_engine_info *engine =
-				(struct drm_i915_engine_info *)&engine_info->engines[i];
-
-			engines[i] = engine->engine;
-		}
-	}
-
-	__engines = engines;
-	__num_engines = num;
+	engines = intel_engine_list_of_physical(fd);
+	igt_assert(engines.nengines);
+	return &engines;
 }
 
 static unsigned int num_engines_in_class(enum intel_engine_id class)
 {
-	unsigned int i, count = 0;
+	const struct intel_engine_data *engines = query_engines();
+	unsigned int count = 0;
+	int i;
 
 	igt_assert(class == VCS);
 
-	query_engines();
-
-	for (i = 0; i < __num_engines; i++) {
-		if (__engines[i].engine_class == I915_ENGINE_CLASS_VIDEO)
+	for (i = 0; i < engines->nengines; i++) {
+		if (engines->engines[i].class == I915_ENGINE_CLASS_VIDEO)
 			count++;
 	}
 
-	igt_assert(count);
 	return count;
 }
 
@@ -607,16 +488,15 @@ static void
 fill_engines_id_class(enum intel_engine_id *list,
 		      enum intel_engine_id class)
 {
+	const struct intel_engine_data *engines = query_engines();
 	enum intel_engine_id engine = VCS1;
 	unsigned int i, j = 0;
 
 	igt_assert(class == VCS);
 	igt_assert(num_engines_in_class(VCS) <= 2);
 
-	query_engines();
-
-	for (i = 0; i < __num_engines; i++) {
-		if (__engines[i].engine_class != I915_ENGINE_CLASS_VIDEO)
+	for (i = 0; i < engines->nengines; i++) {
+		if (engines->engines[i].class != I915_ENGINE_CLASS_VIDEO)
 			continue;
 
 		list[j++] = engine++;
@@ -626,17 +506,18 @@ fill_engines_id_class(enum intel_engine_id *list,
 static unsigned int
 find_physical_instance(enum intel_engine_id class, unsigned int logical)
 {
+	const struct intel_engine_data *engines = query_engines();
 	unsigned int i, j = 0;
 
 	igt_assert(class == VCS);
 
-	for (i = 0; i < __num_engines; i++) {
-		if (__engines[i].engine_class != I915_ENGINE_CLASS_VIDEO)
+	for (i = 0; i < engines->nengines; i++) {
+		if (engines->engines[i].class != I915_ENGINE_CLASS_VIDEO)
 			continue;
 
 		/* Map logical to physical instances. */
 		if (logical == j++)
-			return __engines[i].engine_instance;
+			return engines->engines[i].instance;
 	}
 
 	igt_assert(0);
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [igt-dev] [PATCH i-g-t 10/14] benchmarks/gem_wsim: allow comments in workload description files
  2023-09-26  8:44 [igt-dev] [PATCH i-g-t 00/14] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (8 preceding siblings ...)
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 09/14] benchmarks/gem_wsim: use lib code to query engines Marcin Bernatowicz
@ 2023-09-26  8:44 ` Marcin Bernatowicz
  2023-09-26 11:33   ` Tvrtko Ursulin
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 11/14] benchmarks/gem_wsim: introduce w_step_sync function Marcin Bernatowicz
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: Marcin Bernatowicz @ 2023-09-26  8:44 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Lines starting with '#' are skipped.
If command line step separator (',') is encountered after '#'
it is replaced with ';' to not break parsing.

v2: SKIP step type is not needed (Tvrtko)

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c  | 19 ++++++++++++++++++-
 benchmarks/wsim/README |  2 ++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index a3339e1b2..0222c6c71 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -43,6 +43,7 @@
 #include <limits.h>
 #include <pthread.h>
 #include <math.h>
+#include <ctype.h>
 
 #include "drm.h"
 #include "drmtest.h"
@@ -809,6 +810,14 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 		if (field) {
 			fstart = NULL;
 
+			/* line starting with # is a comment */
+			if (field[0] == '#') {
+				if (verbose > 3)
+					printf("skipped line: %s\n", _token);
+				free(token);
+				continue;
+			}
+
 			if (!strcmp(field, "d")) {
 				int_field(DELAY, delay, tmp <= 0,
 					  "Invalid delay at step %u!\n");
@@ -1073,7 +1082,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 		if (field) {
 			fstart = NULL;
 
-			check_arg(strlen(field) != 1 ||
+			check_arg(!strlen(field) ||
+				  (strlen(field) > 1 && !isspace(field[1]) && field[1] != '#') ||
 				  (field[0] != '0' && field[0] != '1'),
 				  "Invalid wait boolean at step %u!\n",
 				  nr_steps);
@@ -2422,6 +2432,13 @@ static char *load_workload_descriptor(char *filename)
 	close(infd);
 
 	for (i = 0; i < len; i++) {
+		/* '#' starts comment till end of line */
+		if (buf[i] == '#')
+			/* replace ',' in comments to not break parsing */
+			while (++i < len && buf[i] != '\n')
+				if (buf[i] == ',')
+					buf[i] = ';';
+
 		if (buf[i] == '\n')
 			buf[i] = ',';
 	}
diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
index 8c71f2fe6..e4fd61645 100644
--- a/benchmarks/wsim/README
+++ b/benchmarks/wsim/README
@@ -1,6 +1,8 @@
 Workload descriptor format
 ==========================
 
+Lines starting with '#' are treated as comments (do not create work step).
+
 ctx.engine.duration_us.dependency.wait,...
 <uint>.<str>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
 B.<uint>
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [igt-dev] [PATCH i-g-t 11/14] benchmarks/gem_wsim: introduce w_step_sync function
  2023-09-26  8:44 [igt-dev] [PATCH i-g-t 00/14] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (9 preceding siblings ...)
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 10/14] benchmarks/gem_wsim: allow comments in workload description files Marcin Bernatowicz
@ 2023-09-26  8:44 ` Marcin Bernatowicz
  2023-09-26 11:37   ` Tvrtko Ursulin
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 12/14] benchmarks/gem_wsim: extract prepare contexts code to new function Marcin Bernatowicz
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: Marcin Bernatowicz @ 2023-09-26  8:44 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Added w_step_sync function for workload step synchronization.
Change will allow cleaner xe integration.

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 0222c6c71..2c6ccd3a9 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -245,6 +245,11 @@ static const char *ring_str_map[NUM_ENGINES] = {
 	[VECS] = "VECS",
 };
 
+static void w_step_sync(struct w_step *w)
+{
+	gem_sync(fd, w->obj[0].handle);
+}
+
 static int read_timestamp_frequency(int i915)
 {
 	int value = 0;
@@ -2106,7 +2111,7 @@ static void w_sync_to(struct workload *wrk, struct w_step *w, int target)
 	igt_assert(target < wrk->nr_steps);
 	igt_assert(wrk->steps[target].type == BATCH);
 
-	gem_sync(fd, wrk->steps[target].obj[0].handle);
+	w_step_sync(&wrk->steps[target]);
 }
 
 static void
@@ -2165,7 +2170,7 @@ static void sync_deps(struct workload *wrk, struct w_step *w)
 		igt_assert(dep_idx >= 0 && dep_idx < w->idx);
 		igt_assert(wrk->steps[dep_idx].type == BATCH);
 
-		gem_sync(fd, wrk->steps[dep_idx].obj[0].handle);
+		w_step_sync(&wrk->steps[dep_idx]);
 	}
 }
 
@@ -2219,7 +2224,7 @@ static void *run_workload(void *data)
 
 				igt_assert(s_idx >= 0 && s_idx < i);
 				igt_assert(wrk->steps[s_idx].type == BATCH);
-				gem_sync(fd, wrk->steps[s_idx].obj[0].handle);
+				w_step_sync(&wrk->steps[s_idx]);
 				continue;
 			} else if (w->type == THROTTLE) {
 				throttle = w->throttle;
@@ -2310,7 +2315,7 @@ static void *run_workload(void *data)
 				break;
 
 			if (w->sync)
-				gem_sync(fd, w->obj[0].handle);
+				w_step_sync(w);
 
 			if (qd_throttle > 0) {
 				while (wrk->nrequest[engine] > qd_throttle) {
@@ -2319,7 +2324,7 @@ static void *run_workload(void *data)
 					s = igt_list_first_entry(&wrk->requests[engine],
 								 s, rq_link);
 
-					gem_sync(fd, s->obj[0].handle);
+						w_step_sync(s);
 
 					s->request = -1;
 					igt_list_del(&s->rq_link);
@@ -2351,7 +2356,7 @@ static void *run_workload(void *data)
 			continue;
 
 		w = igt_list_last_entry(&wrk->requests[i], w, rq_link);
-		gem_sync(fd, w->obj[0].handle);
+		w_step_sync(w);
 	}
 
 	clock_gettime(CLOCK_MONOTONIC, &t_end);
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [igt-dev] [PATCH i-g-t 12/14] benchmarks/gem_wsim: extract prepare contexts code to new function
  2023-09-26  8:44 [igt-dev] [PATCH i-g-t 00/14] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (10 preceding siblings ...)
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 11/14] benchmarks/gem_wsim: introduce w_step_sync function Marcin Bernatowicz
@ 2023-09-26  8:44 ` Marcin Bernatowicz
  2023-09-26 11:43   ` Tvrtko Ursulin
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 13/14] benchmarks/gem_wsim: extract prepare working sets " Marcin Bernatowicz
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: Marcin Bernatowicz @ 2023-09-26  8:44 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

No functional changes.
Extracted prepare_contexts function from prepare_workload.
Small code cleanup for "No need for 'else' after continue/break". (Kamil)

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 2c6ccd3a9..55f8d9b1b 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -1766,20 +1766,13 @@ static void measure_active_set(struct workload *wrk)
 
 #define alloca0(sz) ({ size_t sz__ = (sz); memset(alloca(sz__), 0, sz__); })
 
-static int prepare_workload(unsigned int id, struct workload *wrk)
+static int prepare_contexts(unsigned int id, struct workload *wrk)
 {
-	struct working_set **sets;
-	unsigned long total = 0;
 	uint32_t share_vm = 0;
 	int max_ctx = -1;
 	struct w_step *w;
 	int i, j;
 
-	wrk->id = id;
-	wrk->bb_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
-	wrk->bo_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
-	wrk->run = true;
-
 	/*
 	 * Pre-scan workload steps to allocate context list storage.
 	 */
@@ -1968,6 +1961,23 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
 	if (share_vm)
 		vm_destroy(fd, share_vm);
 
+	return 0;
+}
+
+static int prepare_workload(unsigned int id, struct workload *wrk)
+{
+	struct working_set **sets;
+	unsigned long total = 0;
+	struct w_step *w;
+	int i, j;
+
+	wrk->id = id;
+	wrk->bb_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
+	wrk->bo_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
+	wrk->run = true;
+
+	prepare_contexts(id, wrk);
+
 	/* Record default preemption. */
 	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
 		if (w->type == BATCH)
@@ -1990,9 +2000,9 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
 
 			if (w2->context != w->context)
 				continue;
-			else if (w2->type == PREEMPTION)
+			if (w2->type == PREEMPTION)
 				break;
-			else if (w2->type != BATCH)
+			if (w2->type != BATCH)
 				continue;
 
 			w2->preempt_us = w->period;
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [igt-dev] [PATCH i-g-t 13/14] benchmarks/gem_wsim: extract prepare working sets code to new function
  2023-09-26  8:44 [igt-dev] [PATCH i-g-t 00/14] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (11 preceding siblings ...)
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 12/14] benchmarks/gem_wsim: extract prepare contexts code to new function Marcin Bernatowicz
@ 2023-09-26  8:44 ` Marcin Bernatowicz
  2023-09-26 11:46   ` Tvrtko Ursulin
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 14/14] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: Marcin Bernatowicz @ 2023-09-26  8:44 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

No functional changes.
Extracted prepare_working_sets function from prepare_workload.

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 106 +++++++++++++++++++++++-------------------
 1 file changed, 58 insertions(+), 48 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 55f8d9b1b..7703ca822 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -1964,10 +1964,66 @@ static int prepare_contexts(unsigned int id, struct workload *wrk)
 	return 0;
 }
 
-static int prepare_workload(unsigned int id, struct workload *wrk)
+static int prepare_working_sets(unsigned int id, struct workload *wrk)
 {
 	struct working_set **sets;
 	unsigned long total = 0;
+	struct w_step *w;
+	int i;
+
+	/*
+	 * Allocate working sets.
+	 */
+	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+		if (w->type == WORKINGSET && !w->working_set.shared)
+			total += allocate_working_set(wrk, &w->working_set);
+	}
+
+	if (verbose > 2)
+		printf("%u: %lu bytes in working sets.\n", wrk->id, total);
+
+	/*
+	 * Map of working set ids.
+	 */
+	wrk->max_working_set_id = -1;
+	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+		if (w->type == WORKINGSET &&
+		    w->working_set.id > wrk->max_working_set_id)
+			wrk->max_working_set_id = w->working_set.id;
+	}
+
+	sets = wrk->working_sets;
+	wrk->working_sets = calloc(wrk->max_working_set_id + 1,
+				   sizeof(*wrk->working_sets));
+	igt_assert(wrk->working_sets);
+
+	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+		struct working_set *set;
+
+		if (w->type != WORKINGSET)
+			continue;
+
+		if (!w->working_set.shared) {
+			set = &w->working_set;
+		} else {
+			igt_assert(sets);
+
+			set = sets[w->working_set.id];
+			igt_assert(set->shared);
+			igt_assert(set->sizes);
+		}
+
+		wrk->working_sets[w->working_set.id] = set;
+	}
+
+	if (sets)
+		free(sets);
+
+	return 0;
+}
+
+static int prepare_workload(unsigned int id, struct workload *wrk)
+{
 	struct w_step *w;
 	int i, j;
 
@@ -2019,53 +2075,7 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
 		}
 	}
 
-	/*
-	 * Allocate working sets.
-	 */
-	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
-		if (w->type == WORKINGSET && !w->working_set.shared)
-			total += allocate_working_set(wrk, &w->working_set);
-	}
-
-	if (verbose > 2)
-		printf("%u: %lu bytes in working sets.\n", wrk->id, total);
-
-	/*
-	 * Map of working set ids.
-	 */
-	wrk->max_working_set_id = -1;
-	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
-		if (w->type == WORKINGSET &&
-		    w->working_set.id > wrk->max_working_set_id)
-			wrk->max_working_set_id = w->working_set.id;
-	}
-
-	sets = wrk->working_sets;
-	wrk->working_sets = calloc(wrk->max_working_set_id + 1,
-				   sizeof(*wrk->working_sets));
-	igt_assert(wrk->working_sets);
-
-	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
-		struct working_set *set;
-
-		if (w->type != WORKINGSET)
-			continue;
-
-		if (!w->working_set.shared) {
-			set = &w->working_set;
-		} else {
-			igt_assert(sets);
-
-			set = sets[w->working_set.id];
-			igt_assert(set->shared);
-			igt_assert(set->sizes);
-		}
-
-		wrk->working_sets[w->working_set.id] = set;
-	}
-
-	if (sets)
-		free(sets);
+	prepare_working_sets(id, wrk);
 
 	/*
 	 * Allocate batch buffers.
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [igt-dev] [PATCH i-g-t 14/14] benchmarks/gem_wsim: added basic xe support
  2023-09-26  8:44 [igt-dev] [PATCH i-g-t 00/14] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (12 preceding siblings ...)
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 13/14] benchmarks/gem_wsim: extract prepare working sets " Marcin Bernatowicz
@ 2023-09-26  8:44 ` Marcin Bernatowicz
  2023-09-26 13:10   ` Tvrtko Ursulin
  2023-09-26 10:03 ` [igt-dev] ✓ CI.xeBAT: success for benchmarks/gem_wsim: added basic xe support (rev3) Patchwork
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: Marcin Bernatowicz @ 2023-09-26  8:44 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Added basic xe support. Single binary handles both i915 and Xe devices.

Some functionality is still missing: working sets, bonding.

The tool is handy for scheduling tests, we find it useful to verify vGPU
profiles defining different execution quantum/preemption timeout
settings.

There is also some rationale for the tool in following thread:
https://lore.kernel.org/dri-devel/a443495f-5d1b-52e1-9b2f-80167deb6d57@linux.intel.com/

With this patch it should be possible to run following on xe device:

gem_wsim -w benchmarks/wsim/media_load_balance_fhd26u7.wsim -c 36 -r 600

Best with drm debug logs disabled:

echo 0 > /sys/module/drm/parameters/debug

v2: minimizing divergence - same workload syntax for both drivers,
    so most existing examples should run on xe unmodified (Tvrtko)
    This version creates one common VM per workload.
    Explicit VM management, compute mode will come in next patchset.

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c  | 515 ++++++++++++++++++++++++++++++++++++++---
 benchmarks/wsim/README |   6 +-
 2 files changed, 485 insertions(+), 36 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 7703ca822..c83ed4882 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -62,6 +62,12 @@
 #include "i915/gem_engine_topology.h"
 #include "i915/gem_mman.h"
 
+#include "igt_syncobj.h"
+#include "intel_allocator.h"
+#include "xe_drm.h"
+#include "xe/xe_ioctl.h"
+#include "xe/xe_spin.h"
+
 enum intel_engine_id {
 	DEFAULT,
 	RCS,
@@ -109,6 +115,10 @@ struct deps {
 	struct dep_entry *list;
 };
 
+#define for_each_dep(__dep, __deps) \
+	for (int __i = 0; __i < __deps.nr && \
+	     (__dep = &__deps.list[__i]); ++__i)
+
 struct w_arg {
 	char *filename;
 	char *desc;
@@ -177,10 +187,30 @@ struct w_step {
 	struct drm_i915_gem_execbuffer2 eb;
 	struct drm_i915_gem_exec_object2 *obj;
 	struct drm_i915_gem_relocation_entry reloc[3];
+
+	struct drm_xe_exec exec;
+	size_t bb_size;
+	struct xe_spin *spin;
+	struct drm_xe_sync *syncs;
+
 	uint32_t bb_handle;
 	uint32_t *bb_duration;
 };
 
+struct vm {
+	uint32_t id;
+	bool compute_mode;
+	uint64_t ahnd;
+};
+
+struct exec_queue {
+	uint32_t id;
+	struct drm_xe_engine_class_instance hwe;
+	/* for qd_throttle */
+	unsigned int nrequest;
+	struct igt_list_head requests;
+};
+
 struct ctx {
 	uint32_t id;
 	int priority;
@@ -190,6 +220,10 @@ struct ctx {
 	struct bond *bonds;
 	bool load_balance;
 	uint64_t sseu;
+	/* reference to vm */
+	struct vm *vm;
+	/* queue for each class */
+	struct exec_queue queues[NUM_ENGINES];
 };
 
 struct workload {
@@ -213,7 +247,10 @@ struct workload {
 	unsigned int nr_ctxs;
 	struct ctx *ctx_list;
 
-	struct working_set **working_sets; /* array indexed by set id */
+	unsigned int nr_vms;
+	struct vm *vm_list;
+
+	struct working_set **working_sets;
 	int max_working_set_id;
 
 	int sync_timeline;
@@ -223,6 +260,18 @@ struct workload {
 	unsigned int nrequest[NUM_ENGINES];
 };
 
+#define for_each_ctx(__ctx, __wrk) \
+	for (int __i = 0; __i < (__wrk)->nr_ctxs && \
+	     (__ctx = &(__wrk)->ctx_list[__i]); ++__i)
+
+#define for_each_exec_queue(__eq, __ctx) \
+		for (int __j = 0; __j < NUM_ENGINES && ((__eq) = &((__ctx)->queues[__j])); ++__j) \
+			for_if((__eq)->id > 0)
+
+#define for_each_vm(__vm, __wrk) \
+	for (int __i = 0; __i < (__wrk)->nr_vms && \
+	     (__vm = &(__wrk)->vm_list[__i]); ++__i)
+
 static unsigned int master_prng;
 
 static int verbose = 1;
@@ -231,6 +280,8 @@ static struct drm_i915_gem_context_param_sseu device_sseu = {
 	.slice_mask = -1 /* Force read on first use. */
 };
 
+static bool is_xe;
+
 #define SYNCEDCLIENTS	(1<<1)
 #define DEPSYNC		(1<<2)
 #define FLAG_SSEU	(1<<3)
@@ -247,7 +298,10 @@ static const char *ring_str_map[NUM_ENGINES] = {
 
 static void w_step_sync(struct w_step *w)
 {
-	gem_sync(fd, w->obj[0].handle);
+	if (is_xe)
+		igt_assert(syncobj_wait(fd, &w->syncs[0].handle, 1, INT64_MAX, 0, NULL));
+	else
+		gem_sync(fd, w->obj[0].handle);
 }
 
 static int read_timestamp_frequency(int i915)
@@ -351,15 +405,23 @@ parse_dependency(unsigned int nr_steps, struct w_step *w, char *str)
 		if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
 			return -1;
 
-		add_dep(&w->data_deps, entry);
+		/* only fence deps in xe, let f-1 <==> -1 */
+		if (is_xe)
+			add_dep(&w->fence_deps, entry);
+		else
+			add_dep(&w->data_deps, entry);
 
 		break;
 	case 's':
-		submit_fence = true;
+		/* no submit fence in xe ? */
+		if (!is_xe)
+			submit_fence = true;
 		/* Fall-through. */
 	case 'f':
-		/* Multiple fences not yet supported. */
-		igt_assert_eq(w->fence_deps.nr, 0);
+		/* xe supports multiple fences */
+		if (!is_xe)
+			/* Multiple fences not yet supported. */
+			igt_assert_eq(w->fence_deps.nr, 0);
 
 		entry.target = atoi(++str);
 		if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
@@ -469,7 +531,17 @@ static struct intel_engine_data *query_engines(void)
 	if (engines.nengines)
 		return &engines;
 
-	engines = intel_engine_list_of_physical(fd);
+	if (is_xe) {
+		struct drm_xe_engine_class_instance *hwe;
+
+		xe_for_each_hw_engine(fd, hwe) {
+			engines.engines[engines.nengines].class = hwe->engine_class;
+			engines.engines[engines.nengines].instance = hwe->engine_instance;
+			engines.nengines++;
+		}
+	} else
+		engines = intel_engine_list_of_physical(fd);
+
 	igt_assert(engines.nengines);
 	return &engines;
 }
@@ -562,6 +634,40 @@ get_engine(enum intel_engine_id engine)
 	return ci;
 }
 
+static struct drm_xe_engine_class_instance
+get_xe_engine(enum intel_engine_id engine)
+{
+	struct drm_xe_engine_class_instance ci;
+
+	switch (engine) {
+	case DEFAULT:
+	case RCS:
+		ci.engine_class = DRM_XE_ENGINE_CLASS_RENDER;
+		ci.engine_instance = 0;
+		break;
+	case BCS:
+		ci.engine_class = DRM_XE_ENGINE_CLASS_COPY;
+		ci.engine_instance = 0;
+		break;
+	case VCS1:
+		ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_DECODE;
+		ci.engine_instance = 0;
+		break;
+	case VCS2:
+		ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_DECODE;
+		ci.engine_instance = 1;
+		break;
+	case VECS:
+		ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE;
+		ci.engine_instance = 0;
+		break;
+	default:
+		igt_assert(0);
+	};
+
+	return ci;
+}
+
 static int parse_engine_map(struct w_step *step, const char *_str)
 {
 	char *token, *tctx = NULL, *tstart = (char *)_str;
@@ -838,6 +944,13 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			} else if (!strcmp(field, "P")) {
 				unsigned int nr = 0;
 
+				if (is_xe) {
+					if (verbose > 3)
+						printf("skipped line: %s\n", _token);
+					free(token);
+					continue;
+				}
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
@@ -864,6 +977,13 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			} else if (!strcmp(field, "S")) {
 				unsigned int nr = 0;
 
+				if (is_xe) {
+					if (verbose > 3)
+						printf("skipped line: %s\n", _token);
+					free(token);
+					continue;
+				}
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(tmp <= 0 && nr == 0,
@@ -977,6 +1097,12 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			} else if (!strcmp(field, "b")) {
 				unsigned int nr = 0;
 
+				if (is_xe) {
+					if (verbose > 3)
+						printf("skipped line: %s\n", _token);
+					free(token);
+					continue;
+				}
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					check_arg(nr > 2,
 						  "Invalid bond format at step %u!\n",
@@ -1041,7 +1167,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			}
 
 			tmp = atoi(field);
-			check_arg(tmp < 0, "Invalid ctx id at step %u!\n",
+			check_arg(tmp <= 0, "Invalid context id at step %u!\n",
 				  nr_steps);
 			step.context = tmp;
 
@@ -1054,7 +1180,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 
 			i = str_to_engine(field);
 			check_arg(i < 0,
-				  "Invalid engine id at step %u!\n", nr_steps);
+				"Invalid engine id at step %u!\n", nr_steps);
 
 			valid++;
 
@@ -1288,6 +1414,20 @@ __get_ctx(struct workload *wrk, const struct w_step *w)
 	return &wrk->ctx_list[w->context];
 }
 
+static struct exec_queue *
+get_eq(struct workload *wrk, const struct w_step *w)
+{
+	igt_assert(w->engine >= 0 && w->engine < NUM_ENGINES);
+
+	return &__get_ctx(wrk, w)->queues[w->engine];
+}
+
+static struct vm *
+get_vm(struct workload *wrk, const struct w_step *w)
+{
+	return wrk->vm_list;
+}
+
 static uint32_t mmio_base(int i915, enum intel_engine_id engine, int gen)
 {
 	const char *name;
@@ -1540,6 +1680,61 @@ alloc_step_batch(struct workload *wrk, struct w_step *w)
 #endif
 }
 
+static void
+xe_alloc_step_batch(struct workload *wrk, struct w_step *w)
+{
+	struct vm *vm = get_vm(wrk, w);
+	struct exec_queue *eq = get_eq(wrk, w);
+	struct dep_entry *dep;
+	int i;
+
+	w->bb_size = ALIGN(sizeof(*w->spin) + xe_cs_prefetch_size(fd),
+			   xe_get_default_alignment(fd));
+	w->bb_handle = xe_bo_create(fd, 0, vm->id, w->bb_size);
+	w->spin = xe_bo_map(fd, w->bb_handle, w->bb_size);
+	w->exec.address =
+		intel_allocator_alloc_with_strategy(vm->ahnd, w->bb_handle, w->bb_size,
+						    0, ALLOC_STRATEGY_LOW_TO_HIGH);
+	xe_vm_bind_sync(fd, vm->id, w->bb_handle, 0, w->exec.address, w->bb_size);
+	xe_spin_init_opts(w->spin, .addr = w->exec.address,
+				   .preempt = (w->preempt_us > 0),
+				   .ctx_ticks = duration_to_ctx_ticks(fd, eq->hwe.gt_id,
+								1000LL * get_duration(wrk, w)));
+	w->exec.exec_queue_id = eq->id;
+	w->exec.num_batch_buffer = 1;
+	/* always at least one out fence */
+	w->exec.num_syncs = 1;
+	/* count syncs */
+	igt_assert_eq(0, w->data_deps.nr);
+	for_each_dep(dep, w->fence_deps) {
+		int dep_idx = w->idx + dep->target;
+
+		igt_assert(dep_idx >= 0 && dep_idx < w->idx);
+		igt_assert(wrk->steps[dep_idx].type == SW_FENCE ||
+			   wrk->steps[dep_idx].type == BATCH);
+
+		w->exec.num_syncs++;
+	}
+	w->syncs = calloc(w->exec.num_syncs, sizeof(*w->syncs));
+	/* fill syncs */
+	i = 0;
+	/* out fence */
+	w->syncs[i].handle = syncobj_create(fd, 0);
+	w->syncs[i++].flags = DRM_XE_SYNC_SYNCOBJ | DRM_XE_SYNC_SIGNAL;
+	/* in fence(s) */
+	for_each_dep(dep, w->fence_deps) {
+		int dep_idx = w->idx + dep->target;
+
+		igt_assert(wrk->steps[dep_idx].type == SW_FENCE ||
+			   wrk->steps[dep_idx].type == BATCH);
+		igt_assert(wrk->steps[dep_idx].syncs && wrk->steps[dep_idx].syncs[0].handle);
+
+		w->syncs[i].handle = wrk->steps[dep_idx].syncs[0].handle;
+		w->syncs[i++].flags = DRM_XE_SYNC_SYNCOBJ;
+	}
+	w->exec.syncs = to_user_pointer(w->syncs);
+}
+
 static bool set_priority(uint32_t ctx_id, int prio)
 {
 	struct drm_i915_gem_context_param param = {
@@ -1766,6 +1961,61 @@ static void measure_active_set(struct workload *wrk)
 
 #define alloca0(sz) ({ size_t sz__ = (sz); memset(alloca(sz__), 0, sz__); })
 
+static void vm_create(struct vm *vm)
+{
+	uint32_t flags = 0;
+
+	if (vm->compute_mode)
+		flags |= DRM_XE_VM_CREATE_ASYNC_BIND_OPS |
+			 DRM_XE_VM_CREATE_COMPUTE_MODE;
+
+	vm->id = xe_vm_create(fd, flags, 0);
+}
+
+static void exec_queue_create(struct ctx *ctx, struct exec_queue *eq)
+{
+	struct drm_xe_exec_queue_create create = {
+		.vm_id = ctx->vm->id,
+		.width = 1,
+		.num_placements = 1,
+		.instances = to_user_pointer(&eq->hwe),
+	};
+	struct drm_xe_engine_class_instance *eci = NULL;
+
+	if (ctx->load_balance && eq->hwe.engine_class == DRM_XE_ENGINE_CLASS_VIDEO_DECODE) {
+		struct drm_xe_engine_class_instance *hwe;
+		int i;
+
+		for (i = 0; i < ctx->engine_map_count; ++i)
+			igt_assert(ctx->engine_map[i] == VCS || ctx->engine_map[i] == VCS1 ||
+				   ctx->engine_map[i] == VCS2);
+
+		eci = calloc(16, sizeof(struct drm_xe_engine_class_instance));
+		create.num_placements = 0;
+		xe_for_each_hw_engine(fd, hwe) {
+			if (hwe->engine_class != DRM_XE_ENGINE_CLASS_VIDEO_DECODE ||
+			    hwe->gt_id != 0)
+				continue;
+
+			igt_assert(create.num_placements < 16);
+			eci[create.num_placements++] = *hwe;
+		}
+		igt_assert(create.num_placements);
+		create.instances = to_user_pointer(eci);
+
+		if (verbose > 3)
+			printf("num_placements=%d class=%d gt=%d\n", create.num_placements,
+				eq->hwe.engine_class, eq->hwe.gt_id);
+	}
+
+	igt_assert_eq(igt_ioctl(fd, DRM_IOCTL_XE_EXEC_QUEUE_CREATE, &create), 0);
+
+	if (eci)
+		free(eci);
+
+	eq->id = create.exec_queue_id;
+}
+
 static int prepare_contexts(unsigned int id, struct workload *wrk)
 {
 	uint32_t share_vm = 0;
@@ -1796,6 +2046,84 @@ static int prepare_contexts(unsigned int id, struct workload *wrk)
 		max_ctx = ctx;
 	}
 
+	if (is_xe) {
+		int engine_classes[NUM_ENGINES] = {};
+
+		/* shortcut, create one vm */
+		wrk->nr_vms = 1;
+		wrk->vm_list = calloc(wrk->nr_vms, sizeof(struct vm));
+		wrk->vm_list->compute_mode = false;
+		vm_create(wrk->vm_list);
+		wrk->vm_list->ahnd =
+			intel_allocator_open(fd, wrk->vm_list->id, INTEL_ALLOCATOR_RELOC);
+
+		/* create exec queues of each referenced engine class */
+		for (j = 0; j < wrk->nr_ctxs; j++) {
+			struct ctx *ctx = &wrk->ctx_list[j];
+
+			/* link with vm */
+			ctx->vm = wrk->vm_list;
+
+			for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+				if (w->context != j)
+					continue;
+
+				if (w->type == ENGINE_MAP) {
+					ctx->engine_map = w->engine_map;
+					ctx->engine_map_count = w->engine_map_count;
+				} else if (w->type == LOAD_BALANCE) {
+					if (!ctx->engine_map) {
+						wsim_err("Load balancing needs an engine map!\n");
+						return 1;
+					}
+					if (intel_gen(intel_get_drm_devid(fd)) < 11) {
+						wsim_err("Load balancing needs relative mmio support, gen11+!\n");
+						return 1;
+					}
+					ctx->load_balance = w->load_balance;
+				}
+			}
+
+			/* create exec queue for each referenced engine */
+			for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+				if (w->context != j)
+					continue;
+
+				if (w->type == BATCH)
+					engine_classes[w->engine]++;
+			}
+
+			for (i = 0; i < NUM_ENGINES; i++) {
+				if (engine_classes[i]) {
+					if (verbose > 3)
+						printf("%u ctx[%d] eq(%s) load_balance=%d\n",
+							id, j, ring_str_map[i], ctx->load_balance);
+					if (i == VCS) {
+						ctx->queues[i].hwe.engine_class =
+							get_xe_engine(VCS1).engine_class;
+						ctx->queues[i].hwe.engine_instance = 1;
+					} else
+						ctx->queues[i].hwe = get_xe_engine(i);
+					exec_queue_create(ctx, &ctx->queues[i]);
+					/* init request list */
+					IGT_INIT_LIST_HEAD(&ctx->queues[i].requests);
+					ctx->queues[i].nrequest = 0;
+				}
+				engine_classes[i] = 0;
+			}
+		}
+
+		/* create syncobjs for SW_FENCE */
+		for (j = 0, i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++)
+			if (w->type == SW_FENCE) {
+				w->syncs = calloc(1, sizeof(struct drm_xe_sync));
+				w->syncs[0].handle = syncobj_create(fd, 0);
+				w->syncs[0].flags = DRM_XE_SYNC_SYNCOBJ;
+			}
+
+		return 0;
+	}
+
 	/*
 	 * Transfer over engine map configuration from the workload step.
 	 */
@@ -2075,7 +2403,8 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
 		}
 	}
 
-	prepare_working_sets(id, wrk);
+	if (!is_xe)
+		prepare_working_sets(id, wrk);
 
 	/*
 	 * Allocate batch buffers.
@@ -2084,10 +2413,14 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
 		if (w->type != BATCH)
 			continue;
 
-		alloc_step_batch(wrk, w);
+		if (is_xe)
+			xe_alloc_step_batch(wrk, w);
+		else
+			alloc_step_batch(wrk, w);
 	}
 
-	measure_active_set(wrk);
+	if (!is_xe)
+		measure_active_set(wrk);
 
 	return 0;
 }
@@ -2134,6 +2467,31 @@ static void w_sync_to(struct workload *wrk, struct w_step *w, int target)
 	w_step_sync(&wrk->steps[target]);
 }
 
+static void do_xe_exec(struct workload *wrk, struct w_step *w)
+{
+	struct exec_queue *eq = get_eq(wrk, w);
+
+	igt_assert(w->emit_fence <= 0);
+	if (w->emit_fence == -1)
+		syncobj_reset(fd, &w->syncs[0].handle, 1);
+
+	/* update duration if random */
+	if (w->duration.max != w->duration.min)
+		xe_spin_init_opts(w->spin, .addr = w->exec.address,
+					   .preempt = (w->preempt_us > 0),
+					   .ctx_ticks = duration_to_ctx_ticks(fd, eq->hwe.gt_id,
+								1000LL * get_duration(wrk, w)));
+	xe_exec(fd, &w->exec);
+
+	/* for qd_throttle */
+	if (w->rq_link.prev != NULL || w->rq_link.next != NULL) {
+		igt_list_del(&w->rq_link);
+		eq->nrequest--;
+	}
+	igt_list_add_tail(&w->rq_link, &eq->requests);
+	eq->nrequest++;
+}
+
 static void
 do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine)
 {
@@ -2258,6 +2616,10 @@ static void *run_workload(void *data)
 					sw_sync_timeline_create_fence(wrk->sync_timeline,
 								      cur_seqno + w->idx);
 				igt_assert(w->emit_fence > 0);
+				if (is_xe)
+					/* Convert sync file to syncobj */
+					syncobj_import_sync_file(fd, w->syncs[0].handle,
+								 w->emit_fence);
 				continue;
 			} else if (w->type == SW_FENCE_SIGNAL) {
 				int tgt = w->idx + w->target;
@@ -2270,6 +2632,9 @@ static void *run_workload(void *data)
 				sw_sync_timeline_inc(wrk->sync_timeline, inc);
 				continue;
 			} else if (w->type == CTX_PRIORITY) {
+				if (is_xe)
+					continue;
+
 				if (w->priority != wrk->ctx_list[w->context].priority) {
 					struct drm_i915_gem_context_param param = {
 						.ctx_id = wrk->ctx_list[w->context].id,
@@ -2289,7 +2654,10 @@ static void *run_workload(void *data)
 				igt_assert(wrk->steps[t_idx].type == BATCH);
 				igt_assert(wrk->steps[t_idx].duration.unbound_duration);
 
-				*wrk->steps[t_idx].bb_duration = 0xffffffff;
+				if (is_xe)
+					xe_spin_end(wrk->steps[t_idx].spin);
+				else
+					*wrk->steps[t_idx].bb_duration = 0xffffffff;
 				__sync_synchronize();
 				continue;
 			} else if (w->type == SSEU) {
@@ -2321,15 +2689,19 @@ static void *run_workload(void *data)
 			if (throttle > 0)
 				w_sync_to(wrk, w, i - throttle);
 
-			do_eb(wrk, w, engine);
+			if (is_xe)
+				do_xe_exec(wrk, w);
+			else {
+				do_eb(wrk, w, engine);
 
-			if (w->request != -1) {
-				igt_list_del(&w->rq_link);
-				wrk->nrequest[w->request]--;
+				if (w->request != -1) {
+					igt_list_del(&w->rq_link);
+					wrk->nrequest[w->request]--;
+				}
+				w->request = engine;
+				igt_list_add_tail(&w->rq_link, &wrk->requests[engine]);
+				wrk->nrequest[engine]++;
 			}
-			w->request = engine;
-			igt_list_add_tail(&w->rq_link, &wrk->requests[engine]);
-			wrk->nrequest[engine]++;
 
 			if (!wrk->run)
 				break;
@@ -2338,17 +2710,32 @@ static void *run_workload(void *data)
 				w_step_sync(w);
 
 			if (qd_throttle > 0) {
-				while (wrk->nrequest[engine] > qd_throttle) {
-					struct w_step *s;
+				if (is_xe) {
+					struct exec_queue *eq = get_eq(wrk, w);
+
+					while (eq->nrequest > qd_throttle) {
+						struct w_step *s;
+
+						s = igt_list_first_entry(&eq->requests, s, rq_link);
+
+						w_step_sync(s);
 
-					s = igt_list_first_entry(&wrk->requests[engine],
-								 s, rq_link);
+						igt_list_del(&s->rq_link);
+						eq->nrequest--;
+					}
+				} else {
+					while (wrk->nrequest[engine] > qd_throttle) {
+						struct w_step *s;
+
+						s = igt_list_first_entry(&wrk->requests[engine],
+									s, rq_link);
 
 						w_step_sync(s);
 
-					s->request = -1;
-					igt_list_del(&s->rq_link);
-					wrk->nrequest[engine]--;
+						s->request = -1;
+						igt_list_del(&s->rq_link);
+						wrk->nrequest[engine]--;
+					}
 				}
 			}
 		}
@@ -2365,18 +2752,51 @@ static void *run_workload(void *data)
 		for (i = 0, w = wrk->steps; wrk->run && (i < wrk->nr_steps);
 		     i++, w++) {
 			if (w->emit_fence > 0) {
-				close(w->emit_fence);
-				w->emit_fence = -1;
+				if (is_xe) {
+					igt_assert(w->type == SW_FENCE);
+					close(w->emit_fence);
+					w->emit_fence = -1;
+					syncobj_reset(fd, &w->syncs[0].handle, 1);
+				} else {
+					close(w->emit_fence);
+					w->emit_fence = -1;
+				}
 			}
 		}
 	}
 
-	for (i = 0; i < NUM_ENGINES; i++) {
-		if (!wrk->nrequest[i])
-			continue;
+	if (is_xe) {
+		struct exec_queue *eq;
+		struct ctx *ctx;
 
-		w = igt_list_last_entry(&wrk->requests[i], w, rq_link);
-		w_step_sync(w);
+		for_each_ctx(ctx, wrk)
+			for_each_exec_queue(eq, ctx)
+				if (eq->nrequest) {
+					w = igt_list_last_entry(&eq->requests, w, rq_link);
+					w_step_sync(w);
+				}
+
+		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+			if (w->type == BATCH) {
+				syncobj_destroy(fd, w->syncs[0].handle);
+				free(w->syncs);
+				xe_vm_unbind_sync(fd, get_vm(wrk, w)->id, 0, w->exec.address,
+						  w->bb_size);
+				gem_munmap(w->spin, w->bb_size);
+				gem_close(fd, w->bb_handle);
+			} else if (w->type == SW_FENCE) {
+				syncobj_destroy(fd, w->syncs[0].handle);
+				free(w->syncs);
+			}
+		}
+	} else {
+		for (i = 0; i < NUM_ENGINES; i++) {
+			if (!wrk->nrequest[i])
+				continue;
+
+			w = igt_list_last_entry(&wrk->requests[i], w, rq_link);
+			w_step_sync(w);
+		}
 	}
 
 	clock_gettime(CLOCK_MONOTONIC, &t_end);
@@ -2398,6 +2818,23 @@ static void *run_workload(void *data)
 
 static void fini_workload(struct workload *wrk)
 {
+	if (is_xe) {
+		struct exec_queue *eq;
+		struct ctx *ctx;
+		struct vm *vm;
+
+		for_each_ctx(ctx, wrk)
+			for_each_exec_queue(eq, ctx) {
+				xe_exec_queue_destroy(fd, eq->id);
+				eq->id = 0;
+			}
+		for_each_vm(vm, wrk) {
+			put_ahnd(vm->ahnd);
+			xe_vm_destroy(fd, vm->id);
+		}
+		free(wrk->vm_list);
+		wrk->nr_vms = 0;
+	}
 	free(wrk->steps);
 	free(wrk);
 }
@@ -2605,8 +3042,12 @@ int main(int argc, char **argv)
 		ret = igt_device_find_first_i915_discrete_card(&card);
 		if (!ret)
 			ret = igt_device_find_integrated_card(&card);
+		if (!ret)
+			ret = igt_device_find_first_xe_discrete_card(&card);
+		if (!ret)
+			ret = igt_device_find_xe_integrated_card(&card);
 		if (!ret) {
-			wsim_err("No device filter specified and no i915 devices found!\n");
+			wsim_err("No device filter specified and no intel devices found!\n");
 			return EXIT_FAILURE;
 		}
 	}
@@ -2629,6 +3070,10 @@ int main(int argc, char **argv)
 	if (verbose > 1)
 		printf("Using device %s\n", drm_dev);
 
+	is_xe = is_xe_device(fd);
+	if (is_xe)
+		xe_device_get(fd);
+
 	if (!nr_w_args) {
 		wsim_err("No workload descriptor(s)!\n");
 		goto err;
diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
index e4fd61645..f49a73989 100644
--- a/benchmarks/wsim/README
+++ b/benchmarks/wsim/README
@@ -88,6 +88,10 @@ Batch durations can also be specified as infinite by using the '*' in the
 duration field. Such batches must be ended by the terminate command ('T')
 otherwise they will cause a GPU hang to be reported.
 
+Note: On Xe Batch dependencies are expressed with syncobjects,
+so there is no difference between f-1 and -1
+ex. 1.1000.-2.0 is same as 1.1000.f-2.0.
+
 Sync (fd) fences
 ----------------
 
@@ -131,7 +135,7 @@ runnable. When the second RCS batch completes the standalone fence is signaled
 which allows the two VCS batches to be executed. Finally we wait until the both
 VCS batches have completed before starting the (optional) next iteration.
 
-Submit fences
+Submit fences (i915 only?)
 -------------
 
 Submit fences are a type of input fence which are signalled when the originating
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [igt-dev] ✓ CI.xeBAT: success for benchmarks/gem_wsim: added basic xe support (rev3)
  2023-09-26  8:44 [igt-dev] [PATCH i-g-t 00/14] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (13 preceding siblings ...)
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 14/14] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
@ 2023-09-26 10:03 ` Patchwork
  2023-09-26 10:11 ` [igt-dev] ✗ Fi.CI.BAT: failure " Patchwork
  2023-09-26 11:56 ` [igt-dev] ✗ Fi.CI.BUILD: failure for benchmarks/gem_wsim: added basic xe support (rev4) Patchwork
  16 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2023-09-26 10:03 UTC (permalink / raw)
  To: Marcin Bernatowicz; +Cc: igt-dev

[-- Attachment #1: Type: text/plain, Size: 1458 bytes --]

== Series Details ==

Series: benchmarks/gem_wsim: added basic xe support (rev3)
URL   : https://patchwork.freedesktop.org/series/122920/
State : success

== Summary ==

CI Bug Log - changes from XEIGT_7503_BAT -> XEIGTPW_9874_BAT
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (4 -> 4)
------------------------------

  No changes in participating hosts

Known issues
------------

  Here are the changes found in XEIGTPW_9874_BAT that come from known issues:

### IGT changes ###

#### Possible fixes ####

  * {igt@xe_create@create-execqueues-noleak}:
    - bat-atsm-2:         [FAIL][1] ([Intel XE#524]) -> [PASS][2]
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/IGT_7503/bat-atsm-2/igt@xe_create@create-execqueues-noleak.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_9874/bat-atsm-2/igt@xe_create@create-execqueues-noleak.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [Intel XE#524]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/524


Build changes
-------------

  * IGT: IGT_7503 -> IGTPW_9874

  IGTPW_9874: 9874
  IGT_7503: 7503
  xe-396-fc8ec3c56efa5c15b630ddc17c89100440fe03ef: fc8ec3c56efa5c15b630ddc17c89100440fe03ef

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_9874/index.html

[-- Attachment #2: Type: text/html, Size: 2030 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [igt-dev] ✗ Fi.CI.BAT: failure for benchmarks/gem_wsim: added basic xe support (rev3)
  2023-09-26  8:44 [igt-dev] [PATCH i-g-t 00/14] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (14 preceding siblings ...)
  2023-09-26 10:03 ` [igt-dev] ✓ CI.xeBAT: success for benchmarks/gem_wsim: added basic xe support (rev3) Patchwork
@ 2023-09-26 10:11 ` Patchwork
  2023-09-26 11:56 ` [igt-dev] ✗ Fi.CI.BUILD: failure for benchmarks/gem_wsim: added basic xe support (rev4) Patchwork
  16 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2023-09-26 10:11 UTC (permalink / raw)
  To: Marcin Bernatowicz; +Cc: igt-dev

[-- Attachment #1: Type: text/plain, Size: 13609 bytes --]

== Series Details ==

Series: benchmarks/gem_wsim: added basic xe support (rev3)
URL   : https://patchwork.freedesktop.org/series/122920/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_13680 -> IGTPW_9874
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with IGTPW_9874 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in IGTPW_9874, please notify your bug team (lgci.bug.filing@intel.com) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/index.html

Participating hosts (38 -> 40)
------------------------------

  Additional (4): bat-dg2-8 bat-adlm-1 bat-adlp-11 fi-hsw-4770 
  Missing    (2): fi-kbl-soraka fi-snb-2520m 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in IGTPW_9874:

### IGT changes ###

#### Possible regressions ####

  * igt@kms_pipe_crc_basic@suspend-read-crc@pipe-b-dp-1:
    - bat-dg2-8:          NOTRUN -> [INCOMPLETE][1]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-dg2-8/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-b-dp-1.html

  * igt@kms_pipe_crc_basic@suspend-read-crc@pipe-b-hdmi-a-3:
    - bat-dg2-11:         [PASS][2] -> [INCOMPLETE][3]
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13680/bat-dg2-11/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-b-hdmi-a-3.html
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-dg2-11/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-b-hdmi-a-3.html

  
Known issues
------------

  Here are the changes found in IGTPW_9874 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@debugfs_test@basic-hwmon:
    - bat-adlp-11:        NOTRUN -> [SKIP][4] ([i915#9318])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlp-11/igt@debugfs_test@basic-hwmon.html
    - fi-hsw-4770:        NOTRUN -> [SKIP][5] ([fdo#109271])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/fi-hsw-4770/igt@debugfs_test@basic-hwmon.html
    - bat-adlm-1:         NOTRUN -> [SKIP][6] ([i915#3826])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlm-1/igt@debugfs_test@basic-hwmon.html

  * igt@fbdev@eof:
    - bat-adlm-1:         NOTRUN -> [SKIP][7] ([i915#2582]) +3 other tests skip
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlm-1/igt@fbdev@eof.html

  * igt@fbdev@info:
    - bat-adlm-1:         NOTRUN -> [SKIP][8] ([i915#1849] / [i915#2582])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlm-1/igt@fbdev@info.html

  * igt@gem_lmem_swapping@parallel-random-engines:
    - bat-adlm-1:         NOTRUN -> [SKIP][9] ([i915#4613]) +3 other tests skip
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlm-1/igt@gem_lmem_swapping@parallel-random-engines.html

  * igt@gem_mmap@basic:
    - bat-dg2-8:          NOTRUN -> [SKIP][10] ([i915#4083])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-dg2-8/igt@gem_mmap@basic.html

  * igt@gem_mmap_gtt@basic:
    - bat-dg2-8:          NOTRUN -> [SKIP][11] ([i915#4077]) +2 other tests skip
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-dg2-8/igt@gem_mmap_gtt@basic.html

  * igt@gem_tiled_pread_basic:
    - bat-dg2-8:          NOTRUN -> [SKIP][12] ([i915#4079]) +1 other test skip
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-dg2-8/igt@gem_tiled_pread_basic.html
    - bat-adlm-1:         NOTRUN -> [SKIP][13] ([i915#3282])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlm-1/igt@gem_tiled_pread_basic.html
    - bat-adlp-11:        NOTRUN -> [SKIP][14] ([i915#3282])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlp-11/igt@gem_tiled_pread_basic.html

  * igt@i915_pm_rps@basic-api:
    - bat-dg2-8:          NOTRUN -> [SKIP][15] ([i915#6621])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-dg2-8/igt@i915_pm_rps@basic-api.html
    - bat-adlm-1:         NOTRUN -> [SKIP][16] ([i915#6621])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlm-1/igt@i915_pm_rps@basic-api.html

  * igt@i915_selftest@live@requests:
    - bat-atsm-1:         [PASS][17] -> [INCOMPLETE][18] ([i915#7913])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13680/bat-atsm-1/igt@i915_selftest@live@requests.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-atsm-1/igt@i915_selftest@live@requests.html

  * igt@i915_suspend@basic-s3-without-i915:
    - bat-dg2-8:          NOTRUN -> [SKIP][19] ([i915#6645])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-dg2-8/igt@i915_suspend@basic-s3-without-i915.html

  * igt@kms_addfb_basic@addfb25-y-tiled-small-legacy:
    - bat-dg2-8:          NOTRUN -> [SKIP][20] ([i915#5190])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-dg2-8/igt@kms_addfb_basic@addfb25-y-tiled-small-legacy.html

  * igt@kms_addfb_basic@basic-y-tiled-legacy:
    - bat-dg2-8:          NOTRUN -> [SKIP][21] ([i915#4215] / [i915#5190])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-dg2-8/igt@kms_addfb_basic@basic-y-tiled-legacy.html

  * igt@kms_addfb_basic@framebuffer-vs-set-tiling:
    - bat-dg2-8:          NOTRUN -> [SKIP][22] ([i915#4212]) +6 other tests skip
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-dg2-8/igt@kms_addfb_basic@framebuffer-vs-set-tiling.html

  * igt@kms_addfb_basic@tile-pitch-mismatch:
    - bat-dg2-8:          NOTRUN -> [SKIP][23] ([i915#4212] / [i915#5608])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-dg2-8/igt@kms_addfb_basic@tile-pitch-mismatch.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy:
    - bat-adlp-11:        NOTRUN -> [SKIP][24] ([i915#4103] / [i915#5608]) +1 other test skip
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlp-11/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy.html
    - bat-dg2-8:          NOTRUN -> [SKIP][25] ([i915#4103] / [i915#4213] / [i915#5608]) +1 other test skip
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-dg2-8/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy.html

  * igt@kms_cursor_legacy@basic-flip-after-cursor-varying-size:
    - bat-adlm-1:         NOTRUN -> [SKIP][26] ([i915#1845]) +17 other tests skip
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlm-1/igt@kms_cursor_legacy@basic-flip-after-cursor-varying-size.html

  * igt@kms_dsc@dsc-basic:
    - bat-adlp-11:        NOTRUN -> [SKIP][27] ([i915#3555] / [i915#3840])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlp-11/igt@kms_dsc@dsc-basic.html

  * igt@kms_flip@basic-plain-flip:
    - bat-adlm-1:         NOTRUN -> [SKIP][28] ([i915#3637]) +3 other tests skip
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlm-1/igt@kms_flip@basic-plain-flip.html

  * igt@kms_force_connector_basic@force-load-detect:
    - bat-adlm-1:         NOTRUN -> [SKIP][29] ([fdo#109285])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlm-1/igt@kms_force_connector_basic@force-load-detect.html
    - bat-dg2-8:          NOTRUN -> [SKIP][30] ([fdo#109285])
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-dg2-8/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_force_connector_basic@prune-stale-modes:
    - bat-adlp-11:        NOTRUN -> [SKIP][31] ([i915#4093]) +3 other tests skip
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlp-11/igt@kms_force_connector_basic@prune-stale-modes.html
    - bat-dg2-8:          NOTRUN -> [SKIP][32] ([i915#5274])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-dg2-8/igt@kms_force_connector_basic@prune-stale-modes.html

  * igt@kms_frontbuffer_tracking@basic:
    - bat-adlm-1:         NOTRUN -> [SKIP][33] ([i915#1849])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlm-1/igt@kms_frontbuffer_tracking@basic.html

  * igt@kms_hdmi_inject@inject-audio:
    - bat-adlp-11:        NOTRUN -> [SKIP][34] ([i915#4369])
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlp-11/igt@kms_hdmi_inject@inject-audio.html

  * igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-d-dp-5:
    - bat-adlp-11:        NOTRUN -> [ABORT][35] ([i915#8668])
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlp-11/igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-d-dp-5.html

  * igt@kms_psr@cursor_plane_move:
    - bat-dg2-8:          NOTRUN -> [SKIP][36] ([i915#1072]) +3 other tests skip
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-dg2-8/igt@kms_psr@cursor_plane_move.html
    - bat-adlm-1:         NOTRUN -> [SKIP][37] ([i915#1072]) +3 other tests skip
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlm-1/igt@kms_psr@cursor_plane_move.html

  * igt@kms_setmode@basic-clone-single-crtc:
    - bat-dg2-8:          NOTRUN -> [SKIP][38] ([i915#3555])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-dg2-8/igt@kms_setmode@basic-clone-single-crtc.html
    - bat-adlm-1:         NOTRUN -> [SKIP][39] ([i915#3555])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlm-1/igt@kms_setmode@basic-clone-single-crtc.html

  * igt@prime_vgem@basic-fence-flip:
    - bat-dg2-8:          NOTRUN -> [SKIP][40] ([i915#3708])
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-dg2-8/igt@prime_vgem@basic-fence-flip.html
    - bat-adlm-1:         NOTRUN -> [SKIP][41] ([i915#1845] / [i915#3708])
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlm-1/igt@prime_vgem@basic-fence-flip.html

  * igt@prime_vgem@basic-fence-mmap:
    - bat-dg2-8:          NOTRUN -> [SKIP][42] ([i915#3708] / [i915#4077]) +1 other test skip
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-dg2-8/igt@prime_vgem@basic-fence-mmap.html

  * igt@prime_vgem@basic-write:
    - bat-dg2-8:          NOTRUN -> [SKIP][43] ([i915#3291] / [i915#3708]) +2 other tests skip
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-dg2-8/igt@prime_vgem@basic-write.html
    - bat-adlm-1:         NOTRUN -> [SKIP][44] ([i915#3708]) +2 other tests skip
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-adlm-1/igt@prime_vgem@basic-write.html

  
#### Possible fixes ####

  * igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-d-edp-1:
    - bat-rplp-1:         [ABORT][45] ([i915#8668]) -> [PASS][46]
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13680/bat-rplp-1/igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-d-edp-1.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/bat-rplp-1/igt@kms_pipe_crc_basic@read-crc-frame-sequence@pipe-d-edp-1.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#1849]: https://gitlab.freedesktop.org/drm/intel/issues/1849
  [i915#2582]: https://gitlab.freedesktop.org/drm/intel/issues/2582
  [i915#3282]: https://gitlab.freedesktop.org/drm/intel/issues/3282
  [i915#3291]: https://gitlab.freedesktop.org/drm/intel/issues/3291
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3637]: https://gitlab.freedesktop.org/drm/intel/issues/3637
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#3826]: https://gitlab.freedesktop.org/drm/intel/issues/3826
  [i915#3840]: https://gitlab.freedesktop.org/drm/intel/issues/3840
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4093]: https://gitlab.freedesktop.org/drm/intel/issues/4093
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212
  [i915#4213]: https://gitlab.freedesktop.org/drm/intel/issues/4213
  [i915#4215]: https://gitlab.freedesktop.org/drm/intel/issues/4215
  [i915#4369]: https://gitlab.freedesktop.org/drm/intel/issues/4369
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#5190]: https://gitlab.freedesktop.org/drm/intel/issues/5190
  [i915#5274]: https://gitlab.freedesktop.org/drm/intel/issues/5274
  [i915#5354]: https://gitlab.freedesktop.org/drm/intel/issues/5354
  [i915#5608]: https://gitlab.freedesktop.org/drm/intel/issues/5608
  [i915#6621]: https://gitlab.freedesktop.org/drm/intel/issues/6621
  [i915#6645]: https://gitlab.freedesktop.org/drm/intel/issues/6645
  [i915#7913]: https://gitlab.freedesktop.org/drm/intel/issues/7913
  [i915#8668]: https://gitlab.freedesktop.org/drm/intel/issues/8668
  [i915#9318]: https://gitlab.freedesktop.org/drm/intel/issues/9318


Build changes
-------------

  * CI: CI-20190529 -> None
  * IGT: IGT_7503 -> IGTPW_9874

  CI-20190529: 20190529
  CI_DRM_13680: d60e6a65cc963bc77b655a3ed21c6989bbaa9cbf @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_9874: 9874
  IGT_7503: 7503

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9874/index.html

[-- Attachment #2: Type: text/html, Size: 16514 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 02/14] benchmarks/gem_wsim: reposition the unbound duration boolean
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 02/14] benchmarks/gem_wsim: reposition the unbound duration boolean Marcin Bernatowicz
@ 2023-09-26 10:23   ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2023-09-26 10:23 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 26/09/2023 09:44, Marcin Bernatowicz wrote:
> All duration info is now in struct duration of w_step.
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 10 +++++-----
>   1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 7b5e62a3b..90a36f7de 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -73,6 +73,7 @@ enum intel_engine_id {
>   
>   struct duration {
>   	unsigned int min, max;
> +	bool unbound_duration;

I guess '_duration' suffix is now redundant so with it removed:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

P.S. I trust the patch is useful for later in the series, haven't gotten 
that far yet.

>   };
>   
>   enum w_type
> @@ -145,7 +146,6 @@ struct w_step
>   	unsigned int context;
>   	unsigned int engine;
>   	struct duration duration;
> -	bool unbound_duration;
>   	struct deps data_deps;
>   	struct deps fence_deps;
>   	int emit_fence;
> @@ -1130,7 +1130,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   				check_arg(intel_gen(intel_get_drm_devid(fd)) < 8,
>   					  "Infinite batch at step %u needs Gen8+!\n",
>   					  nr_steps);
> -				step.unbound_duration = true;
> +				step.duration.unbound_duration = true;
>   			} else {
>   				tmpl = strtol(field, &sep, 10);
>   				check_arg(tmpl <= 0 || tmpl == LONG_MIN ||
> @@ -2172,8 +2172,8 @@ update_bb_start(struct workload *wrk, struct w_step *w)
>   
>   	/* ticks is inverted for MI_DO_COMPARE (less-than comparison) */
>   	ticks = 0;
> -	if (!w->unbound_duration)
> -		ticks = ~ns_to_ctx_ticks(1000 * get_duration(wrk, w));
> +	if (!w->duration.unbound_duration)
> +		ticks = ~ns_to_ctx_ticks(1000LL * get_duration(wrk, w));
>   
>   	*w->bb_duration = ticks;
>   }
> @@ -2349,7 +2349,7 @@ static void *run_workload(void *data)
>   
>   				igt_assert(t_idx >= 0 && t_idx < i);
>   				igt_assert(wrk->steps[t_idx].type == BATCH);
> -				igt_assert(wrk->steps[t_idx].unbound_duration);
> +				igt_assert(wrk->steps[t_idx].duration.unbound_duration);
>   
>   				*wrk->steps[t_idx].bb_duration = 0xffffffff;
>   				__sync_synchronize();

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 03/14] benchmarks/gem_wsim: fix scaling of period steps
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 03/14] benchmarks/gem_wsim: fix scaling of period steps Marcin Bernatowicz
@ 2023-09-26 10:28   ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2023-09-26 10:28 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 26/09/2023 09:44, Marcin Bernatowicz wrote:
> Period steps take scale time (-F) command line option into account.
> This allows to scale workload without need to modify .wsim file
> 
> ex. having following example.wsim
> 
> 1.VCS1.3000.0.1
> 1.RCS.500-1000.-1.0
> 1.RCS.3700.0.0
> 1.RCS.1000.-2.0
> 1.VCS2.2300.-2.0
> 1.RCS.4700.-1.0
> 1.VCS2.600.-1.1
> p.16000
> 
> we can scale the whole workload x10 with:
> 
> gem_wsim -w example.wsim -f 10 -F 10
> 
> -f is for batch duration steps, -F for period and delay steps
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 90a36f7de..65061461d 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -899,8 +899,14 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   				int_field(DELAY, delay, tmp <= 0,
>   					  "Invalid delay at step %u!\n");
>   			} else if (!strcmp(field, "p")) {
> -				int_field(PERIOD, period, tmp <= 0,
> -					  "Invalid period at step %u!\n");
> +				field = strtok_r(fstart, ".", &fctx);
> +				if (field) {
> +					tmp = atoi(field);
> +					check_arg(tmp <= 0, "Invalid period at step %u!\n", nr_steps);
> +					step.type = PERIOD;
> +					step.period = __duration(tmp, scale_time);
> +					goto add_step;
> +				}

Why not do it with fewer added lines of code where the delay steps are currently scaled?

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 7b5e62a3be53..486ab0124063 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -1186,6 +1186,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
  add_step:
                 if (step.type == DELAY)
                         step.delay = __duration(step.delay, scale_time);
+               else if (step.type == PERIOD)
+                       step.period = __duration(step.period, scale_time);
  
                 step.idx = nr_steps++;
                 step.request = -1;

Regards,

Tvrtko

>   			} else if (!strcmp(field, "P")) {
>   				unsigned int nr = 0;
>   				while ((field = strtok_r(fstart, ".", &fctx))) {

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 04/14] benchmarks/gem_wsim: fix duration range check
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 04/14] benchmarks/gem_wsim: fix duration range check Marcin Bernatowicz
@ 2023-09-26 10:40   ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2023-09-26 10:40 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson



On 26/09/2023 09:44, Marcin Bernatowicz wrote:
> When scale duration (-f) command line option is provided,
> the max duration check does not take it into account, fix it.
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 65061461d..4f0deb095 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -1148,7 +1148,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   				if (sep && *sep == '-') {
>   					tmpl = strtol(sep + 1, NULL, 10);
>   					check_arg(tmpl <= 0 ||
> -						tmpl <= step.duration.min ||
> +						__duration(tmpl, scale_dur) <= step.duration.min ||

Right!

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

>   						tmpl == LONG_MIN ||
>   						tmpl == LONG_MAX,
>   						"Invalid duration range at step %u!\n",

Could improve the error message with 's/duration range/maximum 
duration/' while at it if you want.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 05/14] benchmarks/gem_wsim: extract duration parsing code to new function
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 05/14] benchmarks/gem_wsim: extract duration parsing code to new function Marcin Bernatowicz
@ 2023-09-26 10:48   ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2023-09-26 10:48 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 26/09/2023 09:44, Marcin Bernatowicz wrote:
> Moved code from parse_workload to separate function.
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 67 ++++++++++++++++++++++++-------------------
>   1 file changed, 37 insertions(+), 30 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 4f0deb095..aeb959364 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -860,6 +860,40 @@ static long __duration(long dur, double scale)
>   	return round(scale * dur);
>   }
>   
> +static int
> +parse_duration(unsigned int nr_steps, struct duration *dur, double scale_dur, char *_desc)

Nitpick - underscore _desc reads a bit odd when there is no aliasing or 
anything like that? If you even kept the name 'field' code movement 
would be more preserved. FWIW.

> +{
> +	char *sep = NULL;
> +	long tmpl;
> +
> +	if (_desc[0] == '*') {
> +		if (intel_gen(intel_get_drm_devid(fd)) < 8) {
> +			wsim_err("Infinite batch at step %u needs Gen8+!\n", nr_steps);
> +			return -1;
> +		}
> +		dur->unbound_duration = true;
> +	} else {
> +		tmpl = strtol(_desc, &sep, 10);
> +		if (tmpl <= 0 || tmpl == LONG_MIN || tmpl == LONG_MAX)
> +			return -1;

Hm I see now that the suggestion to improve the error message from the 
previous patch would be lost here.

Would it work to emit wsim_err directly from here and below? Then make 
the caller omit the check_arg and just return NULL.

Regards,

Tvrtko

> +
> +		dur->min = __duration(tmpl, scale_dur);
> +
> +		if (sep && *sep == '-') {
> +			tmpl = strtol(sep + 1, NULL, 10);
> +			if (tmpl <= 0 || __duration(tmpl, scale_dur) <= dur->min ||
> +			    tmpl == LONG_MIN || tmpl == LONG_MAX)
> +				return -1;
> +
> +			dur->max = __duration(tmpl, scale_dur);
> +		} else {
> +			dur->max = dur->min;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>   #define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
>   	if ((field = strtok_r(fstart, ".", &fctx))) { \
>   		tmp = atoi(field); \
> @@ -1127,38 +1161,11 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   		}
>   
>   		if ((field = strtok_r(fstart, ".", &fctx))) {
> -			char *sep = NULL;
> -			long int tmpl;
> -
>   			fstart = NULL;
>   
> -			if (field[0] == '*') {
> -				check_arg(intel_gen(intel_get_drm_devid(fd)) < 8,
> -					  "Infinite batch at step %u needs Gen8+!\n",
> -					  nr_steps);
> -				step.duration.unbound_duration = true;
> -			} else {
> -				tmpl = strtol(field, &sep, 10);
> -				check_arg(tmpl <= 0 || tmpl == LONG_MIN ||
> -					  tmpl == LONG_MAX,
> -					  "Invalid duration at step %u!\n",
> -					  nr_steps);
> -				step.duration.min = __duration(tmpl, scale_dur);
> -
> -				if (sep && *sep == '-') {
> -					tmpl = strtol(sep + 1, NULL, 10);
> -					check_arg(tmpl <= 0 ||
> -						__duration(tmpl, scale_dur) <= step.duration.min ||
> -						tmpl == LONG_MIN ||
> -						tmpl == LONG_MAX,
> -						"Invalid duration range at step %u!\n",
> -						nr_steps);
> -					step.duration.max = __duration(tmpl,
> -								       scale_dur);
> -				} else {
> -					step.duration.max = step.duration.min;
> -				}
> -			}
> +			tmp = parse_duration(nr_steps, &step.duration, scale_dur, field);
> +			check_arg(tmp < 0,
> +				  "Invalid duration at step %u!\n", nr_steps);
>   
>   			valid++;
>   		}

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 06/14] benchmarks/gem_wsim: fix conflicting SSEU #define and enum
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 06/14] benchmarks/gem_wsim: fix conflicting SSEU #define and enum Marcin Bernatowicz
@ 2023-09-26 10:51   ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2023-09-26 10:51 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 26/09/2023 09:44, Marcin Bernatowicz wrote:
> One SSEU is in enum w_step and then as #define SSEU (1 << 3).
> Fix this.
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index aeb959364..3b01340bf 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -238,7 +238,7 @@ static struct drm_i915_gem_context_param_sseu device_sseu = {
>   
>   #define SYNCEDCLIENTS	(1<<1)
>   #define DEPSYNC		(1<<2)
> -#define SSEU		(1<<3)
> +#define FLAG_SSEU	(1<<3)

Right, it worked with no side effects because enum SSEU is above the 
highest flag bit.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Prefix all flags with FLAG_ while at it?

Regards,

Tvrtko

>   
>   static const char *ring_str_map[NUM_ENGINES] = {
>   	[DEFAULT] = "DEFAULT",
> @@ -2597,7 +2597,7 @@ int main(int argc, char **argv)
>   			/* Fall through */
>   		case 'w':
>   			w_args = add_workload_arg(w_args, ++nr_w_args, optarg,
> -						  prio, flags & SSEU);
> +						  prio, flags & FLAG_SSEU);
>   			break;
>   		case 'p':
>   			prio = atoi(optarg);
> @@ -2626,7 +2626,7 @@ int main(int argc, char **argv)
>   			flags |= SYNCEDCLIENTS;
>   			break;
>   		case 's':
> -			flags ^= SSEU;
> +			flags ^= FLAG_SSEU;
>   			break;
>   		case 'd':
>   			flags |= DEPSYNC;

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 07/14] benchmarks/gem_wsim: cleanups
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 07/14] benchmarks/gem_wsim: cleanups Marcin Bernatowicz
@ 2023-09-26 11:08   ` Tvrtko Ursulin
  2023-09-27 19:03     ` Bernatowicz, Marcin
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2023-09-26 11:08 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 26/09/2023 09:44, Marcin Bernatowicz wrote:
> Cleaning checkpatch.pl reported warnings/errors.
> Removed unused fence_signal field from struct w_step.
> calloc vs malloc in parse_workload for struct workload.
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 56 ++++++++++++++++++++++++++-----------------
>   1 file changed, 34 insertions(+), 22 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 3b01340bf..daa20fb8a 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -1,3 +1,4 @@
> +// SPDX-License-Identifier: MIT
>   /*
>    * Copyright © 2017 Intel Corporation
>    *
> @@ -76,8 +77,7 @@ struct duration {
>   	bool unbound_duration;
>   };
>   
> -enum w_type
> -{
> +enum w_type {
>   	BATCH,
>   	SYNC,
>   	DELAY,
> @@ -102,8 +102,7 @@ struct dep_entry {
>   	int working_set; /* -1 = step dependecy, >= 0 working set id */
>   };
>   
> -struct deps
> -{
> +struct deps {
>   	int nr;
>   	bool submit_fence;
>   	struct dep_entry *list;
> @@ -137,8 +136,7 @@ struct working_set {
>   
>   struct workload;
>   
> -struct w_step
> -{
> +struct w_step {
>   	struct workload *wrk;
>   
>   	/* Workload step metadata */
> @@ -155,7 +153,6 @@ struct w_step
>   		int period;
>   		int target;
>   		int throttle;
> -		int fence_signal;
>   		int priority;
>   		struct {
>   			unsigned int engine_map_count;
> @@ -194,8 +191,7 @@ struct ctx {
>   	uint64_t sseu;
>   };
>   
> -struct workload
> -{
> +struct workload {
>   	unsigned int id;
>   
>   	unsigned int nr_steps;
> @@ -807,6 +803,7 @@ static int add_buffers(struct working_set *set, char *str)
>   
>   	for (i = 0; i < add; i++) {
>   		struct work_buffer_size *sz = &sizes[set->nr + i];
> +
>   		sz->min = min_sz;
>   		sz->max = max_sz;
>   		sz->size = 0;
> @@ -895,13 +892,16 @@ parse_duration(unsigned int nr_steps, struct duration *dur, double scale_dur, ch
>   }
>   
>   #define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
> -	if ((field = strtok_r(fstart, ".", &fctx))) { \
> -		tmp = atoi(field); \
> -		check_arg(_COND_, _ERR_, nr_steps); \
> -		step.type = _STEP_; \
> -		step._FIELD_ = tmp; \
> -		goto add_step; \
> -	} \
> +	do { \
> +		field = strtok_r(fstart, ".", &fctx); \
> +		if (field) { \
> +			tmp = atoi(field); \
> +			check_arg(_COND_, _ERR_, nr_steps); \
> +			step.type = _STEP_; \
> +			step._FIELD_ = tmp; \
> +			goto add_step; \
> +		} \
> +	} while (0)
>   
>   static struct workload *
>   parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
> @@ -926,7 +926,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   		valid = 0;
>   		memset(&step, 0, sizeof(step));
>   
> -		if ((field = strtok_r(fstart, ".", &fctx))) {
> +		field = strtok_r(fstart, ".", &fctx);
> +		if (field) {
>   			fstart = NULL;
>   
>   			if (!strcmp(field, "d")) {
> @@ -943,6 +944,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   				}
>   			} else if (!strcmp(field, "P")) {
>   				unsigned int nr = 0;
> +
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
>   					tmp = atoi(field);
>   					check_arg(nr == 0 && tmp <= 0,
> @@ -968,6 +970,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   					  "Invalid sync target at step %u!\n");
>   			} else if (!strcmp(field, "S")) {
>   				unsigned int nr = 0;
> +
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
>   					tmp = atoi(field);
>   					check_arg(tmp <= 0 && nr == 0,
> @@ -1004,6 +1007,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   				goto add_step;
>   			} else if (!strcmp(field, "M")) {
>   				unsigned int nr = 0;
> +
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
>   					tmp = atoi(field);
>   					check_arg(nr == 0 && tmp <= 0,
> @@ -1034,6 +1038,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   					  "Invalid terminate target at step %u!\n");
>   			} else if (!strcmp(field, "X")) {
>   				unsigned int nr = 0;
> +
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
>   					tmp = atoi(field);
>   					check_arg(nr == 0 && tmp <= 0,
> @@ -1058,6 +1063,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   				goto add_step;
>   			} else if (!strcmp(field, "B")) {
>   				unsigned int nr = 0;
> +
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
>   					tmp = atoi(field);
>   					check_arg(nr == 0 && tmp <= 0,
> @@ -1077,6 +1083,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   				goto add_step;
>   			} else if (!strcmp(field, "b")) {
>   				unsigned int nr = 0;
> +
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
>   					check_arg(nr > 2,
>   						  "Invalid bond format at step %u!\n",
> @@ -1148,7 +1155,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   			valid++;
>   		}
>   
> -		if ((field = strtok_r(fstart, ".", &fctx))) {
> +		field = strtok_r(fstart, ".", &fctx);
> +		if (field) {
>   			fstart = NULL;
>   
>   			i = str_to_engine(field);
> @@ -1160,7 +1168,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   			step.engine = i;
>   		}
>   
> -		if ((field = strtok_r(fstart, ".", &fctx))) {
> +		field = strtok_r(fstart, ".", &fctx);
> +		if (field) {
>   			fstart = NULL;
>   
>   			tmp = parse_duration(nr_steps, &step.duration, scale_dur, field);
> @@ -1170,7 +1179,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   			valid++;
>   		}
>   
> -		if ((field = strtok_r(fstart, ".", &fctx))) {
> +		field = strtok_r(fstart, ".", &fctx);
> +		if (field) {
>   			fstart = NULL;
>   
>   			tmp = parse_dependencies(nr_steps, &step, field);
> @@ -1180,7 +1190,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   			valid++;
>   		}
>   
> -		if ((field = strtok_r(fstart, ".", &fctx))) {
> +		field = strtok_r(fstart, ".", &fctx);
> +		if (field) {
>   			fstart = NULL;
>   
>   			check_arg(strlen(field) != 1 ||
> @@ -1224,7 +1235,7 @@ add_step:
>   		nr_steps += app_w->nr_steps;
>   	}
>   
> -	wrk = malloc(sizeof(*wrk));
> +	wrk = calloc(1, sizeof(*wrk));

Rest looks fine but this change I don't know what checkpatch has against 
it and why calloc(1,..) is better. That's the kernels checkpatch.pl? I 
don't get it when I run it.

Regards,

Tvrtko

>   	igt_assert(wrk);
>   
>   	wrk->nr_steps = nr_steps;
> @@ -2717,6 +2728,7 @@ int main(int argc, char **argv)
>   
>   	if (append_workload_arg) {
>   		struct w_arg arg = { NULL, append_workload_arg, 0 };
> +
>   		app_w = parse_workload(&arg, flags, scale_dur, scale_time,
>   				       NULL);
>   		if (!app_w) {

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 08/14] benchmarks/gem_wsim: reposition repeat_start variable
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 08/14] benchmarks/gem_wsim: reposition repeat_start variable Marcin Bernatowicz
@ 2023-09-26 11:10   ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2023-09-26 11:10 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 26/09/2023 09:44, Marcin Bernatowicz wrote:
> No need for repeat_start in struct workload.
> It's now a variable in run_workload function scope.
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 8 +++-----
>   1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index daa20fb8a..2e6eb6388 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -209,8 +209,6 @@ struct workload {
>   	uint32_t bb_prng;
>   	uint32_t bo_prng;
>   
> -	struct timespec repeat_start;
> -
>   	unsigned int nr_ctxs;
>   	struct ctx *ctx_list;
>   
> @@ -2283,7 +2281,7 @@ static void sync_deps(struct workload *wrk, struct w_step *w)
>   static void *run_workload(void *data)
>   {
>   	struct workload *wrk = (struct workload *)data;
> -	struct timespec t_start, t_end;
> +	struct timespec t_start, t_end, repeat_start;
>   	struct w_step *w;
>   	int throttle = -1;
>   	int qd_throttle = -1;
> @@ -2297,7 +2295,7 @@ static void *run_workload(void *data)
>   	     count++) {
>   		unsigned int cur_seqno = wrk->sync_seqno;
>   
> -		clock_gettime(CLOCK_MONOTONIC, &wrk->repeat_start);
> +		clock_gettime(CLOCK_MONOTONIC, &repeat_start);
>   
>   		for (i = 0, w = wrk->steps; wrk->run && (i < wrk->nr_steps);
>   		     i++, w++) {
> @@ -2311,7 +2309,7 @@ static void *run_workload(void *data)
>   				int elapsed;
>   
>   				clock_gettime(CLOCK_MONOTONIC, &now);
> -				elapsed = elapsed_us(&wrk->repeat_start, &now);
> +				elapsed = elapsed_us(&repeat_start, &now);
>   				do_sleep = w->period - elapsed;
>   				time_tot += elapsed;
>   				if (elapsed < time_min)

Looks like an innocent cleanup indeed.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 09/14] benchmarks/gem_wsim: use lib code to query engines
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 09/14] benchmarks/gem_wsim: use lib code to query engines Marcin Bernatowicz
@ 2023-09-26 11:23   ` Tvrtko Ursulin
  2023-09-27 19:09     ` Bernatowicz, Marcin
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2023-09-26 11:23 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 26/09/2023 09:44, Marcin Bernatowicz wrote:
> Use code in lib/i915/gem_engine_topology to query engines.
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 157 +++++-------------------------------------
>   1 file changed, 19 insertions(+), 138 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 2e6eb6388..a3339e1b2 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -456,150 +456,31 @@ static int str_to_engine(const char *str)
>   	return -1;
>   }
>   
> -static bool __engines_queried;
> -static unsigned int __num_engines;
> -static struct i915_engine_class_instance *__engines;
> -
> -static int
> -__i915_query(int i915, struct drm_i915_query *q)
> +static struct intel_engine_data *query_engines(void)
>   {
> -	if (igt_ioctl(i915, DRM_IOCTL_I915_QUERY, q))
> -		return -errno;
> -	return 0;
> -}
> +	static struct intel_engine_data engines = {};
>   
> -static int
> -__i915_query_items(int i915, struct drm_i915_query_item *items, uint32_t n_items)
> -{
> -	struct drm_i915_query q = {
> -		.num_items = n_items,
> -		.items_ptr = to_user_pointer(items),
> -	};
> -	return __i915_query(i915, &q);
> -}
> +	if (engines.nengines)
> +		return &engines;
>   
> -static void
> -i915_query_items(int i915, struct drm_i915_query_item *items, uint32_t n_items)
> -{
> -	igt_assert_eq(__i915_query_items(i915, items, n_items), 0);
> -}
> -
> -static bool has_engine_query(int i915)
> -{
> -	struct drm_i915_query_item item = {
> -		.query_id = DRM_I915_QUERY_ENGINE_INFO,
> -	};
> -
> -	return __i915_query_items(i915, &item, 1) == 0 && item.length > 0;
> -}
> -
> -static void query_engines(void)
> -{
> -	struct i915_engine_class_instance *engines;
> -	unsigned int num;
> -
> -	if (__engines_queried)
> -		return;
> -
> -	__engines_queried = true;
> -
> -	if (!has_engine_query(fd)) {
> -		unsigned int num_bsd = gem_has_bsd(fd) + gem_has_bsd2(fd);
> -		unsigned int i = 0;
> -
> -		igt_assert(num_bsd);
> -
> -		num = 1 + num_bsd;
> -
> -		if (gem_has_blt(fd))
> -			num++;
> -
> -		if (gem_has_vebox(fd))
> -			num++;
> -
> -		engines = calloc(num,
> -				 sizeof(struct i915_engine_class_instance));
> -		igt_assert(engines);
> -
> -		engines[i].engine_class = I915_ENGINE_CLASS_RENDER;
> -		engines[i].engine_instance = 0;
> -		i++;
> -
> -		if (gem_has_blt(fd)) {
> -			engines[i].engine_class = I915_ENGINE_CLASS_COPY;
> -			engines[i].engine_instance = 0;
> -			i++;
> -		}
> -
> -		if (gem_has_bsd(fd)) {
> -			engines[i].engine_class = I915_ENGINE_CLASS_VIDEO;
> -			engines[i].engine_instance = 0;
> -			i++;
> -		}
> -
> -		if (gem_has_bsd2(fd)) {
> -			engines[i].engine_class = I915_ENGINE_CLASS_VIDEO;
> -			engines[i].engine_instance = 1;
> -			i++;
> -		}
> -
> -		if (gem_has_vebox(fd)) {
> -			engines[i].engine_class =
> -				I915_ENGINE_CLASS_VIDEO_ENHANCE;
> -			engines[i].engine_instance = 0;
> -			i++;
> -		}
> -	} else {
> -		struct drm_i915_query_engine_info *engine_info;
> -		struct drm_i915_query_item item = {
> -			.query_id = DRM_I915_QUERY_ENGINE_INFO,
> -		};
> -		const unsigned int sz = 4096;
> -		unsigned int i;
> -
> -		engine_info = malloc(sz);
> -		igt_assert(engine_info);
> -		memset(engine_info, 0, sz);
> -
> -		item.data_ptr = to_user_pointer(engine_info);
> -		item.length = sz;
> -
> -		i915_query_items(fd, &item, 1);
> -		igt_assert(item.length > 0);
> -		igt_assert(item.length <= sz);
> -
> -		num = engine_info->num_engines;
> -
> -		engines = calloc(num,
> -				 sizeof(struct i915_engine_class_instance));
> -		igt_assert(engines);
> -
> -		for (i = 0; i < num; i++) {
> -			struct drm_i915_engine_info *engine =
> -				(struct drm_i915_engine_info *)&engine_info->engines[i];
> -
> -			engines[i] = engine->engine;
> -		}
> -	}
> -
> -	__engines = engines;
> -	__num_engines = num;
> +	engines = intel_engine_list_of_physical(fd);
> +	igt_assert(engines.nengines);
> +	return &engines;
>   }
>   
>   static unsigned int num_engines_in_class(enum intel_engine_id class)
>   {
> -	unsigned int i, count = 0;
> +	const struct intel_engine_data *engines = query_engines();
> +	unsigned int count = 0;
> +	int i;
>   
>   	igt_assert(class == VCS);
>   
> -	query_engines();
> -
> -	for (i = 0; i < __num_engines; i++) {
> -		if (__engines[i].engine_class == I915_ENGINE_CLASS_VIDEO)
> +	for (i = 0; i < engines->nengines; i++) {

nengines is uint32_t so probably best to keep i unsigned.

> +		if (engines->engines[i].class == I915_ENGINE_CLASS_VIDEO)
>   			count++;
>   	}
>   
> -	igt_assert(count);

Why dropping this?

Regards,

Tvrtko

>   	return count;
>   }
>   
> @@ -607,16 +488,15 @@ static void
>   fill_engines_id_class(enum intel_engine_id *list,
>   		      enum intel_engine_id class)
>   {
> +	const struct intel_engine_data *engines = query_engines();
>   	enum intel_engine_id engine = VCS1;
>   	unsigned int i, j = 0;
>   
>   	igt_assert(class == VCS);
>   	igt_assert(num_engines_in_class(VCS) <= 2);
>   
> -	query_engines();
> -
> -	for (i = 0; i < __num_engines; i++) {
> -		if (__engines[i].engine_class != I915_ENGINE_CLASS_VIDEO)
> +	for (i = 0; i < engines->nengines; i++) {
> +		if (engines->engines[i].class != I915_ENGINE_CLASS_VIDEO)
>   			continue;
>   
>   		list[j++] = engine++;
> @@ -626,17 +506,18 @@ fill_engines_id_class(enum intel_engine_id *list,
>   static unsigned int
>   find_physical_instance(enum intel_engine_id class, unsigned int logical)
>   {
> +	const struct intel_engine_data *engines = query_engines();
>   	unsigned int i, j = 0;
>   
>   	igt_assert(class == VCS);
>   
> -	for (i = 0; i < __num_engines; i++) {
> -		if (__engines[i].engine_class != I915_ENGINE_CLASS_VIDEO)
> +	for (i = 0; i < engines->nengines; i++) {
> +		if (engines->engines[i].class != I915_ENGINE_CLASS_VIDEO)
>   			continue;
>   
>   		/* Map logical to physical instances. */
>   		if (logical == j++)
> -			return __engines[i].engine_instance;
> +			return engines->engines[i].instance;
>   	}
>   
>   	igt_assert(0);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 10/14] benchmarks/gem_wsim: allow comments in workload description files
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 10/14] benchmarks/gem_wsim: allow comments in workload description files Marcin Bernatowicz
@ 2023-09-26 11:33   ` Tvrtko Ursulin
  2023-09-26 11:48     ` Bernatowicz, Marcin
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2023-09-26 11:33 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 26/09/2023 09:44, Marcin Bernatowicz wrote:
> Lines starting with '#' are skipped.
> If command line step separator (',') is encountered after '#'
> it is replaced with ';' to not break parsing.
> 
> v2: SKIP step type is not needed (Tvrtko)
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c  | 19 ++++++++++++++++++-
>   benchmarks/wsim/README |  2 ++
>   2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index a3339e1b2..0222c6c71 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -43,6 +43,7 @@
>   #include <limits.h>
>   #include <pthread.h>
>   #include <math.h>
> +#include <ctype.h>
>   
>   #include "drm.h"
>   #include "drmtest.h"
> @@ -809,6 +810,14 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   		if (field) {
>   			fstart = NULL;
>   
> +			/* line starting with # is a comment */
> +			if (field[0] == '#') {
> +				if (verbose > 3)
> +					printf("skipped line: %s\n", _token);
> +				free(token);
> +				continue;
> +			}
> +
>   			if (!strcmp(field, "d")) {
>   				int_field(DELAY, delay, tmp <= 0,
>   					  "Invalid delay at step %u!\n");
> @@ -1073,7 +1082,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   		if (field) {
>   			fstart = NULL;
>   
> -			check_arg(strlen(field) != 1 ||
> +			check_arg(!strlen(field) ||
> +				  (strlen(field) > 1 && !isspace(field[1]) && field[1] != '#') ||

Help me out here please - why is this needed for this specific field?

>   				  (field[0] != '0' && field[0] != '1'),
>   				  "Invalid wait boolean at step %u!\n",
>   				  nr_steps);
> @@ -2422,6 +2432,13 @@ static char *load_workload_descriptor(char *filename)
>   	close(infd);
>   
>   	for (i = 0; i < len; i++) {
> +		/* '#' starts comment till end of line */
> +		if (buf[i] == '#')
> +			/* replace ',' in comments to not break parsing */
> +			while (++i < len && buf[i] != '\n')
> +				if (buf[i] == ',')
> +					buf[i] = ';';
> +
>   		if (buf[i] == '\n')
>   			buf[i] = ',';
>   	}
> diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
> index 8c71f2fe6..e4fd61645 100644
> --- a/benchmarks/wsim/README
> +++ b/benchmarks/wsim/README
> @@ -1,6 +1,8 @@
>   Workload descriptor format
>   ==========================
>   
> +Lines starting with '#' are treated as comments (do not create work step).

Maybe reads better as "..as comments and will not create a work step."

Regards,

Tvrtko

> +
>   ctx.engine.duration_us.dependency.wait,...
>   <uint>.<str>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
>   B.<uint>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 11/14] benchmarks/gem_wsim: introduce w_step_sync function
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 11/14] benchmarks/gem_wsim: introduce w_step_sync function Marcin Bernatowicz
@ 2023-09-26 11:37   ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2023-09-26 11:37 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 26/09/2023 09:44, Marcin Bernatowicz wrote:
> Added w_step_sync function for workload step synchronization.
> Change will allow cleaner xe integration.
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 17 +++++++++++------
>   1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 0222c6c71..2c6ccd3a9 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -245,6 +245,11 @@ static const char *ring_str_map[NUM_ENGINES] = {
>   	[VECS] = "VECS",
>   };
>   
> +static void w_step_sync(struct w_step *w)
> +{
> +	gem_sync(fd, w->obj[0].handle);
> +}
> +
>   static int read_timestamp_frequency(int i915)
>   {
>   	int value = 0;
> @@ -2106,7 +2111,7 @@ static void w_sync_to(struct workload *wrk, struct w_step *w, int target)
>   	igt_assert(target < wrk->nr_steps);
>   	igt_assert(wrk->steps[target].type == BATCH);
>   
> -	gem_sync(fd, wrk->steps[target].obj[0].handle);
> +	w_step_sync(&wrk->steps[target]);
>   }
>   
>   static void
> @@ -2165,7 +2170,7 @@ static void sync_deps(struct workload *wrk, struct w_step *w)
>   		igt_assert(dep_idx >= 0 && dep_idx < w->idx);
>   		igt_assert(wrk->steps[dep_idx].type == BATCH);
>   
> -		gem_sync(fd, wrk->steps[dep_idx].obj[0].handle);
> +		w_step_sync(&wrk->steps[dep_idx]);
>   	}
>   }
>   
> @@ -2219,7 +2224,7 @@ static void *run_workload(void *data)
>   
>   				igt_assert(s_idx >= 0 && s_idx < i);
>   				igt_assert(wrk->steps[s_idx].type == BATCH);
> -				gem_sync(fd, wrk->steps[s_idx].obj[0].handle);
> +				w_step_sync(&wrk->steps[s_idx]);
>   				continue;
>   			} else if (w->type == THROTTLE) {
>   				throttle = w->throttle;
> @@ -2310,7 +2315,7 @@ static void *run_workload(void *data)
>   				break;
>   
>   			if (w->sync)
> -				gem_sync(fd, w->obj[0].handle);
> +				w_step_sync(w);
>   
>   			if (qd_throttle > 0) {
>   				while (wrk->nrequest[engine] > qd_throttle) {
> @@ -2319,7 +2324,7 @@ static void *run_workload(void *data)
>   					s = igt_list_first_entry(&wrk->requests[engine],
>   								 s, rq_link);
>   
> -					gem_sync(fd, s->obj[0].handle);
> +						w_step_sync(s);

Indentation looks broken here.

Otherwise cleanup looks good so with this fixed:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

>   
>   					s->request = -1;
>   					igt_list_del(&s->rq_link);
> @@ -2351,7 +2356,7 @@ static void *run_workload(void *data)
>   			continue;
>   
>   		w = igt_list_last_entry(&wrk->requests[i], w, rq_link);
> -		gem_sync(fd, w->obj[0].handle);
> +		w_step_sync(w);
>   	}
>   
>   	clock_gettime(CLOCK_MONOTONIC, &t_end);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 12/14] benchmarks/gem_wsim: extract prepare contexts code to new function
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 12/14] benchmarks/gem_wsim: extract prepare contexts code to new function Marcin Bernatowicz
@ 2023-09-26 11:43   ` Tvrtko Ursulin
  2023-09-26 11:58     ` Bernatowicz, Marcin
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2023-09-26 11:43 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 26/09/2023 09:44, Marcin Bernatowicz wrote:
> No functional changes.
> Extracted prepare_contexts function from prepare_workload.
> Small code cleanup for "No need for 'else' after continue/break". (Kamil)
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 30 ++++++++++++++++++++----------
>   1 file changed, 20 insertions(+), 10 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 2c6ccd3a9..55f8d9b1b 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -1766,20 +1766,13 @@ static void measure_active_set(struct workload *wrk)
>   
>   #define alloca0(sz) ({ size_t sz__ = (sz); memset(alloca(sz__), 0, sz__); })
>   
> -static int prepare_workload(unsigned int id, struct workload *wrk)
> +static int prepare_contexts(unsigned int id, struct workload *wrk)
>   {
> -	struct working_set **sets;
> -	unsigned long total = 0;
>   	uint32_t share_vm = 0;
>   	int max_ctx = -1;
>   	struct w_step *w;
>   	int i, j;
>   
> -	wrk->id = id;
> -	wrk->bb_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
> -	wrk->bo_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
> -	wrk->run = true;
> -
>   	/*
>   	 * Pre-scan workload steps to allocate context list storage.
>   	 */
> @@ -1968,6 +1961,23 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
>   	if (share_vm)
>   		vm_destroy(fd, share_vm);
>   
> +	return 0;
> +}
> +
> +static int prepare_workload(unsigned int id, struct workload *wrk)
> +{
> +	struct working_set **sets;
> +	unsigned long total = 0;
> +	struct w_step *w;
> +	int i, j;
> +
> +	wrk->id = id;
> +	wrk->bb_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
> +	wrk->bo_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
> +	wrk->run = true;
> +
> +	prepare_contexts(id, wrk);
> +
>   	/* Record default preemption. */
>   	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
>   		if (w->type == BATCH)
> @@ -1990,9 +2000,9 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
>   
>   			if (w2->context != w->context)
>   				continue;
> -			else if (w2->type == PREEMPTION)
> +			if (w2->type == PREEMPTION)
>   				break;
> -			else if (w2->type != BATCH)
> +			if (w2->type != BATCH)

To me it is more readable like it was. Is this some general rule that 
else if should not be used in such cases?

Either case I'd be happiest if extracting code into functions wash't 
mixed with unrelated changes.

Otherwise code extraction looks fine.

Regards,

Tvrtko

>   				continue;
>   
>   			w2->preempt_us = w->period;

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 13/14] benchmarks/gem_wsim: extract prepare working sets code to new function
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 13/14] benchmarks/gem_wsim: extract prepare working sets " Marcin Bernatowicz
@ 2023-09-26 11:46   ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2023-09-26 11:46 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 26/09/2023 09:44, Marcin Bernatowicz wrote:
> No functional changes.
> Extracted prepare_working_sets function from prepare_workload.
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 106 +++++++++++++++++++++++-------------------
>   1 file changed, 58 insertions(+), 48 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 55f8d9b1b..7703ca822 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -1964,10 +1964,66 @@ static int prepare_contexts(unsigned int id, struct workload *wrk)
>   	return 0;
>   }
>   
> -static int prepare_workload(unsigned int id, struct workload *wrk)
> +static int prepare_working_sets(unsigned int id, struct workload *wrk)

Can return void? Come to think of it, previous patch does not use the 
return value either.

Otherwise code movement looks 1:1.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

>   {
>   	struct working_set **sets;
>   	unsigned long total = 0;
> +	struct w_step *w;
> +	int i;
> +
> +	/*
> +	 * Allocate working sets.
> +	 */
> +	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
> +		if (w->type == WORKINGSET && !w->working_set.shared)
> +			total += allocate_working_set(wrk, &w->working_set);
> +	}
> +
> +	if (verbose > 2)
> +		printf("%u: %lu bytes in working sets.\n", wrk->id, total);
> +
> +	/*
> +	 * Map of working set ids.
> +	 */
> +	wrk->max_working_set_id = -1;
> +	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
> +		if (w->type == WORKINGSET &&
> +		    w->working_set.id > wrk->max_working_set_id)
> +			wrk->max_working_set_id = w->working_set.id;
> +	}
> +
> +	sets = wrk->working_sets;
> +	wrk->working_sets = calloc(wrk->max_working_set_id + 1,
> +				   sizeof(*wrk->working_sets));
> +	igt_assert(wrk->working_sets);
> +
> +	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
> +		struct working_set *set;
> +
> +		if (w->type != WORKINGSET)
> +			continue;
> +
> +		if (!w->working_set.shared) {
> +			set = &w->working_set;
> +		} else {
> +			igt_assert(sets);
> +
> +			set = sets[w->working_set.id];
> +			igt_assert(set->shared);
> +			igt_assert(set->sizes);
> +		}
> +
> +		wrk->working_sets[w->working_set.id] = set;
> +	}
> +
> +	if (sets)
> +		free(sets);
> +
> +	return 0;
> +}
> +
> +static int prepare_workload(unsigned int id, struct workload *wrk)
> +{
>   	struct w_step *w;
>   	int i, j;
>   
> @@ -2019,53 +2075,7 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
>   		}
>   	}
>   
> -	/*
> -	 * Allocate working sets.
> -	 */
> -	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
> -		if (w->type == WORKINGSET && !w->working_set.shared)
> -			total += allocate_working_set(wrk, &w->working_set);
> -	}
> -
> -	if (verbose > 2)
> -		printf("%u: %lu bytes in working sets.\n", wrk->id, total);
> -
> -	/*
> -	 * Map of working set ids.
> -	 */
> -	wrk->max_working_set_id = -1;
> -	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
> -		if (w->type == WORKINGSET &&
> -		    w->working_set.id > wrk->max_working_set_id)
> -			wrk->max_working_set_id = w->working_set.id;
> -	}
> -
> -	sets = wrk->working_sets;
> -	wrk->working_sets = calloc(wrk->max_working_set_id + 1,
> -				   sizeof(*wrk->working_sets));
> -	igt_assert(wrk->working_sets);
> -
> -	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
> -		struct working_set *set;
> -
> -		if (w->type != WORKINGSET)
> -			continue;
> -
> -		if (!w->working_set.shared) {
> -			set = &w->working_set;
> -		} else {
> -			igt_assert(sets);
> -
> -			set = sets[w->working_set.id];
> -			igt_assert(set->shared);
> -			igt_assert(set->sizes);
> -		}
> -
> -		wrk->working_sets[w->working_set.id] = set;
> -	}
> -
> -	if (sets)
> -		free(sets);
> +	prepare_working_sets(id, wrk);
>   
>   	/*
>   	 * Allocate batch buffers.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 10/14] benchmarks/gem_wsim: allow comments in workload description files
  2023-09-26 11:33   ` Tvrtko Ursulin
@ 2023-09-26 11:48     ` Bernatowicz, Marcin
  2023-09-26 12:10       ` Tvrtko Ursulin
  0 siblings, 1 reply; 39+ messages in thread
From: Bernatowicz, Marcin @ 2023-09-26 11:48 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: chris.p.wilson

Hi

On 9/26/2023 1:33 PM, Tvrtko Ursulin wrote:
> 
> On 26/09/2023 09:44, Marcin Bernatowicz wrote:
>> Lines starting with '#' are skipped.
>> If command line step separator (',') is encountered after '#'
>> it is replaced with ';' to not break parsing.
>>
>> v2: SKIP step type is not needed (Tvrtko)
>>
>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>> ---
>>   benchmarks/gem_wsim.c  | 19 ++++++++++++++++++-
>>   benchmarks/wsim/README |  2 ++
>>   2 files changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>> index a3339e1b2..0222c6c71 100644
>> --- a/benchmarks/gem_wsim.c
>> +++ b/benchmarks/gem_wsim.c
>> @@ -43,6 +43,7 @@
>>   #include <limits.h>
>>   #include <pthread.h>
>>   #include <math.h>
>> +#include <ctype.h>
>>   #include "drm.h"
>>   #include "drmtest.h"
>> @@ -809,6 +810,14 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>           if (field) {
>>               fstart = NULL;
>> +            /* line starting with # is a comment */
>> +            if (field[0] == '#') {
>> +                if (verbose > 3)
>> +                    printf("skipped line: %s\n", _token);
>> +                free(token);
>> +                continue;
>> +            }
>> +
>>               if (!strcmp(field, "d")) {
>>                   int_field(DELAY, delay, tmp <= 0,
>>                         "Invalid delay at step %u!\n");
>> @@ -1073,7 +1082,8 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>           if (field) {
>>               fstart = NULL;
>> -            check_arg(strlen(field) != 1 ||
>> +            check_arg(!strlen(field) ||
>> +                  (strlen(field) > 1 && !isspace(field[1]) && 
>> field[1] != '#') ||
> 
> Help me out here please - why is this needed for this specific field?

I loosened the condition a bit to allow a comment on the same line as 
BATCH step, maybe looks a bit weird but allows to have:

1.RCS.100.0.1 # a comment
1.RCS.100.0.1# a comment
1.RCS.100.0.1 this is also acceppted but didn't want to add more loops:/


> 
>>                     (field[0] != '0' && field[0] != '1'),
>>                     "Invalid wait boolean at step %u!\n",
>>                     nr_steps);
>> @@ -2422,6 +2432,13 @@ static char *load_workload_descriptor(char 
>> *filename)
>>       close(infd);
>>       for (i = 0; i < len; i++) {
>> +        /* '#' starts comment till end of line */
>> +        if (buf[i] == '#')
>> +            /* replace ',' in comments to not break parsing */
>> +            while (++i < len && buf[i] != '\n')
>> +                if (buf[i] == ',')
>> +                    buf[i] = ';';
>> +
>>           if (buf[i] == '\n')
>>               buf[i] = ',';
>>       }
>> diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
>> index 8c71f2fe6..e4fd61645 100644
>> --- a/benchmarks/wsim/README
>> +++ b/benchmarks/wsim/README
>> @@ -1,6 +1,8 @@
>>   Workload descriptor format
>>   ==========================
>> +Lines starting with '#' are treated as comments (do not create work 
>> step).
> 
> Maybe reads better as "..as comments and will not create a work step."
> 
> Regards,
> 
> Tvrtko
> 
>> +
>>   ctx.engine.duration_us.dependency.wait,...
>>   <uint>.<str>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
>>   B.<uint>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [igt-dev] ✗ Fi.CI.BUILD: failure for benchmarks/gem_wsim: added basic xe support (rev4)
  2023-09-26  8:44 [igt-dev] [PATCH i-g-t 00/14] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (15 preceding siblings ...)
  2023-09-26 10:11 ` [igt-dev] ✗ Fi.CI.BAT: failure " Patchwork
@ 2023-09-26 11:56 ` Patchwork
  16 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2023-09-26 11:56 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: igt-dev

== Series Details ==

Series: benchmarks/gem_wsim: added basic xe support (rev4)
URL   : https://patchwork.freedesktop.org/series/122920/
State : failure

== Summary ==

Applying: lib/igt_device_scan: Xe get integrated/discrete card functions
Applying: benchmarks/gem_wsim: reposition the unbound duration boolean
Applying: benchmarks/gem_wsim: fix scaling of period steps
Using index info to reconstruct a base tree...
M	benchmarks/gem_wsim.c
Patch failed at 0003 benchmarks/gem_wsim: fix scaling of period steps
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 12/14] benchmarks/gem_wsim: extract prepare contexts code to new function
  2023-09-26 11:43   ` Tvrtko Ursulin
@ 2023-09-26 11:58     ` Bernatowicz, Marcin
  0 siblings, 0 replies; 39+ messages in thread
From: Bernatowicz, Marcin @ 2023-09-26 11:58 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: chris.p.wilson

Hi,

On 9/26/2023 1:43 PM, Tvrtko Ursulin wrote:
> 
> On 26/09/2023 09:44, Marcin Bernatowicz wrote:
>> No functional changes.
>> Extracted prepare_contexts function from prepare_workload.
>> Small code cleanup for "No need for 'else' after continue/break". (Kamil)
>>
>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>> ---
>>   benchmarks/gem_wsim.c | 30 ++++++++++++++++++++----------
>>   1 file changed, 20 insertions(+), 10 deletions(-)
>>
>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>> index 2c6ccd3a9..55f8d9b1b 100644
>> --- a/benchmarks/gem_wsim.c
>> +++ b/benchmarks/gem_wsim.c
>> @@ -1766,20 +1766,13 @@ static void measure_active_set(struct workload 
>> *wrk)
>>   #define alloca0(sz) ({ size_t sz__ = (sz); memset(alloca(sz__), 0, 
>> sz__); })
>> -static int prepare_workload(unsigned int id, struct workload *wrk)
>> +static int prepare_contexts(unsigned int id, struct workload *wrk)
>>   {
>> -    struct working_set **sets;
>> -    unsigned long total = 0;
>>       uint32_t share_vm = 0;
>>       int max_ctx = -1;
>>       struct w_step *w;
>>       int i, j;
>> -    wrk->id = id;
>> -    wrk->bb_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
>> -    wrk->bo_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
>> -    wrk->run = true;
>> -
>>       /*
>>        * Pre-scan workload steps to allocate context list storage.
>>        */
>> @@ -1968,6 +1961,23 @@ static int prepare_workload(unsigned int id, 
>> struct workload *wrk)
>>       if (share_vm)
>>           vm_destroy(fd, share_vm);
>> +    return 0;
>> +}
>> +
>> +static int prepare_workload(unsigned int id, struct workload *wrk)
>> +{
>> +    struct working_set **sets;
>> +    unsigned long total = 0;
>> +    struct w_step *w;
>> +    int i, j;
>> +
>> +    wrk->id = id;
>> +    wrk->bb_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
>> +    wrk->bo_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
>> +    wrk->run = true;
>> +
>> +    prepare_contexts(id, wrk);
>> +
>>       /* Record default preemption. */
>>       for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
>>           if (w->type == BATCH)
>> @@ -1990,9 +2000,9 @@ static int prepare_workload(unsigned int id, 
>> struct workload *wrk)
>>               if (w2->context != w->context)
>>                   continue;
>> -            else if (w2->type == PREEMPTION)
>> +            if (w2->type == PREEMPTION)
>>                   break;
>> -            else if (w2->type != BATCH)
>> +            if (w2->type != BATCH)
> 
> To me it is more readable like it was. Is this some general rule that 
> else if should not be used in such cases?

I'm ok with both, Kamil raised the hand on this, so I modified,
but I don't recall any warnings/errors with this.

Kamil can we live with original version ?

> 
> Either case I'd be happiest if extracting code into functions wash't 
> mixed with unrelated changes.
> 
> Otherwise code extraction looks fine.
> 
> Regards,
> 
> Tvrtko
> 
>>                   continue;
>>               w2->preempt_us = w->period;

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 10/14] benchmarks/gem_wsim: allow comments in workload description files
  2023-09-26 11:48     ` Bernatowicz, Marcin
@ 2023-09-26 12:10       ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2023-09-26 12:10 UTC (permalink / raw)
  To: Bernatowicz, Marcin, igt-dev; +Cc: chris.p.wilson


On 26/09/2023 12:48, Bernatowicz, Marcin wrote:
> Hi
> 
> On 9/26/2023 1:33 PM, Tvrtko Ursulin wrote:
>>
>> On 26/09/2023 09:44, Marcin Bernatowicz wrote:
>>> Lines starting with '#' are skipped.
>>> If command line step separator (',') is encountered after '#'
>>> it is replaced with ';' to not break parsing.
>>>
>>> v2: SKIP step type is not needed (Tvrtko)
>>>
>>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>>> ---
>>>   benchmarks/gem_wsim.c  | 19 ++++++++++++++++++-
>>>   benchmarks/wsim/README |  2 ++
>>>   2 files changed, 20 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>>> index a3339e1b2..0222c6c71 100644
>>> --- a/benchmarks/gem_wsim.c
>>> +++ b/benchmarks/gem_wsim.c
>>> @@ -43,6 +43,7 @@
>>>   #include <limits.h>
>>>   #include <pthread.h>
>>>   #include <math.h>
>>> +#include <ctype.h>
>>>   #include "drm.h"
>>>   #include "drmtest.h"
>>> @@ -809,6 +810,14 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>           if (field) {
>>>               fstart = NULL;
>>> +            /* line starting with # is a comment */
>>> +            if (field[0] == '#') {
>>> +                if (verbose > 3)
>>> +                    printf("skipped line: %s\n", _token);
>>> +                free(token);
>>> +                continue;
>>> +            }
>>> +
>>>               if (!strcmp(field, "d")) {
>>>                   int_field(DELAY, delay, tmp <= 0,
>>>                         "Invalid delay at step %u!\n");
>>> @@ -1073,7 +1082,8 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>           if (field) {
>>>               fstart = NULL;
>>> -            check_arg(strlen(field) != 1 ||
>>> +            check_arg(!strlen(field) ||
>>> +                  (strlen(field) > 1 && !isspace(field[1]) && 
>>> field[1] != '#') ||
>>
>> Help me out here please - why is this needed for this specific field?
> 
> I loosened the condition a bit to allow a comment on the same line as 
> BATCH step, maybe looks a bit weird but allows to have:
> 
> 1.RCS.100.0.1 # a comment
> 1.RCS.100.0.1# a comment
> 1.RCS.100.0.1 this is also acceppted but didn't want to add more loops:/

Right, but only on batch steps? It's probably then worth supporting 
trailing comments with any command/line. Either that or drop this hunk 
I'd say.

Regards,

Tvrtko

> 
> 
>>
>>>                     (field[0] != '0' && field[0] != '1'),
>>>                     "Invalid wait boolean at step %u!\n",
>>>                     nr_steps);
>>> @@ -2422,6 +2432,13 @@ static char *load_workload_descriptor(char 
>>> *filename)
>>>       close(infd);
>>>       for (i = 0; i < len; i++) {
>>> +        /* '#' starts comment till end of line */
>>> +        if (buf[i] == '#')
>>> +            /* replace ',' in comments to not break parsing */
>>> +            while (++i < len && buf[i] != '\n')
>>> +                if (buf[i] == ',')
>>> +                    buf[i] = ';';
>>> +
>>>           if (buf[i] == '\n')
>>>               buf[i] = ',';
>>>       }
>>> diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
>>> index 8c71f2fe6..e4fd61645 100644
>>> --- a/benchmarks/wsim/README
>>> +++ b/benchmarks/wsim/README
>>> @@ -1,6 +1,8 @@
>>>   Workload descriptor format
>>>   ==========================
>>> +Lines starting with '#' are treated as comments (do not create work 
>>> step).
>>
>> Maybe reads better as "..as comments and will not create a work step."
>>
>> Regards,
>>
>> Tvrtko
>>
>>> +
>>>   ctx.engine.duration_us.dependency.wait,...
>>>   <uint>.<str>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
>>>   B.<uint>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 14/14] benchmarks/gem_wsim: added basic xe support
  2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 14/14] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
@ 2023-09-26 13:10   ` Tvrtko Ursulin
  2023-09-26 18:52     ` Bernatowicz, Marcin
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2023-09-26 13:10 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 26/09/2023 09:44, Marcin Bernatowicz wrote:
> Added basic xe support. Single binary handles both i915 and Xe devices.
> 
> Some functionality is still missing: working sets, bonding.
> 
> The tool is handy for scheduling tests, we find it useful to verify vGPU
> profiles defining different execution quantum/preemption timeout
> settings.
> 
> There is also some rationale for the tool in following thread:
> https://lore.kernel.org/dri-devel/a443495f-5d1b-52e1-9b2f-80167deb6d57@linux.intel.com/
> 
> With this patch it should be possible to run following on xe device:
> 
> gem_wsim -w benchmarks/wsim/media_load_balance_fhd26u7.wsim -c 36 -r 600
> 
> Best with drm debug logs disabled:
> 
> echo 0 > /sys/module/drm/parameters/debug
> 
> v2: minimizing divergence - same workload syntax for both drivers,
>      so most existing examples should run on xe unmodified (Tvrtko)

Awesome!

>      This version creates one common VM per workload.
>      Explicit VM management, compute mode will come in next patchset.

I think this is going quite well and is looking promising we will end up 
with something clean.

The only thing I feel needs to be said ahead of time is that I am not 
convinced we should be merging any xe specific changes until xe arrives 
upstream.

But much good progress with refactoring, cleanup and review can still be 
made.

> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c  | 515 ++++++++++++++++++++++++++++++++++++++---
>   benchmarks/wsim/README |   6 +-
>   2 files changed, 485 insertions(+), 36 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 7703ca822..c83ed4882 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -62,6 +62,12 @@
>   #include "i915/gem_engine_topology.h"
>   #include "i915/gem_mman.h"
>   
> +#include "igt_syncobj.h"
> +#include "intel_allocator.h"
> +#include "xe_drm.h"
> +#include "xe/xe_ioctl.h"
> +#include "xe/xe_spin.h"
> +
>   enum intel_engine_id {
>   	DEFAULT,
>   	RCS,
> @@ -109,6 +115,10 @@ struct deps {
>   	struct dep_entry *list;
>   };
>   
> +#define for_each_dep(__dep, __deps) \
> +	for (int __i = 0; __i < __deps.nr && \
> +	     (__dep = &__deps.list[__i]); ++__i)

Could you make use of this macro outside xe too? Like in do_eb()? If so 
you could extract it and merge ahead of time.

> +
>   struct w_arg {
>   	char *filename;
>   	char *desc;
> @@ -177,10 +187,30 @@ struct w_step {
>   	struct drm_i915_gem_execbuffer2 eb;
>   	struct drm_i915_gem_exec_object2 *obj;
>   	struct drm_i915_gem_relocation_entry reloc[3];
> +
> +	struct drm_xe_exec exec;
> +	size_t bb_size;
> +	struct xe_spin *spin;
> +	struct drm_xe_sync *syncs;

Lets think how to create backend specific containers here and make them 
an union. So it is clear what gets used in each case and that it is 
mutually exclusive. It may end up requiring a separate refactoring 
patch(-es) which move the i915 bits into the i915 unions/namespace.

I know gem_wsim did not have that much a focus for clean design, but now 
that 2nd backend is coming I think it is much more important for ease of 
maintenance in the future.

> +
>   	uint32_t bb_handle;
>   	uint32_t *bb_duration;
>   };
>   
> +struct vm {

Everything xe specific please prefix with xe_. Structs, functions, etc.

> +	uint32_t id;
> +	bool compute_mode;
> +	uint64_t ahnd;
> +};
> +
> +struct exec_queue {
> +	uint32_t id;
> +	struct drm_xe_engine_class_instance hwe;
> +	/* for qd_throttle */
> +	unsigned int nrequest;
> +	struct igt_list_head requests;
> +};
> +
>   struct ctx {
>   	uint32_t id;
>   	int priority;
> @@ -190,6 +220,10 @@ struct ctx {
>   	struct bond *bonds;
>   	bool load_balance;
>   	uint64_t sseu;
> +	/* reference to vm */
> +	struct vm *vm;
> +	/* queue for each class */
> +	struct exec_queue queues[NUM_ENGINES];
>   };
>   
>   struct workload {
> @@ -213,7 +247,10 @@ struct workload {
>   	unsigned int nr_ctxs;
>   	struct ctx *ctx_list;
>   
> -	struct working_set **working_sets; /* array indexed by set id */

Comment got lost.

> +	unsigned int nr_vms;
> +	struct vm *vm_list;
> +
> +	struct working_set **working_sets;
>   	int max_working_set_id;
>   
>   	int sync_timeline;
> @@ -223,6 +260,18 @@ struct workload {
>   	unsigned int nrequest[NUM_ENGINES];
>   };
>   
> +#define for_each_ctx(__ctx, __wrk) \
> +	for (int __i = 0; __i < (__wrk)->nr_ctxs && \
> +	     (__ctx = &(__wrk)->ctx_list[__i]); ++__i)
> +
> +#define for_each_exec_queue(__eq, __ctx) \
> +		for (int __j = 0; __j < NUM_ENGINES && ((__eq) = &((__ctx)->queues[__j])); ++__j) \
> +			for_if((__eq)->id > 0)
> +
> +#define for_each_vm(__vm, __wrk) \
> +	for (int __i = 0; __i < (__wrk)->nr_vms && \
> +	     (__vm = &(__wrk)->vm_list[__i]); ++__i)
> +
>   static unsigned int master_prng;
>   
>   static int verbose = 1;
> @@ -231,6 +280,8 @@ static struct drm_i915_gem_context_param_sseu device_sseu = {
>   	.slice_mask = -1 /* Force read on first use. */
>   };
>   
> +static bool is_xe;

Put it next to global 'int fd'.

> +
>   #define SYNCEDCLIENTS	(1<<1)
>   #define DEPSYNC		(1<<2)
>   #define FLAG_SSEU	(1<<3)
> @@ -247,7 +298,10 @@ static const char *ring_str_map[NUM_ENGINES] = {
>   
>   static void w_step_sync(struct w_step *w)
>   {
> -	gem_sync(fd, w->obj[0].handle);
> +	if (is_xe)
> +		igt_assert(syncobj_wait(fd, &w->syncs[0].handle, 1, INT64_MAX, 0, NULL));
> +	else
> +		gem_sync(fd, w->obj[0].handle);
>   }
>   
>   static int read_timestamp_frequency(int i915)
> @@ -351,15 +405,23 @@ parse_dependency(unsigned int nr_steps, struct w_step *w, char *str)
>   		if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
>   			return -1;
>   
> -		add_dep(&w->data_deps, entry);
> +		/* only fence deps in xe, let f-1 <==> -1 */
> +		if (is_xe)
> +			add_dep(&w->fence_deps, entry);
> +		else
> +			add_dep(&w->data_deps, entry);
>   
>   		break;
>   	case 's':
> -		submit_fence = true;
> +		/* no submit fence in xe ? */
> +		if (!is_xe)
> +			submit_fence = true;
>   		/* Fall-through. */
>   	case 'f':
> -		/* Multiple fences not yet supported. */
> -		igt_assert_eq(w->fence_deps.nr, 0);
> +		/* xe supports multiple fences */
> +		if (!is_xe)
> +			/* Multiple fences not yet supported. */
> +			igt_assert_eq(w->fence_deps.nr, 0);
>   
>   		entry.target = atoi(++str);
>   		if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
> @@ -469,7 +531,17 @@ static struct intel_engine_data *query_engines(void)
>   	if (engines.nengines)
>   		return &engines;
>   
> -	engines = intel_engine_list_of_physical(fd);
> +	if (is_xe) {
> +		struct drm_xe_engine_class_instance *hwe;
> +
> +		xe_for_each_hw_engine(fd, hwe) {
> +			engines.engines[engines.nengines].class = hwe->engine_class;
> +			engines.engines[engines.nengines].instance = hwe->engine_instance;
> +			engines.nengines++;
> +		}
> +	} else
> +		engines = intel_engine_list_of_physical(fd);
> +
>   	igt_assert(engines.nengines);
>   	return &engines;
>   }
> @@ -562,6 +634,40 @@ get_engine(enum intel_engine_id engine)
>   	return ci;
>   }
>   
> +static struct drm_xe_engine_class_instance
> +get_xe_engine(enum intel_engine_id engine)
> +{
> +	struct drm_xe_engine_class_instance ci;
> +
> +	switch (engine) {
> +	case DEFAULT:
> +	case RCS:
> +		ci.engine_class = DRM_XE_ENGINE_CLASS_RENDER;
> +		ci.engine_instance = 0;
> +		break;
> +	case BCS:
> +		ci.engine_class = DRM_XE_ENGINE_CLASS_COPY;
> +		ci.engine_instance = 0;
> +		break;
> +	case VCS1:
> +		ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_DECODE;
> +		ci.engine_instance = 0;
> +		break;
> +	case VCS2:
> +		ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_DECODE;
> +		ci.engine_instance = 1;
> +		break;
> +	case VECS:
> +		ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE;
> +		ci.engine_instance = 0;
> +		break;
> +	default:
> +		igt_assert(0);
> +	};
> +
> +	return ci;
> +}
> +
>   static int parse_engine_map(struct w_step *step, const char *_str)
>   {
>   	char *token, *tctx = NULL, *tstart = (char *)_str;
> @@ -838,6 +944,13 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   			} else if (!strcmp(field, "P")) {
>   				unsigned int nr = 0;
>   
> +				if (is_xe) {
> +					if (verbose > 3)
> +						printf("skipped line: %s\n", _token);

There are no priorities at all in xe?

I'd make this a warning level print during parsing, that is printed if 
verbose > 0.

> +					free(token);
> +					continue;
> +				}
> +
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
>   					tmp = atoi(field);
>   					check_arg(nr == 0 && tmp <= 0,
> @@ -864,6 +977,13 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   			} else if (!strcmp(field, "S")) {
>   				unsigned int nr = 0;
>   
> +				if (is_xe) {
> +					if (verbose > 3)
> +						printf("skipped line: %s\n", _token);

This one probably best if fails parsing with an user friendly message.

> +					free(token);
> +					continue;
> +				}
> +
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
>   					tmp = atoi(field);
>   					check_arg(tmp <= 0 && nr == 0,
> @@ -977,6 +1097,12 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   			} else if (!strcmp(field, "b")) {
>   				unsigned int nr = 0;
>   
> +				if (is_xe) {
> +					if (verbose > 3)
> +						printf("skipped line: %s\n", _token);

Ditto.

> +					free(token);
> +					continue;
> +				}
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
>   					check_arg(nr > 2,
>   						  "Invalid bond format at step %u!\n",
> @@ -1041,7 +1167,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   			}
>   
>   			tmp = atoi(field);
> -			check_arg(tmp < 0, "Invalid ctx id at step %u!\n",
> +			check_arg(tmp <= 0, "Invalid context id at step %u!\n",

If context id 0, eg. '0.RCS.1000.0.0', works today, please make this a 
separate patch which adds a new restriction. If it doesn't work then 
still make it a separate patch which fixes the validation bug.

>   				  nr_steps);
>   			step.context = tmp;
>   
> @@ -1054,7 +1180,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   
>   			i = str_to_engine(field);
>   			check_arg(i < 0,
> -				  "Invalid engine id at step %u!\n", nr_steps);
> +				"Invalid engine id at step %u!\n", nr_steps);

Noise which breaks indentation. :)

>   
>   			valid++;
>   
> @@ -1288,6 +1414,20 @@ __get_ctx(struct workload *wrk, const struct w_step *w)
>   	return &wrk->ctx_list[w->context];
>   }
>   
> +static struct exec_queue *
> +get_eq(struct workload *wrk, const struct w_step *w)
> +{
> +	igt_assert(w->engine >= 0 && w->engine < NUM_ENGINES);
> +
> +	return &__get_ctx(wrk, w)->queues[w->engine];
> +}
> +
> +static struct vm *
> +get_vm(struct workload *wrk, const struct w_step *w)
> +{
> +	return wrk->vm_list;
> +}
> +
>   static uint32_t mmio_base(int i915, enum intel_engine_id engine, int gen)
>   {
>   	const char *name;
> @@ -1540,6 +1680,61 @@ alloc_step_batch(struct workload *wrk, struct w_step *w)
>   #endif
>   }
>   
> +static void
> +xe_alloc_step_batch(struct workload *wrk, struct w_step *w)
> +{
> +	struct vm *vm = get_vm(wrk, w);
> +	struct exec_queue *eq = get_eq(wrk, w);
> +	struct dep_entry *dep;
> +	int i;
> +
> +	w->bb_size = ALIGN(sizeof(*w->spin) + xe_cs_prefetch_size(fd),
> +			   xe_get_default_alignment(fd));
> +	w->bb_handle = xe_bo_create(fd, 0, vm->id, w->bb_size);
> +	w->spin = xe_bo_map(fd, w->bb_handle, w->bb_size);
> +	w->exec.address =
> +		intel_allocator_alloc_with_strategy(vm->ahnd, w->bb_handle, w->bb_size,
> +						    0, ALLOC_STRATEGY_LOW_TO_HIGH);
> +	xe_vm_bind_sync(fd, vm->id, w->bb_handle, 0, w->exec.address, w->bb_size);
> +	xe_spin_init_opts(w->spin, .addr = w->exec.address,
> +				   .preempt = (w->preempt_us > 0),
> +				   .ctx_ticks = duration_to_ctx_ticks(fd, eq->hwe.gt_id,
> +								1000LL * get_duration(wrk, w)));
> +	w->exec.exec_queue_id = eq->id;
> +	w->exec.num_batch_buffer = 1;
> +	/* always at least one out fence */
> +	w->exec.num_syncs = 1;
> +	/* count syncs */
> +	igt_assert_eq(0, w->data_deps.nr);
> +	for_each_dep(dep, w->fence_deps) {
> +		int dep_idx = w->idx + dep->target;
> +
> +		igt_assert(dep_idx >= 0 && dep_idx < w->idx);
> +		igt_assert(wrk->steps[dep_idx].type == SW_FENCE ||
> +			   wrk->steps[dep_idx].type == BATCH);
> +
> +		w->exec.num_syncs++;
> +	}
> +	w->syncs = calloc(w->exec.num_syncs, sizeof(*w->syncs));
> +	/* fill syncs */
> +	i = 0;
> +	/* out fence */
> +	w->syncs[i].handle = syncobj_create(fd, 0);
> +	w->syncs[i++].flags = DRM_XE_SYNC_SYNCOBJ | DRM_XE_SYNC_SIGNAL;
> +	/* in fence(s) */
> +	for_each_dep(dep, w->fence_deps) {
> +		int dep_idx = w->idx + dep->target;
> +
> +		igt_assert(wrk->steps[dep_idx].type == SW_FENCE ||
> +			   wrk->steps[dep_idx].type == BATCH);
> +		igt_assert(wrk->steps[dep_idx].syncs && wrk->steps[dep_idx].syncs[0].handle);
> +
> +		w->syncs[i].handle = wrk->steps[dep_idx].syncs[0].handle;
> +		w->syncs[i++].flags = DRM_XE_SYNC_SYNCOBJ;
> +	}
> +	w->exec.syncs = to_user_pointer(w->syncs);
> +}
> +
>   static bool set_priority(uint32_t ctx_id, int prio)
>   {
>   	struct drm_i915_gem_context_param param = {
> @@ -1766,6 +1961,61 @@ static void measure_active_set(struct workload *wrk)
>   
>   #define alloca0(sz) ({ size_t sz__ = (sz); memset(alloca(sz__), 0, sz__); })
>   
> +static void vm_create(struct vm *vm)
> +{
> +	uint32_t flags = 0;
> +
> +	if (vm->compute_mode)
> +		flags |= DRM_XE_VM_CREATE_ASYNC_BIND_OPS |
> +			 DRM_XE_VM_CREATE_COMPUTE_MODE;
> +
> +	vm->id = xe_vm_create(fd, flags, 0);
> +}
> +
> +static void exec_queue_create(struct ctx *ctx, struct exec_queue *eq)
> +{
> +	struct drm_xe_exec_queue_create create = {
> +		.vm_id = ctx->vm->id,
> +		.width = 1,
> +		.num_placements = 1,
> +		.instances = to_user_pointer(&eq->hwe),
> +	};
> +	struct drm_xe_engine_class_instance *eci = NULL;
> +
> +	if (ctx->load_balance && eq->hwe.engine_class == DRM_XE_ENGINE_CLASS_VIDEO_DECODE) {
> +		struct drm_xe_engine_class_instance *hwe;
> +		int i;
> +
> +		for (i = 0; i < ctx->engine_map_count; ++i)
> +			igt_assert(ctx->engine_map[i] == VCS || ctx->engine_map[i] == VCS1 ||
> +				   ctx->engine_map[i] == VCS2);
> +
> +		eci = calloc(16, sizeof(struct drm_xe_engine_class_instance));
> +		create.num_placements = 0;
> +		xe_for_each_hw_engine(fd, hwe) {
> +			if (hwe->engine_class != DRM_XE_ENGINE_CLASS_VIDEO_DECODE ||
> +			    hwe->gt_id != 0)
> +				continue;
> +
> +			igt_assert(create.num_placements < 16);
> +			eci[create.num_placements++] = *hwe;
> +		}
> +		igt_assert(create.num_placements);
> +		create.instances = to_user_pointer(eci);
> +
> +		if (verbose > 3)
> +			printf("num_placements=%d class=%d gt=%d\n", create.num_placements,
> +				eq->hwe.engine_class, eq->hwe.gt_id);
> +	}
> +
> +	igt_assert_eq(igt_ioctl(fd, DRM_IOCTL_XE_EXEC_QUEUE_CREATE, &create), 0);
> +
> +	if (eci)
> +		free(eci);
> +
> +	eq->id = create.exec_queue_id;
> +}
> +
>   static int prepare_contexts(unsigned int id, struct workload *wrk)
>   {
>   	uint32_t share_vm = 0;
> @@ -1796,6 +2046,84 @@ static int prepare_contexts(unsigned int id, struct workload *wrk)
>   		max_ctx = ctx;
>   	}
>   
> +	if (is_xe) {

Shouldn't the i915 and xe parts be mutually exclusive? Or xe ctx setup 
depends on some parts of the existing setup run?

> +		int engine_classes[NUM_ENGINES] = {};
> +
> +		/* shortcut, create one vm */
> +		wrk->nr_vms = 1;
> +		wrk->vm_list = calloc(wrk->nr_vms, sizeof(struct vm));
> +		wrk->vm_list->compute_mode = false;
> +		vm_create(wrk->vm_list);
> +		wrk->vm_list->ahnd =
> +			intel_allocator_open(fd, wrk->vm_list->id, INTEL_ALLOCATOR_RELOC);
> +
> +		/* create exec queues of each referenced engine class */
> +		for (j = 0; j < wrk->nr_ctxs; j++) {
> +			struct ctx *ctx = &wrk->ctx_list[j];
> +
> +			/* link with vm */
> +			ctx->vm = wrk->vm_list;
> +
> +			for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
> +				if (w->context != j)
> +					continue;
> +
> +				if (w->type == ENGINE_MAP) {
> +					ctx->engine_map = w->engine_map;
> +					ctx->engine_map_count = w->engine_map_count;
> +				} else if (w->type == LOAD_BALANCE) {
> +					if (!ctx->engine_map) {
> +						wsim_err("Load balancing needs an engine map!\n");
> +						return 1;
> +					}
> +					if (intel_gen(intel_get_drm_devid(fd)) < 11) {
> +						wsim_err("Load balancing needs relative mmio support, gen11+!\n");
> +						return 1;
> +					}
> +					ctx->load_balance = w->load_balance;
> +				}
> +			}
> +
> +			/* create exec queue for each referenced engine */
> +			for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
> +				if (w->context != j)
> +					continue;
> +
> +				if (w->type == BATCH)
> +					engine_classes[w->engine]++;
> +			}
> +
> +			for (i = 0; i < NUM_ENGINES; i++) {
> +				if (engine_classes[i]) {
> +					if (verbose > 3)
> +						printf("%u ctx[%d] eq(%s) load_balance=%d\n",
> +							id, j, ring_str_map[i], ctx->load_balance);
> +					if (i == VCS) {
> +						ctx->queues[i].hwe.engine_class =
> +							get_xe_engine(VCS1).engine_class;
> +						ctx->queues[i].hwe.engine_instance = 1;
> +					} else
> +						ctx->queues[i].hwe = get_xe_engine(i);
> +					exec_queue_create(ctx, &ctx->queues[i]);
> +					/* init request list */
> +					IGT_INIT_LIST_HEAD(&ctx->queues[i].requests);
> +					ctx->queues[i].nrequest = 0;
> +				}
> +				engine_classes[i] = 0;
> +			}
> +		}
> +
> +		/* create syncobjs for SW_FENCE */
> +		for (j = 0, i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++)
> +			if (w->type == SW_FENCE) {
> +				w->syncs = calloc(1, sizeof(struct drm_xe_sync));
> +				w->syncs[0].handle = syncobj_create(fd, 0);
> +				w->syncs[0].flags = DRM_XE_SYNC_SYNCOBJ;
> +			}
> +
> +		return 0;
> +	}
> +
>   	/*
>   	 * Transfer over engine map configuration from the workload step.
>   	 */
> @@ -2075,7 +2403,8 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
>   		}
>   	}
>   
> -	prepare_working_sets(id, wrk);
> +	if (!is_xe)
> +		prepare_working_sets(id, wrk);

Lets make it error out during parsing, with a user friendly message, 
when working sets are used.

>   
>   	/*
>   	 * Allocate batch buffers.
> @@ -2084,10 +2413,14 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
>   		if (w->type != BATCH)
>   			continue;
>   
> -		alloc_step_batch(wrk, w);
> +		if (is_xe)
> +			xe_alloc_step_batch(wrk, w);
> +		else
> +			alloc_step_batch(wrk, w);
>   	}
>   
> -	measure_active_set(wrk);
> +	if (!is_xe)
> +		measure_active_set(wrk);
>   
>   	return 0;
>   }
> @@ -2134,6 +2467,31 @@ static void w_sync_to(struct workload *wrk, struct w_step *w, int target)
>   	w_step_sync(&wrk->steps[target]);
>   }
>   
> +static void do_xe_exec(struct workload *wrk, struct w_step *w)
> +{
> +	struct exec_queue *eq = get_eq(wrk, w);
> +
> +	igt_assert(w->emit_fence <= 0);
> +	if (w->emit_fence == -1)
> +		syncobj_reset(fd, &w->syncs[0].handle, 1);
> +
> +	/* update duration if random */
> +	if (w->duration.max != w->duration.min)
> +		xe_spin_init_opts(w->spin, .addr = w->exec.address,
> +					   .preempt = (w->preempt_us > 0),
> +					   .ctx_ticks = duration_to_ctx_ticks(fd, eq->hwe.gt_id,
> +								1000LL * get_duration(wrk, w)));
> +	xe_exec(fd, &w->exec);
> +
> +	/* for qd_throttle */
> +	if (w->rq_link.prev != NULL || w->rq_link.next != NULL) {
> +		igt_list_del(&w->rq_link);
> +		eq->nrequest--;
> +	}
> +	igt_list_add_tail(&w->rq_link, &eq->requests);
> +	eq->nrequest++;
> +}
> +
>   static void
>   do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine)
>   {
> @@ -2258,6 +2616,10 @@ static void *run_workload(void *data)
>   					sw_sync_timeline_create_fence(wrk->sync_timeline,
>   								      cur_seqno + w->idx);
>   				igt_assert(w->emit_fence > 0);
> +				if (is_xe)
> +					/* Convert sync file to syncobj */
> +					syncobj_import_sync_file(fd, w->syncs[0].handle,
> +								 w->emit_fence);
>   				continue;
>   			} else if (w->type == SW_FENCE_SIGNAL) {
>   				int tgt = w->idx + w->target;
> @@ -2270,6 +2632,9 @@ static void *run_workload(void *data)
>   				sw_sync_timeline_inc(wrk->sync_timeline, inc);
>   				continue;
>   			} else if (w->type == CTX_PRIORITY) {
> +				if (is_xe)
> +					continue;
> +
>   				if (w->priority != wrk->ctx_list[w->context].priority) {
>   					struct drm_i915_gem_context_param param = {
>   						.ctx_id = wrk->ctx_list[w->context].id,
> @@ -2289,7 +2654,10 @@ static void *run_workload(void *data)
>   				igt_assert(wrk->steps[t_idx].type == BATCH);
>   				igt_assert(wrk->steps[t_idx].duration.unbound_duration);
>   
> -				*wrk->steps[t_idx].bb_duration = 0xffffffff;
> +				if (is_xe)
> +					xe_spin_end(wrk->steps[t_idx].spin);
> +				else
> +					*wrk->steps[t_idx].bb_duration = 0xffffffff;
>   				__sync_synchronize();
>   				continue;
>   			} else if (w->type == SSEU) {
> @@ -2321,15 +2689,19 @@ static void *run_workload(void *data)
>   			if (throttle > 0)
>   				w_sync_to(wrk, w, i - throttle);
>   
> -			do_eb(wrk, w, engine);
> +			if (is_xe)
> +				do_xe_exec(wrk, w);
> +			else {
> +				do_eb(wrk, w, engine);
>   
> -			if (w->request != -1) {
> -				igt_list_del(&w->rq_link);
> -				wrk->nrequest[w->request]--;
> +				if (w->request != -1) {
> +					igt_list_del(&w->rq_link);
> +					wrk->nrequest[w->request]--;
> +				}
> +				w->request = engine;
> +				igt_list_add_tail(&w->rq_link, &wrk->requests[engine]);
> +				wrk->nrequest[engine]++;

Is the rq list management the same in here and do_xe_exec? If so please 
consolidate into a common wrapper, which can then branch off into i915 
and xe specific parts.

>   			}
> -			w->request = engine;
> -			igt_list_add_tail(&w->rq_link, &wrk->requests[engine]);
> -			wrk->nrequest[engine]++;
>   
>   			if (!wrk->run)
>   				break;
> @@ -2338,17 +2710,32 @@ static void *run_workload(void *data)
>   				w_step_sync(w);
>   
>   			if (qd_throttle > 0) {
> -				while (wrk->nrequest[engine] > qd_throttle) {
> -					struct w_step *s;
> +				if (is_xe) {
> +					struct exec_queue *eq = get_eq(wrk, w);
> +
> +					while (eq->nrequest > qd_throttle) {
> +						struct w_step *s;
> +
> +						s = igt_list_first_entry(&eq->requests, s, rq_link);
> +
> +						w_step_sync(s);
>   
> -					s = igt_list_first_entry(&wrk->requests[engine],
> -								 s, rq_link);
> +						igt_list_del(&s->rq_link);
> +						eq->nrequest--;
> +					}
> +				} else {
> +					while (wrk->nrequest[engine] > qd_throttle) {
> +						struct w_step *s;
> +
> +						s = igt_list_first_entry(&wrk->requests[engine],
> +									s, rq_link);
>   
>   						w_step_sync(s);
>   
> -					s->request = -1;
> -					igt_list_del(&s->rq_link);
> -					wrk->nrequest[engine]--;
> +						s->request = -1;
> +						igt_list_del(&s->rq_link);
> +						wrk->nrequest[engine]--;
> +					}

Hm okay throttling is kind of very similar but not exactly the same. 
What is the conceptual difference?

>   				}
>   			}
>   		}
> @@ -2365,18 +2752,51 @@ static void *run_workload(void *data)
>   		for (i = 0, w = wrk->steps; wrk->run && (i < wrk->nr_steps);
>   		     i++, w++) {
>   			if (w->emit_fence > 0) {
> -				close(w->emit_fence);
> -				w->emit_fence = -1;
> +				if (is_xe) {
> +					igt_assert(w->type == SW_FENCE);
> +					close(w->emit_fence);
> +					w->emit_fence = -1;
> +					syncobj_reset(fd, &w->syncs[0].handle, 1);
> +				} else {
> +					close(w->emit_fence);
> +					w->emit_fence = -1;
> +				}
>   			}
>   		}
>   	}
>   
> -	for (i = 0; i < NUM_ENGINES; i++) {
> -		if (!wrk->nrequest[i])
> -			continue;
> +	if (is_xe) {
> +		struct exec_queue *eq;
> +		struct ctx *ctx;
>   
> -		w = igt_list_last_entry(&wrk->requests[i], w, rq_link);
> -		w_step_sync(w);
> +		for_each_ctx(ctx, wrk)
> +			for_each_exec_queue(eq, ctx)
> +				if (eq->nrequest) {
> +					w = igt_list_last_entry(&eq->requests, w, rq_link);
> +					w_step_sync(w);
> +				}
> +
> +		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
> +			if (w->type == BATCH) {
> +				syncobj_destroy(fd, w->syncs[0].handle);
> +				free(w->syncs);
> +				xe_vm_unbind_sync(fd, get_vm(wrk, w)->id, 0, w->exec.address,
> +						  w->bb_size);
> +				gem_munmap(w->spin, w->bb_size);
> +				gem_close(fd, w->bb_handle);
> +			} else if (w->type == SW_FENCE) {
> +				syncobj_destroy(fd, w->syncs[0].handle);
> +				free(w->syncs);
> +			}
> +		}
> +	} else {
> +		for (i = 0; i < NUM_ENGINES; i++) {
> +			if (!wrk->nrequest[i])
> +				continue;
> +
> +			w = igt_list_last_entry(&wrk->requests[i], w, rq_link);
> +			w_step_sync(w);
> +		}
>   	}
>   
>   	clock_gettime(CLOCK_MONOTONIC, &t_end);
> @@ -2398,6 +2818,23 @@ static void *run_workload(void *data)
>   
>   static void fini_workload(struct workload *wrk)
>   {
> +	if (is_xe) {
> +		struct exec_queue *eq;
> +		struct ctx *ctx;
> +		struct vm *vm;
> +
> +		for_each_ctx(ctx, wrk)
> +			for_each_exec_queue(eq, ctx) {
> +				xe_exec_queue_destroy(fd, eq->id);
> +				eq->id = 0;
> +			}
> +		for_each_vm(vm, wrk) {
> +			put_ahnd(vm->ahnd);
> +			xe_vm_destroy(fd, vm->id);
> +		}
> +		free(wrk->vm_list);
> +		wrk->nr_vms = 0;
> +	}
>   	free(wrk->steps);
>   	free(wrk);
>   }
> @@ -2605,8 +3042,12 @@ int main(int argc, char **argv)
>   		ret = igt_device_find_first_i915_discrete_card(&card);
>   		if (!ret)
>   			ret = igt_device_find_integrated_card(&card);
> +		if (!ret)
> +			ret = igt_device_find_first_xe_discrete_card(&card);
> +		if (!ret)
> +			ret = igt_device_find_xe_integrated_card(&card);
>   		if (!ret) {
> -			wsim_err("No device filter specified and no i915 devices found!\n");
> +			wsim_err("No device filter specified and no intel devices found!\n");
>   			return EXIT_FAILURE;
>   		}
>   	}
> @@ -2629,6 +3070,10 @@ int main(int argc, char **argv)
>   	if (verbose > 1)
>   		printf("Using device %s\n", drm_dev);
>   
> +	is_xe = is_xe_device(fd);
> +	if (is_xe)
> +		xe_device_get(fd);

What does this do, out of curiosity? There is no put AFAICT.

> +
>   	if (!nr_w_args) {
>   		wsim_err("No workload descriptor(s)!\n");
>   		goto err;
> diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
> index e4fd61645..f49a73989 100644
> --- a/benchmarks/wsim/README
> +++ b/benchmarks/wsim/README
> @@ -88,6 +88,10 @@ Batch durations can also be specified as infinite by using the '*' in the
>   duration field. Such batches must be ended by the terminate command ('T')
>   otherwise they will cause a GPU hang to be reported.
>   
> +Note: On Xe Batch dependencies are expressed with syncobjects,
> +so there is no difference between f-1 and -1
> +ex. 1.1000.-2.0 is same as 1.1000.f-2.0.
> +

Maybe add a "chapter" talking about the differences between i915 and xe, 
for all the ones which may be relevant. This one in particular may not 
have any practical effect, dont' know. Presumably mixing explicit fence 
creating with implicit, simulated data dependencies all works fine?

Or maybe on top of that we will end up needing two chapters to list the 
commands only available for each backend.

>   Sync (fd) fences
>   ----------------
>   
> @@ -131,7 +135,7 @@ runnable. When the second RCS batch completes the standalone fence is signaled
>   which allows the two VCS batches to be executed. Finally we wait until the both
>   VCS batches have completed before starting the (optional) next iteration.
>   
> -Submit fences
> +Submit fences (i915 only?)

s/?//

>   -------------
>   
>   Submit fences are a type of input fence which are signalled when the originating

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 14/14] benchmarks/gem_wsim: added basic xe support
  2023-09-26 13:10   ` Tvrtko Ursulin
@ 2023-09-26 18:52     ` Bernatowicz, Marcin
  2023-09-27 13:17       ` Tvrtko Ursulin
  0 siblings, 1 reply; 39+ messages in thread
From: Bernatowicz, Marcin @ 2023-09-26 18:52 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: chris.p.wilson

Hi,

On 9/26/2023 3:10 PM, Tvrtko Ursulin wrote:
> 
> On 26/09/2023 09:44, Marcin Bernatowicz wrote:
>> Added basic xe support. Single binary handles both i915 and Xe devices.
>>
>> Some functionality is still missing: working sets, bonding.
>>
>> The tool is handy for scheduling tests, we find it useful to verify vGPU
>> profiles defining different execution quantum/preemption timeout
>> settings.
>>
>> There is also some rationale for the tool in following thread:
>> https://lore.kernel.org/dri-devel/a443495f-5d1b-52e1-9b2f-80167deb6d57@linux.intel.com/
>>
>> With this patch it should be possible to run following on xe device:
>>
>> gem_wsim -w benchmarks/wsim/media_load_balance_fhd26u7.wsim -c 36 -r 600
>>
>> Best with drm debug logs disabled:
>>
>> echo 0 > /sys/module/drm/parameters/debug
>>
>> v2: minimizing divergence - same workload syntax for both drivers,
>>      so most existing examples should run on xe unmodified (Tvrtko)
> 
> Awesome!
> 
>>      This version creates one common VM per workload.
>>      Explicit VM management, compute mode will come in next patchset.
> 
> I think this is going quite well and is looking promising we will end up 
> with something clean.
> 
> The only thing I feel needs to be said ahead of time is that I am not 
> convinced we should be merging any xe specific changes until xe arrives 
> upstream.

I will create a patchset "benchmarks/gem_wsim: fixes and improvements" 
with code not related to xe and then some xe specific ones in 
"benchmarks/gem_wsim: added basic xe support".

> 
> But much good progress with refactoring, cleanup and review can still be 
> made.
> 
I've two bigger refactors for parse_workload and run_workload (to split 
both big loops). In short I introduced w_step_xxx_parse, w_step_xxx_run 
for each step type and have a w_step_parse, w_step_run functions called 
from parse_workload, run_workload accordingly. It may easy a bit a 
maintenance if one of backends needs other handling.

>>
>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>> ---
>>   benchmarks/gem_wsim.c  | 515 ++++++++++++++++++++++++++++++++++++++---
>>   benchmarks/wsim/README |   6 +-
>>   2 files changed, 485 insertions(+), 36 deletions(-)
>>
>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>> index 7703ca822..c83ed4882 100644
>> --- a/benchmarks/gem_wsim.c
>> +++ b/benchmarks/gem_wsim.c
>> @@ -62,6 +62,12 @@
>>   #include "i915/gem_engine_topology.h"
>>   #include "i915/gem_mman.h"
>> +#include "igt_syncobj.h"
>> +#include "intel_allocator.h"
>> +#include "xe_drm.h"
>> +#include "xe/xe_ioctl.h"
>> +#include "xe/xe_spin.h"
>> +
>>   enum intel_engine_id {
>>       DEFAULT,
>>       RCS,
>> @@ -109,6 +115,10 @@ struct deps {
>>       struct dep_entry *list;
>>   };
>> +#define for_each_dep(__dep, __deps) \
>> +    for (int __i = 0; __i < __deps.nr && \
>> +         (__dep = &__deps.list[__i]); ++__i)
> 
> Could you make use of this macro outside xe too? Like in do_eb()? If so 
> you could extract it and merge ahead of time.
> 
sure
>> +
>>   struct w_arg {
>>       char *filename;
>>       char *desc;
>> @@ -177,10 +187,30 @@ struct w_step {
>>       struct drm_i915_gem_execbuffer2 eb;
>>       struct drm_i915_gem_exec_object2 *obj;
>>       struct drm_i915_gem_relocation_entry reloc[3];
>> +
>> +    struct drm_xe_exec exec;
>> +    size_t bb_size;
>> +    struct xe_spin *spin;
>> +    struct drm_xe_sync *syncs;
> 
> Lets think how to create backend specific containers here and make them 
> an union. So it is clear what gets used in each case and that it is 
> mutually exclusive. It may end up requiring a separate refactoring 
> patch(-es) which move the i915 bits into the i915 unions/namespace.

> 
> I know gem_wsim did not have that much a focus for clean design, but now 
> that 2nd backend is coming I think it is much more important for ease of 
> maintenance in the future.

good point
> 
>> +
>>       uint32_t bb_handle;
>>       uint32_t *bb_duration;
>>   };
>> +struct vm {
> 
> Everything xe specific please prefix with xe_. Structs, functions, etc.
> 
ok
>> +    uint32_t id;
>> +    bool compute_mode;
>> +    uint64_t ahnd;
>> +};
>> +
>> +struct exec_queue {
>> +    uint32_t id;
>> +    struct drm_xe_engine_class_instance hwe;
>> +    /* for qd_throttle */
>> +    unsigned int nrequest;
>> +    struct igt_list_head requests;
>> +};
>> +
>>   struct ctx {
>>       uint32_t id;
>>       int priority;
>> @@ -190,6 +220,10 @@ struct ctx {
>>       struct bond *bonds;
>>       bool load_balance;
>>       uint64_t sseu;
>> +    /* reference to vm */
>> +    struct vm *vm;
>> +    /* queue for each class */
>> +    struct exec_queue queues[NUM_ENGINES];
>>   };
>>   struct workload {
>> @@ -213,7 +247,10 @@ struct workload {
>>       unsigned int nr_ctxs;
>>       struct ctx *ctx_list;
>> -    struct working_set **working_sets; /* array indexed by set id */
> 
> Comment got lost.
> 
ups
>> +    unsigned int nr_vms;
>> +    struct vm *vm_list;
>> +
>> +    struct working_set **working_sets;
>>       int max_working_set_id;
>>       int sync_timeline;
>> @@ -223,6 +260,18 @@ struct workload {
>>       unsigned int nrequest[NUM_ENGINES];
>>   };
>> +#define for_each_ctx(__ctx, __wrk) \
>> +    for (int __i = 0; __i < (__wrk)->nr_ctxs && \
>> +         (__ctx = &(__wrk)->ctx_list[__i]); ++__i)
>> +
>> +#define for_each_exec_queue(__eq, __ctx) \
>> +        for (int __j = 0; __j < NUM_ENGINES && ((__eq) = 
>> &((__ctx)->queues[__j])); ++__j) \
>> +            for_if((__eq)->id > 0)
>> +
>> +#define for_each_vm(__vm, __wrk) \
>> +    for (int __i = 0; __i < (__wrk)->nr_vms && \
>> +         (__vm = &(__wrk)->vm_list[__i]); ++__i)
>> +
>>   static unsigned int master_prng;
>>   static int verbose = 1;
>> @@ -231,6 +280,8 @@ static struct drm_i915_gem_context_param_sseu 
>> device_sseu = {
>>       .slice_mask = -1 /* Force read on first use. */
>>   };
>> +static bool is_xe;
> 
> Put it next to global 'int fd'.

ok
> 
>> +
>>   #define SYNCEDCLIENTS    (1<<1)
>>   #define DEPSYNC        (1<<2)
>>   #define FLAG_SSEU    (1<<3)
>> @@ -247,7 +298,10 @@ static const char *ring_str_map[NUM_ENGINES] = {
>>   static void w_step_sync(struct w_step *w)
>>   {
>> -    gem_sync(fd, w->obj[0].handle);
>> +    if (is_xe)
>> +        igt_assert(syncobj_wait(fd, &w->syncs[0].handle, 1, 
>> INT64_MAX, 0, NULL));
>> +    else
>> +        gem_sync(fd, w->obj[0].handle);
>>   }
>>   static int read_timestamp_frequency(int i915)
>> @@ -351,15 +405,23 @@ parse_dependency(unsigned int nr_steps, struct 
>> w_step *w, char *str)
>>           if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
>>               return -1;
>> -        add_dep(&w->data_deps, entry);
>> +        /* only fence deps in xe, let f-1 <==> -1 */
>> +        if (is_xe)
>> +            add_dep(&w->fence_deps, entry);
>> +        else
>> +            add_dep(&w->data_deps, entry);
>>           break;
>>       case 's':
>> -        submit_fence = true;
>> +        /* no submit fence in xe ? */
>> +        if (!is_xe)
>> +            submit_fence = true;
>>           /* Fall-through. */
>>       case 'f':
>> -        /* Multiple fences not yet supported. */
>> -        igt_assert_eq(w->fence_deps.nr, 0);
>> +        /* xe supports multiple fences */
>> +        if (!is_xe)
>> +            /* Multiple fences not yet supported. */
>> +            igt_assert_eq(w->fence_deps.nr, 0);
>>           entry.target = atoi(++str);
>>           if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
>> @@ -469,7 +531,17 @@ static struct intel_engine_data *query_engines(void)
>>       if (engines.nengines)
>>           return &engines;
>> -    engines = intel_engine_list_of_physical(fd);
>> +    if (is_xe) {
>> +        struct drm_xe_engine_class_instance *hwe;
>> +
>> +        xe_for_each_hw_engine(fd, hwe) {
>> +            engines.engines[engines.nengines].class = hwe->engine_class;
>> +            engines.engines[engines.nengines].instance = 
>> hwe->engine_instance;
>> +            engines.nengines++;
>> +        }
>> +    } else
>> +        engines = intel_engine_list_of_physical(fd);
>> +
>>       igt_assert(engines.nengines);
>>       return &engines;
>>   }
>> @@ -562,6 +634,40 @@ get_engine(enum intel_engine_id engine)
>>       return ci;
>>   }
>> +static struct drm_xe_engine_class_instance
>> +get_xe_engine(enum intel_engine_id engine)
>> +{
>> +    struct drm_xe_engine_class_instance ci;
>> +
>> +    switch (engine) {
>> +    case DEFAULT:
>> +    case RCS:
>> +        ci.engine_class = DRM_XE_ENGINE_CLASS_RENDER;
>> +        ci.engine_instance = 0;
>> +        break;
>> +    case BCS:
>> +        ci.engine_class = DRM_XE_ENGINE_CLASS_COPY;
>> +        ci.engine_instance = 0;
>> +        break;
>> +    case VCS1:
>> +        ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_DECODE;
>> +        ci.engine_instance = 0;
>> +        break;
>> +    case VCS2:
>> +        ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_DECODE;
>> +        ci.engine_instance = 1;
>> +        break;
>> +    case VECS:
>> +        ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE;
>> +        ci.engine_instance = 0;
>> +        break;
>> +    default:
>> +        igt_assert(0);
>> +    };
>> +
>> +    return ci;
>> +}
>> +
>>   static int parse_engine_map(struct w_step *step, const char *_str)
>>   {
>>       char *token, *tctx = NULL, *tstart = (char *)_str;
>> @@ -838,6 +944,13 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>               } else if (!strcmp(field, "P")) {
>>                   unsigned int nr = 0;
>> +                if (is_xe) {
>> +                    if (verbose > 3)
>> +                        printf("skipped line: %s\n", _token);
> 
> There are no priorities at all in xe?

There are exec queue properties like 
job_timeout_ms/timeslice_us/preemption_timeout_us/persistence/priority 
(DRM_SCHED_PRIORITY_MIN, DRM_SCHED_PRIORITY_NORMAL, 
DRM_SCHED_PRIORITY_HIGH).

I'm thinking on extending the step to allow for engine granularity if 
provided like P.1.value[.engine_map], otherwise for all engines
or
having something more generic like Property step with syntax:
P.ctx.[jt=value.ts=value.pt=value.pr=value.mp=VCS|RCS|..]
making properties optional, we can specify only the one we need to 
change, example to modify priority on all exec queues in 1st context:
P.1.pr=-1

> 
> I'd make this a warning level print during parsing, that is printed if 
> verbose > 0.

ok
> 
>> +                    free(token);
>> +                    continue;
>> +                }
>> +
>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>                       tmp = atoi(field);
>>                       check_arg(nr == 0 && tmp <= 0,
>> @@ -864,6 +977,13 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>               } else if (!strcmp(field, "S")) {
>>                   unsigned int nr = 0;
>> +                if (is_xe) {
>> +                    if (verbose > 3)
>> +                        printf("skipped line: %s\n", _token);
> 
> This one probably best if fails parsing with an user friendly message.
ok
> >> +                    free(token);
>> +                    continue;
>> +                }
>> +
>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>                       tmp = atoi(field);
>>                       check_arg(tmp <= 0 && nr == 0,
>> @@ -977,6 +1097,12 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>               } else if (!strcmp(field, "b")) {
>>                   unsigned int nr = 0;
>> +                if (is_xe) {
>> +                    if (verbose > 3)
>> +                        printf("skipped line: %s\n", _token);
> 
> Ditto.
> 
>> +                    free(token);
>> +                    continue;
>> +                }
>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>                       check_arg(nr > 2,
>>                             "Invalid bond format at step %u!\n",
>> @@ -1041,7 +1167,7 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>               }
>>               tmp = atoi(field);
>> -            check_arg(tmp < 0, "Invalid ctx id at step %u!\n",
>> +            check_arg(tmp <= 0, "Invalid context id at step %u!\n",
> 
> If context id 0, eg. '0.RCS.1000.0.0', works today, please make this a 
> separate patch which adds a new restriction. If it doesn't work then 
> still make it a separate patch which fixes the validation bug.

Looks my mistake (previously exec_queues started with 0)

> 
>>                     nr_steps);
>>               step.context = tmp;
>> @@ -1054,7 +1180,7 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>               i = str_to_engine(field);
>>               check_arg(i < 0,
>> -                  "Invalid engine id at step %u!\n", nr_steps);
>> +                "Invalid engine id at step %u!\n", nr_steps);
> 
> Noise which breaks indentation. :)
> 
>>               valid++;
>> @@ -1288,6 +1414,20 @@ __get_ctx(struct workload *wrk, const struct 
>> w_step *w)
>>       return &wrk->ctx_list[w->context];
>>   }
>> +static struct exec_queue *
>> +get_eq(struct workload *wrk, const struct w_step *w)
>> +{
>> +    igt_assert(w->engine >= 0 && w->engine < NUM_ENGINES);
>> +
>> +    return &__get_ctx(wrk, w)->queues[w->engine];
>> +}
>> +
>> +static struct vm *
>> +get_vm(struct workload *wrk, const struct w_step *w)
>> +{
>> +    return wrk->vm_list;
>> +}
>> +
>>   static uint32_t mmio_base(int i915, enum intel_engine_id engine, int 
>> gen)
>>   {
>>       const char *name;
>> @@ -1540,6 +1680,61 @@ alloc_step_batch(struct workload *wrk, struct 
>> w_step *w)
>>   #endif
>>   }
>> +static void
>> +xe_alloc_step_batch(struct workload *wrk, struct w_step *w)
>> +{
>> +    struct vm *vm = get_vm(wrk, w);
>> +    struct exec_queue *eq = get_eq(wrk, w);
>> +    struct dep_entry *dep;
>> +    int i;
>> +
>> +    w->bb_size = ALIGN(sizeof(*w->spin) + xe_cs_prefetch_size(fd),
>> +               xe_get_default_alignment(fd));
>> +    w->bb_handle = xe_bo_create(fd, 0, vm->id, w->bb_size);
>> +    w->spin = xe_bo_map(fd, w->bb_handle, w->bb_size);
>> +    w->exec.address =
>> +        intel_allocator_alloc_with_strategy(vm->ahnd, w->bb_handle, 
>> w->bb_size,
>> +                            0, ALLOC_STRATEGY_LOW_TO_HIGH);
>> +    xe_vm_bind_sync(fd, vm->id, w->bb_handle, 0, w->exec.address, 
>> w->bb_size);
>> +    xe_spin_init_opts(w->spin, .addr = w->exec.address,
>> +                   .preempt = (w->preempt_us > 0),
>> +                   .ctx_ticks = duration_to_ctx_ticks(fd, eq->hwe.gt_id,
>> +                                1000LL * get_duration(wrk, w)));
>> +    w->exec.exec_queue_id = eq->id;
>> +    w->exec.num_batch_buffer = 1;
>> +    /* always at least one out fence */
>> +    w->exec.num_syncs = 1;
>> +    /* count syncs */
>> +    igt_assert_eq(0, w->data_deps.nr);
>> +    for_each_dep(dep, w->fence_deps) {
>> +        int dep_idx = w->idx + dep->target;
>> +
>> +        igt_assert(dep_idx >= 0 && dep_idx < w->idx);
>> +        igt_assert(wrk->steps[dep_idx].type == SW_FENCE ||
>> +               wrk->steps[dep_idx].type == BATCH);
>> +
>> +        w->exec.num_syncs++;
>> +    }
>> +    w->syncs = calloc(w->exec.num_syncs, sizeof(*w->syncs));
>> +    /* fill syncs */
>> +    i = 0;
>> +    /* out fence */
>> +    w->syncs[i].handle = syncobj_create(fd, 0);
>> +    w->syncs[i++].flags = DRM_XE_SYNC_SYNCOBJ | DRM_XE_SYNC_SIGNAL;
>> +    /* in fence(s) */
>> +    for_each_dep(dep, w->fence_deps) {
>> +        int dep_idx = w->idx + dep->target;
>> +
>> +        igt_assert(wrk->steps[dep_idx].type == SW_FENCE ||
>> +               wrk->steps[dep_idx].type == BATCH);
>> +        igt_assert(wrk->steps[dep_idx].syncs && 
>> wrk->steps[dep_idx].syncs[0].handle);
>> +
>> +        w->syncs[i].handle = wrk->steps[dep_idx].syncs[0].handle;
>> +        w->syncs[i++].flags = DRM_XE_SYNC_SYNCOBJ;
>> +    }
>> +    w->exec.syncs = to_user_pointer(w->syncs);
>> +}
>> +
>>   static bool set_priority(uint32_t ctx_id, int prio)
>>   {
>>       struct drm_i915_gem_context_param param = {
>> @@ -1766,6 +1961,61 @@ static void measure_active_set(struct workload 
>> *wrk)
>>   #define alloca0(sz) ({ size_t sz__ = (sz); memset(alloca(sz__), 0, 
>> sz__); })
>> +static void vm_create(struct vm *vm)
>> +{
>> +    uint32_t flags = 0;
>> +
>> +    if (vm->compute_mode)
>> +        flags |= DRM_XE_VM_CREATE_ASYNC_BIND_OPS |
>> +             DRM_XE_VM_CREATE_COMPUTE_MODE;
>> +
>> +    vm->id = xe_vm_create(fd, flags, 0);
>> +}
>> +
>> +static void exec_queue_create(struct ctx *ctx, struct exec_queue *eq)
>> +{
>> +    struct drm_xe_exec_queue_create create = {
>> +        .vm_id = ctx->vm->id,
>> +        .width = 1,
>> +        .num_placements = 1,
>> +        .instances = to_user_pointer(&eq->hwe),
>> +    };
>> +    struct drm_xe_engine_class_instance *eci = NULL;
>> +
>> +    if (ctx->load_balance && eq->hwe.engine_class == 
>> DRM_XE_ENGINE_CLASS_VIDEO_DECODE) {
>> +        struct drm_xe_engine_class_instance *hwe;
>> +        int i;
>> +
>> +        for (i = 0; i < ctx->engine_map_count; ++i)
>> +            igt_assert(ctx->engine_map[i] == VCS || 
>> ctx->engine_map[i] == VCS1 ||
>> +                   ctx->engine_map[i] == VCS2);
>> +
>> +        eci = calloc(16, sizeof(struct drm_xe_engine_class_instance));
>> +        create.num_placements = 0;
>> +        xe_for_each_hw_engine(fd, hwe) {
>> +            if (hwe->engine_class != DRM_XE_ENGINE_CLASS_VIDEO_DECODE ||
>> +                hwe->gt_id != 0)
>> +                continue;
>> +
>> +            igt_assert(create.num_placements < 16);
>> +            eci[create.num_placements++] = *hwe;
>> +        }
>> +        igt_assert(create.num_placements);
>> +        create.instances = to_user_pointer(eci);
>> +
>> +        if (verbose > 3)
>> +            printf("num_placements=%d class=%d gt=%d\n", 
>> create.num_placements,
>> +                eq->hwe.engine_class, eq->hwe.gt_id);
>> +    }
>> +
>> +    igt_assert_eq(igt_ioctl(fd, DRM_IOCTL_XE_EXEC_QUEUE_CREATE, 
>> &create), 0);
>> +
>> +    if (eci)
>> +        free(eci);
>> +
>> +    eq->id = create.exec_queue_id;
>> +}
>> +
>>   static int prepare_contexts(unsigned int id, struct workload *wrk)
>>   {
>>       uint32_t share_vm = 0;
>> @@ -1796,6 +2046,84 @@ static int prepare_contexts(unsigned int id, 
>> struct workload *wrk)
>>           max_ctx = ctx;
>>       }
>> +    if (is_xe) {
> 
> Shouldn't the i915 and xe parts be mutually exclusive? Or xe ctx setup 
> depends on some parts of the existing setup run?

I probably need to split better, the common step is allocate_contexts.

> 
>> +        int engine_classes[NUM_ENGINES] = {};
>> +
>> +        /* shortcut, create one vm */
>> +        wrk->nr_vms = 1;
>> +        wrk->vm_list = calloc(wrk->nr_vms, sizeof(struct vm));
>> +        wrk->vm_list->compute_mode = false;
>> +        vm_create(wrk->vm_list);
>> +        wrk->vm_list->ahnd =
>> +            intel_allocator_open(fd, wrk->vm_list->id, 
>> INTEL_ALLOCATOR_RELOC);
>> +
>> +        /* create exec queues of each referenced engine class */
>> +        for (j = 0; j < wrk->nr_ctxs; j++) {
>> +            struct ctx *ctx = &wrk->ctx_list[j];
>> +
>> +            /* link with vm */
>> +            ctx->vm = wrk->vm_list;
>> +
>> +            for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
>> +                if (w->context != j)
>> +                    continue;
>> +
>> +                if (w->type == ENGINE_MAP) {
>> +                    ctx->engine_map = w->engine_map;
>> +                    ctx->engine_map_count = w->engine_map_count;
>> +                } else if (w->type == LOAD_BALANCE) {
>> +                    if (!ctx->engine_map) {
>> +                        wsim_err("Load balancing needs an engine 
>> map!\n");
>> +                        return 1;
>> +                    }
>> +                    if (intel_gen(intel_get_drm_devid(fd)) < 11) {
>> +                        wsim_err("Load balancing needs relative mmio 
>> support, gen11+!\n");
>> +                        return 1;
>> +                    }
>> +                    ctx->load_balance = w->load_balance;
>> +                }
>> +            }
>> +
>> +            /* create exec queue for each referenced engine */
>> +            for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
>> +                if (w->context != j)
>> +                    continue;
>> +
>> +                if (w->type == BATCH)
>> +                    engine_classes[w->engine]++;
>> +            }
>> +
>> +            for (i = 0; i < NUM_ENGINES; i++) {
>> +                if (engine_classes[i]) {
>> +                    if (verbose > 3)
>> +                        printf("%u ctx[%d] eq(%s) load_balance=%d\n",
>> +                            id, j, ring_str_map[i], ctx->load_balance);
>> +                    if (i == VCS) {
>> +                        ctx->queues[i].hwe.engine_class =
>> +                            get_xe_engine(VCS1).engine_class;
>> +                        ctx->queues[i].hwe.engine_instance = 1;
>> +                    } else
>> +                        ctx->queues[i].hwe = get_xe_engine(i);
>> +                    exec_queue_create(ctx, &ctx->queues[i]);
>> +                    /* init request list */
>> +                    IGT_INIT_LIST_HEAD(&ctx->queues[i].requests);
>> +                    ctx->queues[i].nrequest = 0;
>> +                }
>> +                engine_classes[i] = 0;
>> +            }
>> +        }
>> +
>> +        /* create syncobjs for SW_FENCE */
>> +        for (j = 0, i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++)
>> +            if (w->type == SW_FENCE) {
>> +                w->syncs = calloc(1, sizeof(struct drm_xe_sync));
>> +                w->syncs[0].handle = syncobj_create(fd, 0);
>> +                w->syncs[0].flags = DRM_XE_SYNC_SYNCOBJ;
>> +            }
>> +
>> +        return 0;
>> +    }
>> +
>>       /*
>>        * Transfer over engine map configuration from the workload step.
>>        */
>> @@ -2075,7 +2403,8 @@ static int prepare_workload(unsigned int id, 
>> struct workload *wrk)
>>           }
>>       }
>> -    prepare_working_sets(id, wrk);
>> +    if (!is_xe)
>> +        prepare_working_sets(id, wrk);
> 
> Lets make it error out during parsing, with a user friendly message, 
> when working sets are used.
> 
>>       /*
>>        * Allocate batch buffers.
>> @@ -2084,10 +2413,14 @@ static int prepare_workload(unsigned int id, 
>> struct workload *wrk)
>>           if (w->type != BATCH)
>>               continue;
>> -        alloc_step_batch(wrk, w);
>> +        if (is_xe)
>> +            xe_alloc_step_batch(wrk, w);
>> +        else
>> +            alloc_step_batch(wrk, w);
>>       }
>> -    measure_active_set(wrk);
>> +    if (!is_xe)
>> +        measure_active_set(wrk);
>>       return 0;
>>   }
>> @@ -2134,6 +2467,31 @@ static void w_sync_to(struct workload *wrk, 
>> struct w_step *w, int target)
>>       w_step_sync(&wrk->steps[target]);
>>   }
>> +static void do_xe_exec(struct workload *wrk, struct w_step *w)
>> +{
>> +    struct exec_queue *eq = get_eq(wrk, w);
>> +
>> +    igt_assert(w->emit_fence <= 0);
>> +    if (w->emit_fence == -1)
>> +        syncobj_reset(fd, &w->syncs[0].handle, 1);
>> +
>> +    /* update duration if random */
>> +    if (w->duration.max != w->duration.min)
>> +        xe_spin_init_opts(w->spin, .addr = w->exec.address,
>> +                       .preempt = (w->preempt_us > 0),
>> +                       .ctx_ticks = duration_to_ctx_ticks(fd, 
>> eq->hwe.gt_id,
>> +                                1000LL * get_duration(wrk, w)));
>> +    xe_exec(fd, &w->exec);
>> +
>> +    /* for qd_throttle */
>> +    if (w->rq_link.prev != NULL || w->rq_link.next != NULL) {
>> +        igt_list_del(&w->rq_link);
>> +        eq->nrequest--;
>> +    }
>> +    igt_list_add_tail(&w->rq_link, &eq->requests);
>> +    eq->nrequest++;
>> +}
>> +
>>   static void
>>   do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id 
>> engine)
>>   {
>> @@ -2258,6 +2616,10 @@ static void *run_workload(void *data)
>>                       sw_sync_timeline_create_fence(wrk->sync_timeline,
>>                                         cur_seqno + w->idx);
>>                   igt_assert(w->emit_fence > 0);
>> +                if (is_xe)
>> +                    /* Convert sync file to syncobj */
>> +                    syncobj_import_sync_file(fd, w->syncs[0].handle,
>> +                                 w->emit_fence);
>>                   continue;
>>               } else if (w->type == SW_FENCE_SIGNAL) {
>>                   int tgt = w->idx + w->target;
>> @@ -2270,6 +2632,9 @@ static void *run_workload(void *data)
>>                   sw_sync_timeline_inc(wrk->sync_timeline, inc);
>>                   continue;
>>               } else if (w->type == CTX_PRIORITY) {
>> +                if (is_xe)
>> +                    continue;
>> +
>>                   if (w->priority != 
>> wrk->ctx_list[w->context].priority) {
>>                       struct drm_i915_gem_context_param param = {
>>                           .ctx_id = wrk->ctx_list[w->context].id,
>> @@ -2289,7 +2654,10 @@ static void *run_workload(void *data)
>>                   igt_assert(wrk->steps[t_idx].type == BATCH);
>>                   
>> igt_assert(wrk->steps[t_idx].duration.unbound_duration);
>> -                *wrk->steps[t_idx].bb_duration = 0xffffffff;
>> +                if (is_xe)
>> +                    xe_spin_end(wrk->steps[t_idx].spin);
>> +                else
>> +                    *wrk->steps[t_idx].bb_duration = 0xffffffff;
>>                   __sync_synchronize();
>>                   continue;
>>               } else if (w->type == SSEU) {
>> @@ -2321,15 +2689,19 @@ static void *run_workload(void *data)
>>               if (throttle > 0)
>>                   w_sync_to(wrk, w, i - throttle);
>> -            do_eb(wrk, w, engine);
>> +            if (is_xe)
>> +                do_xe_exec(wrk, w);
>> +            else {
>> +                do_eb(wrk, w, engine);
>> -            if (w->request != -1) {
>> -                igt_list_del(&w->rq_link);
>> -                wrk->nrequest[w->request]--;
>> +                if (w->request != -1) {
>> +                    igt_list_del(&w->rq_link);
>> +                    wrk->nrequest[w->request]--;
>> +                }
>> +                w->request = engine;
>> +                igt_list_add_tail(&w->rq_link, &wrk->requests[engine]);
>> +                wrk->nrequest[engine]++;
> 
> Is the rq list management the same in here and do_xe_exec? If so please 
> consolidate into a common wrapper, which can then branch off into i915 
> and xe specific parts.

I need to revisit this.
> 
>>               }
>> -            w->request = engine;
>> -            igt_list_add_tail(&w->rq_link, &wrk->requests[engine]);
>> -            wrk->nrequest[engine]++;
>>               if (!wrk->run)
>>                   break;
>> @@ -2338,17 +2710,32 @@ static void *run_workload(void *data)
>>                   w_step_sync(w);
>>               if (qd_throttle > 0) {
>> -                while (wrk->nrequest[engine] > qd_throttle) {
>> -                    struct w_step *s;
>> +                if (is_xe) {
>> +                    struct exec_queue *eq = get_eq(wrk, w);
>> +
>> +                    while (eq->nrequest > qd_throttle) {
>> +                        struct w_step *s;
>> +
>> +                        s = igt_list_first_entry(&eq->requests, s, 
>> rq_link);
>> +
>> +                        w_step_sync(s);
>> -                    s = igt_list_first_entry(&wrk->requests[engine],
>> -                                 s, rq_link);
>> +                        igt_list_del(&s->rq_link);
>> +                        eq->nrequest--;
>> +                    }
>> +                } else {
>> +                    while (wrk->nrequest[engine] > qd_throttle) {
>> +                        struct w_step *s;
>> +
>> +                        s = igt_list_first_entry(&wrk->requests[engine],
>> +                                    s, rq_link);
>>                           w_step_sync(s);
>> -                    s->request = -1;
>> -                    igt_list_del(&s->rq_link);
>> -                    wrk->nrequest[engine]--;
>> +                        s->request = -1;
>> +                        igt_list_del(&s->rq_link);
>> +                        wrk->nrequest[engine]--;
>> +                    }
> 
> Hm okay throttling is kind of very similar but not exactly the same. 
> What is the conceptual difference?

I need to revisit this code. Probably we can unify it now.
We want to throttle the number of requests on all exec queues of given type.
> >>                   }
>>               }
>>           }
>> @@ -2365,18 +2752,51 @@ static void *run_workload(void *data)
>>           for (i = 0, w = wrk->steps; wrk->run && (i < wrk->nr_steps);
>>                i++, w++) {
>>               if (w->emit_fence > 0) {
>> -                close(w->emit_fence);
>> -                w->emit_fence = -1;
>> +                if (is_xe) {
>> +                    igt_assert(w->type == SW_FENCE);
>> +                    close(w->emit_fence);
>> +                    w->emit_fence = -1;
>> +                    syncobj_reset(fd, &w->syncs[0].handle, 1);
>> +                } else {
>> +                    close(w->emit_fence);
>> +                    w->emit_fence = -1;
>> +                }
>>               }
>>           }
>>       }
>> -    for (i = 0; i < NUM_ENGINES; i++) {
>> -        if (!wrk->nrequest[i])
>> -            continue;
>> +    if (is_xe) {
>> +        struct exec_queue *eq;
>> +        struct ctx *ctx;
>> -        w = igt_list_last_entry(&wrk->requests[i], w, rq_link);
>> -        w_step_sync(w);
>> +        for_each_ctx(ctx, wrk)
>> +            for_each_exec_queue(eq, ctx)
>> +                if (eq->nrequest) {
>> +                    w = igt_list_last_entry(&eq->requests, w, rq_link);
>> +                    w_step_sync(w);
>> +                }
>> +
>> +        for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
>> +            if (w->type == BATCH) {
>> +                syncobj_destroy(fd, w->syncs[0].handle);
>> +                free(w->syncs);
>> +                xe_vm_unbind_sync(fd, get_vm(wrk, w)->id, 0, 
>> w->exec.address,
>> +                          w->bb_size);
>> +                gem_munmap(w->spin, w->bb_size);
>> +                gem_close(fd, w->bb_handle);
>> +            } else if (w->type == SW_FENCE) {
>> +                syncobj_destroy(fd, w->syncs[0].handle);
>> +                free(w->syncs);
>> +            }
>> +        }
>> +    } else {
>> +        for (i = 0; i < NUM_ENGINES; i++) {
>> +            if (!wrk->nrequest[i])
>> +                continue;
>> +
>> +            w = igt_list_last_entry(&wrk->requests[i], w, rq_link);
>> +            w_step_sync(w);
>> +        }
>>       }
>>       clock_gettime(CLOCK_MONOTONIC, &t_end);
>> @@ -2398,6 +2818,23 @@ static void *run_workload(void *data)
>>   static void fini_workload(struct workload *wrk)
>>   {
>> +    if (is_xe) {
>> +        struct exec_queue *eq;
>> +        struct ctx *ctx;
>> +        struct vm *vm;
>> +
>> +        for_each_ctx(ctx, wrk)
>> +            for_each_exec_queue(eq, ctx) {
>> +                xe_exec_queue_destroy(fd, eq->id);
>> +                eq->id = 0;
>> +            }
>> +        for_each_vm(vm, wrk) {
>> +            put_ahnd(vm->ahnd);
>> +            xe_vm_destroy(fd, vm->id);
>> +        }
>> +        free(wrk->vm_list);
>> +        wrk->nr_vms = 0;
>> +    }
>>       free(wrk->steps);
>>       free(wrk);
>>   }
>> @@ -2605,8 +3042,12 @@ int main(int argc, char **argv)
>>           ret = igt_device_find_first_i915_discrete_card(&card);
>>           if (!ret)
>>               ret = igt_device_find_integrated_card(&card);
>> +        if (!ret)
>> +            ret = igt_device_find_first_xe_discrete_card(&card);
>> +        if (!ret)
>> +            ret = igt_device_find_xe_integrated_card(&card);
>>           if (!ret) {
>> -            wsim_err("No device filter specified and no i915 devices 
>> found!\n");
>> +            wsim_err("No device filter specified and no intel devices 
>> found!\n");
>>               return EXIT_FAILURE;
>>           }
>>       }
>> @@ -2629,6 +3070,10 @@ int main(int argc, char **argv)
>>       if (verbose > 1)
>>           printf("Using device %s\n", drm_dev);
>> +    is_xe = is_xe_device(fd);
>> +    if (is_xe)
>> +        xe_device_get(fd);
> 
> What does this do, out of curiosity? There is no put AFAICT.

It reads (via ioctls) many device properties (query gts, engines, 
topology..) and caches that in a map with fd as a key. So calls like

xe_for_each_hw_engine(fd, hwe) {
  ...
}

xe_has_vram, xe_number_hw_engines... (in lib/xe_query)

are using that cached data.

And indeed the put is missing and should be next to close(fd).
> 
>> +
>>       if (!nr_w_args) {
>>           wsim_err("No workload descriptor(s)!\n");
>>           goto err;
>> diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
>> index e4fd61645..f49a73989 100644
>> --- a/benchmarks/wsim/README
>> +++ b/benchmarks/wsim/README
>> @@ -88,6 +88,10 @@ Batch durations can also be specified as infinite 
>> by using the '*' in the
>>   duration field. Such batches must be ended by the terminate command 
>> ('T')
>>   otherwise they will cause a GPU hang to be reported.
>> +Note: On Xe Batch dependencies are expressed with syncobjects,
>> +so there is no difference between f-1 and -1
>> +ex. 1.1000.-2.0 is same as 1.1000.f-2.0.
>> +
> 
> Maybe add a "chapter" talking about the differences between i915 and xe, 
> for all the ones which may be relevant. This one in particular may not 
> have any practical effect, dont' know. Presumably mixing explicit fence 
> creating with implicit, simulated data dependencies all works fine?
> 
> Or maybe on top of that we will end up needing two chapters to list the 
> commands only available for each backend.

Sound reasonable.

> 
>>   Sync (fd) fences
>>   ----------------
>> @@ -131,7 +135,7 @@ runnable. When the second RCS batch completes the 
>> standalone fence is signaled
>>   which allows the two VCS batches to be executed. Finally we wait 
>> until the both
>>   VCS batches have completed before starting the (optional) next 
>> iteration.
>> -Submit fences
>> +Submit fences (i915 only?)
> 
> s/?//
> 
>>   -------------
>>   Submit fences are a type of input fence which are signalled when the 
>> originating
> 
> Regards,
> 
> Tvrtko

Thanks a lot for review
--
marcin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 14/14] benchmarks/gem_wsim: added basic xe support
  2023-09-26 18:52     ` Bernatowicz, Marcin
@ 2023-09-27 13:17       ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2023-09-27 13:17 UTC (permalink / raw)
  To: Bernatowicz, Marcin, igt-dev; +Cc: chris.p.wilson


On 26/09/2023 19:52, Bernatowicz, Marcin wrote:
> Hi,
> 
> On 9/26/2023 3:10 PM, Tvrtko Ursulin wrote:
>>
>> On 26/09/2023 09:44, Marcin Bernatowicz wrote:
>>> Added basic xe support. Single binary handles both i915 and Xe devices.
>>>
>>> Some functionality is still missing: working sets, bonding.
>>>
>>> The tool is handy for scheduling tests, we find it useful to verify vGPU
>>> profiles defining different execution quantum/preemption timeout
>>> settings.
>>>
>>> There is also some rationale for the tool in following thread:
>>> https://lore.kernel.org/dri-devel/a443495f-5d1b-52e1-9b2f-80167deb6d57@linux.intel.com/
>>>
>>> With this patch it should be possible to run following on xe device:
>>>
>>> gem_wsim -w benchmarks/wsim/media_load_balance_fhd26u7.wsim -c 36 -r 600
>>>
>>> Best with drm debug logs disabled:
>>>
>>> echo 0 > /sys/module/drm/parameters/debug
>>>
>>> v2: minimizing divergence - same workload syntax for both drivers,
>>>      so most existing examples should run on xe unmodified (Tvrtko)
>>
>> Awesome!
>>
>>>      This version creates one common VM per workload.
>>>      Explicit VM management, compute mode will come in next patchset.
>>
>> I think this is going quite well and is looking promising we will end 
>> up with something clean.
>>
>> The only thing I feel needs to be said ahead of time is that I am not 
>> convinced we should be merging any xe specific changes until xe 
>> arrives upstream.
> 
> I will create a patchset "benchmarks/gem_wsim: fixes and improvements" 
> with code not related to xe and then some xe specific ones in 
> "benchmarks/gem_wsim: added basic xe support".

It is probably easier to have a single patch series, and then you merge 
patches from the beginning of it as we review them.

>> But much good progress with refactoring, cleanup and review can still 
>> be made.
>>
> I've two bigger refactors for parse_workload and run_workload (to split 
> both big loops). In short I introduced w_step_xxx_parse, w_step_xxx_run 
> for each step type and have a w_step_parse, w_step_run functions called 
> from parse_workload, run_workload accordingly. It may easy a bit a 
> maintenance if one of backends needs other handling.
> 
>>>
>>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>>> ---
>>>   benchmarks/gem_wsim.c  | 515 ++++++++++++++++++++++++++++++++++++++---
>>>   benchmarks/wsim/README |   6 +-
>>>   2 files changed, 485 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>>> index 7703ca822..c83ed4882 100644
>>> --- a/benchmarks/gem_wsim.c
>>> +++ b/benchmarks/gem_wsim.c
>>> @@ -62,6 +62,12 @@
>>>   #include "i915/gem_engine_topology.h"
>>>   #include "i915/gem_mman.h"
>>> +#include "igt_syncobj.h"
>>> +#include "intel_allocator.h"
>>> +#include "xe_drm.h"
>>> +#include "xe/xe_ioctl.h"
>>> +#include "xe/xe_spin.h"
>>> +
>>>   enum intel_engine_id {
>>>       DEFAULT,
>>>       RCS,
>>> @@ -109,6 +115,10 @@ struct deps {
>>>       struct dep_entry *list;
>>>   };
>>> +#define for_each_dep(__dep, __deps) \
>>> +    for (int __i = 0; __i < __deps.nr && \
>>> +         (__dep = &__deps.list[__i]); ++__i)
>>
>> Could you make use of this macro outside xe too? Like in do_eb()? If 
>> so you could extract it and merge ahead of time.
>>
> sure
>>> +
>>>   struct w_arg {
>>>       char *filename;
>>>       char *desc;
>>> @@ -177,10 +187,30 @@ struct w_step {
>>>       struct drm_i915_gem_execbuffer2 eb;
>>>       struct drm_i915_gem_exec_object2 *obj;
>>>       struct drm_i915_gem_relocation_entry reloc[3];
>>> +
>>> +    struct drm_xe_exec exec;
>>> +    size_t bb_size;
>>> +    struct xe_spin *spin;
>>> +    struct drm_xe_sync *syncs;
>>
>> Lets think how to create backend specific containers here and make 
>> them an union. So it is clear what gets used in each case and that it 
>> is mutually exclusive. It may end up requiring a separate refactoring 
>> patch(-es) which move the i915 bits into the i915 unions/namespace.
> 
>>
>> I know gem_wsim did not have that much a focus for clean design, but 
>> now that 2nd backend is coming I think it is much more important for 
>> ease of maintenance in the future.
> 
> good point
>>
>>> +
>>>       uint32_t bb_handle;
>>>       uint32_t *bb_duration;
>>>   };
>>> +struct vm {
>>
>> Everything xe specific please prefix with xe_. Structs, functions, etc.
>>
> ok
>>> +    uint32_t id;
>>> +    bool compute_mode;
>>> +    uint64_t ahnd;
>>> +};
>>> +
>>> +struct exec_queue {
>>> +    uint32_t id;
>>> +    struct drm_xe_engine_class_instance hwe;
>>> +    /* for qd_throttle */
>>> +    unsigned int nrequest;
>>> +    struct igt_list_head requests;
>>> +};
>>> +
>>>   struct ctx {
>>>       uint32_t id;
>>>       int priority;
>>> @@ -190,6 +220,10 @@ struct ctx {
>>>       struct bond *bonds;
>>>       bool load_balance;
>>>       uint64_t sseu;
>>> +    /* reference to vm */
>>> +    struct vm *vm;
>>> +    /* queue for each class */
>>> +    struct exec_queue queues[NUM_ENGINES];
>>>   };
>>>   struct workload {
>>> @@ -213,7 +247,10 @@ struct workload {
>>>       unsigned int nr_ctxs;
>>>       struct ctx *ctx_list;
>>> -    struct working_set **working_sets; /* array indexed by set id */
>>
>> Comment got lost.
>>
> ups
>>> +    unsigned int nr_vms;
>>> +    struct vm *vm_list;
>>> +
>>> +    struct working_set **working_sets;
>>>       int max_working_set_id;
>>>       int sync_timeline;
>>> @@ -223,6 +260,18 @@ struct workload {
>>>       unsigned int nrequest[NUM_ENGINES];
>>>   };
>>> +#define for_each_ctx(__ctx, __wrk) \
>>> +    for (int __i = 0; __i < (__wrk)->nr_ctxs && \
>>> +         (__ctx = &(__wrk)->ctx_list[__i]); ++__i)
>>> +
>>> +#define for_each_exec_queue(__eq, __ctx) \
>>> +        for (int __j = 0; __j < NUM_ENGINES && ((__eq) = 
>>> &((__ctx)->queues[__j])); ++__j) \
>>> +            for_if((__eq)->id > 0)
>>> +
>>> +#define for_each_vm(__vm, __wrk) \
>>> +    for (int __i = 0; __i < (__wrk)->nr_vms && \
>>> +         (__vm = &(__wrk)->vm_list[__i]); ++__i)
>>> +
>>>   static unsigned int master_prng;
>>>   static int verbose = 1;
>>> @@ -231,6 +280,8 @@ static struct drm_i915_gem_context_param_sseu 
>>> device_sseu = {
>>>       .slice_mask = -1 /* Force read on first use. */
>>>   };
>>> +static bool is_xe;
>>
>> Put it next to global 'int fd'.
> 
> ok
>>
>>> +
>>>   #define SYNCEDCLIENTS    (1<<1)
>>>   #define DEPSYNC        (1<<2)
>>>   #define FLAG_SSEU    (1<<3)
>>> @@ -247,7 +298,10 @@ static const char *ring_str_map[NUM_ENGINES] = {
>>>   static void w_step_sync(struct w_step *w)
>>>   {
>>> -    gem_sync(fd, w->obj[0].handle);
>>> +    if (is_xe)
>>> +        igt_assert(syncobj_wait(fd, &w->syncs[0].handle, 1, 
>>> INT64_MAX, 0, NULL));
>>> +    else
>>> +        gem_sync(fd, w->obj[0].handle);
>>>   }
>>>   static int read_timestamp_frequency(int i915)
>>> @@ -351,15 +405,23 @@ parse_dependency(unsigned int nr_steps, struct 
>>> w_step *w, char *str)
>>>           if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
>>>               return -1;
>>> -        add_dep(&w->data_deps, entry);
>>> +        /* only fence deps in xe, let f-1 <==> -1 */
>>> +        if (is_xe)
>>> +            add_dep(&w->fence_deps, entry);
>>> +        else
>>> +            add_dep(&w->data_deps, entry);
>>>           break;
>>>       case 's':
>>> -        submit_fence = true;
>>> +        /* no submit fence in xe ? */
>>> +        if (!is_xe)
>>> +            submit_fence = true;
>>>           /* Fall-through. */
>>>       case 'f':
>>> -        /* Multiple fences not yet supported. */
>>> -        igt_assert_eq(w->fence_deps.nr, 0);
>>> +        /* xe supports multiple fences */
>>> +        if (!is_xe)
>>> +            /* Multiple fences not yet supported. */
>>> +            igt_assert_eq(w->fence_deps.nr, 0);
>>>           entry.target = atoi(++str);
>>>           if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
>>> @@ -469,7 +531,17 @@ static struct intel_engine_data 
>>> *query_engines(void)
>>>       if (engines.nengines)
>>>           return &engines;
>>> -    engines = intel_engine_list_of_physical(fd);
>>> +    if (is_xe) {
>>> +        struct drm_xe_engine_class_instance *hwe;
>>> +
>>> +        xe_for_each_hw_engine(fd, hwe) {
>>> +            engines.engines[engines.nengines].class = 
>>> hwe->engine_class;
>>> +            engines.engines[engines.nengines].instance = 
>>> hwe->engine_instance;
>>> +            engines.nengines++;
>>> +        }
>>> +    } else
>>> +        engines = intel_engine_list_of_physical(fd);
>>> +
>>>       igt_assert(engines.nengines);
>>>       return &engines;
>>>   }
>>> @@ -562,6 +634,40 @@ get_engine(enum intel_engine_id engine)
>>>       return ci;
>>>   }
>>> +static struct drm_xe_engine_class_instance
>>> +get_xe_engine(enum intel_engine_id engine)
>>> +{
>>> +    struct drm_xe_engine_class_instance ci;
>>> +
>>> +    switch (engine) {
>>> +    case DEFAULT:
>>> +    case RCS:
>>> +        ci.engine_class = DRM_XE_ENGINE_CLASS_RENDER;
>>> +        ci.engine_instance = 0;
>>> +        break;
>>> +    case BCS:
>>> +        ci.engine_class = DRM_XE_ENGINE_CLASS_COPY;
>>> +        ci.engine_instance = 0;
>>> +        break;
>>> +    case VCS1:
>>> +        ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_DECODE;
>>> +        ci.engine_instance = 0;
>>> +        break;
>>> +    case VCS2:
>>> +        ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_DECODE;
>>> +        ci.engine_instance = 1;
>>> +        break;
>>> +    case VECS:
>>> +        ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE;
>>> +        ci.engine_instance = 0;
>>> +        break;
>>> +    default:
>>> +        igt_assert(0);
>>> +    };
>>> +
>>> +    return ci;
>>> +}
>>> +
>>>   static int parse_engine_map(struct w_step *step, const char *_str)
>>>   {
>>>       char *token, *tctx = NULL, *tstart = (char *)_str;
>>> @@ -838,6 +944,13 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>               } else if (!strcmp(field, "P")) {
>>>                   unsigned int nr = 0;
>>> +                if (is_xe) {
>>> +                    if (verbose > 3)
>>> +                        printf("skipped line: %s\n", _token);
>>
>> There are no priorities at all in xe?
> 
> There are exec queue properties like 
> job_timeout_ms/timeslice_us/preemption_timeout_us/persistence/priority 
> (DRM_SCHED_PRIORITY_MIN, DRM_SCHED_PRIORITY_NORMAL, 
> DRM_SCHED_PRIORITY_HIGH).

So you just left implementing that for later? Or you were not sure about 
how to map the priority integers?

> I'm thinking on extending the step to allow for engine granularity if 
> provided like P.1.value[.engine_map], otherwise for all engines
> or
> having something more generic like Property step with syntax:
> P.ctx.[jt=value.ts=value.pt=value.pr=value.mp=VCS|RCS|..]
> making properties optional, we can specify only the one we need to 
> change, example to modify priority on all exec queues in 1st context:
> P.1.pr=-1

Okay, I don't have an opinion right now. I think leave it for the end of 
the series, or even for later. For starters see if you can make the 
existing P.<ctx>.<priority> work, if that makes sense.

Regards,

Tvrtko

> 
>>
>> I'd make this a warning level print during parsing, that is printed if 
>> verbose > 0.
> 
> ok
>>
>>> +                    free(token);
>>> +                    continue;
>>> +                }
>>> +
>>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>>                       tmp = atoi(field);
>>>                       check_arg(nr == 0 && tmp <= 0,
>>> @@ -864,6 +977,13 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>               } else if (!strcmp(field, "S")) {
>>>                   unsigned int nr = 0;
>>> +                if (is_xe) {
>>> +                    if (verbose > 3)
>>> +                        printf("skipped line: %s\n", _token);
>>
>> This one probably best if fails parsing with an user friendly message.
> ok
>> >> +                    free(token);
>>> +                    continue;
>>> +                }
>>> +
>>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>>                       tmp = atoi(field);
>>>                       check_arg(tmp <= 0 && nr == 0,
>>> @@ -977,6 +1097,12 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>               } else if (!strcmp(field, "b")) {
>>>                   unsigned int nr = 0;
>>> +                if (is_xe) {
>>> +                    if (verbose > 3)
>>> +                        printf("skipped line: %s\n", _token);
>>
>> Ditto.
>>
>>> +                    free(token);
>>> +                    continue;
>>> +                }
>>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>>                       check_arg(nr > 2,
>>>                             "Invalid bond format at step %u!\n",
>>> @@ -1041,7 +1167,7 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>               }
>>>               tmp = atoi(field);
>>> -            check_arg(tmp < 0, "Invalid ctx id at step %u!\n",
>>> +            check_arg(tmp <= 0, "Invalid context id at step %u!\n",
>>
>> If context id 0, eg. '0.RCS.1000.0.0', works today, please make this a 
>> separate patch which adds a new restriction. If it doesn't work then 
>> still make it a separate patch which fixes the validation bug.
> 
> Looks my mistake (previously exec_queues started with 0)
> 
>>
>>>                     nr_steps);
>>>               step.context = tmp;
>>> @@ -1054,7 +1180,7 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>               i = str_to_engine(field);
>>>               check_arg(i < 0,
>>> -                  "Invalid engine id at step %u!\n", nr_steps);
>>> +                "Invalid engine id at step %u!\n", nr_steps);
>>
>> Noise which breaks indentation. :)
>>
>>>               valid++;
>>> @@ -1288,6 +1414,20 @@ __get_ctx(struct workload *wrk, const struct 
>>> w_step *w)
>>>       return &wrk->ctx_list[w->context];
>>>   }
>>> +static struct exec_queue *
>>> +get_eq(struct workload *wrk, const struct w_step *w)
>>> +{
>>> +    igt_assert(w->engine >= 0 && w->engine < NUM_ENGINES);
>>> +
>>> +    return &__get_ctx(wrk, w)->queues[w->engine];
>>> +}
>>> +
>>> +static struct vm *
>>> +get_vm(struct workload *wrk, const struct w_step *w)
>>> +{
>>> +    return wrk->vm_list;
>>> +}
>>> +
>>>   static uint32_t mmio_base(int i915, enum intel_engine_id engine, 
>>> int gen)
>>>   {
>>>       const char *name;
>>> @@ -1540,6 +1680,61 @@ alloc_step_batch(struct workload *wrk, struct 
>>> w_step *w)
>>>   #endif
>>>   }
>>> +static void
>>> +xe_alloc_step_batch(struct workload *wrk, struct w_step *w)
>>> +{
>>> +    struct vm *vm = get_vm(wrk, w);
>>> +    struct exec_queue *eq = get_eq(wrk, w);
>>> +    struct dep_entry *dep;
>>> +    int i;
>>> +
>>> +    w->bb_size = ALIGN(sizeof(*w->spin) + xe_cs_prefetch_size(fd),
>>> +               xe_get_default_alignment(fd));
>>> +    w->bb_handle = xe_bo_create(fd, 0, vm->id, w->bb_size);
>>> +    w->spin = xe_bo_map(fd, w->bb_handle, w->bb_size);
>>> +    w->exec.address =
>>> +        intel_allocator_alloc_with_strategy(vm->ahnd, w->bb_handle, 
>>> w->bb_size,
>>> +                            0, ALLOC_STRATEGY_LOW_TO_HIGH);
>>> +    xe_vm_bind_sync(fd, vm->id, w->bb_handle, 0, w->exec.address, 
>>> w->bb_size);
>>> +    xe_spin_init_opts(w->spin, .addr = w->exec.address,
>>> +                   .preempt = (w->preempt_us > 0),
>>> +                   .ctx_ticks = duration_to_ctx_ticks(fd, 
>>> eq->hwe.gt_id,
>>> +                                1000LL * get_duration(wrk, w)));
>>> +    w->exec.exec_queue_id = eq->id;
>>> +    w->exec.num_batch_buffer = 1;
>>> +    /* always at least one out fence */
>>> +    w->exec.num_syncs = 1;
>>> +    /* count syncs */
>>> +    igt_assert_eq(0, w->data_deps.nr);
>>> +    for_each_dep(dep, w->fence_deps) {
>>> +        int dep_idx = w->idx + dep->target;
>>> +
>>> +        igt_assert(dep_idx >= 0 && dep_idx < w->idx);
>>> +        igt_assert(wrk->steps[dep_idx].type == SW_FENCE ||
>>> +               wrk->steps[dep_idx].type == BATCH);
>>> +
>>> +        w->exec.num_syncs++;
>>> +    }
>>> +    w->syncs = calloc(w->exec.num_syncs, sizeof(*w->syncs));
>>> +    /* fill syncs */
>>> +    i = 0;
>>> +    /* out fence */
>>> +    w->syncs[i].handle = syncobj_create(fd, 0);
>>> +    w->syncs[i++].flags = DRM_XE_SYNC_SYNCOBJ | DRM_XE_SYNC_SIGNAL;
>>> +    /* in fence(s) */
>>> +    for_each_dep(dep, w->fence_deps) {
>>> +        int dep_idx = w->idx + dep->target;
>>> +
>>> +        igt_assert(wrk->steps[dep_idx].type == SW_FENCE ||
>>> +               wrk->steps[dep_idx].type == BATCH);
>>> +        igt_assert(wrk->steps[dep_idx].syncs && 
>>> wrk->steps[dep_idx].syncs[0].handle);
>>> +
>>> +        w->syncs[i].handle = wrk->steps[dep_idx].syncs[0].handle;
>>> +        w->syncs[i++].flags = DRM_XE_SYNC_SYNCOBJ;
>>> +    }
>>> +    w->exec.syncs = to_user_pointer(w->syncs);
>>> +}
>>> +
>>>   static bool set_priority(uint32_t ctx_id, int prio)
>>>   {
>>>       struct drm_i915_gem_context_param param = {
>>> @@ -1766,6 +1961,61 @@ static void measure_active_set(struct workload 
>>> *wrk)
>>>   #define alloca0(sz) ({ size_t sz__ = (sz); memset(alloca(sz__), 0, 
>>> sz__); })
>>> +static void vm_create(struct vm *vm)
>>> +{
>>> +    uint32_t flags = 0;
>>> +
>>> +    if (vm->compute_mode)
>>> +        flags |= DRM_XE_VM_CREATE_ASYNC_BIND_OPS |
>>> +             DRM_XE_VM_CREATE_COMPUTE_MODE;
>>> +
>>> +    vm->id = xe_vm_create(fd, flags, 0);
>>> +}
>>> +
>>> +static void exec_queue_create(struct ctx *ctx, struct exec_queue *eq)
>>> +{
>>> +    struct drm_xe_exec_queue_create create = {
>>> +        .vm_id = ctx->vm->id,
>>> +        .width = 1,
>>> +        .num_placements = 1,
>>> +        .instances = to_user_pointer(&eq->hwe),
>>> +    };
>>> +    struct drm_xe_engine_class_instance *eci = NULL;
>>> +
>>> +    if (ctx->load_balance && eq->hwe.engine_class == 
>>> DRM_XE_ENGINE_CLASS_VIDEO_DECODE) {
>>> +        struct drm_xe_engine_class_instance *hwe;
>>> +        int i;
>>> +
>>> +        for (i = 0; i < ctx->engine_map_count; ++i)
>>> +            igt_assert(ctx->engine_map[i] == VCS || 
>>> ctx->engine_map[i] == VCS1 ||
>>> +                   ctx->engine_map[i] == VCS2);
>>> +
>>> +        eci = calloc(16, sizeof(struct drm_xe_engine_class_instance));
>>> +        create.num_placements = 0;
>>> +        xe_for_each_hw_engine(fd, hwe) {
>>> +            if (hwe->engine_class != 
>>> DRM_XE_ENGINE_CLASS_VIDEO_DECODE ||
>>> +                hwe->gt_id != 0)
>>> +                continue;
>>> +
>>> +            igt_assert(create.num_placements < 16);
>>> +            eci[create.num_placements++] = *hwe;
>>> +        }
>>> +        igt_assert(create.num_placements);
>>> +        create.instances = to_user_pointer(eci);
>>> +
>>> +        if (verbose > 3)
>>> +            printf("num_placements=%d class=%d gt=%d\n", 
>>> create.num_placements,
>>> +                eq->hwe.engine_class, eq->hwe.gt_id);
>>> +    }
>>> +
>>> +    igt_assert_eq(igt_ioctl(fd, DRM_IOCTL_XE_EXEC_QUEUE_CREATE, 
>>> &create), 0);
>>> +
>>> +    if (eci)
>>> +        free(eci);
>>> +
>>> +    eq->id = create.exec_queue_id;
>>> +}
>>> +
>>>   static int prepare_contexts(unsigned int id, struct workload *wrk)
>>>   {
>>>       uint32_t share_vm = 0;
>>> @@ -1796,6 +2046,84 @@ static int prepare_contexts(unsigned int id, 
>>> struct workload *wrk)
>>>           max_ctx = ctx;
>>>       }
>>> +    if (is_xe) {
>>
>> Shouldn't the i915 and xe parts be mutually exclusive? Or xe ctx setup 
>> depends on some parts of the existing setup run?
> 
> I probably need to split better, the common step is allocate_contexts.
> 
>>
>>> +        int engine_classes[NUM_ENGINES] = {};
>>> +
>>> +        /* shortcut, create one vm */
>>> +        wrk->nr_vms = 1;
>>> +        wrk->vm_list = calloc(wrk->nr_vms, sizeof(struct vm));
>>> +        wrk->vm_list->compute_mode = false;
>>> +        vm_create(wrk->vm_list);
>>> +        wrk->vm_list->ahnd =
>>> +            intel_allocator_open(fd, wrk->vm_list->id, 
>>> INTEL_ALLOCATOR_RELOC);
>>> +
>>> +        /* create exec queues of each referenced engine class */
>>> +        for (j = 0; j < wrk->nr_ctxs; j++) {
>>> +            struct ctx *ctx = &wrk->ctx_list[j];
>>> +
>>> +            /* link with vm */
>>> +            ctx->vm = wrk->vm_list;
>>> +
>>> +            for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
>>> +                if (w->context != j)
>>> +                    continue;
>>> +
>>> +                if (w->type == ENGINE_MAP) {
>>> +                    ctx->engine_map = w->engine_map;
>>> +                    ctx->engine_map_count = w->engine_map_count;
>>> +                } else if (w->type == LOAD_BALANCE) {
>>> +                    if (!ctx->engine_map) {
>>> +                        wsim_err("Load balancing needs an engine 
>>> map!\n");
>>> +                        return 1;
>>> +                    }
>>> +                    if (intel_gen(intel_get_drm_devid(fd)) < 11) {
>>> +                        wsim_err("Load balancing needs relative mmio 
>>> support, gen11+!\n");
>>> +                        return 1;
>>> +                    }
>>> +                    ctx->load_balance = w->load_balance;
>>> +                }
>>> +            }
>>> +
>>> +            /* create exec queue for each referenced engine */
>>> +            for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
>>> +                if (w->context != j)
>>> +                    continue;
>>> +
>>> +                if (w->type == BATCH)
>>> +                    engine_classes[w->engine]++;
>>> +            }
>>> +
>>> +            for (i = 0; i < NUM_ENGINES; i++) {
>>> +                if (engine_classes[i]) {
>>> +                    if (verbose > 3)
>>> +                        printf("%u ctx[%d] eq(%s) load_balance=%d\n",
>>> +                            id, j, ring_str_map[i], ctx->load_balance);
>>> +                    if (i == VCS) {
>>> +                        ctx->queues[i].hwe.engine_class =
>>> +                            get_xe_engine(VCS1).engine_class;
>>> +                        ctx->queues[i].hwe.engine_instance = 1;
>>> +                    } else
>>> +                        ctx->queues[i].hwe = get_xe_engine(i);
>>> +                    exec_queue_create(ctx, &ctx->queues[i]);
>>> +                    /* init request list */
>>> +                    IGT_INIT_LIST_HEAD(&ctx->queues[i].requests);
>>> +                    ctx->queues[i].nrequest = 0;
>>> +                }
>>> +                engine_classes[i] = 0;
>>> +            }
>>> +        }
>>> +
>>> +        /* create syncobjs for SW_FENCE */
>>> +        for (j = 0, i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++)
>>> +            if (w->type == SW_FENCE) {
>>> +                w->syncs = calloc(1, sizeof(struct drm_xe_sync));
>>> +                w->syncs[0].handle = syncobj_create(fd, 0);
>>> +                w->syncs[0].flags = DRM_XE_SYNC_SYNCOBJ;
>>> +            }
>>> +
>>> +        return 0;
>>> +    }
>>> +
>>>       /*
>>>        * Transfer over engine map configuration from the workload step.
>>>        */
>>> @@ -2075,7 +2403,8 @@ static int prepare_workload(unsigned int id, 
>>> struct workload *wrk)
>>>           }
>>>       }
>>> -    prepare_working_sets(id, wrk);
>>> +    if (!is_xe)
>>> +        prepare_working_sets(id, wrk);
>>
>> Lets make it error out during parsing, with a user friendly message, 
>> when working sets are used.
>>
>>>       /*
>>>        * Allocate batch buffers.
>>> @@ -2084,10 +2413,14 @@ static int prepare_workload(unsigned int id, 
>>> struct workload *wrk)
>>>           if (w->type != BATCH)
>>>               continue;
>>> -        alloc_step_batch(wrk, w);
>>> +        if (is_xe)
>>> +            xe_alloc_step_batch(wrk, w);
>>> +        else
>>> +            alloc_step_batch(wrk, w);
>>>       }
>>> -    measure_active_set(wrk);
>>> +    if (!is_xe)
>>> +        measure_active_set(wrk);
>>>       return 0;
>>>   }
>>> @@ -2134,6 +2467,31 @@ static void w_sync_to(struct workload *wrk, 
>>> struct w_step *w, int target)
>>>       w_step_sync(&wrk->steps[target]);
>>>   }
>>> +static void do_xe_exec(struct workload *wrk, struct w_step *w)
>>> +{
>>> +    struct exec_queue *eq = get_eq(wrk, w);
>>> +
>>> +    igt_assert(w->emit_fence <= 0);
>>> +    if (w->emit_fence == -1)
>>> +        syncobj_reset(fd, &w->syncs[0].handle, 1);
>>> +
>>> +    /* update duration if random */
>>> +    if (w->duration.max != w->duration.min)
>>> +        xe_spin_init_opts(w->spin, .addr = w->exec.address,
>>> +                       .preempt = (w->preempt_us > 0),
>>> +                       .ctx_ticks = duration_to_ctx_ticks(fd, 
>>> eq->hwe.gt_id,
>>> +                                1000LL * get_duration(wrk, w)));
>>> +    xe_exec(fd, &w->exec);
>>> +
>>> +    /* for qd_throttle */
>>> +    if (w->rq_link.prev != NULL || w->rq_link.next != NULL) {
>>> +        igt_list_del(&w->rq_link);
>>> +        eq->nrequest--;
>>> +    }
>>> +    igt_list_add_tail(&w->rq_link, &eq->requests);
>>> +    eq->nrequest++;
>>> +}
>>> +
>>>   static void
>>>   do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id 
>>> engine)
>>>   {
>>> @@ -2258,6 +2616,10 @@ static void *run_workload(void *data)
>>>                       sw_sync_timeline_create_fence(wrk->sync_timeline,
>>>                                         cur_seqno + w->idx);
>>>                   igt_assert(w->emit_fence > 0);
>>> +                if (is_xe)
>>> +                    /* Convert sync file to syncobj */
>>> +                    syncobj_import_sync_file(fd, w->syncs[0].handle,
>>> +                                 w->emit_fence);
>>>                   continue;
>>>               } else if (w->type == SW_FENCE_SIGNAL) {
>>>                   int tgt = w->idx + w->target;
>>> @@ -2270,6 +2632,9 @@ static void *run_workload(void *data)
>>>                   sw_sync_timeline_inc(wrk->sync_timeline, inc);
>>>                   continue;
>>>               } else if (w->type == CTX_PRIORITY) {
>>> +                if (is_xe)
>>> +                    continue;
>>> +
>>>                   if (w->priority != 
>>> wrk->ctx_list[w->context].priority) {
>>>                       struct drm_i915_gem_context_param param = {
>>>                           .ctx_id = wrk->ctx_list[w->context].id,
>>> @@ -2289,7 +2654,10 @@ static void *run_workload(void *data)
>>>                   igt_assert(wrk->steps[t_idx].type == BATCH);
>>> igt_assert(wrk->steps[t_idx].duration.unbound_duration);
>>> -                *wrk->steps[t_idx].bb_duration = 0xffffffff;
>>> +                if (is_xe)
>>> +                    xe_spin_end(wrk->steps[t_idx].spin);
>>> +                else
>>> +                    *wrk->steps[t_idx].bb_duration = 0xffffffff;
>>>                   __sync_synchronize();
>>>                   continue;
>>>               } else if (w->type == SSEU) {
>>> @@ -2321,15 +2689,19 @@ static void *run_workload(void *data)
>>>               if (throttle > 0)
>>>                   w_sync_to(wrk, w, i - throttle);
>>> -            do_eb(wrk, w, engine);
>>> +            if (is_xe)
>>> +                do_xe_exec(wrk, w);
>>> +            else {
>>> +                do_eb(wrk, w, engine);
>>> -            if (w->request != -1) {
>>> -                igt_list_del(&w->rq_link);
>>> -                wrk->nrequest[w->request]--;
>>> +                if (w->request != -1) {
>>> +                    igt_list_del(&w->rq_link);
>>> +                    wrk->nrequest[w->request]--;
>>> +                }
>>> +                w->request = engine;
>>> +                igt_list_add_tail(&w->rq_link, &wrk->requests[engine]);
>>> +                wrk->nrequest[engine]++;
>>
>> Is the rq list management the same in here and do_xe_exec? If so 
>> please consolidate into a common wrapper, which can then branch off 
>> into i915 and xe specific parts.
> 
> I need to revisit this.
>>
>>>               }
>>> -            w->request = engine;
>>> -            igt_list_add_tail(&w->rq_link, &wrk->requests[engine]);
>>> -            wrk->nrequest[engine]++;
>>>               if (!wrk->run)
>>>                   break;
>>> @@ -2338,17 +2710,32 @@ static void *run_workload(void *data)
>>>                   w_step_sync(w);
>>>               if (qd_throttle > 0) {
>>> -                while (wrk->nrequest[engine] > qd_throttle) {
>>> -                    struct w_step *s;
>>> +                if (is_xe) {
>>> +                    struct exec_queue *eq = get_eq(wrk, w);
>>> +
>>> +                    while (eq->nrequest > qd_throttle) {
>>> +                        struct w_step *s;
>>> +
>>> +                        s = igt_list_first_entry(&eq->requests, s, 
>>> rq_link);
>>> +
>>> +                        w_step_sync(s);
>>> -                    s = igt_list_first_entry(&wrk->requests[engine],
>>> -                                 s, rq_link);
>>> +                        igt_list_del(&s->rq_link);
>>> +                        eq->nrequest--;
>>> +                    }
>>> +                } else {
>>> +                    while (wrk->nrequest[engine] > qd_throttle) {
>>> +                        struct w_step *s;
>>> +
>>> +                        s = 
>>> igt_list_first_entry(&wrk->requests[engine],
>>> +                                    s, rq_link);
>>>                           w_step_sync(s);
>>> -                    s->request = -1;
>>> -                    igt_list_del(&s->rq_link);
>>> -                    wrk->nrequest[engine]--;
>>> +                        s->request = -1;
>>> +                        igt_list_del(&s->rq_link);
>>> +                        wrk->nrequest[engine]--;
>>> +                    }
>>
>> Hm okay throttling is kind of very similar but not exactly the same. 
>> What is the conceptual difference?
> 
> I need to revisit this code. Probably we can unify it now.
> We want to throttle the number of requests on all exec queues of given 
> type.
>> >>                   }
>>>               }
>>>           }
>>> @@ -2365,18 +2752,51 @@ static void *run_workload(void *data)
>>>           for (i = 0, w = wrk->steps; wrk->run && (i < wrk->nr_steps);
>>>                i++, w++) {
>>>               if (w->emit_fence > 0) {
>>> -                close(w->emit_fence);
>>> -                w->emit_fence = -1;
>>> +                if (is_xe) {
>>> +                    igt_assert(w->type == SW_FENCE);
>>> +                    close(w->emit_fence);
>>> +                    w->emit_fence = -1;
>>> +                    syncobj_reset(fd, &w->syncs[0].handle, 1);
>>> +                } else {
>>> +                    close(w->emit_fence);
>>> +                    w->emit_fence = -1;
>>> +                }
>>>               }
>>>           }
>>>       }
>>> -    for (i = 0; i < NUM_ENGINES; i++) {
>>> -        if (!wrk->nrequest[i])
>>> -            continue;
>>> +    if (is_xe) {
>>> +        struct exec_queue *eq;
>>> +        struct ctx *ctx;
>>> -        w = igt_list_last_entry(&wrk->requests[i], w, rq_link);
>>> -        w_step_sync(w);
>>> +        for_each_ctx(ctx, wrk)
>>> +            for_each_exec_queue(eq, ctx)
>>> +                if (eq->nrequest) {
>>> +                    w = igt_list_last_entry(&eq->requests, w, rq_link);
>>> +                    w_step_sync(w);
>>> +                }
>>> +
>>> +        for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
>>> +            if (w->type == BATCH) {
>>> +                syncobj_destroy(fd, w->syncs[0].handle);
>>> +                free(w->syncs);
>>> +                xe_vm_unbind_sync(fd, get_vm(wrk, w)->id, 0, 
>>> w->exec.address,
>>> +                          w->bb_size);
>>> +                gem_munmap(w->spin, w->bb_size);
>>> +                gem_close(fd, w->bb_handle);
>>> +            } else if (w->type == SW_FENCE) {
>>> +                syncobj_destroy(fd, w->syncs[0].handle);
>>> +                free(w->syncs);
>>> +            }
>>> +        }
>>> +    } else {
>>> +        for (i = 0; i < NUM_ENGINES; i++) {
>>> +            if (!wrk->nrequest[i])
>>> +                continue;
>>> +
>>> +            w = igt_list_last_entry(&wrk->requests[i], w, rq_link);
>>> +            w_step_sync(w);
>>> +        }
>>>       }
>>>       clock_gettime(CLOCK_MONOTONIC, &t_end);
>>> @@ -2398,6 +2818,23 @@ static void *run_workload(void *data)
>>>   static void fini_workload(struct workload *wrk)
>>>   {
>>> +    if (is_xe) {
>>> +        struct exec_queue *eq;
>>> +        struct ctx *ctx;
>>> +        struct vm *vm;
>>> +
>>> +        for_each_ctx(ctx, wrk)
>>> +            for_each_exec_queue(eq, ctx) {
>>> +                xe_exec_queue_destroy(fd, eq->id);
>>> +                eq->id = 0;
>>> +            }
>>> +        for_each_vm(vm, wrk) {
>>> +            put_ahnd(vm->ahnd);
>>> +            xe_vm_destroy(fd, vm->id);
>>> +        }
>>> +        free(wrk->vm_list);
>>> +        wrk->nr_vms = 0;
>>> +    }
>>>       free(wrk->steps);
>>>       free(wrk);
>>>   }
>>> @@ -2605,8 +3042,12 @@ int main(int argc, char **argv)
>>>           ret = igt_device_find_first_i915_discrete_card(&card);
>>>           if (!ret)
>>>               ret = igt_device_find_integrated_card(&card);
>>> +        if (!ret)
>>> +            ret = igt_device_find_first_xe_discrete_card(&card);
>>> +        if (!ret)
>>> +            ret = igt_device_find_xe_integrated_card(&card);
>>>           if (!ret) {
>>> -            wsim_err("No device filter specified and no i915 devices 
>>> found!\n");
>>> +            wsim_err("No device filter specified and no intel 
>>> devices found!\n");
>>>               return EXIT_FAILURE;
>>>           }
>>>       }
>>> @@ -2629,6 +3070,10 @@ int main(int argc, char **argv)
>>>       if (verbose > 1)
>>>           printf("Using device %s\n", drm_dev);
>>> +    is_xe = is_xe_device(fd);
>>> +    if (is_xe)
>>> +        xe_device_get(fd);
>>
>> What does this do, out of curiosity? There is no put AFAICT.
> 
> It reads (via ioctls) many device properties (query gts, engines, 
> topology..) and caches that in a map with fd as a key. So calls like
> 
> xe_for_each_hw_engine(fd, hwe) {
>   ...
> }
> 
> xe_has_vram, xe_number_hw_engines... (in lib/xe_query)
> 
> are using that cached data.
> 
> And indeed the put is missing and should be next to close(fd).
>>
>>> +
>>>       if (!nr_w_args) {
>>>           wsim_err("No workload descriptor(s)!\n");
>>>           goto err;
>>> diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
>>> index e4fd61645..f49a73989 100644
>>> --- a/benchmarks/wsim/README
>>> +++ b/benchmarks/wsim/README
>>> @@ -88,6 +88,10 @@ Batch durations can also be specified as infinite 
>>> by using the '*' in the
>>>   duration field. Such batches must be ended by the terminate command 
>>> ('T')
>>>   otherwise they will cause a GPU hang to be reported.
>>> +Note: On Xe Batch dependencies are expressed with syncobjects,
>>> +so there is no difference between f-1 and -1
>>> +ex. 1.1000.-2.0 is same as 1.1000.f-2.0.
>>> +
>>
>> Maybe add a "chapter" talking about the differences between i915 and 
>> xe, for all the ones which may be relevant. This one in particular may 
>> not have any practical effect, dont' know. Presumably mixing explicit 
>> fence creating with implicit, simulated data dependencies all works fine?
>>
>> Or maybe on top of that we will end up needing two chapters to list 
>> the commands only available for each backend.
> 
> Sound reasonable.
> 
>>
>>>   Sync (fd) fences
>>>   ----------------
>>> @@ -131,7 +135,7 @@ runnable. When the second RCS batch completes the 
>>> standalone fence is signaled
>>>   which allows the two VCS batches to be executed. Finally we wait 
>>> until the both
>>>   VCS batches have completed before starting the (optional) next 
>>> iteration.
>>> -Submit fences
>>> +Submit fences (i915 only?)
>>
>> s/?//
>>
>>>   -------------
>>>   Submit fences are a type of input fence which are signalled when 
>>> the originating
>>
>> Regards,
>>
>> Tvrtko
> 
> Thanks a lot for review
> -- 
> marcin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 07/14] benchmarks/gem_wsim: cleanups
  2023-09-26 11:08   ` Tvrtko Ursulin
@ 2023-09-27 19:03     ` Bernatowicz, Marcin
  2023-09-28  8:37       ` Bernatowicz, Marcin
  0 siblings, 1 reply; 39+ messages in thread
From: Bernatowicz, Marcin @ 2023-09-27 19:03 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: chris.p.wilson



On 9/26/2023 1:08 PM, Tvrtko Ursulin wrote:
> 
> On 26/09/2023 09:44, Marcin Bernatowicz wrote:
>> Cleaning checkpatch.pl reported warnings/errors.
>> Removed unused fence_signal field from struct w_step.
>> calloc vs malloc in parse_workload for struct workload.
>>
>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>> ---
>>   benchmarks/gem_wsim.c | 56 ++++++++++++++++++++++++++-----------------
>>   1 file changed, 34 insertions(+), 22 deletions(-)
>>
>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>> index 3b01340bf..daa20fb8a 100644
>> --- a/benchmarks/gem_wsim.c
>> +++ b/benchmarks/gem_wsim.c
>> @@ -1,3 +1,4 @@
>> +// SPDX-License-Identifier: MIT
>>   /*
>>    * Copyright © 2017 Intel Corporation
>>    *
>> @@ -76,8 +77,7 @@ struct duration {
>>       bool unbound_duration;
>>   };
>> -enum w_type
>> -{
>> +enum w_type {
>>       BATCH,
>>       SYNC,
>>       DELAY,
>> @@ -102,8 +102,7 @@ struct dep_entry {
>>       int working_set; /* -1 = step dependecy, >= 0 working set id */
>>   };
>> -struct deps
>> -{
>> +struct deps {
>>       int nr;
>>       bool submit_fence;
>>       struct dep_entry *list;
>> @@ -137,8 +136,7 @@ struct working_set {
>>   struct workload;
>> -struct w_step
>> -{
>> +struct w_step {
>>       struct workload *wrk;
>>       /* Workload step metadata */
>> @@ -155,7 +153,6 @@ struct w_step
>>           int period;
>>           int target;
>>           int throttle;
>> -        int fence_signal;
>>           int priority;
>>           struct {
>>               unsigned int engine_map_count;
>> @@ -194,8 +191,7 @@ struct ctx {
>>       uint64_t sseu;
>>   };
>> -struct workload
>> -{
>> +struct workload {
>>       unsigned int id;
>>       unsigned int nr_steps;
>> @@ -807,6 +803,7 @@ static int add_buffers(struct working_set *set, 
>> char *str)
>>       for (i = 0; i < add; i++) {
>>           struct work_buffer_size *sz = &sizes[set->nr + i];
>> +
>>           sz->min = min_sz;
>>           sz->max = max_sz;
>>           sz->size = 0;
>> @@ -895,13 +892,16 @@ parse_duration(unsigned int nr_steps, struct 
>> duration *dur, double scale_dur, ch
>>   }
>>   #define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
>> -    if ((field = strtok_r(fstart, ".", &fctx))) { \
>> -        tmp = atoi(field); \
>> -        check_arg(_COND_, _ERR_, nr_steps); \
>> -        step.type = _STEP_; \
>> -        step._FIELD_ = tmp; \
>> -        goto add_step; \
>> -    } \
>> +    do { \
>> +        field = strtok_r(fstart, ".", &fctx); \
>> +        if (field) { \
>> +            tmp = atoi(field); \
>> +            check_arg(_COND_, _ERR_, nr_steps); \
>> +            step.type = _STEP_; \
>> +            step._FIELD_ = tmp; \
>> +            goto add_step; \
>> +        } \
>> +    } while (0)
>>   static struct workload *
>>   parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>> @@ -926,7 +926,8 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>           valid = 0;
>>           memset(&step, 0, sizeof(step));
>> -        if ((field = strtok_r(fstart, ".", &fctx))) {
>> +        field = strtok_r(fstart, ".", &fctx);
>> +        if (field) {
>>               fstart = NULL;
>>               if (!strcmp(field, "d")) {
>> @@ -943,6 +944,7 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>                   }
>>               } else if (!strcmp(field, "P")) {
>>                   unsigned int nr = 0;
>> +
>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>                       tmp = atoi(field);
>>                       check_arg(nr == 0 && tmp <= 0,
>> @@ -968,6 +970,7 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>                         "Invalid sync target at step %u!\n");
>>               } else if (!strcmp(field, "S")) {
>>                   unsigned int nr = 0;
>> +
>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>                       tmp = atoi(field);
>>                       check_arg(tmp <= 0 && nr == 0,
>> @@ -1004,6 +1007,7 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>                   goto add_step;
>>               } else if (!strcmp(field, "M")) {
>>                   unsigned int nr = 0;
>> +
>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>                       tmp = atoi(field);
>>                       check_arg(nr == 0 && tmp <= 0,
>> @@ -1034,6 +1038,7 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>                         "Invalid terminate target at step %u!\n");
>>               } else if (!strcmp(field, "X")) {
>>                   unsigned int nr = 0;
>> +
>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>                       tmp = atoi(field);
>>                       check_arg(nr == 0 && tmp <= 0,
>> @@ -1058,6 +1063,7 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>                   goto add_step;
>>               } else if (!strcmp(field, "B")) {
>>                   unsigned int nr = 0;
>> +
>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>                       tmp = atoi(field);
>>                       check_arg(nr == 0 && tmp <= 0,
>> @@ -1077,6 +1083,7 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>                   goto add_step;
>>               } else if (!strcmp(field, "b")) {
>>                   unsigned int nr = 0;
>> +
>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>                       check_arg(nr > 2,
>>                             "Invalid bond format at step %u!\n",
>> @@ -1148,7 +1155,8 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>               valid++;
>>           }
>> -        if ((field = strtok_r(fstart, ".", &fctx))) {
>> +        field = strtok_r(fstart, ".", &fctx);
>> +        if (field) {
>>               fstart = NULL;
>>               i = str_to_engine(field);
>> @@ -1160,7 +1168,8 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>               step.engine = i;
>>           }
>> -        if ((field = strtok_r(fstart, ".", &fctx))) {
>> +        field = strtok_r(fstart, ".", &fctx);
>> +        if (field) {
>>               fstart = NULL;
>>               tmp = parse_duration(nr_steps, &step.duration, 
>> scale_dur, field);
>> @@ -1170,7 +1179,8 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>               valid++;
>>           }
>> -        if ((field = strtok_r(fstart, ".", &fctx))) {
>> +        field = strtok_r(fstart, ".", &fctx);
>> +        if (field) {
>>               fstart = NULL;
>>               tmp = parse_dependencies(nr_steps, &step, field);
>> @@ -1180,7 +1190,8 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>               valid++;
>>           }
>> -        if ((field = strtok_r(fstart, ".", &fctx))) {
>> +        field = strtok_r(fstart, ".", &fctx);
>> +        if (field) {
>>               fstart = NULL;
>>               check_arg(strlen(field) != 1 ||
>> @@ -1224,7 +1235,7 @@ add_step:
>>           nr_steps += app_w->nr_steps;
>>       }
>> -    wrk = malloc(sizeof(*wrk));
>> +    wrk = calloc(1, sizeof(*wrk));
> 
> Rest looks fine but this change I don't know what checkpatch has against 
> it and why calloc(1,..) is better. That's the kernels checkpatch.pl? I 
> don't get it when I run it.

I'll restore it in next version.
My previous patch changes contained fields which were zeroed here, now 
it's not required and creates confusion, I agree it's not a 
cleanup/issue and not reported by the any tool.
--
marcin
> 
> Regards,
> 
> Tvrtko
> 
>>       igt_assert(wrk);
>>       wrk->nr_steps = nr_steps;
>> @@ -2717,6 +2728,7 @@ int main(int argc, char **argv)
>>       if (append_workload_arg) {
>>           struct w_arg arg = { NULL, append_workload_arg, 0 };
>> +
>>           app_w = parse_workload(&arg, flags, scale_dur, scale_time,
>>                          NULL);
>>           if (!app_w) {

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 09/14] benchmarks/gem_wsim: use lib code to query engines
  2023-09-26 11:23   ` Tvrtko Ursulin
@ 2023-09-27 19:09     ` Bernatowicz, Marcin
  0 siblings, 0 replies; 39+ messages in thread
From: Bernatowicz, Marcin @ 2023-09-27 19:09 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: chris.p.wilson



On 9/26/2023 1:23 PM, Tvrtko Ursulin wrote:
> 
> On 26/09/2023 09:44, Marcin Bernatowicz wrote:
>> Use code in lib/i915/gem_engine_topology to query engines.
>>
>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>> ---
>>   benchmarks/gem_wsim.c | 157 +++++-------------------------------------
>>   1 file changed, 19 insertions(+), 138 deletions(-)
>>
>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>> index 2e6eb6388..a3339e1b2 100644
>> --- a/benchmarks/gem_wsim.c
>> +++ b/benchmarks/gem_wsim.c
>> @@ -456,150 +456,31 @@ static int str_to_engine(const char *str)
>>       return -1;
>>   }
>> -static bool __engines_queried;
>> -static unsigned int __num_engines;
>> -static struct i915_engine_class_instance *__engines;
>> -
>> -static int
>> -__i915_query(int i915, struct drm_i915_query *q)
>> +static struct intel_engine_data *query_engines(void)
>>   {
>> -    if (igt_ioctl(i915, DRM_IOCTL_I915_QUERY, q))
>> -        return -errno;
>> -    return 0;
>> -}
>> +    static struct intel_engine_data engines = {};
>> -static int
>> -__i915_query_items(int i915, struct drm_i915_query_item *items, 
>> uint32_t n_items)
>> -{
>> -    struct drm_i915_query q = {
>> -        .num_items = n_items,
>> -        .items_ptr = to_user_pointer(items),
>> -    };
>> -    return __i915_query(i915, &q);
>> -}
>> +    if (engines.nengines)
>> +        return &engines;
>> -static void
>> -i915_query_items(int i915, struct drm_i915_query_item *items, 
>> uint32_t n_items)
>> -{
>> -    igt_assert_eq(__i915_query_items(i915, items, n_items), 0);
>> -}
>> -
>> -static bool has_engine_query(int i915)
>> -{
>> -    struct drm_i915_query_item item = {
>> -        .query_id = DRM_I915_QUERY_ENGINE_INFO,
>> -    };
>> -
>> -    return __i915_query_items(i915, &item, 1) == 0 && item.length > 0;
>> -}
>> -
>> -static void query_engines(void)
>> -{
>> -    struct i915_engine_class_instance *engines;
>> -    unsigned int num;
>> -
>> -    if (__engines_queried)
>> -        return;
>> -
>> -    __engines_queried = true;
>> -
>> -    if (!has_engine_query(fd)) {
>> -        unsigned int num_bsd = gem_has_bsd(fd) + gem_has_bsd2(fd);
>> -        unsigned int i = 0;
>> -
>> -        igt_assert(num_bsd);
>> -
>> -        num = 1 + num_bsd;
>> -
>> -        if (gem_has_blt(fd))
>> -            num++;
>> -
>> -        if (gem_has_vebox(fd))
>> -            num++;
>> -
>> -        engines = calloc(num,
>> -                 sizeof(struct i915_engine_class_instance));
>> -        igt_assert(engines);
>> -
>> -        engines[i].engine_class = I915_ENGINE_CLASS_RENDER;
>> -        engines[i].engine_instance = 0;
>> -        i++;
>> -
>> -        if (gem_has_blt(fd)) {
>> -            engines[i].engine_class = I915_ENGINE_CLASS_COPY;
>> -            engines[i].engine_instance = 0;
>> -            i++;
>> -        }
>> -
>> -        if (gem_has_bsd(fd)) {
>> -            engines[i].engine_class = I915_ENGINE_CLASS_VIDEO;
>> -            engines[i].engine_instance = 0;
>> -            i++;
>> -        }
>> -
>> -        if (gem_has_bsd2(fd)) {
>> -            engines[i].engine_class = I915_ENGINE_CLASS_VIDEO;
>> -            engines[i].engine_instance = 1;
>> -            i++;
>> -        }
>> -
>> -        if (gem_has_vebox(fd)) {
>> -            engines[i].engine_class =
>> -                I915_ENGINE_CLASS_VIDEO_ENHANCE;
>> -            engines[i].engine_instance = 0;
>> -            i++;
>> -        }
>> -    } else {
>> -        struct drm_i915_query_engine_info *engine_info;
>> -        struct drm_i915_query_item item = {
>> -            .query_id = DRM_I915_QUERY_ENGINE_INFO,
>> -        };
>> -        const unsigned int sz = 4096;
>> -        unsigned int i;
>> -
>> -        engine_info = malloc(sz);
>> -        igt_assert(engine_info);
>> -        memset(engine_info, 0, sz);
>> -
>> -        item.data_ptr = to_user_pointer(engine_info);
>> -        item.length = sz;
>> -
>> -        i915_query_items(fd, &item, 1);
>> -        igt_assert(item.length > 0);
>> -        igt_assert(item.length <= sz);
>> -
>> -        num = engine_info->num_engines;
>> -
>> -        engines = calloc(num,
>> -                 sizeof(struct i915_engine_class_instance));
>> -        igt_assert(engines);
>> -
>> -        for (i = 0; i < num; i++) {
>> -            struct drm_i915_engine_info *engine =
>> -                (struct drm_i915_engine_info *)&engine_info->engines[i];
>> -
>> -            engines[i] = engine->engine;
>> -        }
>> -    }
>> -
>> -    __engines = engines;
>> -    __num_engines = num;
>> +    engines = intel_engine_list_of_physical(fd);
>> +    igt_assert(engines.nengines);
>> +    return &engines;
>>   }
>>   static unsigned int num_engines_in_class(enum intel_engine_id class)
>>   {
>> -    unsigned int i, count = 0;
>> +    const struct intel_engine_data *engines = query_engines();
>> +    unsigned int count = 0;
>> +    int i;
>>       igt_assert(class == VCS);
>> -    query_engines();
>> -
>> -    for (i = 0; i < __num_engines; i++) {
>> -        if (__engines[i].engine_class == I915_ENGINE_CLASS_VIDEO)
>> +    for (i = 0; i < engines->nengines; i++) {
> 
> nengines is uint32_t so probably best to keep i unsigned.
> 
>> +        if (engines->engines[i].class == I915_ENGINE_CLASS_VIDEO)
>>               count++;
>>       }
>> -    igt_assert(count);
> 
> Why dropping this?
I messed up, will restore unsigned int and assert in next version.
> 
> Regards,
> 
> Tvrtko
> 
>>       return count;
>>   }
>> @@ -607,16 +488,15 @@ static void
>>   fill_engines_id_class(enum intel_engine_id *list,
>>                 enum intel_engine_id class)
>>   {
>> +    const struct intel_engine_data *engines = query_engines();
>>       enum intel_engine_id engine = VCS1;
>>       unsigned int i, j = 0;
>>       igt_assert(class == VCS);
>>       igt_assert(num_engines_in_class(VCS) <= 2);
>> -    query_engines();
>> -
>> -    for (i = 0; i < __num_engines; i++) {
>> -        if (__engines[i].engine_class != I915_ENGINE_CLASS_VIDEO)
>> +    for (i = 0; i < engines->nengines; i++) {
>> +        if (engines->engines[i].class != I915_ENGINE_CLASS_VIDEO)
>>               continue;
>>           list[j++] = engine++;
>> @@ -626,17 +506,18 @@ fill_engines_id_class(enum intel_engine_id *list,
>>   static unsigned int
>>   find_physical_instance(enum intel_engine_id class, unsigned int 
>> logical)
>>   {
>> +    const struct intel_engine_data *engines = query_engines();
>>       unsigned int i, j = 0;
>>       igt_assert(class == VCS);
>> -    for (i = 0; i < __num_engines; i++) {
>> -        if (__engines[i].engine_class != I915_ENGINE_CLASS_VIDEO)
>> +    for (i = 0; i < engines->nengines; i++) {
>> +        if (engines->engines[i].class != I915_ENGINE_CLASS_VIDEO)
>>               continue;
>>           /* Map logical to physical instances. */
>>           if (logical == j++)
>> -            return __engines[i].engine_instance;
>> +            return engines->engines[i].instance;
>>       }
>>       igt_assert(0);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 07/14] benchmarks/gem_wsim: cleanups
  2023-09-27 19:03     ` Bernatowicz, Marcin
@ 2023-09-28  8:37       ` Bernatowicz, Marcin
  0 siblings, 0 replies; 39+ messages in thread
From: Bernatowicz, Marcin @ 2023-09-28  8:37 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: chris.p.wilson



On 9/27/2023 9:03 PM, Bernatowicz, Marcin wrote:
> 
> 
> On 9/26/2023 1:08 PM, Tvrtko Ursulin wrote:
>>
>> On 26/09/2023 09:44, Marcin Bernatowicz wrote:
>>> Cleaning checkpatch.pl reported warnings/errors.
>>> Removed unused fence_signal field from struct w_step.
>>> calloc vs malloc in parse_workload for struct workload.
>>>
>>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>>> ---
>>>   benchmarks/gem_wsim.c | 56 ++++++++++++++++++++++++++-----------------
>>>   1 file changed, 34 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>>> index 3b01340bf..daa20fb8a 100644
>>> --- a/benchmarks/gem_wsim.c
>>> +++ b/benchmarks/gem_wsim.c
>>> @@ -1,3 +1,4 @@
>>> +// SPDX-License-Identifier: MIT
>>>   /*
>>>    * Copyright © 2017 Intel Corporation
>>>    *
>>> @@ -76,8 +77,7 @@ struct duration {
>>>       bool unbound_duration;
>>>   };
>>> -enum w_type
>>> -{
>>> +enum w_type {
>>>       BATCH,
>>>       SYNC,
>>>       DELAY,
>>> @@ -102,8 +102,7 @@ struct dep_entry {
>>>       int working_set; /* -1 = step dependecy, >= 0 working set id */
>>>   };
>>> -struct deps
>>> -{
>>> +struct deps {
>>>       int nr;
>>>       bool submit_fence;
>>>       struct dep_entry *list;
>>> @@ -137,8 +136,7 @@ struct working_set {
>>>   struct workload;
>>> -struct w_step
>>> -{
>>> +struct w_step {
>>>       struct workload *wrk;
>>>       /* Workload step metadata */
>>> @@ -155,7 +153,6 @@ struct w_step
>>>           int period;
>>>           int target;
>>>           int throttle;
>>> -        int fence_signal;
>>>           int priority;
>>>           struct {
>>>               unsigned int engine_map_count;
>>> @@ -194,8 +191,7 @@ struct ctx {
>>>       uint64_t sseu;
>>>   };
>>> -struct workload
>>> -{
>>> +struct workload {
>>>       unsigned int id;
>>>       unsigned int nr_steps;
>>> @@ -807,6 +803,7 @@ static int add_buffers(struct working_set *set, 
>>> char *str)
>>>       for (i = 0; i < add; i++) {
>>>           struct work_buffer_size *sz = &sizes[set->nr + i];
>>> +
>>>           sz->min = min_sz;
>>>           sz->max = max_sz;
>>>           sz->size = 0;
>>> @@ -895,13 +892,16 @@ parse_duration(unsigned int nr_steps, struct 
>>> duration *dur, double scale_dur, ch
>>>   }
>>>   #define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
>>> -    if ((field = strtok_r(fstart, ".", &fctx))) { \
>>> -        tmp = atoi(field); \
>>> -        check_arg(_COND_, _ERR_, nr_steps); \
>>> -        step.type = _STEP_; \
>>> -        step._FIELD_ = tmp; \
>>> -        goto add_step; \
>>> -    } \
>>> +    do { \
>>> +        field = strtok_r(fstart, ".", &fctx); \
>>> +        if (field) { \
>>> +            tmp = atoi(field); \
>>> +            check_arg(_COND_, _ERR_, nr_steps); \
>>> +            step.type = _STEP_; \
>>> +            step._FIELD_ = tmp; \
>>> +            goto add_step; \
>>> +        } \
>>> +    } while (0)
>>>   static struct workload *
>>>   parse_workload(struct w_arg *arg, unsigned int flags, double 
>>> scale_dur,
>>> @@ -926,7 +926,8 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>           valid = 0;
>>>           memset(&step, 0, sizeof(step));
>>> -        if ((field = strtok_r(fstart, ".", &fctx))) {
>>> +        field = strtok_r(fstart, ".", &fctx);
>>> +        if (field) {
>>>               fstart = NULL;
>>>               if (!strcmp(field, "d")) {
>>> @@ -943,6 +944,7 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>                   }
>>>               } else if (!strcmp(field, "P")) {
>>>                   unsigned int nr = 0;
>>> +
>>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>>                       tmp = atoi(field);
>>>                       check_arg(nr == 0 && tmp <= 0,
>>> @@ -968,6 +970,7 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>                         "Invalid sync target at step %u!\n");
>>>               } else if (!strcmp(field, "S")) {
>>>                   unsigned int nr = 0;
>>> +
>>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>>                       tmp = atoi(field);
>>>                       check_arg(tmp <= 0 && nr == 0,
>>> @@ -1004,6 +1007,7 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>                   goto add_step;
>>>               } else if (!strcmp(field, "M")) {
>>>                   unsigned int nr = 0;
>>> +
>>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>>                       tmp = atoi(field);
>>>                       check_arg(nr == 0 && tmp <= 0,
>>> @@ -1034,6 +1038,7 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>                         "Invalid terminate target at step %u!\n");
>>>               } else if (!strcmp(field, "X")) {
>>>                   unsigned int nr = 0;
>>> +
>>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>>                       tmp = atoi(field);
>>>                       check_arg(nr == 0 && tmp <= 0,
>>> @@ -1058,6 +1063,7 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>                   goto add_step;
>>>               } else if (!strcmp(field, "B")) {
>>>                   unsigned int nr = 0;
>>> +
>>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>>                       tmp = atoi(field);
>>>                       check_arg(nr == 0 && tmp <= 0,
>>> @@ -1077,6 +1083,7 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>                   goto add_step;
>>>               } else if (!strcmp(field, "b")) {
>>>                   unsigned int nr = 0;
>>> +
>>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>>                       check_arg(nr > 2,
>>>                             "Invalid bond format at step %u!\n",
>>> @@ -1148,7 +1155,8 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>               valid++;
>>>           }
>>> -        if ((field = strtok_r(fstart, ".", &fctx))) {
>>> +        field = strtok_r(fstart, ".", &fctx);
>>> +        if (field) {
>>>               fstart = NULL;
>>>               i = str_to_engine(field);
>>> @@ -1160,7 +1168,8 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>               step.engine = i;
>>>           }
>>> -        if ((field = strtok_r(fstart, ".", &fctx))) {
>>> +        field = strtok_r(fstart, ".", &fctx);
>>> +        if (field) {
>>>               fstart = NULL;
>>>               tmp = parse_duration(nr_steps, &step.duration, 
>>> scale_dur, field);
>>> @@ -1170,7 +1179,8 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>               valid++;
>>>           }
>>> -        if ((field = strtok_r(fstart, ".", &fctx))) {
>>> +        field = strtok_r(fstart, ".", &fctx);
>>> +        if (field) {
>>>               fstart = NULL;
>>>               tmp = parse_dependencies(nr_steps, &step, field);
>>> @@ -1180,7 +1190,8 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>               valid++;
>>>           }
>>> -        if ((field = strtok_r(fstart, ".", &fctx))) {
>>> +        field = strtok_r(fstart, ".", &fctx);
>>> +        if (field) {
>>>               fstart = NULL;
>>>               check_arg(strlen(field) != 1 ||
>>> @@ -1224,7 +1235,7 @@ add_step:
>>>           nr_steps += app_w->nr_steps;
>>>       }
>>> -    wrk = malloc(sizeof(*wrk));
>>> +    wrk = calloc(1, sizeof(*wrk));
>>
>> Rest looks fine but this change I don't know what checkpatch has 
>> against it and why calloc(1,..) is better. That's the kernels 
>> checkpatch.pl? I don't get it when I run it.
> 
> I'll restore it in next version.
> My previous patch changes contained fields which were zeroed here, now 
> it's not required and creates confusion, I agree it's not a 
> cleanup/issue and not reported by the any tool.
> -- 
> marcin

I rememeber why I added it, to zero the nr_ctxs value and be able to do 
cleanup in fini_workload. It's not a problem of present code, as we do 
not touch (ex. destroy) contexts in fini_workload, but with coming
changes we will need this nr_ctxs zero initialization (either explicit 
or with calloc), because of:

	for (i = 0; i < clients; i++)
		fini_workload(w[i]); <- those are clones, nr_ctxs is correct as clones 
are calloc'ed and we called prepare_workload on them
	free(w);
	for (i = 0; i < nr_w_args; i++)
		fini_workload(wrk[i]);  <- this ones may have garbage in nr_ctxs 
causing issues

But that is for other patch with changes.

>>
>> Regards,
>>
>> Tvrtko
>>
>>>       igt_assert(wrk);
>>>       wrk->nr_steps = nr_steps;
>>> @@ -2717,6 +2728,7 @@ int main(int argc, char **argv)
>>>       if (append_workload_arg) {
>>>           struct w_arg arg = { NULL, append_workload_arg, 0 };
>>> +
>>>           app_w = parse_workload(&arg, flags, scale_dur, scale_time,
>>>                          NULL);
>>>           if (!app_w) {

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2023-09-28  8:37 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-26  8:44 [igt-dev] [PATCH i-g-t 00/14] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 01/14] lib/igt_device_scan: Xe get integrated/discrete card functions Marcin Bernatowicz
2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 02/14] benchmarks/gem_wsim: reposition the unbound duration boolean Marcin Bernatowicz
2023-09-26 10:23   ` Tvrtko Ursulin
2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 03/14] benchmarks/gem_wsim: fix scaling of period steps Marcin Bernatowicz
2023-09-26 10:28   ` Tvrtko Ursulin
2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 04/14] benchmarks/gem_wsim: fix duration range check Marcin Bernatowicz
2023-09-26 10:40   ` Tvrtko Ursulin
2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 05/14] benchmarks/gem_wsim: extract duration parsing code to new function Marcin Bernatowicz
2023-09-26 10:48   ` Tvrtko Ursulin
2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 06/14] benchmarks/gem_wsim: fix conflicting SSEU #define and enum Marcin Bernatowicz
2023-09-26 10:51   ` Tvrtko Ursulin
2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 07/14] benchmarks/gem_wsim: cleanups Marcin Bernatowicz
2023-09-26 11:08   ` Tvrtko Ursulin
2023-09-27 19:03     ` Bernatowicz, Marcin
2023-09-28  8:37       ` Bernatowicz, Marcin
2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 08/14] benchmarks/gem_wsim: reposition repeat_start variable Marcin Bernatowicz
2023-09-26 11:10   ` Tvrtko Ursulin
2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 09/14] benchmarks/gem_wsim: use lib code to query engines Marcin Bernatowicz
2023-09-26 11:23   ` Tvrtko Ursulin
2023-09-27 19:09     ` Bernatowicz, Marcin
2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 10/14] benchmarks/gem_wsim: allow comments in workload description files Marcin Bernatowicz
2023-09-26 11:33   ` Tvrtko Ursulin
2023-09-26 11:48     ` Bernatowicz, Marcin
2023-09-26 12:10       ` Tvrtko Ursulin
2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 11/14] benchmarks/gem_wsim: introduce w_step_sync function Marcin Bernatowicz
2023-09-26 11:37   ` Tvrtko Ursulin
2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 12/14] benchmarks/gem_wsim: extract prepare contexts code to new function Marcin Bernatowicz
2023-09-26 11:43   ` Tvrtko Ursulin
2023-09-26 11:58     ` Bernatowicz, Marcin
2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 13/14] benchmarks/gem_wsim: extract prepare working sets " Marcin Bernatowicz
2023-09-26 11:46   ` Tvrtko Ursulin
2023-09-26  8:44 ` [igt-dev] [PATCH i-g-t 14/14] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
2023-09-26 13:10   ` Tvrtko Ursulin
2023-09-26 18:52     ` Bernatowicz, Marcin
2023-09-27 13:17       ` Tvrtko Ursulin
2023-09-26 10:03 ` [igt-dev] ✓ CI.xeBAT: success for benchmarks/gem_wsim: added basic xe support (rev3) Patchwork
2023-09-26 10:11 ` [igt-dev] ✗ Fi.CI.BAT: failure " Patchwork
2023-09-26 11:56 ` [igt-dev] ✗ Fi.CI.BUILD: failure for benchmarks/gem_wsim: added basic xe support (rev4) Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox