[igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem

Igt-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support
@ 2023-09-28 17:45 Marcin Bernatowicz
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 01/17] lib/igt_device_scan: Xe get integrated/discrete card functions Marcin Bernatowicz
                   ` (18 more replies)
  0 siblings, 19 replies; 40+ messages in thread
From: Marcin Bernatowicz @ 2023-09-28 17:45 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Added basic xe support. Single binary handles both i915 and Xe devices.

Some functionality is still missing: working sets, bonding.

The tool is handy for scheduling tests, we find it useful to verify vGPU
profiles defining different execution quantum/preemption timeout
settings.

There is also some rationale for the tool in following thread:
https://lore.kernel.org/dri-devel/a443495f-5d1b-52e1-9b2f-80167deb6d57@linux.intel.com/

With this patch it should be possible to run following on xe device:

gem_wsim -w benchmarks/wsim/media_load_balance_fhd26u7.wsim -c 36 -r 600

Best with drm debug logs disabled:

echo 0 > /sys/module/drm/parameters/debug

v2:
- splitted patches to easy review (Kamil), 
  all benchmarks/gem_wsim patches before [RFC] one
  contain fixes (for scale duration option), cleanups (checkpatch.pl),
  refactors (some code moved to functions),
  not related to xe and ready to be applied
- lib/xe_spin is under review in other thread https://patchwork.freedesktop.org/series/122624/

v3: 
- minimizing divergence - same workload syntax for both drivers,
  so most existing examples should run on xe unmodified (Tvrtko)
  This version creates one common VM per workload.
  Explicit VM management, compute mode, improved engine handling
  to come in next patchset.
- split patches to easy review (Tvrtko)
- dropped already merged patches, added documentation to public
  lib functions, some code cleanups (Kamil)

v4:
- addressed review comments in previous patches
- added patch with for_each_dep macro (Tvrtko)
- added patch with bb_size field in w_step
- grouped xe specific fields (Tvrtko)
- moved is_xe boolean next to fd (Tvrtko)
- use xe_ prefix for Xe specific things (Tvrtko)
- left throttling untouched (Tvrtko)
- parse errors vs silent skips on not implemented steps (Tvrtko)
- add 'Xe and i915 differences' section to README (Tvrtko)
  for now no data dependency implemented, left -1 <=> f-1
  to not modify examples (maybe too optimistic assumption?)
- need to think on better engine handling in next version

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>

Marcin Bernatowicz (17):
  lib/igt_device_scan: Xe get integrated/discrete card functions
  benchmarks/gem_wsim: reposition the unbound duration boolean
  benchmarks/gem_wsim: fix scaling of period steps
  benchmarks/gem_wsim: fix duration range check
  benchmarks/gem_wsim: extract duration parsing code to new function
  benchmarks/gem_wsim: fix conflicting SSEU #define and enum
  benchmarks/gem_wsim: cleanups
  benchmarks/gem_wsim: reposition repeat_start variable
  benchmarks/gem_wsim: use lib code to query engines
  benchmarks/gem_wsim: allow comments in workload description files
  benchmarks/gem_wsim: introduce w_step_sync function
  benchmarks/gem_wsim: extract allocate and prepare contexts code to new
    functions
  benchmarks/gem_wsim: extract prepare working sets code to new function
  benchmarks/gem_wsim: group i915 fields
  benchmarks/gem_wsim: introduce bb_size in w_step
  benchmarks/gem_wsim: for_each_dep macro
  benchmarks/gem_wsim: added basic xe support

 benchmarks/gem_wsim.c  | 1020 +++++++++++++++++++++++++++-------------
 benchmarks/wsim/README |   17 +-
 lib/igt_device_scan.c  |   52 +-
 lib/igt_device_scan.h  |    2 +
 4 files changed, 756 insertions(+), 335 deletions(-)

-- 
2.42.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [igt-dev] [PATCH i-g-t 01/17] lib/igt_device_scan: Xe get integrated/discrete card functions
  2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
@ 2023-09-28 17:45 ` Marcin Bernatowicz
  2023-10-05 12:17   ` Kamil Konieczny
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 02/17] benchmarks/gem_wsim: reposition the unbound duration boolean Marcin Bernatowicz
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Marcin Bernatowicz @ 2023-09-28 17:45 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Xe functions to get integrated/discrete card.

v2:
- renamed __find_first_i915_card to __find_first_intel_card_by_driver_name (Zbyszek)
v3:
- added documentation to public functions (Kamil)

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 lib/igt_device_scan.c | 52 +++++++++++++++++++++++++++++++++++--------
 lib/igt_device_scan.h |  2 ++
 2 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/lib/igt_device_scan.c b/lib/igt_device_scan.c
index ae69ed09f..f4f95fef3 100644
--- a/lib/igt_device_scan.c
+++ b/lib/igt_device_scan.c
@@ -769,25 +769,27 @@ __copy_dev_to_card(struct igt_device *dev, struct igt_device_card *card)
  * Iterate over all igt_devices array and find first discrete/integrated card.
  * card->pci_slot_name will be updated only if a card is found.
  */
-static bool __find_first_i915_card(struct igt_device_card *card, bool discrete)
+static bool __find_first_intel_card_by_driver_name(struct igt_device_card *card,
+				bool want_discrete, const char *drv_name)
 {
 	struct igt_device *dev;
-	int cmp;
+	int is_integrated;
 
+	igt_assert(drv_name);
 	memset(card, 0, sizeof(*card));
 
 	igt_list_for_each_entry(dev, &igt_devs.all, link) {
 
-		if (!is_pci_subsystem(dev) || strcmp(dev->driver, "i915"))
+		if (!is_pci_subsystem(dev) || strcmp(dev->driver, drv_name))
 			continue;
 
-		cmp = strncmp(dev->pci_slot_name, INTEGRATED_I915_GPU_PCI_ID,
-			      PCI_SLOT_NAME_SIZE);
+		is_integrated = !strncmp(dev->pci_slot_name, INTEGRATED_I915_GPU_PCI_ID,
+					 PCI_SLOT_NAME_SIZE);
 
-		if (discrete && cmp) {
+		if (want_discrete && !is_integrated) {
 			__copy_dev_to_card(dev, card);
 			return true;
-		} else if (!discrete && !cmp) {
+		} else if (!want_discrete && is_integrated) {
 			__copy_dev_to_card(dev, card);
 			return true;
 		}
@@ -800,14 +802,46 @@ bool igt_device_find_first_i915_discrete_card(struct igt_device_card *card)
 {
 	igt_assert(card);
 
-	return __find_first_i915_card(card, true);
+	return __find_first_intel_card_by_driver_name(card, true, "i915");
+}
+
+/**
+ * igt_device_find_first_xe_discrete_card
+ * @card: pointer to igt_device_card structure
+ *
+ * Iterate over all igt_devices array and find first xe discrete card.
+ * card will be updated only if a device is found.
+ *
+ * Returns: true if device is found, false otherwise.
+ */
+bool igt_device_find_first_xe_discrete_card(struct igt_device_card *card)
+{
+	igt_assert(card);
+
+	return __find_first_intel_card_by_driver_name(card, true, "xe");
 }
 
 bool igt_device_find_integrated_card(struct igt_device_card *card)
 {
 	igt_assert(card);
 
-	return __find_first_i915_card(card, false);
+	return __find_first_intel_card_by_driver_name(card, false, "i915");
+}
+
+/**
+ * igt_device_find_xe_integrated_card
+ * @card: pointer to igt_device_card structure
+ *
+ * Iterate over all igt_devices array and find first xe integrated card.
+ * card will be updated only if a device is found.
+ *
+ * Returns: true if device is found, false otherwise.
+ */
+bool igt_device_find_xe_integrated_card(struct igt_device_card *card)
+{
+	igt_assert(card);
+
+	return __find_first_intel_card_by_driver_name(card, false, "xe");
 }
 
 static struct igt_device *igt_device_from_syspath(const char *syspath)
diff --git a/lib/igt_device_scan.h b/lib/igt_device_scan.h
index e6b0f1b90..b8f6a843d 100644
--- a/lib/igt_device_scan.h
+++ b/lib/igt_device_scan.h
@@ -87,6 +87,8 @@ bool igt_device_card_match_pci(const char *filter,
 	struct igt_device_card *card);
 bool igt_device_find_first_i915_discrete_card(struct igt_device_card *card);
 bool igt_device_find_integrated_card(struct igt_device_card *card);
+bool igt_device_find_first_xe_discrete_card(struct igt_device_card *card);
+bool igt_device_find_xe_integrated_card(struct igt_device_card *card);
 char *igt_device_get_pretty_name(struct igt_device_card *card, bool numeric);
 int igt_open_card(struct igt_device_card *card);
 int igt_open_render(struct igt_device_card *card);
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 01/17] lib/igt_device_scan: Xe get integrated/discrete card functions
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 01/17] lib/igt_device_scan: Xe get integrated/discrete card functions Marcin Bernatowicz
@ 2023-10-05 12:17   ` Kamil Konieczny
  0 siblings, 0 replies; 40+ messages in thread
From: Kamil Konieczny @ 2023-10-05 12:17 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Hi Marcin,
On 2023-09-28 at 17:45:16 +0000, Marcin Bernatowicz wrote:
> Xe functions to get integrated/discrete card.
--------------------------------^
s!/discrete! or first discrete!

With that
Reviewed-by: Kamil Konieczny <kamil.konieczny@linux.intel.com>

> 
> v2:
> - renamed __find_first_i915_card to __find_first_intel_card_by_driver_name (Zbyszek)
> v3:
> - added documentation to public functions (Kamil)
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>  lib/igt_device_scan.c | 52 +++++++++++++++++++++++++++++++++++--------
>  lib/igt_device_scan.h |  2 ++
>  2 files changed, 45 insertions(+), 9 deletions(-)
> 
> diff --git a/lib/igt_device_scan.c b/lib/igt_device_scan.c
> index ae69ed09f..f4f95fef3 100644
> --- a/lib/igt_device_scan.c
> +++ b/lib/igt_device_scan.c
> @@ -769,25 +769,27 @@ __copy_dev_to_card(struct igt_device *dev, struct igt_device_card *card)
>   * Iterate over all igt_devices array and find first discrete/integrated card.
>   * card->pci_slot_name will be updated only if a card is found.
>   */
> -static bool __find_first_i915_card(struct igt_device_card *card, bool discrete)
> +static bool __find_first_intel_card_by_driver_name(struct igt_device_card *card,
> +				bool want_discrete, const char *drv_name)
>  {
>  	struct igt_device *dev;
> -	int cmp;
> +	int is_integrated;
>  
> +	igt_assert(drv_name);
>  	memset(card, 0, sizeof(*card));
>  
>  	igt_list_for_each_entry(dev, &igt_devs.all, link) {
>  
> -		if (!is_pci_subsystem(dev) || strcmp(dev->driver, "i915"))
> +		if (!is_pci_subsystem(dev) || strcmp(dev->driver, drv_name))
>  			continue;
>  
> -		cmp = strncmp(dev->pci_slot_name, INTEGRATED_I915_GPU_PCI_ID,
> -			      PCI_SLOT_NAME_SIZE);
> +		is_integrated = !strncmp(dev->pci_slot_name, INTEGRATED_I915_GPU_PCI_ID,
> +					 PCI_SLOT_NAME_SIZE);
>  
> -		if (discrete && cmp) {
> +		if (want_discrete && !is_integrated) {
>  			__copy_dev_to_card(dev, card);
>  			return true;
> -		} else if (!discrete && !cmp) {
> +		} else if (!want_discrete && is_integrated) {
>  			__copy_dev_to_card(dev, card);
>  			return true;
>  		}
> @@ -800,14 +802,46 @@ bool igt_device_find_first_i915_discrete_card(struct igt_device_card *card)
>  {
>  	igt_assert(card);
>  
> -	return __find_first_i915_card(card, true);
> +	return __find_first_intel_card_by_driver_name(card, true, "i915");
> +}
> +
> +/**
> + * igt_device_find_first_xe_discrete_card
> + * @card: pointer to igt_device_card structure
> + *
> + * Iterate over all igt_devices array and find first xe discrete card.
> + * card will be updated only if a device is found.
> + *
> + * Returns: true if device is found, false otherwise.
> + */
> +bool igt_device_find_first_xe_discrete_card(struct igt_device_card *card)
> +{
> +	igt_assert(card);
> +
> +	return __find_first_intel_card_by_driver_name(card, true, "xe");
>  }
>  
>  bool igt_device_find_integrated_card(struct igt_device_card *card)
>  {
>  	igt_assert(card);
>  
> -	return __find_first_i915_card(card, false);
> +	return __find_first_intel_card_by_driver_name(card, false, "i915");
> +}
> +
> +/**
> + * igt_device_find_xe_integrated_card
> + * @card: pointer to igt_device_card structure
> + *
> + * Iterate over all igt_devices array and find first xe integrated card.
> + * card will be updated only if a device is found.
> + *
> + * Returns: true if device is found, false otherwise.
> + */
> +bool igt_device_find_xe_integrated_card(struct igt_device_card *card)
> +{
> +	igt_assert(card);
> +
> +	return __find_first_intel_card_by_driver_name(card, false, "xe");
>  }
>  
>  static struct igt_device *igt_device_from_syspath(const char *syspath)
> diff --git a/lib/igt_device_scan.h b/lib/igt_device_scan.h
> index e6b0f1b90..b8f6a843d 100644
> --- a/lib/igt_device_scan.h
> +++ b/lib/igt_device_scan.h
> @@ -87,6 +87,8 @@ bool igt_device_card_match_pci(const char *filter,
>  	struct igt_device_card *card);
>  bool igt_device_find_first_i915_discrete_card(struct igt_device_card *card);
>  bool igt_device_find_integrated_card(struct igt_device_card *card);
> +bool igt_device_find_first_xe_discrete_card(struct igt_device_card *card);
> +bool igt_device_find_xe_integrated_card(struct igt_device_card *card);
>  char *igt_device_get_pretty_name(struct igt_device_card *card, bool numeric);
>  int igt_open_card(struct igt_device_card *card);
>  int igt_open_render(struct igt_device_card *card);
> -- 
> 2.42.0
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [igt-dev] [PATCH i-g-t 02/17] benchmarks/gem_wsim: reposition the unbound duration boolean
  2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 01/17] lib/igt_device_scan: Xe get integrated/discrete card functions Marcin Bernatowicz
@ 2023-09-28 17:45 ` Marcin Bernatowicz
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 03/17] benchmarks/gem_wsim: fix scaling of period steps Marcin Bernatowicz
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 40+ messages in thread
From: Marcin Bernatowicz @ 2023-09-28 17:45 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

All duration info is now in struct duration of w_step.

v2:
- rename unbound_duration to unbound (Tvrtko)

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 7b5e62a3b..42690d3d0 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -73,6 +73,7 @@ enum intel_engine_id {
 
 struct duration {
 	unsigned int min, max;
+	bool unbound;
 };
 
 enum w_type
@@ -145,7 +146,6 @@ struct w_step
 	unsigned int context;
 	unsigned int engine;
 	struct duration duration;
-	bool unbound_duration;
 	struct deps data_deps;
 	struct deps fence_deps;
 	int emit_fence;
@@ -1130,7 +1130,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 				check_arg(intel_gen(intel_get_drm_devid(fd)) < 8,
 					  "Infinite batch at step %u needs Gen8+!\n",
 					  nr_steps);
-				step.unbound_duration = true;
+				step.duration.unbound = true;
 			} else {
 				tmpl = strtol(field, &sep, 10);
 				check_arg(tmpl <= 0 || tmpl == LONG_MIN ||
@@ -2172,8 +2172,8 @@ update_bb_start(struct workload *wrk, struct w_step *w)
 
 	/* ticks is inverted for MI_DO_COMPARE (less-than comparison) */
 	ticks = 0;
-	if (!w->unbound_duration)
-		ticks = ~ns_to_ctx_ticks(1000 * get_duration(wrk, w));
+	if (!w->duration.unbound)
+		ticks = ~ns_to_ctx_ticks(1000LL * get_duration(wrk, w));
 
 	*w->bb_duration = ticks;
 }
@@ -2349,7 +2349,7 @@ static void *run_workload(void *data)
 
 				igt_assert(t_idx >= 0 && t_idx < i);
 				igt_assert(wrk->steps[t_idx].type == BATCH);
-				igt_assert(wrk->steps[t_idx].unbound_duration);
+				igt_assert(wrk->steps[t_idx].duration.unbound);
 
 				*wrk->steps[t_idx].bb_duration = 0xffffffff;
 				__sync_synchronize();
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [igt-dev] [PATCH i-g-t 03/17] benchmarks/gem_wsim: fix scaling of period steps
  2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 01/17] lib/igt_device_scan: Xe get integrated/discrete card functions Marcin Bernatowicz
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 02/17] benchmarks/gem_wsim: reposition the unbound duration boolean Marcin Bernatowicz
@ 2023-09-28 17:45 ` Marcin Bernatowicz
  2023-09-29  8:01   ` Tvrtko Ursulin
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 04/17] benchmarks/gem_wsim: fix duration range check Marcin Bernatowicz
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Marcin Bernatowicz @ 2023-09-28 17:45 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Period steps take scale time (-F) command line option into account.
This allows to scale workload without need to modify .wsim file

ex. having following example.wsim

1.VCS1.3000.0.1
1.RCS.500-1000.-1.0
1.RCS.3700.0.0
1.RCS.1000.-2.0
1.VCS2.2300.-2.0
1.RCS.4700.-1.0
1.VCS2.600.-1.1
p.16000

we can scale the whole workload x10 with:

gem_wsim -w example.wsim -f 10 -F 10

-f is for batch duration steps, -F for period and delay steps

v2:
- apply same approach as with DELAY step (Tvrtko)

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 42690d3d0..41557517c 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -1186,6 +1186,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 add_step:
 		if (step.type == DELAY)
 			step.delay = __duration(step.delay, scale_time);
+		else if (step.type == PERIOD)
+			step.period = __duration(step.period, scale_time);
 
 		step.idx = nr_steps++;
 		step.request = -1;
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 03/17] benchmarks/gem_wsim: fix scaling of period steps
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 03/17] benchmarks/gem_wsim: fix scaling of period steps Marcin Bernatowicz
@ 2023-09-29  8:01   ` Tvrtko Ursulin
  2023-09-29  9:31     ` Bernatowicz, Marcin
  0 siblings, 1 reply; 40+ messages in thread
From: Tvrtko Ursulin @ 2023-09-29  8:01 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 28/09/2023 18:45, Marcin Bernatowicz wrote:
> Period steps take scale time (-F) command line option into account.

"Make period steps.."?

"Periods steps should take.."?

> This allows to scale workload without need to modify .wsim file
> 
> ex. having following example.wsim
> 
> 1.VCS1.3000.0.1
> 1.RCS.500-1000.-1.0
> 1.RCS.3700.0.0
> 1.RCS.1000.-2.0
> 1.VCS2.2300.-2.0
> 1.RCS.4700.-1.0
> 1.VCS2.600.-1.1
> p.16000
> 
> we can scale the whole workload x10 with:
> 
> gem_wsim -w example.wsim -f 10 -F 10
> 
> -f is for batch duration steps, -F for period and delay steps

Actually I am having a little bit of a second thought here. Thinking 
that perhaps it was deliberate to not scale periods.

Think of it like this. -f 0.5 simulates a twice as fast GPU. -F 2 
simulates a twice as slow CPU.

In both cases if something wants to hit 60 fps, it still wants to hit 60 
fps. What use case for scaling the period do you have in mind?

Regards,

Tvrtko

> v2:
> - apply same approach as with DELAY step (Tvrtko)
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 42690d3d0..41557517c 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -1186,6 +1186,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   add_step:
>   		if (step.type == DELAY)
>   			step.delay = __duration(step.delay, scale_time);
> +		else if (step.type == PERIOD)
> +			step.period = __duration(step.period, scale_time);
>   
>   		step.idx = nr_steps++;
>   		step.request = -1;

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 03/17] benchmarks/gem_wsim: fix scaling of period steps
  2023-09-29  8:01   ` Tvrtko Ursulin
@ 2023-09-29  9:31     ` Bernatowicz, Marcin
  2023-09-29 10:52       ` Tvrtko Ursulin
  0 siblings, 1 reply; 40+ messages in thread
From: Bernatowicz, Marcin @ 2023-09-29  9:31 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: chris.p.wilson

Hi,

On 9/29/2023 10:01 AM, Tvrtko Ursulin wrote:
> 
> On 28/09/2023 18:45, Marcin Bernatowicz wrote:
>> Period steps take scale time (-F) command line option into account.
> 
> "Make period steps.."?
> 
> "Periods steps should take.."?
> 
>> This allows to scale workload without need to modify .wsim file
>>
>> ex. having following example.wsim
>>
>> 1.VCS1.3000.0.1
>> 1.RCS.500-1000.-1.0
>> 1.RCS.3700.0.0
>> 1.RCS.1000.-2.0
>> 1.VCS2.2300.-2.0
>> 1.RCS.4700.-1.0
>> 1.VCS2.600.-1.1
>> p.16000
>>
>> we can scale the whole workload x10 with:
>>
>> gem_wsim -w example.wsim -f 10 -F 10
>>
>> -f is for batch duration steps, -F for period and delay steps
> 
> Actually I am having a little bit of a second thought here. Thinking 
> that perhaps it was deliberate to not scale periods.
> 
> Think of it like this. -f 0.5 simulates a twice as fast GPU. -F 2 
> simulates a twice as slow CPU.
> 
> In both cases if something wants to hit 60 fps, it still wants to hit 60 
> fps. What use case for scaling the period do you have in mind?
> 

That gives another view on the matter.

I thought about it more like having a common unit, so giving -F 1000 -> 
makes all CPU values (Period/Duration) are given in ms.
-f option may be used to calibrate for difference between GPU and CPU 
ex. if wrongly reported freq makes a real GPU duration 10x faster than 
CPU measured (ex. 10ms specified batch duration takes 1ms in reality) we 
can provide -f 10 and still have GPU durations correspond to CPU time.

Regards,
marcin


> Regards,
> 
> Tvrtko
> 
>> v2:
>> - apply same approach as with DELAY step (Tvrtko)
>>
>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>> ---
>>   benchmarks/gem_wsim.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>> index 42690d3d0..41557517c 100644
>> --- a/benchmarks/gem_wsim.c
>> +++ b/benchmarks/gem_wsim.c
>> @@ -1186,6 +1186,8 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>   add_step:
>>           if (step.type == DELAY)
>>               step.delay = __duration(step.delay, scale_time);
>> +        else if (step.type == PERIOD)
>> +            step.period = __duration(step.period, scale_time);
>>           step.idx = nr_steps++;
>>           step.request = -1;

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 03/17] benchmarks/gem_wsim: fix scaling of period steps
  2023-09-29  9:31     ` Bernatowicz, Marcin
@ 2023-09-29 10:52       ` Tvrtko Ursulin
  2023-09-29 11:30         ` Bernatowicz, Marcin
  0 siblings, 1 reply; 40+ messages in thread
From: Tvrtko Ursulin @ 2023-09-29 10:52 UTC (permalink / raw)
  To: Bernatowicz, Marcin, igt-dev; +Cc: chris.p.wilson


On 29/09/2023 10:31, Bernatowicz, Marcin wrote:
> Hi,
> 
> On 9/29/2023 10:01 AM, Tvrtko Ursulin wrote:
>>
>> On 28/09/2023 18:45, Marcin Bernatowicz wrote:
>>> Period steps take scale time (-F) command line option into account.
>>
>> "Make period steps.."?
>>
>> "Periods steps should take.."?
>>
>>> This allows to scale workload without need to modify .wsim file
>>>
>>> ex. having following example.wsim
>>>
>>> 1.VCS1.3000.0.1
>>> 1.RCS.500-1000.-1.0
>>> 1.RCS.3700.0.0
>>> 1.RCS.1000.-2.0
>>> 1.VCS2.2300.-2.0
>>> 1.RCS.4700.-1.0
>>> 1.VCS2.600.-1.1
>>> p.16000
>>>
>>> we can scale the whole workload x10 with:
>>>
>>> gem_wsim -w example.wsim -f 10 -F 10
>>>
>>> -f is for batch duration steps, -F for period and delay steps
>>
>> Actually I am having a little bit of a second thought here. Thinking 
>> that perhaps it was deliberate to not scale periods.
>>
>> Think of it like this. -f 0.5 simulates a twice as fast GPU. -F 2 
>> simulates a twice as slow CPU.
>>
>> In both cases if something wants to hit 60 fps, it still wants to hit 
>> 60 fps. What use case for scaling the period do you have in mind?
>>
> 
> That gives another view on the matter.
> 
> I thought about it more like having a common unit, so giving -F 1000 -> 
> makes all CPU values (Period/Duration) are given in ms.

I lost you here. -F is also a scaling factor and not a time value.

> -f option may be used to calibrate for difference between GPU and CPU 
> ex. if wrongly reported freq makes a real GPU duration 10x faster than 
> CPU measured (ex. 10ms specified batch duration takes 1ms in reality) we 
> can provide -f 10 and still have GPU durations correspond to CPU time.

Hm but in either case nothing of this relates to framerate.

My current thinking is to drop this patch unless you can think of a good 
use case for scaling period. Or we need a new command option for scaling 
only periods.

Regards,

Tvrtko

> 
> Regards,
> marcin
> 
> 
>> Regards,
>>
>> Tvrtko
>>
>>> v2:
>>> - apply same approach as with DELAY step (Tvrtko)
>>>
>>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>>> ---
>>>   benchmarks/gem_wsim.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>>> index 42690d3d0..41557517c 100644
>>> --- a/benchmarks/gem_wsim.c
>>> +++ b/benchmarks/gem_wsim.c
>>> @@ -1186,6 +1186,8 @@ parse_workload(struct w_arg *arg, unsigned int 
>>> flags, double scale_dur,
>>>   add_step:
>>>           if (step.type == DELAY)
>>>               step.delay = __duration(step.delay, scale_time);
>>> +        else if (step.type == PERIOD)
>>> +            step.period = __duration(step.period, scale_time);
>>>           step.idx = nr_steps++;
>>>           step.request = -1;

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 03/17] benchmarks/gem_wsim: fix scaling of period steps
  2023-09-29 10:52       ` Tvrtko Ursulin
@ 2023-09-29 11:30         ` Bernatowicz, Marcin
  0 siblings, 0 replies; 40+ messages in thread
From: Bernatowicz, Marcin @ 2023-09-29 11:30 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: chris.p.wilson



On 9/29/2023 12:52 PM, Tvrtko Ursulin wrote:
> 
> On 29/09/2023 10:31, Bernatowicz, Marcin wrote:
>> Hi,
>>
>> On 9/29/2023 10:01 AM, Tvrtko Ursulin wrote:
>>>
>>> On 28/09/2023 18:45, Marcin Bernatowicz wrote:
>>>> Period steps take scale time (-F) command line option into account.
>>>
>>> "Make period steps.."?
>>>
>>> "Periods steps should take.."?
>>>
>>>> This allows to scale workload without need to modify .wsim file
>>>>
>>>> ex. having following example.wsim
>>>>
>>>> 1.VCS1.3000.0.1
>>>> 1.RCS.500-1000.-1.0
>>>> 1.RCS.3700.0.0
>>>> 1.RCS.1000.-2.0
>>>> 1.VCS2.2300.-2.0
>>>> 1.RCS.4700.-1.0
>>>> 1.VCS2.600.-1.1
>>>> p.16000
>>>>
>>>> we can scale the whole workload x10 with:
>>>>
>>>> gem_wsim -w example.wsim -f 10 -F 10
>>>>
>>>> -f is for batch duration steps, -F for period and delay steps
>>>
>>> Actually I am having a little bit of a second thought here. Thinking 
>>> that perhaps it was deliberate to not scale periods.
>>>
>>> Think of it like this. -f 0.5 simulates a twice as fast GPU. -F 2 
>>> simulates a twice as slow CPU.
>>>
>>> In both cases if something wants to hit 60 fps, it still wants to hit 
>>> 60 fps. What use case for scaling the period do you have in mind?
>>>
>>
>> That gives another view on the matter.
>>
>> I thought about it more like having a common unit, so giving -F 1000 
>> -> makes all CPU values (Period/Duration) are given in ms.
> 
> I lost you here. -F is also a scaling factor and not a time value.
> 
>> -f option may be used to calibrate for difference between GPU and CPU 
>> ex. if wrongly reported freq makes a real GPU duration 10x faster than 
>> CPU measured (ex. 10ms specified batch duration takes 1ms in reality) 
>> we can provide -f 10 and still have GPU durations correspond to CPU time.
> 
> Hm but in either case nothing of this relates to framerate.

true

> 
> My current thinking is to drop this patch unless you can think of a good 
> use case for scaling period. Or we need a new command option for scaling 
> only periods.

I don't see a use case for it, I will drop the patch.
--
marcin

> 
> Regards,
> 
> Tvrtko
> 
>>
>> Regards,
>> marcin
>>
>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> v2:
>>>> - apply same approach as with DELAY step (Tvrtko)
>>>>
>>>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>>>> ---
>>>>   benchmarks/gem_wsim.c | 2 ++
>>>>   1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>>>> index 42690d3d0..41557517c 100644
>>>> --- a/benchmarks/gem_wsim.c
>>>> +++ b/benchmarks/gem_wsim.c
>>>> @@ -1186,6 +1186,8 @@ parse_workload(struct w_arg *arg, unsigned int 
>>>> flags, double scale_dur,
>>>>   add_step:
>>>>           if (step.type == DELAY)
>>>>               step.delay = __duration(step.delay, scale_time);
>>>> +        else if (step.type == PERIOD)
>>>> +            step.period = __duration(step.period, scale_time);
>>>>           step.idx = nr_steps++;
>>>>           step.request = -1;

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [igt-dev] [PATCH i-g-t 04/17] benchmarks/gem_wsim: fix duration range check
  2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (2 preceding siblings ...)
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 03/17] benchmarks/gem_wsim: fix scaling of period steps Marcin Bernatowicz
@ 2023-09-28 17:45 ` Marcin Bernatowicz
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 05/17] benchmarks/gem_wsim: extract duration parsing code to new function Marcin Bernatowicz
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 40+ messages in thread
From: Marcin Bernatowicz @ 2023-09-28 17:45 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

When scale duration (-f) command line option is provided,
the max duration check does not take it into account, fix it.

v2:
- improve error message (Tvrtko)

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 41557517c..3c3779b72 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -1142,10 +1142,10 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 				if (sep && *sep == '-') {
 					tmpl = strtol(sep + 1, NULL, 10);
 					check_arg(tmpl <= 0 ||
-						tmpl <= step.duration.min ||
+						__duration(tmpl, scale_dur) <= step.duration.min ||
 						tmpl == LONG_MIN ||
 						tmpl == LONG_MAX,
-						"Invalid duration range at step %u!\n",
+						"Invalid maximum duration at step %u!\n",
 						nr_steps);
 					step.duration.max = __duration(tmpl,
 								       scale_dur);
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [igt-dev] [PATCH i-g-t 05/17] benchmarks/gem_wsim: extract duration parsing code to new function
  2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (3 preceding siblings ...)
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 04/17] benchmarks/gem_wsim: fix duration range check Marcin Bernatowicz
@ 2023-09-28 17:45 ` Marcin Bernatowicz
  2023-09-29  8:08   ` Tvrtko Ursulin
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 06/17] benchmarks/gem_wsim: fix conflicting SSEU #define and enum Marcin Bernatowicz
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Marcin Bernatowicz @ 2023-09-28 17:45 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Moved code from parse_workload to separate function.

v2:
- keep "field" parameter name (instead of "_desc") (Tvrtko)
- emit error messages from parse_duration, caller returns NULL
  on parse_duration failure  (Tvrtko)

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 70 ++++++++++++++++++++++++-------------------
 1 file changed, 40 insertions(+), 30 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 3c3779b72..71d1c55ae 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -860,6 +860,44 @@ static long __duration(long dur, double scale)
 	return round(scale * dur);
 }
 
+static int
+parse_duration(unsigned int nr_steps, struct duration *dur, double scale_dur, char *field)
+{
+	char *sep = NULL;
+	long tmpl;
+
+	if (field[0] == '*') {
+		if (intel_gen(intel_get_drm_devid(fd)) < 8) {
+			wsim_err("Infinite batch at step %u needs Gen8+!\n", nr_steps);
+			return -1;
+		}
+		dur->unbound = true;
+	} else {
+		tmpl = strtol(field, &sep, 10);
+		if (tmpl <= 0 || tmpl == LONG_MIN || tmpl == LONG_MAX) {
+			wsim_err("Invalid duration at step %u!\n", nr_steps);
+			return -1;
+		}
+
+		dur->min = __duration(tmpl, scale_dur);
+
+		if (sep && *sep == '-') {
+			tmpl = strtol(sep + 1, NULL, 10);
+			if (tmpl <= 0 || __duration(tmpl, scale_dur) <= dur->min ||
+			    tmpl == LONG_MIN || tmpl == LONG_MAX) {
+				wsim_err("Invalid maximum duration at step %u!\n", nr_steps);
+				return -1;
+			}
+
+			dur->max = __duration(tmpl, scale_dur);
+		} else {
+			dur->max = dur->min;
+		}
+	}
+
+	return 0;
+}
+
 #define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
 	if ((field = strtok_r(fstart, ".", &fctx))) { \
 		tmp = atoi(field); \
@@ -1121,38 +1159,10 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 		}
 
 		if ((field = strtok_r(fstart, ".", &fctx))) {
-			char *sep = NULL;
-			long int tmpl;
-
 			fstart = NULL;
 
-			if (field[0] == '*') {
-				check_arg(intel_gen(intel_get_drm_devid(fd)) < 8,
-					  "Infinite batch at step %u needs Gen8+!\n",
-					  nr_steps);
-				step.duration.unbound = true;
-			} else {
-				tmpl = strtol(field, &sep, 10);
-				check_arg(tmpl <= 0 || tmpl == LONG_MIN ||
-					  tmpl == LONG_MAX,
-					  "Invalid duration at step %u!\n",
-					  nr_steps);
-				step.duration.min = __duration(tmpl, scale_dur);
-
-				if (sep && *sep == '-') {
-					tmpl = strtol(sep + 1, NULL, 10);
-					check_arg(tmpl <= 0 ||
-						__duration(tmpl, scale_dur) <= step.duration.min ||
-						tmpl == LONG_MIN ||
-						tmpl == LONG_MAX,
-						"Invalid maximum duration at step %u!\n",
-						nr_steps);
-					step.duration.max = __duration(tmpl,
-								       scale_dur);
-				} else {
-					step.duration.max = step.duration.min;
-				}
-			}
+			if (parse_duration(nr_steps, &step.duration, scale_dur, field))
+				return NULL;
 
 			valid++;
 		}
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 05/17] benchmarks/gem_wsim: extract duration parsing code to new function
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 05/17] benchmarks/gem_wsim: extract duration parsing code to new function Marcin Bernatowicz
@ 2023-09-29  8:08   ` Tvrtko Ursulin
  0 siblings, 0 replies; 40+ messages in thread
From: Tvrtko Ursulin @ 2023-09-29  8:08 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 28/09/2023 18:45, Marcin Bernatowicz wrote:
> Moved code from parse_workload to separate function.
> 
> v2:
> - keep "field" parameter name (instead of "_desc") (Tvrtko)
> - emit error messages from parse_duration, caller returns NULL
>    on parse_duration failure  (Tvrtko)
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 70 ++++++++++++++++++++++++-------------------
>   1 file changed, 40 insertions(+), 30 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 3c3779b72..71d1c55ae 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -860,6 +860,44 @@ static long __duration(long dur, double scale)
>   	return round(scale * dur);
>   }
>   
> +static int
> +parse_duration(unsigned int nr_steps, struct duration *dur, double scale_dur, char *field)
> +{
> +	char *sep = NULL;
> +	long tmpl;
> +
> +	if (field[0] == '*') {
> +		if (intel_gen(intel_get_drm_devid(fd)) < 8) {
> +			wsim_err("Infinite batch at step %u needs Gen8+!\n", nr_steps);
> +			return -1;
> +		}
> +		dur->unbound = true;
> +	} else {
> +		tmpl = strtol(field, &sep, 10);
> +		if (tmpl <= 0 || tmpl == LONG_MIN || tmpl == LONG_MAX) {
> +			wsim_err("Invalid duration at step %u!\n", nr_steps);
> +			return -1;
> +		}
> +
> +		dur->min = __duration(tmpl, scale_dur);
> +
> +		if (sep && *sep == '-') {
> +			tmpl = strtol(sep + 1, NULL, 10);
> +			if (tmpl <= 0 || __duration(tmpl, scale_dur) <= dur->min ||
> +			    tmpl == LONG_MIN || tmpl == LONG_MAX) {
> +				wsim_err("Invalid maximum duration at step %u!\n", nr_steps);
> +				return -1;
> +			}
> +
> +			dur->max = __duration(tmpl, scale_dur);
> +		} else {
> +			dur->max = dur->min;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>   #define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
>   	if ((field = strtok_r(fstart, ".", &fctx))) { \
>   		tmp = atoi(field); \
> @@ -1121,38 +1159,10 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   		}
>   
>   		if ((field = strtok_r(fstart, ".", &fctx))) {
> -			char *sep = NULL;
> -			long int tmpl;
> -
>   			fstart = NULL;
>   
> -			if (field[0] == '*') {
> -				check_arg(intel_gen(intel_get_drm_devid(fd)) < 8,
> -					  "Infinite batch at step %u needs Gen8+!\n",
> -					  nr_steps);
> -				step.duration.unbound = true;
> -			} else {
> -				tmpl = strtol(field, &sep, 10);
> -				check_arg(tmpl <= 0 || tmpl == LONG_MIN ||
> -					  tmpl == LONG_MAX,
> -					  "Invalid duration at step %u!\n",
> -					  nr_steps);
> -				step.duration.min = __duration(tmpl, scale_dur);
> -
> -				if (sep && *sep == '-') {
> -					tmpl = strtol(sep + 1, NULL, 10);
> -					check_arg(tmpl <= 0 ||
> -						__duration(tmpl, scale_dur) <= step.duration.min ||
> -						tmpl == LONG_MIN ||
> -						tmpl == LONG_MAX,
> -						"Invalid maximum duration at step %u!\n",
> -						nr_steps);
> -					step.duration.max = __duration(tmpl,
> -								       scale_dur);
> -				} else {
> -					step.duration.max = step.duration.min;
> -				}
> -			}
> +			if (parse_duration(nr_steps, &step.duration, scale_dur, field))
> +				return NULL;
>   
>   			valid++;
>   		}

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [igt-dev] [PATCH i-g-t 06/17] benchmarks/gem_wsim: fix conflicting SSEU #define and enum
  2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (4 preceding siblings ...)
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 05/17] benchmarks/gem_wsim: extract duration parsing code to new function Marcin Bernatowicz
@ 2023-09-28 17:45 ` Marcin Bernatowicz
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 07/17] benchmarks/gem_wsim: cleanups Marcin Bernatowicz
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 40+ messages in thread
From: Marcin Bernatowicz @ 2023-09-28 17:45 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

One SSEU is in enum w_step and then as #define SSEU (1 << 3).
Fix this.

v2:
- add FLAG_ prefix to all flags (Tvrtko)

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 71d1c55ae..3c25d801c 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -236,9 +236,9 @@ static struct drm_i915_gem_context_param_sseu device_sseu = {
 	.slice_mask = -1 /* Force read on first use. */
 };
 
-#define SYNCEDCLIENTS	(1<<1)
-#define DEPSYNC		(1<<2)
-#define SSEU		(1<<3)
+#define FLAG_SYNCEDCLIENTS	(1<<1)
+#define FLAG_DEPSYNC		(1<<2)
+#define FLAG_SSEU		(1<<3)
 
 static const char *ring_str_map[NUM_ENGINES] = {
 	[DEFAULT] = "DEFAULT",
@@ -1232,7 +1232,7 @@ add_step:
 	wrk->sseu = arg->sseu;
 	wrk->max_working_set_id = -1;
 	wrk->working_sets = NULL;
-	wrk->bo_prng = (flags & SYNCEDCLIENTS) ? master_prng : rand();
+	wrk->bo_prng = (flags & FLAG_SYNCEDCLIENTS) ? master_prng : rand();
 
 	free(desc);
 
@@ -1870,8 +1870,8 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
 	int i, j;
 
 	wrk->id = id;
-	wrk->bb_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
-	wrk->bo_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
+	wrk->bb_prng = (wrk->flags & FLAG_SYNCEDCLIENTS) ? master_prng : rand();
+	wrk->bo_prng = (wrk->flags & FLAG_SYNCEDCLIENTS) ? master_prng : rand();
 	wrk->run = true;
 
 	/*
@@ -2389,7 +2389,7 @@ static void *run_workload(void *data)
 
 			igt_assert(w->type == BATCH);
 
-			if (wrk->flags & DEPSYNC)
+			if (wrk->flags & FLAG_DEPSYNC)
 				sync_deps(wrk, w);
 
 			if (throttle > 0)
@@ -2596,7 +2596,7 @@ int main(int argc, char **argv)
 			/* Fall through */
 		case 'w':
 			w_args = add_workload_arg(w_args, ++nr_w_args, optarg,
-						  prio, flags & SSEU);
+						  prio, flags & FLAG_SSEU);
 			break;
 		case 'p':
 			prio = atoi(optarg);
@@ -2622,13 +2622,13 @@ int main(int argc, char **argv)
 			verbose++;
 			break;
 		case 'S':
-			flags |= SYNCEDCLIENTS;
+			flags |= FLAG_SYNCEDCLIENTS;
 			break;
 		case 's':
-			flags ^= SSEU;
+			flags ^= FLAG_SSEU;
 			break;
 		case 'd':
-			flags |= DEPSYNC;
+			flags |= FLAG_DEPSYNC;
 			break;
 		case 'I':
 			master_prng = strtol(optarg, NULL, 0);
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [igt-dev] [PATCH i-g-t 07/17] benchmarks/gem_wsim: cleanups
  2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (5 preceding siblings ...)
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 06/17] benchmarks/gem_wsim: fix conflicting SSEU #define and enum Marcin Bernatowicz
@ 2023-09-28 17:45 ` Marcin Bernatowicz
  2023-09-29  8:09   ` Tvrtko Ursulin
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 08/17] benchmarks/gem_wsim: reposition repeat_start variable Marcin Bernatowicz
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Marcin Bernatowicz @ 2023-09-28 17:45 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 5860 bytes --]

Cleaning checkpatch.pl reported warnings/errors.
Removed unused fence_signal field from struct w_step.

v2:
- restored unnecessarily changed malloc (Tvrtko)

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 54 ++++++++++++++++++++++++++-----------------
 1 file changed, 33 insertions(+), 21 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 3c25d801c..f91e126f1 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: MIT
 /*
  * Copyright © 2017 Intel Corporation
  *
@@ -76,8 +77,7 @@ struct duration {
 	bool unbound;
 };
 
-enum w_type
-{
+enum w_type {
 	BATCH,
 	SYNC,
 	DELAY,
@@ -102,8 +102,7 @@ struct dep_entry {
 	int working_set; /* -1 = step dependecy, >= 0 working set id */
 };
 
-struct deps
-{
+struct deps {
 	int nr;
 	bool submit_fence;
 	struct dep_entry *list;
@@ -137,8 +136,7 @@ struct working_set {
 
 struct workload;
 
-struct w_step
-{
+struct w_step {
 	struct workload *wrk;
 
 	/* Workload step metadata */
@@ -155,7 +153,6 @@ struct w_step
 		int period;
 		int target;
 		int throttle;
-		int fence_signal;
 		int priority;
 		struct {
 			unsigned int engine_map_count;
@@ -194,8 +191,7 @@ struct ctx {
 	uint64_t sseu;
 };
 
-struct workload
-{
+struct workload {
 	unsigned int id;
 
 	unsigned int nr_steps;
@@ -807,6 +803,7 @@ static int add_buffers(struct working_set *set, char *str)
 
 	for (i = 0; i < add; i++) {
 		struct work_buffer_size *sz = &sizes[set->nr + i];
+
 		sz->min = min_sz;
 		sz->max = max_sz;
 		sz->size = 0;
@@ -899,13 +896,16 @@ parse_duration(unsigned int nr_steps, struct duration *dur, double scale_dur, ch
 }
 
 #define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
-	if ((field = strtok_r(fstart, ".", &fctx))) { \
-		tmp = atoi(field); \
-		check_arg(_COND_, _ERR_, nr_steps); \
-		step.type = _STEP_; \
-		step._FIELD_ = tmp; \
-		goto add_step; \
-	} \
+	do { \
+		field = strtok_r(fstart, ".", &fctx); \
+		if (field) { \
+			tmp = atoi(field); \
+			check_arg(_COND_, _ERR_, nr_steps); \
+			step.type = _STEP_; \
+			step._FIELD_ = tmp; \
+			goto add_step; \
+		} \
+	} while (0)
 
 static struct workload *
 parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
@@ -930,7 +930,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 		valid = 0;
 		memset(&step, 0, sizeof(step));
 
-		if ((field = strtok_r(fstart, ".", &fctx))) {
+		field = strtok_r(fstart, ".", &fctx);
+		if (field) {
 			fstart = NULL;
 
 			if (!strcmp(field, "d")) {
@@ -941,6 +942,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 					  "Invalid period at step %u!\n");
 			} else if (!strcmp(field, "P")) {
 				unsigned int nr = 0;
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
@@ -966,6 +968,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 					  "Invalid sync target at step %u!\n");
 			} else if (!strcmp(field, "S")) {
 				unsigned int nr = 0;
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(tmp <= 0 && nr == 0,
@@ -1002,6 +1005,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 				goto add_step;
 			} else if (!strcmp(field, "M")) {
 				unsigned int nr = 0;
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
@@ -1032,6 +1036,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 					  "Invalid terminate target at step %u!\n");
 			} else if (!strcmp(field, "X")) {
 				unsigned int nr = 0;
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
@@ -1056,6 +1061,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 				goto add_step;
 			} else if (!strcmp(field, "B")) {
 				unsigned int nr = 0;
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
@@ -1075,6 +1081,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 				goto add_step;
 			} else if (!strcmp(field, "b")) {
 				unsigned int nr = 0;
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					check_arg(nr > 2,
 						  "Invalid bond format at step %u!\n",
@@ -1146,7 +1153,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			valid++;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx))) {
+		field = strtok_r(fstart, ".", &fctx);
+		if (field) {
 			fstart = NULL;
 
 			i = str_to_engine(field);
@@ -1158,7 +1166,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			step.engine = i;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx))) {
+		field = strtok_r(fstart, ".", &fctx);
+		if (field) {
 			fstart = NULL;
 
 			if (parse_duration(nr_steps, &step.duration, scale_dur, field))
@@ -1167,7 +1176,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			valid++;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx))) {
+		field = strtok_r(fstart, ".", &fctx);
+		if (field) {
 			fstart = NULL;
 
 			tmp = parse_dependencies(nr_steps, &step, field);
@@ -1177,7 +1187,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			valid++;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx))) {
+		field = strtok_r(fstart, ".", &fctx);
+		if (field) {
 			fstart = NULL;
 
 			check_arg(strlen(field) != 1 ||
@@ -2716,6 +2727,7 @@ int main(int argc, char **argv)
 
 	if (append_workload_arg) {
 		struct w_arg arg = { NULL, append_workload_arg, 0 };
+
 		app_w = parse_workload(&arg, flags, scale_dur, scale_time,
 				       NULL);
 		if (!app_w) {
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 07/17] benchmarks/gem_wsim: cleanups
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 07/17] benchmarks/gem_wsim: cleanups Marcin Bernatowicz
@ 2023-09-29  8:09   ` Tvrtko Ursulin
  0 siblings, 0 replies; 40+ messages in thread
From: Tvrtko Ursulin @ 2023-09-29  8:09 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 28/09/2023 18:45, Marcin Bernatowicz wrote:
> Cleaning checkpatch.pl reported warnings/errors.
> Removed unused fence_signal field from struct w_step.
> 
> v2:
> - restored unnecessarily changed malloc (Tvrtko)
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 54 ++++++++++++++++++++++++++-----------------
>   1 file changed, 33 insertions(+), 21 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 3c25d801c..f91e126f1 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -1,3 +1,4 @@
> +// SPDX-License-Identifier: MIT
>   /*
>    * Copyright © 2017 Intel Corporation
>    *
> @@ -76,8 +77,7 @@ struct duration {
>   	bool unbound;
>   };
>   
> -enum w_type
> -{
> +enum w_type {
>   	BATCH,
>   	SYNC,
>   	DELAY,
> @@ -102,8 +102,7 @@ struct dep_entry {
>   	int working_set; /* -1 = step dependecy, >= 0 working set id */
>   };
>   
> -struct deps
> -{
> +struct deps {
>   	int nr;
>   	bool submit_fence;
>   	struct dep_entry *list;
> @@ -137,8 +136,7 @@ struct working_set {
>   
>   struct workload;
>   
> -struct w_step
> -{
> +struct w_step {
>   	struct workload *wrk;
>   
>   	/* Workload step metadata */
> @@ -155,7 +153,6 @@ struct w_step
>   		int period;
>   		int target;
>   		int throttle;
> -		int fence_signal;
>   		int priority;
>   		struct {
>   			unsigned int engine_map_count;
> @@ -194,8 +191,7 @@ struct ctx {
>   	uint64_t sseu;
>   };
>   
> -struct workload
> -{
> +struct workload {
>   	unsigned int id;
>   
>   	unsigned int nr_steps;
> @@ -807,6 +803,7 @@ static int add_buffers(struct working_set *set, char *str)
>   
>   	for (i = 0; i < add; i++) {
>   		struct work_buffer_size *sz = &sizes[set->nr + i];
> +
>   		sz->min = min_sz;
>   		sz->max = max_sz;
>   		sz->size = 0;
> @@ -899,13 +896,16 @@ parse_duration(unsigned int nr_steps, struct duration *dur, double scale_dur, ch
>   }
>   
>   #define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
> -	if ((field = strtok_r(fstart, ".", &fctx))) { \
> -		tmp = atoi(field); \
> -		check_arg(_COND_, _ERR_, nr_steps); \
> -		step.type = _STEP_; \
> -		step._FIELD_ = tmp; \
> -		goto add_step; \
> -	} \
> +	do { \
> +		field = strtok_r(fstart, ".", &fctx); \
> +		if (field) { \
> +			tmp = atoi(field); \
> +			check_arg(_COND_, _ERR_, nr_steps); \
> +			step.type = _STEP_; \
> +			step._FIELD_ = tmp; \
> +			goto add_step; \
> +		} \
> +	} while (0)
>   
>   static struct workload *
>   parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
> @@ -930,7 +930,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   		valid = 0;
>   		memset(&step, 0, sizeof(step));
>   
> -		if ((field = strtok_r(fstart, ".", &fctx))) {
> +		field = strtok_r(fstart, ".", &fctx);
> +		if (field) {
>   			fstart = NULL;
>   
>   			if (!strcmp(field, "d")) {
> @@ -941,6 +942,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   					  "Invalid period at step %u!\n");
>   			} else if (!strcmp(field, "P")) {
>   				unsigned int nr = 0;
> +
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
>   					tmp = atoi(field);
>   					check_arg(nr == 0 && tmp <= 0,
> @@ -966,6 +968,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   					  "Invalid sync target at step %u!\n");
>   			} else if (!strcmp(field, "S")) {
>   				unsigned int nr = 0;
> +
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
>   					tmp = atoi(field);
>   					check_arg(tmp <= 0 && nr == 0,
> @@ -1002,6 +1005,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   				goto add_step;
>   			} else if (!strcmp(field, "M")) {
>   				unsigned int nr = 0;
> +
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
>   					tmp = atoi(field);
>   					check_arg(nr == 0 && tmp <= 0,
> @@ -1032,6 +1036,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   					  "Invalid terminate target at step %u!\n");
>   			} else if (!strcmp(field, "X")) {
>   				unsigned int nr = 0;
> +
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
>   					tmp = atoi(field);
>   					check_arg(nr == 0 && tmp <= 0,
> @@ -1056,6 +1061,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   				goto add_step;
>   			} else if (!strcmp(field, "B")) {
>   				unsigned int nr = 0;
> +
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
>   					tmp = atoi(field);
>   					check_arg(nr == 0 && tmp <= 0,
> @@ -1075,6 +1081,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   				goto add_step;
>   			} else if (!strcmp(field, "b")) {
>   				unsigned int nr = 0;
> +
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
>   					check_arg(nr > 2,
>   						  "Invalid bond format at step %u!\n",
> @@ -1146,7 +1153,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   			valid++;
>   		}
>   
> -		if ((field = strtok_r(fstart, ".", &fctx))) {
> +		field = strtok_r(fstart, ".", &fctx);
> +		if (field) {
>   			fstart = NULL;
>   
>   			i = str_to_engine(field);
> @@ -1158,7 +1166,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   			step.engine = i;
>   		}
>   
> -		if ((field = strtok_r(fstart, ".", &fctx))) {
> +		field = strtok_r(fstart, ".", &fctx);
> +		if (field) {
>   			fstart = NULL;
>   
>   			if (parse_duration(nr_steps, &step.duration, scale_dur, field))
> @@ -1167,7 +1176,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   			valid++;
>   		}
>   
> -		if ((field = strtok_r(fstart, ".", &fctx))) {
> +		field = strtok_r(fstart, ".", &fctx);
> +		if (field) {
>   			fstart = NULL;
>   
>   			tmp = parse_dependencies(nr_steps, &step, field);
> @@ -1177,7 +1187,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   			valid++;
>   		}
>   
> -		if ((field = strtok_r(fstart, ".", &fctx))) {
> +		field = strtok_r(fstart, ".", &fctx);
> +		if (field) {
>   			fstart = NULL;
>   
>   			check_arg(strlen(field) != 1 ||
> @@ -2716,6 +2727,7 @@ int main(int argc, char **argv)
>   
>   	if (append_workload_arg) {
>   		struct w_arg arg = { NULL, append_workload_arg, 0 };
> +
>   		app_w = parse_workload(&arg, flags, scale_dur, scale_time,
>   				       NULL);
>   		if (!app_w) {

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [igt-dev] [PATCH i-g-t 08/17] benchmarks/gem_wsim: reposition repeat_start variable
  2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (6 preceding siblings ...)
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 07/17] benchmarks/gem_wsim: cleanups Marcin Bernatowicz
@ 2023-09-28 17:45 ` Marcin Bernatowicz
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 09/17] benchmarks/gem_wsim: use lib code to query engines Marcin Bernatowicz
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 40+ messages in thread
From: Marcin Bernatowicz @ 2023-09-28 17:45 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

No need for repeat_start in struct workload.
It's now a variable in run_workload function scope.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index f91e126f1..10ad6cf75 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -209,8 +209,6 @@ struct workload {
 	uint32_t bb_prng;
 	uint32_t bo_prng;
 
-	struct timespec repeat_start;
-
 	unsigned int nr_ctxs;
 	struct ctx *ctx_list;
 
@@ -2282,7 +2280,7 @@ static void sync_deps(struct workload *wrk, struct w_step *w)
 static void *run_workload(void *data)
 {
 	struct workload *wrk = (struct workload *)data;
-	struct timespec t_start, t_end;
+	struct timespec t_start, t_end, repeat_start;
 	struct w_step *w;
 	int throttle = -1;
 	int qd_throttle = -1;
@@ -2296,7 +2294,7 @@ static void *run_workload(void *data)
 	     count++) {
 		unsigned int cur_seqno = wrk->sync_seqno;
 
-		clock_gettime(CLOCK_MONOTONIC, &wrk->repeat_start);
+		clock_gettime(CLOCK_MONOTONIC, &repeat_start);
 
 		for (i = 0, w = wrk->steps; wrk->run && (i < wrk->nr_steps);
 		     i++, w++) {
@@ -2310,7 +2308,7 @@ static void *run_workload(void *data)
 				int elapsed;
 
 				clock_gettime(CLOCK_MONOTONIC, &now);
-				elapsed = elapsed_us(&wrk->repeat_start, &now);
+				elapsed = elapsed_us(&repeat_start, &now);
 				do_sleep = w->period - elapsed;
 				time_tot += elapsed;
 				if (elapsed < time_min)
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [igt-dev] [PATCH i-g-t 09/17] benchmarks/gem_wsim: use lib code to query engines
  2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (7 preceding siblings ...)
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 08/17] benchmarks/gem_wsim: reposition repeat_start variable Marcin Bernatowicz
@ 2023-09-28 17:45 ` Marcin Bernatowicz
  2023-09-29  8:11   ` Tvrtko Ursulin
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 10/17] benchmarks/gem_wsim: allow comments in workload description files Marcin Bernatowicz
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Marcin Bernatowicz @ 2023-09-28 17:45 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Use code in lib/i915/gem_engine_topology to query engines.

v2:
- keep i unsigned, restore igt_assert(count)
  in num_engines_in_class (Tvrtko)

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 153 +++++-------------------------------------
 1 file changed, 17 insertions(+), 136 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 10ad6cf75..8cd68058d 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -456,146 +456,27 @@ static int str_to_engine(const char *str)
 	return -1;
 }
 
-static bool __engines_queried;
-static unsigned int __num_engines;
-static struct i915_engine_class_instance *__engines;
-
-static int
-__i915_query(int i915, struct drm_i915_query *q)
-{
-	if (igt_ioctl(i915, DRM_IOCTL_I915_QUERY, q))
-		return -errno;
-	return 0;
-}
-
-static int
-__i915_query_items(int i915, struct drm_i915_query_item *items, uint32_t n_items)
-{
-	struct drm_i915_query q = {
-		.num_items = n_items,
-		.items_ptr = to_user_pointer(items),
-	};
-	return __i915_query(i915, &q);
-}
-
-static void
-i915_query_items(int i915, struct drm_i915_query_item *items, uint32_t n_items)
-{
-	igt_assert_eq(__i915_query_items(i915, items, n_items), 0);
-}
-
-static bool has_engine_query(int i915)
-{
-	struct drm_i915_query_item item = {
-		.query_id = DRM_I915_QUERY_ENGINE_INFO,
-	};
-
-	return __i915_query_items(i915, &item, 1) == 0 && item.length > 0;
-}
-
-static void query_engines(void)
+static struct intel_engine_data *query_engines(void)
 {
-	struct i915_engine_class_instance *engines;
-	unsigned int num;
-
-	if (__engines_queried)
-		return;
-
-	__engines_queried = true;
-
-	if (!has_engine_query(fd)) {
-		unsigned int num_bsd = gem_has_bsd(fd) + gem_has_bsd2(fd);
-		unsigned int i = 0;
-
-		igt_assert(num_bsd);
-
-		num = 1 + num_bsd;
+	static struct intel_engine_data engines = {};
 
-		if (gem_has_blt(fd))
-			num++;
+	if (engines.nengines)
+		return &engines;
 
-		if (gem_has_vebox(fd))
-			num++;
-
-		engines = calloc(num,
-				 sizeof(struct i915_engine_class_instance));
-		igt_assert(engines);
-
-		engines[i].engine_class = I915_ENGINE_CLASS_RENDER;
-		engines[i].engine_instance = 0;
-		i++;
-
-		if (gem_has_blt(fd)) {
-			engines[i].engine_class = I915_ENGINE_CLASS_COPY;
-			engines[i].engine_instance = 0;
-			i++;
-		}
-
-		if (gem_has_bsd(fd)) {
-			engines[i].engine_class = I915_ENGINE_CLASS_VIDEO;
-			engines[i].engine_instance = 0;
-			i++;
-		}
-
-		if (gem_has_bsd2(fd)) {
-			engines[i].engine_class = I915_ENGINE_CLASS_VIDEO;
-			engines[i].engine_instance = 1;
-			i++;
-		}
-
-		if (gem_has_vebox(fd)) {
-			engines[i].engine_class =
-				I915_ENGINE_CLASS_VIDEO_ENHANCE;
-			engines[i].engine_instance = 0;
-			i++;
-		}
-	} else {
-		struct drm_i915_query_engine_info *engine_info;
-		struct drm_i915_query_item item = {
-			.query_id = DRM_I915_QUERY_ENGINE_INFO,
-		};
-		const unsigned int sz = 4096;
-		unsigned int i;
-
-		engine_info = malloc(sz);
-		igt_assert(engine_info);
-		memset(engine_info, 0, sz);
-
-		item.data_ptr = to_user_pointer(engine_info);
-		item.length = sz;
-
-		i915_query_items(fd, &item, 1);
-		igt_assert(item.length > 0);
-		igt_assert(item.length <= sz);
-
-		num = engine_info->num_engines;
-
-		engines = calloc(num,
-				 sizeof(struct i915_engine_class_instance));
-		igt_assert(engines);
-
-		for (i = 0; i < num; i++) {
-			struct drm_i915_engine_info *engine =
-				(struct drm_i915_engine_info *)&engine_info->engines[i];
-
-			engines[i] = engine->engine;
-		}
-	}
-
-	__engines = engines;
-	__num_engines = num;
+	engines = intel_engine_list_of_physical(fd);
+	igt_assert(engines.nengines);
+	return &engines;
 }
 
 static unsigned int num_engines_in_class(enum intel_engine_id class)
 {
+	const struct intel_engine_data *engines = query_engines();
 	unsigned int i, count = 0;
 
 	igt_assert(class == VCS);
 
-	query_engines();
-
-	for (i = 0; i < __num_engines; i++) {
-		if (__engines[i].engine_class == I915_ENGINE_CLASS_VIDEO)
+	for (i = 0; i < engines->nengines; i++) {
+		if (engines->engines[i].class == I915_ENGINE_CLASS_VIDEO)
 			count++;
 	}
 
@@ -607,16 +488,15 @@ static void
 fill_engines_id_class(enum intel_engine_id *list,
 		      enum intel_engine_id class)
 {
+	const struct intel_engine_data *engines = query_engines();
 	enum intel_engine_id engine = VCS1;
 	unsigned int i, j = 0;
 
 	igt_assert(class == VCS);
 	igt_assert(num_engines_in_class(VCS) <= 2);
 
-	query_engines();
-
-	for (i = 0; i < __num_engines; i++) {
-		if (__engines[i].engine_class != I915_ENGINE_CLASS_VIDEO)
+	for (i = 0; i < engines->nengines; i++) {
+		if (engines->engines[i].class != I915_ENGINE_CLASS_VIDEO)
 			continue;
 
 		list[j++] = engine++;
@@ -626,17 +506,18 @@ fill_engines_id_class(enum intel_engine_id *list,
 static unsigned int
 find_physical_instance(enum intel_engine_id class, unsigned int logical)
 {
+	const struct intel_engine_data *engines = query_engines();
 	unsigned int i, j = 0;
 
 	igt_assert(class == VCS);
 
-	for (i = 0; i < __num_engines; i++) {
-		if (__engines[i].engine_class != I915_ENGINE_CLASS_VIDEO)
+	for (i = 0; i < engines->nengines; i++) {
+		if (engines->engines[i].class != I915_ENGINE_CLASS_VIDEO)
 			continue;
 
 		/* Map logical to physical instances. */
 		if (logical == j++)
-			return __engines[i].engine_instance;
+			return engines->engines[i].instance;
 	}
 
 	igt_assert(0);
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 09/17] benchmarks/gem_wsim: use lib code to query engines
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 09/17] benchmarks/gem_wsim: use lib code to query engines Marcin Bernatowicz
@ 2023-09-29  8:11   ` Tvrtko Ursulin
  2023-09-29 10:35     ` Bernatowicz, Marcin
  0 siblings, 1 reply; 40+ messages in thread
From: Tvrtko Ursulin @ 2023-09-29  8:11 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 28/09/2023 18:45, Marcin Bernatowicz wrote:
> Use code in lib/i915/gem_engine_topology to query engines.
> 
> v2:
> - keep i unsigned, restore igt_assert(count)
>    in num_engines_in_class (Tvrtko)
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 153 +++++-------------------------------------
>   1 file changed, 17 insertions(+), 136 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 10ad6cf75..8cd68058d 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -456,146 +456,27 @@ static int str_to_engine(const char *str)
>   	return -1;
>   }
>   
> -static bool __engines_queried;
> -static unsigned int __num_engines;
> -static struct i915_engine_class_instance *__engines;
> -
> -static int
> -__i915_query(int i915, struct drm_i915_query *q)
> -{
> -	if (igt_ioctl(i915, DRM_IOCTL_I915_QUERY, q))
> -		return -errno;
> -	return 0;
> -}
> -
> -static int
> -__i915_query_items(int i915, struct drm_i915_query_item *items, uint32_t n_items)
> -{
> -	struct drm_i915_query q = {
> -		.num_items = n_items,
> -		.items_ptr = to_user_pointer(items),
> -	};
> -	return __i915_query(i915, &q);
> -}
> -
> -static void
> -i915_query_items(int i915, struct drm_i915_query_item *items, uint32_t n_items)
> -{
> -	igt_assert_eq(__i915_query_items(i915, items, n_items), 0);
> -}
> -
> -static bool has_engine_query(int i915)
> -{
> -	struct drm_i915_query_item item = {
> -		.query_id = DRM_I915_QUERY_ENGINE_INFO,
> -	};
> -
> -	return __i915_query_items(i915, &item, 1) == 0 && item.length > 0;
> -}
> -
> -static void query_engines(void)
> +static struct intel_engine_data *query_engines(void)
>   {
> -	struct i915_engine_class_instance *engines;
> -	unsigned int num;
> -
> -	if (__engines_queried)
> -		return;
> -
> -	__engines_queried = true;
> -
> -	if (!has_engine_query(fd)) {
> -		unsigned int num_bsd = gem_has_bsd(fd) + gem_has_bsd2(fd);
> -		unsigned int i = 0;
> -
> -		igt_assert(num_bsd);
> -
> -		num = 1 + num_bsd;
> +	static struct intel_engine_data engines = {};
>   
> -		if (gem_has_blt(fd))
> -			num++;
> +	if (engines.nengines)
> +		return &engines;
>   
> -		if (gem_has_vebox(fd))
> -			num++;
> -
> -		engines = calloc(num,
> -				 sizeof(struct i915_engine_class_instance));
> -		igt_assert(engines);
> -
> -		engines[i].engine_class = I915_ENGINE_CLASS_RENDER;
> -		engines[i].engine_instance = 0;
> -		i++;
> -
> -		if (gem_has_blt(fd)) {
> -			engines[i].engine_class = I915_ENGINE_CLASS_COPY;
> -			engines[i].engine_instance = 0;
> -			i++;
> -		}
> -
> -		if (gem_has_bsd(fd)) {
> -			engines[i].engine_class = I915_ENGINE_CLASS_VIDEO;
> -			engines[i].engine_instance = 0;
> -			i++;
> -		}
> -
> -		if (gem_has_bsd2(fd)) {
> -			engines[i].engine_class = I915_ENGINE_CLASS_VIDEO;
> -			engines[i].engine_instance = 1;
> -			i++;
> -		}
> -
> -		if (gem_has_vebox(fd)) {
> -			engines[i].engine_class =
> -				I915_ENGINE_CLASS_VIDEO_ENHANCE;
> -			engines[i].engine_instance = 0;
> -			i++;
> -		}
> -	} else {
> -		struct drm_i915_query_engine_info *engine_info;
> -		struct drm_i915_query_item item = {
> -			.query_id = DRM_I915_QUERY_ENGINE_INFO,
> -		};
> -		const unsigned int sz = 4096;
> -		unsigned int i;
> -
> -		engine_info = malloc(sz);
> -		igt_assert(engine_info);
> -		memset(engine_info, 0, sz);
> -
> -		item.data_ptr = to_user_pointer(engine_info);
> -		item.length = sz;
> -
> -		i915_query_items(fd, &item, 1);
> -		igt_assert(item.length > 0);
> -		igt_assert(item.length <= sz);
> -
> -		num = engine_info->num_engines;
> -
> -		engines = calloc(num,
> -				 sizeof(struct i915_engine_class_instance));
> -		igt_assert(engines);
> -
> -		for (i = 0; i < num; i++) {
> -			struct drm_i915_engine_info *engine =
> -				(struct drm_i915_engine_info *)&engine_info->engines[i];
> -
> -			engines[i] = engine->engine;
> -		}
> -	}
> -
> -	__engines = engines;
> -	__num_engines = num;
> +	engines = intel_engine_list_of_physical(fd);
> +	igt_assert(engines.nengines);
> +	return &engines;
>   }
>   
>   static unsigned int num_engines_in_class(enum intel_engine_id class)
>   {
> +	const struct intel_engine_data *engines = query_engines();
>   	unsigned int i, count = 0;
>   
>   	igt_assert(class == VCS);
>   
> -	query_engines();
> -
> -	for (i = 0; i < __num_engines; i++) {
> -		if (__engines[i].engine_class == I915_ENGINE_CLASS_VIDEO)
> +	for (i = 0; i < engines->nengines; i++) {
> +		if (engines->engines[i].class == I915_ENGINE_CLASS_VIDEO)
>   			count++;
>   	}
>   
> @@ -607,16 +488,15 @@ static void
>   fill_engines_id_class(enum intel_engine_id *list,
>   		      enum intel_engine_id class)
>   {
> +	const struct intel_engine_data *engines = query_engines();
>   	enum intel_engine_id engine = VCS1;
>   	unsigned int i, j = 0;
>   
>   	igt_assert(class == VCS);
>   	igt_assert(num_engines_in_class(VCS) <= 2);
>   
> -	query_engines();
> -
> -	for (i = 0; i < __num_engines; i++) {
> -		if (__engines[i].engine_class != I915_ENGINE_CLASS_VIDEO)
> +	for (i = 0; i < engines->nengines; i++) {
> +		if (engines->engines[i].class != I915_ENGINE_CLASS_VIDEO)
>   			continue;
>   
>   		list[j++] = engine++;
> @@ -626,17 +506,18 @@ fill_engines_id_class(enum intel_engine_id *list,
>   static unsigned int
>   find_physical_instance(enum intel_engine_id class, unsigned int logical)
>   {
> +	const struct intel_engine_data *engines = query_engines();
>   	unsigned int i, j = 0;
>   
>   	igt_assert(class == VCS);
>   
> -	for (i = 0; i < __num_engines; i++) {
> -		if (__engines[i].engine_class != I915_ENGINE_CLASS_VIDEO)
> +	for (i = 0; i < engines->nengines; i++) {
> +		if (engines->engines[i].class != I915_ENGINE_CLASS_VIDEO)
>   			continue;
>   
>   		/* Map logical to physical instances. */
>   		if (logical == j++)
> -			return __engines[i].engine_instance;
> +			return engines->engines[i].instance;
>   	}
>   
>   	igt_assert(0);

Looks fine - I couldn't spot a problem when I looked at the details a 
little bit more during the last round of review. I trust you are smoke 
testing against both drivers at least a little bit.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 09/17] benchmarks/gem_wsim: use lib code to query engines
  2023-09-29  8:11   ` Tvrtko Ursulin
@ 2023-09-29 10:35     ` Bernatowicz, Marcin
  0 siblings, 0 replies; 40+ messages in thread
From: Bernatowicz, Marcin @ 2023-09-29 10:35 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: chris.p.wilson



On 9/29/2023 10:11 AM, Tvrtko Ursulin wrote:
> 
> On 28/09/2023 18:45, Marcin Bernatowicz wrote:
>> Use code in lib/i915/gem_engine_topology to query engines.
>>
>> v2:
>> - keep i unsigned, restore igt_assert(count)
>>    in num_engines_in_class (Tvrtko)
>>
>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>> ---
>>   benchmarks/gem_wsim.c | 153 +++++-------------------------------------
>>   1 file changed, 17 insertions(+), 136 deletions(-)
>>
>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>> index 10ad6cf75..8cd68058d 100644
>> --- a/benchmarks/gem_wsim.c
>> +++ b/benchmarks/gem_wsim.c
>> @@ -456,146 +456,27 @@ static int str_to_engine(const char *str)
>>       return -1;
>>   }
>> -static bool __engines_queried;
>> -static unsigned int __num_engines;
>> -static struct i915_engine_class_instance *__engines;
>> -
>> -static int
>> -__i915_query(int i915, struct drm_i915_query *q)
>> -{
>> -    if (igt_ioctl(i915, DRM_IOCTL_I915_QUERY, q))
>> -        return -errno;
>> -    return 0;
>> -}
>> -
>> -static int
>> -__i915_query_items(int i915, struct drm_i915_query_item *items, 
>> uint32_t n_items)
>> -{
>> -    struct drm_i915_query q = {
>> -        .num_items = n_items,
>> -        .items_ptr = to_user_pointer(items),
>> -    };
>> -    return __i915_query(i915, &q);
>> -}
>> -
>> -static void
>> -i915_query_items(int i915, struct drm_i915_query_item *items, 
>> uint32_t n_items)
>> -{
>> -    igt_assert_eq(__i915_query_items(i915, items, n_items), 0);
>> -}
>> -
>> -static bool has_engine_query(int i915)
>> -{
>> -    struct drm_i915_query_item item = {
>> -        .query_id = DRM_I915_QUERY_ENGINE_INFO,
>> -    };
>> -
>> -    return __i915_query_items(i915, &item, 1) == 0 && item.length > 0;
>> -}
>> -
>> -static void query_engines(void)
>> +static struct intel_engine_data *query_engines(void)
>>   {
>> -    struct i915_engine_class_instance *engines;
>> -    unsigned int num;
>> -
>> -    if (__engines_queried)
>> -        return;
>> -
>> -    __engines_queried = true;
>> -
>> -    if (!has_engine_query(fd)) {
>> -        unsigned int num_bsd = gem_has_bsd(fd) + gem_has_bsd2(fd);
>> -        unsigned int i = 0;
>> -
>> -        igt_assert(num_bsd);
>> -
>> -        num = 1 + num_bsd;
>> +    static struct intel_engine_data engines = {};
>> -        if (gem_has_blt(fd))
>> -            num++;
>> +    if (engines.nengines)
>> +        return &engines;
>> -        if (gem_has_vebox(fd))
>> -            num++;
>> -
>> -        engines = calloc(num,
>> -                 sizeof(struct i915_engine_class_instance));
>> -        igt_assert(engines);
>> -
>> -        engines[i].engine_class = I915_ENGINE_CLASS_RENDER;
>> -        engines[i].engine_instance = 0;
>> -        i++;
>> -
>> -        if (gem_has_blt(fd)) {
>> -            engines[i].engine_class = I915_ENGINE_CLASS_COPY;
>> -            engines[i].engine_instance = 0;
>> -            i++;
>> -        }
>> -
>> -        if (gem_has_bsd(fd)) {
>> -            engines[i].engine_class = I915_ENGINE_CLASS_VIDEO;
>> -            engines[i].engine_instance = 0;
>> -            i++;
>> -        }
>> -
>> -        if (gem_has_bsd2(fd)) {
>> -            engines[i].engine_class = I915_ENGINE_CLASS_VIDEO;
>> -            engines[i].engine_instance = 1;
>> -            i++;
>> -        }
>> -
>> -        if (gem_has_vebox(fd)) {
>> -            engines[i].engine_class =
>> -                I915_ENGINE_CLASS_VIDEO_ENHANCE;
>> -            engines[i].engine_instance = 0;
>> -            i++;
>> -        }
>> -    } else {
>> -        struct drm_i915_query_engine_info *engine_info;
>> -        struct drm_i915_query_item item = {
>> -            .query_id = DRM_I915_QUERY_ENGINE_INFO,
>> -        };
>> -        const unsigned int sz = 4096;
>> -        unsigned int i;
>> -
>> -        engine_info = malloc(sz);
>> -        igt_assert(engine_info);
>> -        memset(engine_info, 0, sz);
>> -
>> -        item.data_ptr = to_user_pointer(engine_info);
>> -        item.length = sz;
>> -
>> -        i915_query_items(fd, &item, 1);
>> -        igt_assert(item.length > 0);
>> -        igt_assert(item.length <= sz);
>> -
>> -        num = engine_info->num_engines;
>> -
>> -        engines = calloc(num,
>> -                 sizeof(struct i915_engine_class_instance));
>> -        igt_assert(engines);
>> -
>> -        for (i = 0; i < num; i++) {
>> -            struct drm_i915_engine_info *engine =
>> -                (struct drm_i915_engine_info *)&engine_info->engines[i];
>> -
>> -            engines[i] = engine->engine;
>> -        }
>> -    }
>> -
>> -    __engines = engines;
>> -    __num_engines = num;
>> +    engines = intel_engine_list_of_physical(fd);
>> +    igt_assert(engines.nengines);
>> +    return &engines;
>>   }
>>   static unsigned int num_engines_in_class(enum intel_engine_id class)
>>   {
>> +    const struct intel_engine_data *engines = query_engines();
>>       unsigned int i, count = 0;
>>       igt_assert(class == VCS);
>> -    query_engines();
>> -
>> -    for (i = 0; i < __num_engines; i++) {
>> -        if (__engines[i].engine_class == I915_ENGINE_CLASS_VIDEO)
>> +    for (i = 0; i < engines->nengines; i++) {
>> +        if (engines->engines[i].class == I915_ENGINE_CLASS_VIDEO)
>>               count++;
>>       }
>> @@ -607,16 +488,15 @@ static void
>>   fill_engines_id_class(enum intel_engine_id *list,
>>                 enum intel_engine_id class)
>>   {
>> +    const struct intel_engine_data *engines = query_engines();
>>       enum intel_engine_id engine = VCS1;
>>       unsigned int i, j = 0;
>>       igt_assert(class == VCS);
>>       igt_assert(num_engines_in_class(VCS) <= 2);
>> -    query_engines();
>> -
>> -    for (i = 0; i < __num_engines; i++) {
>> -        if (__engines[i].engine_class != I915_ENGINE_CLASS_VIDEO)
>> +    for (i = 0; i < engines->nengines; i++) {
>> +        if (engines->engines[i].class != I915_ENGINE_CLASS_VIDEO)
>>               continue;
>>           list[j++] = engine++;
>> @@ -626,17 +506,18 @@ fill_engines_id_class(enum intel_engine_id *list,
>>   static unsigned int
>>   find_physical_instance(enum intel_engine_id class, unsigned int 
>> logical)
>>   {
>> +    const struct intel_engine_data *engines = query_engines();
>>       unsigned int i, j = 0;
>>       igt_assert(class == VCS);
>> -    for (i = 0; i < __num_engines; i++) {
>> -        if (__engines[i].engine_class != I915_ENGINE_CLASS_VIDEO)
>> +    for (i = 0; i < engines->nengines; i++) {
>> +        if (engines->engines[i].class != I915_ENGINE_CLASS_VIDEO)
>>               continue;
>>           /* Map logical to physical instances. */
>>           if (logical == j++)
>> -            return __engines[i].engine_instance;
>> +            return engines->engines[i].instance;
>>       }
>>       igt_assert(0);
> 
> Looks fine - I couldn't spot a problem when I looked at the details a 
> little bit more during the last round of review. I trust you are smoke 
> testing against both drivers at least a little bit.

Yes, I'm checking few workloads: media_nn_1080p_s2.wsim, 
media_load_balance_fhd26u7.wsim and some artificial test_syntax.wsim
on both drivers.
--
marcin
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [igt-dev] [PATCH i-g-t 10/17] benchmarks/gem_wsim: allow comments in workload description files
  2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (8 preceding siblings ...)
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 09/17] benchmarks/gem_wsim: use lib code to query engines Marcin Bernatowicz
@ 2023-09-28 17:45 ` Marcin Bernatowicz
  2023-09-29  8:28   ` Tvrtko Ursulin
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 11/17] benchmarks/gem_wsim: introduce w_step_sync function Marcin Bernatowicz
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Marcin Bernatowicz @ 2023-09-28 17:45 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Lines starting with '#' are skipped.
If command line step separator (',') is encountered after '#'
it is replaced with ';' to not break parsing.

v2:
- SKIP step type is not needed (Tvrtko)
v3:
- correct README comment (Tvrtko)
- removed hunk for trailing comments after BATCH step,
  as some other steps do not support it either (Tvrtko)

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c  | 16 ++++++++++++++++
 benchmarks/wsim/README |  2 ++
 2 files changed, 18 insertions(+)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 8cd68058d..87c68316d 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -43,6 +43,7 @@
 #include <limits.h>
 #include <pthread.h>
 #include <math.h>
+#include <ctype.h>
 
 #include "drm.h"
 #include "drmtest.h"
@@ -813,6 +814,14 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 		if (field) {
 			fstart = NULL;
 
+			/* line starting with # is a comment */
+			if (field[0] == '#') {
+				if (verbose > 3)
+					printf("skipped line: %s\n", _token);
+				free(token);
+				continue;
+			}
+
 			if (!strcmp(field, "d")) {
 				int_field(DELAY, delay, tmp <= 0,
 					  "Invalid delay at step %u!\n");
@@ -2421,6 +2430,13 @@ static char *load_workload_descriptor(char *filename)
 	close(infd);
 
 	for (i = 0; i < len; i++) {
+		/* '#' starts comment till end of line */
+		if (buf[i] == '#')
+			/* replace ',' in comments to not break parsing */
+			while (++i < len && buf[i] != '\n')
+				if (buf[i] == ',')
+					buf[i] = ';';
+
 		if (buf[i] == '\n')
 			buf[i] = ',';
 	}
diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
index 8c71f2fe6..d4f3dd8ae 100644
--- a/benchmarks/wsim/README
+++ b/benchmarks/wsim/README
@@ -1,6 +1,8 @@
 Workload descriptor format
 ==========================
 
+Lines starting with '#' are treated as comments and will not create a work step.
+
 ctx.engine.duration_us.dependency.wait,...
 <uint>.<str>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
 B.<uint>
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 10/17] benchmarks/gem_wsim: allow comments in workload description files
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 10/17] benchmarks/gem_wsim: allow comments in workload description files Marcin Bernatowicz
@ 2023-09-29  8:28   ` Tvrtko Ursulin
  0 siblings, 0 replies; 40+ messages in thread
From: Tvrtko Ursulin @ 2023-09-29  8:28 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 28/09/2023 18:45, Marcin Bernatowicz wrote:
> Lines starting with '#' are skipped.
> If command line step separator (',') is encountered after '#'
> it is replaced with ';' to not break parsing.
> 
> v2:
> - SKIP step type is not needed (Tvrtko)
> v3:
> - correct README comment (Tvrtko)
> - removed hunk for trailing comments after BATCH step,
>    as some other steps do not support it either (Tvrtko)
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c  | 16 ++++++++++++++++
>   benchmarks/wsim/README |  2 ++
>   2 files changed, 18 insertions(+)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 8cd68058d..87c68316d 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -43,6 +43,7 @@
>   #include <limits.h>
>   #include <pthread.h>
>   #include <math.h>
> +#include <ctype.h>
>   
>   #include "drm.h"
>   #include "drmtest.h"
> @@ -813,6 +814,14 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   		if (field) {
>   			fstart = NULL;
>   
> +			/* line starting with # is a comment */
> +			if (field[0] == '#') {
> +				if (verbose > 3)
> +					printf("skipped line: %s\n", _token);
> +				free(token);
> +				continue;
> +			}
> +
>   			if (!strcmp(field, "d")) {
>   				int_field(DELAY, delay, tmp <= 0,
>   					  "Invalid delay at step %u!\n");
> @@ -2421,6 +2430,13 @@ static char *load_workload_descriptor(char *filename)
>   	close(infd);
>   
>   	for (i = 0; i < len; i++) {
> +		/* '#' starts comment till end of line */
> +		if (buf[i] == '#')
> +			/* replace ',' in comments to not break parsing */
> +			while (++i < len && buf[i] != '\n')
> +				if (buf[i] == ',')
> +					buf[i] = ';';
> +
>   		if (buf[i] == '\n')
>   			buf[i] = ',';

Is it a out of bounds access here if ++i < len was hit above? Would 
happen if file ends with a hash, I think.

Maybe:

if (buf[i] == '#') {
	in_comment = true;
} else if (buf[i] == '\n')
	buf[i] = ',';
	in_comment = false;
} else if (in_comment && buf[i] == ',')
	buf[i] = ';';
}

But please triple check. :)

Regards,

Tvrtko

>   	}
> diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
> index 8c71f2fe6..d4f3dd8ae 100644
> --- a/benchmarks/wsim/README
> +++ b/benchmarks/wsim/README
> @@ -1,6 +1,8 @@
>   Workload descriptor format
>   ==========================
>   
> +Lines starting with '#' are treated as comments and will not create a work step.
> +
>   ctx.engine.duration_us.dependency.wait,...
>   <uint>.<str>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
>   B.<uint>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [igt-dev] [PATCH i-g-t 11/17] benchmarks/gem_wsim: introduce w_step_sync function
  2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (9 preceding siblings ...)
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 10/17] benchmarks/gem_wsim: allow comments in workload description files Marcin Bernatowicz
@ 2023-09-28 17:45 ` Marcin Bernatowicz
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 12/17] benchmarks/gem_wsim: extract allocate and prepare contexts code to new functions Marcin Bernatowicz
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 40+ messages in thread
From: Marcin Bernatowicz @ 2023-09-28 17:45 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Added w_step_sync function for workload step synchronization.
Change will allow cleaner xe integration.

v2:
- correct indentation (Tvrtko)

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 87c68316d..7495ab297 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -245,6 +245,11 @@ static const char *ring_str_map[NUM_ENGINES] = {
 	[VECS] = "VECS",
 };
 
+static void w_step_sync(struct w_step *w)
+{
+	gem_sync(fd, w->obj[0].handle);
+}
+
 static int read_timestamp_frequency(int i915)
 {
 	int value = 0;
@@ -2104,7 +2109,7 @@ static void w_sync_to(struct workload *wrk, struct w_step *w, int target)
 	igt_assert(target < wrk->nr_steps);
 	igt_assert(wrk->steps[target].type == BATCH);
 
-	gem_sync(fd, wrk->steps[target].obj[0].handle);
+	w_step_sync(&wrk->steps[target]);
 }
 
 static void
@@ -2163,7 +2168,7 @@ static void sync_deps(struct workload *wrk, struct w_step *w)
 		igt_assert(dep_idx >= 0 && dep_idx < w->idx);
 		igt_assert(wrk->steps[dep_idx].type == BATCH);
 
-		gem_sync(fd, wrk->steps[dep_idx].obj[0].handle);
+		w_step_sync(&wrk->steps[dep_idx]);
 	}
 }
 
@@ -2217,7 +2222,7 @@ static void *run_workload(void *data)
 
 				igt_assert(s_idx >= 0 && s_idx < i);
 				igt_assert(wrk->steps[s_idx].type == BATCH);
-				gem_sync(fd, wrk->steps[s_idx].obj[0].handle);
+				w_step_sync(&wrk->steps[s_idx]);
 				continue;
 			} else if (w->type == THROTTLE) {
 				throttle = w->throttle;
@@ -2308,7 +2313,7 @@ static void *run_workload(void *data)
 				break;
 
 			if (w->sync)
-				gem_sync(fd, w->obj[0].handle);
+				w_step_sync(w);
 
 			if (qd_throttle > 0) {
 				while (wrk->nrequest[engine] > qd_throttle) {
@@ -2317,7 +2322,7 @@ static void *run_workload(void *data)
 					s = igt_list_first_entry(&wrk->requests[engine],
 								 s, rq_link);
 
-					gem_sync(fd, s->obj[0].handle);
+					w_step_sync(s);
 
 					s->request = -1;
 					igt_list_del(&s->rq_link);
@@ -2349,7 +2354,7 @@ static void *run_workload(void *data)
 			continue;
 
 		w = igt_list_last_entry(&wrk->requests[i], w, rq_link);
-		gem_sync(fd, w->obj[0].handle);
+		w_step_sync(w);
 	}
 
 	clock_gettime(CLOCK_MONOTONIC, &t_end);
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [igt-dev] [PATCH i-g-t 12/17] benchmarks/gem_wsim: extract allocate and prepare contexts code to new functions
  2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (10 preceding siblings ...)
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 11/17] benchmarks/gem_wsim: introduce w_step_sync function Marcin Bernatowicz
@ 2023-09-28 17:45 ` Marcin Bernatowicz
  2023-09-29  9:26   ` Tvrtko Ursulin
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 13/17] benchmarks/gem_wsim: extract prepare working sets code to new function Marcin Bernatowicz
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Marcin Bernatowicz @ 2023-09-28 17:45 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

No functional changes.
Extracted allocate_contexts and prepare_contexts functions
from prepare_workload.

v2:
- propagate error code from prepare_contexts (Tvrtko)
- don't mix unrelated changes (Tvrtko)

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 43 ++++++++++++++++++++++++++++++++-----------
 1 file changed, 32 insertions(+), 11 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 7495ab297..a344fea9e 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -1764,19 +1764,11 @@ static void measure_active_set(struct workload *wrk)
 
 #define alloca0(sz) ({ size_t sz__ = (sz); memset(alloca(sz__), 0, sz__); })
 
-static int prepare_workload(unsigned int id, struct workload *wrk)
+static void allocate_contexts(unsigned int id, struct workload *wrk)
 {
-	struct working_set **sets;
-	unsigned long total = 0;
-	uint32_t share_vm = 0;
 	int max_ctx = -1;
 	struct w_step *w;
-	int i, j;
-
-	wrk->id = id;
-	wrk->bb_prng = (wrk->flags & FLAG_SYNCEDCLIENTS) ? master_prng : rand();
-	wrk->bo_prng = (wrk->flags & FLAG_SYNCEDCLIENTS) ? master_prng : rand();
-	wrk->run = true;
+	int i;
 
 	/*
 	 * Pre-scan workload steps to allocate context list storage.
@@ -1800,6 +1792,13 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
 
 		max_ctx = ctx;
 	}
+}
+
+static int prepare_contexts(unsigned int id, struct workload *wrk)
+{
+	uint32_t share_vm = 0;
+	struct w_step *w;
+	int i, j;
 
 	/*
 	 * Transfer over engine map configuration from the workload step.
@@ -1966,6 +1965,28 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
 	if (share_vm)
 		vm_destroy(fd, share_vm);
 
+	return 0;
+}
+
+static int prepare_workload(unsigned int id, struct workload *wrk)
+{
+	struct working_set **sets;
+	unsigned long total = 0;
+	struct w_step *w;
+	int i, j;
+	int ret = 0;
+
+	wrk->id = id;
+	wrk->bb_prng = (wrk->flags & FLAG_SYNCEDCLIENTS) ? master_prng : rand();
+	wrk->bo_prng = (wrk->flags & FLAG_SYNCEDCLIENTS) ? master_prng : rand();
+	wrk->run = true;
+
+	allocate_contexts(id, wrk);
+
+	ret = prepare_contexts(id, wrk);
+	if (ret)
+		return ret;
+
 	/* Record default preemption. */
 	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
 		if (w->type == BATCH)
@@ -2067,7 +2088,7 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
 
 	measure_active_set(wrk);
 
-	return 0;
+	return ret;
 }
 
 static double elapsed(const struct timespec *start, const struct timespec *end)
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 12/17] benchmarks/gem_wsim: extract allocate and prepare contexts code to new functions
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 12/17] benchmarks/gem_wsim: extract allocate and prepare contexts code to new functions Marcin Bernatowicz
@ 2023-09-29  9:26   ` Tvrtko Ursulin
  0 siblings, 0 replies; 40+ messages in thread
From: Tvrtko Ursulin @ 2023-09-29  9:26 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 28/09/2023 18:45, Marcin Bernatowicz wrote:
> No functional changes.
> Extracted allocate_contexts and prepare_contexts functions
> from prepare_workload.
> 
> v2:
> - propagate error code from prepare_contexts (Tvrtko)
> - don't mix unrelated changes (Tvrtko)
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 43 ++++++++++++++++++++++++++++++++-----------
>   1 file changed, 32 insertions(+), 11 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 7495ab297..a344fea9e 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -1764,19 +1764,11 @@ static void measure_active_set(struct workload *wrk)
>   
>   #define alloca0(sz) ({ size_t sz__ = (sz); memset(alloca(sz__), 0, sz__); })
>   
> -static int prepare_workload(unsigned int id, struct workload *wrk)
> +static void allocate_contexts(unsigned int id, struct workload *wrk)
>   {
> -	struct working_set **sets;
> -	unsigned long total = 0;
> -	uint32_t share_vm = 0;
>   	int max_ctx = -1;
>   	struct w_step *w;
> -	int i, j;
> -
> -	wrk->id = id;
> -	wrk->bb_prng = (wrk->flags & FLAG_SYNCEDCLIENTS) ? master_prng : rand();
> -	wrk->bo_prng = (wrk->flags & FLAG_SYNCEDCLIENTS) ? master_prng : rand();
> -	wrk->run = true;
> +	int i;
>   
>   	/*
>   	 * Pre-scan workload steps to allocate context list storage.
> @@ -1800,6 +1792,13 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
>   
>   		max_ctx = ctx;
>   	}
> +}
> +
> +static int prepare_contexts(unsigned int id, struct workload *wrk)
> +{
> +	uint32_t share_vm = 0;
> +	struct w_step *w;
> +	int i, j;
>   
>   	/*
>   	 * Transfer over engine map configuration from the workload step.
> @@ -1966,6 +1965,28 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
>   	if (share_vm)
>   		vm_destroy(fd, share_vm);
>   
> +	return 0;
> +}
> +
> +static int prepare_workload(unsigned int id, struct workload *wrk)
> +{
> +	struct working_set **sets;
> +	unsigned long total = 0;
> +	struct w_step *w;
> +	int i, j;
> +	int ret = 0;
> +
> +	wrk->id = id;
> +	wrk->bb_prng = (wrk->flags & FLAG_SYNCEDCLIENTS) ? master_prng : rand();
> +	wrk->bo_prng = (wrk->flags & FLAG_SYNCEDCLIENTS) ? master_prng : rand();
> +	wrk->run = true;
> +
> +	allocate_contexts(id, wrk);
> +
> +	ret = prepare_contexts(id, wrk);
> +	if (ret)
> +		return ret;
> +
>   	/* Record default preemption. */
>   	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
>   		if (w->type == BATCH)
> @@ -2067,7 +2088,7 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
>   
>   	measure_active_set(wrk);
>   
> -	return 0;
> +	return ret;
>   }
>   
>   static double elapsed(const struct timespec *start, const struct timespec *end)

Looks fine.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [igt-dev] [PATCH i-g-t 13/17] benchmarks/gem_wsim: extract prepare working sets code to new function
  2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (11 preceding siblings ...)
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 12/17] benchmarks/gem_wsim: extract allocate and prepare contexts code to new functions Marcin Bernatowicz
@ 2023-09-28 17:45 ` Marcin Bernatowicz
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 14/17] benchmarks/gem_wsim: group i915 fields Marcin Bernatowicz
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 40+ messages in thread
From: Marcin Bernatowicz @ 2023-09-28 17:45 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

No functional changes.
Extracted prepare_working_sets function from prepare_workload.

v2:
- return void (Tvrtko)

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 104 +++++++++++++++++++++++-------------------
 1 file changed, 56 insertions(+), 48 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index a344fea9e..e875ddaa7 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -1968,10 +1968,64 @@ static int prepare_contexts(unsigned int id, struct workload *wrk)
 	return 0;
 }
 
-static int prepare_workload(unsigned int id, struct workload *wrk)
+static void prepare_working_sets(unsigned int id, struct workload *wrk)
 {
 	struct working_set **sets;
 	unsigned long total = 0;
+	struct w_step *w;
+	int i;
+
+	/*
+	 * Allocate working sets.
+	 */
+	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+		if (w->type == WORKINGSET && !w->working_set.shared)
+			total += allocate_working_set(wrk, &w->working_set);
+	}
+
+	if (verbose > 2)
+		printf("%u: %lu bytes in working sets.\n", wrk->id, total);
+
+	/*
+	 * Map of working set ids.
+	 */
+	wrk->max_working_set_id = -1;
+	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+		if (w->type == WORKINGSET &&
+		    w->working_set.id > wrk->max_working_set_id)
+			wrk->max_working_set_id = w->working_set.id;
+	}
+
+	sets = wrk->working_sets;
+	wrk->working_sets = calloc(wrk->max_working_set_id + 1,
+				   sizeof(*wrk->working_sets));
+	igt_assert(wrk->working_sets);
+
+	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+		struct working_set *set;
+
+		if (w->type != WORKINGSET)
+			continue;
+
+		if (!w->working_set.shared) {
+			set = &w->working_set;
+		} else {
+			igt_assert(sets);
+
+			set = sets[w->working_set.id];
+			igt_assert(set->shared);
+			igt_assert(set->sizes);
+		}
+
+		wrk->working_sets[w->working_set.id] = set;
+	}
+
+	if (sets)
+		free(sets);
+}
+
+static int prepare_workload(unsigned int id, struct workload *wrk)
+{
 	struct w_step *w;
 	int i, j;
 	int ret = 0;
@@ -2028,53 +2082,7 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
 		}
 	}
 
-	/*
-	 * Allocate working sets.
-	 */
-	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
-		if (w->type == WORKINGSET && !w->working_set.shared)
-			total += allocate_working_set(wrk, &w->working_set);
-	}
-
-	if (verbose > 2)
-		printf("%u: %lu bytes in working sets.\n", wrk->id, total);
-
-	/*
-	 * Map of working set ids.
-	 */
-	wrk->max_working_set_id = -1;
-	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
-		if (w->type == WORKINGSET &&
-		    w->working_set.id > wrk->max_working_set_id)
-			wrk->max_working_set_id = w->working_set.id;
-	}
-
-	sets = wrk->working_sets;
-	wrk->working_sets = calloc(wrk->max_working_set_id + 1,
-				   sizeof(*wrk->working_sets));
-	igt_assert(wrk->working_sets);
-
-	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
-		struct working_set *set;
-
-		if (w->type != WORKINGSET)
-			continue;
-
-		if (!w->working_set.shared) {
-			set = &w->working_set;
-		} else {
-			igt_assert(sets);
-
-			set = sets[w->working_set.id];
-			igt_assert(set->shared);
-			igt_assert(set->sizes);
-		}
-
-		wrk->working_sets[w->working_set.id] = set;
-	}
-
-	if (sets)
-		free(sets);
+	prepare_working_sets(id, wrk);
 
 	/*
 	 * Allocate batch buffers.
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [igt-dev] [PATCH i-g-t 14/17] benchmarks/gem_wsim: group i915 fields
  2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (12 preceding siblings ...)
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 13/17] benchmarks/gem_wsim: extract prepare working sets code to new function Marcin Bernatowicz
@ 2023-09-28 17:45 ` Marcin Bernatowicz
  2023-09-29  9:33   ` Tvrtko Ursulin
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 15/17] benchmarks/gem_wsim: introduce bb_size in w_step Marcin Bernatowicz
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Marcin Bernatowicz @ 2023-09-28 17:45 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Group i915 specific fields.

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 100 ++++++++++++++++++++++--------------------
 1 file changed, 52 insertions(+), 48 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index e875ddaa7..4618509ab 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -174,11 +174,15 @@ struct w_step {
 	unsigned int request;
 	unsigned int preempt_us;
 
-	struct drm_i915_gem_execbuffer2 eb;
-	struct drm_i915_gem_exec_object2 *obj;
-	struct drm_i915_gem_relocation_entry reloc[3];
+	union {
+		struct {
+			struct drm_i915_gem_execbuffer2 eb;
+			struct drm_i915_gem_exec_object2 *obj;
+			struct drm_i915_gem_relocation_entry reloc[3];
+			uint32_t *bb_duration;
+		} i915;
+	};
 	uint32_t bb_handle;
-	uint32_t *bb_duration;
 };
 
 struct ctx {
@@ -247,7 +251,7 @@ static const char *ring_str_map[NUM_ENGINES] = {
 
 static void w_step_sync(struct w_step *w)
 {
-	gem_sync(fd, w->obj[0].handle);
+	gem_sync(fd, w->i915.obj[0].handle);
 }
 
 static int read_timestamp_frequency(int i915)
@@ -1374,9 +1378,9 @@ static unsigned int create_bb(struct w_step *w, int self)
 	/* Save delta for indirect read by COND_BBE */
 	*cs++ = MI_STORE_REGISTER_MEM_CMD | (1 + use_64b) | MI_CS_MMIO_DST;
 	*cs++ = CS_GPR(NOW_TS);
-	w->reloc[r].target_handle = self;
-	w->reloc[r].offset = offset_in_page(cs);
-	*cs++ = w->reloc[r].delta = 4000;
+	w->i915.reloc[r].target_handle = self;
+	w->i915.reloc[r].offset = offset_in_page(cs);
+	*cs++ = w->i915.reloc[r].delta = 4000;
 	*cs++ = 0;
 	r++;
 
@@ -1389,19 +1393,19 @@ static unsigned int create_bb(struct w_step *w, int self)
 
 	/* Break if delta [time elapsed] > target ns (target filled in later) */
 	*cs++ = MI_COND_BATCH_BUFFER_END | MI_DO_COMPARE | (1 + use_64b);
-	w->bb_duration = cs;
+	w->i915.bb_duration = cs;
 	*cs++ = 0;
-	w->reloc[r].target_handle = self;
-	w->reloc[r].offset = offset_in_page(cs);
-	*cs++ = w->reloc[r].delta = 4000;
+	w->i915.reloc[r].target_handle = self;
+	w->i915.reloc[r].offset = offset_in_page(cs);
+	*cs++ = w->i915.reloc[r].delta = 4000;
 	*cs++ = 0;
 	r++;
 
 	/* Otherwise back to recalculating delta */
 	*cs++ = MI_BATCH_BUFFER_START | 1 << 8 | use_64b;
-	w->reloc[r].target_handle = self;
-	w->reloc[r].offset = offset_in_page(cs);
-	*cs++ = w->reloc[r].delta = offset_in_page(jmp);
+	w->i915.reloc[r].target_handle = self;
+	w->i915.reloc[r].offset = offset_in_page(cs);
+	*cs++ = w->i915.reloc[r].delta = offset_in_page(jmp);
 	*cs++ = 0;
 	r++;
 
@@ -1446,16 +1450,16 @@ eb_update_flags(struct workload *wrk, struct w_step *w,
 	struct ctx *ctx = __get_ctx(wrk, w);
 
 	if (ctx->engine_map)
-		w->eb.flags = find_engine_in_map(ctx, engine);
+		w->i915.eb.flags = find_engine_in_map(ctx, engine);
 	else
-		eb_set_engine(&w->eb, engine);
+		eb_set_engine(&w->i915.eb, engine);
 
-	w->eb.flags |= I915_EXEC_HANDLE_LUT;
-	w->eb.flags |= I915_EXEC_NO_RELOC;
+	w->i915.eb.flags |= I915_EXEC_HANDLE_LUT;
+	w->i915.eb.flags |= I915_EXEC_NO_RELOC;
 
 	igt_assert(w->emit_fence <= 0);
 	if (w->emit_fence)
-		w->eb.flags |= I915_EXEC_FENCE_OUT;
+		w->i915.eb.flags |= I915_EXEC_FENCE_OUT;
 }
 
 static uint32_t
@@ -1477,11 +1481,11 @@ alloc_step_batch(struct workload *wrk, struct w_step *w)
 	unsigned int nr_obj = 2 + w->data_deps.nr;
 	unsigned int i;
 
-	w->obj = calloc(nr_obj, sizeof(*w->obj));
-	igt_assert(w->obj);
+	w->i915.obj = calloc(nr_obj, sizeof(*w->i915.obj));
+	igt_assert(w->i915.obj);
 
-	w->obj[j].handle = alloc_bo(fd, 4096);
-	w->obj[j].flags = EXEC_OBJECT_WRITE;
+	w->i915.obj[j].handle = alloc_bo(fd, 4096);
+	w->i915.obj[j].flags = EXEC_OBJECT_WRITE;
 	j++;
 	igt_assert(j < nr_obj);
 
@@ -1496,7 +1500,7 @@ alloc_step_batch(struct workload *wrk, struct w_step *w)
 			igt_assert(dep_idx >= 0 && dep_idx < w->idx);
 			igt_assert(wrk->steps[dep_idx].type == BATCH);
 
-			dep_handle = wrk->steps[dep_idx].obj[0].handle;
+			dep_handle = wrk->steps[dep_idx].i915.obj[0].handle;
 		} else {
 			struct working_set *set;
 
@@ -1512,28 +1516,28 @@ alloc_step_batch(struct workload *wrk, struct w_step *w)
 			dep_handle = set->handles[entry->target];
 		}
 
-		w->obj[j].flags = entry->write ? EXEC_OBJECT_WRITE : 0;
-		w->obj[j].handle = dep_handle;
+		w->i915.obj[j].flags = entry->write ? EXEC_OBJECT_WRITE : 0;
+		w->i915.obj[j].handle = dep_handle;
 		j++;
 		igt_assert(j < nr_obj);
 	}
 
-	w->bb_handle = w->obj[j].handle = gem_create(fd, 4096);
-	w->obj[j].relocation_count = create_bb(w, j);
-	igt_assert(w->obj[j].relocation_count <= ARRAY_SIZE(w->reloc));
-	w->obj[j].relocs_ptr = to_user_pointer(&w->reloc);
+	w->bb_handle = w->i915.obj[j].handle = gem_create(fd, 4096);
+	w->i915.obj[j].relocation_count = create_bb(w, j);
+	igt_assert(w->i915.obj[j].relocation_count <= ARRAY_SIZE(w->i915.reloc));
+	w->i915.obj[j].relocs_ptr = to_user_pointer(&w->i915.reloc);
 
-	w->eb.buffers_ptr = to_user_pointer(w->obj);
-	w->eb.buffer_count = j + 1;
-	w->eb.rsvd1 = get_ctxid(wrk, w);
+	w->i915.eb.buffers_ptr = to_user_pointer(w->i915.obj);
+	w->i915.eb.buffer_count = j + 1;
+	w->i915.eb.rsvd1 = get_ctxid(wrk, w);
 
 	eb_update_flags(wrk, w, engine);
 #ifdef DEBUG
-	printf("%u: %u:|", w->idx, w->eb.buffer_count);
+	printf("%u: %u:|", w->idx, w->i915.eb.buffer_count);
 	for (i = 0; i <= j; i++)
-		printf("%x|", w->obj[i].handle);
+		printf("%x|", w->i915.obj[i].handle);
 	printf(" flags=%llx bb=%x[%u] ctx[%u]=%u\n",
-		w->eb.flags, w->bb_handle, j, w->context,
+		w->i915.eb.flags, w->bb_handle, j, w->context,
 		get_ctxid(wrk, w));
 #endif
 }
@@ -1730,7 +1734,7 @@ static void measure_active_set(struct workload *wrk)
 				igt_assert(idx >= 0 && idx < w->idx);
 				igt_assert(wrk->steps[idx].type == BATCH);
 
-				_dep.target = wrk->steps[idx].obj[0].handle;
+				_dep.target = wrk->steps[idx].i915.obj[0].handle;
 			}
 
 			if (!find_dep(deps, nr, _dep)) {
@@ -2120,7 +2124,7 @@ update_bb_start(struct workload *wrk, struct w_step *w)
 	if (!w->duration.unbound)
 		ticks = ~ns_to_ctx_ticks(1000LL * get_duration(wrk, w));
 
-	*w->bb_duration = ticks;
+	*w->i915.bb_duration = ticks;
 }
 
 static void w_sync_to(struct workload *wrk, struct w_step *w, int target)
@@ -2158,20 +2162,20 @@ do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine)
 		igt_assert(wrk->steps[tgt].emit_fence > 0);
 
 		if (w->fence_deps.submit_fence)
-			w->eb.flags |= I915_EXEC_FENCE_SUBMIT;
+			w->i915.eb.flags |= I915_EXEC_FENCE_SUBMIT;
 		else
-			w->eb.flags |= I915_EXEC_FENCE_IN;
+			w->i915.eb.flags |= I915_EXEC_FENCE_IN;
 
-		w->eb.rsvd2 = wrk->steps[tgt].emit_fence;
+		w->i915.eb.rsvd2 = wrk->steps[tgt].emit_fence;
 	}
 
-	if (w->eb.flags & I915_EXEC_FENCE_OUT)
-		gem_execbuf_wr(fd, &w->eb);
+	if (w->i915.eb.flags & I915_EXEC_FENCE_OUT)
+		gem_execbuf_wr(fd, &w->i915.eb);
 	else
-		gem_execbuf(fd, &w->eb);
+		gem_execbuf(fd, &w->i915.eb);
 
-	if (w->eb.flags & I915_EXEC_FENCE_OUT) {
-		w->emit_fence = w->eb.rsvd2 >> 32;
+	if (w->i915.eb.flags & I915_EXEC_FENCE_OUT) {
+		w->emit_fence = w->i915.eb.rsvd2 >> 32;
 		igt_assert(w->emit_fence > 0);
 	}
 }
@@ -2296,7 +2300,7 @@ static void *run_workload(void *data)
 				igt_assert(wrk->steps[t_idx].type == BATCH);
 				igt_assert(wrk->steps[t_idx].duration.unbound);
 
-				*wrk->steps[t_idx].bb_duration = 0xffffffff;
+				*wrk->steps[t_idx].i915.bb_duration = 0xffffffff;
 				__sync_synchronize();
 				continue;
 			} else if (w->type == SSEU) {
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 14/17] benchmarks/gem_wsim: group i915 fields
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 14/17] benchmarks/gem_wsim: group i915 fields Marcin Bernatowicz
@ 2023-09-29  9:33   ` Tvrtko Ursulin
  0 siblings, 0 replies; 40+ messages in thread
From: Tvrtko Ursulin @ 2023-09-29  9:33 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 28/09/2023 18:45, Marcin Bernatowicz wrote:
> Group i915 specific fields.
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 100 ++++++++++++++++++++++--------------------
>   1 file changed, 52 insertions(+), 48 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index e875ddaa7..4618509ab 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -174,11 +174,15 @@ struct w_step {
>   	unsigned int request;
>   	unsigned int preempt_us;
>   
> -	struct drm_i915_gem_execbuffer2 eb;
> -	struct drm_i915_gem_exec_object2 *obj;
> -	struct drm_i915_gem_relocation_entry reloc[3];
> +	union {
> +		struct {
> +			struct drm_i915_gem_execbuffer2 eb;
> +			struct drm_i915_gem_exec_object2 *obj;
> +			struct drm_i915_gem_relocation_entry reloc[3];
> +			uint32_t *bb_duration;

That bb_duration was i915 specific caught my attention, but then I 
figured out xe spinner supports this functionality. Nice. Maybe as a 
follow up task someone can move the same into i915 spinner, although 
return on investment might be low. Perf_pmu tests *could* use it, but 
perhaps it is not a simple task. Anyway, digression over..

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

> +		} i915;
> +	};
>   	uint32_t bb_handle;
> -	uint32_t *bb_duration;
>   };
>   
>   struct ctx {
> @@ -247,7 +251,7 @@ static const char *ring_str_map[NUM_ENGINES] = {
>   
>   static void w_step_sync(struct w_step *w)
>   {
> -	gem_sync(fd, w->obj[0].handle);
> +	gem_sync(fd, w->i915.obj[0].handle);
>   }
>   
>   static int read_timestamp_frequency(int i915)
> @@ -1374,9 +1378,9 @@ static unsigned int create_bb(struct w_step *w, int self)
>   	/* Save delta for indirect read by COND_BBE */
>   	*cs++ = MI_STORE_REGISTER_MEM_CMD | (1 + use_64b) | MI_CS_MMIO_DST;
>   	*cs++ = CS_GPR(NOW_TS);
> -	w->reloc[r].target_handle = self;
> -	w->reloc[r].offset = offset_in_page(cs);
> -	*cs++ = w->reloc[r].delta = 4000;
> +	w->i915.reloc[r].target_handle = self;
> +	w->i915.reloc[r].offset = offset_in_page(cs);
> +	*cs++ = w->i915.reloc[r].delta = 4000;
>   	*cs++ = 0;
>   	r++;
>   
> @@ -1389,19 +1393,19 @@ static unsigned int create_bb(struct w_step *w, int self)
>   
>   	/* Break if delta [time elapsed] > target ns (target filled in later) */
>   	*cs++ = MI_COND_BATCH_BUFFER_END | MI_DO_COMPARE | (1 + use_64b);
> -	w->bb_duration = cs;
> +	w->i915.bb_duration = cs;
>   	*cs++ = 0;
> -	w->reloc[r].target_handle = self;
> -	w->reloc[r].offset = offset_in_page(cs);
> -	*cs++ = w->reloc[r].delta = 4000;
> +	w->i915.reloc[r].target_handle = self;
> +	w->i915.reloc[r].offset = offset_in_page(cs);
> +	*cs++ = w->i915.reloc[r].delta = 4000;
>   	*cs++ = 0;
>   	r++;
>   
>   	/* Otherwise back to recalculating delta */
>   	*cs++ = MI_BATCH_BUFFER_START | 1 << 8 | use_64b;
> -	w->reloc[r].target_handle = self;
> -	w->reloc[r].offset = offset_in_page(cs);
> -	*cs++ = w->reloc[r].delta = offset_in_page(jmp);
> +	w->i915.reloc[r].target_handle = self;
> +	w->i915.reloc[r].offset = offset_in_page(cs);
> +	*cs++ = w->i915.reloc[r].delta = offset_in_page(jmp);
>   	*cs++ = 0;
>   	r++;
>   
> @@ -1446,16 +1450,16 @@ eb_update_flags(struct workload *wrk, struct w_step *w,
>   	struct ctx *ctx = __get_ctx(wrk, w);
>   
>   	if (ctx->engine_map)
> -		w->eb.flags = find_engine_in_map(ctx, engine);
> +		w->i915.eb.flags = find_engine_in_map(ctx, engine);
>   	else
> -		eb_set_engine(&w->eb, engine);
> +		eb_set_engine(&w->i915.eb, engine);
>   
> -	w->eb.flags |= I915_EXEC_HANDLE_LUT;
> -	w->eb.flags |= I915_EXEC_NO_RELOC;
> +	w->i915.eb.flags |= I915_EXEC_HANDLE_LUT;
> +	w->i915.eb.flags |= I915_EXEC_NO_RELOC;
>   
>   	igt_assert(w->emit_fence <= 0);
>   	if (w->emit_fence)
> -		w->eb.flags |= I915_EXEC_FENCE_OUT;
> +		w->i915.eb.flags |= I915_EXEC_FENCE_OUT;
>   }
>   
>   static uint32_t
> @@ -1477,11 +1481,11 @@ alloc_step_batch(struct workload *wrk, struct w_step *w)
>   	unsigned int nr_obj = 2 + w->data_deps.nr;
>   	unsigned int i;
>   
> -	w->obj = calloc(nr_obj, sizeof(*w->obj));
> -	igt_assert(w->obj);
> +	w->i915.obj = calloc(nr_obj, sizeof(*w->i915.obj));
> +	igt_assert(w->i915.obj);
>   
> -	w->obj[j].handle = alloc_bo(fd, 4096);
> -	w->obj[j].flags = EXEC_OBJECT_WRITE;
> +	w->i915.obj[j].handle = alloc_bo(fd, 4096);
> +	w->i915.obj[j].flags = EXEC_OBJECT_WRITE;
>   	j++;
>   	igt_assert(j < nr_obj);
>   
> @@ -1496,7 +1500,7 @@ alloc_step_batch(struct workload *wrk, struct w_step *w)
>   			igt_assert(dep_idx >= 0 && dep_idx < w->idx);
>   			igt_assert(wrk->steps[dep_idx].type == BATCH);
>   
> -			dep_handle = wrk->steps[dep_idx].obj[0].handle;
> +			dep_handle = wrk->steps[dep_idx].i915.obj[0].handle;
>   		} else {
>   			struct working_set *set;
>   
> @@ -1512,28 +1516,28 @@ alloc_step_batch(struct workload *wrk, struct w_step *w)
>   			dep_handle = set->handles[entry->target];
>   		}
>   
> -		w->obj[j].flags = entry->write ? EXEC_OBJECT_WRITE : 0;
> -		w->obj[j].handle = dep_handle;
> +		w->i915.obj[j].flags = entry->write ? EXEC_OBJECT_WRITE : 0;
> +		w->i915.obj[j].handle = dep_handle;
>   		j++;
>   		igt_assert(j < nr_obj);
>   	}
>   
> -	w->bb_handle = w->obj[j].handle = gem_create(fd, 4096);
> -	w->obj[j].relocation_count = create_bb(w, j);
> -	igt_assert(w->obj[j].relocation_count <= ARRAY_SIZE(w->reloc));
> -	w->obj[j].relocs_ptr = to_user_pointer(&w->reloc);
> +	w->bb_handle = w->i915.obj[j].handle = gem_create(fd, 4096);
> +	w->i915.obj[j].relocation_count = create_bb(w, j);
> +	igt_assert(w->i915.obj[j].relocation_count <= ARRAY_SIZE(w->i915.reloc));
> +	w->i915.obj[j].relocs_ptr = to_user_pointer(&w->i915.reloc);
>   
> -	w->eb.buffers_ptr = to_user_pointer(w->obj);
> -	w->eb.buffer_count = j + 1;
> -	w->eb.rsvd1 = get_ctxid(wrk, w);
> +	w->i915.eb.buffers_ptr = to_user_pointer(w->i915.obj);
> +	w->i915.eb.buffer_count = j + 1;
> +	w->i915.eb.rsvd1 = get_ctxid(wrk, w);
>   
>   	eb_update_flags(wrk, w, engine);
>   #ifdef DEBUG
> -	printf("%u: %u:|", w->idx, w->eb.buffer_count);
> +	printf("%u: %u:|", w->idx, w->i915.eb.buffer_count);
>   	for (i = 0; i <= j; i++)
> -		printf("%x|", w->obj[i].handle);
> +		printf("%x|", w->i915.obj[i].handle);
>   	printf(" flags=%llx bb=%x[%u] ctx[%u]=%u\n",
> -		w->eb.flags, w->bb_handle, j, w->context,
> +		w->i915.eb.flags, w->bb_handle, j, w->context,
>   		get_ctxid(wrk, w));
>   #endif
>   }
> @@ -1730,7 +1734,7 @@ static void measure_active_set(struct workload *wrk)
>   				igt_assert(idx >= 0 && idx < w->idx);
>   				igt_assert(wrk->steps[idx].type == BATCH);
>   
> -				_dep.target = wrk->steps[idx].obj[0].handle;
> +				_dep.target = wrk->steps[idx].i915.obj[0].handle;
>   			}
>   
>   			if (!find_dep(deps, nr, _dep)) {
> @@ -2120,7 +2124,7 @@ update_bb_start(struct workload *wrk, struct w_step *w)
>   	if (!w->duration.unbound)
>   		ticks = ~ns_to_ctx_ticks(1000LL * get_duration(wrk, w));
>   
> -	*w->bb_duration = ticks;
> +	*w->i915.bb_duration = ticks;
>   }
>   
>   static void w_sync_to(struct workload *wrk, struct w_step *w, int target)
> @@ -2158,20 +2162,20 @@ do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine)
>   		igt_assert(wrk->steps[tgt].emit_fence > 0);
>   
>   		if (w->fence_deps.submit_fence)
> -			w->eb.flags |= I915_EXEC_FENCE_SUBMIT;
> +			w->i915.eb.flags |= I915_EXEC_FENCE_SUBMIT;
>   		else
> -			w->eb.flags |= I915_EXEC_FENCE_IN;
> +			w->i915.eb.flags |= I915_EXEC_FENCE_IN;
>   
> -		w->eb.rsvd2 = wrk->steps[tgt].emit_fence;
> +		w->i915.eb.rsvd2 = wrk->steps[tgt].emit_fence;
>   	}
>   
> -	if (w->eb.flags & I915_EXEC_FENCE_OUT)
> -		gem_execbuf_wr(fd, &w->eb);
> +	if (w->i915.eb.flags & I915_EXEC_FENCE_OUT)
> +		gem_execbuf_wr(fd, &w->i915.eb);
>   	else
> -		gem_execbuf(fd, &w->eb);
> +		gem_execbuf(fd, &w->i915.eb);
>   
> -	if (w->eb.flags & I915_EXEC_FENCE_OUT) {
> -		w->emit_fence = w->eb.rsvd2 >> 32;
> +	if (w->i915.eb.flags & I915_EXEC_FENCE_OUT) {
> +		w->emit_fence = w->i915.eb.rsvd2 >> 32;
>   		igt_assert(w->emit_fence > 0);
>   	}
>   }
> @@ -2296,7 +2300,7 @@ static void *run_workload(void *data)
>   				igt_assert(wrk->steps[t_idx].type == BATCH);
>   				igt_assert(wrk->steps[t_idx].duration.unbound);
>   
> -				*wrk->steps[t_idx].bb_duration = 0xffffffff;
> +				*wrk->steps[t_idx].i915.bb_duration = 0xffffffff;
>   				__sync_synchronize();
>   				continue;
>   			} else if (w->type == SSEU) {

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [igt-dev] [PATCH i-g-t 15/17] benchmarks/gem_wsim: introduce bb_size in w_step
  2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (13 preceding siblings ...)
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 14/17] benchmarks/gem_wsim: group i915 fields Marcin Bernatowicz
@ 2023-09-28 17:45 ` Marcin Bernatowicz
  2023-09-29  9:35   ` Tvrtko Ursulin
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 16/17] benchmarks/gem_wsim: for_each_dep macro Marcin Bernatowicz
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Marcin Bernatowicz @ 2023-09-28 17:45 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Put it next to bb_handle.
Use it in alloc_step_batch and measure_active_set.

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 4618509ab..d22d66aeb 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -183,6 +183,7 @@ struct w_step {
 		} i915;
 	};
 	uint32_t bb_handle;
+	size_t bb_size;
 };
 
 struct ctx {
@@ -1481,6 +1482,7 @@ alloc_step_batch(struct workload *wrk, struct w_step *w)
 	unsigned int nr_obj = 2 + w->data_deps.nr;
 	unsigned int i;
 
+	w->bb_size = 4096;
 	w->i915.obj = calloc(nr_obj, sizeof(*w->i915.obj));
 	igt_assert(w->i915.obj);
 
@@ -1522,7 +1524,7 @@ alloc_step_batch(struct workload *wrk, struct w_step *w)
 		igt_assert(j < nr_obj);
 	}
 
-	w->bb_handle = w->i915.obj[j].handle = gem_create(fd, 4096);
+	w->bb_handle = w->i915.obj[j].handle = gem_create(fd, w->bb_size);
 	w->i915.obj[j].relocation_count = create_bb(w, j);
 	igt_assert(w->i915.obj[j].relocation_count <= ARRAY_SIZE(w->i915.reloc));
 	w->i915.obj[j].relocs_ptr = to_user_pointer(&w->i915.reloc);
@@ -1722,7 +1724,7 @@ static void measure_active_set(struct workload *wrk)
 		if (w->type != BATCH)
 			continue;
 
-		batch_sizes += 4096;
+		batch_sizes += w->bb_size;
 
 		for (j = 0; j < w->data_deps.nr; j++) {
 			struct dep_entry *dep = &w->data_deps.list[j];
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 15/17] benchmarks/gem_wsim: introduce bb_size in w_step
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 15/17] benchmarks/gem_wsim: introduce bb_size in w_step Marcin Bernatowicz
@ 2023-09-29  9:35   ` Tvrtko Ursulin
  2023-09-29 10:08     ` Bernatowicz, Marcin
  0 siblings, 1 reply; 40+ messages in thread
From: Tvrtko Ursulin @ 2023-09-29  9:35 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 28/09/2023 18:45, Marcin Bernatowicz wrote:
> Put it next to bb_handle.
> Use it in alloc_step_batch and measure_active_set.

Could say why.

Like xe might need more than 4k? Might not be able to allocate only 4k? 
(Guessing only.)

Regards,

Tvrtko

> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 4618509ab..d22d66aeb 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -183,6 +183,7 @@ struct w_step {
>   		} i915;
>   	};
>   	uint32_t bb_handle;
> +	size_t bb_size;
>   };
>   
>   struct ctx {
> @@ -1481,6 +1482,7 @@ alloc_step_batch(struct workload *wrk, struct w_step *w)
>   	unsigned int nr_obj = 2 + w->data_deps.nr;
>   	unsigned int i;
>   
> +	w->bb_size = 4096;
>   	w->i915.obj = calloc(nr_obj, sizeof(*w->i915.obj));
>   	igt_assert(w->i915.obj);
>   
> @@ -1522,7 +1524,7 @@ alloc_step_batch(struct workload *wrk, struct w_step *w)
>   		igt_assert(j < nr_obj);
>   	}
>   
> -	w->bb_handle = w->i915.obj[j].handle = gem_create(fd, 4096);
> +	w->bb_handle = w->i915.obj[j].handle = gem_create(fd, w->bb_size);
>   	w->i915.obj[j].relocation_count = create_bb(w, j);
>   	igt_assert(w->i915.obj[j].relocation_count <= ARRAY_SIZE(w->i915.reloc));
>   	w->i915.obj[j].relocs_ptr = to_user_pointer(&w->i915.reloc);
> @@ -1722,7 +1724,7 @@ static void measure_active_set(struct workload *wrk)
>   		if (w->type != BATCH)
>   			continue;
>   
> -		batch_sizes += 4096;
> +		batch_sizes += w->bb_size;
>   
>   		for (j = 0; j < w->data_deps.nr; j++) {
>   			struct dep_entry *dep = &w->data_deps.list[j];

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 15/17] benchmarks/gem_wsim: introduce bb_size in w_step
  2023-09-29  9:35   ` Tvrtko Ursulin
@ 2023-09-29 10:08     ` Bernatowicz, Marcin
  2023-09-29 10:49       ` Tvrtko Ursulin
  0 siblings, 1 reply; 40+ messages in thread
From: Bernatowicz, Marcin @ 2023-09-29 10:08 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: chris.p.wilson



On 9/29/2023 11:35 AM, Tvrtko Ursulin wrote:
> 
> On 28/09/2023 18:45, Marcin Bernatowicz wrote:
>> Put it next to bb_handle.
>> Use it in alloc_step_batch and measure_active_set.
> 
> Could say why.
> 
> Like xe might need more than 4k? Might not be able to allocate only 4k? 
> (Guessing only.)

Xe uses following formula:

w->bb_size = ALIGN(sizeof(*w->xe.data) + xe_cs_prefetch_size(fd),
			   xe_get_default_alignment(fd));

which equaled 4096 on platform I tested.
I didn't want to put bb_size inside xe specifics as it is connected with 
bb_handle.

Regards,
marcin

> 
> Regards,
> 
> Tvrtko
> 
>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>> ---
>>   benchmarks/gem_wsim.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>> index 4618509ab..d22d66aeb 100644
>> --- a/benchmarks/gem_wsim.c
>> +++ b/benchmarks/gem_wsim.c
>> @@ -183,6 +183,7 @@ struct w_step {
>>           } i915;
>>       };
>>       uint32_t bb_handle;
>> +    size_t bb_size;
>>   };
>>   struct ctx {
>> @@ -1481,6 +1482,7 @@ alloc_step_batch(struct workload *wrk, struct 
>> w_step *w)
>>       unsigned int nr_obj = 2 + w->data_deps.nr;
>>       unsigned int i;
>> +    w->bb_size = 4096;
>>       w->i915.obj = calloc(nr_obj, sizeof(*w->i915.obj));
>>       igt_assert(w->i915.obj);
>> @@ -1522,7 +1524,7 @@ alloc_step_batch(struct workload *wrk, struct 
>> w_step *w)
>>           igt_assert(j < nr_obj);
>>       }
>> -    w->bb_handle = w->i915.obj[j].handle = gem_create(fd, 4096);
>> +    w->bb_handle = w->i915.obj[j].handle = gem_create(fd, w->bb_size);
>>       w->i915.obj[j].relocation_count = create_bb(w, j);
>>       igt_assert(w->i915.obj[j].relocation_count <= 
>> ARRAY_SIZE(w->i915.reloc));
>>       w->i915.obj[j].relocs_ptr = to_user_pointer(&w->i915.reloc);
>> @@ -1722,7 +1724,7 @@ static void measure_active_set(struct workload 
>> *wrk)
>>           if (w->type != BATCH)
>>               continue;
>> -        batch_sizes += 4096;
>> +        batch_sizes += w->bb_size;
>>           for (j = 0; j < w->data_deps.nr; j++) {
>>               struct dep_entry *dep = &w->data_deps.list[j];

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 15/17] benchmarks/gem_wsim: introduce bb_size in w_step
  2023-09-29 10:08     ` Bernatowicz, Marcin
@ 2023-09-29 10:49       ` Tvrtko Ursulin
  2023-10-05 10:52         ` Bernatowicz, Marcin
  0 siblings, 1 reply; 40+ messages in thread
From: Tvrtko Ursulin @ 2023-09-29 10:49 UTC (permalink / raw)
  To: Bernatowicz, Marcin, igt-dev; +Cc: chris.p.wilson


On 29/09/2023 11:08, Bernatowicz, Marcin wrote:
> 
> 
> On 9/29/2023 11:35 AM, Tvrtko Ursulin wrote:
>>
>> On 28/09/2023 18:45, Marcin Bernatowicz wrote:
>>> Put it next to bb_handle.
>>> Use it in alloc_step_batch and measure_active_set.
>>
>> Could say why.
>>
>> Like xe might need more than 4k? Might not be able to allocate only 
>> 4k? (Guessing only.)
> 
> Xe uses following formula:
> 
> w->bb_size = ALIGN(sizeof(*w->xe.data) + xe_cs_prefetch_size(fd),
>                 xe_get_default_alignment(fd));
> 
> which equaled 4096 on platform I tested.
> I didn't want to put bb_size inside xe specifics as it is connected with 
> bb_handle.

Hmmm could you dig a bit to figure out if sometimes this can be larger 
than 4k and if so why only xe and not i915. Because things like prefetch 
and alignment sound like should be more hardware than driver dependent.

Regards,

Tvrtko

> 
> Regards,
> marcin
> 
>>
>> Regards,
>>
>> Tvrtko
>>
>>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>>> ---
>>>   benchmarks/gem_wsim.c | 6 ++++--
>>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>>> index 4618509ab..d22d66aeb 100644
>>> --- a/benchmarks/gem_wsim.c
>>> +++ b/benchmarks/gem_wsim.c
>>> @@ -183,6 +183,7 @@ struct w_step {
>>>           } i915;
>>>       };
>>>       uint32_t bb_handle;
>>> +    size_t bb_size;
>>>   };
>>>   struct ctx {
>>> @@ -1481,6 +1482,7 @@ alloc_step_batch(struct workload *wrk, struct 
>>> w_step *w)
>>>       unsigned int nr_obj = 2 + w->data_deps.nr;
>>>       unsigned int i;
>>> +    w->bb_size = 4096;
>>>       w->i915.obj = calloc(nr_obj, sizeof(*w->i915.obj));
>>>       igt_assert(w->i915.obj);
>>> @@ -1522,7 +1524,7 @@ alloc_step_batch(struct workload *wrk, struct 
>>> w_step *w)
>>>           igt_assert(j < nr_obj);
>>>       }
>>> -    w->bb_handle = w->i915.obj[j].handle = gem_create(fd, 4096);
>>> +    w->bb_handle = w->i915.obj[j].handle = gem_create(fd, w->bb_size);
>>>       w->i915.obj[j].relocation_count = create_bb(w, j);
>>>       igt_assert(w->i915.obj[j].relocation_count <= 
>>> ARRAY_SIZE(w->i915.reloc));
>>>       w->i915.obj[j].relocs_ptr = to_user_pointer(&w->i915.reloc);
>>> @@ -1722,7 +1724,7 @@ static void measure_active_set(struct workload 
>>> *wrk)
>>>           if (w->type != BATCH)
>>>               continue;
>>> -        batch_sizes += 4096;
>>> +        batch_sizes += w->bb_size;
>>>           for (j = 0; j < w->data_deps.nr; j++) {
>>>               struct dep_entry *dep = &w->data_deps.list[j];

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 15/17] benchmarks/gem_wsim: introduce bb_size in w_step
  2023-09-29 10:49       ` Tvrtko Ursulin
@ 2023-10-05 10:52         ` Bernatowicz, Marcin
  2023-10-05 12:30           ` Tvrtko Ursulin
  0 siblings, 1 reply; 40+ messages in thread
From: Bernatowicz, Marcin @ 2023-10-05 10:52 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: chris.p.wilson



On 9/29/2023 12:49 PM, Tvrtko Ursulin wrote:
> 
> On 29/09/2023 11:08, Bernatowicz, Marcin wrote:
>>
>>
>> On 9/29/2023 11:35 AM, Tvrtko Ursulin wrote:
>>>
>>> On 28/09/2023 18:45, Marcin Bernatowicz wrote:
>>>> Put it next to bb_handle.
>>>> Use it in alloc_step_batch and measure_active_set.
>>>
>>> Could say why.
>>>
>>> Like xe might need more than 4k? Might not be able to allocate only 
>>> 4k? (Guessing only.)
>>
>> Xe uses following formula:
>>
>> w->bb_size = ALIGN(sizeof(*w->xe.data) + xe_cs_prefetch_size(fd),
>>                 xe_get_default_alignment(fd));
>>
>> which equaled 4096 on platform I tested.
>> I didn't want to put bb_size inside xe specifics as it is connected 
>> with bb_handle.
> 
> Hmmm could you dig a bit to figure out if sometimes this can be larger 
> than 4k and if so why only xe and not i915. Because things like prefetch 
> and alignment sound like should be more hardware than driver dependent.

I got information there may be a case a prefetch size 4096 on some 
platform, but I did not get a clear answer why/if above calculation is 
needed/redundant. So I assume it's redundant and I will not copy/paste 
it from igt tests.

There is one concern I have related to DRM_IOCTL_XE_GEM_CREATE ioctl and 
size field, suggesting a need for bb_size according to description:

struct drm_xe_gem_create {
         ...
	/**
	 * @size: Requested size for the object
	 *
	 * The (page-aligned) allocated size for the object will be returned.
	 */
	__u64 size;

It suggests size could be adjusted after a call, but I think there is 
some discussion ongoing to that, so will wait. (and the 
xe_bo_create_flags does not take it into account and does not update 
size param)

I will drop the patch, leave 4096 and I'm expecting a xe_bo_create 
failure if not possible.

Regards,
Marcin

> 
> Regards,
> 
> Tvrtko
> 
>>
>> Regards,
>> marcin
>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>>>> ---
>>>>   benchmarks/gem_wsim.c | 6 ++++--
>>>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>>>> index 4618509ab..d22d66aeb 100644
>>>> --- a/benchmarks/gem_wsim.c
>>>> +++ b/benchmarks/gem_wsim.c
>>>> @@ -183,6 +183,7 @@ struct w_step {
>>>>           } i915;
>>>>       };
>>>>       uint32_t bb_handle;
>>>> +    size_t bb_size;
>>>>   };
>>>>   struct ctx {
>>>> @@ -1481,6 +1482,7 @@ alloc_step_batch(struct workload *wrk, struct 
>>>> w_step *w)
>>>>       unsigned int nr_obj = 2 + w->data_deps.nr;
>>>>       unsigned int i;
>>>> +    w->bb_size = 4096;
>>>>       w->i915.obj = calloc(nr_obj, sizeof(*w->i915.obj));
>>>>       igt_assert(w->i915.obj);
>>>> @@ -1522,7 +1524,7 @@ alloc_step_batch(struct workload *wrk, struct 
>>>> w_step *w)
>>>>           igt_assert(j < nr_obj);
>>>>       }
>>>> -    w->bb_handle = w->i915.obj[j].handle = gem_create(fd, 4096);
>>>> +    w->bb_handle = w->i915.obj[j].handle = gem_create(fd, w->bb_size);
>>>>       w->i915.obj[j].relocation_count = create_bb(w, j);
>>>>       igt_assert(w->i915.obj[j].relocation_count <= 
>>>> ARRAY_SIZE(w->i915.reloc));
>>>>       w->i915.obj[j].relocs_ptr = to_user_pointer(&w->i915.reloc);
>>>> @@ -1722,7 +1724,7 @@ static void measure_active_set(struct workload 
>>>> *wrk)
>>>>           if (w->type != BATCH)
>>>>               continue;
>>>> -        batch_sizes += 4096;
>>>> +        batch_sizes += w->bb_size;
>>>>           for (j = 0; j < w->data_deps.nr; j++) {
>>>>               struct dep_entry *dep = &w->data_deps.list[j];

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 15/17] benchmarks/gem_wsim: introduce bb_size in w_step
  2023-10-05 10:52         ` Bernatowicz, Marcin
@ 2023-10-05 12:30           ` Tvrtko Ursulin
  0 siblings, 0 replies; 40+ messages in thread
From: Tvrtko Ursulin @ 2023-10-05 12:30 UTC (permalink / raw)
  To: Bernatowicz, Marcin, igt-dev; +Cc: chris.p.wilson


On 05/10/2023 11:52, Bernatowicz, Marcin wrote:
> 
> 
> On 9/29/2023 12:49 PM, Tvrtko Ursulin wrote:
>>
>> On 29/09/2023 11:08, Bernatowicz, Marcin wrote:
>>>
>>>
>>> On 9/29/2023 11:35 AM, Tvrtko Ursulin wrote:
>>>>
>>>> On 28/09/2023 18:45, Marcin Bernatowicz wrote:
>>>>> Put it next to bb_handle.
>>>>> Use it in alloc_step_batch and measure_active_set.
>>>>
>>>> Could say why.
>>>>
>>>> Like xe might need more than 4k? Might not be able to allocate only 
>>>> 4k? (Guessing only.)
>>>
>>> Xe uses following formula:
>>>
>>> w->bb_size = ALIGN(sizeof(*w->xe.data) + xe_cs_prefetch_size(fd),
>>>                 xe_get_default_alignment(fd));
>>>
>>> which equaled 4096 on platform I tested.
>>> I didn't want to put bb_size inside xe specifics as it is connected 
>>> with bb_handle.
>>
>> Hmmm could you dig a bit to figure out if sometimes this can be larger 
>> than 4k and if so why only xe and not i915. Because things like 
>> prefetch and alignment sound like should be more hardware than driver 
>> dependent.
> 
> I got information there may be a case a prefetch size 4096 on some 
> platform, but I did not get a clear answer why/if above calculation is 
> needed/redundant. So I assume it's redundant and I will not copy/paste 
> it from igt tests.
> 
> There is one concern I have related to DRM_IOCTL_XE_GEM_CREATE ioctl and 
> size field, suggesting a need for bb_size according to description:
> 
> struct drm_xe_gem_create {
>          ...
>      /**
>       * @size: Requested size for the object
>       *
>       * The (page-aligned) allocated size for the object will be returned.
>       */
>      __u64 size;
> 
> It suggests size could be adjusted after a call, but I think there is 
> some discussion ongoing to that, so will wait. (and the 
> xe_bo_create_flags does not take it into account and does not update 
> size param)

i915 gem_create also rounds up the size and reports it back btw.

Regards,

Tvrtko

> 
> I will drop the patch, leave 4096 and I'm expecting a xe_bo_create 
> failure if not possible.
> 
> Regards,
> Marcin
> 
>>
>> Regards,
>>
>> Tvrtko
>>
>>>
>>> Regards,
>>> marcin
>>>
>>>>
>>>> Regards,
>>>>
>>>> Tvrtko
>>>>
>>>>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>>>>> ---
>>>>>   benchmarks/gem_wsim.c | 6 ++++--
>>>>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>>>>> index 4618509ab..d22d66aeb 100644
>>>>> --- a/benchmarks/gem_wsim.c
>>>>> +++ b/benchmarks/gem_wsim.c
>>>>> @@ -183,6 +183,7 @@ struct w_step {
>>>>>           } i915;
>>>>>       };
>>>>>       uint32_t bb_handle;
>>>>> +    size_t bb_size;
>>>>>   };
>>>>>   struct ctx {
>>>>> @@ -1481,6 +1482,7 @@ alloc_step_batch(struct workload *wrk, struct 
>>>>> w_step *w)
>>>>>       unsigned int nr_obj = 2 + w->data_deps.nr;
>>>>>       unsigned int i;
>>>>> +    w->bb_size = 4096;
>>>>>       w->i915.obj = calloc(nr_obj, sizeof(*w->i915.obj));
>>>>>       igt_assert(w->i915.obj);
>>>>> @@ -1522,7 +1524,7 @@ alloc_step_batch(struct workload *wrk, struct 
>>>>> w_step *w)
>>>>>           igt_assert(j < nr_obj);
>>>>>       }
>>>>> -    w->bb_handle = w->i915.obj[j].handle = gem_create(fd, 4096);
>>>>> +    w->bb_handle = w->i915.obj[j].handle = gem_create(fd, 
>>>>> w->bb_size);
>>>>>       w->i915.obj[j].relocation_count = create_bb(w, j);
>>>>>       igt_assert(w->i915.obj[j].relocation_count <= 
>>>>> ARRAY_SIZE(w->i915.reloc));
>>>>>       w->i915.obj[j].relocs_ptr = to_user_pointer(&w->i915.reloc);
>>>>> @@ -1722,7 +1724,7 @@ static void measure_active_set(struct 
>>>>> workload *wrk)
>>>>>           if (w->type != BATCH)
>>>>>               continue;
>>>>> -        batch_sizes += 4096;
>>>>> +        batch_sizes += w->bb_size;
>>>>>           for (j = 0; j < w->data_deps.nr; j++) {
>>>>>               struct dep_entry *dep = &w->data_deps.list[j];

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [igt-dev] [PATCH i-g-t 16/17] benchmarks/gem_wsim: for_each_dep macro
  2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (14 preceding siblings ...)
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 15/17] benchmarks/gem_wsim: introduce bb_size in w_step Marcin Bernatowicz
@ 2023-09-28 17:45 ` Marcin Bernatowicz
  2023-09-29  9:37   ` Tvrtko Ursulin
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 17/17] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 40+ messages in thread
From: Marcin Bernatowicz @ 2023-09-28 17:45 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Utility macro to easy traverse dependencies.

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c | 45 ++++++++++++++++++++++++-------------------
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index d22d66aeb..d447dced4 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -109,6 +109,10 @@ struct deps {
 	struct dep_entry *list;
 };
 
+#define for_each_dep(__dep, __deps) \
+	for (int __i = 0; __i < __deps.nr && \
+	     (__dep = &__deps.list[__i]); ++__i)
+
 struct w_arg {
 	char *filename;
 	char *desc;
@@ -1150,8 +1154,10 @@ add_step:
 	 * referencing them as a sync fence dependency.
 	 */
 	for (i = 0; i < nr_steps; i++) {
-		for (j = 0; j < steps[i].fence_deps.nr; j++) {
-			tmp = steps[i].idx + steps[i].fence_deps.list[j].target;
+		struct dep_entry *dep;
+
+		for_each_dep(dep, steps[i].fence_deps) {
+			tmp = steps[i].idx + dep->target;
 			check_arg(tmp < 0 || tmp >= i ||
 				  (steps[tmp].type != BATCH &&
 				   steps[tmp].type != SW_FENCE),
@@ -1478,9 +1484,9 @@ static void
 alloc_step_batch(struct workload *wrk, struct w_step *w)
 {
 	enum intel_engine_id engine = w->engine;
+	struct dep_entry *dep;
 	unsigned int j = 0;
 	unsigned int nr_obj = 2 + w->data_deps.nr;
-	unsigned int i;
 
 	w->bb_size = 4096;
 	w->i915.obj = calloc(nr_obj, sizeof(*w->i915.obj));
@@ -1491,14 +1497,13 @@ alloc_step_batch(struct workload *wrk, struct w_step *w)
 	j++;
 	igt_assert(j < nr_obj);
 
-	for (i = 0; i < w->data_deps.nr; i++) {
-		struct dep_entry *entry = &w->data_deps.list[i];
+	for_each_dep(dep, w->data_deps) {
 		uint32_t dep_handle;
 
-		if (entry->working_set == -1) {
-			int dep_idx = w->idx + entry->target;
+		if (dep->working_set == -1) {
+			int dep_idx = w->idx + dep->target;
 
-			igt_assert(entry->target <= 0);
+			igt_assert(dep->target <= 0);
 			igt_assert(dep_idx >= 0 && dep_idx < w->idx);
 			igt_assert(wrk->steps[dep_idx].type == BATCH);
 
@@ -1506,19 +1511,19 @@ alloc_step_batch(struct workload *wrk, struct w_step *w)
 		} else {
 			struct working_set *set;
 
-			igt_assert(entry->working_set <=
+			igt_assert(dep->working_set <=
 				   wrk->max_working_set_id);
 
-			set = wrk->working_sets[entry->working_set];
+			set = wrk->working_sets[dep->working_set];
 
 			igt_assert(set->nr);
-			igt_assert(entry->target < set->nr);
-			igt_assert(set->sizes[entry->target].size);
+			igt_assert(dep->target < set->nr);
+			igt_assert(set->sizes[dep->target].size);
 
-			dep_handle = set->handles[entry->target];
+			dep_handle = set->handles[dep->target];
 		}
 
-		w->i915.obj[j].flags = entry->write ? EXEC_OBJECT_WRITE : 0;
+		w->i915.obj[j].flags = dep->write ? EXEC_OBJECT_WRITE : 0;
 		w->i915.obj[j].handle = dep_handle;
 		j++;
 		igt_assert(j < nr_obj);
@@ -1713,8 +1718,8 @@ find_dep(struct dep_entry *deps, unsigned int nr, struct dep_entry dep)
 static void measure_active_set(struct workload *wrk)
 {
 	unsigned long total = 0, batch_sizes = 0;
-	struct dep_entry *deps = NULL;
-	unsigned int nr = 0, i, j;
+	struct dep_entry *dep, *deps = NULL;
+	unsigned int nr = 0, i;
 	struct w_step *w;
 
 	if (verbose < 3)
@@ -1726,8 +1731,7 @@ static void measure_active_set(struct workload *wrk)
 
 		batch_sizes += w->bb_size;
 
-		for (j = 0; j < w->data_deps.nr; j++) {
-			struct dep_entry *dep = &w->data_deps.list[j];
+		for_each_dep(dep, w->data_deps) {
 			struct dep_entry _dep = *dep;
 
 			if (dep->working_set == -1 && dep->target < 0) {
@@ -2150,13 +2154,14 @@ static void w_sync_to(struct workload *wrk, struct w_step *w, int target)
 static void
 do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine)
 {
+	struct dep_entry *dep;
 	unsigned int i;
 
 	eb_update_flags(wrk, w, engine);
 	update_bb_start(wrk, w);
 
-	for (i = 0; i < w->fence_deps.nr; i++) {
-		int tgt = w->idx + w->fence_deps.list[i].target;
+	for_each_dep(dep, w->fence_deps) {
+		int tgt = w->idx + dep->target;
 
 		/* TODO: fence merging needed to support multiple inputs */
 		igt_assert(i == 0);
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 16/17] benchmarks/gem_wsim: for_each_dep macro
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 16/17] benchmarks/gem_wsim: for_each_dep macro Marcin Bernatowicz
@ 2023-09-29  9:37   ` Tvrtko Ursulin
  0 siblings, 0 replies; 40+ messages in thread
From: Tvrtko Ursulin @ 2023-09-29  9:37 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 28/09/2023 18:45, Marcin Bernatowicz wrote:
> Utility macro to easy traverse dependencies.
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c | 45 ++++++++++++++++++++++++-------------------
>   1 file changed, 25 insertions(+), 20 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index d22d66aeb..d447dced4 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -109,6 +109,10 @@ struct deps {
>   	struct dep_entry *list;
>   };
>   
> +#define for_each_dep(__dep, __deps) \
> +	for (int __i = 0; __i < __deps.nr && \
> +	     (__dep = &__deps.list[__i]); ++__i)
> +
>   struct w_arg {
>   	char *filename;
>   	char *desc;
> @@ -1150,8 +1154,10 @@ add_step:
>   	 * referencing them as a sync fence dependency.
>   	 */
>   	for (i = 0; i < nr_steps; i++) {
> -		for (j = 0; j < steps[i].fence_deps.nr; j++) {
> -			tmp = steps[i].idx + steps[i].fence_deps.list[j].target;
> +		struct dep_entry *dep;
> +
> +		for_each_dep(dep, steps[i].fence_deps) {
> +			tmp = steps[i].idx + dep->target;
>   			check_arg(tmp < 0 || tmp >= i ||
>   				  (steps[tmp].type != BATCH &&
>   				   steps[tmp].type != SW_FENCE),
> @@ -1478,9 +1484,9 @@ static void
>   alloc_step_batch(struct workload *wrk, struct w_step *w)
>   {
>   	enum intel_engine_id engine = w->engine;
> +	struct dep_entry *dep;
>   	unsigned int j = 0;
>   	unsigned int nr_obj = 2 + w->data_deps.nr;
> -	unsigned int i;
>   
>   	w->bb_size = 4096;
>   	w->i915.obj = calloc(nr_obj, sizeof(*w->i915.obj));
> @@ -1491,14 +1497,13 @@ alloc_step_batch(struct workload *wrk, struct w_step *w)
>   	j++;
>   	igt_assert(j < nr_obj);
>   
> -	for (i = 0; i < w->data_deps.nr; i++) {
> -		struct dep_entry *entry = &w->data_deps.list[i];
> +	for_each_dep(dep, w->data_deps) {
>   		uint32_t dep_handle;
>   
> -		if (entry->working_set == -1) {
> -			int dep_idx = w->idx + entry->target;
> +		if (dep->working_set == -1) {
> +			int dep_idx = w->idx + dep->target;
>   
> -			igt_assert(entry->target <= 0);
> +			igt_assert(dep->target <= 0);
>   			igt_assert(dep_idx >= 0 && dep_idx < w->idx);
>   			igt_assert(wrk->steps[dep_idx].type == BATCH);
>   
> @@ -1506,19 +1511,19 @@ alloc_step_batch(struct workload *wrk, struct w_step *w)
>   		} else {
>   			struct working_set *set;
>   
> -			igt_assert(entry->working_set <=
> +			igt_assert(dep->working_set <=
>   				   wrk->max_working_set_id);
>   
> -			set = wrk->working_sets[entry->working_set];
> +			set = wrk->working_sets[dep->working_set];
>   
>   			igt_assert(set->nr);
> -			igt_assert(entry->target < set->nr);
> -			igt_assert(set->sizes[entry->target].size);
> +			igt_assert(dep->target < set->nr);
> +			igt_assert(set->sizes[dep->target].size);
>   
> -			dep_handle = set->handles[entry->target];
> +			dep_handle = set->handles[dep->target];
>   		}
>   
> -		w->i915.obj[j].flags = entry->write ? EXEC_OBJECT_WRITE : 0;
> +		w->i915.obj[j].flags = dep->write ? EXEC_OBJECT_WRITE : 0;
>   		w->i915.obj[j].handle = dep_handle;
>   		j++;
>   		igt_assert(j < nr_obj);
> @@ -1713,8 +1718,8 @@ find_dep(struct dep_entry *deps, unsigned int nr, struct dep_entry dep)
>   static void measure_active_set(struct workload *wrk)
>   {
>   	unsigned long total = 0, batch_sizes = 0;
> -	struct dep_entry *deps = NULL;
> -	unsigned int nr = 0, i, j;
> +	struct dep_entry *dep, *deps = NULL;
> +	unsigned int nr = 0, i;
>   	struct w_step *w;
>   
>   	if (verbose < 3)
> @@ -1726,8 +1731,7 @@ static void measure_active_set(struct workload *wrk)
>   
>   		batch_sizes += w->bb_size;
>   
> -		for (j = 0; j < w->data_deps.nr; j++) {
> -			struct dep_entry *dep = &w->data_deps.list[j];
> +		for_each_dep(dep, w->data_deps) {
>   			struct dep_entry _dep = *dep;
>   
>   			if (dep->working_set == -1 && dep->target < 0) {
> @@ -2150,13 +2154,14 @@ static void w_sync_to(struct workload *wrk, struct w_step *w, int target)
>   static void
>   do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine)
>   {
> +	struct dep_entry *dep;
>   	unsigned int i;
>   
>   	eb_update_flags(wrk, w, engine);
>   	update_bb_start(wrk, w);
>   
> -	for (i = 0; i < w->fence_deps.nr; i++) {
> -		int tgt = w->idx + w->fence_deps.list[i].target;
> +	for_each_dep(dep, w->fence_deps) {
> +		int tgt = w->idx + dep->target;
>   
>   		/* TODO: fence merging needed to support multiple inputs */
>   		igt_assert(i == 0);

Smooth sailing.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin.com>

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [igt-dev] [PATCH i-g-t 17/17] benchmarks/gem_wsim: added basic xe support
  2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (15 preceding siblings ...)
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 16/17] benchmarks/gem_wsim: for_each_dep macro Marcin Bernatowicz
@ 2023-09-28 17:45 ` Marcin Bernatowicz
  2023-09-29 10:45   ` Tvrtko Ursulin
  2023-09-28 18:59 ` [igt-dev] ✓ CI.xeBAT: success for benchmarks/gem_wsim: added basic xe support (rev5) Patchwork
  2023-09-28 19:10 ` [igt-dev] ✗ Fi.CI.BAT: failure " Patchwork
  18 siblings, 1 reply; 40+ messages in thread
From: Marcin Bernatowicz @ 2023-09-28 17:45 UTC (permalink / raw)
  To: igt-dev; +Cc: chris.p.wilson

Added basic xe support. Single binary handles both i915 and Xe devices.

Some functionality is still missing: working sets, bonding.

The tool is handy for scheduling tests, we find it useful to verify vGPU
profiles defining different execution quantum/preemption timeout
settings.

There is also some rationale for the tool in following thread:
https://lore.kernel.org/dri-devel/a443495f-5d1b-52e1-9b2f-80167deb6d57@linux.intel.com/

With this patch it should be possible to run following on xe device:

gem_wsim -w benchmarks/wsim/media_load_balance_fhd26u7.wsim -c 36 -r 600

Best with drm debug logs disabled:

echo 0 > /sys/module/drm/parameters/debug

v2: minimizing divergence - same workload syntax for both drivers,
    so most existing examples should run on xe unmodified (Tvrtko)
    This version creates one common VM per workload.
    Explicit VM management, compute mode will come in next patchset.

v3:
- use calloc in parse_workload for struct workload,
  to allow cleanups in fini_workload
- grouped xe specific fields (Tvrtko)
- moved is_xe boolean next to fd (Tvrtko)
- use xe_ prefix for Xe specific things (Tvrtko)
- left throttling untouched (Tvrtko)
- parse errors vs silent skips on not implemented steps (Tvrtko)
- need to think on better engine handling in next version
- add 'Xe and i915 differences' section to README (Tvrtko)
  for now no data dependency implemented, left -1 <=> f-1
  to not modify examples (maybe too optimistic assumption?)

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
---
 benchmarks/gem_wsim.c  | 430 +++++++++++++++++++++++++++++++++++++++--
 benchmarks/wsim/README |  15 +-
 2 files changed, 432 insertions(+), 13 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index d447dced4..68c3b2cb0 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -62,6 +62,12 @@
 #include "i915/gem_engine_topology.h"
 #include "i915/gem_mman.h"
 
+#include "igt_syncobj.h"
+#include "intel_allocator.h"
+#include "xe_drm.h"
+#include "xe/xe_ioctl.h"
+#include "xe/xe_spin.h"
+
 enum intel_engine_id {
 	DEFAULT,
 	RCS,
@@ -185,11 +191,31 @@ struct w_step {
 			struct drm_i915_gem_relocation_entry reloc[3];
 			uint32_t *bb_duration;
 		} i915;
+		struct {
+			struct drm_xe_exec exec;
+			struct {
+				struct xe_spin spin;
+				uint64_t vm_sync;
+				uint64_t exec_sync;
+			} *data;
+			struct drm_xe_sync *syncs;
+		} xe;
 	};
 	uint32_t bb_handle;
 	size_t bb_size;
 };
 
+struct xe_vm {
+	uint32_t id;
+	bool compute_mode;
+	uint64_t ahnd;
+};
+
+struct xe_exec_queue {
+	uint32_t id;
+	struct drm_xe_engine_class_instance hwe;
+};
+
 struct ctx {
 	uint32_t id;
 	int priority;
@@ -199,6 +225,12 @@ struct ctx {
 	struct bond *bonds;
 	bool load_balance;
 	uint64_t sseu;
+	struct {
+		/* reference to vm */
+		struct xe_vm *vm;
+		/* queue for each class */
+		struct xe_exec_queue queues[NUM_ENGINES];
+	} xe;
 };
 
 struct workload {
@@ -222,6 +254,11 @@ struct workload {
 	unsigned int nr_ctxs;
 	struct ctx *ctx_list;
 
+	struct {
+		unsigned int nr_vms;
+		struct xe_vm *vm_list;
+	} xe;
+
 	struct working_set **working_sets; /* array indexed by set id */
 	int max_working_set_id;
 
@@ -232,10 +269,24 @@ struct workload {
 	unsigned int nrequest[NUM_ENGINES];
 };
 
+#define for_each_ctx(__ctx, __wrk) \
+	for (int __i = 0; __i < (__wrk)->nr_ctxs && \
+	     (__ctx = &(__wrk)->ctx_list[__i]); ++__i)
+
+#define for_each_xe_exec_queue(__eq, __ctx) \
+		for (int __j = 0; __j < NUM_ENGINES && \
+		     ((__eq) = &((__ctx)->xe.queues[__j])); ++__j) \
+			for_if((__eq)->id > 0)
+
+#define for_each_xe_vm(__vm, __wrk) \
+	for (int __i = 0; __i < (__wrk)->xe.nr_vms && \
+	     (__vm = &(__wrk)->xe.vm_list[__i]); ++__i)
+
 static unsigned int master_prng;
 
 static int verbose = 1;
 static int fd;
+static bool is_xe;
 static struct drm_i915_gem_context_param_sseu device_sseu = {
 	.slice_mask = -1 /* Force read on first use. */
 };
@@ -256,7 +307,10 @@ static const char *ring_str_map[NUM_ENGINES] = {
 
 static void w_step_sync(struct w_step *w)
 {
-	gem_sync(fd, w->i915.obj[0].handle);
+	if (is_xe)
+		igt_assert(syncobj_wait(fd, &w->xe.syncs[0].handle, 1, INT64_MAX, 0, NULL));
+	else
+		gem_sync(fd, w->i915.obj[0].handle);
 }
 
 static int read_timestamp_frequency(int i915)
@@ -360,15 +414,23 @@ parse_dependency(unsigned int nr_steps, struct w_step *w, char *str)
 		if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
 			return -1;
 
-		add_dep(&w->data_deps, entry);
+		/* only fence deps in xe, let f-1 <==> -1 */
+		if (is_xe)
+			add_dep(&w->fence_deps, entry);
+		else
+			add_dep(&w->data_deps, entry);
 
 		break;
 	case 's':
-		submit_fence = true;
+		/* no submit fence in xe ? */
+		if (!is_xe)
+			submit_fence = true;
 		/* Fall-through. */
 	case 'f':
-		/* Multiple fences not yet supported. */
-		igt_assert_eq(w->fence_deps.nr, 0);
+		/* xe supports multiple fences */
+		if (!is_xe)
+			/* Multiple fences not yet supported. */
+			igt_assert_eq(w->fence_deps.nr, 0);
 
 		entry.target = atoi(++str);
 		if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
@@ -478,7 +540,17 @@ static struct intel_engine_data *query_engines(void)
 	if (engines.nengines)
 		return &engines;
 
-	engines = intel_engine_list_of_physical(fd);
+	if (is_xe) {
+		struct drm_xe_engine_class_instance *hwe;
+
+		xe_for_each_hw_engine(fd, hwe) {
+			engines.engines[engines.nengines].class = hwe->engine_class;
+			engines.engines[engines.nengines].instance = hwe->engine_instance;
+			engines.nengines++;
+		}
+	} else
+		engines = intel_engine_list_of_physical(fd);
+
 	igt_assert(engines.nengines);
 	return &engines;
 }
@@ -571,6 +643,40 @@ get_engine(enum intel_engine_id engine)
 	return ci;
 }
 
+static struct drm_xe_engine_class_instance
+xe_get_engine(enum intel_engine_id engine)
+{
+	struct drm_xe_engine_class_instance ci = {};
+
+	switch (engine) {
+	case DEFAULT:
+	case RCS:
+		ci.engine_class = DRM_XE_ENGINE_CLASS_RENDER;
+		ci.engine_instance = 0;
+		break;
+	case BCS:
+		ci.engine_class = DRM_XE_ENGINE_CLASS_COPY;
+		ci.engine_instance = 0;
+		break;
+	case VCS1:
+		ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_DECODE;
+		ci.engine_instance = 0;
+		break;
+	case VCS2:
+		ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_DECODE;
+		ci.engine_instance = 1;
+		break;
+	case VECS:
+		ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE;
+		ci.engine_instance = 0;
+		break;
+	default:
+		igt_assert(0);
+	};
+
+	return ci;
+}
+
 static int parse_engine_map(struct w_step *step, const char *_str)
 {
 	char *token, *tctx = NULL, *tstart = (char *)_str;
@@ -845,6 +951,12 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			} else if (!strcmp(field, "P")) {
 				unsigned int nr = 0;
 
+				if (is_xe) {
+					wsim_err("Priority step is not implemented yet :(\n");
+					free(token);
+					return NULL;
+				}
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
@@ -871,6 +983,12 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			} else if (!strcmp(field, "S")) {
 				unsigned int nr = 0;
 
+				if (is_xe) {
+					wsim_err("SSEU step is not implemented yet :(\n");
+					free(token);
+					return NULL;
+				}
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(tmp <= 0 && nr == 0,
@@ -984,6 +1102,12 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			} else if (!strcmp(field, "b")) {
 				unsigned int nr = 0;
 
+				if (is_xe) {
+					wsim_err("Bonding is not implemented yet :(\n");
+					free(token);
+					return NULL;
+				}
+
 				while ((field = strtok_r(fstart, ".", &fctx))) {
 					check_arg(nr > 2,
 						  "Invalid bond format at step %u!\n",
@@ -1018,6 +1142,12 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
 			} else if (!strcmp(field, "w") || !strcmp(field, "W")) {
 				unsigned int nr = 0;
 
+				if (is_xe) {
+					wsim_err("Working sets are not implemented yet :(\n");
+					free(token);
+					return NULL;
+				}
+
 				step.working_set.shared = field[0] == 'W';
 
 				while ((field = strtok_r(fstart, ".", &fctx))) {
@@ -1136,7 +1266,7 @@ add_step:
 		nr_steps += app_w->nr_steps;
 	}
 
-	wrk = malloc(sizeof(*wrk));
+	wrk = calloc(1, sizeof(*wrk));
 	igt_assert(wrk);
 
 	wrk->nr_steps = nr_steps;
@@ -1297,6 +1427,20 @@ __get_ctx(struct workload *wrk, const struct w_step *w)
 	return &wrk->ctx_list[w->context];
 }
 
+static struct xe_exec_queue *
+xe_get_eq(struct workload *wrk, const struct w_step *w)
+{
+	igt_assert(w->engine >= 0 && w->engine < NUM_ENGINES);
+
+	return &__get_ctx(wrk, w)->xe.queues[w->engine];
+}
+
+static struct xe_vm *
+xe_get_vm(struct workload *wrk, const struct w_step *w)
+{
+	return wrk->xe.vm_list;
+}
+
 static uint32_t mmio_base(int i915, enum intel_engine_id engine, int gen)
 {
 	const char *name;
@@ -1549,6 +1693,62 @@ alloc_step_batch(struct workload *wrk, struct w_step *w)
 #endif
 }
 
+static void
+xe_alloc_step_batch(struct workload *wrk, struct w_step *w)
+{
+	struct xe_vm *vm = xe_get_vm(wrk, w);
+	struct xe_exec_queue *eq = xe_get_eq(wrk, w);
+	struct dep_entry *dep;
+	int i;
+
+	w->bb_size = ALIGN(sizeof(*w->xe.data) + xe_cs_prefetch_size(fd),
+			   xe_get_default_alignment(fd));
+	w->bb_handle = xe_bo_create_flags(fd, vm->id, w->bb_size,
+				visible_vram_if_possible(fd, eq->hwe.gt_id));
+	w->xe.data = xe_bo_map(fd, w->bb_handle, w->bb_size);
+	w->xe.exec.address =
+		intel_allocator_alloc_with_strategy(vm->ahnd, w->bb_handle, w->bb_size,
+						    0, ALLOC_STRATEGY_LOW_TO_HIGH);
+	xe_vm_bind_sync(fd, vm->id, w->bb_handle, 0, w->xe.exec.address, w->bb_size);
+	xe_spin_init_opts(&w->xe.data->spin, .addr = w->xe.exec.address,
+				   .preempt = (w->preempt_us > 0),
+				   .ctx_ticks = duration_to_ctx_ticks(fd, eq->hwe.gt_id,
+								1000LL * get_duration(wrk, w)));
+	w->xe.exec.exec_queue_id = eq->id;
+	w->xe.exec.num_batch_buffer = 1;
+	/* always at least one out fence */
+	w->xe.exec.num_syncs = 1;
+	/* count syncs */
+	igt_assert_eq(0, w->data_deps.nr);
+	for_each_dep(dep, w->fence_deps) {
+		int dep_idx = w->idx + dep->target;
+
+		igt_assert(dep_idx >= 0 && dep_idx < w->idx);
+		igt_assert(wrk->steps[dep_idx].type == SW_FENCE ||
+			   wrk->steps[dep_idx].type == BATCH);
+
+		w->xe.exec.num_syncs++;
+	}
+	w->xe.syncs = calloc(w->xe.exec.num_syncs, sizeof(*w->xe.syncs));
+	/* fill syncs */
+	i = 0;
+	/* out fence */
+	w->xe.syncs[i].handle = syncobj_create(fd, 0);
+	w->xe.syncs[i++].flags = DRM_XE_SYNC_SYNCOBJ | DRM_XE_SYNC_SIGNAL;
+	/* in fence(s) */
+	for_each_dep(dep, w->fence_deps) {
+		int dep_idx = w->idx + dep->target;
+
+		igt_assert(wrk->steps[dep_idx].type == SW_FENCE ||
+			   wrk->steps[dep_idx].type == BATCH);
+		igt_assert(wrk->steps[dep_idx].xe.syncs && wrk->steps[dep_idx].xe.syncs[0].handle);
+
+		w->xe.syncs[i].handle = wrk->steps[dep_idx].xe.syncs[0].handle;
+		w->xe.syncs[i++].flags = DRM_XE_SYNC_SYNCOBJ;
+	}
+	w->xe.exec.syncs = to_user_pointer(w->xe.syncs);
+}
+
 static bool set_priority(uint32_t ctx_id, int prio)
 {
 	struct drm_i915_gem_context_param param = {
@@ -1774,6 +1974,61 @@ static void measure_active_set(struct workload *wrk)
 
 #define alloca0(sz) ({ size_t sz__ = (sz); memset(alloca(sz__), 0, sz__); })
 
+static void xe_vm_create_(struct xe_vm *vm)
+{
+	uint32_t flags = 0;
+
+	if (vm->compute_mode)
+		flags |= DRM_XE_VM_CREATE_ASYNC_BIND_OPS |
+			 DRM_XE_VM_CREATE_COMPUTE_MODE;
+
+	vm->id = xe_vm_create(fd, flags, 0);
+}
+
+static void xe_exec_queue_create_(struct ctx *ctx, struct xe_exec_queue *eq)
+{
+	struct drm_xe_exec_queue_create create = {
+		.vm_id = ctx->xe.vm->id,
+		.width = 1,
+		.num_placements = 1,
+		.instances = to_user_pointer(&eq->hwe),
+	};
+	struct drm_xe_engine_class_instance *eci = NULL;
+
+	if (ctx->load_balance && eq->hwe.engine_class == DRM_XE_ENGINE_CLASS_VIDEO_DECODE) {
+		struct drm_xe_engine_class_instance *hwe;
+		int i;
+
+		for (i = 0; i < ctx->engine_map_count; ++i)
+			igt_assert(ctx->engine_map[i] == VCS || ctx->engine_map[i] == VCS1 ||
+				   ctx->engine_map[i] == VCS2);
+
+		eci = calloc(16, sizeof(struct drm_xe_engine_class_instance));
+		create.num_placements = 0;
+		xe_for_each_hw_engine(fd, hwe) {
+			if (hwe->engine_class != DRM_XE_ENGINE_CLASS_VIDEO_DECODE ||
+			    hwe->gt_id != 0)
+				continue;
+
+			igt_assert(create.num_placements < 16);
+			eci[create.num_placements++] = *hwe;
+		}
+		igt_assert(create.num_placements);
+		create.instances = to_user_pointer(eci);
+
+		if (verbose > 3)
+			printf("num_placements=%d class=%d gt=%d\n", create.num_placements,
+				eq->hwe.engine_class, eq->hwe.gt_id);
+	}
+
+	igt_assert_eq(igt_ioctl(fd, DRM_IOCTL_XE_EXEC_QUEUE_CREATE, &create), 0);
+
+	if (eci)
+		free(eci);
+
+	eq->id = create.exec_queue_id;
+}
+
 static void allocate_contexts(unsigned int id, struct workload *wrk)
 {
 	int max_ctx = -1;
@@ -1978,6 +2233,77 @@ static int prepare_contexts(unsigned int id, struct workload *wrk)
 	return 0;
 }
 
+static int xe_prepare_vms_contexts_syncobjs(unsigned int id, struct workload *wrk)
+{
+	int engine_classes[NUM_ENGINES] = {};
+	struct w_step *w;
+	int i, j;
+
+	/* shortcut, create one vm */
+	wrk->xe.nr_vms = 1;
+	wrk->xe.vm_list = calloc(wrk->xe.nr_vms, sizeof(struct xe_vm));
+	wrk->xe.vm_list->compute_mode = false;
+	xe_vm_create_(wrk->xe.vm_list);
+	wrk->xe.vm_list->ahnd = intel_allocator_open(fd, wrk->xe.vm_list->id, INTEL_ALLOCATOR_RELOC);
+
+	/* create exec queues of each referenced engine class */
+	for (j = 0; j < wrk->nr_ctxs; j++) {
+		struct ctx *ctx = &wrk->ctx_list[j];
+		/* link with vm */
+		ctx->xe.vm = wrk->xe.vm_list;
+		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+			if (w->context != j)
+				continue;
+			if (w->type == ENGINE_MAP) {
+				ctx->engine_map = w->engine_map;
+				ctx->engine_map_count = w->engine_map_count;
+			} else if (w->type == LOAD_BALANCE) {
+				if (!ctx->engine_map) {
+					wsim_err("Load balancing needs an engine map!\n");
+					return 1;
+				}
+				if (intel_gen(intel_get_drm_devid(fd)) < 11) {
+					wsim_err("Load balancing needs relative mmio support, gen11+!\n");
+					return 1;
+				}
+				ctx->load_balance = w->load_balance;
+			}
+		}
+		/* create exec queue for each referenced engine */
+		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+			if (w->context != j)
+				continue;
+			if (w->type == BATCH)
+				engine_classes[w->engine]++;
+		}
+		for (i = 0; i < NUM_ENGINES; i++) {
+			if (engine_classes[i]) {
+				if (verbose > 3)
+					printf("%u ctx[%d] eq(%s) load_balance=%d\n",
+						id, j, ring_str_map[i], ctx->load_balance);
+				if (i == VCS) {
+					ctx->xe.queues[i].hwe.engine_class =
+						xe_get_engine(VCS1).engine_class;
+					ctx->xe.queues[i].hwe.engine_instance = 1;
+				} else
+					ctx->xe.queues[i].hwe = xe_get_engine(i);
+				xe_exec_queue_create_(ctx, &ctx->xe.queues[i]);
+			}
+			engine_classes[i] = 0;
+		}
+	}
+
+	/* create syncobjs for SW_FENCE */
+	for (j = 0, i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++)
+		if (w->type == SW_FENCE) {
+			w->xe.syncs = calloc(1, sizeof(struct drm_xe_sync));
+			w->xe.syncs[0].handle = syncobj_create(fd, 0);
+			w->xe.syncs[0].flags = DRM_XE_SYNC_SYNCOBJ;
+		}
+
+	return 0;
+}
+
 static void prepare_working_sets(unsigned int id, struct workload *wrk)
 {
 	struct working_set **sets;
@@ -2047,7 +2373,11 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
 
 	allocate_contexts(id, wrk);
 
-	ret = prepare_contexts(id, wrk);
+	if (is_xe)
+		ret = xe_prepare_vms_contexts_syncobjs(id, wrk);
+	else
+		ret = prepare_contexts(id, wrk);
+
 	if (ret)
 		return ret;
 
@@ -2101,7 +2431,10 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
 		if (w->type != BATCH)
 			continue;
 
-		alloc_step_batch(wrk, w);
+		if (is_xe)
+			xe_alloc_step_batch(wrk, w);
+		else
+			alloc_step_batch(wrk, w);
 	}
 
 	measure_active_set(wrk);
@@ -2151,6 +2484,23 @@ static void w_sync_to(struct workload *wrk, struct w_step *w, int target)
 	w_step_sync(&wrk->steps[target]);
 }
 
+static void do_xe_exec(struct workload *wrk, struct w_step *w)
+{
+	struct xe_exec_queue *eq = xe_get_eq(wrk, w);
+
+	igt_assert(w->emit_fence <= 0);
+	if (w->emit_fence == -1)
+		syncobj_reset(fd, &w->xe.syncs[0].handle, 1);
+
+	/* update duration if random */
+	if (w->duration.max != w->duration.min)
+		xe_spin_init_opts(&w->xe.data->spin, .addr = w->xe.exec.address,
+					   .preempt = (w->preempt_us > 0),
+					   .ctx_ticks = duration_to_ctx_ticks(fd, eq->hwe.gt_id,
+								1000LL * get_duration(wrk, w)));
+	xe_exec(fd, &w->xe.exec);
+}
+
 static void
 do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine)
 {
@@ -2276,6 +2626,10 @@ static void *run_workload(void *data)
 					sw_sync_timeline_create_fence(wrk->sync_timeline,
 								      cur_seqno + w->idx);
 				igt_assert(w->emit_fence > 0);
+				if (is_xe)
+					/* Convert sync file to syncobj */
+					syncobj_import_sync_file(fd, w->xe.syncs[0].handle,
+								 w->emit_fence);
 				continue;
 			} else if (w->type == SW_FENCE_SIGNAL) {
 				int tgt = w->idx + w->target;
@@ -2307,7 +2661,10 @@ static void *run_workload(void *data)
 				igt_assert(wrk->steps[t_idx].type == BATCH);
 				igt_assert(wrk->steps[t_idx].duration.unbound);
 
-				*wrk->steps[t_idx].i915.bb_duration = 0xffffffff;
+				if (is_xe)
+					xe_spin_end(&wrk->steps[t_idx].xe.data->spin);
+				else
+					*wrk->steps[t_idx].i915.bb_duration = 0xffffffff;
 				__sync_synchronize();
 				continue;
 			} else if (w->type == SSEU) {
@@ -2339,7 +2696,10 @@ static void *run_workload(void *data)
 			if (throttle > 0)
 				w_sync_to(wrk, w, i - throttle);
 
-			do_eb(wrk, w, engine);
+			if (is_xe)
+				do_xe_exec(wrk, w);
+			else
+				do_eb(wrk, w, engine);
 
 			if (w->request != -1) {
 				igt_list_del(&w->rq_link);
@@ -2383,6 +2743,10 @@ static void *run_workload(void *data)
 		for (i = 0, w = wrk->steps; wrk->run && (i < wrk->nr_steps);
 		     i++, w++) {
 			if (w->emit_fence > 0) {
+				if (is_xe) {
+					igt_assert(w->type == SW_FENCE);
+					syncobj_reset(fd, &w->xe.syncs[0].handle, 1);
+				}
 				close(w->emit_fence);
 				w->emit_fence = -1;
 			}
@@ -2397,6 +2761,23 @@ static void *run_workload(void *data)
 		w_step_sync(w);
 	}
 
+	if (is_xe) {
+		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+			if (w->type == BATCH) {
+				w_step_sync(w);
+				syncobj_destroy(fd, w->xe.syncs[0].handle);
+				free(w->xe.syncs);
+				xe_vm_unbind_sync(fd, xe_get_vm(wrk, w)->id, 0, w->xe.exec.address,
+						  w->bb_size);
+				gem_munmap(w->xe.data, w->bb_size);
+				gem_close(fd, w->bb_handle);
+			} else if (w->type == SW_FENCE) {
+				syncobj_destroy(fd, w->xe.syncs[0].handle);
+				free(w->xe.syncs);
+			}
+		}
+	}
+
 	clock_gettime(CLOCK_MONOTONIC, &t_end);
 
 	if (wrk->print_stats) {
@@ -2416,6 +2797,23 @@ static void *run_workload(void *data)
 
 static void fini_workload(struct workload *wrk)
 {
+	if (is_xe) {
+		struct xe_exec_queue *eq;
+		struct xe_vm *vm;
+		struct ctx *ctx;
+
+		for_each_ctx(ctx, wrk)
+			for_each_xe_exec_queue(eq, ctx) {
+				xe_exec_queue_destroy(fd, eq->id);
+				eq->id = 0;
+			}
+		for_each_xe_vm(vm, wrk) {
+			put_ahnd(vm->ahnd);
+			xe_vm_destroy(fd, vm->id);
+		}
+		free(wrk->xe.vm_list);
+		wrk->xe.nr_vms = 0;
+	}
 	free(wrk->steps);
 	free(wrk);
 }
@@ -2623,8 +3021,12 @@ int main(int argc, char **argv)
 		ret = igt_device_find_first_i915_discrete_card(&card);
 		if (!ret)
 			ret = igt_device_find_integrated_card(&card);
+		if (!ret)
+			ret = igt_device_find_first_xe_discrete_card(&card);
+		if (!ret)
+			ret = igt_device_find_xe_integrated_card(&card);
 		if (!ret) {
-			wsim_err("No device filter specified and no i915 devices found!\n");
+			wsim_err("No device filter specified and no intel devices found!\n");
 			return EXIT_FAILURE;
 		}
 	}
@@ -2647,6 +3049,10 @@ int main(int argc, char **argv)
 	if (verbose > 1)
 		printf("Using device %s\n", drm_dev);
 
+	is_xe = is_xe_device(fd);
+	if (is_xe)
+		xe_device_get(fd);
+
 	if (!nr_w_args) {
 		wsim_err("No workload descriptor(s)!\n");
 		goto err;
diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
index d4f3dd8ae..38f305ba0 100644
--- a/benchmarks/wsim/README
+++ b/benchmarks/wsim/README
@@ -88,6 +88,19 @@ Batch durations can also be specified as infinite by using the '*' in the
 duration field. Such batches must be ended by the terminate command ('T')
 otherwise they will cause a GPU hang to be reported.
 
+Xe and i915 differences
+------------------------
+
+There are differences between Xe and i915, like not allowing a BO list to
+be passed to an exec (and create implicit syncs). For more details see:
+https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/drm-xe-next/drivers/gpu/drm/xe/xe_exec.c
+
+Currently following batch steps are equal on Xe:
+1.1000.-2.0 <==> 1.1000.f-2.0
+and will create explicit sync fence dependency (via syncobjects).
+
+The data dependency need to wait for working sets implementation.
+
 Sync (fd) fences
 ----------------
 
@@ -131,7 +144,7 @@ runnable. When the second RCS batch completes the standalone fence is signaled
 which allows the two VCS batches to be executed. Finally we wait until the both
 VCS batches have completed before starting the (optional) next iteration.
 
-Submit fences
+Submit fences (i915 only)
 -------------
 
 Submit fences are a type of input fence which are signalled when the originating
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 17/17] benchmarks/gem_wsim: added basic xe support
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 17/17] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
@ 2023-09-29 10:45   ` Tvrtko Ursulin
  2023-09-29 15:53     ` Bernatowicz, Marcin
  0 siblings, 1 reply; 40+ messages in thread
From: Tvrtko Ursulin @ 2023-09-29 10:45 UTC (permalink / raw)
  To: Marcin Bernatowicz, igt-dev; +Cc: chris.p.wilson


On 28/09/2023 18:45, Marcin Bernatowicz wrote:
> Added basic xe support. Single binary handles both i915 and Xe devices.
> 
> Some functionality is still missing: working sets, bonding.
> 
> The tool is handy for scheduling tests, we find it useful to verify vGPU
> profiles defining different execution quantum/preemption timeout
> settings.
> 
> There is also some rationale for the tool in following thread:
> https://lore.kernel.org/dri-devel/a443495f-5d1b-52e1-9b2f-80167deb6d57@linux.intel.com/
> 
> With this patch it should be possible to run following on xe device:
> 
> gem_wsim -w benchmarks/wsim/media_load_balance_fhd26u7.wsim -c 36 -r 600
> 
> Best with drm debug logs disabled:
> 
> echo 0 > /sys/module/drm/parameters/debug
> 
> v2: minimizing divergence - same workload syntax for both drivers,
>      so most existing examples should run on xe unmodified (Tvrtko)
>      This version creates one common VM per workload.
>      Explicit VM management, compute mode will come in next patchset.
> 
> v3:
> - use calloc in parse_workload for struct workload,
>    to allow cleanups in fini_workload
> - grouped xe specific fields (Tvrtko)
> - moved is_xe boolean next to fd (Tvrtko)
> - use xe_ prefix for Xe specific things (Tvrtko)
> - left throttling untouched (Tvrtko)
> - parse errors vs silent skips on not implemented steps (Tvrtko)
> - need to think on better engine handling in next version
> - add 'Xe and i915 differences' section to README (Tvrtko)
>    for now no data dependency implemented, left -1 <=> f-1
>    to not modify examples (maybe too optimistic assumption?)
> 
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> ---
>   benchmarks/gem_wsim.c  | 430 +++++++++++++++++++++++++++++++++++++++--
>   benchmarks/wsim/README |  15 +-
>   2 files changed, 432 insertions(+), 13 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index d447dced4..68c3b2cb0 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -62,6 +62,12 @@
>   #include "i915/gem_engine_topology.h"
>   #include "i915/gem_mman.h"
>   
> +#include "igt_syncobj.h"
> +#include "intel_allocator.h"
> +#include "xe_drm.h"
> +#include "xe/xe_ioctl.h"
> +#include "xe/xe_spin.h"
> +
>   enum intel_engine_id {
>   	DEFAULT,
>   	RCS,
> @@ -185,11 +191,31 @@ struct w_step {
>   			struct drm_i915_gem_relocation_entry reloc[3];
>   			uint32_t *bb_duration;
>   		} i915;
> +		struct {
> +			struct drm_xe_exec exec;
> +			struct {
> +				struct xe_spin spin;
> +				uint64_t vm_sync;
> +				uint64_t exec_sync;
> +			} *data;
> +			struct drm_xe_sync *syncs;
> +		} xe;
>   	};
>   	uint32_t bb_handle;
>   	size_t bb_size;
>   };
>   
> +struct xe_vm {
> +	uint32_t id;
> +	bool compute_mode;
> +	uint64_t ahnd;
> +};
> +
> +struct xe_exec_queue {
> +	uint32_t id;
> +	struct drm_xe_engine_class_instance hwe;
> +};
> +
>   struct ctx {
>   	uint32_t id;
>   	int priority;
> @@ -199,6 +225,12 @@ struct ctx {
>   	struct bond *bonds;
>   	bool load_balance;
>   	uint64_t sseu;
> +	struct {
> +		/* reference to vm */
> +		struct xe_vm *vm;
> +		/* queue for each class */
> +		struct xe_exec_queue queues[NUM_ENGINES];
> +	} xe;
>   };
>   
>   struct workload {
> @@ -222,6 +254,11 @@ struct workload {
>   	unsigned int nr_ctxs;
>   	struct ctx *ctx_list;
>   
> +	struct {
> +		unsigned int nr_vms;
> +		struct xe_vm *vm_list;
> +	} xe;
> +
>   	struct working_set **working_sets; /* array indexed by set id */
>   	int max_working_set_id;
>   
> @@ -232,10 +269,24 @@ struct workload {
>   	unsigned int nrequest[NUM_ENGINES];
>   };
>   
> +#define for_each_ctx(__ctx, __wrk) \
> +	for (int __i = 0; __i < (__wrk)->nr_ctxs && \
> +	     (__ctx = &(__wrk)->ctx_list[__i]); ++__i)

This one looks you could also extract and make use of ahead of xe support.

> +
> +#define for_each_xe_exec_queue(__eq, __ctx) \
> +		for (int __j = 0; __j < NUM_ENGINES && \
> +		     ((__eq) = &((__ctx)->xe.queues[__j])); ++__j) \
> +			for_if((__eq)->id > 0)
> +
> +#define for_each_xe_vm(__vm, __wrk) \
> +	for (int __i = 0; __i < (__wrk)->xe.nr_vms && \
> +	     (__vm = &(__wrk)->xe.vm_list[__i]); ++__i)
> +
>   static unsigned int master_prng;
>   
>   static int verbose = 1;
>   static int fd;
> +static bool is_xe;
>   static struct drm_i915_gem_context_param_sseu device_sseu = {
>   	.slice_mask = -1 /* Force read on first use. */
>   };
> @@ -256,7 +307,10 @@ static const char *ring_str_map[NUM_ENGINES] = {
>   
>   static void w_step_sync(struct w_step *w)
>   {
> -	gem_sync(fd, w->i915.obj[0].handle);
> +	if (is_xe)
> +		igt_assert(syncobj_wait(fd, &w->xe.syncs[0].handle, 1, INT64_MAX, 0, NULL));
> +	else
> +		gem_sync(fd, w->i915.obj[0].handle);
>   }
>   
>   static int read_timestamp_frequency(int i915)
> @@ -360,15 +414,23 @@ parse_dependency(unsigned int nr_steps, struct w_step *w, char *str)
>   		if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
>   			return -1;
>   
> -		add_dep(&w->data_deps, entry);
> +		/* only fence deps in xe, let f-1 <==> -1 */
> +		if (is_xe)
> +			add_dep(&w->fence_deps, entry);
> +		else
> +			add_dep(&w->data_deps, entry);
>   
>   		break;
>   	case 's':
> -		submit_fence = true;
> +		/* no submit fence in xe ? */
> +		if (!is_xe)
> +			submit_fence = true;

I think best to make parsing fail with an user facing message like 
"Submit fences are not supported with xe" or so.

>   		/* Fall-through. */
>   	case 'f':
> -		/* Multiple fences not yet supported. */
> -		igt_assert_eq(w->fence_deps.nr, 0);
> +		/* xe supports multiple fences */
> +		if (!is_xe)
> +			/* Multiple fences not yet supported. */
> +			igt_assert_eq(w->fence_deps.nr, 0);
>   
>   		entry.target = atoi(++str);
>   		if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
> @@ -478,7 +540,17 @@ static struct intel_engine_data *query_engines(void)
>   	if (engines.nengines)
>   		return &engines;
>   
> -	engines = intel_engine_list_of_physical(fd);
> +	if (is_xe) {
> +		struct drm_xe_engine_class_instance *hwe;
> +
> +		xe_for_each_hw_engine(fd, hwe) {
> +			engines.engines[engines.nengines].class = hwe->engine_class;
> +			engines.engines[engines.nengines].instance = hwe->engine_instance;
> +			engines.nengines++;
> +		}
> +	} else
> +		engines = intel_engine_list_of_physical(fd);
> +
>   	igt_assert(engines.nengines);
>   	return &engines;
>   }
> @@ -571,6 +643,40 @@ get_engine(enum intel_engine_id engine)
>   	return ci;
>   }
>   
> +static struct drm_xe_engine_class_instance
> +xe_get_engine(enum intel_engine_id engine)
> +{
> +	struct drm_xe_engine_class_instance ci = {};
> +
> +	switch (engine) {
> +	case DEFAULT:
> +	case RCS:
> +		ci.engine_class = DRM_XE_ENGINE_CLASS_RENDER;
> +		ci.engine_instance = 0;
> +		break;

Workload such as:

M.1.VCS
1.DEFAULT.1000.0.0

Keep working and submit to VCS queue with xe?

> +	case BCS:
> +		ci.engine_class = DRM_XE_ENGINE_CLASS_COPY;
> +		ci.engine_instance = 0;
> +		break;
> +	case VCS1:
> +		ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_DECODE;
> +		ci.engine_instance = 0;
> +		break;
> +	case VCS2:
> +		ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_DECODE;
> +		ci.engine_instance = 1;
> +		break;
> +	case VECS:
> +		ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE;
> +		ci.engine_instance = 0;
> +		break;
> +	default:
> +		igt_assert(0);
> +	};
> +
> +	return ci;
> +}
> +
>   static int parse_engine_map(struct w_step *step, const char *_str)
>   {
>   	char *token, *tctx = NULL, *tstart = (char *)_str;
> @@ -845,6 +951,12 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   			} else if (!strcmp(field, "P")) {
>   				unsigned int nr = 0;
>   
> +				if (is_xe) {
> +					wsim_err("Priority step is not implemented yet :(\n");

Clarify error message by adding "with xe" or something please. Here and 
the bunch below.

> +					free(token);
> +					return NULL;
> +				}
> +
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
>   					tmp = atoi(field);
>   					check_arg(nr == 0 && tmp <= 0,
> @@ -871,6 +983,12 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   			} else if (!strcmp(field, "S")) {
>   				unsigned int nr = 0;
>   
> +				if (is_xe) {
> +					wsim_err("SSEU step is not implemented yet :(\n");
> +					free(token);
> +					return NULL;
> +				}
> +
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
>   					tmp = atoi(field);
>   					check_arg(tmp <= 0 && nr == 0,
> @@ -984,6 +1102,12 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   			} else if (!strcmp(field, "b")) {
>   				unsigned int nr = 0;
>   
> +				if (is_xe) {
> +					wsim_err("Bonding is not implemented yet :(\n");
> +					free(token);
> +					return NULL;
> +				}
> +
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
>   					check_arg(nr > 2,
>   						  "Invalid bond format at step %u!\n",
> @@ -1018,6 +1142,12 @@ parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
>   			} else if (!strcmp(field, "w") || !strcmp(field, "W")) {
>   				unsigned int nr = 0;
>   
> +				if (is_xe) {
> +					wsim_err("Working sets are not implemented yet :(\n");
> +					free(token);
> +					return NULL;
> +				}
> +
>   				step.working_set.shared = field[0] == 'W';
>   
>   				while ((field = strtok_r(fstart, ".", &fctx))) {
> @@ -1136,7 +1266,7 @@ add_step:
>   		nr_steps += app_w->nr_steps;
>   	}
>   
> -	wrk = malloc(sizeof(*wrk));
> +	wrk = calloc(1, sizeof(*wrk));
>   	igt_assert(wrk);
>   
>   	wrk->nr_steps = nr_steps;
> @@ -1297,6 +1427,20 @@ __get_ctx(struct workload *wrk, const struct w_step *w)
>   	return &wrk->ctx_list[w->context];
>   }
>   
> +static struct xe_exec_queue *
> +xe_get_eq(struct workload *wrk, const struct w_step *w)
> +{
> +	igt_assert(w->engine >= 0 && w->engine < NUM_ENGINES);
> +
> +	return &__get_ctx(wrk, w)->xe.queues[w->engine];
> +}
> +
> +static struct xe_vm *
> +xe_get_vm(struct workload *wrk, const struct w_step *w)
> +{
> +	return wrk->xe.vm_list;
> +}
> +
>   static uint32_t mmio_base(int i915, enum intel_engine_id engine, int gen)
>   {
>   	const char *name;
> @@ -1549,6 +1693,62 @@ alloc_step_batch(struct workload *wrk, struct w_step *w)
>   #endif
>   }
>   
> +static void
> +xe_alloc_step_batch(struct workload *wrk, struct w_step *w)
> +{
> +	struct xe_vm *vm = xe_get_vm(wrk, w);
> +	struct xe_exec_queue *eq = xe_get_eq(wrk, w);
> +	struct dep_entry *dep;
> +	int i;
> +
> +	w->bb_size = ALIGN(sizeof(*w->xe.data) + xe_cs_prefetch_size(fd),
> +			   xe_get_default_alignment(fd));
> +	w->bb_handle = xe_bo_create_flags(fd, vm->id, w->bb_size,
> +				visible_vram_if_possible(fd, eq->hwe.gt_id));
> +	w->xe.data = xe_bo_map(fd, w->bb_handle, w->bb_size);
> +	w->xe.exec.address =
> +		intel_allocator_alloc_with_strategy(vm->ahnd, w->bb_handle, w->bb_size,
> +						    0, ALLOC_STRATEGY_LOW_TO_HIGH);
> +	xe_vm_bind_sync(fd, vm->id, w->bb_handle, 0, w->xe.exec.address, w->bb_size);
> +	xe_spin_init_opts(&w->xe.data->spin, .addr = w->xe.exec.address,
> +				   .preempt = (w->preempt_us > 0),
> +				   .ctx_ticks = duration_to_ctx_ticks(fd, eq->hwe.gt_id,
> +								1000LL * get_duration(wrk, w)));
> +	w->xe.exec.exec_queue_id = eq->id;
> +	w->xe.exec.num_batch_buffer = 1;
> +	/* always at least one out fence */
> +	w->xe.exec.num_syncs = 1;
> +	/* count syncs */
> +	igt_assert_eq(0, w->data_deps.nr);
> +	for_each_dep(dep, w->fence_deps) {
> +		int dep_idx = w->idx + dep->target;
> +
> +		igt_assert(dep_idx >= 0 && dep_idx < w->idx);
> +		igt_assert(wrk->steps[dep_idx].type == SW_FENCE ||
> +			   wrk->steps[dep_idx].type == BATCH);
> +
> +		w->xe.exec.num_syncs++;
> +	}
> +	w->xe.syncs = calloc(w->xe.exec.num_syncs, sizeof(*w->xe.syncs));
> +	/* fill syncs */
> +	i = 0;
> +	/* out fence */
> +	w->xe.syncs[i].handle = syncobj_create(fd, 0);
> +	w->xe.syncs[i++].flags = DRM_XE_SYNC_SYNCOBJ | DRM_XE_SYNC_SIGNAL;
> +	/* in fence(s) */
> +	for_each_dep(dep, w->fence_deps) {
> +		int dep_idx = w->idx + dep->target;
> +
> +		igt_assert(wrk->steps[dep_idx].type == SW_FENCE ||
> +			   wrk->steps[dep_idx].type == BATCH);
> +		igt_assert(wrk->steps[dep_idx].xe.syncs && wrk->steps[dep_idx].xe.syncs[0].handle);
> +
> +		w->xe.syncs[i].handle = wrk->steps[dep_idx].xe.syncs[0].handle;
> +		w->xe.syncs[i++].flags = DRM_XE_SYNC_SYNCOBJ;
> +	}
> +	w->xe.exec.syncs = to_user_pointer(w->xe.syncs);
> +}
> +
>   static bool set_priority(uint32_t ctx_id, int prio)
>   {
>   	struct drm_i915_gem_context_param param = {
> @@ -1774,6 +1974,61 @@ static void measure_active_set(struct workload *wrk)
>   
>   #define alloca0(sz) ({ size_t sz__ = (sz); memset(alloca(sz__), 0, sz__); })
>   
> +static void xe_vm_create_(struct xe_vm *vm)
> +{
> +	uint32_t flags = 0;
> +
> +	if (vm->compute_mode)
> +		flags |= DRM_XE_VM_CREATE_ASYNC_BIND_OPS |
> +			 DRM_XE_VM_CREATE_COMPUTE_MODE;
> +
> +	vm->id = xe_vm_create(fd, flags, 0);
> +}
> +
> +static void xe_exec_queue_create_(struct ctx *ctx, struct xe_exec_queue *eq)
> +{
> +	struct drm_xe_exec_queue_create create = {
> +		.vm_id = ctx->xe.vm->id,
> +		.width = 1,
> +		.num_placements = 1,
> +		.instances = to_user_pointer(&eq->hwe),
> +	};
> +	struct drm_xe_engine_class_instance *eci = NULL;
> +
> +	if (ctx->load_balance && eq->hwe.engine_class == DRM_XE_ENGINE_CLASS_VIDEO_DECODE) {
> +		struct drm_xe_engine_class_instance *hwe;
> +		int i;
> +
> +		for (i = 0; i < ctx->engine_map_count; ++i)
> +			igt_assert(ctx->engine_map[i] == VCS || ctx->engine_map[i] == VCS1 ||
> +				   ctx->engine_map[i] == VCS2);

ctx->engine_map should not contain VCS at this point, AFAICT the meta 
"all engines" token is expanded to individual instances during parsing 
in parse_engine_map().

So adjust this assert and double-check it really is as I say please.

> +
> +		eci = calloc(16, sizeof(struct drm_xe_engine_class_instance));
> +		create.num_placements = 0;
> +		xe_for_each_hw_engine(fd, hwe) {
> +			if (hwe->engine_class != DRM_XE_ENGINE_CLASS_VIDEO_DECODE ||
> +			    hwe->gt_id != 0)
> +				continue;

1)

Why the gt_id check - does it work on Meteorlake like that?

2)

You seem to be adding all vcs engines irrespective of the configure 
engine map? Make sure that:

M.1.VCS - maps to all vcs instances
M.2.VCS1|VCS2 - maps to vcs0 & vcs1
M.3.VCS1 - maps to vcs0 only
M.3.VCS2 - maps to vcs1 only

> +
> +			igt_assert(create.num_placements < 16);
> +			eci[create.num_placements++] = *hwe;
> +		}
> +		igt_assert(create.num_placements);
> +		create.instances = to_user_pointer(eci);
> +
> +		if (verbose > 3)
> +			printf("num_placements=%d class=%d gt=%d\n", create.num_placements,
> +				eq->hwe.engine_class, eq->hwe.gt_id);
> +	}
> +
> +	igt_assert_eq(igt_ioctl(fd, DRM_IOCTL_XE_EXEC_QUEUE_CREATE, &create), 0);
> +
> +	if (eci)
> +		free(eci);
> +
> +	eq->id = create.exec_queue_id;
> +}
> +
>   static void allocate_contexts(unsigned int id, struct workload *wrk)
>   {
>   	int max_ctx = -1;
> @@ -1978,6 +2233,77 @@ static int prepare_contexts(unsigned int id, struct workload *wrk)
>   	return 0;
>   }
>   
> +static int xe_prepare_vms_contexts_syncobjs(unsigned int id, struct workload *wrk)
> +{
> +	int engine_classes[NUM_ENGINES] = {};
> +	struct w_step *w;
> +	int i, j;
> +
> +	/* shortcut, create one vm */
> +	wrk->xe.nr_vms = 1;
> +	wrk->xe.vm_list = calloc(wrk->xe.nr_vms, sizeof(struct xe_vm));
> +	wrk->xe.vm_list->compute_mode = false;
> +	xe_vm_create_(wrk->xe.vm_list);
> +	wrk->xe.vm_list->ahnd = intel_allocator_open(fd, wrk->xe.vm_list->id, INTEL_ALLOCATOR_RELOC);
> +
> +	/* create exec queues of each referenced engine class */
> +	for (j = 0; j < wrk->nr_ctxs; j++) {
> +		struct ctx *ctx = &wrk->ctx_list[j];
> +		/* link with vm */
> +		ctx->xe.vm = wrk->xe.vm_list;
> +		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
> +			if (w->context != j)
> +				continue;
> +			if (w->type == ENGINE_MAP) {
> +				ctx->engine_map = w->engine_map;
> +				ctx->engine_map_count = w->engine_map_count;
> +			} else if (w->type == LOAD_BALANCE) {
> +				if (!ctx->engine_map) {
> +					wsim_err("Load balancing needs an engine map!\n");
> +					return 1;
> +				}
> +				if (intel_gen(intel_get_drm_devid(fd)) < 11) {
> +					wsim_err("Load balancing needs relative mmio support, gen11+!\n");
> +					return 1;
> +				}
> +				ctx->load_balance = w->load_balance;
> +			}
> +		}
> +		/* create exec queue for each referenced engine */
> +		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
> +			if (w->context != j)
> +				continue;
> +			if (w->type == BATCH)
> +				engine_classes[w->engine]++;
> +		}
> +		for (i = 0; i < NUM_ENGINES; i++) {
> +			if (engine_classes[i]) {
> +				if (verbose > 3)
> +					printf("%u ctx[%d] eq(%s) load_balance=%d\n",
> +						id, j, ring_str_map[i], ctx->load_balance);
> +				if (i == VCS) {
> +					ctx->xe.queues[i].hwe.engine_class =
> +						xe_get_engine(VCS1).engine_class;
> +					ctx->xe.queues[i].hwe.engine_instance = 1;
> +				} else
> +					ctx->xe.queues[i].hwe = xe_get_engine(i);
> +				xe_exec_queue_create_(ctx, &ctx->xe.queues[i]);
> +			}
> +			engine_classes[i] = 0;
> +		}
> +	}
> +
> +	/* create syncobjs for SW_FENCE */
> +	for (j = 0, i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++)
> +		if (w->type == SW_FENCE) {
> +			w->xe.syncs = calloc(1, sizeof(struct drm_xe_sync));
> +			w->xe.syncs[0].handle = syncobj_create(fd, 0);
> +			w->xe.syncs[0].flags = DRM_XE_SYNC_SYNCOBJ;
> +		}
> +
> +	return 0;
> +}
> +
>   static void prepare_working_sets(unsigned int id, struct workload *wrk)
>   {
>   	struct working_set **sets;
> @@ -2047,7 +2373,11 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
>   
>   	allocate_contexts(id, wrk);
>   
> -	ret = prepare_contexts(id, wrk);
> +	if (is_xe)
> +		ret = xe_prepare_vms_contexts_syncobjs(id, wrk);
> +	else
> +		ret = prepare_contexts(id, wrk);

Should this then be more obviously named like:

if (is_xe)
	ret = xe_prepare_contexts(..);
else
	ret = i915_prepare_contexts(..);

?

i915_ prefix optional I guess for less churn, since it would otherwise 
apply to alloc_step_batch too. Whatever you prefer.

> +
>   	if (ret)
>   		return ret;
>   
> @@ -2101,7 +2431,10 @@ static int prepare_workload(unsigned int id, struct workload *wrk)
>   		if (w->type != BATCH)
>   			continue;
>   
> -		alloc_step_batch(wrk, w);
> +		if (is_xe)
> +			xe_alloc_step_batch(wrk, w);
> +		else
> +			alloc_step_batch(wrk, w);
>   	}
>   
>   	measure_active_set(wrk);
> @@ -2151,6 +2484,23 @@ static void w_sync_to(struct workload *wrk, struct w_step *w, int target)
>   	w_step_sync(&wrk->steps[target]);
>   }
>   
> +static void do_xe_exec(struct workload *wrk, struct w_step *w)
> +{
> +	struct xe_exec_queue *eq = xe_get_eq(wrk, w);
> +
> +	igt_assert(w->emit_fence <= 0);
> +	if (w->emit_fence == -1)
> +		syncobj_reset(fd, &w->xe.syncs[0].handle, 1);
> +
> +	/* update duration if random */
> +	if (w->duration.max != w->duration.min)
> +		xe_spin_init_opts(&w->xe.data->spin, .addr = w->xe.exec.address,
> +					   .preempt = (w->preempt_us > 0),
> +					   .ctx_ticks = duration_to_ctx_ticks(fd, eq->hwe.gt_id,
> +								1000LL * get_duration(wrk, w)));
> +	xe_exec(fd, &w->xe.exec);
> +}
> +
>   static void
>   do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine)
>   {
> @@ -2276,6 +2626,10 @@ static void *run_workload(void *data)
>   					sw_sync_timeline_create_fence(wrk->sync_timeline,
>   								      cur_seqno + w->idx);
>   				igt_assert(w->emit_fence > 0);
> +				if (is_xe)
> +					/* Convert sync file to syncobj */
> +					syncobj_import_sync_file(fd, w->xe.syncs[0].handle,
> +								 w->emit_fence);
>   				continue;
>   			} else if (w->type == SW_FENCE_SIGNAL) {
>   				int tgt = w->idx + w->target;
> @@ -2307,7 +2661,10 @@ static void *run_workload(void *data)
>   				igt_assert(wrk->steps[t_idx].type == BATCH);
>   				igt_assert(wrk->steps[t_idx].duration.unbound);
>   
> -				*wrk->steps[t_idx].i915.bb_duration = 0xffffffff;
> +				if (is_xe)
> +					xe_spin_end(&wrk->steps[t_idx].xe.data->spin);
> +				else
> +					*wrk->steps[t_idx].i915.bb_duration = 0xffffffff;
>   				__sync_synchronize();
>   				continue;
>   			} else if (w->type == SSEU) {
> @@ -2339,7 +2696,10 @@ static void *run_workload(void *data)
>   			if (throttle > 0)
>   				w_sync_to(wrk, w, i - throttle);
>   
> -			do_eb(wrk, w, engine);
> +			if (is_xe)
> +				do_xe_exec(wrk, w);
> +			else
> +				do_eb(wrk, w, engine);
>   
>   			if (w->request != -1) {
>   				igt_list_del(&w->rq_link);
> @@ -2383,6 +2743,10 @@ static void *run_workload(void *data)
>   		for (i = 0, w = wrk->steps; wrk->run && (i < wrk->nr_steps);
>   		     i++, w++) {
>   			if (w->emit_fence > 0) {
> +				if (is_xe) {
> +					igt_assert(w->type == SW_FENCE);
> +					syncobj_reset(fd, &w->xe.syncs[0].handle, 1);
> +				}
>   				close(w->emit_fence);
>   				w->emit_fence = -1;
>   			}
> @@ -2397,6 +2761,23 @@ static void *run_workload(void *data)
>   		w_step_sync(w);
>   	}
>   
> +	if (is_xe) {
> +		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
> +			if (w->type == BATCH) {
> +				w_step_sync(w);
> +				syncobj_destroy(fd, w->xe.syncs[0].handle);
> +				free(w->xe.syncs);
> +				xe_vm_unbind_sync(fd, xe_get_vm(wrk, w)->id, 0, w->xe.exec.address,
> +						  w->bb_size);
> +				gem_munmap(w->xe.data, w->bb_size);
> +				gem_close(fd, w->bb_handle);
> +			} else if (w->type == SW_FENCE) {
> +				syncobj_destroy(fd, w->xe.syncs[0].handle);
> +				free(w->xe.syncs);
> +			}
> +		}
> +	}
> +
>   	clock_gettime(CLOCK_MONOTONIC, &t_end);
>   
>   	if (wrk->print_stats) {
> @@ -2416,6 +2797,23 @@ static void *run_workload(void *data)
>   
>   static void fini_workload(struct workload *wrk)
>   {
> +	if (is_xe) {
> +		struct xe_exec_queue *eq;
> +		struct xe_vm *vm;
> +		struct ctx *ctx;
> +
> +		for_each_ctx(ctx, wrk)
> +			for_each_xe_exec_queue(eq, ctx) {
> +				xe_exec_queue_destroy(fd, eq->id);
> +				eq->id = 0;
> +			}
> +		for_each_xe_vm(vm, wrk) {
> +			put_ahnd(vm->ahnd);
> +			xe_vm_destroy(fd, vm->id);
> +		}
> +		free(wrk->xe.vm_list);
> +		wrk->xe.nr_vms = 0;
> +	}

Does it really matter since device fd is getting closed anyway? But 
apart from making the patch bigger it doesn't harm anything so up to you.

>   	free(wrk->steps);
>   	free(wrk);
>   }
> @@ -2623,8 +3021,12 @@ int main(int argc, char **argv)
>   		ret = igt_device_find_first_i915_discrete_card(&card);
>   		if (!ret)
>   			ret = igt_device_find_integrated_card(&card);
> +		if (!ret)
> +			ret = igt_device_find_first_xe_discrete_card(&card);
> +		if (!ret)
> +			ret = igt_device_find_xe_integrated_card(&card);
>   		if (!ret) {
> -			wsim_err("No device filter specified and no i915 devices found!\n");
> +			wsim_err("No device filter specified and no intel devices found!\n");
>   			return EXIT_FAILURE;
>   		}
>   	}
> @@ -2647,6 +3049,10 @@ int main(int argc, char **argv)
>   	if (verbose > 1)
>   		printf("Using device %s\n", drm_dev);
>   
> +	is_xe = is_xe_device(fd);
> +	if (is_xe)
> +		xe_device_get(fd);

Still no put, but meh on that and meh on the get/put naming to begin 
with. :)

> +
>   	if (!nr_w_args) {
>   		wsim_err("No workload descriptor(s)!\n");
>   		goto err;
> diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
> index d4f3dd8ae..38f305ba0 100644
> --- a/benchmarks/wsim/README
> +++ b/benchmarks/wsim/README
> @@ -88,6 +88,19 @@ Batch durations can also be specified as infinite by using the '*' in the
>   duration field. Such batches must be ended by the terminate command ('T')
>   otherwise they will cause a GPU hang to be reported.
>   
> +Xe and i915 differences
> +------------------------
> +
> +There are differences between Xe and i915, like not allowing a BO list to
> +be passed to an exec (and create implicit syncs). For more details see:
> +https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/drm-xe-next/drivers/gpu/drm/xe/xe_exec.c
> +
> +Currently following batch steps are equal on Xe:
> +1.1000.-2.0 <==> 1.1000.f-2.0
> +and will create explicit sync fence dependency (via syncobjects).

I don't remember if I asked if possible misuse combinations of implicit 
fence creation and data deps keep working. Like:

f
1.DEFAULT.1000.-1.0 # should fail

f
1.DEFAULT.1000.f-1.0
1.DEFAULT.1000.f-1.0 # should fail, but without 'f' should work

At least I think this is what should happen. Basically we want to make 
sure existing workloads work but perversions like above should be 
detected as something the auto-magic i915<->xe compatibility translation 
does not want to start allowing.

> +
> +The data dependency need to wait for working sets implementation.

For explicit working set dependencies or also for implicits?

> +
>   Sync (fd) fences
>   ----------------
>   
> @@ -131,7 +144,7 @@ runnable. When the second RCS batch completes the standalone fence is signaled
>   which allows the two VCS batches to be executed. Finally we wait until the both
>   VCS batches have completed before starting the (optional) next iteration.
>   
> -Submit fences
> +Submit fences (i915 only)

Mark priorities, SSEU, working sets and bonds as well please.

>   -------------
>   
>   Submit fences are a type of input fence which are signalled when the originating

Overall - not too much code to add xe support, nice!

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 17/17] benchmarks/gem_wsim: added basic xe support
  2023-09-29 10:45   ` Tvrtko Ursulin
@ 2023-09-29 15:53     ` Bernatowicz, Marcin
  0 siblings, 0 replies; 40+ messages in thread
From: Bernatowicz, Marcin @ 2023-09-29 15:53 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: chris.p.wilson



On 9/29/2023 12:45 PM, Tvrtko Ursulin wrote:
> 
> On 28/09/2023 18:45, Marcin Bernatowicz wrote:
>> Added basic xe support. Single binary handles both i915 and Xe devices.
>>
>> Some functionality is still missing: working sets, bonding.
>>
>> The tool is handy for scheduling tests, we find it useful to verify vGPU
>> profiles defining different execution quantum/preemption timeout
>> settings.
>>
>> There is also some rationale for the tool in following thread:
>> https://lore.kernel.org/dri-devel/a443495f-5d1b-52e1-9b2f-80167deb6d57@linux.intel.com/
>>
>> With this patch it should be possible to run following on xe device:
>>
>> gem_wsim -w benchmarks/wsim/media_load_balance_fhd26u7.wsim -c 36 -r 600
>>
>> Best with drm debug logs disabled:
>>
>> echo 0 > /sys/module/drm/parameters/debug
>>
>> v2: minimizing divergence - same workload syntax for both drivers,
>>      so most existing examples should run on xe unmodified (Tvrtko)
>>      This version creates one common VM per workload.
>>      Explicit VM management, compute mode will come in next patchset.
>>
>> v3:
>> - use calloc in parse_workload for struct workload,
>>    to allow cleanups in fini_workload
>> - grouped xe specific fields (Tvrtko)
>> - moved is_xe boolean next to fd (Tvrtko)
>> - use xe_ prefix for Xe specific things (Tvrtko)
>> - left throttling untouched (Tvrtko)
>> - parse errors vs silent skips on not implemented steps (Tvrtko)
>> - need to think on better engine handling in next version
>> - add 'Xe and i915 differences' section to README (Tvrtko)
>>    for now no data dependency implemented, left -1 <=> f-1
>>    to not modify examples (maybe too optimistic assumption?)
>>
>> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
>> ---
>>   benchmarks/gem_wsim.c  | 430 +++++++++++++++++++++++++++++++++++++++--
>>   benchmarks/wsim/README |  15 +-
>>   2 files changed, 432 insertions(+), 13 deletions(-)
>>
>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>> index d447dced4..68c3b2cb0 100644
>> --- a/benchmarks/gem_wsim.c
>> +++ b/benchmarks/gem_wsim.c
>> @@ -62,6 +62,12 @@
>>   #include "i915/gem_engine_topology.h"
>>   #include "i915/gem_mman.h"
>> +#include "igt_syncobj.h"
>> +#include "intel_allocator.h"
>> +#include "xe_drm.h"
>> +#include "xe/xe_ioctl.h"
>> +#include "xe/xe_spin.h"
>> +
>>   enum intel_engine_id {
>>       DEFAULT,
>>       RCS,
>> @@ -185,11 +191,31 @@ struct w_step {
>>               struct drm_i915_gem_relocation_entry reloc[3];
>>               uint32_t *bb_duration;
>>           } i915;
>> +        struct {
>> +            struct drm_xe_exec exec;
>> +            struct {
>> +                struct xe_spin spin;
>> +                uint64_t vm_sync;
>> +                uint64_t exec_sync;
>> +            } *data;
>> +            struct drm_xe_sync *syncs;
>> +        } xe;
>>       };
>>       uint32_t bb_handle;
>>       size_t bb_size;
>>   };
>> +struct xe_vm {
>> +    uint32_t id;
>> +    bool compute_mode;
>> +    uint64_t ahnd;
>> +};
>> +
>> +struct xe_exec_queue {
>> +    uint32_t id;
>> +    struct drm_xe_engine_class_instance hwe;
>> +};
>> +
>>   struct ctx {
>>       uint32_t id;
>>       int priority;
>> @@ -199,6 +225,12 @@ struct ctx {
>>       struct bond *bonds;
>>       bool load_balance;
>>       uint64_t sseu;
>> +    struct {
>> +        /* reference to vm */
>> +        struct xe_vm *vm;
>> +        /* queue for each class */
>> +        struct xe_exec_queue queues[NUM_ENGINES];
>> +    } xe;
>>   };
>>   struct workload {
>> @@ -222,6 +254,11 @@ struct workload {
>>       unsigned int nr_ctxs;
>>       struct ctx *ctx_list;
>> +    struct {
>> +        unsigned int nr_vms;
>> +        struct xe_vm *vm_list;
>> +    } xe;
>> +
>>       struct working_set **working_sets; /* array indexed by set id */
>>       int max_working_set_id;
>> @@ -232,10 +269,24 @@ struct workload {
>>       unsigned int nrequest[NUM_ENGINES];
>>   };
>> +#define for_each_ctx(__ctx, __wrk) \
>> +    for (int __i = 0; __i < (__wrk)->nr_ctxs && \
>> +         (__ctx = &(__wrk)->ctx_list[__i]); ++__i)
> 
> This one looks you could also extract and make use of ahead of xe support.

ok

> 
>> +
>> +#define for_each_xe_exec_queue(__eq, __ctx) \
>> +        for (int __j = 0; __j < NUM_ENGINES && \
>> +             ((__eq) = &((__ctx)->xe.queues[__j])); ++__j) \
>> +            for_if((__eq)->id > 0)
>> +
>> +#define for_each_xe_vm(__vm, __wrk) \
>> +    for (int __i = 0; __i < (__wrk)->xe.nr_vms && \
>> +         (__vm = &(__wrk)->xe.vm_list[__i]); ++__i)
>> +
>>   static unsigned int master_prng;
>>   static int verbose = 1;
>>   static int fd;
>> +static bool is_xe;
>>   static struct drm_i915_gem_context_param_sseu device_sseu = {
>>       .slice_mask = -1 /* Force read on first use. */
>>   };
>> @@ -256,7 +307,10 @@ static const char *ring_str_map[NUM_ENGINES] = {
>>   static void w_step_sync(struct w_step *w)
>>   {
>> -    gem_sync(fd, w->i915.obj[0].handle);
>> +    if (is_xe)
>> +        igt_assert(syncobj_wait(fd, &w->xe.syncs[0].handle, 1, 
>> INT64_MAX, 0, NULL));
>> +    else
>> +        gem_sync(fd, w->i915.obj[0].handle);
>>   }
>>   static int read_timestamp_frequency(int i915)
>> @@ -360,15 +414,23 @@ parse_dependency(unsigned int nr_steps, struct 
>> w_step *w, char *str)
>>           if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
>>               return -1;
>> -        add_dep(&w->data_deps, entry);
>> +        /* only fence deps in xe, let f-1 <==> -1 */
>> +        if (is_xe)
>> +            add_dep(&w->fence_deps, entry);
>> +        else
>> +            add_dep(&w->data_deps, entry);
>>           break;
>>       case 's':
>> -        submit_fence = true;
>> +        /* no submit fence in xe ? */
>> +        if (!is_xe)
>> +            submit_fence = true;
> 
> I think best to make parsing fail with an user facing message like 
> "Submit fences are not supported with xe" or so.

ok

> 
>>           /* Fall-through. */
>>       case 'f':
>> -        /* Multiple fences not yet supported. */
>> -        igt_assert_eq(w->fence_deps.nr, 0);
>> +        /* xe supports multiple fences */
>> +        if (!is_xe)
>> +            /* Multiple fences not yet supported. */
>> +            igt_assert_eq(w->fence_deps.nr, 0);
>>           entry.target = atoi(++str);
>>           if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
>> @@ -478,7 +540,17 @@ static struct intel_engine_data *query_engines(void)
>>       if (engines.nengines)
>>           return &engines;
>> -    engines = intel_engine_list_of_physical(fd);
>> +    if (is_xe) {
>> +        struct drm_xe_engine_class_instance *hwe;
>> +
>> +        xe_for_each_hw_engine(fd, hwe) {
>> +            engines.engines[engines.nengines].class = hwe->engine_class;
>> +            engines.engines[engines.nengines].instance = 
>> hwe->engine_instance;
>> +            engines.nengines++;
>> +        }
>> +    } else
>> +        engines = intel_engine_list_of_physical(fd);
>> +
>>       igt_assert(engines.nengines);
>>       return &engines;
>>   }
>> @@ -571,6 +643,40 @@ get_engine(enum intel_engine_id engine)
>>       return ci;
>>   }
>> +static struct drm_xe_engine_class_instance
>> +xe_get_engine(enum intel_engine_id engine)
>> +{
>> +    struct drm_xe_engine_class_instance ci = {};
>> +
>> +    switch (engine) {
>> +    case DEFAULT:
>> +    case RCS:
>> +        ci.engine_class = DRM_XE_ENGINE_CLASS_RENDER;
>> +        ci.engine_instance = 0;
>> +        break;
> 
> Workload such as:
> 
> M.1.VCS
> 1.DEFAULT.1000.0.0
> 
> Keep working and submit to VCS queue with xe?
> 
>> +    case BCS:
>> +        ci.engine_class = DRM_XE_ENGINE_CLASS_COPY;
>> +        ci.engine_instance = 0;
>> +        break;
>> +    case VCS1:
>> +        ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_DECODE;
>> +        ci.engine_instance = 0;
>> +        break;
>> +    case VCS2:
>> +        ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_DECODE;
>> +        ci.engine_instance = 1;
>> +        break;
>> +    case VECS:
>> +        ci.engine_class = DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE;
>> +        ci.engine_instance = 0;
>> +        break;
>> +    default:
>> +        igt_assert(0);
>> +    };
>> +
>> +    return ci;
>> +}
>> +
>>   static int parse_engine_map(struct w_step *step, const char *_str)
>>   {
>>       char *token, *tctx = NULL, *tstart = (char *)_str;
>> @@ -845,6 +951,12 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>               } else if (!strcmp(field, "P")) {
>>                   unsigned int nr = 0;
>> +                if (is_xe) {
>> +                    wsim_err("Priority step is not implemented yet 
>> :(\n");
> 
> Clarify error message by adding "with xe" or something please. Here and 
> the bunch below.

ok
> 
>> +                    free(token);
>> +                    return NULL;
>> +                }
>> +
>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>                       tmp = atoi(field);
>>                       check_arg(nr == 0 && tmp <= 0,
>> @@ -871,6 +983,12 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>               } else if (!strcmp(field, "S")) {
>>                   unsigned int nr = 0;
>> +                if (is_xe) {
>> +                    wsim_err("SSEU step is not implemented yet :(\n");
>> +                    free(token);
>> +                    return NULL;
>> +                }
>> +
>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>                       tmp = atoi(field);
>>                       check_arg(tmp <= 0 && nr == 0,
>> @@ -984,6 +1102,12 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>               } else if (!strcmp(field, "b")) {
>>                   unsigned int nr = 0;
>> +                if (is_xe) {
>> +                    wsim_err("Bonding is not implemented yet :(\n");
>> +                    free(token);
>> +                    return NULL;
>> +                }
>> +
>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>>                       check_arg(nr > 2,
>>                             "Invalid bond format at step %u!\n",
>> @@ -1018,6 +1142,12 @@ parse_workload(struct w_arg *arg, unsigned int 
>> flags, double scale_dur,
>>               } else if (!strcmp(field, "w") || !strcmp(field, "W")) {
>>                   unsigned int nr = 0;
>> +                if (is_xe) {
>> +                    wsim_err("Working sets are not implemented yet 
>> :(\n");
>> +                    free(token);
>> +                    return NULL;
>> +                }
>> +
>>                   step.working_set.shared = field[0] == 'W';
>>                   while ((field = strtok_r(fstart, ".", &fctx))) {
>> @@ -1136,7 +1266,7 @@ add_step:
>>           nr_steps += app_w->nr_steps;
>>       }
>> -    wrk = malloc(sizeof(*wrk));
>> +    wrk = calloc(1, sizeof(*wrk));
>>       igt_assert(wrk);
>>       wrk->nr_steps = nr_steps;
>> @@ -1297,6 +1427,20 @@ __get_ctx(struct workload *wrk, const struct 
>> w_step *w)
>>       return &wrk->ctx_list[w->context];
>>   }
>> +static struct xe_exec_queue *
>> +xe_get_eq(struct workload *wrk, const struct w_step *w)
>> +{
>> +    igt_assert(w->engine >= 0 && w->engine < NUM_ENGINES);
>> +
>> +    return &__get_ctx(wrk, w)->xe.queues[w->engine];
>> +}
>> +
>> +static struct xe_vm *
>> +xe_get_vm(struct workload *wrk, const struct w_step *w)
>> +{
>> +    return wrk->xe.vm_list;
>> +}
>> +
>>   static uint32_t mmio_base(int i915, enum intel_engine_id engine, int 
>> gen)
>>   {
>>       const char *name;
>> @@ -1549,6 +1693,62 @@ alloc_step_batch(struct workload *wrk, struct 
>> w_step *w)
>>   #endif
>>   }
>> +static void
>> +xe_alloc_step_batch(struct workload *wrk, struct w_step *w)
>> +{
>> +    struct xe_vm *vm = xe_get_vm(wrk, w);
>> +    struct xe_exec_queue *eq = xe_get_eq(wrk, w);
>> +    struct dep_entry *dep;
>> +    int i;
>> +
>> +    w->bb_size = ALIGN(sizeof(*w->xe.data) + xe_cs_prefetch_size(fd),
>> +               xe_get_default_alignment(fd));
>> +    w->bb_handle = xe_bo_create_flags(fd, vm->id, w->bb_size,
>> +                visible_vram_if_possible(fd, eq->hwe.gt_id));
>> +    w->xe.data = xe_bo_map(fd, w->bb_handle, w->bb_size);
>> +    w->xe.exec.address =
>> +        intel_allocator_alloc_with_strategy(vm->ahnd, w->bb_handle, 
>> w->bb_size,
>> +                            0, ALLOC_STRATEGY_LOW_TO_HIGH);
>> +    xe_vm_bind_sync(fd, vm->id, w->bb_handle, 0, w->xe.exec.address, 
>> w->bb_size);
>> +    xe_spin_init_opts(&w->xe.data->spin, .addr = w->xe.exec.address,
>> +                   .preempt = (w->preempt_us > 0),
>> +                   .ctx_ticks = duration_to_ctx_ticks(fd, eq->hwe.gt_id,
>> +                                1000LL * get_duration(wrk, w)));
>> +    w->xe.exec.exec_queue_id = eq->id;
>> +    w->xe.exec.num_batch_buffer = 1;
>> +    /* always at least one out fence */
>> +    w->xe.exec.num_syncs = 1;
>> +    /* count syncs */
>> +    igt_assert_eq(0, w->data_deps.nr);
>> +    for_each_dep(dep, w->fence_deps) {
>> +        int dep_idx = w->idx + dep->target;
>> +
>> +        igt_assert(dep_idx >= 0 && dep_idx < w->idx);
>> +        igt_assert(wrk->steps[dep_idx].type == SW_FENCE ||
>> +               wrk->steps[dep_idx].type == BATCH);
>> +
>> +        w->xe.exec.num_syncs++;
>> +    }
>> +    w->xe.syncs = calloc(w->xe.exec.num_syncs, sizeof(*w->xe.syncs));
>> +    /* fill syncs */
>> +    i = 0;
>> +    /* out fence */
>> +    w->xe.syncs[i].handle = syncobj_create(fd, 0);
>> +    w->xe.syncs[i++].flags = DRM_XE_SYNC_SYNCOBJ | DRM_XE_SYNC_SIGNAL;
>> +    /* in fence(s) */
>> +    for_each_dep(dep, w->fence_deps) {
>> +        int dep_idx = w->idx + dep->target;
>> +
>> +        igt_assert(wrk->steps[dep_idx].type == SW_FENCE ||
>> +               wrk->steps[dep_idx].type == BATCH);
>> +        igt_assert(wrk->steps[dep_idx].xe.syncs && 
>> wrk->steps[dep_idx].xe.syncs[0].handle);
>> +
>> +        w->xe.syncs[i].handle = wrk->steps[dep_idx].xe.syncs[0].handle;
>> +        w->xe.syncs[i++].flags = DRM_XE_SYNC_SYNCOBJ;
>> +    }
>> +    w->xe.exec.syncs = to_user_pointer(w->xe.syncs);
>> +}
>> +
>>   static bool set_priority(uint32_t ctx_id, int prio)
>>   {
>>       struct drm_i915_gem_context_param param = {
>> @@ -1774,6 +1974,61 @@ static void measure_active_set(struct workload 
>> *wrk)
>>   #define alloca0(sz) ({ size_t sz__ = (sz); memset(alloca(sz__), 0, 
>> sz__); })
>> +static void xe_vm_create_(struct xe_vm *vm)
>> +{
>> +    uint32_t flags = 0;
>> +
>> +    if (vm->compute_mode)
>> +        flags |= DRM_XE_VM_CREATE_ASYNC_BIND_OPS |
>> +             DRM_XE_VM_CREATE_COMPUTE_MODE;
>> +
>> +    vm->id = xe_vm_create(fd, flags, 0);
>> +}
>> +
>> +static void xe_exec_queue_create_(struct ctx *ctx, struct 
>> xe_exec_queue *eq)
>> +{
>> +    struct drm_xe_exec_queue_create create = {
>> +        .vm_id = ctx->xe.vm->id,
>> +        .width = 1,
>> +        .num_placements = 1,
>> +        .instances = to_user_pointer(&eq->hwe),
>> +    };
>> +    struct drm_xe_engine_class_instance *eci = NULL;
>> +
>> +    if (ctx->load_balance && eq->hwe.engine_class == 
>> DRM_XE_ENGINE_CLASS_VIDEO_DECODE) {
>> +        struct drm_xe_engine_class_instance *hwe;
>> +        int i;
>> +
>> +        for (i = 0; i < ctx->engine_map_count; ++i)
>> +            igt_assert(ctx->engine_map[i] == VCS || 
>> ctx->engine_map[i] == VCS1 ||
>> +                   ctx->engine_map[i] == VCS2);
> 
> ctx->engine_map should not contain VCS at this point, AFAICT the meta 
> "all engines" token is expanded to individual instances during parsing 
> in parse_engine_map().
> 
> So adjust this assert and double-check it really is as I say please.
> 
>> +
>> +        eci = calloc(16, sizeof(struct drm_xe_engine_class_instance));
>> +        create.num_placements = 0;
>> +        xe_for_each_hw_engine(fd, hwe) {
>> +            if (hwe->engine_class != DRM_XE_ENGINE_CLASS_VIDEO_DECODE ||
>> +                hwe->gt_id != 0)
>> +                continue;
> 
> 1)
> 
> Why the gt_id check - does it work on Meteorlake like that?

Good catch.
> 
> 2)
> 
> You seem to be adding all vcs engines irrespective of the configure 
> engine map? Make sure that:
> 
> M.1.VCS - maps to all vcs instances
> M.2.VCS1|VCS2 - maps to vcs0 & vcs1
> M.3.VCS1 - maps to vcs0 only
> M.3.VCS2 - maps to vcs1 only

I need to figure out how to do correct mappings, in Xe we have 
class.instance.gt_id tuple to identify engine (CCS needs to be added).

> 
>> +
>> +            igt_assert(create.num_placements < 16);
>> +            eci[create.num_placements++] = *hwe;
>> +        }
>> +        igt_assert(create.num_placements);
>> +        create.instances = to_user_pointer(eci);
>> +
>> +        if (verbose > 3)
>> +            printf("num_placements=%d class=%d gt=%d\n", 
>> create.num_placements,
>> +                eq->hwe.engine_class, eq->hwe.gt_id);
>> +    }
>> +
>> +    igt_assert_eq(igt_ioctl(fd, DRM_IOCTL_XE_EXEC_QUEUE_CREATE, 
>> &create), 0);
>> +
>> +    if (eci)
>> +        free(eci);
>> +
>> +    eq->id = create.exec_queue_id;
>> +}
>> +
>>   static void allocate_contexts(unsigned int id, struct workload *wrk)
>>   {
>>       int max_ctx = -1;
>> @@ -1978,6 +2233,77 @@ static int prepare_contexts(unsigned int id, 
>> struct workload *wrk)
>>       return 0;
>>   }
>> +static int xe_prepare_vms_contexts_syncobjs(unsigned int id, struct 
>> workload *wrk)
>> +{
>> +    int engine_classes[NUM_ENGINES] = {};
>> +    struct w_step *w;
>> +    int i, j;
>> +
>> +    /* shortcut, create one vm */
>> +    wrk->xe.nr_vms = 1;
>> +    wrk->xe.vm_list = calloc(wrk->xe.nr_vms, sizeof(struct xe_vm));
>> +    wrk->xe.vm_list->compute_mode = false;
>> +    xe_vm_create_(wrk->xe.vm_list);
>> +    wrk->xe.vm_list->ahnd = intel_allocator_open(fd, 
>> wrk->xe.vm_list->id, INTEL_ALLOCATOR_RELOC);
>> +
>> +    /* create exec queues of each referenced engine class */
>> +    for (j = 0; j < wrk->nr_ctxs; j++) {
>> +        struct ctx *ctx = &wrk->ctx_list[j];
>> +        /* link with vm */
>> +        ctx->xe.vm = wrk->xe.vm_list;
>> +        for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
>> +            if (w->context != j)
>> +                continue;
>> +            if (w->type == ENGINE_MAP) {
>> +                ctx->engine_map = w->engine_map;
>> +                ctx->engine_map_count = w->engine_map_count;
>> +            } else if (w->type == LOAD_BALANCE) {
>> +                if (!ctx->engine_map) {
>> +                    wsim_err("Load balancing needs an engine map!\n");
>> +                    return 1;
>> +                }
>> +                if (intel_gen(intel_get_drm_devid(fd)) < 11) {
>> +                    wsim_err("Load balancing needs relative mmio 
>> support, gen11+!\n");
>> +                    return 1;
>> +                }
>> +                ctx->load_balance = w->load_balance;
>> +            }
>> +        }
>> +        /* create exec queue for each referenced engine */
>> +        for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
>> +            if (w->context != j)
>> +                continue;
>> +            if (w->type == BATCH)
>> +                engine_classes[w->engine]++;
>> +        }
>> +        for (i = 0; i < NUM_ENGINES; i++) {
>> +            if (engine_classes[i]) {
>> +                if (verbose > 3)
>> +                    printf("%u ctx[%d] eq(%s) load_balance=%d\n",
>> +                        id, j, ring_str_map[i], ctx->load_balance);
>> +                if (i == VCS) {
>> +                    ctx->xe.queues[i].hwe.engine_class =
>> +                        xe_get_engine(VCS1).engine_class;
>> +                    ctx->xe.queues[i].hwe.engine_instance = 1;
>> +                } else
>> +                    ctx->xe.queues[i].hwe = xe_get_engine(i);
>> +                xe_exec_queue_create_(ctx, &ctx->xe.queues[i]);
>> +            }
>> +            engine_classes[i] = 0;
>> +        }
>> +    }
>> +
>> +    /* create syncobjs for SW_FENCE */
>> +    for (j = 0, i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++)
>> +        if (w->type == SW_FENCE) {
>> +            w->xe.syncs = calloc(1, sizeof(struct drm_xe_sync));
>> +            w->xe.syncs[0].handle = syncobj_create(fd, 0);
>> +            w->xe.syncs[0].flags = DRM_XE_SYNC_SYNCOBJ;
>> +        }
>> +
>> +    return 0;
>> +}
>> +
>>   static void prepare_working_sets(unsigned int id, struct workload *wrk)
>>   {
>>       struct working_set **sets;
>> @@ -2047,7 +2373,11 @@ static int prepare_workload(unsigned int id, 
>> struct workload *wrk)
>>       allocate_contexts(id, wrk);
>> -    ret = prepare_contexts(id, wrk);
>> +    if (is_xe)
>> +        ret = xe_prepare_vms_contexts_syncobjs(id, wrk);
>> +    else
>> +        ret = prepare_contexts(id, wrk);
> 
> Should this then be more obviously named like:
> 
> if (is_xe)
>      ret = xe_prepare_contexts(..);
> else
>      ret = i915_prepare_contexts(..);
> 
> ?
> 
> i915_ prefix optional I guess for less churn, since it would otherwise 
> apply to alloc_step_batch too. Whatever you prefer.
> 

makes sense

>> +
>>       if (ret)
>>           return ret;
>> @@ -2101,7 +2431,10 @@ static int prepare_workload(unsigned int id, 
>> struct workload *wrk)
>>           if (w->type != BATCH)
>>               continue;
>> -        alloc_step_batch(wrk, w);
>> +        if (is_xe)
>> +            xe_alloc_step_batch(wrk, w);
>> +        else
>> +            alloc_step_batch(wrk, w);
>>       }
>>       measure_active_set(wrk);
>> @@ -2151,6 +2484,23 @@ static void w_sync_to(struct workload *wrk, 
>> struct w_step *w, int target)
>>       w_step_sync(&wrk->steps[target]);
>>   }
>> +static void do_xe_exec(struct workload *wrk, struct w_step *w)
>> +{
>> +    struct xe_exec_queue *eq = xe_get_eq(wrk, w);
>> +
>> +    igt_assert(w->emit_fence <= 0);
>> +    if (w->emit_fence == -1)
>> +        syncobj_reset(fd, &w->xe.syncs[0].handle, 1);
>> +
>> +    /* update duration if random */
>> +    if (w->duration.max != w->duration.min)
>> +        xe_spin_init_opts(&w->xe.data->spin, .addr = w->xe.exec.address,
>> +                       .preempt = (w->preempt_us > 0),
>> +                       .ctx_ticks = duration_to_ctx_ticks(fd, 
>> eq->hwe.gt_id,
>> +                                1000LL * get_duration(wrk, w)));
>> +    xe_exec(fd, &w->xe.exec);
>> +}
>> +
>>   static void
>>   do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id 
>> engine)
>>   {
>> @@ -2276,6 +2626,10 @@ static void *run_workload(void *data)
>>                       sw_sync_timeline_create_fence(wrk->sync_timeline,
>>                                         cur_seqno + w->idx);
>>                   igt_assert(w->emit_fence > 0);
>> +                if (is_xe)
>> +                    /* Convert sync file to syncobj */
>> +                    syncobj_import_sync_file(fd, w->xe.syncs[0].handle,
>> +                                 w->emit_fence);
>>                   continue;
>>               } else if (w->type == SW_FENCE_SIGNAL) {
>>                   int tgt = w->idx + w->target;
>> @@ -2307,7 +2661,10 @@ static void *run_workload(void *data)
>>                   igt_assert(wrk->steps[t_idx].type == BATCH);
>>                   igt_assert(wrk->steps[t_idx].duration.unbound);
>> -                *wrk->steps[t_idx].i915.bb_duration = 0xffffffff;
>> +                if (is_xe)
>> +                    xe_spin_end(&wrk->steps[t_idx].xe.data->spin);
>> +                else
>> +                    *wrk->steps[t_idx].i915.bb_duration = 0xffffffff;
>>                   __sync_synchronize();
>>                   continue;
>>               } else if (w->type == SSEU) {
>> @@ -2339,7 +2696,10 @@ static void *run_workload(void *data)
>>               if (throttle > 0)
>>                   w_sync_to(wrk, w, i - throttle);
>> -            do_eb(wrk, w, engine);
>> +            if (is_xe)
>> +                do_xe_exec(wrk, w);
>> +            else
>> +                do_eb(wrk, w, engine);
>>               if (w->request != -1) {
>>                   igt_list_del(&w->rq_link);
>> @@ -2383,6 +2743,10 @@ static void *run_workload(void *data)
>>           for (i = 0, w = wrk->steps; wrk->run && (i < wrk->nr_steps);
>>                i++, w++) {
>>               if (w->emit_fence > 0) {
>> +                if (is_xe) {
>> +                    igt_assert(w->type == SW_FENCE);
>> +                    syncobj_reset(fd, &w->xe.syncs[0].handle, 1);
>> +                }
>>                   close(w->emit_fence);
>>                   w->emit_fence = -1;
>>               }
>> @@ -2397,6 +2761,23 @@ static void *run_workload(void *data)
>>           w_step_sync(w);
>>       }
>> +    if (is_xe) {
>> +        for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
>> +            if (w->type == BATCH) {
>> +                w_step_sync(w);
>> +                syncobj_destroy(fd, w->xe.syncs[0].handle);
>> +                free(w->xe.syncs);
>> +                xe_vm_unbind_sync(fd, xe_get_vm(wrk, w)->id, 0, 
>> w->xe.exec.address,
>> +                          w->bb_size);
>> +                gem_munmap(w->xe.data, w->bb_size);
>> +                gem_close(fd, w->bb_handle);
>> +            } else if (w->type == SW_FENCE) {
>> +                syncobj_destroy(fd, w->xe.syncs[0].handle);
>> +                free(w->xe.syncs);
>> +            }
>> +        }
>> +    }
>> +
>>       clock_gettime(CLOCK_MONOTONIC, &t_end);
>>       if (wrk->print_stats) {
>> @@ -2416,6 +2797,23 @@ static void *run_workload(void *data)
>>   static void fini_workload(struct workload *wrk)
>>   {
>> +    if (is_xe) {
>> +        struct xe_exec_queue *eq;
>> +        struct xe_vm *vm;
>> +        struct ctx *ctx;
>> +
>> +        for_each_ctx(ctx, wrk)
>> +            for_each_xe_exec_queue(eq, ctx) {
>> +                xe_exec_queue_destroy(fd, eq->id);
>> +                eq->id = 0;
>> +            }
>> +        for_each_xe_vm(vm, wrk) {
>> +            put_ahnd(vm->ahnd);
>> +            xe_vm_destroy(fd, vm->id);
>> +        }
>> +        free(wrk->xe.vm_list);
>> +        wrk->xe.nr_vms = 0;
>> +    }
> 
> Does it really matter since device fd is getting closed anyway? But 
> apart from making the patch bigger it doesn't harm anything so up to you.
> 

ok, will shorten the patch then, if needed I will add a cleanup in 
separate patch.

>>       free(wrk->steps);
>>       free(wrk);
>>   }
>> @@ -2623,8 +3021,12 @@ int main(int argc, char **argv)
>>           ret = igt_device_find_first_i915_discrete_card(&card);
>>           if (!ret)
>>               ret = igt_device_find_integrated_card(&card);
>> +        if (!ret)
>> +            ret = igt_device_find_first_xe_discrete_card(&card);
>> +        if (!ret)
>> +            ret = igt_device_find_xe_integrated_card(&card);
>>           if (!ret) {
>> -            wsim_err("No device filter specified and no i915 devices 
>> found!\n");
>> +            wsim_err("No device filter specified and no intel devices 
>> found!\n");
>>               return EXIT_FAILURE;
>>           }
>>       }
>> @@ -2647,6 +3049,10 @@ int main(int argc, char **argv)
>>       if (verbose > 1)
>>           printf("Using device %s\n", drm_dev);
>> +    is_xe = is_xe_device(fd);
>> +    if (is_xe)
>> +        xe_device_get(fd);
> 
> Still no put, but meh on that and meh on the get/put naming to begin 
> with. :)
> 
will put next to close(fd) :)
>> +
>>       if (!nr_w_args) {
>>           wsim_err("No workload descriptor(s)!\n");
>>           goto err;
>> diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
>> index d4f3dd8ae..38f305ba0 100644
>> --- a/benchmarks/wsim/README
>> +++ b/benchmarks/wsim/README
>> @@ -88,6 +88,19 @@ Batch durations can also be specified as infinite 
>> by using the '*' in the
>>   duration field. Such batches must be ended by the terminate command 
>> ('T')
>>   otherwise they will cause a GPU hang to be reported.
>> +Xe and i915 differences
>> +------------------------
>> +
>> +There are differences between Xe and i915, like not allowing a BO 
>> list to
>> +be passed to an exec (and create implicit syncs). For more details see:
>> +https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/drm-xe-next/drivers/gpu/drm/xe/xe_exec.c
>> +
>> +Currently following batch steps are equal on Xe:
>> +1.1000.-2.0 <==> 1.1000.f-2.0
>> +and will create explicit sync fence dependency (via syncobjects).
> 
> I don't remember if I asked if possible misuse combinations of implicit 
> fence creation and data deps keep working. Like:
> 
> f
> 1.DEFAULT.1000.-1.0 # should fail
> 
> f
> 1.DEFAULT.1000.f-1.0
> 1.DEFAULT.1000.f-1.0 # should fail, but without 'f' should work
> 
> At least I think this is what should happen. Basically we want to make 
> sure existing workloads work but perversions like above should be 
> detected as something the auto-magic i915<->xe compatibility translation 
> does not want to start allowing.
> 
will check and correct
>> +
>> +The data dependency need to wait for working sets implementation.
> 
> For explicit working set dependencies or also for implicits?

If my understanding is correct, there are no implicit data dependencies 
in Xe.
For i915 alloc_step_batch is creating one BO with EXEC_OBJECT_WRITE 
flag, used to create implicit dependencies (and synchronization points) 
by passing it's handle to BO list of dependee execs.
I guess to achieve same in Xe we can create BO, call VM_BIND and have 
explicit syncobject synchronization between execs.
So looks in current implementation we only have explicit synchronization 
and no additional BO with VM_BIND calls. I think that can be a part for 
working set dependencies implementation.

> 
>> +
>>   Sync (fd) fences
>>   ----------------
>> @@ -131,7 +144,7 @@ runnable. When the second RCS batch completes the 
>> standalone fence is signaled
>>   which allows the two VCS batches to be executed. Finally we wait 
>> until the both
>>   VCS batches have completed before starting the (optional) next 
>> iteration.
>> -Submit fences
>> +Submit fences (i915 only)
> 
> Mark priorities, SSEU, working sets and bonds as well please.
> 
ok

>>   -------------
>>   Submit fences are a type of input fence which are signalled when the 
>> originating
> 
> Overall - not too much code to add xe support, nice!

yes, at least from scheduling point of view it's already testable (let 
me figure out those engine mappings/fence corrections).

Regards,
Marcin

> 
> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [igt-dev] ✓ CI.xeBAT: success for benchmarks/gem_wsim: added basic xe support (rev5)
  2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (16 preceding siblings ...)
  2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 17/17] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
@ 2023-09-28 18:59 ` Patchwork
  2023-09-28 19:10 ` [igt-dev] ✗ Fi.CI.BAT: failure " Patchwork
  18 siblings, 0 replies; 40+ messages in thread
From: Patchwork @ 2023-09-28 18:59 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: igt-dev

[-- Attachment #1: Type: text/plain, Size: 6218 bytes --]

== Series Details ==

Series: benchmarks/gem_wsim: added basic xe support (rev5)
URL   : https://patchwork.freedesktop.org/series/122920/
State : success

== Summary ==

CI Bug Log - changes from XEIGT_7506_BAT -> XEIGTPW_9892_BAT
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (3 -> 4)
------------------------------

  Additional (1): bat-pvc-2 

Known issues
------------

  Here are the changes found in XEIGTPW_9892_BAT that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_addfb_basic@addfb25-x-tiled-legacy:
    - bat-pvc-2:          NOTRUN -> [SKIP][1] ([Intel XE#538]) +33 other tests skip
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_9892/bat-pvc-2/igt@kms_addfb_basic@addfb25-x-tiled-legacy.html

  * igt@kms_cursor_legacy@basic-flip-after-cursor-atomic:
    - bat-pvc-2:          NOTRUN -> [SKIP][2] ([Intel XE#539]) +7 other tests skip
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_9892/bat-pvc-2/igt@kms_cursor_legacy@basic-flip-after-cursor-atomic.html

  * igt@kms_flip@basic-flip-vs-wf_vblank:
    - bat-pvc-2:          NOTRUN -> [SKIP][3] ([Intel XE#275] / [Intel XE#541]) +3 other tests skip
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_9892/bat-pvc-2/igt@kms_flip@basic-flip-vs-wf_vblank.html

  * igt@kms_force_connector_basic@force-connector-state:
    - bat-pvc-2:          NOTRUN -> [SKIP][4] ([Intel XE#540]) +3 other tests skip
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_9892/bat-pvc-2/igt@kms_force_connector_basic@force-connector-state.html

  * igt@kms_pipe_crc_basic@nonblocking-crc:
    - bat-pvc-2:          NOTRUN -> [SKIP][5] ([Intel XE#537]) +6 other tests skip
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_9892/bat-pvc-2/igt@kms_pipe_crc_basic@nonblocking-crc.html

  * igt@kms_prop_blob@basic:
    - bat-pvc-2:          NOTRUN -> [SKIP][6] ([Intel XE#536])
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_9892/bat-pvc-2/igt@kms_prop_blob@basic.html

  * igt@kms_psr@primary_page_flip:
    - bat-pvc-2:          NOTRUN -> [SKIP][7] ([Intel XE#535]) +2 other tests skip
   [7]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_9892/bat-pvc-2/igt@kms_psr@primary_page_flip.html

  * igt@xe_evict@evict-beng-small-external:
    - bat-pvc-2:          NOTRUN -> [FAIL][8] ([Intel XE#389]) +3 other tests fail
   [8]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_9892/bat-pvc-2/igt@xe_evict@evict-beng-small-external.html

  * igt@xe_evict@evict-small-external-cm:
    - bat-pvc-2:          NOTRUN -> [DMESG-FAIL][9] ([Intel XE#482]) +3 other tests dmesg-fail
   [9]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_9892/bat-pvc-2/igt@xe_evict@evict-small-external-cm.html

  * igt@xe_guc_pc@freq_fixed_idle:
    - bat-pvc-2:          NOTRUN -> [SKIP][10] ([Intel XE#533]) +1 other test skip
   [10]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_9892/bat-pvc-2/igt@xe_guc_pc@freq_fixed_idle.html

  * igt@xe_huc_copy@huc_copy:
    - bat-pvc-2:          NOTRUN -> [SKIP][11] ([Intel XE#255])
   [11]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_9892/bat-pvc-2/igt@xe_huc_copy@huc_copy.html

  * igt@xe_intel_bb@render:
    - bat-pvc-2:          NOTRUN -> [SKIP][12] ([Intel XE#532])
   [12]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_9892/bat-pvc-2/igt@xe_intel_bb@render.html

  * igt@xe_module_load@load:
    - bat-pvc-2:          NOTRUN -> [SKIP][13] ([Intel XE#378])
   [13]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_9892/bat-pvc-2/igt@xe_module_load@load.html

  * igt@xe_pm_residency@gt-c6-on-idle:
    - bat-pvc-2:          NOTRUN -> [SKIP][14] ([Intel XE#531])
   [14]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_9892/bat-pvc-2/igt@xe_pm_residency@gt-c6-on-idle.html

  
#### Possible fixes ####

  * igt@kms_flip@basic-flip-vs-wf_vblank@b-edp1:
    - bat-adlp-7:         [FAIL][15] ([Intel XE#480]) -> [PASS][16] +1 other test pass
   [15]: https://intel-gfx-ci.01.org/tree/intel-xe/IGT_7506/bat-adlp-7/igt@kms_flip@basic-flip-vs-wf_vblank@b-edp1.html
   [16]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_9892/bat-adlp-7/igt@kms_flip@basic-flip-vs-wf_vblank@b-edp1.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [Intel XE#255]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/255
  [Intel XE#275]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/275
  [Intel XE#378]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/378
  [Intel XE#389]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/389
  [Intel XE#480]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/480
  [Intel XE#482]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/482
  [Intel XE#524]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/524
  [Intel XE#531]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/531
  [Intel XE#532]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/532
  [Intel XE#533]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/533
  [Intel XE#535]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/535
  [Intel XE#536]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/536
  [Intel XE#537]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/537
  [Intel XE#538]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/538
  [Intel XE#539]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/539
  [Intel XE#540]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/540
  [Intel XE#541]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/541


Build changes
-------------

  * IGT: IGT_7506 -> IGTPW_9892
  * Linux: xe-397-bce60b0ff2937cb2ea51841a479bc1a2da65052b -> xe-399-7c58b58522cf124dd324fbf95d9dd838fac36bcb

  IGTPW_9892: 9892
  IGT_7506: 4fdf544bd0a38c5a100ef43c30171827e1c8c442 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  xe-397-bce60b0ff2937cb2ea51841a479bc1a2da65052b: bce60b0ff2937cb2ea51841a479bc1a2da65052b
  xe-399-7c58b58522cf124dd324fbf95d9dd838fac36bcb: 7c58b58522cf124dd324fbf95d9dd838fac36bcb

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_9892/index.html

[-- Attachment #2: Type: text/html, Size: 7066 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [igt-dev] ✗ Fi.CI.BAT: failure for benchmarks/gem_wsim: added basic xe support (rev5)
  2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
                   ` (17 preceding siblings ...)
  2023-09-28 18:59 ` [igt-dev] ✓ CI.xeBAT: success for benchmarks/gem_wsim: added basic xe support (rev5) Patchwork
@ 2023-09-28 19:10 ` Patchwork
  18 siblings, 0 replies; 40+ messages in thread
From: Patchwork @ 2023-09-28 19:10 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: igt-dev

[-- Attachment #1: Type: text/plain, Size: 15446 bytes --]

== Series Details ==

Series: benchmarks/gem_wsim: added basic xe support (rev5)
URL   : https://patchwork.freedesktop.org/series/122920/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_13689 -> IGTPW_9892
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with IGTPW_9892 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in IGTPW_9892, please notify your bug team (lgci.bug.filing@intel.com) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/index.html

Participating hosts (34 -> 40)
------------------------------

  Additional (8): fi-kbl-soraka bat-kbl-2 bat-dg2-8 fi-cfl-8700k fi-apl-guc fi-kbl-guc fi-ivb-3770 fi-skl-6600u 
  Missing    (2): fi-snb-2520m fi-pnv-d510 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in IGTPW_9892:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@sanitycheck:
    - bat-dg1-5:          [PASS][1] -> [ABORT][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13689/bat-dg1-5/igt@i915_selftest@live@sanitycheck.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-dg1-5/igt@i915_selftest@live@sanitycheck.html

  
Known issues
------------

  Here are the changes found in IGTPW_9892 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@fbdev@info:
    - bat-kbl-2:          NOTRUN -> [SKIP][3] ([fdo#109271] / [i915#1849])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-kbl-2/igt@fbdev@info.html
    - fi-kbl-guc:         NOTRUN -> [SKIP][4] ([fdo#109271] / [i915#1849])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/fi-kbl-guc/igt@fbdev@info.html

  * igt@gem_huc_copy@huc-copy:
    - fi-cfl-8700k:       NOTRUN -> [SKIP][5] ([fdo#109271] / [i915#2190])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/fi-cfl-8700k/igt@gem_huc_copy@huc-copy.html
    - fi-skl-6600u:       NOTRUN -> [SKIP][6] ([fdo#109271] / [i915#2190])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/fi-skl-6600u/igt@gem_huc_copy@huc-copy.html
    - fi-kbl-soraka:      NOTRUN -> [SKIP][7] ([fdo#109271] / [i915#2190])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/fi-kbl-soraka/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@basic:
    - fi-apl-guc:         NOTRUN -> [SKIP][8] ([fdo#109271] / [i915#4613]) +3 other tests skip
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/fi-apl-guc/igt@gem_lmem_swapping@basic.html
    - fi-kbl-soraka:      NOTRUN -> [SKIP][9] ([fdo#109271] / [i915#4613]) +3 other tests skip
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/fi-kbl-soraka/igt@gem_lmem_swapping@basic.html
    - fi-cfl-8700k:       NOTRUN -> [SKIP][10] ([fdo#109271] / [i915#4613]) +3 other tests skip
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/fi-cfl-8700k/igt@gem_lmem_swapping@basic.html
    - fi-kbl-guc:         NOTRUN -> [SKIP][11] ([fdo#109271] / [i915#4613]) +3 other tests skip
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/fi-kbl-guc/igt@gem_lmem_swapping@basic.html

  * igt@gem_lmem_swapping@parallel-random-engines:
    - bat-kbl-2:          NOTRUN -> [SKIP][12] ([fdo#109271]) +39 other tests skip
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-kbl-2/igt@gem_lmem_swapping@parallel-random-engines.html

  * igt@gem_lmem_swapping@random-engines:
    - fi-skl-6600u:       NOTRUN -> [SKIP][13] ([fdo#109271] / [i915#4613]) +3 other tests skip
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/fi-skl-6600u/igt@gem_lmem_swapping@random-engines.html

  * igt@gem_mmap@basic:
    - bat-dg2-8:          NOTRUN -> [SKIP][14] ([i915#4083])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-dg2-8/igt@gem_mmap@basic.html

  * igt@gem_render_tiled_blits@basic:
    - bat-dg2-8:          NOTRUN -> [SKIP][15] ([i915#4079]) +1 other test skip
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-dg2-8/igt@gem_render_tiled_blits@basic.html

  * igt@gem_tiled_fence_blits@basic:
    - bat-dg2-8:          NOTRUN -> [SKIP][16] ([i915#4077]) +2 other tests skip
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-dg2-8/igt@gem_tiled_fence_blits@basic.html

  * igt@i915_pm_rps@basic-api:
    - bat-dg2-8:          NOTRUN -> [SKIP][17] ([i915#6621])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-dg2-8/igt@i915_pm_rps@basic-api.html

  * igt@i915_selftest@live@gt_pm:
    - fi-kbl-soraka:      NOTRUN -> [DMESG-FAIL][18] ([i915#1886] / [i915#7913])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/fi-kbl-soraka/igt@i915_selftest@live@gt_pm.html

  * igt@i915_selftest@live@requests:
    - bat-mtlp-8:         [PASS][19] -> [ABORT][20] ([i915#9414])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13689/bat-mtlp-8/igt@i915_selftest@live@requests.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-mtlp-8/igt@i915_selftest@live@requests.html

  * igt@i915_suspend@basic-s3-without-i915:
    - bat-dg2-8:          NOTRUN -> [SKIP][21] ([i915#6645])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-dg2-8/igt@i915_suspend@basic-s3-without-i915.html

  * igt@kms_addfb_basic@addfb25-x-tiled-mismatch-legacy:
    - bat-dg2-8:          NOTRUN -> [SKIP][22] ([i915#4212]) +6 other tests skip
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-dg2-8/igt@kms_addfb_basic@addfb25-x-tiled-mismatch-legacy.html

  * igt@kms_addfb_basic@addfb25-y-tiled-small-legacy:
    - bat-dg2-8:          NOTRUN -> [SKIP][23] ([i915#5190])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-dg2-8/igt@kms_addfb_basic@addfb25-y-tiled-small-legacy.html

  * igt@kms_addfb_basic@basic-y-tiled-legacy:
    - bat-dg2-8:          NOTRUN -> [SKIP][24] ([i915#4215] / [i915#5190])
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-dg2-8/igt@kms_addfb_basic@basic-y-tiled-legacy.html

  * igt@kms_addfb_basic@tile-pitch-mismatch:
    - bat-dg2-8:          NOTRUN -> [SKIP][25] ([i915#4212] / [i915#5608])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-dg2-8/igt@kms_addfb_basic@tile-pitch-mismatch.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][26] ([fdo#109271]) +9 other tests skip
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/fi-kbl-soraka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy:
    - bat-dg2-8:          NOTRUN -> [SKIP][27] ([i915#4103] / [i915#4213] / [i915#5608]) +1 other test skip
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-dg2-8/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy.html

  * igt@kms_cursor_legacy@basic-flip-before-cursor-legacy:
    - fi-kbl-guc:         NOTRUN -> [SKIP][28] ([fdo#109271] / [i915#1845]) +8 other tests skip
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/fi-kbl-guc/igt@kms_cursor_legacy@basic-flip-before-cursor-legacy.html

  * igt@kms_dsc@dsc-basic:
    - fi-skl-6600u:       NOTRUN -> [SKIP][29] ([fdo#109271]) +8 other tests skip
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/fi-skl-6600u/igt@kms_dsc@dsc-basic.html

  * igt@kms_force_connector_basic@force-load-detect:
    - fi-cfl-8700k:       NOTRUN -> [SKIP][30] ([fdo#109271]) +10 other tests skip
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/fi-cfl-8700k/igt@kms_force_connector_basic@force-load-detect.html
    - bat-dg2-8:          NOTRUN -> [SKIP][31] ([fdo#109285])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-dg2-8/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_force_connector_basic@prune-stale-modes:
    - bat-dg2-8:          NOTRUN -> [SKIP][32] ([i915#5274])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-dg2-8/igt@kms_force_connector_basic@prune-stale-modes.html

  * igt@kms_hdmi_inject@inject-audio:
    - fi-apl-guc:         NOTRUN -> [SKIP][33] ([fdo#109271]) +16 other tests skip
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/fi-apl-guc/igt@kms_hdmi_inject@inject-audio.html
    - fi-kbl-guc:         NOTRUN -> [FAIL][34] ([IGT#3])
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/fi-kbl-guc/igt@kms_hdmi_inject@inject-audio.html

  * igt@kms_pipe_crc_basic@suspend-read-crc@pipe-c-vga-1:
    - fi-ivb-3770:        NOTRUN -> [DMESG-WARN][35] ([i915#8841]) +6 other tests dmesg-warn
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/fi-ivb-3770/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-c-vga-1.html

  * igt@kms_psr@cursor_plane_move:
    - fi-ivb-3770:        NOTRUN -> [SKIP][36] ([fdo#109271]) +21 other tests skip
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/fi-ivb-3770/igt@kms_psr@cursor_plane_move.html
    - bat-dg2-8:          NOTRUN -> [SKIP][37] ([i915#1072]) +3 other tests skip
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-dg2-8/igt@kms_psr@cursor_plane_move.html
    - fi-kbl-guc:         NOTRUN -> [SKIP][38] ([fdo#109271]) +25 other tests skip
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/fi-kbl-guc/igt@kms_psr@cursor_plane_move.html

  * igt@kms_setmode@basic-clone-single-crtc:
    - bat-dg2-8:          NOTRUN -> [SKIP][39] ([i915#3555])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-dg2-8/igt@kms_setmode@basic-clone-single-crtc.html

  * igt@prime_vgem@basic-fence-flip:
    - bat-dg2-8:          NOTRUN -> [SKIP][40] ([i915#3708])
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-dg2-8/igt@prime_vgem@basic-fence-flip.html

  * igt@prime_vgem@basic-gtt:
    - bat-dg2-8:          NOTRUN -> [SKIP][41] ([i915#3708] / [i915#4077]) +1 other test skip
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-dg2-8/igt@prime_vgem@basic-gtt.html

  * igt@prime_vgem@basic-write:
    - bat-dg2-8:          NOTRUN -> [SKIP][42] ([i915#3291] / [i915#3708]) +2 other tests skip
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/bat-dg2-8/igt@prime_vgem@basic-write.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [IGT#3]: https://gitlab.freedesktop.org/drm/igt-gpu-tools/issues/3
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#1849]: https://gitlab.freedesktop.org/drm/intel/issues/1849
  [i915#1886]: https://gitlab.freedesktop.org/drm/intel/issues/1886
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#3291]: https://gitlab.freedesktop.org/drm/intel/issues/3291
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212
  [i915#4213]: https://gitlab.freedesktop.org/drm/intel/issues/4213
  [i915#4215]: https://gitlab.freedesktop.org/drm/intel/issues/4215
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#5190]: https://gitlab.freedesktop.org/drm/intel/issues/5190
  [i915#5274]: https://gitlab.freedesktop.org/drm/intel/issues/5274
  [i915#5354]: https://gitlab.freedesktop.org/drm/intel/issues/5354
  [i915#5608]: https://gitlab.freedesktop.org/drm/intel/issues/5608
  [i915#6621]: https://gitlab.freedesktop.org/drm/intel/issues/6621
  [i915#6645]: https://gitlab.freedesktop.org/drm/intel/issues/6645
  [i915#7913]: https://gitlab.freedesktop.org/drm/intel/issues/7913
  [i915#8841]: https://gitlab.freedesktop.org/drm/intel/issues/8841
  [i915#9414]: https://gitlab.freedesktop.org/drm/intel/issues/9414


Build changes
-------------

  * CI: CI-20190529 -> None
  * IGT: IGT_7506 -> IGTPW_9892

  CI-20190529: 20190529
  CI_DRM_13689: 5933eb0a0717a28e668d33e01a707311d31cebbb @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_9892: 9892
  IGT_7506: 4fdf544bd0a38c5a100ef43c30171827e1c8c442 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git


Testlist changes
----------------

+igt@kms_universal_plane@cursor-fb-leak-pipe-a
+igt@kms_universal_plane@cursor-fb-leak-pipe-b
+igt@kms_universal_plane@cursor-fb-leak-pipe-c
+igt@kms_universal_plane@cursor-fb-leak-pipe-d
+igt@kms_universal_plane@cursor-fb-leak-pipe-e
+igt@kms_universal_plane@cursor-fb-leak-pipe-f
+igt@kms_universal_plane@cursor-fb-leak-pipe-g
+igt@kms_universal_plane@cursor-fb-leak-pipe-h
+igt@kms_universal_plane@disable-primary-vs-flip-pipe-a
+igt@kms_universal_plane@disable-primary-vs-flip-pipe-b
+igt@kms_universal_plane@disable-primary-vs-flip-pipe-c
+igt@kms_universal_plane@disable-primary-vs-flip-pipe-d
+igt@kms_universal_plane@disable-primary-vs-flip-pipe-e
+igt@kms_universal_plane@disable-primary-vs-flip-pipe-f
+igt@kms_universal_plane@disable-primary-vs-flip-pipe-g
+igt@kms_universal_plane@disable-primary-vs-flip-pipe-h
+igt@kms_universal_plane@universal-plane-pageflip-windowed-pipe-a
+igt@kms_universal_plane@universal-plane-pageflip-windowed-pipe-b
+igt@kms_universal_plane@universal-plane-pageflip-windowed-pipe-c
+igt@kms_universal_plane@universal-plane-pageflip-windowed-pipe-d
+igt@kms_universal_plane@universal-plane-pageflip-windowed-pipe-e
+igt@kms_universal_plane@universal-plane-pageflip-windowed-pipe-f
+igt@kms_universal_plane@universal-plane-pageflip-windowed-pipe-g
+igt@kms_universal_plane@universal-plane-pageflip-windowed-pipe-h
+igt@kms_universal_plane@universal-plane-pipe-a-functional
+igt@kms_universal_plane@universal-plane-pipe-a-sanity
+igt@kms_universal_plane@universal-plane-pipe-b-functional
+igt@kms_universal_plane@universal-plane-pipe-b-sanity
+igt@kms_universal_plane@universal-plane-pipe-c-functional
+igt@kms_universal_plane@universal-plane-pipe-c-sanity
+igt@kms_universal_plane@universal-plane-pipe-d-functional
+igt@kms_universal_plane@universal-plane-pipe-d-sanity
+igt@kms_universal_plane@universal-plane-pipe-e-functional
+igt@kms_universal_plane@universal-plane-pipe-e-sanity
+igt@kms_universal_plane@universal-plane-pipe-f-functional
+igt@kms_universal_plane@universal-plane-pipe-f-sanity
+igt@kms_universal_plane@universal-plane-pipe-g-functional
+igt@kms_universal_plane@universal-plane-pipe-g-sanity
+igt@kms_universal_plane@universal-plane-pipe-h-functional
+igt@kms_universal_plane@universal-plane-pipe-h-sanity
-igt@kms_universal_plane@cursor-fb-leak
-igt@kms_universal_plane@disable-primary-vs-flip
-igt@kms_universal_plane@universal-plane-functional
-igt@kms_universal_plane@universal-plane-pageflip-windowed
-igt@kms_universal_plane@universal-plane-sanity

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9892/index.html

[-- Attachment #2: Type: text/html, Size: 19163 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2023-10-05 12:30 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-28 17:45 [igt-dev] [PATCH i-g-t 00/17] [RFC] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 01/17] lib/igt_device_scan: Xe get integrated/discrete card functions Marcin Bernatowicz
2023-10-05 12:17   ` Kamil Konieczny
2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 02/17] benchmarks/gem_wsim: reposition the unbound duration boolean Marcin Bernatowicz
2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 03/17] benchmarks/gem_wsim: fix scaling of period steps Marcin Bernatowicz
2023-09-29  8:01   ` Tvrtko Ursulin
2023-09-29  9:31     ` Bernatowicz, Marcin
2023-09-29 10:52       ` Tvrtko Ursulin
2023-09-29 11:30         ` Bernatowicz, Marcin
2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 04/17] benchmarks/gem_wsim: fix duration range check Marcin Bernatowicz
2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 05/17] benchmarks/gem_wsim: extract duration parsing code to new function Marcin Bernatowicz
2023-09-29  8:08   ` Tvrtko Ursulin
2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 06/17] benchmarks/gem_wsim: fix conflicting SSEU #define and enum Marcin Bernatowicz
2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 07/17] benchmarks/gem_wsim: cleanups Marcin Bernatowicz
2023-09-29  8:09   ` Tvrtko Ursulin
2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 08/17] benchmarks/gem_wsim: reposition repeat_start variable Marcin Bernatowicz
2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 09/17] benchmarks/gem_wsim: use lib code to query engines Marcin Bernatowicz
2023-09-29  8:11   ` Tvrtko Ursulin
2023-09-29 10:35     ` Bernatowicz, Marcin
2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 10/17] benchmarks/gem_wsim: allow comments in workload description files Marcin Bernatowicz
2023-09-29  8:28   ` Tvrtko Ursulin
2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 11/17] benchmarks/gem_wsim: introduce w_step_sync function Marcin Bernatowicz
2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 12/17] benchmarks/gem_wsim: extract allocate and prepare contexts code to new functions Marcin Bernatowicz
2023-09-29  9:26   ` Tvrtko Ursulin
2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 13/17] benchmarks/gem_wsim: extract prepare working sets code to new function Marcin Bernatowicz
2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 14/17] benchmarks/gem_wsim: group i915 fields Marcin Bernatowicz
2023-09-29  9:33   ` Tvrtko Ursulin
2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 15/17] benchmarks/gem_wsim: introduce bb_size in w_step Marcin Bernatowicz
2023-09-29  9:35   ` Tvrtko Ursulin
2023-09-29 10:08     ` Bernatowicz, Marcin
2023-09-29 10:49       ` Tvrtko Ursulin
2023-10-05 10:52         ` Bernatowicz, Marcin
2023-10-05 12:30           ` Tvrtko Ursulin
2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 16/17] benchmarks/gem_wsim: for_each_dep macro Marcin Bernatowicz
2023-09-29  9:37   ` Tvrtko Ursulin
2023-09-28 17:45 ` [igt-dev] [PATCH i-g-t 17/17] benchmarks/gem_wsim: added basic xe support Marcin Bernatowicz
2023-09-29 10:45   ` Tvrtko Ursulin
2023-09-29 15:53     ` Bernatowicz, Marcin
2023-09-28 18:59 ` [igt-dev] ✓ CI.xeBAT: success for benchmarks/gem_wsim: added basic xe support (rev5) Patchwork
2023-09-28 19:10 ` [igt-dev] ✗ Fi.CI.BAT: failure " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox