public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/4] Disable automatic load CCS load balancing
@ 2024-03-08 20:22 Andi Shyti
  2024-03-08 20:22 ` [PATCH v5 1/4] drm/i915/gt: Disable HW load balancing for CCS Andi Shyti
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Andi Shyti @ 2024-03-08 20:22 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Chris Wilson, Joonas Lahtinen, Matt Roper, John Harrison, stable,
	Andi Shyti, Andi Shyti, Tvrtko Ursulin

Hi,

this series does basically two things:

1. Disables automatic load balancing as adviced by the hardware
   workaround.

2. Assigns all the CCS slices to one single user engine. The user
   will then be able to query only one CCS engine

In this v5 I have created a new file, gt/intel_gt_ccs_mode.c
where I added the intel_gt_apply_ccs_mode(). In the upcoming
patches, this file will contain the implementation for dynamic
CCS mode setting.

I saw also necessary the creation of a new mechanism fro looping
through engines in order to exclude the CCS's that are merged
into one single stream. It's called for_each_available_engine()
and I started using it in the hangcheck sefltest. I might still
need to iterate a few CI runs in order to cover more cases when
this call is needed.

I'm using here the "Requires: " tag, but I'm not sure the commit
id will be valid, on the other hand, I don't know what commit id
I should use.

Thanks Tvrtko, Matt, John and Joonas for your reviews!

Andi

Changelog
=========
v4 -> v5
 - Use the workaround framework to do all the CCS balancing
   settings in order to always apply the modes also when the
   engine resets. Put everything in its own specific function to
   be executed for the first CCS engine encountered. (Thanks
   Matt)
 - Calculate the CCS ID for the CCS mode as the first available
   CCS among all the engines (Thanks Matt)
 - create the intel_gt_ccs_mode.c function to host the CCS
   configuration. We will have it ready for the next series.
 - Fix a selftest that was failing because could not set CCS2.
 - Add the for_each_available_engine() macro to exclude CCS1+ and
   start using it in the hangcheck selftest.

v3 -> v4
 - Reword correctly the comment in the workaround
 - Fix a buffer overflow (Thanks Joonas)
 - Handle properly the fused engines when setting the CCS mode.

v2 -> v3
 - Simplified the algorithm for creating the list of the exported
   uabi engines. (Patch 1) (Thanks, Tvrtko)
 - Consider the fused engines when creating the uabi engine list
   (Patch 2) (Thanks, Matt)
 - Patch 4 now uses a the refactoring from patch 1, in a cleaner
   outcome.

v1 -> v2
 - In Patch 1 use the correct workaround number (thanks Matt).
 - In Patch 2 do not add the extra CCS engines to the exposed
   UABI engine list and adapt the engine counting accordingly
   (thanks Tvrtko).
 - Reword the commit of Patch 2 (thanks John).

Andi Shyti (4):
  drm/i915/gt: Disable HW load balancing for CCS
  drm/i915/gt: Refactor uabi engine class/instance list creation
  drm/i915/gt: Disable tests for CCS engines beyond the first
  drm/i915/gt: Enable only one CCS for compute workload

 drivers/gpu/drm/i915/Makefile                |  1 +
 drivers/gpu/drm/i915/gt/intel_engine_user.c  | 40 ++++++++++++++------
 drivers/gpu/drm/i915/gt/intel_gt.h           | 13 +++++++
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c  | 39 +++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h  | 13 +++++++
 drivers/gpu/drm/i915/gt/intel_gt_regs.h      |  6 +++
 drivers/gpu/drm/i915/gt/intel_workarounds.c  | 30 ++++++++++++++-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 22 +++++------
 8 files changed, 139 insertions(+), 25 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h

-- 
2.43.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v5 1/4] drm/i915/gt: Disable HW load balancing for CCS
  2024-03-08 20:22 [PATCH v5 0/4] Disable automatic load CCS load balancing Andi Shyti
@ 2024-03-08 20:22 ` Andi Shyti
  2024-03-12 16:58   ` Matt Roper
  2024-03-08 20:22 ` [PATCH v5 2/4] drm/i915/gt: Refactor uabi engine class/instance list creation Andi Shyti
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Andi Shyti @ 2024-03-08 20:22 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Chris Wilson, Joonas Lahtinen, Matt Roper, John Harrison, stable,
	Andi Shyti, Andi Shyti, Tvrtko Ursulin

The hardware should not dynamically balance the load between CCS
engines. Wa_14019159160 recommends disabling it across all
platforms.

Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: <stable@vger.kernel.org> # v6.2+
---
 drivers/gpu/drm/i915/gt/intel_gt_regs.h     |  1 +
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 23 +++++++++++++++++++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index 50962cfd1353..cf709f6c05ae 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -1478,6 +1478,7 @@
 
 #define GEN12_RCU_MODE				_MMIO(0x14800)
 #define   GEN12_RCU_MODE_CCS_ENABLE		REG_BIT(0)
+#define   XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE	REG_BIT(1)
 
 #define CHV_FUSE_GT				_MMIO(VLV_GUNIT_BASE + 0x2168)
 #define   CHV_FGT_DISABLE_SS0			(1 << 10)
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 25413809b9dc..4865eb5ca9c9 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -51,7 +51,8 @@
  *   registers belonging to BCS, VCS or VECS should be implemented in
  *   xcs_engine_wa_init(). Workarounds for registers not belonging to a specific
  *   engine's MMIO range but that are part of of the common RCS/CCS reset domain
- *   should be implemented in general_render_compute_wa_init().
+ *   should be implemented in general_render_compute_wa_init(). The settings
+ *   about the CCS load balancing should be added in ccs_engine_wa_mode().
  *
  * - GT workarounds: the list of these WAs is applied whenever these registers
  *   revert to their default values: on GPU reset, suspend/resume [1]_, etc.
@@ -2854,6 +2855,22 @@ add_render_compute_tuning_settings(struct intel_gt *gt,
 		wa_write_clr(wal, GEN8_GARBCNTL, GEN12_BUS_HASH_CTL_BIT_EXC);
 }
 
+static void ccs_engine_wa_mode(struct intel_engine_cs *engine, struct i915_wa_list *wal)
+{
+	struct intel_gt *gt = engine->gt;
+
+	if (!IS_DG2(gt->i915))
+		return;
+
+	/*
+	 * Wa_14019159160: This workaround, along with others, leads to
+	 * significant challenges in utilizing load balancing among the
+	 * CCS slices. Consequently, an architectural decision has been
+	 * made to completely disable automatic CCS load balancing.
+	 */
+	wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE);
+}
+
 /*
  * The workarounds in this function apply to shared registers in
  * the general render reset domain that aren't tied to a
@@ -3004,8 +3021,10 @@ engine_init_workarounds(struct intel_engine_cs *engine, struct i915_wa_list *wal
 	 * to a single RCS/CCS engine's workaround list since
 	 * they're reset as part of the general render domain reset.
 	 */
-	if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE)
+	if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE) {
 		general_render_compute_wa_init(engine, wal);
+		ccs_engine_wa_mode(engine, wal);
+	}
 
 	if (engine->class == COMPUTE_CLASS)
 		ccs_engine_wa_init(engine, wal);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 2/4] drm/i915/gt: Refactor uabi engine class/instance list creation
  2024-03-08 20:22 [PATCH v5 0/4] Disable automatic load CCS load balancing Andi Shyti
  2024-03-08 20:22 ` [PATCH v5 1/4] drm/i915/gt: Disable HW load balancing for CCS Andi Shyti
@ 2024-03-08 20:22 ` Andi Shyti
  2024-03-12 17:08   ` Matt Roper
  2024-03-08 20:22 ` [PATCH v5 3/4] drm/i915/gt: Disable tests for CCS engines beyond the first Andi Shyti
  2024-03-08 20:22 ` [PATCH v5 4/4] drm/i915/gt: Enable only one CCS for compute workload Andi Shyti
  3 siblings, 1 reply; 10+ messages in thread
From: Andi Shyti @ 2024-03-08 20:22 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Chris Wilson, Joonas Lahtinen, Matt Roper, John Harrison, stable,
	Andi Shyti, Andi Shyti, Tvrtko Ursulin

For the upcoming changes we need a cleaner way to build the list
of uabi engines.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.2+
---
 drivers/gpu/drm/i915/gt/intel_engine_user.c | 29 ++++++++++++---------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 833987015b8b..11cc06c0c785 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -203,7 +203,7 @@ static void engine_rename(struct intel_engine_cs *engine, const char *name, u16
 
 void intel_engines_driver_register(struct drm_i915_private *i915)
 {
-	u16 name_instance, other_instance = 0;
+	u16 class_instance[I915_LAST_UABI_ENGINE_CLASS + 2] = { };
 	struct legacy_ring ring = {};
 	struct list_head *it, *next;
 	struct rb_node **p, *prev;
@@ -214,6 +214,8 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 	prev = NULL;
 	p = &i915->uabi_engines.rb_node;
 	list_for_each_safe(it, next, &engines) {
+		u16 uabi_class;
+
 		struct intel_engine_cs *engine =
 			container_of(it, typeof(*engine), uabi_list);
 
@@ -222,15 +224,14 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 
 		GEM_BUG_ON(engine->class >= ARRAY_SIZE(uabi_classes));
 		engine->uabi_class = uabi_classes[engine->class];
-		if (engine->uabi_class == I915_NO_UABI_CLASS) {
-			name_instance = other_instance++;
-		} else {
-			GEM_BUG_ON(engine->uabi_class >=
-				   ARRAY_SIZE(i915->engine_uabi_class_count));
-			name_instance =
-				i915->engine_uabi_class_count[engine->uabi_class]++;
-		}
-		engine->uabi_instance = name_instance;
+
+		if (engine->uabi_class == I915_NO_UABI_CLASS)
+			uabi_class = I915_LAST_UABI_ENGINE_CLASS + 1;
+		else
+			uabi_class = engine->uabi_class;
+
+		GEM_BUG_ON(uabi_class >= ARRAY_SIZE(class_instance));
+		engine->uabi_instance = class_instance[uabi_class]++;
 
 		/*
 		 * Replace the internal name with the final user and log facing
@@ -238,11 +239,15 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 		 */
 		engine_rename(engine,
 			      intel_engine_class_repr(engine->class),
-			      name_instance);
+			      engine->uabi_instance);
 
-		if (engine->uabi_class == I915_NO_UABI_CLASS)
+		if (uabi_class > I915_LAST_UABI_ENGINE_CLASS)
 			continue;
 
+		GEM_BUG_ON(uabi_class >=
+			   ARRAY_SIZE(i915->engine_uabi_class_count));
+		i915->engine_uabi_class_count[uabi_class]++;
+
 		rb_link_node(&engine->uabi_node, prev, p);
 		rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 3/4] drm/i915/gt: Disable tests for CCS engines beyond the first
  2024-03-08 20:22 [PATCH v5 0/4] Disable automatic load CCS load balancing Andi Shyti
  2024-03-08 20:22 ` [PATCH v5 1/4] drm/i915/gt: Disable HW load balancing for CCS Andi Shyti
  2024-03-08 20:22 ` [PATCH v5 2/4] drm/i915/gt: Refactor uabi engine class/instance list creation Andi Shyti
@ 2024-03-08 20:22 ` Andi Shyti
  2024-03-08 20:25   ` kernel test robot
  2024-03-08 20:22 ` [PATCH v5 4/4] drm/i915/gt: Enable only one CCS for compute workload Andi Shyti
  3 siblings, 1 reply; 10+ messages in thread
From: Andi Shyti @ 2024-03-08 20:22 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Chris Wilson, Joonas Lahtinen, Matt Roper, John Harrison, stable,
	Andi Shyti, Andi Shyti, Tvrtko Ursulin

In anticipation of the upcoming commit that will operate with
only one CCS stream, when more than one CCS slice is present,
create a new for_each_available_engine() that excludes CCS
engines beyond the forst. Begin using it in the hangcheck
selftest.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt.h           | 13 ++++++++++++
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 22 ++++++++++----------
 2 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index 6e7cab60834c..d3ee7aee9c7c 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -188,6 +188,19 @@ void intel_gt_release_all(struct drm_i915_private *i915);
 	     (id__)++) \
 		for_each_if ((engine__) = (gt__)->engine[(id__)])
 
+/*
+ * Simple iterator over all initialised engines that
+ * takes into account CCS fixed mode load balancing
+ */
+#define for_each_available_engine(engine__, gt__, id__) \
+	for ((id__) = 0; \
+	     (id__) < I915_NUM_ENGINES; \
+	     (id__)++) \
+		for_each_if \
+			(((engine__) = (gt__)->engine[(id__)]) && \
+			 !(engine__->class == COMPUTE_CLASS && \
+			   engine__->instance))
+
 /* Iterator over subset of engines selected by mask */
 #define for_each_engine_masked(engine__, gt__, mask__, tmp__) \
 	for ((tmp__) = (mask__) & (gt__)->info.engine_mask; \
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 9ce8ff1c04fe..f1e684987ddb 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -296,7 +296,7 @@ static int igt_hang_sanitycheck(void *arg)
 	if (err)
 		return err;
 
-	for_each_engine(engine, gt, id) {
+	for_each_available_engine(engine, gt, id) {
 		struct intel_wedge_me w;
 		long timeout;
 
@@ -360,7 +360,7 @@ static int igt_reset_nop(void *arg)
 	reset_count = i915_reset_count(global);
 	count = 0;
 	do {
-		for_each_engine(engine, gt, id) {
+		for_each_available_engine(engine, gt, id) {
 			struct intel_context *ce;
 			int i;
 
@@ -433,7 +433,7 @@ static int igt_reset_nop_engine(void *arg)
 	if (!intel_has_reset_engine(gt))
 		return 0;
 
-	for_each_engine(engine, gt, id) {
+	for_each_available_engine(engine, gt, id) {
 		unsigned int reset_count, reset_engine_count, count;
 		struct intel_context *ce;
 		IGT_TIMEOUT(end_time);
@@ -553,7 +553,7 @@ static int igt_reset_fail_engine(void *arg)
 	if (!intel_has_reset_engine(gt))
 		return 0;
 
-	for_each_engine(engine, gt, id) {
+	for_each_available_engine(engine, gt, id) {
 		unsigned int count;
 		struct intel_context *ce;
 		IGT_TIMEOUT(end_time);
@@ -700,7 +700,7 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
 			return err;
 	}
 
-	for_each_engine(engine, gt, id) {
+	for_each_available_engine(engine, gt, id) {
 		unsigned int reset_count, reset_engine_count;
 		unsigned long count;
 		bool using_guc = intel_engine_uses_guc(engine);
@@ -990,7 +990,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
 	if (!threads)
 		return -ENOMEM;
 
-	for_each_engine(engine, gt, id) {
+	for_each_available_engine(engine, gt, id) {
 		unsigned long device = i915_reset_count(global);
 		unsigned long count = 0, reported;
 		bool using_guc = intel_engine_uses_guc(engine);
@@ -1010,7 +1010,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
 		}
 
 		memset(threads, 0, sizeof(*threads) * I915_NUM_ENGINES);
-		for_each_engine(other, gt, tmp) {
+		for_each_available_engine(other, gt, tmp) {
 			struct kthread_worker *worker;
 
 			threads[tmp].resets =
@@ -1185,7 +1185,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
 		}
 
 unwind:
-		for_each_engine(other, gt, tmp) {
+		for_each_available_engine(other, gt, tmp) {
 			int ret;
 
 			if (!threads[tmp].worker)
@@ -1621,7 +1621,7 @@ static int wait_for_others(struct intel_gt *gt,
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	for_each_engine(engine, gt, id) {
+	for_each_available_engine(engine, gt, id) {
 		if (engine == exclude)
 			continue;
 
@@ -1649,7 +1649,7 @@ static int igt_reset_queue(void *arg)
 	if (err)
 		goto unlock;
 
-	for_each_engine(engine, gt, id) {
+	for_each_available_engine(engine, gt, id) {
 		struct intel_selftest_saved_policy saved;
 		struct i915_request *prev;
 		IGT_TIMEOUT(end_time);
@@ -1982,7 +1982,7 @@ static int igt_reset_engines_atomic(void *arg)
 		struct intel_engine_cs *engine;
 		enum intel_engine_id id;
 
-		for_each_engine(engine, gt, id) {
+		for_each_available_engine(engine, gt, id) {
 			err = igt_atomic_reset_engine(engine, p);
 			if (err)
 				goto out;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 4/4] drm/i915/gt: Enable only one CCS for compute workload
  2024-03-08 20:22 [PATCH v5 0/4] Disable automatic load CCS load balancing Andi Shyti
                   ` (2 preceding siblings ...)
  2024-03-08 20:22 ` [PATCH v5 3/4] drm/i915/gt: Disable tests for CCS engines beyond the first Andi Shyti
@ 2024-03-08 20:22 ` Andi Shyti
  3 siblings, 0 replies; 10+ messages in thread
From: Andi Shyti @ 2024-03-08 20:22 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Chris Wilson, Joonas Lahtinen, Matt Roper, John Harrison, stable,
	Andi Shyti, Andi Shyti, Tvrtko Ursulin

Enable only one CCS engine by default with all the compute sices
allocated to it.

While generating the list of UABI engines to be exposed to the
user, exclude any additional CCS engines beyond the first
instance.

This change can be tested with igt i915_query.

Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
Requires: 075e003a9e22 ("drm/i915/gt: Refactor uabi engine class/instance list creation")
Requires: 58b935268238 ("drm/i915/gt: Disable tests for CCS engines beyond the first")
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: <stable@vger.kernel.org> # v6.2+
---
 drivers/gpu/drm/i915/Makefile               |  1 +
 drivers/gpu/drm/i915/gt/intel_engine_user.c | 11 ++++++
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 39 +++++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 13 +++++++
 drivers/gpu/drm/i915/gt/intel_gt_regs.h     |  5 +++
 drivers/gpu/drm/i915/gt/intel_workarounds.c |  7 ++++
 6 files changed, 76 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 3ef6ed41e62b..a6885a1d41a1 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -118,6 +118,7 @@ gt-y += \
 	gt/intel_ggtt_fencing.o \
 	gt/intel_gt.o \
 	gt/intel_gt_buffer_pool.o \
+	gt/intel_gt_ccs_mode.o \
 	gt/intel_gt_clock_utils.o \
 	gt/intel_gt_debugfs.o \
 	gt/intel_gt_engines_debugfs.o \
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 11cc06c0c785..9ef1c4ce252d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -208,6 +208,7 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 	struct list_head *it, *next;
 	struct rb_node **p, *prev;
 	LIST_HEAD(engines);
+	u16 uabi_ccs = 0;
 
 	sort_engines(i915, &engines);
 
@@ -244,6 +245,16 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 		if (uabi_class > I915_LAST_UABI_ENGINE_CLASS)
 			continue;
 
+		/*
+		 * The load is balanced among all the available compute
+		 * slices. Expose only the first instance of the compute
+		 * engine.
+		 */
+		if (IS_DG2(i915) &&
+		    uabi_class == I915_ENGINE_CLASS_COMPUTE &&
+		    uabi_ccs++)
+			continue;
+
 		GEM_BUG_ON(uabi_class >=
 			   ARRAY_SIZE(i915->engine_uabi_class_count));
 		i915->engine_uabi_class_count[uabi_class]++;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
new file mode 100644
index 000000000000..044219c5960a
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+
+#include "i915_drv.h"
+#include "intel_gt.h"
+#include "intel_gt_ccs_mode.h"
+#include "intel_gt_regs.h"
+
+void intel_gt_apply_ccs_mode(struct intel_gt *gt)
+{
+	int cslice;
+	u32 mode = 0;
+	int first_ccs = __ffs(CCS_MASK(gt));
+
+	if (!IS_DG2(gt->i915))
+		return;
+
+	/* Build the value for the fixed CCS load balancing */
+	for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
+		if (CCS_MASK(gt) & BIT(cslice))
+			/*
+			 * If available, assign the cslice
+			 * to the first available engine...
+			 */
+			mode |= XEHP_CCS_MODE_CSLICE(cslice, first_ccs);
+
+		else
+			/*
+			 * ... otherwise, mark the cslice as
+			 * unavailable if no CCS dispatches here
+			 */
+			mode |= XEHP_CCS_MODE_CSLICE(cslice,
+						     XEHP_CCS_MODE_CSLICE_MASK);
+	}
+
+	intel_uncore_write(gt->uncore, XEHP_CCS_MODE, mode);
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
new file mode 100644
index 000000000000..9e5549caeb26
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+
+#ifndef __INTEL_GT_CCS_MODE_H__
+#define __INTEL_GT_CCS_MODE_H__
+
+struct intel_gt;
+
+void intel_gt_apply_ccs_mode(struct intel_gt *gt);
+
+#endif /* __INTEL_GT_CCS_MODE_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index cf709f6c05ae..8224dd99c7d7 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -1480,6 +1480,11 @@
 #define   GEN12_RCU_MODE_CCS_ENABLE		REG_BIT(0)
 #define   XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE	REG_BIT(1)
 
+#define XEHP_CCS_MODE				_MMIO(0x14804)
+#define   XEHP_CCS_MODE_CSLICE_MASK		REG_GENMASK(2, 0) /* CCS0-3 + rsvd */
+#define   XEHP_CCS_MODE_CSLICE_WIDTH		ilog2(XEHP_CCS_MODE_CSLICE_MASK + 1)
+#define   XEHP_CCS_MODE_CSLICE(cslice, ccs)	(ccs << (cslice * XEHP_CCS_MODE_CSLICE_WIDTH))
+
 #define CHV_FUSE_GT				_MMIO(VLV_GUNIT_BASE + 0x2168)
 #define   CHV_FGT_DISABLE_SS0			(1 << 10)
 #define   CHV_FGT_DISABLE_SS1			(1 << 11)
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 4865eb5ca9c9..6ec3582c9735 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -10,6 +10,7 @@
 #include "intel_engine_regs.h"
 #include "intel_gpu_commands.h"
 #include "intel_gt.h"
+#include "intel_gt_ccs_mode.h"
 #include "intel_gt_mcr.h"
 #include "intel_gt_print.h"
 #include "intel_gt_regs.h"
@@ -2869,6 +2870,12 @@ static void ccs_engine_wa_mode(struct intel_engine_cs *engine, struct i915_wa_li
 	 * made to completely disable automatic CCS load balancing.
 	 */
 	wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE);
+
+	/*
+	 * After having disabled automatic load balancing we need to
+	 * assign all slices to a single CCS. We will call it CCS mode 1
+	 */
+	intel_gt_apply_ccs_mode(gt);
 }
 
 /*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 3/4] drm/i915/gt: Disable tests for CCS engines beyond the first
  2024-03-08 20:22 ` [PATCH v5 3/4] drm/i915/gt: Disable tests for CCS engines beyond the first Andi Shyti
@ 2024-03-08 20:25   ` kernel test robot
  0 siblings, 0 replies; 10+ messages in thread
From: kernel test robot @ 2024-03-08 20:25 UTC (permalink / raw)
  To: Andi Shyti; +Cc: stable, oe-kbuild-all

Hi,

Thanks for your patch.

FYI: kernel test robot notices the stable kernel rule is not satisfied.

The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#option-1

Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree.
Subject: [PATCH v5 3/4] drm/i915/gt: Disable tests for CCS engines beyond the first
Link: https://lore.kernel.org/stable/20240308202223.406384-4-andi.shyti%40linux.intel.com

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 1/4] drm/i915/gt: Disable HW load balancing for CCS
  2024-03-08 20:22 ` [PATCH v5 1/4] drm/i915/gt: Disable HW load balancing for CCS Andi Shyti
@ 2024-03-12 16:58   ` Matt Roper
  2024-03-12 20:37     ` Andi Shyti
  0 siblings, 1 reply; 10+ messages in thread
From: Matt Roper @ 2024-03-12 16:58 UTC (permalink / raw)
  To: Andi Shyti
  Cc: intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen,
	John Harrison, stable, Andi Shyti, Tvrtko Ursulin

On Fri, Mar 08, 2024 at 09:22:16PM +0100, Andi Shyti wrote:
> The hardware should not dynamically balance the load between CCS
> engines. Wa_14019159160 recommends disabling it across all
> platforms.
> 
> Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: <stable@vger.kernel.org> # v6.2+
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h     |  1 +
>  drivers/gpu/drm/i915/gt/intel_workarounds.c | 23 +++++++++++++++++++--
>  2 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> index 50962cfd1353..cf709f6c05ae 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> @@ -1478,6 +1478,7 @@
>  
>  #define GEN12_RCU_MODE				_MMIO(0x14800)
>  #define   GEN12_RCU_MODE_CCS_ENABLE		REG_BIT(0)
> +#define   XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE	REG_BIT(1)

Nitpick: we usually order register bits in descending order.  Aside from
that,

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>

although I still hope our architects will push through a formal
documentation update for this.


Matt

>  
>  #define CHV_FUSE_GT				_MMIO(VLV_GUNIT_BASE + 0x2168)
>  #define   CHV_FGT_DISABLE_SS0			(1 << 10)
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index 25413809b9dc..4865eb5ca9c9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -51,7 +51,8 @@
>   *   registers belonging to BCS, VCS or VECS should be implemented in
>   *   xcs_engine_wa_init(). Workarounds for registers not belonging to a specific
>   *   engine's MMIO range but that are part of of the common RCS/CCS reset domain
> - *   should be implemented in general_render_compute_wa_init().
> + *   should be implemented in general_render_compute_wa_init(). The settings
> + *   about the CCS load balancing should be added in ccs_engine_wa_mode().
>   *
>   * - GT workarounds: the list of these WAs is applied whenever these registers
>   *   revert to their default values: on GPU reset, suspend/resume [1]_, etc.
> @@ -2854,6 +2855,22 @@ add_render_compute_tuning_settings(struct intel_gt *gt,
>  		wa_write_clr(wal, GEN8_GARBCNTL, GEN12_BUS_HASH_CTL_BIT_EXC);
>  }
>  
> +static void ccs_engine_wa_mode(struct intel_engine_cs *engine, struct i915_wa_list *wal)
> +{
> +	struct intel_gt *gt = engine->gt;
> +
> +	if (!IS_DG2(gt->i915))
> +		return;
> +
> +	/*
> +	 * Wa_14019159160: This workaround, along with others, leads to
> +	 * significant challenges in utilizing load balancing among the
> +	 * CCS slices. Consequently, an architectural decision has been
> +	 * made to completely disable automatic CCS load balancing.
> +	 */
> +	wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE);
> +}
> +
>  /*
>   * The workarounds in this function apply to shared registers in
>   * the general render reset domain that aren't tied to a
> @@ -3004,8 +3021,10 @@ engine_init_workarounds(struct intel_engine_cs *engine, struct i915_wa_list *wal
>  	 * to a single RCS/CCS engine's workaround list since
>  	 * they're reset as part of the general render domain reset.
>  	 */
> -	if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE)
> +	if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE) {
>  		general_render_compute_wa_init(engine, wal);
> +		ccs_engine_wa_mode(engine, wal);
> +	}
>  
>  	if (engine->class == COMPUTE_CLASS)
>  		ccs_engine_wa_init(engine, wal);
> -- 
> 2.43.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 2/4] drm/i915/gt: Refactor uabi engine class/instance list creation
  2024-03-08 20:22 ` [PATCH v5 2/4] drm/i915/gt: Refactor uabi engine class/instance list creation Andi Shyti
@ 2024-03-12 17:08   ` Matt Roper
  2024-03-12 20:49     ` Andi Shyti
  0 siblings, 1 reply; 10+ messages in thread
From: Matt Roper @ 2024-03-12 17:08 UTC (permalink / raw)
  To: Andi Shyti
  Cc: intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen,
	John Harrison, stable, Andi Shyti, Tvrtko Ursulin

On Fri, Mar 08, 2024 at 09:22:17PM +0100, Andi Shyti wrote:
> For the upcoming changes we need a cleaner way to build the list
> of uabi engines.
> 
> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: <stable@vger.kernel.org> # v6.2+

I don't really see why we need patches 2 & 3 in this series.  If we want
to restrict the platform to a single CCS engine for now (and give that
single engine access to all of the cslices), it would be much simpler to
only create a single intel_engine_cs which which would then cause both
i915 and userspace to only consider a single engine, even if more than
one is physically present.  That could be done with a simple adjustment
to engine_mask_apply_compute_fuses() to mask off extra bits from the
engine mask such that only a single CCS can get returned rather than the
mask of all CCSs that are present.

Managing all of the engines in the KMD but only exposing one (some) of
them to userspace might be something we need if you want to add extra
functionality down to road to "hotplug" extra engines, or to allow
userspace to explicitly request multi-CCS mode.  But none of that seems
necessary for this series, especially for something you're backporting
to stable kernels.


Matt

> ---
>  drivers/gpu/drm/i915/gt/intel_engine_user.c | 29 ++++++++++++---------
>  1 file changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> index 833987015b8b..11cc06c0c785 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> @@ -203,7 +203,7 @@ static void engine_rename(struct intel_engine_cs *engine, const char *name, u16
>  
>  void intel_engines_driver_register(struct drm_i915_private *i915)
>  {
> -	u16 name_instance, other_instance = 0;
> +	u16 class_instance[I915_LAST_UABI_ENGINE_CLASS + 2] = { };
>  	struct legacy_ring ring = {};
>  	struct list_head *it, *next;
>  	struct rb_node **p, *prev;
> @@ -214,6 +214,8 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
>  	prev = NULL;
>  	p = &i915->uabi_engines.rb_node;
>  	list_for_each_safe(it, next, &engines) {
> +		u16 uabi_class;
> +
>  		struct intel_engine_cs *engine =
>  			container_of(it, typeof(*engine), uabi_list);
>  
> @@ -222,15 +224,14 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
>  
>  		GEM_BUG_ON(engine->class >= ARRAY_SIZE(uabi_classes));
>  		engine->uabi_class = uabi_classes[engine->class];
> -		if (engine->uabi_class == I915_NO_UABI_CLASS) {
> -			name_instance = other_instance++;
> -		} else {
> -			GEM_BUG_ON(engine->uabi_class >=
> -				   ARRAY_SIZE(i915->engine_uabi_class_count));
> -			name_instance =
> -				i915->engine_uabi_class_count[engine->uabi_class]++;
> -		}
> -		engine->uabi_instance = name_instance;
> +
> +		if (engine->uabi_class == I915_NO_UABI_CLASS)
> +			uabi_class = I915_LAST_UABI_ENGINE_CLASS + 1;
> +		else
> +			uabi_class = engine->uabi_class;
> +
> +		GEM_BUG_ON(uabi_class >= ARRAY_SIZE(class_instance));
> +		engine->uabi_instance = class_instance[uabi_class]++;
>  
>  		/*
>  		 * Replace the internal name with the final user and log facing
> @@ -238,11 +239,15 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
>  		 */
>  		engine_rename(engine,
>  			      intel_engine_class_repr(engine->class),
> -			      name_instance);
> +			      engine->uabi_instance);
>  
> -		if (engine->uabi_class == I915_NO_UABI_CLASS)
> +		if (uabi_class > I915_LAST_UABI_ENGINE_CLASS)
>  			continue;
>  
> +		GEM_BUG_ON(uabi_class >=
> +			   ARRAY_SIZE(i915->engine_uabi_class_count));
> +		i915->engine_uabi_class_count[uabi_class]++;
> +
>  		rb_link_node(&engine->uabi_node, prev, p);
>  		rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
>  
> -- 
> 2.43.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 1/4] drm/i915/gt: Disable HW load balancing for CCS
  2024-03-12 16:58   ` Matt Roper
@ 2024-03-12 20:37     ` Andi Shyti
  0 siblings, 0 replies; 10+ messages in thread
From: Andi Shyti @ 2024-03-12 20:37 UTC (permalink / raw)
  To: Matt Roper
  Cc: Andi Shyti, intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen,
	John Harrison, stable, Andi Shyti, Tvrtko Ursulin

Hi Matt,

...

> >  #define GEN12_RCU_MODE				_MMIO(0x14800)
> >  #define   GEN12_RCU_MODE_CCS_ENABLE		REG_BIT(0)
> > +#define   XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE	REG_BIT(1)
> 
> Nitpick: we usually order register bits in descending order.  Aside from
> that,

I can take care of it.

> Reviewed-by: Matt Roper <matthew.d.roper@intel.com>

Thanks!
Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 2/4] drm/i915/gt: Refactor uabi engine class/instance list creation
  2024-03-12 17:08   ` Matt Roper
@ 2024-03-12 20:49     ` Andi Shyti
  0 siblings, 0 replies; 10+ messages in thread
From: Andi Shyti @ 2024-03-12 20:49 UTC (permalink / raw)
  To: Matt Roper
  Cc: Andi Shyti, intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen,
	John Harrison, stable, Andi Shyti, Tvrtko Ursulin

On Tue, Mar 12, 2024 at 10:08:33AM -0700, Matt Roper wrote:
> On Fri, Mar 08, 2024 at 09:22:17PM +0100, Andi Shyti wrote:
> > For the upcoming changes we need a cleaner way to build the list
> > of uabi engines.
> > 
> > Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> > Cc: <stable@vger.kernel.org> # v6.2+
> 
> I don't really see why we need patches 2 & 3 in this series. 

For patch number '2' We had a round of review with Tvrtko and we
wanted to avoid the change I pasted at the bottom[*], which would
decrease something that was increased earlier.

> If we want
> to restrict the platform to a single CCS engine for now (and give that
> single engine access to all of the cslices), it would be much simpler to
> only create a single intel_engine_cs which which would then cause both
> i915 and userspace to only consider a single engine, even if more than
> one is physically present.  That could be done with a simple adjustment
> to engine_mask_apply_compute_fuses() to mask off extra bits from the
> engine mask such that only a single CCS can get returned rather than the
> mask of all CCSs that are present.
> 
> Managing all of the engines in the KMD but only exposing one (some) of
> them to userspace might be something we need if you want to add extra
> functionality down to road to "hotplug" extra engines, or to allow
> userspace to explicitly request multi-CCS mode.  But none of that seems
> necessary for this series, especially for something you're backporting
> to stable kernels.

It's true, it would even be easier to mask out all the CCS
engines after the first. I thought of this.

On one hand hand, adding a for_each_available_engine() throught
the stable path its a bit of abusing, but it's functional to the
single CCS mode.

I was aiming for a longer term solution. If I add a patch to mask
off CCS engines, then I will need to revert it quite soon for
the stable release.

I'm not sure which one is better, though.

Thanks,
Andi

[*]
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 833987015b8b..7041acc77810 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -243,6 +243,15 @@  void intel_engines_driver_register(struct drm_i915_private *i915)
 		if (engine->uabi_class == I915_NO_UABI_CLASS)
 			continue;

+		/*
+		 * Do not list and do not count CCS engines other than the first
+		 */
+		if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE &&
+		    engine->uabi_instance > 0) {
+			i915->engine_uabi_class_count[engine->uabi_class]--;
+			continue;
+		}
+
 		rb_link_node(&engine->uabi_node, prev, p);
 		rb_insert_color(&engine->uabi_node, &i915->uabi_engines);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-03-12 20:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-08 20:22 [PATCH v5 0/4] Disable automatic load CCS load balancing Andi Shyti
2024-03-08 20:22 ` [PATCH v5 1/4] drm/i915/gt: Disable HW load balancing for CCS Andi Shyti
2024-03-12 16:58   ` Matt Roper
2024-03-12 20:37     ` Andi Shyti
2024-03-08 20:22 ` [PATCH v5 2/4] drm/i915/gt: Refactor uabi engine class/instance list creation Andi Shyti
2024-03-12 17:08   ` Matt Roper
2024-03-12 20:49     ` Andi Shyti
2024-03-08 20:22 ` [PATCH v5 3/4] drm/i915/gt: Disable tests for CCS engines beyond the first Andi Shyti
2024-03-08 20:25   ` kernel test robot
2024-03-08 20:22 ` [PATCH v5 4/4] drm/i915/gt: Enable only one CCS for compute workload Andi Shyti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox