[PATCH v4 0/3] Disable automatic load CCS load balancing

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 0/3] Disable automatic load CCS load balancing
@ 2024-03-06  1:22 Andi Shyti
  2024-03-06  1:22 ` [PATCH v4 1/3] drm/i915/gt: Disable HW load balancing for CCS Andi Shyti
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Andi Shyti @ 2024-03-06  1:22 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Chris Wilson, Joonas Lahtinen, Matt Roper, John Harrison, stable,
	Andi Shyti, Andi Shyti, Tvrtko Ursulin

Hi,

I have to admit that v3 was a lazy attempt. This one should be on
the right path.

this series does basically two things:

1. Disables automatic load balancing as adviced by the hardware
   workaround.

2. Assigns all the CCS slices to one single user engine. The user
   will then be able to query only one CCS engine

I'm using here the "Requires: " tag, but I'm not sure the commit
id will be valid, on the other hand, I don't know what commit id
I should use.

Thanks Tvrtko, Matt, John and Joonas for your reviews!

Andi

Changelog
=========
v3 -> v4
- Reword correctly the comment in the workaround
- Fix a buffer overflow (Thanks Joonas)
- Handle properly the fused engines when setting the CCS mode.

v2 -> v3
- Simplified the algorithm for creating the list of the exported
  uabi engines. (Patch 1) (Thanks, Tvrtko)
- Consider the fused engines when creating the uabi engine list
  (Patch 2) (Thanks, Matt)
- Patch 4 now uses a the refactoring from patch 1, in a cleaner
  outcome.

v1 -> v2
- In Patch 1 use the correct workaround number (thanks Matt).
- In Patch 2 do not add the extra CCS engines to the exposed UABI
  engine list and adapt the engine counting accordingly (thanks
  Tvrtko).
- Reword the commit of Patch 2 (thanks John).


Andi Shyti (3):
  drm/i915/gt: Disable HW load balancing for CCS
  drm/i915/gt: Refactor uabi engine class/instance list creation
  drm/i915/gt: Enable only one CCS for compute workload

 drivers/gpu/drm/i915/gt/intel_engine_user.c | 40 ++++++++++++++-------
 drivers/gpu/drm/i915/gt/intel_gt.c          | 23 ++++++++++++
 drivers/gpu/drm/i915/gt/intel_gt_regs.h     |  6 ++++
 drivers/gpu/drm/i915/gt/intel_workarounds.c |  5 +++
 4 files changed, 62 insertions(+), 12 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v4 1/3] drm/i915/gt: Disable HW load balancing for CCS
  2024-03-06  1:22 [PATCH v4 0/3] Disable automatic load CCS load balancing Andi Shyti
@ 2024-03-06  1:22 ` Andi Shyti
  2024-03-06 23:46   ` Matt Roper
  2024-03-06  1:22 ` [PATCH v4 2/3] drm/i915/gt: Refactor uabi engine class/instance list creation Andi Shyti
  2024-03-06  1:22 ` [PATCH v4 3/3] drm/i915/gt: Enable only one CCS for compute workload Andi Shyti
  2 siblings, 1 reply; 10+ messages in thread
From: Andi Shyti @ 2024-03-06  1:22 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Chris Wilson, Joonas Lahtinen, Matt Roper, John Harrison, stable,
	Andi Shyti, Andi Shyti, Tvrtko Ursulin

The hardware should not dynamically balance the load between CCS
engines. Wa_14019159160 recommends disabling it across all
platforms.

Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: <stable@vger.kernel.org> # v6.2+
---
 drivers/gpu/drm/i915/gt/intel_gt_regs.h     | 1 +
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index 50962cfd1353..cf709f6c05ae 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -1478,6 +1478,7 @@
 
 #define GEN12_RCU_MODE				_MMIO(0x14800)
 #define   GEN12_RCU_MODE_CCS_ENABLE		REG_BIT(0)
+#define   XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE	REG_BIT(1)
 
 #define CHV_FUSE_GT				_MMIO(VLV_GUNIT_BASE + 0x2168)
 #define   CHV_FGT_DISABLE_SS0			(1 << 10)
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index d67d44611c28..a2e78cf0b5f5 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -2945,6 +2945,11 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li
 
 		/* Wa_18028616096 */
 		wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0_UDW, UGM_FRAGMENT_THRESHOLD_TO_3);
+
+		/*
+		 * Wa_14019159160: disable the automatic CCS load balancing
+		 */
+		wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE);
 	}
 
 	if (IS_DG2_G11(i915)) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 2/3] drm/i915/gt: Refactor uabi engine class/instance list creation
  2024-03-06  1:22 [PATCH v4 0/3] Disable automatic load CCS load balancing Andi Shyti
  2024-03-06  1:22 ` [PATCH v4 1/3] drm/i915/gt: Disable HW load balancing for CCS Andi Shyti
@ 2024-03-06  1:22 ` Andi Shyti
  2024-03-06  1:22 ` [PATCH v4 3/3] drm/i915/gt: Enable only one CCS for compute workload Andi Shyti
  2 siblings, 0 replies; 10+ messages in thread
From: Andi Shyti @ 2024-03-06  1:22 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Chris Wilson, Joonas Lahtinen, Matt Roper, John Harrison, stable,
	Andi Shyti, Andi Shyti, Tvrtko Ursulin

For the upcoming changes we need a cleaner way to build the list
of uabi engines.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.2+
---
 drivers/gpu/drm/i915/gt/intel_engine_user.c | 29 ++++++++++++---------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 833987015b8b..11cc06c0c785 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -203,7 +203,7 @@ static void engine_rename(struct intel_engine_cs *engine, const char *name, u16
 
 void intel_engines_driver_register(struct drm_i915_private *i915)
 {
-	u16 name_instance, other_instance = 0;
+	u16 class_instance[I915_LAST_UABI_ENGINE_CLASS + 2] = { };
 	struct legacy_ring ring = {};
 	struct list_head *it, *next;
 	struct rb_node **p, *prev;
@@ -214,6 +214,8 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 	prev = NULL;
 	p = &i915->uabi_engines.rb_node;
 	list_for_each_safe(it, next, &engines) {
+		u16 uabi_class;
+
 		struct intel_engine_cs *engine =
 			container_of(it, typeof(*engine), uabi_list);
 
@@ -222,15 +224,14 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 
 		GEM_BUG_ON(engine->class >= ARRAY_SIZE(uabi_classes));
 		engine->uabi_class = uabi_classes[engine->class];
-		if (engine->uabi_class == I915_NO_UABI_CLASS) {
-			name_instance = other_instance++;
-		} else {
-			GEM_BUG_ON(engine->uabi_class >=
-				   ARRAY_SIZE(i915->engine_uabi_class_count));
-			name_instance =
-				i915->engine_uabi_class_count[engine->uabi_class]++;
-		}
-		engine->uabi_instance = name_instance;
+
+		if (engine->uabi_class == I915_NO_UABI_CLASS)
+			uabi_class = I915_LAST_UABI_ENGINE_CLASS + 1;
+		else
+			uabi_class = engine->uabi_class;
+
+		GEM_BUG_ON(uabi_class >= ARRAY_SIZE(class_instance));
+		engine->uabi_instance = class_instance[uabi_class]++;
 
 		/*
 		 * Replace the internal name with the final user and log facing
@@ -238,11 +239,15 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 		 */
 		engine_rename(engine,
 			      intel_engine_class_repr(engine->class),
-			      name_instance);
+			      engine->uabi_instance);
 
-		if (engine->uabi_class == I915_NO_UABI_CLASS)
+		if (uabi_class > I915_LAST_UABI_ENGINE_CLASS)
 			continue;
 
+		GEM_BUG_ON(uabi_class >=
+			   ARRAY_SIZE(i915->engine_uabi_class_count));
+		i915->engine_uabi_class_count[uabi_class]++;
+
 		rb_link_node(&engine->uabi_node, prev, p);
 		rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 3/3] drm/i915/gt: Enable only one CCS for compute workload
  2024-03-06  1:22 [PATCH v4 0/3] Disable automatic load CCS load balancing Andi Shyti
  2024-03-06  1:22 ` [PATCH v4 1/3] drm/i915/gt: Disable HW load balancing for CCS Andi Shyti
  2024-03-06  1:22 ` [PATCH v4 2/3] drm/i915/gt: Refactor uabi engine class/instance list creation Andi Shyti
@ 2024-03-06  1:22 ` Andi Shyti
  2024-03-07  0:14   ` Matt Roper
  2 siblings, 1 reply; 10+ messages in thread
From: Andi Shyti @ 2024-03-06  1:22 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Chris Wilson, Joonas Lahtinen, Matt Roper, John Harrison, stable,
	Andi Shyti, Andi Shyti, Tvrtko Ursulin

Enable only one CCS engine by default with all the compute sices
allocated to it.

While generating the list of UABI engines to be exposed to the
user, exclude any additional CCS engines beyond the first
instance.

This change can be tested with igt i915_query.

Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
Requires: 97aba5e46038 ("drm/i915/gt: Refactor uabi engine class/instance list creation")
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: <stable@vger.kernel.org> # v6.2+
---
 drivers/gpu/drm/i915/gt/intel_engine_user.c | 11 ++++++++++
 drivers/gpu/drm/i915/gt/intel_gt.c          | 23 +++++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_gt_regs.h     |  5 +++++
 3 files changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 11cc06c0c785..9ef1c4ce252d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -208,6 +208,7 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 	struct list_head *it, *next;
 	struct rb_node **p, *prev;
 	LIST_HEAD(engines);
+	u16 uabi_ccs = 0;
 
 	sort_engines(i915, &engines);
 
@@ -244,6 +245,16 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 		if (uabi_class > I915_LAST_UABI_ENGINE_CLASS)
 			continue;
 
+		/*
+		 * The load is balanced among all the available compute
+		 * slices. Expose only the first instance of the compute
+		 * engine.
+		 */
+		if (IS_DG2(i915) &&
+		    uabi_class == I915_ENGINE_CLASS_COMPUTE &&
+		    uabi_ccs++)
+			continue;
+
 		GEM_BUG_ON(uabi_class >=
 			   ARRAY_SIZE(i915->engine_uabi_class_count));
 		i915->engine_uabi_class_count[uabi_class]++;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index a425db5ed3a2..0aac97439552 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -168,6 +168,26 @@ static void init_unused_rings(struct intel_gt *gt)
 	}
 }
 
+static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
+{
+	u32 mode;
+	int cslice;
+
+	if (!IS_DG2(gt->i915))
+		return;
+
+	/* Set '0' as a default CCS id to all the cslices */
+	mode = 0;
+
+	for (cslice = 0; cslice < hweight32(CCS_MASK(gt)); cslice++)
+		/* Write 0x7 if no CCS context dispatches to this cslice */
+		if (!(CCS_MASK(gt) & BIT(cslice)))
+			mode |= XEHP_CCS_MODE_CSLICE(cslice,
+						     XEHP_CCS_MODE_CSLICE_MASK);
+
+	intel_uncore_write(gt->uncore, XEHP_CCS_MODE, mode);
+}
+
 int intel_gt_init_hw(struct intel_gt *gt)
 {
 	struct drm_i915_private *i915 = gt->i915;
@@ -195,6 +215,9 @@ int intel_gt_init_hw(struct intel_gt *gt)
 
 	intel_gt_init_swizzling(gt);
 
+	/* Configure CCS mode */
+	intel_gt_apply_ccs_mode(gt);
+
 	/*
 	 * At least 830 can leave some of the unused rings
 	 * "active" (ie. head != tail) after resume which
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index cf709f6c05ae..8224dd99c7d7 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -1480,6 +1480,11 @@
 #define   GEN12_RCU_MODE_CCS_ENABLE		REG_BIT(0)
 #define   XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE	REG_BIT(1)
 
+#define XEHP_CCS_MODE				_MMIO(0x14804)
+#define   XEHP_CCS_MODE_CSLICE_MASK		REG_GENMASK(2, 0) /* CCS0-3 + rsvd */
+#define   XEHP_CCS_MODE_CSLICE_WIDTH		ilog2(XEHP_CCS_MODE_CSLICE_MASK + 1)
+#define   XEHP_CCS_MODE_CSLICE(cslice, ccs)	(ccs << (cslice * XEHP_CCS_MODE_CSLICE_WIDTH))
+
 #define CHV_FUSE_GT				_MMIO(VLV_GUNIT_BASE + 0x2168)
 #define   CHV_FGT_DISABLE_SS0			(1 << 10)
 #define   CHV_FGT_DISABLE_SS1			(1 << 11)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/3] drm/i915/gt: Disable HW load balancing for CCS
  2024-03-06  1:22 ` [PATCH v4 1/3] drm/i915/gt: Disable HW load balancing for CCS Andi Shyti
@ 2024-03-06 23:46   ` Matt Roper
  2024-03-07 20:02     ` Andi Shyti
  0 siblings, 1 reply; 10+ messages in thread
From: Matt Roper @ 2024-03-06 23:46 UTC (permalink / raw)
  To: Andi Shyti
  Cc: intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen,
	John Harrison, stable, Andi Shyti, Tvrtko Ursulin

On Wed, Mar 06, 2024 at 02:22:45AM +0100, Andi Shyti wrote:
> The hardware should not dynamically balance the load between CCS
> engines. Wa_14019159160 recommends disabling it across all
> platforms.
> 
> Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: <stable@vger.kernel.org> # v6.2+
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h     | 1 +
>  drivers/gpu/drm/i915/gt/intel_workarounds.c | 5 +++++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> index 50962cfd1353..cf709f6c05ae 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> @@ -1478,6 +1478,7 @@
>  
>  #define GEN12_RCU_MODE				_MMIO(0x14800)
>  #define   GEN12_RCU_MODE_CCS_ENABLE		REG_BIT(0)
> +#define   XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE	REG_BIT(1)
>  
>  #define CHV_FUSE_GT				_MMIO(VLV_GUNIT_BASE + 0x2168)
>  #define   CHV_FGT_DISABLE_SS0			(1 << 10)
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index d67d44611c28..a2e78cf0b5f5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -2945,6 +2945,11 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li
>  
>  		/* Wa_18028616096 */
>  		wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0_UDW, UGM_FRAGMENT_THRESHOLD_TO_3);
> +
> +		/*
> +		 * Wa_14019159160: disable the automatic CCS load balancing

I'm still a bit concerned that this doesn't really match what this
specific workaround is asking us to do.  There seems to be an agreement
on various internal email threads that we need to disable load
balancing, but there's no single specific workaround that officially
documents that decision.

This specific workaround asks us to do a bunch of different things, and
the third item it asks for is to disable load balancing in very specific
cases (i.e., while the RCS is active at the same time as one or more CCS
engines).  Taking this workaround in isolation, it would be valid to
keep load balancing active if you were just using the CCS engines and
leaving the RCS idle, or if balancing was turned on/off by the GuC
scheduler according to engine use at the moment, as the documented
workaround seems to assume will be the case.

So in general I think we do need to disable load balancing based on
other offline discussion, but blaming that entire change on
Wa_14019159160 seems a bit questionable since it's not really what this
specific workaround is asking us to do and someone may come back and try
to "correct" the implementation of this workaround in the future without
realizing there are other factors too.  It would be great if we could
get hardware teams to properly document this expectation somewhere
(either in a separate dedicated workaround, or in the MMIO tuning guide)
so that we'll have a more direct and authoritative source for such a
large behavioral change.

Matt

> +		 */
> +		wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE);
>  	}
>  
>  	if (IS_DG2_G11(i915)) {
> -- 
> 2.43.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 3/3] drm/i915/gt: Enable only one CCS for compute workload
  2024-03-06  1:22 ` [PATCH v4 3/3] drm/i915/gt: Enable only one CCS for compute workload Andi Shyti
@ 2024-03-07  0:14   ` Matt Roper
  2024-03-07 23:52     ` Andi Shyti
  0 siblings, 1 reply; 10+ messages in thread
From: Matt Roper @ 2024-03-07  0:14 UTC (permalink / raw)
  To: Andi Shyti
  Cc: intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen,
	John Harrison, stable, Andi Shyti, Tvrtko Ursulin

On Wed, Mar 06, 2024 at 02:22:47AM +0100, Andi Shyti wrote:
> Enable only one CCS engine by default with all the compute sices
> allocated to it.
> 
> While generating the list of UABI engines to be exposed to the
> user, exclude any additional CCS engines beyond the first
> instance.
> 
> This change can be tested with igt i915_query.
> 
> Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
> Requires: 97aba5e46038 ("drm/i915/gt: Refactor uabi engine class/instance list creation")
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: <stable@vger.kernel.org> # v6.2+
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_user.c | 11 ++++++++++
>  drivers/gpu/drm/i915/gt/intel_gt.c          | 23 +++++++++++++++++++++
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h     |  5 +++++
>  3 files changed, 39 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> index 11cc06c0c785..9ef1c4ce252d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> @@ -208,6 +208,7 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
>  	struct list_head *it, *next;
>  	struct rb_node **p, *prev;
>  	LIST_HEAD(engines);
> +	u16 uabi_ccs = 0;
>  
>  	sort_engines(i915, &engines);
>  
> @@ -244,6 +245,16 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
>  		if (uabi_class > I915_LAST_UABI_ENGINE_CLASS)
>  			continue;
>  
> +		/*
> +		 * The load is balanced among all the available compute
> +		 * slices. Expose only the first instance of the compute
> +		 * engine.
> +		 */
> +		if (IS_DG2(i915) &&
> +		    uabi_class == I915_ENGINE_CLASS_COMPUTE &&
> +		    uabi_ccs++)
> +			continue;
> +
>  		GEM_BUG_ON(uabi_class >=
>  			   ARRAY_SIZE(i915->engine_uabi_class_count));
>  		i915->engine_uabi_class_count[uabi_class]++;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index a425db5ed3a2..0aac97439552 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -168,6 +168,26 @@ static void init_unused_rings(struct intel_gt *gt)
>  	}
>  }
>  
> +static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> +{
> +	u32 mode;
> +	int cslice;
> +
> +	if (!IS_DG2(gt->i915))
> +		return;
> +
> +	/* Set '0' as a default CCS id to all the cslices */
> +	mode = 0;
> +
> +	for (cslice = 0; cslice < hweight32(CCS_MASK(gt)); cslice++)
> +		/* Write 0x7 if no CCS context dispatches to this cslice */
> +		if (!(CCS_MASK(gt) & BIT(cslice)))
> +			mode |= XEHP_CCS_MODE_CSLICE(cslice,
> +						     XEHP_CCS_MODE_CSLICE_MASK);
> +
> +	intel_uncore_write(gt->uncore, XEHP_CCS_MODE, mode);

This is still going to hook all available cslices up to hardware engine
ccs0.  But what you actually want is to hook them all up to what
userspace sees as CCS0 (i.e., the first CCS engine that wasn't fused
off).  Hardware's engine numbering and userspace's numbering aren't the
same.

Also, if you have a part that only has hardware ccs1/cslice1 for
example, you're not going to set cslices 2 & 3 to 0x7 properly.

So probably what you want is something like this (untested):

static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
{
        u32 mode = 0;
        int first_ccs = __ffs(CCS_MASK(gt));

        /*
         * Re-assign every present cslice to the first available CCS
         * engine; mark unavailable cslices as unused.
         */
        for (int cslice = 0; cslice < 4; cslice++) {
                if (CCS_MASK(gt) & BIT(cslice))
                        mode |= XEHP_CCS_MODE_CSLICE(cslice, first_ccs);
                else
                        mode |= XEHP_CCS_MODE_CSLICE(cslice,
                                                     XEHP_CCS_MODE_CSLICE_MASK);
        }

        intel_uncore_write(gt->uncore, XEHP_CCS_MODE, mode);
}

> +}
> +
>  int intel_gt_init_hw(struct intel_gt *gt)
>  {
>  	struct drm_i915_private *i915 = gt->i915;
> @@ -195,6 +215,9 @@ int intel_gt_init_hw(struct intel_gt *gt)
>  
>  	intel_gt_init_swizzling(gt);
>  
> +	/* Configure CCS mode */
> +	intel_gt_apply_ccs_mode(gt);

This is only setting this once during init.  The value gets lost on
every RCS/CCS reset, so we need to make sure it gets reapplied when
necessary.  That means you either need to add this to the GuC regset, or
you need to implement the programming as a "fake workaround" so that the
workaround framework will take care of the re-application for you.


Matt

> +
>  	/*
>  	 * At least 830 can leave some of the unused rings
>  	 * "active" (ie. head != tail) after resume which
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> index cf709f6c05ae..8224dd99c7d7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> @@ -1480,6 +1480,11 @@
>  #define   GEN12_RCU_MODE_CCS_ENABLE		REG_BIT(0)
>  #define   XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE	REG_BIT(1)
>  
> +#define XEHP_CCS_MODE				_MMIO(0x14804)
> +#define   XEHP_CCS_MODE_CSLICE_MASK		REG_GENMASK(2, 0) /* CCS0-3 + rsvd */
> +#define   XEHP_CCS_MODE_CSLICE_WIDTH		ilog2(XEHP_CCS_MODE_CSLICE_MASK + 1)
> +#define   XEHP_CCS_MODE_CSLICE(cslice, ccs)	(ccs << (cslice * XEHP_CCS_MODE_CSLICE_WIDTH))
> +
>  #define CHV_FUSE_GT				_MMIO(VLV_GUNIT_BASE + 0x2168)
>  #define   CHV_FGT_DISABLE_SS0			(1 << 10)
>  #define   CHV_FGT_DISABLE_SS1			(1 << 11)
> -- 
> 2.43.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/3] drm/i915/gt: Disable HW load balancing for CCS
  2024-03-06 23:46   ` Matt Roper
@ 2024-03-07 20:02     ` Andi Shyti
  2024-03-07 20:18       ` John Harrison
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Shyti @ 2024-03-07 20:02 UTC (permalink / raw)
  To: Matt Roper
  Cc: Andi Shyti, intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen,
	John Harrison, stable, Andi Shyti, Tvrtko Ursulin

Hi Matt,

On Wed, Mar 06, 2024 at 03:46:09PM -0800, Matt Roper wrote:
> On Wed, Mar 06, 2024 at 02:22:45AM +0100, Andi Shyti wrote:
> > The hardware should not dynamically balance the load between CCS
> > engines. Wa_14019159160 recommends disabling it across all
> > platforms.
> > 
> > Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
> > Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> > Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Matt Roper <matthew.d.roper@intel.com>
> > Cc: <stable@vger.kernel.org> # v6.2+
> > ---
> >  drivers/gpu/drm/i915/gt/intel_gt_regs.h     | 1 +
> >  drivers/gpu/drm/i915/gt/intel_workarounds.c | 5 +++++
> >  2 files changed, 6 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > index 50962cfd1353..cf709f6c05ae 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > @@ -1478,6 +1478,7 @@
> >  
> >  #define GEN12_RCU_MODE				_MMIO(0x14800)
> >  #define   GEN12_RCU_MODE_CCS_ENABLE		REG_BIT(0)
> > +#define   XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE	REG_BIT(1)
> >  
> >  #define CHV_FUSE_GT				_MMIO(VLV_GUNIT_BASE + 0x2168)
> >  #define   CHV_FGT_DISABLE_SS0			(1 << 10)
> > diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > index d67d44611c28..a2e78cf0b5f5 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > @@ -2945,6 +2945,11 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li
> >  
> >  		/* Wa_18028616096 */
> >  		wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0_UDW, UGM_FRAGMENT_THRESHOLD_TO_3);
> > +
> > +		/*
> > +		 * Wa_14019159160: disable the automatic CCS load balancing
> 
> I'm still a bit concerned that this doesn't really match what this
> specific workaround is asking us to do.  There seems to be an agreement
> on various internal email threads that we need to disable load
> balancing, but there's no single specific workaround that officially
> documents that decision.
> 
> This specific workaround asks us to do a bunch of different things, and
> the third item it asks for is to disable load balancing in very specific
> cases (i.e., while the RCS is active at the same time as one or more CCS
> engines).  Taking this workaround in isolation, it would be valid to
> keep load balancing active if you were just using the CCS engines and
> leaving the RCS idle, or if balancing was turned on/off by the GuC
> scheduler according to engine use at the moment, as the documented
> workaround seems to assume will be the case.
> 
> So in general I think we do need to disable load balancing based on
> other offline discussion, but blaming that entire change on
> Wa_14019159160 seems a bit questionable since it's not really what this
> specific workaround is asking us to do and someone may come back and try
> to "correct" the implementation of this workaround in the future without
> realizing there are other factors too.  It would be great if we could
> get hardware teams to properly document this expectation somewhere
> (either in a separate dedicated workaround, or in the MMIO tuning guide)
> so that we'll have a more direct and authoritative source for such a
> large behavioral change.

On one had I think you are right, on the other hand I think this
workaround has not properly developed in what we have been
describing later.

Perhaps, one solution would be to create a new generic workaround
for all platforms with more than one CCS and put everyone at
peace. But I don't know the process.

Are you able to help here? Or Joonas?

Thanks, Matt!
Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/3] drm/i915/gt: Disable HW load balancing for CCS
  2024-03-07 20:02     ` Andi Shyti
@ 2024-03-07 20:18       ` John Harrison
  2024-03-07 23:53         ` Andi Shyti
  0 siblings, 1 reply; 10+ messages in thread
From: John Harrison @ 2024-03-07 20:18 UTC (permalink / raw)
  To: Andi Shyti, Matt Roper
  Cc: intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen, stable,
	Andi Shyti, Tvrtko Ursulin

On 3/7/2024 12:02, Andi Shyti wrote:
> Hi Matt,
>
> On Wed, Mar 06, 2024 at 03:46:09PM -0800, Matt Roper wrote:
>> On Wed, Mar 06, 2024 at 02:22:45AM +0100, Andi Shyti wrote:
>>> The hardware should not dynamically balance the load between CCS
>>> engines. Wa_14019159160 recommends disabling it across all
>>> platforms.
>>>
>>> Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
>>> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> Cc: Matt Roper <matthew.d.roper@intel.com>
>>> Cc: <stable@vger.kernel.org> # v6.2+
>>> ---
>>>   drivers/gpu/drm/i915/gt/intel_gt_regs.h     | 1 +
>>>   drivers/gpu/drm/i915/gt/intel_workarounds.c | 5 +++++
>>>   2 files changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
>>> index 50962cfd1353..cf709f6c05ae 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
>>> @@ -1478,6 +1478,7 @@
>>>   
>>>   #define GEN12_RCU_MODE				_MMIO(0x14800)
>>>   #define   GEN12_RCU_MODE_CCS_ENABLE		REG_BIT(0)
>>> +#define   XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE	REG_BIT(1)
>>>   
>>>   #define CHV_FUSE_GT				_MMIO(VLV_GUNIT_BASE + 0x2168)
>>>   #define   CHV_FGT_DISABLE_SS0			(1 << 10)
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
>>> index d67d44611c28..a2e78cf0b5f5 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
>>> @@ -2945,6 +2945,11 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li
>>>   
>>>   		/* Wa_18028616096 */
>>>   		wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0_UDW, UGM_FRAGMENT_THRESHOLD_TO_3);
>>> +
>>> +		/*
>>> +		 * Wa_14019159160: disable the automatic CCS load balancing
>> I'm still a bit concerned that this doesn't really match what this
>> specific workaround is asking us to do.  There seems to be an agreement
>> on various internal email threads that we need to disable load
>> balancing, but there's no single specific workaround that officially
>> documents that decision.
>>
>> This specific workaround asks us to do a bunch of different things, and
>> the third item it asks for is to disable load balancing in very specific
>> cases (i.e., while the RCS is active at the same time as one or more CCS
>> engines).  Taking this workaround in isolation, it would be valid to
>> keep load balancing active if you were just using the CCS engines and
>> leaving the RCS idle, or if balancing was turned on/off by the GuC
>> scheduler according to engine use at the moment, as the documented
>> workaround seems to assume will be the case.
>>
>> So in general I think we do need to disable load balancing based on
>> other offline discussion, but blaming that entire change on
>> Wa_14019159160 seems a bit questionable since it's not really what this
>> specific workaround is asking us to do and someone may come back and try
>> to "correct" the implementation of this workaround in the future without
>> realizing there are other factors too.  It would be great if we could
>> get hardware teams to properly document this expectation somewhere
>> (either in a separate dedicated workaround, or in the MMIO tuning guide)
>> so that we'll have a more direct and authoritative source for such a
>> large behavioral change.
> On one had I think you are right, on the other hand I think this
> workaround has not properly developed in what we have been
> describing later.
I think it is not so much that the w/a is 'not properly developed'. It's 
more that this w/a plus others when taken in combination plus knowledge 
of future directions has led to an architectural decision that is beyond 
the scope of the w/a.

As such, I think Matt is definitely correct. Tagging a code change with 
a w/a number when that change does something very different to what is 
described in the w/a is wrong and a maintenance issue waiting to happen.

At the very least, you should just put in a comment explaining the 
situation. E.g.:

  /*
  * Wa_14019159160: This w/a plus others cause significant issues with the use of
  * load balancing. Hence an architectural level decision was taking to simply
  * disable automatic CCS load balancing completely.
  */

Ideally yes, we would get an officially trackable software only 
workaround number or something created and just use that. But in the 
meantime, just clearly explaining the situation seems reasonable to me.

John.


>
> Perhaps, one solution would be to create a new generic workaround
> for all platforms with more than one CCS and put everyone at
> peace. But I don't know the process.
>
> Are you able to help here? Or Joonas?
>
> Thanks, Matt!
> Andi


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 3/3] drm/i915/gt: Enable only one CCS for compute workload
  2024-03-07  0:14   ` Matt Roper
@ 2024-03-07 23:52     ` Andi Shyti
  0 siblings, 0 replies; 10+ messages in thread
From: Andi Shyti @ 2024-03-07 23:52 UTC (permalink / raw)
  To: Matt Roper
  Cc: Andi Shyti, intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen,
	John Harrison, stable, Andi Shyti, Tvrtko Ursulin

Hi Matt,

> > +static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> > +{
> > +	u32 mode;
> > +	int cslice;
> > +
> > +	if (!IS_DG2(gt->i915))
> > +		return;
> > +
> > +	/* Set '0' as a default CCS id to all the cslices */
> > +	mode = 0;
> > +
> > +	for (cslice = 0; cslice < hweight32(CCS_MASK(gt)); cslice++)
> > +		/* Write 0x7 if no CCS context dispatches to this cslice */
> > +		if (!(CCS_MASK(gt) & BIT(cslice)))
> > +			mode |= XEHP_CCS_MODE_CSLICE(cslice,
> > +						     XEHP_CCS_MODE_CSLICE_MASK);
> > +
> > +	intel_uncore_write(gt->uncore, XEHP_CCS_MODE, mode);
> 
> This is still going to hook all available cslices up to hardware engine
> ccs0.  But what you actually want is to hook them all up to what
> userspace sees as CCS0 (i.e., the first CCS engine that wasn't fused
> off).  Hardware's engine numbering and userspace's numbering aren't the
> same.

Yes, correct... we had so many discussions and I forgot about it :-)

> Also, if you have a part that only has hardware ccs1/cslice1 for
> example, you're not going to set cslices 2 & 3 to 0x7 properly.

Good point also here, the XEHP_CCS_MODE register is indeed
generic to all platforms.

> So probably what you want is something like this (untested):
> 
> static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> {
>         u32 mode = 0;
>         int first_ccs = __ffs(CCS_MASK(gt));
> 
>         /*
>          * Re-assign every present cslice to the first available CCS
>          * engine; mark unavailable cslices as unused.
>          */
>         for (int cslice = 0; cslice < 4; cslice++) {
>                 if (CCS_MASK(gt) & BIT(cslice))
>                         mode |= XEHP_CCS_MODE_CSLICE(cslice, first_ccs);
>                 else
>                         mode |= XEHP_CCS_MODE_CSLICE(cslice,
>                                                      XEHP_CCS_MODE_CSLICE_MASK);
>         }
> 
>         intel_uncore_write(gt->uncore, XEHP_CCS_MODE, mode);
> }
> 
> > +}
> > +
> >  int intel_gt_init_hw(struct intel_gt *gt)
> >  {
> >  	struct drm_i915_private *i915 = gt->i915;
> > @@ -195,6 +215,9 @@ int intel_gt_init_hw(struct intel_gt *gt)
> >  
> >  	intel_gt_init_swizzling(gt);
> >  
> > +	/* Configure CCS mode */
> > +	intel_gt_apply_ccs_mode(gt);
> 
> This is only setting this once during init.  The value gets lost on
> every RCS/CCS reset, so we need to make sure it gets reapplied when
> necessary.  That means you either need to add this to the GuC regset, or
> you need to implement the programming as a "fake workaround" so that the
> workaround framework will take care of the re-application for you.

OK, I'll hook everything up in the ccs_engine_wa_init().

Thanks,
Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/3] drm/i915/gt: Disable HW load balancing for CCS
  2024-03-07 20:18       ` John Harrison
@ 2024-03-07 23:53         ` Andi Shyti
  0 siblings, 0 replies; 10+ messages in thread
From: Andi Shyti @ 2024-03-07 23:53 UTC (permalink / raw)
  To: John Harrison
  Cc: Andi Shyti, Matt Roper, intel-gfx, dri-devel, Chris Wilson,
	Joonas Lahtinen, stable, Andi Shyti, Tvrtko Ursulin

Hi John,

...

> > > > +
> > > > +		/*
> > > > +		 * Wa_14019159160: disable the automatic CCS load balancing
> > > I'm still a bit concerned that this doesn't really match what this
> > > specific workaround is asking us to do.  There seems to be an agreement
> > > on various internal email threads that we need to disable load
> > > balancing, but there's no single specific workaround that officially
> > > documents that decision.
> > > 
> > > This specific workaround asks us to do a bunch of different things, and
> > > the third item it asks for is to disable load balancing in very specific
> > > cases (i.e., while the RCS is active at the same time as one or more CCS
> > > engines).  Taking this workaround in isolation, it would be valid to
> > > keep load balancing active if you were just using the CCS engines and
> > > leaving the RCS idle, or if balancing was turned on/off by the GuC
> > > scheduler according to engine use at the moment, as the documented
> > > workaround seems to assume will be the case.
> > > 
> > > So in general I think we do need to disable load balancing based on
> > > other offline discussion, but blaming that entire change on
> > > Wa_14019159160 seems a bit questionable since it's not really what this
> > > specific workaround is asking us to do and someone may come back and try
> > > to "correct" the implementation of this workaround in the future without
> > > realizing there are other factors too.  It would be great if we could
> > > get hardware teams to properly document this expectation somewhere
> > > (either in a separate dedicated workaround, or in the MMIO tuning guide)
> > > so that we'll have a more direct and authoritative source for such a
> > > large behavioral change.
> > On one had I think you are right, on the other hand I think this
> > workaround has not properly developed in what we have been
> > describing later.
> I think it is not so much that the w/a is 'not properly developed'. It's
> more that this w/a plus others when taken in combination plus knowledge of
> future directions has led to an architectural decision that is beyond the
> scope of the w/a.
> 
> As such, I think Matt is definitely correct. Tagging a code change with a
> w/a number when that change does something very different to what is
> described in the w/a is wrong and a maintenance issue waiting to happen.
> 
> At the very least, you should just put in a comment explaining the
> situation. E.g.:
> 
>  /*
>  * Wa_14019159160: This w/a plus others cause significant issues with the use of
>  * load balancing. Hence an architectural level decision was taking to simply
>  * disable automatic CCS load balancing completely.
>  */
Good suggestion! I will anyway check tomorrow with Joonas if it's
worth the effort to set up a new "software" workaround.

Thanks,
Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-03-07 23:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-06  1:22 [PATCH v4 0/3] Disable automatic load CCS load balancing Andi Shyti
2024-03-06  1:22 ` [PATCH v4 1/3] drm/i915/gt: Disable HW load balancing for CCS Andi Shyti
2024-03-06 23:46   ` Matt Roper
2024-03-07 20:02     ` Andi Shyti
2024-03-07 20:18       ` John Harrison
2024-03-07 23:53         ` Andi Shyti
2024-03-06  1:22 ` [PATCH v4 2/3] drm/i915/gt: Refactor uabi engine class/instance list creation Andi Shyti
2024-03-06  1:22 ` [PATCH v4 3/3] drm/i915/gt: Enable only one CCS for compute workload Andi Shyti
2024-03-07  0:14   ` Matt Roper
2024-03-07 23:52     ` Andi Shyti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).