[PATCH v4 00/15] CCS static load balance

dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 00/15] CCS static load balance
@ 2025-03-24 13:29 Andi Shyti
  2025-03-24 13:29 ` [PATCH v4 01/15] drm/i915/gt: Avoid using masked workaround for CCS_MODE setting Andi Shyti
                   ` (18 more replies)
  0 siblings, 19 replies; 24+ messages in thread
From: Andi Shyti @ 2025-03-24 13:29 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Tvrtko Ursulin, Joonas Lahtinen, Chris Wilson, Simona Vetter,
	Arshad Mehmood, Michal Mrozek, Andi Shyti, Andi Shyti

Hi,

Back in v3, this patch series was turned down due to community
policies regarding i915 GEM development. Since then, I have
received several requests from userspace developers, which I
initially declined in order to respect those policies.

However, with the latest request from UMD users, I decided to
give this series another chance. I believe that when a feature
is genuinely needed, our goal should be to support it, not to
dismiss user and customer needs blindly.

Here is the link to the userspace counterpart, which depends on
this series to function properly[*].

I've been refreshing and testing the series together with Arshad.

This patchset introduces static load balancing for GPUs with
multiple compute engines. It's a relatively long series.

To help with review, I've broken the work down as much as
possible in multiple patches.

To summarise:
- Patches 1 to 14 introduce no functional changes, aside from
  adding the 'num_cslices' interface.
- Patch 15 contains the core of the CCS mode setting, building
  on the earlier groundwork.

The updated approach focuses on managing the UABI engine list,
which controls which engines are exposed to userspace. Instead
of manipulating physical engines and their memory directly, we
now control exposure via this list.

Since v3, I've kept the changes in v4 to a minimum because there
wasn't a real technical review on the previous posting. I would
really appreciate it if this time all technical concerns could be
raised and discussed on the mailing list.

IGT tests for this work exist but haven't been submitted yet.

Thanks to Chris for the reviews, to Arshad for the work we've
done together over the past few weeks, and to Michal for his
invaluable input from the userspace side.

Thanks,  
Andi

[*] https://github.com/intel/compute-runtime

Changelog:
==========
PATCHv3 -> PATCHv4
------------------
 - Rebase on top of the latest drm-tip
 - Do not call functions inside GEM_BUG_ONs, but call them
   explicitly (thanks Arshad).

PATCHv2 -> PATCHv3
------------------
 - Fix a NULL pointer dereference during module unload.
   In i915_gem_driver_remove() I was accessing the gt after the
   gt was removed. Use the dev_priv, instead (obviously!).
 - Fix a lockdep issue: Some of the uabi_engines_mutex unlocks
   were not correctly placed in the exit paths.
 - Fix a checkpatch error for spaces after and before parenthesis
   in the for_each_enabled_engine() definition.

PATCHv1 -> PATCHv2
------------------
 - Use uabi_mutex to protect the uabi_engines, not the engine
   itself. Rename it to uabi_engines_mutex.
 - Use kobject_add/kobject_del for adding and removing
   interfaces, this way we don't need to destroy and recreate the
   engines, anymore. Refactor intel_engine_add_single_sysfs() to
   reflect this scenario.
 - After adding engines to the rb_tree check that they have been
   added correctly.
 - Fix rb_find_add() compare function to take into accoung also
   the class, not just the instance.

RFCv2 -> PATCHv1
----------------
 - Removed gt->ccs.mutex
 - Rename m -> width, ccs_id -> engine in
   intel_gt_apply_ccs_mode().
 - In the CCS register value calculation
   (intel_gt_apply_ccs_mode()) the engine (ccs_id) needs to move
   along the ccs_mask (set by the user) instead of the
   cslice_mask.
 - Add GEM_BUG_ON after calculating the new ccs_mask
   (update_ccs_mask()) to make sure all angines have been
   evaluated (i.e. ccs_mask must be '0' at the end of the
   algorithm).
 - move wakeref lock before evaluating intel_gt_pm_is_awake() and
   fix exit path accordingly.
 - Use a more compact form in intel_gt_sysfs_ccs_init() and
   add_uabi_ccs_engines() when evaluating sysfs_create_file(): no
   need to store the return value to the err variable which is
   unused. Get rid of err.
 - Print a warnging instead of a debug message if we fail to
   create the sysfs files.
 - If engine files creation fails in
   intel_engine_add_single_sysfs(), print a warning, not an
   error.
 - Rename gt->ccs.ccs_mask to gt->ccs.id_mask and add a comment
   to explain its purpose.
 - During uabi engine creation, in
   intel_engines_driver_register(), the uabi_ccs_instance is
   redundant because the ccs_instances is already tracked in
   engine->uabi_instance.
 - Mark add_uabi_ccs_engines() and remove_uabi_ccs_engines() as
   __maybe_unused not to break bisectability. They wouldn't
   compile in their own commit. They will be used in the next
   patch and the __maybe_unused is removed.
 - Update engine's workaround every time a new mode is set in
   update_ccs_mask().
 - Mark engines as valid or invalid using their status as
   rb_node. Invalid engines are marked as invalid using
   RB_CLEAR_NODE(). Execbufs will check for their validity when
   selecting the engine to be combined to a context.
 - Create for_each_enabled_engine() which skips the non valid
   engines and use it in selftests.

RFCv1 -> RFCv2
--------------
Compared to the first version I've taken a completely different
approach to adding and removing engines. in v1 physical engines
were directly added and removed, along with the memory allocated
to them, each time the user changed the CCS mode (from the
previous cover letter).

Andi Shyti (15):
  drm/i915/gt: Avoid using masked workaround for CCS_MODE setting
  drm/i915/gt: Move the CCS mode variable to a global position
  drm/i915/gt: Allow the creation of multi-mode CCS masks
  drm/i915/gt: Refactor uabi engine class/instance list creation
  drm/i915/gem: Mark and verify UABI engine validity
  drm/i915/gt: Introduce for_each_enabled_engine() and apply it in
    selftests
  drm/i915/gt: Manage CCS engine creation within UABI exposure
  drm/i915/gt: Remove cslices mask value from the CCS structure
  drm/i915/gt: Expose the number of total CCS slices
  drm/i915/gt: Store engine-related sysfs kobjects
  drm/i915/gt: Store active CCS mask
  drm/i915: Protect access to the UABI engines list with a mutex
  drm/i915/gt: Isolate single sysfs engine file creation
  drm/i915/gt: Implement creation and removal routines for CCS engines
  drm/i915/gt: Allow the user to change the CCS mode through sysfs

 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   3 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  28 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  23 --
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   2 +
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |  62 ++-
 drivers/gpu/drm/i915/gt/intel_gt.c            |   3 +
 drivers/gpu/drm/i915/gt/intel_gt.h            |  12 +
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c   | 357 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h   |   5 +-
 drivers/gpu/drm/i915/gt/intel_gt_sysfs.c      |   2 +
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |  19 +-
 drivers/gpu/drm/i915/gt/intel_workarounds.c   |   8 +-
 drivers/gpu/drm/i915/gt/selftest_context.c    |   6 +-
 drivers/gpu/drm/i915/gt/selftest_engine_cs.c  |   4 +-
 .../drm/i915/gt/selftest_engine_heartbeat.c   |   6 +-
 drivers/gpu/drm/i915/gt/selftest_engine_pm.c  |   6 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  52 +--
 drivers/gpu/drm/i915/gt/selftest_gt_pm.c      |   2 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |  22 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |  18 +-
 drivers/gpu/drm/i915/gt/selftest_mocs.c       |   6 +-
 drivers/gpu/drm/i915/gt/selftest_rc6.c        |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c      |   8 +-
 .../drm/i915/gt/selftest_ring_submission.c    |   2 +-
 drivers/gpu/drm/i915/gt/selftest_rps.c        |  14 +-
 drivers/gpu/drm/i915/gt/selftest_timeline.c   |  14 +-
 drivers/gpu/drm/i915/gt/selftest_tlb.c        |   2 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    |  14 +-
 drivers/gpu/drm/i915/gt/sysfs_engines.c       |  80 ++--
 drivers/gpu/drm/i915/gt/sysfs_engines.h       |   2 +
 drivers/gpu/drm/i915/i915_cmd_parser.c        |   2 +
 drivers/gpu/drm/i915/i915_debugfs.c           |   4 +
 drivers/gpu/drm/i915/i915_drv.h               |   5 +
 drivers/gpu/drm/i915/i915_gem.c               |   4 +
 drivers/gpu/drm/i915/i915_perf.c              |   8 +-
 drivers/gpu/drm/i915/i915_pmu.c               |  11 +-
 drivers/gpu/drm/i915/i915_query.c             |  21 +-
 37 files changed, 648 insertions(+), 193 deletions(-)

-- 
2.47.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 01/15] drm/i915/gt: Avoid using masked workaround for CCS_MODE setting
  2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
@ 2025-03-24 13:29 ` Andi Shyti
  2025-04-23 14:28   ` Lucas De Marchi
  2025-03-24 13:29 ` [PATCH v4 02/15] drm/i915/gt: Move the CCS mode variable to a global position Andi Shyti
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 24+ messages in thread
From: Andi Shyti @ 2025-03-24 13:29 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Tvrtko Ursulin, Joonas Lahtinen, Chris Wilson, Simona Vetter,
	Arshad Mehmood, Michal Mrozek, Andi Shyti, Andi Shyti

When setting the CCS mode, we mistakenly used wa_masked_en() to
apply the workaround, which reads from the register and masks the
existing value with the new one.

Our intention was to write the value directly, without masking
it.

So far, this hasn't caused issues because we've been using a
register value that only enables a single CCS engine, typically
with an ID of '0'.

However, in upcoming patches, we will be utilizing multiple
engines, and it's crucial that we write the new value directly
without any masking.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 116683ebe074..b3dd8a077660 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -2760,7 +2760,7 @@ static void ccs_engine_wa_mode(struct intel_engine_cs *engine, struct i915_wa_li
 	 * assign all slices to a single CCS. We will call it CCS mode 1
 	 */
 	mode = intel_gt_apply_ccs_mode(gt);
-	wa_masked_en(wal, XEHP_CCS_MODE, mode);
+	wa_add(wal, XEHP_CCS_MODE, 0, mode, mode, false);
 }
 
 /*
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 01/15] drm/i915/gt: Avoid using masked workaround for CCS_MODE setting
  2025-03-24 13:29 ` [PATCH v4 01/15] drm/i915/gt: Avoid using masked workaround for CCS_MODE setting Andi Shyti
@ 2025-04-23 14:28   ` Lucas De Marchi
  0 siblings, 0 replies; 24+ messages in thread
From: Lucas De Marchi @ 2025-04-23 14:28 UTC (permalink / raw)
  To: Andi Shyti
  Cc: intel-gfx, dri-devel, Tvrtko Ursulin, Joonas Lahtinen,
	Chris Wilson, Simona Vetter, Arshad Mehmood, Michal Mrozek,
	Andi Shyti

On Mon, Mar 24, 2025 at 02:29:37PM +0100, Andi Shyti wrote:
>When setting the CCS mode, we mistakenly used wa_masked_en() to
>apply the workaround, which reads from the register and masks the
>existing value with the new one.

That's not what wa_masked_* does. The use of wa_masked* depends if the
register is a "masked register", which depends only on the HW IP, it's
not a sw thing.

On the xe side we tried to clarify this by making sure the
"masked" annotation is on the register definition rather than simply
using a different function that receives the same type:

drivers/gpu/drm/xe/regs/xe_gt_regs.h:#define CCS_MODE XE_REG(0x14804, XE_REG_OPTION_MASKED)

Copy and paste of the comment from drivers/gpu/drm/i915/gt/intel_workarounds.c
that explains what it actually is:

/*
  * WA operations on "masked register". A masked register has the upper 16 bits
  * documented as "masked" in b-spec. Its purpose is to allow writing to just a
  * portion of the register without a rmw: you simply write in the upper 16 bits
  * the mask of bits you are going to modify.
  *
  * The wa_masked_* family of functions already does the necessary operations to
  * calculate the mask based on the parameters passed, so user only has to
  * provide the lower 16 bits of that register.
  */

If you don't set the bit on the upper nibble, it's the same as not
writing anything, so this patch basically breaks the setting to
CCS_MODE: none of the writes will go through.

bspec 46034 shows this register as a masked register:

	Access: Masked(R/W)

and documentation for bits 31:16 shows the mask.

Lucas De Marchi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 02/15] drm/i915/gt: Move the CCS mode variable to a global position
  2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
  2025-03-24 13:29 ` [PATCH v4 01/15] drm/i915/gt: Avoid using masked workaround for CCS_MODE setting Andi Shyti
@ 2025-03-24 13:29 ` Andi Shyti
  2025-03-24 13:29 ` [PATCH v4 03/15] drm/i915/gt: Allow the creation of multi-mode CCS masks Andi Shyti
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Andi Shyti @ 2025-03-24 13:29 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Tvrtko Ursulin, Joonas Lahtinen, Chris Wilson, Simona Vetter,
	Arshad Mehmood, Michal Mrozek, Andi Shyti, Andi Shyti

Store the CCS mode value in the intel_gt->ccs structure to make
it available for future instances that may need to change its
value.

Name it mode_reg_val because it holds the value that will
be written into the CCS_MODE register, determining the CCS
balancing and, consequently, the number of engines generated.

No functional changes intended.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt.c          |  3 +++
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 16 +++++++++++-----
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt_types.h    | 11 +++++++++++
 drivers/gpu/drm/i915/gt/intel_workarounds.c |  6 ++++--
 5 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 3d3b1ba76e2b..bf09297f92c1 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -18,6 +18,7 @@
 #include "intel_ggtt_gmch.h"
 #include "intel_gt.h"
 #include "intel_gt_buffer_pool.h"
+#include "intel_gt_ccs_mode.h"
 #include "intel_gt_clock_utils.h"
 #include "intel_gt_debugfs.h"
 #include "intel_gt_mcr.h"
@@ -136,6 +137,8 @@ int intel_gt_init_mmio(struct intel_gt *gt)
 	intel_sseu_info_init(gt);
 	intel_gt_mcr_init(gt);
 
+	intel_gt_ccs_mode_init(gt);
+
 	return intel_engines_init_mmio(gt);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index 3c62a44e9106..fcd07eb4728b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -8,15 +8,12 @@
 #include "intel_gt_ccs_mode.h"
 #include "intel_gt_regs.h"
 
-unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt)
+static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
 	int cslice;
 	u32 mode = 0;
 	int first_ccs = __ffs(CCS_MASK(gt));
 
-	if (!IS_DG2(gt->i915))
-		return 0;
-
 	/* Build the value for the fixed CCS load balancing */
 	for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
 		if (gt->ccs.cslices & BIT(cslice))
@@ -35,5 +32,14 @@ unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt)
 						     XEHP_CCS_MODE_CSLICE_MASK);
 	}
 
-	return mode;
+	gt->ccs.mode_reg_val = mode;
+}
+
+void intel_gt_ccs_mode_init(struct intel_gt *gt)
+{
+	if (!IS_DG2(gt->i915))
+		return;
+
+	/* Initialize the CCS mode setting */
+	intel_gt_apply_ccs_mode(gt);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
index 55547f2ff426..0f2506586a41 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
@@ -8,6 +8,6 @@
 
 struct intel_gt;
 
-unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt);
+void intel_gt_ccs_mode_init(struct intel_gt *gt);
 
 #endif /* __INTEL_GT_CCS_MODE_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index bcee084b1f27..9e257f34d05b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -207,12 +207,23 @@ struct intel_gt {
 					    [MAX_ENGINE_INSTANCE + 1];
 	enum intel_submission_method submission_method;
 
+	/*
+	 * Track fixed mapping between CCS engines and compute slices.
+	 *
+	 * In order to w/a HW that has the inability to dynamically load
+	 * balance between CCS engines and EU in the compute slices, we have to
+	 * reconfigure a static mapping on the fly.
+	 *
+	 * The mode variable is set by the user and sets the balancing mode,
+	 * i.e. how the CCS streams are distributed amongs the slices.
+	 */
 	struct {
 		/*
 		 * Mask of the non fused CCS slices
 		 * to be used for the load balancing
 		 */
 		intel_engine_mask_t cslices;
+		u32 mode_reg_val;
 	} ccs;
 
 	/*
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index b3dd8a077660..bec70294fc5c 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -2742,7 +2742,7 @@ add_render_compute_tuning_settings(struct intel_gt *gt,
 static void ccs_engine_wa_mode(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 {
 	struct intel_gt *gt = engine->gt;
-	u32 mode;
+	u32 mode = gt->ccs.mode_reg_val;
 
 	if (!IS_DG2(gt->i915))
 		return;
@@ -2758,8 +2758,10 @@ static void ccs_engine_wa_mode(struct intel_engine_cs *engine, struct i915_wa_li
 	/*
 	 * After having disabled automatic load balancing we need to
 	 * assign all slices to a single CCS. We will call it CCS mode 1
+	 *
+	 * The gt->ccs.mode_reg_val has already been set previously during
+	 * initialization.
 	 */
-	mode = intel_gt_apply_ccs_mode(gt);
 	wa_add(wal, XEHP_CCS_MODE, 0, mode, mode, false);
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 03/15] drm/i915/gt: Allow the creation of multi-mode CCS masks
  2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
  2025-03-24 13:29 ` [PATCH v4 01/15] drm/i915/gt: Avoid using masked workaround for CCS_MODE setting Andi Shyti
  2025-03-24 13:29 ` [PATCH v4 02/15] drm/i915/gt: Move the CCS mode variable to a global position Andi Shyti
@ 2025-03-24 13:29 ` Andi Shyti
  2025-03-24 13:29 ` [PATCH v4 04/15] drm/i915/gt: Refactor uabi engine class/instance list creation Andi Shyti
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Andi Shyti @ 2025-03-24 13:29 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Tvrtko Ursulin, Joonas Lahtinen, Chris Wilson, Simona Vetter,
	Arshad Mehmood, Michal Mrozek, Andi Shyti, Andi Shyti

Until now, we have only set CCS mode balancing to 1, which means
that only one compute engine is exposed to the user. The stream
of compute commands submitted to that engine is then shared among
all the dedicated execution units.

This is done by calling the 'intel_gt_apply_ccs_mode(); function.

With this change, the aforementioned function takes an additional
parameter called 'mode' that specifies the desired mode to be set
for the CCS engines balancing. The mode parameter can have the
following values:

 - mode = 0: CCS load balancing mode 1 (1 CCS engine exposed)
 - mode = 1: CCS load balancing mode 2 (2 CCS engines exposed)
 - mode = 3: CCS load balancing mode 4 (4 CCS engines exposed)

This allows us to generate the appropriate register value to be
written to CCS_MODE, configuring how the exposed engine streams
will be submitted to the execution units.

No functional changes are intended yet, as no mode higher than
'0' is currently being set.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 85 +++++++++++++++++----
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h |  2 +-
 2 files changed, 72 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index fcd07eb4728b..a6c33b471567 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -4,35 +4,92 @@
  */
 
 #include "i915_drv.h"
-#include "intel_gt.h"
 #include "intel_gt_ccs_mode.h"
 #include "intel_gt_regs.h"
 
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
+	unsigned long cslices_mask = gt->ccs.cslices;
+	u32 mode_val = 0;
+	/* CCS engine id, i.e. the engines position in the engine's bitmask */
+	int engine;
 	int cslice;
-	u32 mode = 0;
-	int first_ccs = __ffs(CCS_MASK(gt));
 
-	/* Build the value for the fixed CCS load balancing */
+	/*
+	 * The mode has two bit dedicated for each engine
+	 * that will be used for the CCS balancing algorithm:
+	 *
+	 *    BIT | CCS slice
+	 *   ------------------
+	 *     0  | CCS slice
+	 *     1  |     0
+	 *   ------------------
+	 *     2  | CCS slice
+	 *     3  |     1
+	 *   ------------------
+	 *     4  | CCS slice
+	 *     5  |     2
+	 *   ------------------
+	 *     6  | CCS slice
+	 *     7  |     3
+	 *   ------------------
+	 *
+	 * When a CCS slice is not available, then we will write 0x7,
+	 * oterwise we will write the user engine id which load will
+	 * be forwarded to that slice.
+	 *
+	 * The possible configurations are:
+	 *
+	 * 1 engine (ccs0):
+	 *   slice 0, 1, 2, 3: ccs0
+	 *
+	 * 2 engines (ccs0, ccs1):
+	 *   slice 0, 2: ccs0
+	 *   slice 1, 3: ccs1
+	 *
+	 * 4 engines (ccs0, ccs1, ccs2, ccs3):
+	 *   slice 0: ccs0
+	 *   slice 1: ccs1
+	 *   slice 2: ccs2
+	 *   slice 3: ccs3
+	 */
+	engine = __ffs(cslices_mask);
+
 	for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
-		if (gt->ccs.cslices & BIT(cslice))
+		if (!(cslices_mask & BIT(cslice))) {
 			/*
-			 * If available, assign the cslice
-			 * to the first available engine...
+			 * If not available, mark the slice as unavailable
+			 * and no task will be dispatched here.
 			 */
-			mode |= XEHP_CCS_MODE_CSLICE(cslice, first_ccs);
+			mode_val |= XEHP_CCS_MODE_CSLICE(cslice,
+						     XEHP_CCS_MODE_CSLICE_MASK);
+			continue;
+		}
 
-		else
+		mode_val |= XEHP_CCS_MODE_CSLICE(cslice, engine);
+
+		engine = find_next_bit(&cslices_mask, I915_MAX_CCS, engine + 1);
+		/*
+		 * If "engine" has reached the I915_MAX_CCS value it means that
+		 * we have gone through all the unfused engines and now we need
+		 * to reset its value to the first engine.
+		 *
+		 * From the find_next_bit() description:
+		 *
+		 * "Returns the bit number for the next set bit
+		 * If no bits are set, returns @size."
+		 */
+		if (engine == I915_MAX_CCS) {
 			/*
-			 * ... otherwise, mark the cslice as
-			 * unavailable if no CCS dispatches here
+			 * CCS mode, will be used later to
+			 * reset to a flexible value
 			 */
-			mode |= XEHP_CCS_MODE_CSLICE(cslice,
-						     XEHP_CCS_MODE_CSLICE_MASK);
+			engine = __ffs(cslices_mask);
+			continue;
+		}
 	}
 
-	gt->ccs.mode_reg_val = mode;
+	gt->ccs.mode_reg_val = mode_val;
 }
 
 void intel_gt_ccs_mode_init(struct intel_gt *gt)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
index 0f2506586a41..4a6763b95a78 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
@@ -6,7 +6,7 @@
 #ifndef __INTEL_GT_CCS_MODE_H__
 #define __INTEL_GT_CCS_MODE_H__
 
-struct intel_gt;
+#include "intel_gt.h"
 
 void intel_gt_ccs_mode_init(struct intel_gt *gt);
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 04/15] drm/i915/gt: Refactor uabi engine class/instance list creation
  2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
                   ` (2 preceding siblings ...)
  2025-03-24 13:29 ` [PATCH v4 03/15] drm/i915/gt: Allow the creation of multi-mode CCS masks Andi Shyti
@ 2025-03-24 13:29 ` Andi Shyti
  2025-03-24 13:29 ` [PATCH v4 05/15] drm/i915/gem: Mark and verify UABI engine validity Andi Shyti
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Andi Shyti @ 2025-03-24 13:29 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Tvrtko Ursulin, Joonas Lahtinen, Chris Wilson, Simona Vetter,
	Arshad Mehmood, Michal Mrozek, Andi Shyti, Andi Shyti,
	Tvrtko Ursulin

For the upcoming changes we need a cleaner way to build the list
of uabi engines.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_user.c | 29 ++++++++++++---------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 833987015b8b..11cc06c0c785 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -203,7 +203,7 @@ static void engine_rename(struct intel_engine_cs *engine, const char *name, u16
 
 void intel_engines_driver_register(struct drm_i915_private *i915)
 {
-	u16 name_instance, other_instance = 0;
+	u16 class_instance[I915_LAST_UABI_ENGINE_CLASS + 2] = { };
 	struct legacy_ring ring = {};
 	struct list_head *it, *next;
 	struct rb_node **p, *prev;
@@ -214,6 +214,8 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 	prev = NULL;
 	p = &i915->uabi_engines.rb_node;
 	list_for_each_safe(it, next, &engines) {
+		u16 uabi_class;
+
 		struct intel_engine_cs *engine =
 			container_of(it, typeof(*engine), uabi_list);
 
@@ -222,15 +224,14 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 
 		GEM_BUG_ON(engine->class >= ARRAY_SIZE(uabi_classes));
 		engine->uabi_class = uabi_classes[engine->class];
-		if (engine->uabi_class == I915_NO_UABI_CLASS) {
-			name_instance = other_instance++;
-		} else {
-			GEM_BUG_ON(engine->uabi_class >=
-				   ARRAY_SIZE(i915->engine_uabi_class_count));
-			name_instance =
-				i915->engine_uabi_class_count[engine->uabi_class]++;
-		}
-		engine->uabi_instance = name_instance;
+
+		if (engine->uabi_class == I915_NO_UABI_CLASS)
+			uabi_class = I915_LAST_UABI_ENGINE_CLASS + 1;
+		else
+			uabi_class = engine->uabi_class;
+
+		GEM_BUG_ON(uabi_class >= ARRAY_SIZE(class_instance));
+		engine->uabi_instance = class_instance[uabi_class]++;
 
 		/*
 		 * Replace the internal name with the final user and log facing
@@ -238,11 +239,15 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 		 */
 		engine_rename(engine,
 			      intel_engine_class_repr(engine->class),
-			      name_instance);
+			      engine->uabi_instance);
 
-		if (engine->uabi_class == I915_NO_UABI_CLASS)
+		if (uabi_class > I915_LAST_UABI_ENGINE_CLASS)
 			continue;
 
+		GEM_BUG_ON(uabi_class >=
+			   ARRAY_SIZE(i915->engine_uabi_class_count));
+		i915->engine_uabi_class_count[uabi_class]++;
+
 		rb_link_node(&engine->uabi_node, prev, p);
 		rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 05/15] drm/i915/gem: Mark and verify UABI engine validity
  2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
                   ` (3 preceding siblings ...)
  2025-03-24 13:29 ` [PATCH v4 04/15] drm/i915/gt: Refactor uabi engine class/instance list creation Andi Shyti
@ 2025-03-24 13:29 ` Andi Shyti
  2025-03-24 13:29 ` [PATCH v4 06/15] drm/i915/gt: Introduce for_each_enabled_engine() and apply it in selftests Andi Shyti
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Andi Shyti @ 2025-03-24 13:29 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Tvrtko Ursulin, Joonas Lahtinen, Chris Wilson, Simona Vetter,
	Arshad Mehmood, Michal Mrozek, Andi Shyti, Andi Shyti

Mark engines as invalid when they are not added to the UABI list
to prevent accidental assignment of batch buffers.

Currently, this change is mostly precautionary with minimal
impact. However, in the future, when CCS engines will be
dynamically added and removed by the user, this mechanism will
be used for determining engine validity.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 28 +++++++++++++++++--
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |  9 ++++--
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 7796c4119ef5..a6448f6c8f6a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2680,6 +2680,22 @@ eb_select_legacy_ring(struct i915_execbuffer *eb)
 	return user_ring_map[user_ring_id];
 }
 
+static bool engine_valid(struct intel_context *ce)
+{
+	if (!intel_engine_is_virtual(ce->engine))
+		return !RB_EMPTY_NODE(&ce->engine->uabi_node);
+
+	/*
+	 * TODO: check virtual sibilings; we need to walk through all the
+	 * virtual engines and ask whether the physical engine where it is based
+	 * is still valid. For each of them we need to check with
+	 * RB_EMPTY_NODE(...)
+	 *
+	 * This can be a placed in a new ce_ops.
+	 */
+	return true;
+}
+
 static int
 eb_select_engine(struct i915_execbuffer *eb)
 {
@@ -2710,8 +2726,6 @@ eb_select_engine(struct i915_execbuffer *eb)
 	eb->num_batches = ce->parallel.number_children + 1;
 	gt = ce->engine->gt;
 
-	for_each_child(ce, child)
-		intel_context_get(child);
 	eb->wakeref = intel_gt_pm_get(ce->engine->gt);
 	/*
 	 * Keep GT0 active on MTL so that i915_vma_parked() doesn't
@@ -2720,6 +2734,16 @@ eb_select_engine(struct i915_execbuffer *eb)
 	if (gt->info.id)
 		eb->wakeref_gt0 = intel_gt_pm_get(to_gt(gt->i915));
 
+	/* We need to hold the wakeref to stabilize i915->uabi_engines */
+	if (!engine_valid(ce)) {
+		intel_context_put(ce);
+		err = -ENODEV;
+		goto err;
+	}
+
+	for_each_child(ce, child)
+		intel_context_get(child);
+
 	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
 		err = intel_context_alloc_state(ce);
 		if (err)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 11cc06c0c785..cd7662b1ad59 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -220,7 +220,7 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 			container_of(it, typeof(*engine), uabi_list);
 
 		if (intel_gt_has_unrecoverable_error(engine->gt))
-			continue; /* ignore incomplete engines */
+			goto clear_node_continue; /* ignore incomplete engines */
 
 		GEM_BUG_ON(engine->class >= ARRAY_SIZE(uabi_classes));
 		engine->uabi_class = uabi_classes[engine->class];
@@ -242,7 +242,7 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 			      engine->uabi_instance);
 
 		if (uabi_class > I915_LAST_UABI_ENGINE_CLASS)
-			continue;
+			goto clear_node_continue;
 
 		GEM_BUG_ON(uabi_class >=
 			   ARRAY_SIZE(i915->engine_uabi_class_count));
@@ -260,6 +260,11 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 
 		prev = &engine->uabi_node;
 		p = &prev->rb_right;
+
+		continue;
+
+clear_node_continue:
+		RB_CLEAR_NODE(&engine->uabi_node);
 	}
 
 	if (IS_ENABLED(CONFIG_DRM_I915_SELFTESTS) &&
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 06/15] drm/i915/gt: Introduce for_each_enabled_engine() and apply it in selftests
  2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
                   ` (4 preceding siblings ...)
  2025-03-24 13:29 ` [PATCH v4 05/15] drm/i915/gem: Mark and verify UABI engine validity Andi Shyti
@ 2025-03-24 13:29 ` Andi Shyti
  2025-03-24 13:29 ` [PATCH v4 07/15] drm/i915/gt: Manage CCS engine creation within UABI exposure Andi Shyti
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Andi Shyti @ 2025-03-24 13:29 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Tvrtko Ursulin, Joonas Lahtinen, Chris Wilson, Simona Vetter,
	Arshad Mehmood, Michal Mrozek, Andi Shyti, Andi Shyti

Selftests should run only on enabled engines, as disabled engines
are not intended for use. A practical example is when, on DG2
machines, the user chooses to utilize only one CCS stream instead
of all four.

To address this, introduce the for_each_enabled_engine() loop,
which will skip engines when they are marked as RB_EMPTY.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt.h            | 12 +++++
 drivers/gpu/drm/i915/gt/selftest_context.c    |  6 +--
 drivers/gpu/drm/i915/gt/selftest_engine_cs.c  |  4 +-
 .../drm/i915/gt/selftest_engine_heartbeat.c   |  6 +--
 drivers/gpu/drm/i915/gt/selftest_engine_pm.c  |  6 +--
 drivers/gpu/drm/i915/gt/selftest_execlists.c  | 52 +++++++++----------
 drivers/gpu/drm/i915/gt/selftest_gt_pm.c      |  2 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  | 22 ++++----
 drivers/gpu/drm/i915/gt/selftest_lrc.c        | 18 +++----
 drivers/gpu/drm/i915/gt/selftest_mocs.c       |  6 +--
 drivers/gpu/drm/i915/gt/selftest_rc6.c        |  4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c      |  8 +--
 .../drm/i915/gt/selftest_ring_submission.c    |  2 +-
 drivers/gpu/drm/i915/gt/selftest_rps.c        | 14 ++---
 drivers/gpu/drm/i915/gt/selftest_timeline.c   | 14 ++---
 drivers/gpu/drm/i915/gt/selftest_tlb.c        |  2 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    | 14 ++---
 17 files changed, 102 insertions(+), 90 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index 998ca029b73a..1c9d861241ad 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -188,6 +188,18 @@ int intel_gt_tiles_init(struct drm_i915_private *i915);
 	     (id__)++) \
 		for_each_if ((engine__) = (gt__)->engine[(id__)])
 
+/*
+ * Iterator over all initialized and enabled engines. Some engines, like CCS,
+ * may be "disabled" (i.e., not exposed to the user). Disabling is indicated
+ * by marking the rb_node as empty.
+ */
+#define for_each_enabled_engine(engine__, gt__, id__) \
+	for ((id__) = 0; \
+	     (id__) < I915_NUM_ENGINES; \
+	     (id__)++) \
+		for_each_if (((engine__) = (gt__)->engine[(id__)]) && \
+			     (!RB_EMPTY_NODE(&(engine__)->uabi_node)))
+
 /* Iterator over subset of engines selected by mask */
 #define for_each_engine_masked(engine__, gt__, mask__, tmp__) \
 	for ((tmp__) = (mask__) & (gt__)->info.engine_mask; \
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c
index 5eb46700dc4e..9976e231248d 100644
--- a/drivers/gpu/drm/i915/gt/selftest_context.c
+++ b/drivers/gpu/drm/i915/gt/selftest_context.c
@@ -157,7 +157,7 @@ static int live_context_size(void *arg)
 	 * HW tries to write past the end of one.
 	 */
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct file *saved;
 
 		if (!engine->context_size)
@@ -311,7 +311,7 @@ static int live_active_context(void *arg)
 	enum intel_engine_id id;
 	int err = 0;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		err = __live_active_context(engine);
 		if (err)
 			break;
@@ -424,7 +424,7 @@ static int live_remote_context(void *arg)
 	enum intel_engine_id id;
 	int err = 0;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		err = __live_remote_context(engine);
 		if (err)
 			break;
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
index 5ffa5e30f419..038723a401df 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
@@ -142,7 +142,7 @@ static int perf_mi_bb_start(void *arg)
 		return 0;
 
 	wakeref = perf_begin(gt);
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct intel_context *ce = engine->kernel_context;
 		struct i915_vma *batch;
 		u32 cycles[COUNT];
@@ -270,7 +270,7 @@ static int perf_mi_noop(void *arg)
 		return 0;
 
 	wakeref = perf_begin(gt);
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct intel_context *ce = engine->kernel_context;
 		struct i915_vma *base, *nop;
 		u32 cycles[COUNT];
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
index 9e4f0e417b3b..74d4c2dc69cf 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
@@ -160,7 +160,7 @@ static int live_idle_flush(void *arg)
 
 	/* Check that we can flush the idle barriers */
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		st_engine_heartbeat_disable(engine);
 		err = __live_idle_pulse(engine, intel_engine_flush_barriers);
 		st_engine_heartbeat_enable(engine);
@@ -180,7 +180,7 @@ static int live_idle_pulse(void *arg)
 
 	/* Check that heartbeat pulses flush the idle barriers */
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		st_engine_heartbeat_disable(engine);
 		err = __live_idle_pulse(engine, intel_engine_pulse);
 		st_engine_heartbeat_enable(engine);
@@ -246,7 +246,7 @@ static int live_heartbeat_off(void *arg)
 	if (!CONFIG_DRM_I915_HEARTBEAT_INTERVAL)
 		return 0;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		if (!intel_engine_has_preemption(engine))
 			continue;
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
index 10e556a7eac4..1da3bddbf02e 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
@@ -203,7 +203,7 @@ static int live_engine_timestamps(void *arg)
 	if (GRAPHICS_VER(gt->i915) < 8)
 		return 0;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		int err;
 
 		st_engine_heartbeat_disable(engine);
@@ -257,7 +257,7 @@ static int live_engine_busy_stats(void *arg)
 		return -ENOMEM;
 
 	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct i915_request *rq;
 		ktime_t busyness, dummy;
 		ktime_t de, dt;
@@ -363,7 +363,7 @@ static int live_engine_pm(void *arg)
 	}
 
 	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		const typeof(*igt_atomic_phases) *p;
 
 		for (p = igt_atomic_phases; p->name; p++) {
diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
index d7717de17ecc..e47411c05b31 100644
--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
@@ -120,7 +120,7 @@ static int live_sanitycheck(void *arg)
 	if (igt_spinner_init(&spin, gt))
 		return -ENOMEM;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct intel_context *ce;
 		struct i915_request *rq;
 
@@ -177,7 +177,7 @@ static int live_unlite_restore(struct intel_gt *gt, int prio)
 		return err;
 
 	err = 0;
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct intel_context *ce[2] = {};
 		struct i915_request *rq[2];
 		struct igt_live_test t;
@@ -339,7 +339,7 @@ static int live_unlite_ring(void *arg)
 	if (igt_spinner_init(&spin, gt))
 		return -ENOMEM;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct intel_context *ce[2] = {};
 		struct i915_request *rq;
 		struct igt_live_test t;
@@ -488,7 +488,7 @@ static int live_pin_rewind(void *arg)
 	 * To simulate this, let's apply a bit of deliberate sabotague.
 	 */
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct intel_context *ce;
 		struct i915_request *rq;
 		struct intel_ring *ring;
@@ -596,7 +596,7 @@ static int live_hold_reset(void *arg)
 	if (igt_spinner_init(&spin, gt))
 		return -ENOMEM;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct intel_context *ce;
 		struct i915_request *rq;
 
@@ -703,7 +703,7 @@ static int live_error_interrupt(void *arg)
 	if (!intel_has_reset_engine(gt))
 		return 0;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		const struct error_phase *p;
 		int err = 0;
 
@@ -938,7 +938,7 @@ slice_semaphore_queue(struct intel_engine_cs *outer,
 	if (IS_ERR(head))
 		return PTR_ERR(head);
 
-	for_each_engine(engine, outer->gt, id) {
+	for_each_enabled_engine(engine, outer->gt, id) {
 		if (!intel_engine_has_preemption(engine))
 			continue;
 
@@ -1018,7 +1018,7 @@ static int live_timeslice_preempt(void *arg)
 	if (err)
 		goto err_pin;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		if (!intel_engine_has_preemption(engine))
 			continue;
 
@@ -1124,7 +1124,7 @@ static int live_timeslice_rewind(void *arg)
 	if (!CONFIG_DRM_I915_TIMESLICE_DURATION)
 		return 0;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		enum { A1, A2, B1 };
 		enum { X = 1, Z, Y };
 		struct i915_request *rq[3] = {};
@@ -1325,7 +1325,7 @@ static int live_timeslice_queue(void *arg)
 	if (err)
 		goto err_pin;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct i915_sched_attr attr = { .priority = I915_PRIORITY_MAX };
 		struct i915_request *rq, *nop;
 
@@ -1425,7 +1425,7 @@ static int live_timeslice_nopreempt(void *arg)
 	if (igt_spinner_init(&spin, gt))
 		return -ENOMEM;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct intel_context *ce;
 		struct i915_request *rq;
 		unsigned long timeslice;
@@ -1578,7 +1578,7 @@ static int live_busywait_preempt(void *arg)
 	if (err)
 		goto err_vma;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct i915_request *lo, *hi;
 		struct igt_live_test t;
 		u32 *cs;
@@ -1754,7 +1754,7 @@ static int live_preempt(void *arg)
 	if (igt_spinner_init(&spin_lo, gt))
 		goto err_spin_hi;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct igt_live_test t;
 		struct i915_request *rq;
 
@@ -1847,7 +1847,7 @@ static int live_late_preempt(void *arg)
 	/* Make sure ctx_lo stays before ctx_hi until we trigger preemption. */
 	ctx_lo->sched.priority = 1;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct igt_live_test t;
 		struct i915_request *rq;
 
@@ -1969,7 +1969,7 @@ static int live_nopreempt(void *arg)
 		goto err_client_a;
 	b.ctx->sched.priority = I915_PRIORITY_MAX;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct i915_request *rq_a, *rq_b;
 
 		if (!intel_engine_has_preemption(engine))
@@ -2396,7 +2396,7 @@ static int live_preempt_cancel(void *arg)
 	if (preempt_client_init(gt, &data.b))
 		goto err_client_a;
 
-	for_each_engine(data.engine, gt, id) {
+	for_each_enabled_engine(data.engine, gt, id) {
 		if (!intel_engine_has_preemption(data.engine))
 			continue;
 
@@ -2463,7 +2463,7 @@ static int live_suppress_self_preempt(void *arg)
 	if (preempt_client_init(gt, &b))
 		goto err_client_a;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct i915_request *rq_a, *rq_b;
 		int depth;
 
@@ -2570,7 +2570,7 @@ static int live_chain_preempt(void *arg)
 	if (preempt_client_init(gt, &lo))
 		goto err_client_hi;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct i915_sched_attr attr = { .priority = I915_PRIORITY_MAX };
 		struct igt_live_test t;
 		struct i915_request *rq;
@@ -2928,7 +2928,7 @@ static int live_preempt_ring(void *arg)
 	if (igt_spinner_init(&spin, gt))
 		return -ENOMEM;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		int n;
 
 		if (!intel_engine_has_preemption(engine))
@@ -2971,7 +2971,7 @@ static int live_preempt_gang(void *arg)
 	 * high priority levels into execution order.
 	 */
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct i915_request *rq = NULL;
 		struct igt_live_test t;
 		IGT_TIMEOUT(end_time);
@@ -3277,7 +3277,7 @@ static int live_preempt_user(void *arg)
 		return PTR_ERR(result);
 	}
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct i915_request *client[3] = {};
 		struct igt_live_test t;
 		int i;
@@ -3393,7 +3393,7 @@ static int live_preempt_timeout(void *arg)
 	if (igt_spinner_init(&spin_lo, gt))
 		goto err_ctx_lo;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		unsigned long saved_timeout;
 		struct i915_request *rq;
 
@@ -3567,7 +3567,7 @@ static int smoke_crescendo(struct preempt_smoke *smoke, unsigned int flags)
 
 	memset(arg, 0, I915_NUM_ENGINES * sizeof(*arg));
 
-	for_each_engine(engine, smoke->gt, id) {
+	for_each_enabled_engine(engine, smoke->gt, id) {
 		arg[id] = *smoke;
 		arg[id].engine = engine;
 		if (!(flags & BATCH))
@@ -3585,7 +3585,7 @@ static int smoke_crescendo(struct preempt_smoke *smoke, unsigned int flags)
 	}
 
 	count = 0;
-	for_each_engine(engine, smoke->gt, id) {
+	for_each_enabled_engine(engine, smoke->gt, id) {
 		if (IS_ERR_OR_NULL(worker[id]))
 			continue;
 
@@ -3613,7 +3613,7 @@ static int smoke_random(struct preempt_smoke *smoke, unsigned int flags)
 
 	count = 0;
 	do {
-		for_each_engine(smoke->engine, smoke->gt, id) {
+		for_each_enabled_engine(smoke->engine, smoke->gt, id) {
 			struct i915_gem_context *ctx = smoke_context(smoke);
 			int err;
 
@@ -3876,7 +3876,7 @@ static int live_virtual_engine(void *arg)
 	if (intel_uc_uses_guc_submission(&gt->uc))
 		return 0;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		err = nop_virtual_engine(gt, &engine, 1, 1, 0);
 		if (err) {
 			pr_err("Failed to wrap engine %s: err=%d\n",
diff --git a/drivers/gpu/drm/i915/gt/selftest_gt_pm.c b/drivers/gpu/drm/i915/gt/selftest_gt_pm.c
index 33351deeea4f..ddc4b5623f19 100644
--- a/drivers/gpu/drm/i915/gt/selftest_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/selftest_gt_pm.c
@@ -95,7 +95,7 @@ static int live_gt_clocks(void *arg)
 	wakeref = intel_gt_pm_get(gt);
 	intel_uncore_forcewake_get(gt->uncore, FORCEWAKE_ALL);
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		u32 cycles;
 		u32 expected;
 		u64 time;
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index f057c16410e7..7a486a650e3e 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -296,7 +296,7 @@ static int igt_hang_sanitycheck(void *arg)
 	if (err)
 		return err;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct intel_wedge_me w;
 		long timeout;
 
@@ -360,7 +360,7 @@ static int igt_reset_nop(void *arg)
 	reset_count = i915_reset_count(global);
 	count = 0;
 	do {
-		for_each_engine(engine, gt, id) {
+		for_each_enabled_engine(engine, gt, id) {
 			struct intel_context *ce;
 			int i;
 
@@ -433,7 +433,7 @@ static int igt_reset_nop_engine(void *arg)
 	if (!intel_has_reset_engine(gt))
 		return 0;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		unsigned int reset_count, reset_engine_count, count;
 		struct intel_context *ce;
 		IGT_TIMEOUT(end_time);
@@ -553,7 +553,7 @@ static int igt_reset_fail_engine(void *arg)
 	if (!intel_has_reset_engine(gt))
 		return 0;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		unsigned int count;
 		struct intel_context *ce;
 		IGT_TIMEOUT(end_time);
@@ -700,7 +700,7 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
 			return err;
 	}
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		unsigned int reset_count, reset_engine_count;
 		unsigned long count;
 		bool using_guc = intel_engine_uses_guc(engine);
@@ -990,7 +990,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
 	if (!threads)
 		return -ENOMEM;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		unsigned long device = i915_reset_count(global);
 		unsigned long count = 0, reported;
 		bool using_guc = intel_engine_uses_guc(engine);
@@ -1010,7 +1010,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
 		}
 
 		memset(threads, 0, sizeof(*threads) * I915_NUM_ENGINES);
-		for_each_engine(other, gt, tmp) {
+		for_each_enabled_engine(other, gt, tmp) {
 			struct kthread_worker *worker;
 
 			threads[tmp].resets =
@@ -1185,7 +1185,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
 		}
 
 unwind:
-		for_each_engine(other, gt, tmp) {
+		for_each_enabled_engine(other, gt, tmp) {
 			int ret;
 
 			if (!threads[tmp].worker)
@@ -1621,7 +1621,7 @@ static int wait_for_others(struct intel_gt *gt,
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		if (engine == exclude)
 			continue;
 
@@ -1649,7 +1649,7 @@ static int igt_reset_queue(void *arg)
 	if (err)
 		goto unlock;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct intel_selftest_saved_policy saved;
 		struct i915_request *prev;
 		IGT_TIMEOUT(end_time);
@@ -1982,7 +1982,7 @@ static int igt_reset_engines_atomic(void *arg)
 		struct intel_engine_cs *engine;
 		enum intel_engine_id id;
 
-		for_each_engine(engine, gt, id) {
+		for_each_enabled_engine(engine, gt, id) {
 			err = igt_atomic_reset_engine(engine, p);
 			if (err)
 				goto out;
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 23f04f6f8fba..8c18e3f11991 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -172,7 +172,7 @@ static int live_lrc_layout(void *arg)
 	GEM_BUG_ON(offset_in_page(lrc));
 
 	err = 0;
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		u32 *hw;
 		int dw;
 
@@ -295,7 +295,7 @@ static int live_lrc_fixed(void *arg)
 	 * the context image.
 	 */
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		const struct {
 			u32 reg;
 			u32 offset;
@@ -517,7 +517,7 @@ static int live_lrc_state(void *arg)
 	if (IS_ERR(scratch))
 		return PTR_ERR(scratch);
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		err = __live_lrc_state(engine, scratch);
 		if (err)
 			break;
@@ -711,7 +711,7 @@ static int live_lrc_gpr(void *arg)
 	if (IS_ERR(scratch))
 		return PTR_ERR(scratch);
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		st_engine_heartbeat_disable(engine);
 
 		err = __live_lrc_gpr(engine, scratch, false);
@@ -876,7 +876,7 @@ static int live_lrc_timestamp(void *arg)
 	 * with a second request (carrying more poison into the timestamp).
 	 */
 
-	for_each_engine(data.engine, gt, id) {
+	for_each_enabled_engine(data.engine, gt, id) {
 		int i, err = 0;
 
 		st_engine_heartbeat_disable(data.engine);
@@ -1534,7 +1534,7 @@ static int live_lrc_isolation(void *arg)
 	 * context image and attempt to modify that list from a remote context.
 	 */
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		int i;
 
 		/* Just don't even ask */
@@ -1722,7 +1722,7 @@ static int lrc_wabb_ctx(void *arg, bool per_ctx)
 	enum intel_engine_id id;
 	int err = 0;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		intel_engine_pm_get(engine);
 		err = __lrc_wabb_ctx(engine, per_ctx);
 		intel_engine_pm_put(engine);
@@ -1858,7 +1858,7 @@ static int live_lrc_garbage(void *arg)
 	if (!IS_ENABLED(CONFIG_DRM_I915_SELFTEST_BROKEN))
 		return 0;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		I915_RND_STATE(prng);
 		int err = 0, i;
 
@@ -1960,7 +1960,7 @@ static int live_pphwsp_runtime(void *arg)
 	 * is monotonic.
 	 */
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		err = __live_pphwsp_runtime(engine);
 		if (err)
 			break;
diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c b/drivers/gpu/drm/i915/gt/selftest_mocs.c
index d73e438fb85f..6fd9fb0cd9f6 100644
--- a/drivers/gpu/drm/i915/gt/selftest_mocs.c
+++ b/drivers/gpu/drm/i915/gt/selftest_mocs.c
@@ -271,7 +271,7 @@ static int live_mocs_kernel(void *arg)
 	if (err)
 		return err;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		intel_engine_pm_get(engine);
 		err = check_mocs_engine(&mocs, engine->kernel_context);
 		intel_engine_pm_put(engine);
@@ -297,7 +297,7 @@ static int live_mocs_clean(void *arg)
 	if (err)
 		return err;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct intel_context *ce;
 
 		ce = mocs_context_create(engine);
@@ -400,7 +400,7 @@ static int live_mocs_reset(void *arg)
 		return err;
 
 	igt_global_reset_lock(gt);
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		bool using_guc = intel_engine_uses_guc(engine);
 		struct intel_selftest_saved_policy saved;
 		struct intel_context *ce;
diff --git a/drivers/gpu/drm/i915/gt/selftest_rc6.c b/drivers/gpu/drm/i915/gt/selftest_rc6.c
index 99de5d85a096..805942864100 100644
--- a/drivers/gpu/drm/i915/gt/selftest_rc6.c
+++ b/drivers/gpu/drm/i915/gt/selftest_rc6.c
@@ -177,7 +177,7 @@ randomised_engines(struct intel_gt *gt,
 	int n;
 
 	n = 0;
-	for_each_engine(engine, gt, id)
+	for_each_enabled_engine(engine, gt, id)
 		n++;
 	if (!n)
 		return NULL;
@@ -187,7 +187,7 @@ randomised_engines(struct intel_gt *gt,
 		return NULL;
 
 	n = 0;
-	for_each_engine(engine, gt, id)
+	for_each_enabled_engine(engine, gt, id)
 		engines[n++] = engine;
 
 	i915_prandom_shuffle(engines, sizeof(*engines), n, prng);
diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c
index 2cfc23c58e90..548e00ec47bd 100644
--- a/drivers/gpu/drm/i915/gt/selftest_reset.c
+++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
@@ -55,7 +55,7 @@ __igt_reset_stolen(struct intel_gt *gt,
 	if (err)
 		goto err_lock;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct intel_context *ce;
 		struct i915_request *rq;
 
@@ -113,7 +113,7 @@ __igt_reset_stolen(struct intel_gt *gt,
 	if (mask == ALL_ENGINES) {
 		intel_gt_reset(gt, mask, NULL);
 	} else {
-		for_each_engine(engine, gt, id) {
+		for_each_enabled_engine(engine, gt, id) {
 			if (mask & engine->mask)
 				intel_engine_reset(engine, NULL);
 		}
@@ -197,7 +197,7 @@ static int igt_reset_engines_stolen(void *arg)
 	if (!intel_has_reset_engine(gt))
 		return 0;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		err = __igt_reset_stolen(gt, engine->mask, engine->name);
 		if (err)
 			return err;
@@ -326,7 +326,7 @@ static int igt_atomic_engine_reset(void *arg)
 	if (!igt_force_reset(gt))
 		goto out_unlock;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct tasklet_struct *t = &engine->sched_engine->tasklet;
 
 		if (t->func)
diff --git a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
index 87ceb0f374b6..a447fec027e1 100644
--- a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
@@ -259,7 +259,7 @@ static int live_ctx_switch_wa(void *arg)
 	 * and equally important it was wasn't run when we don't!
 	 */
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct i915_vma *saved_wa;
 		int err;
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c
index 73bc91c6ea07..d77a95acb400 100644
--- a/drivers/gpu/drm/i915/gt/selftest_rps.c
+++ b/drivers/gpu/drm/i915/gt/selftest_rps.c
@@ -242,7 +242,7 @@ int live_rps_clock_interval(void *arg)
 
 	intel_gt_check_clock_frequency(gt);
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct i915_request *rq;
 		u32 cycles;
 		u64 dt;
@@ -401,7 +401,7 @@ int live_rps_control(void *arg)
 	rps->work.func = dummy_rps_work;
 
 	wakeref = intel_gt_pm_get(gt);
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct i915_request *rq;
 		ktime_t min_dt, max_dt;
 		int f, limit;
@@ -630,7 +630,7 @@ int live_rps_frequency_cs(void *arg)
 	saved_work = rps->work.func;
 	rps->work.func = dummy_rps_work;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct i915_request *rq;
 		struct i915_vma *vma;
 		u32 *cancel, *cntr;
@@ -769,7 +769,7 @@ int live_rps_frequency_srm(void *arg)
 	saved_work = rps->work.func;
 	rps->work.func = dummy_rps_work;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct i915_request *rq;
 		struct i915_vma *vma;
 		u32 *cancel, *cntr;
@@ -1052,7 +1052,7 @@ int live_rps_interrupt(void *arg)
 	saved_work = rps->work.func;
 	rps->work.func = dummy_rps_work;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		/* Keep the engine busy with a spinner; expect an UP! */
 		if (pm_events & GEN6_PM_RP_UP_THRESHOLD) {
 			intel_gt_pm_wait_for_idle(engine->gt);
@@ -1159,7 +1159,7 @@ int live_rps_power(void *arg)
 	saved_work = rps->work.func;
 	rps->work.func = dummy_rps_work;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct i915_request *rq;
 		struct {
 			u64 power;
@@ -1261,7 +1261,7 @@ int live_rps_dynamic(void *arg)
 	if (intel_rps_uses_timer(rps))
 		pr_info("RPS has timer support\n");
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct i915_request *rq;
 		struct {
 			ktime_t dt;
diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c b/drivers/gpu/drm/i915/gt/selftest_timeline.c
index fa36cf920bde..47d6f02808ba 100644
--- a/drivers/gpu/drm/i915/gt/selftest_timeline.c
+++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c
@@ -543,7 +543,7 @@ static int live_hwsp_engine(void *arg)
 		return -ENOMEM;
 
 	count = 0;
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		if (!intel_engine_can_store_dword(engine))
 			continue;
 
@@ -619,7 +619,7 @@ static int live_hwsp_alternate(void *arg)
 
 	count = 0;
 	for (n = 0; n < NUM_TIMELINES; n++) {
-		for_each_engine(engine, gt, id) {
+		for_each_enabled_engine(engine, gt, id) {
 			struct intel_timeline *tl;
 			struct i915_request *rq;
 
@@ -691,7 +691,7 @@ static int live_hwsp_wrap(void *arg)
 	if (err)
 		goto out_free;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		const u32 *hwsp_seqno[2];
 		struct i915_request *rq;
 		u32 seqno[2];
@@ -1016,7 +1016,7 @@ static int live_hwsp_read(void *arg)
 			goto out;
 	}
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct intel_context *ce;
 		unsigned long count = 0;
 		IGT_TIMEOUT(end_time);
@@ -1188,7 +1188,7 @@ static int live_hwsp_rollover_kernel(void *arg)
 	 * see a seqno rollover.
 	 */
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct intel_context *ce = engine->kernel_context;
 		struct intel_timeline *tl = ce->timeline;
 		struct i915_request *rq[3] = {};
@@ -1266,7 +1266,7 @@ static int live_hwsp_rollover_user(void *arg)
 	 * on the user's timeline.
 	 */
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct i915_request *rq[3] = {};
 		struct intel_timeline *tl;
 		struct intel_context *ce;
@@ -1357,7 +1357,7 @@ static int live_hwsp_recycle(void *arg)
 	 */
 
 	count = 0;
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		IGT_TIMEOUT(end_time);
 
 		if (!intel_engine_can_store_dword(engine))
diff --git a/drivers/gpu/drm/i915/gt/selftest_tlb.c b/drivers/gpu/drm/i915/gt/selftest_tlb.c
index 69ed946a39e5..12526d17177f 100644
--- a/drivers/gpu/drm/i915/gt/selftest_tlb.c
+++ b/drivers/gpu/drm/i915/gt/selftest_tlb.c
@@ -293,7 +293,7 @@ mem_tlbinv(struct intel_gt *gt,
 	}
 
 	err = 0;
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct i915_gem_ww_ctx ww;
 		struct intel_context *ce;
 		int bit;
diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
index 14a8b25b6204..55f9f5c556c3 100644
--- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
@@ -70,7 +70,7 @@ reference_lists_init(struct intel_gt *gt, struct wa_lists *lists)
 	gt_init_workarounds(gt, &lists->gt_wa_list);
 	wa_init_finish(&lists->gt_wa_list);
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct i915_wa_list *wal = &lists->engine[id].wa_list;
 
 		wa_init_start(wal, gt, "REF", engine->name);
@@ -89,7 +89,7 @@ reference_lists_fini(struct intel_gt *gt, struct wa_lists *lists)
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	for_each_engine(engine, gt, id)
+	for_each_enabled_engine(engine, gt, id)
 		intel_wa_list_free(&lists->engine[id].wa_list);
 
 	intel_wa_list_free(&lists->gt_wa_list);
@@ -764,7 +764,7 @@ static int live_dirty_whitelist(void *arg)
 	if (GRAPHICS_VER(gt->i915) < 7) /* minimum requirement for LRI, SRM, LRM */
 		return 0;
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct intel_context *ce;
 		int err;
 
@@ -794,7 +794,7 @@ static int live_reset_whitelist(void *arg)
 	/* If we reset the gpu, we should not lose the RING_NONPRIV */
 	igt_global_reset_lock(gt);
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		if (engine->whitelist.count == 0)
 			continue;
 
@@ -1089,7 +1089,7 @@ static int live_isolated_whitelist(void *arg)
 		}
 	}
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct intel_context *ce[2];
 
 		if (!engine->kernel_context->vm)
@@ -1172,7 +1172,7 @@ verify_wa_lists(struct intel_gt *gt, struct wa_lists *lists,
 
 	ok &= wa_list_verify(gt, &lists->gt_wa_list, str);
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct intel_context *ce;
 
 		ce = intel_context_create(engine);
@@ -1257,7 +1257,7 @@ live_engine_reset_workarounds(void *arg)
 
 	reference_lists_init(gt, lists);
 
-	for_each_engine(engine, gt, id) {
+	for_each_enabled_engine(engine, gt, id) {
 		struct intel_selftest_saved_policy saved;
 		bool using_guc = intel_engine_uses_guc(engine);
 		bool ok;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 07/15] drm/i915/gt: Manage CCS engine creation within UABI exposure
  2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
                   ` (5 preceding siblings ...)
  2025-03-24 13:29 ` [PATCH v4 06/15] drm/i915/gt: Introduce for_each_enabled_engine() and apply it in selftests Andi Shyti
@ 2025-03-24 13:29 ` Andi Shyti
  2025-03-24 13:29 ` [PATCH v4 08/15] drm/i915/gt: Remove cslices mask value from the CCS structure Andi Shyti
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Andi Shyti @ 2025-03-24 13:29 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Tvrtko Ursulin, Joonas Lahtinen, Chris Wilson, Simona Vetter,
	Arshad Mehmood, Michal Mrozek, Andi Shyti, Andi Shyti

In commit ea315f98e5d6 ("drm/i915/gt: Do not generate the command
streamer for all the CCS"), we restricted the creation of
physical CCS engines to only one stream. This allowed the user to
submit a single compute workload, with all CCS slices sharing the
workload from that stream.

This patch removes that limitation but still exposes only one
stream to the user. The physical memory for each engine remains
allocated but unused, however the user will only see one engine
exposed.

Do this by adding only one engine to the UABI list, ensuring that
only one engine is visible to the user.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c   | 23 ---------------------
 drivers/gpu/drm/i915/gt/intel_engine_user.c | 17 ++++++++++++---
 2 files changed, 14 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index b721bbd23356..d2e2461e09d1 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -875,29 +875,6 @@ static intel_engine_mask_t init_engine_mask(struct intel_gt *gt)
 		info->engine_mask &= ~BIT(GSC0);
 	}
 
-	/*
-	 * Do not create the command streamer for CCS slices beyond the first.
-	 * All the workload submitted to the first engine will be shared among
-	 * all the slices.
-	 *
-	 * Once the user will be allowed to customize the CCS mode, then this
-	 * check needs to be removed.
-	 */
-	if (IS_DG2(gt->i915)) {
-		u8 first_ccs = __ffs(CCS_MASK(gt));
-
-		/*
-		 * Store the number of active cslices before
-		 * changing the CCS engine configuration
-		 */
-		gt->ccs.cslices = CCS_MASK(gt);
-
-		/* Mask off all the CCS engine */
-		info->engine_mask &= ~GENMASK(CCS3, CCS0);
-		/* Put back in the first CCS engine */
-		info->engine_mask |= BIT(_CCS(first_ccs));
-	}
-
 	return info->engine_mask;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index cd7662b1ad59..8e5284af8335 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -246,6 +246,20 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 
 		GEM_BUG_ON(uabi_class >=
 			   ARRAY_SIZE(i915->engine_uabi_class_count));
+
+		/* Fix up the mapping to match default execbuf::user_map[] */
+		add_legacy_ring(&ring, engine);
+
+		/*
+		 * Do not create the command streamer for CCS slices beyond the
+		 * first. All the workload submitted to the first engine will be
+		 * shared among all the slices.
+		 */
+		if (IS_DG2(i915) &&
+		    uabi_class == I915_ENGINE_CLASS_COMPUTE &&
+		    engine->uabi_instance)
+			goto clear_node_continue;
+
 		i915->engine_uabi_class_count[uabi_class]++;
 
 		rb_link_node(&engine->uabi_node, prev, p);
@@ -255,9 +269,6 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 						    engine->uabi_class,
 						    engine->uabi_instance) != engine);
 
-		/* Fix up the mapping to match default execbuf::user_map[] */
-		add_legacy_ring(&ring, engine);
-
 		prev = &engine->uabi_node;
 		p = &prev->rb_right;
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 08/15] drm/i915/gt: Remove cslices mask value from the CCS structure
  2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
                   ` (6 preceding siblings ...)
  2025-03-24 13:29 ` [PATCH v4 07/15] drm/i915/gt: Manage CCS engine creation within UABI exposure Andi Shyti
@ 2025-03-24 13:29 ` Andi Shyti
  2025-03-24 13:29 ` [PATCH v4 09/15] drm/i915/gt: Expose the number of total CCS slices Andi Shyti
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Andi Shyti @ 2025-03-24 13:29 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Tvrtko Ursulin, Joonas Lahtinen, Chris Wilson, Simona Vetter,
	Arshad Mehmood, Michal Mrozek, Andi Shyti, Andi Shyti

Following the decision to manage CCS engine creation within UABI
engines, the "cslices" variable in the "ccs" structure in the
"gt" is no longer needed. Remove it is now redundant.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 2 +-
 drivers/gpu/drm/i915/gt/intel_gt_types.h    | 5 -----
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index a6c33b471567..fc8a23fc28b6 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -9,7 +9,7 @@
 
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
-	unsigned long cslices_mask = gt->ccs.cslices;
+	unsigned long cslices_mask = CCS_MASK(gt);
 	u32 mode_val = 0;
 	/* CCS engine id, i.e. the engines position in the engine's bitmask */
 	int engine;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 9e257f34d05b..71e43071da0b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -218,11 +218,6 @@ struct intel_gt {
 	 * i.e. how the CCS streams are distributed amongs the slices.
 	 */
 	struct {
-		/*
-		 * Mask of the non fused CCS slices
-		 * to be used for the load balancing
-		 */
-		intel_engine_mask_t cslices;
 		u32 mode_reg_val;
 	} ccs;
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 09/15] drm/i915/gt: Expose the number of total CCS slices
  2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
                   ` (7 preceding siblings ...)
  2025-03-24 13:29 ` [PATCH v4 08/15] drm/i915/gt: Remove cslices mask value from the CCS structure Andi Shyti
@ 2025-03-24 13:29 ` Andi Shyti
  2025-03-24 13:29 ` [PATCH v4 10/15] drm/i915/gt: Store engine-related sysfs kobjects Andi Shyti
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Andi Shyti @ 2025-03-24 13:29 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Tvrtko Ursulin, Joonas Lahtinen, Chris Wilson, Simona Vetter,
	Arshad Mehmood, Michal Mrozek, Andi Shyti, Andi Shyti

Implement a sysfs interface to show the number of available CCS
slices. The displayed number does not take into account the CCS
balancing mode.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 21 +++++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h |  1 +
 drivers/gpu/drm/i915/gt/intel_gt_sysfs.c    |  2 ++
 3 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index fc8a23fc28b6..edb6a4b63826 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -5,7 +5,9 @@
 
 #include "i915_drv.h"
 #include "intel_gt_ccs_mode.h"
+#include "intel_gt_print.h"
 #include "intel_gt_regs.h"
+#include "intel_gt_sysfs.h"
 
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
@@ -100,3 +102,22 @@ void intel_gt_ccs_mode_init(struct intel_gt *gt)
 	/* Initialize the CCS mode setting */
 	intel_gt_apply_ccs_mode(gt);
 }
+
+static ssize_t num_cslices_show(struct device *dev,
+				struct device_attribute *attr,
+				char *buff)
+{
+	struct intel_gt *gt = kobj_to_gt(&dev->kobj);
+	u32 num_slices;
+
+	num_slices = hweight32(CCS_MASK(gt));
+
+	return sysfs_emit(buff, "%u\n", num_slices);
+}
+static DEVICE_ATTR_RO(num_cslices);
+
+void intel_gt_sysfs_ccs_init(struct intel_gt *gt)
+{
+	if (sysfs_create_file(&gt->sysfs_gt, &dev_attr_num_cslices.attr))
+		gt_warn(gt, "Failed to create sysfs num_cslices files\n");
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
index 4a6763b95a78..9696cc9017f6 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
@@ -9,5 +9,6 @@
 #include "intel_gt.h"
 
 void intel_gt_ccs_mode_init(struct intel_gt *gt);
+void intel_gt_sysfs_ccs_init(struct intel_gt *gt);
 
 #endif /* __INTEL_GT_CCS_MODE_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c b/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
index 33cba406b569..895eedc402ae 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
@@ -12,6 +12,7 @@
 #include "i915_drv.h"
 #include "i915_sysfs.h"
 #include "intel_gt.h"
+#include "intel_gt_ccs_mode.h"
 #include "intel_gt_print.h"
 #include "intel_gt_sysfs.h"
 #include "intel_gt_sysfs_pm.h"
@@ -101,6 +102,7 @@ void intel_gt_sysfs_register(struct intel_gt *gt)
 		goto exit_fail;
 
 	intel_gt_sysfs_pm_init(gt, &gt->sysfs_gt);
+	intel_gt_sysfs_ccs_init(gt);
 
 	return;
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 10/15] drm/i915/gt: Store engine-related sysfs kobjects
  2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
                   ` (8 preceding siblings ...)
  2025-03-24 13:29 ` [PATCH v4 09/15] drm/i915/gt: Expose the number of total CCS slices Andi Shyti
@ 2025-03-24 13:29 ` Andi Shyti
  2025-03-24 13:29 ` [PATCH v4 11/15] drm/i915/gt: Store active CCS mask Andi Shyti
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Andi Shyti @ 2025-03-24 13:29 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Tvrtko Ursulin, Joonas Lahtinen, Chris Wilson, Simona Vetter,
	Arshad Mehmood, Michal Mrozek, Andi Shyti, Andi Shyti

Upcoming commits will need to access engine-related kobjects to
enable the creation and destruction of sysfs interfaces at
runtime.

For this, store the "engine" directory (i915->sysfs_engine), the
engine files (gt->kobj), and the default data
(gt->kobj_defaults).

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 ++
 drivers/gpu/drm/i915/gt/sysfs_engines.c      | 4 ++++
 drivers/gpu/drm/i915/i915_drv.h              | 1 +
 3 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 155b6255a63e..be8f1eb77b29 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -393,6 +393,8 @@ struct intel_engine_cs {
 	u32 context_size;
 	u32 mmio_base;
 
+	struct kobject *kobj;
+
 	struct intel_engine_tlb_inv tlb_inv;
 
 	/*
diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c b/drivers/gpu/drm/i915/gt/sysfs_engines.c
index aab2759067d2..f70f0a2983f1 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -506,6 +506,8 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
 	if (!dir)
 		return;
 
+	i915->sysfs_engine = dir;
+
 	for_each_uabi_engine(engine, i915) {
 		struct kobject *kobj;
 
@@ -526,6 +528,8 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
 
 		add_defaults(container_of(kobj, struct kobj_engine, base));
 
+		engine->kobj = kobj;
+
 		if (0) {
 err_object:
 			kobject_put(kobj);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ffc346379cc2..97806e44429c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -319,6 +319,7 @@ struct drm_i915_private {
 	struct intel_gt *gt[I915_MAX_GT];
 
 	struct kobject *sysfs_gt;
+	struct kobject *sysfs_engine;
 
 	/* Quick lookup of media GT (current platforms only have one) */
 	struct intel_gt *media_gt;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 11/15] drm/i915/gt: Store active CCS mask
  2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
                   ` (9 preceding siblings ...)
  2025-03-24 13:29 ` [PATCH v4 10/15] drm/i915/gt: Store engine-related sysfs kobjects Andi Shyti
@ 2025-03-24 13:29 ` Andi Shyti
  2025-03-24 13:29 ` [PATCH v4 12/15] drm/i915: Protect access to the UABI engines list with a mutex Andi Shyti
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Andi Shyti @ 2025-03-24 13:29 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Tvrtko Ursulin, Joonas Lahtinen, Chris Wilson, Simona Vetter,
	Arshad Mehmood, Michal Mrozek, Andi Shyti, Andi Shyti

To support upcoming patches, we need to store the current mask
for active CCS engines.

Active engines refer to those exposed to userspace via the UABI
engine list.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 41 +++++++++++++++++++--
 drivers/gpu/drm/i915/gt/intel_gt_types.h    |  7 ++++
 2 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index edb6a4b63826..5eead7b18f57 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -12,6 +12,7 @@
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
 	unsigned long cslices_mask = CCS_MASK(gt);
+	unsigned long ccs_mask = gt->ccs.id_mask;
 	u32 mode_val = 0;
 	/* CCS engine id, i.e. the engines position in the engine's bitmask */
 	int engine;
@@ -55,7 +56,7 @@ static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 	 *   slice 2: ccs2
 	 *   slice 3: ccs3
 	 */
-	engine = __ffs(cslices_mask);
+	engine = __ffs(ccs_mask);
 
 	for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
 		if (!(cslices_mask & BIT(cslice))) {
@@ -86,7 +87,7 @@ static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 			 * CCS mode, will be used later to
 			 * reset to a flexible value
 			 */
-			engine = __ffs(cslices_mask);
+			engine = __ffs(ccs_mask);
 			continue;
 		}
 	}
@@ -94,13 +95,45 @@ static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 	gt->ccs.mode_reg_val = mode_val;
 }
 
+static void __update_ccs_mask(struct intel_gt *gt, u32 ccs_mode)
+{
+	unsigned long cslices_mask = CCS_MASK(gt);
+	int i;
+
+	/* Mask off all the CCS engines */
+	gt->ccs.id_mask = 0;
+
+	for_each_set_bit(i, &cslices_mask, I915_MAX_CCS) {
+		gt->ccs.id_mask |= BIT(i);
+
+		ccs_mode--;
+		if (!ccs_mode)
+			break;
+	}
+
+	/*
+	 * It's impossible for 'ccs_mode' to be zero at this point.
+	 * This scenario would only occur if the 'ccs_mode' provided by
+	 * the caller exceeded the total number of CCS engines, a condition
+	 * we check before calling the 'update_ccs_mask()' function.
+	 */
+	GEM_BUG_ON(ccs_mode);
+
+	/* Initialize the CCS mode setting */
+	intel_gt_apply_ccs_mode(gt);
+}
+
 void intel_gt_ccs_mode_init(struct intel_gt *gt)
 {
 	if (!IS_DG2(gt->i915))
 		return;
 
-	/* Initialize the CCS mode setting */
-	intel_gt_apply_ccs_mode(gt);
+	/*
+	 * Set CCS balance mode 1 in the ccs_mask.
+	 *
+	 * During init the workaround are not set up yet.
+	 */
+	__update_ccs_mask(gt, 1);
 }
 
 static ssize_t num_cslices_show(struct device *dev,
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 71e43071da0b..641be69016e1 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -219,6 +219,13 @@ struct intel_gt {
 	 */
 	struct {
 		u32 mode_reg_val;
+
+		/*
+		 * CCS id_mask is the command streamer instance
+		 * exposed to the user. While the CCS_MASK(gt)
+		 * is the available unfused compute slices.
+		 */
+		intel_engine_mask_t id_mask;
 	} ccs;
 
 	/*
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 12/15] drm/i915: Protect access to the UABI engines list with a mutex
  2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
                   ` (10 preceding siblings ...)
  2025-03-24 13:29 ` [PATCH v4 11/15] drm/i915/gt: Store active CCS mask Andi Shyti
@ 2025-03-24 13:29 ` Andi Shyti
  2025-03-24 13:29 ` [PATCH v4 13/15] drm/i915/gt: Isolate single sysfs engine file creation Andi Shyti
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Andi Shyti @ 2025-03-24 13:29 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Tvrtko Ursulin, Joonas Lahtinen, Chris Wilson, Simona Vetter,
	Arshad Mehmood, Michal Mrozek, Andi Shyti, Andi Shyti

Until now, the UABI engines list has been accessed in read-only
mode, as it was created once during boot and destroyed upon
module unload.

In upcoming commits, we will be modifying this list by changing
the CCS mode, allowing compute engines to be dynamically added
and removed at runtime based on user whims.

To ensure thread safety and prevent race conditions, we need to
protect the engine list with a mutex, thereby serializing access
to it.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c |  3 +++
 drivers/gpu/drm/i915/gt/intel_engine_user.c |  7 +++++++
 drivers/gpu/drm/i915/gt/sysfs_engines.c     |  5 +++++
 drivers/gpu/drm/i915/i915_cmd_parser.c      |  2 ++
 drivers/gpu/drm/i915/i915_debugfs.c         |  4 ++++
 drivers/gpu/drm/i915/i915_drv.h             |  4 ++++
 drivers/gpu/drm/i915/i915_gem.c             |  4 ++++
 drivers/gpu/drm/i915/i915_perf.c            |  8 +++++---
 drivers/gpu/drm/i915/i915_pmu.c             | 11 +++++++++--
 drivers/gpu/drm/i915/i915_query.c           | 21 ++++++++++++++++-----
 10 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index ab1af978911b..4263d3eb2557 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1124,6 +1124,7 @@ static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx,
 	if (!e)
 		return ERR_PTR(-ENOMEM);
 
+	mutex_lock(&ctx->i915->uabi_engines_mutex);
 	for_each_uabi_engine(engine, ctx->i915) {
 		struct intel_context *ce;
 		struct intel_sseu sseu = {};
@@ -1155,9 +1156,11 @@ static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx,
 
 	}
 
+	mutex_unlock(&ctx->i915->uabi_engines_mutex);
 	return e;
 
 free_engines:
+	mutex_unlock(&ctx->i915->uabi_engines_mutex);
 	free_engines(e);
 	return err;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 8e5284af8335..209d5badbd3d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -210,6 +210,13 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 	LIST_HEAD(engines);
 
 	sort_engines(i915, &engines);
+	mutex_init(&i915->uabi_engines_mutex);
+
+	/*
+	 * We are still booting i915 and we are sure we are running
+	 * single-threaded. We don't need at this point to protect the
+	 * uabi_engines access list with the mutex.
+	 */
 
 	prev = NULL;
 	p = &i915->uabi_engines.rb_node;
diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c b/drivers/gpu/drm/i915/gt/sysfs_engines.c
index f70f0a2983f1..d3d3c67edf34 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -508,6 +508,11 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
 
 	i915->sysfs_engine = dir;
 
+	/*
+	 * We are still booting i915 and we are sure we are running
+	 * single-threaded. We don't need at this point to protect the
+	 * uabi_engines access list with the mutex.
+	 */
 	for_each_uabi_engine(engine, i915) {
 		struct kobject *kobj;
 
diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 2905df83e180..12987ece6f8e 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -1592,12 +1592,14 @@ int i915_cmd_parser_get_version(struct drm_i915_private *dev_priv)
 	bool active = false;
 
 	/* If the command parser is not enabled, report 0 - unsupported */
+	mutex_lock(&dev_priv->uabi_engines_mutex);
 	for_each_uabi_engine(engine, dev_priv) {
 		if (intel_engine_using_cmd_parser(engine)) {
 			active = true;
 			break;
 		}
 	}
+	mutex_unlock(&dev_priv->uabi_engines_mutex);
 	if (!active)
 		return 0;
 
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 0d9e263913ff..f2957435b529 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -451,8 +451,10 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 		   to_gt(i915)->clock_period_ns);
 
 	p = drm_seq_file_printer(m);
+	mutex_lock(&i915->uabi_engines_mutex);
 	for_each_uabi_engine(engine, i915)
 		intel_engine_dump(engine, &p, "%s\n", engine->name);
+	mutex_unlock(&i915->uabi_engines_mutex);
 
 	intel_gt_show_timelines(to_gt(i915), &p, i915_request_show_with_schedule);
 
@@ -466,6 +468,7 @@ static int i915_wa_registers(struct seq_file *m, void *unused)
 	struct drm_i915_private *i915 = node_to_i915(m->private);
 	struct intel_engine_cs *engine;
 
+	mutex_lock(&i915->uabi_engines_mutex);
 	for_each_uabi_engine(engine, i915) {
 		const struct i915_wa_list *wal = &engine->ctx_wa_list;
 		const struct i915_wa *wa;
@@ -485,6 +488,7 @@ static int i915_wa_registers(struct seq_file *m, void *unused)
 
 		seq_printf(m, "\n");
 	}
+	mutex_unlock(&i915->uabi_engines_mutex);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 97806e44429c..fcfab2ad2908 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -223,6 +223,10 @@ struct drm_i915_private {
 		struct rb_root uabi_engines;
 	};
 	unsigned int engine_uabi_class_count[I915_LAST_UABI_ENGINE_CLASS + 1];
+	/*
+	 * Protect access to the uabi_engines list.
+	 */
+	struct mutex uabi_engines_mutex;
 
 	/* protects the irq masks */
 	spinlock_t irq_lock;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8c8d43451f35..56b796c0e06b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1261,7 +1261,11 @@ void i915_gem_driver_remove(struct drm_i915_private *dev_priv)
 	i915_gem_suspend_late(dev_priv);
 	for_each_gt(gt, dev_priv, i)
 		intel_gt_driver_remove(gt);
+
+	/* Let's make sure no one is using the uabi_engines list */
+	mutex_lock(&dev_priv->uabi_engines_mutex);
 	dev_priv->uabi_engines = RB_ROOT;
+	mutex_unlock(&dev_priv->uabi_engines_mutex);
 
 	/* Flush any outstanding unpin_work. */
 	i915_gem_drain_workqueue(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index bec164e884ae..7c8dc42a3623 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -2691,7 +2691,7 @@ oa_configure_all_contexts(struct i915_perf_stream *stream,
 	struct intel_engine_cs *engine;
 	struct intel_gt *gt = stream->engine->gt;
 	struct i915_gem_context *ctx, *cn;
-	int err;
+	int err = 0;
 
 	lockdep_assert_held(&gt->perf.lock);
 
@@ -2735,6 +2735,7 @@ oa_configure_all_contexts(struct i915_perf_stream *stream,
 	 * If we don't modify the kernel_context, we do not get events while
 	 * idle.
 	 */
+	mutex_lock(&i915->uabi_engines_mutex);
 	for_each_uabi_engine(engine, i915) {
 		struct intel_context *ce = engine->kernel_context;
 
@@ -2745,10 +2746,11 @@ oa_configure_all_contexts(struct i915_perf_stream *stream,
 
 		err = gen8_modify_self(ce, regs, num_regs, active);
 		if (err)
-			return err;
+			break;
 	}
+	mutex_unlock(&i915->uabi_engines_mutex);
 
-	return 0;
+	return err;
 }
 
 static int
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 69a109d02116..047588aba524 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1017,6 +1017,7 @@ create_event_attributes(struct i915_pmu *pmu)
 		}
 	}
 
+	mutex_lock(&i915->uabi_engines_mutex);
 	for_each_uabi_engine(engine, i915) {
 		for (i = 0; i < ARRAY_SIZE(engine_events); i++) {
 			if (!engine_event_status(engine,
@@ -1024,6 +1025,7 @@ create_event_attributes(struct i915_pmu *pmu)
 				count++;
 		}
 	}
+	mutex_unlock(&i915->uabi_engines_mutex);
 
 	/* Allocate attribute objects and table. */
 	i915_attr = kcalloc(count, sizeof(*i915_attr), GFP_KERNEL);
@@ -1081,6 +1083,7 @@ create_event_attributes(struct i915_pmu *pmu)
 	}
 
 	/* Initialize supported engine counters. */
+	mutex_lock(&i915->uabi_engines_mutex);
 	for_each_uabi_engine(engine, i915) {
 		for (i = 0; i < ARRAY_SIZE(engine_events); i++) {
 			char *str;
@@ -1092,7 +1095,7 @@ create_event_attributes(struct i915_pmu *pmu)
 			str = kasprintf(GFP_KERNEL, "%s-%s",
 					engine->name, engine_events[i].name);
 			if (!str)
-				goto err;
+				goto err_unlock;
 
 			*attr_iter++ = &i915_iter->attr.attr;
 			i915_iter =
@@ -1104,18 +1107,22 @@ create_event_attributes(struct i915_pmu *pmu)
 			str = kasprintf(GFP_KERNEL, "%s-%s.unit",
 					engine->name, engine_events[i].name);
 			if (!str)
-				goto err;
+				goto err_unlock;
 
 			*attr_iter++ = &pmu_iter->attr.attr;
 			pmu_iter = add_pmu_attr(pmu_iter, str, "ns");
 		}
 	}
+	mutex_unlock(&i915->uabi_engines_mutex);
 
 	pmu->i915_attr = i915_attr;
 	pmu->pmu_attr = pmu_attr;
 
 	return attr;
 
+err_unlock:
+	mutex_unlock(&i915->uabi_engines_mutex);
+
 err:;
 	for (attr_iter = attr; *attr_iter; attr_iter++)
 		kfree((*attr_iter)->name);
diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
index 14d9ec0ed777..7c6669cc4c96 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -140,6 +140,7 @@ query_engine_info(struct drm_i915_private *i915,
 	if (query_item->flags)
 		return -EINVAL;
 
+	mutex_lock(&i915->uabi_engines_mutex);
 	for_each_uabi_engine(engine, i915)
 		num_uabi_engines++;
 
@@ -147,11 +148,13 @@ query_engine_info(struct drm_i915_private *i915,
 
 	ret = copy_query_item(&query, sizeof(query), len, query_item);
 	if (ret != 0)
-		return ret;
+		goto err;
 
 	if (query.num_engines || query.rsvd[0] || query.rsvd[1] ||
-	    query.rsvd[2])
-		return -EINVAL;
+	    query.rsvd[2]) {
+		ret = -EINVAL;
+		goto err;
+	}
 
 	info_ptr = &query_ptr->engines[0];
 
@@ -162,17 +165,25 @@ query_engine_info(struct drm_i915_private *i915,
 		info.capabilities = engine->uabi_capabilities;
 		info.logical_instance = ilog2(engine->logical_mask);
 
-		if (copy_to_user(info_ptr, &info, sizeof(info)))
-			return -EFAULT;
+		if (copy_to_user(info_ptr, &info, sizeof(info))) {
+			ret = -EFAULT;
+			goto err;
+		}
 
 		query.num_engines++;
 		info_ptr++;
 	}
+	mutex_unlock(&i915->uabi_engines_mutex);
 
 	if (copy_to_user(query_ptr, &query, sizeof(query)))
 		return -EFAULT;
 
 	return len;
+
+err:
+	mutex_unlock(&i915->uabi_engines_mutex);
+
+	return ret;
 }
 
 static int can_copy_perf_config_registers_or_number(u32 user_n_regs,
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 13/15] drm/i915/gt: Isolate single sysfs engine file creation
  2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
                   ` (11 preceding siblings ...)
  2025-03-24 13:29 ` [PATCH v4 12/15] drm/i915: Protect access to the UABI engines list with a mutex Andi Shyti
@ 2025-03-24 13:29 ` Andi Shyti
  2025-03-24 13:29 ` [PATCH v4 14/15] drm/i915/gt: Implement creation and removal routines for CCS engines Andi Shyti
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Andi Shyti @ 2025-03-24 13:29 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Tvrtko Ursulin, Joonas Lahtinen, Chris Wilson, Simona Vetter,
	Arshad Mehmood, Michal Mrozek, Andi Shyti, Andi Shyti

In preparation for upcoming patches, we need the ability to
create and remove individual sysfs files. To facilitate this,
extract from the intel_engines_add_sysfs() function the creation
of individual files.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/sysfs_engines.c | 75 ++++++++++++++++---------
 drivers/gpu/drm/i915/gt/sysfs_engines.h |  2 +
 2 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c b/drivers/gpu/drm/i915/gt/sysfs_engines.c
index d3d3c67edf34..ef2eda72ac7f 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -9,6 +9,7 @@
 #include "i915_drv.h"
 #include "intel_engine.h"
 #include "intel_engine_heartbeat.h"
+#include "intel_gt_print.h"
 #include "sysfs_engines.h"
 
 struct kobj_engine {
@@ -481,7 +482,7 @@ static void add_defaults(struct kobj_engine *parent)
 		return;
 }
 
-void intel_engines_add_sysfs(struct drm_i915_private *i915)
+int intel_engine_add_single_sysfs(struct intel_engine_cs *engine)
 {
 	static const struct attribute * const files[] = {
 		&name_attr.attr,
@@ -497,7 +498,48 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
 #endif
 		NULL
 	};
+	struct kobject *dir = engine->i915->sysfs_engine;
+	struct kobject *kobj = engine->kobj;
+	int err;
+
+	kobj = kobj_engine(dir, engine);
+	if (!kobj) {
+		err = -EFAULT;
+		goto err_engine;
+	}
+
+	err = sysfs_create_files(kobj, files);
+	if (err)
+		goto err_object;
+
+	if (intel_engine_has_timeslices(engine)) {
+		err = sysfs_create_file(kobj, &timeslice_duration_attr.attr);
+		if (err)
+			goto err_object;
+	}
 
+	if (intel_engine_has_preempt_reset(engine)) {
+		err = sysfs_create_file(kobj, &preempt_timeout_attr.attr);
+		if (err)
+			goto err_object;
+	}
+
+	add_defaults(container_of(kobj, struct kobj_engine, base));
+
+	engine->kobj = kobj;
+
+	return 0;
+
+err_object:
+	kobject_put(kobj);
+err_engine:
+	gt_warn(engine->gt, "Failed to add sysfs engine '%s'\n", engine->name);
+
+	return err;
+}
+
+void intel_engines_add_sysfs(struct drm_i915_private *i915)
+{
 	struct device *kdev = i915->drm.primary->kdev;
 	struct intel_engine_cs *engine;
 	struct kobject *dir;
@@ -514,33 +556,10 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
 	 * uabi_engines access list with the mutex.
 	 */
 	for_each_uabi_engine(engine, i915) {
-		struct kobject *kobj;
-
-		kobj = kobj_engine(dir, engine);
-		if (!kobj)
-			goto err_engine;
-
-		if (sysfs_create_files(kobj, files))
-			goto err_object;
+		int err;
 
-		if (intel_engine_has_timeslices(engine) &&
-		    sysfs_create_file(kobj, &timeslice_duration_attr.attr))
-			goto err_engine;
-
-		if (intel_engine_has_preempt_reset(engine) &&
-		    sysfs_create_file(kobj, &preempt_timeout_attr.attr))
-			goto err_engine;
-
-		add_defaults(container_of(kobj, struct kobj_engine, base));
-
-		engine->kobj = kobj;
-
-		if (0) {
-err_object:
-			kobject_put(kobj);
-err_engine:
-			dev_warn(kdev, "Failed to add sysfs engine '%s'\n",
-				 engine->name);
-		}
+		err = intel_engine_add_single_sysfs(engine);
+		if (err)
+			break;
 	}
 }
diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.h b/drivers/gpu/drm/i915/gt/sysfs_engines.h
index 9546fffe03a7..2e3ec2df14a9 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.h
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.h
@@ -7,7 +7,9 @@
 #define INTEL_ENGINE_SYSFS_H
 
 struct drm_i915_private;
+struct intel_engine_cs;
 
 void intel_engines_add_sysfs(struct drm_i915_private *i915);
+int intel_engine_add_single_sysfs(struct intel_engine_cs *engine);
 
 #endif /* INTEL_ENGINE_SYSFS_H */
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 14/15] drm/i915/gt: Implement creation and removal routines for CCS engines
  2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
                   ` (12 preceding siblings ...)
  2025-03-24 13:29 ` [PATCH v4 13/15] drm/i915/gt: Isolate single sysfs engine file creation Andi Shyti
@ 2025-03-24 13:29 ` Andi Shyti
  2025-03-24 13:29 ` [PATCH v4 15/15] drm/i915/gt: Allow the user to change the CCS mode through sysfs Andi Shyti
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Andi Shyti @ 2025-03-24 13:29 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Tvrtko Ursulin, Joonas Lahtinen, Chris Wilson, Simona Vetter,
	Arshad Mehmood, Michal Mrozek, Andi Shyti, Andi Shyti

In preparation for upcoming patches, we need routines to
dynamically create and destroy CCS engines based on the CCS mode
that the user wants to set.

The process begins by calculating the engine mask for the engines
that need to be added or removed. We then update the UABI list of
exposed engines and create or destroy the corresponding sysfs
interfaces accordingly.

These functions are not yet in use, so no functional changes are
intended at this stage.

Mark the functions 'add_uabi_ccs_engines()' and
'remove_uabi_ccs_engines()' as '__maybe_unused' to ensure
successful compilation and maintain bisectability. This
annotation will be removed in subsequent commits.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 128 ++++++++++++++++++++
 1 file changed, 128 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index 5eead7b18f57..cbabeb503d3b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -4,10 +4,12 @@
  */
 
 #include "i915_drv.h"
+#include "intel_engine_user.h"
 #include "intel_gt_ccs_mode.h"
 #include "intel_gt_print.h"
 #include "intel_gt_regs.h"
 #include "intel_gt_sysfs.h"
+#include "sysfs_engines.h"
 
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
@@ -123,6 +125,29 @@ static void __update_ccs_mask(struct intel_gt *gt, u32 ccs_mode)
 	intel_gt_apply_ccs_mode(gt);
 }
 
+static void update_ccs_mask(struct intel_gt *gt, u32 ccs_mode)
+{
+	struct intel_engine_cs *engine;
+	intel_engine_mask_t tmp;
+
+	__update_ccs_mask(gt, ccs_mode);
+
+	/* Update workaround values */
+	for_each_engine_masked(engine, gt, gt->ccs.id_mask, tmp) {
+		struct i915_wa_list *wal = &engine->wa_list;
+		struct i915_wa *wa;
+		int i;
+
+		for (i = 0, wa = wal->list; i < wal->count; i++, wa++) {
+			if (!i915_mmio_reg_equal(wa->reg, XEHP_CCS_MODE))
+				continue;
+
+			wa->set = gt->ccs.mode_reg_val;
+			wa->read = gt->ccs.mode_reg_val;
+		}
+	}
+}
+
 void intel_gt_ccs_mode_init(struct intel_gt *gt)
 {
 	if (!IS_DG2(gt->i915))
@@ -136,6 +161,109 @@ void intel_gt_ccs_mode_init(struct intel_gt *gt)
 	__update_ccs_mask(gt, 1);
 }
 
+static int rb_engine_cmp(struct rb_node *rb_new, const struct rb_node *rb_old)
+{
+	struct intel_engine_cs *new = rb_to_uabi_engine(rb_new);
+	struct intel_engine_cs *old = rb_to_uabi_engine(rb_old);
+
+	if (new->uabi_class - old->uabi_class == 0)
+		return new->uabi_instance - old->uabi_instance;
+
+	return new->uabi_class - old->uabi_class;
+}
+
+static void __maybe_unused add_uabi_ccs_engines(struct intel_gt *gt, u32 ccs_mode)
+{
+	struct drm_i915_private *i915 = gt->i915;
+	intel_engine_mask_t new_ccs_mask, tmp;
+	struct intel_engine_cs *e;
+
+	/* Store the current ccs mask */
+	new_ccs_mask = gt->ccs.id_mask;
+	update_ccs_mask(gt, ccs_mode);
+
+	/*
+	 * Store only the mask of the CCS engines that need to be added by
+	 * removing from the new mask the engines that are already active
+	 */
+	new_ccs_mask = gt->ccs.id_mask & ~new_ccs_mask;
+	new_ccs_mask <<= CCS0;
+
+	mutex_lock(&i915->uabi_engines_mutex);
+	for_each_engine_masked(e, gt, new_ccs_mask, tmp) {
+		int err;
+		struct rb_node *n;
+		struct intel_engine_cs *__e;
+
+		i915->engine_uabi_class_count[I915_ENGINE_CLASS_COMPUTE]++;
+
+		/*
+		 * The engine is now inserted and marked as valid.
+		 *
+		 * rb_find_add() should always return NULL. If it returns a
+		 * pointer to an rb_node it means that it found the engine we
+		 * are trying to insert which means that something is really
+		 * wrong.
+		 */
+		n = rb_find_add(&e->uabi_node,
+				&i915->uabi_engines, rb_engine_cmp);
+		GEM_BUG_ON(n);
+
+		/* We inserted the engine, let's check if now we can find it */
+		__e = intel_engine_lookup_user(i915, e->uabi_class,
+					       e->uabi_instance) != e;
+		GEM_BUG_ON(__e != e);
+
+		/*
+		 * If the engine has never been used before (e.g. we are moving
+		 * for the first time from CCS mode 1 to CCS mode 2 or 4), then
+		 * also its sysfs entry has never been created. In this case its
+		 * value will be null and we need to allocate it.
+		 */
+		if (!e->kobj)
+			err = intel_engine_add_single_sysfs(e);
+		else
+			err = kobject_add(e->kobj,
+					  i915->sysfs_engine, "%s", e->name);
+
+		if (err)
+			gt_warn(gt,
+				"Unable to create sysfs entries for %s engine",
+				e->name);
+	}
+	mutex_unlock(&i915->uabi_engines_mutex);
+}
+
+static void __maybe_unused remove_uabi_ccs_engines(struct intel_gt *gt, u8 ccs_mode)
+{
+	struct drm_i915_private *i915 = gt->i915;
+	intel_engine_mask_t new_ccs_mask, tmp;
+	struct intel_engine_cs *e;
+
+	/* Store the current ccs mask */
+	new_ccs_mask = gt->ccs.id_mask;
+	update_ccs_mask(gt, ccs_mode);
+
+	/*
+	 * Store only the mask of the CCS engines that need to be removed by
+	 * unmasking them from the new mask the engines that are already active
+	 */
+	new_ccs_mask = new_ccs_mask & ~gt->ccs.id_mask;
+	new_ccs_mask <<= CCS0;
+
+	mutex_lock(&i915->uabi_engines_mutex);
+	for_each_engine_masked(e, gt, new_ccs_mask, tmp) {
+		i915->engine_uabi_class_count[I915_ENGINE_CLASS_COMPUTE]--;
+
+		rb_erase(&e->uabi_node, &i915->uabi_engines);
+		RB_CLEAR_NODE(&e->uabi_node);
+
+		/* Remove sysfs entries */
+		kobject_del(e->kobj);
+	}
+	mutex_unlock(&i915->uabi_engines_mutex);
+}
+
 static ssize_t num_cslices_show(struct device *dev,
 				struct device_attribute *attr,
 				char *buff)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 15/15] drm/i915/gt: Allow the user to change the CCS mode through sysfs
  2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
                   ` (13 preceding siblings ...)
  2025-03-24 13:29 ` [PATCH v4 14/15] drm/i915/gt: Implement creation and removal routines for CCS engines Andi Shyti
@ 2025-03-24 13:29 ` Andi Shyti
  2025-03-24 13:59 ` [PATCH v4 00/15] CCS static load balance Mrozek, Michal
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Andi Shyti @ 2025-03-24 13:29 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Tvrtko Ursulin, Joonas Lahtinen, Chris Wilson, Simona Vetter,
	Arshad Mehmood, Michal Mrozek, Andi Shyti, Andi Shyti

Create the 'ccs_mode' file under

/sys/class/drm/cardX/gt/gt0/ccs_mode

This file allows the user to read and set the current CCS mode.

 - Reading: The user can read the current CCS mode, which can be
   1, 2, or 4. This value is derived from the current engine
   mask.

 - Writing: The user can set the CCS mode to 1, 2, or 4,
   depending on the desired number of exposed engines and the
   required load balancing.

The interface will return -EBUSY if other clients are connected
to i915, or -EINVAL if an invalid value is set.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 82 ++++++++++++++++++++-
 1 file changed, 80 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index cbabeb503d3b..8364523f2730 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -6,6 +6,7 @@
 #include "i915_drv.h"
 #include "intel_engine_user.h"
 #include "intel_gt_ccs_mode.h"
+#include "intel_gt_pm.h"
 #include "intel_gt_print.h"
 #include "intel_gt_regs.h"
 #include "intel_gt_sysfs.h"
@@ -172,7 +173,7 @@ static int rb_engine_cmp(struct rb_node *rb_new, const struct rb_node *rb_old)
 	return new->uabi_class - old->uabi_class;
 }
 
-static void __maybe_unused add_uabi_ccs_engines(struct intel_gt *gt, u32 ccs_mode)
+static void add_uabi_ccs_engines(struct intel_gt *gt, u32 ccs_mode)
 {
 	struct drm_i915_private *i915 = gt->i915;
 	intel_engine_mask_t new_ccs_mask, tmp;
@@ -234,7 +235,7 @@ static void __maybe_unused add_uabi_ccs_engines(struct intel_gt *gt, u32 ccs_mod
 	mutex_unlock(&i915->uabi_engines_mutex);
 }
 
-static void __maybe_unused remove_uabi_ccs_engines(struct intel_gt *gt, u8 ccs_mode)
+static void remove_uabi_ccs_engines(struct intel_gt *gt, u8 ccs_mode)
 {
 	struct drm_i915_private *i915 = gt->i915;
 	intel_engine_mask_t new_ccs_mask, tmp;
@@ -277,8 +278,85 @@ static ssize_t num_cslices_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(num_cslices);
 
+static ssize_t ccs_mode_show(struct device *dev,
+			     struct device_attribute *attr, char *buff)
+{
+	struct intel_gt *gt = kobj_to_gt(&dev->kobj);
+	u32 ccs_mode;
+
+	ccs_mode = hweight32(gt->ccs.id_mask);
+
+	return sysfs_emit(buff, "%u\n", ccs_mode);
+}
+
+static ssize_t ccs_mode_store(struct device *dev,
+			      struct device_attribute *attr,
+			      const char *buff, size_t count)
+{
+	struct intel_gt *gt = kobj_to_gt(&dev->kobj);
+	int num_cslices = hweight32(CCS_MASK(gt));
+	int ccs_mode = hweight32(gt->ccs.id_mask);
+	ssize_t ret;
+	u32 val;
+
+	ret = kstrtou32(buff, 0, &val);
+	if (ret)
+		return ret;
+
+	/*
+	 * As of now possible values to be set are 1, 2, 4,
+	 * up to the maximum number of available slices
+	 */
+	if (!val || val > num_cslices || (num_cslices % val))
+		return -EINVAL;
+
+	/* Let's wait until the GT is no longer in use */
+	ret = intel_gt_pm_wait_for_idle(gt);
+	if (ret)
+		return ret;
+
+	mutex_lock(&gt->wakeref.mutex);
+
+	/*
+	 * Let's check again that the GT is idle,
+	 * we don't want to change the CCS mode
+	 * while someone is using the GT
+	 */
+	if (intel_gt_pm_is_awake(gt)) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	/*
+	 * Nothing to do if the requested setting
+	 * is the same as the current one
+	 */
+	if (val == ccs_mode)
+		goto out;
+	else if (val > ccs_mode)
+		add_uabi_ccs_engines(gt, val);
+	else
+		remove_uabi_ccs_engines(gt, val);
+
+out:
+	mutex_unlock(&gt->wakeref.mutex);
+
+	return ret ?: count;
+}
+static DEVICE_ATTR_RW(ccs_mode);
+
 void intel_gt_sysfs_ccs_init(struct intel_gt *gt)
 {
 	if (sysfs_create_file(&gt->sysfs_gt, &dev_attr_num_cslices.attr))
 		gt_warn(gt, "Failed to create sysfs num_cslices files\n");
+
+	/*
+	 * Do not create the ccs_mode file for non DG2 platforms
+	 * because they don't need it as they have only one CCS engine
+	 */
+	if (!IS_DG2(gt->i915))
+		return;
+
+	if (sysfs_create_file(&gt->sysfs_gt, &dev_attr_ccs_mode.attr))
+		gt_warn(gt, "Failed to create sysfs ccs_mode files\n");
 }
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* RE: [PATCH v4 00/15] CCS static load balance
  2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
                   ` (14 preceding siblings ...)
  2025-03-24 13:29 ` [PATCH v4 15/15] drm/i915/gt: Allow the user to change the CCS mode through sysfs Andi Shyti
@ 2025-03-24 13:59 ` Mrozek, Michal
  2025-03-25  8:24 ` Joonas Lahtinen
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Mrozek, Michal @ 2025-03-24 13:59 UTC (permalink / raw)
  To: Andi Shyti, intel-gfx, dri-devel
  Cc: Tvrtko Ursulin, Joonas Lahtinen, Chris Wilson, Simona Vetter,
	Mehmood, Arshad, Andi Shyti

Acked-by: Michal Mrozek <michal.mrozek@intel.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 00/15] CCS static load balance
  2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
                   ` (15 preceding siblings ...)
  2025-03-24 13:59 ` [PATCH v4 00/15] CCS static load balance Mrozek, Michal
@ 2025-03-25  8:24 ` Joonas Lahtinen
  2025-03-25 10:52   ` Andi Shyti
  2025-03-25 10:36 ` Mehmood, Arshad
  2025-03-27 13:44 ` Ayyalasomayajula, Usharani
  18 siblings, 1 reply; 24+ messages in thread
From: Joonas Lahtinen @ 2025-03-25  8:24 UTC (permalink / raw)
  To: Andi Shyti, dri-devel, intel-gfx
  Cc: Tvrtko Ursulin, Chris Wilson, Simona Vetter, Arshad Mehmood,
	Michal Mrozek, Andi Shyti, Andi Shyti

Quoting Andi Shyti (2025-03-24 15:29:36)
> Hi,
> 
> Back in v3, this patch series was turned down due to community
> policies regarding i915 GEM development. Since then, I have
> received several requests from userspace developers, which I
> initially declined in order to respect those policies.
> 
> However, with the latest request from UMD users, I decided to
> give this series another chance. I believe that when a feature
> is genuinely needed, our goal should be to support it, not to
> dismiss user and customer needs blindly.

We had plenty of community bug reports when the move to fixed CCS mode
was initially implemented with some bugs.

After those bugs were fixed, nobody was reporting impactful performance
regressions.

Do you have a reference to some GitLab issues or maybe some external
project issues where regressions around here are discussed?

Regards, Joonas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 00/15] CCS static load balance
  2025-03-25  8:24 ` Joonas Lahtinen
@ 2025-03-25 10:52   ` Andi Shyti
  2025-03-27  6:49     ` Joonas Lahtinen
  0 siblings, 1 reply; 24+ messages in thread
From: Andi Shyti @ 2025-03-25 10:52 UTC (permalink / raw)
  To: Joonas Lahtinen
  Cc: Andi Shyti, dri-devel, intel-gfx, Tvrtko Ursulin, Chris Wilson,
	Simona Vetter, Arshad Mehmood, Michal Mrozek, Andi Shyti,
	Usharani Ayyalasomayajula

Hi Joonas,

thanks a lot for your reply!

On Tue, Mar 25, 2025 at 10:24:42AM +0200, Joonas Lahtinen wrote:
> Quoting Andi Shyti (2025-03-24 15:29:36)
> > Back in v3, this patch series was turned down due to community
> > policies regarding i915 GEM development. Since then, I have
> > received several requests from userspace developers, which I
> > initially declined in order to respect those policies.
> > 
> > However, with the latest request from UMD users, I decided to
> > give this series another chance. I believe that when a feature
> > is genuinely needed, our goal should be to support it, not to
> > dismiss user and customer needs blindly.
> 
> We had plenty of community bug reports when the move to fixed CCS mode
> was initially implemented with some bugs.
> 
> After those bugs were fixed, nobody was reporting impactful performance
> regressions.
> 
> Do you have a reference to some GitLab issues or maybe some external
> project issues where regressions around here are discussed?

AFAIK, there's no GitLab issue for this because we're not fixing
a bug here; we're adding a new sysfs interface.

All known issues and reports related to CCS load balancing have
already been addressed.

What we're still missing is a way for compute applications to
tweak CCS load balancing settings. I already shared the link [1],
but if you take a look at that code, you'll find
'execution_environment_drm.cpp' [2], where the new interface is
used.

If you're feeling lazy, I can point out the relevant parts,
otherwise, feel free to skip to the final greetings :-)

In 'void ExecutionEnvironment::configureCcsMode()', the app sets
up the path like this:

    const std::string drmPath = "/sys/class/drm";
    const std::string expectedFilePrefix = drmPath + "/card";

    ...

    auto gtFiles = Directory::getFiles(gtPath.c_str());
    auto expectedGtFilePrefix = gtPath + "/gt";

    ...

    std::string ccsFile = gtFile + "/ccs_mode";

Then it writes the desired CCS mode value:

    uint32_t ccsValue = 0;
    ssize_t ret = SysCalls::read(fd, &ccsValue, sizeof(uint32_t));

    ...

    do {
        ret = SysCalls::write(fd, &ccsMode, sizeof(uint32_t));
    } while (ret == -1 && errno == -EBUSY);

Arshad and Usha can definitely help if there are any technical
questions about how the application uses the interface.

Usha, would you please be able to share your use case?

Thanks,
Andi

[1] https://github.com/intel/compute-runtime
[2] https://github.com/intel/compute-runtime/blob/master/shared/source/execution_environment/execution_environment_drm.cpp

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 00/15] CCS static load balance
  2025-03-25 10:52   ` Andi Shyti
@ 2025-03-27  6:49     ` Joonas Lahtinen
  2025-03-27 15:10       ` Mehmood, Arshad
  0 siblings, 1 reply; 24+ messages in thread
From: Joonas Lahtinen @ 2025-03-27  6:49 UTC (permalink / raw)
  To: Andi Shyti
  Cc: Andi Shyti, dri-devel, intel-gfx, Tvrtko Ursulin, Chris Wilson,
	Simona Vetter, Arshad Mehmood, Michal Mrozek, Andi Shyti,
	Usharani Ayyalasomayajula

Quoting Andi Shyti (2025-03-25 12:52:58)
> On Tue, Mar 25, 2025 at 10:24:42AM +0200, Joonas Lahtinen wrote:

<SNIP>

> > Do you have a reference to some GitLab issues or maybe some external
> > project issues where regressions around here are discussed?
> 
> AFAIK, there's no GitLab issue for this because we're not fixing
> a bug here; we're adding a new sysfs interface.

This sysfs interface was exactly designed to address performance
regressions coming from limiting the number of CCS to 1.

So unless we have a specific workload and end-user reporting a 
regression on it, there's no incentive to spend any further time here.

<SNIP>

> Arshad and Usha can definitely help if there are any technical
> questions about how the application uses the interface.

I don't have any technical questions as I specified the interface
initially :)

This is not about technical opens about how the interface works.
To recap, when we initially implemented the 1CCS mode, we got active
feedback on the community on regressions.

We were careful to verify that all userspace would cleanly fall back to
using 1CCS mode after it was implemented. And indeed, nobody has been
asking for the 4CCS mode back after the 1CCS mode bugs were fixed.

So as far as I see it, there are no users for this interface in
upstream, and thus we should not spend the time on it.

Regards, Joonas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 00/15] CCS static load balance
  2025-03-27  6:49     ` Joonas Lahtinen
@ 2025-03-27 15:10       ` Mehmood, Arshad
  0 siblings, 0 replies; 24+ messages in thread
From: Mehmood, Arshad @ 2025-03-27 15:10 UTC (permalink / raw)
  To: Joonas Lahtinen, Andi Shyti
  Cc: dri-devel, intel-gfx, Tvrtko Ursulin, Chris Wilson, Simona Vetter,
	Mrozek, Michal, Andi Shyti, Ayyalasomayajula, Usharani

[-- Attachment #1: Type: text/plain, Size: 3281 bytes --]

I’d like to provide additional context regarding the necessity of these patches.
The shift from dynamic load balancing mode to fixed mode, with CCS usage restricted to a single unit, has led to a notable performance regression, with workloads experiencing an approximately 10% FPS drop.

For example, on DG2, the ResNet-50 inference benchmark previously achieved ~10,500 FPS in dynamic load balancing mode. However, after limiting CCS to 1 in fixed mode, performance dropped to ~9,200 FPS. With these patches, enabling all 4 CCS units via sysfs (in fixed mode) restores performance back to nearly 10,500 FPS, effectively matching the previous dynamic mode results.

Given customer expectations to maintain prior performance levels, these patches are essential to ensuring workloads utilizing multiple CCS units do not experience unnecessary degradation. The proposed sysfs interface provides configurability, allowing controlled re-enablement of all 4 CCS units while keeping fixed mode intact. Since fixed mode is now in use, having a configurable approach ensures flexibility to address different scenarios that may arise.

Let me know if you need further details.

Regards,
Arshad
________________________________
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Sent: Thursday, March 27, 2025 2:49 PM
To: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>; dri-devel <dri-devel@lists.freedesktop.org>; intel-gfx <intel-gfx@lists.freedesktop.org>; Tvrtko Ursulin <tursulin@ursulin.net>; Chris Wilson <chris.p.wilson@linux.intel.com>; Simona Vetter <simona.vetter@ffwll.ch>; Mehmood, Arshad <arshad.mehmood@intel.com>; Mrozek, Michal <michal.mrozek@intel.com>; Andi Shyti <andi.shyti@kernel.org>; Ayyalasomayajula, Usharani <usharani.ayyalasomayajula@intel.com>
Subject: Re: [PATCH v4 00/15] CCS static load balance

Quoting Andi Shyti (2025-03-25 12:52:58)
> On Tue, Mar 25, 2025 at 10:24:42AM +0200, Joonas Lahtinen wrote:

<SNIP>

> > Do you have a reference to some GitLab issues or maybe some external
> > project issues where regressions around here are discussed?
>
> AFAIK, there's no GitLab issue for this because we're not fixing
> a bug here; we're adding a new sysfs interface.

This sysfs interface was exactly designed to address performance
regressions coming from limiting the number of CCS to 1.

So unless we have a specific workload and end-user reporting a
regression on it, there's no incentive to spend any further time here.

<SNIP>

> Arshad and Usha can definitely help if there are any technical
> questions about how the application uses the interface.

I don't have any technical questions as I specified the interface
initially :)

This is not about technical opens about how the interface works.
To recap, when we initially implemented the 1CCS mode, we got active
feedback on the community on regressions.

We were careful to verify that all userspace would cleanly fall back to
using 1CCS mode after it was implemented. And indeed, nobody has been
asking for the 4CCS mode back after the 1CCS mode bugs were fixed.

So as far as I see it, there are no users for this interface in
upstream, and thus we should not spend the time on it.

Regards, Joonas

[-- Attachment #2: Type: text/html, Size: 6135 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 00/15] CCS static load balance
  2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
                   ` (16 preceding siblings ...)
  2025-03-25  8:24 ` Joonas Lahtinen
@ 2025-03-25 10:36 ` Mehmood, Arshad
  2025-03-27 13:44 ` Ayyalasomayajula, Usharani
  18 siblings, 0 replies; 24+ messages in thread
From: Mehmood, Arshad @ 2025-03-25 10:36 UTC (permalink / raw)
  To: Andi Shyti, intel-gfx, dri-devel
  Cc: Tvrtko Ursulin, Joonas Lahtinen, Chris Wilson, Simona Vetter,
	Mrozek, Michal, Andi Shyti

[-- Attachment #1: Type: text/plain, Size: 8902 bytes --]

The ccs mode setting support via sysfs is required by our customer.

Acked-by: Arshad Mehmood <arshad.mehmood@intel.com>

________________________________
From: Andi Shyti
Sent: Monday, March 24, 2025 9:29 PM
To: intel-gfx; dri-devel
Cc: Tvrtko Ursulin; Joonas Lahtinen; Chris Wilson; Simona Vetter; Mehmood, Arshad; Mrozek, Michal; Andi Shyti; Andi Shyti
Subject: [PATCH v4 00/15] CCS static load balance

Hi,

Back in v3, this patch series was turned down due to community
policies regarding i915 GEM development. Since then, I have
received several requests from userspace developers, which I
initially declined in order to respect those policies.

However, with the latest request from UMD users, I decided to
give this series another chance. I believe that when a feature
is genuinely needed, our goal should be to support it, not to
dismiss user and customer needs blindly.

Here is the link to the userspace counterpart, which depends on
this series to function properly[*].

I've been refreshing and testing the series together with Arshad.

This patchset introduces static load balancing for GPUs with
multiple compute engines. It's a relatively long series.

To help with review, I've broken the work down as much as
possible in multiple patches.

To summarise:
- Patches 1 to 14 introduce no functional changes, aside from
  adding the 'num_cslices' interface.
- Patch 15 contains the core of the CCS mode setting, building
  on the earlier groundwork.

The updated approach focuses on managing the UABI engine list,
which controls which engines are exposed to userspace. Instead
of manipulating physical engines and their memory directly, we
now control exposure via this list.

Since v3, I've kept the changes in v4 to a minimum because there
wasn't a real technical review on the previous posting. I would
really appreciate it if this time all technical concerns could be
raised and discussed on the mailing list.

IGT tests for this work exist but haven't been submitted yet.

Thanks to Chris for the reviews, to Arshad for the work we've
done together over the past few weeks, and to Michal for his
invaluable input from the userspace side.

Thanks,
Andi

[*] https://github.com/intel/compute-runtime

Changelog:
==========
PATCHv3 -> PATCHv4
------------------
 - Rebase on top of the latest drm-tip
 - Do not call functions inside GEM_BUG_ONs, but call them
   explicitly (thanks Arshad).

PATCHv2 -> PATCHv3
------------------
 - Fix a NULL pointer dereference during module unload.
   In i915_gem_driver_remove() I was accessing the gt after the
   gt was removed. Use the dev_priv, instead (obviously!).
 - Fix a lockdep issue: Some of the uabi_engines_mutex unlocks
   were not correctly placed in the exit paths.
 - Fix a checkpatch error for spaces after and before parenthesis
   in the for_each_enabled_engine() definition.

PATCHv1 -> PATCHv2
------------------
 - Use uabi_mutex to protect the uabi_engines, not the engine
   itself. Rename it to uabi_engines_mutex.
 - Use kobject_add/kobject_del for adding and removing
   interfaces, this way we don't need to destroy and recreate the
   engines, anymore. Refactor intel_engine_add_single_sysfs() to
   reflect this scenario.
 - After adding engines to the rb_tree check that they have been
   added correctly.
 - Fix rb_find_add() compare function to take into accoung also
   the class, not just the instance.

RFCv2 -> PATCHv1
----------------
 - Removed gt->ccs.mutex
 - Rename m -> width, ccs_id -> engine in
   intel_gt_apply_ccs_mode().
 - In the CCS register value calculation
   (intel_gt_apply_ccs_mode()) the engine (ccs_id) needs to move
   along the ccs_mask (set by the user) instead of the
   cslice_mask.
 - Add GEM_BUG_ON after calculating the new ccs_mask
   (update_ccs_mask()) to make sure all angines have been
   evaluated (i.e. ccs_mask must be '0' at the end of the
   algorithm).
 - move wakeref lock before evaluating intel_gt_pm_is_awake() and
   fix exit path accordingly.
 - Use a more compact form in intel_gt_sysfs_ccs_init() and
   add_uabi_ccs_engines() when evaluating sysfs_create_file(): no
   need to store the return value to the err variable which is
   unused. Get rid of err.
 - Print a warnging instead of a debug message if we fail to
   create the sysfs files.
 - If engine files creation fails in
   intel_engine_add_single_sysfs(), print a warning, not an
   error.
 - Rename gt->ccs.ccs_mask to gt->ccs.id_mask and add a comment
   to explain its purpose.
 - During uabi engine creation, in
   intel_engines_driver_register(), the uabi_ccs_instance is
   redundant because the ccs_instances is already tracked in
   engine->uabi_instance.
 - Mark add_uabi_ccs_engines() and remove_uabi_ccs_engines() as
   __maybe_unused not to break bisectability. They wouldn't
   compile in their own commit. They will be used in the next
   patch and the __maybe_unused is removed.
 - Update engine's workaround every time a new mode is set in
   update_ccs_mask().
 - Mark engines as valid or invalid using their status as
   rb_node. Invalid engines are marked as invalid using
   RB_CLEAR_NODE(). Execbufs will check for their validity when
   selecting the engine to be combined to a context.
 - Create for_each_enabled_engine() which skips the non valid
   engines and use it in selftests.

RFCv1 -> RFCv2
--------------
Compared to the first version I've taken a completely different
approach to adding and removing engines. in v1 physical engines
were directly added and removed, along with the memory allocated
to them, each time the user changed the CCS mode (from the
previous cover letter).

Andi Shyti (15):
  drm/i915/gt: Avoid using masked workaround for CCS_MODE setting
  drm/i915/gt: Move the CCS mode variable to a global position
  drm/i915/gt: Allow the creation of multi-mode CCS masks
  drm/i915/gt: Refactor uabi engine class/instance list creation
  drm/i915/gem: Mark and verify UABI engine validity
  drm/i915/gt: Introduce for_each_enabled_engine() and apply it in
    selftests
  drm/i915/gt: Manage CCS engine creation within UABI exposure
  drm/i915/gt: Remove cslices mask value from the CCS structure
  drm/i915/gt: Expose the number of total CCS slices
  drm/i915/gt: Store engine-related sysfs kobjects
  drm/i915/gt: Store active CCS mask
  drm/i915: Protect access to the UABI engines list with a mutex
  drm/i915/gt: Isolate single sysfs engine file creation
  drm/i915/gt: Implement creation and removal routines for CCS engines
  drm/i915/gt: Allow the user to change the CCS mode through sysfs

 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   3 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  28 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  23 --
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   2 +
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |  62 ++-
 drivers/gpu/drm/i915/gt/intel_gt.c            |   3 +
 drivers/gpu/drm/i915/gt/intel_gt.h            |  12 +
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c   | 357 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h   |   5 +-
 drivers/gpu/drm/i915/gt/intel_gt_sysfs.c      |   2 +
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |  19 +-
 drivers/gpu/drm/i915/gt/intel_workarounds.c   |   8 +-
 drivers/gpu/drm/i915/gt/selftest_context.c    |   6 +-
 drivers/gpu/drm/i915/gt/selftest_engine_cs.c  |   4 +-
 .../drm/i915/gt/selftest_engine_heartbeat.c   |   6 +-
 drivers/gpu/drm/i915/gt/selftest_engine_pm.c  |   6 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  52 +--
 drivers/gpu/drm/i915/gt/selftest_gt_pm.c      |   2 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |  22 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |  18 +-
 drivers/gpu/drm/i915/gt/selftest_mocs.c       |   6 +-
 drivers/gpu/drm/i915/gt/selftest_rc6.c        |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c      |   8 +-
 .../drm/i915/gt/selftest_ring_submission.c    |   2 +-
 drivers/gpu/drm/i915/gt/selftest_rps.c        |  14 +-
 drivers/gpu/drm/i915/gt/selftest_timeline.c   |  14 +-
 drivers/gpu/drm/i915/gt/selftest_tlb.c        |   2 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    |  14 +-
 drivers/gpu/drm/i915/gt/sysfs_engines.c       |  80 ++--
 drivers/gpu/drm/i915/gt/sysfs_engines.h       |   2 +
 drivers/gpu/drm/i915/i915_cmd_parser.c        |   2 +
 drivers/gpu/drm/i915/i915_debugfs.c           |   4 +
 drivers/gpu/drm/i915/i915_drv.h               |   5 +
 drivers/gpu/drm/i915/i915_gem.c               |   4 +
 drivers/gpu/drm/i915/i915_perf.c              |   8 +-
 drivers/gpu/drm/i915/i915_pmu.c               |  11 +-
 drivers/gpu/drm/i915/i915_query.c             |  21 +-
 37 files changed, 648 insertions(+), 193 deletions(-)

--
2.47.2


[-- Attachment #2: Type: text/html, Size: 13071 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 00/15] CCS static load balance
  2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
                   ` (17 preceding siblings ...)
  2025-03-25 10:36 ` Mehmood, Arshad
@ 2025-03-27 13:44 ` Ayyalasomayajula, Usharani
  18 siblings, 0 replies; 24+ messages in thread
From: Ayyalasomayajula, Usharani @ 2025-03-27 13:44 UTC (permalink / raw)
  To: Andi Shyti, intel-gfx, dri-devel
  Cc: Tvrtko Ursulin, Joonas Lahtinen, Chris Wilson, Simona Vetter,
	Mehmood, Arshad, Mrozek, Michal, Andi Shyti

[-- Attachment #1: Type: text/plain, Size: 9340 bytes --]

Justification: To address a hardware bug causing stability issues when RCS and multiple CCS operate simultaneously in dynamic load balancing mode, CCG limited the CCS count to 1 as a software workaround. Many ECG customers run compute-only workloads without requiring rendering tasks. Therefore, it is important to provide customers with a runtime configuration option to increase the CCS count for compute-only workloads, in order to meet the performance requirements.


Acked-by: Usharani Ayyalasomayajula <usharani.ayyalasomayajula@intel.com>

Thanks,
Usha.
________________________________
From: Andi Shyti
Sent: Monday, March 24, 2025 18:59
To: intel-gfx; dri-devel
Cc: Tvrtko Ursulin; Joonas Lahtinen; Chris Wilson; Simona Vetter; Mehmood, Arshad; Mrozek, Michal; Andi Shyti; Andi Shyti
Subject: [PATCH v4 00/15] CCS static load balance

Hi,

Back in v3, this patch series was turned down due to community
policies regarding i915 GEM development. Since then, I have
received several requests from userspace developers, which I
initially declined in order to respect those policies.

However, with the latest request from UMD users, I decided to
give this series another chance. I believe that when a feature
is genuinely needed, our goal should be to support it, not to
dismiss user and customer needs blindly.

Here is the link to the userspace counterpart, which depends on
this series to function properly[*].

I've been refreshing and testing the series together with Arshad.

This patchset introduces static load balancing for GPUs with
multiple compute engines. It's a relatively long series.

To help with review, I've broken the work down as much as
possible in multiple patches.

To summarise:
- Patches 1 to 14 introduce no functional changes, aside from
  adding the 'num_cslices' interface.
- Patch 15 contains the core of the CCS mode setting, building
  on the earlier groundwork.

The updated approach focuses on managing the UABI engine list,
which controls which engines are exposed to userspace. Instead
of manipulating physical engines and their memory directly, we
now control exposure via this list.

Since v3, I've kept the changes in v4 to a minimum because there
wasn't a real technical review on the previous posting. I would
really appreciate it if this time all technical concerns could be
raised and discussed on the mailing list.

IGT tests for this work exist but haven't been submitted yet.

Thanks to Chris for the reviews, to Arshad for the work we've
done together over the past few weeks, and to Michal for his
invaluable input from the userspace side.

Thanks,
Andi

[*] https://github.com/intel/compute-runtime

Changelog:
==========
PATCHv3 -> PATCHv4
------------------
 - Rebase on top of the latest drm-tip
 - Do not call functions inside GEM_BUG_ONs, but call them
   explicitly (thanks Arshad).

PATCHv2 -> PATCHv3
------------------
 - Fix a NULL pointer dereference during module unload.
   In i915_gem_driver_remove() I was accessing the gt after the
   gt was removed. Use the dev_priv, instead (obviously!).
 - Fix a lockdep issue: Some of the uabi_engines_mutex unlocks
   were not correctly placed in the exit paths.
 - Fix a checkpatch error for spaces after and before parenthesis
   in the for_each_enabled_engine() definition.

PATCHv1 -> PATCHv2
------------------
 - Use uabi_mutex to protect the uabi_engines, not the engine
   itself. Rename it to uabi_engines_mutex.
 - Use kobject_add/kobject_del for adding and removing
   interfaces, this way we don't need to destroy and recreate the
   engines, anymore. Refactor intel_engine_add_single_sysfs() to
   reflect this scenario.
 - After adding engines to the rb_tree check that they have been
   added correctly.
 - Fix rb_find_add() compare function to take into accoung also
   the class, not just the instance.

RFCv2 -> PATCHv1
----------------
 - Removed gt->ccs.mutex
 - Rename m -> width, ccs_id -> engine in
   intel_gt_apply_ccs_mode().
 - In the CCS register value calculation
   (intel_gt_apply_ccs_mode()) the engine (ccs_id) needs to move
   along the ccs_mask (set by the user) instead of the
   cslice_mask.
 - Add GEM_BUG_ON after calculating the new ccs_mask
   (update_ccs_mask()) to make sure all angines have been
   evaluated (i.e. ccs_mask must be '0' at the end of the
   algorithm).
 - move wakeref lock before evaluating intel_gt_pm_is_awake() and
   fix exit path accordingly.
 - Use a more compact form in intel_gt_sysfs_ccs_init() and
   add_uabi_ccs_engines() when evaluating sysfs_create_file(): no
   need to store the return value to the err variable which is
   unused. Get rid of err.
 - Print a warnging instead of a debug message if we fail to
   create the sysfs files.
 - If engine files creation fails in
   intel_engine_add_single_sysfs(), print a warning, not an
   error.
 - Rename gt->ccs.ccs_mask to gt->ccs.id_mask and add a comment
   to explain its purpose.
 - During uabi engine creation, in
   intel_engines_driver_register(), the uabi_ccs_instance is
   redundant because the ccs_instances is already tracked in
   engine->uabi_instance.
 - Mark add_uabi_ccs_engines() and remove_uabi_ccs_engines() as
   __maybe_unused not to break bisectability. They wouldn't
   compile in their own commit. They will be used in the next
   patch and the __maybe_unused is removed.
 - Update engine's workaround every time a new mode is set in
   update_ccs_mask().
 - Mark engines as valid or invalid using their status as
   rb_node. Invalid engines are marked as invalid using
   RB_CLEAR_NODE(). Execbufs will check for their validity when
   selecting the engine to be combined to a context.
 - Create for_each_enabled_engine() which skips the non valid
   engines and use it in selftests.

RFCv1 -> RFCv2
--------------
Compared to the first version I've taken a completely different
approach to adding and removing engines. in v1 physical engines
were directly added and removed, along with the memory allocated
to them, each time the user changed the CCS mode (from the
previous cover letter).

Andi Shyti (15):
  drm/i915/gt: Avoid using masked workaround for CCS_MODE setting
  drm/i915/gt: Move the CCS mode variable to a global position
  drm/i915/gt: Allow the creation of multi-mode CCS masks
  drm/i915/gt: Refactor uabi engine class/instance list creation
  drm/i915/gem: Mark and verify UABI engine validity
  drm/i915/gt: Introduce for_each_enabled_engine() and apply it in
    selftests
  drm/i915/gt: Manage CCS engine creation within UABI exposure
  drm/i915/gt: Remove cslices mask value from the CCS structure
  drm/i915/gt: Expose the number of total CCS slices
  drm/i915/gt: Store engine-related sysfs kobjects
  drm/i915/gt: Store active CCS mask
  drm/i915: Protect access to the UABI engines list with a mutex
  drm/i915/gt: Isolate single sysfs engine file creation
  drm/i915/gt: Implement creation and removal routines for CCS engines
  drm/i915/gt: Allow the user to change the CCS mode through sysfs

 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   3 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  28 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  23 --
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   2 +
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |  62 ++-
 drivers/gpu/drm/i915/gt/intel_gt.c            |   3 +
 drivers/gpu/drm/i915/gt/intel_gt.h            |  12 +
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c   | 357 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h   |   5 +-
 drivers/gpu/drm/i915/gt/intel_gt_sysfs.c      |   2 +
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |  19 +-
 drivers/gpu/drm/i915/gt/intel_workarounds.c   |   8 +-
 drivers/gpu/drm/i915/gt/selftest_context.c    |   6 +-
 drivers/gpu/drm/i915/gt/selftest_engine_cs.c  |   4 +-
 .../drm/i915/gt/selftest_engine_heartbeat.c   |   6 +-
 drivers/gpu/drm/i915/gt/selftest_engine_pm.c  |   6 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  52 +--
 drivers/gpu/drm/i915/gt/selftest_gt_pm.c      |   2 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |  22 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |  18 +-
 drivers/gpu/drm/i915/gt/selftest_mocs.c       |   6 +-
 drivers/gpu/drm/i915/gt/selftest_rc6.c        |   4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c      |   8 +-
 .../drm/i915/gt/selftest_ring_submission.c    |   2 +-
 drivers/gpu/drm/i915/gt/selftest_rps.c        |  14 +-
 drivers/gpu/drm/i915/gt/selftest_timeline.c   |  14 +-
 drivers/gpu/drm/i915/gt/selftest_tlb.c        |   2 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    |  14 +-
 drivers/gpu/drm/i915/gt/sysfs_engines.c       |  80 ++--
 drivers/gpu/drm/i915/gt/sysfs_engines.h       |   2 +
 drivers/gpu/drm/i915/i915_cmd_parser.c        |   2 +
 drivers/gpu/drm/i915/i915_debugfs.c           |   4 +
 drivers/gpu/drm/i915/i915_drv.h               |   5 +
 drivers/gpu/drm/i915/i915_gem.c               |   4 +
 drivers/gpu/drm/i915/i915_perf.c              |   8 +-
 drivers/gpu/drm/i915/i915_pmu.c               |  11 +-
 drivers/gpu/drm/i915/i915_query.c             |  21 +-
 37 files changed, 648 insertions(+), 193 deletions(-)

--
2.47.2

[-- Attachment #2: Type: text/html, Size: 14175 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-04-23 14:29 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-24 13:29 [PATCH v4 00/15] CCS static load balance Andi Shyti
2025-03-24 13:29 ` [PATCH v4 01/15] drm/i915/gt: Avoid using masked workaround for CCS_MODE setting Andi Shyti
2025-04-23 14:28   ` Lucas De Marchi
2025-03-24 13:29 ` [PATCH v4 02/15] drm/i915/gt: Move the CCS mode variable to a global position Andi Shyti
2025-03-24 13:29 ` [PATCH v4 03/15] drm/i915/gt: Allow the creation of multi-mode CCS masks Andi Shyti
2025-03-24 13:29 ` [PATCH v4 04/15] drm/i915/gt: Refactor uabi engine class/instance list creation Andi Shyti
2025-03-24 13:29 ` [PATCH v4 05/15] drm/i915/gem: Mark and verify UABI engine validity Andi Shyti
2025-03-24 13:29 ` [PATCH v4 06/15] drm/i915/gt: Introduce for_each_enabled_engine() and apply it in selftests Andi Shyti
2025-03-24 13:29 ` [PATCH v4 07/15] drm/i915/gt: Manage CCS engine creation within UABI exposure Andi Shyti
2025-03-24 13:29 ` [PATCH v4 08/15] drm/i915/gt: Remove cslices mask value from the CCS structure Andi Shyti
2025-03-24 13:29 ` [PATCH v4 09/15] drm/i915/gt: Expose the number of total CCS slices Andi Shyti
2025-03-24 13:29 ` [PATCH v4 10/15] drm/i915/gt: Store engine-related sysfs kobjects Andi Shyti
2025-03-24 13:29 ` [PATCH v4 11/15] drm/i915/gt: Store active CCS mask Andi Shyti
2025-03-24 13:29 ` [PATCH v4 12/15] drm/i915: Protect access to the UABI engines list with a mutex Andi Shyti
2025-03-24 13:29 ` [PATCH v4 13/15] drm/i915/gt: Isolate single sysfs engine file creation Andi Shyti
2025-03-24 13:29 ` [PATCH v4 14/15] drm/i915/gt: Implement creation and removal routines for CCS engines Andi Shyti
2025-03-24 13:29 ` [PATCH v4 15/15] drm/i915/gt: Allow the user to change the CCS mode through sysfs Andi Shyti
2025-03-24 13:59 ` [PATCH v4 00/15] CCS static load balance Mrozek, Michal
2025-03-25  8:24 ` Joonas Lahtinen
2025-03-25 10:52   ` Andi Shyti
2025-03-27  6:49     ` Joonas Lahtinen
2025-03-27 15:10       ` Mehmood, Arshad
2025-03-25 10:36 ` Mehmood, Arshad
2025-03-27 13:44 ` Ayyalasomayajula, Usharani

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).