[PATCH 00/10] Introduce SRIOV scheduler groups

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/10] Introduce SRIOV scheduler groups
@ 2025-11-27  1:45 Daniele Ceraolo Spurio
  2025-11-27  1:45 ` [PATCH 01/10] drm/xe/gt: Add engine masks for each class Daniele Ceraolo Spurio
                   ` (13 more replies)
  0 siblings, 14 replies; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-11-27  1:45 UTC (permalink / raw)
  To: intel-xe; +Cc: Daniele Ceraolo Spurio, Michal Wajdeczko

The normal SRIOV setup timeslices the whole GT across VFs. While this is
fine in the great majority of cases, in some cases the admin knows that
a VF is not going to use all the GT HW and that some engines are going
to be permanently idle.
To increase HW utilization in such a scenario, starting from v70.53.0 the
GuC supports scheduler groups (a.k.a. Engine Group Scheduling or EGS);
this feature allows the driver to subdivide a GT into groups of engines,
which the GuC will then independently timeslice across VFs, thus allowing
multiple VF to access the HW at the same time. Given that each group is
independently scheduled, execution quantums and preemption timeouts are
settable per-group-per-VF.

While the GuC supports any group assignment (as long as each engine
belongs to only one group), we only allow specific tested configuration
to be set by the admin that are tailored to specific use-cases. This
series introduces one of those use cases: if each VF is doing a frame
rendering + encoding at a not-too-high resolution (e.g 1080p@30fps),
like it happens e.g. with a simple remote desktop, the render engine
can produce frames faster than the video engine can encode them.
However, our HW can have multiple video engines, so while one of them is
encoding a frame for a VF the other ones can be used for encoding frames
for other VFs. Given that media slices share some resources (e.g. SFC),
to obtain this parallel execution we can simply assign each media slice
to a different group.

This series only allows enabling/disbling of this feature via debugfs
for now (like several other SRIOV features). Sysfs will be implemented
as a follow up, after the review of this series and the proposed
interface is complete.

While the feature is disabled, only one debugfs file is available.
If any configs are supported on the GT, reading this file will dump the
available configs and which one is selected, e.g:

#cat sriov/pf/tile0/gt1/sched_groups_mode
[disabled] media_slices

Writing the config name to the file will enable that configuration and
create a new debugfs folder for PF and each VF called sched_groups, inside
of which are debugfs files that allow listing which engines are assigned
to each group and setting of the EQ/PT values, either for all groups in
one go (by providing a list of values) or for each group individually:

	:
	├── vf1
	:   ├── tile0
	    :   ├── gt1
		:   ├── sched_groups
		    :   ├── exec_quantums_ms
			├── preempt_timeouts_us
			├── group0
			:   ├── engines
			    ├── exec_quantum_ms
			    └── preempt_timeout_us
			:
			└── groupN
			    ├── engines
			    ├── exec_quantum_ms
			    └── preempt_timeout_us

Refer to the individual patches for more details about the debugfs
usage.

Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>

Daniele Ceraolo Spurio (10):
  drm/xe/gt: Add engine masks for each class
  drm/xe/sriov: Initialize scheduler groups
  drm/xe/sriov: Add support for enabling scheduler groups
  drm/xe/sriov: Scheduler groups are incompatible with multi-lrc
  drm/xe/sriov: Add debugfs to enable scheduler groups
  drm/xe/sriov: Add debugfs with scheduler groups information
  drm/xe/sriov: Prep for multiple exec quantums and preemption timeouts
  drm/xe/sriov: Add functions to set exec quantums for each group
  drm/xe/sriov: Add functions to set preempt timeouts for each group
  drm/xe/sriov: Add debugfs to set EQ and PT for scheduler groups

 drivers/gpu/drm/xe/abi/guc_klvs_abi.h         |  62 ++-
 drivers/gpu/drm/xe/xe_exec_queue.c            |  19 +
 drivers/gpu/drm/xe/xe_gt.h                    |   9 +-
 drivers/gpu/drm/xe/xe_gt_sriov_pf.c           |   5 +
 drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c    | 418 ++++++++++++++++--
 drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h    |  16 +
 .../gpu/drm/xe/xe_gt_sriov_pf_config_types.h  |   5 +-
 drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c   | 417 +++++++++++++++++
 drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c    | 294 ++++++++++++
 drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h    |   5 +
 .../gpu/drm/xe/xe_gt_sriov_pf_policy_types.h  |  15 +
 drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h     |   4 +
 drivers/gpu/drm/xe/xe_gt_sriov_vf.c           |  59 +++
 drivers/gpu/drm/xe/xe_gt_sriov_vf.h           |   3 +
 drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h     |   2 +
 .../drm/xe/xe_guc_klv_thresholds_set_types.h  |   1 +
 drivers/gpu/drm/xe/xe_sriov.c                 |  18 +
 drivers/gpu/drm/xe/xe_sriov.h                 |   1 +
 18 files changed, 1323 insertions(+), 30 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 01/10] drm/xe/gt: Add engine masks for each class
  2025-11-27  1:45 [PATCH 00/10] Introduce SRIOV scheduler groups Daniele Ceraolo Spurio
@ 2025-11-27  1:45 ` Daniele Ceraolo Spurio
  2025-12-01 16:52   ` Michal Wajdeczko
  2025-11-27  1:45 ` [PATCH 02/10] drm/xe/sriov: Initialize scheduler groups Daniele Ceraolo Spurio
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-11-27  1:45 UTC (permalink / raw)
  To: intel-xe; +Cc: Daniele Ceraolo Spurio

Follow up patches will need the engine masks for VCS and VECS engines.
Since we already have a macro for the CCS engines, just extend the same
approach to all classes.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/xe/xe_gt.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
index 9d710049da45..e70789dfac6e 100644
--- a/drivers/gpu/drm/xe/xe_gt.h
+++ b/drivers/gpu/drm/xe/xe_gt.h
@@ -20,7 +20,14 @@
 		for_each_if(((hwe__) = (gt__)->hw_engines + (id__)) && \
 			  xe_hw_engine_is_valid((hwe__)))
 
-#define CCS_MASK(gt) (((gt)->info.engine_mask & XE_HW_ENGINE_CCS_MASK) >> XE_HW_ENGINE_CCS0)
+#define __ENGINE_CLASS_MASK(gt, name) \
+	(((gt)->info.engine_mask & XE_HW_ENGINE_##name##_MASK) >> XE_HW_ENGINE_##name##0)
+
+#define RCS_MASK(gt) __ENGINE_CLASS_MASK(gt, RCS)
+#define VCS_MASK(gt) __ENGINE_CLASS_MASK(gt, VCS)
+#define VECS_MASK(gt) __ENGINE_CLASS_MASK(gt, VECS)
+#define CCS_MASK(gt) __ENGINE_CLASS_MASK(gt, CCS)
+#define GSCCS_MASK(gt) __ENGINE_CLASS_MASK(gt, GSCCS)
 
 #define GT_VER(gt) ({ \
 	typeof(gt) gt_ = (gt); \
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 01/10] drm/xe/gt: Add engine masks for each class
  2025-11-27  1:45 ` [PATCH 01/10] drm/xe/gt: Add engine masks for each class Daniele Ceraolo Spurio
@ 2025-12-01 16:52   ` Michal Wajdeczko
  0 siblings, 0 replies; 44+ messages in thread
From: Michal Wajdeczko @ 2025-12-01 16:52 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-xe



On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
> Follow up patches will need the engine masks for VCS and VECS engines.
> Since we already have a macro for the CCS engines, just extend the same
> approach to all classes.

but the problem is that this existing macro is already little confusing as we have:

	XE_HW_ENGINE_CCS_MASK
and
	CCS_MASK

where only former is a real engine mask, while latter is not masking anything

maybe we should rename the CCS_MASK to

	CCS_INSTANCES

and then extend that naming to

	RCS_INSTANCES
	VCS_INSTANCES
	...


> 
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt.h | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
> index 9d710049da45..e70789dfac6e 100644
> --- a/drivers/gpu/drm/xe/xe_gt.h
> +++ b/drivers/gpu/drm/xe/xe_gt.h
> @@ -20,7 +20,14 @@
>  		for_each_if(((hwe__) = (gt__)->hw_engines + (id__)) && \
>  			  xe_hw_engine_is_valid((hwe__)))
>  
> -#define CCS_MASK(gt) (((gt)->info.engine_mask & XE_HW_ENGINE_CCS_MASK) >> XE_HW_ENGINE_CCS0)
> +#define __ENGINE_CLASS_MASK(gt, name) \
> +	(((gt)->info.engine_mask & XE_HW_ENGINE_##name##_MASK) >> XE_HW_ENGINE_##name##0)
> +
> +#define RCS_MASK(gt) __ENGINE_CLASS_MASK(gt, RCS)
> +#define VCS_MASK(gt) __ENGINE_CLASS_MASK(gt, VCS)
> +#define VECS_MASK(gt) __ENGINE_CLASS_MASK(gt, VECS)
> +#define CCS_MASK(gt) __ENGINE_CLASS_MASK(gt, CCS)
> +#define GSCCS_MASK(gt) __ENGINE_CLASS_MASK(gt, GSCCS)
>  
>  #define GT_VER(gt) ({ \
>  	typeof(gt) gt_ = (gt); \


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 02/10] drm/xe/sriov: Initialize scheduler groups
  2025-11-27  1:45 [PATCH 00/10] Introduce SRIOV scheduler groups Daniele Ceraolo Spurio
  2025-11-27  1:45 ` [PATCH 01/10] drm/xe/gt: Add engine masks for each class Daniele Ceraolo Spurio
@ 2025-11-27  1:45 ` Daniele Ceraolo Spurio
  2025-12-01 22:37   ` Michal Wajdeczko
  2025-11-27  1:45 ` [PATCH 03/10] drm/xe/sriov: Add support for enabling " Daniele Ceraolo Spurio
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-11-27  1:45 UTC (permalink / raw)
  To: intel-xe; +Cc: Daniele Ceraolo Spurio, Michal Wajdeczko

Scheduler groups (a.k.a. Engine Groups Scheduling, or EGS) is a GuC
feature that allows the driver to define groups of engines that are
independently scheduled across VFs, which allows different VFs to be
active on the HW at the same time on different groups.

This is intended for specific scenarios where the admin knows that the
VFs are not going to fully utilize the HW and therefore assigning all of
it to a single VF would lead to part of it being permanently idle.
We do not allow the admin to decide how to divide the engines across
groups, but we instead support specific configurations that are designed
for specific use-cases. During PF initialization we detect which
configurations are possible on a given GT and create the relevant
groups. Since the GuC expect a mask for each class for each group, that
is what we save when we init the configs.

Right now we only have one use-case on the media GT. If the VFs are
running a frame render + encoding at a not-too-high resolution (e.g.
1080@30fps) the render can produce frames faster than the video engine
can encode them, which means that the maximum number of parallel VFs is
limited by the VCS bandwidth. Since our products can have multiple VCS
engines, allowing multiple VFs to be active on the different VCS engines
at the same time allows us to run more parallel VFs on the same HW.
Given that engines in the same media slice share some resources (e.g.
SFC), we assign each media slice to a different scheduling group. We
refer to this configuration as "media_slices", given that each slice
gets its own group.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_sriov_pf.c           |   5 +
 drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c    | 135 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h    |   1 +
 .../gpu/drm/xe/xe_gt_sriov_pf_policy_types.h  |  14 ++
 4 files changed, 155 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
index 0714c758b9c1..62dda1c24e77 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
@@ -14,6 +14,7 @@
 #include "xe_gt_sriov_pf_control.h"
 #include "xe_gt_sriov_pf_helpers.h"
 #include "xe_gt_sriov_pf_migration.h"
+#include "xe_gt_sriov_pf_policy.h"
 #include "xe_gt_sriov_pf_service.h"
 #include "xe_gt_sriov_printk.h"
 #include "xe_guc_submit.h"
@@ -123,6 +124,10 @@ int xe_gt_sriov_pf_init(struct xe_gt *gt)
 	if (err)
 		return err;
 
+	err = xe_gt_sriov_pf_policy_init(gt);
+	if (err)
+		return err;
+
 	err = xe_gt_sriov_pf_migration_init(gt);
 	if (err)
 		return err;
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
index 4445f660e6d1..9b878578ea90 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
@@ -5,11 +5,14 @@
 
 #include "abi/guc_actions_sriov_abi.h"
 
+#include <drm/drm_managed.h>
+
 #include "xe_bo.h"
 #include "xe_gt.h"
 #include "xe_gt_sriov_pf_helpers.h"
 #include "xe_gt_sriov_pf_policy.h"
 #include "xe_gt_sriov_printk.h"
+#include "xe_guc.h"
 #include "xe_guc_buf.h"
 #include "xe_guc_ct.h"
 #include "xe_guc_klv_helpers.h"
@@ -351,6 +354,125 @@ u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt)
 	return value;
 }
 
+#define MAX_MEDIA_SLICES (hweight32(XE_HW_ENGINE_VECS_MASK))
+static int pf_sched_group_media_slices(struct xe_gt *gt, u32 **masks, u32 *num_masks)
+{
+	u8 slice_to_group[MAX_MEDIA_SLICES];
+	struct xe_hw_engine *hwe = NULL;
+	enum xe_hw_engine_id id;
+	u32 vcs_mask = VCS_MASK(gt);
+	u32 vecs_mask = VECS_MASK(gt);
+	u32 gsc_mask = GSCCS_MASK(gt);
+	u32 *values;
+	u8 slice;
+	u8 groups;
+
+	xe_gt_assert(gt, xe_gt_is_media_type(gt));
+
+	/* A media slice has 2 VCS and a VECS. We bundle the GSC with the first slice */
+	for (slice = 0, groups = 0;
+	     gsc_mask || vcs_mask || vecs_mask;
+	     slice++, gsc_mask = 0, vcs_mask >>= 2, vecs_mask >>= 1) {
+		if (unlikely(slice >= MAX_MEDIA_SLICES)) {
+			xe_gt_sriov_err(gt, "Too many media slices (%u) during EGS setup\n",
+					slice);
+			return -EINVAL;
+		}
+
+		if ((vcs_mask & 0x3) || (vecs_mask & 0x1) || (gsc_mask & 0x1))
+			slice_to_group[slice] = groups++;
+	}
+
+	/* We need at least 2 slices to split them up */
+	if (groups < 2) {
+		*masks = NULL;
+		*num_masks = 0;
+		return 0;
+	}
+
+	/*
+	 * The GuC expects and array with GUC_MAX_ENGINE_CLASSES entries for
+	 * each group.
+	 */
+	values = drmm_kzalloc(&gt_to_xe(gt)->drm,
+			      GUC_MAX_ENGINE_CLASSES * groups * sizeof(u32),
+			      GFP_KERNEL);
+	if (!values)
+		return -ENOMEM;
+
+	for_each_hw_engine(hwe, gt, id) {
+		u8 guc_class = xe_engine_class_to_guc_class(hwe->class);
+		u8 entry;
+
+		switch (hwe->class) {
+		case XE_ENGINE_CLASS_VIDEO_DECODE:
+			slice = hwe->instance / 2;
+			break;
+		case XE_ENGINE_CLASS_VIDEO_ENHANCE:
+			slice = hwe->instance;
+			break;
+		case XE_ENGINE_CLASS_OTHER:
+			slice = 0;
+			break;
+		default:
+			xe_gt_sriov_err(gt, "unknown media gt class %u (%s) during EGS setup\n",
+					hwe->class, hwe->name);
+			drmm_kfree(&gt_to_xe(gt)->drm, values);
+			return -EINVAL;
+		}
+
+		entry = (slice_to_group[slice] * GUC_MAX_ENGINE_CLASSES) + guc_class;
+		values[entry] |= BIT(hwe->logical_instance);
+	}
+
+	*masks = values;
+	*num_masks = GUC_MAX_ENGINE_CLASSES * groups;
+
+	return 0;
+}
+
+static int pf_init_sched_groups(struct xe_gt *gt)
+{
+	int err;
+	int m;
+
+	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
+
+	if (GUC_SUBMIT_VER(&gt->uc.guc) < MAKE_GUC_VER(1, 26, 0))
+		return 0;
+
+	for (m = 0; m < XE_SRIOV_SCHED_GROUPS_MODES_COUNT; m++) {
+		u32 *masks = NULL;
+		u32 num_masks = 0;
+
+		switch (m) {
+		case XE_SRIOV_SCHED_GROUPS_NONE:
+			break;
+		case XE_SRIOV_SCHED_GROUPS_MEDIA_SLICES:
+			/* this mode only has groups on the media GT */
+			if (xe_gt_is_media_type(gt)) {
+				err = pf_sched_group_media_slices(gt, &masks, &num_masks);
+				if (err)
+					return err;
+			}
+			break;
+		default:
+			xe_gt_sriov_err(gt, "unknown sched group mode %u\n", m);
+			return -EINVAL;
+		}
+
+		xe_gt_assert(gt, (num_masks % GUC_MAX_ENGINE_CLASSES) == 0);
+
+		if ((m == XE_SRIOV_SCHED_GROUPS_NONE) || num_masks)
+			gt->sriov.pf.policy.guc.sched_groups.supported_modes |= BIT(m);
+
+		gt->sriov.pf.policy.guc.sched_groups.modes[m].masks = masks;
+		gt->sriov.pf.policy.guc.sched_groups.modes[m].num_masks = num_masks;
+	}
+
+	return 0;
+}
+
 static void pf_sanitize_guc_policies(struct xe_gt *gt)
 {
 	pf_sanitize_sched_if_idle(gt);
@@ -401,6 +523,19 @@ int xe_gt_sriov_pf_policy_reprovision(struct xe_gt *gt, bool reset)
 	return err ? -ENXIO : 0;
 }
 
+/**
+ * xe_gt_sriov_pf_policy_init - Initializes the SW state of the PF policies.
+ * @gt: the &xe_gt
+ *
+ * This function can only be called on PF. This function does not touch the HW.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_gt_sriov_pf_policy_init(struct xe_gt *gt)
+{
+	return pf_init_sched_groups(gt);
+}
+
 static void print_guc_policies(struct drm_printer *p, struct xe_gt_sriov_guc_policies *policy)
 {
 	drm_printf(p, "%s:\t%s\n",
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
index 2a5dc33dc6d7..c9c04d1b7f50 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
@@ -18,6 +18,7 @@ bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
 int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
 u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
 
+int xe_gt_sriov_pf_policy_init(struct xe_gt *gt);
 void xe_gt_sriov_pf_policy_sanitize(struct xe_gt *gt);
 int xe_gt_sriov_pf_policy_reprovision(struct xe_gt *gt, bool reset);
 int xe_gt_sriov_pf_policy_print(struct xe_gt *gt, struct drm_printer *p);
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
index 4de532af135e..3b915801c01b 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
@@ -8,16 +8,30 @@
 
 #include <linux/types.h>
 
+enum xe_sriov_sched_group_modes {
+	XE_SRIOV_SCHED_GROUPS_NONE = 0, /* disabled */
+	XE_SRIOV_SCHED_GROUPS_MEDIA_SLICES, /* separate groups for each media slice */
+	XE_SRIOV_SCHED_GROUPS_MODES_COUNT
+};
+
 /**
  * struct xe_gt_sriov_guc_policies - GuC SR-IOV policies.
  * @sched_if_idle: controls strict scheduling policy.
  * @reset_engine: controls engines reset on VF switch policy.
  * @sample_period: adverse events sampling period (in milliseconds).
+ * @sched_groups: available scheduling group configurations and current mode.
  */
 struct xe_gt_sriov_guc_policies {
 	bool sched_if_idle;
 	bool reset_engine;
 	u32 sample_period;
+	struct {
+		u32 supported_modes;
+		struct {
+			u32 *masks;
+			u32 num_masks;
+		} modes[XE_SRIOV_SCHED_GROUPS_MODES_COUNT];
+	} sched_groups;
 };
 
 /**
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/10] drm/xe/sriov: Initialize scheduler groups
  2025-11-27  1:45 ` [PATCH 02/10] drm/xe/sriov: Initialize scheduler groups Daniele Ceraolo Spurio
@ 2025-12-01 22:37   ` Michal Wajdeczko
  2025-12-01 23:33     ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 44+ messages in thread
From: Michal Wajdeczko @ 2025-12-01 22:37 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-xe

Hi Daniele,

some initial comments to keep you busy ;)

On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
> Scheduler groups (a.k.a. Engine Groups Scheduling, or EGS) is a GuC
> feature that allows the driver to define groups of engines that are
> independently scheduled across VFs, which allows different VFs to be
> active on the HW at the same time on different groups.
> 
> This is intended for specific scenarios where the admin knows that the
> VFs are not going to fully utilize the HW and therefore assigning all of
> it to a single VF would lead to part of it being permanently idle.
> We do not allow the admin to decide how to divide the engines across
> groups, but we instead support specific configurations that are designed
> for specific use-cases. During PF initialization we detect which
> configurations are possible on a given GT and create the relevant
> groups. Since the GuC expect a mask for each class for each group, that
> is what we save when we init the configs.
> 
> Right now we only have one use-case on the media GT. If the VFs are
> running a frame render + encoding at a not-too-high resolution (e.g.
> 1080@30fps) the render can produce frames faster than the video engine
> can encode them, which means that the maximum number of parallel VFs is
> limited by the VCS bandwidth. Since our products can have multiple VCS
> engines, allowing multiple VFs to be active on the different VCS engines
> at the same time allows us to run more parallel VFs on the same HW.
> Given that engines in the same media slice share some resources (e.g.
> SFC), we assign each media slice to a different scheduling group. We
> refer to this configuration as "media_slices", given that each slice
> gets its own group.
> 
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt_sriov_pf.c           |   5 +
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c    | 135 ++++++++++++++++++
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h    |   1 +
>  .../gpu/drm/xe/xe_gt_sriov_pf_policy_types.h  |  14 ++
>  4 files changed, 155 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
> index 0714c758b9c1..62dda1c24e77 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
> @@ -14,6 +14,7 @@
>  #include "xe_gt_sriov_pf_control.h"
>  #include "xe_gt_sriov_pf_helpers.h"
>  #include "xe_gt_sriov_pf_migration.h"
> +#include "xe_gt_sriov_pf_policy.h"
>  #include "xe_gt_sriov_pf_service.h"
>  #include "xe_gt_sriov_printk.h"
>  #include "xe_guc_submit.h"
> @@ -123,6 +124,10 @@ int xe_gt_sriov_pf_init(struct xe_gt *gt)
>  	if (err)
>  		return err;
>  
> +	err = xe_gt_sriov_pf_policy_init(gt);
> +	if (err)
> +		return err;
> +
>  	err = xe_gt_sriov_pf_migration_init(gt);
>  	if (err)
>  		return err;
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
> index 4445f660e6d1..9b878578ea90 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
> @@ -5,11 +5,14 @@
>  
>  #include "abi/guc_actions_sriov_abi.h"
>  
> +#include <drm/drm_managed.h>

system headers shall be included before our local headers

> +
>  #include "xe_bo.h"
>  #include "xe_gt.h"
>  #include "xe_gt_sriov_pf_helpers.h"
>  #include "xe_gt_sriov_pf_policy.h"
>  #include "xe_gt_sriov_printk.h"
> +#include "xe_guc.h"
>  #include "xe_guc_buf.h"
>  #include "xe_guc_ct.h"
>  #include "xe_guc_klv_helpers.h"
> @@ -351,6 +354,125 @@ u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt)
>  	return value;
>  }
>  
> +#define MAX_MEDIA_SLICES (hweight32(XE_HW_ENGINE_VECS_MASK))

redundant ( )

and maybe such macro should be placed in xe_hw_engine_types.h
with proper comment why it works:

	/*
	 * Each media slice has 1x VECS
	 * Max number of VECS instances gives us a max number of slices
	 */

> +static int pf_sched_group_media_slices(struct xe_gt *gt, u32 **masks, u32 *num_masks)
> +{
> +	u8 slice_to_group[MAX_MEDIA_SLICES];
> +	struct xe_hw_engine *hwe = NULL;

do we need to initialize this here?
it will be initialized by for_each_hw_engine, no?

> +	enum xe_hw_engine_id id;
> +	u32 vcs_mask = VCS_MASK(gt);
> +	u32 vecs_mask = VECS_MASK(gt);
> +	u32 gsc_mask = GSCCS_MASK(gt);

we try to define vars in rev-xmas-tree order

> +	u32 *values;
> +	u8 slice;
> +	u8 groups;

I guess for those two generic counters we don't need to use explicitly sized int

> +
> +	xe_gt_assert(gt, xe_gt_is_media_type(gt));
> +
> +	/* A media slice has 2 VCS and a VECS. We bundle the GSC with the first slice */
> +	for (slice = 0, groups = 0;
> +	     gsc_mask || vcs_mask || vecs_mask;
> +	     slice++, gsc_mask = 0, vcs_mask >>= 2, vecs_mask >>= 1) {
> +		if (unlikely(slice >= MAX_MEDIA_SLICES)) {

maybe this 'for' loop should be just for 'slice = [0, MAX)'  

> +			xe_gt_sriov_err(gt, "Too many media slices (%u) during EGS setup\n",
> +					slice);

then after this loop we can have asserts that no engine instances are left behind

> +			return -EINVAL;
> +		}
> +
> +		if ((vcs_mask & 0x3) || (vecs_mask & 0x1) || (gsc_mask & 0x1))
> +			slice_to_group[slice] = groups++;
> +	}

like this:

	for (slice = 0; slice < MAX_MEDIA_SLICES; slice++ ) {

		if ((vcs_mask & 0x3) || (vecs_mask & 0x1) || (gsc_mask & 0x1))
			slice_to_group[slice] = groups++;

		vcs_mask >>= 2;
		vecs_mask >>= 1;
		gsc_mask >>= 1;
	}

	xe_gt_assert(gt, !vcs_mask)
	xe_gt_assert(gt, !vecs_mask)
	xe_gt_assert(gt, !gsc_mask)

> +
> +	/* We need at least 2 slices to split them up */

shouldn't we also check that each group has both VCS and VECS?

> +	if (groups < 2) {
> +		*masks = NULL;
> +		*num_masks = 0;
> +		return 0;
> +	}
> +
> +	/*
> +	 * The GuC expects and array with GUC_MAX_ENGINE_CLASSES entries for

typo: an array ?

> +	 * each group.
> +	 */
> +	values = drmm_kzalloc(&gt_to_xe(gt)->drm,
> +			      GUC_MAX_ENGINE_CLASSES * groups * sizeof(u32),
> +			      GFP_KERNEL);
> +	if (!values)
> +		return -ENOMEM;
> +
> +	for_each_hw_engine(hwe, gt, id) {
> +		u8 guc_class = xe_engine_class_to_guc_class(hwe->class);
> +		u8 entry;
> +
> +		switch (hwe->class) {
> +		case XE_ENGINE_CLASS_VIDEO_DECODE:
> +			slice = hwe->instance / 2;

hmm, this seems to duplicate previous loop where we had:

			vcs_mask >>= 2;

maybe these two loops can be combined?

> +			break;
> +		case XE_ENGINE_CLASS_VIDEO_ENHANCE:
> +			slice = hwe->instance;
> +			break;
> +		case XE_ENGINE_CLASS_OTHER:
> +			slice = 0;
> +			break;
> +		default:
> +			xe_gt_sriov_err(gt, "unknown media gt class %u (%s) during EGS setup\n",
> +					hwe->class, hwe->name);

this also seems like an attempt to catch our coding error,
so plain assert should be sufficient

> +			drmm_kfree(&gt_to_xe(gt)->drm, values);

do we need to bother with this kfree ?
it's managed allocation and any error from here will lead to abort the probe anyway

> +			return -EINVAL;
> +		}
> +
> +		entry = (slice_to_group[slice] * GUC_MAX_ENGINE_CLASSES) + guc_class;

redundant ( )

> +		values[entry] |= BIT(hwe->logical_instance);
> +	}
> +
> +	*masks = values;
> +	*num_masks = GUC_MAX_ENGINE_CLASSES * groups;
> +
> +	return 0;
> +}
> +
> +static int pf_init_sched_groups(struct xe_gt *gt)
> +{
> +	int err;
> +	int m;
> +
> +	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
> +
> +	if (GUC_SUBMIT_VER(&gt->uc.guc) < MAKE_GUC_VER(1, 26, 0))
> +		return 0;
> +
> +	for (m = 0; m < XE_SRIOV_SCHED_GROUPS_MODES_COUNT; m++) {
> +		u32 *masks = NULL;
> +		u32 num_masks = 0;
> +
> +		switch (m) {
> +		case XE_SRIOV_SCHED_GROUPS_NONE:
> +			break;
> +		case XE_SRIOV_SCHED_GROUPS_MEDIA_SLICES:
> +			/* this mode only has groups on the media GT */

then having array of all modes on every GT seems non-optimal

> +			if (xe_gt_is_media_type(gt)) {
> +				err = pf_sched_group_media_slices(gt, &masks, &num_masks);
> +				if (err)
> +					return err;
> +			}
> +			break;
> +		default:
> +			xe_gt_sriov_err(gt, "unknown sched group mode %u\n", m);

this looks more like a coding error, so maybe use just assert?

> +			return -EINVAL;
> +		}
> +
> +		xe_gt_assert(gt, (num_masks % GUC_MAX_ENGINE_CLASSES) == 0);
> +
> +		if ((m == XE_SRIOV_SCHED_GROUPS_NONE) || num_masks)
> +			gt->sriov.pf.policy.guc.sched_groups.supported_modes |= BIT(m);
> +
> +		gt->sriov.pf.policy.guc.sched_groups.modes[m].masks = masks;
> +		gt->sriov.pf.policy.guc.sched_groups.modes[m].num_masks = num_masks;
> +	}
> +
> +	return 0;
> +}
> +
>  static void pf_sanitize_guc_policies(struct xe_gt *gt)
>  {
>  	pf_sanitize_sched_if_idle(gt);
> @@ -401,6 +523,19 @@ int xe_gt_sriov_pf_policy_reprovision(struct xe_gt *gt, bool reset)
>  	return err ? -ENXIO : 0;
>  }
>  
> +/**
> + * xe_gt_sriov_pf_policy_init - Initializes the SW state of the PF policies.
> + * @gt: the &xe_gt
> + *
> + * This function can only be called on PF. This function does not touch the HW.

but must be called after checking HW engine fuses, right?

> + *
> + * Return: 0 on success or a negative error code on failure.
> + */
> +int xe_gt_sriov_pf_policy_init(struct xe_gt *gt)
> +{
> +	return pf_init_sched_groups(gt);
> +}
> +
>  static void print_guc_policies(struct drm_printer *p, struct xe_gt_sriov_guc_policies *policy)
>  {
>  	drm_printf(p, "%s:\t%s\n",
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
> index 2a5dc33dc6d7..c9c04d1b7f50 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
> @@ -18,6 +18,7 @@ bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
>  int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
>  u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
>  
> +int xe_gt_sriov_pf_policy_init(struct xe_gt *gt);
>  void xe_gt_sriov_pf_policy_sanitize(struct xe_gt *gt);
>  int xe_gt_sriov_pf_policy_reprovision(struct xe_gt *gt, bool reset);
>  int xe_gt_sriov_pf_policy_print(struct xe_gt *gt, struct drm_printer *p);
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
> index 4de532af135e..3b915801c01b 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
> @@ -8,16 +8,30 @@
>  
>  #include <linux/types.h>
>  

we expect all enums (and enumerators) to have proper kernel-doc

> +enum xe_sriov_sched_group_modes {
> +	XE_SRIOV_SCHED_GROUPS_NONE = 0, /* disabled */

do we really need this?

> +	XE_SRIOV_SCHED_GROUPS_MEDIA_SLICES, /* separate groups for each media slice */
> +	XE_SRIOV_SCHED_GROUPS_MODES_COUNT

while it looks handy solution to count enumerators,
now COUNT pretends to be a valid mode enumerator while it is not

maybe just provide separate #define for it?

> +};
> +
>  /**
>   * struct xe_gt_sriov_guc_policies - GuC SR-IOV policies.
>   * @sched_if_idle: controls strict scheduling policy.
>   * @reset_engine: controls engines reset on VF switch policy.
>   * @sample_period: adverse events sampling period (in milliseconds).
> + * @sched_groups: available scheduling group configurations and current mode.

don't forget to describe inner members

and there is no 'current' now

>   */
>  struct xe_gt_sriov_guc_policies {
>  	bool sched_if_idle;
>  	bool reset_engine;
>  	u32 sample_period;
> +	struct {
> +		u32 supported_modes;
> +		struct {
> +			u32 *masks;
> +			u32 num_masks;
> +		} modes[XE_SRIOV_SCHED_GROUPS_MODES_COUNT];

hmm, for the NONE mode, all groups must be 0, so why bother with .masks/.num fields for it?

maybe all we need is:

	struct {
		const char *name;
		u32 *config;
	} modes[];

then:

	supported_modes := iterate over modes array and look for non-null config/name

	none := supported_modes is empty

	current := pointer to modes[] or NULL

> +	} sched_groups;
>  };
>  
>  /**


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/10] drm/xe/sriov: Initialize scheduler groups
  2025-12-01 22:37   ` Michal Wajdeczko
@ 2025-12-01 23:33     ` Daniele Ceraolo Spurio
  2025-12-02 21:08       ` Michal Wajdeczko
  0 siblings, 1 reply; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-12-01 23:33 UTC (permalink / raw)
  To: Michal Wajdeczko, intel-xe



On 12/1/2025 2:37 PM, Michal Wajdeczko wrote:
> Hi Daniele,
>
> some initial comments to keep you busy ;)
>
> On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
>> Scheduler groups (a.k.a. Engine Groups Scheduling, or EGS) is a GuC
>> feature that allows the driver to define groups of engines that are
>> independently scheduled across VFs, which allows different VFs to be
>> active on the HW at the same time on different groups.
>>
>> This is intended for specific scenarios where the admin knows that the
>> VFs are not going to fully utilize the HW and therefore assigning all of
>> it to a single VF would lead to part of it being permanently idle.
>> We do not allow the admin to decide how to divide the engines across
>> groups, but we instead support specific configurations that are designed
>> for specific use-cases. During PF initialization we detect which
>> configurations are possible on a given GT and create the relevant
>> groups. Since the GuC expect a mask for each class for each group, that
>> is what we save when we init the configs.
>>
>> Right now we only have one use-case on the media GT. If the VFs are
>> running a frame render + encoding at a not-too-high resolution (e.g.
>> 1080@30fps) the render can produce frames faster than the video engine
>> can encode them, which means that the maximum number of parallel VFs is
>> limited by the VCS bandwidth. Since our products can have multiple VCS
>> engines, allowing multiple VFs to be active on the different VCS engines
>> at the same time allows us to run more parallel VFs on the same HW.
>> Given that engines in the same media slice share some resources (e.g.
>> SFC), we assign each media slice to a different scheduling group. We
>> refer to this configuration as "media_slices", given that each slice
>> gets its own group.
>>
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_gt_sriov_pf.c           |   5 +
>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c    | 135 ++++++++++++++++++
>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h    |   1 +
>>   .../gpu/drm/xe/xe_gt_sriov_pf_policy_types.h  |  14 ++
>>   4 files changed, 155 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
>> index 0714c758b9c1..62dda1c24e77 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
>> @@ -14,6 +14,7 @@
>>   #include "xe_gt_sriov_pf_control.h"
>>   #include "xe_gt_sriov_pf_helpers.h"
>>   #include "xe_gt_sriov_pf_migration.h"
>> +#include "xe_gt_sriov_pf_policy.h"
>>   #include "xe_gt_sriov_pf_service.h"
>>   #include "xe_gt_sriov_printk.h"
>>   #include "xe_guc_submit.h"
>> @@ -123,6 +124,10 @@ int xe_gt_sriov_pf_init(struct xe_gt *gt)
>>   	if (err)
>>   		return err;
>>   
>> +	err = xe_gt_sriov_pf_policy_init(gt);
>> +	if (err)
>> +		return err;
>> +
>>   	err = xe_gt_sriov_pf_migration_init(gt);
>>   	if (err)
>>   		return err;
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>> index 4445f660e6d1..9b878578ea90 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>> @@ -5,11 +5,14 @@
>>   
>>   #include "abi/guc_actions_sriov_abi.h"
>>   
>> +#include <drm/drm_managed.h>
> system headers shall be included before our local headers

ok

>
>> +
>>   #include "xe_bo.h"
>>   #include "xe_gt.h"
>>   #include "xe_gt_sriov_pf_helpers.h"
>>   #include "xe_gt_sriov_pf_policy.h"
>>   #include "xe_gt_sriov_printk.h"
>> +#include "xe_guc.h"
>>   #include "xe_guc_buf.h"
>>   #include "xe_guc_ct.h"
>>   #include "xe_guc_klv_helpers.h"
>> @@ -351,6 +354,125 @@ u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt)
>>   	return value;
>>   }
>>   
>> +#define MAX_MEDIA_SLICES (hweight32(XE_HW_ENGINE_VECS_MASK))
> redundant ( )
>
> and maybe such macro should be placed in xe_hw_engine_types.h
> with proper comment why it works:
>
> 	/*
> 	 * Each media slice has 1x VECS
> 	 * Max number of VECS instances gives us a max number of slices
> 	 */

ok

>> +static int pf_sched_group_media_slices(struct xe_gt *gt, u32 **masks, u32 *num_masks)
>> +{
>> +	u8 slice_to_group[MAX_MEDIA_SLICES];
>> +	struct xe_hw_engine *hwe = NULL;
> do we need to initialize this here?
> it will be initialized by for_each_hw_engine, no?
>
>> +	enum xe_hw_engine_id id;
>> +	u32 vcs_mask = VCS_MASK(gt);
>> +	u32 vecs_mask = VECS_MASK(gt);
>> +	u32 gsc_mask = GSCCS_MASK(gt);
> we try to define vars in rev-xmas-tree order
>
>> +	u32 *values;
>> +	u8 slice;
>> +	u8 groups;
> I guess for those two generic counters we don't need to use explicitly sized int
>
>> +
>> +	xe_gt_assert(gt, xe_gt_is_media_type(gt));
>> +
>> +	/* A media slice has 2 VCS and a VECS. We bundle the GSC with the first slice */
>> +	for (slice = 0, groups = 0;
>> +	     gsc_mask || vcs_mask || vecs_mask;
>> +	     slice++, gsc_mask = 0, vcs_mask >>= 2, vecs_mask >>= 1) {
>> +		if (unlikely(slice >= MAX_MEDIA_SLICES)) {
> maybe this 'for' loop should be just for 'slice = [0, MAX)'
>
>> +			xe_gt_sriov_err(gt, "Too many media slices (%u) during EGS setup\n",
>> +					slice);
> then after this loop we can have asserts that no engine instances are left behind
>
>> +			return -EINVAL;
>> +		}
>> +
>> +		if ((vcs_mask & 0x3) || (vecs_mask & 0x1) || (gsc_mask & 0x1))
>> +			slice_to_group[slice] = groups++;
>> +	}
> like this:
>
> 	for (slice = 0; slice < MAX_MEDIA_SLICES; slice++ ) {
>
> 		if ((vcs_mask & 0x3) || (vecs_mask & 0x1) || (gsc_mask & 0x1))
> 			slice_to_group[slice] = groups++;
>
> 		vcs_mask >>= 2;
> 		vecs_mask >>= 1;
> 		gsc_mask >>= 1;
> 	}
>
> 	xe_gt_assert(gt, !vcs_mask)
> 	xe_gt_assert(gt, !vecs_mask)
> 	xe_gt_assert(gt, !gsc_mask)

sure, will change

>> +
>> +	/* We need at least 2 slices to split them up */
> shouldn't we also check that each group has both VCS and VECS?

It's not really necessary, having a group that only has a VCS or a VECS 
is a valid scenario. For example, the main use-case we have right now 
(remote desktop) only cares about VCS, so not having a VECS in a group 
wouldn't impact it.

>
>> +	if (groups < 2) {
>> +		*masks = NULL;
>> +		*num_masks = 0;
>> +		return 0;
>> +	}
>> +
>> +	/*
>> +	 * The GuC expects and array with GUC_MAX_ENGINE_CLASSES entries for
> typo: an array ?

yup

>> +	 * each group.
>> +	 */
>> +	values = drmm_kzalloc(&gt_to_xe(gt)->drm,
>> +			      GUC_MAX_ENGINE_CLASSES * groups * sizeof(u32),
>> +			      GFP_KERNEL);
>> +	if (!values)
>> +		return -ENOMEM;
>> +
>> +	for_each_hw_engine(hwe, gt, id) {
>> +		u8 guc_class = xe_engine_class_to_guc_class(hwe->class);
>> +		u8 entry;
>> +
>> +		switch (hwe->class) {
>> +		case XE_ENGINE_CLASS_VIDEO_DECODE:
>> +			slice = hwe->instance / 2;
> hmm, this seems to duplicate previous loop where we had:
>
> 			vcs_mask >>= 2;
>
> maybe these two loops can be combined?

The issue is that there is an allocation in the middle, which is 
required in the second loop and sized based on the results of the first 
loop. I could always allocate enough memory for all the possible groups 
(i.e. MAX_MEDIA_SLICES), no matter how many there actually are, but I'm 
not sure if that is better than the double loop. I'm ok with switching 
if you think that's a better solution.

>
>> +			break;
>> +		case XE_ENGINE_CLASS_VIDEO_ENHANCE:
>> +			slice = hwe->instance;
>> +			break;
>> +		case XE_ENGINE_CLASS_OTHER:
>> +			slice = 0;
>> +			break;
>> +		default:
>> +			xe_gt_sriov_err(gt, "unknown media gt class %u (%s) during EGS setup\n",
>> +					hwe->class, hwe->name);
> this also seems like an attempt to catch our coding error,
> so plain assert should be sufficient

ok

>
>> +			drmm_kfree(&gt_to_xe(gt)->drm, values);
> do we need to bother with this kfree ?
> it's managed allocation and any error from here will lead to abort the probe anyway

will drop

>
>> +			return -EINVAL;
>> +		}
>> +
>> +		entry = (slice_to_group[slice] * GUC_MAX_ENGINE_CLASSES) + guc_class;
> redundant ( )
>
>> +		values[entry] |= BIT(hwe->logical_instance);
>> +	}
>> +
>> +	*masks = values;
>> +	*num_masks = GUC_MAX_ENGINE_CLASSES * groups;
>> +
>> +	return 0;
>> +}
>> +
>> +static int pf_init_sched_groups(struct xe_gt *gt)
>> +{
>> +	int err;
>> +	int m;
>> +
>> +	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
>> +
>> +	if (GUC_SUBMIT_VER(&gt->uc.guc) < MAKE_GUC_VER(1, 26, 0))
>> +		return 0;
>> +
>> +	for (m = 0; m < XE_SRIOV_SCHED_GROUPS_MODES_COUNT; m++) {
>> +		u32 *masks = NULL;
>> +		u32 num_masks = 0;
>> +
>> +		switch (m) {
>> +		case XE_SRIOV_SCHED_GROUPS_NONE:
>> +			break;
>> +		case XE_SRIOV_SCHED_GROUPS_MEDIA_SLICES:
>> +			/* this mode only has groups on the media GT */
> then having array of all modes on every GT seems non-optimal

My thought here comes from the fact that the sysfs controls are 
per-device, not per-gt, so we need a list of modes that will be usable 
in that interface. My decision was to have a single list of modes that 
applies to both GTs.
To have separate lists of modes per GT we'll then also need to have a 
device-level list of modes that maps to the per-gt lists, which IMO 
makes things more complicated. It's not like we're wasting a huge amount 
of memory anyway.

>
>> +			if (xe_gt_is_media_type(gt)) {
>> +				err = pf_sched_group_media_slices(gt, &masks, &num_masks);
>> +				if (err)
>> +					return err;
>> +			}
>> +			break;
>> +		default:
>> +			xe_gt_sriov_err(gt, "unknown sched group mode %u\n", m);
> this looks more like a coding error, so maybe use just assert?

ok

>
>> +			return -EINVAL;
>> +		}
>> +
>> +		xe_gt_assert(gt, (num_masks % GUC_MAX_ENGINE_CLASSES) == 0);
>> +
>> +		if ((m == XE_SRIOV_SCHED_GROUPS_NONE) || num_masks)
>> +			gt->sriov.pf.policy.guc.sched_groups.supported_modes |= BIT(m);
>> +
>> +		gt->sriov.pf.policy.guc.sched_groups.modes[m].masks = masks;
>> +		gt->sriov.pf.policy.guc.sched_groups.modes[m].num_masks = num_masks;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>   static void pf_sanitize_guc_policies(struct xe_gt *gt)
>>   {
>>   	pf_sanitize_sched_if_idle(gt);
>> @@ -401,6 +523,19 @@ int xe_gt_sriov_pf_policy_reprovision(struct xe_gt *gt, bool reset)
>>   	return err ? -ENXIO : 0;
>>   }
>>   
>> +/**
>> + * xe_gt_sriov_pf_policy_init - Initializes the SW state of the PF policies.
>> + * @gt: the &xe_gt
>> + *
>> + * This function can only be called on PF. This function does not touch the HW.
> but must be called after checking HW engine fuses, right?

yes, I'll expand the doc.

>
>> + *
>> + * Return: 0 on success or a negative error code on failure.
>> + */
>> +int xe_gt_sriov_pf_policy_init(struct xe_gt *gt)
>> +{
>> +	return pf_init_sched_groups(gt);
>> +}
>> +
>>   static void print_guc_policies(struct drm_printer *p, struct xe_gt_sriov_guc_policies *policy)
>>   {
>>   	drm_printf(p, "%s:\t%s\n",
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>> index 2a5dc33dc6d7..c9c04d1b7f50 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>> @@ -18,6 +18,7 @@ bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
>>   int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
>>   u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
>>   
>> +int xe_gt_sriov_pf_policy_init(struct xe_gt *gt);
>>   void xe_gt_sriov_pf_policy_sanitize(struct xe_gt *gt);
>>   int xe_gt_sriov_pf_policy_reprovision(struct xe_gt *gt, bool reset);
>>   int xe_gt_sriov_pf_policy_print(struct xe_gt *gt, struct drm_printer *p);
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
>> index 4de532af135e..3b915801c01b 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
>> @@ -8,16 +8,30 @@
>>   
>>   #include <linux/types.h>
>>   
> we expect all enums (and enumerators) to have proper kernel-doc

will add

>
>> +enum xe_sriov_sched_group_modes {
>> +	XE_SRIOV_SCHED_GROUPS_NONE = 0, /* disabled */
> do we really need this?

This makes things much easier in the follow up patches, because we don't 
have to treat the disabling as a special case. If the user want to 
disabled the feature we map it to this enum and then just handle it as 
any other mode.

>
>> +	XE_SRIOV_SCHED_GROUPS_MEDIA_SLICES, /* separate groups for each media slice */
>> +	XE_SRIOV_SCHED_GROUPS_MODES_COUNT
> while it looks handy solution to count enumerators,
> now COUNT pretends to be a valid mode enumerator while it is not
>
> maybe just provide separate #define for it?

IMO this is safer for future enum extensions. Also, we do it this way 
all over the driver, including in SRIOV code (e.g. 
XE_SRIOV_VF_CCS_CTX_COUNT).

>
>> +};
>> +
>>   /**
>>    * struct xe_gt_sriov_guc_policies - GuC SR-IOV policies.
>>    * @sched_if_idle: controls strict scheduling policy.
>>    * @reset_engine: controls engines reset on VF switch policy.
>>    * @sample_period: adverse events sampling period (in milliseconds).
>> + * @sched_groups: available scheduling group configurations and current mode.
> don't forget to describe inner members
>
> and there is no 'current' now

oops, that comes later

>
>>    */
>>   struct xe_gt_sriov_guc_policies {
>>   	bool sched_if_idle;
>>   	bool reset_engine;
>>   	u32 sample_period;
>> +	struct {
>> +		u32 supported_modes;
>> +		struct {
>> +			u32 *masks;
>> +			u32 num_masks;
>> +		} modes[XE_SRIOV_SCHED_GROUPS_MODES_COUNT];
> hmm, for the NONE mode, all groups must be 0, so why bother with .masks/.num fields for it?

Again for follow up. To disable the groups we need to send the KLV with 
no data. The way I've done it is that we just set the mode to 
XE_SRIOV_SCHED_GROUPS_NONE and the underlying code will automatically 
handle it because the mask is null, without having to know that this 
particular setting is actually turning the feature off. See 
__pf_provision_sched_groups in the next patch.
Also, I was thinking that for sysfs we could allow the code to set 
XE_SRIOV_SCHED_GROUPS_MEDIA_SLICES on both GTs and since on GT0 that's 
not a supported mode it will automatically map to empty masks/num_masks, 
which will tell the GuC to keep the feature disabled.

Daniele

>
> maybe all we need is:
>
> 	struct {
> 		const char *name;
> 		u32 *config;
> 	} modes[];
>
> then:
>
> 	supported_modes := iterate over modes array and look for non-null config/name
>
> 	none := supported_modes is empty
>
> 	current := pointer to modes[] or NULL
>
>> +	} sched_groups;
>>   };
>>   
>>   /**


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/10] drm/xe/sriov: Initialize scheduler groups
  2025-12-01 23:33     ` Daniele Ceraolo Spurio
@ 2025-12-02 21:08       ` Michal Wajdeczko
  2025-12-02 23:02         ` Daniele Ceraolo Spurio
  2025-12-03  1:15         ` Daniele Ceraolo Spurio
  0 siblings, 2 replies; 44+ messages in thread
From: Michal Wajdeczko @ 2025-12-02 21:08 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-xe



On 12/2/2025 12:33 AM, Daniele Ceraolo Spurio wrote:
> 
> 
> On 12/1/2025 2:37 PM, Michal Wajdeczko wrote:
>> Hi Daniele,
>>
>> some initial comments to keep you busy ;)
>>
>> On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
>>> Scheduler groups (a.k.a. Engine Groups Scheduling, or EGS) is a GuC
>>> feature that allows the driver to define groups of engines that are
>>> independently scheduled across VFs, which allows different VFs to be
>>> active on the HW at the same time on different groups.
>>>
>>> This is intended for specific scenarios where the admin knows that the
>>> VFs are not going to fully utilize the HW and therefore assigning all of
>>> it to a single VF would lead to part of it being permanently idle.
>>> We do not allow the admin to decide how to divide the engines across
>>> groups, but we instead support specific configurations that are designed
>>> for specific use-cases. During PF initialization we detect which
>>> configurations are possible on a given GT and create the relevant
>>> groups. Since the GuC expect a mask for each class for each group, that
>>> is what we save when we init the configs.
>>>
>>> Right now we only have one use-case on the media GT. If the VFs are
>>> running a frame render + encoding at a not-too-high resolution (e.g.
>>> 1080@30fps) the render can produce frames faster than the video engine
>>> can encode them, which means that the maximum number of parallel VFs is
>>> limited by the VCS bandwidth. Since our products can have multiple VCS
>>> engines, allowing multiple VFs to be active on the different VCS engines
>>> at the same time allows us to run more parallel VFs on the same HW.
>>> Given that engines in the same media slice share some resources (e.g.
>>> SFC), we assign each media slice to a different scheduling group. We
>>> refer to this configuration as "media_slices", given that each slice
>>> gets its own group.
>>>
>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>> ---
>>>   drivers/gpu/drm/xe/xe_gt_sriov_pf.c           |   5 +
>>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c    | 135 ++++++++++++++++++
>>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h    |   1 +
>>>   .../gpu/drm/xe/xe_gt_sriov_pf_policy_types.h  |  14 ++
>>>   4 files changed, 155 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
>>> index 0714c758b9c1..62dda1c24e77 100644
>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
>>> @@ -14,6 +14,7 @@
>>>   #include "xe_gt_sriov_pf_control.h"
>>>   #include "xe_gt_sriov_pf_helpers.h"
>>>   #include "xe_gt_sriov_pf_migration.h"
>>> +#include "xe_gt_sriov_pf_policy.h"
>>>   #include "xe_gt_sriov_pf_service.h"
>>>   #include "xe_gt_sriov_printk.h"
>>>   #include "xe_guc_submit.h"
>>> @@ -123,6 +124,10 @@ int xe_gt_sriov_pf_init(struct xe_gt *gt)
>>>       if (err)
>>>           return err;
>>>   +    err = xe_gt_sriov_pf_policy_init(gt);
>>> +    if (err)
>>> +        return err;
>>> +
>>>       err = xe_gt_sriov_pf_migration_init(gt);
>>>       if (err)
>>>           return err;
>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>> index 4445f660e6d1..9b878578ea90 100644
>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>> @@ -5,11 +5,14 @@
>>>     #include "abi/guc_actions_sriov_abi.h"
>>>   +#include <drm/drm_managed.h>
>> system headers shall be included before our local headers
> 
> ok
> 
>>
>>> +
>>>   #include "xe_bo.h"
>>>   #include "xe_gt.h"
>>>   #include "xe_gt_sriov_pf_helpers.h"
>>>   #include "xe_gt_sriov_pf_policy.h"
>>>   #include "xe_gt_sriov_printk.h"
>>> +#include "xe_guc.h"
>>>   #include "xe_guc_buf.h"
>>>   #include "xe_guc_ct.h"
>>>   #include "xe_guc_klv_helpers.h"
>>> @@ -351,6 +354,125 @@ u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt)
>>>       return value;
>>>   }
>>>   +#define MAX_MEDIA_SLICES (hweight32(XE_HW_ENGINE_VECS_MASK))
>> redundant ( )
>>
>> and maybe such macro should be placed in xe_hw_engine_types.h
>> with proper comment why it works:
>>
>>     /*
>>      * Each media slice has 1x VECS
>>      * Max number of VECS instances gives us a max number of slices
>>      */
> 
> ok
> 
>>> +static int pf_sched_group_media_slices(struct xe_gt *gt, u32 **masks, u32 *num_masks)
>>> +{
>>> +    u8 slice_to_group[MAX_MEDIA_SLICES];
>>> +    struct xe_hw_engine *hwe = NULL;
>> do we need to initialize this here?
>> it will be initialized by for_each_hw_engine, no?
>>
>>> +    enum xe_hw_engine_id id;
>>> +    u32 vcs_mask = VCS_MASK(gt);
>>> +    u32 vecs_mask = VECS_MASK(gt);
>>> +    u32 gsc_mask = GSCCS_MASK(gt);
>> we try to define vars in rev-xmas-tree order
>>
>>> +    u32 *values;
>>> +    u8 slice;
>>> +    u8 groups;
>> I guess for those two generic counters we don't need to use explicitly sized int
>>
>>> +
>>> +    xe_gt_assert(gt, xe_gt_is_media_type(gt));
>>> +
>>> +    /* A media slice has 2 VCS and a VECS. We bundle the GSC with the first slice */
>>> +    for (slice = 0, groups = 0;
>>> +         gsc_mask || vcs_mask || vecs_mask;
>>> +         slice++, gsc_mask = 0, vcs_mask >>= 2, vecs_mask >>= 1) {
>>> +        if (unlikely(slice >= MAX_MEDIA_SLICES)) {
>> maybe this 'for' loop should be just for 'slice = [0, MAX)'
>>
>>> +            xe_gt_sriov_err(gt, "Too many media slices (%u) during EGS setup\n",
>>> +                    slice);
>> then after this loop we can have asserts that no engine instances are left behind
>>
>>> +            return -EINVAL;
>>> +        }
>>> +
>>> +        if ((vcs_mask & 0x3) || (vecs_mask & 0x1) || (gsc_mask & 0x1))
>>> +            slice_to_group[slice] = groups++;
>>> +    }
>> like this:
>>
>>     for (slice = 0; slice < MAX_MEDIA_SLICES; slice++ ) {
>>
>>         if ((vcs_mask & 0x3) || (vecs_mask & 0x1) || (gsc_mask & 0x1))
>>             slice_to_group[slice] = groups++;
>>
>>         vcs_mask >>= 2;
>>         vecs_mask >>= 1;
>>         gsc_mask >>= 1;
>>     }
>>
>>     xe_gt_assert(gt, !vcs_mask)
>>     xe_gt_assert(gt, !vecs_mask)
>>     xe_gt_assert(gt, !gsc_mask)
> 
> sure, will change
> 
>>> +
>>> +    /* We need at least 2 slices to split them up */
>> shouldn't we also check that each group has both VCS and VECS?
> 
> It's not really necessary, having a group that only has a VCS or a VECS is a valid scenario. For example, the main use-case we have right now (remote desktop) only cares about VCS, so not having a VECS in a group wouldn't impact it.
> 
>>
>>> +    if (groups < 2) {
>>> +        *masks = NULL;
>>> +        *num_masks = 0;
>>> +        return 0;
>>> +    }
>>> +
>>> +    /*
>>> +     * The GuC expects and array with GUC_MAX_ENGINE_CLASSES entries for
>> typo: an array ?
> 
> yup
> 
>>> +     * each group.
>>> +     */
>>> +    values = drmm_kzalloc(&gt_to_xe(gt)->drm,
>>> +                  GUC_MAX_ENGINE_CLASSES * groups * sizeof(u32),
>>> +                  GFP_KERNEL);
>>> +    if (!values)
>>> +        return -ENOMEM;
>>> +
>>> +    for_each_hw_engine(hwe, gt, id) {
>>> +        u8 guc_class = xe_engine_class_to_guc_class(hwe->class);
>>> +        u8 entry;
>>> +
>>> +        switch (hwe->class) {
>>> +        case XE_ENGINE_CLASS_VIDEO_DECODE:
>>> +            slice = hwe->instance / 2;
>> hmm, this seems to duplicate previous loop where we had:
>>
>>             vcs_mask >>= 2;
>>
>> maybe these two loops can be combined?
> 
> The issue is that there is an allocation in the middle, which is required in the second loop and sized based on the results of the first loop. I could always allocate enough memory for all the possible groups (i.e. MAX_MEDIA_SLICES), no matter how many there actually are, but I'm not sure if that is better than the double loop. I'm ok with switching if you think that's a better solution.

if you can try, then yes please, but if that's too much - we can rework this later if needed

> 
>>
>>> +            break;
>>> +        case XE_ENGINE_CLASS_VIDEO_ENHANCE:
>>> +            slice = hwe->instance;
>>> +            break;
>>> +        case XE_ENGINE_CLASS_OTHER:
>>> +            slice = 0;
>>> +            break;
>>> +        default:
>>> +            xe_gt_sriov_err(gt, "unknown media gt class %u (%s) during EGS setup\n",
>>> +                    hwe->class, hwe->name);
>> this also seems like an attempt to catch our coding error,
>> so plain assert should be sufficient
> 
> ok
> 
>>
>>> +            drmm_kfree(&gt_to_xe(gt)->drm, values);
>> do we need to bother with this kfree ?
>> it's managed allocation and any error from here will lead to abort the probe anyway
> 
> will drop
> 
>>
>>> +            return -EINVAL;
>>> +        }
>>> +
>>> +        entry = (slice_to_group[slice] * GUC_MAX_ENGINE_CLASSES) + guc_class;
>> redundant ( )
>>
>>> +        values[entry] |= BIT(hwe->logical_instance);
>>> +    }
>>> +
>>> +    *masks = values;
>>> +    *num_masks = GUC_MAX_ENGINE_CLASSES * groups;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int pf_init_sched_groups(struct xe_gt *gt)
>>> +{
>>> +    int err;
>>> +    int m;
>>> +
>>> +    xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
>>> +
>>> +    if (GUC_SUBMIT_VER(&gt->uc.guc) < MAKE_GUC_VER(1, 26, 0))
>>> +        return 0;
>>> +
>>> +    for (m = 0; m < XE_SRIOV_SCHED_GROUPS_MODES_COUNT; m++) {
>>> +        u32 *masks = NULL;
>>> +        u32 num_masks = 0;
>>> +
>>> +        switch (m) {
>>> +        case XE_SRIOV_SCHED_GROUPS_NONE:
>>> +            break;
>>> +        case XE_SRIOV_SCHED_GROUPS_MEDIA_SLICES:
>>> +            /* this mode only has groups on the media GT */
>> then having array of all modes on every GT seems non-optimal
> 
> My thought here comes from the fact that the sysfs controls are per-device, not per-gt, so we need a list of modes that will be usable in that interface. My decision was to have a single list of modes that applies to both GTs.
> To have separate lists of modes per GT we'll then also need to have a device-level list of modes that maps to the per-gt lists, which IMO makes things more complicated. It's not like we're wasting a huge amount of memory anyway.

but how this sysfs device-level uabi could look like?

for now we have just two options: either EGS is not avail/used, or it's configured to single fixed preset config (if one is available)
we can't have or select any more configs per-GT as VF will not be able to detect any other variant (it can only see if EGS is on/off)

but that's on the GT level, so in theory we can combine individual GT-modes into some device level combo

my understanding is that on the sysfs we will expose something like this (if EGS config is available):

for current media_slice only case:

├── .scheduling_groups
    ├── mode				# RW: none [media_slice]
    ├── current -> available/media_slice
    ├── available
        ├── media_slice
        │   ├── group0			# RO: rcs0 ccs0 ...
        │   ├── group1			# RO: vcs0 vcs1 vecs0
        │   └── group2			# RO: vcs2 vcs2 vecs1

but if we could create some EGS config also on the GT0 then this could be expanded to:

├── .scheduling_groups
    ├── mode				# RW: none compute [media] compute_media
    ├── current -> available/media
    ├── available
        ├── compute
        │   ├── group0			# RO: ccs0 ...
        │   ├── group1			# RO: ccs1 ...
        │   └── group2			# RO: vcs0 vcs1 vecs0 vcs2 vcs2 vecs1
        ├── media
        │   ├── group0			# RO: rcs0 ccs0 ...
        │   ├── group1			# RO: vcs0 vcs1 vecs0
        │   └── group2			# RO: vcs2 vcs2 vecs1
        ├── compute_media
        │   ├── group0			# RO: ccs0 ...
        │   ├── group1			# RO: ccs1 ...
        │   ├── group2			# RO: vcs0 vcs1 vecs0
        │   └── group3			# RO: vcs2 vcs2 vecs1


but maybe it's too early to discuss this ...


> 
>>
>>> +            if (xe_gt_is_media_type(gt)) {
>>> +                err = pf_sched_group_media_slices(gt, &masks, &num_masks);
>>> +                if (err)
>>> +                    return err;
>>> +            }
>>> +            break;
>>> +        default:
>>> +            xe_gt_sriov_err(gt, "unknown sched group mode %u\n", m);
>> this looks more like a coding error, so maybe use just assert?
> 
> ok
> 
>>
>>> +            return -EINVAL;
>>> +        }
>>> +
>>> +        xe_gt_assert(gt, (num_masks % GUC_MAX_ENGINE_CLASSES) == 0);
>>> +
>>> +        if ((m == XE_SRIOV_SCHED_GROUPS_NONE) || num_masks)
>>> +            gt->sriov.pf.policy.guc.sched_groups.supported_modes |= BIT(m);
>>> +
>>> +        gt->sriov.pf.policy.guc.sched_groups.modes[m].masks = masks;
>>> +        gt->sriov.pf.policy.guc.sched_groups.modes[m].num_masks = num_masks;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>   static void pf_sanitize_guc_policies(struct xe_gt *gt)
>>>   {
>>>       pf_sanitize_sched_if_idle(gt);
>>> @@ -401,6 +523,19 @@ int xe_gt_sriov_pf_policy_reprovision(struct xe_gt *gt, bool reset)
>>>       return err ? -ENXIO : 0;
>>>   }
>>>   +/**
>>> + * xe_gt_sriov_pf_policy_init - Initializes the SW state of the PF policies.
>>> + * @gt: the &xe_gt
>>> + *
>>> + * This function can only be called on PF. This function does not touch the HW.
>> but must be called after checking HW engine fuses, right?
> 
> yes, I'll expand the doc.
> 
>>
>>> + *
>>> + * Return: 0 on success or a negative error code on failure.
>>> + */
>>> +int xe_gt_sriov_pf_policy_init(struct xe_gt *gt)
>>> +{
>>> +    return pf_init_sched_groups(gt);
>>> +}
>>> +
>>>   static void print_guc_policies(struct drm_printer *p, struct xe_gt_sriov_guc_policies *policy)
>>>   {
>>>       drm_printf(p, "%s:\t%s\n",
>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>> index 2a5dc33dc6d7..c9c04d1b7f50 100644
>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>> @@ -18,6 +18,7 @@ bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
>>>   int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
>>>   u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
>>>   +int xe_gt_sriov_pf_policy_init(struct xe_gt *gt);
>>>   void xe_gt_sriov_pf_policy_sanitize(struct xe_gt *gt);
>>>   int xe_gt_sriov_pf_policy_reprovision(struct xe_gt *gt, bool reset);
>>>   int xe_gt_sriov_pf_policy_print(struct xe_gt *gt, struct drm_printer *p);
>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
>>> index 4de532af135e..3b915801c01b 100644
>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
>>> @@ -8,16 +8,30 @@
>>>     #include <linux/types.h>
>>>   
>> we expect all enums (and enumerators) to have proper kernel-doc
> 
> will add
> 
>>
>>> +enum xe_sriov_sched_group_modes {
>>> +    XE_SRIOV_SCHED_GROUPS_NONE = 0, /* disabled */
>> do we really need this?
> 
> This makes things much easier in the follow up patches, because we don't have to treat the disabling as a special case. If the user want to disabled the feature we map it to this enum and then just handle it as any other mode.
> 
>>
>>> +    XE_SRIOV_SCHED_GROUPS_MEDIA_SLICES, /* separate groups for each media slice */
>>> +    XE_SRIOV_SCHED_GROUPS_MODES_COUNT
>> while it looks handy solution to count enumerators,
>> now COUNT pretends to be a valid mode enumerator while it is not
>>
>> maybe just provide separate #define for it?
> 
> IMO this is safer for future enum extensions. 

missing enumerators in 'switch' can be captured by the compiler

> Also, we do it this way all over the driver, including in SRIOV code (e.g. XE_SRIOV_VF_CCS_CTX_COUNT).

it wasn't me to let this in ;)

> 
>>
>>> +};
>>> +
>>>   /**
>>>    * struct xe_gt_sriov_guc_policies - GuC SR-IOV policies.
>>>    * @sched_if_idle: controls strict scheduling policy.
>>>    * @reset_engine: controls engines reset on VF switch policy.
>>>    * @sample_period: adverse events sampling period (in milliseconds).
>>> + * @sched_groups: available scheduling group configurations and current mode.
>> don't forget to describe inner members
>>
>> and there is no 'current' now
> 
> oops, that comes later
> 
>>
>>>    */
>>>   struct xe_gt_sriov_guc_policies {
>>>       bool sched_if_idle;
>>>       bool reset_engine;
>>>       u32 sample_period;
>>> +    struct {
>>> +        u32 supported_modes;
>>> +        struct {
>>> +            u32 *masks;
>>> +            u32 num_masks;
>>> +        } modes[XE_SRIOV_SCHED_GROUPS_MODES_COUNT];
>> hmm, for the NONE mode, all groups must be 0, so why bother with .masks/.num fields for it?
> 
> Again for follow up. To disable the groups we need to send the KLV with no data. The way I've done it is that we just set the mode to XE_SRIOV_SCHED_GROUPS_NONE and the underlying code will automatically handle it because the mask is null, without having to know that this particular setting is actually turning the feature off. See __pf_provision_sched_groups in the next patch.

but why make it so hidden/secret (and to some extend error-prone)

disabling EGS is an explicit activity, IMO it shouldn't be done due to lack of some bits (that might be not intentional)

> Also, I was thinking that for sysfs we could allow the code to set XE_SRIOV_SCHED_GROUPS_MEDIA_SLICES on both GTs and since on GT0 that's not a supported mode it will automatically map to empty masks/num_masks, which will tell the GuC to keep the feature disabled.

not quite happy, but if you insist, will not block on that

> 
> Daniele
> 
>>
>> maybe all we need is:
>>
>>     struct {
>>         const char *name;
>>         u32 *config;
>>     } modes[];
>>
>> then:
>>
>>     supported_modes := iterate over modes array and look for non-null config/name
>>
>>     none := supported_modes is empty
>>
>>     current := pointer to modes[] or NULL
>>
>>> +    } sched_groups;
>>>   };
>>>     /**
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/10] drm/xe/sriov: Initialize scheduler groups
  2025-12-02 21:08       ` Michal Wajdeczko
@ 2025-12-02 23:02         ` Daniele Ceraolo Spurio
  2025-12-03  1:15         ` Daniele Ceraolo Spurio
  1 sibling, 0 replies; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-12-02 23:02 UTC (permalink / raw)
  To: Michal Wajdeczko, intel-xe



On 12/2/2025 1:08 PM, Michal Wajdeczko wrote:
>
> On 12/2/2025 12:33 AM, Daniele Ceraolo Spurio wrote:
>>
>> On 12/1/2025 2:37 PM, Michal Wajdeczko wrote:
>>> Hi Daniele,
>>>
>>> some initial comments to keep you busy ;)
>>>
>>> On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
>>>> Scheduler groups (a.k.a. Engine Groups Scheduling, or EGS) is a GuC
>>>> feature that allows the driver to define groups of engines that are
>>>> independently scheduled across VFs, which allows different VFs to be
>>>> active on the HW at the same time on different groups.
>>>>
>>>> This is intended for specific scenarios where the admin knows that the
>>>> VFs are not going to fully utilize the HW and therefore assigning all of
>>>> it to a single VF would lead to part of it being permanently idle.
>>>> We do not allow the admin to decide how to divide the engines across
>>>> groups, but we instead support specific configurations that are designed
>>>> for specific use-cases. During PF initialization we detect which
>>>> configurations are possible on a given GT and create the relevant
>>>> groups. Since the GuC expect a mask for each class for each group, that
>>>> is what we save when we init the configs.
>>>>
>>>> Right now we only have one use-case on the media GT. If the VFs are
>>>> running a frame render + encoding at a not-too-high resolution (e.g.
>>>> 1080@30fps) the render can produce frames faster than the video engine
>>>> can encode them, which means that the maximum number of parallel VFs is
>>>> limited by the VCS bandwidth. Since our products can have multiple VCS
>>>> engines, allowing multiple VFs to be active on the different VCS engines
>>>> at the same time allows us to run more parallel VFs on the same HW.
>>>> Given that engines in the same media slice share some resources (e.g.
>>>> SFC), we assign each media slice to a different scheduling group. We
>>>> refer to this configuration as "media_slices", given that each slice
>>>> gets its own group.
>>>>
>>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/xe/xe_gt_sriov_pf.c           |   5 +
>>>>    drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c    | 135 ++++++++++++++++++
>>>>    drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h    |   1 +
>>>>    .../gpu/drm/xe/xe_gt_sriov_pf_policy_types.h  |  14 ++
>>>>    4 files changed, 155 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
>>>> index 0714c758b9c1..62dda1c24e77 100644
>>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
>>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
>>>> @@ -14,6 +14,7 @@
>>>>    #include "xe_gt_sriov_pf_control.h"
>>>>    #include "xe_gt_sriov_pf_helpers.h"
>>>>    #include "xe_gt_sriov_pf_migration.h"
>>>> +#include "xe_gt_sriov_pf_policy.h"
>>>>    #include "xe_gt_sriov_pf_service.h"
>>>>    #include "xe_gt_sriov_printk.h"
>>>>    #include "xe_guc_submit.h"
>>>> @@ -123,6 +124,10 @@ int xe_gt_sriov_pf_init(struct xe_gt *gt)
>>>>        if (err)
>>>>            return err;
>>>>    +    err = xe_gt_sriov_pf_policy_init(gt);
>>>> +    if (err)
>>>> +        return err;
>>>> +
>>>>        err = xe_gt_sriov_pf_migration_init(gt);
>>>>        if (err)
>>>>            return err;
>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>>> index 4445f660e6d1..9b878578ea90 100644
>>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>>> @@ -5,11 +5,14 @@
>>>>      #include "abi/guc_actions_sriov_abi.h"
>>>>    +#include <drm/drm_managed.h>
>>> system headers shall be included before our local headers
>> ok
>>
>>>> +
>>>>    #include "xe_bo.h"
>>>>    #include "xe_gt.h"
>>>>    #include "xe_gt_sriov_pf_helpers.h"
>>>>    #include "xe_gt_sriov_pf_policy.h"
>>>>    #include "xe_gt_sriov_printk.h"
>>>> +#include "xe_guc.h"
>>>>    #include "xe_guc_buf.h"
>>>>    #include "xe_guc_ct.h"
>>>>    #include "xe_guc_klv_helpers.h"
>>>> @@ -351,6 +354,125 @@ u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt)
>>>>        return value;
>>>>    }
>>>>    +#define MAX_MEDIA_SLICES (hweight32(XE_HW_ENGINE_VECS_MASK))
>>> redundant ( )
>>>
>>> and maybe such macro should be placed in xe_hw_engine_types.h
>>> with proper comment why it works:
>>>
>>>      /*
>>>       * Each media slice has 1x VECS
>>>       * Max number of VECS instances gives us a max number of slices
>>>       */
>> ok
>>
>>>> +static int pf_sched_group_media_slices(struct xe_gt *gt, u32 **masks, u32 *num_masks)
>>>> +{
>>>> +    u8 slice_to_group[MAX_MEDIA_SLICES];
>>>> +    struct xe_hw_engine *hwe = NULL;
>>> do we need to initialize this here?
>>> it will be initialized by for_each_hw_engine, no?
>>>
>>>> +    enum xe_hw_engine_id id;
>>>> +    u32 vcs_mask = VCS_MASK(gt);
>>>> +    u32 vecs_mask = VECS_MASK(gt);
>>>> +    u32 gsc_mask = GSCCS_MASK(gt);
>>> we try to define vars in rev-xmas-tree order
>>>
>>>> +    u32 *values;
>>>> +    u8 slice;
>>>> +    u8 groups;
>>> I guess for those two generic counters we don't need to use explicitly sized int
>>>
>>>> +
>>>> +    xe_gt_assert(gt, xe_gt_is_media_type(gt));
>>>> +
>>>> +    /* A media slice has 2 VCS and a VECS. We bundle the GSC with the first slice */
>>>> +    for (slice = 0, groups = 0;
>>>> +         gsc_mask || vcs_mask || vecs_mask;
>>>> +         slice++, gsc_mask = 0, vcs_mask >>= 2, vecs_mask >>= 1) {
>>>> +        if (unlikely(slice >= MAX_MEDIA_SLICES)) {
>>> maybe this 'for' loop should be just for 'slice = [0, MAX)'
>>>
>>>> +            xe_gt_sriov_err(gt, "Too many media slices (%u) during EGS setup\n",
>>>> +                    slice);
>>> then after this loop we can have asserts that no engine instances are left behind
>>>
>>>> +            return -EINVAL;
>>>> +        }
>>>> +
>>>> +        if ((vcs_mask & 0x3) || (vecs_mask & 0x1) || (gsc_mask & 0x1))
>>>> +            slice_to_group[slice] = groups++;
>>>> +    }
>>> like this:
>>>
>>>      for (slice = 0; slice < MAX_MEDIA_SLICES; slice++ ) {
>>>
>>>          if ((vcs_mask & 0x3) || (vecs_mask & 0x1) || (gsc_mask & 0x1))
>>>              slice_to_group[slice] = groups++;
>>>
>>>          vcs_mask >>= 2;
>>>          vecs_mask >>= 1;
>>>          gsc_mask >>= 1;
>>>      }
>>>
>>>      xe_gt_assert(gt, !vcs_mask)
>>>      xe_gt_assert(gt, !vecs_mask)
>>>      xe_gt_assert(gt, !gsc_mask)
>> sure, will change
>>
>>>> +
>>>> +    /* We need at least 2 slices to split them up */
>>> shouldn't we also check that each group has both VCS and VECS?
>> It's not really necessary, having a group that only has a VCS or a VECS is a valid scenario. For example, the main use-case we have right now (remote desktop) only cares about VCS, so not having a VECS in a group wouldn't impact it.
>>
>>>> +    if (groups < 2) {
>>>> +        *masks = NULL;
>>>> +        *num_masks = 0;
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * The GuC expects and array with GUC_MAX_ENGINE_CLASSES entries for
>>> typo: an array ?
>> yup
>>
>>>> +     * each group.
>>>> +     */
>>>> +    values = drmm_kzalloc(&gt_to_xe(gt)->drm,
>>>> +                  GUC_MAX_ENGINE_CLASSES * groups * sizeof(u32),
>>>> +                  GFP_KERNEL);
>>>> +    if (!values)
>>>> +        return -ENOMEM;
>>>> +
>>>> +    for_each_hw_engine(hwe, gt, id) {
>>>> +        u8 guc_class = xe_engine_class_to_guc_class(hwe->class);
>>>> +        u8 entry;
>>>> +
>>>> +        switch (hwe->class) {
>>>> +        case XE_ENGINE_CLASS_VIDEO_DECODE:
>>>> +            slice = hwe->instance / 2;
>>> hmm, this seems to duplicate previous loop where we had:
>>>
>>>              vcs_mask >>= 2;
>>>
>>> maybe these two loops can be combined?
>> The issue is that there is an allocation in the middle, which is required in the second loop and sized based on the results of the first loop. I could always allocate enough memory for all the possible groups (i.e. MAX_MEDIA_SLICES), no matter how many there actually are, but I'm not sure if that is better than the double loop. I'm ok with switching if you think that's a better solution.
> if you can try, then yes please, but if that's too much - we can rework this later if needed
>
>>>> +            break;
>>>> +        case XE_ENGINE_CLASS_VIDEO_ENHANCE:
>>>> +            slice = hwe->instance;
>>>> +            break;
>>>> +        case XE_ENGINE_CLASS_OTHER:
>>>> +            slice = 0;
>>>> +            break;
>>>> +        default:
>>>> +            xe_gt_sriov_err(gt, "unknown media gt class %u (%s) during EGS setup\n",
>>>> +                    hwe->class, hwe->name);
>>> this also seems like an attempt to catch our coding error,
>>> so plain assert should be sufficient
>> ok
>>
>>>> +            drmm_kfree(&gt_to_xe(gt)->drm, values);
>>> do we need to bother with this kfree ?
>>> it's managed allocation and any error from here will lead to abort the probe anyway
>> will drop
>>
>>>> +            return -EINVAL;
>>>> +        }
>>>> +
>>>> +        entry = (slice_to_group[slice] * GUC_MAX_ENGINE_CLASSES) + guc_class;
>>> redundant ( )
>>>
>>>> +        values[entry] |= BIT(hwe->logical_instance);
>>>> +    }
>>>> +
>>>> +    *masks = values;
>>>> +    *num_masks = GUC_MAX_ENGINE_CLASSES * groups;
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static int pf_init_sched_groups(struct xe_gt *gt)
>>>> +{
>>>> +    int err;
>>>> +    int m;
>>>> +
>>>> +    xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
>>>> +
>>>> +    if (GUC_SUBMIT_VER(&gt->uc.guc) < MAKE_GUC_VER(1, 26, 0))
>>>> +        return 0;
>>>> +
>>>> +    for (m = 0; m < XE_SRIOV_SCHED_GROUPS_MODES_COUNT; m++) {
>>>> +        u32 *masks = NULL;
>>>> +        u32 num_masks = 0;
>>>> +
>>>> +        switch (m) {
>>>> +        case XE_SRIOV_SCHED_GROUPS_NONE:
>>>> +            break;
>>>> +        case XE_SRIOV_SCHED_GROUPS_MEDIA_SLICES:
>>>> +            /* this mode only has groups on the media GT */
>>> then having array of all modes on every GT seems non-optimal
>> My thought here comes from the fact that the sysfs controls are per-device, not per-gt, so we need a list of modes that will be usable in that interface. My decision was to have a single list of modes that applies to both GTs.
>> To have separate lists of modes per GT we'll then also need to have a device-level list of modes that maps to the per-gt lists, which IMO makes things more complicated. It's not like we're wasting a huge amount of memory anyway.
> but how this sysfs device-level uabi could look like?
>
> for now we have just two options: either EGS is not avail/used, or it's configured to single fixed preset config (if one is available)
> we can't have or select any more configs per-GT as VF will not be able to detect any other variant (it can only see if EGS is on/off)

Regarding the variants, the idea is that a VF will be curated to run on 
the specific mode. So the VF doesn't actually need to check the mode, 
the check is only there to catch programming errors where a VF 
incorrectly tries to enable multi-lrc (which is disabled for all EGS modes).

>
> but that's on the GT level, so in theory we can combine individual GT-modes into some device level combo
>
> my understanding is that on the sysfs we will expose something like this (if EGS config is available):
>
> for current media_slice only case:
>
> ├── .scheduling_groups
>      ├── mode				# RW: none [media_slice]
>      ├── current -> available/media_slice
>      ├── available
>          ├── media_slice
>          │   ├── group0			# RO: rcs0 ccs0 ...
>          │   ├── group1			# RO: vcs0 vcs1 vecs0
>          │   └── group2			# RO: vcs2 vcs2 vecs1
>
> but if we could create some EGS config also on the GT0 then this could be expanded to:
>
> ├── .scheduling_groups
>      ├── mode				# RW: none compute [media] compute_media
>      ├── current -> available/media
>      ├── available
>          ├── compute
>          │   ├── group0			# RO: ccs0 ...
>          │   ├── group1			# RO: ccs1 ...
>          │   └── group2			# RO: vcs0 vcs1 vecs0 vcs2 vcs2 vecs1
>          ├── media
>          │   ├── group0			# RO: rcs0 ccs0 ...
>          │   ├── group1			# RO: vcs0 vcs1 vecs0
>          │   └── group2			# RO: vcs2 vcs2 vecs1
>          ├── compute_media
>          │   ├── group0			# RO: ccs0 ...
>          │   ├── group1			# RO: ccs1 ...
>          │   ├── group2			# RO: vcs0 vcs1 vecs0
>          │   └── group3			# RO: vcs2 vcs2 vecs1
>
>
> but maybe it's too early to discuss this ...

My idea was that the groups would always apply to both GTs, so the 
media_slices group has an empty config on GT0 (which automatically maps 
to all engines) while on GT1 it has a valid config.
But anyway I agree that it is too early, if we ever need to split the 
GT0 and GT1 configs we can do it at a later point when we do actually 
have known use-cases.

>
>
>>>> +            if (xe_gt_is_media_type(gt)) {
>>>> +                err = pf_sched_group_media_slices(gt, &masks, &num_masks);
>>>> +                if (err)
>>>> +                    return err;
>>>> +            }
>>>> +            break;
>>>> +        default:
>>>> +            xe_gt_sriov_err(gt, "unknown sched group mode %u\n", m);
>>> this looks more like a coding error, so maybe use just assert?
>> ok
>>
>>>> +            return -EINVAL;
>>>> +        }
>>>> +
>>>> +        xe_gt_assert(gt, (num_masks % GUC_MAX_ENGINE_CLASSES) == 0);
>>>> +
>>>> +        if ((m == XE_SRIOV_SCHED_GROUPS_NONE) || num_masks)
>>>> +            gt->sriov.pf.policy.guc.sched_groups.supported_modes |= BIT(m);
>>>> +
>>>> +        gt->sriov.pf.policy.guc.sched_groups.modes[m].masks = masks;
>>>> +        gt->sriov.pf.policy.guc.sched_groups.modes[m].num_masks = num_masks;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>>    static void pf_sanitize_guc_policies(struct xe_gt *gt)
>>>>    {
>>>>        pf_sanitize_sched_if_idle(gt);
>>>> @@ -401,6 +523,19 @@ int xe_gt_sriov_pf_policy_reprovision(struct xe_gt *gt, bool reset)
>>>>        return err ? -ENXIO : 0;
>>>>    }
>>>>    +/**
>>>> + * xe_gt_sriov_pf_policy_init - Initializes the SW state of the PF policies.
>>>> + * @gt: the &xe_gt
>>>> + *
>>>> + * This function can only be called on PF. This function does not touch the HW.
>>> but must be called after checking HW engine fuses, right?
>> yes, I'll expand the doc.
>>
>>>> + *
>>>> + * Return: 0 on success or a negative error code on failure.
>>>> + */
>>>> +int xe_gt_sriov_pf_policy_init(struct xe_gt *gt)
>>>> +{
>>>> +    return pf_init_sched_groups(gt);
>>>> +}
>>>> +
>>>>    static void print_guc_policies(struct drm_printer *p, struct xe_gt_sriov_guc_policies *policy)
>>>>    {
>>>>        drm_printf(p, "%s:\t%s\n",
>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>>> index 2a5dc33dc6d7..c9c04d1b7f50 100644
>>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>>> @@ -18,6 +18,7 @@ bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
>>>>    int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
>>>>    u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
>>>>    +int xe_gt_sriov_pf_policy_init(struct xe_gt *gt);
>>>>    void xe_gt_sriov_pf_policy_sanitize(struct xe_gt *gt);
>>>>    int xe_gt_sriov_pf_policy_reprovision(struct xe_gt *gt, bool reset);
>>>>    int xe_gt_sriov_pf_policy_print(struct xe_gt *gt, struct drm_printer *p);
>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
>>>> index 4de532af135e..3b915801c01b 100644
>>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
>>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
>>>> @@ -8,16 +8,30 @@
>>>>      #include <linux/types.h>
>>>>    
>>> we expect all enums (and enumerators) to have proper kernel-doc
>> will add
>>
>>>> +enum xe_sriov_sched_group_modes {
>>>> +    XE_SRIOV_SCHED_GROUPS_NONE = 0, /* disabled */
>>> do we really need this?
>> This makes things much easier in the follow up patches, because we don't have to treat the disabling as a special case. If the user want to disabled the feature we map it to this enum and then just handle it as any other mode.
>>
>>>> +    XE_SRIOV_SCHED_GROUPS_MEDIA_SLICES, /* separate groups for each media slice */
>>>> +    XE_SRIOV_SCHED_GROUPS_MODES_COUNT
>>> while it looks handy solution to count enumerators,
>>> now COUNT pretends to be a valid mode enumerator while it is not
>>>
>>> maybe just provide separate #define for it?
>> IMO this is safer for future enum extensions.
> missing enumerators in 'switch' can be captured by the compiler

I was more worried about the for loops.

>
>> Also, we do it this way all over the driver, including in SRIOV code (e.g. XE_SRIOV_VF_CCS_CTX_COUNT).
> it wasn't me to let this in ;)

it is still the standard in the driver though.

>
>>>> +};
>>>> +
>>>>    /**
>>>>     * struct xe_gt_sriov_guc_policies - GuC SR-IOV policies.
>>>>     * @sched_if_idle: controls strict scheduling policy.
>>>>     * @reset_engine: controls engines reset on VF switch policy.
>>>>     * @sample_period: adverse events sampling period (in milliseconds).
>>>> + * @sched_groups: available scheduling group configurations and current mode.
>>> don't forget to describe inner members
>>>
>>> and there is no 'current' now
>> oops, that comes later
>>
>>>>     */
>>>>    struct xe_gt_sriov_guc_policies {
>>>>        bool sched_if_idle;
>>>>        bool reset_engine;
>>>>        u32 sample_period;
>>>> +    struct {
>>>> +        u32 supported_modes;
>>>> +        struct {
>>>> +            u32 *masks;
>>>> +            u32 num_masks;
>>>> +        } modes[XE_SRIOV_SCHED_GROUPS_MODES_COUNT];
>>> hmm, for the NONE mode, all groups must be 0, so why bother with .masks/.num fields for it?
>> Again for follow up. To disable the groups we need to send the KLV with no data. The way I've done it is that we just set the mode to XE_SRIOV_SCHED_GROUPS_NONE and the underlying code will automatically handle it because the mask is null, without having to know that this particular setting is actually turning the feature off. See __pf_provision_sched_groups in the next patch.
> but why make it so hidden/secret (and to some extend error-prone)
>
> disabling EGS is an explicit activity, IMO it shouldn't be done due to lack of some bits (that might be not intentional)

I don't think this is hidden/secret. The GuC specs says that to disable 
EGS we need to send the same KLV we do to enable it, just without any 
data. IMO this means that we can use the exact same code paths for both 
enable and disable, just with NULL data in the disable case. The only 
way we can end up disabling the feature by mistake is if a mode has 
empty masks when it shouldn't, in which case the whole thing is 
completely broken because that mode wouldn't work at all and we should 
fix that.

>
>> Also, I was thinking that for sysfs we could allow the code to set XE_SRIOV_SCHED_GROUPS_MEDIA_SLICES on both GTs and since on GT0 that's not a supported mode it will automatically map to empty masks/num_masks, which will tell the GuC to keep the feature disabled.
> not quite happy, but if you insist, will not block on that

Thanks! we can rework later if needed, but I believe that at least for 
now it makes the code simpler.

Daniele

>
>> Daniele
>>
>>> maybe all we need is:
>>>
>>>      struct {
>>>          const char *name;
>>>          u32 *config;
>>>      } modes[];
>>>
>>> then:
>>>
>>>      supported_modes := iterate over modes array and look for non-null config/name
>>>
>>>      none := supported_modes is empty
>>>
>>>      current := pointer to modes[] or NULL
>>>
>>>> +    } sched_groups;
>>>>    };
>>>>      /**


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/10] drm/xe/sriov: Initialize scheduler groups
  2025-12-02 21:08       ` Michal Wajdeczko
  2025-12-02 23:02         ` Daniele Ceraolo Spurio
@ 2025-12-03  1:15         ` Daniele Ceraolo Spurio
  1 sibling, 0 replies; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-12-03  1:15 UTC (permalink / raw)
  To: Michal Wajdeczko, intel-xe

<snip>

>>>> +static int pf_sched_group_media_slices(struct xe_gt *gt, u32 **masks, u32 *num_masks)
>>>> +{
>>>> +    u8 slice_to_group[MAX_MEDIA_SLICES];
>>>> +    struct xe_hw_engine *hwe = NULL;
>>> do we need to initialize this here?
>>> it will be initialized by for_each_hw_engine, no?
>>>
>>>> +    enum xe_hw_engine_id id;
>>>> +    u32 vcs_mask = VCS_MASK(gt);
>>>> +    u32 vecs_mask = VECS_MASK(gt);
>>>> +    u32 gsc_mask = GSCCS_MASK(gt);
>>> we try to define vars in rev-xmas-tree order
>>>
>>>> +    u32 *values;
>>>> +    u8 slice;
>>>> +    u8 groups;
>>> I guess for those two generic counters we don't need to use explicitly sized int
>>>
>>>> +
>>>> +    xe_gt_assert(gt, xe_gt_is_media_type(gt));
>>>> +
>>>> +    /* A media slice has 2 VCS and a VECS. We bundle the GSC with the first slice */
>>>> +    for (slice = 0, groups = 0;
>>>> +         gsc_mask || vcs_mask || vecs_mask;
>>>> +         slice++, gsc_mask = 0, vcs_mask >>= 2, vecs_mask >>= 1) {
>>>> +        if (unlikely(slice >= MAX_MEDIA_SLICES)) {
>>> maybe this 'for' loop should be just for 'slice = [0, MAX)'
>>>
>>>> +            xe_gt_sriov_err(gt, "Too many media slices (%u) during EGS setup\n",
>>>> +                    slice);
>>> then after this loop we can have asserts that no engine instances are left behind
>>>
>>>> +            return -EINVAL;
>>>> +        }
>>>> +
>>>> +        if ((vcs_mask & 0x3) || (vecs_mask & 0x1) || (gsc_mask & 0x1))
>>>> +            slice_to_group[slice] = groups++;
>>>> +    }
>>> like this:
>>>
>>>      for (slice = 0; slice < MAX_MEDIA_SLICES; slice++ ) {
>>>
>>>          if ((vcs_mask & 0x3) || (vecs_mask & 0x1) || (gsc_mask & 0x1))
>>>              slice_to_group[slice] = groups++;
>>>
>>>          vcs_mask >>= 2;
>>>          vecs_mask >>= 1;
>>>          gsc_mask >>= 1;
>>>      }
>>>
>>>      xe_gt_assert(gt, !vcs_mask)
>>>      xe_gt_assert(gt, !vecs_mask)
>>>      xe_gt_assert(gt, !gsc_mask)
>> sure, will change
>>
>>>> +
>>>> +    /* We need at least 2 slices to split them up */
>>> shouldn't we also check that each group has both VCS and VECS?
>> It's not really necessary, having a group that only has a VCS or a VECS is a valid scenario. For example, the main use-case we have right now (remote desktop) only cares about VCS, so not having a VECS in a group wouldn't impact it.
>>
>>>> +    if (groups < 2) {
>>>> +        *masks = NULL;
>>>> +        *num_masks = 0;
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * The GuC expects and array with GUC_MAX_ENGINE_CLASSES entries for
>>> typo: an array ?
>> yup
>>
>>>> +     * each group.
>>>> +     */
>>>> +    values = drmm_kzalloc(&gt_to_xe(gt)->drm,
>>>> +                  GUC_MAX_ENGINE_CLASSES * groups * sizeof(u32),
>>>> +                  GFP_KERNEL);
>>>> +    if (!values)
>>>> +        return -ENOMEM;
>>>> +
>>>> +    for_each_hw_engine(hwe, gt, id) {
>>>> +        u8 guc_class = xe_engine_class_to_guc_class(hwe->class);
>>>> +        u8 entry;
>>>> +
>>>> +        switch (hwe->class) {
>>>> +        case XE_ENGINE_CLASS_VIDEO_DECODE:
>>>> +            slice = hwe->instance / 2;
>>> hmm, this seems to duplicate previous loop where we had:
>>>
>>>              vcs_mask >>= 2;
>>>
>>> maybe these two loops can be combined?
>> The issue is that there is an allocation in the middle, which is required in the second loop and sized based on the results of the first loop. I could always allocate enough memory for all the possible groups (i.e. MAX_MEDIA_SLICES), no matter how many there actually are, but I'm not sure if that is better than the double loop. I'm ok with switching if you think that's a better solution.
> if you can try, then yes please, but if that's too much - we can rework this later if needed

I tried this and it ends up a bit ugly, because in the first loop we 
have physical instances masks, while in the second loop we need masks of 
logical instances. To convert from one to the other we need to split up 
the physical mask, get the hwe structures for each engine and then get 
the logical values from there. I ended up with 3 nested loops, which IMO 
makes things much less readable. I'll stick with the current version for 
now and we can rework later as you mentioned if needed.

Daniele

>
>>>> +            break;
>>>> +        case XE_ENGINE_CLASS_VIDEO_ENHANCE:
>>>> +            slice = hwe->instance;
>>>> +            break;
>>>> +        case XE_ENGINE_CLASS_OTHER:
>>>> +            slice = 0;
>>>> +            break;
>>>> +        default:
>>>> +            xe_gt_sriov_err(gt, "unknown media gt class %u (%s) during EGS setup\n",
>>>> +                    hwe->class, hwe->name);
>>> this also seems like an attempt to catch our coding error,
>>> so plain assert should be sufficient
>> ok
>>
>>>> +            drmm_kfree(&gt_to_xe(gt)->drm, values);
>>> do we need to bother with this kfree ?
>>> it's managed allocation and any error from here will lead to abort the probe anyway
>> will drop
>>
>>>> +            return -EINVAL;
>>>> +        }
>>>> +
>>>> +        entry = (slice_to_group[slice] * GUC_MAX_ENGINE_CLASSES) + guc_class;
>>> redundant ( )
>>>
>>>> +        values[entry] |= BIT(hwe->logical_instance);
>>>> +    }
>>>> +
>>>> +    *masks = values;
>>>> +    *num_masks = GUC_MAX_ENGINE_CLASSES * groups;
>>>> +
>>>> +    return 0;
>>>> +}


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 03/10] drm/xe/sriov: Add support for enabling scheduler groups
  2025-11-27  1:45 [PATCH 00/10] Introduce SRIOV scheduler groups Daniele Ceraolo Spurio
  2025-11-27  1:45 ` [PATCH 01/10] drm/xe/gt: Add engine masks for each class Daniele Ceraolo Spurio
  2025-11-27  1:45 ` [PATCH 02/10] drm/xe/sriov: Initialize scheduler groups Daniele Ceraolo Spurio
@ 2025-11-27  1:45 ` Daniele Ceraolo Spurio
  2025-12-02 11:49   ` Michal Wajdeczko
  2025-11-27  1:45 ` [PATCH 04/10] drm/xe/sriov: Scheduler groups are incompatible with multi-lrc Daniele Ceraolo Spurio
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-11-27  1:45 UTC (permalink / raw)
  To: intel-xe; +Cc: Daniele Ceraolo Spurio, Michal Wajdeczko

Schedler groups are enabled by sending a specific policy configuration
KLV to the GuC. We don't allow changing this policy if there are VF
active, since the expectation is that the VF will only check if the
feature is enabled during driver initialization.

The functions added by this patch will be used by sysfs/debugfs, coming
in follow up patches.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/abi/guc_klvs_abi.h         |  17 +++
 drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c    | 129 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h    |   1 +
 .../gpu/drm/xe/xe_gt_sriov_pf_policy_types.h  |   1 +
 4 files changed, 148 insertions(+)

diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
index 265a135e7061..274f1b1ec37f 100644
--- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
+++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
@@ -200,6 +200,20 @@ enum  {
  *      :0: adverse events are not counted (default)
  *      :n: sample period in milliseconds
  *
+ * _`GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG` : 0x8004
+ *      Ths config allows the PF to split the engines across scheduling groups.
+ *      Each group is independently timesliced across VFs, allowing different
+ *      VFs to be active on the HW at the same time. When enabling this feature,
+ *      all engines must be assigned to a group (and only one group), or they
+ *      will be excluded from scheduling after this KLV is sent. To enable
+ *      the groups, the driver must provide a masks array with
+ *      GUC_MAX_ENGINE_CLASSES entries for each group, with each mask indicating
+ *      which logical instances of that class belong to the group. Therefore,
+ *      the length of this KLV when enabling groups is
+ *      num_groups * GUC_MAX_ENGINE_CLASSES. To disable the groups, the driver
+ *      must send the KLV without any payload (i.e. len = 0). The maximum
+ *      number of groups is 8.
+ *
  * _`GUC_KLV_VGT_POLICY_RESET_AFTER_VF_SWITCH` : 0x8D00
  *      This enum is to reset utilized HW engine after VF Switch (i.e to clean
  *      up Stale HW register left behind by previous VF)
@@ -214,6 +228,9 @@ enum  {
 #define GUC_KLV_VGT_POLICY_ADVERSE_SAMPLE_PERIOD_KEY	0x8002
 #define GUC_KLV_VGT_POLICY_ADVERSE_SAMPLE_PERIOD_LEN	1u
 
+#define GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG_KEY	0x8004
+#define GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT	8
+
 #define GUC_KLV_VGT_POLICY_RESET_AFTER_VF_SWITCH_KEY	0x8D00
 #define GUC_KLV_VGT_POLICY_RESET_AFTER_VF_SWITCH_LEN	1u
 
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
index 9b878578ea90..48f250ae0d0d 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
@@ -97,6 +97,25 @@ static int pf_push_policy_u32(struct xe_gt *gt, u16 key, u32 value)
 	return pf_push_policy_klvs(gt, 1, klv, ARRAY_SIZE(klv));
 }
 
+static int pf_push_policy_payload(struct xe_gt *gt, u16 key, u32 *payload, u32 num_dwords)
+{
+	u32 *klv;
+	int err;
+
+	klv = kzalloc((num_dwords + 1) * sizeof(u32), GFP_KERNEL);
+	if (!klv)
+		return -ENOMEM;
+
+	klv[0] = PREP_GUC_KLV(key, num_dwords);
+	if (num_dwords)
+		memcpy(&klv[1], payload, num_dwords * sizeof(u32));
+
+	err = pf_push_policy_klvs(gt, 1, klv, num_dwords + 1);
+
+	kfree(klv);
+	return err;
+}
+
 static int pf_update_policy_bool(struct xe_gt *gt, u16 key, bool *policy, bool value)
 {
 	int err;
@@ -444,6 +463,7 @@ static int pf_init_sched_groups(struct xe_gt *gt)
 	for (m = 0; m < XE_SRIOV_SCHED_GROUPS_MODES_COUNT; m++) {
 		u32 *masks = NULL;
 		u32 num_masks = 0;
+		u32 num_groups = 0;
 
 		switch (m) {
 		case XE_SRIOV_SCHED_GROUPS_NONE:
@@ -463,6 +483,13 @@ static int pf_init_sched_groups(struct xe_gt *gt)
 
 		xe_gt_assert(gt, (num_masks % GUC_MAX_ENGINE_CLASSES) == 0);
 
+		num_groups = num_masks / GUC_MAX_ENGINE_CLASSES;
+		if (num_groups > GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT) {
+			xe_gt_sriov_err(gt, "too many groups (%u) for sched group mode %u\n",
+					num_groups, m);
+			return -EINVAL;
+		}
+
 		if ((m == XE_SRIOV_SCHED_GROUPS_NONE) || num_masks)
 			gt->sriov.pf.policy.guc.sched_groups.supported_modes |= BIT(m);
 
@@ -473,11 +500,112 @@ static int pf_init_sched_groups(struct xe_gt *gt)
 	return 0;
 }
 
+static bool
+pf_policy_has_sched_group_modes(struct xe_gt *gt, unsigned long mask)
+{
+	return gt->sriov.pf.policy.guc.sched_groups.supported_modes & mask;
+}
+
+static bool pf_policy_has_valid_sched_group_modes(struct xe_gt *gt)
+{
+	return pf_policy_has_sched_group_modes(gt, ~BIT(XE_SRIOV_SCHED_GROUPS_NONE));
+}
+
+static bool pf_policy_has_sched_group_mode(struct xe_gt *gt, u32 mode)
+{
+	return pf_policy_has_sched_group_modes(gt, BIT(mode));
+}
+
+static int __pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
+{
+	u32 *masks = gt->sriov.pf.policy.guc.sched_groups.modes[mode].masks;
+	u32 num_masks = gt->sriov.pf.policy.guc.sched_groups.modes[mode].num_masks;
+
+	xe_gt_assert(gt, (num_masks % GUC_MAX_ENGINE_CLASSES) == 0);
+
+	return pf_push_policy_payload(gt, GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG_KEY,
+				      masks, num_masks);
+}
+
+static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
+{
+	int err;
+
+	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
+	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
+
+	if (!pf_policy_has_sched_group_mode(gt, mode))
+		return -EINVAL;
+
+	/* already in the desired mode */
+	if (gt->sriov.pf.policy.guc.sched_groups.current_mode == mode)
+		return 0;
+
+	/*
+	 * We don't allow changing this with VFs active since it is hard for
+	 * VFs to check.
+	 */
+	if (xe_sriov_pf_num_vfs(gt_to_xe(gt)))
+		return -EPERM;
+
+	err = __pf_provision_sched_groups(gt, mode);
+	if (err)
+		return err;
+
+	gt->sriov.pf.policy.guc.sched_groups.current_mode = mode;
+
+	return 0;
+}
+
+static int pf_reprovision_sched_groups(struct xe_gt *gt)
+{
+	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
+	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
+
+	/* We only have something to provision if we have possible groups */
+	if (!pf_policy_has_valid_sched_group_modes(gt))
+		return 0;
+
+	return __pf_provision_sched_groups(gt, gt->sriov.pf.policy.guc.sched_groups.current_mode);
+}
+
+static void pf_sanitize_sched_groups(struct xe_gt *gt)
+{
+	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
+	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
+
+	gt->sriov.pf.policy.guc.sched_groups.current_mode = XE_SRIOV_SCHED_GROUPS_NONE;
+}
+
+/**
+ * xe_gt_sriov_pf_policy_set_sched_groups_mode - Control the 'sched_groups' policy.
+ * @gt: the &xe_gt where to apply the policy
+ * @value: the sched_group mode to be activated (see enum xe_sriov_sched_group_modes)
+ *
+ * This function can only be called on PF.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value)
+{
+	int err;
+
+	if (!(pf_policy_has_valid_sched_group_modes(gt)))
+		return -ENODEV;
+
+	mutex_lock(xe_gt_sriov_pf_master_mutex(gt));
+	err = pf_provision_sched_groups(gt, value);
+	mutex_unlock(xe_gt_sriov_pf_master_mutex(gt));
+
+	return err;
+}
+
 static void pf_sanitize_guc_policies(struct xe_gt *gt)
 {
 	pf_sanitize_sched_if_idle(gt);
 	pf_sanitize_reset_engine(gt);
 	pf_sanitize_sample_period(gt);
+	pf_sanitize_sched_groups(gt);
 }
 
 /**
@@ -516,6 +644,7 @@ int xe_gt_sriov_pf_policy_reprovision(struct xe_gt *gt, bool reset)
 	err |= pf_reprovision_sched_if_idle(gt);
 	err |= pf_reprovision_reset_engine(gt);
 	err |= pf_reprovision_sample_period(gt);
+	err |= pf_reprovision_sched_groups(gt);
 	mutex_unlock(xe_gt_sriov_pf_master_mutex(gt));
 
 	xe_pm_runtime_put(gt_to_xe(gt));
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
index c9c04d1b7f50..36680996f2bd 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
@@ -17,6 +17,7 @@ int xe_gt_sriov_pf_policy_set_reset_engine(struct xe_gt *gt, bool enable);
 bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
 int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
 u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
+int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value);
 
 int xe_gt_sriov_pf_policy_init(struct xe_gt *gt);
 void xe_gt_sriov_pf_policy_sanitize(struct xe_gt *gt);
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
index 3b915801c01b..5d44d23a5ed4 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
@@ -27,6 +27,7 @@ struct xe_gt_sriov_guc_policies {
 	u32 sample_period;
 	struct {
 		u32 supported_modes;
+		enum xe_sriov_sched_group_modes current_mode;
 		struct {
 			u32 *masks;
 			u32 num_masks;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 03/10] drm/xe/sriov: Add support for enabling scheduler groups
  2025-11-27  1:45 ` [PATCH 03/10] drm/xe/sriov: Add support for enabling " Daniele Ceraolo Spurio
@ 2025-12-02 11:49   ` Michal Wajdeczko
  2025-12-02 17:39     ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 44+ messages in thread
From: Michal Wajdeczko @ 2025-12-02 11:49 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-xe



On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
> Schedler groups are enabled by sending a specific policy configuration

typo: Scheduler ?

> KLV to the GuC. We don't allow changing this policy if there are VF
> active, since the expectation is that the VF will only check if the
> feature is enabled during driver initialization.
> 
> The functions added by this patch will be used by sysfs/debugfs, coming
> in follow up patches.
> 
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> ---
>  drivers/gpu/drm/xe/abi/guc_klvs_abi.h         |  17 +++
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c    | 129 ++++++++++++++++++
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h    |   1 +
>  .../gpu/drm/xe/xe_gt_sriov_pf_policy_types.h  |   1 +
>  4 files changed, 148 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
> index 265a135e7061..274f1b1ec37f 100644
> --- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
> +++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
> @@ -200,6 +200,20 @@ enum  {
>   *      :0: adverse events are not counted (default)
>   *      :n: sample period in milliseconds
>   *
> + * _`GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG` : 0x8004
> + *      Ths config allows the PF to split the engines across scheduling groups.

typo: This

> + *      Each group is independently timesliced across VFs, allowing different
> + *      VFs to be active on the HW at the same time. When enabling this feature,
> + *      all engines must be assigned to a group (and only one group), or they
> + *      will be excluded from scheduling after this KLV is sent. To enable
> + *      the groups, the driver must provide a masks array with
> + *      GUC_MAX_ENGINE_CLASSES entries for each group, with each mask indicating
> + *      which logical instances of that class belong to the group. Therefore,
> + *      the length of this KLV when enabling groups is
> + *      num_groups * GUC_MAX_ENGINE_CLASSES. To disable the groups, the driver
> + *      must send the KLV without any payload (i.e. len = 0). The maximum
> + *      number of groups is 8.

don't forget to update xe_guc_klv_key_to_string() to recognize this new KEY

> + *
>   * _`GUC_KLV_VGT_POLICY_RESET_AFTER_VF_SWITCH` : 0x8D00
>   *      This enum is to reset utilized HW engine after VF Switch (i.e to clean
>   *      up Stale HW register left behind by previous VF)
> @@ -214,6 +228,9 @@ enum  {
>  #define GUC_KLV_VGT_POLICY_ADVERSE_SAMPLE_PERIOD_KEY	0x8002
>  #define GUC_KLV_VGT_POLICY_ADVERSE_SAMPLE_PERIOD_LEN	1u
>  
> +#define GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG_KEY	0x8004

maybe we should add some _LEN macros for completeness?

   #define GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG_MIN_LEN	0u
   #define GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG_MAX_LEN \
	(GUC_MAX_ENGINE_CLASSES * GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT)

which then can be used in some asserts where we prepare KLV payloads

> +#define GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT	8> +
>  #define GUC_KLV_VGT_POLICY_RESET_AFTER_VF_SWITCH_KEY	0x8D00
>  #define GUC_KLV_VGT_POLICY_RESET_AFTER_VF_SWITCH_LEN	1u
>  
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
> index 9b878578ea90..48f250ae0d0d 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
> @@ -97,6 +97,25 @@ static int pf_push_policy_u32(struct xe_gt *gt, u16 key, u32 value)
>  	return pf_push_policy_klvs(gt, 1, klv, ARRAY_SIZE(klv));
>  }
>  
> +static int pf_push_policy_payload(struct xe_gt *gt, u16 key, u32 *payload, u32 num_dwords)
> +{
> +	u32 *klv;
> +	int err;
> +
> +	klv = kzalloc((num_dwords + 1) * sizeof(u32), GFP_KERNEL);

no need for extra alloc, use 

	CLASS(xe_guc_buf, buf)(&gt->uc.guc.buf, GUC_KLV_LEN_MIN + num_dwords);

> +	if (!klv)
> +		return -ENOMEM;
> +
> +	klv[0] = PREP_GUC_KLV(key, num_dwords);
> +	if (num_dwords)
> +		memcpy(&klv[1], payload, num_dwords * sizeof(u32));
> +
> +	err = pf_push_policy_klvs(gt, 1, klv, num_dwords + 1);

and then

	return pf_push_policy_buf_klvs(gt, 1, buf, GUC_KLV_LEN_MIN + num_dwords);

> +
> +	kfree(klv);
> +	return err;
> +}
> +
>  static int pf_update_policy_bool(struct xe_gt *gt, u16 key, bool *policy, bool value)
>  {
>  	int err;
> @@ -444,6 +463,7 @@ static int pf_init_sched_groups(struct xe_gt *gt)
>  	for (m = 0; m < XE_SRIOV_SCHED_GROUPS_MODES_COUNT; m++) {
>  		u32 *masks = NULL;
>  		u32 num_masks = 0;
> +		u32 num_groups = 0;
>  
>  		switch (m) {
>  		case XE_SRIOV_SCHED_GROUPS_NONE:
> @@ -463,6 +483,13 @@ static int pf_init_sched_groups(struct xe_gt *gt)
>  
>  		xe_gt_assert(gt, (num_masks % GUC_MAX_ENGINE_CLASSES) == 0);
>  
> +		num_groups = num_masks / GUC_MAX_ENGINE_CLASSES;
> +		if (num_groups > GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT) {
> +			xe_gt_sriov_err(gt, "too many groups (%u) for sched group mode %u\n",
> +					num_groups, m);

likely can be replaced by xe_gt_assert

> +			return -EINVAL;
> +		}
> +
>  		if ((m == XE_SRIOV_SCHED_GROUPS_NONE) || num_masks)
>  			gt->sriov.pf.policy.guc.sched_groups.supported_modes |= BIT(m);
>  
> @@ -473,11 +500,112 @@ static int pf_init_sched_groups(struct xe_gt *gt)
>  	return 0;
>  }
>  
> +static bool
> +pf_policy_has_sched_group_modes(struct xe_gt *gt, unsigned long mask)
> +{
> +	return gt->sriov.pf.policy.guc.sched_groups.supported_modes & mask;
> +}
> +
> +static bool pf_policy_has_valid_sched_group_modes(struct xe_gt *gt)
> +{
> +	return pf_policy_has_sched_group_modes(gt, ~BIT(XE_SRIOV_SCHED_GROUPS_NONE));

hmm, I still don't buy that NONE must be represented as valid BIT
IMO supported_modes shall only hold bits for valid configs/modes
and supported_modes == 0 would indicate no support for EGS

> +}
> +
> +static bool pf_policy_has_sched_group_mode(struct xe_gt *gt, u32 mode)
> +{
> +	return pf_policy_has_sched_group_modes(gt, BIT(mode));
> +}
> +
> +static int __pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
> +{
> +	u32 *masks = gt->sriov.pf.policy.guc.sched_groups.modes[mode].masks;
> +	u32 num_masks = gt->sriov.pf.policy.guc.sched_groups.modes[mode].num_masks;
> +
> +	xe_gt_assert(gt, (num_masks % GUC_MAX_ENGINE_CLASSES) == 0);
> +
> +	return pf_push_policy_payload(gt, GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG_KEY,
> +				      masks, num_masks);

having helper for explicit disabling EGS would be nice:

	return pf_push_policy_payload(gt, GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG_KEY, 0, 0);

> +}
> +
> +static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
> +{
> +	int err;
> +
> +	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
> +	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
> +
> +	if (!pf_policy_has_sched_group_mode(gt, mode))
> +		return -EINVAL;
> +
> +	/* already in the desired mode */
> +	if (gt->sriov.pf.policy.guc.sched_groups.current_mode == mode)
> +		return 0;
> +
> +	/*
> +	 * We don't allow changing this with VFs active since it is hard for
> +	 * VFs to check.
> +	 */
> +	if (xe_sriov_pf_num_vfs(gt_to_xe(gt)))
> +		return -EPERM;

maybe -EBUSY instead?

> +
> +	err = __pf_provision_sched_groups(gt, mode);
> +	if (err)
> +		return err;
> +
> +	gt->sriov.pf.policy.guc.sched_groups.current_mode = mode;
> +
> +	return 0;
> +}
> +
> +static int pf_reprovision_sched_groups(struct xe_gt *gt)
> +{
> +	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
> +	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
> +
> +	/* We only have something to provision if we have possible groups */
> +	if (!pf_policy_has_valid_sched_group_modes(gt))
> +		return 0;
> +
> +	return __pf_provision_sched_groups(gt, gt->sriov.pf.policy.guc.sched_groups.current_mode);
> +}
> +
> +static void pf_sanitize_sched_groups(struct xe_gt *gt)
> +{
> +	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
> +	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
> +
> +	gt->sriov.pf.policy.guc.sched_groups.current_mode = XE_SRIOV_SCHED_GROUPS_NONE;
> +}
> +
> +/**
> + * xe_gt_sriov_pf_policy_set_sched_groups_mode - Control the 'sched_groups' policy.

new BKM is to add () after function name

    * xe_gt_sriov_pf_policy_set_sched_groups_mode() - Control ...

> + * @gt: the &xe_gt where to apply the policy
> + * @value: the sched_group mode to be activated (see enum xe_sriov_sched_group_modes)

maybe at this point we should already use enum instead u32 ?

> + *
> + * This function can only be called on PF.
> + *
> + * Return: 0 on success or a negative error code on failure.
> + */
> +int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value)
> +{
> +	int err;
> +
> +	if (!(pf_policy_has_valid_sched_group_modes(gt)))
> +		return -ENODEV;
> +
> +	mutex_lock(xe_gt_sriov_pf_master_mutex(gt));

in Xe we started converting driver to use

	guard(mutex)(...)

> +	err = pf_provision_sched_groups(gt, value);
> +	mutex_unlock(xe_gt_sriov_pf_master_mutex(gt));
> +
> +	return err;
> +}
> +
>  static void pf_sanitize_guc_policies(struct xe_gt *gt)
>  {
>  	pf_sanitize_sched_if_idle(gt);
>  	pf_sanitize_reset_engine(gt);
>  	pf_sanitize_sample_period(gt);
> +	pf_sanitize_sched_groups(gt);
>  }
>  
>  /**
> @@ -516,6 +644,7 @@ int xe_gt_sriov_pf_policy_reprovision(struct xe_gt *gt, bool reset)
>  	err |= pf_reprovision_sched_if_idle(gt);
>  	err |= pf_reprovision_reset_engine(gt);
>  	err |= pf_reprovision_sample_period(gt);
> +	err |= pf_reprovision_sched_groups(gt);
>  	mutex_unlock(xe_gt_sriov_pf_master_mutex(gt));
>  
>  	xe_pm_runtime_put(gt_to_xe(gt));
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
> index c9c04d1b7f50..36680996f2bd 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
> @@ -17,6 +17,7 @@ int xe_gt_sriov_pf_policy_set_reset_engine(struct xe_gt *gt, bool enable);
>  bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
>  int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
>  u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
> +int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value);
>  
>  int xe_gt_sriov_pf_policy_init(struct xe_gt *gt);
>  void xe_gt_sriov_pf_policy_sanitize(struct xe_gt *gt);
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
> index 3b915801c01b..5d44d23a5ed4 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
> @@ -27,6 +27,7 @@ struct xe_gt_sriov_guc_policies {
>  	u32 sample_period;
>  	struct {
>  		u32 supported_modes;
> +		enum xe_sriov_sched_group_modes current_mode;
>  		struct {
>  			u32 *masks;
>  			u32 num_masks;


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 03/10] drm/xe/sriov: Add support for enabling scheduler groups
  2025-12-02 11:49   ` Michal Wajdeczko
@ 2025-12-02 17:39     ` Daniele Ceraolo Spurio
  2025-12-04 22:06       ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-12-02 17:39 UTC (permalink / raw)
  To: Michal Wajdeczko, intel-xe



On 12/2/2025 3:49 AM, Michal Wajdeczko wrote:
>
> On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
>> Schedler groups are enabled by sending a specific policy configuration
> typo: Scheduler ?
>
>> KLV to the GuC. We don't allow changing this policy if there are VF
>> active, since the expectation is that the VF will only check if the
>> feature is enabled during driver initialization.
>>
>> The functions added by this patch will be used by sysfs/debugfs, coming
>> in follow up patches.
>>
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> ---
>>   drivers/gpu/drm/xe/abi/guc_klvs_abi.h         |  17 +++
>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c    | 129 ++++++++++++++++++
>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h    |   1 +
>>   .../gpu/drm/xe/xe_gt_sriov_pf_policy_types.h  |   1 +
>>   4 files changed, 148 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>> index 265a135e7061..274f1b1ec37f 100644
>> --- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>> +++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>> @@ -200,6 +200,20 @@ enum  {
>>    *      :0: adverse events are not counted (default)
>>    *      :n: sample period in milliseconds
>>    *
>> + * _`GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG` : 0x8004
>> + *      Ths config allows the PF to split the engines across scheduling groups.
> typo: This
>
>> + *      Each group is independently timesliced across VFs, allowing different
>> + *      VFs to be active on the HW at the same time. When enabling this feature,
>> + *      all engines must be assigned to a group (and only one group), or they
>> + *      will be excluded from scheduling after this KLV is sent. To enable
>> + *      the groups, the driver must provide a masks array with
>> + *      GUC_MAX_ENGINE_CLASSES entries for each group, with each mask indicating
>> + *      which logical instances of that class belong to the group. Therefore,
>> + *      the length of this KLV when enabling groups is
>> + *      num_groups * GUC_MAX_ENGINE_CLASSES. To disable the groups, the driver
>> + *      must send the KLV without any payload (i.e. len = 0). The maximum
>> + *      number of groups is 8.
> don't forget to update xe_guc_klv_key_to_string() to recognize this new KEY

ok

>
>> + *
>>    * _`GUC_KLV_VGT_POLICY_RESET_AFTER_VF_SWITCH` : 0x8D00
>>    *      This enum is to reset utilized HW engine after VF Switch (i.e to clean
>>    *      up Stale HW register left behind by previous VF)
>> @@ -214,6 +228,9 @@ enum  {
>>   #define GUC_KLV_VGT_POLICY_ADVERSE_SAMPLE_PERIOD_KEY	0x8002
>>   #define GUC_KLV_VGT_POLICY_ADVERSE_SAMPLE_PERIOD_LEN	1u
>>   
>> +#define GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG_KEY	0x8004
> maybe we should add some _LEN macros for completeness?
>
>     #define GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG_MIN_LEN	0u
>     #define GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG_MAX_LEN \
> 	(GUC_MAX_ENGINE_CLASSES * GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT)
>
> which then can be used in some asserts where we prepare KLV payloads

ok

>
>> +#define GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT	8> +
>>   #define GUC_KLV_VGT_POLICY_RESET_AFTER_VF_SWITCH_KEY	0x8D00
>>   #define GUC_KLV_VGT_POLICY_RESET_AFTER_VF_SWITCH_LEN	1u
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>> index 9b878578ea90..48f250ae0d0d 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>> @@ -97,6 +97,25 @@ static int pf_push_policy_u32(struct xe_gt *gt, u16 key, u32 value)
>>   	return pf_push_policy_klvs(gt, 1, klv, ARRAY_SIZE(klv));
>>   }
>>   
>> +static int pf_push_policy_payload(struct xe_gt *gt, u16 key, u32 *payload, u32 num_dwords)
>> +{
>> +	u32 *klv;
>> +	int err;
>> +
>> +	klv = kzalloc((num_dwords + 1) * sizeof(u32), GFP_KERNEL);
> no need for extra alloc, use
>
> 	CLASS(xe_guc_buf, buf)(&gt->uc.guc.buf, GUC_KLV_LEN_MIN + num_dwords);
>
>> +	if (!klv)
>> +		return -ENOMEM;
>> +
>> +	klv[0] = PREP_GUC_KLV(key, num_dwords);
>> +	if (num_dwords)
>> +		memcpy(&klv[1], payload, num_dwords * sizeof(u32));
>> +
>> +	err = pf_push_policy_klvs(gt, 1, klv, num_dwords + 1);
> and then
>
> 	return pf_push_policy_buf_klvs(gt, 1, buf, GUC_KLV_LEN_MIN + num_dwords);

ok

>
>> +
>> +	kfree(klv);
>> +	return err;
>> +}
>> +
>>   static int pf_update_policy_bool(struct xe_gt *gt, u16 key, bool *policy, bool value)
>>   {
>>   	int err;
>> @@ -444,6 +463,7 @@ static int pf_init_sched_groups(struct xe_gt *gt)
>>   	for (m = 0; m < XE_SRIOV_SCHED_GROUPS_MODES_COUNT; m++) {
>>   		u32 *masks = NULL;
>>   		u32 num_masks = 0;
>> +		u32 num_groups = 0;
>>   
>>   		switch (m) {
>>   		case XE_SRIOV_SCHED_GROUPS_NONE:
>> @@ -463,6 +483,13 @@ static int pf_init_sched_groups(struct xe_gt *gt)
>>   
>>   		xe_gt_assert(gt, (num_masks % GUC_MAX_ENGINE_CLASSES) == 0);
>>   
>> +		num_groups = num_masks / GUC_MAX_ENGINE_CLASSES;
>> +		if (num_groups > GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT) {
>> +			xe_gt_sriov_err(gt, "too many groups (%u) for sched group mode %u\n",
>> +					num_groups, m);
> likely can be replaced by xe_gt_assert
>
>> +			return -EINVAL;
>> +		}
>> +
>>   		if ((m == XE_SRIOV_SCHED_GROUPS_NONE) || num_masks)
>>   			gt->sriov.pf.policy.guc.sched_groups.supported_modes |= BIT(m);
>>   
>> @@ -473,11 +500,112 @@ static int pf_init_sched_groups(struct xe_gt *gt)
>>   	return 0;
>>   }
>>   
>> +static bool
>> +pf_policy_has_sched_group_modes(struct xe_gt *gt, unsigned long mask)
>> +{
>> +	return gt->sriov.pf.policy.guc.sched_groups.supported_modes & mask;
>> +}
>> +
>> +static bool pf_policy_has_valid_sched_group_modes(struct xe_gt *gt)
>> +{
>> +	return pf_policy_has_sched_group_modes(gt, ~BIT(XE_SRIOV_SCHED_GROUPS_NONE));
> hmm, I still don't buy that NONE must be represented as valid BIT
> IMO supported_modes shall only hold bits for valid configs/modes
> and supported_modes == 0 would indicate no support for EGS

I can change that to not have a bit set for XE_SRIOV_SCHED_GROUPS_NONE, 
but I'd still like to keep that as an enum value as it makes everything 
easier.

>
>> +}
>> +
>> +static bool pf_policy_has_sched_group_mode(struct xe_gt *gt, u32 mode)
>> +{
>> +	return pf_policy_has_sched_group_modes(gt, BIT(mode));
>> +}
>> +
>> +static int __pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>> +{
>> +	u32 *masks = gt->sriov.pf.policy.guc.sched_groups.modes[mode].masks;
>> +	u32 num_masks = gt->sriov.pf.policy.guc.sched_groups.modes[mode].num_masks;
>> +
>> +	xe_gt_assert(gt, (num_masks % GUC_MAX_ENGINE_CLASSES) == 0);
>> +
>> +	return pf_push_policy_payload(gt, GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG_KEY,
>> +				      masks, num_masks);
> having helper for explicit disabling EGS would be nice:
>
> 	return pf_push_policy_payload(gt, GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG_KEY, 0, 0);

IMO that's not really useful. If we have this as a special case then in 
the debugfs/sysfs we need to explicitly check against "disabled" and map 
it to the disabling call, while right now I just have it as part of the 
loop to map string to enum and call the same function.

>
>> +}
>> +
>> +static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>> +{
>> +	int err;
>> +
>> +	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
>> +	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
>> +
>> +	if (!pf_policy_has_sched_group_mode(gt, mode))
>> +		return -EINVAL;
>> +
>> +	/* already in the desired mode */
>> +	if (gt->sriov.pf.policy.guc.sched_groups.current_mode == mode)
>> +		return 0;
>> +
>> +	/*
>> +	 * We don't allow changing this with VFs active since it is hard for
>> +	 * VFs to check.
>> +	 */
>> +	if (xe_sriov_pf_num_vfs(gt_to_xe(gt)))
>> +		return -EPERM;
> maybe -EBUSY instead?

ok

>
>> +
>> +	err = __pf_provision_sched_groups(gt, mode);
>> +	if (err)
>> +		return err;
>> +
>> +	gt->sriov.pf.policy.guc.sched_groups.current_mode = mode;
>> +
>> +	return 0;
>> +}
>> +
>> +static int pf_reprovision_sched_groups(struct xe_gt *gt)
>> +{
>> +	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
>> +	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
>> +
>> +	/* We only have something to provision if we have possible groups */
>> +	if (!pf_policy_has_valid_sched_group_modes(gt))
>> +		return 0;
>> +
>> +	return __pf_provision_sched_groups(gt, gt->sriov.pf.policy.guc.sched_groups.current_mode);
>> +}
>> +
>> +static void pf_sanitize_sched_groups(struct xe_gt *gt)
>> +{
>> +	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
>> +	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
>> +
>> +	gt->sriov.pf.policy.guc.sched_groups.current_mode = XE_SRIOV_SCHED_GROUPS_NONE;
>> +}
>> +
>> +/**
>> + * xe_gt_sriov_pf_policy_set_sched_groups_mode - Control the 'sched_groups' policy.
> new BKM is to add () after function name
>
>      * xe_gt_sriov_pf_policy_set_sched_groups_mode() - Control ...
>
>> + * @gt: the &xe_gt where to apply the policy
>> + * @value: the sched_group mode to be activated (see enum xe_sriov_sched_group_modes)
> maybe at this point we should already use enum instead u32 ?

ok

>
>> + *
>> + * This function can only be called on PF.
>> + *
>> + * Return: 0 on success or a negative error code on failure.
>> + */
>> +int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value)
>> +{
>> +	int err;
>> +
>> +	if (!(pf_policy_has_valid_sched_group_modes(gt)))
>> +		return -ENODEV;
>> +
>> +	mutex_lock(xe_gt_sriov_pf_master_mutex(gt));
> in Xe we started converting driver to use
>
> 	guard(mutex)(...)

ok

Daniele

>
>> +	err = pf_provision_sched_groups(gt, value);
>> +	mutex_unlock(xe_gt_sriov_pf_master_mutex(gt));
>> +
>> +	return err;
>> +}
>> +
>>   static void pf_sanitize_guc_policies(struct xe_gt *gt)
>>   {
>>   	pf_sanitize_sched_if_idle(gt);
>>   	pf_sanitize_reset_engine(gt);
>>   	pf_sanitize_sample_period(gt);
>> +	pf_sanitize_sched_groups(gt);
>>   }
>>   
>>   /**
>> @@ -516,6 +644,7 @@ int xe_gt_sriov_pf_policy_reprovision(struct xe_gt *gt, bool reset)
>>   	err |= pf_reprovision_sched_if_idle(gt);
>>   	err |= pf_reprovision_reset_engine(gt);
>>   	err |= pf_reprovision_sample_period(gt);
>> +	err |= pf_reprovision_sched_groups(gt);
>>   	mutex_unlock(xe_gt_sriov_pf_master_mutex(gt));
>>   
>>   	xe_pm_runtime_put(gt_to_xe(gt));
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>> index c9c04d1b7f50..36680996f2bd 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>> @@ -17,6 +17,7 @@ int xe_gt_sriov_pf_policy_set_reset_engine(struct xe_gt *gt, bool enable);
>>   bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
>>   int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
>>   u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
>> +int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value);
>>   
>>   int xe_gt_sriov_pf_policy_init(struct xe_gt *gt);
>>   void xe_gt_sriov_pf_policy_sanitize(struct xe_gt *gt);
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
>> index 3b915801c01b..5d44d23a5ed4 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
>> @@ -27,6 +27,7 @@ struct xe_gt_sriov_guc_policies {
>>   	u32 sample_period;
>>   	struct {
>>   		u32 supported_modes;
>> +		enum xe_sriov_sched_group_modes current_mode;
>>   		struct {
>>   			u32 *masks;
>>   			u32 num_masks;


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 03/10] drm/xe/sriov: Add support for enabling scheduler groups
  2025-12-02 17:39     ` Daniele Ceraolo Spurio
@ 2025-12-04 22:06       ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-12-04 22:06 UTC (permalink / raw)
  To: Michal Wajdeczko, intel-xe

<snip>

>>> @@ -214,6 +228,9 @@ enum  {
>>>   #define GUC_KLV_VGT_POLICY_ADVERSE_SAMPLE_PERIOD_KEY 0x8002
>>>   #define GUC_KLV_VGT_POLICY_ADVERSE_SAMPLE_PERIOD_LEN    1u
>>>   +#define GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG_KEY 0x8004
>> maybe we should add some _LEN macros for completeness?
>>
>>     #define GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG_MIN_LEN    0u
>>     #define GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG_MAX_LEN \
>>     (GUC_MAX_ENGINE_CLASSES * GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT)
>>
>> which then can be used in some asserts where we prepare KLV payloads
>
> ok
>
Can't actually do this in an easy way because GUC_MAX_ENGINE_CLASSES is 
defined in guc_fwif.h, which already includes guc_klvs_abi.h . Easier to 
just do the multiplication in the .c file.

Daniele

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 04/10] drm/xe/sriov: Scheduler groups are incompatible with multi-lrc
  2025-11-27  1:45 [PATCH 00/10] Introduce SRIOV scheduler groups Daniele Ceraolo Spurio
                   ` (2 preceding siblings ...)
  2025-11-27  1:45 ` [PATCH 03/10] drm/xe/sriov: Add support for enabling " Daniele Ceraolo Spurio
@ 2025-11-27  1:45 ` Daniele Ceraolo Spurio
  2025-12-02 13:32   ` Michal Wajdeczko
  2025-11-27  1:45 ` [PATCH 05/10] drm/xe/sriov: Add debugfs to enable scheduler groups Daniele Ceraolo Spurio
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-11-27  1:45 UTC (permalink / raw)
  To: intel-xe; +Cc: Daniele Ceraolo Spurio, Michal Wajdeczko

Since engines in the same class can be divided across multiple groups,
the GuC does not allow scheduler groups to be active if there are
multi-lrc contexts. This means that:

1) if a mlrc context is registered when we enable scheduler groups, the
   GuC will silently ignore the configuration
2) if a mlrc context is registered after scheduler groups are enabled,
   the GuC will disable the groups and generate an adverse event.

We therefore need to block mlrc context creation when scheduler groups
are enabled.
An adverse event threshold is available for the new adverse event.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/abi/guc_klvs_abi.h         | 14 +++++
 drivers/gpu/drm/xe/xe_exec_queue.c            | 19 ++++++
 drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c    | 30 ++++++++++
 drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h    |  1 +
 drivers/gpu/drm/xe/xe_gt_sriov_vf.c           | 59 +++++++++++++++++++
 drivers/gpu/drm/xe/xe_gt_sriov_vf.h           |  3 +
 drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h     |  2 +
 .../drm/xe/xe_guc_klv_thresholds_set_types.h  |  1 +
 8 files changed, 129 insertions(+)

diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
index 274f1b1ec37f..a6dce9da339f 100644
--- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
+++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
@@ -46,11 +46,18 @@
  *      Refers to 32 bit architecture version as reported by the HW IP.
  *      This key is supported on MTL+ platforms only.
  *      Requires GuC ABI 1.2+.
+ *
+ * _`GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE` : 0x3001
+ *      Tells the driver whether scheduler groups are enabled or not.
+ *      Requres GuC ABI 1.26+
  */
 
 #define GUC_KLV_GLOBAL_CFG_GMD_ID_KEY			0x3000u
 #define GUC_KLV_GLOBAL_CFG_GMD_ID_LEN			1u
 
+#define GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_KEY	0x3001u
+#define GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_LEN	1u
+
 /**
  * DOC: GuC Self Config KLVs
  *
@@ -369,6 +376,10 @@ enum  {
  *      :1: NORMAL = schedule VF always, irrespective of whether it has work or not
  *      :2: HIGH = schedule VF in the next time-slice after current active
  *          time-slice completes if it has active work
+ *
+ * _`GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT` : 0x8A0D
+ *      This config sets the threshold for LRCA context registration when SRIOV
+ *      scheduler groups are enabled.
  */
 
 #define GUC_KLV_VF_CFG_GGTT_START_KEY		0x0001
@@ -427,6 +438,9 @@ enum  {
 #define   GUC_SCHED_PRIORITY_NORMAL		1u
 #define   GUC_SCHED_PRIORITY_HIGH		2u
 
+#define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_KEY	0x8a0d
+#define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_LEN	1u
+
 /*
  * Workaround keys:
  */
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 8724f8de67e2..e59c41c913b4 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -16,6 +16,7 @@
 #include "xe_dep_scheduler.h"
 #include "xe_device.h"
 #include "xe_gt.h"
+#include "xe_gt_sriov_pf_policy.h"
 #include "xe_gt_sriov_vf.h"
 #include "xe_hw_engine_class_sysfs.h"
 #include "xe_hw_engine_group.h"
@@ -698,6 +699,17 @@ static u32 calc_validate_logical_mask(struct xe_device *xe,
 	return return_mask;
 }
 
+static bool has_sched_groups(struct xe_gt *gt)
+{
+	if (IS_SRIOV_PF(gt_to_xe(gt)) && xe_gt_sriov_pf_policy_sched_groups_enabled(gt))
+		return true;
+
+	if (IS_SRIOV_VF(gt_to_xe(gt)) && xe_gt_sriov_vf_sched_groups_enabled(gt))
+		return true;
+
+	return false;
+}
+
 int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 			       struct drm_file *file)
 {
@@ -790,6 +802,13 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 			return -ENOENT;
 		}
 
+		/* SRIOV sched groups are not compatible with multi-lrc */
+		if (XE_IOCTL_DBG(xe, args->width > 1 && has_sched_groups(hwe->gt))) {
+			up_read(&vm->lock);
+			xe_vm_put(vm);
+			return -EINVAL;
+		}
+
 		q = xe_exec_queue_create(xe, vm, logical_mask,
 					 args->width, hwe, flags,
 					 args->extensions);
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
index 48f250ae0d0d..c7f1ea8eb9c5 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
@@ -8,6 +8,7 @@
 #include <drm/drm_managed.h>
 
 #include "xe_bo.h"
+#include "xe_exec_queue_types.h"
 #include "xe_gt.h"
 #include "xe_gt_sriov_pf_helpers.h"
 #include "xe_gt_sriov_pf_policy.h"
@@ -527,6 +528,24 @@ static int __pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
 				      masks, num_masks);
 }
 
+static bool guc_has_mlrc_queue(struct xe_guc *guc)
+{
+	struct xe_exec_queue *q;
+	unsigned long index;
+	bool found = false;
+
+	mutex_lock(&guc->submission_state.lock);
+	xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) {
+		if (q->width > 1) {
+			found = true;
+			break;
+		}
+	}
+	mutex_unlock(&guc->submission_state.lock);
+
+	return found;
+}
+
 static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
 {
 	int err;
@@ -548,6 +567,12 @@ static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
 	if (xe_sriov_pf_num_vfs(gt_to_xe(gt)))
 		return -EPERM;
 
+	/* The GuC silently ignores the setting if any mlrc contexts are registered */
+	if ((mode != XE_SRIOV_SCHED_GROUPS_NONE) && guc_has_mlrc_queue(&gt->uc.guc)) {
+		xe_gt_sriov_notice(gt, "can't enable sched groups with active mlrc queues\n");
+		return -EPERM;
+	}
+
 	err = __pf_provision_sched_groups(gt, mode);
 	if (err)
 		return err;
@@ -600,6 +625,11 @@ int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value)
 	return err;
 }
 
+bool xe_gt_sriov_pf_policy_sched_groups_enabled(struct xe_gt *gt)
+{
+	return gt->sriov.pf.policy.guc.sched_groups.current_mode != XE_SRIOV_SCHED_GROUPS_NONE;
+}
+
 static void pf_sanitize_guc_policies(struct xe_gt *gt)
 {
 	pf_sanitize_sched_if_idle(gt);
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
index 36680996f2bd..89aa3af6cc7d 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
@@ -18,6 +18,7 @@ bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
 int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
 u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
 int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value);
+bool xe_gt_sriov_pf_policy_sched_groups_enabled(struct xe_gt *gt);
 
 int xe_gt_sriov_pf_policy_init(struct xe_gt *gt);
 void xe_gt_sriov_pf_policy_sanitize(struct xe_gt *gt);
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
index 4c73a077d314..7a180c947032 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
@@ -438,6 +438,30 @@ u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt)
 	return value;
 }
 
+static int query_vf_sched_groups(struct xe_gt *gt)
+{
+	struct xe_guc *guc = &gt->uc.guc;
+	u32 value = 0;
+	int err;
+
+	xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
+
+	if (MAKE_GUC_VER_STRUCT(gt->sriov.vf.guc_version) < MAKE_GUC_VER(1, 26, 0))
+		return 0;
+
+	err = guc_action_query_single_klv32(guc,
+					    GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_KEY,
+					    &value);
+	if (unlikely(err)) {
+		xe_gt_sriov_err(gt, "Failed to obtain sched groups status (%pe)\n",
+				ERR_PTR(err));
+		return err;
+	}
+
+	xe_gt_sriov_dbg(gt, "sched groups %s\n", value ? "enabled" : "disabled");
+	return value;
+}
+
 static int vf_get_ggtt_info(struct xe_gt *gt)
 {
 	struct xe_tile *tile = gt_to_tile(gt);
@@ -564,6 +588,21 @@ static void vf_cache_gmdid(struct xe_gt *gt)
 	gt->sriov.vf.runtime.gmdid = xe_gt_sriov_vf_gmdid(gt);
 }
 
+static int vf_cache_sched_groups_status(struct xe_gt *gt)
+{
+	int ret;
+
+	xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
+
+	ret = query_vf_sched_groups(gt);
+	if (ret < 0)
+		return ret;
+
+	gt->sriov.vf.runtime.uses_sched_groups = ret;
+
+	return 0;
+}
+
 /**
  * xe_gt_sriov_vf_query_config - Query SR-IOV config data over MMIO.
  * @gt: the &xe_gt
@@ -593,12 +632,32 @@ int xe_gt_sriov_vf_query_config(struct xe_gt *gt)
 	if (unlikely(err))
 		return err;
 
+	err = vf_cache_sched_groups_status(gt);
+	if (unlikely(err))
+		return err;
+
 	if (has_gmdid(xe))
 		vf_cache_gmdid(gt);
 
 	return 0;
 }
 
+/**
+ * xe_gt_sriov_vf_sched_groups_enabled - Check if PF has enabled sched groups
+ * @gt: the &xe_gt
+ *
+ * This function is for VF use only.
+ *
+ * Return: true if shed groups were enabled, false otherwise.
+ */
+bool xe_gt_sriov_vf_sched_groups_enabled(struct xe_gt *gt)
+{
+	xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
+	xe_gt_assert(gt, gt->sriov.vf.guc_version.major);
+
+	return gt->sriov.vf.runtime.uses_sched_groups;
+}
+
 /**
  * xe_gt_sriov_vf_guc_ids - VF GuC context IDs configuration.
  * @gt: the &xe_gt
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
index af40276790fa..2e1d34c0397f 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
@@ -23,11 +23,14 @@ int xe_gt_sriov_vf_connect(struct xe_gt *gt);
 int xe_gt_sriov_vf_query_runtime(struct xe_gt *gt);
 void xe_gt_sriov_vf_migrated_event_handler(struct xe_gt *gt);
 
+bool xe_gt_sriov_vf_sched_groups_enabled(struct xe_gt *gt);
+
 int xe_gt_sriov_vf_init_early(struct xe_gt *gt);
 int xe_gt_sriov_vf_init(struct xe_gt *gt);
 bool xe_gt_sriov_vf_recovery_pending(struct xe_gt *gt);
 
 u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt);
+u32 xe_gt_sriov_vf_sched_groups(struct xe_gt *gt);
 u16 xe_gt_sriov_vf_guc_ids(struct xe_gt *gt);
 u64 xe_gt_sriov_vf_lmem(struct xe_gt *gt);
 
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
index 420b0e6089de..5267c097ecd0 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
@@ -27,6 +27,8 @@ struct xe_gt_sriov_vf_selfconfig {
 struct xe_gt_sriov_vf_runtime {
 	/** @gmdid: cached value of the GDMID register. */
 	u32 gmdid;
+	/** @uses_sched_groups: whether PF enabled sched groups or not. */
+	bool uses_sched_groups;
 	/** @regs_size: size of runtime register array. */
 	u32 regs_size;
 	/** @num_regs: number of runtime registers in the array. */
diff --git a/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h b/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
index 0a028c94756d..3e55d9302855 100644
--- a/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
@@ -32,6 +32,7 @@
 	define(H2G_STORM, guc_time_us)			\
 	define(IRQ_STORM, irq_time_us)			\
 	define(DOORBELL_STORM, doorbell_time_us)	\
+	define(MULTI_LRC_COUNT, multi_lrc_count)	\
 	/* end */
 
 /**
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/10] drm/xe/sriov: Scheduler groups are incompatible with multi-lrc
  2025-11-27  1:45 ` [PATCH 04/10] drm/xe/sriov: Scheduler groups are incompatible with multi-lrc Daniele Ceraolo Spurio
@ 2025-12-02 13:32   ` Michal Wajdeczko
  2025-12-02 17:57     ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 44+ messages in thread
From: Michal Wajdeczko @ 2025-12-02 13:32 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-xe



On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
> Since engines in the same class can be divided across multiple groups,
> the GuC does not allow scheduler groups to be active if there are
> multi-lrc contexts. This means that:
> 
> 1) if a mlrc context is registered when we enable scheduler groups, the
>    GuC will silently ignore the configuration
> 2) if a mlrc context is registered after scheduler groups are enabled,
>    the GuC will disable the groups and generate an adverse event.
> 
> We therefore need to block mlrc context creation when scheduler groups
> are enabled.

s/mlrc/MLRC

> An adverse event threshold is available for the new adverse event.

changes related to introduction of new threshold deserves separate patch
(as we need to handle GuC FW version checks)

> 
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> ---
>  drivers/gpu/drm/xe/abi/guc_klvs_abi.h         | 14 +++++
>  drivers/gpu/drm/xe/xe_exec_queue.c            | 19 ++++++
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c    | 30 ++++++++++
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h    |  1 +
>  drivers/gpu/drm/xe/xe_gt_sriov_vf.c           | 59 +++++++++++++++++++
>  drivers/gpu/drm/xe/xe_gt_sriov_vf.h           |  3 +
>  drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h     |  2 +
>  .../drm/xe/xe_guc_klv_thresholds_set_types.h  |  1 +
>  8 files changed, 129 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
> index 274f1b1ec37f..a6dce9da339f 100644
> --- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
> +++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
> @@ -46,11 +46,18 @@
>   *      Refers to 32 bit architecture version as reported by the HW IP.
>   *      This key is supported on MTL+ platforms only.
>   *      Requires GuC ABI 1.2+.
> + *
> + * _`GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE` : 0x3001
> + *      Tells the driver whether scheduler groups are enabled or not.
> + *      Requres GuC ABI 1.26+

typo: Requires

and don't forget to update xe_guc_klv_key_to_string() with new KEY

>   */
>  
>  #define GUC_KLV_GLOBAL_CFG_GMD_ID_KEY			0x3000u
>  #define GUC_KLV_GLOBAL_CFG_GMD_ID_LEN			1u
>  
> +#define GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_KEY	0x3001u
> +#define GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_LEN	1u
> +
>  /**
>   * DOC: GuC Self Config KLVs
>   *
> @@ -369,6 +376,10 @@ enum  {
>   *      :1: NORMAL = schedule VF always, irrespective of whether it has work or not
>   *      :2: HIGH = schedule VF in the next time-slice after current active
>   *          time-slice completes if it has active work
> + *
> + * _`GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT` : 0x8A0D
> + *      This config sets the threshold for LRCA context registration when SRIOV

... threshold for _Multi_ LRCA context registrations ...

> + *      scheduler groups are enabled.

"This is allows PF to monitor VFs' behavior when EGS is enabled."

>   */
>  
>  #define GUC_KLV_VF_CFG_GGTT_START_KEY		0x0001
> @@ -427,6 +438,9 @@ enum  {
>  #define   GUC_SCHED_PRIORITY_NORMAL		1u
>  #define   GUC_SCHED_PRIORITY_HIGH		2u
>  
> +#define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_KEY	0x8a0d
> +#define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_LEN	1u
> +
>  /*
>   * Workaround keys:
>   */
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 8724f8de67e2..e59c41c913b4 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -16,6 +16,7 @@
>  #include "xe_dep_scheduler.h"
>  #include "xe_device.h"
>  #include "xe_gt.h"
> +#include "xe_gt_sriov_pf_policy.h"
>  #include "xe_gt_sriov_vf.h"
>  #include "xe_hw_engine_class_sysfs.h"
>  #include "xe_hw_engine_group.h"
> @@ -698,6 +699,17 @@ static u32 calc_validate_logical_mask(struct xe_device *xe,
>  	return return_mask;
>  }
>  
> +static bool has_sched_groups(struct xe_gt *gt)
> +{
> +	if (IS_SRIOV_PF(gt_to_xe(gt)) && xe_gt_sriov_pf_policy_sched_groups_enabled(gt))

hmm, usually we don't want core code to look so deeply into PF subcomponent

also, do we want to work in hybrid mode where one GT is using MLRC and other is not?

> +		return true;
> +> +	if (IS_SRIOV_VF(gt_to_xe(gt)) && xe_gt_sriov_vf_sched_groups_enabled(gt))
> +		return true;
> +
> +	return false;
> +}
> +
>  int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>  			       struct drm_file *file)
>  {
> @@ -790,6 +802,13 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>  			return -ENOENT;
>  		}
>  
> +		/* SRIOV sched groups are not compatible with multi-lrc */
> +		if (XE_IOCTL_DBG(xe, args->width > 1 && has_sched_groups(hwe->gt))) {
> +			up_read(&vm->lock);
> +			xe_vm_put(vm);
> +			return -EINVAL;
> +		}
> +
>  		q = xe_exec_queue_create(xe, vm, logical_mask,
>  					 args->width, hwe, flags,
>  					 args->extensions);
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
> index 48f250ae0d0d..c7f1ea8eb9c5 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
> @@ -8,6 +8,7 @@
>  #include <drm/drm_managed.h>
>  
>  #include "xe_bo.h"
> +#include "xe_exec_queue_types.h"
>  #include "xe_gt.h"
>  #include "xe_gt_sriov_pf_helpers.h"
>  #include "xe_gt_sriov_pf_policy.h"
> @@ -527,6 +528,24 @@ static int __pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>  				      masks, num_masks);
>  }
>  
> +static bool guc_has_mlrc_queue(struct xe_guc *guc)

this is all GuC stuff, so export it from xe_guc_submission.c

> +{
> +	struct xe_exec_queue *q;
> +	unsigned long index;
> +	bool found = false;
> +
> +	mutex_lock(&guc->submission_state.lock);

guard(mutex) ?

> +	xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) {
> +		if (q->width > 1) {
> +			found = true;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&guc->submission_state.lock);
> +
> +	return found;

what if new MLRC is created right now?

maybe we should use xe_guard to lockdown one feature or the other?

> +}
> +
>  static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>  {
>  	int err;
> @@ -548,6 +567,12 @@ static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>  	if (xe_sriov_pf_num_vfs(gt_to_xe(gt)))
>  		return -EPERM;
>  
> +	/* The GuC silently ignores the setting if any mlrc contexts are registered */
> +	if ((mode != XE_SRIOV_SCHED_GROUPS_NONE) && guc_has_mlrc_queue(&gt->uc.guc)) {
> +		xe_gt_sriov_notice(gt, "can't enable sched groups with active mlrc queues\n");
> +		return -EPERM;
> +	}
> +
>  	err = __pf_provision_sched_groups(gt, mode);
>  	if (err)
>  		return err;
> @@ -600,6 +625,11 @@ int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value)
>  	return err;
>  }
>  

every public function needs to have kernel-doc

> +bool xe_gt_sriov_pf_policy_sched_groups_enabled(struct xe_gt *gt)
> +{
> +	return gt->sriov.pf.policy.guc.sched_groups.current_mode != XE_SRIOV_SCHED_GROUPS_NONE;
> +}
> +
>  static void pf_sanitize_guc_policies(struct xe_gt *gt)
>  {
>  	pf_sanitize_sched_if_idle(gt);
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
> index 36680996f2bd..89aa3af6cc7d 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
> @@ -18,6 +18,7 @@ bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
>  int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
>  u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
>  int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value);
> +bool xe_gt_sriov_pf_policy_sched_groups_enabled(struct xe_gt *gt);
>  
>  int xe_gt_sriov_pf_policy_init(struct xe_gt *gt);
>  void xe_gt_sriov_pf_policy_sanitize(struct xe_gt *gt);
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> index 4c73a077d314..7a180c947032 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> @@ -438,6 +438,30 @@ u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt)
>  	return value;
>  }
>  
> +static int query_vf_sched_groups(struct xe_gt *gt)
> +{
> +	struct xe_guc *guc = &gt->uc.guc;
> +	u32 value = 0;
> +	int err;
> +
> +	xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
> +
> +	if (MAKE_GUC_VER_STRUCT(gt->sriov.vf.guc_version) < MAKE_GUC_VER(1, 26, 0))
> +		return 0;
> +
> +	err = guc_action_query_single_klv32(guc,
> +					    GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_KEY,
> +					    &value);
> +	if (unlikely(err)) {
> +		xe_gt_sriov_err(gt, "Failed to obtain sched groups status (%pe)\n",
> +				ERR_PTR(err));
> +		return err;
> +	}
> +
> +	xe_gt_sriov_dbg(gt, "sched groups %s\n", value ? "enabled" : "disabled");

str_enabled_disabled(value)

> +	return value;
> +}
> +
>  static int vf_get_ggtt_info(struct xe_gt *gt)
>  {
>  	struct xe_tile *tile = gt_to_tile(gt);
> @@ -564,6 +588,21 @@ static void vf_cache_gmdid(struct xe_gt *gt)
>  	gt->sriov.vf.runtime.gmdid = xe_gt_sriov_vf_gmdid(gt);
>  }
>  
> +static int vf_cache_sched_groups_status(struct xe_gt *gt)
> +{
> +	int ret;
> +
> +	xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
> +
> +	ret = query_vf_sched_groups(gt);
> +	if (ret < 0)
> +		return ret;
> +
> +	gt->sriov.vf.runtime.uses_sched_groups = ret;
> +
> +	return 0;
> +}
> +
>  /**
>   * xe_gt_sriov_vf_query_config - Query SR-IOV config data over MMIO.
>   * @gt: the &xe_gt
> @@ -593,12 +632,32 @@ int xe_gt_sriov_vf_query_config(struct xe_gt *gt)
>  	if (unlikely(err))
>  		return err;
>  
> +	err = vf_cache_sched_groups_status(gt);
> +	if (unlikely(err))
> +		return err;
> +
>  	if (has_gmdid(xe))
>  		vf_cache_gmdid(gt);
>  
>  	return 0;
>  }
>  
> +/**
> + * xe_gt_sriov_vf_sched_groups_enabled - Check if PF has enabled sched groups

    * xe_gt_sriov_vf_sched_groups_enabled() - Check ...


> + * @gt: the &xe_gt
> + *
> + * This function is for VF use only.
> + *
> + * Return: true if shed groups were enabled, false otherwise.

typo: s/shed/scheduler

> + */
> +bool xe_gt_sriov_vf_sched_groups_enabled(struct xe_gt *gt)
> +{
> +	xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
> +	xe_gt_assert(gt, gt->sriov.vf.guc_version.major);
> +
> +	return gt->sriov.vf.runtime.uses_sched_groups;
> +}
> +
>  /**
>   * xe_gt_sriov_vf_guc_ids - VF GuC context IDs configuration.
>   * @gt: the &xe_gt
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
> index af40276790fa..2e1d34c0397f 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
> @@ -23,11 +23,14 @@ int xe_gt_sriov_vf_connect(struct xe_gt *gt);
>  int xe_gt_sriov_vf_query_runtime(struct xe_gt *gt);
>  void xe_gt_sriov_vf_migrated_event_handler(struct xe_gt *gt);
>  
> +bool xe_gt_sriov_vf_sched_groups_enabled(struct xe_gt *gt);

move it there [1]

> +
>  int xe_gt_sriov_vf_init_early(struct xe_gt *gt);
>  int xe_gt_sriov_vf_init(struct xe_gt *gt);
>  bool xe_gt_sriov_vf_recovery_pending(struct xe_gt *gt);
>  
>  u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt);
> +u32 xe_gt_sriov_vf_sched_groups(struct xe_gt *gt);

unused ?

>  u16 xe_gt_sriov_vf_guc_ids(struct xe_gt *gt);
>  u64 xe_gt_sriov_vf_lmem(struct xe_gt *gt);

[1] here

>  
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
> index 420b0e6089de..5267c097ecd0 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
> @@ -27,6 +27,8 @@ struct xe_gt_sriov_vf_selfconfig {
>  struct xe_gt_sriov_vf_runtime {
>  	/** @gmdid: cached value of the GDMID register. */
>  	u32 gmdid;
> +	/** @uses_sched_groups: whether PF enabled sched groups or not. */
> +	bool uses_sched_groups;
>  	/** @regs_size: size of runtime register array. */
>  	u32 regs_size;
>  	/** @num_regs: number of runtime registers in the array. */
> diff --git a/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h b/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
> index 0a028c94756d..3e55d9302855 100644
> --- a/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
> +++ b/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
> @@ -32,6 +32,7 @@
>  	define(H2G_STORM, guc_time_us)			\
>  	define(IRQ_STORM, irq_time_us)			\
>  	define(DOORBELL_STORM, doorbell_time_us)	\
> +	define(MULTI_LRC_COUNT, multi_lrc_count)	\

this needs to be defined with some version info, maybe:

	define(MULTI_LRC_COUNT, multi_lrc_count, (70, 53, 0))	\

and then in encode_config() have two variants of code generators:

#define encode_threshold_config2(TAG, ...) ({					\
	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_THRESHOLD_##TAG);			\
	cfg[n++] = config->thresholds[MAKE_XE_GUC_KLV_THRESHOLD_INDEX(TAG)];	\
});
#define encode_threshold_config3(TAG, NAME, VERSION) ({				\
	if (GUC_FIRMWARE_VER(&gt->uc.guc) >= MAKE_GUC_VER VERSION) {		\
		encode_threshold_config2(TAG, NAME);				\
	}									\
});
#define encode_threshold_config(ARGS...) \
	CALL_ARGS(CONCATENATE(encode_threshold_config, COUNT_ARGS(ARGS)), ARGS)

this should fix the issues already spotted by the CI on ADLP:

<6> [133.603624] xe 0000:00:02.0: [drm] Tile0: GT0: { key 0x8a0d : 32b value 0 } # multi_lrc_count
<6> [133.603625] xe 0000:00:02.0: [drm] Tile0: GT0: { key 0x0001 : 64b value 0x200000 } # ggtt_start
<6> [133.603626] xe 0000:00:02.0: [drm] Tile0: GT0: { key 0x0002 : 64b value 0xfec00000 } # ggtt_size
<3> [133.603653] xe 0000:00:02.0: [drm] *ERROR* PF: Tile0: GT0: Failed to push self configuration (-ENOKEY)


>  	/* end */
>  
>  /**


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/10] drm/xe/sriov: Scheduler groups are incompatible with multi-lrc
  2025-12-02 13:32   ` Michal Wajdeczko
@ 2025-12-02 17:57     ` Daniele Ceraolo Spurio
  2025-12-02 21:17       ` Michal Wajdeczko
  0 siblings, 1 reply; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-12-02 17:57 UTC (permalink / raw)
  To: Michal Wajdeczko, intel-xe



On 12/2/2025 5:32 AM, Michal Wajdeczko wrote:
>
> On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
>> Since engines in the same class can be divided across multiple groups,
>> the GuC does not allow scheduler groups to be active if there are
>> multi-lrc contexts. This means that:
>>
>> 1) if a mlrc context is registered when we enable scheduler groups, the
>>     GuC will silently ignore the configuration
>> 2) if a mlrc context is registered after scheduler groups are enabled,
>>     the GuC will disable the groups and generate an adverse event.
>>
>> We therefore need to block mlrc context creation when scheduler groups
>> are enabled.
> s/mlrc/MLRC
>
>> An adverse event threshold is available for the new adverse event.
> changes related to introduction of new threshold deserves separate patch
> (as we need to handle GuC FW version checks)

ok

>
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> ---
>>   drivers/gpu/drm/xe/abi/guc_klvs_abi.h         | 14 +++++
>>   drivers/gpu/drm/xe/xe_exec_queue.c            | 19 ++++++
>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c    | 30 ++++++++++
>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h    |  1 +
>>   drivers/gpu/drm/xe/xe_gt_sriov_vf.c           | 59 +++++++++++++++++++
>>   drivers/gpu/drm/xe/xe_gt_sriov_vf.h           |  3 +
>>   drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h     |  2 +
>>   .../drm/xe/xe_guc_klv_thresholds_set_types.h  |  1 +
>>   8 files changed, 129 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>> index 274f1b1ec37f..a6dce9da339f 100644
>> --- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>> +++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>> @@ -46,11 +46,18 @@
>>    *      Refers to 32 bit architecture version as reported by the HW IP.
>>    *      This key is supported on MTL+ platforms only.
>>    *      Requires GuC ABI 1.2+.
>> + *
>> + * _`GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE` : 0x3001
>> + *      Tells the driver whether scheduler groups are enabled or not.
>> + *      Requres GuC ABI 1.26+
> typo: Requires
>
> and don't forget to update xe_guc_klv_key_to_string() with new KEY

ok

>
>>    */
>>   
>>   #define GUC_KLV_GLOBAL_CFG_GMD_ID_KEY			0x3000u
>>   #define GUC_KLV_GLOBAL_CFG_GMD_ID_LEN			1u
>>   
>> +#define GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_KEY	0x3001u
>> +#define GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_LEN	1u
>> +
>>   /**
>>    * DOC: GuC Self Config KLVs
>>    *
>> @@ -369,6 +376,10 @@ enum  {
>>    *      :1: NORMAL = schedule VF always, irrespective of whether it has work or not
>>    *      :2: HIGH = schedule VF in the next time-slice after current active
>>    *          time-slice completes if it has active work
>> + *
>> + * _`GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT` : 0x8A0D
>> + *      This config sets the threshold for LRCA context registration when SRIOV
> ... threshold for _Multi_ LRCA context registrations ...

ooops :)

>
>> + *      scheduler groups are enabled.
> "This is allows PF to monitor VFs' behavior when EGS is enabled."
>
>>    */
>>   
>>   #define GUC_KLV_VF_CFG_GGTT_START_KEY		0x0001
>> @@ -427,6 +438,9 @@ enum  {
>>   #define   GUC_SCHED_PRIORITY_NORMAL		1u
>>   #define   GUC_SCHED_PRIORITY_HIGH		2u
>>   
>> +#define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_KEY	0x8a0d
>> +#define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_LEN	1u
>> +
>>   /*
>>    * Workaround keys:
>>    */
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>> index 8724f8de67e2..e59c41c913b4 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>> @@ -16,6 +16,7 @@
>>   #include "xe_dep_scheduler.h"
>>   #include "xe_device.h"
>>   #include "xe_gt.h"
>> +#include "xe_gt_sriov_pf_policy.h"
>>   #include "xe_gt_sriov_vf.h"
>>   #include "xe_hw_engine_class_sysfs.h"
>>   #include "xe_hw_engine_group.h"
>> @@ -698,6 +699,17 @@ static u32 calc_validate_logical_mask(struct xe_device *xe,
>>   	return return_mask;
>>   }
>>   
>> +static bool has_sched_groups(struct xe_gt *gt)
>> +{
>> +	if (IS_SRIOV_PF(gt_to_xe(gt)) && xe_gt_sriov_pf_policy_sched_groups_enabled(gt))
> hmm, usually we don't want core code to look so deeply into PF subcomponent

I thought about that, but I didn't know which sriov file would be the 
right one, since almost all of them are either pf-only or vf-only. 
should I just move it to xe_sriov.c?

> also, do we want to work in hybrid mode where one GT is using MLRC and other is not?

The GuCs allow it. I didn't want to block it in case we do get a 
scenario later where this is needed (e.g. we might get a split primary 
GT but still want to do mlrc on media)

>
>> +		return true;
>> +> +	if (IS_SRIOV_VF(gt_to_xe(gt)) && xe_gt_sriov_vf_sched_groups_enabled(gt))
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>>   int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>>   			       struct drm_file *file)
>>   {
>> @@ -790,6 +802,13 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>>   			return -ENOENT;
>>   		}
>>   
>> +		/* SRIOV sched groups are not compatible with multi-lrc */
>> +		if (XE_IOCTL_DBG(xe, args->width > 1 && has_sched_groups(hwe->gt))) {
>> +			up_read(&vm->lock);
>> +			xe_vm_put(vm);
>> +			return -EINVAL;
>> +		}
>> +
>>   		q = xe_exec_queue_create(xe, vm, logical_mask,
>>   					 args->width, hwe, flags,
>>   					 args->extensions);
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>> index 48f250ae0d0d..c7f1ea8eb9c5 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>> @@ -8,6 +8,7 @@
>>   #include <drm/drm_managed.h>
>>   
>>   #include "xe_bo.h"
>> +#include "xe_exec_queue_types.h"
>>   #include "xe_gt.h"
>>   #include "xe_gt_sriov_pf_helpers.h"
>>   #include "xe_gt_sriov_pf_policy.h"
>> @@ -527,6 +528,24 @@ static int __pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>>   				      masks, num_masks);
>>   }
>>   
>> +static bool guc_has_mlrc_queue(struct xe_guc *guc)
> this is all GuC stuff, so export it from xe_guc_submission.c

ok

>
>> +{
>> +	struct xe_exec_queue *q;
>> +	unsigned long index;
>> +	bool found = false;
>> +
>> +	mutex_lock(&guc->submission_state.lock);
> guard(mutex) ?
>
>> +	xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) {
>> +		if (q->width > 1) {
>> +			found = true;
>> +			break;
>> +		}
>> +	}
>> +	mutex_unlock(&guc->submission_state.lock);
>> +
>> +	return found;
> what if new MLRC is created right now?
>
> maybe we should use xe_guard to lockdown one feature or the other?

The idea is that the admin is responsible for enabling EGS when the 
system is in the correct state, which is why the GuC doesn't return an 
error and just ignore the KLV if the system is not in the correct state. 
This check is there to catch the case where the admin has closed all 
apps in preparation to enabling EGS but the driver still hasn't 
processed all the context de-registrations. Therefore, I don't really 
want to over-complicate it with extra locking. I'll add a comment to 
better explain.

>
>> +}
>> +
>>   static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>>   {
>>   	int err;
>> @@ -548,6 +567,12 @@ static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>>   	if (xe_sriov_pf_num_vfs(gt_to_xe(gt)))
>>   		return -EPERM;
>>   
>> +	/* The GuC silently ignores the setting if any mlrc contexts are registered */
>> +	if ((mode != XE_SRIOV_SCHED_GROUPS_NONE) && guc_has_mlrc_queue(&gt->uc.guc)) {
>> +		xe_gt_sriov_notice(gt, "can't enable sched groups with active mlrc queues\n");
>> +		return -EPERM;
>> +	}
>> +
>>   	err = __pf_provision_sched_groups(gt, mode);
>>   	if (err)
>>   		return err;
>> @@ -600,6 +625,11 @@ int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value)
>>   	return err;
>>   }
>>   
> every public function needs to have kernel-doc

ok

>
>> +bool xe_gt_sriov_pf_policy_sched_groups_enabled(struct xe_gt *gt)
>> +{
>> +	return gt->sriov.pf.policy.guc.sched_groups.current_mode != XE_SRIOV_SCHED_GROUPS_NONE;
>> +}
>> +
>>   static void pf_sanitize_guc_policies(struct xe_gt *gt)
>>   {
>>   	pf_sanitize_sched_if_idle(gt);
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>> index 36680996f2bd..89aa3af6cc7d 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>> @@ -18,6 +18,7 @@ bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
>>   int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
>>   u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
>>   int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value);
>> +bool xe_gt_sriov_pf_policy_sched_groups_enabled(struct xe_gt *gt);
>>   
>>   int xe_gt_sriov_pf_policy_init(struct xe_gt *gt);
>>   void xe_gt_sriov_pf_policy_sanitize(struct xe_gt *gt);
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>> index 4c73a077d314..7a180c947032 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>> @@ -438,6 +438,30 @@ u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt)
>>   	return value;
>>   }
>>   
>> +static int query_vf_sched_groups(struct xe_gt *gt)
>> +{
>> +	struct xe_guc *guc = &gt->uc.guc;
>> +	u32 value = 0;
>> +	int err;
>> +
>> +	xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
>> +
>> +	if (MAKE_GUC_VER_STRUCT(gt->sriov.vf.guc_version) < MAKE_GUC_VER(1, 26, 0))
>> +		return 0;
>> +
>> +	err = guc_action_query_single_klv32(guc,
>> +					    GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_KEY,
>> +					    &value);
>> +	if (unlikely(err)) {
>> +		xe_gt_sriov_err(gt, "Failed to obtain sched groups status (%pe)\n",
>> +				ERR_PTR(err));
>> +		return err;
>> +	}
>> +
>> +	xe_gt_sriov_dbg(gt, "sched groups %s\n", value ? "enabled" : "disabled");
> str_enabled_disabled(value)
>
>> +	return value;
>> +}
>> +
>>   static int vf_get_ggtt_info(struct xe_gt *gt)
>>   {
>>   	struct xe_tile *tile = gt_to_tile(gt);
>> @@ -564,6 +588,21 @@ static void vf_cache_gmdid(struct xe_gt *gt)
>>   	gt->sriov.vf.runtime.gmdid = xe_gt_sriov_vf_gmdid(gt);
>>   }
>>   
>> +static int vf_cache_sched_groups_status(struct xe_gt *gt)
>> +{
>> +	int ret;
>> +
>> +	xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
>> +
>> +	ret = query_vf_sched_groups(gt);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	gt->sriov.vf.runtime.uses_sched_groups = ret;
>> +
>> +	return 0;
>> +}
>> +
>>   /**
>>    * xe_gt_sriov_vf_query_config - Query SR-IOV config data over MMIO.
>>    * @gt: the &xe_gt
>> @@ -593,12 +632,32 @@ int xe_gt_sriov_vf_query_config(struct xe_gt *gt)
>>   	if (unlikely(err))
>>   		return err;
>>   
>> +	err = vf_cache_sched_groups_status(gt);
>> +	if (unlikely(err))
>> +		return err;
>> +
>>   	if (has_gmdid(xe))
>>   		vf_cache_gmdid(gt);
>>   
>>   	return 0;
>>   }
>>   
>> +/**
>> + * xe_gt_sriov_vf_sched_groups_enabled - Check if PF has enabled sched groups
>      * xe_gt_sriov_vf_sched_groups_enabled() - Check ...
>
>
>> + * @gt: the &xe_gt
>> + *
>> + * This function is for VF use only.
>> + *
>> + * Return: true if shed groups were enabled, false otherwise.
> typo: s/shed/scheduler
>
>> + */
>> +bool xe_gt_sriov_vf_sched_groups_enabled(struct xe_gt *gt)
>> +{
>> +	xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
>> +	xe_gt_assert(gt, gt->sriov.vf.guc_version.major);
>> +
>> +	return gt->sriov.vf.runtime.uses_sched_groups;
>> +}
>> +
>>   /**
>>    * xe_gt_sriov_vf_guc_ids - VF GuC context IDs configuration.
>>    * @gt: the &xe_gt
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>> index af40276790fa..2e1d34c0397f 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>> @@ -23,11 +23,14 @@ int xe_gt_sriov_vf_connect(struct xe_gt *gt);
>>   int xe_gt_sriov_vf_query_runtime(struct xe_gt *gt);
>>   void xe_gt_sriov_vf_migrated_event_handler(struct xe_gt *gt);
>>   
>> +bool xe_gt_sriov_vf_sched_groups_enabled(struct xe_gt *gt);
> move it there [1]
>
>> +
>>   int xe_gt_sriov_vf_init_early(struct xe_gt *gt);
>>   int xe_gt_sriov_vf_init(struct xe_gt *gt);
>>   bool xe_gt_sriov_vf_recovery_pending(struct xe_gt *gt);
>>   
>>   u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt);
>> +u32 xe_gt_sriov_vf_sched_groups(struct xe_gt *gt);
> unused ?

oops, old function name that I forgot to remove.

>
>>   u16 xe_gt_sriov_vf_guc_ids(struct xe_gt *gt);
>>   u64 xe_gt_sriov_vf_lmem(struct xe_gt *gt);
> [1] here
>
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
>> index 420b0e6089de..5267c097ecd0 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
>> @@ -27,6 +27,8 @@ struct xe_gt_sriov_vf_selfconfig {
>>   struct xe_gt_sriov_vf_runtime {
>>   	/** @gmdid: cached value of the GDMID register. */
>>   	u32 gmdid;
>> +	/** @uses_sched_groups: whether PF enabled sched groups or not. */
>> +	bool uses_sched_groups;
>>   	/** @regs_size: size of runtime register array. */
>>   	u32 regs_size;
>>   	/** @num_regs: number of runtime registers in the array. */
>> diff --git a/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h b/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
>> index 0a028c94756d..3e55d9302855 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
>> +++ b/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
>> @@ -32,6 +32,7 @@
>>   	define(H2G_STORM, guc_time_us)			\
>>   	define(IRQ_STORM, irq_time_us)			\
>>   	define(DOORBELL_STORM, doorbell_time_us)	\
>> +	define(MULTI_LRC_COUNT, multi_lrc_count)	\
> this needs to be defined with some version info, maybe:
>
> 	define(MULTI_LRC_COUNT, multi_lrc_count, (70, 53, 0))	\
>
> and then in encode_config() have two variants of code generators:
>
> #define encode_threshold_config2(TAG, ...) ({					\
> 	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_THRESHOLD_##TAG);			\
> 	cfg[n++] = config->thresholds[MAKE_XE_GUC_KLV_THRESHOLD_INDEX(TAG)];	\
> });
> #define encode_threshold_config3(TAG, NAME, VERSION) ({				\
> 	if (GUC_FIRMWARE_VER(&gt->uc.guc) >= MAKE_GUC_VER VERSION) {		\
> 		encode_threshold_config2(TAG, NAME);				\
> 	}									\
> });
> #define encode_threshold_config(ARGS...) \
> 	CALL_ARGS(CONCATENATE(encode_threshold_config, COUNT_ARGS(ARGS)), ARGS)
>
> this should fix the issues already spotted by the CI on ADLP:
>
> <6> [133.603624] xe 0000:00:02.0: [drm] Tile0: GT0: { key 0x8a0d : 32b value 0 } # multi_lrc_count
> <6> [133.603625] xe 0000:00:02.0: [drm] Tile0: GT0: { key 0x0001 : 64b value 0x200000 } # ggtt_start
> <6> [133.603626] xe 0000:00:02.0: [drm] Tile0: GT0: { key 0x0002 : 64b value 0xfec00000 } # ggtt_size
> <3> [133.603653] xe 0000:00:02.0: [drm] *ERROR* PF: Tile0: GT0: Failed to push self configuration (-ENOKEY)

I think it works better if we add the version for all thresholds, so we 
don't have to define multiple macros every time we need to check the GuC 
version (which I believe needs to also be done at least in 
register_threshold_attribute() and define_threshold_key_to_provision_case().
I can just use 70.29.2 (which is the minimum supported version for the 
Xe driver) for all the older defines.

Daniele

>
>
>>   	/* end */
>>   
>>   /**


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/10] drm/xe/sriov: Scheduler groups are incompatible with multi-lrc
  2025-12-02 17:57     ` Daniele Ceraolo Spurio
@ 2025-12-02 21:17       ` Michal Wajdeczko
  2025-12-02 21:25         ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 44+ messages in thread
From: Michal Wajdeczko @ 2025-12-02 21:17 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-xe



On 12/2/2025 6:57 PM, Daniele Ceraolo Spurio wrote:
> 
> 
> On 12/2/2025 5:32 AM, Michal Wajdeczko wrote:
>>
>> On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
>>> Since engines in the same class can be divided across multiple groups,
>>> the GuC does not allow scheduler groups to be active if there are
>>> multi-lrc contexts. This means that:
>>>
>>> 1) if a mlrc context is registered when we enable scheduler groups, the
>>>     GuC will silently ignore the configuration
>>> 2) if a mlrc context is registered after scheduler groups are enabled,
>>>     the GuC will disable the groups and generate an adverse event.
>>>
>>> We therefore need to block mlrc context creation when scheduler groups
>>> are enabled.
>> s/mlrc/MLRC
>>
>>> An adverse event threshold is available for the new adverse event.
>> changes related to introduction of new threshold deserves separate patch
>> (as we need to handle GuC FW version checks)
> 
> ok
> 
>>
>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>> ---
>>>   drivers/gpu/drm/xe/abi/guc_klvs_abi.h         | 14 +++++
>>>   drivers/gpu/drm/xe/xe_exec_queue.c            | 19 ++++++
>>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c    | 30 ++++++++++
>>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h    |  1 +
>>>   drivers/gpu/drm/xe/xe_gt_sriov_vf.c           | 59 +++++++++++++++++++
>>>   drivers/gpu/drm/xe/xe_gt_sriov_vf.h           |  3 +
>>>   drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h     |  2 +
>>>   .../drm/xe/xe_guc_klv_thresholds_set_types.h  |  1 +
>>>   8 files changed, 129 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>>> index 274f1b1ec37f..a6dce9da339f 100644
>>> --- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>>> +++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>>> @@ -46,11 +46,18 @@
>>>    *      Refers to 32 bit architecture version as reported by the HW IP.
>>>    *      This key is supported on MTL+ platforms only.
>>>    *      Requires GuC ABI 1.2+.
>>> + *
>>> + * _`GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE` : 0x3001
>>> + *      Tells the driver whether scheduler groups are enabled or not.
>>> + *      Requres GuC ABI 1.26+
>> typo: Requires
>>
>> and don't forget to update xe_guc_klv_key_to_string() with new KEY
> 
> ok
> 
>>
>>>    */
>>>     #define GUC_KLV_GLOBAL_CFG_GMD_ID_KEY            0x3000u
>>>   #define GUC_KLV_GLOBAL_CFG_GMD_ID_LEN            1u
>>>   +#define GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_KEY    0x3001u
>>> +#define GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_LEN    1u
>>> +
>>>   /**
>>>    * DOC: GuC Self Config KLVs
>>>    *
>>> @@ -369,6 +376,10 @@ enum  {
>>>    *      :1: NORMAL = schedule VF always, irrespective of whether it has work or not
>>>    *      :2: HIGH = schedule VF in the next time-slice after current active
>>>    *          time-slice completes if it has active work
>>> + *
>>> + * _`GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT` : 0x8A0D
>>> + *      This config sets the threshold for LRCA context registration when SRIOV
>> ... threshold for _Multi_ LRCA context registrations ...
> 
> ooops :)
> 
>>
>>> + *      scheduler groups are enabled.
>> "This is allows PF to monitor VFs' behavior when EGS is enabled."
>>
>>>    */
>>>     #define GUC_KLV_VF_CFG_GGTT_START_KEY        0x0001
>>> @@ -427,6 +438,9 @@ enum  {
>>>   #define   GUC_SCHED_PRIORITY_NORMAL        1u
>>>   #define   GUC_SCHED_PRIORITY_HIGH        2u
>>>   +#define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_KEY    0x8a0d
>>> +#define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_LEN    1u
>>> +
>>>   /*
>>>    * Workaround keys:
>>>    */
>>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>>> index 8724f8de67e2..e59c41c913b4 100644
>>> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
>>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>>> @@ -16,6 +16,7 @@
>>>   #include "xe_dep_scheduler.h"
>>>   #include "xe_device.h"
>>>   #include "xe_gt.h"
>>> +#include "xe_gt_sriov_pf_policy.h"
>>>   #include "xe_gt_sriov_vf.h"
>>>   #include "xe_hw_engine_class_sysfs.h"
>>>   #include "xe_hw_engine_group.h"
>>> @@ -698,6 +699,17 @@ static u32 calc_validate_logical_mask(struct xe_device *xe,
>>>       return return_mask;
>>>   }
>>>   +static bool has_sched_groups(struct xe_gt *gt)
>>> +{
>>> +    if (IS_SRIOV_PF(gt_to_xe(gt)) && xe_gt_sriov_pf_policy_sched_groups_enabled(gt))
>> hmm, usually we don't want core code to look so deeply into PF subcomponent
> 
> I thought about that, but I didn't know which sriov file would be the right one, since almost all of them are either pf-only or vf-only. should I just move it to xe_sriov.c?

hmm, but xe_sriov.c is device level function

we might add helper to xe_gt_sriov_pf.h where we can also provide stub there:

#ifdef CONFIG_PCI_IOV
bool xe_gt_sriov_pf_sched_groups_enabled(gt);
#else
static inline bool xe_gt_sriov_pf_sched_groups_enabled(gt) { return false; }
#endif


> 
>> also, do we want to work in hybrid mode where one GT is using MLRC and other is not?
> 
> The GuCs allow it. I didn't want to block it in case we do get a scenario later where this is needed (e.g. we might get a split primary GT but still want to do mlrc on media)
> 
>>
>>> +        return true;
>>> +> +    if (IS_SRIOV_VF(gt_to_xe(gt)) && xe_gt_sriov_vf_sched_groups_enabled(gt))
>>> +        return true;
>>> +
>>> +    return false;
>>> +}
>>> +
>>>   int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>>>                      struct drm_file *file)
>>>   {
>>> @@ -790,6 +802,13 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>>>               return -ENOENT;
>>>           }
>>>   +        /* SRIOV sched groups are not compatible with multi-lrc */
>>> +        if (XE_IOCTL_DBG(xe, args->width > 1 && has_sched_groups(hwe->gt))) {
>>> +            up_read(&vm->lock);
>>> +            xe_vm_put(vm);
>>> +            return -EINVAL;
>>> +        }
>>> +
>>>           q = xe_exec_queue_create(xe, vm, logical_mask,
>>>                        args->width, hwe, flags,
>>>                        args->extensions);
>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>> index 48f250ae0d0d..c7f1ea8eb9c5 100644
>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>> @@ -8,6 +8,7 @@
>>>   #include <drm/drm_managed.h>
>>>     #include "xe_bo.h"
>>> +#include "xe_exec_queue_types.h"
>>>   #include "xe_gt.h"
>>>   #include "xe_gt_sriov_pf_helpers.h"
>>>   #include "xe_gt_sriov_pf_policy.h"
>>> @@ -527,6 +528,24 @@ static int __pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>>>                         masks, num_masks);
>>>   }
>>>   +static bool guc_has_mlrc_queue(struct xe_guc *guc)
>> this is all GuC stuff, so export it from xe_guc_submission.c
> 
> ok
> 
>>
>>> +{
>>> +    struct xe_exec_queue *q;
>>> +    unsigned long index;
>>> +    bool found = false;
>>> +
>>> +    mutex_lock(&guc->submission_state.lock);
>> guard(mutex) ?
>>
>>> +    xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) {
>>> +        if (q->width > 1) {
>>> +            found = true;
>>> +            break;
>>> +        }
>>> +    }
>>> +    mutex_unlock(&guc->submission_state.lock);
>>> +
>>> +    return found;
>> what if new MLRC is created right now?
>>
>> maybe we should use xe_guard to lockdown one feature or the other?
> 
> The idea is that the admin is responsible for enabling EGS when the system is in the correct state, which is why the GuC doesn't return an error and just ignore the KLV if the system is not in the correct state. This check is there to catch the case where the admin has closed all apps in preparation to enabling EGS but the driver still hasn't processed all the context de-registrations. Therefore, I don't really want to over-complicate it with extra locking. I'll add a comment to better explain.

so it doesn't have to be bullet-proof ?

> 
>>
>>> +}
>>> +
>>>   static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>>>   {
>>>       int err;
>>> @@ -548,6 +567,12 @@ static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>>>       if (xe_sriov_pf_num_vfs(gt_to_xe(gt)))
>>>           return -EPERM;
>>>   +    /* The GuC silently ignores the setting if any mlrc contexts are registered */
>>> +    if ((mode != XE_SRIOV_SCHED_GROUPS_NONE) && guc_has_mlrc_queue(&gt->uc.guc)) {
>>> +        xe_gt_sriov_notice(gt, "can't enable sched groups with active mlrc queues\n");
>>> +        return -EPERM;
>>> +    }
>>> +
>>>       err = __pf_provision_sched_groups(gt, mode);
>>>       if (err)
>>>           return err;
>>> @@ -600,6 +625,11 @@ int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value)
>>>       return err;
>>>   }
>>>   
>> every public function needs to have kernel-doc
> 
> ok
> 
>>
>>> +bool xe_gt_sriov_pf_policy_sched_groups_enabled(struct xe_gt *gt)
>>> +{
>>> +    return gt->sriov.pf.policy.guc.sched_groups.current_mode != XE_SRIOV_SCHED_GROUPS_NONE;
>>> +}
>>> +
>>>   static void pf_sanitize_guc_policies(struct xe_gt *gt)
>>>   {
>>>       pf_sanitize_sched_if_idle(gt);
>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>> index 36680996f2bd..89aa3af6cc7d 100644
>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>> @@ -18,6 +18,7 @@ bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
>>>   int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
>>>   u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
>>>   int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value);
>>> +bool xe_gt_sriov_pf_policy_sched_groups_enabled(struct xe_gt *gt);
>>>     int xe_gt_sriov_pf_policy_init(struct xe_gt *gt);
>>>   void xe_gt_sriov_pf_policy_sanitize(struct xe_gt *gt);
>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>>> index 4c73a077d314..7a180c947032 100644
>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>>> @@ -438,6 +438,30 @@ u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt)
>>>       return value;
>>>   }
>>>   +static int query_vf_sched_groups(struct xe_gt *gt)
>>> +{
>>> +    struct xe_guc *guc = &gt->uc.guc;
>>> +    u32 value = 0;
>>> +    int err;
>>> +
>>> +    xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
>>> +
>>> +    if (MAKE_GUC_VER_STRUCT(gt->sriov.vf.guc_version) < MAKE_GUC_VER(1, 26, 0))
>>> +        return 0;
>>> +
>>> +    err = guc_action_query_single_klv32(guc,
>>> +                        GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_KEY,
>>> +                        &value);
>>> +    if (unlikely(err)) {
>>> +        xe_gt_sriov_err(gt, "Failed to obtain sched groups status (%pe)\n",
>>> +                ERR_PTR(err));
>>> +        return err;
>>> +    }
>>> +
>>> +    xe_gt_sriov_dbg(gt, "sched groups %s\n", value ? "enabled" : "disabled");
>> str_enabled_disabled(value)
>>
>>> +    return value;
>>> +}
>>> +
>>>   static int vf_get_ggtt_info(struct xe_gt *gt)
>>>   {
>>>       struct xe_tile *tile = gt_to_tile(gt);
>>> @@ -564,6 +588,21 @@ static void vf_cache_gmdid(struct xe_gt *gt)
>>>       gt->sriov.vf.runtime.gmdid = xe_gt_sriov_vf_gmdid(gt);
>>>   }
>>>   +static int vf_cache_sched_groups_status(struct xe_gt *gt)
>>> +{
>>> +    int ret;
>>> +
>>> +    xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
>>> +
>>> +    ret = query_vf_sched_groups(gt);
>>> +    if (ret < 0)
>>> +        return ret;
>>> +
>>> +    gt->sriov.vf.runtime.uses_sched_groups = ret;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>   /**
>>>    * xe_gt_sriov_vf_query_config - Query SR-IOV config data over MMIO.
>>>    * @gt: the &xe_gt
>>> @@ -593,12 +632,32 @@ int xe_gt_sriov_vf_query_config(struct xe_gt *gt)
>>>       if (unlikely(err))
>>>           return err;
>>>   +    err = vf_cache_sched_groups_status(gt);
>>> +    if (unlikely(err))
>>> +        return err;
>>> +
>>>       if (has_gmdid(xe))
>>>           vf_cache_gmdid(gt);
>>>         return 0;
>>>   }
>>>   +/**
>>> + * xe_gt_sriov_vf_sched_groups_enabled - Check if PF has enabled sched groups
>>      * xe_gt_sriov_vf_sched_groups_enabled() - Check ...
>>
>>
>>> + * @gt: the &xe_gt
>>> + *
>>> + * This function is for VF use only.
>>> + *
>>> + * Return: true if shed groups were enabled, false otherwise.
>> typo: s/shed/scheduler
>>
>>> + */
>>> +bool xe_gt_sriov_vf_sched_groups_enabled(struct xe_gt *gt)
>>> +{
>>> +    xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
>>> +    xe_gt_assert(gt, gt->sriov.vf.guc_version.major);
>>> +
>>> +    return gt->sriov.vf.runtime.uses_sched_groups;
>>> +}
>>> +
>>>   /**
>>>    * xe_gt_sriov_vf_guc_ids - VF GuC context IDs configuration.
>>>    * @gt: the &xe_gt
>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>>> index af40276790fa..2e1d34c0397f 100644
>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>>> @@ -23,11 +23,14 @@ int xe_gt_sriov_vf_connect(struct xe_gt *gt);
>>>   int xe_gt_sriov_vf_query_runtime(struct xe_gt *gt);
>>>   void xe_gt_sriov_vf_migrated_event_handler(struct xe_gt *gt);
>>>   +bool xe_gt_sriov_vf_sched_groups_enabled(struct xe_gt *gt);
>> move it there [1]
>>
>>> +
>>>   int xe_gt_sriov_vf_init_early(struct xe_gt *gt);
>>>   int xe_gt_sriov_vf_init(struct xe_gt *gt);
>>>   bool xe_gt_sriov_vf_recovery_pending(struct xe_gt *gt);
>>>     u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt);
>>> +u32 xe_gt_sriov_vf_sched_groups(struct xe_gt *gt);
>> unused ?
> 
> oops, old function name that I forgot to remove.
> 
>>
>>>   u16 xe_gt_sriov_vf_guc_ids(struct xe_gt *gt);
>>>   u64 xe_gt_sriov_vf_lmem(struct xe_gt *gt);
>> [1] here
>>
>>>   diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
>>> index 420b0e6089de..5267c097ecd0 100644
>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
>>> @@ -27,6 +27,8 @@ struct xe_gt_sriov_vf_selfconfig {
>>>   struct xe_gt_sriov_vf_runtime {
>>>       /** @gmdid: cached value of the GDMID register. */
>>>       u32 gmdid;
>>> +    /** @uses_sched_groups: whether PF enabled sched groups or not. */
>>> +    bool uses_sched_groups;
>>>       /** @regs_size: size of runtime register array. */
>>>       u32 regs_size;
>>>       /** @num_regs: number of runtime registers in the array. */
>>> diff --git a/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h b/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
>>> index 0a028c94756d..3e55d9302855 100644
>>> --- a/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
>>> +++ b/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
>>> @@ -32,6 +32,7 @@
>>>       define(H2G_STORM, guc_time_us)            \
>>>       define(IRQ_STORM, irq_time_us)            \
>>>       define(DOORBELL_STORM, doorbell_time_us)    \
>>> +    define(MULTI_LRC_COUNT, multi_lrc_count)    \
>> this needs to be defined with some version info, maybe:
>>
>>     define(MULTI_LRC_COUNT, multi_lrc_count, (70, 53, 0))    \
>>
>> and then in encode_config() have two variants of code generators:
>>
>> #define encode_threshold_config2(TAG, ...) ({                    \
>>     cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_THRESHOLD_##TAG);            \
>>     cfg[n++] = config->thresholds[MAKE_XE_GUC_KLV_THRESHOLD_INDEX(TAG)];    \
>> });
>> #define encode_threshold_config3(TAG, NAME, VERSION) ({                \
>>     if (GUC_FIRMWARE_VER(&gt->uc.guc) >= MAKE_GUC_VER VERSION) {        \
>>         encode_threshold_config2(TAG, NAME);                \
>>     }                                    \
>> });
>> #define encode_threshold_config(ARGS...) \
>>     CALL_ARGS(CONCATENATE(encode_threshold_config, COUNT_ARGS(ARGS)), ARGS)
>>
>> this should fix the issues already spotted by the CI on ADLP:
>>
>> <6> [133.603624] xe 0000:00:02.0: [drm] Tile0: GT0: { key 0x8a0d : 32b value 0 } # multi_lrc_count
>> <6> [133.603625] xe 0000:00:02.0: [drm] Tile0: GT0: { key 0x0001 : 64b value 0x200000 } # ggtt_start
>> <6> [133.603626] xe 0000:00:02.0: [drm] Tile0: GT0: { key 0x0002 : 64b value 0xfec00000 } # ggtt_size
>> <3> [133.603653] xe 0000:00:02.0: [drm] *ERROR* PF: Tile0: GT0: Failed to push self configuration (-ENOKEY)
> 
> I think it works better if we add the version for all thresholds, so we don't have to define multiple macros every time we need to check the GuC version (which I believe needs to also be done at least in register_threshold_attribute() and define_threshold_key_to_provision_case().
> I can just use 70.29.2 (which is the minimum supported version for the Xe driver) for all the older defines.

but then the code for version check will be generated for all thresholds, even those that we are sure are supported

I will try to find something simpler, or more reusable, but no promise

> 
> Daniele
> 
>>
>>
>>>       /* end */
>>>     /**
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/10] drm/xe/sriov: Scheduler groups are incompatible with multi-lrc
  2025-12-02 21:17       ` Michal Wajdeczko
@ 2025-12-02 21:25         ` Daniele Ceraolo Spurio
  2025-12-02 21:37           ` Michal Wajdeczko
  0 siblings, 1 reply; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-12-02 21:25 UTC (permalink / raw)
  To: Michal Wajdeczko, intel-xe



On 12/2/2025 1:17 PM, Michal Wajdeczko wrote:
>
> On 12/2/2025 6:57 PM, Daniele Ceraolo Spurio wrote:
>>
>> On 12/2/2025 5:32 AM, Michal Wajdeczko wrote:
>>> On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
>>>> Since engines in the same class can be divided across multiple groups,
>>>> the GuC does not allow scheduler groups to be active if there are
>>>> multi-lrc contexts. This means that:
>>>>
>>>> 1) if a mlrc context is registered when we enable scheduler groups, the
>>>>      GuC will silently ignore the configuration
>>>> 2) if a mlrc context is registered after scheduler groups are enabled,
>>>>      the GuC will disable the groups and generate an adverse event.
>>>>
>>>> We therefore need to block mlrc context creation when scheduler groups
>>>> are enabled.
>>> s/mlrc/MLRC
>>>
>>>> An adverse event threshold is available for the new adverse event.
>>> changes related to introduction of new threshold deserves separate patch
>>> (as we need to handle GuC FW version checks)
>> ok
>>
>>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/xe/abi/guc_klvs_abi.h         | 14 +++++
>>>>    drivers/gpu/drm/xe/xe_exec_queue.c            | 19 ++++++
>>>>    drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c    | 30 ++++++++++
>>>>    drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h    |  1 +
>>>>    drivers/gpu/drm/xe/xe_gt_sriov_vf.c           | 59 +++++++++++++++++++
>>>>    drivers/gpu/drm/xe/xe_gt_sriov_vf.h           |  3 +
>>>>    drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h     |  2 +
>>>>    .../drm/xe/xe_guc_klv_thresholds_set_types.h  |  1 +
>>>>    8 files changed, 129 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>>>> index 274f1b1ec37f..a6dce9da339f 100644
>>>> --- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>>>> +++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>>>> @@ -46,11 +46,18 @@
>>>>     *      Refers to 32 bit architecture version as reported by the HW IP.
>>>>     *      This key is supported on MTL+ platforms only.
>>>>     *      Requires GuC ABI 1.2+.
>>>> + *
>>>> + * _`GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE` : 0x3001
>>>> + *      Tells the driver whether scheduler groups are enabled or not.
>>>> + *      Requres GuC ABI 1.26+
>>> typo: Requires
>>>
>>> and don't forget to update xe_guc_klv_key_to_string() with new KEY
>> ok
>>
>>>>     */
>>>>      #define GUC_KLV_GLOBAL_CFG_GMD_ID_KEY            0x3000u
>>>>    #define GUC_KLV_GLOBAL_CFG_GMD_ID_LEN            1u
>>>>    +#define GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_KEY    0x3001u
>>>> +#define GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_LEN    1u
>>>> +
>>>>    /**
>>>>     * DOC: GuC Self Config KLVs
>>>>     *
>>>> @@ -369,6 +376,10 @@ enum  {
>>>>     *      :1: NORMAL = schedule VF always, irrespective of whether it has work or not
>>>>     *      :2: HIGH = schedule VF in the next time-slice after current active
>>>>     *          time-slice completes if it has active work
>>>> + *
>>>> + * _`GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT` : 0x8A0D
>>>> + *      This config sets the threshold for LRCA context registration when SRIOV
>>> ... threshold for _Multi_ LRCA context registrations ...
>> ooops :)
>>
>>>> + *      scheduler groups are enabled.
>>> "This is allows PF to monitor VFs' behavior when EGS is enabled."
>>>
>>>>     */
>>>>      #define GUC_KLV_VF_CFG_GGTT_START_KEY        0x0001
>>>> @@ -427,6 +438,9 @@ enum  {
>>>>    #define   GUC_SCHED_PRIORITY_NORMAL        1u
>>>>    #define   GUC_SCHED_PRIORITY_HIGH        2u
>>>>    +#define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_KEY    0x8a0d
>>>> +#define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_LEN    1u
>>>> +
>>>>    /*
>>>>     * Workaround keys:
>>>>     */
>>>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>>>> index 8724f8de67e2..e59c41c913b4 100644
>>>> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
>>>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>>>> @@ -16,6 +16,7 @@
>>>>    #include "xe_dep_scheduler.h"
>>>>    #include "xe_device.h"
>>>>    #include "xe_gt.h"
>>>> +#include "xe_gt_sriov_pf_policy.h"
>>>>    #include "xe_gt_sriov_vf.h"
>>>>    #include "xe_hw_engine_class_sysfs.h"
>>>>    #include "xe_hw_engine_group.h"
>>>> @@ -698,6 +699,17 @@ static u32 calc_validate_logical_mask(struct xe_device *xe,
>>>>        return return_mask;
>>>>    }
>>>>    +static bool has_sched_groups(struct xe_gt *gt)
>>>> +{
>>>> +    if (IS_SRIOV_PF(gt_to_xe(gt)) && xe_gt_sriov_pf_policy_sched_groups_enabled(gt))
>>> hmm, usually we don't want core code to look so deeply into PF subcomponent
>> I thought about that, but I didn't know which sriov file would be the right one, since almost all of them are either pf-only or vf-only. should I just move it to xe_sriov.c?
> hmm, but xe_sriov.c is device level function
>
> we might add helper to xe_gt_sriov_pf.h where we can also provide stub there:
>
> #ifdef CONFIG_PCI_IOV
> bool xe_gt_sriov_pf_sched_groups_enabled(gt);
> #else
> static inline bool xe_gt_sriov_pf_sched_groups_enabled(gt) { return false; }
> #endif

Do you mean to check for both PF and VF from that function, and just 
have it in the PF file for convenience?
I could also just create a new xe_gt_sriov.h file and place it there, 
making the function an inline.

>
>>> also, do we want to work in hybrid mode where one GT is using MLRC and other is not?
>> The GuCs allow it. I didn't want to block it in case we do get a scenario later where this is needed (e.g. we might get a split primary GT but still want to do mlrc on media)
>>
>>>> +        return true;
>>>> +> +    if (IS_SRIOV_VF(gt_to_xe(gt)) && xe_gt_sriov_vf_sched_groups_enabled(gt))
>>>> +        return true;
>>>> +
>>>> +    return false;
>>>> +}
>>>> +
>>>>    int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>>>>                       struct drm_file *file)
>>>>    {
>>>> @@ -790,6 +802,13 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>>>>                return -ENOENT;
>>>>            }
>>>>    +        /* SRIOV sched groups are not compatible with multi-lrc */
>>>> +        if (XE_IOCTL_DBG(xe, args->width > 1 && has_sched_groups(hwe->gt))) {
>>>> +            up_read(&vm->lock);
>>>> +            xe_vm_put(vm);
>>>> +            return -EINVAL;
>>>> +        }
>>>> +
>>>>            q = xe_exec_queue_create(xe, vm, logical_mask,
>>>>                         args->width, hwe, flags,
>>>>                         args->extensions);
>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>>> index 48f250ae0d0d..c7f1ea8eb9c5 100644
>>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>>> @@ -8,6 +8,7 @@
>>>>    #include <drm/drm_managed.h>
>>>>      #include "xe_bo.h"
>>>> +#include "xe_exec_queue_types.h"
>>>>    #include "xe_gt.h"
>>>>    #include "xe_gt_sriov_pf_helpers.h"
>>>>    #include "xe_gt_sriov_pf_policy.h"
>>>> @@ -527,6 +528,24 @@ static int __pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>>>>                          masks, num_masks);
>>>>    }
>>>>    +static bool guc_has_mlrc_queue(struct xe_guc *guc)
>>> this is all GuC stuff, so export it from xe_guc_submission.c
>> ok
>>
>>>> +{
>>>> +    struct xe_exec_queue *q;
>>>> +    unsigned long index;
>>>> +    bool found = false;
>>>> +
>>>> +    mutex_lock(&guc->submission_state.lock);
>>> guard(mutex) ?
>>>
>>>> +    xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) {
>>>> +        if (q->width > 1) {
>>>> +            found = true;
>>>> +            break;
>>>> +        }
>>>> +    }
>>>> +    mutex_unlock(&guc->submission_state.lock);
>>>> +
>>>> +    return found;
>>> what if new MLRC is created right now?
>>>
>>> maybe we should use xe_guard to lockdown one feature or the other?
>> The idea is that the admin is responsible for enabling EGS when the system is in the correct state, which is why the GuC doesn't return an error and just ignore the KLV if the system is not in the correct state. This check is there to catch the case where the admin has closed all apps in preparation to enabling EGS but the driver still hasn't processed all the context de-registrations. Therefore, I don't really want to over-complicate it with extra locking. I'll add a comment to better explain.
> so it doesn't have to be bullet-proof ?

Yup. If we had stricter requirements then the GuC would enforce them.

Daniele

>
>>>> +}
>>>> +
>>>>    static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>>>>    {
>>>>        int err;
>>>> @@ -548,6 +567,12 @@ static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>>>>        if (xe_sriov_pf_num_vfs(gt_to_xe(gt)))
>>>>            return -EPERM;
>>>>    +    /* The GuC silently ignores the setting if any mlrc contexts are registered */
>>>> +    if ((mode != XE_SRIOV_SCHED_GROUPS_NONE) && guc_has_mlrc_queue(&gt->uc.guc)) {
>>>> +        xe_gt_sriov_notice(gt, "can't enable sched groups with active mlrc queues\n");
>>>> +        return -EPERM;
>>>> +    }
>>>> +
>>>>        err = __pf_provision_sched_groups(gt, mode);
>>>>        if (err)
>>>>            return err;
>>>> @@ -600,6 +625,11 @@ int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value)
>>>>        return err;
>>>>    }
>>>>    
>>> every public function needs to have kernel-doc
>> ok
>>
>>>> +bool xe_gt_sriov_pf_policy_sched_groups_enabled(struct xe_gt *gt)
>>>> +{
>>>> +    return gt->sriov.pf.policy.guc.sched_groups.current_mode != XE_SRIOV_SCHED_GROUPS_NONE;
>>>> +}
>>>> +
>>>>    static void pf_sanitize_guc_policies(struct xe_gt *gt)
>>>>    {
>>>>        pf_sanitize_sched_if_idle(gt);
>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>>> index 36680996f2bd..89aa3af6cc7d 100644
>>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>>> @@ -18,6 +18,7 @@ bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
>>>>    int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
>>>>    u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
>>>>    int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value);
>>>> +bool xe_gt_sriov_pf_policy_sched_groups_enabled(struct xe_gt *gt);
>>>>      int xe_gt_sriov_pf_policy_init(struct xe_gt *gt);
>>>>    void xe_gt_sriov_pf_policy_sanitize(struct xe_gt *gt);
>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>>>> index 4c73a077d314..7a180c947032 100644
>>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>>>> @@ -438,6 +438,30 @@ u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt)
>>>>        return value;
>>>>    }
>>>>    +static int query_vf_sched_groups(struct xe_gt *gt)
>>>> +{
>>>> +    struct xe_guc *guc = &gt->uc.guc;
>>>> +    u32 value = 0;
>>>> +    int err;
>>>> +
>>>> +    xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
>>>> +
>>>> +    if (MAKE_GUC_VER_STRUCT(gt->sriov.vf.guc_version) < MAKE_GUC_VER(1, 26, 0))
>>>> +        return 0;
>>>> +
>>>> +    err = guc_action_query_single_klv32(guc,
>>>> +                        GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_KEY,
>>>> +                        &value);
>>>> +    if (unlikely(err)) {
>>>> +        xe_gt_sriov_err(gt, "Failed to obtain sched groups status (%pe)\n",
>>>> +                ERR_PTR(err));
>>>> +        return err;
>>>> +    }
>>>> +
>>>> +    xe_gt_sriov_dbg(gt, "sched groups %s\n", value ? "enabled" : "disabled");
>>> str_enabled_disabled(value)
>>>
>>>> +    return value;
>>>> +}
>>>> +
>>>>    static int vf_get_ggtt_info(struct xe_gt *gt)
>>>>    {
>>>>        struct xe_tile *tile = gt_to_tile(gt);
>>>> @@ -564,6 +588,21 @@ static void vf_cache_gmdid(struct xe_gt *gt)
>>>>        gt->sriov.vf.runtime.gmdid = xe_gt_sriov_vf_gmdid(gt);
>>>>    }
>>>>    +static int vf_cache_sched_groups_status(struct xe_gt *gt)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
>>>> +
>>>> +    ret = query_vf_sched_groups(gt);
>>>> +    if (ret < 0)
>>>> +        return ret;
>>>> +
>>>> +    gt->sriov.vf.runtime.uses_sched_groups = ret;
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>>    /**
>>>>     * xe_gt_sriov_vf_query_config - Query SR-IOV config data over MMIO.
>>>>     * @gt: the &xe_gt
>>>> @@ -593,12 +632,32 @@ int xe_gt_sriov_vf_query_config(struct xe_gt *gt)
>>>>        if (unlikely(err))
>>>>            return err;
>>>>    +    err = vf_cache_sched_groups_status(gt);
>>>> +    if (unlikely(err))
>>>> +        return err;
>>>> +
>>>>        if (has_gmdid(xe))
>>>>            vf_cache_gmdid(gt);
>>>>          return 0;
>>>>    }
>>>>    +/**
>>>> + * xe_gt_sriov_vf_sched_groups_enabled - Check if PF has enabled sched groups
>>>       * xe_gt_sriov_vf_sched_groups_enabled() - Check ...
>>>
>>>
>>>> + * @gt: the &xe_gt
>>>> + *
>>>> + * This function is for VF use only.
>>>> + *
>>>> + * Return: true if shed groups were enabled, false otherwise.
>>> typo: s/shed/scheduler
>>>
>>>> + */
>>>> +bool xe_gt_sriov_vf_sched_groups_enabled(struct xe_gt *gt)
>>>> +{
>>>> +    xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
>>>> +    xe_gt_assert(gt, gt->sriov.vf.guc_version.major);
>>>> +
>>>> +    return gt->sriov.vf.runtime.uses_sched_groups;
>>>> +}
>>>> +
>>>>    /**
>>>>     * xe_gt_sriov_vf_guc_ids - VF GuC context IDs configuration.
>>>>     * @gt: the &xe_gt
>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>>>> index af40276790fa..2e1d34c0397f 100644
>>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>>>> @@ -23,11 +23,14 @@ int xe_gt_sriov_vf_connect(struct xe_gt *gt);
>>>>    int xe_gt_sriov_vf_query_runtime(struct xe_gt *gt);
>>>>    void xe_gt_sriov_vf_migrated_event_handler(struct xe_gt *gt);
>>>>    +bool xe_gt_sriov_vf_sched_groups_enabled(struct xe_gt *gt);
>>> move it there [1]
>>>
>>>> +
>>>>    int xe_gt_sriov_vf_init_early(struct xe_gt *gt);
>>>>    int xe_gt_sriov_vf_init(struct xe_gt *gt);
>>>>    bool xe_gt_sriov_vf_recovery_pending(struct xe_gt *gt);
>>>>      u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt);
>>>> +u32 xe_gt_sriov_vf_sched_groups(struct xe_gt *gt);
>>> unused ?
>> oops, old function name that I forgot to remove.
>>
>>>>    u16 xe_gt_sriov_vf_guc_ids(struct xe_gt *gt);
>>>>    u64 xe_gt_sriov_vf_lmem(struct xe_gt *gt);
>>> [1] here
>>>
>>>>    diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
>>>> index 420b0e6089de..5267c097ecd0 100644
>>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
>>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
>>>> @@ -27,6 +27,8 @@ struct xe_gt_sriov_vf_selfconfig {
>>>>    struct xe_gt_sriov_vf_runtime {
>>>>        /** @gmdid: cached value of the GDMID register. */
>>>>        u32 gmdid;
>>>> +    /** @uses_sched_groups: whether PF enabled sched groups or not. */
>>>> +    bool uses_sched_groups;
>>>>        /** @regs_size: size of runtime register array. */
>>>>        u32 regs_size;
>>>>        /** @num_regs: number of runtime registers in the array. */
>>>> diff --git a/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h b/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
>>>> index 0a028c94756d..3e55d9302855 100644
>>>> --- a/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
>>>> +++ b/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
>>>> @@ -32,6 +32,7 @@
>>>>        define(H2G_STORM, guc_time_us)            \
>>>>        define(IRQ_STORM, irq_time_us)            \
>>>>        define(DOORBELL_STORM, doorbell_time_us)    \
>>>> +    define(MULTI_LRC_COUNT, multi_lrc_count)    \
>>> this needs to be defined with some version info, maybe:
>>>
>>>      define(MULTI_LRC_COUNT, multi_lrc_count, (70, 53, 0))    \
>>>
>>> and then in encode_config() have two variants of code generators:
>>>
>>> #define encode_threshold_config2(TAG, ...) ({                    \
>>>      cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_THRESHOLD_##TAG);            \
>>>      cfg[n++] = config->thresholds[MAKE_XE_GUC_KLV_THRESHOLD_INDEX(TAG)];    \
>>> });
>>> #define encode_threshold_config3(TAG, NAME, VERSION) ({                \
>>>      if (GUC_FIRMWARE_VER(&gt->uc.guc) >= MAKE_GUC_VER VERSION) {        \
>>>          encode_threshold_config2(TAG, NAME);                \
>>>      }                                    \
>>> });
>>> #define encode_threshold_config(ARGS...) \
>>>      CALL_ARGS(CONCATENATE(encode_threshold_config, COUNT_ARGS(ARGS)), ARGS)
>>>
>>> this should fix the issues already spotted by the CI on ADLP:
>>>
>>> <6> [133.603624] xe 0000:00:02.0: [drm] Tile0: GT0: { key 0x8a0d : 32b value 0 } # multi_lrc_count
>>> <6> [133.603625] xe 0000:00:02.0: [drm] Tile0: GT0: { key 0x0001 : 64b value 0x200000 } # ggtt_start
>>> <6> [133.603626] xe 0000:00:02.0: [drm] Tile0: GT0: { key 0x0002 : 64b value 0xfec00000 } # ggtt_size
>>> <3> [133.603653] xe 0000:00:02.0: [drm] *ERROR* PF: Tile0: GT0: Failed to push self configuration (-ENOKEY)
>> I think it works better if we add the version for all thresholds, so we don't have to define multiple macros every time we need to check the GuC version (which I believe needs to also be done at least in register_threshold_attribute() and define_threshold_key_to_provision_case().
>> I can just use 70.29.2 (which is the minimum supported version for the Xe driver) for all the older defines.
> but then the code for version check will be generated for all thresholds, even those that we are sure are supported
>
> I will try to find something simpler, or more reusable, but no promise
>
>> Daniele
>>
>>>
>>>>        /* end */
>>>>      /**


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/10] drm/xe/sriov: Scheduler groups are incompatible with multi-lrc
  2025-12-02 21:25         ` Daniele Ceraolo Spurio
@ 2025-12-02 21:37           ` Michal Wajdeczko
  2025-12-02 21:42             ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 44+ messages in thread
From: Michal Wajdeczko @ 2025-12-02 21:37 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-xe



On 12/2/2025 10:25 PM, Daniele Ceraolo Spurio wrote:
> 
> 
> On 12/2/2025 1:17 PM, Michal Wajdeczko wrote:
>>
>> On 12/2/2025 6:57 PM, Daniele Ceraolo Spurio wrote:
>>>
>>> On 12/2/2025 5:32 AM, Michal Wajdeczko wrote:
>>>> On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
>>>>> Since engines in the same class can be divided across multiple groups,
>>>>> the GuC does not allow scheduler groups to be active if there are
>>>>> multi-lrc contexts. This means that:
>>>>>
>>>>> 1) if a mlrc context is registered when we enable scheduler groups, the
>>>>>      GuC will silently ignore the configuration
>>>>> 2) if a mlrc context is registered after scheduler groups are enabled,
>>>>>      the GuC will disable the groups and generate an adverse event.
>>>>>
>>>>> We therefore need to block mlrc context creation when scheduler groups
>>>>> are enabled.
>>>> s/mlrc/MLRC
>>>>
>>>>> An adverse event threshold is available for the new adverse event.
>>>> changes related to introduction of new threshold deserves separate patch
>>>> (as we need to handle GuC FW version checks)
>>> ok
>>>
>>>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/xe/abi/guc_klvs_abi.h         | 14 +++++
>>>>>    drivers/gpu/drm/xe/xe_exec_queue.c            | 19 ++++++
>>>>>    drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c    | 30 ++++++++++
>>>>>    drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h    |  1 +
>>>>>    drivers/gpu/drm/xe/xe_gt_sriov_vf.c           | 59 +++++++++++++++++++
>>>>>    drivers/gpu/drm/xe/xe_gt_sriov_vf.h           |  3 +
>>>>>    drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h     |  2 +
>>>>>    .../drm/xe/xe_guc_klv_thresholds_set_types.h  |  1 +
>>>>>    8 files changed, 129 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>>>>> index 274f1b1ec37f..a6dce9da339f 100644
>>>>> --- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>>>>> +++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>>>>> @@ -46,11 +46,18 @@
>>>>>     *      Refers to 32 bit architecture version as reported by the HW IP.
>>>>>     *      This key is supported on MTL+ platforms only.
>>>>>     *      Requires GuC ABI 1.2+.
>>>>> + *
>>>>> + * _`GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE` : 0x3001
>>>>> + *      Tells the driver whether scheduler groups are enabled or not.
>>>>> + *      Requres GuC ABI 1.26+
>>>> typo: Requires
>>>>
>>>> and don't forget to update xe_guc_klv_key_to_string() with new KEY
>>> ok
>>>
>>>>>     */
>>>>>      #define GUC_KLV_GLOBAL_CFG_GMD_ID_KEY            0x3000u
>>>>>    #define GUC_KLV_GLOBAL_CFG_GMD_ID_LEN            1u
>>>>>    +#define GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_KEY    0x3001u
>>>>> +#define GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_LEN    1u
>>>>> +
>>>>>    /**
>>>>>     * DOC: GuC Self Config KLVs
>>>>>     *
>>>>> @@ -369,6 +376,10 @@ enum  {
>>>>>     *      :1: NORMAL = schedule VF always, irrespective of whether it has work or not
>>>>>     *      :2: HIGH = schedule VF in the next time-slice after current active
>>>>>     *          time-slice completes if it has active work
>>>>> + *
>>>>> + * _`GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT` : 0x8A0D
>>>>> + *      This config sets the threshold for LRCA context registration when SRIOV
>>>> ... threshold for _Multi_ LRCA context registrations ...
>>> ooops :)
>>>
>>>>> + *      scheduler groups are enabled.
>>>> "This is allows PF to monitor VFs' behavior when EGS is enabled."
>>>>
>>>>>     */
>>>>>      #define GUC_KLV_VF_CFG_GGTT_START_KEY        0x0001
>>>>> @@ -427,6 +438,9 @@ enum  {
>>>>>    #define   GUC_SCHED_PRIORITY_NORMAL        1u
>>>>>    #define   GUC_SCHED_PRIORITY_HIGH        2u
>>>>>    +#define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_KEY    0x8a0d
>>>>> +#define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_LEN    1u
>>>>> +
>>>>>    /*
>>>>>     * Workaround keys:
>>>>>     */
>>>>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>>>>> index 8724f8de67e2..e59c41c913b4 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>>>>> @@ -16,6 +16,7 @@
>>>>>    #include "xe_dep_scheduler.h"
>>>>>    #include "xe_device.h"
>>>>>    #include "xe_gt.h"
>>>>> +#include "xe_gt_sriov_pf_policy.h"
>>>>>    #include "xe_gt_sriov_vf.h"
>>>>>    #include "xe_hw_engine_class_sysfs.h"
>>>>>    #include "xe_hw_engine_group.h"
>>>>> @@ -698,6 +699,17 @@ static u32 calc_validate_logical_mask(struct xe_device *xe,
>>>>>        return return_mask;
>>>>>    }
>>>>>    +static bool has_sched_groups(struct xe_gt *gt)
>>>>> +{
>>>>> +    if (IS_SRIOV_PF(gt_to_xe(gt)) && xe_gt_sriov_pf_policy_sched_groups_enabled(gt))
>>>> hmm, usually we don't want core code to look so deeply into PF subcomponent
>>> I thought about that, but I didn't know which sriov file would be the right one, since almost all of them are either pf-only or vf-only. should I just move it to xe_sriov.c?
>> hmm, but xe_sriov.c is device level function
>>
>> we might add helper to xe_gt_sriov_pf.h where we can also provide stub there:
>>
>> #ifdef CONFIG_PCI_IOV
>> bool xe_gt_sriov_pf_sched_groups_enabled(gt);
>> #else
>> static inline bool xe_gt_sriov_pf_sched_groups_enabled(gt) { return false; }
>> #endif
> 
> Do you mean to check for both PF and VF from that function, and just have it in the PF file for convenience?

this is PF file and only PF functions might require stubs (when PCI_IOV=n)
so it is just to hide PF internals

VF function is already exposed at GT level

> I could also just create a new xe_gt_sriov.h file and place it there, making the function an inline.

maybe later ;) we already have many files

> 
>>
>>>> also, do we want to work in hybrid mode where one GT is using MLRC and other is not?
>>> The GuCs allow it. I didn't want to block it in case we do get a scenario later where this is needed (e.g. we might get a split primary GT but still want to do mlrc on media)
>>>
>>>>> +        return true;
>>>>> +> +    if (IS_SRIOV_VF(gt_to_xe(gt)) && xe_gt_sriov_vf_sched_groups_enabled(gt))
>>>>> +        return true;
>>>>> +
>>>>> +    return false;
>>>>> +}
>>>>> +
>>>>>    int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>>>>>                       struct drm_file *file)
>>>>>    {
>>>>> @@ -790,6 +802,13 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>>>>>                return -ENOENT;
>>>>>            }
>>>>>    +        /* SRIOV sched groups are not compatible with multi-lrc */
>>>>> +        if (XE_IOCTL_DBG(xe, args->width > 1 && has_sched_groups(hwe->gt))) {
>>>>> +            up_read(&vm->lock);
>>>>> +            xe_vm_put(vm);
>>>>> +            return -EINVAL;
>>>>> +        }
>>>>> +
>>>>>            q = xe_exec_queue_create(xe, vm, logical_mask,
>>>>>                         args->width, hwe, flags,
>>>>>                         args->extensions);
>>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>>>> index 48f250ae0d0d..c7f1ea8eb9c5 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>>>> @@ -8,6 +8,7 @@
>>>>>    #include <drm/drm_managed.h>
>>>>>      #include "xe_bo.h"
>>>>> +#include "xe_exec_queue_types.h"
>>>>>    #include "xe_gt.h"
>>>>>    #include "xe_gt_sriov_pf_helpers.h"
>>>>>    #include "xe_gt_sriov_pf_policy.h"
>>>>> @@ -527,6 +528,24 @@ static int __pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>>>>>                          masks, num_masks);
>>>>>    }
>>>>>    +static bool guc_has_mlrc_queue(struct xe_guc *guc)
>>>> this is all GuC stuff, so export it from xe_guc_submission.c
>>> ok
>>>
>>>>> +{
>>>>> +    struct xe_exec_queue *q;
>>>>> +    unsigned long index;
>>>>> +    bool found = false;
>>>>> +
>>>>> +    mutex_lock(&guc->submission_state.lock);
>>>> guard(mutex) ?
>>>>
>>>>> +    xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) {
>>>>> +        if (q->width > 1) {
>>>>> +            found = true;
>>>>> +            break;
>>>>> +        }
>>>>> +    }
>>>>> +    mutex_unlock(&guc->submission_state.lock);
>>>>> +
>>>>> +    return found;
>>>> what if new MLRC is created right now?
>>>>
>>>> maybe we should use xe_guard to lockdown one feature or the other?
>>> The idea is that the admin is responsible for enabling EGS when the system is in the correct state, which is why the GuC doesn't return an error and just ignore the KLV if the system is not in the correct state. This check is there to catch the case where the admin has closed all apps in preparation to enabling EGS but the driver still hasn't processed all the context de-registrations. Therefore, I don't really want to over-complicate it with extra locking. I'll add a comment to better explain.
>> so it doesn't have to be bullet-proof ?
> 
> Yup. If we had stricter requirements then the GuC would enforce them.
> 
> Daniele
> 
>>
>>>>> +}
>>>>> +
>>>>>    static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>>>>>    {
>>>>>        int err;
>>>>> @@ -548,6 +567,12 @@ static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>>>>>        if (xe_sriov_pf_num_vfs(gt_to_xe(gt)))
>>>>>            return -EPERM;
>>>>>    +    /* The GuC silently ignores the setting if any mlrc contexts are registered */
>>>>> +    if ((mode != XE_SRIOV_SCHED_GROUPS_NONE) && guc_has_mlrc_queue(&gt->uc.guc)) {
>>>>> +        xe_gt_sriov_notice(gt, "can't enable sched groups with active mlrc queues\n");
>>>>> +        return -EPERM;
>>>>> +    }
>>>>> +
>>>>>        err = __pf_provision_sched_groups(gt, mode);
>>>>>        if (err)
>>>>>            return err;
>>>>> @@ -600,6 +625,11 @@ int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value)
>>>>>        return err;
>>>>>    }
>>>>>    
>>>> every public function needs to have kernel-doc
>>> ok
>>>
>>>>> +bool xe_gt_sriov_pf_policy_sched_groups_enabled(struct xe_gt *gt)
>>>>> +{
>>>>> +    return gt->sriov.pf.policy.guc.sched_groups.current_mode != XE_SRIOV_SCHED_GROUPS_NONE;
>>>>> +}
>>>>> +
>>>>>    static void pf_sanitize_guc_policies(struct xe_gt *gt)
>>>>>    {
>>>>>        pf_sanitize_sched_if_idle(gt);
>>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>>>> index 36680996f2bd..89aa3af6cc7d 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>>>> @@ -18,6 +18,7 @@ bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
>>>>>    int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
>>>>>    u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
>>>>>    int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value);
>>>>> +bool xe_gt_sriov_pf_policy_sched_groups_enabled(struct xe_gt *gt);
>>>>>      int xe_gt_sriov_pf_policy_init(struct xe_gt *gt);
>>>>>    void xe_gt_sriov_pf_policy_sanitize(struct xe_gt *gt);
>>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>>>>> index 4c73a077d314..7a180c947032 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>>>>> @@ -438,6 +438,30 @@ u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt)
>>>>>        return value;
>>>>>    }
>>>>>    +static int query_vf_sched_groups(struct xe_gt *gt)
>>>>> +{
>>>>> +    struct xe_guc *guc = &gt->uc.guc;
>>>>> +    u32 value = 0;
>>>>> +    int err;
>>>>> +
>>>>> +    xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
>>>>> +
>>>>> +    if (MAKE_GUC_VER_STRUCT(gt->sriov.vf.guc_version) < MAKE_GUC_VER(1, 26, 0))
>>>>> +        return 0;
>>>>> +
>>>>> +    err = guc_action_query_single_klv32(guc,
>>>>> +                        GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_KEY,
>>>>> +                        &value);
>>>>> +    if (unlikely(err)) {
>>>>> +        xe_gt_sriov_err(gt, "Failed to obtain sched groups status (%pe)\n",
>>>>> +                ERR_PTR(err));
>>>>> +        return err;
>>>>> +    }
>>>>> +
>>>>> +    xe_gt_sriov_dbg(gt, "sched groups %s\n", value ? "enabled" : "disabled");
>>>> str_enabled_disabled(value)
>>>>
>>>>> +    return value;
>>>>> +}
>>>>> +
>>>>>    static int vf_get_ggtt_info(struct xe_gt *gt)
>>>>>    {
>>>>>        struct xe_tile *tile = gt_to_tile(gt);
>>>>> @@ -564,6 +588,21 @@ static void vf_cache_gmdid(struct xe_gt *gt)
>>>>>        gt->sriov.vf.runtime.gmdid = xe_gt_sriov_vf_gmdid(gt);
>>>>>    }
>>>>>    +static int vf_cache_sched_groups_status(struct xe_gt *gt)
>>>>> +{
>>>>> +    int ret;
>>>>> +
>>>>> +    xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
>>>>> +
>>>>> +    ret = query_vf_sched_groups(gt);
>>>>> +    if (ret < 0)
>>>>> +        return ret;
>>>>> +
>>>>> +    gt->sriov.vf.runtime.uses_sched_groups = ret;
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>>    /**
>>>>>     * xe_gt_sriov_vf_query_config - Query SR-IOV config data over MMIO.
>>>>>     * @gt: the &xe_gt
>>>>> @@ -593,12 +632,32 @@ int xe_gt_sriov_vf_query_config(struct xe_gt *gt)
>>>>>        if (unlikely(err))
>>>>>            return err;
>>>>>    +    err = vf_cache_sched_groups_status(gt);
>>>>> +    if (unlikely(err))
>>>>> +        return err;
>>>>> +
>>>>>        if (has_gmdid(xe))
>>>>>            vf_cache_gmdid(gt);
>>>>>          return 0;
>>>>>    }
>>>>>    +/**
>>>>> + * xe_gt_sriov_vf_sched_groups_enabled - Check if PF has enabled sched groups
>>>>       * xe_gt_sriov_vf_sched_groups_enabled() - Check ...
>>>>
>>>>
>>>>> + * @gt: the &xe_gt
>>>>> + *
>>>>> + * This function is for VF use only.
>>>>> + *
>>>>> + * Return: true if shed groups were enabled, false otherwise.
>>>> typo: s/shed/scheduler
>>>>
>>>>> + */
>>>>> +bool xe_gt_sriov_vf_sched_groups_enabled(struct xe_gt *gt)
>>>>> +{
>>>>> +    xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
>>>>> +    xe_gt_assert(gt, gt->sriov.vf.guc_version.major);
>>>>> +
>>>>> +    return gt->sriov.vf.runtime.uses_sched_groups;
>>>>> +}
>>>>> +
>>>>>    /**
>>>>>     * xe_gt_sriov_vf_guc_ids - VF GuC context IDs configuration.
>>>>>     * @gt: the &xe_gt
>>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>>>>> index af40276790fa..2e1d34c0397f 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>>>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>>>>> @@ -23,11 +23,14 @@ int xe_gt_sriov_vf_connect(struct xe_gt *gt);
>>>>>    int xe_gt_sriov_vf_query_runtime(struct xe_gt *gt);
>>>>>    void xe_gt_sriov_vf_migrated_event_handler(struct xe_gt *gt);
>>>>>    +bool xe_gt_sriov_vf_sched_groups_enabled(struct xe_gt *gt);
>>>> move it there [1]
>>>>
>>>>> +
>>>>>    int xe_gt_sriov_vf_init_early(struct xe_gt *gt);
>>>>>    int xe_gt_sriov_vf_init(struct xe_gt *gt);
>>>>>    bool xe_gt_sriov_vf_recovery_pending(struct xe_gt *gt);
>>>>>      u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt);
>>>>> +u32 xe_gt_sriov_vf_sched_groups(struct xe_gt *gt);
>>>> unused ?
>>> oops, old function name that I forgot to remove.
>>>
>>>>>    u16 xe_gt_sriov_vf_guc_ids(struct xe_gt *gt);
>>>>>    u64 xe_gt_sriov_vf_lmem(struct xe_gt *gt);
>>>> [1] here
>>>>
>>>>>    diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
>>>>> index 420b0e6089de..5267c097ecd0 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
>>>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
>>>>> @@ -27,6 +27,8 @@ struct xe_gt_sriov_vf_selfconfig {
>>>>>    struct xe_gt_sriov_vf_runtime {
>>>>>        /** @gmdid: cached value of the GDMID register. */
>>>>>        u32 gmdid;
>>>>> +    /** @uses_sched_groups: whether PF enabled sched groups or not. */
>>>>> +    bool uses_sched_groups;
>>>>>        /** @regs_size: size of runtime register array. */
>>>>>        u32 regs_size;
>>>>>        /** @num_regs: number of runtime registers in the array. */
>>>>> diff --git a/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h b/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
>>>>> index 0a028c94756d..3e55d9302855 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
>>>>> +++ b/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
>>>>> @@ -32,6 +32,7 @@
>>>>>        define(H2G_STORM, guc_time_us)            \
>>>>>        define(IRQ_STORM, irq_time_us)            \
>>>>>        define(DOORBELL_STORM, doorbell_time_us)    \
>>>>> +    define(MULTI_LRC_COUNT, multi_lrc_count)    \
>>>> this needs to be defined with some version info, maybe:
>>>>
>>>>      define(MULTI_LRC_COUNT, multi_lrc_count, (70, 53, 0))    \
>>>>
>>>> and then in encode_config() have two variants of code generators:
>>>>
>>>> #define encode_threshold_config2(TAG, ...) ({                    \
>>>>      cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_THRESHOLD_##TAG);            \
>>>>      cfg[n++] = config->thresholds[MAKE_XE_GUC_KLV_THRESHOLD_INDEX(TAG)];    \
>>>> });
>>>> #define encode_threshold_config3(TAG, NAME, VERSION) ({                \
>>>>      if (GUC_FIRMWARE_VER(&gt->uc.guc) >= MAKE_GUC_VER VERSION) {        \
>>>>          encode_threshold_config2(TAG, NAME);                \
>>>>      }                                    \
>>>> });
>>>> #define encode_threshold_config(ARGS...) \
>>>>      CALL_ARGS(CONCATENATE(encode_threshold_config, COUNT_ARGS(ARGS)), ARGS)
>>>>
>>>> this should fix the issues already spotted by the CI on ADLP:
>>>>
>>>> <6> [133.603624] xe 0000:00:02.0: [drm] Tile0: GT0: { key 0x8a0d : 32b value 0 } # multi_lrc_count
>>>> <6> [133.603625] xe 0000:00:02.0: [drm] Tile0: GT0: { key 0x0001 : 64b value 0x200000 } # ggtt_start
>>>> <6> [133.603626] xe 0000:00:02.0: [drm] Tile0: GT0: { key 0x0002 : 64b value 0xfec00000 } # ggtt_size
>>>> <3> [133.603653] xe 0000:00:02.0: [drm] *ERROR* PF: Tile0: GT0: Failed to push self configuration (-ENOKEY)
>>> I think it works better if we add the version for all thresholds, so we don't have to define multiple macros every time we need to check the GuC version (which I believe needs to also be done at least in register_threshold_attribute() and define_threshold_key_to_provision_case().
>>> I can just use 70.29.2 (which is the minimum supported version for the Xe driver) for all the older defines.
>> but then the code for version check will be generated for all thresholds, even those that we are sure are supported
>>
>> I will try to find something simpler, or more reusable, but no promise
>>
>>> Daniele
>>>
>>>>
>>>>>        /* end */
>>>>>      /**
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/10] drm/xe/sriov: Scheduler groups are incompatible with multi-lrc
  2025-12-02 21:37           ` Michal Wajdeczko
@ 2025-12-02 21:42             ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-12-02 21:42 UTC (permalink / raw)
  To: Michal Wajdeczko, intel-xe



On 12/2/2025 1:37 PM, Michal Wajdeczko wrote:
>
> On 12/2/2025 10:25 PM, Daniele Ceraolo Spurio wrote:
>>
>> On 12/2/2025 1:17 PM, Michal Wajdeczko wrote:
>>> On 12/2/2025 6:57 PM, Daniele Ceraolo Spurio wrote:
>>>> On 12/2/2025 5:32 AM, Michal Wajdeczko wrote:
>>>>> On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
>>>>>> Since engines in the same class can be divided across multiple groups,
>>>>>> the GuC does not allow scheduler groups to be active if there are
>>>>>> multi-lrc contexts. This means that:
>>>>>>
>>>>>> 1) if a mlrc context is registered when we enable scheduler groups, the
>>>>>>       GuC will silently ignore the configuration
>>>>>> 2) if a mlrc context is registered after scheduler groups are enabled,
>>>>>>       the GuC will disable the groups and generate an adverse event.
>>>>>>
>>>>>> We therefore need to block mlrc context creation when scheduler groups
>>>>>> are enabled.
>>>>> s/mlrc/MLRC
>>>>>
>>>>>> An adverse event threshold is available for the new adverse event.
>>>>> changes related to introduction of new threshold deserves separate patch
>>>>> (as we need to handle GuC FW version checks)
>>>> ok
>>>>
>>>>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/xe/abi/guc_klvs_abi.h         | 14 +++++
>>>>>>     drivers/gpu/drm/xe/xe_exec_queue.c            | 19 ++++++
>>>>>>     drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c    | 30 ++++++++++
>>>>>>     drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h    |  1 +
>>>>>>     drivers/gpu/drm/xe/xe_gt_sriov_vf.c           | 59 +++++++++++++++++++
>>>>>>     drivers/gpu/drm/xe/xe_gt_sriov_vf.h           |  3 +
>>>>>>     drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h     |  2 +
>>>>>>     .../drm/xe/xe_guc_klv_thresholds_set_types.h  |  1 +
>>>>>>     8 files changed, 129 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>>>>>> index 274f1b1ec37f..a6dce9da339f 100644
>>>>>> --- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>>>>>> +++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>>>>>> @@ -46,11 +46,18 @@
>>>>>>      *      Refers to 32 bit architecture version as reported by the HW IP.
>>>>>>      *      This key is supported on MTL+ platforms only.
>>>>>>      *      Requires GuC ABI 1.2+.
>>>>>> + *
>>>>>> + * _`GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE` : 0x3001
>>>>>> + *      Tells the driver whether scheduler groups are enabled or not.
>>>>>> + *      Requres GuC ABI 1.26+
>>>>> typo: Requires
>>>>>
>>>>> and don't forget to update xe_guc_klv_key_to_string() with new KEY
>>>> ok
>>>>
>>>>>>      */
>>>>>>       #define GUC_KLV_GLOBAL_CFG_GMD_ID_KEY            0x3000u
>>>>>>     #define GUC_KLV_GLOBAL_CFG_GMD_ID_LEN            1u
>>>>>>     +#define GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_KEY    0x3001u
>>>>>> +#define GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_LEN    1u
>>>>>> +
>>>>>>     /**
>>>>>>      * DOC: GuC Self Config KLVs
>>>>>>      *
>>>>>> @@ -369,6 +376,10 @@ enum  {
>>>>>>      *      :1: NORMAL = schedule VF always, irrespective of whether it has work or not
>>>>>>      *      :2: HIGH = schedule VF in the next time-slice after current active
>>>>>>      *          time-slice completes if it has active work
>>>>>> + *
>>>>>> + * _`GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT` : 0x8A0D
>>>>>> + *      This config sets the threshold for LRCA context registration when SRIOV
>>>>> ... threshold for _Multi_ LRCA context registrations ...
>>>> ooops :)
>>>>
>>>>>> + *      scheduler groups are enabled.
>>>>> "This is allows PF to monitor VFs' behavior when EGS is enabled."
>>>>>
>>>>>>      */
>>>>>>       #define GUC_KLV_VF_CFG_GGTT_START_KEY        0x0001
>>>>>> @@ -427,6 +438,9 @@ enum  {
>>>>>>     #define   GUC_SCHED_PRIORITY_NORMAL        1u
>>>>>>     #define   GUC_SCHED_PRIORITY_HIGH        2u
>>>>>>     +#define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_KEY    0x8a0d
>>>>>> +#define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_LEN    1u
>>>>>> +
>>>>>>     /*
>>>>>>      * Workaround keys:
>>>>>>      */
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>>>>>> index 8724f8de67e2..e59c41c913b4 100644
>>>>>> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
>>>>>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>>>>>> @@ -16,6 +16,7 @@
>>>>>>     #include "xe_dep_scheduler.h"
>>>>>>     #include "xe_device.h"
>>>>>>     #include "xe_gt.h"
>>>>>> +#include "xe_gt_sriov_pf_policy.h"
>>>>>>     #include "xe_gt_sriov_vf.h"
>>>>>>     #include "xe_hw_engine_class_sysfs.h"
>>>>>>     #include "xe_hw_engine_group.h"
>>>>>> @@ -698,6 +699,17 @@ static u32 calc_validate_logical_mask(struct xe_device *xe,
>>>>>>         return return_mask;
>>>>>>     }
>>>>>>     +static bool has_sched_groups(struct xe_gt *gt)
>>>>>> +{
>>>>>> +    if (IS_SRIOV_PF(gt_to_xe(gt)) && xe_gt_sriov_pf_policy_sched_groups_enabled(gt))
>>>>> hmm, usually we don't want core code to look so deeply into PF subcomponent
>>>> I thought about that, but I didn't know which sriov file would be the right one, since almost all of them are either pf-only or vf-only. should I just move it to xe_sriov.c?
>>> hmm, but xe_sriov.c is device level function
>>>
>>> we might add helper to xe_gt_sriov_pf.h where we can also provide stub there:
>>>
>>> #ifdef CONFIG_PCI_IOV
>>> bool xe_gt_sriov_pf_sched_groups_enabled(gt);
>>> #else
>>> static inline bool xe_gt_sriov_pf_sched_groups_enabled(gt) { return false; }
>>> #endif
>> Do you mean to check for both PF and VF from that function, and just have it in the PF file for convenience?
> this is PF file and only PF functions might require stubs (when PCI_IOV=n)
> so it is just to hide PF internals
>
> VF function is already exposed at GT level

ok, I hadn't understood your first comment. I thought you wanted me to 
show the whole has_sched_groups() function into an SRIOV file, while you 
where just asking to hide xe_gt_sriov_pf_policy_sched_groups_enabled to 
avoid calling a "policy" func directly. Having just that in a PF file 
makes more sense :)

Daniele

>
>> I could also just create a new xe_gt_sriov.h file and place it there, making the function an inline.
> maybe later ;) we already have many files
>
>>>>> also, do we want to work in hybrid mode where one GT is using MLRC and other is not?
>>>> The GuCs allow it. I didn't want to block it in case we do get a scenario later where this is needed (e.g. we might get a split primary GT but still want to do mlrc on media)
>>>>
>>>>>> +        return true;
>>>>>> +> +    if (IS_SRIOV_VF(gt_to_xe(gt)) && xe_gt_sriov_vf_sched_groups_enabled(gt))
>>>>>> +        return true;
>>>>>> +
>>>>>> +    return false;
>>>>>> +}
>>>>>> +
>>>>>>     int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>>>>>>                        struct drm_file *file)
>>>>>>     {
>>>>>> @@ -790,6 +802,13 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>>>>>>                 return -ENOENT;
>>>>>>             }
>>>>>>     +        /* SRIOV sched groups are not compatible with multi-lrc */
>>>>>> +        if (XE_IOCTL_DBG(xe, args->width > 1 && has_sched_groups(hwe->gt))) {
>>>>>> +            up_read(&vm->lock);
>>>>>> +            xe_vm_put(vm);
>>>>>> +            return -EINVAL;
>>>>>> +        }
>>>>>> +
>>>>>>             q = xe_exec_queue_create(xe, vm, logical_mask,
>>>>>>                          args->width, hwe, flags,
>>>>>>                          args->extensions);
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>>>>> index 48f250ae0d0d..c7f1ea8eb9c5 100644
>>>>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>>>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>>>>> @@ -8,6 +8,7 @@
>>>>>>     #include <drm/drm_managed.h>
>>>>>>       #include "xe_bo.h"
>>>>>> +#include "xe_exec_queue_types.h"
>>>>>>     #include "xe_gt.h"
>>>>>>     #include "xe_gt_sriov_pf_helpers.h"
>>>>>>     #include "xe_gt_sriov_pf_policy.h"
>>>>>> @@ -527,6 +528,24 @@ static int __pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>>>>>>                           masks, num_masks);
>>>>>>     }
>>>>>>     +static bool guc_has_mlrc_queue(struct xe_guc *guc)
>>>>> this is all GuC stuff, so export it from xe_guc_submission.c
>>>> ok
>>>>
>>>>>> +{
>>>>>> +    struct xe_exec_queue *q;
>>>>>> +    unsigned long index;
>>>>>> +    bool found = false;
>>>>>> +
>>>>>> +    mutex_lock(&guc->submission_state.lock);
>>>>> guard(mutex) ?
>>>>>
>>>>>> +    xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) {
>>>>>> +        if (q->width > 1) {
>>>>>> +            found = true;
>>>>>> +            break;
>>>>>> +        }
>>>>>> +    }
>>>>>> +    mutex_unlock(&guc->submission_state.lock);
>>>>>> +
>>>>>> +    return found;
>>>>> what if new MLRC is created right now?
>>>>>
>>>>> maybe we should use xe_guard to lockdown one feature or the other?
>>>> The idea is that the admin is responsible for enabling EGS when the system is in the correct state, which is why the GuC doesn't return an error and just ignore the KLV if the system is not in the correct state. This check is there to catch the case where the admin has closed all apps in preparation to enabling EGS but the driver still hasn't processed all the context de-registrations. Therefore, I don't really want to over-complicate it with extra locking. I'll add a comment to better explain.
>>> so it doesn't have to be bullet-proof ?
>> Yup. If we had stricter requirements then the GuC would enforce them.
>>
>> Daniele
>>
>>>>>> +}
>>>>>> +
>>>>>>     static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>>>>>>     {
>>>>>>         int err;
>>>>>> @@ -548,6 +567,12 @@ static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>>>>>>         if (xe_sriov_pf_num_vfs(gt_to_xe(gt)))
>>>>>>             return -EPERM;
>>>>>>     +    /* The GuC silently ignores the setting if any mlrc contexts are registered */
>>>>>> +    if ((mode != XE_SRIOV_SCHED_GROUPS_NONE) && guc_has_mlrc_queue(&gt->uc.guc)) {
>>>>>> +        xe_gt_sriov_notice(gt, "can't enable sched groups with active mlrc queues\n");
>>>>>> +        return -EPERM;
>>>>>> +    }
>>>>>> +
>>>>>>         err = __pf_provision_sched_groups(gt, mode);
>>>>>>         if (err)
>>>>>>             return err;
>>>>>> @@ -600,6 +625,11 @@ int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value)
>>>>>>         return err;
>>>>>>     }
>>>>>>     
>>>>> every public function needs to have kernel-doc
>>>> ok
>>>>
>>>>>> +bool xe_gt_sriov_pf_policy_sched_groups_enabled(struct xe_gt *gt)
>>>>>> +{
>>>>>> +    return gt->sriov.pf.policy.guc.sched_groups.current_mode != XE_SRIOV_SCHED_GROUPS_NONE;
>>>>>> +}
>>>>>> +
>>>>>>     static void pf_sanitize_guc_policies(struct xe_gt *gt)
>>>>>>     {
>>>>>>         pf_sanitize_sched_if_idle(gt);
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>>>>> index 36680996f2bd..89aa3af6cc7d 100644
>>>>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>>>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>>>>> @@ -18,6 +18,7 @@ bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
>>>>>>     int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
>>>>>>     u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
>>>>>>     int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value);
>>>>>> +bool xe_gt_sriov_pf_policy_sched_groups_enabled(struct xe_gt *gt);
>>>>>>       int xe_gt_sriov_pf_policy_init(struct xe_gt *gt);
>>>>>>     void xe_gt_sriov_pf_policy_sanitize(struct xe_gt *gt);
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>>>>>> index 4c73a077d314..7a180c947032 100644
>>>>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>>>>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>>>>>> @@ -438,6 +438,30 @@ u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt)
>>>>>>         return value;
>>>>>>     }
>>>>>>     +static int query_vf_sched_groups(struct xe_gt *gt)
>>>>>> +{
>>>>>> +    struct xe_guc *guc = &gt->uc.guc;
>>>>>> +    u32 value = 0;
>>>>>> +    int err;
>>>>>> +
>>>>>> +    xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
>>>>>> +
>>>>>> +    if (MAKE_GUC_VER_STRUCT(gt->sriov.vf.guc_version) < MAKE_GUC_VER(1, 26, 0))
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    err = guc_action_query_single_klv32(guc,
>>>>>> +                        GUC_KLV_GLOBAL_CFG_GROUP_SCHEDULING_AVAILABLE_KEY,
>>>>>> +                        &value);
>>>>>> +    if (unlikely(err)) {
>>>>>> +        xe_gt_sriov_err(gt, "Failed to obtain sched groups status (%pe)\n",
>>>>>> +                ERR_PTR(err));
>>>>>> +        return err;
>>>>>> +    }
>>>>>> +
>>>>>> +    xe_gt_sriov_dbg(gt, "sched groups %s\n", value ? "enabled" : "disabled");
>>>>> str_enabled_disabled(value)
>>>>>
>>>>>> +    return value;
>>>>>> +}
>>>>>> +
>>>>>>     static int vf_get_ggtt_info(struct xe_gt *gt)
>>>>>>     {
>>>>>>         struct xe_tile *tile = gt_to_tile(gt);
>>>>>> @@ -564,6 +588,21 @@ static void vf_cache_gmdid(struct xe_gt *gt)
>>>>>>         gt->sriov.vf.runtime.gmdid = xe_gt_sriov_vf_gmdid(gt);
>>>>>>     }
>>>>>>     +static int vf_cache_sched_groups_status(struct xe_gt *gt)
>>>>>> +{
>>>>>> +    int ret;
>>>>>> +
>>>>>> +    xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
>>>>>> +
>>>>>> +    ret = query_vf_sched_groups(gt);
>>>>>> +    if (ret < 0)
>>>>>> +        return ret;
>>>>>> +
>>>>>> +    gt->sriov.vf.runtime.uses_sched_groups = ret;
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>>     /**
>>>>>>      * xe_gt_sriov_vf_query_config - Query SR-IOV config data over MMIO.
>>>>>>      * @gt: the &xe_gt
>>>>>> @@ -593,12 +632,32 @@ int xe_gt_sriov_vf_query_config(struct xe_gt *gt)
>>>>>>         if (unlikely(err))
>>>>>>             return err;
>>>>>>     +    err = vf_cache_sched_groups_status(gt);
>>>>>> +    if (unlikely(err))
>>>>>> +        return err;
>>>>>> +
>>>>>>         if (has_gmdid(xe))
>>>>>>             vf_cache_gmdid(gt);
>>>>>>           return 0;
>>>>>>     }
>>>>>>     +/**
>>>>>> + * xe_gt_sriov_vf_sched_groups_enabled - Check if PF has enabled sched groups
>>>>>        * xe_gt_sriov_vf_sched_groups_enabled() - Check ...
>>>>>
>>>>>
>>>>>> + * @gt: the &xe_gt
>>>>>> + *
>>>>>> + * This function is for VF use only.
>>>>>> + *
>>>>>> + * Return: true if shed groups were enabled, false otherwise.
>>>>> typo: s/shed/scheduler
>>>>>
>>>>>> + */
>>>>>> +bool xe_gt_sriov_vf_sched_groups_enabled(struct xe_gt *gt)
>>>>>> +{
>>>>>> +    xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
>>>>>> +    xe_gt_assert(gt, gt->sriov.vf.guc_version.major);
>>>>>> +
>>>>>> +    return gt->sriov.vf.runtime.uses_sched_groups;
>>>>>> +}
>>>>>> +
>>>>>>     /**
>>>>>>      * xe_gt_sriov_vf_guc_ids - VF GuC context IDs configuration.
>>>>>>      * @gt: the &xe_gt
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>>>>>> index af40276790fa..2e1d34c0397f 100644
>>>>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>>>>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>>>>>> @@ -23,11 +23,14 @@ int xe_gt_sriov_vf_connect(struct xe_gt *gt);
>>>>>>     int xe_gt_sriov_vf_query_runtime(struct xe_gt *gt);
>>>>>>     void xe_gt_sriov_vf_migrated_event_handler(struct xe_gt *gt);
>>>>>>     +bool xe_gt_sriov_vf_sched_groups_enabled(struct xe_gt *gt);
>>>>> move it there [1]
>>>>>
>>>>>> +
>>>>>>     int xe_gt_sriov_vf_init_early(struct xe_gt *gt);
>>>>>>     int xe_gt_sriov_vf_init(struct xe_gt *gt);
>>>>>>     bool xe_gt_sriov_vf_recovery_pending(struct xe_gt *gt);
>>>>>>       u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt);
>>>>>> +u32 xe_gt_sriov_vf_sched_groups(struct xe_gt *gt);
>>>>> unused ?
>>>> oops, old function name that I forgot to remove.
>>>>
>>>>>>     u16 xe_gt_sriov_vf_guc_ids(struct xe_gt *gt);
>>>>>>     u64 xe_gt_sriov_vf_lmem(struct xe_gt *gt);
>>>>> [1] here
>>>>>
>>>>>>     diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
>>>>>> index 420b0e6089de..5267c097ecd0 100644
>>>>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
>>>>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
>>>>>> @@ -27,6 +27,8 @@ struct xe_gt_sriov_vf_selfconfig {
>>>>>>     struct xe_gt_sriov_vf_runtime {
>>>>>>         /** @gmdid: cached value of the GDMID register. */
>>>>>>         u32 gmdid;
>>>>>> +    /** @uses_sched_groups: whether PF enabled sched groups or not. */
>>>>>> +    bool uses_sched_groups;
>>>>>>         /** @regs_size: size of runtime register array. */
>>>>>>         u32 regs_size;
>>>>>>         /** @num_regs: number of runtime registers in the array. */
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h b/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
>>>>>> index 0a028c94756d..3e55d9302855 100644
>>>>>> --- a/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
>>>>>> +++ b/drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
>>>>>> @@ -32,6 +32,7 @@
>>>>>>         define(H2G_STORM, guc_time_us)            \
>>>>>>         define(IRQ_STORM, irq_time_us)            \
>>>>>>         define(DOORBELL_STORM, doorbell_time_us)    \
>>>>>> +    define(MULTI_LRC_COUNT, multi_lrc_count)    \
>>>>> this needs to be defined with some version info, maybe:
>>>>>
>>>>>       define(MULTI_LRC_COUNT, multi_lrc_count, (70, 53, 0))    \
>>>>>
>>>>> and then in encode_config() have two variants of code generators:
>>>>>
>>>>> #define encode_threshold_config2(TAG, ...) ({                    \
>>>>>       cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_THRESHOLD_##TAG);            \
>>>>>       cfg[n++] = config->thresholds[MAKE_XE_GUC_KLV_THRESHOLD_INDEX(TAG)];    \
>>>>> });
>>>>> #define encode_threshold_config3(TAG, NAME, VERSION) ({                \
>>>>>       if (GUC_FIRMWARE_VER(&gt->uc.guc) >= MAKE_GUC_VER VERSION) {        \
>>>>>           encode_threshold_config2(TAG, NAME);                \
>>>>>       }                                    \
>>>>> });
>>>>> #define encode_threshold_config(ARGS...) \
>>>>>       CALL_ARGS(CONCATENATE(encode_threshold_config, COUNT_ARGS(ARGS)), ARGS)
>>>>>
>>>>> this should fix the issues already spotted by the CI on ADLP:
>>>>>
>>>>> <6> [133.603624] xe 0000:00:02.0: [drm] Tile0: GT0: { key 0x8a0d : 32b value 0 } # multi_lrc_count
>>>>> <6> [133.603625] xe 0000:00:02.0: [drm] Tile0: GT0: { key 0x0001 : 64b value 0x200000 } # ggtt_start
>>>>> <6> [133.603626] xe 0000:00:02.0: [drm] Tile0: GT0: { key 0x0002 : 64b value 0xfec00000 } # ggtt_size
>>>>> <3> [133.603653] xe 0000:00:02.0: [drm] *ERROR* PF: Tile0: GT0: Failed to push self configuration (-ENOKEY)
>>>> I think it works better if we add the version for all thresholds, so we don't have to define multiple macros every time we need to check the GuC version (which I believe needs to also be done at least in register_threshold_attribute() and define_threshold_key_to_provision_case().
>>>> I can just use 70.29.2 (which is the minimum supported version for the Xe driver) for all the older defines.
>>> but then the code for version check will be generated for all thresholds, even those that we are sure are supported
>>>
>>> I will try to find something simpler, or more reusable, but no promise
>>>
>>>> Daniele
>>>>
>>>>>>         /* end */
>>>>>>       /**


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 05/10] drm/xe/sriov: Add debugfs to enable scheduler groups
  2025-11-27  1:45 [PATCH 00/10] Introduce SRIOV scheduler groups Daniele Ceraolo Spurio
                   ` (3 preceding siblings ...)
  2025-11-27  1:45 ` [PATCH 04/10] drm/xe/sriov: Scheduler groups are incompatible with multi-lrc Daniele Ceraolo Spurio
@ 2025-11-27  1:45 ` Daniele Ceraolo Spurio
  2025-12-02 15:52   ` Michal Wajdeczko
  2025-11-27  1:45 ` [PATCH 06/10] drm/xe/sriov: Add debugfs with scheduler groups information Daniele Ceraolo Spurio
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-11-27  1:45 UTC (permalink / raw)
  To: intel-xe; +Cc: Daniele Ceraolo Spurio, Michal Wajdeczko

Reading the debugfs file lists the available configurations by name.
Writing the name of a configuration to the file will enable it.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c | 116 ++++++++++++++++++++
 drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c  |  10 +-
 drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h  |   2 +
 3 files changed, 123 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
index 0fd863609848..2953ef21a5ad 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
@@ -155,6 +155,121 @@ static void pf_add_policy_attrs(struct xe_gt *gt, struct dentry *parent)
 	debugfs_create_file_unsafe("sample_period_ms", 0644, parent, parent, &sample_period_fops);
 }
 
+/*
+ *      /sys/kernel/debug/dri/BDF/
+ *      ├── sriov
+ *      :   ├── pf
+ *          :   ├── tile0
+ *              :   ├── gt0
+ *                  :   ├── sched_groups_mode
+ */
+
+static const char *sched_group_mode_to_string(enum xe_sriov_sched_group_modes mode)
+{
+	switch (mode) {
+	case XE_SRIOV_SCHED_GROUPS_NONE:
+		return "disabled";
+	case XE_SRIOV_SCHED_GROUPS_MEDIA_SLICES:
+		return "media_slices";
+	default:
+		return "unknown";
+	}
+}
+
+static int sched_groups_info(struct seq_file *m, void *data)
+{
+	struct drm_printer p = drm_seq_file_printer(m);
+	struct xe_gt *gt = extract_gt(m->private);
+	u32 current_mode = gt->sriov.pf.policy.guc.sched_groups.current_mode;
+	int mode = 0;
+
+	if (!xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt)) {
+		drm_printf(&p, "no groups available\n");
+		return 0;
+	}
+
+	for (mode = 0; mode < XE_SRIOV_SCHED_GROUPS_MODES_COUNT; mode++) {
+		if (!xe_sriov_gt_pf_policy_has_sched_group_mode(gt, mode))
+			continue;
+
+		if (mode)
+			drm_printf(&p, " ");
+
+		if (mode == current_mode)
+			drm_printf(&p, "[");
+
+		drm_printf(&p, "%s", sched_group_mode_to_string(mode));
+
+		if (mode == current_mode)
+			drm_printf(&p, "]");
+	}
+
+	drm_printf(&p, "\n");
+
+	return 0;
+}
+
+static int sched_groups_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, sched_groups_info, inode->i_private);
+}
+
+static ssize_t sched_groups_write(struct file *file, const char __user *ubuf,
+				  size_t size, loff_t *pos)
+{
+	struct xe_gt *gt = extract_gt(file_inode(file)->i_private);
+	char name[32];
+	int ret;
+	int m;
+
+	if (*pos)
+		return -ESPIPE;
+
+	if (!size)
+		return -ENODATA;
+
+	if (!xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt))
+		return -ENODEV;
+
+	if (size > sizeof(name) - 1)
+		return -EINVAL;
+
+	ret = simple_write_to_buffer(name, sizeof(name) - 1, pos, ubuf, size);
+	if (ret < 0)
+		return ret;
+	name[ret] = '\0';
+
+	for (m = 0; m < XE_SRIOV_SCHED_GROUPS_MODES_COUNT; m++)
+		if (sysfs_streq(name, sched_group_mode_to_string(m)))
+			break;
+
+	if (m == XE_SRIOV_SCHED_GROUPS_MODES_COUNT)
+		return -EINVAL;
+
+	xe_pm_runtime_get(gt_to_xe(gt));
+	ret = xe_gt_sriov_pf_policy_set_sched_groups_mode(gt, m);
+	xe_pm_runtime_put(gt_to_xe(gt));
+
+	return (ret < 0) ? ret : size;
+}
+
+static const struct file_operations sched_groups_fops = {
+	.owner = THIS_MODULE,
+	.open = sched_groups_open,
+	.read = seq_read,
+	.write = sched_groups_write,
+	.llseek = seq_lseek,
+	.release = single_release,
+};
+
+static void pf_add_sched_groups(struct xe_gt *gt, struct dentry *parent)
+{
+	xe_gt_assert(gt, gt == extract_gt(parent));
+	xe_gt_assert(gt, PFID == extract_vfid(parent));
+
+	debugfs_create_file("sched_groups_mode", 0644, parent, parent, &sched_groups_fops);
+}
+
 /*
  *      /sys/kernel/debug/dri/BDF/
  *      ├── sriov
@@ -528,6 +643,7 @@ static void pf_populate_gt(struct xe_gt *gt, struct dentry *dent, unsigned int v
 	} else {
 		pf_add_config_attrs(gt, dent, PFID);
 		pf_add_policy_attrs(gt, dent);
+		pf_add_sched_groups(gt, dent);
 
 		drm_debugfs_create_files(pf_info, ARRAY_SIZE(pf_info), dent, minor);
 	}
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
index c7f1ea8eb9c5..3c5fc1b5f281 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
@@ -507,12 +507,12 @@ pf_policy_has_sched_group_modes(struct xe_gt *gt, unsigned long mask)
 	return gt->sriov.pf.policy.guc.sched_groups.supported_modes & mask;
 }
 
-static bool pf_policy_has_valid_sched_group_modes(struct xe_gt *gt)
+bool xe_sriov_gt_pf_policy_has_valid_sched_group_modes(struct xe_gt *gt)
 {
 	return pf_policy_has_sched_group_modes(gt, ~BIT(XE_SRIOV_SCHED_GROUPS_NONE));
 }
 
-static bool pf_policy_has_sched_group_mode(struct xe_gt *gt, u32 mode)
+bool xe_sriov_gt_pf_policy_has_sched_group_mode(struct xe_gt *gt, u32 mode)
 {
 	return pf_policy_has_sched_group_modes(gt, BIT(mode));
 }
@@ -553,7 +553,7 @@ static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
 	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
 	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
 
-	if (!pf_policy_has_sched_group_mode(gt, mode))
+	if (!xe_sriov_gt_pf_policy_has_sched_group_mode(gt, mode))
 		return -EINVAL;
 
 	/* already in the desired mode */
@@ -588,7 +588,7 @@ static int pf_reprovision_sched_groups(struct xe_gt *gt)
 	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
 
 	/* We only have something to provision if we have possible groups */
-	if (!pf_policy_has_valid_sched_group_modes(gt))
+	if (!xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt))
 		return 0;
 
 	return __pf_provision_sched_groups(gt, gt->sriov.pf.policy.guc.sched_groups.current_mode);
@@ -615,7 +615,7 @@ int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value)
 {
 	int err;
 
-	if (!(pf_policy_has_valid_sched_group_modes(gt)))
+	if (!(xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt)))
 		return -ENODEV;
 
 	mutex_lock(xe_gt_sriov_pf_master_mutex(gt));
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
index 89aa3af6cc7d..13550cff7c00 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
@@ -17,6 +17,8 @@ int xe_gt_sriov_pf_policy_set_reset_engine(struct xe_gt *gt, bool enable);
 bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
 int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
 u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
+bool xe_sriov_gt_pf_policy_has_valid_sched_group_modes(struct xe_gt *gt);
+bool xe_sriov_gt_pf_policy_has_sched_group_mode(struct xe_gt *gt, u32 mode);
 int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value);
 bool xe_gt_sriov_pf_policy_sched_groups_enabled(struct xe_gt *gt);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 05/10] drm/xe/sriov: Add debugfs to enable scheduler groups
  2025-11-27  1:45 ` [PATCH 05/10] drm/xe/sriov: Add debugfs to enable scheduler groups Daniele Ceraolo Spurio
@ 2025-12-02 15:52   ` Michal Wajdeczko
  2025-12-02 18:03     ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 44+ messages in thread
From: Michal Wajdeczko @ 2025-12-02 15:52 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-xe



On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
> Reading the debugfs file lists the available configurations by name.
> Writing the name of a configuration to the file will enable it.
> 
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c | 116 ++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c  |  10 +-
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h  |   2 +
>  3 files changed, 123 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
> index 0fd863609848..2953ef21a5ad 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
> @@ -155,6 +155,121 @@ static void pf_add_policy_attrs(struct xe_gt *gt, struct dentry *parent)
>  	debugfs_create_file_unsafe("sample_period_ms", 0644, parent, parent, &sample_period_fops);
>  }
>  
> +/*
> + *      /sys/kernel/debug/dri/BDF/
> + *      ├── sriov
> + *      :   ├── pf
> + *          :   ├── tile0
> + *              :   ├── gt0
> + *                  :   ├── sched_groups_mode
> + */
> +
> +static const char *sched_group_mode_to_string(enum xe_sriov_sched_group_modes mode)
> +{
> +	switch (mode) {
> +	case XE_SRIOV_SCHED_GROUPS_NONE:
> +		return "disabled";
> +	case XE_SRIOV_SCHED_GROUPS_MEDIA_SLICES:
> +		return "media_slices";
> +	default:
> +		return "unknown";
> +	}
> +}
> +
> +static int sched_groups_info(struct seq_file *m, void *data)
> +{
> +	struct drm_printer p = drm_seq_file_printer(m);
> +	struct xe_gt *gt = extract_gt(m->private);
> +	u32 current_mode = gt->sriov.pf.policy.guc.sched_groups.current_mode;
> +	int mode = 0;
> +
> +	if (!xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt)) {
> +		drm_printf(&p, "no groups available\n");

since this will be used by the file read operation and user expects

	"the available configurations by name."

then IMO we should just return empty string

and if we check for EGS support earlier, see below,
then maybe this could be just an assert?

> +		return 0;
> +	}
> +
> +	for (mode = 0; mode < XE_SRIOV_SCHED_GROUPS_MODES_COUNT; mode++) {
> +		if (!xe_sriov_gt_pf_policy_has_sched_group_mode(gt, mode))
> +			continue;
> +
> +		if (mode)
> +			drm_printf(&p, " ");
> +
> +		if (mode == current_mode)
> +			drm_printf(&p, "[");
> +
> +		drm_printf(&p, "%s", sched_group_mode_to_string(mode));
> +
> +		if (mode == current_mode)
> +			drm_printf(&p, "]");
> +	}
> +
> +	drm_printf(&p, "\n");
> +
> +	return 0;
> +}
> +
> +static int sched_groups_open(struct inode *inode, struct file *file)
> +{
> +	return single_open(file, sched_groups_info, inode->i_private);
> +}
> +
> +static ssize_t sched_groups_write(struct file *file, const char __user *ubuf,
> +				  size_t size, loff_t *pos)
> +{
> +	struct xe_gt *gt = extract_gt(file_inode(file)->i_private);
> +	char name[32];
> +	int ret;
> +	int m;
> +
> +	if (*pos)
> +		return -ESPIPE;
> +
> +	if (!size)
> +		return -ENODATA;
> +
> +	if (!xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt))
> +		return -ENODEV;

maybe not needed - see below

> +
> +	if (size > sizeof(name) - 1)
> +		return -EINVAL;
> +
> +	ret = simple_write_to_buffer(name, sizeof(name) - 1, pos, ubuf, size);
> +	if (ret < 0)
> +		return ret;
> +	name[ret] = '\0';
> +
> +	for (m = 0; m < XE_SRIOV_SCHED_GROUPS_MODES_COUNT; m++)
> +		if (sysfs_streq(name, sched_group_mode_to_string(m)))
> +			break;
> +
> +	if (m == XE_SRIOV_SCHED_GROUPS_MODES_COUNT)
> +		return -EINVAL;
> +
> +	xe_pm_runtime_get(gt_to_xe(gt));

	guard(xe_pm_runtime)(xe);

> +	ret = xe_gt_sriov_pf_policy_set_sched_groups_mode(gt, m);
> +	xe_pm_runtime_put(gt_to_xe(gt));
> +
> +	return (ret < 0) ? ret : size;
> +}
> +
> +static const struct file_operations sched_groups_fops = {
> +	.owner = THIS_MODULE,
> +	.open = sched_groups_open,
> +	.read = seq_read,
> +	.write = sched_groups_write,
> +	.llseek = seq_lseek,
> +	.release = single_release,
> +};
> +
> +static void pf_add_sched_groups(struct xe_gt *gt, struct dentry *parent)
> +{
> +	xe_gt_assert(gt, gt == extract_gt(parent));
> +	xe_gt_assert(gt, PFID == extract_vfid(parent));
> +
> +	debugfs_create_file("sched_groups_mode", 0644, parent, parent, &sched_groups_fops);
> +}
> +
>  /*
>   *      /sys/kernel/debug/dri/BDF/
>   *      ├── sriov
> @@ -528,6 +643,7 @@ static void pf_populate_gt(struct xe_gt *gt, struct dentry *dent, unsigned int v
>  	} else {
>  		pf_add_config_attrs(gt, dent, PFID);
>  		pf_add_policy_attrs(gt, dent);
> +		pf_add_sched_groups(gt, dent);

at this point we should know whether we support EGS or not,
so we should create EGS files only if EGS is supported

>  
>  		drm_debugfs_create_files(pf_info, ARRAY_SIZE(pf_info), dent, minor);
>  	}
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
> index c7f1ea8eb9c5..3c5fc1b5f281 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
> @@ -507,12 +507,12 @@ pf_policy_has_sched_group_modes(struct xe_gt *gt, unsigned long mask)
>  	return gt->sriov.pf.policy.guc.sched_groups.supported_modes & mask;
>  }
>  
> -static bool pf_policy_has_valid_sched_group_modes(struct xe_gt *gt)
> +bool xe_sriov_gt_pf_policy_has_valid_sched_group_modes(struct xe_gt *gt)
>  {
>  	return pf_policy_has_sched_group_modes(gt, ~BIT(XE_SRIOV_SCHED_GROUPS_NONE));
>  }
>  
> -static bool pf_policy_has_sched_group_mode(struct xe_gt *gt, u32 mode)

public function needs kernel-doc

> +bool xe_sriov_gt_pf_policy_has_sched_group_mode(struct xe_gt *gt, u32 mode)

if in the single series like this one, we know that we will need some function,
I guess it is ok to define it as public on the first use, even if it was initially
used only locally - this will make smaller diff on next patch like this one

>  {
>  	return pf_policy_has_sched_group_modes(gt, BIT(mode));
>  }
> @@ -553,7 +553,7 @@ static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>  	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
>  	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
>  
> -	if (!pf_policy_has_sched_group_mode(gt, mode))
> +	if (!xe_sriov_gt_pf_policy_has_sched_group_mode(gt, mode))
>  		return -EINVAL;
>  
>  	/* already in the desired mode */
> @@ -588,7 +588,7 @@ static int pf_reprovision_sched_groups(struct xe_gt *gt)
>  	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
>  
>  	/* We only have something to provision if we have possible groups */
> -	if (!pf_policy_has_valid_sched_group_modes(gt))
> +	if (!xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt))
>  		return 0;
>  
>  	return __pf_provision_sched_groups(gt, gt->sriov.pf.policy.guc.sched_groups.current_mode);
> @@ -615,7 +615,7 @@ int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value)
>  {
>  	int err;
>  
> -	if (!(pf_policy_has_valid_sched_group_modes(gt)))
> +	if (!(xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt)))
>  		return -ENODEV;
>  
>  	mutex_lock(xe_gt_sriov_pf_master_mutex(gt));
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
> index 89aa3af6cc7d..13550cff7c00 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
> @@ -17,6 +17,8 @@ int xe_gt_sriov_pf_policy_set_reset_engine(struct xe_gt *gt, bool enable);
>  bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
>  int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
>  u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
> +bool xe_sriov_gt_pf_policy_has_valid_sched_group_modes(struct xe_gt *gt);
> +bool xe_sriov_gt_pf_policy_has_sched_group_mode(struct xe_gt *gt, u32 mode);
>  int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value);
>  bool xe_gt_sriov_pf_policy_sched_groups_enabled(struct xe_gt *gt);
>  


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 05/10] drm/xe/sriov: Add debugfs to enable scheduler groups
  2025-12-02 15:52   ` Michal Wajdeczko
@ 2025-12-02 18:03     ` Daniele Ceraolo Spurio
  2025-12-02 21:24       ` Michal Wajdeczko
  0 siblings, 1 reply; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-12-02 18:03 UTC (permalink / raw)
  To: Michal Wajdeczko, intel-xe



On 12/2/2025 7:52 AM, Michal Wajdeczko wrote:
>
> On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
>> Reading the debugfs file lists the available configurations by name.
>> Writing the name of a configuration to the file will enable it.
>>
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c | 116 ++++++++++++++++++++
>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c  |  10 +-
>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h  |   2 +
>>   3 files changed, 123 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
>> index 0fd863609848..2953ef21a5ad 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
>> @@ -155,6 +155,121 @@ static void pf_add_policy_attrs(struct xe_gt *gt, struct dentry *parent)
>>   	debugfs_create_file_unsafe("sample_period_ms", 0644, parent, parent, &sample_period_fops);
>>   }
>>   
>> +/*
>> + *      /sys/kernel/debug/dri/BDF/
>> + *      ├── sriov
>> + *      :   ├── pf
>> + *          :   ├── tile0
>> + *              :   ├── gt0
>> + *                  :   ├── sched_groups_mode
>> + */
>> +
>> +static const char *sched_group_mode_to_string(enum xe_sriov_sched_group_modes mode)
>> +{
>> +	switch (mode) {
>> +	case XE_SRIOV_SCHED_GROUPS_NONE:
>> +		return "disabled";
>> +	case XE_SRIOV_SCHED_GROUPS_MEDIA_SLICES:
>> +		return "media_slices";
>> +	default:
>> +		return "unknown";
>> +	}
>> +}
>> +
>> +static int sched_groups_info(struct seq_file *m, void *data)
>> +{
>> +	struct drm_printer p = drm_seq_file_printer(m);
>> +	struct xe_gt *gt = extract_gt(m->private);
>> +	u32 current_mode = gt->sriov.pf.policy.guc.sched_groups.current_mode;
>> +	int mode = 0;
>> +
>> +	if (!xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt)) {
>> +		drm_printf(&p, "no groups available\n");
> since this will be used by the file read operation and user expects
>
> 	"the available configurations by name."
>
> then IMO we should just return empty string

ok

>
> and if we check for EGS support earlier, see below,
> then maybe this could be just an assert?

sure

>
>> +		return 0;
>> +	}
>> +
>> +	for (mode = 0; mode < XE_SRIOV_SCHED_GROUPS_MODES_COUNT; mode++) {
>> +		if (!xe_sriov_gt_pf_policy_has_sched_group_mode(gt, mode))
>> +			continue;
>> +
>> +		if (mode)
>> +			drm_printf(&p, " ");
>> +
>> +		if (mode == current_mode)
>> +			drm_printf(&p, "[");
>> +
>> +		drm_printf(&p, "%s", sched_group_mode_to_string(mode));
>> +
>> +		if (mode == current_mode)
>> +			drm_printf(&p, "]");
>> +	}
>> +
>> +	drm_printf(&p, "\n");
>> +
>> +	return 0;
>> +}
>> +
>> +static int sched_groups_open(struct inode *inode, struct file *file)
>> +{
>> +	return single_open(file, sched_groups_info, inode->i_private);
>> +}
>> +
>> +static ssize_t sched_groups_write(struct file *file, const char __user *ubuf,
>> +				  size_t size, loff_t *pos)
>> +{
>> +	struct xe_gt *gt = extract_gt(file_inode(file)->i_private);
>> +	char name[32];
>> +	int ret;
>> +	int m;
>> +
>> +	if (*pos)
>> +		return -ESPIPE;
>> +
>> +	if (!size)
>> +		return -ENODATA;
>> +
>> +	if (!xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt))
>> +		return -ENODEV;
> maybe not needed - see below
>
>> +
>> +	if (size > sizeof(name) - 1)
>> +		return -EINVAL;
>> +
>> +	ret = simple_write_to_buffer(name, sizeof(name) - 1, pos, ubuf, size);
>> +	if (ret < 0)
>> +		return ret;
>> +	name[ret] = '\0';
>> +
>> +	for (m = 0; m < XE_SRIOV_SCHED_GROUPS_MODES_COUNT; m++)
>> +		if (sysfs_streq(name, sched_group_mode_to_string(m)))
>> +			break;
>> +
>> +	if (m == XE_SRIOV_SCHED_GROUPS_MODES_COUNT)
>> +		return -EINVAL;
>> +
>> +	xe_pm_runtime_get(gt_to_xe(gt));
> 	guard(xe_pm_runtime)(xe);
>
>> +	ret = xe_gt_sriov_pf_policy_set_sched_groups_mode(gt, m);
>> +	xe_pm_runtime_put(gt_to_xe(gt));
>> +
>> +	return (ret < 0) ? ret : size;
>> +}
>> +
>> +static const struct file_operations sched_groups_fops = {
>> +	.owner = THIS_MODULE,
>> +	.open = sched_groups_open,
>> +	.read = seq_read,
>> +	.write = sched_groups_write,
>> +	.llseek = seq_lseek,
>> +	.release = single_release,
>> +};
>> +
>> +static void pf_add_sched_groups(struct xe_gt *gt, struct dentry *parent)
>> +{
>> +	xe_gt_assert(gt, gt == extract_gt(parent));
>> +	xe_gt_assert(gt, PFID == extract_vfid(parent));
>> +
>> +	debugfs_create_file("sched_groups_mode", 0644, parent, parent, &sched_groups_fops);
>> +}
>> +
>>   /*
>>    *      /sys/kernel/debug/dri/BDF/
>>    *      ├── sriov
>> @@ -528,6 +643,7 @@ static void pf_populate_gt(struct xe_gt *gt, struct dentry *dent, unsigned int v
>>   	} else {
>>   		pf_add_config_attrs(gt, dent, PFID);
>>   		pf_add_policy_attrs(gt, dent);
>> +		pf_add_sched_groups(gt, dent);
> at this point we should know whether we support EGS or not,
> so we should create EGS files only if EGS is supported

We actually don't. xe_sriov_init_late() (which is where the EGS init is 
called from) happens after xe_debugfs_register. xe_sriov_init() happens 
too early, so that's also not a good choice.
I thought about moving xe_sriov_init_late to an earlier point, but I 
didn't want to mess with the general SRIOV flows. Thoughts?

>
>>   
>>   		drm_debugfs_create_files(pf_info, ARRAY_SIZE(pf_info), dent, minor);
>>   	}
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>> index c7f1ea8eb9c5..3c5fc1b5f281 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>> @@ -507,12 +507,12 @@ pf_policy_has_sched_group_modes(struct xe_gt *gt, unsigned long mask)
>>   	return gt->sriov.pf.policy.guc.sched_groups.supported_modes & mask;
>>   }
>>   
>> -static bool pf_policy_has_valid_sched_group_modes(struct xe_gt *gt)
>> +bool xe_sriov_gt_pf_policy_has_valid_sched_group_modes(struct xe_gt *gt)
>>   {
>>   	return pf_policy_has_sched_group_modes(gt, ~BIT(XE_SRIOV_SCHED_GROUPS_NONE));
>>   }
>>   
>> -static bool pf_policy_has_sched_group_mode(struct xe_gt *gt, u32 mode)
> public function needs kernel-doc

ok

>
>> +bool xe_sriov_gt_pf_policy_has_sched_group_mode(struct xe_gt *gt, u32 mode)
> if in the single series like this one, we know that we will need some function,
> I guess it is ok to define it as public on the first use, even if it was initially
> used only locally - this will make smaller diff on next patch like this one

ok

Daniele

>
>>   {
>>   	return pf_policy_has_sched_group_modes(gt, BIT(mode));
>>   }
>> @@ -553,7 +553,7 @@ static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>>   	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
>>   	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
>>   
>> -	if (!pf_policy_has_sched_group_mode(gt, mode))
>> +	if (!xe_sriov_gt_pf_policy_has_sched_group_mode(gt, mode))
>>   		return -EINVAL;
>>   
>>   	/* already in the desired mode */
>> @@ -588,7 +588,7 @@ static int pf_reprovision_sched_groups(struct xe_gt *gt)
>>   	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
>>   
>>   	/* We only have something to provision if we have possible groups */
>> -	if (!pf_policy_has_valid_sched_group_modes(gt))
>> +	if (!xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt))
>>   		return 0;
>>   
>>   	return __pf_provision_sched_groups(gt, gt->sriov.pf.policy.guc.sched_groups.current_mode);
>> @@ -615,7 +615,7 @@ int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value)
>>   {
>>   	int err;
>>   
>> -	if (!(pf_policy_has_valid_sched_group_modes(gt)))
>> +	if (!(xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt)))
>>   		return -ENODEV;
>>   
>>   	mutex_lock(xe_gt_sriov_pf_master_mutex(gt));
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>> index 89aa3af6cc7d..13550cff7c00 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>> @@ -17,6 +17,8 @@ int xe_gt_sriov_pf_policy_set_reset_engine(struct xe_gt *gt, bool enable);
>>   bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
>>   int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
>>   u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
>> +bool xe_sriov_gt_pf_policy_has_valid_sched_group_modes(struct xe_gt *gt);
>> +bool xe_sriov_gt_pf_policy_has_sched_group_mode(struct xe_gt *gt, u32 mode);
>>   int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value);
>>   bool xe_gt_sriov_pf_policy_sched_groups_enabled(struct xe_gt *gt);
>>   


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 05/10] drm/xe/sriov: Add debugfs to enable scheduler groups
  2025-12-02 18:03     ` Daniele Ceraolo Spurio
@ 2025-12-02 21:24       ` Michal Wajdeczko
  0 siblings, 0 replies; 44+ messages in thread
From: Michal Wajdeczko @ 2025-12-02 21:24 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-xe



On 12/2/2025 7:03 PM, Daniele Ceraolo Spurio wrote:
> 
> 
> On 12/2/2025 7:52 AM, Michal Wajdeczko wrote:
>>
>> On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
>>> Reading the debugfs file lists the available configurations by name.
>>> Writing the name of a configuration to the file will enable it.
>>>
>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>> ---
>>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c | 116 ++++++++++++++++++++
>>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c  |  10 +-
>>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h  |   2 +
>>>   3 files changed, 123 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
>>> index 0fd863609848..2953ef21a5ad 100644
>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
>>> @@ -155,6 +155,121 @@ static void pf_add_policy_attrs(struct xe_gt *gt, struct dentry *parent)
>>>       debugfs_create_file_unsafe("sample_period_ms", 0644, parent, parent, &sample_period_fops);
>>>   }
>>>   +/*
>>> + *      /sys/kernel/debug/dri/BDF/
>>> + *      ├── sriov
>>> + *      :   ├── pf
>>> + *          :   ├── tile0
>>> + *              :   ├── gt0
>>> + *                  :   ├── sched_groups_mode
>>> + */
>>> +
>>> +static const char *sched_group_mode_to_string(enum xe_sriov_sched_group_modes mode)
>>> +{
>>> +    switch (mode) {
>>> +    case XE_SRIOV_SCHED_GROUPS_NONE:
>>> +        return "disabled";
>>> +    case XE_SRIOV_SCHED_GROUPS_MEDIA_SLICES:
>>> +        return "media_slices";
>>> +    default:
>>> +        return "unknown";
>>> +    }
>>> +}
>>> +
>>> +static int sched_groups_info(struct seq_file *m, void *data)
>>> +{
>>> +    struct drm_printer p = drm_seq_file_printer(m);
>>> +    struct xe_gt *gt = extract_gt(m->private);
>>> +    u32 current_mode = gt->sriov.pf.policy.guc.sched_groups.current_mode;
>>> +    int mode = 0;
>>> +
>>> +    if (!xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt)) {
>>> +        drm_printf(&p, "no groups available\n");
>> since this will be used by the file read operation and user expects
>>
>>     "the available configurations by name."
>>
>> then IMO we should just return empty string
> 
> ok
> 
>>
>> and if we check for EGS support earlier, see below,
>> then maybe this could be just an assert?
> 
> sure
> 
>>
>>> +        return 0;
>>> +    }
>>> +
>>> +    for (mode = 0; mode < XE_SRIOV_SCHED_GROUPS_MODES_COUNT; mode++) {
>>> +        if (!xe_sriov_gt_pf_policy_has_sched_group_mode(gt, mode))
>>> +            continue;
>>> +
>>> +        if (mode)
>>> +            drm_printf(&p, " ");
>>> +
>>> +        if (mode == current_mode)
>>> +            drm_printf(&p, "[");
>>> +
>>> +        drm_printf(&p, "%s", sched_group_mode_to_string(mode));
>>> +
>>> +        if (mode == current_mode)
>>> +            drm_printf(&p, "]");
>>> +    }
>>> +
>>> +    drm_printf(&p, "\n");
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int sched_groups_open(struct inode *inode, struct file *file)
>>> +{
>>> +    return single_open(file, sched_groups_info, inode->i_private);
>>> +}
>>> +
>>> +static ssize_t sched_groups_write(struct file *file, const char __user *ubuf,
>>> +                  size_t size, loff_t *pos)
>>> +{
>>> +    struct xe_gt *gt = extract_gt(file_inode(file)->i_private);
>>> +    char name[32];
>>> +    int ret;
>>> +    int m;
>>> +
>>> +    if (*pos)
>>> +        return -ESPIPE;
>>> +
>>> +    if (!size)
>>> +        return -ENODATA;
>>> +
>>> +    if (!xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt))
>>> +        return -ENODEV;
>> maybe not needed - see below
>>
>>> +
>>> +    if (size > sizeof(name) - 1)
>>> +        return -EINVAL;
>>> +
>>> +    ret = simple_write_to_buffer(name, sizeof(name) - 1, pos, ubuf, size);
>>> +    if (ret < 0)
>>> +        return ret;
>>> +    name[ret] = '\0';
>>> +
>>> +    for (m = 0; m < XE_SRIOV_SCHED_GROUPS_MODES_COUNT; m++)
>>> +        if (sysfs_streq(name, sched_group_mode_to_string(m)))
>>> +            break;
>>> +
>>> +    if (m == XE_SRIOV_SCHED_GROUPS_MODES_COUNT)
>>> +        return -EINVAL;
>>> +
>>> +    xe_pm_runtime_get(gt_to_xe(gt));
>>     guard(xe_pm_runtime)(xe);
>>
>>> +    ret = xe_gt_sriov_pf_policy_set_sched_groups_mode(gt, m);
>>> +    xe_pm_runtime_put(gt_to_xe(gt));
>>> +
>>> +    return (ret < 0) ? ret : size;
>>> +}
>>> +
>>> +static const struct file_operations sched_groups_fops = {
>>> +    .owner = THIS_MODULE,
>>> +    .open = sched_groups_open,
>>> +    .read = seq_read,
>>> +    .write = sched_groups_write,
>>> +    .llseek = seq_lseek,
>>> +    .release = single_release,
>>> +};
>>> +
>>> +static void pf_add_sched_groups(struct xe_gt *gt, struct dentry *parent)
>>> +{
>>> +    xe_gt_assert(gt, gt == extract_gt(parent));
>>> +    xe_gt_assert(gt, PFID == extract_vfid(parent));
>>> +
>>> +    debugfs_create_file("sched_groups_mode", 0644, parent, parent, &sched_groups_fops);
>>> +}
>>> +
>>>   /*
>>>    *      /sys/kernel/debug/dri/BDF/
>>>    *      ├── sriov
>>> @@ -528,6 +643,7 @@ static void pf_populate_gt(struct xe_gt *gt, struct dentry *dent, unsigned int v
>>>       } else {
>>>           pf_add_config_attrs(gt, dent, PFID);
>>>           pf_add_policy_attrs(gt, dent);
>>> +        pf_add_sched_groups(gt, dent);
>> at this point we should know whether we support EGS or not,
>> so we should create EGS files only if EGS is supported
> 
> We actually don't. xe_sriov_init_late() (which is where the EGS init is called from) happens after xe_debugfs_register. xe_sriov_init() happens too early, so that's also not a good choice.
> I thought about moving xe_sriov_init_late to an earlier point, but I didn't want to mess with the general SRIOV flows. Thoughts?

add dent at xe_gt (like we have on xe_tile)

and move PF GT debugfs initialization from xe_gt_debugfs_register to xe_gt_sriov_pf_init() ?

but will take a closer look if this is safe ;)

> 
>>
>>>             drm_debugfs_create_files(pf_info, ARRAY_SIZE(pf_info), dent, minor);
>>>       }
>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>> index c7f1ea8eb9c5..3c5fc1b5f281 100644
>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>>> @@ -507,12 +507,12 @@ pf_policy_has_sched_group_modes(struct xe_gt *gt, unsigned long mask)
>>>       return gt->sriov.pf.policy.guc.sched_groups.supported_modes & mask;
>>>   }
>>>   -static bool pf_policy_has_valid_sched_group_modes(struct xe_gt *gt)
>>> +bool xe_sriov_gt_pf_policy_has_valid_sched_group_modes(struct xe_gt *gt)
>>>   {
>>>       return pf_policy_has_sched_group_modes(gt, ~BIT(XE_SRIOV_SCHED_GROUPS_NONE));
>>>   }
>>>   -static bool pf_policy_has_sched_group_mode(struct xe_gt *gt, u32 mode)
>> public function needs kernel-doc
> 
> ok
> 
>>
>>> +bool xe_sriov_gt_pf_policy_has_sched_group_mode(struct xe_gt *gt, u32 mode)
>> if in the single series like this one, we know that we will need some function,
>> I guess it is ok to define it as public on the first use, even if it was initially
>> used only locally - this will make smaller diff on next patch like this one
> 
> ok
> 
> Daniele
> 
>>
>>>   {
>>>       return pf_policy_has_sched_group_modes(gt, BIT(mode));
>>>   }
>>> @@ -553,7 +553,7 @@ static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>>>       xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
>>>       lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
>>>   -    if (!pf_policy_has_sched_group_mode(gt, mode))
>>> +    if (!xe_sriov_gt_pf_policy_has_sched_group_mode(gt, mode))
>>>           return -EINVAL;
>>>         /* already in the desired mode */
>>> @@ -588,7 +588,7 @@ static int pf_reprovision_sched_groups(struct xe_gt *gt)
>>>       lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
>>>         /* We only have something to provision if we have possible groups */
>>> -    if (!pf_policy_has_valid_sched_group_modes(gt))
>>> +    if (!xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt))
>>>           return 0;
>>>         return __pf_provision_sched_groups(gt, gt->sriov.pf.policy.guc.sched_groups.current_mode);
>>> @@ -615,7 +615,7 @@ int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value)
>>>   {
>>>       int err;
>>>   -    if (!(pf_policy_has_valid_sched_group_modes(gt)))
>>> +    if (!(xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt)))
>>>           return -ENODEV;
>>>         mutex_lock(xe_gt_sriov_pf_master_mutex(gt));
>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>> index 89aa3af6cc7d..13550cff7c00 100644
>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>>> @@ -17,6 +17,8 @@ int xe_gt_sriov_pf_policy_set_reset_engine(struct xe_gt *gt, bool enable);
>>>   bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
>>>   int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
>>>   u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
>>> +bool xe_sriov_gt_pf_policy_has_valid_sched_group_modes(struct xe_gt *gt);
>>> +bool xe_sriov_gt_pf_policy_has_sched_group_mode(struct xe_gt *gt, u32 mode);
>>>   int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value);
>>>   bool xe_gt_sriov_pf_policy_sched_groups_enabled(struct xe_gt *gt);
>>>   
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 06/10] drm/xe/sriov: Add debugfs with scheduler groups information
  2025-11-27  1:45 [PATCH 00/10] Introduce SRIOV scheduler groups Daniele Ceraolo Spurio
                   ` (4 preceding siblings ...)
  2025-11-27  1:45 ` [PATCH 05/10] drm/xe/sriov: Add debugfs to enable scheduler groups Daniele Ceraolo Spurio
@ 2025-11-27  1:45 ` Daniele Ceraolo Spurio
  2025-12-02 16:24   ` Michal Wajdeczko
  2025-11-27  1:45 ` [PATCH 07/10] drm/xe/sriov: Prep for multiple exec quantums and preemption timeouts Daniele Ceraolo Spurio
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-11-27  1:45 UTC (permalink / raw)
  To: intel-xe; +Cc: Daniele Ceraolo Spurio, Michal Wajdeczko

When schedulers groups are enabled, we dynamically create new debugfs
folders under the GT folder for the PF and each VF. The aim is to have
all the info for each VF under the sched_groups folder, with individual
folders per-group under it. Right now the only info is the engine list,
but follow up patches will add execution quantum and preemption timeout
as well (which are configurable per-group-per-VF).

Note that the engine lists are not configurable per-VF and therefore not
strictly needed in the VF folders. However, it is still useful to have
them there because an admin might want to check the engine list while
configuring the EQ/PT values for a specific VF.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c | 140 +++++++++++++++++++-
 drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h   |   4 +
 2 files changed, 143 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
index 2953ef21a5ad..947e2b92d58a 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
@@ -5,6 +5,7 @@
 
 #include <linux/debugfs.h>
 
+#include <drm/drm_managed.h>
 #include <drm/drm_print.h>
 #include <drm/drm_debugfs.h>
 
@@ -21,6 +22,7 @@
 #include "xe_gt_sriov_pf_monitor.h"
 #include "xe_gt_sriov_pf_policy.h"
 #include "xe_gt_sriov_pf_service.h"
+#include "xe_guc.h"
 #include "xe_pm.h"
 #include "xe_sriov_pf.h"
 #include "xe_sriov_pf_provision.h"
@@ -162,8 +164,127 @@ static void pf_add_policy_attrs(struct xe_gt *gt, struct dentry *parent)
  *          :   ├── tile0
  *              :   ├── gt0
  *                  :   ├── sched_groups_mode
+ *                      ├── sched_groups
+ *                      :   ├── group0
+ *                          :   └── engines
+ *                          :
+ *                          └── groupN
+ *          :                   └── engines
+ *          ├── vf1
+ *          :   ├── tile0
+ *              :   ├── gt0
+ *                  :   ├── sched_groups
+ *                      :   ├── group0
+ *                          :   └── engines
  */
 
+struct sched_group_info {
+	struct xe_gt *gt;
+	u32 *masks;
+};
+
+static int sched_group_engines_info(struct seq_file *m, void *data)
+{
+	struct drm_printer p = drm_seq_file_printer(m);
+	struct sched_group_info *gi = m->private;
+	struct xe_gt *gt = gi->gt;
+	struct xe_hw_engine *hwe;
+	enum xe_hw_engine_id id;
+	bool first = true;
+
+	for_each_hw_engine(hwe, gt, id) {
+		u8 guc_class = xe_engine_class_to_guc_class(hwe->class);
+		u32 mask = gi->masks[guc_class];
+
+		if (mask & BIT(hwe->logical_instance)) {
+			drm_printf(&p, "%s%s", first ? "" : " ", hwe->name);
+
+			first = false;
+		}
+	}
+
+	drm_printf(&p, "\n");
+
+	return 0;
+}
+
+static int sched_group_engines_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, sched_group_engines_info, inode->i_private);
+}
+
+static const struct file_operations sched_group_engines_fops = {
+	.owner = THIS_MODULE,
+	.open = sched_group_engines_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = single_release,
+};
+
+static void __sched_group_info_cleanup(struct xe_gt *gt, struct dentry *dent)
+{
+	if (dent->d_inode->i_private)
+		drmm_kfree(&gt_to_xe(gt)->drm, dent->d_inode->i_private);
+
+	debugfs_remove_recursive(dent);
+}
+
+#define GROUP_INFO_ROOT "sched_groups"
+static void sched_group_info_register(struct xe_gt *gt, unsigned int vfid)
+{
+	struct xe_gt_sriov_pf_policy *policy = &gt->sriov.pf.policy;
+	u32 mode = policy->guc.sched_groups.current_mode;
+	u8 num_groups = policy->guc.sched_groups.modes[mode].num_masks / GUC_MAX_ENGINE_CLASSES;
+	struct sched_group_info *infos;
+	struct dentry *parent;
+	struct dentry *old;
+	u8 g;
+
+	if (!gt->sriov.pf.debugfs_roots)
+		return;
+
+	/* remove existing debugfs entries for old groups */
+	old = debugfs_lookup(GROUP_INFO_ROOT, gt->sriov.pf.debugfs_roots[vfid]);
+	if (old)
+		__sched_group_info_cleanup(gt, old);
+
+	/* re-create debugfs for new groups (if any)*/
+	if (!num_groups)
+		return;
+
+	parent = debugfs_create_dir(GROUP_INFO_ROOT, gt->sriov.pf.debugfs_roots[vfid]);
+	if (IS_ERR(parent))
+		return;
+
+	infos = drmm_kzalloc(&gt_to_xe(gt)->drm, sizeof(*infos) * num_groups, GFP_KERNEL);
+	if (!infos)
+		goto out_err;
+	parent->d_inode->i_private = infos;
+
+	for (g = 0; g < num_groups; g++) {
+		struct sched_group_info *info = &infos[g];
+		u32 base = g * GUC_MAX_ENGINE_CLASSES;
+		struct dentry *dent;
+		char name[10];
+
+		snprintf(name, sizeof(name), "group%u", g);
+		dent = debugfs_create_dir(name, parent);
+		if (IS_ERR(dent))
+			goto out_err;
+
+		info->gt = gt;
+		info->masks = &policy->guc.sched_groups.modes[mode].masks[base];
+
+		dent->d_inode->i_private = info;
+		debugfs_create_file("engines", 0644, dent, info, &sched_group_engines_fops);
+	}
+
+	return;
+
+out_err:
+	__sched_group_info_cleanup(gt, parent);
+}
+
 static const char *sched_group_mode_to_string(enum xe_sriov_sched_group_modes mode)
 {
 	switch (mode) {
@@ -220,7 +341,7 @@ static ssize_t sched_groups_write(struct file *file, const char __user *ubuf,
 	struct xe_gt *gt = extract_gt(file_inode(file)->i_private);
 	char name[32];
 	int ret;
-	int m;
+	int m, i;
 
 	if (*pos)
 		return -ESPIPE;
@@ -250,6 +371,10 @@ static ssize_t sched_groups_write(struct file *file, const char __user *ubuf,
 	ret = xe_gt_sriov_pf_policy_set_sched_groups_mode(gt, m);
 	xe_pm_runtime_put(gt_to_xe(gt));
 
+	if (!ret)
+		for (i = 0; i <= xe_sriov_pf_get_totalvfs(gt_to_xe(gt)); i++)
+			sched_group_info_register(gt, i);
+
 	return (ret < 0) ? ret : size;
 }
 
@@ -693,6 +818,19 @@ void xe_gt_sriov_pf_debugfs_populate(struct xe_gt *gt, struct dentry *parent, un
 		return;
 	dent->d_inode->i_private = gt;
 
+	/*
+	 * we allocate an array to store the GT-level dentries for PF and all
+	 * VFs when creating the PF folder. Failing to create this allocation
+	 * is not fatal.
+	 */
+	if (vfid == 0)
+		gt->sriov.pf.debugfs_roots =
+			drmm_kcalloc(&gt_to_xe(gt)->drm,
+				     1 + xe_sriov_pf_get_totalvfs(gt_to_xe(gt)),
+				     sizeof(struct dentry *), GFP_KERNEL);
+	if (gt->sriov.pf.debugfs_roots)
+		gt->sriov.pf.debugfs_roots[vfid] = dent;
+
 	xe_gt_assert(gt, extract_gt(dent) == gt);
 	xe_gt_assert(gt, extract_vfid(dent) == vfid);
 
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h
index 667b8310478d..747ec5dae652 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h
@@ -15,6 +15,8 @@
 #include "xe_gt_sriov_pf_policy_types.h"
 #include "xe_gt_sriov_pf_service_types.h"
 
+struct dentry;
+
 /**
  * struct xe_gt_sriov_metadata - GT level per-VF metadata.
  */
@@ -52,6 +54,7 @@ struct xe_gt_sriov_pf_workers {
  * @migration: migration data.
  * @spare: PF-only provisioning configuration.
  * @vfs: metadata for all VFs.
+ * @debugfs_root: GT debugfs root for all VFs.
  */
 struct xe_gt_sriov_pf {
 	struct xe_gt_sriov_pf_workers workers;
@@ -60,6 +63,7 @@ struct xe_gt_sriov_pf {
 	struct xe_gt_sriov_pf_policy policy;
 	struct xe_gt_sriov_spare_config spare;
 	struct xe_gt_sriov_metadata *vfs;
+	struct dentry **debugfs_roots;
 };
 
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/10] drm/xe/sriov: Add debugfs with scheduler groups information
  2025-11-27  1:45 ` [PATCH 06/10] drm/xe/sriov: Add debugfs with scheduler groups information Daniele Ceraolo Spurio
@ 2025-12-02 16:24   ` Michal Wajdeczko
  2025-12-02 18:20     ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 44+ messages in thread
From: Michal Wajdeczko @ 2025-12-02 16:24 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-xe



On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
> When schedulers groups are enabled, we dynamically create new debugfs
> folders under the GT folder for the PF and each VF. The aim is to have

hmm, but since this is debugfs and groups (even uninitialized) still exist
then maybe it is easier (and safer) to create static list of groups (8 or 2)
and just print their configurations: either empty (if uninitialized) or with
engine names (after selecting predefined EGS mode)

then debugfs files (not directories) will reflect currently used configuration

> all the info for each VF under the sched_groups folder, with individual
> folders per-group under it. Right now the only info is the engine list,
> but follow up patches will add execution quantum and preemption timeout
> as well (which are configurable per-group-per-VF).

but groups EQ/PT configuration is not per-group, but only per-GT (GuC)
so it should be OK to have them level up

> 
> Note that the engine lists are not configurable per-VF and therefore not
> strictly needed in the VF folders. However, it is still useful to have
> them there because an admin might want to check the engine list while
> configuring the EQ/PT values for a specific VF.

duplicating RO entries per-VF seems odd, maybe it's better to use symlinks:


/sys/kernel/debug/dri/BDF/
├── sriov
:   ├── pf
    :   ├── tile0
        :   ├── gt0
            :   ├── sched_groups_mode			# disabled [media_slices]
                ├── sched_groups_exec_quantums_ms	# 20 20
                ├── sched_groups_preempt_timeouts_us	# 2000 2000
                ├── sched_groups
                :   ├── group0				# vcs0 vcs1 vecs0
                :   ├── group1				# vcs2 vcs3 vecs1
                :   ├── group2				#
                    :
                    └── group7
    ├── vf1
    :   ├── tile0
        :   ├── gt0
                ├── sched_groups_exec_quantums_ms	# 30 30
                ├── sched_groups_preempt_timeouts_us	# 3000 3000
                ├── sched_groups --> ../../../pf/tile0/gt0/sched_groups

    ├── vf2
    :   ├── tile0
        :   ├── gt0
                ├── sched_groups_exec_quantums_ms	# 40 40
                ├── sched_groups_preempt_timeouts_us	# 4000 4000
                ├── sched_groups --> ../../../pf/tile0/gt0/sched_groups

> 
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c | 140 +++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h   |   4 +
>  2 files changed, 143 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
> index 2953ef21a5ad..947e2b92d58a 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
> @@ -5,6 +5,7 @@
>  
>  #include <linux/debugfs.h>
>  
> +#include <drm/drm_managed.h>
>  #include <drm/drm_print.h>
>  #include <drm/drm_debugfs.h>
>  
> @@ -21,6 +22,7 @@
>  #include "xe_gt_sriov_pf_monitor.h"
>  #include "xe_gt_sriov_pf_policy.h"
>  #include "xe_gt_sriov_pf_service.h"
> +#include "xe_guc.h"
>  #include "xe_pm.h"
>  #include "xe_sriov_pf.h"
>  #include "xe_sriov_pf_provision.h"
> @@ -162,8 +164,127 @@ static void pf_add_policy_attrs(struct xe_gt *gt, struct dentry *parent)
>   *          :   ├── tile0
>   *              :   ├── gt0
>   *                  :   ├── sched_groups_mode
> + *                      ├── sched_groups
> + *                      :   ├── group0
> + *                          :   └── engines
> + *                          :
> + *                          └── groupN
> + *          :                   └── engines
> + *          ├── vf1
> + *          :   ├── tile0
> + *              :   ├── gt0
> + *                  :   ├── sched_groups
> + *                      :   ├── group0
> + *                          :   └── engines
>   */
>  
> +struct sched_group_info {
> +	struct xe_gt *gt;

we should be able to get gt and vfid from parents d_inode->i_private

> +	u32 *masks;
> +};
> +
> +static int sched_group_engines_info(struct seq_file *m, void *data)
> +{
> +	struct drm_printer p = drm_seq_file_printer(m);
> +	struct sched_group_info *gi = m->private;
> +	struct xe_gt *gt = gi->gt;
> +	struct xe_hw_engine *hwe;
> +	enum xe_hw_engine_id id;
> +	bool first = true;
> +
> +	for_each_hw_engine(hwe, gt, id) {
> +		u8 guc_class = xe_engine_class_to_guc_class(hwe->class);
> +		u32 mask = gi->masks[guc_class];
> +
> +		if (mask & BIT(hwe->logical_instance)) {
> +			drm_printf(&p, "%s%s", first ? "" : " ", hwe->name);
> +
> +			first = false;
> +		}
> +	}
> +
> +	drm_printf(&p, "\n");
> +
> +	return 0;
> +}
> +
> +static int sched_group_engines_open(struct inode *inode, struct file *file)
> +{
> +	return single_open(file, sched_group_engines_info, inode->i_private);
> +}
> +
> +static const struct file_operations sched_group_engines_fops = {
> +	.owner = THIS_MODULE,
> +	.open = sched_group_engines_open,
> +	.read = seq_read,
> +	.llseek = seq_lseek,
> +	.release = single_release,
> +};
> +
> +static void __sched_group_info_cleanup(struct xe_gt *gt, struct dentry *dent)
> +{
> +	if (dent->d_inode->i_private)
> +		drmm_kfree(&gt_to_xe(gt)->drm, dent->d_inode->i_private);
> +
> +	debugfs_remove_recursive(dent);
> +}
> +
> +#define GROUP_INFO_ROOT "sched_groups"
> +static void sched_group_info_register(struct xe_gt *gt, unsigned int vfid)
> +{
> +	struct xe_gt_sriov_pf_policy *policy = &gt->sriov.pf.policy;
> +	u32 mode = policy->guc.sched_groups.current_mode;
> +	u8 num_groups = policy->guc.sched_groups.modes[mode].num_masks / GUC_MAX_ENGINE_CLASSES;
> +	struct sched_group_info *infos;
> +	struct dentry *parent;
> +	struct dentry *old;
> +	u8 g;
> +
> +	if (!gt->sriov.pf.debugfs_roots)
> +		return;
> +
> +	/* remove existing debugfs entries for old groups */
> +	old = debugfs_lookup(GROUP_INFO_ROOT, gt->sriov.pf.debugfs_roots[vfid]);
> +	if (old)
> +		__sched_group_info_cleanup(gt, old);
> +
> +	/* re-create debugfs for new groups (if any)*/
> +	if (!num_groups)
> +		return;
> +
> +	parent = debugfs_create_dir(GROUP_INFO_ROOT, gt->sriov.pf.debugfs_roots[vfid]);
> +	if (IS_ERR(parent))
> +		return;
> +
> +	infos = drmm_kzalloc(&gt_to_xe(gt)->drm, sizeof(*infos) * num_groups, GFP_KERNEL);
> +	if (!infos)
> +		goto out_err;
> +	parent->d_inode->i_private = infos;
> +
> +	for (g = 0; g < num_groups; g++) {
> +		struct sched_group_info *info = &infos[g];
> +		u32 base = g * GUC_MAX_ENGINE_CLASSES;
> +		struct dentry *dent;
> +		char name[10];
> +
> +		snprintf(name, sizeof(name), "group%u", g);
> +		dent = debugfs_create_dir(name, parent);
> +		if (IS_ERR(dent))
> +			goto out_err;
> +
> +		info->gt = gt;
> +		info->masks = &policy->guc.sched_groups.modes[mode].masks[base];
> +
> +		dent->d_inode->i_private = info;
> +		debugfs_create_file("engines", 0644, dent, info, &sched_group_engines_fops);
> +	}
> +
> +	return;
> +
> +out_err:
> +	__sched_group_info_cleanup(gt, parent);
> +}
> +
>  static const char *sched_group_mode_to_string(enum xe_sriov_sched_group_modes mode)
>  {
>  	switch (mode) {
> @@ -220,7 +341,7 @@ static ssize_t sched_groups_write(struct file *file, const char __user *ubuf,
>  	struct xe_gt *gt = extract_gt(file_inode(file)->i_private);
>  	char name[32];
>  	int ret;
> -	int m;
> +	int m, i;
>  
>  	if (*pos)
>  		return -ESPIPE;
> @@ -250,6 +371,10 @@ static ssize_t sched_groups_write(struct file *file, const char __user *ubuf,
>  	ret = xe_gt_sriov_pf_policy_set_sched_groups_mode(gt, m);
>  	xe_pm_runtime_put(gt_to_xe(gt));
>  
> +	if (!ret)
> +		for (i = 0; i <= xe_sriov_pf_get_totalvfs(gt_to_xe(gt)); i++)
> +			sched_group_info_register(gt, i);
> +
>  	return (ret < 0) ? ret : size;
>  }
>  
> @@ -693,6 +818,19 @@ void xe_gt_sriov_pf_debugfs_populate(struct xe_gt *gt, struct dentry *parent, un
>  		return;
>  	dent->d_inode->i_private = gt;
>  
> +	/*
> +	 * we allocate an array to store the GT-level dentries for PF and all
> +	 * VFs when creating the PF folder. Failing to create this allocation
> +	 * is not fatal.
> +	 */
> +	if (vfid == 0)
> +		gt->sriov.pf.debugfs_roots =
> +			drmm_kcalloc(&gt_to_xe(gt)->drm,
> +				     1 + xe_sriov_pf_get_totalvfs(gt_to_xe(gt)),
> +				     sizeof(struct dentry *), GFP_KERNEL);
> +	if (gt->sriov.pf.debugfs_roots)
> +		gt->sriov.pf.debugfs_roots[vfid] = dent;

for per-VF data we have struct xe_gt_sriov_metadata that can be accessed from
struct xe_gt_sriov_pf

but maybe it shouldn't be needed at all if we create static egs files as
suggested above


> +
>  	xe_gt_assert(gt, extract_gt(dent) == gt);
>  	xe_gt_assert(gt, extract_vfid(dent) == vfid);
>  
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h
> index 667b8310478d..747ec5dae652 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h
> @@ -15,6 +15,8 @@
>  #include "xe_gt_sriov_pf_policy_types.h"
>  #include "xe_gt_sriov_pf_service_types.h"
>  
> +struct dentry;
> +
>  /**
>   * struct xe_gt_sriov_metadata - GT level per-VF metadata.
>   */
> @@ -52,6 +54,7 @@ struct xe_gt_sriov_pf_workers {
>   * @migration: migration data.
>   * @spare: PF-only provisioning configuration.
>   * @vfs: metadata for all VFs.
> + * @debugfs_root: GT debugfs root for all VFs.
>   */
>  struct xe_gt_sriov_pf {
>  	struct xe_gt_sriov_pf_workers workers;
> @@ -60,6 +63,7 @@ struct xe_gt_sriov_pf {
>  	struct xe_gt_sriov_pf_policy policy;
>  	struct xe_gt_sriov_spare_config spare;
>  	struct xe_gt_sriov_metadata *vfs;
> +	struct dentry **debugfs_roots;
>  };
>  
>  #endif


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/10] drm/xe/sriov: Add debugfs with scheduler groups information
  2025-12-02 16:24   ` Michal Wajdeczko
@ 2025-12-02 18:20     ` Daniele Ceraolo Spurio
  2025-12-02 21:31       ` Michal Wajdeczko
  0 siblings, 1 reply; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-12-02 18:20 UTC (permalink / raw)
  To: Michal Wajdeczko, intel-xe



On 12/2/2025 8:24 AM, Michal Wajdeczko wrote:
>
> On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
>> When schedulers groups are enabled, we dynamically create new debugfs
>> folders under the GT folder for the PF and each VF. The aim is to have
> hmm, but since this is debugfs and groups (even uninitialized) still exist
> then maybe it is easier (and safer) to create static list of groups (8 or 2)
> and just print their configurations: either empty (if uninitialized) or with
> engine names (after selecting predefined EGS mode)

ok

>
> then debugfs files (not directories) will reflect currently used configuration

See below about the directories

>
>> all the info for each VF under the sched_groups folder, with individual
>> folders per-group under it. Right now the only info is the engine list,
>> but follow up patches will add execution quantum and preemption timeout
>> as well (which are configurable per-group-per-VF).
> but groups EQ/PT configuration is not per-group, but only per-GT (GuC)
> so it should be OK to have them level up

ok

>
>> Note that the engine lists are not configurable per-VF and therefore not
>> strictly needed in the VF folders. However, it is still useful to have
>> them there because an admin might want to check the engine list while
>> configuring the EQ/PT values for a specific VF.
> duplicating RO entries per-VF seems odd, maybe it's better to use symlinks:
>
>
> /sys/kernel/debug/dri/BDF/
> ├── sriov
> :   ├── pf
>      :   ├── tile0
>          :   ├── gt0
>              :   ├── sched_groups_mode			# disabled [media_slices]
>                  ├── sched_groups_exec_quantums_ms	# 20 20
>                  ├── sched_groups_preempt_timeouts_us	# 2000 2000
>                  ├── sched_groups
>                  :   ├── group0				# vcs0 vcs1 vecs0
>                  :   ├── group1				# vcs2 vcs3 vecs1
>                  :   ├── group2				#
>                      :
>                      └── group7
>      ├── vf1
>      :   ├── tile0
>          :   ├── gt0
>                  ├── sched_groups_exec_quantums_ms	# 30 30
>                  ├── sched_groups_preempt_timeouts_us	# 3000 3000
>                  ├── sched_groups --> ../../../pf/tile0/gt0/sched_groups
>
>      ├── vf2
>      :   ├── tile0
>          :   ├── gt0
>                  ├── sched_groups_exec_quantums_ms	# 40 40
>                  ├── sched_groups_preempt_timeouts_us	# 4000 4000
>                  ├── sched_groups --> ../../../pf/tile0/gt0/sched_groups

My idea here was to have exec quantum and preempt timeout configurable 
for a single group, so the end result would be something like this for 
PF and each VF:

	    :   ├── gt1
		:   ├── sched_groups
		    :   ├── group0
			:   ├── engines
			    ├── exec_quantum_ms
			    └── preempt_timeout_us
			:
			└── groupN
			    ├── engines
			    ├── exec_quantum_ms
			    └── preempt_timeout_us


Which only works if groups are folders (which is also why I can't 
symlink the whole sched_groups folder). If you think that's overkill and 
that we should just keep the option to set all the groups at once then I 
can do that and not have folders. I can also just drop the engine lists 
from VF debufs, my idea here was that if an admin is setting the EQ/PT 
e.g. for VF1 group 1 from within the VF debugfs folder it might be 
useful to have a way to check which engines are in the group without 
having to go back to the PF folder, but it's not really an essential thing.


>
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c | 140 +++++++++++++++++++-
>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h   |   4 +
>>   2 files changed, 143 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
>> index 2953ef21a5ad..947e2b92d58a 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
>> @@ -5,6 +5,7 @@
>>   
>>   #include <linux/debugfs.h>
>>   
>> +#include <drm/drm_managed.h>
>>   #include <drm/drm_print.h>
>>   #include <drm/drm_debugfs.h>
>>   
>> @@ -21,6 +22,7 @@
>>   #include "xe_gt_sriov_pf_monitor.h"
>>   #include "xe_gt_sriov_pf_policy.h"
>>   #include "xe_gt_sriov_pf_service.h"
>> +#include "xe_guc.h"
>>   #include "xe_pm.h"
>>   #include "xe_sriov_pf.h"
>>   #include "xe_sriov_pf_provision.h"
>> @@ -162,8 +164,127 @@ static void pf_add_policy_attrs(struct xe_gt *gt, struct dentry *parent)
>>    *          :   ├── tile0
>>    *              :   ├── gt0
>>    *                  :   ├── sched_groups_mode
>> + *                      ├── sched_groups
>> + *                      :   ├── group0
>> + *                          :   └── engines
>> + *                          :
>> + *                          └── groupN
>> + *          :                   └── engines
>> + *          ├── vf1
>> + *          :   ├── tile0
>> + *              :   ├── gt0
>> + *                  :   ├── sched_groups
>> + *                      :   ├── group0
>> + *                          :   └── engines
>>    */
>>   
>> +struct sched_group_info {
>> +	struct xe_gt *gt;
> we should be able to get gt and vfid from parents d_inode->i_private

In this implementation the parent of the engines folder is groupX, and 
that one doesn't have that info in d_inode->i_private. Might change if 
we drop the subfolders.

Daniele

>
>> +	u32 *masks;
>> +};
>> +
>> +static int sched_group_engines_info(struct seq_file *m, void *data)
>> +{
>> +	struct drm_printer p = drm_seq_file_printer(m);
>> +	struct sched_group_info *gi = m->private;
>> +	struct xe_gt *gt = gi->gt;
>> +	struct xe_hw_engine *hwe;
>> +	enum xe_hw_engine_id id;
>> +	bool first = true;
>> +
>> +	for_each_hw_engine(hwe, gt, id) {
>> +		u8 guc_class = xe_engine_class_to_guc_class(hwe->class);
>> +		u32 mask = gi->masks[guc_class];
>> +
>> +		if (mask & BIT(hwe->logical_instance)) {
>> +			drm_printf(&p, "%s%s", first ? "" : " ", hwe->name);
>> +
>> +			first = false;
>> +		}
>> +	}
>> +
>> +	drm_printf(&p, "\n");
>> +
>> +	return 0;
>> +}
>> +
>> +static int sched_group_engines_open(struct inode *inode, struct file *file)
>> +{
>> +	return single_open(file, sched_group_engines_info, inode->i_private);
>> +}
>> +
>> +static const struct file_operations sched_group_engines_fops = {
>> +	.owner = THIS_MODULE,
>> +	.open = sched_group_engines_open,
>> +	.read = seq_read,
>> +	.llseek = seq_lseek,
>> +	.release = single_release,
>> +};
>> +
>> +static void __sched_group_info_cleanup(struct xe_gt *gt, struct dentry *dent)
>> +{
>> +	if (dent->d_inode->i_private)
>> +		drmm_kfree(&gt_to_xe(gt)->drm, dent->d_inode->i_private);
>> +
>> +	debugfs_remove_recursive(dent);
>> +}
>> +
>> +#define GROUP_INFO_ROOT "sched_groups"
>> +static void sched_group_info_register(struct xe_gt *gt, unsigned int vfid)
>> +{
>> +	struct xe_gt_sriov_pf_policy *policy = &gt->sriov.pf.policy;
>> +	u32 mode = policy->guc.sched_groups.current_mode;
>> +	u8 num_groups = policy->guc.sched_groups.modes[mode].num_masks / GUC_MAX_ENGINE_CLASSES;
>> +	struct sched_group_info *infos;
>> +	struct dentry *parent;
>> +	struct dentry *old;
>> +	u8 g;
>> +
>> +	if (!gt->sriov.pf.debugfs_roots)
>> +		return;
>> +
>> +	/* remove existing debugfs entries for old groups */
>> +	old = debugfs_lookup(GROUP_INFO_ROOT, gt->sriov.pf.debugfs_roots[vfid]);
>> +	if (old)
>> +		__sched_group_info_cleanup(gt, old);
>> +
>> +	/* re-create debugfs for new groups (if any)*/
>> +	if (!num_groups)
>> +		return;
>> +
>> +	parent = debugfs_create_dir(GROUP_INFO_ROOT, gt->sriov.pf.debugfs_roots[vfid]);
>> +	if (IS_ERR(parent))
>> +		return;
>> +
>> +	infos = drmm_kzalloc(&gt_to_xe(gt)->drm, sizeof(*infos) * num_groups, GFP_KERNEL);
>> +	if (!infos)
>> +		goto out_err;
>> +	parent->d_inode->i_private = infos;
>> +
>> +	for (g = 0; g < num_groups; g++) {
>> +		struct sched_group_info *info = &infos[g];
>> +		u32 base = g * GUC_MAX_ENGINE_CLASSES;
>> +		struct dentry *dent;
>> +		char name[10];
>> +
>> +		snprintf(name, sizeof(name), "group%u", g);
>> +		dent = debugfs_create_dir(name, parent);
>> +		if (IS_ERR(dent))
>> +			goto out_err;
>> +
>> +		info->gt = gt;
>> +		info->masks = &policy->guc.sched_groups.modes[mode].masks[base];
>> +
>> +		dent->d_inode->i_private = info;
>> +		debugfs_create_file("engines", 0644, dent, info, &sched_group_engines_fops);
>> +	}
>> +
>> +	return;
>> +
>> +out_err:
>> +	__sched_group_info_cleanup(gt, parent);
>> +}
>> +
>>   static const char *sched_group_mode_to_string(enum xe_sriov_sched_group_modes mode)
>>   {
>>   	switch (mode) {
>> @@ -220,7 +341,7 @@ static ssize_t sched_groups_write(struct file *file, const char __user *ubuf,
>>   	struct xe_gt *gt = extract_gt(file_inode(file)->i_private);
>>   	char name[32];
>>   	int ret;
>> -	int m;
>> +	int m, i;
>>   
>>   	if (*pos)
>>   		return -ESPIPE;
>> @@ -250,6 +371,10 @@ static ssize_t sched_groups_write(struct file *file, const char __user *ubuf,
>>   	ret = xe_gt_sriov_pf_policy_set_sched_groups_mode(gt, m);
>>   	xe_pm_runtime_put(gt_to_xe(gt));
>>   
>> +	if (!ret)
>> +		for (i = 0; i <= xe_sriov_pf_get_totalvfs(gt_to_xe(gt)); i++)
>> +			sched_group_info_register(gt, i);
>> +
>>   	return (ret < 0) ? ret : size;
>>   }
>>   
>> @@ -693,6 +818,19 @@ void xe_gt_sriov_pf_debugfs_populate(struct xe_gt *gt, struct dentry *parent, un
>>   		return;
>>   	dent->d_inode->i_private = gt;
>>   
>> +	/*
>> +	 * we allocate an array to store the GT-level dentries for PF and all
>> +	 * VFs when creating the PF folder. Failing to create this allocation
>> +	 * is not fatal.
>> +	 */
>> +	if (vfid == 0)
>> +		gt->sriov.pf.debugfs_roots =
>> +			drmm_kcalloc(&gt_to_xe(gt)->drm,
>> +				     1 + xe_sriov_pf_get_totalvfs(gt_to_xe(gt)),
>> +				     sizeof(struct dentry *), GFP_KERNEL);
>> +	if (gt->sriov.pf.debugfs_roots)
>> +		gt->sriov.pf.debugfs_roots[vfid] = dent;
> for per-VF data we have struct xe_gt_sriov_metadata that can be accessed from
> struct xe_gt_sriov_pf
>
> but maybe it shouldn't be needed at all if we create static egs files as
> suggested above
>
>
>> +
>>   	xe_gt_assert(gt, extract_gt(dent) == gt);
>>   	xe_gt_assert(gt, extract_vfid(dent) == vfid);
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h
>> index 667b8310478d..747ec5dae652 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h
>> @@ -15,6 +15,8 @@
>>   #include "xe_gt_sriov_pf_policy_types.h"
>>   #include "xe_gt_sriov_pf_service_types.h"
>>   
>> +struct dentry;
>> +
>>   /**
>>    * struct xe_gt_sriov_metadata - GT level per-VF metadata.
>>    */
>> @@ -52,6 +54,7 @@ struct xe_gt_sriov_pf_workers {
>>    * @migration: migration data.
>>    * @spare: PF-only provisioning configuration.
>>    * @vfs: metadata for all VFs.
>> + * @debugfs_root: GT debugfs root for all VFs.
>>    */
>>   struct xe_gt_sriov_pf {
>>   	struct xe_gt_sriov_pf_workers workers;
>> @@ -60,6 +63,7 @@ struct xe_gt_sriov_pf {
>>   	struct xe_gt_sriov_pf_policy policy;
>>   	struct xe_gt_sriov_spare_config spare;
>>   	struct xe_gt_sriov_metadata *vfs;
>> +	struct dentry **debugfs_roots;
>>   };
>>   
>>   #endif


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/10] drm/xe/sriov: Add debugfs with scheduler groups information
  2025-12-02 18:20     ` Daniele Ceraolo Spurio
@ 2025-12-02 21:31       ` Michal Wajdeczko
  0 siblings, 0 replies; 44+ messages in thread
From: Michal Wajdeczko @ 2025-12-02 21:31 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-xe



On 12/2/2025 7:20 PM, Daniele Ceraolo Spurio wrote:
> 
> 
> On 12/2/2025 8:24 AM, Michal Wajdeczko wrote:
>>
>> On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
>>> When schedulers groups are enabled, we dynamically create new debugfs
>>> folders under the GT folder for the PF and each VF. The aim is to have
>> hmm, but since this is debugfs and groups (even uninitialized) still exist
>> then maybe it is easier (and safer) to create static list of groups (8 or 2)
>> and just print their configurations: either empty (if uninitialized) or with
>> engine names (after selecting predefined EGS mode)
> 
> ok
> 
>>
>> then debugfs files (not directories) will reflect currently used configuration
> 
> See below about the directories
> 
>>
>>> all the info for each VF under the sched_groups folder, with individual
>>> folders per-group under it. Right now the only info is the engine list,
>>> but follow up patches will add execution quantum and preemption timeout
>>> as well (which are configurable per-group-per-VF).
>> but groups EQ/PT configuration is not per-group, but only per-GT (GuC)
>> so it should be OK to have them level up
> 
> ok
> 
>>
>>> Note that the engine lists are not configurable per-VF and therefore not
>>> strictly needed in the VF folders. However, it is still useful to have
>>> them there because an admin might want to check the engine list while
>>> configuring the EQ/PT values for a specific VF.
>> duplicating RO entries per-VF seems odd, maybe it's better to use symlinks:
>>
>>
>> /sys/kernel/debug/dri/BDF/
>> ├── sriov
>> :   ├── pf
>>      :   ├── tile0
>>          :   ├── gt0
>>              :   ├── sched_groups_mode            # disabled [media_slices]
>>                  ├── sched_groups_exec_quantums_ms    # 20 20
>>                  ├── sched_groups_preempt_timeouts_us    # 2000 2000
>>                  ├── sched_groups
>>                  :   ├── group0                # vcs0 vcs1 vecs0
>>                  :   ├── group1                # vcs2 vcs3 vecs1
>>                  :   ├── group2                #
>>                      :
>>                      └── group7
>>      ├── vf1
>>      :   ├── tile0
>>          :   ├── gt0
>>                  ├── sched_groups_exec_quantums_ms    # 30 30
>>                  ├── sched_groups_preempt_timeouts_us    # 3000 3000
>>                  ├── sched_groups --> ../../../pf/tile0/gt0/sched_groups
>>
>>      ├── vf2
>>      :   ├── tile0
>>          :   ├── gt0
>>                  ├── sched_groups_exec_quantums_ms    # 40 40
>>                  ├── sched_groups_preempt_timeouts_us    # 4000 4000
>>                  ├── sched_groups --> ../../../pf/tile0/gt0/sched_groups
> 
> My idea here was to have exec quantum and preempt timeout configurable for a single group, so the end result would be something like this for PF and each VF:
> 
>         :   ├── gt1
>         :   ├── sched_groups
>             :   ├── group0
>             :   ├── engines
>                 ├── exec_quantum_ms
>                 └── preempt_timeout_us
>             :
>             └── groupN
>                 ├── engines
>                 ├── exec_quantum_ms
>                 └── preempt_timeout_us
> 
> 
> Which only works if groups are folders (which is also why I can't symlink the whole sched_groups folder). If you think that's overkill

yes, since GuC does not expose it, so why bother?

this whole EGS feature requires some experienced admin who should already know configs for all groups

> and that we should just keep the option to set all the groups at once then I can do that and not have folders. I can also just drop the engine lists from VF debufs, my idea here was that if an admin is setting the EQ/PT e.g. for VF1 group 1 from within the VF debugfs folder it might be useful to have a way to check which engines are in the group without having to go back to the PF folder, but it's not really an essential thing.

this useful hint how the groups look like can be done by symlink (which is also to some extend overkill, since it points to well know location)

my point is that we should expose only minimal set of files that are actually needed to exposes all required information

creating files with redundant info will just cause more test code to be developed ;)

> 
> 
>>
>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>> ---
>>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c | 140 +++++++++++++++++++-
>>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h   |   4 +
>>>   2 files changed, 143 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
>>> index 2953ef21a5ad..947e2b92d58a 100644
>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
>>> @@ -5,6 +5,7 @@
>>>     #include <linux/debugfs.h>
>>>   +#include <drm/drm_managed.h>
>>>   #include <drm/drm_print.h>
>>>   #include <drm/drm_debugfs.h>
>>>   @@ -21,6 +22,7 @@
>>>   #include "xe_gt_sriov_pf_monitor.h"
>>>   #include "xe_gt_sriov_pf_policy.h"
>>>   #include "xe_gt_sriov_pf_service.h"
>>> +#include "xe_guc.h"
>>>   #include "xe_pm.h"
>>>   #include "xe_sriov_pf.h"
>>>   #include "xe_sriov_pf_provision.h"
>>> @@ -162,8 +164,127 @@ static void pf_add_policy_attrs(struct xe_gt *gt, struct dentry *parent)
>>>    *          :   ├── tile0
>>>    *              :   ├── gt0
>>>    *                  :   ├── sched_groups_mode
>>> + *                      ├── sched_groups
>>> + *                      :   ├── group0
>>> + *                          :   └── engines
>>> + *                          :
>>> + *                          └── groupN
>>> + *          :                   └── engines
>>> + *          ├── vf1
>>> + *          :   ├── tile0
>>> + *              :   ├── gt0
>>> + *                  :   ├── sched_groups
>>> + *                      :   ├── group0
>>> + *                          :   └── engines
>>>    */
>>>   +struct sched_group_info {
>>> +    struct xe_gt *gt;
>> we should be able to get gt and vfid from parents d_inode->i_private
> 
> In this implementation the parent of the engines folder is groupX, and that one doesn't have that info in d_inode->i_private. Might change if we drop the subfolders.

but grand-parent or grand-grand parent will eventually have expected priv data (and since we know the structure we should know how deep to search)

> 
> Daniele
> 
>>
>>> +    u32 *masks;
>>> +};
>>> +
>>> +static int sched_group_engines_info(struct seq_file *m, void *data)
>>> +{
>>> +    struct drm_printer p = drm_seq_file_printer(m);
>>> +    struct sched_group_info *gi = m->private;
>>> +    struct xe_gt *gt = gi->gt;
>>> +    struct xe_hw_engine *hwe;
>>> +    enum xe_hw_engine_id id;
>>> +    bool first = true;
>>> +
>>> +    for_each_hw_engine(hwe, gt, id) {
>>> +        u8 guc_class = xe_engine_class_to_guc_class(hwe->class);
>>> +        u32 mask = gi->masks[guc_class];
>>> +
>>> +        if (mask & BIT(hwe->logical_instance)) {
>>> +            drm_printf(&p, "%s%s", first ? "" : " ", hwe->name);
>>> +
>>> +            first = false;
>>> +        }
>>> +    }
>>> +
>>> +    drm_printf(&p, "\n");
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int sched_group_engines_open(struct inode *inode, struct file *file)
>>> +{
>>> +    return single_open(file, sched_group_engines_info, inode->i_private);
>>> +}
>>> +
>>> +static const struct file_operations sched_group_engines_fops = {
>>> +    .owner = THIS_MODULE,
>>> +    .open = sched_group_engines_open,
>>> +    .read = seq_read,
>>> +    .llseek = seq_lseek,
>>> +    .release = single_release,
>>> +};
>>> +
>>> +static void __sched_group_info_cleanup(struct xe_gt *gt, struct dentry *dent)
>>> +{
>>> +    if (dent->d_inode->i_private)
>>> +        drmm_kfree(&gt_to_xe(gt)->drm, dent->d_inode->i_private);
>>> +
>>> +    debugfs_remove_recursive(dent);
>>> +}
>>> +
>>> +#define GROUP_INFO_ROOT "sched_groups"
>>> +static void sched_group_info_register(struct xe_gt *gt, unsigned int vfid)
>>> +{
>>> +    struct xe_gt_sriov_pf_policy *policy = &gt->sriov.pf.policy;
>>> +    u32 mode = policy->guc.sched_groups.current_mode;
>>> +    u8 num_groups = policy->guc.sched_groups.modes[mode].num_masks / GUC_MAX_ENGINE_CLASSES;
>>> +    struct sched_group_info *infos;
>>> +    struct dentry *parent;
>>> +    struct dentry *old;
>>> +    u8 g;
>>> +
>>> +    if (!gt->sriov.pf.debugfs_roots)
>>> +        return;
>>> +
>>> +    /* remove existing debugfs entries for old groups */
>>> +    old = debugfs_lookup(GROUP_INFO_ROOT, gt->sriov.pf.debugfs_roots[vfid]);
>>> +    if (old)
>>> +        __sched_group_info_cleanup(gt, old);
>>> +
>>> +    /* re-create debugfs for new groups (if any)*/
>>> +    if (!num_groups)
>>> +        return;
>>> +
>>> +    parent = debugfs_create_dir(GROUP_INFO_ROOT, gt->sriov.pf.debugfs_roots[vfid]);
>>> +    if (IS_ERR(parent))
>>> +        return;
>>> +
>>> +    infos = drmm_kzalloc(&gt_to_xe(gt)->drm, sizeof(*infos) * num_groups, GFP_KERNEL);
>>> +    if (!infos)
>>> +        goto out_err;
>>> +    parent->d_inode->i_private = infos;
>>> +
>>> +    for (g = 0; g < num_groups; g++) {
>>> +        struct sched_group_info *info = &infos[g];
>>> +        u32 base = g * GUC_MAX_ENGINE_CLASSES;
>>> +        struct dentry *dent;
>>> +        char name[10];
>>> +
>>> +        snprintf(name, sizeof(name), "group%u", g);
>>> +        dent = debugfs_create_dir(name, parent);
>>> +        if (IS_ERR(dent))
>>> +            goto out_err;
>>> +
>>> +        info->gt = gt;
>>> +        info->masks = &policy->guc.sched_groups.modes[mode].masks[base];
>>> +
>>> +        dent->d_inode->i_private = info;
>>> +        debugfs_create_file("engines", 0644, dent, info, &sched_group_engines_fops);
>>> +    }
>>> +
>>> +    return;
>>> +
>>> +out_err:
>>> +    __sched_group_info_cleanup(gt, parent);
>>> +}
>>> +
>>>   static const char *sched_group_mode_to_string(enum xe_sriov_sched_group_modes mode)
>>>   {
>>>       switch (mode) {
>>> @@ -220,7 +341,7 @@ static ssize_t sched_groups_write(struct file *file, const char __user *ubuf,
>>>       struct xe_gt *gt = extract_gt(file_inode(file)->i_private);
>>>       char name[32];
>>>       int ret;
>>> -    int m;
>>> +    int m, i;
>>>         if (*pos)
>>>           return -ESPIPE;
>>> @@ -250,6 +371,10 @@ static ssize_t sched_groups_write(struct file *file, const char __user *ubuf,
>>>       ret = xe_gt_sriov_pf_policy_set_sched_groups_mode(gt, m);
>>>       xe_pm_runtime_put(gt_to_xe(gt));
>>>   +    if (!ret)
>>> +        for (i = 0; i <= xe_sriov_pf_get_totalvfs(gt_to_xe(gt)); i++)
>>> +            sched_group_info_register(gt, i);
>>> +
>>>       return (ret < 0) ? ret : size;
>>>   }
>>>   @@ -693,6 +818,19 @@ void xe_gt_sriov_pf_debugfs_populate(struct xe_gt *gt, struct dentry *parent, un
>>>           return;
>>>       dent->d_inode->i_private = gt;
>>>   +    /*
>>> +     * we allocate an array to store the GT-level dentries for PF and all
>>> +     * VFs when creating the PF folder. Failing to create this allocation
>>> +     * is not fatal.
>>> +     */
>>> +    if (vfid == 0)
>>> +        gt->sriov.pf.debugfs_roots =
>>> +            drmm_kcalloc(&gt_to_xe(gt)->drm,
>>> +                     1 + xe_sriov_pf_get_totalvfs(gt_to_xe(gt)),
>>> +                     sizeof(struct dentry *), GFP_KERNEL);
>>> +    if (gt->sriov.pf.debugfs_roots)
>>> +        gt->sriov.pf.debugfs_roots[vfid] = dent;
>> for per-VF data we have struct xe_gt_sriov_metadata that can be accessed from
>> struct xe_gt_sriov_pf
>>
>> but maybe it shouldn't be needed at all if we create static egs files as
>> suggested above
>>
>>
>>> +
>>>       xe_gt_assert(gt, extract_gt(dent) == gt);
>>>       xe_gt_assert(gt, extract_vfid(dent) == vfid);
>>>   diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h
>>> index 667b8310478d..747ec5dae652 100644
>>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h
>>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_types.h
>>> @@ -15,6 +15,8 @@
>>>   #include "xe_gt_sriov_pf_policy_types.h"
>>>   #include "xe_gt_sriov_pf_service_types.h"
>>>   +struct dentry;
>>> +
>>>   /**
>>>    * struct xe_gt_sriov_metadata - GT level per-VF metadata.
>>>    */
>>> @@ -52,6 +54,7 @@ struct xe_gt_sriov_pf_workers {
>>>    * @migration: migration data.
>>>    * @spare: PF-only provisioning configuration.
>>>    * @vfs: metadata for all VFs.
>>> + * @debugfs_root: GT debugfs root for all VFs.
>>>    */
>>>   struct xe_gt_sriov_pf {
>>>       struct xe_gt_sriov_pf_workers workers;
>>> @@ -60,6 +63,7 @@ struct xe_gt_sriov_pf {
>>>       struct xe_gt_sriov_pf_policy policy;
>>>       struct xe_gt_sriov_spare_config spare;
>>>       struct xe_gt_sriov_metadata *vfs;
>>> +    struct dentry **debugfs_roots;
>>>   };
>>>     #endif
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 07/10] drm/xe/sriov: Prep for multiple exec quantums and preemption timeouts
  2025-11-27  1:45 [PATCH 00/10] Introduce SRIOV scheduler groups Daniele Ceraolo Spurio
                   ` (5 preceding siblings ...)
  2025-11-27  1:45 ` [PATCH 06/10] drm/xe/sriov: Add debugfs with scheduler groups information Daniele Ceraolo Spurio
@ 2025-11-27  1:45 ` Daniele Ceraolo Spurio
  2025-12-02 16:42   ` Michal Wajdeczko
  2025-11-27  1:45 ` [PATCH 08/10] drm/xe/sriov: Add functions to set exec quantums for each group Daniele Ceraolo Spurio
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-11-27  1:45 UTC (permalink / raw)
  To: intel-xe; +Cc: Daniele Ceraolo Spurio, Michal Wajdeczko

Each scheduler group can be independently configured with its own exec
quantum and preemption timeouts. The existing KLVs to configure those
parameter will apply the value to all groups (even if they're not
enabled at the moment).

When scheduler groups are disable the GuC used the values from Group 0.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/abi/guc_klvs_abi.h         |  7 ++++--
 drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c    | 25 +++++++++++++------
 .../gpu/drm/xe/xe_gt_sriov_pf_config_types.h  |  5 ++--
 3 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
index a6dce9da339f..48f47e26132d 100644
--- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
+++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
@@ -290,7 +290,9 @@ enum  {
  *      infinitely long compute or shader kernel. In such a scenario, the
  *      PF would need to trigger a VM PAUSE and then change the KLV to force
  *      it to take effect. Such cases might typically happen on a 1PF+1VF
- *      Virtualization config enabled for heavier workloads like AI/ML.
+ *      Virtualization config enabled for heavier workloads like AI/ML. If
+ *      scheduling groups are supported, the provided value is applied to all
+ *      groups (even if they've not yet been enabled).
  *
  *      The max value for this KLV is 100 seconds, anything exceeding that
  *      will be clamped to the max.
@@ -312,7 +314,8 @@ enum  {
  *      In this case, the PF would need to trigger a VM PAUSE and then change
  *      the KLV to force it to take effect. Such cases might typically happen
  *      on a 1PF+1VF Virtualization config enabled for heavier workloads like
- *      AI/ML.
+ *      AI/ML. If scheduling groups are supported, the provided value is applied
+ *      to all groups (even if they've not yet been enabled).
  *
  *      The max value for this KLV is 100 seconds, anything exceeding that
  *      will be clamped to the max.
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
index 59c5c6b4d994..eb547fedb6da 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
@@ -298,10 +298,10 @@ static u32 encode_config(u32 *cfg, const struct xe_gt_sriov_config *config, bool
 	}
 
 	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_EXEC_QUANTUM);
-	cfg[n++] = config->exec_quantum;
+	cfg[n++] = config->exec_quantum[0];
 
 	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_PREEMPT_TIMEOUT);
-	cfg[n++] = config->preempt_timeout;
+	cfg[n++] = config->preempt_timeout[0];
 
 #define encode_threshold_config(TAG, ...) ({					\
 	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_THRESHOLD_##TAG);			\
@@ -1857,12 +1857,15 @@ static int pf_provision_exec_quantum(struct xe_gt *gt, unsigned int vfid,
 {
 	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
 	int err;
+	int i;
 
 	err = pf_push_vf_cfg_exec_quantum(gt, vfid, &exec_quantum);
 	if (unlikely(err))
 		return err;
 
-	config->exec_quantum = exec_quantum;
+	for (i = 0; i < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT; i++)
+		config->exec_quantum[i] = exec_quantum;
+
 	return 0;
 }
 
@@ -1870,7 +1873,7 @@ static u32 pf_get_exec_quantum(struct xe_gt *gt, unsigned int vfid)
 {
 	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
 
-	return config->exec_quantum;
+	return config->exec_quantum[0];
 }
 
 /**
@@ -1987,12 +1990,14 @@ static int pf_provision_preempt_timeout(struct xe_gt *gt, unsigned int vfid,
 {
 	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
 	int err;
+	int i;
 
 	err = pf_push_vf_cfg_preempt_timeout(gt, vfid, &preempt_timeout);
 	if (unlikely(err))
 		return err;
 
-	config->preempt_timeout = preempt_timeout;
+	for (i = 0; i < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT; i++)
+		config->preempt_timeout[i] = preempt_timeout;
 
 	return 0;
 }
@@ -2001,7 +2006,7 @@ static u32 pf_get_preempt_timeout(struct xe_gt *gt, unsigned int vfid)
 {
 	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
 
-	return config->preempt_timeout;
+	return config->preempt_timeout[0];
 }
 
 /**
@@ -2180,10 +2185,14 @@ u32 xe_gt_sriov_pf_config_get_sched_priority(struct xe_gt *gt, unsigned int vfid
 
 static void pf_reset_config_sched(struct xe_gt *gt, struct xe_gt_sriov_config *config)
 {
+	int i;
+
 	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
 
-	config->exec_quantum = 0;
-	config->preempt_timeout = 0;
+	for (i = 0; i < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT; i++) {
+		config->exec_quantum[i] = 0;
+		config->preempt_timeout[i] = 0;
+	}
 }
 
 static int pf_provision_threshold(struct xe_gt *gt, unsigned int vfid,
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h
index 686c7b3b6d7a..abf003946242 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h
@@ -6,6 +6,7 @@
 #ifndef _XE_GT_SRIOV_PF_CONFIG_TYPES_H_
 #define _XE_GT_SRIOV_PF_CONFIG_TYPES_H_
 
+#include "abi/guc_klvs_abi.h"
 #include "xe_ggtt_types.h"
 #include "xe_guc_klv_thresholds_set_types.h"
 
@@ -30,9 +31,9 @@ struct xe_gt_sriov_config {
 	/** @begin_db: start index of GuC doorbell ID range. */
 	u16 begin_db;
 	/** @exec_quantum: execution-quantum in milliseconds. */
-	u32 exec_quantum;
+	u32 exec_quantum[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];
 	/** @preempt_timeout: preemption timeout in microseconds. */
-	u32 preempt_timeout;
+	u32 preempt_timeout[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];
 	/** @sched_priority: scheduling priority. */
 	u32 sched_priority;
 	/** @thresholds: GuC thresholds for adverse events notifications. */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/10] drm/xe/sriov: Prep for multiple exec quantums and preemption timeouts
  2025-11-27  1:45 ` [PATCH 07/10] drm/xe/sriov: Prep for multiple exec quantums and preemption timeouts Daniele Ceraolo Spurio
@ 2025-12-02 16:42   ` Michal Wajdeczko
  2025-12-06  1:55     ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 44+ messages in thread
From: Michal Wajdeczko @ 2025-12-02 16:42 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-xe



On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
> Each scheduler group can be independently configured with its own exec
> quantum and preemption timeouts. The existing KLVs to configure those
> parameter will apply the value to all groups (even if they're not

typo: parameters

> enabled at the moment).
> 
> When scheduler groups are disable the GuC used the values from Group 0.

typo: ... disabled then GuC uses ... ?

> 
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> ---
>  drivers/gpu/drm/xe/abi/guc_klvs_abi.h         |  7 ++++--
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c    | 25 +++++++++++++------
>  .../gpu/drm/xe/xe_gt_sriov_pf_config_types.h  |  5 ++--
>  3 files changed, 25 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
> index a6dce9da339f..48f47e26132d 100644
> --- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
> +++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
> @@ -290,7 +290,9 @@ enum  {
>   *      infinitely long compute or shader kernel. In such a scenario, the
>   *      PF would need to trigger a VM PAUSE and then change the KLV to force
>   *      it to take effect. Such cases might typically happen on a 1PF+1VF
> - *      Virtualization config enabled for heavier workloads like AI/ML.
> + *      Virtualization config enabled for heavier workloads like AI/ML. If

start with new empty line so it will rendered as separate paragraph

> + *      scheduling groups are supported, the provided value is applied to all
> + *      groups (even if they've not yet been enabled).

should we document also here that "The scheduling groups are available starting from GuC 70.53" ?

>   *
>   *      The max value for this KLV is 100 seconds, anything exceeding that
>   *      will be clamped to the max.
> @@ -312,7 +314,8 @@ enum  {
>   *      In this case, the PF would need to trigger a VM PAUSE and then change
>   *      the KLV to force it to take effect. Such cases might typically happen
>   *      on a 1PF+1VF Virtualization config enabled for heavier workloads like
> - *      AI/ML.
> + *      AI/ML. If scheduling groups are supported, the provided value is applied
> + *      to all groups (even if they've not yet been enabled).

ditto

>   *
>   *      The max value for this KLV is 100 seconds, anything exceeding that
>   *      will be clamped to the max.
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> index 59c5c6b4d994..eb547fedb6da 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> @@ -298,10 +298,10 @@ static u32 encode_config(u32 *cfg, const struct xe_gt_sriov_config *config, bool
>  	}
>  
>  	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_EXEC_QUANTUM);
> -	cfg[n++] = config->exec_quantum;
> +	cfg[n++] = config->exec_quantum[0];
>  
>  	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_PREEMPT_TIMEOUT);
> -	cfg[n++] = config->preempt_timeout;
> +	cfg[n++] = config->preempt_timeout[0];
>  
>  #define encode_threshold_config(TAG, ...) ({					\
>  	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_THRESHOLD_##TAG);			\
> @@ -1857,12 +1857,15 @@ static int pf_provision_exec_quantum(struct xe_gt *gt, unsigned int vfid,
>  {
>  	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
>  	int err;
> +	int i;
>  
>  	err = pf_push_vf_cfg_exec_quantum(gt, vfid, &exec_quantum);
>  	if (unlikely(err))
>  		return err;
>  
> -	config->exec_quantum = exec_quantum;
> +	for (i = 0; i < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT; i++)

maybe:

	for (i = 0; i < ARRAY_SIZE(config->exec_quantum); i++)

> +		config->exec_quantum[i] = exec_quantum;
> +
>  	return 0;
>  }
>  
> @@ -1870,7 +1873,7 @@ static u32 pf_get_exec_quantum(struct xe_gt *gt, unsigned int vfid)
>  {
>  	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
>  
> -	return config->exec_quantum;
> +	return config->exec_quantum[0];
>  }
>  
>  /**
> @@ -1987,12 +1990,14 @@ static int pf_provision_preempt_timeout(struct xe_gt *gt, unsigned int vfid,
>  {
>  	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
>  	int err;
> +	int i;
>  
>  	err = pf_push_vf_cfg_preempt_timeout(gt, vfid, &preempt_timeout);
>  	if (unlikely(err))
>  		return err;
>  
> -	config->preempt_timeout = preempt_timeout;
> +	for (i = 0; i < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT; i++)

ditto

> +		config->preempt_timeout[i] = preempt_timeout;
>  
>  	return 0;
>  }
> @@ -2001,7 +2006,7 @@ static u32 pf_get_preempt_timeout(struct xe_gt *gt, unsigned int vfid)
>  {
>  	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
>  
> -	return config->preempt_timeout;
> +	return config->preempt_timeout[0];
>  }
>  
>  /**
> @@ -2180,10 +2185,14 @@ u32 xe_gt_sriov_pf_config_get_sched_priority(struct xe_gt *gt, unsigned int vfid
>  
>  static void pf_reset_config_sched(struct xe_gt *gt, struct xe_gt_sriov_config *config)
>  {
> +	int i;
> +
>  	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
>  
> -	config->exec_quantum = 0;
> -	config->preempt_timeout = 0;
> +	for (i = 0; i < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT; i++) {
> +		config->exec_quantum[i] = 0;
> +		config->preempt_timeout[i] = 0;
> +	}
>  }
>  
>  static int pf_provision_threshold(struct xe_gt *gt, unsigned int vfid,
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h
> index 686c7b3b6d7a..abf003946242 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h
> @@ -6,6 +6,7 @@
>  #ifndef _XE_GT_SRIOV_PF_CONFIG_TYPES_H_
>  #define _XE_GT_SRIOV_PF_CONFIG_TYPES_H_
>  
> +#include "abi/guc_klvs_abi.h"
>  #include "xe_ggtt_types.h"
>  #include "xe_guc_klv_thresholds_set_types.h"
>  
> @@ -30,9 +31,9 @@ struct xe_gt_sriov_config {
>  	/** @begin_db: start index of GuC doorbell ID range. */
>  	u16 begin_db;
>  	/** @exec_quantum: execution-quantum in milliseconds. */
> -	u32 exec_quantum;
> +	u32 exec_quantum[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];
>  	/** @preempt_timeout: preemption timeout in microseconds. */
> -	u32 preempt_timeout;
> +	u32 preempt_timeout[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];

maybe we should rename

	GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT
to
	GUC_MAX_ENGINE_GROUPS

and make it independent from KLVs and define like we have

	GUC_MAX_ENGINE_CLASSES


>  	/** @sched_priority: scheduling priority. */
>  	u32 sched_priority;
>  	/** @thresholds: GuC thresholds for adverse events notifications. */


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/10] drm/xe/sriov: Prep for multiple exec quantums and preemption timeouts
  2025-12-02 16:42   ` Michal Wajdeczko
@ 2025-12-06  1:55     ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-12-06  1:55 UTC (permalink / raw)
  To: Michal Wajdeczko, intel-xe



On 12/2/2025 8:42 AM, Michal Wajdeczko wrote:
>
> On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
>> Each scheduler group can be independently configured with its own exec
>> quantum and preemption timeouts. The existing KLVs to configure those
>> parameter will apply the value to all groups (even if they're not
> typo: parameters
>
>> enabled at the moment).
>>
>> When scheduler groups are disable the GuC used the values from Group 0.
> typo: ... disabled then GuC uses ... ?
>
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> ---
>>   drivers/gpu/drm/xe/abi/guc_klvs_abi.h         |  7 ++++--
>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c    | 25 +++++++++++++------
>>   .../gpu/drm/xe/xe_gt_sriov_pf_config_types.h  |  5 ++--
>>   3 files changed, 25 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>> index a6dce9da339f..48f47e26132d 100644
>> --- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>> +++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>> @@ -290,7 +290,9 @@ enum  {
>>    *      infinitely long compute or shader kernel. In such a scenario, the
>>    *      PF would need to trigger a VM PAUSE and then change the KLV to force
>>    *      it to take effect. Such cases might typically happen on a 1PF+1VF
>> - *      Virtualization config enabled for heavier workloads like AI/ML.
>> + *      Virtualization config enabled for heavier workloads like AI/ML. If
> start with new empty line so it will rendered as separate paragraph
>
>> + *      scheduling groups are supported, the provided value is applied to all
>> + *      groups (even if they've not yet been enabled).
> should we document also here that "The scheduling groups are available starting from GuC 70.53" ?

ok

>
>>    *
>>    *      The max value for this KLV is 100 seconds, anything exceeding that
>>    *      will be clamped to the max.
>> @@ -312,7 +314,8 @@ enum  {
>>    *      In this case, the PF would need to trigger a VM PAUSE and then change
>>    *      the KLV to force it to take effect. Such cases might typically happen
>>    *      on a 1PF+1VF Virtualization config enabled for heavier workloads like
>> - *      AI/ML.
>> + *      AI/ML. If scheduling groups are supported, the provided value is applied
>> + *      to all groups (even if they've not yet been enabled).
> ditto
>
>>    *
>>    *      The max value for this KLV is 100 seconds, anything exceeding that
>>    *      will be clamped to the max.
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
>> index 59c5c6b4d994..eb547fedb6da 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
>> @@ -298,10 +298,10 @@ static u32 encode_config(u32 *cfg, const struct xe_gt_sriov_config *config, bool
>>   	}
>>   
>>   	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_EXEC_QUANTUM);
>> -	cfg[n++] = config->exec_quantum;
>> +	cfg[n++] = config->exec_quantum[0];
>>   
>>   	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_PREEMPT_TIMEOUT);
>> -	cfg[n++] = config->preempt_timeout;
>> +	cfg[n++] = config->preempt_timeout[0];
>>   
>>   #define encode_threshold_config(TAG, ...) ({					\
>>   	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_THRESHOLD_##TAG);			\
>> @@ -1857,12 +1857,15 @@ static int pf_provision_exec_quantum(struct xe_gt *gt, unsigned int vfid,
>>   {
>>   	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
>>   	int err;
>> +	int i;
>>   
>>   	err = pf_push_vf_cfg_exec_quantum(gt, vfid, &exec_quantum);
>>   	if (unlikely(err))
>>   		return err;
>>   
>> -	config->exec_quantum = exec_quantum;
>> +	for (i = 0; i < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT; i++)
> maybe:
>
> 	for (i = 0; i < ARRAY_SIZE(config->exec_quantum); i++)

ok

>
>> +		config->exec_quantum[i] = exec_quantum;
>> +
>>   	return 0;
>>   }
>>   
>> @@ -1870,7 +1873,7 @@ static u32 pf_get_exec_quantum(struct xe_gt *gt, unsigned int vfid)
>>   {
>>   	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
>>   
>> -	return config->exec_quantum;
>> +	return config->exec_quantum[0];
>>   }
>>   
>>   /**
>> @@ -1987,12 +1990,14 @@ static int pf_provision_preempt_timeout(struct xe_gt *gt, unsigned int vfid,
>>   {
>>   	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
>>   	int err;
>> +	int i;
>>   
>>   	err = pf_push_vf_cfg_preempt_timeout(gt, vfid, &preempt_timeout);
>>   	if (unlikely(err))
>>   		return err;
>>   
>> -	config->preempt_timeout = preempt_timeout;
>> +	for (i = 0; i < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT; i++)
> ditto
>
>> +		config->preempt_timeout[i] = preempt_timeout;
>>   
>>   	return 0;
>>   }
>> @@ -2001,7 +2006,7 @@ static u32 pf_get_preempt_timeout(struct xe_gt *gt, unsigned int vfid)
>>   {
>>   	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
>>   
>> -	return config->preempt_timeout;
>> +	return config->preempt_timeout[0];
>>   }
>>   
>>   /**
>> @@ -2180,10 +2185,14 @@ u32 xe_gt_sriov_pf_config_get_sched_priority(struct xe_gt *gt, unsigned int vfid
>>   
>>   static void pf_reset_config_sched(struct xe_gt *gt, struct xe_gt_sriov_config *config)
>>   {
>> +	int i;
>> +
>>   	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
>>   
>> -	config->exec_quantum = 0;
>> -	config->preempt_timeout = 0;
>> +	for (i = 0; i < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT; i++) {
>> +		config->exec_quantum[i] = 0;
>> +		config->preempt_timeout[i] = 0;
>> +	}
>>   }
>>   
>>   static int pf_provision_threshold(struct xe_gt *gt, unsigned int vfid,
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h
>> index 686c7b3b6d7a..abf003946242 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h
>> @@ -6,6 +6,7 @@
>>   #ifndef _XE_GT_SRIOV_PF_CONFIG_TYPES_H_
>>   #define _XE_GT_SRIOV_PF_CONFIG_TYPES_H_
>>   
>> +#include "abi/guc_klvs_abi.h"
>>   #include "xe_ggtt_types.h"
>>   #include "xe_guc_klv_thresholds_set_types.h"
>>   
>> @@ -30,9 +31,9 @@ struct xe_gt_sriov_config {
>>   	/** @begin_db: start index of GuC doorbell ID range. */
>>   	u16 begin_db;
>>   	/** @exec_quantum: execution-quantum in milliseconds. */
>> -	u32 exec_quantum;
>> +	u32 exec_quantum[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];
>>   	/** @preempt_timeout: preemption timeout in microseconds. */
>> -	u32 preempt_timeout;
>> +	u32 preempt_timeout[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];
> maybe we should rename
>
> 	GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT
> to
> 	GUC_MAX_ENGINE_GROUPS
>
> and make it independent from KLVs and define like we have
>
> 	GUC_MAX_ENGINE_CLASSES

ok, I'll do it in an earlier patch.

Daniele

>
>
>>   	/** @sched_priority: scheduling priority. */
>>   	u32 sched_priority;
>>   	/** @thresholds: GuC thresholds for adverse events notifications. */


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 08/10] drm/xe/sriov: Add functions to set exec quantums for each group
  2025-11-27  1:45 [PATCH 00/10] Introduce SRIOV scheduler groups Daniele Ceraolo Spurio
                   ` (6 preceding siblings ...)
  2025-11-27  1:45 ` [PATCH 07/10] drm/xe/sriov: Prep for multiple exec quantums and preemption timeouts Daniele Ceraolo Spurio
@ 2025-11-27  1:45 ` Daniele Ceraolo Spurio
  2025-12-02 19:54   ` Michal Wajdeczko
  2025-11-27  1:45 ` [PATCH 09/10] drm/xe/sriov: Add functions to set preempt timeouts " Daniele Ceraolo Spurio
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-11-27  1:45 UTC (permalink / raw)
  To: intel-xe; +Cc: Daniele Ceraolo Spurio, Michal Wajdeczko

The GuC has a new dedicated KLV to set the EQs for the groups. The GuC
always sets the EQs for all the groups (even the ones not enabled). If
we provide fewer values than the max number of grops (8), the GuC will
set the remaining ones to 0.

Based on this, we offer 2 ways of setting the EQs:

1) provide a list of EQs, which is passed straight to the GuC. This will
   cause the GuC to use zero for any missing value as mentioned above
2) provide a single EQ for a specific group. In this case we send all 8
   EQs to the GuC, using the current values for the groups which are not
   being updated.

Note that the new KLV can be used even when groups are disabled (as the
GuC always consider group0 to be active), so we can use it when encoding
the SRIOV config.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/abi/guc_klvs_abi.h      |  12 +
 drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c | 244 +++++++++++++++++++--
 drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h |   8 +
 drivers/gpu/drm/xe/xe_sriov.c              |  18 ++
 drivers/gpu/drm/xe/xe_sriov.h              |   1 +
 5 files changed, 266 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
index 48f47e26132d..a0763cc15518 100644
--- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
+++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
@@ -383,6 +383,16 @@ enum  {
  * _`GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT` : 0x8A0D
  *      This config sets the threshold for LRCA context registration when SRIOV
  *      scheduler groups are enabled.
+ *
+ * _`GUC_KLV_VF_CFG_ENGINE_GROUP_EXEC_QUANTUM' : 0x8A0E
+ *      This config sets the VFs-execution-quantum for each scheduling group in
+ *      milliseconds. The driver must provide an array of values, with each of
+ *      them matching the respective group index (first value goes to group 0,
+ *      second to group 1, etc). The setting of group values follows the same
+ *      behavior and rules as setting via GUC_KLV_VF_CFG_EXEC_QUANTUM. Note that
+ *      the GuC always sets the EQ for all groups (even the non-enabled ones),
+ *      so if we provide fewer values than the max the GuC will use 0 for the
+ *      remaining groups.
  */
 
 #define GUC_KLV_VF_CFG_GGTT_START_KEY		0x0001
@@ -444,6 +454,8 @@ enum  {
 #define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_KEY	0x8a0d
 #define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_LEN	1u
 
+#define GUC_KLV_VF_CFG_ENGINE_GROUP_EXEC_QUANTUM_KEY	0x8a0e
+
 /*
  * Workaround keys:
  */
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
index eb547fedb6da..1bfb25bda432 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
@@ -195,6 +195,22 @@ static int pf_push_vf_cfg_dbs(struct xe_gt *gt, unsigned int vfid, u32 begin, u3
 	return pf_push_vf_cfg_klvs(gt, vfid, 2, klvs, ARRAY_SIZE(klvs));
 }
 
+static int pf_push_vf_grp_cfg_u32(struct xe_gt *gt, unsigned int vfid,
+				  u16 key, const u32 *values, u32 count)
+{
+	u32 klv[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT + 1];
+
+	if (!count)
+		return -ENODATA;
+	if (count > GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT)
+		return -E2BIG;
+
+	klv[0] = FIELD_PREP(GUC_KLV_0_KEY, key) | FIELD_PREP(GUC_KLV_0_LEN, count);
+	memcpy(&klv[1], values, count * sizeof(u32));
+
+	return pf_push_vf_cfg_klvs(gt, vfid, 1, klv, count + 1);
+}
+
 static int pf_push_vf_cfg_exec_quantum(struct xe_gt *gt, unsigned int vfid, u32 *exec_quantum)
 {
 	/* GuC will silently clamp values exceeding max */
@@ -269,9 +285,11 @@ static u32 encode_config_ggtt(u32 *cfg, const struct xe_gt_sriov_config *config,
 }
 
 /* Return: number of configuration dwords written */
-static u32 encode_config(u32 *cfg, const struct xe_gt_sriov_config *config, bool details)
+static u32 encode_config(struct xe_gt *gt, u32 *cfg,
+			 const struct xe_gt_sriov_config *config, bool details)
 {
 	u32 n = 0;
+	int i;
 
 	n += encode_config_ggtt(cfg, config, details);
 
@@ -297,8 +315,15 @@ static u32 encode_config(u32 *cfg, const struct xe_gt_sriov_config *config, bool
 		cfg[n++] = upper_32_bits(xe_bo_size(config->lmem_obj));
 	}
 
-	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_EXEC_QUANTUM);
-	cfg[n++] = config->exec_quantum[0];
+	if (xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt)) {
+		cfg[n++] = PREP_GUC_KLV_CONST(GUC_KLV_VF_CFG_ENGINE_GROUP_EXEC_QUANTUM_KEY,
+					      GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
+		for (i = 0; i < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT; i++)
+			cfg[n++] = config->exec_quantum[i];
+	} else {
+		cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_EXEC_QUANTUM);
+		cfg[n++] = config->exec_quantum[0];
+	}
 
 	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_PREEMPT_TIMEOUT);
 	cfg[n++] = config->preempt_timeout[0];
@@ -328,7 +353,7 @@ static int pf_push_full_vf_config(struct xe_gt *gt, unsigned int vfid)
 		return -ENOBUFS;
 
 	cfg = xe_guc_buf_cpu_ptr(buf);
-	num_dwords = encode_config(cfg, config, true);
+	num_dwords = encode_config(gt, cfg, config, true);
 	xe_gt_assert(gt, num_dwords <= max_cfg_dwords);
 
 	if (xe_gt_is_media_type(gt)) {
@@ -952,6 +977,21 @@ static const char *spare_unit(u32 unused)
 	return " spare";
 }
 
+static void __set_u32_done(struct xe_gt *gt, const char *name, u32 value, u32 actual,
+			   const char *what, const char *(*unit)(u32), int err)
+{
+	if (unlikely(err)) {
+		xe_gt_sriov_notice(gt, "Failed to provision %s with %u%s %s (%pe)\n",
+				   name, value, unit(value), what, ERR_PTR(err));
+		xe_gt_sriov_info(gt, "%s provisioning remains at %u%s %s\n",
+				 name, actual, unit(actual), what);
+	} else {
+		/* the actual value may have changed during provisioning */
+		xe_gt_sriov_info(gt, "%s provisioned with %u%s %s\n",
+				 name, actual, unit(actual), what);
+	}
+}
+
 static int pf_config_set_u32_done(struct xe_gt *gt, unsigned int vfid, u32 value, u32 actual,
 				  const char *what, const char *(*unit)(u32), int err)
 {
@@ -959,18 +999,47 @@ static int pf_config_set_u32_done(struct xe_gt *gt, unsigned int vfid, u32 value
 
 	xe_sriov_function_name(vfid, name, sizeof(name));
 
-	if (unlikely(err)) {
-		xe_gt_sriov_notice(gt, "Failed to provision %s with %u%s %s (%pe)\n",
-				   name, value, unit(value), what, ERR_PTR(err));
-		xe_gt_sriov_info(gt, "%s provisioning remains at %u%s %s\n",
-				 name, actual, unit(actual), what);
-		return err;
+	__set_u32_done(gt, name, value, actual, what, unit, err);
+
+	return err;
+}
+
+static int pf_group_config_set_u32_done(struct xe_gt *gt, unsigned int vfid, u8 group,
+					u32 value, u32 actual, const char *what,
+					const char *(*unit)(u32), int err)
+{
+	char name[24];
+
+	xe_sriov_function_and_group_name(vfid, group, name, sizeof(name));
+
+	__set_u32_done(gt, name, value, actual, what, unit, err);
+
+	return err;
+}
+
+static int
+pf_groups_cfg_set_u32_array_done(struct xe_gt *gt, unsigned int vfid,
+				 u32 *values, u32 count,
+				 void (*get_actual)(struct xe_gt *, unsigned int, u32 *, u32),
+				 const char *what, const char *(*unit)(u32), int err)
+{
+	u32 actual[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];
+	char name[24];
+	u8 g;
+
+	get_actual(gt, vfid, actual, count);
+
+	for (g = 0; g < count; g++) {
+		xe_sriov_function_and_group_name(vfid, g, name, sizeof(name));
+
+		__set_u32_done(gt, name, values[g], actual[g], what, unit, err);
 	}
 
-	/* the actual value may have changed during provisioning */
-	xe_gt_sriov_info(gt, "%s provisioned with %u%s %s\n",
-			 name, actual, unit(actual), what);
-	return 0;
+	if (!err && count < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT)
+		xe_gt_sriov_info(gt, "All remaining groups provisioned with 0%s %s\n",
+				 unit(0), what);
+
+	return err;
 }
 
 /**
@@ -1869,11 +1938,16 @@ static int pf_provision_exec_quantum(struct xe_gt *gt, unsigned int vfid,
 	return 0;
 }
 
-static u32 pf_get_exec_quantum(struct xe_gt *gt, unsigned int vfid)
+static u32 pf_get_group_exec_quantum(struct xe_gt *gt, unsigned int vfid, u8 group)
 {
 	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
 
-	return config->exec_quantum[0];
+	return config->exec_quantum[group];
+}
+
+static u32 pf_get_exec_quantum(struct xe_gt *gt, unsigned int vfid)
+{
+	return pf_get_group_exec_quantum(gt, vfid, 0);
 }
 
 /**
@@ -1980,6 +2054,137 @@ int xe_gt_sriov_pf_config_bulk_set_exec_quantum_locked(struct xe_gt *gt, u32 exe
 					   exec_quantum_unit, n, err);
 }
 
+static int pf_provision_groups_exec_quantums(struct xe_gt *gt, unsigned int vfid,
+					     const u32 *exec_quantums, u32 count)
+{
+	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
+	int err;
+	int i;
+
+	err = pf_push_vf_grp_cfg_u32(gt, vfid, GUC_KLV_VF_CFG_ENGINE_GROUP_EXEC_QUANTUM_KEY,
+				     exec_quantums, count);
+	if (unlikely(err))
+		return err;
+
+	/*
+	 * GuC silently clamps values exceeding the max and zeroes out the
+	 * quantum for groups not in the array
+	 */
+	for (i = 0; i < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT; i++) {
+		if (i < count)
+			config->exec_quantum[i] = min_t(u32, exec_quantums[i],
+							GUC_KLV_VF_CFG_EXEC_QUANTUM_MAX_VALUE);
+		else
+			config->exec_quantum[i] = 0;
+	}
+
+	return 0;
+}
+
+static void pf_get_groups_exec_quantums(struct xe_gt *gt, unsigned int vfid,
+					u32 *exec_quantums, u32 max_count)
+{
+	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
+	u32 count = min_t(u32, max_count, GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
+
+	memcpy(exec_quantums, config->exec_quantum, sizeof(u32) * count);
+}
+
+/**
+ * xe_gt_sriov_pf_config_set_groups_exec_quantums() - Configure PF/VF EQs for sched groups.
+ * @gt: the &xe_gt
+ * @vfid: the PF or VF identifier
+ * @exec_quantums: array of requested EQs in milliseconds (0 is infinity)
+ * @count: number of entries in the array
+ *
+ * This function can only be called on PF.
+ * It will log the provisioned value or an error in case of the failure.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_gt_sriov_pf_config_set_groups_exec_quantums(struct xe_gt *gt, unsigned int vfid,
+						   u32 *exec_quantums, u32 count)
+{
+	int err;
+
+	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
+
+	err = pf_provision_groups_exec_quantums(gt, vfid, exec_quantums, count);
+
+	return pf_groups_cfg_set_u32_array_done(gt, vfid, exec_quantums, count,
+						pf_get_groups_exec_quantums,
+						"execution quantum",
+						exec_quantum_unit, err);
+}
+
+/**
+ * xe_gt_sriov_pf_config_get_groups_exec_quantums - Get PF/VF sched groups EQs
+ * @gt: the &xe_gt
+ * @vfid: the PF or VF identifier
+ * @exec_quantums: array in which to store the execution quantums values
+ * @max_count: maximum number of entries to store
+ *
+ * This function can only be called on PF.
+ */
+void xe_gt_sriov_pf_config_get_groups_exec_quantums(struct xe_gt *gt, unsigned int vfid,
+						    u32 *exec_quantums, u32 max_count)
+{
+	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
+
+	return pf_get_groups_exec_quantums(gt, vfid, exec_quantums, max_count);
+}
+
+/**
+ * xe_gt_sriov_pf_config_set_group_exec_quantum - Configure PF/VF EQs for a sched group.
+ * @gt: the &xe_gt
+ * @vfid: the PF or VF identifier
+ * @group: index of the group to configure
+ * @exec_quantum: requested EQs in milliseconds (0 is infinity)
+ *
+ * This function can only be called on PF.
+ * It will log the provisioned value or an error in case of the failure.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_gt_sriov_pf_config_set_group_exec_quantum(struct xe_gt *gt, unsigned int vfid,
+						 u8 group, u32 exec_quantum)
+{
+	u32 values[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];
+	int err;
+
+	xe_gt_assert(gt, group < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
+
+	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
+
+	pf_get_groups_exec_quantums(gt, vfid, values, ARRAY_SIZE(values));
+	values[group] = exec_quantum;
+
+	err = pf_provision_groups_exec_quantums(gt, vfid, values, ARRAY_SIZE(values));
+
+	return pf_group_config_set_u32_done(gt, vfid, group, exec_quantum,
+					    pf_get_group_exec_quantum(gt, vfid, group),
+					    "execution quantum", exec_quantum_unit, err);
+}
+
+/**
+ * xe_gt_sriov_pf_config_get_group_exec_quantum - Get PF/VF EQ for a sched groups
+ * @gt: the &xe_gt
+ * @vfid: the PF or VF identifier
+ * @group: index of the group for which to get the EQ
+ *
+ * This function can only be called on PF.
+ *
+ * Return: execution quantum in milliseconds (or 0 if infinity).
+ */
+u32 xe_gt_sriov_pf_config_get_group_exec_quantum(struct xe_gt *gt, unsigned int vfid, u8 group)
+{
+	xe_gt_assert(gt, group < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
+
+	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
+
+	return pf_get_group_exec_quantum(gt, vfid, group);
+}
+
 static const char *preempt_timeout_unit(u32 preempt_timeout)
 {
 	return preempt_timeout ? "us" : "(infinity)";
@@ -2527,7 +2732,7 @@ ssize_t xe_gt_sriov_pf_config_save(struct xe_gt *gt, unsigned int vfid, void *bu
 			ret = -ENOBUFS;
 		} else {
 			config = pf_pick_vf_config(gt, vfid);
-			ret = encode_config(buf, config, false) * sizeof(u32);
+			ret = encode_config(gt, buf, config, false) * sizeof(u32);
 		}
 	}
 	mutex_unlock(xe_gt_sriov_pf_master_mutex(gt));
@@ -2554,6 +2759,11 @@ static int pf_restore_vf_config_klv(struct xe_gt *gt, unsigned int vfid,
 			return -EBADMSG;
 		return pf_provision_exec_quantum(gt, vfid, value[0]);
 
+	case GUC_KLV_VF_CFG_ENGINE_GROUP_EXEC_QUANTUM_KEY:
+		if (len > GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT)
+			return -EBADMSG;
+		return pf_provision_groups_exec_quantums(gt, vfid, value, len);
+
 	case GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_KEY:
 		if (len != GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_LEN)
 			return -EBADMSG;
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h
index 4975730423d7..aaf6bb824bc9 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h
@@ -46,6 +46,14 @@ int xe_gt_sriov_pf_config_set_exec_quantum_locked(struct xe_gt *gt, unsigned int
 						  u32 exec_quantum);
 int xe_gt_sriov_pf_config_bulk_set_exec_quantum_locked(struct xe_gt *gt, u32 exec_quantum);
 
+void xe_gt_sriov_pf_config_get_groups_exec_quantums(struct xe_gt *gt, unsigned int vfid,
+						    u32 *exec_quantum, u32 max_count);
+int xe_gt_sriov_pf_config_set_groups_exec_quantums(struct xe_gt *gt, unsigned int vfid,
+						   u32 *exec_quantum, u32 count);
+u32 xe_gt_sriov_pf_config_get_group_exec_quantum(struct xe_gt *gt, unsigned int vfid, u8 group);
+int xe_gt_sriov_pf_config_set_group_exec_quantum(struct xe_gt *gt, unsigned int vfid,
+						 u8 group, u32 exec_quantum);
+
 u32 xe_gt_sriov_pf_config_get_preempt_timeout(struct xe_gt *gt, unsigned int vfid);
 int xe_gt_sriov_pf_config_set_preempt_timeout(struct xe_gt *gt, unsigned int vfid,
 					      u32 preempt_timeout);
diff --git a/drivers/gpu/drm/xe/xe_sriov.c b/drivers/gpu/drm/xe/xe_sriov.c
index ea411944609b..eecdd4aaf972 100644
--- a/drivers/gpu/drm/xe/xe_sriov.c
+++ b/drivers/gpu/drm/xe/xe_sriov.c
@@ -159,6 +159,24 @@ const char *xe_sriov_function_name(unsigned int n, char *buf, size_t size)
 	return buf;
 }
 
+/**
+ * xe_sriov_function_and_group_name() - Get SR-IOV Function and group name.
+ * @n: the Function number (identifier) to get name of
+ * @n: the scheduling group to get name of
+ * @buf: the buffer to format to
+ * @size: size of the buffer (shall be at least 18 bytes)
+ *
+ * Return: formatted function name ("PF sched group%u" or "VF%u sched group%u").
+ */
+const char *xe_sriov_function_and_group_name(unsigned int n, u8 g, char *buf, size_t size)
+{
+	if (n)
+		snprintf(buf, size, "VF%u sched group%u", n, g);
+	else
+		snprintf(buf, size, "PF sched group%u", g);
+	return buf;
+}
+
 /**
  * xe_sriov_init_late() - SR-IOV late initialization functions.
  * @xe: the &xe_device to initialize
diff --git a/drivers/gpu/drm/xe/xe_sriov.h b/drivers/gpu/drm/xe/xe_sriov.h
index 6db45df55615..df2b02cb97d0 100644
--- a/drivers/gpu/drm/xe/xe_sriov.h
+++ b/drivers/gpu/drm/xe/xe_sriov.h
@@ -14,6 +14,7 @@ struct drm_printer;
 
 const char *xe_sriov_mode_to_string(enum xe_sriov_mode mode);
 const char *xe_sriov_function_name(unsigned int n, char *buf, size_t len);
+const char *xe_sriov_function_and_group_name(unsigned int n, u8 g, char *buf, size_t size);
 
 void xe_sriov_probe_early(struct xe_device *xe);
 void xe_sriov_print_info(struct xe_device *xe, struct drm_printer *p);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 08/10] drm/xe/sriov: Add functions to set exec quantums for each group
  2025-11-27  1:45 ` [PATCH 08/10] drm/xe/sriov: Add functions to set exec quantums for each group Daniele Ceraolo Spurio
@ 2025-12-02 19:54   ` Michal Wajdeczko
  2025-12-06  1:58     ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 44+ messages in thread
From: Michal Wajdeczko @ 2025-12-02 19:54 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-xe



On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
> The GuC has a new dedicated KLV to set the EQs for the groups. The GuC
> always sets the EQs for all the groups (even the ones not enabled). If
> we provide fewer values than the max number of grops (8), the GuC will
> set the remaining ones to 0.
> 
> Based on this, we offer 2 ways of setting the EQs:
> 
> 1) provide a list of EQs, which is passed straight to the GuC. This will
>    cause the GuC to use zero for any missing value as mentioned above
> 2) provide a single EQ for a specific group. In this case we send all 8
>    EQs to the GuC, using the current values for the groups which are not
>    being updated.
> 
> Note that the new KLV can be used even when groups are disabled (as the
> GuC always consider group0 to be active), so we can use it when encoding
> the SRIOV config.
> 
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> ---
>  drivers/gpu/drm/xe/abi/guc_klvs_abi.h      |  12 +
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c | 244 +++++++++++++++++++--
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h |   8 +
>  drivers/gpu/drm/xe/xe_sriov.c              |  18 ++
>  drivers/gpu/drm/xe/xe_sriov.h              |   1 +
>  5 files changed, 266 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
> index 48f47e26132d..a0763cc15518 100644
> --- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
> +++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
> @@ -383,6 +383,16 @@ enum  {
>   * _`GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT` : 0x8A0D
>   *      This config sets the threshold for LRCA context registration when SRIOV
>   *      scheduler groups are enabled.
> + *
> + * _`GUC_KLV_VF_CFG_ENGINE_GROUP_EXEC_QUANTUM' : 0x8A0E
> + *      This config sets the VFs-execution-quantum for each scheduling group in
> + *      milliseconds. The driver must provide an array of values, with each of
> + *      them matching the respective group index (first value goes to group 0,
> + *      second to group 1, etc). The setting of group values follows the same
> + *      behavior and rules as setting via GUC_KLV_VF_CFG_EXEC_QUANTUM. Note that
> + *      the GuC always sets the EQ for all groups (even the non-enabled ones),
> + *      so if we provide fewer values than the max the GuC will use 0 for the
> + *      remaining groups.

don't forget to update xe_guc_klv_key_to_string()

>   */
>  
>  #define GUC_KLV_VF_CFG_GGTT_START_KEY		0x0001
> @@ -444,6 +454,8 @@ enum  {
>  #define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_KEY	0x8a0d
>  #define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_LEN	1u
>  
> +#define GUC_KLV_VF_CFG_ENGINE_GROUP_EXEC_QUANTUM_KEY	0x8a0e

what about MIN_LEN and MAX_LEN definitions?

> +
>  /*
>   * Workaround keys:
>   */
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> index eb547fedb6da..1bfb25bda432 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> @@ -195,6 +195,22 @@ static int pf_push_vf_cfg_dbs(struct xe_gt *gt, unsigned int vfid, u32 begin, u3
>  	return pf_push_vf_cfg_klvs(gt, vfid, 2, klvs, ARRAY_SIZE(klvs));
>  }
>  
> +static int pf_push_vf_grp_cfg_u32(struct xe_gt *gt, unsigned int vfid,
> +				  u16 key, const u32 *values, u32 count)
> +{
> +	u32 klv[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT + 1];

this magic "1" is GUC_KLV_LEN_MIN, please use it

and maybe we don't need this temp storage and can use CLASS(xe_guc_buf) ?

> +
> +	if (!count)
> +		return -ENODATA;
> +	if (count > GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT)
> +		return -E2BIG;

this looks like our coding error, use assert instead

> +
> +	klv[0] = FIELD_PREP(GUC_KLV_0_KEY, key) | FIELD_PREP(GUC_KLV_0_LEN, count);
> +	memcpy(&klv[1], values, count * sizeof(u32));
> +
> +	return pf_push_vf_cfg_klvs(gt, vfid, 1, klv, count + 1);
> +}
> +
>  static int pf_push_vf_cfg_exec_quantum(struct xe_gt *gt, unsigned int vfid, u32 *exec_quantum)
>  {
>  	/* GuC will silently clamp values exceeding max */
> @@ -269,9 +285,11 @@ static u32 encode_config_ggtt(u32 *cfg, const struct xe_gt_sriov_config *config,
>  }
>  
>  /* Return: number of configuration dwords written */
> -static u32 encode_config(u32 *cfg, const struct xe_gt_sriov_config *config, bool details)
> +static u32 encode_config(struct xe_gt *gt, u32 *cfg,
> +			 const struct xe_gt_sriov_config *config, bool details)
>  {
>  	u32 n = 0;
> +	int i;
>  
>  	n += encode_config_ggtt(cfg, config, details);
>  
> @@ -297,8 +315,15 @@ static u32 encode_config(u32 *cfg, const struct xe_gt_sriov_config *config, bool
>  		cfg[n++] = upper_32_bits(xe_bo_size(config->lmem_obj));
>  	}
>  
> -	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_EXEC_QUANTUM);
> -	cfg[n++] = config->exec_quantum[0];
> +	if (xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt)) {
> +		cfg[n++] = PREP_GUC_KLV_CONST(GUC_KLV_VF_CFG_ENGINE_GROUP_EXEC_QUANTUM_KEY,
> +					      GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
> +		for (i = 0; i < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT; i++)
> +			cfg[n++] = config->exec_quantum[i];
> +	} else {
> +		cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_EXEC_QUANTUM);
> +		cfg[n++] = config->exec_quantum[0];
> +	}

I guess it's time to extract above chunk to new encode_sched() helper

there we could encode both EQ and PT and avoid double call to
	xe_sriov_gt_pf_policy_has_valid_sched_group_modes

>  
>  	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_PREEMPT_TIMEOUT);
>  	cfg[n++] = config->preempt_timeout[0];
> @@ -328,7 +353,7 @@ static int pf_push_full_vf_config(struct xe_gt *gt, unsigned int vfid)
>  		return -ENOBUFS;
>  
>  	cfg = xe_guc_buf_cpu_ptr(buf);
> -	num_dwords = encode_config(cfg, config, true);
> +	num_dwords = encode_config(gt, cfg, config, true);
>  	xe_gt_assert(gt, num_dwords <= max_cfg_dwords);
>  
>  	if (xe_gt_is_media_type(gt)) {
> @@ -952,6 +977,21 @@ static const char *spare_unit(u32 unused)
>  	return " spare";
>  }
>  
> +static void __set_u32_done(struct xe_gt *gt, const char *name, u32 value, u32 actual,
> +			   const char *what, const char *(*unit)(u32), int err)

please keep the pf prefix:

	__pf_config_set_u32_done(...

and maybe we shouldn't change the meaning of the "name" here (as it's still about PF or VF)
but rather augment the "what" was changed, like:

	"execution quantum" ->	"group0 execution quantum"

so the only helper we need is:

const char *to_group_name(const char *what, unsigned int group, char *buf, size_t size)
{
	snprintf(buf, size, "group%u%s%s", group, what ? " " : "", what ?: "");
	return buf;
}

then we could call existing helper as usual:

	pf_group_config_set_u32_done(gt, vfid, value, actual,
				to_group_name(what, group, name, sizeof(name)),
				unit, err);

which will result in:

 [drm] PF: Tile0: GT1: VF1 provisioned with 1ms group0 execution quantum

or

 [drm] *ERROR* PF: Tile0: GT1: Failed to provision VF1 with 1ms group0 execution quantum (-EIO)


> +{
> +	if (unlikely(err)) {
> +		xe_gt_sriov_notice(gt, "Failed to provision %s with %u%s %s (%pe)\n",
> +				   name, value, unit(value), what, ERR_PTR(err));
> +		xe_gt_sriov_info(gt, "%s provisioning remains at %u%s %s\n",
> +				 name, actual, unit(actual), what);
> +	} else {
> +		/* the actual value may have changed during provisioning */
> +		xe_gt_sriov_info(gt, "%s provisioned with %u%s %s\n",
> +				 name, actual, unit(actual), what);
> +	}
> +}
> +
>  static int pf_config_set_u32_done(struct xe_gt *gt, unsigned int vfid, u32 value, u32 actual,
>  				  const char *what, const char *(*unit)(u32), int err)
>  {
> @@ -959,18 +999,47 @@ static int pf_config_set_u32_done(struct xe_gt *gt, unsigned int vfid, u32 value
>  
>  	xe_sriov_function_name(vfid, name, sizeof(name));
>  
> -	if (unlikely(err)) {
> -		xe_gt_sriov_notice(gt, "Failed to provision %s with %u%s %s (%pe)\n",
> -				   name, value, unit(value), what, ERR_PTR(err));
> -		xe_gt_sriov_info(gt, "%s provisioning remains at %u%s %s\n",
> -				 name, actual, unit(actual), what);
> -		return err;
> +	__set_u32_done(gt, name, value, actual, what, unit, err);
> +
> +	return err;
> +}
> +
> +static int pf_group_config_set_u32_done(struct xe_gt *gt, unsigned int vfid, u8 group,
> +					u32 value, u32 actual, const char *what,
> +					const char *(*unit)(u32), int err)
> +{
> +	char name[24];
> +
> +	xe_sriov_function_and_group_name(vfid, group, name, sizeof(name));
> +
> +	__set_u32_done(gt, name, value, actual, what, unit, err);
> +
> +	return err;
> +}
> +
> +static int
> +pf_groups_cfg_set_u32_array_done(struct xe_gt *gt, unsigned int vfid,
> +				 u32 *values, u32 count,
> +				 void (*get_actual)(struct xe_gt *, unsigned int, u32 *, u32),
> +				 const char *what, const char *(*unit)(u32), int err)
> +{
> +	u32 actual[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];
> +	char name[24];
> +	u8 g;
> +
> +	get_actual(gt, vfid, actual, count);
> +
> +	for (g = 0; g < count; g++) {
> +		xe_sriov_function_and_group_name(vfid, g, name, sizeof(name));
> +
> +		__set_u32_done(gt, name, values[g], actual[g], what, unit, err);

in case of error, does it make sense to report the same error up to 8 times?

>  	}
>  
> -	/* the actual value may have changed during provisioning */
> -	xe_gt_sriov_info(gt, "%s provisioned with %u%s %s\n",
> -			 name, actual, unit(actual), what);
> -	return 0;
> +	if (!err && count < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT)
> +		xe_gt_sriov_info(gt, "All remaining groups provisioned with 0%s %s\n",
> +				 unit(0), what);

this prints:

 [drm] PF: Tile0: GT1: All remaining groups provisioned with 0(infinity) execution quantum

but there is no info about the target: PF or VF1

but OTOH do we need to shout about implicit configurations, so maybe just drop it?

> +
> +	return err;
>  }
>  
>  /**
> @@ -1869,11 +1938,16 @@ static int pf_provision_exec_quantum(struct xe_gt *gt, unsigned int vfid,
>  	return 0;
>  }
>  
> -static u32 pf_get_exec_quantum(struct xe_gt *gt, unsigned int vfid)
> +static u32 pf_get_group_exec_quantum(struct xe_gt *gt, unsigned int vfid, u8 group)

do we need to use fixed size integer for group index ?

>  {
>  	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
>  
> -	return config->exec_quantum[0];
> +	return config->exec_quantum[group];
> +}
> +
> +static u32 pf_get_exec_quantum(struct xe_gt *gt, unsigned int vfid)
> +{
> +	return pf_get_group_exec_quantum(gt, vfid, 0);
>  }
>  
>  /**
> @@ -1980,6 +2054,137 @@ int xe_gt_sriov_pf_config_bulk_set_exec_quantum_locked(struct xe_gt *gt, u32 exe
>  					   exec_quantum_unit, n, err);
>  }
>  
> +static int pf_provision_groups_exec_quantums(struct xe_gt *gt, unsigned int vfid,
> +					     const u32 *exec_quantums, u32 count)
> +{
> +	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
> +	int err;
> +	int i;
> +
> +	err = pf_push_vf_grp_cfg_u32(gt, vfid, GUC_KLV_VF_CFG_ENGINE_GROUP_EXEC_QUANTUM_KEY,
> +				     exec_quantums, count);
> +	if (unlikely(err))
> +		return err;
> +
> +	/*
> +	 * GuC silently clamps values exceeding the max and zeroes out the
> +	 * quantum for groups not in the array
> +	 */
> +	for (i = 0; i < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT; i++) {
> +		if (i < count)
> +			config->exec_quantum[i] = min_t(u32, exec_quantums[i],
> +							GUC_KLV_VF_CFG_EXEC_QUANTUM_MAX_VALUE);
> +		else
> +			config->exec_quantum[i] = 0;
> +	}
> +
> +	return 0;
> +}
> +
> +static void pf_get_groups_exec_quantums(struct xe_gt *gt, unsigned int vfid,
> +					u32 *exec_quantums, u32 max_count)
> +{
> +	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
> +	u32 count = min_t(u32, max_count, GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
> +
> +	memcpy(exec_quantums, config->exec_quantum, sizeof(u32) * count);
> +}
> +
> +/**
> + * xe_gt_sriov_pf_config_set_groups_exec_quantums() - Configure PF/VF EQs for sched groups.
> + * @gt: the &xe_gt
> + * @vfid: the PF or VF identifier
> + * @exec_quantums: array of requested EQs in milliseconds (0 is infinity)
> + * @count: number of entries in the array
> + *
> + * This function can only be called on PF.
> + * It will log the provisioned value or an error in case of the failure.
> + *
> + * Return: 0 on success or a negative error code on failure.
> + */
> +int xe_gt_sriov_pf_config_set_groups_exec_quantums(struct xe_gt *gt, unsigned int vfid,
> +						   u32 *exec_quantums, u32 count)
> +{
> +	int err;
> +
> +	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
> +
> +	err = pf_provision_groups_exec_quantums(gt, vfid, exec_quantums, count);
> +
> +	return pf_groups_cfg_set_u32_array_done(gt, vfid, exec_quantums, count,
> +						pf_get_groups_exec_quantums,
> +						"execution quantum",
> +						exec_quantum_unit, err);
> +}
> +
> +/**
> + * xe_gt_sriov_pf_config_get_groups_exec_quantums - Get PF/VF sched groups EQs
> + * @gt: the &xe_gt
> + * @vfid: the PF or VF identifier
> + * @exec_quantums: array in which to store the execution quantums values
> + * @max_count: maximum number of entries to store

just @count ?

> + *
> + * This function can only be called on PF.
> + */
> +void xe_gt_sriov_pf_config_get_groups_exec_quantums(struct xe_gt *gt, unsigned int vfid,
> +						    u32 *exec_quantums, u32 max_count)
> +{
> +	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));

maybe assert that count <= MAX_GROUPS ?

> +
> +	return pf_get_groups_exec_quantums(gt, vfid, exec_quantums, max_count);
> +}
> +
> +/**
> + * xe_gt_sriov_pf_config_set_group_exec_quantum - Configure PF/VF EQs for a sched group.
> + * @gt: the &xe_gt
> + * @vfid: the PF or VF identifier
> + * @group: index of the group to configure

GuC ABI does not allow directly to setup single group EQ, so why bother?

> + * @exec_quantum: requested EQs in milliseconds (0 is infinity)
> + *
> + * This function can only be called on PF.
> + * It will log the provisioned value or an error in case of the failure.
> + *
> + * Return: 0 on success or a negative error code on failure.
> + */
> +int xe_gt_sriov_pf_config_set_group_exec_quantum(struct xe_gt *gt, unsigned int vfid,
> +						 u8 group, u32 exec_quantum)
> +{
> +	u32 values[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];
> +	int err;
> +
> +	xe_gt_assert(gt, group < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
> +
> +	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
> +
> +	pf_get_groups_exec_quantums(gt, vfid, values, ARRAY_SIZE(values));
> +	values[group] = exec_quantum;
> +
> +	err = pf_provision_groups_exec_quantums(gt, vfid, values, ARRAY_SIZE(values));
> +
> +	return pf_group_config_set_u32_done(gt, vfid, group, exec_quantum,
> +					    pf_get_group_exec_quantum(gt, vfid, group),
> +					    "execution quantum", exec_quantum_unit, err);
> +}
> +
> +/**
> + * xe_gt_sriov_pf_config_get_group_exec_quantum - Get PF/VF EQ for a sched groups
> + * @gt: the &xe_gt
> + * @vfid: the PF or VF identifier
> + * @group: index of the group for which to get the EQ
> + *
> + * This function can only be called on PF.
> + *
> + * Return: execution quantum in milliseconds (or 0 if infinity).
> + */
> +u32 xe_gt_sriov_pf_config_get_group_exec_quantum(struct xe_gt *gt, unsigned int vfid, u8 group)
> +{
> +	xe_gt_assert(gt, group < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
> +
> +	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
> +
> +	return pf_get_group_exec_quantum(gt, vfid, group);
> +}
> +
>  static const char *preempt_timeout_unit(u32 preempt_timeout)
>  {
>  	return preempt_timeout ? "us" : "(infinity)";
> @@ -2527,7 +2732,7 @@ ssize_t xe_gt_sriov_pf_config_save(struct xe_gt *gt, unsigned int vfid, void *bu
>  			ret = -ENOBUFS;
>  		} else {
>  			config = pf_pick_vf_config(gt, vfid);
> -			ret = encode_config(buf, config, false) * sizeof(u32);
> +			ret = encode_config(gt, buf, config, false) * sizeof(u32);
>  		}
>  	}
>  	mutex_unlock(xe_gt_sriov_pf_master_mutex(gt));
> @@ -2554,6 +2759,11 @@ static int pf_restore_vf_config_klv(struct xe_gt *gt, unsigned int vfid,
>  			return -EBADMSG;
>  		return pf_provision_exec_quantum(gt, vfid, value[0]);
>  
> +	case GUC_KLV_VF_CFG_ENGINE_GROUP_EXEC_QUANTUM_KEY:
> +		if (len > GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT)
> +			return -EBADMSG;
> +		return pf_provision_groups_exec_quantums(gt, vfid, value, len);
> +
>  	case GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_KEY:
>  		if (len != GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_LEN)
>  			return -EBADMSG;
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h
> index 4975730423d7..aaf6bb824bc9 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h
> @@ -46,6 +46,14 @@ int xe_gt_sriov_pf_config_set_exec_quantum_locked(struct xe_gt *gt, unsigned int
>  						  u32 exec_quantum);
>  int xe_gt_sriov_pf_config_bulk_set_exec_quantum_locked(struct xe_gt *gt, u32 exec_quantum);
>  
> +void xe_gt_sriov_pf_config_get_groups_exec_quantums(struct xe_gt *gt, unsigned int vfid,
> +						    u32 *exec_quantum, u32 max_count);
> +int xe_gt_sriov_pf_config_set_groups_exec_quantums(struct xe_gt *gt, unsigned int vfid,
> +						   u32 *exec_quantum, u32 count);
> +u32 xe_gt_sriov_pf_config_get_group_exec_quantum(struct xe_gt *gt, unsigned int vfid, u8 group);
> +int xe_gt_sriov_pf_config_set_group_exec_quantum(struct xe_gt *gt, unsigned int vfid,
> +						 u8 group, u32 exec_quantum);
> +
>  u32 xe_gt_sriov_pf_config_get_preempt_timeout(struct xe_gt *gt, unsigned int vfid);
>  int xe_gt_sriov_pf_config_set_preempt_timeout(struct xe_gt *gt, unsigned int vfid,
>  					      u32 preempt_timeout);
> diff --git a/drivers/gpu/drm/xe/xe_sriov.c b/drivers/gpu/drm/xe/xe_sriov.c
> index ea411944609b..eecdd4aaf972 100644
> --- a/drivers/gpu/drm/xe/xe_sriov.c
> +++ b/drivers/gpu/drm/xe/xe_sriov.c
> @@ -159,6 +159,24 @@ const char *xe_sriov_function_name(unsigned int n, char *buf, size_t size)
>  	return buf;
>  }
>  
> +/**
> + * xe_sriov_function_and_group_name() - Get SR-IOV Function and group name.
> + * @n: the Function number (identifier) to get name of
> + * @n: the scheduling group to get name of

@g or better @group

> + * @buf: the buffer to format to
> + * @size: size of the buffer (shall be at least 18 bytes)
> + *
> + * Return: formatted function name ("PF sched group%u" or "VF%u sched group%u").
> + */
> +const char *xe_sriov_function_and_group_name(unsigned int n, u8 g, char *buf, size_t size)
> +{
> +	if (n)
> +		snprintf(buf, size, "VF%u sched group%u", n, g);
> +	else
> +		snprintf(buf, size, "PF sched group%u", g);

	char name[10];
	snprintf(buf, size, "%s sched group%u",
		xe_sriov_function_name(name, n, sizeof(name), group);

but honestly I'm not convinced that we need this function at all

> +	return buf;
> +}
> +
>  /**
>   * xe_sriov_init_late() - SR-IOV late initialization functions.
>   * @xe: the &xe_device to initialize
> diff --git a/drivers/gpu/drm/xe/xe_sriov.h b/drivers/gpu/drm/xe/xe_sriov.h
> index 6db45df55615..df2b02cb97d0 100644
> --- a/drivers/gpu/drm/xe/xe_sriov.h
> +++ b/drivers/gpu/drm/xe/xe_sriov.h
> @@ -14,6 +14,7 @@ struct drm_printer;
>  
>  const char *xe_sriov_mode_to_string(enum xe_sriov_mode mode);
>  const char *xe_sriov_function_name(unsigned int n, char *buf, size_t len);
> +const char *xe_sriov_function_and_group_name(unsigned int n, u8 g, char *buf, size_t size);
>  
>  void xe_sriov_probe_early(struct xe_device *xe);
>  void xe_sriov_print_info(struct xe_device *xe, struct drm_printer *p);


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 08/10] drm/xe/sriov: Add functions to set exec quantums for each group
  2025-12-02 19:54   ` Michal Wajdeczko
@ 2025-12-06  1:58     ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-12-06  1:58 UTC (permalink / raw)
  To: Michal Wajdeczko, intel-xe



On 12/2/2025 11:54 AM, Michal Wajdeczko wrote:
>
> On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
>> The GuC has a new dedicated KLV to set the EQs for the groups. The GuC
>> always sets the EQs for all the groups (even the ones not enabled). If
>> we provide fewer values than the max number of grops (8), the GuC will
>> set the remaining ones to 0.
>>
>> Based on this, we offer 2 ways of setting the EQs:
>>
>> 1) provide a list of EQs, which is passed straight to the GuC. This will
>>     cause the GuC to use zero for any missing value as mentioned above
>> 2) provide a single EQ for a specific group. In this case we send all 8
>>     EQs to the GuC, using the current values for the groups which are not
>>     being updated.
>>
>> Note that the new KLV can be used even when groups are disabled (as the
>> GuC always consider group0 to be active), so we can use it when encoding
>> the SRIOV config.
>>
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> ---
>>   drivers/gpu/drm/xe/abi/guc_klvs_abi.h      |  12 +
>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c | 244 +++++++++++++++++++--
>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h |   8 +
>>   drivers/gpu/drm/xe/xe_sriov.c              |  18 ++
>>   drivers/gpu/drm/xe/xe_sriov.h              |   1 +
>>   5 files changed, 266 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>> index 48f47e26132d..a0763cc15518 100644
>> --- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>> +++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>> @@ -383,6 +383,16 @@ enum  {
>>    * _`GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT` : 0x8A0D
>>    *      This config sets the threshold for LRCA context registration when SRIOV
>>    *      scheduler groups are enabled.
>> + *
>> + * _`GUC_KLV_VF_CFG_ENGINE_GROUP_EXEC_QUANTUM' : 0x8A0E
>> + *      This config sets the VFs-execution-quantum for each scheduling group in
>> + *      milliseconds. The driver must provide an array of values, with each of
>> + *      them matching the respective group index (first value goes to group 0,
>> + *      second to group 1, etc). The setting of group values follows the same
>> + *      behavior and rules as setting via GUC_KLV_VF_CFG_EXEC_QUANTUM. Note that
>> + *      the GuC always sets the EQ for all groups (even the non-enabled ones),
>> + *      so if we provide fewer values than the max the GuC will use 0 for the
>> + *      remaining groups.
> don't forget to update xe_guc_klv_key_to_string()
>
>>    */
>>   
>>   #define GUC_KLV_VF_CFG_GGTT_START_KEY		0x0001
>> @@ -444,6 +454,8 @@ enum  {
>>   #define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_KEY	0x8a0d
>>   #define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_LEN	1u
>>   
>> +#define GUC_KLV_VF_CFG_ENGINE_GROUP_EXEC_QUANTUM_KEY	0x8a0e
> what about MIN_LEN and MAX_LEN definitions?

will add.

>
>> +
>>   /*
>>    * Workaround keys:
>>    */
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
>> index eb547fedb6da..1bfb25bda432 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
>> @@ -195,6 +195,22 @@ static int pf_push_vf_cfg_dbs(struct xe_gt *gt, unsigned int vfid, u32 begin, u3
>>   	return pf_push_vf_cfg_klvs(gt, vfid, 2, klvs, ARRAY_SIZE(klvs));
>>   }
>>   
>> +static int pf_push_vf_grp_cfg_u32(struct xe_gt *gt, unsigned int vfid,
>> +				  u16 key, const u32 *values, u32 count)
>> +{
>> +	u32 klv[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT + 1];
> this magic "1" is GUC_KLV_LEN_MIN, please use it
>
> and maybe we don't need this temp storage and can use CLASS(xe_guc_buf) ?

ok

>
>> +
>> +	if (!count)
>> +		return -ENODATA;
>> +	if (count > GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT)
>> +		return -E2BIG;
> this looks like our coding error, use assert instead

ok

>> +
>> +	klv[0] = FIELD_PREP(GUC_KLV_0_KEY, key) | FIELD_PREP(GUC_KLV_0_LEN, count);
>> +	memcpy(&klv[1], values, count * sizeof(u32));
>> +
>> +	return pf_push_vf_cfg_klvs(gt, vfid, 1, klv, count + 1);
>> +}
>> +
>>   static int pf_push_vf_cfg_exec_quantum(struct xe_gt *gt, unsigned int vfid, u32 *exec_quantum)
>>   {
>>   	/* GuC will silently clamp values exceeding max */
>> @@ -269,9 +285,11 @@ static u32 encode_config_ggtt(u32 *cfg, const struct xe_gt_sriov_config *config,
>>   }
>>   
>>   /* Return: number of configuration dwords written */
>> -static u32 encode_config(u32 *cfg, const struct xe_gt_sriov_config *config, bool details)
>> +static u32 encode_config(struct xe_gt *gt, u32 *cfg,
>> +			 const struct xe_gt_sriov_config *config, bool details)
>>   {
>>   	u32 n = 0;
>> +	int i;
>>   
>>   	n += encode_config_ggtt(cfg, config, details);
>>   
>> @@ -297,8 +315,15 @@ static u32 encode_config(u32 *cfg, const struct xe_gt_sriov_config *config, bool
>>   		cfg[n++] = upper_32_bits(xe_bo_size(config->lmem_obj));
>>   	}
>>   
>> -	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_EXEC_QUANTUM);
>> -	cfg[n++] = config->exec_quantum[0];
>> +	if (xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt)) {
>> +		cfg[n++] = PREP_GUC_KLV_CONST(GUC_KLV_VF_CFG_ENGINE_GROUP_EXEC_QUANTUM_KEY,
>> +					      GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
>> +		for (i = 0; i < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT; i++)
>> +			cfg[n++] = config->exec_quantum[i];
>> +	} else {
>> +		cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_EXEC_QUANTUM);
>> +		cfg[n++] = config->exec_quantum[0];
>> +	}
> I guess it's time to extract above chunk to new encode_sched() helper
>
> there we could encode both EQ and PT and avoid double call to
> 	xe_sriov_gt_pf_policy_has_valid_sched_group_modes

will do

>
>>   
>>   	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_PREEMPT_TIMEOUT);
>>   	cfg[n++] = config->preempt_timeout[0];
>> @@ -328,7 +353,7 @@ static int pf_push_full_vf_config(struct xe_gt *gt, unsigned int vfid)
>>   		return -ENOBUFS;
>>   
>>   	cfg = xe_guc_buf_cpu_ptr(buf);
>> -	num_dwords = encode_config(cfg, config, true);
>> +	num_dwords = encode_config(gt, cfg, config, true);
>>   	xe_gt_assert(gt, num_dwords <= max_cfg_dwords);
>>   
>>   	if (xe_gt_is_media_type(gt)) {
>> @@ -952,6 +977,21 @@ static const char *spare_unit(u32 unused)
>>   	return " spare";
>>   }
>>   
>> +static void __set_u32_done(struct xe_gt *gt, const char *name, u32 value, u32 actual,
>> +			   const char *what, const char *(*unit)(u32), int err)
> please keep the pf prefix:
>
> 	__pf_config_set_u32_done(...
>
> and maybe we shouldn't change the meaning of the "name" here (as it's still about PF or VF)
> but rather augment the "what" was changed, like:
>
> 	"execution quantum" ->	"group0 execution quantum"
>
> so the only helper we need is:
>
> const char *to_group_name(const char *what, unsigned int group, char *buf, size_t size)
> {
> 	snprintf(buf, size, "group%u%s%s", group, what ? " " : "", what ?: "");
> 	return buf;
> }
>
> then we could call existing helper as usual:
>
> 	pf_group_config_set_u32_done(gt, vfid, value, actual,
> 				to_group_name(what, group, name, sizeof(name)),
> 				unit, err);
>
> which will result in:
>
>   [drm] PF: Tile0: GT1: VF1 provisioned with 1ms group0 execution quantum
>
> or
>
>   [drm] *ERROR* PF: Tile0: GT1: Failed to provision VF1 with 1ms group0 execution quantum (-EIO)

ok

>
>> +{
>> +	if (unlikely(err)) {
>> +		xe_gt_sriov_notice(gt, "Failed to provision %s with %u%s %s (%pe)\n",
>> +				   name, value, unit(value), what, ERR_PTR(err));
>> +		xe_gt_sriov_info(gt, "%s provisioning remains at %u%s %s\n",
>> +				 name, actual, unit(actual), what);
>> +	} else {
>> +		/* the actual value may have changed during provisioning */
>> +		xe_gt_sriov_info(gt, "%s provisioned with %u%s %s\n",
>> +				 name, actual, unit(actual), what);
>> +	}
>> +}
>> +
>>   static int pf_config_set_u32_done(struct xe_gt *gt, unsigned int vfid, u32 value, u32 actual,
>>   				  const char *what, const char *(*unit)(u32), int err)
>>   {
>> @@ -959,18 +999,47 @@ static int pf_config_set_u32_done(struct xe_gt *gt, unsigned int vfid, u32 value
>>   
>>   	xe_sriov_function_name(vfid, name, sizeof(name));
>>   
>> -	if (unlikely(err)) {
>> -		xe_gt_sriov_notice(gt, "Failed to provision %s with %u%s %s (%pe)\n",
>> -				   name, value, unit(value), what, ERR_PTR(err));
>> -		xe_gt_sriov_info(gt, "%s provisioning remains at %u%s %s\n",
>> -				 name, actual, unit(actual), what);
>> -		return err;
>> +	__set_u32_done(gt, name, value, actual, what, unit, err);
>> +
>> +	return err;
>> +}
>> +
>> +static int pf_group_config_set_u32_done(struct xe_gt *gt, unsigned int vfid, u8 group,
>> +					u32 value, u32 actual, const char *what,
>> +					const char *(*unit)(u32), int err)
>> +{
>> +	char name[24];
>> +
>> +	xe_sriov_function_and_group_name(vfid, group, name, sizeof(name));
>> +
>> +	__set_u32_done(gt, name, value, actual, what, unit, err);
>> +
>> +	return err;
>> +}
>> +
>> +static int
>> +pf_groups_cfg_set_u32_array_done(struct xe_gt *gt, unsigned int vfid,
>> +				 u32 *values, u32 count,
>> +				 void (*get_actual)(struct xe_gt *, unsigned int, u32 *, u32),
>> +				 const char *what, const char *(*unit)(u32), int err)
>> +{
>> +	u32 actual[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];
>> +	char name[24];
>> +	u8 g;
>> +
>> +	get_actual(gt, vfid, actual, count);
>> +
>> +	for (g = 0; g < count; g++) {
>> +		xe_sriov_function_and_group_name(vfid, g, name, sizeof(name));
>> +
>> +		__set_u32_done(gt, name, values[g], actual[g], what, unit, err);
> in case of error, does it make sense to report the same error up to 8 times?

yes, because we print the value that is left active, which can be 
different per group.

>
>>   	}
>>   
>> -	/* the actual value may have changed during provisioning */
>> -	xe_gt_sriov_info(gt, "%s provisioned with %u%s %s\n",
>> -			 name, actual, unit(actual), what);
>> -	return 0;
>> +	if (!err && count < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT)
>> +		xe_gt_sriov_info(gt, "All remaining groups provisioned with 0%s %s\n",
>> +				 unit(0), what);
> this prints:
>
>   [drm] PF: Tile0: GT1: All remaining groups provisioned with 0(infinity) execution quantum
>
> but there is no info about the target: PF or VF1
>
> but OTOH do we need to shout about implicit configurations, so maybe just drop it?

ok, I'll drop it.

>
>> +
>> +	return err;
>>   }
>>   
>>   /**
>> @@ -1869,11 +1938,16 @@ static int pf_provision_exec_quantum(struct xe_gt *gt, unsigned int vfid,
>>   	return 0;
>>   }
>>   
>> -static u32 pf_get_exec_quantum(struct xe_gt *gt, unsigned int vfid)
>> +static u32 pf_get_group_exec_quantum(struct xe_gt *gt, unsigned int vfid, u8 group)
> do we need to use fixed size integer for group index ?

This function will just be removed due to other suggestions

>>   {
>>   	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
>>   
>> -	return config->exec_quantum[0];
>> +	return config->exec_quantum[group];
>> +}
>> +
>> +static u32 pf_get_exec_quantum(struct xe_gt *gt, unsigned int vfid)
>> +{
>> +	return pf_get_group_exec_quantum(gt, vfid, 0);
>>   }
>>   
>>   /**
>> @@ -1980,6 +2054,137 @@ int xe_gt_sriov_pf_config_bulk_set_exec_quantum_locked(struct xe_gt *gt, u32 exe
>>   					   exec_quantum_unit, n, err);
>>   }
>>   
>> +static int pf_provision_groups_exec_quantums(struct xe_gt *gt, unsigned int vfid,
>> +					     const u32 *exec_quantums, u32 count)
>> +{
>> +	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
>> +	int err;
>> +	int i;
>> +
>> +	err = pf_push_vf_grp_cfg_u32(gt, vfid, GUC_KLV_VF_CFG_ENGINE_GROUP_EXEC_QUANTUM_KEY,
>> +				     exec_quantums, count);
>> +	if (unlikely(err))
>> +		return err;
>> +
>> +	/*
>> +	 * GuC silently clamps values exceeding the max and zeroes out the
>> +	 * quantum for groups not in the array
>> +	 */
>> +	for (i = 0; i < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT; i++) {
>> +		if (i < count)
>> +			config->exec_quantum[i] = min_t(u32, exec_quantums[i],
>> +							GUC_KLV_VF_CFG_EXEC_QUANTUM_MAX_VALUE);
>> +		else
>> +			config->exec_quantum[i] = 0;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static void pf_get_groups_exec_quantums(struct xe_gt *gt, unsigned int vfid,
>> +					u32 *exec_quantums, u32 max_count)
>> +{
>> +	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
>> +	u32 count = min_t(u32, max_count, GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
>> +
>> +	memcpy(exec_quantums, config->exec_quantum, sizeof(u32) * count);
>> +}
>> +
>> +/**
>> + * xe_gt_sriov_pf_config_set_groups_exec_quantums() - Configure PF/VF EQs for sched groups.
>> + * @gt: the &xe_gt
>> + * @vfid: the PF or VF identifier
>> + * @exec_quantums: array of requested EQs in milliseconds (0 is infinity)
>> + * @count: number of entries in the array
>> + *
>> + * This function can only be called on PF.
>> + * It will log the provisioned value or an error in case of the failure.
>> + *
>> + * Return: 0 on success or a negative error code on failure.
>> + */
>> +int xe_gt_sriov_pf_config_set_groups_exec_quantums(struct xe_gt *gt, unsigned int vfid,
>> +						   u32 *exec_quantums, u32 count)
>> +{
>> +	int err;
>> +
>> +	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
>> +
>> +	err = pf_provision_groups_exec_quantums(gt, vfid, exec_quantums, count);
>> +
>> +	return pf_groups_cfg_set_u32_array_done(gt, vfid, exec_quantums, count,
>> +						pf_get_groups_exec_quantums,
>> +						"execution quantum",
>> +						exec_quantum_unit, err);
>> +}
>> +
>> +/**
>> + * xe_gt_sriov_pf_config_get_groups_exec_quantums - Get PF/VF sched groups EQs
>> + * @gt: the &xe_gt
>> + * @vfid: the PF or VF identifier
>> + * @exec_quantums: array in which to store the execution quantums values
>> + * @max_count: maximum number of entries to store
> just @count ?

ok

>
>> + *
>> + * This function can only be called on PF.
>> + */
>> +void xe_gt_sriov_pf_config_get_groups_exec_quantums(struct xe_gt *gt, unsigned int vfid,
>> +						    u32 *exec_quantums, u32 max_count)
>> +{
>> +	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
> maybe assert that count <= MAX_GROUPS ?

ok

>
>> +
>> +	return pf_get_groups_exec_quantums(gt, vfid, exec_quantums, max_count);
>> +}
>> +
>> +/**
>> + * xe_gt_sriov_pf_config_set_group_exec_quantum - Configure PF/VF EQs for a sched group.
>> + * @gt: the &xe_gt
>> + * @vfid: the PF or VF identifier
>> + * @group: index of the group to configure
> GuC ABI does not allow directly to setup single group EQ, so why bother?

will drop

>
>> + * @exec_quantum: requested EQs in milliseconds (0 is infinity)
>> + *
>> + * This function can only be called on PF.
>> + * It will log the provisioned value or an error in case of the failure.
>> + *
>> + * Return: 0 on success or a negative error code on failure.
>> + */
>> +int xe_gt_sriov_pf_config_set_group_exec_quantum(struct xe_gt *gt, unsigned int vfid,
>> +						 u8 group, u32 exec_quantum)
>> +{
>> +	u32 values[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];
>> +	int err;
>> +
>> +	xe_gt_assert(gt, group < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
>> +
>> +	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
>> +
>> +	pf_get_groups_exec_quantums(gt, vfid, values, ARRAY_SIZE(values));
>> +	values[group] = exec_quantum;
>> +
>> +	err = pf_provision_groups_exec_quantums(gt, vfid, values, ARRAY_SIZE(values));
>> +
>> +	return pf_group_config_set_u32_done(gt, vfid, group, exec_quantum,
>> +					    pf_get_group_exec_quantum(gt, vfid, group),
>> +					    "execution quantum", exec_quantum_unit, err);
>> +}
>> +
>> +/**
>> + * xe_gt_sriov_pf_config_get_group_exec_quantum - Get PF/VF EQ for a sched groups
>> + * @gt: the &xe_gt
>> + * @vfid: the PF or VF identifier
>> + * @group: index of the group for which to get the EQ
>> + *
>> + * This function can only be called on PF.
>> + *
>> + * Return: execution quantum in milliseconds (or 0 if infinity).
>> + */
>> +u32 xe_gt_sriov_pf_config_get_group_exec_quantum(struct xe_gt *gt, unsigned int vfid, u8 group)
>> +{
>> +	xe_gt_assert(gt, group < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
>> +
>> +	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
>> +
>> +	return pf_get_group_exec_quantum(gt, vfid, group);
>> +}
>> +
>>   static const char *preempt_timeout_unit(u32 preempt_timeout)
>>   {
>>   	return preempt_timeout ? "us" : "(infinity)";
>> @@ -2527,7 +2732,7 @@ ssize_t xe_gt_sriov_pf_config_save(struct xe_gt *gt, unsigned int vfid, void *bu
>>   			ret = -ENOBUFS;
>>   		} else {
>>   			config = pf_pick_vf_config(gt, vfid);
>> -			ret = encode_config(buf, config, false) * sizeof(u32);
>> +			ret = encode_config(gt, buf, config, false) * sizeof(u32);
>>   		}
>>   	}
>>   	mutex_unlock(xe_gt_sriov_pf_master_mutex(gt));
>> @@ -2554,6 +2759,11 @@ static int pf_restore_vf_config_klv(struct xe_gt *gt, unsigned int vfid,
>>   			return -EBADMSG;
>>   		return pf_provision_exec_quantum(gt, vfid, value[0]);
>>   
>> +	case GUC_KLV_VF_CFG_ENGINE_GROUP_EXEC_QUANTUM_KEY:
>> +		if (len > GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT)
>> +			return -EBADMSG;
>> +		return pf_provision_groups_exec_quantums(gt, vfid, value, len);
>> +
>>   	case GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_KEY:
>>   		if (len != GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_LEN)
>>   			return -EBADMSG;
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h
>> index 4975730423d7..aaf6bb824bc9 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h
>> @@ -46,6 +46,14 @@ int xe_gt_sriov_pf_config_set_exec_quantum_locked(struct xe_gt *gt, unsigned int
>>   						  u32 exec_quantum);
>>   int xe_gt_sriov_pf_config_bulk_set_exec_quantum_locked(struct xe_gt *gt, u32 exec_quantum);
>>   
>> +void xe_gt_sriov_pf_config_get_groups_exec_quantums(struct xe_gt *gt, unsigned int vfid,
>> +						    u32 *exec_quantum, u32 max_count);
>> +int xe_gt_sriov_pf_config_set_groups_exec_quantums(struct xe_gt *gt, unsigned int vfid,
>> +						   u32 *exec_quantum, u32 count);
>> +u32 xe_gt_sriov_pf_config_get_group_exec_quantum(struct xe_gt *gt, unsigned int vfid, u8 group);
>> +int xe_gt_sriov_pf_config_set_group_exec_quantum(struct xe_gt *gt, unsigned int vfid,
>> +						 u8 group, u32 exec_quantum);
>> +
>>   u32 xe_gt_sriov_pf_config_get_preempt_timeout(struct xe_gt *gt, unsigned int vfid);
>>   int xe_gt_sriov_pf_config_set_preempt_timeout(struct xe_gt *gt, unsigned int vfid,
>>   					      u32 preempt_timeout);
>> diff --git a/drivers/gpu/drm/xe/xe_sriov.c b/drivers/gpu/drm/xe/xe_sriov.c
>> index ea411944609b..eecdd4aaf972 100644
>> --- a/drivers/gpu/drm/xe/xe_sriov.c
>> +++ b/drivers/gpu/drm/xe/xe_sriov.c
>> @@ -159,6 +159,24 @@ const char *xe_sriov_function_name(unsigned int n, char *buf, size_t size)
>>   	return buf;
>>   }
>>   
>> +/**
>> + * xe_sriov_function_and_group_name() - Get SR-IOV Function and group name.
>> + * @n: the Function number (identifier) to get name of
>> + * @n: the scheduling group to get name of
> @g or better @group
>
>> + * @buf: the buffer to format to
>> + * @size: size of the buffer (shall be at least 18 bytes)
>> + *
>> + * Return: formatted function name ("PF sched group%u" or "VF%u sched group%u").
>> + */
>> +const char *xe_sriov_function_and_group_name(unsigned int n, u8 g, char *buf, size_t size)
>> +{
>> +	if (n)
>> +		snprintf(buf, size, "VF%u sched group%u", n, g);
>> +	else
>> +		snprintf(buf, size, "PF sched group%u", g);
> 	char name[10];
> 	snprintf(buf, size, "%s sched group%u",
> 		xe_sriov_function_name(name, n, sizeof(name), group);
>
> but honestly I'm not convinced that we need this function at all

will drop.

Daniele

>
>> +	return buf;
>> +}
>> +
>>   /**
>>    * xe_sriov_init_late() - SR-IOV late initialization functions.
>>    * @xe: the &xe_device to initialize
>> diff --git a/drivers/gpu/drm/xe/xe_sriov.h b/drivers/gpu/drm/xe/xe_sriov.h
>> index 6db45df55615..df2b02cb97d0 100644
>> --- a/drivers/gpu/drm/xe/xe_sriov.h
>> +++ b/drivers/gpu/drm/xe/xe_sriov.h
>> @@ -14,6 +14,7 @@ struct drm_printer;
>>   
>>   const char *xe_sriov_mode_to_string(enum xe_sriov_mode mode);
>>   const char *xe_sriov_function_name(unsigned int n, char *buf, size_t len);
>> +const char *xe_sriov_function_and_group_name(unsigned int n, u8 g, char *buf, size_t size);
>>   
>>   void xe_sriov_probe_early(struct xe_device *xe);
>>   void xe_sriov_print_info(struct xe_device *xe, struct drm_printer *p);


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 09/10] drm/xe/sriov: Add functions to set preempt timeouts for each group
  2025-11-27  1:45 [PATCH 00/10] Introduce SRIOV scheduler groups Daniele Ceraolo Spurio
                   ` (7 preceding siblings ...)
  2025-11-27  1:45 ` [PATCH 08/10] drm/xe/sriov: Add functions to set exec quantums for each group Daniele Ceraolo Spurio
@ 2025-11-27  1:45 ` Daniele Ceraolo Spurio
  2025-12-02 20:01   ` Michal Wajdeczko
  2025-11-27  1:45 ` [PATCH 10/10] drm/xe/sriov: Add debugfs to set EQ and PT for scheduler groups Daniele Ceraolo Spurio
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-11-27  1:45 UTC (permalink / raw)
  To: intel-xe; +Cc: Daniele Ceraolo Spurio, Michal Wajdeczko

The KLV to set the preemption timeout for each groups works the exact
same way as the one for the exec quantums, so we add similar functions.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/abi/guc_klvs_abi.h      |  12 ++
 drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c | 157 ++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h |   8 ++
 3 files changed, 173 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
index a0763cc15518..02547043a9e0 100644
--- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
+++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
@@ -393,6 +393,16 @@ enum  {
  *      the GuC always sets the EQ for all groups (even the non-enabled ones),
  *      so if we provide fewer values than the max the GuC will use 0 for the
  *      remaining groups.
+ *
+ * _`GUC_KLV_VF_CFG_ENGINE_GROUP_PREEMPT_TIMEOUT' : 0x8A0F
+ *      This config sets the VFs-preemption-timeout for each scheduling group in
+ *      microseconds. The driver must provide an array of values, with each of
+ *      them matching the respective group index (first value goes to group 0,
+ *      second to group 1, etc). The setting of group values follows the same
+ *      behavior and rules as setting via GUC_KLV_VF_CFG_PREEMPT_TIMEOUT. Note
+ *      that the GuC always sets the EQ for all groups (even the non-enabled
+ *      ones), so if we provide fewer values than the max the GuC will use 0 for
+ *      the remaining groups.
  */
 
 #define GUC_KLV_VF_CFG_GGTT_START_KEY		0x0001
@@ -456,6 +466,8 @@ enum  {
 
 #define GUC_KLV_VF_CFG_ENGINE_GROUP_EXEC_QUANTUM_KEY	0x8a0e
 
+#define GUC_KLV_VF_CFG_ENGINE_GROUP_PREEMPT_TIMEOUT_KEY	0x8a0f
+
 /*
  * Workaround keys:
  */
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
index 1bfb25bda432..deb79b2a7527 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
@@ -325,8 +325,15 @@ static u32 encode_config(struct xe_gt *gt, u32 *cfg,
 		cfg[n++] = config->exec_quantum[0];
 	}
 
-	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_PREEMPT_TIMEOUT);
-	cfg[n++] = config->preempt_timeout[0];
+	if (xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt)) {
+		cfg[n++] = PREP_GUC_KLV_CONST(GUC_KLV_VF_CFG_ENGINE_GROUP_PREEMPT_TIMEOUT_KEY,
+					      GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
+		for (i = 0; i < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT; i++)
+			cfg[n++] = config->exec_quantum[i];
+	} else {
+		cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_PREEMPT_TIMEOUT);
+		cfg[n++] = config->preempt_timeout[0];
+	}
 
 #define encode_threshold_config(TAG, ...) ({					\
 	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_THRESHOLD_##TAG);			\
@@ -2207,11 +2214,16 @@ static int pf_provision_preempt_timeout(struct xe_gt *gt, unsigned int vfid,
 	return 0;
 }
 
-static u32 pf_get_preempt_timeout(struct xe_gt *gt, unsigned int vfid)
+static u32 pf_get_group_preempt_timeout(struct xe_gt *gt, unsigned int vfid, u8 group)
 {
 	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
 
-	return config->preempt_timeout[0];
+	return config->preempt_timeout[group];
+}
+
+static u32 pf_get_preempt_timeout(struct xe_gt *gt, unsigned int vfid)
+{
+	return pf_get_group_preempt_timeout(gt, vfid, 0);
 }
 
 /**
@@ -2317,6 +2329,138 @@ int xe_gt_sriov_pf_config_bulk_set_preempt_timeout_locked(struct xe_gt *gt, u32
 					   preempt_timeout_unit, n, err);
 }
 
+static int pf_provision_groups_preempt_timeouts(struct xe_gt *gt, unsigned int vfid,
+						const u32 *preempt_timeouts, u32 count)
+{
+	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
+	int err;
+	int i;
+
+	err = pf_push_vf_grp_cfg_u32(gt, vfid, GUC_KLV_VF_CFG_ENGINE_GROUP_PREEMPT_TIMEOUT_KEY,
+				     preempt_timeouts, count);
+	if (unlikely(err))
+		return err;
+
+	/*
+	 * GuC silently clamps values exceeding the max and zeroes out the
+	 * quantum for groups not in the array
+	 */
+	for (i = 0; i < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT; i++) {
+		if (i < count)
+			config->preempt_timeout[i] =
+				min_t(u32, preempt_timeouts[i],
+				      GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_MAX_VALUE);
+		else
+			config->preempt_timeout[i] = 0;
+	}
+
+	return 0;
+}
+
+static void pf_get_groups_preempt_timeouts(struct xe_gt *gt, unsigned int vfid,
+					   u32 *preempt_timeouts, u32 max_count)
+{
+	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
+	u32 count = min_t(u32, max_count, GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
+
+	memcpy(preempt_timeouts, config->preempt_timeout, sizeof(u32) * count);
+}
+
+/**
+ * xe_gt_sriov_pf_config_set_groups_preempt_timeouts() - Configure PF/VF PTs for sched groups.
+ * @gt: the &xe_gt
+ * @vfid: the PF or VF identifier
+ * @preempt_timeouts: array of requested PTs in microseconds (0 is infinity)
+ * @count: number of entries in the array
+ *
+ * This function can only be called on PF.
+ * It will log the provisioned value or an error in case of the failure.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_gt_sriov_pf_config_set_groups_preempt_timeouts(struct xe_gt *gt, unsigned int vfid,
+						      u32 *preempt_timeouts, u32 count)
+{
+	int err;
+
+	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
+
+	err = pf_provision_groups_preempt_timeouts(gt, vfid, preempt_timeouts, count);
+
+	return pf_groups_cfg_set_u32_array_done(gt, vfid, preempt_timeouts, count,
+						pf_get_groups_preempt_timeouts,
+						"preempt_timeout",
+						preempt_timeout_unit, err);
+}
+
+/**
+ * xe_gt_sriov_pf_config_get_groups_preempt_timeouts - Get PF/VF sched groups PTs
+ * @gt: the &xe_gt
+ * @vfid: the PF or VF identifier
+ * @preempt_timeouts: array in which to store the preemption timeouts values
+ * @max_count: maximum number of entries to store
+ *
+ * This function can only be called on PF.
+ */
+void xe_gt_sriov_pf_config_get_groups_preempt_timeouts(struct xe_gt *gt, unsigned int vfid,
+						       u32 *preempt_timeouts, u32 max_count)
+{
+	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
+
+	return pf_get_groups_preempt_timeouts(gt, vfid, preempt_timeouts, max_count);
+}
+
+/**
+ * xe_gt_sriov_pf_config_set_group_preempt_timeout - Configure PF/VF PT for a sched group.
+ * @gt: the &xe_gt
+ * @vfid: the PF or VF identifier
+ * @group: index of the group to configure
+ * @preempt_timeout: requested PT in microseconds (0 is infinity)
+ *
+ * This function can only be called on PF.
+ * It will log the provisioned value or an error in case of the failure.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_gt_sriov_pf_config_set_group_preempt_timeout(struct xe_gt *gt, unsigned int vfid,
+						    u8 group, u32 preempt_timeout)
+{
+	u32 values[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];
+	int err;
+
+	xe_gt_assert(gt, group < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
+
+	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
+
+	pf_get_groups_preempt_timeouts(gt, vfid, values, ARRAY_SIZE(values));
+	values[group] = preempt_timeout;
+
+	err = pf_provision_groups_preempt_timeouts(gt, vfid, values, ARRAY_SIZE(values));
+
+	return pf_group_config_set_u32_done(gt, vfid, group, preempt_timeout,
+					    pf_get_group_preempt_timeout(gt, vfid, group),
+					    "preempt_timeout", preempt_timeout_unit, err);
+}
+
+/**
+ * xe_gt_sriov_pf_config_get_group_preempt_timeout - Get PF/VF PT for a sched groups
+ * @gt: the &xe_gt
+ * @vfid: the PF or VF identifier
+ * @group: index of the group for which to get the PT
+ *
+ * This function can only be called on PF.
+ *
+ * Return: preemption timeout in microseconds (or 0 if infinity).
+ */
+u32 xe_gt_sriov_pf_config_get_group_preempt_timeout(struct xe_gt *gt, unsigned int vfid, u8 group)
+{
+	xe_gt_assert(gt, group < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
+
+	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
+
+	return pf_get_group_preempt_timeout(gt, vfid, group);
+}
+
 static const char *sched_priority_unit(u32 priority)
 {
 	return priority == GUC_SCHED_PRIORITY_LOW ? "(low)" :
@@ -2764,6 +2908,11 @@ static int pf_restore_vf_config_klv(struct xe_gt *gt, unsigned int vfid,
 			return -EBADMSG;
 		return pf_provision_groups_exec_quantums(gt, vfid, value, len);
 
+	case GUC_KLV_VF_CFG_ENGINE_GROUP_PREEMPT_TIMEOUT_KEY:
+		if (len > GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT)
+			return -EBADMSG;
+		return pf_provision_groups_preempt_timeouts(gt, vfid, value, len);
+
 	case GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_KEY:
 		if (len != GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_LEN)
 			return -EBADMSG;
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h
index aaf6bb824bc9..f4bfb26b2407 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h
@@ -63,6 +63,14 @@ int xe_gt_sriov_pf_config_set_preempt_timeout_locked(struct xe_gt *gt, unsigned
 						     u32 preempt_timeout);
 int xe_gt_sriov_pf_config_bulk_set_preempt_timeout_locked(struct xe_gt *gt, u32 preempt_timeout);
 
+void xe_gt_sriov_pf_config_get_groups_preempt_timeouts(struct xe_gt *gt, unsigned int vfid,
+						       u32 *preempt_timeout, u32 max_count);
+int xe_gt_sriov_pf_config_set_groups_preempt_timeouts(struct xe_gt *gt, unsigned int vfid,
+						      u32 *preempt_timeout, u32 count);
+u32 xe_gt_sriov_pf_config_get_group_preempt_timeout(struct xe_gt *gt, unsigned int vfid, u8 group);
+int xe_gt_sriov_pf_config_set_group_preempt_timeout(struct xe_gt *gt, unsigned int vfid,
+						    u8 group, u32 preempt_timeout);
+
 u32 xe_gt_sriov_pf_config_get_sched_priority(struct xe_gt *gt, unsigned int vfid);
 int xe_gt_sriov_pf_config_set_sched_priority(struct xe_gt *gt, unsigned int vfid, u32 priority);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 09/10] drm/xe/sriov: Add functions to set preempt timeouts for each group
  2025-11-27  1:45 ` [PATCH 09/10] drm/xe/sriov: Add functions to set preempt timeouts " Daniele Ceraolo Spurio
@ 2025-12-02 20:01   ` Michal Wajdeczko
  0 siblings, 0 replies; 44+ messages in thread
From: Michal Wajdeczko @ 2025-12-02 20:01 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-xe



On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
> The KLV to set the preemption timeout for each groups works the exact
> same way as the one for the exec quantums, so we add similar functions.
> 
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> ---
>  drivers/gpu/drm/xe/abi/guc_klvs_abi.h      |  12 ++
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c | 157 ++++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h |   8 ++
>  3 files changed, 173 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
> index a0763cc15518..02547043a9e0 100644
> --- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
> +++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
> @@ -393,6 +393,16 @@ enum  {
>   *      the GuC always sets the EQ for all groups (even the non-enabled ones),
>   *      so if we provide fewer values than the max the GuC will use 0 for the
>   *      remaining groups.
> + *
> + * _`GUC_KLV_VF_CFG_ENGINE_GROUP_PREEMPT_TIMEOUT' : 0x8A0F
> + *      This config sets the VFs-preemption-timeout for each scheduling group in
> + *      microseconds. The driver must provide an array of values, with each of
> + *      them matching the respective group index (first value goes to group 0,
> + *      second to group 1, etc). The setting of group values follows the same
> + *      behavior and rules as setting via GUC_KLV_VF_CFG_PREEMPT_TIMEOUT. Note
> + *      that the GuC always sets the EQ for all groups (even the non-enabled
> + *      ones), so if we provide fewer values than the max the GuC will use 0 for
> + *      the remaining groups.

update xe_guc_klv_key_to_string()


>   */
>  
>  #define GUC_KLV_VF_CFG_GGTT_START_KEY		0x0001
> @@ -456,6 +466,8 @@ enum  {
>  
>  #define GUC_KLV_VF_CFG_ENGINE_GROUP_EXEC_QUANTUM_KEY	0x8a0e
>  
> +#define GUC_KLV_VF_CFG_ENGINE_GROUP_PREEMPT_TIMEOUT_KEY	0x8a0f

add MIN_LEN and MAX_LEN

> +
>  /*
>   * Workaround keys:
>   */
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> index 1bfb25bda432..deb79b2a7527 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> @@ -325,8 +325,15 @@ static u32 encode_config(struct xe_gt *gt, u32 *cfg,
>  		cfg[n++] = config->exec_quantum[0];
>  	}
>  
> -	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_PREEMPT_TIMEOUT);
> -	cfg[n++] = config->preempt_timeout[0];
> +	if (xe_sriov_gt_pf_policy_has_valid_sched_group_modes(gt)) {
> +		cfg[n++] = PREP_GUC_KLV_CONST(GUC_KLV_VF_CFG_ENGINE_GROUP_PREEMPT_TIMEOUT_KEY,
> +					      GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
> +		for (i = 0; i < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT; i++)
> +			cfg[n++] = config->exec_quantum[i];
> +	} else {
> +		cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_PREEMPT_TIMEOUT);
> +		cfg[n++] = config->preempt_timeout[0];
> +	}

move to new encode_sched() 

>  
>  #define encode_threshold_config(TAG, ...) ({					\
>  	cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_THRESHOLD_##TAG);			\
> @@ -2207,11 +2214,16 @@ static int pf_provision_preempt_timeout(struct xe_gt *gt, unsigned int vfid,
>  	return 0;
>  }
>  
> -static u32 pf_get_preempt_timeout(struct xe_gt *gt, unsigned int vfid)
> +static u32 pf_get_group_preempt_timeout(struct xe_gt *gt, unsigned int vfid, u8 group)
>  {
>  	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
>  
> -	return config->preempt_timeout[0];
> +	return config->preempt_timeout[group];
> +}
> +
> +static u32 pf_get_preempt_timeout(struct xe_gt *gt, unsigned int vfid)
> +{
> +	return pf_get_group_preempt_timeout(gt, vfid, 0);
>  }
>  
>  /**
> @@ -2317,6 +2329,138 @@ int xe_gt_sriov_pf_config_bulk_set_preempt_timeout_locked(struct xe_gt *gt, u32
>  					   preempt_timeout_unit, n, err);
>  }
>  
> +static int pf_provision_groups_preempt_timeouts(struct xe_gt *gt, unsigned int vfid,
> +						const u32 *preempt_timeouts, u32 count)
> +{
> +	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
> +	int err;
> +	int i;
> +
> +	err = pf_push_vf_grp_cfg_u32(gt, vfid, GUC_KLV_VF_CFG_ENGINE_GROUP_PREEMPT_TIMEOUT_KEY,
> +				     preempt_timeouts, count);
> +	if (unlikely(err))
> +		return err;
> +
> +	/*
> +	 * GuC silently clamps values exceeding the max and zeroes out the
> +	 * quantum for groups not in the array

" .. not in the KLV payload" ?

> +	 */
> +	for (i = 0; i < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT; i++) {
> +		if (i < count)
> +			config->preempt_timeout[i] =
> +				min_t(u32, preempt_timeouts[i],
> +				      GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_MAX_VALUE);
> +		else
> +			config->preempt_timeout[i] = 0;
> +	}
> +
> +	return 0;
> +}
> +
> +static void pf_get_groups_preempt_timeouts(struct xe_gt *gt, unsigned int vfid,
> +					   u32 *preempt_timeouts, u32 max_count)
> +{
> +	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
> +	u32 count = min_t(u32, max_count, GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
> +
> +	memcpy(preempt_timeouts, config->preempt_timeout, sizeof(u32) * count);
> +}
> +
> +/**
> + * xe_gt_sriov_pf_config_set_groups_preempt_timeouts() - Configure PF/VF PTs for sched groups.
> + * @gt: the &xe_gt
> + * @vfid: the PF or VF identifier
> + * @preempt_timeouts: array of requested PTs in microseconds (0 is infinity)
> + * @count: number of entries in the array
> + *
> + * This function can only be called on PF.
> + * It will log the provisioned value or an error in case of the failure.
> + *
> + * Return: 0 on success or a negative error code on failure.
> + */
> +int xe_gt_sriov_pf_config_set_groups_preempt_timeouts(struct xe_gt *gt, unsigned int vfid,
> +						      u32 *preempt_timeouts, u32 count)
> +{
> +	int err;
> +
> +	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
> +
> +	err = pf_provision_groups_preempt_timeouts(gt, vfid, preempt_timeouts, count);
> +
> +	return pf_groups_cfg_set_u32_array_done(gt, vfid, preempt_timeouts, count,
> +						pf_get_groups_preempt_timeouts,
> +						"preempt_timeout",
> +						preempt_timeout_unit, err);
> +}
> +
> +/**
> + * xe_gt_sriov_pf_config_get_groups_preempt_timeouts - Get PF/VF sched groups PTs
> + * @gt: the &xe_gt
> + * @vfid: the PF or VF identifier
> + * @preempt_timeouts: array in which to store the preemption timeouts values
> + * @max_count: maximum number of entries to store
> + *
> + * This function can only be called on PF.
> + */
> +void xe_gt_sriov_pf_config_get_groups_preempt_timeouts(struct xe_gt *gt, unsigned int vfid,
> +						       u32 *preempt_timeouts, u32 max_count)
> +{
> +	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
> +
> +	return pf_get_groups_preempt_timeouts(gt, vfid, preempt_timeouts, max_count);
> +}
> +
> +/**
> + * xe_gt_sriov_pf_config_set_group_preempt_timeout - Configure PF/VF PT for a sched group.
> + * @gt: the &xe_gt
> + * @vfid: the PF or VF identifier
> + * @group: index of the group to configure

I don't think we need to expose per-group function to change PT as
GuC ABI does not allow to do that directly

for debugfs purposes it should be sufficient to read/configure all at once

> + * @preempt_timeout: requested PT in microseconds (0 is infinity)
> + *
> + * This function can only be called on PF.
> + * It will log the provisioned value or an error in case of the failure.
> + *
> + * Return: 0 on success or a negative error code on failure.
> + */
> +int xe_gt_sriov_pf_config_set_group_preempt_timeout(struct xe_gt *gt, unsigned int vfid,
> +						    u8 group, u32 preempt_timeout)
> +{
> +	u32 values[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];
> +	int err;
> +
> +	xe_gt_assert(gt, group < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
> +
> +	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
> +
> +	pf_get_groups_preempt_timeouts(gt, vfid, values, ARRAY_SIZE(values));
> +	values[group] = preempt_timeout;
> +
> +	err = pf_provision_groups_preempt_timeouts(gt, vfid, values, ARRAY_SIZE(values));
> +
> +	return pf_group_config_set_u32_done(gt, vfid, group, preempt_timeout,
> +					    pf_get_group_preempt_timeout(gt, vfid, group),
> +					    "preempt_timeout", preempt_timeout_unit, err);
> +}
> +
> +/**
> + * xe_gt_sriov_pf_config_get_group_preempt_timeout - Get PF/VF PT for a sched groups
> + * @gt: the &xe_gt
> + * @vfid: the PF or VF identifier
> + * @group: index of the group for which to get the PT
> + *
> + * This function can only be called on PF.
> + *
> + * Return: preemption timeout in microseconds (or 0 if infinity).
> + */
> +u32 xe_gt_sriov_pf_config_get_group_preempt_timeout(struct xe_gt *gt, unsigned int vfid, u8 group)
> +{
> +	xe_gt_assert(gt, group < GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT);
> +
> +	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
> +
> +	return pf_get_group_preempt_timeout(gt, vfid, group);
> +}
> +
>  static const char *sched_priority_unit(u32 priority)
>  {
>  	return priority == GUC_SCHED_PRIORITY_LOW ? "(low)" :
> @@ -2764,6 +2908,11 @@ static int pf_restore_vf_config_klv(struct xe_gt *gt, unsigned int vfid,
>  			return -EBADMSG;
>  		return pf_provision_groups_exec_quantums(gt, vfid, value, len);
>  
> +	case GUC_KLV_VF_CFG_ENGINE_GROUP_PREEMPT_TIMEOUT_KEY:
> +		if (len > GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT)
> +			return -EBADMSG;
> +		return pf_provision_groups_preempt_timeouts(gt, vfid, value, len);
> +
>  	case GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_KEY:
>  		if (len != GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_LEN)
>  			return -EBADMSG;
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h
> index aaf6bb824bc9..f4bfb26b2407 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h
> @@ -63,6 +63,14 @@ int xe_gt_sriov_pf_config_set_preempt_timeout_locked(struct xe_gt *gt, unsigned
>  						     u32 preempt_timeout);
>  int xe_gt_sriov_pf_config_bulk_set_preempt_timeout_locked(struct xe_gt *gt, u32 preempt_timeout);
>  
> +void xe_gt_sriov_pf_config_get_groups_preempt_timeouts(struct xe_gt *gt, unsigned int vfid,
> +						       u32 *preempt_timeout, u32 max_count);
> +int xe_gt_sriov_pf_config_set_groups_preempt_timeouts(struct xe_gt *gt, unsigned int vfid,
> +						      u32 *preempt_timeout, u32 count);
> +u32 xe_gt_sriov_pf_config_get_group_preempt_timeout(struct xe_gt *gt, unsigned int vfid, u8 group);
> +int xe_gt_sriov_pf_config_set_group_preempt_timeout(struct xe_gt *gt, unsigned int vfid,
> +						    u8 group, u32 preempt_timeout);
> +
>  u32 xe_gt_sriov_pf_config_get_sched_priority(struct xe_gt *gt, unsigned int vfid);
>  int xe_gt_sriov_pf_config_set_sched_priority(struct xe_gt *gt, unsigned int vfid, u32 priority);
>  


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 10/10] drm/xe/sriov: Add debugfs to set EQ and PT for scheduler groups
  2025-11-27  1:45 [PATCH 00/10] Introduce SRIOV scheduler groups Daniele Ceraolo Spurio
                   ` (8 preceding siblings ...)
  2025-11-27  1:45 ` [PATCH 09/10] drm/xe/sriov: Add functions to set preempt timeouts " Daniele Ceraolo Spurio
@ 2025-11-27  1:45 ` Daniele Ceraolo Spurio
  2025-12-02 20:17   ` Michal Wajdeczko
  2025-11-27  1:51 ` ✗ CI.checkpatch: warning for Introduce SRIOV " Patchwork
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-11-27  1:45 UTC (permalink / raw)
  To: intel-xe; +Cc: Daniele Ceraolo Spurio, Michal Wajdeczko

A top-level debugfs file is added that allows a user to provide a
comma-separated list of values to assign to each group. Per-group files
are also added to allow individual tuning of a specific group.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c | 173 +++++++++++++++++++-
 1 file changed, 168 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
index 947e2b92d58a..052510736017 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
@@ -165,24 +165,169 @@ static void pf_add_policy_attrs(struct xe_gt *gt, struct dentry *parent)
  *              :   ├── gt0
  *                  :   ├── sched_groups_mode
  *                      ├── sched_groups
- *                      :   ├── group0
- *                          :   └── engines
+ *                      :   ├── exec_quantums_ms
+ *                          ├── preempt_timeouts_us
+ *                          ├── group0
+ *                          :   ├── engines
+ *                              ├── exec_quantum_ms
+ *                              └── preempt_timeout_us
  *                          :
  *                          └── groupN
- *          :                   └── engines
+ *                              ├── engines
+ *                              ├── exec_quantum_ms
+ *          :                   └── preempt_timeout_us
  *          ├── vf1
  *          :   ├── tile0
  *              :   ├── gt0
  *                  :   ├── sched_groups
- *                      :   ├── group0
- *                          :   └── engines
+ *                      :   ├── exec_quantums_ms
+ *                          ├── preempt_timeouts_us
+ *                          ├── group0
+ *                          :   ├── engines
+ *                              ├── exec_quantum_ms
+ *                              └── preempt_timeout_us
  */
 
 struct sched_group_info {
 	struct xe_gt *gt;
+	unsigned int vfid;
+	u8 group_id;
 	u32 *masks;
 };
 
+static int sched_groups_config_show(struct seq_file *m, void *data,
+				    void (*get)(struct xe_gt *, unsigned int, u32 *, u32))
+{
+	u32 values[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];
+	struct drm_printer p = drm_seq_file_printer(m);
+	struct sched_group_info *groups = m->private;
+	struct xe_gt *gt = groups[0].gt;
+	unsigned int vfid = groups[0].vfid;
+	bool first = true;
+	u8 g;
+
+	get(gt, vfid, values, ARRAY_SIZE(values));
+
+	for (g = 0; g < ARRAY_SIZE(values); g++) {
+		drm_printf(&p, "%s%u", first ? "" : ",", values[g]);
+
+		first = false;
+	}
+
+	drm_printf(&p, "\n");
+
+	return 0;
+}
+
+#define MAX_EGS_ARRAY_SIZE (GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT * sizeof(u32))
+static ssize_t sched_groups_config_write(struct file *file, const char __user *ubuf,
+					 size_t size, loff_t *pos,
+					 int (*set)(struct xe_gt *, unsigned int, u32 *, u32))
+{
+	struct sched_group_info *groups = file_inode(file)->i_private;
+	u32 values[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];
+	struct xe_gt *gt = groups[0].gt;
+	unsigned int vfid = groups[0].vfid;
+	int *input;
+	u32 count;
+	int ret;
+	int i;
+
+	if (*pos)
+		return -ESPIPE;
+
+	if (!size)
+		return -ENODATA;
+
+	ret = parse_int_array_user(ubuf, min(size, MAX_EGS_ARRAY_SIZE), &input);
+	if (ret)
+		return ret;
+
+	count = input[0];
+	for (i = 0; i < count; i++) {
+		if (input[i + 1] < 0 || input[i + 1] > S32_MAX) {
+			ret = -EINVAL;
+			goto out;
+		}
+
+		values[i] = input[i + 1];
+	}
+
+	xe_pm_runtime_get(gt_to_xe(gt));
+	ret = set(gt, vfid, values, count);
+	xe_pm_runtime_put(gt_to_xe(gt));
+
+out:
+	kfree(input);
+	return (ret < 0) ? ret : size;
+}
+
+#define DEFINE_SRIOV_GT_GRP_CFG_DEBUGFS_ATTRIBUTE(CONFIG, TYPE, FORMAT)		\
+static int sched_groups_##CONFIG##s_show(struct seq_file *m,			\
+						  void *data)			\
+{										\
+	return sched_groups_config_show(m, data,				\
+					xe_gt_sriov_pf_config_get_groups_##CONFIG##s); \
+}										\
+										\
+static int sched_groups_##CONFIG##s_open(struct inode *inode, struct file *file)\
+{										\
+	return single_open(file, sched_groups_##CONFIG##s_show,			\
+			   inode->i_private);					\
+}										\
+										\
+static ssize_t sched_groups_##CONFIG##s_write(struct file *file,		\
+					      const char __user *ubuf,		\
+					      size_t size, loff_t *pos)		\
+{										\
+	return sched_groups_config_write(file, ubuf, size, pos,			\
+					 xe_gt_sriov_pf_config_set_groups_##CONFIG##s); \
+}										\
+										\
+static const struct file_operations sched_groups_##CONFIG##s_fops = {		\
+	.owner = THIS_MODULE,							\
+	.open = sched_groups_##CONFIG##s_open,					\
+	.read = seq_read,							\
+	.llseek = seq_lseek,							\
+	.write = sched_groups_##CONFIG##s_write,				\
+	.release = single_release,						\
+};										\
+										\
+static int group_##CONFIG##_set(void *data, u64 val)				\
+{										\
+	struct sched_group_info *gi = data;					\
+	struct xe_device *xe = gt_to_xe(gi->gt);				\
+	int err;								\
+										\
+	if (val > (TYPE)~0ull)							\
+		return -EOVERFLOW;						\
+										\
+	xe_pm_runtime_get(xe);							\
+	err = xe_sriov_pf_wait_ready(xe) ?:					\
+	      xe_gt_sriov_pf_config_set_group_##CONFIG(gi->gt, gi->vfid,	\
+						       gi->group_id, val);	\
+	if (!err)								\
+		xe_sriov_pf_provision_set_custom_mode(xe);			\
+	xe_pm_runtime_put(xe);							\
+										\
+	return err;								\
+}										\
+										\
+static int group_##CONFIG##_get(void *data, u64 *val)				\
+{										\
+	struct sched_group_info *gi = data;					\
+										\
+	*val = xe_gt_sriov_pf_config_get_group_##CONFIG(gi->gt, gi->vfid,	\
+							gi->group_id);		\
+	return 0;								\
+}										\
+										\
+DEFINE_DEBUGFS_ATTRIBUTE(group_##CONFIG##_fops, group_##CONFIG##_get,		\
+			 group_##CONFIG##_set, FORMAT)
+
+DEFINE_SRIOV_GT_GRP_CFG_DEBUGFS_ATTRIBUTE(exec_quantum, u32, "%llu\n");
+DEFINE_SRIOV_GT_GRP_CFG_DEBUGFS_ATTRIBUTE(preempt_timeout, u32, "%llu\n");
+
 static int sched_group_engines_info(struct seq_file *m, void *data)
 {
 	struct drm_printer p = drm_seq_file_printer(m);
@@ -261,6 +406,18 @@ static void sched_group_info_register(struct xe_gt *gt, unsigned int vfid)
 		goto out_err;
 	parent->d_inode->i_private = infos;
 
+	/*
+	 * assign group 0 gt and VF id values early as they're used by the
+	 * exec_quantums debugfs to set quantums for all groups
+	 */
+	infos[0].gt = gt;
+	infos[0].vfid = vfid;
+
+	debugfs_create_file("exec_quantums_ms", 0644, parent, infos,
+			    &sched_groups_exec_quantums_fops);
+	debugfs_create_file("preempt_timeouts_us", 0644, parent, infos,
+			    &sched_groups_preempt_timeouts_fops);
+
 	for (g = 0; g < num_groups; g++) {
 		struct sched_group_info *info = &infos[g];
 		u32 base = g * GUC_MAX_ENGINE_CLASSES;
@@ -273,10 +430,16 @@ static void sched_group_info_register(struct xe_gt *gt, unsigned int vfid)
 			goto out_err;
 
 		info->gt = gt;
+		info->vfid = vfid;
+		info->group_id = g;
 		info->masks = &policy->guc.sched_groups.modes[mode].masks[base];
 
 		dent->d_inode->i_private = info;
 		debugfs_create_file("engines", 0644, dent, info, &sched_group_engines_fops);
+		debugfs_create_file("exec_quantum_ms", 0644, dent, info,
+				    &group_exec_quantum_fops);
+		debugfs_create_file("preempt_timeout_us", 0644, dent, info,
+				    &group_preempt_timeout_fops);
 	}
 
 	return;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 10/10] drm/xe/sriov: Add debugfs to set EQ and PT for scheduler groups
  2025-11-27  1:45 ` [PATCH 10/10] drm/xe/sriov: Add debugfs to set EQ and PT for scheduler groups Daniele Ceraolo Spurio
@ 2025-12-02 20:17   ` Michal Wajdeczko
  2025-12-06  1:53     ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 44+ messages in thread
From: Michal Wajdeczko @ 2025-12-02 20:17 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-xe



On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
> A top-level debugfs file is added that allows a user to provide a
> comma-separated list of values to assign to each group. Per-group files

it doesn't need to be comma-separated list, just array of integers

> are also added to allow individual tuning of a specific group.
> 
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c | 173 +++++++++++++++++++-
>  1 file changed, 168 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
> index 947e2b92d58a..052510736017 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
> @@ -165,24 +165,169 @@ static void pf_add_policy_attrs(struct xe_gt *gt, struct dentry *parent)
>   *              :   ├── gt0
>   *                  :   ├── sched_groups_mode
>   *                      ├── sched_groups
> - *                      :   ├── group0
> - *                          :   └── engines
> + *                      :   ├── exec_quantums_ms
> + *                          ├── preempt_timeouts_us
> + *                          ├── group0
> + *                          :   ├── engines
> + *                              ├── exec_quantum_ms
> + *                              └── preempt_timeout_us

as already commented, per-group EQ/PT are overkill

let just keep array-variants

    *                      ├── sched_groups_exec_quantums_ms
    *                      ├── sched_groups_preempt_timeouts_us

and regular files for groups to show configured engines (if any)

    *                      ├── sched_groups/
    *                          ├── group0
    *                          ├── group1

>   *                          :
>   *                          └── groupN
> - *          :                   └── engines
> + *                              ├── engines
> + *                              ├── exec_quantum_ms
> + *          :                   └── preempt_timeout_us
>   *          ├── vf1
>   *          :   ├── tile0
>   *              :   ├── gt0
>   *                  :   ├── sched_groups
> - *                      :   ├── group0
> - *                          :   └── engines
> + *                      :   ├── exec_quantums_ms
> + *                          ├── preempt_timeouts_us
> + *                          ├── group0
> + *                          :   ├── engines
> + *                              ├── exec_quantum_ms
> + *                              └── preempt_timeout_us
>   */
>  
>  struct sched_group_info {
>  	struct xe_gt *gt;
> +	unsigned int vfid;
> +	u8 group_id;
>  	u32 *masks;
>  };
>  
> +static int sched_groups_config_show(struct seq_file *m, void *data,
> +				    void (*get)(struct xe_gt *, unsigned int, u32 *, u32))
> +{
> +	u32 values[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];
> +	struct drm_printer p = drm_seq_file_printer(m);
> +	struct sched_group_info *groups = m->private;
> +	struct xe_gt *gt = groups[0].gt;
> +	unsigned int vfid = groups[0].vfid;
> +	bool first = true;
> +	u8 g;
> +
> +	get(gt, vfid, values, ARRAY_SIZE(values));
> +
> +	for (g = 0; g < ARRAY_SIZE(values); g++) {
> +		drm_printf(&p, "%s%u", first ? "" : ",", values[g]);

maybe we should print values without commas ?

	20 20 30

it's more likely to be accepted when we promote that to sysfs

> +
> +		first = false;
> +	}
> +
> +	drm_printf(&p, "\n");
> +
> +	return 0;
> +}
> +
> +#define MAX_EGS_ARRAY_SIZE (GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT * sizeof(u32))
> +static ssize_t sched_groups_config_write(struct file *file, const char __user *ubuf,
> +					 size_t size, loff_t *pos,
> +					 int (*set)(struct xe_gt *, unsigned int, u32 *, u32))
> +{
> +	struct sched_group_info *groups = file_inode(file)->i_private;
> +	u32 values[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];
> +	struct xe_gt *gt = groups[0].gt;
> +	unsigned int vfid = groups[0].vfid;
> +	int *input;
> +	u32 count;
> +	int ret;
> +	int i;
> +
> +	if (*pos)
> +		return -ESPIPE;
> +
> +	if (!size)
> +		return -ENODATA;
> +
> +	ret = parse_int_array_user(ubuf, min(size, MAX_EGS_ARRAY_SIZE), &input);
> +	if (ret)
> +		return ret;
> +
> +	count = input[0];

need to check against GUC_MAX_GROUPS

> +	for (i = 0; i < count; i++) {
> +		if (input[i + 1] < 0 || input[i + 1] > S32_MAX) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		values[i] = input[i + 1];
> +	}
> +
> +	xe_pm_runtime_get(gt_to_xe(gt));
> +	ret = set(gt, vfid, values, count);
> +	xe_pm_runtime_put(gt_to_xe(gt));
> +
> +out:
> +	kfree(input);
> +	return (ret < 0) ? ret : size;
> +}
> +
> +#define DEFINE_SRIOV_GT_GRP_CFG_DEBUGFS_ATTRIBUTE(CONFIG, TYPE, FORMAT)		\
> +static int sched_groups_##CONFIG##s_show(struct seq_file *m,			\
> +						  void *data)			\
> +{										\
> +	return sched_groups_config_show(m, data,				\
> +					xe_gt_sriov_pf_config_get_groups_##CONFIG##s); \
> +}										\
> +										\
> +static int sched_groups_##CONFIG##s_open(struct inode *inode, struct file *file)\
> +{										\
> +	return single_open(file, sched_groups_##CONFIG##s_show,			\
> +			   inode->i_private);					\
> +}										\
> +										\
> +static ssize_t sched_groups_##CONFIG##s_write(struct file *file,		\
> +					      const char __user *ubuf,		\
> +					      size_t size, loff_t *pos)		\
> +{										\
> +	return sched_groups_config_write(file, ubuf, size, pos,			\
> +					 xe_gt_sriov_pf_config_set_groups_##CONFIG##s); \
> +}										\
> +										\
> +static const struct file_operations sched_groups_##CONFIG##s_fops = {		\
> +	.owner = THIS_MODULE,							\
> +	.open = sched_groups_##CONFIG##s_open,					\
> +	.read = seq_read,							\
> +	.llseek = seq_lseek,							\
> +	.write = sched_groups_##CONFIG##s_write,				\
> +	.release = single_release,						\
> +};										\
> +										\
> +static int group_##CONFIG##_set(void *data, u64 val)				\
> +{										\
> +	struct sched_group_info *gi = data;					\
> +	struct xe_device *xe = gt_to_xe(gi->gt);				\
> +	int err;								\
> +										\
> +	if (val > (TYPE)~0ull)							\
> +		return -EOVERFLOW;						\
> +										\
> +	xe_pm_runtime_get(xe);							\
> +	err = xe_sriov_pf_wait_ready(xe) ?:					\
> +	      xe_gt_sriov_pf_config_set_group_##CONFIG(gi->gt, gi->vfid,	\
> +						       gi->group_id, val);	\
> +	if (!err)								\
> +		xe_sriov_pf_provision_set_custom_mode(xe);			\
> +	xe_pm_runtime_put(xe);							\
> +										\
> +	return err;								\
> +}										\
> +										\
> +static int group_##CONFIG##_get(void *data, u64 *val)				\
> +{										\
> +	struct sched_group_info *gi = data;					\
> +										\
> +	*val = xe_gt_sriov_pf_config_get_group_##CONFIG(gi->gt, gi->vfid,	\
> +							gi->group_id);		\
> +	return 0;								\
> +}										\
> +										\
> +DEFINE_DEBUGFS_ATTRIBUTE(group_##CONFIG##_fops, group_##CONFIG##_get,		\
> +			 group_##CONFIG##_set, FORMAT)
> +
> +DEFINE_SRIOV_GT_GRP_CFG_DEBUGFS_ATTRIBUTE(exec_quantum, u32, "%llu\n");
> +DEFINE_SRIOV_GT_GRP_CFG_DEBUGFS_ATTRIBUTE(preempt_timeout, u32, "%llu\n");
> +
>  static int sched_group_engines_info(struct seq_file *m, void *data)
>  {
>  	struct drm_printer p = drm_seq_file_printer(m);
> @@ -261,6 +406,18 @@ static void sched_group_info_register(struct xe_gt *gt, unsigned int vfid)
>  		goto out_err;
>  	parent->d_inode->i_private = infos;
>  
> +	/*
> +	 * assign group 0 gt and VF id values early as they're used by the
> +	 * exec_quantums debugfs to set quantums for all groups
> +	 */
> +	infos[0].gt = gt;
> +	infos[0].vfid = vfid;
> +
> +	debugfs_create_file("exec_quantums_ms", 0644, parent, infos,
> +			    &sched_groups_exec_quantums_fops);
> +	debugfs_create_file("preempt_timeouts_us", 0644, parent, infos,
> +			    &sched_groups_preempt_timeouts_fops);
> +
>  	for (g = 0; g < num_groups; g++) {
>  		struct sched_group_info *info = &infos[g];
>  		u32 base = g * GUC_MAX_ENGINE_CLASSES;
> @@ -273,10 +430,16 @@ static void sched_group_info_register(struct xe_gt *gt, unsigned int vfid)
>  			goto out_err;
>  
>  		info->gt = gt;
> +		info->vfid = vfid;
> +		info->group_id = g;
>  		info->masks = &policy->guc.sched_groups.modes[mode].masks[base];
>  
>  		dent->d_inode->i_private = info;
>  		debugfs_create_file("engines", 0644, dent, info, &sched_group_engines_fops);
> +		debugfs_create_file("exec_quantum_ms", 0644, dent, info,
> +				    &group_exec_quantum_fops);
> +		debugfs_create_file("preempt_timeout_us", 0644, dent, info,
> +				    &group_preempt_timeout_fops);
>  	}
>  
>  	return;


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 10/10] drm/xe/sriov: Add debugfs to set EQ and PT for scheduler groups
  2025-12-02 20:17   ` Michal Wajdeczko
@ 2025-12-06  1:53     ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-12-06  1:53 UTC (permalink / raw)
  To: Michal Wajdeczko, intel-xe



On 12/2/2025 12:17 PM, Michal Wajdeczko wrote:
>
> On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
>> A top-level debugfs file is added that allows a user to provide a
>> comma-separated list of values to assign to each group. Per-group files
> it doesn't need to be comma-separated list, just array of integers

parse_int_array_user() requires a comma-separated list. IMO better to 
re-use that one that to re-implement our own logic.

>> are also added to allow individual tuning of a specific group.
>>
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c | 173 +++++++++++++++++++-
>>   1 file changed, 168 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
>> index 947e2b92d58a..052510736017 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
>> @@ -165,24 +165,169 @@ static void pf_add_policy_attrs(struct xe_gt *gt, struct dentry *parent)
>>    *              :   ├── gt0
>>    *                  :   ├── sched_groups_mode
>>    *                      ├── sched_groups
>> - *                      :   ├── group0
>> - *                          :   └── engines
>> + *                      :   ├── exec_quantums_ms
>> + *                          ├── preempt_timeouts_us
>> + *                          ├── group0
>> + *                          :   ├── engines
>> + *                              ├── exec_quantum_ms
>> + *                              └── preempt_timeout_us
> as already commented, per-group EQ/PT are overkill
>
> let just keep array-variants
>
>      *                      ├── sched_groups_exec_quantums_ms
>      *                      ├── sched_groups_preempt_timeouts_us
>
> and regular files for groups to show configured engines (if any)
>
>      *                      ├── sched_groups/
>      *                          ├── group0
>      *                          ├── group1

ok

>>    *                          :
>>    *                          └── groupN
>> - *          :                   └── engines
>> + *                              ├── engines
>> + *                              ├── exec_quantum_ms
>> + *          :                   └── preempt_timeout_us
>>    *          ├── vf1
>>    *          :   ├── tile0
>>    *              :   ├── gt0
>>    *                  :   ├── sched_groups
>> - *                      :   ├── group0
>> - *                          :   └── engines
>> + *                      :   ├── exec_quantums_ms
>> + *                          ├── preempt_timeouts_us
>> + *                          ├── group0
>> + *                          :   ├── engines
>> + *                              ├── exec_quantum_ms
>> + *                              └── preempt_timeout_us
>>    */
>>   
>>   struct sched_group_info {
>>   	struct xe_gt *gt;
>> +	unsigned int vfid;
>> +	u8 group_id;
>>   	u32 *masks;
>>   };
>>   
>> +static int sched_groups_config_show(struct seq_file *m, void *data,
>> +				    void (*get)(struct xe_gt *, unsigned int, u32 *, u32))
>> +{
>> +	u32 values[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];
>> +	struct drm_printer p = drm_seq_file_printer(m);
>> +	struct sched_group_info *groups = m->private;
>> +	struct xe_gt *gt = groups[0].gt;
>> +	unsigned int vfid = groups[0].vfid;
>> +	bool first = true;
>> +	u8 g;
>> +
>> +	get(gt, vfid, values, ARRAY_SIZE(values));
>> +
>> +	for (g = 0; g < ARRAY_SIZE(values); g++) {
>> +		drm_printf(&p, "%s%u", first ? "" : ",", values[g]);
> maybe we should print values without commas ?
>
> 	20 20 30
>
> it's more likely to be accepted when we promote that to sysfs

I left the commas to match with what we expect for the input.

>
>> +
>> +		first = false;
>> +	}
>> +
>> +	drm_printf(&p, "\n");
>> +
>> +	return 0;
>> +}
>> +
>> +#define MAX_EGS_ARRAY_SIZE (GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT * sizeof(u32))
>> +static ssize_t sched_groups_config_write(struct file *file, const char __user *ubuf,
>> +					 size_t size, loff_t *pos,
>> +					 int (*set)(struct xe_gt *, unsigned int, u32 *, u32))
>> +{
>> +	struct sched_group_info *groups = file_inode(file)->i_private;
>> +	u32 values[GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT];
>> +	struct xe_gt *gt = groups[0].gt;
>> +	unsigned int vfid = groups[0].vfid;
>> +	int *input;
>> +	u32 count;
>> +	int ret;
>> +	int i;
>> +
>> +	if (*pos)
>> +		return -ESPIPE;
>> +
>> +	if (!size)
>> +		return -ENODATA;
>> +
>> +	ret = parse_int_array_user(ubuf, min(size, MAX_EGS_ARRAY_SIZE), &input);
>> +	if (ret)
>> +		return ret;
>> +
>> +	count = input[0];
> need to check against GUC_MAX_GROUPS

ok

Daniele

>
>> +	for (i = 0; i < count; i++) {
>> +		if (input[i + 1] < 0 || input[i + 1] > S32_MAX) {
>> +			ret = -EINVAL;
>> +			goto out;
>> +		}
>> +
>> +		values[i] = input[i + 1];
>> +	}
>> +
>> +	xe_pm_runtime_get(gt_to_xe(gt));
>> +	ret = set(gt, vfid, values, count);
>> +	xe_pm_runtime_put(gt_to_xe(gt));
>> +
>> +out:
>> +	kfree(input);
>> +	return (ret < 0) ? ret : size;
>> +}
>> +
>> +#define DEFINE_SRIOV_GT_GRP_CFG_DEBUGFS_ATTRIBUTE(CONFIG, TYPE, FORMAT)		\
>> +static int sched_groups_##CONFIG##s_show(struct seq_file *m,			\
>> +						  void *data)			\
>> +{										\
>> +	return sched_groups_config_show(m, data,				\
>> +					xe_gt_sriov_pf_config_get_groups_##CONFIG##s); \
>> +}										\
>> +										\
>> +static int sched_groups_##CONFIG##s_open(struct inode *inode, struct file *file)\
>> +{										\
>> +	return single_open(file, sched_groups_##CONFIG##s_show,			\
>> +			   inode->i_private);					\
>> +}										\
>> +										\
>> +static ssize_t sched_groups_##CONFIG##s_write(struct file *file,		\
>> +					      const char __user *ubuf,		\
>> +					      size_t size, loff_t *pos)		\
>> +{										\
>> +	return sched_groups_config_write(file, ubuf, size, pos,			\
>> +					 xe_gt_sriov_pf_config_set_groups_##CONFIG##s); \
>> +}										\
>> +										\
>> +static const struct file_operations sched_groups_##CONFIG##s_fops = {		\
>> +	.owner = THIS_MODULE,							\
>> +	.open = sched_groups_##CONFIG##s_open,					\
>> +	.read = seq_read,							\
>> +	.llseek = seq_lseek,							\
>> +	.write = sched_groups_##CONFIG##s_write,				\
>> +	.release = single_release,						\
>> +};										\
>> +										\
>> +static int group_##CONFIG##_set(void *data, u64 val)				\
>> +{										\
>> +	struct sched_group_info *gi = data;					\
>> +	struct xe_device *xe = gt_to_xe(gi->gt);				\
>> +	int err;								\
>> +										\
>> +	if (val > (TYPE)~0ull)							\
>> +		return -EOVERFLOW;						\
>> +										\
>> +	xe_pm_runtime_get(xe);							\
>> +	err = xe_sriov_pf_wait_ready(xe) ?:					\
>> +	      xe_gt_sriov_pf_config_set_group_##CONFIG(gi->gt, gi->vfid,	\
>> +						       gi->group_id, val);	\
>> +	if (!err)								\
>> +		xe_sriov_pf_provision_set_custom_mode(xe);			\
>> +	xe_pm_runtime_put(xe);							\
>> +										\
>> +	return err;								\
>> +}										\
>> +										\
>> +static int group_##CONFIG##_get(void *data, u64 *val)				\
>> +{										\
>> +	struct sched_group_info *gi = data;					\
>> +										\
>> +	*val = xe_gt_sriov_pf_config_get_group_##CONFIG(gi->gt, gi->vfid,	\
>> +							gi->group_id);		\
>> +	return 0;								\
>> +}										\
>> +										\
>> +DEFINE_DEBUGFS_ATTRIBUTE(group_##CONFIG##_fops, group_##CONFIG##_get,		\
>> +			 group_##CONFIG##_set, FORMAT)
>> +
>> +DEFINE_SRIOV_GT_GRP_CFG_DEBUGFS_ATTRIBUTE(exec_quantum, u32, "%llu\n");
>> +DEFINE_SRIOV_GT_GRP_CFG_DEBUGFS_ATTRIBUTE(preempt_timeout, u32, "%llu\n");
>> +
>>   static int sched_group_engines_info(struct seq_file *m, void *data)
>>   {
>>   	struct drm_printer p = drm_seq_file_printer(m);
>> @@ -261,6 +406,18 @@ static void sched_group_info_register(struct xe_gt *gt, unsigned int vfid)
>>   		goto out_err;
>>   	parent->d_inode->i_private = infos;
>>   
>> +	/*
>> +	 * assign group 0 gt and VF id values early as they're used by the
>> +	 * exec_quantums debugfs to set quantums for all groups
>> +	 */
>> +	infos[0].gt = gt;
>> +	infos[0].vfid = vfid;
>> +
>> +	debugfs_create_file("exec_quantums_ms", 0644, parent, infos,
>> +			    &sched_groups_exec_quantums_fops);
>> +	debugfs_create_file("preempt_timeouts_us", 0644, parent, infos,
>> +			    &sched_groups_preempt_timeouts_fops);
>> +
>>   	for (g = 0; g < num_groups; g++) {
>>   		struct sched_group_info *info = &infos[g];
>>   		u32 base = g * GUC_MAX_ENGINE_CLASSES;
>> @@ -273,10 +430,16 @@ static void sched_group_info_register(struct xe_gt *gt, unsigned int vfid)
>>   			goto out_err;
>>   
>>   		info->gt = gt;
>> +		info->vfid = vfid;
>> +		info->group_id = g;
>>   		info->masks = &policy->guc.sched_groups.modes[mode].masks[base];
>>   
>>   		dent->d_inode->i_private = info;
>>   		debugfs_create_file("engines", 0644, dent, info, &sched_group_engines_fops);
>> +		debugfs_create_file("exec_quantum_ms", 0644, dent, info,
>> +				    &group_exec_quantum_fops);
>> +		debugfs_create_file("preempt_timeout_us", 0644, dent, info,
>> +				    &group_preempt_timeout_fops);
>>   	}
>>   
>>   	return;


^ permalink raw reply	[flat|nested] 44+ messages in thread

* ✗ CI.checkpatch: warning for Introduce SRIOV scheduler groups
  2025-11-27  1:45 [PATCH 00/10] Introduce SRIOV scheduler groups Daniele Ceraolo Spurio
                   ` (9 preceding siblings ...)
  2025-11-27  1:45 ` [PATCH 10/10] drm/xe/sriov: Add debugfs to set EQ and PT for scheduler groups Daniele Ceraolo Spurio
@ 2025-11-27  1:51 ` Patchwork
  2025-11-27  1:52 ` ✓ CI.KUnit: success " Patchwork
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Patchwork @ 2025-11-27  1:51 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-xe

== Series Details ==

Series: Introduce SRIOV scheduler groups
URL   : https://patchwork.freedesktop.org/series/158142/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
2de9a3901bc28757c7906b454717b64e2a214021
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 2bb13ada5c50ec8d38cac36e924be8ad094bfcde
Author: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Date:   Wed Nov 26 17:45:16 2025 -0800

    drm/xe/sriov: Add debugfs to set EQ and PT for scheduler groups
    
    A top-level debugfs file is added that allows a user to provide a
    comma-separated list of values to assign to each group. Per-group files
    are also added to allow individual tuning of a specific group.
    
    Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
    Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
+ /mt/dim checkpatch e7a767430515c3a6e8aee91c2a68cba8b06fe884 drm-intel
8fb1e02b22ed drm/xe/gt: Add engine masks for each class
8cc190c04502 drm/xe/sriov: Initialize scheduler groups
-:191: CHECK:UNNECESSARY_PARENTHESES: Unnecessary parentheses around 'm == XE_SRIOV_SCHED_GROUPS_NONE'
#191: FILE: drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c:466:
+		if ((m == XE_SRIOV_SCHED_GROUPS_NONE) || num_masks)

total: 0 errors, 0 warnings, 1 checks, 212 lines checked
8091054e06ab drm/xe/sriov: Add support for enabling scheduler groups
7fcee66f6e1b drm/xe/sriov: Scheduler groups are incompatible with multi-lrc
-:153: CHECK:UNNECESSARY_PARENTHESES: Unnecessary parentheses around 'mode != XE_SRIOV_SCHED_GROUPS_NONE'
#153: FILE: drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c:571:
+	if ((mode != XE_SRIOV_SCHED_GROUPS_NONE) && guc_has_mlrc_queue(&gt->uc.guc)) {

total: 0 errors, 0 warnings, 1 checks, 247 lines checked
0a109d3dc8af drm/xe/sriov: Add debugfs to enable scheduler groups
40f867c26023 drm/xe/sriov: Add debugfs with scheduler groups information
88c911962de3 drm/xe/sriov: Prep for multiple exec quantums and preemption timeouts
dc6140ec4d5d drm/xe/sriov: Add functions to set exec quantums for each group
e26bb478a62e drm/xe/sriov: Add functions to set preempt timeouts for each group
2bb13ada5c50 drm/xe/sriov: Add debugfs to set EQ and PT for scheduler groups



^ permalink raw reply	[flat|nested] 44+ messages in thread

* ✓ CI.KUnit: success for Introduce SRIOV scheduler groups
  2025-11-27  1:45 [PATCH 00/10] Introduce SRIOV scheduler groups Daniele Ceraolo Spurio
                   ` (10 preceding siblings ...)
  2025-11-27  1:51 ` ✗ CI.checkpatch: warning for Introduce SRIOV " Patchwork
@ 2025-11-27  1:52 ` Patchwork
  2025-11-27  2:36 ` ✗ Xe.CI.BAT: failure " Patchwork
  2025-11-27  3:18 ` ✗ Xe.CI.Full: " Patchwork
  13 siblings, 0 replies; 44+ messages in thread
From: Patchwork @ 2025-11-27  1:52 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-xe

== Series Details ==

Series: Introduce SRIOV scheduler groups
URL   : https://patchwork.freedesktop.org/series/158142/
State : success

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
[01:51:24] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[01:51:28] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[01:51:59] Starting KUnit Kernel (1/1)...
[01:51:59] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[01:51:59] ================== guc_buf (11 subtests) ===================
[01:51:59] [PASSED] test_smallest
[01:51:59] [PASSED] test_largest
[01:51:59] [PASSED] test_granular
[01:51:59] [PASSED] test_unique
[01:51:59] [PASSED] test_overlap
[01:51:59] [PASSED] test_reusable
[01:51:59] [PASSED] test_too_big
[01:51:59] [PASSED] test_flush
[01:51:59] [PASSED] test_lookup
[01:51:59] [PASSED] test_data
[01:51:59] [PASSED] test_class
[01:51:59] ===================== [PASSED] guc_buf =====================
[01:51:59] =================== guc_dbm (7 subtests) ===================
[01:51:59] [PASSED] test_empty
[01:51:59] [PASSED] test_default
[01:51:59] ======================== test_size  ========================
[01:51:59] [PASSED] 4
[01:51:59] [PASSED] 8
[01:51:59] [PASSED] 32
[01:51:59] [PASSED] 256
[01:51:59] ==================== [PASSED] test_size ====================
[01:51:59] ======================= test_reuse  ========================
[01:51:59] [PASSED] 4
[01:51:59] [PASSED] 8
[01:51:59] [PASSED] 32
[01:51:59] [PASSED] 256
[01:51:59] =================== [PASSED] test_reuse ====================
[01:51:59] =================== test_range_overlap  ====================
[01:51:59] [PASSED] 4
[01:51:59] [PASSED] 8
[01:51:59] [PASSED] 32
[01:51:59] [PASSED] 256
[01:51:59] =============== [PASSED] test_range_overlap ================
[01:51:59] =================== test_range_compact  ====================
[01:51:59] [PASSED] 4
[01:51:59] [PASSED] 8
[01:51:59] [PASSED] 32
[01:51:59] [PASSED] 256
[01:51:59] =============== [PASSED] test_range_compact ================
[01:51:59] ==================== test_range_spare  =====================
[01:51:59] [PASSED] 4
[01:51:59] [PASSED] 8
[01:51:59] [PASSED] 32
[01:51:59] [PASSED] 256
[01:51:59] ================ [PASSED] test_range_spare =================
[01:51:59] ===================== [PASSED] guc_dbm =====================
[01:51:59] =================== guc_idm (6 subtests) ===================
[01:51:59] [PASSED] bad_init
[01:51:59] [PASSED] no_init
[01:51:59] [PASSED] init_fini
[01:51:59] [PASSED] check_used
[01:51:59] [PASSED] check_quota
[01:51:59] [PASSED] check_all
[01:51:59] ===================== [PASSED] guc_idm =====================
[01:51:59] ================== no_relay (3 subtests) ===================
[01:51:59] [PASSED] xe_drops_guc2pf_if_not_ready
[01:51:59] [PASSED] xe_drops_guc2vf_if_not_ready
[01:51:59] [PASSED] xe_rejects_send_if_not_ready
[01:51:59] ==================== [PASSED] no_relay =====================
[01:51:59] ================== pf_relay (14 subtests) ==================
[01:51:59] [PASSED] pf_rejects_guc2pf_too_short
[01:51:59] [PASSED] pf_rejects_guc2pf_too_long
[01:51:59] [PASSED] pf_rejects_guc2pf_no_payload
[01:51:59] [PASSED] pf_fails_no_payload
[01:51:59] [PASSED] pf_fails_bad_origin
[01:51:59] [PASSED] pf_fails_bad_type
[01:51:59] [PASSED] pf_txn_reports_error
[01:51:59] [PASSED] pf_txn_sends_pf2guc
[01:51:59] [PASSED] pf_sends_pf2guc
[01:51:59] [SKIPPED] pf_loopback_nop
[01:51:59] [SKIPPED] pf_loopback_echo
[01:51:59] [SKIPPED] pf_loopback_fail
[01:51:59] [SKIPPED] pf_loopback_busy
[01:51:59] [SKIPPED] pf_loopback_retry
[01:51:59] ==================== [PASSED] pf_relay =====================
[01:51:59] ================== vf_relay (3 subtests) ===================
[01:51:59] [PASSED] vf_rejects_guc2vf_too_short
[01:51:59] [PASSED] vf_rejects_guc2vf_too_long
[01:51:59] [PASSED] vf_rejects_guc2vf_no_payload
[01:51:59] ==================== [PASSED] vf_relay =====================
[01:51:59] ================ pf_gt_config (6 subtests) =================
[01:51:59] [PASSED] fair_contexts_1vf
[01:51:59] [PASSED] fair_doorbells_1vf
[01:51:59] [PASSED] fair_ggtt_1vf
[01:51:59] ====================== fair_contexts  ======================
[01:51:59] [PASSED] 1 VF
[01:51:59] [PASSED] 2 VFs
[01:51:59] [PASSED] 3 VFs
[01:51:59] [PASSED] 4 VFs
[01:51:59] [PASSED] 5 VFs
[01:51:59] [PASSED] 6 VFs
[01:51:59] [PASSED] 7 VFs
[01:51:59] [PASSED] 8 VFs
[01:51:59] [PASSED] 9 VFs
[01:51:59] [PASSED] 10 VFs
[01:51:59] [PASSED] 11 VFs
[01:51:59] [PASSED] 12 VFs
[01:51:59] [PASSED] 13 VFs
[01:51:59] [PASSED] 14 VFs
[01:51:59] [PASSED] 15 VFs
[01:51:59] [PASSED] 16 VFs
[01:51:59] [PASSED] 17 VFs
[01:51:59] [PASSED] 18 VFs
[01:51:59] [PASSED] 19 VFs
[01:52:00] [PASSED] 20 VFs
[01:52:00] [PASSED] 21 VFs
[01:52:00] [PASSED] 22 VFs
[01:52:00] [PASSED] 23 VFs
[01:52:00] [PASSED] 24 VFs
[01:52:00] [PASSED] 25 VFs
[01:52:00] [PASSED] 26 VFs
[01:52:00] [PASSED] 27 VFs
[01:52:00] [PASSED] 28 VFs
[01:52:00] [PASSED] 29 VFs
[01:52:00] [PASSED] 30 VFs
[01:52:00] [PASSED] 31 VFs
[01:52:00] [PASSED] 32 VFs
[01:52:00] [PASSED] 33 VFs
[01:52:00] [PASSED] 34 VFs
[01:52:00] [PASSED] 35 VFs
[01:52:00] [PASSED] 36 VFs
[01:52:00] [PASSED] 37 VFs
[01:52:00] [PASSED] 38 VFs
[01:52:00] [PASSED] 39 VFs
[01:52:00] [PASSED] 40 VFs
[01:52:00] [PASSED] 41 VFs
[01:52:00] [PASSED] 42 VFs
[01:52:00] [PASSED] 43 VFs
[01:52:00] [PASSED] 44 VFs
[01:52:00] [PASSED] 45 VFs
[01:52:00] [PASSED] 46 VFs
[01:52:00] [PASSED] 47 VFs
[01:52:00] [PASSED] 48 VFs
[01:52:00] [PASSED] 49 VFs
[01:52:00] [PASSED] 50 VFs
[01:52:00] [PASSED] 51 VFs
[01:52:00] [PASSED] 52 VFs
[01:52:00] [PASSED] 53 VFs
[01:52:00] [PASSED] 54 VFs
[01:52:00] [PASSED] 55 VFs
[01:52:00] [PASSED] 56 VFs
[01:52:00] [PASSED] 57 VFs
[01:52:00] [PASSED] 58 VFs
[01:52:00] [PASSED] 59 VFs
[01:52:00] [PASSED] 60 VFs
[01:52:00] [PASSED] 61 VFs
[01:52:00] [PASSED] 62 VFs
[01:52:00] [PASSED] 63 VFs
[01:52:00] ================== [PASSED] fair_contexts ==================
[01:52:00] ===================== fair_doorbells  ======================
[01:52:00] [PASSED] 1 VF
[01:52:00] [PASSED] 2 VFs
[01:52:00] [PASSED] 3 VFs
[01:52:00] [PASSED] 4 VFs
[01:52:00] [PASSED] 5 VFs
[01:52:00] [PASSED] 6 VFs
[01:52:00] [PASSED] 7 VFs
[01:52:00] [PASSED] 8 VFs
[01:52:00] [PASSED] 9 VFs
[01:52:00] [PASSED] 10 VFs
[01:52:00] [PASSED] 11 VFs
[01:52:00] [PASSED] 12 VFs
[01:52:00] [PASSED] 13 VFs
[01:52:00] [PASSED] 14 VFs
[01:52:00] [PASSED] 15 VFs
[01:52:00] [PASSED] 16 VFs
[01:52:00] [PASSED] 17 VFs
[01:52:00] [PASSED] 18 VFs
[01:52:00] [PASSED] 19 VFs
[01:52:00] [PASSED] 20 VFs
[01:52:00] [PASSED] 21 VFs
[01:52:00] [PASSED] 22 VFs
[01:52:00] [PASSED] 23 VFs
[01:52:00] [PASSED] 24 VFs
[01:52:00] [PASSED] 25 VFs
[01:52:00] [PASSED] 26 VFs
[01:52:00] [PASSED] 27 VFs
[01:52:00] [PASSED] 28 VFs
[01:52:00] [PASSED] 29 VFs
[01:52:00] [PASSED] 30 VFs
[01:52:00] [PASSED] 31 VFs
[01:52:00] [PASSED] 32 VFs
[01:52:00] [PASSED] 33 VFs
[01:52:00] [PASSED] 34 VFs
[01:52:00] [PASSED] 35 VFs
[01:52:00] [PASSED] 36 VFs
[01:52:00] [PASSED] 37 VFs
[01:52:00] [PASSED] 38 VFs
[01:52:00] [PASSED] 39 VFs
[01:52:00] [PASSED] 40 VFs
[01:52:00] [PASSED] 41 VFs
[01:52:00] [PASSED] 42 VFs
[01:52:00] [PASSED] 43 VFs
[01:52:00] [PASSED] 44 VFs
[01:52:00] [PASSED] 45 VFs
[01:52:00] [PASSED] 46 VFs
[01:52:00] [PASSED] 47 VFs
[01:52:00] [PASSED] 48 VFs
[01:52:00] [PASSED] 49 VFs
[01:52:00] [PASSED] 50 VFs
[01:52:00] [PASSED] 51 VFs
[01:52:00] [PASSED] 52 VFs
[01:52:00] [PASSED] 53 VFs
[01:52:00] [PASSED] 54 VFs
[01:52:00] [PASSED] 55 VFs
[01:52:00] [PASSED] 56 VFs
[01:52:00] [PASSED] 57 VFs
[01:52:00] [PASSED] 58 VFs
[01:52:00] [PASSED] 59 VFs
[01:52:00] [PASSED] 60 VFs
[01:52:00] [PASSED] 61 VFs
[01:52:00] [PASSED] 62 VFs
[01:52:00] [PASSED] 63 VFs
[01:52:00] ================= [PASSED] fair_doorbells ==================
[01:52:00] ======================== fair_ggtt  ========================
[01:52:00] [PASSED] 1 VF
[01:52:00] [PASSED] 2 VFs
[01:52:00] [PASSED] 3 VFs
[01:52:00] [PASSED] 4 VFs
[01:52:00] [PASSED] 5 VFs
[01:52:00] [PASSED] 6 VFs
[01:52:00] [PASSED] 7 VFs
[01:52:00] [PASSED] 8 VFs
[01:52:00] [PASSED] 9 VFs
[01:52:00] [PASSED] 10 VFs
[01:52:00] [PASSED] 11 VFs
[01:52:00] [PASSED] 12 VFs
[01:52:00] [PASSED] 13 VFs
[01:52:00] [PASSED] 14 VFs
[01:52:00] [PASSED] 15 VFs
[01:52:00] [PASSED] 16 VFs
[01:52:00] [PASSED] 17 VFs
[01:52:00] [PASSED] 18 VFs
[01:52:00] [PASSED] 19 VFs
[01:52:00] [PASSED] 20 VFs
[01:52:00] [PASSED] 21 VFs
[01:52:00] [PASSED] 22 VFs
[01:52:00] [PASSED] 23 VFs
[01:52:00] [PASSED] 24 VFs
[01:52:00] [PASSED] 25 VFs
[01:52:00] [PASSED] 26 VFs
[01:52:00] [PASSED] 27 VFs
[01:52:00] [PASSED] 28 VFs
[01:52:00] [PASSED] 29 VFs
[01:52:00] [PASSED] 30 VFs
[01:52:00] [PASSED] 31 VFs
[01:52:00] [PASSED] 32 VFs
[01:52:00] [PASSED] 33 VFs
[01:52:00] [PASSED] 34 VFs
[01:52:00] [PASSED] 35 VFs
[01:52:00] [PASSED] 36 VFs
[01:52:00] [PASSED] 37 VFs
[01:52:00] [PASSED] 38 VFs
[01:52:00] [PASSED] 39 VFs
[01:52:00] [PASSED] 40 VFs
[01:52:00] [PASSED] 41 VFs
[01:52:00] [PASSED] 42 VFs
[01:52:00] [PASSED] 43 VFs
[01:52:00] [PASSED] 44 VFs
[01:52:00] [PASSED] 45 VFs
[01:52:00] [PASSED] 46 VFs
[01:52:00] [PASSED] 47 VFs
[01:52:00] [PASSED] 48 VFs
[01:52:00] [PASSED] 49 VFs
[01:52:00] [PASSED] 50 VFs
[01:52:00] [PASSED] 51 VFs
[01:52:00] [PASSED] 52 VFs
[01:52:00] [PASSED] 53 VFs
[01:52:00] [PASSED] 54 VFs
[01:52:00] [PASSED] 55 VFs
[01:52:00] [PASSED] 56 VFs
[01:52:00] [PASSED] 57 VFs
[01:52:00] [PASSED] 58 VFs
[01:52:00] [PASSED] 59 VFs
[01:52:00] [PASSED] 60 VFs
[01:52:00] [PASSED] 61 VFs
[01:52:00] [PASSED] 62 VFs
[01:52:00] [PASSED] 63 VFs
[01:52:00] ==================== [PASSED] fair_ggtt ====================
[01:52:00] ================== [PASSED] pf_gt_config ===================
[01:52:00] ===================== lmtt (1 subtest) =====================
[01:52:00] ======================== test_ops  =========================
[01:52:00] [PASSED] 2-level
[01:52:00] [PASSED] multi-level
[01:52:00] ==================== [PASSED] test_ops =====================
[01:52:00] ====================== [PASSED] lmtt =======================
[01:52:00] ================= pf_service (11 subtests) =================
[01:52:00] [PASSED] pf_negotiate_any
[01:52:00] [PASSED] pf_negotiate_base_match
[01:52:00] [PASSED] pf_negotiate_base_newer
[01:52:00] [PASSED] pf_negotiate_base_next
[01:52:00] [SKIPPED] pf_negotiate_base_older
[01:52:00] [PASSED] pf_negotiate_base_prev
[01:52:00] [PASSED] pf_negotiate_latest_match
[01:52:00] [PASSED] pf_negotiate_latest_newer
[01:52:00] [PASSED] pf_negotiate_latest_next
[01:52:00] [SKIPPED] pf_negotiate_latest_older
[01:52:00] [SKIPPED] pf_negotiate_latest_prev
[01:52:00] =================== [PASSED] pf_service ====================
[01:52:00] ================= xe_guc_g2g (2 subtests) ==================
[01:52:00] ============== xe_live_guc_g2g_kunit_default  ==============
[01:52:00] ========= [SKIPPED] xe_live_guc_g2g_kunit_default ==========
[01:52:00] ============== xe_live_guc_g2g_kunit_allmem  ===============
[01:52:00] ========== [SKIPPED] xe_live_guc_g2g_kunit_allmem ==========
[01:52:00] =================== [SKIPPED] xe_guc_g2g ===================
[01:52:00] =================== xe_mocs (2 subtests) ===================
[01:52:00] ================ xe_live_mocs_kernel_kunit  ================
[01:52:00] =========== [SKIPPED] xe_live_mocs_kernel_kunit ============
[01:52:00] ================ xe_live_mocs_reset_kunit  =================
[01:52:00] ============ [SKIPPED] xe_live_mocs_reset_kunit ============
[01:52:00] ==================== [SKIPPED] xe_mocs =====================
[01:52:00] ================= xe_migrate (2 subtests) ==================
[01:52:00] ================= xe_migrate_sanity_kunit  =================
[01:52:00] ============ [SKIPPED] xe_migrate_sanity_kunit =============
[01:52:00] ================== xe_validate_ccs_kunit  ==================
[01:52:00] ============= [SKIPPED] xe_validate_ccs_kunit ==============
[01:52:00] =================== [SKIPPED] xe_migrate ===================
[01:52:00] ================== xe_dma_buf (1 subtest) ==================
[01:52:00] ==================== xe_dma_buf_kunit  =====================
[01:52:00] ================ [SKIPPED] xe_dma_buf_kunit ================
[01:52:00] =================== [SKIPPED] xe_dma_buf ===================
[01:52:00] ================= xe_bo_shrink (1 subtest) =================
[01:52:00] =================== xe_bo_shrink_kunit  ====================
[01:52:00] =============== [SKIPPED] xe_bo_shrink_kunit ===============
[01:52:00] ================== [SKIPPED] xe_bo_shrink ==================
[01:52:00] ==================== xe_bo (2 subtests) ====================
[01:52:00] ================== xe_ccs_migrate_kunit  ===================
[01:52:00] ============== [SKIPPED] xe_ccs_migrate_kunit ==============
[01:52:00] ==================== xe_bo_evict_kunit  ====================
[01:52:00] =============== [SKIPPED] xe_bo_evict_kunit ================
[01:52:00] ===================== [SKIPPED] xe_bo ======================
[01:52:00] ==================== args (11 subtests) ====================
[01:52:00] [PASSED] count_args_test
[01:52:00] [PASSED] call_args_example
[01:52:00] [PASSED] call_args_test
[01:52:00] [PASSED] drop_first_arg_example
[01:52:00] [PASSED] drop_first_arg_test
[01:52:00] [PASSED] first_arg_example
[01:52:00] [PASSED] first_arg_test
[01:52:00] [PASSED] last_arg_example
[01:52:00] [PASSED] last_arg_test
[01:52:00] [PASSED] pick_arg_example
[01:52:00] [PASSED] sep_comma_example
[01:52:00] ====================== [PASSED] args =======================
[01:52:00] =================== xe_pci (3 subtests) ====================
[01:52:00] ==================== check_graphics_ip  ====================
[01:52:00] [PASSED] 12.00 Xe_LP
[01:52:00] [PASSED] 12.10 Xe_LP+
[01:52:00] [PASSED] 12.55 Xe_HPG
[01:52:00] [PASSED] 12.60 Xe_HPC
[01:52:00] [PASSED] 12.70 Xe_LPG
[01:52:00] [PASSED] 12.71 Xe_LPG
[01:52:00] [PASSED] 12.74 Xe_LPG+
[01:52:00] [PASSED] 20.01 Xe2_HPG
[01:52:00] [PASSED] 20.02 Xe2_HPG
[01:52:00] [PASSED] 20.04 Xe2_LPG
[01:52:00] [PASSED] 30.00 Xe3_LPG
[01:52:00] [PASSED] 30.01 Xe3_LPG
[01:52:00] [PASSED] 30.03 Xe3_LPG
[01:52:00] [PASSED] 30.04 Xe3_LPG
[01:52:00] [PASSED] 30.05 Xe3_LPG
[01:52:00] [PASSED] 35.11 Xe3p_XPC
[01:52:00] ================ [PASSED] check_graphics_ip ================
[01:52:00] ===================== check_media_ip  ======================
[01:52:00] [PASSED] 12.00 Xe_M
[01:52:00] [PASSED] 12.55 Xe_HPM
[01:52:00] [PASSED] 13.00 Xe_LPM+
[01:52:00] [PASSED] 13.01 Xe2_HPM
[01:52:00] [PASSED] 20.00 Xe2_LPM
[01:52:00] [PASSED] 30.00 Xe3_LPM
[01:52:00] [PASSED] 30.02 Xe3_LPM
[01:52:00] [PASSED] 35.00 Xe3p_LPM
[01:52:00] [PASSED] 35.03 Xe3p_HPM
[01:52:00] ================= [PASSED] check_media_ip ==================
[01:52:00] =================== check_platform_desc  ===================
[01:52:00] [PASSED] 0x9A60 (TIGERLAKE)
[01:52:00] [PASSED] 0x9A68 (TIGERLAKE)
[01:52:00] [PASSED] 0x9A70 (TIGERLAKE)
[01:52:00] [PASSED] 0x9A40 (TIGERLAKE)
[01:52:00] [PASSED] 0x9A49 (TIGERLAKE)
[01:52:00] [PASSED] 0x9A59 (TIGERLAKE)
[01:52:00] [PASSED] 0x9A78 (TIGERLAKE)
[01:52:00] [PASSED] 0x9AC0 (TIGERLAKE)
[01:52:00] [PASSED] 0x9AC9 (TIGERLAKE)
[01:52:00] [PASSED] 0x9AD9 (TIGERLAKE)
[01:52:00] [PASSED] 0x9AF8 (TIGERLAKE)
[01:52:00] [PASSED] 0x4C80 (ROCKETLAKE)
[01:52:00] [PASSED] 0x4C8A (ROCKETLAKE)
[01:52:00] [PASSED] 0x4C8B (ROCKETLAKE)
[01:52:00] [PASSED] 0x4C8C (ROCKETLAKE)
[01:52:00] [PASSED] 0x4C90 (ROCKETLAKE)
[01:52:00] [PASSED] 0x4C9A (ROCKETLAKE)
[01:52:00] [PASSED] 0x4680 (ALDERLAKE_S)
[01:52:00] [PASSED] 0x4682 (ALDERLAKE_S)
[01:52:00] [PASSED] 0x4688 (ALDERLAKE_S)
[01:52:00] [PASSED] 0x468A (ALDERLAKE_S)
[01:52:00] [PASSED] 0x468B (ALDERLAKE_S)
[01:52:00] [PASSED] 0x4690 (ALDERLAKE_S)
[01:52:00] [PASSED] 0x4692 (ALDERLAKE_S)
[01:52:00] [PASSED] 0x4693 (ALDERLAKE_S)
[01:52:00] [PASSED] 0x46A0 (ALDERLAKE_P)
[01:52:00] [PASSED] 0x46A1 (ALDERLAKE_P)
[01:52:00] [PASSED] 0x46A2 (ALDERLAKE_P)
[01:52:00] [PASSED] 0x46A3 (ALDERLAKE_P)
[01:52:00] [PASSED] 0x46A6 (ALDERLAKE_P)
[01:52:00] [PASSED] 0x46A8 (ALDERLAKE_P)
[01:52:00] [PASSED] 0x46AA (ALDERLAKE_P)
[01:52:00] [PASSED] 0x462A (ALDERLAKE_P)
[01:52:00] [PASSED] 0x4626 (ALDERLAKE_P)
[01:52:00] [PASSED] 0x4628 (ALDERLAKE_P)
[01:52:00] [PASSED] 0x46B0 (ALDERLAKE_P)
stty: 'standard input': Inappropriate ioctl for device
[01:52:00] [PASSED] 0x46B1 (ALDERLAKE_P)
[01:52:00] [PASSED] 0x46B2 (ALDERLAKE_P)
[01:52:00] [PASSED] 0x46B3 (ALDERLAKE_P)
[01:52:00] [PASSED] 0x46C0 (ALDERLAKE_P)
[01:52:00] [PASSED] 0x46C1 (ALDERLAKE_P)
[01:52:00] [PASSED] 0x46C2 (ALDERLAKE_P)
[01:52:00] [PASSED] 0x46C3 (ALDERLAKE_P)
[01:52:00] [PASSED] 0x46D0 (ALDERLAKE_N)
[01:52:00] [PASSED] 0x46D1 (ALDERLAKE_N)
[01:52:00] [PASSED] 0x46D2 (ALDERLAKE_N)
[01:52:00] [PASSED] 0x46D3 (ALDERLAKE_N)
[01:52:00] [PASSED] 0x46D4 (ALDERLAKE_N)
[01:52:00] [PASSED] 0xA721 (ALDERLAKE_P)
[01:52:00] [PASSED] 0xA7A1 (ALDERLAKE_P)
[01:52:00] [PASSED] 0xA7A9 (ALDERLAKE_P)
[01:52:00] [PASSED] 0xA7AC (ALDERLAKE_P)
[01:52:00] [PASSED] 0xA7AD (ALDERLAKE_P)
[01:52:00] [PASSED] 0xA720 (ALDERLAKE_P)
[01:52:00] [PASSED] 0xA7A0 (ALDERLAKE_P)
[01:52:00] [PASSED] 0xA7A8 (ALDERLAKE_P)
[01:52:00] [PASSED] 0xA7AA (ALDERLAKE_P)
[01:52:00] [PASSED] 0xA7AB (ALDERLAKE_P)
[01:52:00] [PASSED] 0xA780 (ALDERLAKE_S)
[01:52:00] [PASSED] 0xA781 (ALDERLAKE_S)
[01:52:00] [PASSED] 0xA782 (ALDERLAKE_S)
[01:52:00] [PASSED] 0xA783 (ALDERLAKE_S)
[01:52:00] [PASSED] 0xA788 (ALDERLAKE_S)
[01:52:00] [PASSED] 0xA789 (ALDERLAKE_S)
[01:52:00] [PASSED] 0xA78A (ALDERLAKE_S)
[01:52:00] [PASSED] 0xA78B (ALDERLAKE_S)
[01:52:00] [PASSED] 0x4905 (DG1)
[01:52:00] [PASSED] 0x4906 (DG1)
[01:52:00] [PASSED] 0x4907 (DG1)
[01:52:00] [PASSED] 0x4908 (DG1)
[01:52:00] [PASSED] 0x4909 (DG1)
[01:52:00] [PASSED] 0x56C0 (DG2)
[01:52:00] [PASSED] 0x56C2 (DG2)
[01:52:00] [PASSED] 0x56C1 (DG2)
[01:52:00] [PASSED] 0x7D51 (METEORLAKE)
[01:52:00] [PASSED] 0x7DD1 (METEORLAKE)
[01:52:00] [PASSED] 0x7D41 (METEORLAKE)
[01:52:00] [PASSED] 0x7D67 (METEORLAKE)
[01:52:00] [PASSED] 0xB640 (METEORLAKE)
[01:52:00] [PASSED] 0x56A0 (DG2)
[01:52:00] [PASSED] 0x56A1 (DG2)
[01:52:00] [PASSED] 0x56A2 (DG2)
[01:52:00] [PASSED] 0x56BE (DG2)
[01:52:00] [PASSED] 0x56BF (DG2)
[01:52:00] [PASSED] 0x5690 (DG2)
[01:52:00] [PASSED] 0x5691 (DG2)
[01:52:00] [PASSED] 0x5692 (DG2)
[01:52:00] [PASSED] 0x56A5 (DG2)
[01:52:00] [PASSED] 0x56A6 (DG2)
[01:52:00] [PASSED] 0x56B0 (DG2)
[01:52:00] [PASSED] 0x56B1 (DG2)
[01:52:00] [PASSED] 0x56BA (DG2)
[01:52:00] [PASSED] 0x56BB (DG2)
[01:52:00] [PASSED] 0x56BC (DG2)
[01:52:00] [PASSED] 0x56BD (DG2)
[01:52:00] [PASSED] 0x5693 (DG2)
[01:52:00] [PASSED] 0x5694 (DG2)
[01:52:00] [PASSED] 0x5695 (DG2)
[01:52:00] [PASSED] 0x56A3 (DG2)
[01:52:00] [PASSED] 0x56A4 (DG2)
[01:52:00] [PASSED] 0x56B2 (DG2)
[01:52:00] [PASSED] 0x56B3 (DG2)
[01:52:00] [PASSED] 0x5696 (DG2)
[01:52:00] [PASSED] 0x5697 (DG2)
[01:52:00] [PASSED] 0xB69 (PVC)
[01:52:00] [PASSED] 0xB6E (PVC)
[01:52:00] [PASSED] 0xBD4 (PVC)
[01:52:00] [PASSED] 0xBD5 (PVC)
[01:52:00] [PASSED] 0xBD6 (PVC)
[01:52:00] [PASSED] 0xBD7 (PVC)
[01:52:00] [PASSED] 0xBD8 (PVC)
[01:52:00] [PASSED] 0xBD9 (PVC)
[01:52:00] [PASSED] 0xBDA (PVC)
[01:52:00] [PASSED] 0xBDB (PVC)
[01:52:00] [PASSED] 0xBE0 (PVC)
[01:52:00] [PASSED] 0xBE1 (PVC)
[01:52:00] [PASSED] 0xBE5 (PVC)
[01:52:00] [PASSED] 0x7D40 (METEORLAKE)
[01:52:00] [PASSED] 0x7D45 (METEORLAKE)
[01:52:00] [PASSED] 0x7D55 (METEORLAKE)
[01:52:00] [PASSED] 0x7D60 (METEORLAKE)
[01:52:00] [PASSED] 0x7DD5 (METEORLAKE)
[01:52:00] [PASSED] 0x6420 (LUNARLAKE)
[01:52:00] [PASSED] 0x64A0 (LUNARLAKE)
[01:52:00] [PASSED] 0x64B0 (LUNARLAKE)
[01:52:00] [PASSED] 0xE202 (BATTLEMAGE)
[01:52:00] [PASSED] 0xE209 (BATTLEMAGE)
[01:52:00] [PASSED] 0xE20B (BATTLEMAGE)
[01:52:00] [PASSED] 0xE20C (BATTLEMAGE)
[01:52:00] [PASSED] 0xE20D (BATTLEMAGE)
[01:52:00] [PASSED] 0xE210 (BATTLEMAGE)
[01:52:00] [PASSED] 0xE211 (BATTLEMAGE)
[01:52:00] [PASSED] 0xE212 (BATTLEMAGE)
[01:52:00] [PASSED] 0xE216 (BATTLEMAGE)
[01:52:00] [PASSED] 0xE220 (BATTLEMAGE)
[01:52:00] [PASSED] 0xE221 (BATTLEMAGE)
[01:52:00] [PASSED] 0xE222 (BATTLEMAGE)
[01:52:00] [PASSED] 0xE223 (BATTLEMAGE)
[01:52:00] [PASSED] 0xB080 (PANTHERLAKE)
[01:52:00] [PASSED] 0xB081 (PANTHERLAKE)
[01:52:00] [PASSED] 0xB082 (PANTHERLAKE)
[01:52:00] [PASSED] 0xB083 (PANTHERLAKE)
[01:52:00] [PASSED] 0xB084 (PANTHERLAKE)
[01:52:00] [PASSED] 0xB085 (PANTHERLAKE)
[01:52:00] [PASSED] 0xB086 (PANTHERLAKE)
[01:52:00] [PASSED] 0xB087 (PANTHERLAKE)
[01:52:00] [PASSED] 0xB08F (PANTHERLAKE)
[01:52:00] [PASSED] 0xB090 (PANTHERLAKE)
[01:52:00] [PASSED] 0xB0A0 (PANTHERLAKE)
[01:52:00] [PASSED] 0xB0B0 (PANTHERLAKE)
[01:52:00] [PASSED] 0xD740 (NOVALAKE_S)
[01:52:00] [PASSED] 0xD741 (NOVALAKE_S)
[01:52:00] [PASSED] 0xD742 (NOVALAKE_S)
[01:52:00] [PASSED] 0xD743 (NOVALAKE_S)
[01:52:00] [PASSED] 0xD744 (NOVALAKE_S)
[01:52:00] [PASSED] 0xD745 (NOVALAKE_S)
[01:52:00] [PASSED] 0x674C (CRESCENTISLAND)
[01:52:00] [PASSED] 0xFD80 (PANTHERLAKE)
[01:52:00] [PASSED] 0xFD81 (PANTHERLAKE)
[01:52:00] =============== [PASSED] check_platform_desc ===============
[01:52:00] ===================== [PASSED] xe_pci ======================
[01:52:00] =================== xe_rtp (2 subtests) ====================
[01:52:00] =============== xe_rtp_process_to_sr_tests  ================
[01:52:00] [PASSED] coalesce-same-reg
[01:52:00] [PASSED] no-match-no-add
[01:52:00] [PASSED] match-or
[01:52:00] [PASSED] match-or-xfail
[01:52:00] [PASSED] no-match-no-add-multiple-rules
[01:52:00] [PASSED] two-regs-two-entries
[01:52:00] [PASSED] clr-one-set-other
[01:52:00] [PASSED] set-field
[01:52:00] [PASSED] conflict-duplicate
[01:52:00] [PASSED] conflict-not-disjoint
[01:52:00] [PASSED] conflict-reg-type
[01:52:00] =========== [PASSED] xe_rtp_process_to_sr_tests ============
[01:52:00] ================== xe_rtp_process_tests  ===================
[01:52:00] [PASSED] active1
[01:52:00] [PASSED] active2
[01:52:00] [PASSED] active-inactive
[01:52:00] [PASSED] inactive-active
[01:52:00] [PASSED] inactive-1st_or_active-inactive
[01:52:00] [PASSED] inactive-2nd_or_active-inactive
[01:52:00] [PASSED] inactive-last_or_active-inactive
[01:52:00] [PASSED] inactive-no_or_active-inactive
[01:52:00] ============== [PASSED] xe_rtp_process_tests ===============
[01:52:00] ===================== [PASSED] xe_rtp ======================
[01:52:00] ==================== xe_wa (1 subtest) =====================
[01:52:00] ======================== xe_wa_gt  =========================
[01:52:00] [PASSED] TIGERLAKE B0
[01:52:00] [PASSED] DG1 A0
[01:52:00] [PASSED] DG1 B0
[01:52:00] [PASSED] ALDERLAKE_S A0
[01:52:00] [PASSED] ALDERLAKE_S B0
[01:52:00] [PASSED] ALDERLAKE_S C0
[01:52:00] [PASSED] ALDERLAKE_S D0
[01:52:00] [PASSED] ALDERLAKE_P A0
[01:52:00] [PASSED] ALDERLAKE_P B0
[01:52:00] [PASSED] ALDERLAKE_P C0
[01:52:00] [PASSED] ALDERLAKE_S RPLS D0
[01:52:00] [PASSED] ALDERLAKE_P RPLU E0
[01:52:00] [PASSED] DG2 G10 C0
[01:52:00] [PASSED] DG2 G11 B1
[01:52:00] [PASSED] DG2 G12 A1
[01:52:00] [PASSED] METEORLAKE 12.70(Xe_LPG) A0 13.00(Xe_LPM+) A0
[01:52:00] [PASSED] METEORLAKE 12.71(Xe_LPG) A0 13.00(Xe_LPM+) A0
[01:52:00] [PASSED] METEORLAKE 12.74(Xe_LPG+) A0 13.00(Xe_LPM+) A0
[01:52:00] [PASSED] LUNARLAKE 20.04(Xe2_LPG) A0 20.00(Xe2_LPM) A0
[01:52:00] [PASSED] LUNARLAKE 20.04(Xe2_LPG) B0 20.00(Xe2_LPM) A0
[01:52:00] [PASSED] BATTLEMAGE 20.01(Xe2_HPG) A0 13.01(Xe2_HPM) A1
[01:52:00] [PASSED] PANTHERLAKE 30.00(Xe3_LPG) A0 30.00(Xe3_LPM) A0
[01:52:00] ==================== [PASSED] xe_wa_gt =====================
[01:52:00] ====================== [PASSED] xe_wa ======================
[01:52:00] ============================================================
[01:52:00] Testing complete. Ran 510 tests: passed: 492, skipped: 18
[01:52:00] Elapsed time: 35.796s total, 4.247s configuring, 31.083s building, 0.453s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/tests/.kunitconfig
[01:52:00] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[01:52:01] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[01:52:27] Starting KUnit Kernel (1/1)...
[01:52:27] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[01:52:27] ============ drm_test_pick_cmdline (2 subtests) ============
[01:52:27] [PASSED] drm_test_pick_cmdline_res_1920_1080_60
[01:52:27] =============== drm_test_pick_cmdline_named  ===============
[01:52:27] [PASSED] NTSC
[01:52:27] [PASSED] NTSC-J
[01:52:27] [PASSED] PAL
[01:52:27] [PASSED] PAL-M
[01:52:27] =========== [PASSED] drm_test_pick_cmdline_named ===========
[01:52:27] ============== [PASSED] drm_test_pick_cmdline ==============
[01:52:27] == drm_test_atomic_get_connector_for_encoder (1 subtest) ===
[01:52:27] [PASSED] drm_test_drm_atomic_get_connector_for_encoder
[01:52:27] ==== [PASSED] drm_test_atomic_get_connector_for_encoder ====
[01:52:27] =========== drm_validate_clone_mode (2 subtests) ===========
[01:52:27] ============== drm_test_check_in_clone_mode  ===============
[01:52:27] [PASSED] in_clone_mode
[01:52:27] [PASSED] not_in_clone_mode
[01:52:27] ========== [PASSED] drm_test_check_in_clone_mode ===========
[01:52:27] =============== drm_test_check_valid_clones  ===============
[01:52:27] [PASSED] not_in_clone_mode
[01:52:27] [PASSED] valid_clone
[01:52:27] [PASSED] invalid_clone
[01:52:27] =========== [PASSED] drm_test_check_valid_clones ===========
[01:52:27] ============= [PASSED] drm_validate_clone_mode =============
[01:52:27] ============= drm_validate_modeset (1 subtest) =============
[01:52:27] [PASSED] drm_test_check_connector_changed_modeset
[01:52:27] ============== [PASSED] drm_validate_modeset ===============
[01:52:27] ====== drm_test_bridge_get_current_state (2 subtests) ======
[01:52:27] [PASSED] drm_test_drm_bridge_get_current_state_atomic
[01:52:27] [PASSED] drm_test_drm_bridge_get_current_state_legacy
[01:52:27] ======== [PASSED] drm_test_bridge_get_current_state ========
[01:52:27] ====== drm_test_bridge_helper_reset_crtc (3 subtests) ======
[01:52:27] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic
[01:52:27] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic_disabled
[01:52:27] [PASSED] drm_test_drm_bridge_helper_reset_crtc_legacy
[01:52:27] ======== [PASSED] drm_test_bridge_helper_reset_crtc ========
[01:52:27] ============== drm_bridge_alloc (2 subtests) ===============
[01:52:27] [PASSED] drm_test_drm_bridge_alloc_basic
[01:52:27] [PASSED] drm_test_drm_bridge_alloc_get_put
[01:52:27] ================ [PASSED] drm_bridge_alloc =================
[01:52:27] ================== drm_buddy (8 subtests) ==================
[01:52:27] [PASSED] drm_test_buddy_alloc_limit
[01:52:27] [PASSED] drm_test_buddy_alloc_optimistic
[01:52:27] [PASSED] drm_test_buddy_alloc_pessimistic
[01:52:27] [PASSED] drm_test_buddy_alloc_pathological
[01:52:27] [PASSED] drm_test_buddy_alloc_contiguous
[01:52:27] [PASSED] drm_test_buddy_alloc_clear
[01:52:27] [PASSED] drm_test_buddy_alloc_range_bias
[01:52:27] [PASSED] drm_test_buddy_fragmentation_performance
[01:52:27] ==================== [PASSED] drm_buddy ====================
[01:52:27] ============= drm_cmdline_parser (40 subtests) =============
[01:52:27] [PASSED] drm_test_cmdline_force_d_only
[01:52:27] [PASSED] drm_test_cmdline_force_D_only_dvi
[01:52:27] [PASSED] drm_test_cmdline_force_D_only_hdmi
[01:52:27] [PASSED] drm_test_cmdline_force_D_only_not_digital
[01:52:27] [PASSED] drm_test_cmdline_force_e_only
[01:52:27] [PASSED] drm_test_cmdline_res
[01:52:27] [PASSED] drm_test_cmdline_res_vesa
[01:52:27] [PASSED] drm_test_cmdline_res_vesa_rblank
[01:52:27] [PASSED] drm_test_cmdline_res_rblank
[01:52:27] [PASSED] drm_test_cmdline_res_bpp
[01:52:27] [PASSED] drm_test_cmdline_res_refresh
[01:52:27] [PASSED] drm_test_cmdline_res_bpp_refresh
[01:52:27] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced
[01:52:27] [PASSED] drm_test_cmdline_res_bpp_refresh_margins
[01:52:27] [PASSED] drm_test_cmdline_res_bpp_refresh_force_off
[01:52:27] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on
[01:52:27] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_analog
[01:52:27] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_digital
[01:52:27] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced_margins_force_on
[01:52:27] [PASSED] drm_test_cmdline_res_margins_force_on
[01:52:27] [PASSED] drm_test_cmdline_res_vesa_margins
[01:52:27] [PASSED] drm_test_cmdline_name
[01:52:27] [PASSED] drm_test_cmdline_name_bpp
[01:52:27] [PASSED] drm_test_cmdline_name_option
[01:52:27] [PASSED] drm_test_cmdline_name_bpp_option
[01:52:27] [PASSED] drm_test_cmdline_rotate_0
[01:52:27] [PASSED] drm_test_cmdline_rotate_90
[01:52:27] [PASSED] drm_test_cmdline_rotate_180
[01:52:27] [PASSED] drm_test_cmdline_rotate_270
[01:52:27] [PASSED] drm_test_cmdline_hmirror
[01:52:27] [PASSED] drm_test_cmdline_vmirror
[01:52:27] [PASSED] drm_test_cmdline_margin_options
[01:52:27] [PASSED] drm_test_cmdline_multiple_options
[01:52:27] [PASSED] drm_test_cmdline_bpp_extra_and_option
[01:52:27] [PASSED] drm_test_cmdline_extra_and_option
[01:52:27] [PASSED] drm_test_cmdline_freestanding_options
[01:52:27] [PASSED] drm_test_cmdline_freestanding_force_e_and_options
[01:52:27] [PASSED] drm_test_cmdline_panel_orientation
[01:52:27] ================ drm_test_cmdline_invalid  =================
[01:52:27] [PASSED] margin_only
[01:52:27] [PASSED] interlace_only
[01:52:27] [PASSED] res_missing_x
[01:52:27] [PASSED] res_missing_y
[01:52:27] [PASSED] res_bad_y
[01:52:27] [PASSED] res_missing_y_bpp
[01:52:27] [PASSED] res_bad_bpp
[01:52:27] [PASSED] res_bad_refresh
[01:52:27] [PASSED] res_bpp_refresh_force_on_off
[01:52:27] [PASSED] res_invalid_mode
[01:52:27] [PASSED] res_bpp_wrong_place_mode
[01:52:27] [PASSED] name_bpp_refresh
[01:52:27] [PASSED] name_refresh
[01:52:27] [PASSED] name_refresh_wrong_mode
[01:52:27] [PASSED] name_refresh_invalid_mode
[01:52:27] [PASSED] rotate_multiple
[01:52:27] [PASSED] rotate_invalid_val
[01:52:27] [PASSED] rotate_truncated
[01:52:27] [PASSED] invalid_option
[01:52:27] [PASSED] invalid_tv_option
[01:52:27] [PASSED] truncated_tv_option
[01:52:27] ============ [PASSED] drm_test_cmdline_invalid =============
[01:52:27] =============== drm_test_cmdline_tv_options  ===============
[01:52:27] [PASSED] NTSC
[01:52:27] [PASSED] NTSC_443
[01:52:27] [PASSED] NTSC_J
[01:52:27] [PASSED] PAL
[01:52:27] [PASSED] PAL_M
[01:52:27] [PASSED] PAL_N
[01:52:27] [PASSED] SECAM
[01:52:27] [PASSED] MONO_525
[01:52:27] [PASSED] MONO_625
[01:52:27] =========== [PASSED] drm_test_cmdline_tv_options ===========
[01:52:27] =============== [PASSED] drm_cmdline_parser ================
[01:52:27] ========== drmm_connector_hdmi_init (20 subtests) ==========
[01:52:27] [PASSED] drm_test_connector_hdmi_init_valid
[01:52:27] [PASSED] drm_test_connector_hdmi_init_bpc_8
[01:52:27] [PASSED] drm_test_connector_hdmi_init_bpc_10
[01:52:27] [PASSED] drm_test_connector_hdmi_init_bpc_12
[01:52:27] [PASSED] drm_test_connector_hdmi_init_bpc_invalid
[01:52:27] [PASSED] drm_test_connector_hdmi_init_bpc_null
[01:52:27] [PASSED] drm_test_connector_hdmi_init_formats_empty
[01:52:27] [PASSED] drm_test_connector_hdmi_init_formats_no_rgb
[01:52:27] === drm_test_connector_hdmi_init_formats_yuv420_allowed  ===
[01:52:27] [PASSED] supported_formats=0x9 yuv420_allowed=1
[01:52:27] [PASSED] supported_formats=0x9 yuv420_allowed=0
[01:52:27] [PASSED] supported_formats=0x3 yuv420_allowed=1
[01:52:27] [PASSED] supported_formats=0x3 yuv420_allowed=0
[01:52:27] === [PASSED] drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[01:52:27] [PASSED] drm_test_connector_hdmi_init_null_ddc
[01:52:27] [PASSED] drm_test_connector_hdmi_init_null_product
[01:52:27] [PASSED] drm_test_connector_hdmi_init_null_vendor
[01:52:27] [PASSED] drm_test_connector_hdmi_init_product_length_exact
[01:52:27] [PASSED] drm_test_connector_hdmi_init_product_length_too_long
[01:52:27] [PASSED] drm_test_connector_hdmi_init_product_valid
[01:52:27] [PASSED] drm_test_connector_hdmi_init_vendor_length_exact
[01:52:27] [PASSED] drm_test_connector_hdmi_init_vendor_length_too_long
[01:52:27] [PASSED] drm_test_connector_hdmi_init_vendor_valid
[01:52:27] ========= drm_test_connector_hdmi_init_type_valid  =========
[01:52:27] [PASSED] HDMI-A
[01:52:27] [PASSED] HDMI-B
[01:52:27] ===== [PASSED] drm_test_connector_hdmi_init_type_valid =====
[01:52:27] ======== drm_test_connector_hdmi_init_type_invalid  ========
[01:52:27] [PASSED] Unknown
[01:52:27] [PASSED] VGA
[01:52:27] [PASSED] DVI-I
[01:52:27] [PASSED] DVI-D
[01:52:27] [PASSED] DVI-A
[01:52:27] [PASSED] Composite
[01:52:27] [PASSED] SVIDEO
[01:52:27] [PASSED] LVDS
[01:52:27] [PASSED] Component
[01:52:27] [PASSED] DIN
[01:52:27] [PASSED] DP
[01:52:27] [PASSED] TV
[01:52:27] [PASSED] eDP
[01:52:27] [PASSED] Virtual
[01:52:27] [PASSED] DSI
[01:52:27] [PASSED] DPI
[01:52:27] [PASSED] Writeback
[01:52:27] [PASSED] SPI
[01:52:27] [PASSED] USB
[01:52:27] ==== [PASSED] drm_test_connector_hdmi_init_type_invalid ====
[01:52:27] ============ [PASSED] drmm_connector_hdmi_init =============
[01:52:27] ============= drmm_connector_init (3 subtests) =============
[01:52:27] [PASSED] drm_test_drmm_connector_init
[01:52:27] [PASSED] drm_test_drmm_connector_init_null_ddc
[01:52:27] ========= drm_test_drmm_connector_init_type_valid  =========
[01:52:27] [PASSED] Unknown
[01:52:27] [PASSED] VGA
[01:52:27] [PASSED] DVI-I
[01:52:27] [PASSED] DVI-D
[01:52:27] [PASSED] DVI-A
[01:52:27] [PASSED] Composite
[01:52:27] [PASSED] SVIDEO
[01:52:27] [PASSED] LVDS
[01:52:27] [PASSED] Component
[01:52:27] [PASSED] DIN
[01:52:27] [PASSED] DP
[01:52:27] [PASSED] HDMI-A
[01:52:27] [PASSED] HDMI-B
[01:52:27] [PASSED] TV
[01:52:27] [PASSED] eDP
[01:52:27] [PASSED] Virtual
[01:52:27] [PASSED] DSI
[01:52:27] [PASSED] DPI
[01:52:27] [PASSED] Writeback
[01:52:27] [PASSED] SPI
[01:52:27] [PASSED] USB
[01:52:27] ===== [PASSED] drm_test_drmm_connector_init_type_valid =====
[01:52:27] =============== [PASSED] drmm_connector_init ===============
[01:52:27] ========= drm_connector_dynamic_init (6 subtests) ==========
[01:52:27] [PASSED] drm_test_drm_connector_dynamic_init
[01:52:27] [PASSED] drm_test_drm_connector_dynamic_init_null_ddc
[01:52:27] [PASSED] drm_test_drm_connector_dynamic_init_not_added
[01:52:27] [PASSED] drm_test_drm_connector_dynamic_init_properties
[01:52:27] ===== drm_test_drm_connector_dynamic_init_type_valid  ======
[01:52:27] [PASSED] Unknown
[01:52:27] [PASSED] VGA
[01:52:27] [PASSED] DVI-I
[01:52:27] [PASSED] DVI-D
[01:52:27] [PASSED] DVI-A
[01:52:27] [PASSED] Composite
[01:52:27] [PASSED] SVIDEO
[01:52:27] [PASSED] LVDS
[01:52:27] [PASSED] Component
[01:52:27] [PASSED] DIN
[01:52:27] [PASSED] DP
[01:52:27] [PASSED] HDMI-A
[01:52:27] [PASSED] HDMI-B
[01:52:27] [PASSED] TV
[01:52:27] [PASSED] eDP
[01:52:27] [PASSED] Virtual
[01:52:27] [PASSED] DSI
[01:52:27] [PASSED] DPI
[01:52:27] [PASSED] Writeback
[01:52:27] [PASSED] SPI
[01:52:27] [PASSED] USB
[01:52:27] = [PASSED] drm_test_drm_connector_dynamic_init_type_valid ==
[01:52:27] ======== drm_test_drm_connector_dynamic_init_name  =========
[01:52:27] [PASSED] Unknown
[01:52:27] [PASSED] VGA
[01:52:27] [PASSED] DVI-I
[01:52:27] [PASSED] DVI-D
[01:52:27] [PASSED] DVI-A
[01:52:27] [PASSED] Composite
[01:52:27] [PASSED] SVIDEO
[01:52:27] [PASSED] LVDS
[01:52:27] [PASSED] Component
[01:52:27] [PASSED] DIN
[01:52:27] [PASSED] DP
[01:52:27] [PASSED] HDMI-A
[01:52:27] [PASSED] HDMI-B
[01:52:27] [PASSED] TV
[01:52:27] [PASSED] eDP
[01:52:27] [PASSED] Virtual
[01:52:27] [PASSED] DSI
[01:52:27] [PASSED] DPI
[01:52:27] [PASSED] Writeback
[01:52:27] [PASSED] SPI
[01:52:27] [PASSED] USB
[01:52:27] ==== [PASSED] drm_test_drm_connector_dynamic_init_name =====
[01:52:27] =========== [PASSED] drm_connector_dynamic_init ============
[01:52:27] ==== drm_connector_dynamic_register_early (4 subtests) =====
[01:52:27] [PASSED] drm_test_drm_connector_dynamic_register_early_on_list
[01:52:27] [PASSED] drm_test_drm_connector_dynamic_register_early_defer
[01:52:27] [PASSED] drm_test_drm_connector_dynamic_register_early_no_init
[01:52:27] [PASSED] drm_test_drm_connector_dynamic_register_early_no_mode_object
[01:52:27] ====== [PASSED] drm_connector_dynamic_register_early =======
[01:52:27] ======= drm_connector_dynamic_register (7 subtests) ========
[01:52:27] [PASSED] drm_test_drm_connector_dynamic_register_on_list
[01:52:27] [PASSED] drm_test_drm_connector_dynamic_register_no_defer
[01:52:27] [PASSED] drm_test_drm_connector_dynamic_register_no_init
[01:52:27] [PASSED] drm_test_drm_connector_dynamic_register_mode_object
[01:52:27] [PASSED] drm_test_drm_connector_dynamic_register_sysfs
[01:52:27] [PASSED] drm_test_drm_connector_dynamic_register_sysfs_name
[01:52:27] [PASSED] drm_test_drm_connector_dynamic_register_debugfs
[01:52:27] ========= [PASSED] drm_connector_dynamic_register ==========
[01:52:27] = drm_connector_attach_broadcast_rgb_property (2 subtests) =
[01:52:27] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property
[01:52:27] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property_hdmi_connector
[01:52:27] === [PASSED] drm_connector_attach_broadcast_rgb_property ===
[01:52:27] ========== drm_get_tv_mode_from_name (2 subtests) ==========
[01:52:27] ========== drm_test_get_tv_mode_from_name_valid  ===========
[01:52:27] [PASSED] NTSC
[01:52:27] [PASSED] NTSC-443
[01:52:27] [PASSED] NTSC-J
[01:52:27] [PASSED] PAL
[01:52:27] [PASSED] PAL-M
[01:52:27] [PASSED] PAL-N
[01:52:27] [PASSED] SECAM
[01:52:27] [PASSED] Mono
[01:52:27] ====== [PASSED] drm_test_get_tv_mode_from_name_valid =======
[01:52:27] [PASSED] drm_test_get_tv_mode_from_name_truncated
[01:52:27] ============ [PASSED] drm_get_tv_mode_from_name ============
[01:52:27] = drm_test_connector_hdmi_compute_mode_clock (12 subtests) =
[01:52:27] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb
[01:52:27] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc
[01:52:27] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc_vic_1
[01:52:27] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc
[01:52:27] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc_vic_1
[01:52:27] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_double
[01:52:27] = drm_test_connector_hdmi_compute_mode_clock_yuv420_valid  =
[01:52:27] [PASSED] VIC 96
[01:52:27] [PASSED] VIC 97
[01:52:27] [PASSED] VIC 101
[01:52:27] [PASSED] VIC 102
[01:52:27] [PASSED] VIC 106
[01:52:27] [PASSED] VIC 107
[01:52:27] === [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_valid ===
[01:52:27] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_10_bpc
[01:52:27] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_12_bpc
[01:52:27] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_8_bpc
[01:52:27] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_10_bpc
[01:52:27] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_12_bpc
[01:52:27] === [PASSED] drm_test_connector_hdmi_compute_mode_clock ====
[01:52:27] == drm_hdmi_connector_get_broadcast_rgb_name (2 subtests) ==
[01:52:27] === drm_test_drm_hdmi_connector_get_broadcast_rgb_name  ====
[01:52:27] [PASSED] Automatic
[01:52:27] [PASSED] Full
[01:52:27] [PASSED] Limited 16:235
[01:52:27] === [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name ===
[01:52:27] [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name_invalid
[01:52:27] ==== [PASSED] drm_hdmi_connector_get_broadcast_rgb_name ====
[01:52:27] == drm_hdmi_connector_get_output_format_name (2 subtests) ==
[01:52:27] === drm_test_drm_hdmi_connector_get_output_format_name  ====
[01:52:27] [PASSED] RGB
[01:52:27] [PASSED] YUV 4:2:0
[01:52:27] [PASSED] YUV 4:2:2
[01:52:27] [PASSED] YUV 4:4:4
[01:52:27] === [PASSED] drm_test_drm_hdmi_connector_get_output_format_name ===
[01:52:27] [PASSED] drm_test_drm_hdmi_connector_get_output_format_name_invalid
[01:52:27] ==== [PASSED] drm_hdmi_connector_get_output_format_name ====
[01:52:27] ============= drm_damage_helper (21 subtests) ==============
[01:52:27] [PASSED] drm_test_damage_iter_no_damage
[01:52:27] [PASSED] drm_test_damage_iter_no_damage_fractional_src
[01:52:27] [PASSED] drm_test_damage_iter_no_damage_src_moved
[01:52:27] [PASSED] drm_test_damage_iter_no_damage_fractional_src_moved
[01:52:27] [PASSED] drm_test_damage_iter_no_damage_not_visible
[01:52:27] [PASSED] drm_test_damage_iter_no_damage_no_crtc
[01:52:27] [PASSED] drm_test_damage_iter_no_damage_no_fb
[01:52:27] [PASSED] drm_test_damage_iter_simple_damage
[01:52:27] [PASSED] drm_test_damage_iter_single_damage
[01:52:27] [PASSED] drm_test_damage_iter_single_damage_intersect_src
[01:52:27] [PASSED] drm_test_damage_iter_single_damage_outside_src
[01:52:27] [PASSED] drm_test_damage_iter_single_damage_fractional_src
[01:52:27] [PASSED] drm_test_damage_iter_single_damage_intersect_fractional_src
[01:52:27] [PASSED] drm_test_damage_iter_single_damage_outside_fractional_src
[01:52:27] [PASSED] drm_test_damage_iter_single_damage_src_moved
[01:52:27] [PASSED] drm_test_damage_iter_single_damage_fractional_src_moved
[01:52:27] [PASSED] drm_test_damage_iter_damage
[01:52:27] [PASSED] drm_test_damage_iter_damage_one_intersect
[01:52:27] [PASSED] drm_test_damage_iter_damage_one_outside
[01:52:27] [PASSED] drm_test_damage_iter_damage_src_moved
[01:52:27] [PASSED] drm_test_damage_iter_damage_not_visible
[01:52:27] ================ [PASSED] drm_damage_helper ================
[01:52:27] ============== drm_dp_mst_helper (3 subtests) ==============
[01:52:27] ============== drm_test_dp_mst_calc_pbn_mode  ==============
[01:52:27] [PASSED] Clock 154000 BPP 30 DSC disabled
[01:52:27] [PASSED] Clock 234000 BPP 30 DSC disabled
[01:52:27] [PASSED] Clock 297000 BPP 24 DSC disabled
[01:52:27] [PASSED] Clock 332880 BPP 24 DSC enabled
[01:52:27] [PASSED] Clock 324540 BPP 24 DSC enabled
[01:52:27] ========== [PASSED] drm_test_dp_mst_calc_pbn_mode ==========
[01:52:27] ============== drm_test_dp_mst_calc_pbn_div  ===============
[01:52:27] [PASSED] Link rate 2000000 lane count 4
[01:52:27] [PASSED] Link rate 2000000 lane count 2
[01:52:27] [PASSED] Link rate 2000000 lane count 1
[01:52:27] [PASSED] Link rate 1350000 lane count 4
[01:52:27] [PASSED] Link rate 1350000 lane count 2
[01:52:27] [PASSED] Link rate 1350000 lane count 1
[01:52:27] [PASSED] Link rate 1000000 lane count 4
[01:52:27] [PASSED] Link rate 1000000 lane count 2
[01:52:27] [PASSED] Link rate 1000000 lane count 1
[01:52:27] [PASSED] Link rate 810000 lane count 4
[01:52:27] [PASSED] Link rate 810000 lane count 2
[01:52:27] [PASSED] Link rate 810000 lane count 1
[01:52:27] [PASSED] Link rate 540000 lane count 4
[01:52:27] [PASSED] Link rate 540000 lane count 2
[01:52:27] [PASSED] Link rate 540000 lane count 1
[01:52:27] [PASSED] Link rate 270000 lane count 4
[01:52:27] [PASSED] Link rate 270000 lane count 2
[01:52:27] [PASSED] Link rate 270000 lane count 1
[01:52:27] [PASSED] Link rate 162000 lane count 4
[01:52:27] [PASSED] Link rate 162000 lane count 2
[01:52:27] [PASSED] Link rate 162000 lane count 1
[01:52:27] ========== [PASSED] drm_test_dp_mst_calc_pbn_div ===========
[01:52:27] ========= drm_test_dp_mst_sideband_msg_req_decode  =========
[01:52:27] [PASSED] DP_ENUM_PATH_RESOURCES with port number
[01:52:27] [PASSED] DP_POWER_UP_PHY with port number
[01:52:27] [PASSED] DP_POWER_DOWN_PHY with port number
[01:52:27] [PASSED] DP_ALLOCATE_PAYLOAD with SDP stream sinks
[01:52:27] [PASSED] DP_ALLOCATE_PAYLOAD with port number
[01:52:27] [PASSED] DP_ALLOCATE_PAYLOAD with VCPI
[01:52:27] [PASSED] DP_ALLOCATE_PAYLOAD with PBN
[01:52:27] [PASSED] DP_QUERY_PAYLOAD with port number
[01:52:27] [PASSED] DP_QUERY_PAYLOAD with VCPI
[01:52:27] [PASSED] DP_REMOTE_DPCD_READ with port number
[01:52:27] [PASSED] DP_REMOTE_DPCD_READ with DPCD address
[01:52:27] [PASSED] DP_REMOTE_DPCD_READ with max number of bytes
[01:52:27] [PASSED] DP_REMOTE_DPCD_WRITE with port number
[01:52:27] [PASSED] DP_REMOTE_DPCD_WRITE with DPCD address
[01:52:27] [PASSED] DP_REMOTE_DPCD_WRITE with data array
[01:52:27] [PASSED] DP_REMOTE_I2C_READ with port number
[01:52:27] [PASSED] DP_REMOTE_I2C_READ with I2C device ID
[01:52:27] [PASSED] DP_REMOTE_I2C_READ with transactions array
[01:52:27] [PASSED] DP_REMOTE_I2C_WRITE with port number
[01:52:27] [PASSED] DP_REMOTE_I2C_WRITE with I2C device ID
[01:52:27] [PASSED] DP_REMOTE_I2C_WRITE with data array
[01:52:27] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream ID
[01:52:27] [PASSED] DP_QUERY_STREAM_ENC_STATUS with client ID
[01:52:27] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream event
[01:52:27] [PASSED] DP_QUERY_STREAM_ENC_STATUS with valid stream event
[01:52:27] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream behavior
[01:52:27] [PASSED] DP_QUERY_STREAM_ENC_STATUS with a valid stream behavior
[01:52:27] ===== [PASSED] drm_test_dp_mst_sideband_msg_req_decode =====
[01:52:27] ================ [PASSED] drm_dp_mst_helper ================
[01:52:27] ================== drm_exec (7 subtests) ===================
[01:52:27] [PASSED] sanitycheck
[01:52:27] [PASSED] test_lock
[01:52:27] [PASSED] test_lock_unlock
[01:52:27] [PASSED] test_duplicates
[01:52:27] [PASSED] test_prepare
[01:52:27] [PASSED] test_prepare_array
[01:52:27] [PASSED] test_multiple_loops
[01:52:27] ==================== [PASSED] drm_exec =====================
[01:52:27] =========== drm_format_helper_test (17 subtests) ===========
[01:52:27] ============== drm_test_fb_xrgb8888_to_gray8  ==============
[01:52:27] [PASSED] single_pixel_source_buffer
[01:52:27] [PASSED] single_pixel_clip_rectangle
[01:52:27] [PASSED] well_known_colors
[01:52:27] [PASSED] destination_pitch
[01:52:27] ========== [PASSED] drm_test_fb_xrgb8888_to_gray8 ==========
[01:52:27] ============= drm_test_fb_xrgb8888_to_rgb332  ==============
[01:52:27] [PASSED] single_pixel_source_buffer
[01:52:27] [PASSED] single_pixel_clip_rectangle
[01:52:27] [PASSED] well_known_colors
[01:52:27] [PASSED] destination_pitch
[01:52:27] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb332 ==========
[01:52:27] ============= drm_test_fb_xrgb8888_to_rgb565  ==============
[01:52:27] [PASSED] single_pixel_source_buffer
[01:52:27] [PASSED] single_pixel_clip_rectangle
[01:52:27] [PASSED] well_known_colors
[01:52:27] [PASSED] destination_pitch
[01:52:27] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb565 ==========
[01:52:27] ============ drm_test_fb_xrgb8888_to_xrgb1555  =============
[01:52:27] [PASSED] single_pixel_source_buffer
[01:52:27] [PASSED] single_pixel_clip_rectangle
[01:52:27] [PASSED] well_known_colors
[01:52:27] [PASSED] destination_pitch
[01:52:27] ======== [PASSED] drm_test_fb_xrgb8888_to_xrgb1555 =========
[01:52:27] ============ drm_test_fb_xrgb8888_to_argb1555  =============
[01:52:27] [PASSED] single_pixel_source_buffer
[01:52:27] [PASSED] single_pixel_clip_rectangle
[01:52:27] [PASSED] well_known_colors
[01:52:27] [PASSED] destination_pitch
[01:52:27] ======== [PASSED] drm_test_fb_xrgb8888_to_argb1555 =========
[01:52:27] ============ drm_test_fb_xrgb8888_to_rgba5551  =============
[01:52:27] [PASSED] single_pixel_source_buffer
[01:52:27] [PASSED] single_pixel_clip_rectangle
[01:52:27] [PASSED] well_known_colors
[01:52:27] [PASSED] destination_pitch
[01:52:27] ======== [PASSED] drm_test_fb_xrgb8888_to_rgba5551 =========
[01:52:27] ============= drm_test_fb_xrgb8888_to_rgb888  ==============
[01:52:27] [PASSED] single_pixel_source_buffer
[01:52:27] [PASSED] single_pixel_clip_rectangle
[01:52:27] [PASSED] well_known_colors
[01:52:27] [PASSED] destination_pitch
[01:52:27] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb888 ==========
[01:52:27] ============= drm_test_fb_xrgb8888_to_bgr888  ==============
[01:52:27] [PASSED] single_pixel_source_buffer
[01:52:27] [PASSED] single_pixel_clip_rectangle
[01:52:27] [PASSED] well_known_colors
[01:52:27] [PASSED] destination_pitch
[01:52:27] ========= [PASSED] drm_test_fb_xrgb8888_to_bgr888 ==========
[01:52:27] ============ drm_test_fb_xrgb8888_to_argb8888  =============
[01:52:27] [PASSED] single_pixel_source_buffer
[01:52:27] [PASSED] single_pixel_clip_rectangle
[01:52:27] [PASSED] well_known_colors
[01:52:27] [PASSED] destination_pitch
[01:52:27] ======== [PASSED] drm_test_fb_xrgb8888_to_argb8888 =========
[01:52:27] =========== drm_test_fb_xrgb8888_to_xrgb2101010  ===========
[01:52:27] [PASSED] single_pixel_source_buffer
[01:52:27] [PASSED] single_pixel_clip_rectangle
[01:52:27] [PASSED] well_known_colors
[01:52:27] [PASSED] destination_pitch
[01:52:27] ======= [PASSED] drm_test_fb_xrgb8888_to_xrgb2101010 =======
[01:52:27] =========== drm_test_fb_xrgb8888_to_argb2101010  ===========
[01:52:27] [PASSED] single_pixel_source_buffer
[01:52:27] [PASSED] single_pixel_clip_rectangle
[01:52:27] [PASSED] well_known_colors
[01:52:27] [PASSED] destination_pitch
[01:52:27] ======= [PASSED] drm_test_fb_xrgb8888_to_argb2101010 =======
[01:52:27] ============== drm_test_fb_xrgb8888_to_mono  ===============
[01:52:27] [PASSED] single_pixel_source_buffer
[01:52:27] [PASSED] single_pixel_clip_rectangle
[01:52:27] [PASSED] well_known_colors
[01:52:27] [PASSED] destination_pitch
[01:52:27] ========== [PASSED] drm_test_fb_xrgb8888_to_mono ===========
[01:52:27] ==================== drm_test_fb_swab  =====================
[01:52:27] [PASSED] single_pixel_source_buffer
[01:52:27] [PASSED] single_pixel_clip_rectangle
[01:52:27] [PASSED] well_known_colors
[01:52:27] [PASSED] destination_pitch
[01:52:27] ================ [PASSED] drm_test_fb_swab =================
[01:52:27] ============ drm_test_fb_xrgb8888_to_xbgr8888  =============
[01:52:27] [PASSED] single_pixel_source_buffer
[01:52:27] [PASSED] single_pixel_clip_rectangle
[01:52:27] [PASSED] well_known_colors
[01:52:27] [PASSED] destination_pitch
[01:52:27] ======== [PASSED] drm_test_fb_xrgb8888_to_xbgr8888 =========
[01:52:27] ============ drm_test_fb_xrgb8888_to_abgr8888  =============
[01:52:27] [PASSED] single_pixel_source_buffer
[01:52:27] [PASSED] single_pixel_clip_rectangle
[01:52:27] [PASSED] well_known_colors
[01:52:27] [PASSED] destination_pitch
[01:52:27] ======== [PASSED] drm_test_fb_xrgb8888_to_abgr8888 =========
[01:52:27] ================= drm_test_fb_clip_offset  =================
[01:52:27] [PASSED] pass through
[01:52:27] [PASSED] horizontal offset
[01:52:27] [PASSED] vertical offset
[01:52:27] [PASSED] horizontal and vertical offset
[01:52:27] [PASSED] horizontal offset (custom pitch)
[01:52:27] [PASSED] vertical offset (custom pitch)
[01:52:27] [PASSED] horizontal and vertical offset (custom pitch)
[01:52:27] ============= [PASSED] drm_test_fb_clip_offset =============
[01:52:27] =================== drm_test_fb_memcpy  ====================
[01:52:27] [PASSED] single_pixel_source_buffer: XR24 little-endian (0x34325258)
[01:52:27] [PASSED] single_pixel_source_buffer: XRA8 little-endian (0x38415258)
[01:52:27] [PASSED] single_pixel_source_buffer: YU24 little-endian (0x34325559)
[01:52:27] [PASSED] single_pixel_clip_rectangle: XB24 little-endian (0x34324258)
[01:52:27] [PASSED] single_pixel_clip_rectangle: XRA8 little-endian (0x38415258)
[01:52:27] [PASSED] single_pixel_clip_rectangle: YU24 little-endian (0x34325559)
[01:52:27] [PASSED] well_known_colors: XB24 little-endian (0x34324258)
[01:52:27] [PASSED] well_known_colors: XRA8 little-endian (0x38415258)
[01:52:27] [PASSED] well_known_colors: YU24 little-endian (0x34325559)
[01:52:27] [PASSED] destination_pitch: XB24 little-endian (0x34324258)
[01:52:27] [PASSED] destination_pitch: XRA8 little-endian (0x38415258)
[01:52:27] [PASSED] destination_pitch: YU24 little-endian (0x34325559)
[01:52:27] =============== [PASSED] drm_test_fb_memcpy ================
[01:52:27] ============= [PASSED] drm_format_helper_test ==============
[01:52:27] ================= drm_format (18 subtests) =================
[01:52:27] [PASSED] drm_test_format_block_width_invalid
[01:52:27] [PASSED] drm_test_format_block_width_one_plane
[01:52:27] [PASSED] drm_test_format_block_width_two_plane
[01:52:27] [PASSED] drm_test_format_block_width_three_plane
[01:52:27] [PASSED] drm_test_format_block_width_tiled
[01:52:27] [PASSED] drm_test_format_block_height_invalid
[01:52:27] [PASSED] drm_test_format_block_height_one_plane
[01:52:27] [PASSED] drm_test_format_block_height_two_plane
[01:52:27] [PASSED] drm_test_format_block_height_three_plane
[01:52:27] [PASSED] drm_test_format_block_height_tiled
[01:52:27] [PASSED] drm_test_format_min_pitch_invalid
[01:52:27] [PASSED] drm_test_format_min_pitch_one_plane_8bpp
[01:52:27] [PASSED] drm_test_format_min_pitch_one_plane_16bpp
[01:52:27] [PASSED] drm_test_format_min_pitch_one_plane_24bpp
[01:52:27] [PASSED] drm_test_format_min_pitch_one_plane_32bpp
[01:52:27] [PASSED] drm_test_format_min_pitch_two_plane
[01:52:27] [PASSED] drm_test_format_min_pitch_three_plane_8bpp
[01:52:27] [PASSED] drm_test_format_min_pitch_tiled
[01:52:27] =================== [PASSED] drm_format ====================
[01:52:27] ============== drm_framebuffer (10 subtests) ===============
[01:52:27] ========== drm_test_framebuffer_check_src_coords  ==========
[01:52:27] [PASSED] Success: source fits into fb
[01:52:27] [PASSED] Fail: overflowing fb with x-axis coordinate
[01:52:27] [PASSED] Fail: overflowing fb with y-axis coordinate
[01:52:27] [PASSED] Fail: overflowing fb with source width
[01:52:27] [PASSED] Fail: overflowing fb with source height
[01:52:27] ====== [PASSED] drm_test_framebuffer_check_src_coords ======
[01:52:27] [PASSED] drm_test_framebuffer_cleanup
[01:52:27] =============== drm_test_framebuffer_create  ===============
[01:52:27] [PASSED] ABGR8888 normal sizes
[01:52:27] [PASSED] ABGR8888 max sizes
[01:52:27] [PASSED] ABGR8888 pitch greater than min required
[01:52:27] [PASSED] ABGR8888 pitch less than min required
[01:52:27] [PASSED] ABGR8888 Invalid width
[01:52:27] [PASSED] ABGR8888 Invalid buffer handle
[01:52:27] [PASSED] No pixel format
[01:52:27] [PASSED] ABGR8888 Width 0
[01:52:27] [PASSED] ABGR8888 Height 0
[01:52:27] [PASSED] ABGR8888 Out of bound height * pitch combination
[01:52:27] [PASSED] ABGR8888 Large buffer offset
[01:52:27] [PASSED] ABGR8888 Buffer offset for inexistent plane
[01:52:27] [PASSED] ABGR8888 Invalid flag
[01:52:27] [PASSED] ABGR8888 Set DRM_MODE_FB_MODIFIERS without modifiers
[01:52:27] [PASSED] ABGR8888 Valid buffer modifier
[01:52:27] [PASSED] ABGR8888 Invalid buffer modifier(DRM_FORMAT_MOD_SAMSUNG_64_32_TILE)
[01:52:27] [PASSED] ABGR8888 Extra pitches without DRM_MODE_FB_MODIFIERS
[01:52:27] [PASSED] ABGR8888 Extra pitches with DRM_MODE_FB_MODIFIERS
[01:52:27] [PASSED] NV12 Normal sizes
[01:52:27] [PASSED] NV12 Max sizes
[01:52:27] [PASSED] NV12 Invalid pitch
[01:52:27] [PASSED] NV12 Invalid modifier/missing DRM_MODE_FB_MODIFIERS flag
[01:52:27] [PASSED] NV12 different  modifier per-plane
[01:52:27] [PASSED] NV12 with DRM_FORMAT_MOD_SAMSUNG_64_32_TILE
[01:52:27] [PASSED] NV12 Valid modifiers without DRM_MODE_FB_MODIFIERS
[01:52:27] [PASSED] NV12 Modifier for inexistent plane
[01:52:27] [PASSED] NV12 Handle for inexistent plane
[01:52:27] [PASSED] NV12 Handle for inexistent plane without DRM_MODE_FB_MODIFIERS
[01:52:27] [PASSED] YVU420 DRM_MODE_FB_MODIFIERS set without modifier
[01:52:27] [PASSED] YVU420 Normal sizes
[01:52:27] [PASSED] YVU420 Max sizes
[01:52:27] [PASSED] YVU420 Invalid pitch
[01:52:27] [PASSED] YVU420 Different pitches
[01:52:27] [PASSED] YVU420 Different buffer offsets/pitches
[01:52:27] [PASSED] YVU420 Modifier set just for plane 0, without DRM_MODE_FB_MODIFIERS
[01:52:27] [PASSED] YVU420 Modifier set just for planes 0, 1, without DRM_MODE_FB_MODIFIERS
[01:52:27] [PASSED] YVU420 Modifier set just for plane 0, 1, with DRM_MODE_FB_MODIFIERS
[01:52:27] [PASSED] YVU420 Valid modifier
[01:52:27] [PASSED] YVU420 Different modifiers per plane
[01:52:27] [PASSED] YVU420 Modifier for inexistent plane
[01:52:27] [PASSED] YUV420_10BIT Invalid modifier(DRM_FORMAT_MOD_LINEAR)
[01:52:27] [PASSED] X0L2 Normal sizes
[01:52:27] [PASSED] X0L2 Max sizes
[01:52:27] [PASSED] X0L2 Invalid pitch
[01:52:27] [PASSED] X0L2 Pitch greater than minimum required
[01:52:27] [PASSED] X0L2 Handle for inexistent plane
[01:52:27] [PASSED] X0L2 Offset for inexistent plane, without DRM_MODE_FB_MODIFIERS set
[01:52:27] [PASSED] X0L2 Modifier without DRM_MODE_FB_MODIFIERS set
[01:52:27] [PASSED] X0L2 Valid modifier
[01:52:27] [PASSED] X0L2 Modifier for inexistent plane
[01:52:27] =========== [PASSED] drm_test_framebuffer_create ===========
[01:52:27] [PASSED] drm_test_framebuffer_free
[01:52:27] [PASSED] drm_test_framebuffer_init
[01:52:27] [PASSED] drm_test_framebuffer_init_bad_format
[01:52:27] [PASSED] drm_test_framebuffer_init_dev_mismatch
[01:52:27] [PASSED] drm_test_framebuffer_lookup
[01:52:27] [PASSED] drm_test_framebuffer_lookup_inexistent
[01:52:27] [PASSED] drm_test_framebuffer_modifiers_not_supported
[01:52:27] ================= [PASSED] drm_framebuffer =================
[01:52:27] ================ drm_gem_shmem (8 subtests) ================
[01:52:27] [PASSED] drm_gem_shmem_test_obj_create
[01:52:27] [PASSED] drm_gem_shmem_test_obj_create_private
[01:52:27] [PASSED] drm_gem_shmem_test_pin_pages
[01:52:27] [PASSED] drm_gem_shmem_test_vmap
[01:52:27] [PASSED] drm_gem_shmem_test_get_pages_sgt
[01:52:27] [PASSED] drm_gem_shmem_test_get_sg_table
[01:52:27] [PASSED] drm_gem_shmem_test_madvise
[01:52:27] [PASSED] drm_gem_shmem_test_purge
[01:52:27] ================== [PASSED] drm_gem_shmem ==================
[01:52:27] === drm_atomic_helper_connector_hdmi_check (27 subtests) ===
[01:52:27] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode
[01:52:27] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode_vic_1
[01:52:27] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode
[01:52:27] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode_vic_1
[01:52:27] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode
[01:52:27] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode_vic_1
[01:52:27] ====== drm_test_check_broadcast_rgb_cea_mode_yuv420  =======
[01:52:27] [PASSED] Automatic
[01:52:27] [PASSED] Full
[01:52:27] [PASSED] Limited 16:235
[01:52:27] == [PASSED] drm_test_check_broadcast_rgb_cea_mode_yuv420 ===
[01:52:27] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_changed
[01:52:27] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_not_changed
[01:52:27] [PASSED] drm_test_check_disable_connector
[01:52:27] [PASSED] drm_test_check_hdmi_funcs_reject_rate
[01:52:27] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_rgb
[01:52:27] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_yuv420
[01:52:27] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv422
[01:52:27] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv420
[01:52:27] [PASSED] drm_test_check_driver_unsupported_fallback_yuv420
[01:52:27] [PASSED] drm_test_check_output_bpc_crtc_mode_changed
[01:52:27] [PASSED] drm_test_check_output_bpc_crtc_mode_not_changed
[01:52:27] [PASSED] drm_test_check_output_bpc_dvi
[01:52:27] [PASSED] drm_test_check_output_bpc_format_vic_1
[01:52:27] [PASSED] drm_test_check_output_bpc_format_display_8bpc_only
[01:52:27] [PASSED] drm_test_check_output_bpc_format_display_rgb_only
[01:52:27] [PASSED] drm_test_check_output_bpc_format_driver_8bpc_only
[01:52:27] [PASSED] drm_test_check_output_bpc_format_driver_rgb_only
[01:52:27] [PASSED] drm_test_check_tmds_char_rate_rgb_8bpc
[01:52:27] [PASSED] drm_test_check_tmds_char_rate_rgb_10bpc
[01:52:27] [PASSED] drm_test_check_tmds_char_rate_rgb_12bpc
[01:52:27] ===== [PASSED] drm_atomic_helper_connector_hdmi_check ======
[01:52:27] === drm_atomic_helper_connector_hdmi_reset (6 subtests) ====
[01:52:27] [PASSED] drm_test_check_broadcast_rgb_value
[01:52:27] [PASSED] drm_test_check_bpc_8_value
[01:52:27] [PASSED] drm_test_check_bpc_10_value
[01:52:27] [PASSED] drm_test_check_bpc_12_value
[01:52:27] [PASSED] drm_test_check_format_value
[01:52:27] [PASSED] drm_test_check_tmds_char_value
[01:52:27] ===== [PASSED] drm_atomic_helper_connector_hdmi_reset ======
[01:52:27] = drm_atomic_helper_connector_hdmi_mode_valid (4 subtests) =
[01:52:27] [PASSED] drm_test_check_mode_valid
[01:52:27] [PASSED] drm_test_check_mode_valid_reject
[01:52:27] [PASSED] drm_test_check_mode_valid_reject_rate
[01:52:27] [PASSED] drm_test_check_mode_valid_reject_max_clock
[01:52:27] === [PASSED] drm_atomic_helper_connector_hdmi_mode_valid ===
[01:52:27] ================= drm_managed (2 subtests) =================
[01:52:27] [PASSED] drm_test_managed_release_action
[01:52:27] [PASSED] drm_test_managed_run_action
[01:52:27] =================== [PASSED] drm_managed ===================
[01:52:27] =================== drm_mm (6 subtests) ====================
[01:52:27] [PASSED] drm_test_mm_init
[01:52:27] [PASSED] drm_test_mm_debug
[01:52:27] [PASSED] drm_test_mm_align32
[01:52:27] [PASSED] drm_test_mm_align64
[01:52:27] [PASSED] drm_test_mm_lowest
[01:52:27] [PASSED] drm_test_mm_highest
[01:52:27] ===================== [PASSED] drm_mm ======================
[01:52:27] ============= drm_modes_analog_tv (5 subtests) =============
[01:52:27] [PASSED] drm_test_modes_analog_tv_mono_576i
[01:52:27] [PASSED] drm_test_modes_analog_tv_ntsc_480i
[01:52:27] [PASSED] drm_test_modes_analog_tv_ntsc_480i_inlined
[01:52:27] [PASSED] drm_test_modes_analog_tv_pal_576i
[01:52:27] [PASSED] drm_test_modes_analog_tv_pal_576i_inlined
[01:52:27] =============== [PASSED] drm_modes_analog_tv ===============
[01:52:27] ============== drm_plane_helper (2 subtests) ===============
[01:52:27] =============== drm_test_check_plane_state  ================
[01:52:27] [PASSED] clipping_simple
[01:52:27] [PASSED] clipping_rotate_reflect
[01:52:27] [PASSED] positioning_simple
[01:52:27] [PASSED] upscaling
[01:52:27] [PASSED] downscaling
[01:52:27] [PASSED] rounding1
[01:52:27] [PASSED] rounding2
[01:52:27] [PASSED] rounding3
[01:52:27] [PASSED] rounding4
[01:52:27] =========== [PASSED] drm_test_check_plane_state ============
[01:52:27] =========== drm_test_check_invalid_plane_state  ============
[01:52:27] [PASSED] positioning_invalid
[01:52:27] [PASSED] upscaling_invalid
[01:52:27] [PASSED] downscaling_invalid
[01:52:27] ======= [PASSED] drm_test_check_invalid_plane_state ========
[01:52:27] ================ [PASSED] drm_plane_helper =================
[01:52:27] ====== drm_connector_helper_tv_get_modes (1 subtest) =======
[01:52:27] ====== drm_test_connector_helper_tv_get_modes_check  =======
[01:52:27] [PASSED] None
[01:52:27] [PASSED] PAL
[01:52:27] [PASSED] NTSC
[01:52:27] [PASSED] Both, NTSC Default
[01:52:27] [PASSED] Both, PAL Default
[01:52:27] [PASSED] Both, NTSC Default, with PAL on command-line
[01:52:27] [PASSED] Both, PAL Default, with NTSC on command-line
[01:52:27] == [PASSED] drm_test_connector_helper_tv_get_modes_check ===
[01:52:27] ======== [PASSED] drm_connector_helper_tv_get_modes ========
[01:52:27] ================== drm_rect (9 subtests) ===================
[01:52:27] [PASSED] drm_test_rect_clip_scaled_div_by_zero
[01:52:27] [PASSED] drm_test_rect_clip_scaled_not_clipped
[01:52:27] [PASSED] drm_test_rect_clip_scaled_clipped
[01:52:27] [PASSED] drm_test_rect_clip_scaled_signed_vs_unsigned
[01:52:27] ================= drm_test_rect_intersect  =================
[01:52:27] [PASSED] top-left x bottom-right: 2x2+1+1 x 2x2+0+0
[01:52:27] [PASSED] top-right x bottom-left: 2x2+0+0 x 2x2+1-1
[01:52:27] [PASSED] bottom-left x top-right: 2x2+1-1 x 2x2+0+0
[01:52:27] [PASSED] bottom-right x top-left: 2x2+0+0 x 2x2+1+1
[01:52:27] [PASSED] right x left: 2x1+0+0 x 3x1+1+0
[01:52:27] [PASSED] left x right: 3x1+1+0 x 2x1+0+0
[01:52:27] [PASSED] up x bottom: 1x2+0+0 x 1x3+0-1
[01:52:27] [PASSED] bottom x up: 1x3+0-1 x 1x2+0+0
[01:52:27] [PASSED] touching corner: 1x1+0+0 x 2x2+1+1
[01:52:27] [PASSED] touching side: 1x1+0+0 x 1x1+1+0
[01:52:27] [PASSED] equal rects: 2x2+0+0 x 2x2+0+0
[01:52:27] [PASSED] inside another: 2x2+0+0 x 1x1+1+1
[01:52:27] [PASSED] far away: 1x1+0+0 x 1x1+3+6
[01:52:27] [PASSED] points intersecting: 0x0+5+10 x 0x0+5+10
[01:52:27] [PASSED] points not intersecting: 0x0+0+0 x 0x0+5+10
[01:52:27] ============= [PASSED] drm_test_rect_intersect =============
[01:52:27] ================ drm_test_rect_calc_hscale  ================
[01:52:27] [PASSED] normal use
[01:52:27] [PASSED] out of max range
[01:52:27] [PASSED] out of min range
[01:52:27] [PASSED] zero dst
[01:52:27] [PASSED] negative src
[01:52:27] [PASSED] negative dst
[01:52:27] ============ [PASSED] drm_test_rect_calc_hscale ============
[01:52:27] ================ drm_test_rect_calc_vscale  ================
[01:52:27] [PASSED] normal use
stty: 'standard input': Inappropriate ioctl for device
[01:52:27] [PASSED] out of max range
[01:52:27] [PASSED] out of min range
[01:52:27] [PASSED] zero dst
[01:52:27] [PASSED] negative src
[01:52:27] [PASSED] negative dst
[01:52:27] ============ [PASSED] drm_test_rect_calc_vscale ============
[01:52:27] ================== drm_test_rect_rotate  ===================
[01:52:27] [PASSED] reflect-x
[01:52:27] [PASSED] reflect-y
[01:52:27] [PASSED] rotate-0
[01:52:27] [PASSED] rotate-90
[01:52:27] [PASSED] rotate-180
[01:52:27] [PASSED] rotate-270
[01:52:27] ============== [PASSED] drm_test_rect_rotate ===============
[01:52:27] ================ drm_test_rect_rotate_inv  =================
[01:52:27] [PASSED] reflect-x
[01:52:27] [PASSED] reflect-y
[01:52:27] [PASSED] rotate-0
[01:52:27] [PASSED] rotate-90
[01:52:27] [PASSED] rotate-180
[01:52:27] [PASSED] rotate-270
[01:52:27] ============ [PASSED] drm_test_rect_rotate_inv =============
[01:52:27] ==================== [PASSED] drm_rect =====================
[01:52:27] ============ drm_sysfb_modeset_test (1 subtest) ============
[01:52:27] ============ drm_test_sysfb_build_fourcc_list  =============
[01:52:27] [PASSED] no native formats
[01:52:27] [PASSED] XRGB8888 as native format
[01:52:27] [PASSED] remove duplicates
[01:52:27] [PASSED] convert alpha formats
[01:52:27] [PASSED] random formats
[01:52:27] ======== [PASSED] drm_test_sysfb_build_fourcc_list =========
[01:52:27] ============= [PASSED] drm_sysfb_modeset_test ==============
[01:52:27] ================== drm_fixp (2 subtests) ===================
[01:52:27] [PASSED] drm_test_int2fixp
[01:52:27] [PASSED] drm_test_sm2fixp
[01:52:27] ==================== [PASSED] drm_fixp =====================
[01:52:27] ============================================================
[01:52:27] Testing complete. Ran 624 tests: passed: 624
[01:52:27] Elapsed time: 27.159s total, 1.694s configuring, 25.048s building, 0.408s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/ttm/tests/.kunitconfig
[01:52:27] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[01:52:29] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[01:52:38] Starting KUnit Kernel (1/1)...
[01:52:38] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[01:52:38] ================= ttm_device (5 subtests) ==================
[01:52:38] [PASSED] ttm_device_init_basic
[01:52:38] [PASSED] ttm_device_init_multiple
[01:52:38] [PASSED] ttm_device_fini_basic
[01:52:38] [PASSED] ttm_device_init_no_vma_man
[01:52:38] ================== ttm_device_init_pools  ==================
[01:52:38] [PASSED] No DMA allocations, no DMA32 required
[01:52:38] [PASSED] DMA allocations, DMA32 required
[01:52:38] [PASSED] No DMA allocations, DMA32 required
[01:52:38] [PASSED] DMA allocations, no DMA32 required
[01:52:38] ============== [PASSED] ttm_device_init_pools ==============
[01:52:38] =================== [PASSED] ttm_device ====================
[01:52:38] ================== ttm_pool (8 subtests) ===================
[01:52:38] ================== ttm_pool_alloc_basic  ===================
[01:52:38] [PASSED] One page
[01:52:38] [PASSED] More than one page
[01:52:38] [PASSED] Above the allocation limit
[01:52:38] [PASSED] One page, with coherent DMA mappings enabled
[01:52:38] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[01:52:38] ============== [PASSED] ttm_pool_alloc_basic ===============
[01:52:38] ============== ttm_pool_alloc_basic_dma_addr  ==============
[01:52:38] [PASSED] One page
[01:52:38] [PASSED] More than one page
[01:52:38] [PASSED] Above the allocation limit
[01:52:38] [PASSED] One page, with coherent DMA mappings enabled
[01:52:38] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[01:52:38] ========== [PASSED] ttm_pool_alloc_basic_dma_addr ==========
[01:52:38] [PASSED] ttm_pool_alloc_order_caching_match
[01:52:38] [PASSED] ttm_pool_alloc_caching_mismatch
[01:52:38] [PASSED] ttm_pool_alloc_order_mismatch
[01:52:38] [PASSED] ttm_pool_free_dma_alloc
[01:52:38] [PASSED] ttm_pool_free_no_dma_alloc
[01:52:38] [PASSED] ttm_pool_fini_basic
[01:52:38] ==================== [PASSED] ttm_pool =====================
[01:52:38] ================ ttm_resource (8 subtests) =================
[01:52:38] ================= ttm_resource_init_basic  =================
[01:52:38] [PASSED] Init resource in TTM_PL_SYSTEM
[01:52:38] [PASSED] Init resource in TTM_PL_VRAM
[01:52:38] [PASSED] Init resource in a private placement
[01:52:38] [PASSED] Init resource in TTM_PL_SYSTEM, set placement flags
[01:52:38] ============= [PASSED] ttm_resource_init_basic =============
[01:52:38] [PASSED] ttm_resource_init_pinned
[01:52:38] [PASSED] ttm_resource_fini_basic
[01:52:38] [PASSED] ttm_resource_manager_init_basic
[01:52:38] [PASSED] ttm_resource_manager_usage_basic
[01:52:38] [PASSED] ttm_resource_manager_set_used_basic
[01:52:38] [PASSED] ttm_sys_man_alloc_basic
[01:52:38] [PASSED] ttm_sys_man_free_basic
[01:52:38] ================== [PASSED] ttm_resource ===================
[01:52:38] =================== ttm_tt (15 subtests) ===================
[01:52:38] ==================== ttm_tt_init_basic  ====================
[01:52:38] [PASSED] Page-aligned size
[01:52:38] [PASSED] Extra pages requested
[01:52:38] ================ [PASSED] ttm_tt_init_basic ================
[01:52:38] [PASSED] ttm_tt_init_misaligned
[01:52:38] [PASSED] ttm_tt_fini_basic
[01:52:38] [PASSED] ttm_tt_fini_sg
[01:52:38] [PASSED] ttm_tt_fini_shmem
[01:52:38] [PASSED] ttm_tt_create_basic
[01:52:38] [PASSED] ttm_tt_create_invalid_bo_type
[01:52:38] [PASSED] ttm_tt_create_ttm_exists
[01:52:38] [PASSED] ttm_tt_create_failed
[01:52:38] [PASSED] ttm_tt_destroy_basic
[01:52:38] [PASSED] ttm_tt_populate_null_ttm
[01:52:38] [PASSED] ttm_tt_populate_populated_ttm
[01:52:38] [PASSED] ttm_tt_unpopulate_basic
[01:52:38] [PASSED] ttm_tt_unpopulate_empty_ttm
[01:52:38] [PASSED] ttm_tt_swapin_basic
[01:52:38] ===================== [PASSED] ttm_tt ======================
[01:52:38] =================== ttm_bo (14 subtests) ===================
[01:52:38] =========== ttm_bo_reserve_optimistic_no_ticket  ===========
[01:52:38] [PASSED] Cannot be interrupted and sleeps
[01:52:38] [PASSED] Cannot be interrupted, locks straight away
[01:52:38] [PASSED] Can be interrupted, sleeps
[01:52:38] ======= [PASSED] ttm_bo_reserve_optimistic_no_ticket =======
[01:52:38] [PASSED] ttm_bo_reserve_locked_no_sleep
[01:52:38] [PASSED] ttm_bo_reserve_no_wait_ticket
[01:52:38] [PASSED] ttm_bo_reserve_double_resv
[01:52:38] [PASSED] ttm_bo_reserve_interrupted
[01:52:38] [PASSED] ttm_bo_reserve_deadlock
[01:52:38] [PASSED] ttm_bo_unreserve_basic
[01:52:38] [PASSED] ttm_bo_unreserve_pinned
[01:52:38] [PASSED] ttm_bo_unreserve_bulk
[01:52:38] [PASSED] ttm_bo_fini_basic
[01:52:38] [PASSED] ttm_bo_fini_shared_resv
[01:52:38] [PASSED] ttm_bo_pin_basic
[01:52:38] [PASSED] ttm_bo_pin_unpin_resource
[01:52:38] [PASSED] ttm_bo_multiple_pin_one_unpin
[01:52:38] ===================== [PASSED] ttm_bo ======================
[01:52:38] ============== ttm_bo_validate (21 subtests) ===============
[01:52:38] ============== ttm_bo_init_reserved_sys_man  ===============
[01:52:38] [PASSED] Buffer object for userspace
[01:52:38] [PASSED] Kernel buffer object
[01:52:38] [PASSED] Shared buffer object
[01:52:38] ========== [PASSED] ttm_bo_init_reserved_sys_man ===========
[01:52:38] ============== ttm_bo_init_reserved_mock_man  ==============
[01:52:38] [PASSED] Buffer object for userspace
[01:52:38] [PASSED] Kernel buffer object
[01:52:38] [PASSED] Shared buffer object
[01:52:38] ========== [PASSED] ttm_bo_init_reserved_mock_man ==========
[01:52:38] [PASSED] ttm_bo_init_reserved_resv
[01:52:38] ================== ttm_bo_validate_basic  ==================
[01:52:38] [PASSED] Buffer object for userspace
[01:52:38] [PASSED] Kernel buffer object
[01:52:38] [PASSED] Shared buffer object
[01:52:38] ============== [PASSED] ttm_bo_validate_basic ==============
[01:52:38] [PASSED] ttm_bo_validate_invalid_placement
[01:52:38] ============= ttm_bo_validate_same_placement  ==============
[01:52:38] [PASSED] System manager
[01:52:38] [PASSED] VRAM manager
[01:52:38] ========= [PASSED] ttm_bo_validate_same_placement ==========
[01:52:38] [PASSED] ttm_bo_validate_failed_alloc
[01:52:38] [PASSED] ttm_bo_validate_pinned
[01:52:38] [PASSED] ttm_bo_validate_busy_placement
[01:52:38] ================ ttm_bo_validate_multihop  =================
[01:52:38] [PASSED] Buffer object for userspace
[01:52:38] [PASSED] Kernel buffer object
[01:52:38] [PASSED] Shared buffer object
[01:52:38] ============ [PASSED] ttm_bo_validate_multihop =============
[01:52:38] ========== ttm_bo_validate_no_placement_signaled  ==========
[01:52:38] [PASSED] Buffer object in system domain, no page vector
[01:52:38] [PASSED] Buffer object in system domain with an existing page vector
[01:52:38] ====== [PASSED] ttm_bo_validate_no_placement_signaled ======
[01:52:38] ======== ttm_bo_validate_no_placement_not_signaled  ========
[01:52:38] [PASSED] Buffer object for userspace
[01:52:38] [PASSED] Kernel buffer object
[01:52:38] [PASSED] Shared buffer object
[01:52:38] ==== [PASSED] ttm_bo_validate_no_placement_not_signaled ====
[01:52:38] [PASSED] ttm_bo_validate_move_fence_signaled
[01:52:38] ========= ttm_bo_validate_move_fence_not_signaled  =========
[01:52:38] [PASSED] Waits for GPU
[01:52:38] [PASSED] Tries to lock straight away
[01:52:38] ===== [PASSED] ttm_bo_validate_move_fence_not_signaled =====
[01:52:38] [PASSED] ttm_bo_validate_happy_evict
[01:52:38] [PASSED] ttm_bo_validate_all_pinned_evict
[01:52:38] [PASSED] ttm_bo_validate_allowed_only_evict
[01:52:38] [PASSED] ttm_bo_validate_deleted_evict
[01:52:38] [PASSED] ttm_bo_validate_busy_domain_evict
[01:52:38] [PASSED] ttm_bo_validate_evict_gutting
[01:52:38] [PASSED] ttm_bo_validate_recrusive_evict
stty: 'standard input': Inappropriate ioctl for device
[01:52:38] ================= [PASSED] ttm_bo_validate =================
[01:52:38] ============================================================
[01:52:38] Testing complete. Ran 101 tests: passed: 101
[01:52:38] Elapsed time: 11.153s total, 1.655s configuring, 9.282s building, 0.180s running

+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 44+ messages in thread

* ✗ Xe.CI.BAT: failure for Introduce SRIOV scheduler groups
  2025-11-27  1:45 [PATCH 00/10] Introduce SRIOV scheduler groups Daniele Ceraolo Spurio
                   ` (11 preceding siblings ...)
  2025-11-27  1:52 ` ✓ CI.KUnit: success " Patchwork
@ 2025-11-27  2:36 ` Patchwork
  2025-11-27  3:18 ` ✗ Xe.CI.Full: " Patchwork
  13 siblings, 0 replies; 44+ messages in thread
From: Patchwork @ 2025-11-27  2:36 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 2786 bytes --]

== Series Details ==

Series: Introduce SRIOV scheduler groups
URL   : https://patchwork.freedesktop.org/series/158142/
State : failure

== Summary ==

CI Bug Log - changes from xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884_BAT -> xe-pw-158142v1_BAT
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with xe-pw-158142v1_BAT absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in xe-pw-158142v1_BAT, please notify your bug team (I915-ci-infra@lists.freedesktop.org) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (12 -> 11)
------------------------------

  Missing    (1): bat-adlp-vm 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in xe-pw-158142v1_BAT:

### IGT changes ###

#### Possible regressions ####

  * igt@xe_module_load@load:
    - bat-adlp-7:         [PASS][1] -> [ABORT][2]
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/bat-adlp-7/igt@xe_module_load@load.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/bat-adlp-7/igt@xe_module_load@load.html

  
Known issues
------------

  Here are the changes found in xe-pw-158142v1_BAT that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@xe_waitfence@abstime:
    - bat-dg2-oem2:       [PASS][3] -> [TIMEOUT][4] ([Intel XE#6506])
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/bat-dg2-oem2/igt@xe_waitfence@abstime.html
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/bat-dg2-oem2/igt@xe_waitfence@abstime.html

  
#### Possible fixes ####

  * igt@xe_waitfence@engine:
    - bat-dg2-oem2:       [FAIL][5] ([Intel XE#6519]) -> [PASS][6]
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/bat-dg2-oem2/igt@xe_waitfence@engine.html
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/bat-dg2-oem2/igt@xe_waitfence@engine.html

  
  [Intel XE#6506]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6506
  [Intel XE#6519]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6519


Build changes
-------------

  * Linux: xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884 -> xe-pw-158142v1

  IGT_8639: 2ce563031e6b2ec91479f6af8c326d25c15bdb26 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884: e7a767430515c3a6e8aee91c2a68cba8b06fe884
  xe-pw-158142v1: 158142v1

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/index.html

[-- Attachment #2: Type: text/html, Size: 3414 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* ✗ Xe.CI.Full: failure for Introduce SRIOV scheduler groups
  2025-11-27  1:45 [PATCH 00/10] Introduce SRIOV scheduler groups Daniele Ceraolo Spurio
                   ` (12 preceding siblings ...)
  2025-11-27  2:36 ` ✗ Xe.CI.BAT: failure " Patchwork
@ 2025-11-27  3:18 ` Patchwork
  2025-12-01 17:46   ` Daniele Ceraolo Spurio
  13 siblings, 1 reply; 44+ messages in thread
From: Patchwork @ 2025-11-27  3:18 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 29067 bytes --]

== Series Details ==

Series: Introduce SRIOV scheduler groups
URL   : https://patchwork.freedesktop.org/series/158142/
State : failure

== Summary ==

CI Bug Log - changes from xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884_FULL -> xe-pw-158142v1_FULL
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with xe-pw-158142v1_FULL absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in xe-pw-158142v1_FULL, please notify your bug team (I915-ci-infra@lists.freedesktop.org) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (4 -> 4)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in xe-pw-158142v1_FULL:

### IGT changes ###

#### Possible regressions ####

  * igt@xe_module_load@load:
    - shard-adlp:         ([PASS][1], [PASS][2], [PASS][3], [PASS][4], [PASS][5], [PASS][6], [PASS][7], [PASS][8], [PASS][9], [PASS][10], [PASS][11], [PASS][12], [PASS][13], [PASS][14], [PASS][15], [PASS][16], [PASS][17], [PASS][18], [PASS][19], [PASS][20], [PASS][21], [PASS][22], [PASS][23], [PASS][24], [PASS][25]) -> ([ABORT][26], [ABORT][27], [ABORT][28], [ABORT][29], [ABORT][30], [ABORT][31], [ABORT][32], [ABORT][33], [ABORT][34], [ABORT][35], [ABORT][36], [ABORT][37], [ABORT][38], [ABORT][39], [ABORT][40], [ABORT][41], [ABORT][42], [ABORT][43], [ABORT][44], [ABORT][45], [ABORT][46], [ABORT][47])
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-4/igt@xe_module_load@load.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-4/igt@xe_module_load@load.html
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-4/igt@xe_module_load@load.html
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-3/igt@xe_module_load@load.html
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-4/igt@xe_module_load@load.html
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-1/igt@xe_module_load@load.html
   [7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-1/igt@xe_module_load@load.html
   [8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-1/igt@xe_module_load@load.html
   [9]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-1/igt@xe_module_load@load.html
   [10]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-9/igt@xe_module_load@load.html
   [11]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-6/igt@xe_module_load@load.html
   [12]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-9/igt@xe_module_load@load.html
   [13]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-8/igt@xe_module_load@load.html
   [14]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-8/igt@xe_module_load@load.html
   [15]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-6/igt@xe_module_load@load.html
   [16]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-8/igt@xe_module_load@load.html
   [17]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-9/igt@xe_module_load@load.html
   [18]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-9/igt@xe_module_load@load.html
   [19]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-6/igt@xe_module_load@load.html
   [20]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-2/igt@xe_module_load@load.html
   [21]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-2/igt@xe_module_load@load.html
   [22]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-2/igt@xe_module_load@load.html
   [23]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-3/igt@xe_module_load@load.html
   [24]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-3/igt@xe_module_load@load.html
   [25]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-2/igt@xe_module_load@load.html
   [26]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-6/igt@xe_module_load@load.html
   [27]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-6/igt@xe_module_load@load.html
   [28]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-6/igt@xe_module_load@load.html
   [29]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-6/igt@xe_module_load@load.html
   [30]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-2/igt@xe_module_load@load.html
   [31]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-2/igt@xe_module_load@load.html
   [32]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-2/igt@xe_module_load@load.html
   [33]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-2/igt@xe_module_load@load.html
   [34]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-3/igt@xe_module_load@load.html
   [35]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-3/igt@xe_module_load@load.html
   [36]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-3/igt@xe_module_load@load.html
   [37]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-1/igt@xe_module_load@load.html
   [38]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-1/igt@xe_module_load@load.html
   [39]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-1/igt@xe_module_load@load.html
   [40]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-4/igt@xe_module_load@load.html
   [41]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-4/igt@xe_module_load@load.html
   [42]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-4/igt@xe_module_load@load.html
   [43]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-8/igt@xe_module_load@load.html
   [44]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-8/igt@xe_module_load@load.html
   [45]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-8/igt@xe_module_load@load.html
   [46]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-9/igt@xe_module_load@load.html
   [47]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-9/igt@xe_module_load@load.html
    - shard-dg2-set2:     ([PASS][48], [PASS][49], [PASS][50], [PASS][51], [PASS][52], [PASS][53], [PASS][54], [PASS][55], [PASS][56], [PASS][57], [PASS][58], [PASS][59], [PASS][60], [PASS][61], [PASS][62], [PASS][63], [PASS][64], [PASS][65], [PASS][66], [PASS][67], [PASS][68], [PASS][69], [PASS][70], [PASS][71]) -> ([ABORT][72], [ABORT][73], [ABORT][74], [ABORT][75], [ABORT][76], [ABORT][77], [ABORT][78], [ABORT][79], [ABORT][80], [ABORT][81], [ABORT][82], [ABORT][83], [ABORT][84], [ABORT][85], [ABORT][86], [ABORT][87], [ABORT][88], [ABORT][89], [ABORT][90], [ABORT][91], [ABORT][92], [ABORT][93], [ABORT][94])
   [48]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-435/igt@xe_module_load@load.html
   [49]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-466/igt@xe_module_load@load.html
   [50]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-463/igt@xe_module_load@load.html
   [51]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-435/igt@xe_module_load@load.html
   [52]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-463/igt@xe_module_load@load.html
   [53]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-463/igt@xe_module_load@load.html
   [54]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-432/igt@xe_module_load@load.html
   [55]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-435/igt@xe_module_load@load.html
   [56]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-466/igt@xe_module_load@load.html
   [57]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-432/igt@xe_module_load@load.html
   [58]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-432/igt@xe_module_load@load.html
   [59]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-466/igt@xe_module_load@load.html
   [60]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-436/igt@xe_module_load@load.html
   [61]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-436/igt@xe_module_load@load.html
   [62]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-436/igt@xe_module_load@load.html
   [63]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-436/igt@xe_module_load@load.html
   [64]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-464/igt@xe_module_load@load.html
   [65]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-464/igt@xe_module_load@load.html
   [66]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-433/igt@xe_module_load@load.html
   [67]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-433/igt@xe_module_load@load.html
   [68]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-464/igt@xe_module_load@load.html
   [69]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-433/igt@xe_module_load@load.html
   [70]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-434/igt@xe_module_load@load.html
   [71]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-434/igt@xe_module_load@load.html
   [72]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-432/igt@xe_module_load@load.html
   [73]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-432/igt@xe_module_load@load.html
   [74]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-432/igt@xe_module_load@load.html
   [75]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-432/igt@xe_module_load@load.html
   [76]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-436/igt@xe_module_load@load.html
   [77]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-436/igt@xe_module_load@load.html
   [78]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-433/igt@xe_module_load@load.html
   [79]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-433/igt@xe_module_load@load.html
   [80]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-433/igt@xe_module_load@load.html
   [81]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-464/igt@xe_module_load@load.html
   [82]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-464/igt@xe_module_load@load.html
   [83]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-464/igt@xe_module_load@load.html
   [84]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-434/igt@xe_module_load@load.html
   [85]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-434/igt@xe_module_load@load.html
   [86]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-434/igt@xe_module_load@load.html
   [87]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-435/igt@xe_module_load@load.html
   [88]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-435/igt@xe_module_load@load.html
   [89]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-435/igt@xe_module_load@load.html
   [90]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-466/igt@xe_module_load@load.html
   [91]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-466/igt@xe_module_load@load.html
   [92]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-466/igt@xe_module_load@load.html
   [93]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-463/igt@xe_module_load@load.html
   [94]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-463/igt@xe_module_load@load.html

  
New tests
---------

  New tests have been introduced between xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884_FULL and xe-pw-158142v1_FULL:

### New IGT tests (1) ###

  * igt@xe_gpgpu_fill@offset-16x16:
    - Statuses : 2 pass(s)
    - Exec time: [0.00] s

  

Known issues
------------

  Here are the changes found in xe-pw-158142v1_FULL that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0:
    - shard-bmg:          NOTRUN -> [SKIP][95] ([Intel XE#1124])
   [95]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-1/igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0.html

  * igt@kms_cdclk@mode-transition:
    - shard-bmg:          NOTRUN -> [SKIP][96] ([Intel XE#2724])
   [96]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-1/igt@kms_cdclk@mode-transition.html

  * igt@kms_chamelium_hpd@hdmi-hpd-enable-disable-mode:
    - shard-bmg:          NOTRUN -> [SKIP][97] ([Intel XE#2252])
   [97]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-1/igt@kms_chamelium_hpd@hdmi-hpd-enable-disable-mode.html

  * igt@kms_cursor_legacy@2x-long-nonblocking-modeset-vs-cursor-atomic:
    - shard-bmg:          [PASS][98] -> [SKIP][99] ([Intel XE#2291])
   [98]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-7/igt@kms_cursor_legacy@2x-long-nonblocking-modeset-vs-cursor-atomic.html
   [99]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-6/igt@kms_cursor_legacy@2x-long-nonblocking-modeset-vs-cursor-atomic.html

  * igt@kms_flip@2x-flip-vs-rmfb:
    - shard-bmg:          [PASS][100] -> [SKIP][101] ([Intel XE#2316])
   [100]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-7/igt@kms_flip@2x-flip-vs-rmfb.html
   [101]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-6/igt@kms_flip@2x-flip-vs-rmfb.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-2p-scndscrn-indfb-pgflip-blt:
    - shard-bmg:          NOTRUN -> [SKIP][102] ([Intel XE#2311])
   [102]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-1/igt@kms_frontbuffer_tracking@fbcdrrs-2p-scndscrn-indfb-pgflip-blt.html

  * igt@kms_frontbuffer_tracking@psr-2p-scndscrn-shrfb-plflip-blt:
    - shard-bmg:          NOTRUN -> [SKIP][103] ([Intel XE#2313]) +1 other test skip
   [103]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-1/igt@kms_frontbuffer_tracking@psr-2p-scndscrn-shrfb-plflip-blt.html

  * igt@kms_hdr@bpc-switch-dpms:
    - shard-bmg:          [PASS][104] -> [ABORT][105] ([Intel XE#6662]) +1 other test abort
   [104]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-4/igt@kms_hdr@bpc-switch-dpms.html
   [105]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-7/igt@kms_hdr@bpc-switch-dpms.html

  * igt@kms_plane@plane-panning-bottom-right-suspend@pipe-b:
    - shard-bmg:          [PASS][106] -> [ABORT][107] ([Intel XE#6675]) +1 other test abort
   [106]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-6/igt@kms_plane@plane-panning-bottom-right-suspend@pipe-b.html
   [107]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-2/igt@kms_plane@plane-panning-bottom-right-suspend@pipe-b.html

  * igt@kms_psr2_sf@fbc-pr-overlay-primary-update-sf-dmg-area:
    - shard-bmg:          NOTRUN -> [SKIP][108] ([Intel XE#1406] / [Intel XE#1489]) +1 other test skip
   [108]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-1/igt@kms_psr2_sf@fbc-pr-overlay-primary-update-sf-dmg-area.html

  * igt@kms_vblank@ts-continuation-suspend:
    - shard-bmg:          NOTRUN -> [ABORT][109] ([Intel XE#6675]) +3 other tests abort
   [109]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-1/igt@kms_vblank@ts-continuation-suspend.html

  * igt@xe_exec_system_allocator@many-stride-malloc-prefetch:
    - shard-bmg:          [PASS][110] -> [WARN][111] ([Intel XE#5786])
   [110]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-8/igt@xe_exec_system_allocator@many-stride-malloc-prefetch.html
   [111]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-8/igt@xe_exec_system_allocator@many-stride-malloc-prefetch.html

  * igt@xe_exec_system_allocator@process-many-free-madvise:
    - shard-bmg:          [PASS][112] -> [ABORT][113] ([Intel XE#3970])
   [112]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-8/igt@xe_exec_system_allocator@process-many-free-madvise.html
   [113]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-3/igt@xe_exec_system_allocator@process-many-free-madvise.html

  * igt@xe_pmu@engine-activity-accuracy-50:
    - shard-lnl:          [PASS][114] -> [FAIL][115] ([Intel XE#6251]) +2 other tests fail
   [114]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-lnl-1/igt@xe_pmu@engine-activity-accuracy-50.html
   [115]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-lnl-2/igt@xe_pmu@engine-activity-accuracy-50.html

  
#### Possible fixes ####

  * igt@kms_cursor_edge_walk@256x256-left-edge:
    - shard-bmg:          [FAIL][116] -> [PASS][117]
   [116]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-2/igt@kms_cursor_edge_walk@256x256-left-edge.html
   [117]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-4/igt@kms_cursor_edge_walk@256x256-left-edge.html

  * igt@kms_cursor_legacy@cursorb-vs-flipb-toggle:
    - shard-bmg:          [SKIP][118] ([Intel XE#2291]) -> [PASS][119] +3 other tests pass
   [118]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-6/igt@kms_cursor_legacy@cursorb-vs-flipb-toggle.html
   [119]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-5/igt@kms_cursor_legacy@cursorb-vs-flipb-toggle.html

  * igt@kms_cursor_legacy@flip-vs-cursor-legacy:
    - shard-bmg:          [FAIL][120] ([Intel XE#5299]) -> [PASS][121]
   [120]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-8/igt@kms_cursor_legacy@flip-vs-cursor-legacy.html
   [121]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-4/igt@kms_cursor_legacy@flip-vs-cursor-legacy.html

  * igt@kms_flip@2x-flip-vs-dpms-on-nop-interruptible:
    - shard-bmg:          [SKIP][122] ([Intel XE#2316]) -> [PASS][123] +1 other test pass
   [122]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-6/igt@kms_flip@2x-flip-vs-dpms-on-nop-interruptible.html
   [123]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-5/igt@kms_flip@2x-flip-vs-dpms-on-nop-interruptible.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@c-edp1:
    - shard-lnl:          [FAIL][124] ([Intel XE#301] / [Intel XE#3149]) -> [PASS][125]
   [124]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-lnl-5/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-edp1.html
   [125]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-lnl-5/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-edp1.html

  * igt@kms_plane@plane-panning-bottom-right-suspend@pipe-a:
    - shard-bmg:          [ABORT][126] ([Intel XE#6675]) -> [PASS][127]
   [126]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-6/igt@kms_plane@plane-panning-bottom-right-suspend@pipe-a.html
   [127]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-2/igt@kms_plane@plane-panning-bottom-right-suspend@pipe-a.html

  * igt@kms_vblank@ts-continuation-dpms-suspend@pipe-c-edp-1:
    - shard-lnl:          [INCOMPLETE][128] ([Intel XE#4488]) -> [PASS][129]
   [128]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-lnl-4/igt@kms_vblank@ts-continuation-dpms-suspend@pipe-c-edp-1.html
   [129]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-lnl-4/igt@kms_vblank@ts-continuation-dpms-suspend@pipe-c-edp-1.html

  * igt@xe_exec_system_allocator@processes-evict-malloc:
    - shard-bmg:          [ABORT][130] ([Intel XE#3970]) -> [PASS][131]
   [130]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-5/igt@xe_exec_system_allocator@processes-evict-malloc.html
   [131]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-1/igt@xe_exec_system_allocator@processes-evict-malloc.html

  
#### Warnings ####

  * igt@kms_flip@flip-vs-expired-vblank-interruptible:
    - shard-lnl:          [FAIL][132] ([Intel XE#301] / [Intel XE#3149]) -> [FAIL][133] ([Intel XE#301])
   [132]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-lnl-5/igt@kms_flip@flip-vs-expired-vblank-interruptible.html
   [133]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-lnl-5/igt@kms_flip@flip-vs-expired-vblank-interruptible.html

  * igt@kms_frontbuffer_tracking@drrs-2p-primscrn-pri-indfb-draw-mmap-wc:
    - shard-bmg:          [SKIP][134] ([Intel XE#2312]) -> [SKIP][135] ([Intel XE#2311]) +7 other tests skip
   [134]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-2/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-pri-indfb-draw-mmap-wc.html
   [135]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-4/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-pri-indfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-shrfb-draw-blt:
    - shard-bmg:          [SKIP][136] ([Intel XE#2312]) -> [SKIP][137] ([Intel XE#4141])
   [136]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-2/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-shrfb-draw-blt.html
   [137]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-4/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-shrfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-indfb-draw-blt:
    - shard-bmg:          [SKIP][138] ([Intel XE#4141]) -> [SKIP][139] ([Intel XE#2312])
   [138]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-7/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-indfb-draw-blt.html
   [139]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-6/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-indfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-spr-indfb-fullscreen:
    - shard-bmg:          [SKIP][140] ([Intel XE#2311]) -> [SKIP][141] ([Intel XE#2312]) +3 other tests skip
   [140]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-7/igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-spr-indfb-fullscreen.html
   [141]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-6/igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-spr-indfb-fullscreen.html

  * igt@kms_frontbuffer_tracking@psr-2p-primscrn-cur-indfb-draw-render:
    - shard-bmg:          [SKIP][142] ([Intel XE#2312]) -> [SKIP][143] ([Intel XE#2313]) +6 other tests skip
   [142]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-6/igt@kms_frontbuffer_tracking@psr-2p-primscrn-cur-indfb-draw-render.html
   [143]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-5/igt@kms_frontbuffer_tracking@psr-2p-primscrn-cur-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@psr-2p-scndscrn-pri-indfb-draw-mmap-wc:
    - shard-bmg:          [SKIP][144] ([Intel XE#2313]) -> [SKIP][145] ([Intel XE#2312]) +3 other tests skip
   [144]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-7/igt@kms_frontbuffer_tracking@psr-2p-scndscrn-pri-indfb-draw-mmap-wc.html
   [145]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-6/igt@kms_frontbuffer_tracking@psr-2p-scndscrn-pri-indfb-draw-mmap-wc.html

  * igt@kms_tiled_display@basic-test-pattern-with-chamelium:
    - shard-bmg:          [SKIP][146] ([Intel XE#2509]) -> [SKIP][147] ([Intel XE#2426])
   [146]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-4/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html
   [147]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-8/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html

  
  [Intel XE#1124]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1124
  [Intel XE#1406]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1406
  [Intel XE#1489]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1489
  [Intel XE#2252]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2252
  [Intel XE#2291]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2291
  [Intel XE#2311]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2311
  [Intel XE#2312]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2312
  [Intel XE#2313]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2313
  [Intel XE#2316]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2316
  [Intel XE#2426]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2426
  [Intel XE#2509]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2509
  [Intel XE#2724]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2724
  [Intel XE#301]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/301
  [Intel XE#3149]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3149
  [Intel XE#3970]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3970
  [Intel XE#4141]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4141
  [Intel XE#4488]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4488
  [Intel XE#5299]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5299
  [Intel XE#5786]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5786
  [Intel XE#6251]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6251
  [Intel XE#6662]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6662
  [Intel XE#6675]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6675


Build changes
-------------

  * Linux: xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884 -> xe-pw-158142v1

  IGT_8639: 2ce563031e6b2ec91479f6af8c326d25c15bdb26 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884: e7a767430515c3a6e8aee91c2a68cba8b06fe884
  xe-pw-158142v1: 158142v1

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/index.html

[-- Attachment #2: Type: text/html, Size: 31722 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: ✗ Xe.CI.Full: failure for Introduce SRIOV scheduler groups
  2025-11-27  3:18 ` ✗ Xe.CI.Full: " Patchwork
@ 2025-12-01 17:46   ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 44+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-12-01 17:46 UTC (permalink / raw)
  To: intel-xe

[-- Attachment #1: Type: text/plain, Size: 31631 bytes --]



On 11/26/2025 7:18 PM, Patchwork wrote:
> Project List - Patchwork *Patch Details*
> *Series:* 	Introduce SRIOV scheduler groups
> *URL:* 	https://patchwork.freedesktop.org/series/158142/
> *State:* 	failure
> *Details:* 
> https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/index.html
>
>
>   CI Bug Log - changes from
>   xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884_FULL ->
>   xe-pw-158142v1_FULL
>
>
>     Summary
>
> *FAILURE*
>
> Serious unknown changes coming with xe-pw-158142v1_FULL absolutely 
> need to be
> verified manually.
>
> If you think the reported changes have nothing to do with the changes
> introduced in xe-pw-158142v1_FULL, please notify your bug team 
> (I915-ci-infra@lists.freedesktop.org) to allow them
> to document this new failure mode, which will reduce false positives 
> in CI.
>
>
>     Participating hosts (4 -> 4)
>
> No changes in participating hosts
>
>
>     Possible new issues
>
> Here are the unknown changes that may have been introduced in 
> xe-pw-158142v1_FULL:
>
>
>       IGT changes
>
>
>         Possible regressions
>
>   * igt@xe_module_load@load:
>       o shard-adlp: (PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-4/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-4/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-4/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-3/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-4/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-1/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-1/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-1/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-1/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-9/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-6/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-9/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-8/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-8/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-6/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-8/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-9/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-9/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-6/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-2/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-2/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-2/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-3/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-3/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-adlp-2/igt@xe_module_load@load.html>)
>         -> (ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-6/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-6/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-6/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-6/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-2/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-2/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-2/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-2/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-3/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-3/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-3/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-1/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-1/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-1/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-4/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-4/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-4/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-8/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-8/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-8/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-9/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-adlp-9/igt@xe_module_load@load.html>)
>       o shard-dg2-set2: (PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-435/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-466/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-463/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-435/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-463/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-463/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-432/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-435/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-466/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-432/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-432/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-466/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-436/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-436/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-436/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-436/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-464/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-464/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-433/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-433/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-464/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-433/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-434/igt@xe_module_load@load.html>,
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-dg2-434/igt@xe_module_load@load.html>)
>         -> (ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-432/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-432/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-432/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-432/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-436/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-436/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-433/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-433/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-433/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-464/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-464/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-464/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-434/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-434/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-434/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-435/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-435/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-435/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-466/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-466/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-466/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-463/igt@xe_module_load@load.html>,
>         ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-dg2-463/igt@xe_module_load@load.html>)
>

I missed the fact that the GuC FW only supports this feature for BMG and 
newer HW (I did all my local testing on BMG), so it fails when we try to 
enable it on older platforms. I'll add a platform check in the next rev.

Daniele

>  *
>
>
>     New tests
>
> New tests have been introduced between 
> xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884_FULL and 
> xe-pw-158142v1_FULL:
>
>
>       New IGT tests (1)
>
>   * igt@xe_gpgpu_fill@offset-16x16:
>       o Statuses : 2 pass(s)
>       o Exec time: [0.00] s
>
>
>     Known issues
>
> Here are the changes found in xe-pw-158142v1_FULL that come from known 
> issues:
>
>
>       IGT changes
>
>
>         Issues hit
>
>  *
>
>     igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0:
>
>       o shard-bmg: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-1/igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0.html>
>         (Intel XE#1124
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/1124>)
>  *
>
>     igt@kms_cdclk@mode-transition:
>
>       o shard-bmg: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-1/igt@kms_cdclk@mode-transition.html>
>         (Intel XE#2724
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2724>)
>  *
>
>     igt@kms_chamelium_hpd@hdmi-hpd-enable-disable-mode:
>
>       o shard-bmg: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-1/igt@kms_chamelium_hpd@hdmi-hpd-enable-disable-mode.html>
>         (Intel XE#2252
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2252>)
>  *
>
>     igt@kms_cursor_legacy@2x-long-nonblocking-modeset-vs-cursor-atomic:
>
>       o shard-bmg: PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-7/igt@kms_cursor_legacy@2x-long-nonblocking-modeset-vs-cursor-atomic.html>
>         -> SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-6/igt@kms_cursor_legacy@2x-long-nonblocking-modeset-vs-cursor-atomic.html>
>         (Intel XE#2291
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2291>)
>  *
>
>     igt@kms_flip@2x-flip-vs-rmfb:
>
>       o shard-bmg: PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-7/igt@kms_flip@2x-flip-vs-rmfb.html>
>         -> SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-6/igt@kms_flip@2x-flip-vs-rmfb.html>
>         (Intel XE#2316
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2316>)
>  *
>
>     igt@kms_frontbuffer_tracking@fbcdrrs-2p-scndscrn-indfb-pgflip-blt:
>
>       o shard-bmg: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-1/igt@kms_frontbuffer_tracking@fbcdrrs-2p-scndscrn-indfb-pgflip-blt.html>
>         (Intel XE#2311
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2311>)
>  *
>
>     igt@kms_frontbuffer_tracking@psr-2p-scndscrn-shrfb-plflip-blt:
>
>       o shard-bmg: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-1/igt@kms_frontbuffer_tracking@psr-2p-scndscrn-shrfb-plflip-blt.html>
>         (Intel XE#2313
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2313>) +1
>         other test skip
>  *
>
>     igt@kms_hdr@bpc-switch-dpms:
>
>       o shard-bmg: PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-4/igt@kms_hdr@bpc-switch-dpms.html>
>         -> ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-7/igt@kms_hdr@bpc-switch-dpms.html>
>         (Intel XE#6662
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/6662>) +1
>         other test abort
>  *
>
>     igt@kms_plane@plane-panning-bottom-right-suspend@pipe-b:
>
>       o shard-bmg: PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-6/igt@kms_plane@plane-panning-bottom-right-suspend@pipe-b.html>
>         -> ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-2/igt@kms_plane@plane-panning-bottom-right-suspend@pipe-b.html>
>         (Intel XE#6675
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/6675>) +1
>         other test abort
>  *
>
>     igt@kms_psr2_sf@fbc-pr-overlay-primary-update-sf-dmg-area:
>
>       o shard-bmg: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-1/igt@kms_psr2_sf@fbc-pr-overlay-primary-update-sf-dmg-area.html>
>         (Intel XE#1406
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/1406> /
>         Intel XE#1489
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/1489>) +1
>         other test skip
>  *
>
>     igt@kms_vblank@ts-continuation-suspend:
>
>       o shard-bmg: NOTRUN -> ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-1/igt@kms_vblank@ts-continuation-suspend.html>
>         (Intel XE#6675
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/6675>) +3
>         other tests abort
>  *
>
>     igt@xe_exec_system_allocator@many-stride-malloc-prefetch:
>
>       o shard-bmg: PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-8/igt@xe_exec_system_allocator@many-stride-malloc-prefetch.html>
>         -> WARN
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-8/igt@xe_exec_system_allocator@many-stride-malloc-prefetch.html>
>         (Intel XE#5786
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/5786>)
>  *
>
>     igt@xe_exec_system_allocator@process-many-free-madvise:
>
>       o shard-bmg: PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-8/igt@xe_exec_system_allocator@process-many-free-madvise.html>
>         -> ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-3/igt@xe_exec_system_allocator@process-many-free-madvise.html>
>         (Intel XE#3970
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/3970>)
>  *
>
>     igt@xe_pmu@engine-activity-accuracy-50:
>
>       o shard-lnl: PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-lnl-1/igt@xe_pmu@engine-activity-accuracy-50.html>
>         -> FAIL
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-lnl-2/igt@xe_pmu@engine-activity-accuracy-50.html>
>         (Intel XE#6251
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/6251>) +2
>         other tests fail
>
>
>         Possible fixes
>
>  *
>
>     igt@kms_cursor_edge_walk@256x256-left-edge:
>
>       o shard-bmg: FAIL
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-2/igt@kms_cursor_edge_walk@256x256-left-edge.html>
>         -> PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-4/igt@kms_cursor_edge_walk@256x256-left-edge.html>
>  *
>
>     igt@kms_cursor_legacy@cursorb-vs-flipb-toggle:
>
>       o shard-bmg: SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-6/igt@kms_cursor_legacy@cursorb-vs-flipb-toggle.html>
>         (Intel XE#2291
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2291>) ->
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-5/igt@kms_cursor_legacy@cursorb-vs-flipb-toggle.html>
>         +3 other tests pass
>  *
>
>     igt@kms_cursor_legacy@flip-vs-cursor-legacy:
>
>       o shard-bmg: FAIL
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-8/igt@kms_cursor_legacy@flip-vs-cursor-legacy.html>
>         (Intel XE#5299
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/5299>) ->
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-4/igt@kms_cursor_legacy@flip-vs-cursor-legacy.html>
>  *
>
>     igt@kms_flip@2x-flip-vs-dpms-on-nop-interruptible:
>
>       o shard-bmg: SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-6/igt@kms_flip@2x-flip-vs-dpms-on-nop-interruptible.html>
>         (Intel XE#2316
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2316>) ->
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-5/igt@kms_flip@2x-flip-vs-dpms-on-nop-interruptible.html>
>         +1 other test pass
>  *
>
>     igt@kms_flip@flip-vs-expired-vblank-interruptible@c-edp1:
>
>       o shard-lnl: FAIL
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-lnl-5/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-edp1.html>
>         (Intel XE#301
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/301> /
>         Intel XE#3149
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/3149>) ->
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-lnl-5/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-edp1.html>
>  *
>
>     igt@kms_plane@plane-panning-bottom-right-suspend@pipe-a:
>
>       o shard-bmg: ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-6/igt@kms_plane@plane-panning-bottom-right-suspend@pipe-a.html>
>         (Intel XE#6675
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/6675>) ->
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-2/igt@kms_plane@plane-panning-bottom-right-suspend@pipe-a.html>
>  *
>
>     igt@kms_vblank@ts-continuation-dpms-suspend@pipe-c-edp-1:
>
>       o shard-lnl: INCOMPLETE
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-lnl-4/igt@kms_vblank@ts-continuation-dpms-suspend@pipe-c-edp-1.html>
>         (Intel XE#4488
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/4488>) ->
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-lnl-4/igt@kms_vblank@ts-continuation-dpms-suspend@pipe-c-edp-1.html>
>  *
>
>     igt@xe_exec_system_allocator@processes-evict-malloc:
>
>       o shard-bmg: ABORT
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-5/igt@xe_exec_system_allocator@processes-evict-malloc.html>
>         (Intel XE#3970
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/3970>) ->
>         PASS
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-1/igt@xe_exec_system_allocator@processes-evict-malloc.html>
>
>
>         Warnings
>
>  *
>
>     igt@kms_flip@flip-vs-expired-vblank-interruptible:
>
>       o shard-lnl: FAIL
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-lnl-5/igt@kms_flip@flip-vs-expired-vblank-interruptible.html>
>         (Intel XE#301
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/301> /
>         Intel XE#3149
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/3149>) ->
>         FAIL
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-lnl-5/igt@kms_flip@flip-vs-expired-vblank-interruptible.html>
>         (Intel XE#301
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/301>)
>  *
>
>     igt@kms_frontbuffer_tracking@drrs-2p-primscrn-pri-indfb-draw-mmap-wc:
>
>       o shard-bmg: SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-2/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-pri-indfb-draw-mmap-wc.html>
>         (Intel XE#2312
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2312>) ->
>         SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-4/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-pri-indfb-draw-mmap-wc.html>
>         (Intel XE#2311
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2311>) +7
>         other tests skip
>  *
>
>     igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-shrfb-draw-blt:
>
>       o shard-bmg: SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-2/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-shrfb-draw-blt.html>
>         (Intel XE#2312
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2312>) ->
>         SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-4/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-shrfb-draw-blt.html>
>         (Intel XE#4141
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/4141>)
>  *
>
>     igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-indfb-draw-blt:
>
>       o shard-bmg: SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-7/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-indfb-draw-blt.html>
>         (Intel XE#4141
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/4141>) ->
>         SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-6/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-indfb-draw-blt.html>
>         (Intel XE#2312
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2312>)
>  *
>
>     igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-spr-indfb-fullscreen:
>
>       o shard-bmg: SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-7/igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-spr-indfb-fullscreen.html>
>         (Intel XE#2311
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2311>) ->
>         SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-6/igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-spr-indfb-fullscreen.html>
>         (Intel XE#2312
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2312>) +3
>         other tests skip
>  *
>
>     igt@kms_frontbuffer_tracking@psr-2p-primscrn-cur-indfb-draw-render:
>
>       o shard-bmg: SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-6/igt@kms_frontbuffer_tracking@psr-2p-primscrn-cur-indfb-draw-render.html>
>         (Intel XE#2312
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2312>) ->
>         SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-5/igt@kms_frontbuffer_tracking@psr-2p-primscrn-cur-indfb-draw-render.html>
>         (Intel XE#2313
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2313>) +6
>         other tests skip
>  *
>
>     igt@kms_frontbuffer_tracking@psr-2p-scndscrn-pri-indfb-draw-mmap-wc:
>
>       o shard-bmg: SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-7/igt@kms_frontbuffer_tracking@psr-2p-scndscrn-pri-indfb-draw-mmap-wc.html>
>         (Intel XE#2313
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2313>) ->
>         SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-6/igt@kms_frontbuffer_tracking@psr-2p-scndscrn-pri-indfb-draw-mmap-wc.html>
>         (Intel XE#2312
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2312>) +3
>         other tests skip
>  *
>
>     igt@kms_tiled_display@basic-test-pattern-with-chamelium:
>
>       o shard-bmg: SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884/shard-bmg-4/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html>
>         (Intel XE#2509
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2509>) ->
>         SKIP
>         <https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158142v1/shard-bmg-8/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html>
>         (Intel XE#2426
>         <https://gitlab.freedesktop.org/drm/xe/kernel/issues/2426>)
>
>
>     Build changes
>
>   * Linux: xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884 ->
>     xe-pw-158142v1
>
> IGT_8639: 2ce563031e6b2ec91479f6af8c326d25c15bdb26 @ 
> https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
> xe-4156-e7a767430515c3a6e8aee91c2a68cba8b06fe884: 
> e7a767430515c3a6e8aee91c2a68cba8b06fe884
> xe-pw-158142v1: 158142v1
>

[-- Attachment #2: Type: text/html, Size: 39335 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2025-12-06  1:58 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-27  1:45 [PATCH 00/10] Introduce SRIOV scheduler groups Daniele Ceraolo Spurio
2025-11-27  1:45 ` [PATCH 01/10] drm/xe/gt: Add engine masks for each class Daniele Ceraolo Spurio
2025-12-01 16:52   ` Michal Wajdeczko
2025-11-27  1:45 ` [PATCH 02/10] drm/xe/sriov: Initialize scheduler groups Daniele Ceraolo Spurio
2025-12-01 22:37   ` Michal Wajdeczko
2025-12-01 23:33     ` Daniele Ceraolo Spurio
2025-12-02 21:08       ` Michal Wajdeczko
2025-12-02 23:02         ` Daniele Ceraolo Spurio
2025-12-03  1:15         ` Daniele Ceraolo Spurio
2025-11-27  1:45 ` [PATCH 03/10] drm/xe/sriov: Add support for enabling " Daniele Ceraolo Spurio
2025-12-02 11:49   ` Michal Wajdeczko
2025-12-02 17:39     ` Daniele Ceraolo Spurio
2025-12-04 22:06       ` Daniele Ceraolo Spurio
2025-11-27  1:45 ` [PATCH 04/10] drm/xe/sriov: Scheduler groups are incompatible with multi-lrc Daniele Ceraolo Spurio
2025-12-02 13:32   ` Michal Wajdeczko
2025-12-02 17:57     ` Daniele Ceraolo Spurio
2025-12-02 21:17       ` Michal Wajdeczko
2025-12-02 21:25         ` Daniele Ceraolo Spurio
2025-12-02 21:37           ` Michal Wajdeczko
2025-12-02 21:42             ` Daniele Ceraolo Spurio
2025-11-27  1:45 ` [PATCH 05/10] drm/xe/sriov: Add debugfs to enable scheduler groups Daniele Ceraolo Spurio
2025-12-02 15:52   ` Michal Wajdeczko
2025-12-02 18:03     ` Daniele Ceraolo Spurio
2025-12-02 21:24       ` Michal Wajdeczko
2025-11-27  1:45 ` [PATCH 06/10] drm/xe/sriov: Add debugfs with scheduler groups information Daniele Ceraolo Spurio
2025-12-02 16:24   ` Michal Wajdeczko
2025-12-02 18:20     ` Daniele Ceraolo Spurio
2025-12-02 21:31       ` Michal Wajdeczko
2025-11-27  1:45 ` [PATCH 07/10] drm/xe/sriov: Prep for multiple exec quantums and preemption timeouts Daniele Ceraolo Spurio
2025-12-02 16:42   ` Michal Wajdeczko
2025-12-06  1:55     ` Daniele Ceraolo Spurio
2025-11-27  1:45 ` [PATCH 08/10] drm/xe/sriov: Add functions to set exec quantums for each group Daniele Ceraolo Spurio
2025-12-02 19:54   ` Michal Wajdeczko
2025-12-06  1:58     ` Daniele Ceraolo Spurio
2025-11-27  1:45 ` [PATCH 09/10] drm/xe/sriov: Add functions to set preempt timeouts " Daniele Ceraolo Spurio
2025-12-02 20:01   ` Michal Wajdeczko
2025-11-27  1:45 ` [PATCH 10/10] drm/xe/sriov: Add debugfs to set EQ and PT for scheduler groups Daniele Ceraolo Spurio
2025-12-02 20:17   ` Michal Wajdeczko
2025-12-06  1:53     ` Daniele Ceraolo Spurio
2025-11-27  1:51 ` ✗ CI.checkpatch: warning for Introduce SRIOV " Patchwork
2025-11-27  1:52 ` ✓ CI.KUnit: success " Patchwork
2025-11-27  2:36 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-11-27  3:18 ` ✗ Xe.CI.Full: " Patchwork
2025-12-01 17:46   ` Daniele Ceraolo Spurio

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox