Re: [PATCH 02/10] drm/xe/sriov: Initialize scheduler groups

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Michal Wajdeczko <michal.wajdeczko@intel.com>
To: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>,
	<intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH 02/10] drm/xe/sriov: Initialize scheduler groups
Date: Mon, 1 Dec 2025 23:37:12 +0100	[thread overview]
Message-ID: <0254302e-35cf-4721-84d8-d0c95c2e6d62@intel.com> (raw)
In-Reply-To: <20251127014507.2323746-14-daniele.ceraolospurio@intel.com>

Hi Daniele,

some initial comments to keep you busy ;)

On 11/27/2025 2:45 AM, Daniele Ceraolo Spurio wrote:
> Scheduler groups (a.k.a. Engine Groups Scheduling, or EGS) is a GuC
> feature that allows the driver to define groups of engines that are
> independently scheduled across VFs, which allows different VFs to be
> active on the HW at the same time on different groups.
> 
> This is intended for specific scenarios where the admin knows that the
> VFs are not going to fully utilize the HW and therefore assigning all of
> it to a single VF would lead to part of it being permanently idle.
> We do not allow the admin to decide how to divide the engines across
> groups, but we instead support specific configurations that are designed
> for specific use-cases. During PF initialization we detect which
> configurations are possible on a given GT and create the relevant
> groups. Since the GuC expect a mask for each class for each group, that
> is what we save when we init the configs.
> 
> Right now we only have one use-case on the media GT. If the VFs are
> running a frame render + encoding at a not-too-high resolution (e.g.
> 1080@30fps) the render can produce frames faster than the video engine
> can encode them, which means that the maximum number of parallel VFs is
> limited by the VCS bandwidth. Since our products can have multiple VCS
> engines, allowing multiple VFs to be active on the different VCS engines
> at the same time allows us to run more parallel VFs on the same HW.
> Given that engines in the same media slice share some resources (e.g.
> SFC), we assign each media slice to a different scheduling group. We
> refer to this configuration as "media_slices", given that each slice
> gets its own group.
> 
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt_sriov_pf.c           |   5 +
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c    | 135 ++++++++++++++++++
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h    |   1 +
>  .../gpu/drm/xe/xe_gt_sriov_pf_policy_types.h  |  14 ++
>  4 files changed, 155 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
> index 0714c758b9c1..62dda1c24e77 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
> @@ -14,6 +14,7 @@
>  #include "xe_gt_sriov_pf_control.h"
>  #include "xe_gt_sriov_pf_helpers.h"
>  #include "xe_gt_sriov_pf_migration.h"
> +#include "xe_gt_sriov_pf_policy.h"
>  #include "xe_gt_sriov_pf_service.h"
>  #include "xe_gt_sriov_printk.h"
>  #include "xe_guc_submit.h"
> @@ -123,6 +124,10 @@ int xe_gt_sriov_pf_init(struct xe_gt *gt)
>  	if (err)
>  		return err;
>  
> +	err = xe_gt_sriov_pf_policy_init(gt);
> +	if (err)
> +		return err;
> +
>  	err = xe_gt_sriov_pf_migration_init(gt);
>  	if (err)
>  		return err;
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
> index 4445f660e6d1..9b878578ea90 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
> @@ -5,11 +5,14 @@
>  
>  #include "abi/guc_actions_sriov_abi.h"
>  
> +#include <drm/drm_managed.h>

system headers shall be included before our local headers

> +
>  #include "xe_bo.h"
>  #include "xe_gt.h"
>  #include "xe_gt_sriov_pf_helpers.h"
>  #include "xe_gt_sriov_pf_policy.h"
>  #include "xe_gt_sriov_printk.h"
> +#include "xe_guc.h"
>  #include "xe_guc_buf.h"
>  #include "xe_guc_ct.h"
>  #include "xe_guc_klv_helpers.h"
> @@ -351,6 +354,125 @@ u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt)
>  	return value;
>  }
>  
> +#define MAX_MEDIA_SLICES (hweight32(XE_HW_ENGINE_VECS_MASK))

redundant ( )

and maybe such macro should be placed in xe_hw_engine_types.h
with proper comment why it works:

	/*
	 * Each media slice has 1x VECS
	 * Max number of VECS instances gives us a max number of slices
	 */

> +static int pf_sched_group_media_slices(struct xe_gt *gt, u32 **masks, u32 *num_masks)
> +{
> +	u8 slice_to_group[MAX_MEDIA_SLICES];
> +	struct xe_hw_engine *hwe = NULL;

do we need to initialize this here?
it will be initialized by for_each_hw_engine, no?

> +	enum xe_hw_engine_id id;
> +	u32 vcs_mask = VCS_MASK(gt);
> +	u32 vecs_mask = VECS_MASK(gt);
> +	u32 gsc_mask = GSCCS_MASK(gt);

we try to define vars in rev-xmas-tree order

> +	u32 *values;
> +	u8 slice;
> +	u8 groups;

I guess for those two generic counters we don't need to use explicitly sized int

> +
> +	xe_gt_assert(gt, xe_gt_is_media_type(gt));
> +
> +	/* A media slice has 2 VCS and a VECS. We bundle the GSC with the first slice */
> +	for (slice = 0, groups = 0;
> +	     gsc_mask || vcs_mask || vecs_mask;
> +	     slice++, gsc_mask = 0, vcs_mask >>= 2, vecs_mask >>= 1) {
> +		if (unlikely(slice >= MAX_MEDIA_SLICES)) {

maybe this 'for' loop should be just for 'slice = [0, MAX)'  

> +			xe_gt_sriov_err(gt, "Too many media slices (%u) during EGS setup\n",
> +					slice);

then after this loop we can have asserts that no engine instances are left behind

> +			return -EINVAL;
> +		}
> +
> +		if ((vcs_mask & 0x3) || (vecs_mask & 0x1) || (gsc_mask & 0x1))
> +			slice_to_group[slice] = groups++;
> +	}

like this:

	for (slice = 0; slice < MAX_MEDIA_SLICES; slice++ ) {

		if ((vcs_mask & 0x3) || (vecs_mask & 0x1) || (gsc_mask & 0x1))
			slice_to_group[slice] = groups++;

		vcs_mask >>= 2;
		vecs_mask >>= 1;
		gsc_mask >>= 1;
	}

	xe_gt_assert(gt, !vcs_mask)
	xe_gt_assert(gt, !vecs_mask)
	xe_gt_assert(gt, !gsc_mask)

> +
> +	/* We need at least 2 slices to split them up */

shouldn't we also check that each group has both VCS and VECS?

> +	if (groups < 2) {
> +		*masks = NULL;
> +		*num_masks = 0;
> +		return 0;
> +	}
> +
> +	/*
> +	 * The GuC expects and array with GUC_MAX_ENGINE_CLASSES entries for

typo: an array ?

> +	 * each group.
> +	 */
> +	values = drmm_kzalloc(&gt_to_xe(gt)->drm,
> +			      GUC_MAX_ENGINE_CLASSES * groups * sizeof(u32),
> +			      GFP_KERNEL);
> +	if (!values)
> +		return -ENOMEM;
> +
> +	for_each_hw_engine(hwe, gt, id) {
> +		u8 guc_class = xe_engine_class_to_guc_class(hwe->class);
> +		u8 entry;
> +
> +		switch (hwe->class) {
> +		case XE_ENGINE_CLASS_VIDEO_DECODE:
> +			slice = hwe->instance / 2;

hmm, this seems to duplicate previous loop where we had:

			vcs_mask >>= 2;

maybe these two loops can be combined?

> +			break;
> +		case XE_ENGINE_CLASS_VIDEO_ENHANCE:
> +			slice = hwe->instance;
> +			break;
> +		case XE_ENGINE_CLASS_OTHER:
> +			slice = 0;
> +			break;
> +		default:
> +			xe_gt_sriov_err(gt, "unknown media gt class %u (%s) during EGS setup\n",
> +					hwe->class, hwe->name);

this also seems like an attempt to catch our coding error,
so plain assert should be sufficient

> +			drmm_kfree(&gt_to_xe(gt)->drm, values);

do we need to bother with this kfree ?
it's managed allocation and any error from here will lead to abort the probe anyway

> +			return -EINVAL;
> +		}
> +
> +		entry = (slice_to_group[slice] * GUC_MAX_ENGINE_CLASSES) + guc_class;

redundant ( )

> +		values[entry] |= BIT(hwe->logical_instance);
> +	}
> +
> +	*masks = values;
> +	*num_masks = GUC_MAX_ENGINE_CLASSES * groups;
> +
> +	return 0;
> +}
> +
> +static int pf_init_sched_groups(struct xe_gt *gt)
> +{
> +	int err;
> +	int m;
> +
> +	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
> +
> +	if (GUC_SUBMIT_VER(&gt->uc.guc) < MAKE_GUC_VER(1, 26, 0))
> +		return 0;
> +
> +	for (m = 0; m < XE_SRIOV_SCHED_GROUPS_MODES_COUNT; m++) {
> +		u32 *masks = NULL;
> +		u32 num_masks = 0;
> +
> +		switch (m) {
> +		case XE_SRIOV_SCHED_GROUPS_NONE:
> +			break;
> +		case XE_SRIOV_SCHED_GROUPS_MEDIA_SLICES:
> +			/* this mode only has groups on the media GT */

then having array of all modes on every GT seems non-optimal

> +			if (xe_gt_is_media_type(gt)) {
> +				err = pf_sched_group_media_slices(gt, &masks, &num_masks);
> +				if (err)
> +					return err;
> +			}
> +			break;
> +		default:
> +			xe_gt_sriov_err(gt, "unknown sched group mode %u\n", m);

this looks more like a coding error, so maybe use just assert?

> +			return -EINVAL;
> +		}
> +
> +		xe_gt_assert(gt, (num_masks % GUC_MAX_ENGINE_CLASSES) == 0);
> +
> +		if ((m == XE_SRIOV_SCHED_GROUPS_NONE) || num_masks)
> +			gt->sriov.pf.policy.guc.sched_groups.supported_modes |= BIT(m);
> +
> +		gt->sriov.pf.policy.guc.sched_groups.modes[m].masks = masks;
> +		gt->sriov.pf.policy.guc.sched_groups.modes[m].num_masks = num_masks;
> +	}
> +
> +	return 0;
> +}
> +
>  static void pf_sanitize_guc_policies(struct xe_gt *gt)
>  {
>  	pf_sanitize_sched_if_idle(gt);
> @@ -401,6 +523,19 @@ int xe_gt_sriov_pf_policy_reprovision(struct xe_gt *gt, bool reset)
>  	return err ? -ENXIO : 0;
>  }
>  
> +/**
> + * xe_gt_sriov_pf_policy_init - Initializes the SW state of the PF policies.
> + * @gt: the &xe_gt
> + *
> + * This function can only be called on PF. This function does not touch the HW.

but must be called after checking HW engine fuses, right?

> + *
> + * Return: 0 on success or a negative error code on failure.
> + */
> +int xe_gt_sriov_pf_policy_init(struct xe_gt *gt)
> +{
> +	return pf_init_sched_groups(gt);
> +}
> +
>  static void print_guc_policies(struct drm_printer *p, struct xe_gt_sriov_guc_policies *policy)
>  {
>  	drm_printf(p, "%s:\t%s\n",
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
> index 2a5dc33dc6d7..c9c04d1b7f50 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
> @@ -18,6 +18,7 @@ bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
>  int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
>  u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
>  
> +int xe_gt_sriov_pf_policy_init(struct xe_gt *gt);
>  void xe_gt_sriov_pf_policy_sanitize(struct xe_gt *gt);
>  int xe_gt_sriov_pf_policy_reprovision(struct xe_gt *gt, bool reset);
>  int xe_gt_sriov_pf_policy_print(struct xe_gt *gt, struct drm_printer *p);
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
> index 4de532af135e..3b915801c01b 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
> @@ -8,16 +8,30 @@
>  
>  #include <linux/types.h>
>  

we expect all enums (and enumerators) to have proper kernel-doc

> +enum xe_sriov_sched_group_modes {
> +	XE_SRIOV_SCHED_GROUPS_NONE = 0, /* disabled */

do we really need this?

> +	XE_SRIOV_SCHED_GROUPS_MEDIA_SLICES, /* separate groups for each media slice */
> +	XE_SRIOV_SCHED_GROUPS_MODES_COUNT

while it looks handy solution to count enumerators,
now COUNT pretends to be a valid mode enumerator while it is not

maybe just provide separate #define for it?

> +};
> +
>  /**
>   * struct xe_gt_sriov_guc_policies - GuC SR-IOV policies.
>   * @sched_if_idle: controls strict scheduling policy.
>   * @reset_engine: controls engines reset on VF switch policy.
>   * @sample_period: adverse events sampling period (in milliseconds).
> + * @sched_groups: available scheduling group configurations and current mode.

don't forget to describe inner members

and there is no 'current' now

>   */
>  struct xe_gt_sriov_guc_policies {
>  	bool sched_if_idle;
>  	bool reset_engine;
>  	u32 sample_period;
> +	struct {
> +		u32 supported_modes;
> +		struct {
> +			u32 *masks;
> +			u32 num_masks;
> +		} modes[XE_SRIOV_SCHED_GROUPS_MODES_COUNT];

hmm, for the NONE mode, all groups must be 0, so why bother with .masks/.num fields for it?

maybe all we need is:

	struct {
		const char *name;
		u32 *config;
	} modes[];

then:

	supported_modes := iterate over modes array and look for non-null config/name

	none := supported_modes is empty

	current := pointer to modes[] or NULL

> +	} sched_groups;
>  };
>  
>  /**

next prev parent reply	other threads:[~2025-12-01 22:37 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-27  1:45 [PATCH 00/10] Introduce SRIOV scheduler groups Daniele Ceraolo Spurio
2025-11-27  1:45 ` [PATCH 01/10] drm/xe/gt: Add engine masks for each class Daniele Ceraolo Spurio
2025-12-01 16:52   ` Michal Wajdeczko
2025-11-27  1:45 ` [PATCH 02/10] drm/xe/sriov: Initialize scheduler groups Daniele Ceraolo Spurio
2025-12-01 22:37   ` Michal Wajdeczko [this message]
2025-12-01 23:33     ` Daniele Ceraolo Spurio
2025-12-02 21:08       ` Michal Wajdeczko
2025-12-02 23:02         ` Daniele Ceraolo Spurio
2025-12-03  1:15         ` Daniele Ceraolo Spurio
2025-11-27  1:45 ` [PATCH 03/10] drm/xe/sriov: Add support for enabling " Daniele Ceraolo Spurio
2025-12-02 11:49   ` Michal Wajdeczko
2025-12-02 17:39     ` Daniele Ceraolo Spurio
2025-12-04 22:06       ` Daniele Ceraolo Spurio
2025-11-27  1:45 ` [PATCH 04/10] drm/xe/sriov: Scheduler groups are incompatible with multi-lrc Daniele Ceraolo Spurio
2025-12-02 13:32   ` Michal Wajdeczko
2025-12-02 17:57     ` Daniele Ceraolo Spurio
2025-12-02 21:17       ` Michal Wajdeczko
2025-12-02 21:25         ` Daniele Ceraolo Spurio
2025-12-02 21:37           ` Michal Wajdeczko
2025-12-02 21:42             ` Daniele Ceraolo Spurio
2025-11-27  1:45 ` [PATCH 05/10] drm/xe/sriov: Add debugfs to enable scheduler groups Daniele Ceraolo Spurio
2025-12-02 15:52   ` Michal Wajdeczko
2025-12-02 18:03     ` Daniele Ceraolo Spurio
2025-12-02 21:24       ` Michal Wajdeczko
2025-11-27  1:45 ` [PATCH 06/10] drm/xe/sriov: Add debugfs with scheduler groups information Daniele Ceraolo Spurio
2025-12-02 16:24   ` Michal Wajdeczko
2025-12-02 18:20     ` Daniele Ceraolo Spurio
2025-12-02 21:31       ` Michal Wajdeczko
2025-11-27  1:45 ` [PATCH 07/10] drm/xe/sriov: Prep for multiple exec quantums and preemption timeouts Daniele Ceraolo Spurio
2025-12-02 16:42   ` Michal Wajdeczko
2025-12-06  1:55     ` Daniele Ceraolo Spurio
2025-11-27  1:45 ` [PATCH 08/10] drm/xe/sriov: Add functions to set exec quantums for each group Daniele Ceraolo Spurio
2025-12-02 19:54   ` Michal Wajdeczko
2025-12-06  1:58     ` Daniele Ceraolo Spurio
2025-11-27  1:45 ` [PATCH 09/10] drm/xe/sriov: Add functions to set preempt timeouts " Daniele Ceraolo Spurio
2025-12-02 20:01   ` Michal Wajdeczko
2025-11-27  1:45 ` [PATCH 10/10] drm/xe/sriov: Add debugfs to set EQ and PT for scheduler groups Daniele Ceraolo Spurio
2025-12-02 20:17   ` Michal Wajdeczko
2025-12-06  1:53     ` Daniele Ceraolo Spurio
2025-11-27  1:51 ` ✗ CI.checkpatch: warning for Introduce SRIOV " Patchwork
2025-11-27  1:52 ` ✓ CI.KUnit: success " Patchwork
2025-11-27  2:36 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-11-27  3:18 ` ✗ Xe.CI.Full: " Patchwork
2025-12-01 17:46   ` Daniele Ceraolo Spurio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0254302e-35cf-4721-84d8-d0c95c2e6d62@intel.com \
    --to=michal.wajdeczko@intel.com \
    --cc=daniele.ceraolospurio@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox