Re: [PATCH v2 03/11] drm/xe/sriov: Add support for enabling scheduler groups

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
To: Michal Wajdeczko <michal.wajdeczko@intel.com>,
	<intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH v2 03/11] drm/xe/sriov: Add support for enabling scheduler groups
Date: Mon, 8 Dec 2025 09:41:06 -0800	[thread overview]
Message-ID: <166f18ae-3696-4374-99eb-250c77abe59f@intel.com> (raw)
In-Reply-To: <5e09411a-0f84-434d-9d3a-8fa813353c26@intel.com>



On 12/7/2025 1:57 PM, Michal Wajdeczko wrote:
>
> On 12/7/2025 12:03 AM, Daniele Ceraolo Spurio wrote:
>> Scheduler groups are enabled by sending a specific policy configuration
>> KLV to the GuC. We don't allow changing this policy if there are VF
>> active, since the expectation is that the VF will only check if the
>> feature is enabled during driver initialization.
>>
>> The functions added by this patch will be used by sysfs/debugfs, coming
>> in follow up patches.
>>
>> v2: code improvements, add GUC_MAX_SCHED_GROUPS define, don't add
>>      XE_SRIOV_SCHED_GROUPS_NONE to supported_modes (Michal)
>>
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> ---
>>   drivers/gpu/drm/xe/abi/guc_klvs_abi.h         |  17 +++
>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c    | 136 ++++++++++++++++++
>>   drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h    |   3 +
>>   .../gpu/drm/xe/xe_gt_sriov_pf_policy_types.h  |   4 +
>>   drivers/gpu/drm/xe/xe_guc_fwif.h              |   2 +
>>   drivers/gpu/drm/xe/xe_guc_klv_helpers.c       |   2 +
>>   6 files changed, 164 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>> index 265a135e7061..45733a87183a 100644
>> --- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>> +++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h
>> @@ -200,6 +200,20 @@ enum  {
>>    *      :0: adverse events are not counted (default)
>>    *      :n: sample period in milliseconds
>>    *
>> + * _`GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG` : 0x8004
>> + *      This config allows the PF to split the engines across scheduling groups.
>> + *      Each group is independently timesliced across VFs, allowing different
>> + *      VFs to be active on the HW at the same time. When enabling this feature,
>> + *      all engines must be assigned to a group (and only one group), or they
>> + *      will be excluded from scheduling after this KLV is sent. To enable
>> + *      the groups, the driver must provide a masks array with
>> + *      GUC_MAX_ENGINE_CLASSES entries for each group, with each mask indicating
>> + *      which logical instances of that class belong to the group. Therefore,
>> + *      the length of this KLV when enabling groups is
>> + *      num_groups * GUC_MAX_ENGINE_CLASSES. To disable the groups, the driver
>> + *      must send the KLV without any payload (i.e. len = 0). The maximum
>> + *      number of groups is 8.
>> + *
>>    * _`GUC_KLV_VGT_POLICY_RESET_AFTER_VF_SWITCH` : 0x8D00
>>    *      This enum is to reset utilized HW engine after VF Switch (i.e to clean
>>    *      up Stale HW register left behind by previous VF)
>> @@ -214,6 +228,9 @@ enum  {
>>   #define GUC_KLV_VGT_POLICY_ADVERSE_SAMPLE_PERIOD_KEY	0x8002
>>   #define GUC_KLV_VGT_POLICY_ADVERSE_SAMPLE_PERIOD_LEN	1u
>>   
>> +#define GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG_KEY	0x8004
>> +#define GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT	8u
>> +
>>   #define GUC_KLV_VGT_POLICY_RESET_AFTER_VF_SWITCH_KEY	0x8D00
>>   #define GUC_KLV_VGT_POLICY_RESET_AFTER_VF_SWITCH_LEN	1u
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>> index 158d68aff4b7..1109fec99fc3 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c
>> @@ -97,6 +97,23 @@ static int pf_push_policy_u32(struct xe_gt *gt, u16 key, u32 value)
>>   	return pf_push_policy_klvs(gt, 1, klv, ARRAY_SIZE(klv));
>>   }
>>   
>> +static int pf_push_policy_payload(struct xe_gt *gt, u16 key, u32 *payload, u32 num_dwords)
>> +{
>> +	CLASS(xe_guc_buf, buf)(&gt->uc.guc.buf, GUC_KLV_LEN_MIN + num_dwords);
>> +	u32 *klv;
>> +
>> +	if (!xe_guc_buf_is_valid(buf))
>> +		return -ENOBUFS;
>> +
>> +	klv = xe_guc_buf_cpu_ptr(buf);
>> +
>> +	klv[0] = PREP_GUC_KLV(key, num_dwords);
>> +	if (num_dwords)
>> +		memcpy(&klv[1], payload, num_dwords * sizeof(u32));
>> +
>> +	return pf_push_policy_buf_klvs(gt, 1, buf, GUC_KLV_LEN_MIN + num_dwords);
>> +}
>> +
>>   static int pf_update_policy_bool(struct xe_gt *gt, u16 key, bool *policy, bool value)
>>   {
>>   	int err;
>> @@ -476,16 +493,134 @@ static void pf_init_sched_groups(struct xe_gt *gt)
>>   
>>   		xe_gt_assert(gt, (num_masks % GUC_MAX_ENGINE_CLASSES) == 0);
>>   
> please keep asserts together

ok

>
>> +		xe_gt_assert(gt, num_masks / GUC_MAX_ENGINE_CLASSES < GUC_MAX_SCHED_GROUPS);
>> +
>> +		if (num_masks)
>> +			gt->sriov.pf.policy.guc.sched_groups.supported_modes |= BIT(m);
>> +
>>   		gt->sriov.pf.policy.guc.sched_groups.modes[m].masks = masks;
>>   		gt->sriov.pf.policy.guc.sched_groups.modes[m].num_masks = num_masks;
>>   	}
>>   }
>>   
>> +/**
>> + * xe_sriov_gt_pf_policy_has_multi_group_modes() - check whether the GT supports
>> + * any scheduler modes that have multiple groups
>> + * @gt: the &xe_gt to check
>> + *
>> + * This function can only be called on PF.
>> + *
>> + * Return: true if the GT supports modes with multiple groups, false otherwise.
>> + */
>> +bool xe_sriov_gt_pf_policy_has_multi_group_modes(struct xe_gt *gt)
>> +{
>> +	return gt->sriov.pf.policy.guc.sched_groups.supported_modes;
>> +}
>> +
>> +/**
>> + * xe_sriov_gt_pf_policy_has_sched_group_mode() - check whether the GT supports
>> + * a specific scheduler group mode
>> + * @gt: the &xe_gt to check
>> + * @mode: the mode to check
>> + *
>> + * This function can only be called on PF.
>> + *
>> + * Return: true if the GT supports the specified mode, false otherwise.
>> + */
>> +bool xe_sriov_gt_pf_policy_has_sched_group_mode(struct xe_gt *gt, u32 mode)
>> +{
>> +	if (mode == XE_SRIOV_SCHED_GROUPS_NONE)
>> +		return true;
>> +
>> +	return gt->sriov.pf.policy.guc.sched_groups.supported_modes & BIT(mode);
>> +}
>> +
>> +static int __pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>> +{
>> +	u32 *masks = gt->sriov.pf.policy.guc.sched_groups.modes[mode].masks;
>> +	u32 num_masks = gt->sriov.pf.policy.guc.sched_groups.modes[mode].num_masks;
>> +
>> +	xe_gt_assert(gt, (num_masks % GUC_MAX_ENGINE_CLASSES) == 0);
>> +
>> +	return pf_push_policy_payload(gt, GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG_KEY,
>> +				      masks, num_masks);
>> +}
>> +
>> +static int pf_provision_sched_groups(struct xe_gt *gt, u32 mode)
>> +{
>> +	int err;
>> +
>> +	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
>> +	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
>> +
>> +	if (!xe_sriov_gt_pf_policy_has_sched_group_mode(gt, mode))
>> +		return -EINVAL;
>> +
>> +	/* already in the desired mode */
>> +	if (gt->sriov.pf.policy.guc.sched_groups.current_mode == mode)
>> +		return 0;
>> +
>> +	/*
>> +	 * We don't allow changing this with VFs active since it is hard for
>> +	 * VFs to check.
>> +	 */
>> +	if (xe_sriov_pf_num_vfs(gt_to_xe(gt)))
>> +		return -EBUSY;
>> +
>> +	err = __pf_provision_sched_groups(gt, mode);
>> +	if (err)
>> +		return err;
>> +
>> +	gt->sriov.pf.policy.guc.sched_groups.current_mode = mode;
>> +
>> +	return 0;
>> +}
>> +
>> +static int pf_reprovision_sched_groups(struct xe_gt *gt)
>> +{
>> +	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
>> +	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
>> +
>> +	/* We only have something to provision if we have possible groups */
>> +	if (!xe_sriov_gt_pf_policy_has_multi_group_modes(gt))
>> +		return 0;
>> +
>> +	return __pf_provision_sched_groups(gt, gt->sriov.pf.policy.guc.sched_groups.current_mode);
>> +}
>> +
>> +static void pf_sanitize_sched_groups(struct xe_gt *gt)
>> +{
>> +	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
>> +	lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt));
>> +
>> +	gt->sriov.pf.policy.guc.sched_groups.current_mode = XE_SRIOV_SCHED_GROUPS_NONE;
>> +}
>> +
>> +/**
>> + * xe_gt_sriov_pf_policy_set_sched_groups_mode() - Control the 'sched_groups' policy.
>> + * @gt: the &xe_gt where to apply the policy
>> + * @value: the sched_group mode to be activated
>> + *
>> + * This function can only be called on PF.
>> + *
>> + * Return: 0 on success or a negative error code on failure.
>> + */
>> +int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt,
>> +						enum xe_sriov_sched_group_modes value)
>> +{
>> +	if (!xe_sriov_gt_pf_policy_has_multi_group_modes(gt))
>> +		return -ENODEV;
>> +
>> +	guard(mutex)(xe_gt_sriov_pf_master_mutex(gt));
>> +	return pf_provision_sched_groups(gt, value);
>> +}
>> +
>>   static void pf_sanitize_guc_policies(struct xe_gt *gt)
>>   {
>>   	pf_sanitize_sched_if_idle(gt);
>>   	pf_sanitize_reset_engine(gt);
>>   	pf_sanitize_sample_period(gt);
>> +	pf_sanitize_sched_groups(gt);
>>   }
>>   
>>   /**
>> @@ -524,6 +659,7 @@ int xe_gt_sriov_pf_policy_reprovision(struct xe_gt *gt, bool reset)
>>   	err |= pf_reprovision_sched_if_idle(gt);
>>   	err |= pf_reprovision_reset_engine(gt);
>>   	err |= pf_reprovision_sample_period(gt);
>> +	err |= pf_reprovision_sched_groups(gt);
>>   	mutex_unlock(xe_gt_sriov_pf_master_mutex(gt));
>>   
>>   	xe_pm_runtime_put(gt_to_xe(gt));
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>> index 52312d24d527..6b3e294bc934 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.h
>> @@ -17,6 +17,9 @@ int xe_gt_sriov_pf_policy_set_reset_engine(struct xe_gt *gt, bool enable);
>>   bool xe_gt_sriov_pf_policy_get_reset_engine(struct xe_gt *gt);
>>   int xe_gt_sriov_pf_policy_set_sample_period(struct xe_gt *gt, u32 value);
>>   u32 xe_gt_sriov_pf_policy_get_sample_period(struct xe_gt *gt);
>> +bool xe_sriov_gt_pf_policy_has_multi_group_modes(struct xe_gt *gt);
>> +bool xe_sriov_gt_pf_policy_has_sched_group_mode(struct xe_gt *gt, u32 mode);
>> +int xe_gt_sriov_pf_policy_set_sched_groups_mode(struct xe_gt *gt, u32 value);
>>   
>>   void xe_gt_sriov_pf_policy_init(struct xe_gt *gt);
>>   void xe_gt_sriov_pf_policy_sanitize(struct xe_gt *gt);
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
>> index 1d4cdc87e069..d9928c200d72 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy_types.h
>> @@ -23,12 +23,16 @@ enum xe_sriov_sched_group_modes {
>>   /**
>>    * struct xe_gt_sriov_scheduler_groups - Scheduler groups policy info
>>    * @max_num_of_groups: number of groups supported by the GuC for the platform
>> + * @supported_modes: mask of supported modes
>> + * @current_mode: active scheduler groups mode
>>    * @modes: array of masks and their number for each mode
>>    * @modes.masks: array of masks for a given mode
>>    * @modes.num_masks: number of masks in the array
>>    */
>>   struct xe_gt_sriov_scheduler_groups {
>>   	u8 max_num_of_groups;
>> +	u32 supported_modes;
>> +	enum xe_sriov_sched_group_modes current_mode;
>>   	struct {
>>   		u32 *masks;
>>   		u32 num_masks;
>> diff --git a/drivers/gpu/drm/xe/xe_guc_fwif.h b/drivers/gpu/drm/xe/xe_guc_fwif.h
>> index 7d93c2749485..c2e0a2dae586 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_fwif.h
>> +++ b/drivers/gpu/drm/xe/xe_guc_fwif.h
>> @@ -46,6 +46,8 @@
>>   #define GUC_MAX_ENGINE_CLASSES		16
>>   #define GUC_MAX_INSTANCES_PER_CLASS	32
>>   
>> +#define GUC_MAX_SCHED_GROUPS GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT
> actually my idea was to have here:
>
> 	#define GUC_MAX_SCHED_GROUPS	8
>
> and then in the klv abi header:
>
> 	#define GUC_KLV_VGT_POLICY_ENGINE_GROUP_MAX_COUNT GUC_MAX_SCHED_GROUPS
>
> as IMO the KLV definition follows FW capability, not the other way around

That would require a rework of the header hierarchy, because guc_fwif.h 
includes guc_klvs_abi.h, not the other way around. IMO that does not 
belong in this series. I can just define it as 8 in both places for now 
if you think that's better, and later on if we rework the include 
hierarchy we can have the KLV define based on the one in fwif.h

Daniele

>
>> +
>>   #define GUC_CONTEXT_NORMAL			0
>>   #define GUC_CONTEXT_COMPRESSION_SAVE		1
>>   #define GUC_CONTEXT_COMPRESSION_RESTORE	2
>> diff --git a/drivers/gpu/drm/xe/xe_guc_klv_helpers.c b/drivers/gpu/drm/xe/xe_guc_klv_helpers.c
>> index 146a6eda9e06..1b08b443606e 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_klv_helpers.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_klv_helpers.c
>> @@ -26,6 +26,8 @@ const char *xe_guc_klv_key_to_string(u16 key)
>>   		return "sched_if_idle";
>>   	case GUC_KLV_VGT_POLICY_ADVERSE_SAMPLE_PERIOD_KEY:
>>   		return "sample_period";
>> +	case GUC_KLV_VGT_POLICY_ENGINE_GROUP_CONFIG_KEY:
>> +		return "engine_group_config";
>>   	case GUC_KLV_VGT_POLICY_RESET_AFTER_VF_SWITCH_KEY:
>>   		return "reset_engine";
>>   	/* VF CFG keys */

next prev parent reply	other threads:[~2025-12-08 17:41 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-06 23:03 [PATCH v2 00/11] Introduce SRIOV scheduler groups Daniele Ceraolo Spurio
2025-12-06 23:03 ` [PATCH v2 01/11] drm/xe/gt: Add engine masks for each class Daniele Ceraolo Spurio
2025-12-07 15:35   ` Michal Wajdeczko
2025-12-06 23:03 ` [PATCH v2 02/11] drm/xe/sriov: Initialize scheduler groups Daniele Ceraolo Spurio
2025-12-07 21:57   ` Michal Wajdeczko
2025-12-08 17:36     ` Daniele Ceraolo Spurio
2025-12-06 23:03 ` [PATCH v2 03/11] drm/xe/sriov: Add support for enabling " Daniele Ceraolo Spurio
2025-12-07 21:57   ` Michal Wajdeczko
2025-12-08 17:41     ` Daniele Ceraolo Spurio [this message]
2025-12-06 23:04 ` [PATCH v2 04/11] drm/xe/sriov: Scheduler groups are incompatible with multi-lrc Daniele Ceraolo Spurio
2025-12-07 21:58   ` Michal Wajdeczko
2025-12-08 17:48     ` Daniele Ceraolo Spurio
2025-12-06 23:04 ` [PATCH v2 05/11] drm/xe/sriov: Add handling for MLRC adverse event threshold Daniele Ceraolo Spurio
2025-12-07 22:03   ` Michal Wajdeczko
2025-12-08 17:52     ` Daniele Ceraolo Spurio
2025-12-08 18:27       ` Daniele Ceraolo Spurio
2025-12-06 23:04 ` [PATCH v2 06/11] drm/xe/sriov: Add debugfs to enable scheduler groups Daniele Ceraolo Spurio
2025-12-08 23:38   ` Michal Wajdeczko
2025-12-09  0:36     ` Daniele Ceraolo Spurio
2025-12-09 15:07       ` Michal Wajdeczko
2025-12-09 18:09         ` Daniele Ceraolo Spurio
2025-12-06 23:04 ` [PATCH v2 07/11] drm/xe/sriov: Add debugfs with scheduler groups information Daniele Ceraolo Spurio
2025-12-09  0:08   ` Michal Wajdeczko
2025-12-09  0:23     ` Daniele Ceraolo Spurio
2025-12-06 23:04 ` [PATCH v2 08/11] drm/xe/sriov: Prep for multiple exec quantums and preemption timeouts Daniele Ceraolo Spurio
2025-12-06 23:04 ` [PATCH v2 09/11] drm/xe/sriov: Add functions to set exec quantums for each group Daniele Ceraolo Spurio
2025-12-06 23:04 ` [PATCH v2 10/11] drm/xe/sriov: Add functions to set preempt timeouts " Daniele Ceraolo Spurio
2025-12-06 23:04 ` [PATCH v2 11/11] drm/xe/sriov: Add debugfs to set EQ and PT for scheduler groups Daniele Ceraolo Spurio
2025-12-06 23:10 ` ✗ CI.checkpatch: warning for Introduce SRIOV scheduler groups (rev2) Patchwork
2025-12-06 23:11 ` ✓ CI.KUnit: success " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=166f18ae-3696-4374-99eb-250c77abe59f@intel.com \
    --to=daniele.ceraolospurio@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=michal.wajdeczko@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox