Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
To: Matt Roper <matthew.d.roper@intel.com>,
	<intel-xe@lists.freedesktop.org>,
	 Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Subject: Re: [PATCH v4 22/23] drm/xe/configfs: Add attribute to disable GT types
Date: Wed, 8 Oct 2025 12:12:37 +0200	[thread overview]
Message-ID: <c45b55d0-697d-4e3b-82d3-5cecda6362ff@intel.com> (raw)
In-Reply-To: <20251007204829.1468209-47-matthew.d.roper@intel.com>



On 10/7/2025 10:48 PM, Matt Roper wrote:
> Preventing the driver from initializing GTs of specific type(s) can be
> useful for debugging and early hardware bringup.  Add a configfs
> attribute to allow this kind of control for debugging.
> 
> With today's platforms and software design, this configuration setting
> is only effective for disabling the media GT since the driver currently
> requires that there always be a primary GT to probe the device.  However
> this might change in the future ---  in theory it should be possible
> (with some additional driver work) to allow an igpu device to come up
> with only the media GT and no primary GT.  Or to allow an igpu device to
> come up with no GTs at all (for display-only usage).  A primary GT will
> likely always be required on dgpu platforms because we rely on the BCS
> engines inside the primary GT for various vram operations.
> 
> v2:
>  - Expand/clarify kerneldoc for configfs attribute.  (Gustavo)
>  - Tighten type usage in gt_types[] structure.  (Gustavo)
>  - Adjust string parsing/name matching to match exact GT names and not
>    accept partial names.  (Gustavo)
> 
> v3:
>  - Switch to scope-based cleanup in gt_types_allowed_store() to fix a
>    leak if the device is already bound.  (Gustavo)
>  - Switch configfs lookup interface to two boolean functions that
>    specify whether primary/media are supported rather than one function
>    that returns a mask.  This is simpler to use and understand.
> 
> Cc: Gustavo Sousa <gustavo.sousa@intel.com>
> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_configfs.c | 145 +++++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_configfs.h |   4 +
>  drivers/gpu/drm/xe/xe_pci.c      |  22 +++++
>  3 files changed, 171 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_configfs.c b/drivers/gpu/drm/xe/xe_configfs.c
> index 139663423185..e36cc5e1bc8f 100644
> --- a/drivers/gpu/drm/xe/xe_configfs.c
> +++ b/drivers/gpu/drm/xe/xe_configfs.c
> @@ -15,6 +15,7 @@
>  
>  #include "instructions/xe_mi_commands.h"
>  #include "xe_configfs.h"
> +#include "xe_gt_types.h"
>  #include "xe_hw_engine_types.h"
>  #include "xe_module.h"
>  #include "xe_pci_types.h"
> @@ -56,6 +57,7 @@
>   *	:
>   *	└── 0000:03:00.0
>   *	    ├── survivability_mode
> + *	    ├── gt_types_allowed

I'm wondering if we want to keep such advance knobs at the same level as others?
maybe create sub-group for them, like I did for sriov?

	    └── tweaks
	        ├── gt_types_allowed
	        ├── engines_allowed


>   *	    ├── engines_allowed
>   *	    └── enable_psmi

oops, and it looks that I missed to update this part of the doc when adding max_vfs with:

	    └── sriov
	        ├── max_vfs

>   *
> @@ -79,6 +81,44 @@
>   *
>   * This attribute can only be set before binding to the device.
>   *
> + * Allowed GT types:
> + * -----------------
> + *
> + * Allow only specific types of GTs to be detected and initialized by the
> + * driver.  Any combination of GT types can be enabled/disabled, although
> + * some settings will cause the device to fail to probe.
> + *
> + * Writes support both comma- and newline-separated input format. Reads
> + * will always return one GT type per line. "primary" and "media" are the
> + * GT type names supported by this interface.
> + *
> + * This attribute can only be set before binding to the device.
> + *
> + * Examples:
> + *
> + * Allow both primary and media GTs to be initialized and used.  This matches
> + * the driver's default behavior::
> + *
> + *	# echo 'primary,media' > /sys/kernel/config/xe/0000:03:00.0/gt_types_allowed

maybe "all" as an alias?

> + *
> + * Allow only the primary GT of each tile to be initialized and used,
> + * effectively disabling the media GT if it exists on the platform::
> + *
> + *	# echo 'primary' > /sys/kernel/config/xe/0000:03:00.0/gt_types_allowed
> + *
> + * Allow only the media GT of each tile to be initialized and used,
> + * effectively disabling the primary GT.  **This configuration will cause
> + * device probe failure on all current platforms, but may be allowed on
> + * igpu platforms in the future**::
> + *
> + *	# echo 'media' > /sys/kernel/config/xe/0000:03:00.0/gt_types_allowed
> + *
> + * Disable all GTs.  Only other GPU IP (such as display) is potentially usable.
> + * **This configuration will cause device probe failure on all current
> + * platforms, but may be allowed on igpu platforms in the future**::
> + *
> + *	# echo '' > /sys/kernel/config/xe/0000:03:00.0/gt_types_allowed

maybe "none" as an alias?

> + *
>   * Allowed engines:
>   * ----------------
>   *
> @@ -187,6 +227,7 @@ struct xe_config_group_device {
>  	struct config_group group;
>  
>  	struct xe_config_device {
> +		u64 gt_types_allowed;
>  		u64 engines_allowed;
>  		struct wa_bb ctx_restore_post_bb[XE_ENGINE_CLASS_MAX];
>  		struct wa_bb ctx_restore_mid_bb[XE_ENGINE_CLASS_MAX];
> @@ -201,6 +242,7 @@ struct xe_config_group_device {
>  };
>  
>  static const struct xe_config_device device_defaults = {
> +	.gt_types_allowed = U64_MAX,
>  	.engines_allowed = U64_MAX,
>  	.survivability_mode = false,
>  	.enable_psmi = false,
> @@ -220,6 +262,7 @@ struct engine_info {
>  /* Some helpful macros to aid on the sizing of buffer allocation when parsing */
>  #define MAX_ENGINE_CLASS_CHARS 5
>  #define MAX_ENGINE_INSTANCE_CHARS 2
> +#define MAX_GT_TYPE_CHARS 7
>  
>  static const struct engine_info engine_info[] = {
>  	{ .cls = "rcs", .mask = XE_HW_ENGINE_RCS_MASK, .engine_class = XE_ENGINE_CLASS_RENDER },
> @@ -230,6 +273,14 @@ static const struct engine_info engine_info[] = {
>  	{ .cls = "gsccs", .mask = XE_HW_ENGINE_GSCCS_MASK, .engine_class = XE_ENGINE_CLASS_OTHER },
>  };
>  
> +static const struct {
> +	const char name[MAX_GT_TYPE_CHARS + 1];
> +	enum xe_gt_type type;
> +} gt_types[] = {
> +	{ .name = "primary", .type = XE_GT_TYPE_MAIN },
> +	{ .name = "media", .type = XE_GT_TYPE_MEDIA },
> +};
> +
>  static struct xe_config_group_device *to_xe_config_group_device(struct config_item *item)
>  {
>  	return container_of(to_config_group(item), struct xe_config_group_device, group);
> @@ -292,6 +343,58 @@ static ssize_t survivability_mode_store(struct config_item *item, const char *pa
>  	return len;
>  }
>  
> +static ssize_t gt_types_allowed_show(struct config_item *item, char *page)
> +{
> +	struct xe_config_device *dev = to_xe_config_device(item);
> +	char *p = page;
> +
> +	for (size_t i = 0; i < ARRAY_SIZE(gt_types); i++)
> +		if (dev->gt_types_allowed & BIT_ULL(gt_types[i].type))
> +			p += sprintf(p, "%s\n", gt_types[i].name);
> +
> +	return p - page;
> +}
> +
> +static ssize_t gt_types_allowed_store(struct config_item *item, const char *page,
> +				      size_t len)
> +{
> +	struct xe_config_group_device *dev = to_xe_config_group_device(item);
> +	char *buf __free(kfree) = kstrdup(page, GFP_KERNEL);
> +	char *p = buf;
> +	u64 typemask = 0;
> +
> +	if (!buf)
> +		return -ENOMEM;
> +
> +	while (p) {
> +		char *typename = strsep(&p, ",\n");
> +		bool matched = false;
> +
> +		if (typename[0] == '\0')
> +			continue;
> +
> +		for (size_t i = 0; i < ARRAY_SIZE(gt_types); i++) {
> +			if (strcmp(typename, gt_types[i].name) == 0) {
> +				typemask |= BIT(gt_types[i].type);
> +				matched = true;
> +				break;
> +			}
> +		}
> +
> +		if (!matched)
> +			return -EINVAL;
> +	}
> +
> +	scoped_guard(mutex, &dev->lock) {

probably plain guard(mutex) will work here too

> +		if (is_bound(dev))
> +			return -EBUSY;

then we can take a lock and return earlier, before parsing input

> +
> +		dev->config.gt_types_allowed = typemask;
> +	}
> +
> +	return len;
> +}
> +
>  static ssize_t engines_allowed_show(struct config_item *item, char *page)
>  {
>  	struct xe_config_device *dev = to_xe_config_device(item);
> @@ -672,6 +775,7 @@ CONFIGFS_ATTR(, ctx_restore_mid_bb);
>  CONFIGFS_ATTR(, ctx_restore_post_bb);
>  CONFIGFS_ATTR(, enable_psmi);
>  CONFIGFS_ATTR(, engines_allowed);
> +CONFIGFS_ATTR(, gt_types_allowed);
>  CONFIGFS_ATTR(, survivability_mode);
>  
>  static struct configfs_attribute *xe_config_device_attrs[] = {
> @@ -679,6 +783,7 @@ static struct configfs_attribute *xe_config_device_attrs[] = {
>  	&attr_ctx_restore_post_bb,
>  	&attr_enable_psmi,
>  	&attr_engines_allowed,
> +	&attr_gt_types_allowed,
>  	&attr_survivability_mode,
>  	NULL,
>  };
> @@ -846,6 +951,7 @@ static void dump_custom_dev_config(struct pci_dev *pdev,
>  				 dev->config.attr_); \
>  	} while (0)
>  
> +	PRI_CUSTOM_ATTR("%llx", gt_types_allowed);
>  	PRI_CUSTOM_ATTR("%llx", engines_allowed);
>  	PRI_CUSTOM_ATTR("%d", enable_psmi);
>  	PRI_CUSTOM_ATTR("%d", survivability_mode);
> @@ -896,6 +1002,45 @@ bool xe_configfs_get_survivability_mode(struct pci_dev *pdev)
>  	return mode;
>  }
>  
> +static u64 get_gt_types_allowed(struct xe_device *xe)
> +{
> +	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> +	struct xe_config_group_device *dev = find_xe_config_group_device(pdev);
> +	u64 mask;
> +
> +	if (!dev)
> +		return device_defaults.gt_types_allowed;
> +
> +	mask = dev->config.gt_types_allowed;

btw, as we are using guard during write, shouldn't we also guard during read?

> +	config_group_put(&dev->group);
> +
> +	return mask;
> +}
> +
> +/**
> + * xe_configfs_primary_gt_supported - determine whether primary GTs are supported
> + * @xe: xe device
> + *
> + * Return: True if primary GTs are enabled, false if they have been disabled via
> + *     configfs.
> + */
> +bool xe_configfs_primary_gt_supported(struct xe_device *xe)
> +{
> +	return (get_gt_types_allowed(xe) & BIT_ULL(XE_GT_TYPE_MAIN)) != 0;

can't we just rely on the promotion to bool?

	return get_gt_types_allowed(xe) & BIT_ULL(XE_GT_TYPE_MAIN);

> +}
> +
> +/**
> + * xe_configfs_media_gt_supported - determine whether media GTs are supported
> + * @xe: xe device
> + *
> + * Return: True if the media GTs are enabled, false if they have been disabled
> + *     via configfs.
> + */
> +bool xe_configfs_media_gt_supported(struct xe_device *xe)
> +{
> +	return (get_gt_types_allowed(xe) & BIT_ULL(XE_GT_TYPE_MEDIA)) != 0;
> +}
> +
>  /**
>   * xe_configfs_get_engines_allowed - get engine allowed mask from configfs
>   * @pdev: pci device
> diff --git a/drivers/gpu/drm/xe/xe_configfs.h b/drivers/gpu/drm/xe/xe_configfs.h
> index c61e0e47ed94..5624e965b911 100644
> --- a/drivers/gpu/drm/xe/xe_configfs.h
> +++ b/drivers/gpu/drm/xe/xe_configfs.h
> @@ -17,6 +17,8 @@ int xe_configfs_init(void);
>  void xe_configfs_exit(void);
>  void xe_configfs_check_device(struct pci_dev *pdev);
>  bool xe_configfs_get_survivability_mode(struct pci_dev *pdev);
> +bool xe_configfs_primary_gt_supported(struct xe_device *xe);
> +bool xe_configfs_media_gt_supported(struct xe_device *xe);

I guess we need decide now whether we want to continue to pass pdev or switch to xe as argument for all xe_configfs functions

>  u64 xe_configfs_get_engines_allowed(struct pci_dev *pdev);
>  bool xe_configfs_get_psmi_enabled(struct pci_dev *pdev);
>  u32 xe_configfs_get_ctx_restore_mid_bb(struct pci_dev *pdev, enum xe_engine_class,
> @@ -28,6 +30,8 @@ static inline int xe_configfs_init(void) { return 0; }
>  static inline void xe_configfs_exit(void) { }
>  static inline void xe_configfs_check_device(struct pci_dev *pdev) { }
>  static inline bool xe_configfs_get_survivability_mode(struct pci_dev *pdev) { return false; }
> +static inline bool xe_configfs_primary_gt_supported(struct xe_device *xe) { return true; }
> +static inline bool xe_configfs_media_gt_supported(struct xe_device *xe) { return true; }
>  static inline u64 xe_configfs_get_engines_allowed(struct pci_dev *pdev) { return U64_MAX; }
>  static inline bool xe_configfs_get_psmi_enabled(struct pci_dev *pdev) { return false; }
>  static inline u32 xe_configfs_get_ctx_restore_mid_bb(struct pci_dev *pdev, enum xe_engine_class,
> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> index a5932e4f4a23..9c8ab2b41737 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -695,6 +695,11 @@ static struct xe_gt *alloc_primary_gt(struct xe_tile *tile,
>  	struct xe_device *xe = tile_to_xe(tile);
>  	struct xe_gt *gt;
>  
> +	if (!xe_configfs_primary_gt_supported(xe)) {
> +		drm_info(&xe->drm, "Primary GT disabled via configfs\n");

nit: you can use xe_info(xe, "...") now

> +		return NULL;
> +	}
> +
>  	gt = xe_gt_alloc(tile);
>  	if (IS_ERR(gt))
>  		return gt;
> @@ -720,6 +725,11 @@ static struct xe_gt *alloc_media_gt(struct xe_tile *tile,
>  	struct xe_device *xe = tile_to_xe(tile);
>  	struct xe_gt *gt;
>  
> +	if (!xe_configfs_media_gt_supported(xe)) {
> +		drm_info(&xe->drm, "Media GT disabled via configfs\n");
> +		return NULL;
> +	}
> +
>  	if (MEDIA_VER(xe) < 13 || !media_desc)
>  		return NULL;
>  
> @@ -829,6 +839,18 @@ static int xe_info_init(struct xe_device *xe,
>  		if (IS_ERR(tile->primary_gt))
>  			return PTR_ERR(tile->primary_gt);
>  
> +		/*
> +		 * It's not currently possible to probe a device with the
> +		 * primary GT disabled.  With some work, this may be future in
> +		 * the possible for igpu platforms (although probably not for
> +		 * dgpu's since access to the primary GT's BCS engines is
> +		 * required for VRAM management).
> +		 */
> +		if (!tile->primary_gt) {
> +			drm_err(&xe->drm, "Cannot probe device with without a primary GT\n");
> +			return -ENODEV;
> +		}
> +
>  		tile->media_gt = alloc_media_gt(tile, media_desc);
>  		if (IS_ERR(tile->media_gt))
>  			return PTR_ERR(tile->media_gt);


  parent reply	other threads:[~2025-10-08 10:12 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-07 20:48 [PATCH v4 00/23] Allow configfs to disable specific GT type(s) Matt Roper
2025-10-07 20:48 ` [PATCH v4 01/23] drm/xe/huc: Adjust HuC check on primary GT Matt Roper
2025-10-07 20:48 ` [PATCH v4 02/23] drm/xe: Drop GT parameter to xe_display_irq_postinstall() Matt Roper
2025-10-07 20:48 ` [PATCH v4 03/23] drm/xe: Move 'va_bits' flag back to platform descriptor Matt Roper
2025-10-07 22:02   ` Lucas De Marchi
2025-10-07 22:44     ` Matt Roper
2025-10-07 20:48 ` [PATCH v4 04/23] drm/xe: Move 'vm_max_level' " Matt Roper
2025-10-07 21:54   ` Lucas De Marchi
2025-10-08 13:28   ` Gustavo Sousa
2025-10-07 20:48 ` [PATCH v4 05/23] drm/xe: Move 'vram_flags' " Matt Roper
2025-10-07 20:48 ` [PATCH v4 06/23] drm/xe: Move 'has_flatccs' " Matt Roper
2025-10-10 10:50   ` Jani Nikula
2025-10-13 16:42     ` Matt Roper
2025-10-07 20:48 ` [PATCH v4 07/23] drm/xe: Read VF GMD_ID with a specifically-allocated dummy GT Matt Roper
2025-10-08  3:06   ` Lucas De Marchi
2025-10-07 20:48 ` [PATCH v4 08/23] drm/xe: Move primary GT allocation from xe_tile_init_early to xe_tile_init Matt Roper
2025-10-07 20:48 ` [PATCH v4 09/23] drm/xe: Skip L2 / TDF cache flushes if primary GT is disabled Matt Roper
2025-10-07 20:48 ` [PATCH v4 10/23] drm/xe/query: Report hwconfig size as 0 " Matt Roper
2025-10-07 20:48 ` [PATCH v4 11/23] drm/xe/pmu: Initialize PMU event types based on first available GT Matt Roper
2025-10-07 20:48 ` [PATCH v4 12/23] drm/xe: Check for primary GT before looking up Wa_22019338487 Matt Roper
2025-10-08 13:30   ` Gustavo Sousa
2025-10-07 20:48 ` [PATCH v4 13/23] drm/xe: Make display part of Wa_22019338487 a device workaround Matt Roper
2025-10-07 20:48 ` [PATCH v4 14/23] drm/xe/irq: Don't try to lookup engine masks for non-existent primary GT Matt Roper
2025-10-07 20:48 ` [PATCH v4 15/23] drm/xe: Handle Wa_22010954014 and Wa_14022085890 as device workarounds Matt Roper
2025-10-07 20:48 ` [PATCH v4 16/23] drm/xe/rtp: Pass xe_device parameter to FUNC matches Matt Roper
2025-10-07 20:48 ` [PATCH v4 17/23] drm/xe: Bypass Wa_14018094691 when primary GT is disabled Matt Roper
2025-10-07 20:48 ` [PATCH v4 18/23] drm/xe: Correct lineage for Wa_22014953428 and only check with valid GT Matt Roper
2025-10-07 20:48 ` [PATCH v4 19/23] drm/xe: Check that GT is not NULL before testing Wa_16023588340 Matt Roper
2025-10-07 20:48 ` [PATCH v4 20/23] drm/xe: Don't check BIOS-disabled FlatCCS if primary GT is disabled Matt Roper
2025-10-07 20:48 ` [PATCH v4 21/23] drm/xe: Break GT setup out of xe_info_init() Matt Roper
2025-10-08  3:15   ` Lucas De Marchi
2025-10-08 13:39   ` Gustavo Sousa
2025-10-07 20:48 ` [PATCH v4 22/23] drm/xe/configfs: Add attribute to disable GT types Matt Roper
2025-10-08  3:37   ` Lucas De Marchi
2025-10-08 19:10     ` Matt Roper
2025-10-08 19:22       ` Lucas De Marchi
2025-10-08 10:12   ` Michal Wajdeczko [this message]
2025-10-08 20:08     ` Matt Roper
2025-10-08 21:10       ` Lucas De Marchi
2025-10-08 14:06   ` Gustavo Sousa
2025-10-07 20:48 ` [PATCH v4 23/23] drm/xe/sriov: Disable SR-IOV if primary GT is disabled via configfs Matt Roper
2025-10-07 20:56 ` ✗ CI.checkpatch: warning for Allow configfs to disable specific GT type(s) (rev4) Patchwork
2025-10-07 20:57 ` ✓ CI.KUnit: success " Patchwork
2025-10-07 21:49 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-07 23:22 ` ✗ Xe.CI.Full: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c45b55d0-697d-4e3b-82d3-5cecda6362ff@intel.com \
    --to=michal.wajdeczko@intel.com \
    --cc=gustavo.sousa@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.d.roper@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox