Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Jani Nikula <jani.nikula@linux.intel.com>
To: Matt Roper <matthew.d.roper@intel.com>, intel-xe@lists.freedesktop.org
Cc: matthew.d.roper@intel.com
Subject: Re: [PATCH 1/2] drm/xe: Track pre-production workaround support
Date: Thu, 20 Nov 2025 19:00:58 +0200	[thread overview]
Message-ID: <6727682007acb8a501b42b7de8b696c87443d874@intel.com> (raw)
In-Reply-To: <20251119220459.3997343-3-matthew.d.roper@intel.com>

On Wed, 19 Nov 2025, Matt Roper <matthew.d.roper@intel.com> wrote:
> When we're initially enabling driver support for a new platform/IP, we
> usually implement all workarounds documented in the WA database in the
> driver.  Many of those workarounds are restricted to early steppings
> that only showed up in pre-production hardware (i.e., internal test
> chips that are not available to the general public).  Since the
> workarounds for early, pre-production steppings tend to be some of the
> ugliest and most complicated workarounds, we generally want to eliminate
> them and simplify the code once the platform has launched and our
> internal usage of those pre-production parts have been phased out.
>
> Let's add a flag to the device info that tracks which platforms we've
> removed pre-production workarounds for so that we can print a warning
> and taint if someone tries to load the driver on a pre-production part.
> This will help our internal users understand the likely problems they'll
> encounter if they try to load the driver on an old pre-production
> device.
>
> The Xe behavior here is similar to what we've done for many years on
> i915 (see intel_detect_preproduction_hw()), except that instead of
> manually coding up ranges of device steppings that we believe to be
> pre-production hardware, Xe will use the hardware's own production vs
> pre-production fusing status, which we can read from the FUSE2 register.
> This fuse didn't exist on older Intel hardware, but should be present on
> all platforms supported by the Xe driver.
>
> Going forward, let's set the expectation that we'll start looking into
> removing pre-production workarounds for a platform around the time that
> platforms of the next major IP stepping are having their force_probe
> requirement lifted.  This timing is just a rough guideline; there may be
> cases where some instances of pre-production parts are still being
> actively used in CI farms, internal device pools, etc. and we'll need to
> wait a bit longer for those to be swapped out.

Nice!

I fear the i915 checks have pretty much gone stale, and not all old
workarounds have been removed.

> Bspec: 78271, 52544
> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> ---
>  drivers/gpu/drm/xe/regs/xe_gt_regs.h |  3 ++
>  drivers/gpu/drm/xe/xe_device.c       | 48 ++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_device_types.h |  2 ++
>  drivers/gpu/drm/xe/xe_pci.c          |  1 +
>  drivers/gpu/drm/xe/xe_pci_types.h    |  1 +
>  5 files changed, 55 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> index 917a088c28f2..93643da57428 100644
> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> @@ -227,6 +227,9 @@
>  
>  #define MIRROR_FUSE1				XE_REG(0x911c)
>  
> +#define FUSE2					XE_REG(0x9120)
> +#define   PRODUCTION_HW				REG_BIT(2)
> +
>  #define MIRROR_L3BANK_ENABLE			XE_REG(0x9130)
>  #define   XE3_L3BANK_ENABLE			REG_GENMASK(31, 0)
>  
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 1197f914ef77..ed758888cfa1 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -797,6 +797,52 @@ static int probe_has_flat_ccs(struct xe_device *xe)
>  	return 0;
>  }
>  
> +/*
> + * Detect if the driver is being run on pre-production hardware.  We don't
> + * keep workarounds for pre-production hardware long term, so print an
> + * error and add taint if we're being loaded on a pre-production platform
> + * for which the pre-prod workarounds have already been removed.
> + *
> + * The general policy is that we'll remove any workarounds that only apply to
> + * pre-production hardware around the time force_probe restrictions are lifted
> + * for a platform of the next major IP generation (for example, Xe2 pre-prod
> + * workarounds should be removed around the time the first Xe3 platforms have
> + * force_probe lifted).
> + */
> +static void detect_preproduction_hw(struct xe_device *xe)
> +{
> +	struct xe_gt *gt;
> +	int id;
> +
> +	/*
> +	 * The "SW_CAP" fuse contains a bit indicating whether the device is a
> +	 * production or pre-production device.  This fuse is reflected through
> +	 * the GT "FUSE2" register, even though the contents of the fuse are
> +	 * not GT-specific.  Every GT's reflection of this fuse should show the
> +	 * same value, so we'll just use the first available GT for lookup.
> +	 */
> +	for_each_gt(gt, xe, id)
> +		break;
> +
> +	if (!gt)
> +		return;
> +
> +	CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT);
> +	if (xe_force_wake_ref_has_domain(fw_ref.domains, XE_FW_GT)) {
> +		xe_gt_err(gt, "Forcewake failure; cannot determine production/pre-production hw status.\n");
> +		return;
> +	}
> +
> +	if (xe_mmio_read32(&gt->mmio, FUSE2) & PRODUCTION_HW)
> +		return;
> +
> +	xe_info(xe, "Pre-production hardware detected.\n");
> +	if (xe->info.has_prod_wa_only) {

Would it make sense to flip this flag around? Otherwise you'll need to
keep the flag indefinitely for production hardware.

Say, if you had .has_pre_production_wa, you'd initially set that and
.require_force_probe, then at some point drop .require_force_probe, and
later still drop .has_pre_production_wa.

BR,
Jani.


> +		xe_err(xe, "Pre-production workarounds for this platform have already been removed.\n");
> +		add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
> +	}
> +}
> +
>  int xe_device_probe(struct xe_device *xe)
>  {
>  	struct xe_tile *tile;
> @@ -967,6 +1013,8 @@ int xe_device_probe(struct xe_device *xe)
>  	if (err)
>  		goto err_unregister_display;
>  
> +	detect_preproduction_hw(xe);
> +
>  	return devm_add_action_or_reset(xe->drm.dev, xe_device_sanitize, xe);
>  
>  err_unregister_display:
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 0b2fa7c56d38..fe76640db859 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -308,6 +308,8 @@ struct xe_device {
>  		u8 has_mbx_power_limits:1;
>  		/** @info.has_mem_copy_instr: Device supports MEM_COPY instruction */
>  		u8 has_mem_copy_instr:1;
> +		/** @info.has_prod_wa_only: Only production hardware workarounds supported */
> +		u8 has_prod_wa_only:1;
>  		/** @info.has_pxp: Device has PXP support */
>  		u8 has_pxp:1;
>  		/** @info.has_range_tlb_inval: Has range based TLB invalidations */
> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> index cd03b4b3ebdb..25365a891f9d 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -673,6 +673,7 @@ static int xe_info_init_early(struct xe_device *xe,
>  	xe->info.has_heci_cscfi = desc->has_heci_cscfi;
>  	xe->info.has_late_bind = desc->has_late_bind;
>  	xe->info.has_llc = desc->has_llc;
> +	xe->info.has_prod_wa_only = desc->has_prod_wa_only;
>  	xe->info.has_pxp = desc->has_pxp;
>  	xe->info.has_sriov = xe_configfs_primary_gt_allowed(to_pci_dev(xe->drm.dev)) &&
>  		desc->has_sriov;
> diff --git a/drivers/gpu/drm/xe/xe_pci_types.h b/drivers/gpu/drm/xe/xe_pci_types.h
> index 9892c063a9c5..3045e860221c 100644
> --- a/drivers/gpu/drm/xe/xe_pci_types.h
> +++ b/drivers/gpu/drm/xe/xe_pci_types.h
> @@ -47,6 +47,7 @@ struct xe_device_desc {
>  	u8 has_llc:1;
>  	u8 has_mbx_power_limits:1;
>  	u8 has_mem_copy_instr:1;
> +	u8 has_prod_wa_only:1;
>  	u8 has_pxp:1;
>  	u8 has_sriov:1;
>  	u8 needs_scratch:1;

-- 
Jani Nikula, Intel

  parent reply	other threads:[~2025-11-20 17:01 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-19 22:05 [PATCH 1/2] drm/xe: Track pre-production workaround support Matt Roper
2025-11-19 22:05 ` [PATCH 2/2] drm/xe/lnl: Drop " Matt Roper
2025-11-20  1:46 ` ✓ CI.KUnit: success for series starting with [1/2] drm/xe: Track " Patchwork
2025-11-20  2:36 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-11-20  7:45 ` ✗ Xe.CI.Full: " Patchwork
2025-11-20 16:21 ` [PATCH v2 1/2] " Matt Roper
2025-11-20 17:00 ` Jani Nikula [this message]
2025-11-20 18:07 ` [PATCH " Lucas De Marchi
2025-11-20 18:35   ` Matt Roper
2025-11-20 19:13     ` Lucas De Marchi
2025-11-20 20:31 ` ✓ CI.KUnit: success for series starting with [v2,1/2] drm/xe: Track pre-production workaround support (rev2) Patchwork
2025-11-20 21:14 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-11-20 21:18   ` Matt Roper
2025-11-20 21:59     ` Michal Wajdeczko
2025-11-20 22:05       ` Matt Roper
2025-11-21  2:13 ` ✗ Xe.CI.Full: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6727682007acb8a501b42b7de8b696c87443d874@intel.com \
    --to=jani.nikula@linux.intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.d.roper@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox