From: Jani Nikula <jani.nikula@linux.intel.com>
To: Matt Roper <matthew.d.roper@intel.com>, intel-xe@lists.freedesktop.org
Cc: matthew.d.roper@intel.com
Subject: Re: [PATCH 1/2] drm/xe: Track pre-production workaround support
Date: Thu, 20 Nov 2025 19:00:58 +0200 [thread overview]
Message-ID: <6727682007acb8a501b42b7de8b696c87443d874@intel.com> (raw)
In-Reply-To: <20251119220459.3997343-3-matthew.d.roper@intel.com>
On Wed, 19 Nov 2025, Matt Roper <matthew.d.roper@intel.com> wrote:
> When we're initially enabling driver support for a new platform/IP, we
> usually implement all workarounds documented in the WA database in the
> driver. Many of those workarounds are restricted to early steppings
> that only showed up in pre-production hardware (i.e., internal test
> chips that are not available to the general public). Since the
> workarounds for early, pre-production steppings tend to be some of the
> ugliest and most complicated workarounds, we generally want to eliminate
> them and simplify the code once the platform has launched and our
> internal usage of those pre-production parts have been phased out.
>
> Let's add a flag to the device info that tracks which platforms we've
> removed pre-production workarounds for so that we can print a warning
> and taint if someone tries to load the driver on a pre-production part.
> This will help our internal users understand the likely problems they'll
> encounter if they try to load the driver on an old pre-production
> device.
>
> The Xe behavior here is similar to what we've done for many years on
> i915 (see intel_detect_preproduction_hw()), except that instead of
> manually coding up ranges of device steppings that we believe to be
> pre-production hardware, Xe will use the hardware's own production vs
> pre-production fusing status, which we can read from the FUSE2 register.
> This fuse didn't exist on older Intel hardware, but should be present on
> all platforms supported by the Xe driver.
>
> Going forward, let's set the expectation that we'll start looking into
> removing pre-production workarounds for a platform around the time that
> platforms of the next major IP stepping are having their force_probe
> requirement lifted. This timing is just a rough guideline; there may be
> cases where some instances of pre-production parts are still being
> actively used in CI farms, internal device pools, etc. and we'll need to
> wait a bit longer for those to be swapped out.
Nice!
I fear the i915 checks have pretty much gone stale, and not all old
workarounds have been removed.
> Bspec: 78271, 52544
> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> ---
> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 3 ++
> drivers/gpu/drm/xe/xe_device.c | 48 ++++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_device_types.h | 2 ++
> drivers/gpu/drm/xe/xe_pci.c | 1 +
> drivers/gpu/drm/xe/xe_pci_types.h | 1 +
> 5 files changed, 55 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> index 917a088c28f2..93643da57428 100644
> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> @@ -227,6 +227,9 @@
>
> #define MIRROR_FUSE1 XE_REG(0x911c)
>
> +#define FUSE2 XE_REG(0x9120)
> +#define PRODUCTION_HW REG_BIT(2)
> +
> #define MIRROR_L3BANK_ENABLE XE_REG(0x9130)
> #define XE3_L3BANK_ENABLE REG_GENMASK(31, 0)
>
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 1197f914ef77..ed758888cfa1 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -797,6 +797,52 @@ static int probe_has_flat_ccs(struct xe_device *xe)
> return 0;
> }
>
> +/*
> + * Detect if the driver is being run on pre-production hardware. We don't
> + * keep workarounds for pre-production hardware long term, so print an
> + * error and add taint if we're being loaded on a pre-production platform
> + * for which the pre-prod workarounds have already been removed.
> + *
> + * The general policy is that we'll remove any workarounds that only apply to
> + * pre-production hardware around the time force_probe restrictions are lifted
> + * for a platform of the next major IP generation (for example, Xe2 pre-prod
> + * workarounds should be removed around the time the first Xe3 platforms have
> + * force_probe lifted).
> + */
> +static void detect_preproduction_hw(struct xe_device *xe)
> +{
> + struct xe_gt *gt;
> + int id;
> +
> + /*
> + * The "SW_CAP" fuse contains a bit indicating whether the device is a
> + * production or pre-production device. This fuse is reflected through
> + * the GT "FUSE2" register, even though the contents of the fuse are
> + * not GT-specific. Every GT's reflection of this fuse should show the
> + * same value, so we'll just use the first available GT for lookup.
> + */
> + for_each_gt(gt, xe, id)
> + break;
> +
> + if (!gt)
> + return;
> +
> + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT);
> + if (xe_force_wake_ref_has_domain(fw_ref.domains, XE_FW_GT)) {
> + xe_gt_err(gt, "Forcewake failure; cannot determine production/pre-production hw status.\n");
> + return;
> + }
> +
> + if (xe_mmio_read32(>->mmio, FUSE2) & PRODUCTION_HW)
> + return;
> +
> + xe_info(xe, "Pre-production hardware detected.\n");
> + if (xe->info.has_prod_wa_only) {
Would it make sense to flip this flag around? Otherwise you'll need to
keep the flag indefinitely for production hardware.
Say, if you had .has_pre_production_wa, you'd initially set that and
.require_force_probe, then at some point drop .require_force_probe, and
later still drop .has_pre_production_wa.
BR,
Jani.
> + xe_err(xe, "Pre-production workarounds for this platform have already been removed.\n");
> + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
> + }
> +}
> +
> int xe_device_probe(struct xe_device *xe)
> {
> struct xe_tile *tile;
> @@ -967,6 +1013,8 @@ int xe_device_probe(struct xe_device *xe)
> if (err)
> goto err_unregister_display;
>
> + detect_preproduction_hw(xe);
> +
> return devm_add_action_or_reset(xe->drm.dev, xe_device_sanitize, xe);
>
> err_unregister_display:
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 0b2fa7c56d38..fe76640db859 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -308,6 +308,8 @@ struct xe_device {
> u8 has_mbx_power_limits:1;
> /** @info.has_mem_copy_instr: Device supports MEM_COPY instruction */
> u8 has_mem_copy_instr:1;
> + /** @info.has_prod_wa_only: Only production hardware workarounds supported */
> + u8 has_prod_wa_only:1;
> /** @info.has_pxp: Device has PXP support */
> u8 has_pxp:1;
> /** @info.has_range_tlb_inval: Has range based TLB invalidations */
> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> index cd03b4b3ebdb..25365a891f9d 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -673,6 +673,7 @@ static int xe_info_init_early(struct xe_device *xe,
> xe->info.has_heci_cscfi = desc->has_heci_cscfi;
> xe->info.has_late_bind = desc->has_late_bind;
> xe->info.has_llc = desc->has_llc;
> + xe->info.has_prod_wa_only = desc->has_prod_wa_only;
> xe->info.has_pxp = desc->has_pxp;
> xe->info.has_sriov = xe_configfs_primary_gt_allowed(to_pci_dev(xe->drm.dev)) &&
> desc->has_sriov;
> diff --git a/drivers/gpu/drm/xe/xe_pci_types.h b/drivers/gpu/drm/xe/xe_pci_types.h
> index 9892c063a9c5..3045e860221c 100644
> --- a/drivers/gpu/drm/xe/xe_pci_types.h
> +++ b/drivers/gpu/drm/xe/xe_pci_types.h
> @@ -47,6 +47,7 @@ struct xe_device_desc {
> u8 has_llc:1;
> u8 has_mbx_power_limits:1;
> u8 has_mem_copy_instr:1;
> + u8 has_prod_wa_only:1;
> u8 has_pxp:1;
> u8 has_sriov:1;
> u8 needs_scratch:1;
--
Jani Nikula, Intel
next prev parent reply other threads:[~2025-11-20 17:01 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-19 22:05 [PATCH 1/2] drm/xe: Track pre-production workaround support Matt Roper
2025-11-19 22:05 ` [PATCH 2/2] drm/xe/lnl: Drop " Matt Roper
2025-11-20 1:46 ` ✓ CI.KUnit: success for series starting with [1/2] drm/xe: Track " Patchwork
2025-11-20 2:36 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-11-20 7:45 ` ✗ Xe.CI.Full: " Patchwork
2025-11-20 16:21 ` [PATCH v2 1/2] " Matt Roper
2025-11-20 17:00 ` Jani Nikula [this message]
2025-11-20 18:07 ` [PATCH " Lucas De Marchi
2025-11-20 18:35 ` Matt Roper
2025-11-20 19:13 ` Lucas De Marchi
2025-11-20 20:31 ` ✓ CI.KUnit: success for series starting with [v2,1/2] drm/xe: Track pre-production workaround support (rev2) Patchwork
2025-11-20 21:14 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-11-20 21:18 ` Matt Roper
2025-11-20 21:59 ` Michal Wajdeczko
2025-11-20 22:05 ` Matt Roper
2025-11-21 2:13 ` ✗ Xe.CI.Full: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6727682007acb8a501b42b7de8b696c87443d874@intel.com \
--to=jani.nikula@linux.intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.d.roper@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox