Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matt Roper <matthew.d.roper@intel.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH 1/2] drm/xe: Track pre-production workaround support
Date: Thu, 20 Nov 2025 10:35:59 -0800	[thread overview]
Message-ID: <20251120183559.GC3905809@mdroper-desk1.amr.corp.intel.com> (raw)
In-Reply-To: <3zguwnf7qbdsiy6n7vvndbbvtidvmp3q2nzxdz6xi5bkifgtgv@rzmfr7eo3a4n>

On Thu, Nov 20, 2025 at 12:07:45PM -0600, Lucas De Marchi wrote:
> On Wed, Nov 19, 2025 at 02:05:00PM -0800, Matt Roper wrote:
> > When we're initially enabling driver support for a new platform/IP, we
> > usually implement all workarounds documented in the WA database in the
> > driver.  Many of those workarounds are restricted to early steppings
> > that only showed up in pre-production hardware (i.e., internal test
> > chips that are not available to the general public).  Since the
> > workarounds for early, pre-production steppings tend to be some of the
> > ugliest and most complicated workarounds, we generally want to eliminate
> > them and simplify the code once the platform has launched and our
> > internal usage of those pre-production parts have been phased out.
> > 
> > Let's add a flag to the device info that tracks which platforms we've
> > removed pre-production workarounds for so that we can print a warning
> > and taint if someone tries to load the driver on a pre-production part.
> > This will help our internal users understand the likely problems they'll
> > encounter if they try to load the driver on an old pre-production
> > device.
> > 
> > The Xe behavior here is similar to what we've done for many years on
> > i915 (see intel_detect_preproduction_hw()), except that instead of
> > manually coding up ranges of device steppings that we believe to be
> > pre-production hardware, Xe will use the hardware's own production vs
> > pre-production fusing status, which we can read from the FUSE2 register.
> > This fuse didn't exist on older Intel hardware, but should be present on
> > all platforms supported by the Xe driver.
> 
> another difference it seems is that intel_detect_preproduction_hw() is
> executed for any platform, there isn't a flag... but maybe that's
> because the platform/revid check is only added on the flag day.
> 
> > Going forward, let's set the expectation that we'll start looking into
> > removing pre-production workarounds for a platform around the time that
> > platforms of the next major IP stepping are having their force_probe
> > requirement lifted.  This timing is just a rough guideline; there may be
> > cases where some instances of pre-production parts are still being
> > actively used in CI farms, internal device pools, etc. and we'll need to
> > wait a bit longer for those to be swapped out.
> > 
> > Bspec: 78271, 52544
> > Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> > ---
> > drivers/gpu/drm/xe/regs/xe_gt_regs.h |  3 ++
> > drivers/gpu/drm/xe/xe_device.c       | 48 ++++++++++++++++++++++++++++
> > drivers/gpu/drm/xe/xe_device_types.h |  2 ++
> > drivers/gpu/drm/xe/xe_pci.c          |  1 +
> > drivers/gpu/drm/xe/xe_pci_types.h    |  1 +
> > 5 files changed, 55 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> > index 917a088c28f2..93643da57428 100644
> > --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> > +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> > @@ -227,6 +227,9 @@
> > 
> > #define MIRROR_FUSE1				XE_REG(0x911c)
> > 
> > +#define FUSE2					XE_REG(0x9120)
> > +#define   PRODUCTION_HW				REG_BIT(2)
> > +
> > #define MIRROR_L3BANK_ENABLE			XE_REG(0x9130)
> > #define   XE3_L3BANK_ENABLE			REG_GENMASK(31, 0)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > index 1197f914ef77..ed758888cfa1 100644
> > --- a/drivers/gpu/drm/xe/xe_device.c
> > +++ b/drivers/gpu/drm/xe/xe_device.c
> > @@ -797,6 +797,52 @@ static int probe_has_flat_ccs(struct xe_device *xe)
> > 	return 0;
> > }
> > 
> > +/*
> > + * Detect if the driver is being run on pre-production hardware.  We don't
> > + * keep workarounds for pre-production hardware long term, so print an
> > + * error and add taint if we're being loaded on a pre-production platform
> > + * for which the pre-prod workarounds have already been removed.
> > + *
> > + * The general policy is that we'll remove any workarounds that only apply to
> > + * pre-production hardware around the time force_probe restrictions are lifted
> > + * for a platform of the next major IP generation (for example, Xe2 pre-prod
> > + * workarounds should be removed around the time the first Xe3 platforms have
> > + * force_probe lifted).
> > + */
> > +static void detect_preproduction_hw(struct xe_device *xe)
> > +{
> > +	struct xe_gt *gt;
> > +	int id;
> > +
> > +	/*
> > +	 * The "SW_CAP" fuse contains a bit indicating whether the device is a
> > +	 * production or pre-production device.  This fuse is reflected through
> > +	 * the GT "FUSE2" register, even though the contents of the fuse are
> > +	 * not GT-specific.  Every GT's reflection of this fuse should show the
> > +	 * same value, so we'll just use the first available GT for lookup.
> > +	 */
> > +	for_each_gt(gt, xe, id)
> > +		break;
> > +
> > +	if (!gt)
> > +		return;
> > +
> > +	CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT);
> > +	if (xe_force_wake_ref_has_domain(fw_ref.domains, XE_FW_GT)) {
> > +		xe_gt_err(gt, "Forcewake failure; cannot determine production/pre-production hw status.\n");
> > +		return;
> > +	}
> > +
> > +	if (xe_mmio_read32(&gt->mmio, FUSE2) & PRODUCTION_HW)
> > +		return;
> > +
> > +	xe_info(xe, "Pre-production hardware detected.\n");
> > +	if (xe->info.has_prod_wa_only) {
> 
> should we annotate the WA themselves? I was thinking sometime ago that
> we could have a configfs to force WAs in/out, so we have a quick path to
> test  a) dropping the WAs when they are not needed; and b) extending the
> WA, with the same action, to a new platform.

I think the main problem here is that during early development when
we're first implementing the workarounds and sending them upstream we
don't know yet which steppings are going to be production vs
pre-production.  Some (rare) platforms wind up having production parts
using A0, others have several steppings before being used in production.
The decision and documentation that "all production parts will have XX
stepping or later" is something that comes a bit too late for us.

We could go back and annotate them later once the information about
production steppings does get documented, but at that point we're
usually around the point where we're just ready to drop them anyway so
I'm not sure it's really worth it.

> 
> This patch itself lgtm and if we decided to extend, it can be done on
> top.
> 
> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>

Jani suggested that I invert the device flag to indicate that a platform
*does* still has pre-prod workarounds, and then we drop the flag later.
What are your thoughts on that suggestion?


Matt

> 
> thanks
> Lucas De Marchi

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

  reply	other threads:[~2025-11-20 18:36 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-19 22:05 [PATCH 1/2] drm/xe: Track pre-production workaround support Matt Roper
2025-11-19 22:05 ` [PATCH 2/2] drm/xe/lnl: Drop " Matt Roper
2025-11-20  1:46 ` ✓ CI.KUnit: success for series starting with [1/2] drm/xe: Track " Patchwork
2025-11-20  2:36 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-11-20  7:45 ` ✗ Xe.CI.Full: " Patchwork
2025-11-20 16:21 ` [PATCH v2 1/2] " Matt Roper
2025-11-20 17:00 ` [PATCH " Jani Nikula
2025-11-20 18:07 ` Lucas De Marchi
2025-11-20 18:35   ` Matt Roper [this message]
2025-11-20 19:13     ` Lucas De Marchi
2025-11-20 20:31 ` ✓ CI.KUnit: success for series starting with [v2,1/2] drm/xe: Track pre-production workaround support (rev2) Patchwork
2025-11-20 21:14 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-11-20 21:18   ` Matt Roper
2025-11-20 21:59     ` Michal Wajdeczko
2025-11-20 22:05       ` Matt Roper
2025-11-21  2:13 ` ✗ Xe.CI.Full: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251120183559.GC3905809@mdroper-desk1.amr.corp.intel.com \
    --to=matthew.d.roper@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox