Re: [PATCH 1/1] drm/xe/eustall: Add workaround 14027094826 to graphics IP 35.11

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Harish Chegondi <harish.chegondi@intel.com>
To: Matt Roper <matthew.d.roper@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <harish.chegondi@intel.com>
Subject: Re: [PATCH 1/1] drm/xe/eustall: Add workaround 14027094826 to graphics IP 35.11
Date: Tue, 16 Jun 2026 15:57:40 -0700	[thread overview]
Message-ID: <ajHU5EFJw-Lpy7Zd@intel.com> (raw)
In-Reply-To: <20260612212543.GJ6214@mdroper-desk1.amr.corp.intel.com>

On Fri, Jun 12, 2026 at 02:25:43PM -0700, Matt Roper wrote:
> On Thu, Jun 11, 2026 at 04:34:27PM -0700, Harish Chegondi wrote:
> > Before enabling EU stall sampling, write 0x20 to the SWF scratch pad
> > register. When the firmware sees this register set to 0x20,
> > it applies its part of the workaround and sets the scratch pad register
> > to 0x60.
> > 
> > Before disabling EU stall sampling, write 0x40 to the SWF scratch pad
> > register. When the firmware sees this register set to 0x40, it reverts
> > its part of the workaround and sets the scratch pad
> > register to 0.
> > 
> > Bspec update for the SWF scratch pad register is still pending, but has
> > been confirmed offline with the firmware team.
> 
> I think the key point we should try to get across in the commit message
> is that a firmware component, not the driver, is responsible for the
> actual implementation of this workaround, but that firmware needs the
> driver to notify it whenever EU stall sampling is being enabled/disabled
> so that it can do the necessary actions on its side.  A specific
> register is being used to communicate requests/acks about the
> enable/disable status between Xe <-> firmware.

I will improve the commit message to convey this in the next version of
the patch.

> 
> Hmm, what's actually documented in the workaround database right now
> says that the Xe <-> firmware communication is supposed to happen on
> register EUSTALL5 (i.e., 0x53c[9:3]), not the scratchpad register we're
> using here.  Is the workaround database's description outdated?

Yes, the workaround description in the database is outdated. I have
requested the relevant people to update the description.
> 
> > 
> > Bspec: 53188
> > Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
> > ---
> >  drivers/gpu/drm/xe/regs/xe_gt_regs.h |  3 ++
> >  drivers/gpu/drm/xe/xe_eu_stall.c     | 46 ++++++++++++++++++++++++++--
> >  drivers/gpu/drm/xe/xe_wa_oob.rules   |  1 +
> >  3 files changed, 47 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> > index 08251c7a1a4b..b86050503c4e 100644
> > --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> > +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> > @@ -616,6 +616,9 @@
> >  #define   CCS_MODE_CSLICE(cslice, ccs) \
> >  	((ccs) << ((cslice) * CCS_MODE_CSLICE_WIDTH))
> >  
> > +#define SWF_SCRATCHPAD				XE_REG(0x4f000)
> > +#define   POISON_SUPP_MASK			REG_GENMASK(6, 5)
> > +
> >  #define FORCEWAKE_ACK_GT			XE_REG(0x130044)
> >  
> >  /* Applicable for all FORCEWAKE_DOMAIN and FORCEWAKE_ACK_DOMAIN regs */
> > diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c b/drivers/gpu/drm/xe/xe_eu_stall.c
> > index d37770c58c5d..4d0f97ca4cdb 100644
> > --- a/drivers/gpu/drm/xe/xe_eu_stall.c
> > +++ b/drivers/gpu/drm/xe/xe_eu_stall.c
> > @@ -7,6 +7,7 @@
> >  #include <linux/fs.h>
> >  #include <linux/poll.h>
> >  #include <linux/types.h>
> > +#include <linux/iopoll.h>
> >  
> >  #include <drm/drm_drv.h>
> >  #include <generated/xe_wa_oob.h>
> > @@ -20,6 +21,7 @@
> >  #include "xe_gt_printk.h"
> >  #include "xe_gt_topology.h"
> >  #include "xe_macros.h"
> > +#include "xe_mmio.h"
> >  #include "xe_observation.h"
> >  #include "xe_pm.h"
> >  #include "xe_trace.h"
> > @@ -682,7 +684,7 @@ static int xe_eu_stall_stream_enable(struct xe_eu_stall_data_stream *stream)
> >  	struct per_xecore_buf *xecore_buf;
> >  	struct xe_gt *gt = stream->gt;
> >  	u16 group, instance;
> > -	int xecore;
> > +	int xecore, ret = 0;
> >  
> >  	/* Take runtime pm ref and forcewake to disable RC6 */
> >  	xe_pm_runtime_get(gt_to_xe(gt));
> > @@ -693,6 +695,26 @@ static int xe_eu_stall_stream_enable(struct xe_eu_stall_data_stream *stream)
> >  		return -ETIMEDOUT;
> >  	}
> >  
> > +	if (XE_GT_WA(gt, 14027094826)) {
> 
> It looks like the actual workaround number (i.e., the lineage ID that
> stays constant across all releases a workaround may apply to) is
> 14027054324.  So we should be referring to this as Wa_14027054324, not
> Wa_14027094826.
Will change the HSD number in the next version of the patch.
> 
> 
> > +		/**
> 
> A "/**" signifies the start of a kerneldoc block and this isn't
> kerneldoc, just a regular C comment.  Same for the next one below.
Will fix both of these commits in the next version.
> 
> > +		 * Writing 0x20 to SWF register will signal the firmware to suppress
> > +		 * poison propagation.
> > +		 */
> > +		xe_mmio_write32(&gt->mmio, SWF_SCRATCHPAD, 0x20);
> > +		/**
> > +		 * Firmware will write 0x60 to SWF after suppressing poison propagation.
> > +		 * Wait until the SWF value turns 0x60 before enabling EU stall sampling.
> > +		 */
> > +		ret = read_poll_timeout(xe_mmio_read32, reg_value,
> > +					(reg_value & POISON_SUPP_MASK) == 0x60,
> 
> We don't really need any mention of "poison" in either the comments or
> #define's since that isn't relevant to what we're implementing here.
> We're just conveying EUstall's enable/disable status to the firmware
> (and waiting to get an ack back); whatever steps the firmware takes to
> actually implement the workaround is outside the scope of the Xe driver
> and trying to reference them here just causes ambiguity and confusion.
Will remove "poison" from the comments and #defines in the next rev.
> 
> > +					1000, 1000000, true, &gt->mmio, SWF_SCRATCHPAD);
> 
> As noted above, the workaround database is mentioning a different
> register (and bitfield) than we're working with.  But even if we assume
> that the information there is out of date, we still need to make sure
> we're handling our reads and writes in a consistent manner.  The write
> above seems to be assuming that this communication is the only thing
> this register is being used for (so it's fine to just write a 0x20 which
> will wipe all other register bits back to 0), but the read seems to be
> assuming that other bits _are_ being used for other purposes and it's
> masking the value to just extract two specific bits.  We should clarify
> which is the case.  If this workaround's communication channel isn't the
> exclusive owner of this register, then we probably need to be doing a
> RMW of the register (and maybe also ensuring that we aren't racing with
> other register updates somehow?).
I have requested the workaround be updated in the database to use the
scratchpad register instead of the EU stall register. I will also change
writes to the scratchpad register to rmw. Looks like this workaround is
the only place this scratchpad register is being used. So, no race
conditions as of now?
> 
> We should also probably #define the 0x20 / 0x40 / 0x60 / 0x0 magic
> numbers with names like {REQ,ACK}_EUSTALL_{ENABLE,DISABLE}.

Will add these defines in the next version.
> 
> 
> 
> Matt

Thank you
Harish.
> 
> > +		if (ret) {
> > +			xe_gt_err(gt, "Time out while firmware is disabling poison propagation\n");
> > +			xe_force_wake_put(gt_to_fw(gt), stream->fw_ref);
> > +			xe_pm_runtime_put(gt_to_xe(gt));
> > +			return ret;
> > +		}
> > +	}
> >  	if (XE_GT_WA(gt, 22016596838))
> >  		xe_gt_mcr_multicast_write(gt, ROW_CHICKEN2,
> >  					  REG_MASKED_FIELD_ENABLE(DISABLE_DOP_GATING));
> > @@ -730,7 +752,7 @@ static int xe_eu_stall_stream_enable(struct xe_eu_stall_data_stream *stream)
> >  	reg_value |= XEHPC_EUSTALL_BASE_ENABLE_SAMPLING;
> >  	xe_gt_mcr_multicast_write(gt, XEHPC_EUSTALL_BASE, reg_value);
> >  
> > -	return 0;
> > +	return ret;
> >  }
> >  
> >  static void eu_stall_data_buf_poll_work_fn(struct work_struct *work)
> > @@ -840,6 +862,8 @@ static int xe_eu_stall_enable_locked(struct xe_eu_stall_data_stream *stream)
> >  static int xe_eu_stall_disable_locked(struct xe_eu_stall_data_stream *stream)
> >  {
> >  	struct xe_gt *gt = stream->gt;
> > +	u32 reg_value;
> > +	int ret = 0;
> >  
> >  	if (!stream->enabled)
> >  		return 0;
> > @@ -853,11 +877,27 @@ static int xe_eu_stall_disable_locked(struct xe_eu_stall_data_stream *stream)
> >  	if (XE_GT_WA(gt, 22016596838))
> >  		xe_gt_mcr_multicast_write(gt, ROW_CHICKEN2,
> >  					  REG_MASKED_FIELD_DISABLE(DISABLE_DOP_GATING));
> > +	if (XE_GT_WA(gt, 14027094826)) {
> > +		/**
> > +		 * Writing 0x40 to SWF register will signal the firmware to enable
> > +		 * poison propagation.
> > +		 */
> > +		xe_mmio_write32(&gt->mmio, SWF_SCRATCHPAD, 0x40);
> > +		/**
> > +		 * Firmware will write 0 to SWF after enabling poison propagation.
> > +		 * Wait until the SWF register value turns 0.
> > +		 */
> > +		ret = read_poll_timeout(xe_mmio_read32, reg_value,
> > +					(reg_value & POISON_SUPP_MASK) == 0,
> > +					1000, 1000000, true, &gt->mmio, SWF_SCRATCHPAD);
> > +		if (ret)
> > +			xe_gt_err(gt, "Time out while firmware is enabling poison propagation\n");
> > +	}
> >  
> >  	xe_force_wake_put(gt_to_fw(gt), stream->fw_ref);
> >  	xe_pm_runtime_put(gt_to_xe(gt));
> >  
> > -	return 0;
> > +	return ret;
> >  }
> >  
> >  static long xe_eu_stall_stream_ioctl_locked(struct xe_eu_stall_data_stream *stream,
> > diff --git a/drivers/gpu/drm/xe/xe_wa_oob.rules b/drivers/gpu/drm/xe/xe_wa_oob.rules
> > index f8a185103b80..293aa83feca7 100644
> > --- a/drivers/gpu/drm/xe/xe_wa_oob.rules
> > +++ b/drivers/gpu/drm/xe/xe_wa_oob.rules
> > @@ -65,3 +65,4 @@
> >  
> >  14025883347	MEDIA_VERSION_RANGE(1301, 3503)
> >  		GRAPHICS_VERSION_RANGE(2004, 3005)
> > +14027094826	GRAPHICS_VERSION(3511)
> > -- 
> > 2.43.0
> > 
> 
> -- 
> Matt Roper
> Graphics Software Engineer
> Linux GPU Platform Enablement
> Intel Corporation

     prev parent reply	other threads:[~2026-06-16 22:57 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-11 23:34 [PATCH 1/1] drm/xe/eustall: Add workaround 14027094826 to graphics IP 35.11 Harish Chegondi
2026-06-11 23:41 ` ✓ CI.KUnit: success for series starting with [1/1] " Patchwork
2026-06-12  0:18 ` ✓ Xe.CI.BAT: " Patchwork
2026-06-12 17:34 ` ✓ Xe.CI.FULL: " Patchwork
2026-06-12 21:25 ` [PATCH 1/1] " Matt Roper
2026-06-16 22:57   ` Harish Chegondi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajHU5EFJw-Lpy7Zd@intel.com \
    --to=harish.chegondi@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.d.roper@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.