From: Matthew Brost <matthew.brost@intel.com>
To: "Summers, Stuart" <stuart.summers@intel.com>
Cc: "Dugast, Francois" <francois.dugast@intel.com>,
"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
"Ceraolo Spurio, Daniele" <daniele.ceraolospurio@intel.com>,
"Wajdeczko, Michal" <Michal.Wajdeczko@intel.com>
Subject: Re: [PATCH v2 2/2] drm/xe: Remove H2G reads in CT send path in non-debug builds
Date: Tue, 17 Feb 2026 18:35:15 -0800 [thread overview]
Message-ID: <aZUlY866PShQPdLg@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <4920f526ed76e3e80cbb4bea3ae5523252221fa8.camel@intel.com>
On Tue, Feb 17, 2026 at 05:24:07PM -0700, Summers, Stuart wrote:
> On Tue, 2026-02-17 at 12:19 +0100, Francois Dugast wrote:
> > On Mon, Feb 16, 2026 at 09:40:19AM -0800, Matthew Brost wrote:
> > > On Mon, Feb 16, 2026 at 01:51:57PM +0100, Francois Dugast wrote:
> > > > mOn Fri, Feb 13, 2026 at 01:16:25PM -0800, Matthew Brost wrote:
> > > > > A single VRAM read on BMG can take over 1µs. While small, this
> > > > > is a
> > > > > non-trivial amount of time in a hot path. Remove the descriptor
> > > > > H2G read
> > > > > (potentially a VRAM access) from non-debug builds, as this
> > > > > error-checking code is not needed outside of debug
> > > > > configurations.
> > > >
> > > > About the change itself: I understand the performance benefit and
> > > > this would
> > > > make the code consistent with the 2 other reads.
> > > >
> > >
> > > Yes.
> > >
> > > > But the existing block under "if
> > > > (IS_ENABLED(CONFIG_DRM_XE_DEBUG))" seems wrong
> > > > because it is not just about optionally printing some debug
> > > > statements, in case
> > > > of error the function actually returns a different value if
> > > > CONFIG_DRM_XE_DEBUG
> > > > is enabled or not (goto), and this return value is used in
> > > > __guc_ct_send_locked.
> > > >
> > >
> > > I can't realy argue with this logic but I also see the benefits of
> > > verbose checking of GuC state in CI configs. fwiw, this logic was
> > > added
> > > here:
> > >
> > > git format-patch -1 d2c5a5a926f43
> > >
> > > The cost of reads are actually huge here - this patch by itself is
> > > incomplete as a tracepoint below does a desc_read in the argument
> > > list
> > > and this is uncoditionally executed even if ftrace is disabled. On
> > > BMG H2G
> > > before this patch / fixing ftrace are 3-4us, after ~300ns. So I'd
> > > suggest we move forward with with a couple of fixes patches first,
> > > then
> > > in a follow up either:
> > >
> > > The cost of reads is actually huge here — this patch by itself is
> > > incomplete, as a tracepoint below performs a desc_read in the
> > > argument
> > > list, and that is unconditionally executed even if ftrace is
> > > disabled.
> > > On BMG, H2G reads before this patch / fixing ftrace are 3–4-µs;
> > > after,
> > > they’re ~300ns. So I’d suggest we move forward with a couple of
> > > fixes
> > > patches first, and then in a follow-up:
> > >
> > > - Move all really expensive things in GuC CT layer under
> > > CONFIG_DRM_XE_DEBUG_GUC, enable this Kconfig some CI run
> > > - Perhaps just assert if GuC state is corrupted in the path
> > > mentioned
> > > above
>
> So is the suggestion here to enable this DEBUG_GUC parameter for all
> CI? We do rely on these error messages for baseline debug and sometimes
> things coming out of CI (or even a customer) are hard to reproduce. I'm
Customers would never turn this (CONFIG_DRM_XE_DEBUG) on unless we
explicitly ask them to rebuild their kernel and reproduce an issue.
> a little worried removing this will drag out debug for these types of
> problems.
I think we should basically leave this as is for now and focus on
getting the non-debug paths performing optimally in the smallest patch
we can. I’m planning for my next rev (which also fixes the tracepoint)
to be a fixes patch, since that alone gives a 10× speedup in H2G on BMG.
This starts to show up throughout the driver as performance improvements
(e.g., I’m seeing TLB invalidations get faster) because it reduces
contention on the CT locks.
We can debate DEBUG vs. DEBUG_GUC in a follow up, along with any
intended behavior changes tied to the debug knobs. FWIW, I really doubt
this has ever caught anything — this type of failure would be a
catastrophic GuC bug and would surface quickly in other ways.
Matt
>
> -Stuart
>
> >
> > Sounds good.
> >
> > >
> > > Matt
> > >
> > > > Francois
> > > >
> > > > >
> > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > ---
> > > > > drivers/gpu/drm/xe/xe_guc_ct.c | 14 +++++++-------
> > > > > 1 file changed, 7 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c
> > > > > b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > > > index 6a96bea40720..f200d3ee9d22 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > > > @@ -939,22 +939,22 @@ static int h2g_write(struct xe_guc_ct
> > > > > *ct, const u32 *action, u32 len,
> > > > > u32 full_len;
> > > > > struct iosys_map map = IOSYS_MAP_INIT_OFFSET(&h2g-
> > > > > >cmds,
> > > > > tail *
> > > > > sizeof(u32));
> > > > > - u32 desc_status;
> > > > >
> > > > > full_len = len + GUC_CTB_HDR_LEN;
> > > > >
> > > > > lockdep_assert_held(&ct->lock);
> > > > > xe_gt_assert(gt, full_len <= GUC_CTB_MSG_MAX_LEN);
> > > > >
> > > > > - desc_status = desc_read(xe, h2g, status);
> > > > > - if (desc_status) {
> > > > > - xe_gt_err(gt, "CT write: non-zero status:
> > > > > %u\n", desc_status);
> > > > > - goto corrupted;
> > > > > - }
> > > > > -
> > > > > if (IS_ENABLED(CONFIG_DRM_XE_DEBUG)) {
> > > > > u32 desc_tail = desc_read(xe, h2g, tail);
> > > > > u32 desc_head = desc_read(xe, h2g, head);
> > > > > + u32 desc_status;
> > > > > +
> > > > > + desc_status = desc_read(xe, h2g, status);
> > > > > + if (desc_status) {
> > > > > + xe_gt_err(gt, "CT write: non-zero
> > > > > status: %u\n", desc_status);
> > > > > + goto corrupted;
> > > > > + }
> > > > >
> > > > > if (tail != desc_tail) {
> > > > > desc_write(xe, h2g, status, desc_status
> > > > > | GUC_CTB_STATUS_MISMATCH);
> > > > > --
> > > > > 2.34.1
> > > > >
>
next prev parent reply other threads:[~2026-02-18 2:35 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-13 21:16 [PATCH v2 0/2] GuC CT memory optimizations Matthew Brost
2026-02-13 21:16 ` [PATCH v2 1/2] drm/xe: Split H2G and G2H into separate buffer objects Matthew Brost
2026-02-13 22:55 ` Michal Wajdeczko
2026-02-14 0:54 ` Matthew Brost
2026-02-17 16:44 ` Matthew Brost
2026-02-13 21:16 ` [PATCH v2 2/2] drm/xe: Remove H2G reads in CT send path in non-debug builds Matthew Brost
2026-02-16 12:51 ` Francois Dugast
2026-02-16 17:40 ` Matthew Brost
2026-02-17 11:19 ` Francois Dugast
2026-02-18 0:24 ` Summers, Stuart
2026-02-18 2:35 ` Matthew Brost [this message]
2026-02-13 21:59 ` ✓ CI.KUnit: success for GuC CT memory optimizations Patchwork
2026-02-13 22:39 ` ✓ Xe.CI.BAT: " Patchwork
2026-02-14 21:35 ` ✗ Xe.CI.FULL: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aZUlY866PShQPdLg@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=Michal.Wajdeczko@intel.com \
--cc=daniele.ceraolospurio@intel.com \
--cc=francois.dugast@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=stuart.summers@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox