From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: "Cavitt, Jonathan" <jonathan.cavitt@intel.com>
Cc: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
"Gupta, Saurabhg" <saurabhg.gupta@intel.com>,
"Zuo, Alex" <alex.zuo@intel.com>,
"Wajdeczko, Michal" <Michal.Wajdeczko@intel.com>,
"Brost, Matthew" <matthew.brost@intel.com>,
"Ceraolo Spurio, Daniele" <daniele.ceraolospurio@intel.com>
Subject: Re: [PATCH v2] drm/xe/xe_guc_ct: Prevent compiler read/write optimization breaks
Date: Tue, 16 Dec 2025 10:43:34 -0500 [thread overview]
Message-ID: <aUF-Jt1F9qHJSwaW@intel.com> (raw)
In-Reply-To: <CH0PR11MB54444EE0367FE3902378C680E5ADA@CH0PR11MB5444.namprd11.prod.outlook.com>
On Mon, Dec 15, 2025 at 01:00:21PM -0500, Cavitt, Jonathan wrote:
> -----Original Message-----
> From: Vivi, Rodrigo <rodrigo.vivi@intel.com>
> Sent: Monday, December 15, 2025 9:30 AM
> To: Cavitt, Jonathan <jonathan.cavitt@intel.com>
> Cc: intel-xe@lists.freedesktop.org; Gupta, Saurabhg <saurabhg.gupta@intel.com>; Zuo, Alex <alex.zuo@intel.com>; Wajdeczko, Michal <Michal.Wajdeczko@intel.com>; Brost, Matthew <matthew.brost@intel.com>; Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>
> Subject: Re: [PATCH v2] drm/xe/xe_guc_ct: Prevent compiler read/write optimization breaks
> >
> > On Wed, Dec 10, 2025 at 05:44:13PM +0000, Jonathan Cavitt wrote:
> > > Use READ_ONCE and WRITE_ONCE when operating on ct->state and the
> > > g2h_fence->done values to prevent the compiler from ignoring these
> > > necessary operations.
> > >
> > > v2: (Matt Brost)
> > > - Add Fixes tags
> > > - Add comments
> > >
> > > Fixes: 94de94d24ea8 ("drm/xe/guc: Cancel ongoing H2G requests when stopping CT")
> > > Fixes: dc75d03716fe ("drm/xe/guc: Add more GuC CT states")
> > > Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> > > Fixes: 0b93b7dcd9eb ("drm/xe: Fix early wedge on GuC load failure")
> >
> > I really doubt that a 4 line patch is fixing 4 different patches...
> > I believe the important thing here is to identify the original patch
> > that introduced the wrong concept, not all individual patches that latest
> > touched that line...
> >
> > > Suggested-by: Matthew Brost <matthew.brost@intel.com>
> > > Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> > > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > > ---
> > > drivers/gpu/drm/xe/xe_guc_ct.c | 14 +++++++++++---
> > > drivers/gpu/drm/xe/xe_guc_ct.h | 6 ++++--
> > > 2 files changed, 15 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > index 648f0f523abb..4ee628fe34b9 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > @@ -206,7 +206,9 @@ static void g2h_fence_cancel(struct g2h_fence *g2h_fence)
> > > {
> > > g2h_fence->cancel = true;
> > > g2h_fence->fail = true;
> > > - g2h_fence->done = true;
> > > +
> > > + /* WRITE_ONCE pairs with wait_event_timeout in guc_ct_send_recv */
> > > + WRITE_ONCE(g2h_fence->done, true);
>
>
> The function g2h_fence_cancel was introduced in its entirety by
> 94de94d24ea8 ("drm/xe/guc: Cancel ongoing H2G requests when stopping CT")
> and did not exist in any other form before then.
>
> You could argue that setting g2h_fence->done this was had precedent
> from the initial Xe commit, but as this function did not exist before commit
> 94de94d24ea8 ("drm/xe/guc: Cancel ongoing H2G requests when stopping CT"),
> it would be difficult to argue the change to this function fixes anything prior.
>
>
> > > }
> > >
> > > static bool g2h_fence_needs_alloc(struct g2h_fence *g2h_fence)
> > > @@ -527,7 +529,12 @@ static void guc_ct_change_state(struct xe_guc_ct *ct,
> > > if (ct->g2h_outstanding)
> > > xe_pm_runtime_put(ct_to_xe(ct));
> > > ct->g2h_outstanding = 0;
> > > - ct->state = state;
> > > +
> > > + /*
> > > + * WRITE_ONCE pairs with READ_ONCEs in xe_guc_ct_initialized and
> > > + * xe_guc_ct_enabled.
> > > + */
> > > + WRITE_ONCE(ct->state, state);
>
>
> The function guc_ct_change_state was introduced in commit
> dc75d03716fe ("drm/xe/guc: Add more GuC CT states").
>
> This patch also introduced the ct->state flag, replacing the previous
> ct->enabled Boolean. So, this concept did not exist before then.
>
>
> > >
> > > xe_gt_dbg(gt, "GuC CT communication channel %s\n",
> > > state == XE_GUC_CT_STATE_STOPPED ? "stopped" :
> > > @@ -1496,7 +1503,8 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len)
> > >
> > > g2h_release_space(ct, GUC_CTB_HXG_MSG_MAX_LEN);
> > >
> > > - g2h_fence->done = true;
> > > + /* WRITE_ONCE pairs with wait_event_timeout in guc_ct_send_recv */
> > > + WRITE_ONCE(g2h_fence->done, true);
>
>
> This section of parse_g2h_response has been untouched since the initial
> Xe commit
> dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs").
>
>
> > > smp_mb();
> > >
> > > wake_up_all(&ct->g2h_fence_wq);
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.h b/drivers/gpu/drm/xe/xe_guc_ct.h
> > > index 5599939f8fe1..8d318b094f33 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_ct.h
> > > +++ b/drivers/gpu/drm/xe/xe_guc_ct.h
> > > @@ -30,12 +30,14 @@ void xe_guc_ct_print(struct xe_guc_ct *ct, struct drm_printer *p, bool want_ctb)
> > >
> > > static inline bool xe_guc_ct_initialized(struct xe_guc_ct *ct)
> > > {
> > > - return ct->state != XE_GUC_CT_STATE_NOT_INITIALIZED;
> > > + /* READ_ONCE pairs with WRITE_ONCE in guc_ct_change_state */
> > > + return READ_ONCE(ct->state) != XE_GUC_CT_STATE_NOT_INITIALIZED;
>
>
> xe_guc_ct_initialized was added as a part of the patch
> 0b93b7dcd9eb ("drm/xe: Fix early wedge on GuC load failure").
>
> One could argue that this function was added with precedent
> set by the patch
> dc75d03716fe ("drm/xe/guc: Add more GuC CT states"),
> which also introduced the XE_GUC_CT_STATE_NOT_INITIALIZED
> flag. However, as this function did not exist prior to
> 0b93b7dcd9eb ("drm/xe: Fix early wedge on GuC load failure"),
> it would be difficult to argue that the change made to this function
> fixed any commits prior to its existence.
>
>
> > > }
> > >
> > > static inline bool xe_guc_ct_enabled(struct xe_guc_ct *ct)
> > > {
> > > - return ct->state == XE_GUC_CT_STATE_ENABLED;
> > > + /* READ_ONCE pairs with WRITE_ONCE in guc_ct_change_state */
> > > + return READ_ONCE(ct->state) == XE_GUC_CT_STATE_ENABLED;
>
>
> The function xe_guc_ct_enabled was introduced in the patch
> dc75d03716fe ("drm/xe/guc: Add more GuC CT states").
All these explanation makes total sense. Thanks for that.
But now it is clear to me that it deserves 4 different commits.
Each line fixing one patch on separate patches.
>
> -Jonathan Cavitt
>
>
> > > }
> > >
> > > static inline void xe_guc_ct_irq_handler(struct xe_guc_ct *ct)
> > > --
> > > 2.43.0
> > >
> >
prev parent reply other threads:[~2025-12-16 15:43 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-10 17:44 [PATCH v2] drm/xe/xe_guc_ct: Prevent compiler read/write optimization breaks Jonathan Cavitt
2025-12-10 20:23 ` ✓ CI.KUnit: success for drm/xe/xe_guc_ct: Prevent compiler read/write optimization breaks (rev2) Patchwork
2025-12-10 21:19 ` ✓ Xe.CI.BAT: " Patchwork
2025-12-11 6:53 ` ✗ Xe.CI.Full: failure " Patchwork
2025-12-15 17:30 ` [PATCH v2] drm/xe/xe_guc_ct: Prevent compiler read/write optimization breaks Rodrigo Vivi
2025-12-15 18:00 ` Cavitt, Jonathan
2025-12-16 15:43 ` Rodrigo Vivi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aUF-Jt1F9qHJSwaW@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=Michal.Wajdeczko@intel.com \
--cc=alex.zuo@intel.com \
--cc=daniele.ceraolospurio@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jonathan.cavitt@intel.com \
--cc=matthew.brost@intel.com \
--cc=saurabhg.gupta@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.