From: Matthew Brost <matthew.brost@intel.com>
To: "Summers, Stuart" <stuart.summers@intel.com>
Cc: "Cavitt, Jonathan" <jonathan.cavitt@intel.com>,
"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
"Gupta, Saurabhg" <saurabhg.gupta@intel.com>,
"Zuo, Alex" <alex.zuo@intel.com>
Subject: Re: [PATCH v3] drm/xe/xe_guc_ct: Exit CT submission fence wait on GT reset
Date: Tue, 6 Jan 2026 16:35:53 -0800 [thread overview]
Message-ID: <aV2qaVmFEaW8IVIs@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <9e8fd19f0297426bde379e01a8a9ad3f109de12d.camel@intel.com>
On Tue, Jan 06, 2026 at 04:11:27PM -0700, Summers, Stuart wrote:
> On Tue, 2026-01-06 at 15:09 -0800, Matthew Brost wrote:
> > On Tue, Jan 06, 2026 at 08:55:34PM +0000, Jonathan Cavitt wrote:
> > > It's possible if unlikely that the GuC could be reset in the time
> > > between performing a guc_ct_send and the G2H fence completing in
> > > guc_ct_send_recv. Exit early if this is occurs.
> > >
> > > v2: Rebase
> > >
> > > v3: goto retry_same_fence if ct is not alive (Stuart)
> > >
> > > Suggested-by: Stuart Summers <stuart.summers@intel.com>
> > > Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> > > ---
> > > drivers/gpu/drm/xe/xe_guc_ct.c | 14 +++++++++-----
> > > 1 file changed, 9 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c
> > > b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > index dfbf76037b04..0bdcbe6503a7 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > @@ -1238,6 +1238,10 @@ int xe_guc_ct_send_g2h_handler(struct
> > > xe_guc_ct *ct, const u32 *action, u32 len)
> > > return ret;
> > > }
> > >
> > > +#define ct_alive(ct) \
> > > + (xe_guc_ct_enabled(ct) && !ct->ctbs.h2g.info.broken && \
> > > + !ct->ctbs.g2h.info.broken)
> > > +
> > > /*
> > > * Check if a GT reset is in progress or will occur and if GT
> > > reset brought the
> > > * CT back up. Randomly picking 5 seconds for an upper limit to do
> > > a GT a reset.
> > > @@ -1247,12 +1251,8 @@ static bool retry_failure(struct xe_guc_ct
> > > *ct, int ret)
> > > if (!(ret == -EDEADLK || ret == -EPIPE || ret == -ENODEV))
> > > return false;
> > >
> > > -#define ct_alive(ct) \
> > > - (xe_guc_ct_enabled(ct) && !ct->ctbs.h2g.info.broken && \
> > > - !ct->ctbs.g2h.info.broken)
> > > if (!wait_event_interruptible_timeout(ct->wq, ct_alive(ct),
> > > HZ * 5))
> > > return false;
> > > -#undef ct_alive
> > >
> > > return true;
> > > }
> > > @@ -1304,7 +1304,11 @@ static int guc_ct_send_recv(struct xe_guc_ct
> > > *ct, const u32 *action, u32 len,
> > > /* READ_ONCEs pairs with WRITE_ONCEs in parse_g2h_response
> > > * and g2h_fence_cancel.
> > > */
> > > - ret = wait_event_timeout(ct->g2h_fence_wq,
> > > READ_ONCE(g2h_fence.done), HZ);
> > > + ret = wait_event_timeout(ct->g2h_fence_wq, !ct_alive(ct) ||
> > > + READ_ONCE(g2h_fence.done), HZ);
> > > + if (!ct_alive(ct))
> >
> > I think you only want to do this if no_fail is set, but again that is
> > dead code and likely doesn't even work as is. I don't handle the
> > device
> > wedging nor does this patch, so either case could live lock.
>
> So is the suggestion to just bail out if we're in reset here? We could
We already do. guc_ct_change_state will set g2h_fence.done.
> also tear everything down like we do in the !ret call later. Or we
> could add a retry count here so we at least aren't stuck looping here
> forever.
>
I's say leave it as is or fix the no fail function version and use in
possible places we care about this (e.g. changing GuC PC freq - those
really shouldn't get lost across a GT reset or VF migration).
Matt
> Thanks,
> Stuart
>
> >
> > Matt
> >
> > > + goto retry_same_fence;
> > > +
> > > if (!ret) {
> > > LNL_FLUSH_WORK(&ct->g2h_worker);
> > > if (READ_ONCE(g2h_fence.done)) {
> > > --
> > > 2.43.0
> > >
>
next prev parent reply other threads:[~2026-01-07 0:36 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-06 20:55 [PATCH v3] drm/xe/xe_guc_ct: Exit CT submission fence wait on GT reset Jonathan Cavitt
2026-01-06 21:09 ` ✗ CI.checkpatch: warning for drm/xe/xe_guc_ct: Exit CT submission fence wait on GT reset (rev3) Patchwork
2026-01-06 21:10 ` ✓ CI.KUnit: success " Patchwork
2026-01-06 21:46 ` ✓ Xe.CI.BAT: " Patchwork
2026-01-06 23:09 ` [PATCH v3] drm/xe/xe_guc_ct: Exit CT submission fence wait on GT reset Matthew Brost
2026-01-06 23:11 ` Summers, Stuart
2026-01-07 0:35 ` Matthew Brost [this message]
2026-01-07 0:49 ` Summers, Stuart
2026-01-07 0:40 ` ✗ Xe.CI.Full: failure for drm/xe/xe_guc_ct: Exit CT submission fence wait on GT reset (rev3) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aV2qaVmFEaW8IVIs@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=alex.zuo@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jonathan.cavitt@intel.com \
--cc=saurabhg.gupta@intel.com \
--cc=stuart.summers@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox