From: "Summers, Stuart" <stuart.summers@intel.com>
To: "Brost, Matthew" <matthew.brost@intel.com>
Cc: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
"Ceraolo Spurio, Daniele" <daniele.ceraolospurio@intel.com>
Subject: Re: [PATCH] drm/xe: Reduce LRC timestamp stuck message on VFs to notice
Date: Wed, 14 Jan 2026 21:10:29 +0000 [thread overview]
Message-ID: <4e915c370dfda68d31e80813963811803b1ccf96.camel@intel.com> (raw)
In-Reply-To: <aWgFp0N3XQ0Vk75I@lstrano-desk.jf.intel.com>
On Wed, 2026-01-14 at 13:07 -0800, Matthew Brost wrote:
> On Wed, Jan 14, 2026 at 01:43:14PM -0700, Summers, Stuart wrote:
> > On Wed, 2026-01-14 at 12:38 -0800, Matthew Brost wrote:
> > > On Wed, Jan 14, 2026 at 01:25:55PM -0700, Summers, Stuart wrote:
> > > > On Wed, 2026-01-14 at 10:49 -0800, Matthew Brost wrote:
> > > > > An LRC timestamp getting stuck is a somewhat normal
> > > > > occurrence.
> > > > > If a
> > > > > single VF submits a job that does not get timesliced, the LRC
> > > > > timestamp
> > > > > will not increment. Reduce the LRC timestamp stuck message on
> > > > > VFs
> > > > > to
> > > > > notice (same log level as job timeout) to avoid false CI bugs
> > > > > in
> > > > > tests
> > > > > where a VF submits a job that does not get timesliced.
> > > >
> > > > Ok and if this happens and there is an actual problem, the
> > > > queue
> > > > will
> > > > be banned it looks like from the earlier patch, so should still
> > > > be
> > > > ok.
> > > >
> > > > My only question is why not move both cases (VF and non-VF) to
> > > > notice?
> > >
> > > The non-VF case the LRC timestamp should always be moving, if it
> > > isn't
> > > something has badly gone wrong and CI should notify us of the
> > > error.
> >
> > Ok but shouldn't the job timeout generally be the same? (should we
> > change that timeout to warn?)
> >
>
> A timeout message is at notice level. CI understands that, but the
> warn
> level here is triggering CI failures on VFs where this message is
> expected in addition to the timeout message. For non-VFs, we do not
> expect this message for job timeouts.
>
> > And sorry if this is a basic question... but why isn't that the
> > case
> > for VF too? Because on the VF side we are issuing resets as part of
> > the
>
> The algorithm to accurately sample the LRC timestamp relies on access
> to
> the engine MMIO timestamp, which a VF does not have. I’m told we can
Oh this is the piece I was missing and makes sense now. Thanks for the
explanation. And yeah no problem and I agree with the change here.
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
> make some configuration change to allow VF access to this register —
> I’ll follow up on that, but until then, let’s not break CI.
>
> Matt
>
> > submission flow normally? Or because we want to make sure the
> > resets
> > from one VF are hidden from another VF?
> >
> > Thanks,
> > Stuart
> >
> > >
> > > Matt
> > >
> > > > Is the idea that in the non-VF case, the reset is already an
> > > > error
> > > > and
> > > > so the extra warn is ok? But in that case I'd already expect
> > > > some
> > > > other
> > > > error message to trigger CI (like the engine reset notification
> > > > from
> > > > GuC). So the extra information here really isn't doing a whole
> > > > lot
> > > > more
> > > > from a warning level (CI triggering) perspective.
> > > >
> > > > Thanks,
> > > > Stuart
> > > >
> > > > >
> > > > > Closes:
> > > > > https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7032
> > > > > Fixes: bb63e7257e63 ("drm/xe: Avoid toggling schedule state
> > > > > to
> > > > > check
> > > > > LRC timestamp in TDR")
> > > > > Suggested-by: Daniele Ceraolo Spurio
> > > > > <daniele.ceraolospurio@intel.com>
> > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > ---
> > > > > drivers/gpu/drm/xe/xe_guc_submit.c | 11 ++++++++---
> > > > > 1 file changed, 8 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > index be8fa76baf1d..0311c89107f9 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > @@ -1319,9 +1319,14 @@ static bool check_timeout(struct
> > > > > xe_exec_queue
> > > > > *q, struct xe_sched_job *job)
> > > > >
> > > > > ctx_timestamp = lower_32_bits(xe_lrc_timestamp(q-
> > > > > > lrc[0]));
> > > > > if (ctx_timestamp == job->sample_timestamp) {
> > > > > - xe_gt_warn(gt, "Check job timeout: seqno=%u,
> > > > > lrc_seqno=%u, guc_id=%d, timestamp stuck",
> > > > > - xe_sched_job_seqno(job),
> > > > > xe_sched_job_lrc_seqno(job),
> > > > > - q->guc->id);
> > > > > + if (IS_SRIOV_VF(gt_to_xe(gt)))
> > > > > + xe_gt_notice(gt, "Check job timeout:
> > > > > seqno=%u, lrc_seqno=%u, guc_id=%d, timestamp stuck",
> > > > > + xe_sched_job_seqno(job),
> > > > > +
> > > > > xe_sched_job_lrc_seqno(job),
> > > > > q-
> > > > > > guc->id);
> > > > > + else
> > > > > + xe_gt_warn(gt, "Check job timeout:
> > > > > seqno=%u,
> > > > > lrc_seqno=%u, guc_id=%d, timestamp stuck",
> > > > > + xe_sched_job_seqno(job),
> > > > > +
> > > > > xe_sched_job_lrc_seqno(job),
> > > > > q-
> > > > > > guc->id);
> > > > >
> > > > > return xe_sched_invalidate_job(job, 0);
> > > > > }
> > > >
> >
next prev parent reply other threads:[~2026-01-14 21:10 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-14 18:49 [PATCH] drm/xe: Reduce LRC timestamp stuck message on VFs to notice Matthew Brost
2026-01-14 19:36 ` ✓ CI.KUnit: success for " Patchwork
2026-01-14 20:10 ` ✓ Xe.CI.BAT: " Patchwork
2026-01-14 20:25 ` [PATCH] " Summers, Stuart
2026-01-14 20:38 ` Matthew Brost
2026-01-14 20:43 ` Summers, Stuart
2026-01-14 21:07 ` Matthew Brost
2026-01-14 21:10 ` Summers, Stuart [this message]
2026-01-15 2:48 ` ✗ Xe.CI.Full: failure for " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4e915c370dfda68d31e80813963811803b1ccf96.camel@intel.com \
--to=stuart.summers@intel.com \
--cc=daniele.ceraolospurio@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox