From: Matthew Brost <matthew.brost@intel.com>
To: "Summers, Stuart" <stuart.summers@intel.com>
Cc: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
"Ceraolo Spurio, Daniele" <daniele.ceraolospurio@intel.com>
Subject: Re: [PATCH] drm/xe: Reduce LRC timestamp stuck message on VFs to notice
Date: Wed, 14 Jan 2026 13:07:51 -0800 [thread overview]
Message-ID: <aWgFp0N3XQ0Vk75I@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <563ccee86f66c06aa0eebc9430b06641eff36857.camel@intel.com>
On Wed, Jan 14, 2026 at 01:43:14PM -0700, Summers, Stuart wrote:
> On Wed, 2026-01-14 at 12:38 -0800, Matthew Brost wrote:
> > On Wed, Jan 14, 2026 at 01:25:55PM -0700, Summers, Stuart wrote:
> > > On Wed, 2026-01-14 at 10:49 -0800, Matthew Brost wrote:
> > > > An LRC timestamp getting stuck is a somewhat normal occurrence.
> > > > If a
> > > > single VF submits a job that does not get timesliced, the LRC
> > > > timestamp
> > > > will not increment. Reduce the LRC timestamp stuck message on VFs
> > > > to
> > > > notice (same log level as job timeout) to avoid false CI bugs in
> > > > tests
> > > > where a VF submits a job that does not get timesliced.
> > >
> > > Ok and if this happens and there is an actual problem, the queue
> > > will
> > > be banned it looks like from the earlier patch, so should still be
> > > ok.
> > >
> > > My only question is why not move both cases (VF and non-VF) to
> > > notice?
> >
> > The non-VF case the LRC timestamp should always be moving, if it
> > isn't
> > something has badly gone wrong and CI should notify us of the error.
>
> Ok but shouldn't the job timeout generally be the same? (should we
> change that timeout to warn?)
>
A timeout message is at notice level. CI understands that, but the warn
level here is triggering CI failures on VFs where this message is
expected in addition to the timeout message. For non-VFs, we do not
expect this message for job timeouts.
> And sorry if this is a basic question... but why isn't that the case
> for VF too? Because on the VF side we are issuing resets as part of the
The algorithm to accurately sample the LRC timestamp relies on access to
the engine MMIO timestamp, which a VF does not have. I’m told we can
make some configuration change to allow VF access to this register —
I’ll follow up on that, but until then, let’s not break CI.
Matt
> submission flow normally? Or because we want to make sure the resets
> from one VF are hidden from another VF?
>
> Thanks,
> Stuart
>
> >
> > Matt
> >
> > > Is the idea that in the non-VF case, the reset is already an error
> > > and
> > > so the extra warn is ok? But in that case I'd already expect some
> > > other
> > > error message to trigger CI (like the engine reset notification
> > > from
> > > GuC). So the extra information here really isn't doing a whole lot
> > > more
> > > from a warning level (CI triggering) perspective.
> > >
> > > Thanks,
> > > Stuart
> > >
> > > >
> > > > Closes:
> > > > https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7032
> > > > Fixes: bb63e7257e63 ("drm/xe: Avoid toggling schedule state to
> > > > check
> > > > LRC timestamp in TDR")
> > > > Suggested-by: Daniele Ceraolo Spurio
> > > > <daniele.ceraolospurio@intel.com>
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > > drivers/gpu/drm/xe/xe_guc_submit.c | 11 ++++++++---
> > > > 1 file changed, 8 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > index be8fa76baf1d..0311c89107f9 100644
> > > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > @@ -1319,9 +1319,14 @@ static bool check_timeout(struct
> > > > xe_exec_queue
> > > > *q, struct xe_sched_job *job)
> > > >
> > > > ctx_timestamp = lower_32_bits(xe_lrc_timestamp(q-
> > > > >lrc[0]));
> > > > if (ctx_timestamp == job->sample_timestamp) {
> > > > - xe_gt_warn(gt, "Check job timeout: seqno=%u,
> > > > lrc_seqno=%u, guc_id=%d, timestamp stuck",
> > > > - xe_sched_job_seqno(job),
> > > > xe_sched_job_lrc_seqno(job),
> > > > - q->guc->id);
> > > > + if (IS_SRIOV_VF(gt_to_xe(gt)))
> > > > + xe_gt_notice(gt, "Check job timeout:
> > > > seqno=%u, lrc_seqno=%u, guc_id=%d, timestamp stuck",
> > > > + xe_sched_job_seqno(job),
> > > > + xe_sched_job_lrc_seqno(job),
> > > > q-
> > > > > guc->id);
> > > > + else
> > > > + xe_gt_warn(gt, "Check job timeout:
> > > > seqno=%u,
> > > > lrc_seqno=%u, guc_id=%d, timestamp stuck",
> > > > + xe_sched_job_seqno(job),
> > > > + xe_sched_job_lrc_seqno(job),
> > > > q-
> > > > > guc->id);
> > > >
> > > > return xe_sched_invalidate_job(job, 0);
> > > > }
> > >
>
next prev parent reply other threads:[~2026-01-14 21:08 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-14 18:49 [PATCH] drm/xe: Reduce LRC timestamp stuck message on VFs to notice Matthew Brost
2026-01-14 19:36 ` ✓ CI.KUnit: success for " Patchwork
2026-01-14 20:10 ` ✓ Xe.CI.BAT: " Patchwork
2026-01-14 20:25 ` [PATCH] " Summers, Stuart
2026-01-14 20:38 ` Matthew Brost
2026-01-14 20:43 ` Summers, Stuart
2026-01-14 21:07 ` Matthew Brost [this message]
2026-01-14 21:10 ` Summers, Stuart
2026-01-15 2:48 ` ✗ Xe.CI.Full: failure for " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aWgFp0N3XQ0Vk75I@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=daniele.ceraolospurio@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=stuart.summers@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox