From: "Lis, Tomasz" <tomasz.lis@intel.com>
To: Matthew Brost <matthew.brost@intel.com>,
<intel-xe@lists.freedesktop.org>
Cc: <michal.wajdeczko@intel.com>
Subject: Re: [PATCH v2] drm/xe/vf: Start re-emission from first unsignaled job during VF migration
Date: Sat, 1 Nov 2025 02:33:07 +0100 [thread overview]
Message-ID: <9c179328-bc36-49c6-9147-869b9ce2f77b@intel.com> (raw)
In-Reply-To: <20251031201345.3015516-1-matthew.brost@intel.com>
[-- Attachment #1: Type: text/plain, Size: 4078 bytes --]
On 10/31/2025 9:13 PM, Matthew Brost wrote:
> The LRC software ring tail is reset to the first unsignaled pending
> job's head.
>
> Fix the re-emission logic to begin submitting from the first unsignaled
> job detected, rather than scanning all pending jobs, which can cause
> imbalance.
>
> v2:
> - Include missing local changes
>
> Fixes: c25c1010df88 ("drm/xe/vf: Replay GuC submission state on pause / unpause")
> Signed-off-by: Matthew Brost<matthew.brost@intel.com>
> ---
> drivers/gpu/drm/xe/xe_gpu_scheduler.h | 5 +++--
> drivers/gpu/drm/xe/xe_guc_submit.c | 19 +++++++++++--------
> 2 files changed, 14 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.h b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> index 9955397aaaa9..357afaec68d7 100644
> --- a/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> +++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> @@ -54,13 +54,14 @@ static inline void xe_sched_tdr_queue_imm(struct xe_gpu_scheduler *sched)
> static inline void xe_sched_resubmit_jobs(struct xe_gpu_scheduler *sched)
> {
> struct drm_sched_job *s_job;
> + bool skip_emit = false;
>
> list_for_each_entry(s_job, &sched->base.pending_list, list) {
> struct drm_sched_fence *s_fence = s_job->s_fence;
> struct dma_fence *hw_fence = s_fence->parent;
>
> - if (to_xe_sched_job(s_job)->skip_emit ||
> - (hw_fence && !dma_fence_is_signaled(hw_fence)))
> + skip_emit |= to_xe_sched_job(s_job)->skip_emit;
> + if (skip_emit || (hw_fence && !dma_fence_is_signaled(hw_fence)))
This looks ok, but what is the mechanism which could lead to a job after
the first `skip_emit=1` job to have the `skip_emit` flag lifted?
Wouldn't the only possibility be that jobs were executed out of order?
> sched->base.ops->run_job(s_job);
> }
> }
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index d4ffdb71ef3d..f25b71aca498 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -2152,6 +2152,8 @@ static void guc_exec_queue_pause(struct xe_guc *guc, struct xe_exec_queue *q)
>
> job = xe_sched_first_pending_job(sched);
> if (job) {
> + job->skip_emit = true;
> +
> /*
> * Adjust software tail so jobs submitted overwrite previous
> * position in ring buffer with new GGTT addresses.
> @@ -2241,17 +2243,18 @@ static void guc_exec_queue_unpause_prepare(struct xe_guc *guc,
> struct xe_exec_queue *q)
> {
> struct xe_gpu_scheduler *sched = &q->guc->sched;
> - struct drm_sched_job *s_job;
> struct xe_sched_job *job = NULL;
> + bool skip_emit = false;
>
> - list_for_each_entry(s_job, &sched->base.pending_list, list) {
> - job = to_xe_sched_job(s_job);
> -
> - xe_gt_dbg(guc_to_gt(guc), "Replay JOB - guc_id=%d, seqno=%d",
> - q->guc->id, xe_sched_job_seqno(job));
> + list_for_each_entry(job, &sched->base.pending_list, drm.list) {
> + skip_emit |= job->skip_emit;
All emitted jobs have the skip_emit set, unless their EQ got submitted
to GuC which clears it, but if it got unsubmitted without finishing then
the flag is raised again.
So this does seem to select unfinished jobs.
Though this introduces an assertion that within Command Streamer ring
area of a job, there are no GGTT references between seqno increment and
end of the job commands. Maybe worth commenting in code that we're
working on that assumption? Example issue would be if someone introduces
saving some kind of metrics there.
(this assumption was in power before this patch too, but now as we're
skipping fixups for finished jobs still in pending list, it becomes more
important)
Also we're emitting jobs which have a flag names "skip_emit" set. This
disconnect needs a comment too.
-Tomasz
> + if (skip_emit) {
> + xe_gt_dbg(guc_to_gt(guc), "Replay JOB - guc_id=%d, seqno=%d",
> + q->guc->id, xe_sched_job_seqno(job));
>
> - q->ring_ops->emit_job(job);
> - job->skip_emit = true;
> + q->ring_ops->emit_job(job);
> + job->skip_emit = true;
> + }
> }
>
> if (job)
[-- Attachment #2: Type: text/html, Size: 5486 bytes --]
next prev parent reply other threads:[~2025-11-01 1:33 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-31 20:13 [PATCH v2] drm/xe/vf: Start re-emission from first unsignaled job during VF migration Matthew Brost
2025-10-31 22:25 ` ✓ CI.KUnit: success for " Patchwork
2025-10-31 23:43 ` ✓ Xe.CI.BAT: " Patchwork
2025-11-01 1:33 ` Lis, Tomasz [this message]
2025-11-07 7:48 ` [PATCH v2] " Matthew Brost
2025-11-19 2:48 ` Lis, Tomasz
2025-11-19 19:55 ` Matthew Brost
2025-11-20 0:31 ` Lis, Tomasz
2025-11-01 13:52 ` ✗ Xe.CI.Full: failure for " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9c179328-bc36-49c6-9147-869b9ce2f77b@intel.com \
--to=tomasz.lis@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
--cc=michal.wajdeczko@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox