Re: [PATCH v2] drm/xe/vf: Start re-emission from first unsignaled job during VF migration

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Matthew Brost <matthew.brost@intel.com>
To: "Lis, Tomasz" <tomasz.lis@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <michal.wajdeczko@intel.com>
Subject: Re: [PATCH v2] drm/xe/vf: Start re-emission from first unsignaled job during VF migration
Date: Thu, 6 Nov 2025 23:48:36 -0800	[thread overview]
Message-ID: <aQ2kVCs94QjLECVQ@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <9c179328-bc36-49c6-9147-869b9ce2f77b@intel.com>

On Sat, Nov 01, 2025 at 02:33:07AM +0100, Lis, Tomasz wrote:
> 
> On 10/31/2025 9:13 PM, Matthew Brost wrote:
> > The LRC software ring tail is reset to the first unsignaled pending
> > job's head.
> > 
> > Fix the re-emission logic to begin submitting from the first unsignaled
> > job detected, rather than scanning all pending jobs, which can cause
> > imbalance.
> > 
> > v2:
> >   - Include missing local changes
> > 
> > Fixes: c25c1010df88 ("drm/xe/vf: Replay GuC submission state on pause / unpause")
> > Signed-off-by: Matthew Brost<matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_gpu_scheduler.h |  5 +++--
> >   drivers/gpu/drm/xe/xe_guc_submit.c    | 19 +++++++++++--------
> >   2 files changed, 14 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.h b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> > index 9955397aaaa9..357afaec68d7 100644
> > --- a/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> > +++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> > @@ -54,13 +54,14 @@ static inline void xe_sched_tdr_queue_imm(struct xe_gpu_scheduler *sched)
> >   static inline void xe_sched_resubmit_jobs(struct xe_gpu_scheduler *sched)
> >   {
> >   	struct drm_sched_job *s_job;
> > +	bool skip_emit = false;
> >   	list_for_each_entry(s_job, &sched->base.pending_list, list) {
> >   		struct drm_sched_fence *s_fence = s_job->s_fence;
> >   		struct dma_fence *hw_fence = s_fence->parent;
> > -		if (to_xe_sched_job(s_job)->skip_emit ||
> > -		    (hw_fence && !dma_fence_is_signaled(hw_fence)))
> > +		skip_emit |= to_xe_sched_job(s_job)->skip_emit;
> > +		if (skip_emit || (hw_fence && !dma_fence_is_signaled(hw_fence)))
> 
> This looks ok, but what is the mechanism which could lead to a job after the
> first  `skip_emit=1` job to have the `skip_emit` flag lifted?
> 

This shouldn't be possible with the current code, since we're checking
hw_fence. If we were relying on the software fence (i.e., the job's
finished fence), the state wouldn't be stable. I think our eventually
the is goal is to use the software fence [1] to avoid DRM scheduler's
violations, so defensively / future proofed coded here.

[1] https://patchwork.freedesktop.org/series/155314/

> Wouldn't the only possibility be that jobs were executed out of order?
>
> >   			sched->base.ops->run_job(s_job);
> >   	}
> >   }
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index d4ffdb71ef3d..f25b71aca498 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -2152,6 +2152,8 @@ static void guc_exec_queue_pause(struct xe_guc *guc, struct xe_exec_queue *q)
> >   	job = xe_sched_first_pending_job(sched);
> >   	if (job) {
> > +		job->skip_emit = true;
> > +
> >   		/*
> >   		 * Adjust software tail so jobs submitted overwrite previous
> >   		 * position in ring buffer with new GGTT addresses.
> > @@ -2241,17 +2243,18 @@ static void guc_exec_queue_unpause_prepare(struct xe_guc *guc,
> >   					   struct xe_exec_queue *q)
> >   {
> >   	struct xe_gpu_scheduler *sched = &q->guc->sched;
> > -	struct drm_sched_job *s_job;
> >   	struct xe_sched_job *job = NULL;
> > +	bool skip_emit = false;
> > -	list_for_each_entry(s_job, &sched->base.pending_list, list) {
> > -		job = to_xe_sched_job(s_job);
> > -
> > -		xe_gt_dbg(guc_to_gt(guc), "Replay JOB - guc_id=%d, seqno=%d",
> > -			  q->guc->id, xe_sched_job_seqno(job));
> > +	list_for_each_entry(job, &sched->base.pending_list, drm.list) {
> > +		skip_emit |= job->skip_emit;
> 
> All emitted jobs have the skip_emit set, unless their EQ got submitted to
> GuC which clears it, but if it got unsubmitted without finishing then the
> flag is raised again.
> 
> So this does seem to select unfinished jobs.
> 
> Though this introduces an assertion that within Command Streamer ring area
> of a job, there are no GGTT references between seqno increment and end of
> the job commands. Maybe worth commenting in code that we're working on that
> assumption? Example issue would be if someone introduces saving some kind of
> metrics there.
> 
> (this assumption was in power before this patch too, but now as we're
> skipping fixups for finished jobs still in pending list, it becomes more
> important)
> 
> Also we're emitting jobs which have a flag names "skip_emit" set. This
> disconnect needs a comment too.
> 

I am not following this comment.

The idea is that in guc_exec_queue_pause, we set skip_emit—this acts as
a marker to iterate over and resubmit jobs regardless of other
conditions. As I mentioned, if this marker were based on a software
fence, the state could change between guc_exec_queue_pause and
guc_exec_queue_unpause_prepare. The current code uses the hardware
fence, but that could change between pause / unpause too.

We need to keep the iteration consistent because it also affects the KMD
software state. Even if a job signals during this flow, there's no issue
with the hardware; the key is maintaining a consistent software state
throughout the iteration phases: pause, unpause_prepare, and unpause.

Matt

> -Tomasz
> 
> > +		if (skip_emit) {
> > +			xe_gt_dbg(guc_to_gt(guc), "Replay JOB - guc_id=%d, seqno=%d",
> > +				  q->guc->id, xe_sched_job_seqno(job));
> > -		q->ring_ops->emit_job(job);
> > -		job->skip_emit = true;
> > +			q->ring_ops->emit_job(job);
> > +			job->skip_emit = true;
> > +		}
> >   	}
> >   	if (job)

next prev parent reply	other threads:[~2025-11-07  7:48 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-31 20:13 [PATCH v2] drm/xe/vf: Start re-emission from first unsignaled job during VF migration Matthew Brost
2025-10-31 22:25 ` ✓ CI.KUnit: success for " Patchwork
2025-10-31 23:43 ` ✓ Xe.CI.BAT: " Patchwork
2025-11-01  1:33 ` [PATCH v2] " Lis, Tomasz
2025-11-07  7:48   ` Matthew Brost [this message]
2025-11-19  2:48     ` Lis, Tomasz
2025-11-19 19:55       ` Matthew Brost
2025-11-20  0:31         ` Lis, Tomasz
2025-11-01 13:52 ` ✗ Xe.CI.Full: failure for " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQ2kVCs94QjLECVQ@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=michal.wajdeczko@intel.com \
    --cc=tomasz.lis@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox