Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Lis, Tomasz" <tomasz.lis@intel.com>
To: Matthew Brost <matthew.brost@intel.com>,
	<intel-xe@lists.freedesktop.org>
Cc: <michal.wajdeczko@intel.com>
Subject: Re: [PATCH v2] drm/xe/vf: Start re-emission from first unsignaled job during VF migration
Date: Sat, 1 Nov 2025 02:33:07 +0100	[thread overview]
Message-ID: <9c179328-bc36-49c6-9147-869b9ce2f77b@intel.com> (raw)
In-Reply-To: <20251031201345.3015516-1-matthew.brost@intel.com>

[-- Attachment #1: Type: text/plain, Size: 4078 bytes --]


On 10/31/2025 9:13 PM, Matthew Brost wrote:
> The LRC software ring tail is reset to the first unsignaled pending
> job's head.
>
> Fix the re-emission logic to begin submitting from the first unsignaled
> job detected, rather than scanning all pending jobs, which can cause
> imbalance.
>
> v2:
>   - Include missing local changes
>
> Fixes: c25c1010df88 ("drm/xe/vf: Replay GuC submission state on pause / unpause")
> Signed-off-by: Matthew Brost<matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_gpu_scheduler.h |  5 +++--
>   drivers/gpu/drm/xe/xe_guc_submit.c    | 19 +++++++++++--------
>   2 files changed, 14 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.h b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> index 9955397aaaa9..357afaec68d7 100644
> --- a/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> +++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> @@ -54,13 +54,14 @@ static inline void xe_sched_tdr_queue_imm(struct xe_gpu_scheduler *sched)
>   static inline void xe_sched_resubmit_jobs(struct xe_gpu_scheduler *sched)
>   {
>   	struct drm_sched_job *s_job;
> +	bool skip_emit = false;
>   
>   	list_for_each_entry(s_job, &sched->base.pending_list, list) {
>   		struct drm_sched_fence *s_fence = s_job->s_fence;
>   		struct dma_fence *hw_fence = s_fence->parent;
>   
> -		if (to_xe_sched_job(s_job)->skip_emit ||
> -		    (hw_fence && !dma_fence_is_signaled(hw_fence)))
> +		skip_emit |= to_xe_sched_job(s_job)->skip_emit;
> +		if (skip_emit || (hw_fence && !dma_fence_is_signaled(hw_fence)))

This looks ok, but what is the mechanism which could lead to a job after 
the first  `skip_emit=1` job to have the `skip_emit` flag lifted?

Wouldn't the only possibility be that jobs were executed out of order?

>   			sched->base.ops->run_job(s_job);
>   	}
>   }
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index d4ffdb71ef3d..f25b71aca498 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -2152,6 +2152,8 @@ static void guc_exec_queue_pause(struct xe_guc *guc, struct xe_exec_queue *q)
>   
>   	job = xe_sched_first_pending_job(sched);
>   	if (job) {
> +		job->skip_emit = true;
> +
>   		/*
>   		 * Adjust software tail so jobs submitted overwrite previous
>   		 * position in ring buffer with new GGTT addresses.
> @@ -2241,17 +2243,18 @@ static void guc_exec_queue_unpause_prepare(struct xe_guc *guc,
>   					   struct xe_exec_queue *q)
>   {
>   	struct xe_gpu_scheduler *sched = &q->guc->sched;
> -	struct drm_sched_job *s_job;
>   	struct xe_sched_job *job = NULL;
> +	bool skip_emit = false;
>   
> -	list_for_each_entry(s_job, &sched->base.pending_list, list) {
> -		job = to_xe_sched_job(s_job);
> -
> -		xe_gt_dbg(guc_to_gt(guc), "Replay JOB - guc_id=%d, seqno=%d",
> -			  q->guc->id, xe_sched_job_seqno(job));
> +	list_for_each_entry(job, &sched->base.pending_list, drm.list) {
> +		skip_emit |= job->skip_emit;

All emitted jobs have the skip_emit set, unless their EQ got submitted 
to GuC which clears it, but if it got unsubmitted without finishing then 
the flag is raised again.

So this does seem to select unfinished jobs.

Though this introduces an assertion that within Command Streamer ring 
area of a job, there are no GGTT references between seqno increment and 
end of the job commands. Maybe worth commenting in code that we're 
working on that assumption? Example issue would be if someone introduces 
saving some kind of metrics there.

(this assumption was in power before this patch too, but now as we're 
skipping fixups for finished jobs still in pending list, it becomes more 
important)

Also we're emitting jobs which have a flag names "skip_emit" set. This 
disconnect needs a comment too.

-Tomasz

> +		if (skip_emit) {
> +			xe_gt_dbg(guc_to_gt(guc), "Replay JOB - guc_id=%d, seqno=%d",
> +				  q->guc->id, xe_sched_job_seqno(job));
>   
> -		q->ring_ops->emit_job(job);
> -		job->skip_emit = true;
> +			q->ring_ops->emit_job(job);
> +			job->skip_emit = true;
> +		}
>   	}
>   
>   	if (job)

[-- Attachment #2: Type: text/html, Size: 5486 bytes --]

  parent reply	other threads:[~2025-11-01  1:33 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-31 20:13 [PATCH v2] drm/xe/vf: Start re-emission from first unsignaled job during VF migration Matthew Brost
2025-10-31 22:25 ` ✓ CI.KUnit: success for " Patchwork
2025-10-31 23:43 ` ✓ Xe.CI.BAT: " Patchwork
2025-11-01  1:33 ` Lis, Tomasz [this message]
2025-11-07  7:48   ` [PATCH v2] " Matthew Brost
2025-11-19  2:48     ` Lis, Tomasz
2025-11-19 19:55       ` Matthew Brost
2025-11-20  0:31         ` Lis, Tomasz
2025-11-01 13:52 ` ✗ Xe.CI.Full: failure for " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9c179328-bc36-49c6-9147-869b9ce2f77b@intel.com \
    --to=tomasz.lis@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=michal.wajdeczko@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox