From: "Lis, Tomasz" <tomasz.lis@intel.com>
To: Matthew Brost <matthew.brost@intel.com>,
<intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH v6 14/30] drm/xe/vf: Wakeup in GuC backend on VF post migration recovery
Date: Tue, 7 Oct 2025 00:27:06 +0200 [thread overview]
Message-ID: <22e28a11-7798-4f90-a09c-cb20850c5988@intel.com> (raw)
In-Reply-To: <20251006111038.2234860-15-matthew.brost@intel.com>
On 10/6/2025 1:10 PM, Matthew Brost wrote:
> If VF post-migration recovery is in progress, the recovery flow will
> rebuild all GuC submission state. In this case, exit all waiters to
> ensure that submission queue scheduling can also be paused. Avoid taking
> any adverse actions after aborting the wait.
>
> As part of waking up the GuC backend, suspend_wait can now return
> -EAGAIN indicating the waiter should be retried. If the caller is
> running on work item, that work item need to be requeued to avoid a
> deadlock for the work item blocking the VF migration recovery work item.
>
> v3:
> - Don't block in preempt fence work queue as this can interfere with VF
> post-migration work queue scheduling leading to deadlock (Testing)
> - Use xe_gt_recovery_inprogress (Michal)
> v5:
> - Use static function for vf_recovery (Michal)
> - Add helper to wake CT waiters (Michal)
> - Move some code to following patch (Michal)
> - Adjust commit message to explain suspend_wait returning -EAGAIN (Michal)
> - Add kernel doc to suspend_wait around returning -EAGAIN
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
> drivers/gpu/drm/xe/xe_exec_queue_types.h | 3 +
> drivers/gpu/drm/xe/xe_gt_sriov_vf.c | 4 ++
> drivers/gpu/drm/xe/xe_guc_ct.h | 9 +++
> drivers/gpu/drm/xe/xe_guc_submit.c | 82 ++++++++++++++++++------
> drivers/gpu/drm/xe/xe_preempt_fence.c | 11 ++++
> 5 files changed, 88 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index 27b76cf9da89..282505fa1377 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -207,6 +207,9 @@ struct xe_exec_queue_ops {
> * call after suspend. In dma-fencing path thus must return within a
> * reasonable amount of time. -ETIME return shall indicate an error
> * waiting for suspend resulting in associated VM getting killed.
> + * -EAGAIN return indicates the wait should be tried again, if the wait
> + * is within a work item, the work item should be requeued as deadlock
> + * avoidance mechanism.
> */
> int (*suspend_wait)(struct xe_exec_queue *q);
> /**
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> index 7057260175f3..7f703336d692 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> @@ -23,6 +23,7 @@
> #include "xe_gt_sriov_vf.h"
> #include "xe_gt_sriov_vf_types.h"
> #include "xe_guc.h"
> +#include "xe_guc_ct.h"
> #include "xe_guc_hxg_helpers.h"
> #include "xe_guc_relay.h"
> #include "xe_guc_submit.h"
> @@ -743,6 +744,9 @@ static void vf_start_migration_recovery(struct xe_gt *gt)
> !gt->sriov.vf.migration.recovery_teardown) {
> gt->sriov.vf.migration.recovery_queued = true;
> WRITE_ONCE(gt->sriov.vf.migration.recovery_inprogress, true);
> + smp_wmb(); /* Ensure above write visable before wake */
> +
> + xe_guc_ct_wake_waiters(>->uc.guc.ct);
>
> started = queue_work(gt->ordered_wq, >->sriov.vf.migration.worker);
> xe_gt_sriov_info(gt, "VF migration recovery %s\n", started ?
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.h b/drivers/gpu/drm/xe/xe_guc_ct.h
> index d6c81325a76c..ca0ec938edac 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.h
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.h
> @@ -72,4 +72,13 @@ xe_guc_ct_send_block_no_fail(struct xe_guc_ct *ct, const u32 *action, u32 len)
>
> long xe_guc_ct_queue_proc_time_jiffies(struct xe_guc_ct *ct);
>
> +/**
> + * xe_guc_ct_wake_waiters() - GuC CT wake up waiters
> + * @guc: GuC CT object
> + */
> +static inline void xe_guc_ct_wake_waiters(struct xe_guc_ct *ct)
> +{
> + wake_up_all(&ct->wq);
> +}
> +
> #endif
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 59371b7cc8a4..b2ca4911efe9 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -27,7 +27,6 @@
> #include "xe_gt.h"
> #include "xe_gt_clock.h"
> #include "xe_gt_printk.h"
> -#include "xe_gt_sriov_vf.h"
> #include "xe_guc.h"
> #include "xe_guc_capture.h"
> #include "xe_guc_ct.h"
> @@ -702,6 +701,11 @@ static u32 wq_space_until_wrap(struct xe_exec_queue *q)
> return (WQ_SIZE - q->guc->wqi_tail);
> }
>
> +static bool vf_recovery(struct xe_guc *guc)
> +{
> + return xe_gt_recovery_pending(guc_to_gt(guc));
> +}
> +
> static int wq_wait_for_space(struct xe_exec_queue *q, u32 wqi_size)
> {
> struct xe_guc *guc = exec_queue_to_guc(q);
> @@ -711,7 +715,7 @@ static int wq_wait_for_space(struct xe_exec_queue *q, u32 wqi_size)
>
> #define AVAILABLE_SPACE \
> CIRC_SPACE(q->guc->wqi_tail, q->guc->wqi_head, WQ_SIZE)
> - if (wqi_size > AVAILABLE_SPACE) {
> + if (wqi_size > AVAILABLE_SPACE && !vf_recovery(guc)) {
> try_again:
> q->guc->wqi_head = parallel_read(xe, map, wq_desc.head);
> if (wqi_size > AVAILABLE_SPACE) {
> @@ -910,9 +914,10 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
> ret = wait_event_timeout(guc->ct.wq,
> (!exec_queue_pending_enable(q) &&
> !exec_queue_pending_disable(q)) ||
> - xe_guc_read_stopped(guc),
> + xe_guc_read_stopped(guc) ||
> + vf_recovery(guc),
> HZ * 5);
> - if (!ret) {
> + if (!ret && !vf_recovery(guc)) {
Is it possible for vf_recovery() to change its retval between the above
llines? Ending the wait due to recovery, and then forgetting that happened?
Maybe we should assign to a local?
(concerns all places where we do the check this way)
-Tomasz
> struct xe_gpu_scheduler *sched = &q->guc->sched;
>
> xe_gt_warn(q->gt, "Pending enable/disable failed to respond\n");
> @@ -1015,6 +1020,10 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
> bool wedged = false;
>
> xe_gt_assert(guc_to_gt(guc), xe_exec_queue_is_lr(q));
> +
> + if (vf_recovery(guc))
> + return;
> +
> trace_xe_exec_queue_lr_cleanup(q);
>
> if (!exec_queue_killed(q))
> @@ -1047,7 +1056,11 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
> */
> ret = wait_event_timeout(guc->ct.wq,
> !exec_queue_pending_disable(q) ||
> - xe_guc_read_stopped(guc), HZ * 5);
> + xe_guc_read_stopped(guc) ||
> + vf_recovery(guc), HZ * 5);
> + if (vf_recovery(guc))
> + return;
> +
> if (!ret) {
> xe_gt_warn(q->gt, "Schedule disable failed to respond, guc_id=%d\n",
> q->guc->id);
> @@ -1137,8 +1150,9 @@ static void enable_scheduling(struct xe_exec_queue *q)
>
> ret = wait_event_timeout(guc->ct.wq,
> !exec_queue_pending_enable(q) ||
> - xe_guc_read_stopped(guc), HZ * 5);
> - if (!ret || xe_guc_read_stopped(guc)) {
> + xe_guc_read_stopped(guc) ||
> + vf_recovery(guc), HZ * 5);
> + if ((!ret && !vf_recovery(guc)) || xe_guc_read_stopped(guc)) {
> xe_gt_warn(guc_to_gt(guc), "Schedule enable failed to respond");
> set_exec_queue_banned(q);
> xe_gt_reset_async(q->gt);
> @@ -1209,7 +1223,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> * list so job can be freed and kick scheduler ensuring free job is not
> * lost.
> */
> - if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence->flags))
> + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence->flags) ||
> + vf_recovery(guc))
> return DRM_GPU_SCHED_STAT_NO_HANG;
>
> /* Kill the run_job entry point */
> @@ -1261,7 +1276,10 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> ret = wait_event_timeout(guc->ct.wq,
> (!exec_queue_pending_enable(q) &&
> !exec_queue_pending_disable(q)) ||
> - xe_guc_read_stopped(guc), HZ * 5);
> + xe_guc_read_stopped(guc) ||
> + vf_recovery(guc), HZ * 5);
> + if (vf_recovery(guc))
> + goto handle_vf_resume;
> if (!ret || xe_guc_read_stopped(guc))
> goto trigger_reset;
>
> @@ -1286,7 +1304,10 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> smp_rmb();
> ret = wait_event_timeout(guc->ct.wq,
> !exec_queue_pending_disable(q) ||
> - xe_guc_read_stopped(guc), HZ * 5);
> + xe_guc_read_stopped(guc) ||
> + vf_recovery(guc), HZ * 5);
> + if (vf_recovery(guc))
> + goto handle_vf_resume;
> if (!ret || xe_guc_read_stopped(guc)) {
> trigger_reset:
> if (!ret)
> @@ -1391,6 +1412,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> * some thought, do this in a follow up.
> */
> xe_sched_submission_start(sched);
> +handle_vf_resume:
> return DRM_GPU_SCHED_STAT_NO_HANG;
> }
>
> @@ -1487,11 +1509,17 @@ static void __guc_exec_queue_process_msg_set_sched_props(struct xe_sched_msg *ms
>
> static void __suspend_fence_signal(struct xe_exec_queue *q)
> {
> + struct xe_guc *guc = exec_queue_to_guc(q);
> + struct xe_device *xe = guc_to_xe(guc);
> +
> if (!q->guc->suspend_pending)
> return;
>
> WRITE_ONCE(q->guc->suspend_pending, false);
> - wake_up(&q->guc->suspend_wait);
> + if (IS_SRIOV_VF(xe))
> + wake_up_all(&guc->ct.wq);
> + else
> + wake_up(&q->guc->suspend_wait);
> }
>
> static void suspend_fence_signal(struct xe_exec_queue *q)
> @@ -1512,8 +1540,9 @@ static void __guc_exec_queue_process_msg_suspend(struct xe_sched_msg *msg)
>
> if (guc_exec_queue_allowed_to_change_state(q) && !exec_queue_suspended(q) &&
> exec_queue_enabled(q)) {
> - wait_event(guc->ct.wq, (q->guc->resume_time != RESUME_PENDING ||
> - xe_guc_read_stopped(guc)) && !exec_queue_pending_disable(q));
> + wait_event(guc->ct.wq, vf_recovery(guc) ||
> + ((q->guc->resume_time != RESUME_PENDING ||
> + xe_guc_read_stopped(guc)) && !exec_queue_pending_disable(q)));
>
> if (!xe_guc_read_stopped(guc)) {
> s64 since_resume_ms =
> @@ -1640,7 +1669,7 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
>
> q->entity = &ge->entity;
>
> - if (xe_guc_read_stopped(guc))
> + if (xe_guc_read_stopped(guc) || vf_recovery(guc))
> xe_sched_stop(sched);
>
> mutex_unlock(&guc->submission_state.lock);
> @@ -1786,6 +1815,7 @@ static int guc_exec_queue_suspend(struct xe_exec_queue *q)
> static int guc_exec_queue_suspend_wait(struct xe_exec_queue *q)
> {
> struct xe_guc *guc = exec_queue_to_guc(q);
> + struct xe_device *xe = guc_to_xe(guc);
> int ret;
>
> /*
> @@ -1793,11 +1823,22 @@ static int guc_exec_queue_suspend_wait(struct xe_exec_queue *q)
> * suspend_pending upon kill but to be paranoid but races in which
> * suspend_pending is set after kill also check kill here.
> */
> - ret = wait_event_interruptible_timeout(q->guc->suspend_wait,
> - !READ_ONCE(q->guc->suspend_pending) ||
> - exec_queue_killed(q) ||
> - xe_guc_read_stopped(guc),
> - HZ * 5);
> + if (IS_SRIOV_VF(xe))
> + ret = wait_event_interruptible_timeout(guc->ct.wq,
> + !READ_ONCE(q->guc->suspend_pending) ||
> + exec_queue_killed(q) ||
> + xe_guc_read_stopped(guc) ||
> + vf_recovery(guc),
> + HZ * 5);
> + else
> + ret = wait_event_interruptible_timeout(q->guc->suspend_wait,
> + !READ_ONCE(q->guc->suspend_pending) ||
> + exec_queue_killed(q) ||
> + xe_guc_read_stopped(guc),
> + HZ * 5);
> +
> + if (vf_recovery(guc) && !xe_device_wedged((guc_to_xe(guc))))
> + return -EAGAIN;
>
> if (!ret) {
> xe_gt_warn(guc_to_gt(guc),
> @@ -1905,8 +1946,7 @@ int xe_guc_submit_reset_prepare(struct xe_guc *guc)
> {
> int ret;
>
> - if (xe_gt_WARN_ON(guc_to_gt(guc),
> - xe_gt_sriov_vf_recovery_pending(guc_to_gt(guc))))
> + if (xe_gt_WARN_ON(guc_to_gt(guc), vf_recovery(guc)))
> return 0;
>
> if (!guc->submission_state.initialized)
> diff --git a/drivers/gpu/drm/xe/xe_preempt_fence.c b/drivers/gpu/drm/xe/xe_preempt_fence.c
> index 83fbeea5aa20..7f587ca3947d 100644
> --- a/drivers/gpu/drm/xe/xe_preempt_fence.c
> +++ b/drivers/gpu/drm/xe/xe_preempt_fence.c
> @@ -8,6 +8,8 @@
> #include <linux/slab.h>
>
> #include "xe_exec_queue.h"
> +#include "xe_gt_printk.h"
> +#include "xe_guc_exec_queue_types.h"
> #include "xe_vm.h"
>
> static void preempt_fence_work_func(struct work_struct *w)
> @@ -22,6 +24,15 @@ static void preempt_fence_work_func(struct work_struct *w)
> } else if (!q->ops->reset_status(q)) {
> int err = q->ops->suspend_wait(q);
>
> + if (err == -EAGAIN) {
> + xe_gt_dbg(q->gt, "PREEMPT FENCE RETRY guc_id=%d",
> + q->guc->id);
> + queue_work(q->vm->xe->preempt_fence_wq,
> + &pfence->preempt_work);
> + dma_fence_end_signalling(cookie);
> + return;
> + }
> +
> if (err)
> dma_fence_set_error(&pfence->base, err);
> } else {
next prev parent reply other threads:[~2025-10-06 22:27 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-06 11:10 [PATCH v6 00/30] VF migration redesign Matthew Brost
2025-10-06 11:10 ` [PATCH v6 01/30] drm/xe: Add NULL checks to scratch LRC allocation Matthew Brost
2025-10-06 21:51 ` Lis, Tomasz
2025-10-06 11:10 ` [PATCH v6 02/30] drm/xe: Save off position in ring in which a job was programmed Matthew Brost
2025-10-06 11:10 ` [PATCH v6 03/30] drm/xe/guc: Track pending-enable source in submission state Matthew Brost
2025-10-06 11:10 ` [PATCH v6 04/30] drm/xe: Track LR jobs in DRM scheduler pending list Matthew Brost
2025-10-06 11:10 ` [PATCH v6 05/30] drm/xe: Don't change LRC ring head on job resubmission Matthew Brost
2025-10-06 11:10 ` [PATCH v6 06/30] drm/xe: Make LRC W/A scratch buffer usage consistent Matthew Brost
2025-10-06 11:10 ` [PATCH v6 07/30] drm/xe/vf: Add xe_gt_recovery_pending helper Matthew Brost
2025-10-06 13:10 ` Michal Wajdeczko
2025-10-06 11:10 ` [PATCH v6 08/30] drm/xe/vf: Make VF recovery run on per-GT worker Matthew Brost
2025-10-06 11:10 ` [PATCH v6 09/30] drm/xe/vf: Abort H2G sends during VF post-migration recovery Matthew Brost
2025-10-06 11:10 ` [PATCH v6 10/30] drm/xe/vf: Remove memory allocations from VF post migration recovery Matthew Brost
2025-10-06 11:10 ` [PATCH v6 11/30] drm/xe/vf: Close multi-GT GGTT shift race Matthew Brost
2025-10-06 14:27 ` Michal Wajdeczko
2025-10-06 14:56 ` Matthew Brost
2025-10-06 11:10 ` [PATCH v6 12/30] drm/xe/vf: Teardown VF post migration worker on driver unload Matthew Brost
2025-10-06 11:10 ` [PATCH v6 13/30] drm/xe/vf: Don't allow GT reset to be queued during VF post migration recovery Matthew Brost
2025-10-06 11:10 ` [PATCH v6 14/30] drm/xe/vf: Wakeup in GuC backend on " Matthew Brost
2025-10-06 14:35 ` Michal Wajdeczko
2025-10-06 15:54 ` Matthew Brost
2025-10-06 22:27 ` Lis, Tomasz [this message]
2025-10-06 23:07 ` Matthew Brost
2025-10-06 11:10 ` [PATCH v6 15/30] drm/xe/vf: Avoid indefinite blocking in preempt rebind worker for VFs supporting migration Matthew Brost
2025-10-06 11:10 ` [PATCH v6 16/30] drm/xe/vf: Use GUC_HXG_TYPE_EVENT for GuC context register Matthew Brost
2025-10-06 14:51 ` Michal Wajdeczko
2025-10-06 16:02 ` Matthew Brost
2025-10-06 22:21 ` Lis, Tomasz
2025-10-06 22:57 ` Matthew Brost
2025-10-06 11:10 ` [PATCH v6 17/30] drm/xe/vf: Flush and stop CTs in VF post migration recovery Matthew Brost
2025-10-06 11:10 ` [PATCH v6 18/30] drm/xe/vf: Reset TLB invalidations during " Matthew Brost
2025-10-06 11:10 ` [PATCH v6 19/30] drm/xe/vf: Kickstart after resfix in " Matthew Brost
2025-10-06 11:10 ` [PATCH v6 20/30] drm/xe/vf: Start CTs before resfix " Matthew Brost
2025-10-06 21:50 ` Lis, Tomasz
2025-10-06 11:10 ` [PATCH v6 21/30] drm/xe/vf: Abort VF post migration recovery on failure Matthew Brost
2025-10-06 11:10 ` [PATCH v6 22/30] drm/xe/vf: Replay GuC submission state on pause / unpause Matthew Brost
2025-10-06 11:10 ` [PATCH v6 23/30] drm/xe: Move queue init before LRC creation Matthew Brost
2025-10-06 15:22 ` Michal Wajdeczko
2025-10-06 21:33 ` Lis, Tomasz
2025-10-06 11:10 ` [PATCH v6 24/30] drm/xe/vf: Add debug prints for GuC replaying state during VF recovery Matthew Brost
2025-10-06 11:10 ` [PATCH v6 25/30] drm/xe/vf: Workaround for race condition in GuC firmware during VF pause Matthew Brost
2025-10-06 11:10 ` [PATCH v6 26/30] drm/xe: Use PPGTT addresses for TLB invalidation to avoid GGTT fixups Matthew Brost
2025-10-06 11:10 ` [PATCH v6 27/30] drm/xe/vf: Use primary GT ordered work queue on media GT on PTL VF Matthew Brost
2025-10-06 22:24 ` Lucas De Marchi
2025-10-06 22:51 ` Matthew Brost
2025-10-07 17:00 ` Lucas De Marchi
2025-10-07 17:22 ` Matthew Brost
2025-10-07 20:36 ` Lucas De Marchi
2025-10-07 21:18 ` Matthew Brost
2025-10-06 11:10 ` [PATCH v6 28/30] drm/xe/vf: Ensure media GT VF recovery runs after primary GT on PTL Matthew Brost
2025-10-06 11:10 ` [PATCH v6 29/30] drm/xe/vf: Rebase CCS save/restore BB GGTT addresses Matthew Brost
2025-10-06 11:10 ` [PATCH v6 30/30] drm/xe/guc: Increase wait timeout to 2sec after BUSY reply from GuC Matthew Brost
2025-10-06 11:17 ` ✗ CI.checkpatch: warning for VF migration redesign (rev6) Patchwork
2025-10-06 11:18 ` ✓ CI.KUnit: success " Patchwork
2025-10-06 12:24 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-10-06 14:28 ` ✗ Xe.CI.Full: " Patchwork
2025-10-07 0:20 ` [PATCH v6 00/30] VF migration redesign Niranjana Vishwanathapura
2025-10-07 1:11 ` Matthew Brost
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=22e28a11-7798-4f90-a09c-cb20850c5988@intel.com \
--to=tomasz.lis@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox