From: Matthew Brost <matthew.brost@intel.com>
To: intel-xe@lists.freedesktop.org
Subject: [PATCH v8 17/33] drm/xe/vf: Avoid indefinite blocking in preempt rebind worker for VFs supporting migration
Date: Tue, 7 Oct 2025 06:04:49 -0700 [thread overview]
Message-ID: <20251007130505.2694829-18-matthew.brost@intel.com> (raw)
In-Reply-To: <20251007130505.2694829-1-matthew.brost@intel.com>
Blocking in work queues on a hardware action that may never occur —
especially when it depends on a software fixup also scheduled on the
a work queue — is a recipe for deadlock. This situation arises with
the preempt rebind worker and VF post-migration recovery. To prevent
potential deadlocks, avoid indefinite blocking in the preempt rebind
worker for VFs that support migration.
v4:
- Use dma_fence_wait_timeout (CI)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
---
drivers/gpu/drm/xe/xe_vm.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 4e914928e0a9..faca626702b8 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -35,6 +35,7 @@
#include "xe_pt.h"
#include "xe_pxp.h"
#include "xe_res_cursor.h"
+#include "xe_sriov_vf.h"
#include "xe_svm.h"
#include "xe_sync.h"
#include "xe_tile.h"
@@ -111,12 +112,22 @@ static int alloc_preempt_fences(struct xe_vm *vm, struct list_head *list,
static int wait_for_existing_preempt_fences(struct xe_vm *vm)
{
struct xe_exec_queue *q;
+ bool vf_migration = IS_SRIOV_VF(vm->xe) &&
+ xe_sriov_vf_migration_supported(vm->xe);
+ signed long wait_time = vf_migration ? HZ / 5 : MAX_SCHEDULE_TIMEOUT;
xe_vm_assert_held(vm);
list_for_each_entry(q, &vm->preempt.exec_queues, lr.link) {
if (q->lr.pfence) {
- long timeout = dma_fence_wait(q->lr.pfence, false);
+ long timeout;
+
+ timeout = dma_fence_wait_timeout(q->lr.pfence, false,
+ wait_time);
+ if (!timeout) {
+ xe_assert(vm->xe, vf_migration);
+ return -EAGAIN;
+ }
/* Only -ETIME on fence indicates VM needs to be killed */
if (timeout < 0 || q->lr.pfence->error == -ETIME)
@@ -541,6 +552,19 @@ static void preempt_rebind_work_func(struct work_struct *w)
out_unlock_outer:
if (err == -EAGAIN) {
trace_xe_vm_rebind_worker_retry(vm);
+
+ /*
+ * We can't block in workers on a VF which supports migration
+ * given this can block the VF post-migration workers from
+ * getting scheduled.
+ */
+ if (IS_SRIOV_VF(vm->xe) &&
+ xe_sriov_vf_migration_supported(vm->xe)) {
+ up_write(&vm->lock);
+ xe_vm_queue_rebind_worker(vm);
+ return;
+ }
+
goto retry;
}
--
2.34.1
next prev parent reply other threads:[~2025-10-07 13:05 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-07 13:04 [PATCH v8 00/33] VF migration redesign Matthew Brost
2025-10-07 13:04 ` [PATCH v8 01/33] drm/xe: Add NULL checks to scratch LRC allocation Matthew Brost
2025-10-07 13:04 ` [PATCH v8 02/33] drm/xe: Save off position in ring in which a job was programmed Matthew Brost
2025-10-07 13:04 ` [PATCH v8 03/33] drm/xe/guc: Track pending-enable source in submission state Matthew Brost
2025-10-07 13:04 ` [PATCH v8 04/33] drm/xe: Track LR jobs in DRM scheduler pending list Matthew Brost
2025-10-07 13:04 ` [PATCH v8 05/33] drm/xe: Return first unsignaled job first pending job helper Matthew Brost
2025-10-08 8:27 ` Matthew Auld
2025-10-07 13:04 ` [PATCH v8 06/33] drm/xe: Don't change LRC ring head on job resubmission Matthew Brost
2025-10-07 13:04 ` [PATCH v8 07/33] drm/xe: Make LRC W/A scratch buffer usage consistent Matthew Brost
2025-10-07 13:04 ` [PATCH v8 08/33] drm/xe/vf: Add xe_gt_recovery_pending helper Matthew Brost
2025-10-07 13:04 ` [PATCH v8 09/33] drm/xe/vf: Make VF recovery run on per-GT worker Matthew Brost
2025-10-07 13:04 ` [PATCH v8 10/33] drm/xe/vf: Abort H2G sends during VF post-migration recovery Matthew Brost
2025-10-07 13:04 ` [PATCH v8 11/33] drm/xe/vf: Remove memory allocations from VF post migration recovery Matthew Brost
2025-10-07 13:04 ` [PATCH v8 12/33] drm/xe: Move GGTT lock init to alloc Matthew Brost
2025-10-07 13:37 ` Michal Wajdeczko
2025-10-07 13:04 ` [PATCH v8 13/33] drm/xe/vf: Close multi-GT GGTT shift race Matthew Brost
2025-10-07 13:04 ` [PATCH v8 14/33] drm/xe/vf: Teardown VF post migration worker on driver unload Matthew Brost
2025-10-07 13:04 ` [PATCH v8 15/33] drm/xe/vf: Don't allow GT reset to be queued during VF post migration recovery Matthew Brost
2025-10-07 13:04 ` [PATCH v8 16/33] drm/xe/vf: Wakeup in GuC backend on " Matthew Brost
2025-10-07 13:04 ` Matthew Brost [this message]
2025-10-07 13:04 ` [PATCH v8 18/33] drm/xe/vf: Use GUC_HXG_TYPE_EVENT for GuC context register Matthew Brost
2025-10-07 13:04 ` [PATCH v8 19/33] drm/xe/vf: Flush and stop CTs in VF post migration recovery Matthew Brost
2025-10-07 13:04 ` [PATCH v8 20/33] drm/xe/vf: Reset TLB invalidations during " Matthew Brost
2025-10-07 13:04 ` [PATCH v8 21/33] drm/xe/vf: Kickstart after resfix in " Matthew Brost
2025-10-07 13:04 ` [PATCH v8 22/33] drm/xe: Add CTB_H2G_BUFFER_OFFSET define Matthew Brost
2025-10-07 13:34 ` Michal Wajdeczko
2025-10-07 13:04 ` [PATCH v8 23/33] drm/xe/vf: Start CTs before resfix VF post migration recovery Matthew Brost
2025-10-07 14:24 ` Michal Wajdeczko
2025-10-07 13:04 ` [PATCH v8 24/33] drm/xe/vf: Abort VF post migration recovery on failure Matthew Brost
2025-10-07 13:04 ` [PATCH v8 25/33] drm/xe/vf: Replay GuC submission state on pause / unpause Matthew Brost
2025-10-07 13:04 ` [PATCH v8 26/33] drm/xe: Move queue init before LRC creation Matthew Brost
2025-10-07 14:36 ` Lis, Tomasz
2025-10-07 13:04 ` [PATCH v8 27/33] drm/xe/vf: Add debug prints for GuC replaying state during VF recovery Matthew Brost
2025-10-07 13:05 ` [PATCH v8 28/33] drm/xe/vf: Workaround for race condition in GuC firmware during VF pause Matthew Brost
2025-10-07 13:05 ` [PATCH v8 29/33] drm/xe: Use PPGTT addresses for TLB invalidation to avoid GGTT fixups Matthew Brost
2025-10-07 13:05 ` [PATCH v8 30/33] drm/xe/vf: Use primary GT ordered work queue on media GT on PTL VF Matthew Brost
2025-10-08 17:34 ` Lucas De Marchi
2025-10-07 13:05 ` [PATCH v8 31/33] drm/xe/vf: Ensure media GT VF recovery runs after primary GT on PTL Matthew Brost
2025-10-07 13:05 ` [PATCH v8 32/33] drm/xe/vf: Rebase CCS save/restore BB GGTT addresses Matthew Brost
2025-10-07 13:05 ` [PATCH v8 33/33] drm/xe/guc: Increase wait timeout to 2sec after BUSY reply from GuC Matthew Brost
2025-10-07 13:17 ` ✗ CI.checkpatch: warning for VF migration redesign (rev8) Patchwork
2025-10-07 13:18 ` ✓ CI.KUnit: success " Patchwork
2025-10-07 13:57 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-07 17:02 ` ✗ Xe.CI.Full: failure " Patchwork
2025-10-07 20:49 ` Matthew Brost
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251007130505.2694829-18-matthew.brost@intel.com \
--to=matthew.brost@intel.com \
--cc=intel-xe@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox