From: "Lis, Tomasz" <tomasz.lis@intel.com>
To: Matthew Brost <matthew.brost@intel.com>,
<intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH v3 20/36] drm/xe/vf: Avoid indefinite blocking in preempt rebind worker for VFs supporting migration
Date: Wed, 1 Oct 2025 15:45:40 +0200 [thread overview]
Message-ID: <b535bacb-4f28-42ba-84e7-812ec28a07d9@intel.com> (raw)
In-Reply-To: <20250929025542.1486303-21-matthew.brost@intel.com>
On 9/29/2025 4:55 AM, Matthew Brost wrote:
> Blocking in work queues on a hardware action that may never occur —
> especially when it depends on a software fixup also scheduled on the
> awork queue — is a recipe for deadlock. This situation arises with
> the preempt rebind worker and VF post-migration recovery. To prevent
> potential deadlocks, avoid indefinite blocking in the preempt rebind
> worker for VFs that support migration.
Some would say the timeout value is a magic number here, but I don't
have anything better to propose.
And we do not have obligation to match each tracepoint _enter() with
_exit(), that's ok as well.
So, all good:
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
> drivers/gpu/drm/xe/xe_vm.c | 29 ++++++++++++++++++++++++++++-
> 1 file changed, 28 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 80b7f13ecd80..b527ee2a5da5 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -35,6 +35,7 @@
> #include "xe_pt.h"
> #include "xe_pxp.h"
> #include "xe_res_cursor.h"
> +#include "xe_sriov_vf.h"
> #include "xe_svm.h"
> #include "xe_sync.h"
> #include "xe_tile.h"
> @@ -111,12 +112,25 @@ static int alloc_preempt_fences(struct xe_vm *vm, struct list_head *list,
> static int wait_for_existing_preempt_fences(struct xe_vm *vm)
> {
> struct xe_exec_queue *q;
> + bool vf_migration = IS_SRIOV_VF(vm->xe) &&
> + xe_sriov_vf_migration_supported(vm->xe);
>
> xe_vm_assert_held(vm);
>
> list_for_each_entry(q, &vm->preempt.exec_queues, lr.link) {
> if (q->lr.pfence) {
> - long timeout = dma_fence_wait(q->lr.pfence, false);
> + long timeout;
> +
> + if (vf_migration)
> + timeout = dma_fence_wait_timeout(q->lr.pfence,
> + false, HZ / 5);
> + else
> + timeout = dma_fence_wait(q->lr.pfence, false);
> +
> + if (!timeout) {
> + xe_assert(vm->xe, vf_migration);
> + return -EAGAIN;
> + }
>
> /* Only -ETIME on fence indicates VM needs to be killed */
> if (timeout < 0 || q->lr.pfence->error == -ETIME)
> @@ -541,6 +555,19 @@ static void preempt_rebind_work_func(struct work_struct *w)
> out_unlock_outer:
> if (err == -EAGAIN) {
> trace_xe_vm_rebind_worker_retry(vm);
> +
> + /*
> + * We can't block in workers on a VF which supports migration
> + * given this can block the VF post-migration workers from
> + * getting scheduled.
> + */
> + if (IS_SRIOV_VF(vm->xe) &&
> + xe_sriov_vf_migration_supported(vm->xe)) {
> + up_write(&vm->lock);
> + xe_vm_queue_rebind_worker(vm);
> + return;
> + }
> +
> goto retry;
> }
>
next prev parent reply other threads:[~2025-10-01 13:45 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-29 2:55 [PATCH v3 00/36] VF migration redesign Matthew Brost
2025-09-29 2:55 ` [PATCH v3 01/36] drm/xe: Add NULL checks to scratch LRC allocation Matthew Brost
2025-09-30 2:06 ` Lis, Tomasz
2025-09-30 22:53 ` Matthew Brost
2025-09-29 2:55 ` [PATCH v3 02/36] drm/xe/vf: Lock querying GGTT config during driver init Matthew Brost
2025-09-29 7:42 ` Michal Wajdeczko
2025-09-29 12:15 ` Matthew Brost
2025-09-30 0:42 ` Lis, Tomasz
2025-09-30 10:25 ` Michal Wajdeczko
2025-09-29 8:13 ` Ville Syrjälä
2025-09-30 13:22 ` Lis, Tomasz
2025-09-29 2:55 ` [PATCH v3 03/36] Revert "drm/xe/vf: Rebase exec queue parallel commands during migration recovery" Matthew Brost
2025-09-30 15:22 ` Michal Wajdeczko
2025-09-29 2:55 ` [PATCH v3 04/36] Revert "drm/xe/vf: Post migration, repopulate ring area for pending request" Matthew Brost
2025-09-30 15:24 ` Michal Wajdeczko
2025-09-29 2:55 ` [PATCH v3 05/36] Revert "drm/xe/vf: Fixup CTB send buffer messages after migration" Matthew Brost
2025-09-30 15:27 ` Michal Wajdeczko
2025-09-29 2:55 ` [PATCH v3 06/36] drm/xe: Save off position in ring in which a job was programmed Matthew Brost
2025-09-29 2:55 ` [PATCH v3 07/36] drm/xe/guc: Track pending-enable source in submission state Matthew Brost
2025-09-29 2:55 ` [PATCH v3 08/36] drm/xe: Track LR jobs in DRM scheduler pending list Matthew Brost
2025-09-29 2:55 ` [PATCH v3 09/36] drm/xe: Don't change LRC ring head on job resubmission Matthew Brost
2025-09-30 2:38 ` Lis, Tomasz
2025-09-29 2:55 ` [PATCH v3 10/36] drm/xe: Make LRC W/A scratch buffer usage consistent Matthew Brost
2025-09-29 2:55 ` [PATCH v3 11/36] drm/xe/guc: Document GuC submission backend Matthew Brost
2025-09-30 3:28 ` Lis, Tomasz
2025-09-30 6:30 ` Matthew Brost
2025-09-29 2:55 ` [PATCH v3 12/36] drm/xe/vf: Add xe_gt_recovery_inprogress helper Matthew Brost
2025-09-29 8:04 ` Michal Wajdeczko
2025-09-29 8:52 ` Matthew Brost
2025-09-29 2:55 ` [PATCH v3 13/36] drm/xe/vf: Make VF recovery run on per-GT worker Matthew Brost
2025-09-30 14:47 ` Lis, Tomasz
2025-09-29 2:55 ` [PATCH v3 14/36] drm/xe/vf: Abort H2G sends during VF post-migration recovery Matthew Brost
2025-09-29 8:17 ` Michal Wajdeczko
2025-09-29 2:55 ` [PATCH v3 15/36] drm/xe/vf: Remove memory allocations from VF post migration recovery Matthew Brost
2025-09-30 15:00 ` Lis, Tomasz
2025-09-29 2:55 ` [PATCH v3 16/36] drm/xe/vf: Close multi-GT GGTT shift race Matthew Brost
2025-09-29 8:44 ` Michal Wajdeczko
2025-09-29 12:31 ` Matthew Brost
2025-09-29 2:55 ` [PATCH v3 17/36] drm/xe/vf: Teardown VF post migration worker on driver unload Matthew Brost
2025-09-30 16:24 ` Lis, Tomasz
2025-09-29 2:55 ` [PATCH v3 18/36] drm/xe/vf: Don't allow GT reset to be queued during VF post migration recovery Matthew Brost
2025-09-29 9:17 ` Michal Wajdeczko
2025-09-29 12:50 ` Matthew Brost
2025-09-29 2:55 ` [PATCH v3 19/36] drm/xe/vf: Wakeup in GuC backend on " Matthew Brost
2025-09-29 2:55 ` [PATCH v3 20/36] drm/xe/vf: Avoid indefinite blocking in preempt rebind worker for VFs supporting migration Matthew Brost
2025-10-01 13:45 ` Lis, Tomasz [this message]
2025-10-01 13:56 ` Lis, Tomasz
2025-09-29 2:55 ` [PATCH v3 21/36] drm/xe/vf: Extra debug on GGTT shift Matthew Brost
2025-09-29 2:55 ` [PATCH v3 22/36] drm/xe/vf: Use GUC_HXG_TYPE_EVENT for GuC context register Matthew Brost
2025-09-29 2:55 ` [PATCH v3 23/36] drm/xe/vf: Flush and stop CTs in VF post migration recovery Matthew Brost
2025-09-29 21:31 ` Michal Wajdeczko
2025-09-29 2:55 ` [PATCH v3 24/36] drm/xe/vf: Reset TLB invalidations during " Matthew Brost
2025-10-01 13:53 ` Lis, Tomasz
2025-09-29 2:55 ` [PATCH v3 25/36] drm/xe/vf: Kickstart after resfix in " Matthew Brost
2025-09-29 2:55 ` [PATCH v3 26/36] drm/xe/vf: Start CTs before resfix " Matthew Brost
2025-09-29 21:49 ` Michal Wajdeczko
2025-09-30 6:26 ` Matthew Brost
2025-09-29 2:55 ` [PATCH v3 27/36] drm/xe/vf: Abort VF post migration recovery on failure Matthew Brost
2025-10-01 14:06 ` Lis, Tomasz
2025-09-29 2:55 ` [PATCH v3 28/36] drm/xe/vf: Replay GuC submission state on pause / unpause Matthew Brost
2025-10-01 14:37 ` Lis, Tomasz
2025-09-29 2:55 ` [PATCH v3 29/36] drm/xe: Move queue init before LRC creation Matthew Brost
2025-10-02 0:44 ` Lis, Tomasz
2025-10-02 7:36 ` Matthew Brost
2025-10-02 14:54 ` Lis, Tomasz
2025-09-29 2:55 ` [PATCH v3 30/36] drm/xe/vf: Add debug prints for GuC replaying state during VF recovery Matthew Brost
2025-10-02 1:02 ` Lis, Tomasz
2025-09-29 2:55 ` [PATCH v3 31/36] drm/xe/vf: Workaround for race condition in GuC firmware during VF pause Matthew Brost
2025-10-02 1:09 ` Lis, Tomasz
2025-10-02 6:12 ` Matthew Brost
2025-09-29 2:55 ` [PATCH v3 32/36] drm/xe: Use PPGTT addresses for TLB invalidation to avoid GGTT fixups Matthew Brost
2025-10-02 1:25 ` Lis, Tomasz
2025-09-29 2:55 ` [PATCH v3 33/36] drm/xe/vf: Use primary GT ordered work queue on media GT on PTL VF Matthew Brost
2025-09-29 2:55 ` [PATCH v3 34/36] drm/xe/vf: Ensure media GT VF recovery runs after primary GT on PTL Matthew Brost
2025-09-29 2:55 ` [PATCH v3 35/36] drm/xe/vf: Rebase CCS save/restore BB GGTT addresses Matthew Brost
2025-09-29 2:55 ` [PATCH v3 36/36] drm/xe/guc: Increase wait timeout to 2sec after BUSY reply from GuC Matthew Brost
2025-09-29 15:17 ` K V P, Satyanarayana
2025-09-30 12:39 ` Matthew Brost
2025-09-30 13:38 ` Michal Wajdeczko
2025-09-30 14:39 ` Matthew Brost
2025-09-29 3:06 ` ✗ CI.checkpatch: warning for VF migration redesign (rev3) Patchwork
2025-09-29 3:08 ` ✓ CI.KUnit: success " Patchwork
2025-09-29 6:28 ` ✗ Xe.CI.Full: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b535bacb-4f28-42ba-84e7-812ec28a07d9@intel.com \
--to=tomasz.lis@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox