From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: intel-xe@lists.freedesktop.org
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
"Matthew Brost" <matthew.brost@intel.com>,
"Francois Dugast" <francois.dugast@intel.com>,
"Matthew Auld" <matthew.auld@intel.com>,
"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>
Subject: [PATCH 0/4] drm/xe: Fix LR exec queue suspend/resume for S3/S4
Date: Thu, 21 May 2026 16:48:33 +0200 [thread overview]
Message-ID: <20260521144837.7363-1-thomas.hellstrom@linux.intel.com> (raw)
Long Running (LR) exec queues — used by compute workloads with SVM
(fault-mode) and by preempt-fence-mode — were not surviving S3/S4
suspend/resume correctly. Four distinct problems are addressed:
1. Exec queue ops (guc_exec_queue_suspend/resume) lacked coordination
when multiple paths (PM, mode switching, preempt fences) needed to
hold the queue suspended simultaneously. A suspend refcount ensures
the GuC SUSPEND message is only sent when the first caller suspends,
and the RESUME message only when the last caller resumes.
2. During PM suspend, guc_exec_queue_stop() banned any user exec queue
that had a started-but-incomplete job. For LR queues this is always
true — their jobs are designed to run indefinitely — so every PM
suspend permanently banned the queue. The ban is now suppressed for
LR VM exec queues during PM suspend or hibernation while being
preserved for GT reset (legitimate hang detection).
3. Userspace LRC buffer objects carried XE_BO_FLAG_PINNED_LATE_RESTORE,
deferring their VRAM restore to after xe_gt_resume(). However,
xe_gt_resume() drives context registration, which requires valid LRC
VRAM. Dropping the flag moves the restore to xe_bo_restore_early(),
a CPU/BAR copy that runs before xe_gt_resume(), fixing the ordering.
4. Fault-mode (SVM) VMs use GPU page faults to access memory. A
running fault-mode job can re-fault pages torn down by VRAM eviction,
racing with the eviction. A new xe_suspend_all_faulting_lr_jobs()
call in the PM notifier stops all fault-mode queues and waits for GuC
acknowledgement before eviction begins. On resume,
xe_resume_all_faulting_lr_jobs() mirrors the same iteration to
re-register and resume exactly those queues. A per-group pm_suspended
flag (protected by mode_sem) prevents new fault-mode exec queues from
slipping through unsuspended while PM suspend is in progress.
Note: A prerequisite revert ("Revert drm/xe: Skip exec queue schedule
toggle if queue is idle during suspend") was already sent as a separate
patch and is not included here.
Thomas Hellström (4):
drm/xe/guc: Add suspend refcount to exec queue ops
drm/xe/guc: Don't ban LR VM exec queues on PM suspend
drm/xe: Restore userspace LRC BOs early on resume
drm/xe: Suspend fault-mode LR jobs before VRAM eviction on S3/S4
drivers/gpu/drm/xe/xe_exec_queue_types.h | 7 +
drivers/gpu/drm/xe/xe_guc_exec_queue_types.h | 7 +
drivers/gpu/drm/xe/xe_guc_submit.c | 60 +++++--
drivers/gpu/drm/xe/xe_guc_submit.h | 1 +
drivers/gpu/drm/xe/xe_hw_engine_group.c | 158 +++++++++++++++++-
drivers/gpu/drm/xe/xe_hw_engine_group.h | 3 +
drivers/gpu/drm/xe/xe_hw_engine_group_types.h | 7 +
drivers/gpu/drm/xe/xe_lrc.c | 2 +-
drivers/gpu/drm/xe/xe_pm.c | 15 +-
9 files changed, 239 insertions(+), 21 deletions(-)
--
2.54.0
next reply other threads:[~2026-05-21 14:49 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-21 14:48 Thomas Hellström [this message]
2026-05-21 14:48 ` [PATCH 1/4] drm/xe/guc: Add suspend refcount to exec queue ops Thomas Hellström
2026-05-21 14:48 ` [PATCH 2/4] drm/xe/guc: Don't ban LR VM exec queues on PM suspend Thomas Hellström
2026-05-21 14:48 ` [PATCH 3/4] drm/xe: Restore userspace LRC BOs early on resume Thomas Hellström
2026-05-21 16:09 ` Matthew Auld
2026-05-21 16:31 ` Thomas Hellström
2026-05-22 9:51 ` Thomas Hellström
2026-05-22 10:05 ` Matthew Auld
2026-05-21 14:48 ` [PATCH 4/4] drm/xe: Suspend fault-mode LR jobs before VRAM eviction on S3/S4 Thomas Hellström
2026-05-21 14:56 ` ✓ CI.KUnit: success for drm/xe: Fix LR exec queue suspend/resume for S3/S4 Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260521144837.7363-1-thomas.hellstrom@linux.intel.com \
--to=thomas.hellstrom@linux.intel.com \
--cc=francois.dugast@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=maarten.lankhorst@linux.intel.com \
--cc=matthew.auld@intel.com \
--cc=matthew.brost@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox