Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: intel-xe@lists.freedesktop.org
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Subject: [PATCH 0/5] drm/xe: Fix LR exec queue suspend/resume for S3/S4
Date: Fri, 22 May 2026 18:43:50 +0200	[thread overview]
Message-ID: <20260522164355.2773-1-thomas.hellstrom@linux.intel.com> (raw)

Long Running (LR) exec queues — used by compute workloads with SVM
(fault-mode) and by preempt-fence-mode — were not surviving S3/S4
suspend/resume correctly.  Five distinct problems are addressed:

1. Exec queue scheduler start during resume was not deferred: user exec
   queue schedulers were started before page table BOs and LRC BOs were
   restored.  A job in this window would cause GuC to load a context
   from stale or invalid VRAM.  User exec queue schedulers are now
   deferred until after page tables and LRC BOs are restored.  Migrate
   and kernel VM queues are still started immediately as they are
   required by the restore process itself.

2. Exec queue suspend/resume lacked coordination when multiple paths
   (PM, mode switching, preempt fences) needed to hold the queue
   suspended simultaneously.  A resume from one path could prematurely
   re-enable a queue still held suspended by another.  Each caller can
   now independently hold a suspend; the queue resumes only when all
   callers have released it.

3. During PM suspend, any user exec queue with a started-but-incomplete
   job was banned.  For LR queues this is always true — their jobs are
   designed to run indefinitely — so every PM suspend permanently
   banned the queue.  The ban is now suppressed for LR VM exec queues
   during PM suspend or hibernation while being preserved for GT reset
   (legitimate hang detection).

4. The execution mode constant EXEC_MODE_LR in xe_hw_engine_group was
   misleading since not all long-running queues use fault mode.  It is
   renamed to EXEC_MODE_FAULT.  No functional change.

5. Fault-mode (SVM) VMs use GPU page faults to access memory.  A
   running fault-mode job can re-fault pages torn down by VRAM
   eviction, racing with the eviction.  Fault-mode exec queues are now
   suspended and drained before any VRAM eviction begins.  On resume,
   they are re-registered and restarted once hardware is restored.
   Exec queues created concurrently with PM suspend are immediately
   suspended so the resume path picks them up.

Note: A prerequisite revert ("Revert drm/xe: Skip exec queue schedule
toggle if queue is idle during suspend") was already sent as a separate
patch and is not included here.

v2:
 - Dropped "Restore userspace LRC BOs early on resume": replaced by
   patch 1/5 which defers user exec queue scheduler start until after
   page tables are restored, achieving the same ordering guarantee.
 - Added patch 1/5: Defer user exec queue scheduler start until after
   page table restore.
 - Added patch 4/5: Rename EXEC_MODE_LR to EXEC_MODE_FAULT.
 - Patch 5/5: see per-patch v2 changelog.

Thomas Hellström (5):
  drm/xe/guc: Defer user exec queue scheduler start until after page
    table restore
  drm/xe/guc: Don't ban LR VM exec queues on PM suspend
  drm/xe/guc: Add suspend refcount to exec queue ops
  drm/xe: Rename EXEC_MODE_LR to EXEC_MODE_FAULT in hw engine group
  drm/xe: Suspend fault-mode LR jobs before VRAM eviction on S3/S4

 drivers/gpu/drm/xe/xe_device_types.h          |   8 +
 drivers/gpu/drm/xe/xe_exec.c                  |   2 +-
 drivers/gpu/drm/xe/xe_exec_queue_types.h      |   7 +
 drivers/gpu/drm/xe/xe_gt.c                    |  16 ++
 drivers/gpu/drm/xe/xe_gt.h                    |   2 +
 drivers/gpu/drm/xe/xe_guc.c                   |  13 ++
 drivers/gpu/drm/xe/xe_guc.h                   |   1 +
 drivers/gpu/drm/xe/xe_guc_exec_queue_types.h  |   7 +
 drivers/gpu/drm/xe/xe_guc_submit.c            | 103 ++++++++++-
 drivers/gpu/drm/xe/xe_guc_submit.h            |   2 +
 drivers/gpu/drm/xe/xe_hw_engine_group.c       | 171 ++++++++++++++++--
 drivers/gpu/drm/xe/xe_hw_engine_group.h       |   3 +
 drivers/gpu/drm/xe/xe_hw_engine_group_types.h |  11 +-
 drivers/gpu/drm/xe/xe_pm.c                    |  26 ++-
 drivers/gpu/drm/xe/xe_uc.c                    |  16 ++
 drivers/gpu/drm/xe/xe_uc.h                    |   1 +
 16 files changed, 357 insertions(+), 32 deletions(-)

-- 
2.54.0


             reply	other threads:[~2026-05-22 16:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-22 16:43 Thomas Hellström [this message]
2026-05-22 16:43 ` [PATCH 1/5] drm/xe/guc: Defer user exec queue scheduler start until after page table restore Thomas Hellström
2026-05-22 16:43 ` [PATCH 2/5] drm/xe/guc: Don't ban LR VM exec queues on PM suspend Thomas Hellström
2026-05-22 16:43 ` [PATCH 3/5] drm/xe/guc: Add suspend refcount to exec queue ops Thomas Hellström
2026-05-22 16:43 ` [PATCH 4/5] drm/xe: Rename EXEC_MODE_LR to EXEC_MODE_FAULT in hw engine group Thomas Hellström
2026-05-22 16:43 ` [PATCH 5/5] drm/xe: Suspend fault-mode LR jobs before VRAM eviction on S3/S4 Thomas Hellström
2026-05-22 18:46 ` ✓ CI.KUnit: success for drm/xe: Fix LR exec queue suspend/resume for S3/S4 (rev2) Patchwork
2026-05-22 19:23 ` ✗ Xe.CI.BAT: failure " Patchwork
2026-05-23  3:23 ` ✗ Xe.CI.FULL: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260522164355.2773-1-thomas.hellstrom@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox