Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Fix a couple of wedge corner-case memory leaks
@ 2025-10-13 16:24 Stuart Summers
  2025-10-13 16:24 ` [PATCH 1/7] drm/xe: Add additional trace points for LRCs Stuart Summers
                   ` (11 more replies)
  0 siblings, 12 replies; 20+ messages in thread
From: Stuart Summers @ 2025-10-13 16:24 UTC (permalink / raw)
  Cc: intel-xe, matthew.brost, Stuart Summers

Most of the patches in this series are just adding
some debug hints to help track these down. I split
these up in case we want to pick and choose which ones
to include in the tree. I found them useful.

The main two interesting patches are the last two in the
series which are fixing some corner cases when the
driver becomes wedged in the middle of either communication
with the DRM scheduler or in the event the GuC becomes
unresponsive. In both of these cases there is a chance
we could leak memory around the exec queue members
like the LRC and the LRC BO. These patches fix those
scenarios.

v2: Address feedback from Matt:
    - Let the DRM scheduler handle pausing/unpausing
    - Still do the wait after scheduling disable/deregister
      as with the previous patch, but skip the intermediate
      software-based schedule disable using the "banned"
      flag and instead just jump straight to the deregister
      handling which will fully reset the queue state.
      Note that for this case I am seeing a hardware failure
      after submitting to GuC but before receiving the
      response from GuC. So even if we wedge in this case
      (monitoring the hardware state change), the queue
      itself is not wedged because of the active GuC
      submission (CT is not stalled at that point).

Stuart Summers (7):
  drm/xe: Add additional trace points for LRCs
  drm/xe: Add a trace point for VM close
  drm/xe: Add the BO pointer info to the BO trace
  drm/xe: Add new exec queue trace points
  drm/xe: Correct migration VM teardown order
  drm/xe: Don't block messages to the GPU scheduler
  drm/xe: Check for GuC responses on disabling scheduling

 drivers/gpu/drm/xe/xe_exec_queue.c    |  4 +++
 drivers/gpu/drm/xe/xe_gpu_scheduler.c |  6 +---
 drivers/gpu/drm/xe/xe_guc_submit.c    | 24 ++++++++++++---
 drivers/gpu/drm/xe/xe_lrc.c           |  4 +++
 drivers/gpu/drm/xe/xe_lrc.h           |  3 ++
 drivers/gpu/drm/xe/xe_migrate.c       |  2 +-
 drivers/gpu/drm/xe/xe_trace.h         | 22 ++++++++++++--
 drivers/gpu/drm/xe/xe_trace_bo.h      | 12 ++++++--
 drivers/gpu/drm/xe/xe_trace_lrc.h     | 42 ++++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_vm.c            |  2 ++
 10 files changed, 106 insertions(+), 15 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-10-13 23:13 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-13 16:24 [PATCH 0/7] Fix a couple of wedge corner-case memory leaks Stuart Summers
2025-10-13 16:24 ` [PATCH 1/7] drm/xe: Add additional trace points for LRCs Stuart Summers
2025-10-13 16:24 ` [PATCH 2/7] drm/xe: Add a trace point for VM close Stuart Summers
2025-10-13 16:25 ` [PATCH 3/7] drm/xe: Add the BO pointer info to the BO trace Stuart Summers
2025-10-13 16:25 ` [PATCH 4/7] drm/xe: Add new exec queue trace points Stuart Summers
2025-10-13 16:25 ` [PATCH 5/7] drm/xe: Correct migration VM teardown order Stuart Summers
2025-10-13 16:25 ` [PATCH 6/7] drm/xe: Don't block messages to the GPU scheduler Stuart Summers
2025-10-13 16:56   ` Matthew Brost
2025-10-13 17:17     ` Summers, Stuart
2025-10-13 17:31       ` Matthew Brost
2025-10-13 17:38         ` Summers, Stuart
2025-10-13 21:49           ` Summers, Stuart
2025-10-13 16:25 ` [PATCH 7/7] drm/xe: Check for GuC responses on disabling scheduling Stuart Summers
2025-10-13 17:04 ` [PATCH 0/7] Fix a couple of wedge corner-case memory leaks Matthew Brost
2025-10-13 17:13   ` Summers, Stuart
2025-10-13 21:48     ` Summers, Stuart
2025-10-13 18:45 ` ✗ CI.checkpatch: warning for Fix a couple of wedge corner-case memory leaks (rev2) Patchwork
2025-10-13 18:46 ` ✓ CI.KUnit: success " Patchwork
2025-10-13 19:31 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-10-13 23:13 ` ✗ Xe.CI.Full: " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox