public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: intel-xe@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Subject: [RFC PATCH 00/12] Introduce DRM dep queue
Date: Sun, 15 Mar 2026 21:32:43 -0700	[thread overview]
Message-ID: <20260316043255.226352-1-matthew.brost@intel.com> (raw)

Diverging requirements between GPU drivers using firmware scheduling
and those using hardware scheduling have shown that drm_gpu_scheduler is
no longer sufficient for firmware-scheduled GPU drivers. The technical
debt, lack of memory-safety guarantees, absence of clear object-lifetime
rules, and numerous driver-specific hacks have rendered
drm_gpu_scheduler unmaintainable. It is time for a fresh design for
firmware-scheduled GPU drivers—one that addresses all of the
aforementioned shortcomings.

Key changes from DRM scheduler:
	- Unification of drm_gpu_scheduler, drm_sched_entity into
	  drm_dep_queue
	- Reference counting
	- Clear object-lifetime ownership
	- Privatisation of drm dep fence
	- Drop scheduled fence from drm dep fence
	- Submit workqueue bypass path (optional)
	- IRQ removal of jobs (i.e., bypass worker to drop ref to job)
	  (optional)
	- Asynchronous teardown
	- Extensive lockdep annotations and asserts
	- Extensive kernel doc

Xe has been fully converted to drm dep and extensively tested with and
without verbose kernel debug Kconfig options. Everything seems to be
working locally. All permutations of drm_dep_queue_flags tested.

Compile tested only conversions - Panthor and AMDXDNA.

Other candidates to convert are: Nouveau, PVR, and AMDGPU VM queue.

Forward looking - this design seems suitable for Rust bindings and use
by Nova, Tyr given the clean object-lifetime rules and reference
counting.

The Xe team is aligned on moving forward with this; hopefully the rest
of the community embraces it.

In addition to DRM dep, workqueues have been updated with interfaces to
enforce reclaim safety with DRM dep, and Xe is utilizing these interfaces.
This part could be split out into a separate series.

I'm also fairly certain that a sane HW queue component could be built on
top of DRM dep as well, replacing DRM sched for all drivers they choose. 

Matt

Matthew Brost (12):
  workqueue: Add interface to teach lockdep to warn on reclaim
    violations
  drm/dep: Add DRM dependency queue layer
  drm/xe: Use WQ_MEM_WARN_ON_RECLAIM on all workqueues in the reclaim
    path
  drm/xe: Issue GGTT invalidation under lock in ggtt_node_remove
  drm/xe: Return fence from xe_sched_job_arm and adjust job references
  drm/xe: Convert to DRM dep queue scheduler layer
  drm/xe: Make scheduler message lock IRQ-safe
  drm/xe: Rework exec queue object on top of DRM dep
  drm/xe: Enable IRQ job put in DRM dep
  drm/xe: Use DRM dep queue kill semantics
  accel/amdxdna: Convert to drm_dep scheduler layer
  drm/panthor: Convert to drm_dep scheduler layer

 drivers/accel/amdxdna/Kconfig                |    2 +-
 drivers/accel/amdxdna/aie2_ctx.c             |  144 +-
 drivers/accel/amdxdna/aie2_pci.h             |    4 +-
 drivers/accel/amdxdna/amdxdna_ctx.c          |    5 +-
 drivers/accel/amdxdna/amdxdna_ctx.h          |    4 +-
 drivers/gpu/drm/Kconfig                      |    4 +
 drivers/gpu/drm/Makefile                     |    1 +
 drivers/gpu/drm/dep/Makefile                 |    5 +
 drivers/gpu/drm/dep/drm_dep_fence.c          |  406 +++++
 drivers/gpu/drm/dep/drm_dep_fence.h          |   25 +
 drivers/gpu/drm/dep/drm_dep_job.c            |  675 +++++++
 drivers/gpu/drm/dep/drm_dep_job.h            |   13 +
 drivers/gpu/drm/dep/drm_dep_queue.c          | 1647 ++++++++++++++++++
 drivers/gpu/drm/dep/drm_dep_queue.h          |   31 +
 drivers/gpu/drm/panthor/Kconfig              |    2 +-
 drivers/gpu/drm/panthor/panthor_device.c     |    5 +-
 drivers/gpu/drm/panthor/panthor_device.h     |    2 +-
 drivers/gpu/drm/panthor/panthor_drv.c        |   35 +-
 drivers/gpu/drm/panthor/panthor_mmu.c        |  160 +-
 drivers/gpu/drm/panthor/panthor_mmu.h        |   14 +-
 drivers/gpu/drm/panthor/panthor_sched.c      |  242 ++-
 drivers/gpu/drm/panthor/panthor_sched.h      |   12 +-
 drivers/gpu/drm/xe/Kconfig                   |    2 +-
 drivers/gpu/drm/xe/tests/xe_migrate.c        |   10 +-
 drivers/gpu/drm/xe/xe_dep_job_types.h        |    8 +-
 drivers/gpu/drm/xe/xe_dep_scheduler.c        |   81 +-
 drivers/gpu/drm/xe/xe_dep_scheduler.h        |    7 +-
 drivers/gpu/drm/xe/xe_device.c               |    3 +-
 drivers/gpu/drm/xe/xe_exec.c                 |   12 +-
 drivers/gpu/drm/xe/xe_exec_queue.c           |   96 +-
 drivers/gpu/drm/xe/xe_exec_queue.h           |    9 +-
 drivers/gpu/drm/xe/xe_exec_queue_types.h     |   10 +-
 drivers/gpu/drm/xe/xe_execlist.c             |   39 +-
 drivers/gpu/drm/xe/xe_execlist_types.h       |    4 -
 drivers/gpu/drm/xe/xe_ggtt.c                 |   12 +-
 drivers/gpu/drm/xe/xe_gpu_scheduler.c        |   66 +-
 drivers/gpu/drm/xe/xe_gpu_scheduler.h        |   67 +-
 drivers/gpu/drm/xe/xe_gpu_scheduler_types.h  |    9 +-
 drivers/gpu/drm/xe/xe_gsc.c                  |    5 +-
 drivers/gpu/drm/xe/xe_gsc_submit.c           |    5 +-
 drivers/gpu/drm/xe/xe_gt.c                   |    8 +-
 drivers/gpu/drm/xe/xe_guc_ct.c               |    3 +-
 drivers/gpu/drm/xe/xe_guc_exec_queue_types.h |   14 +-
 drivers/gpu/drm/xe/xe_guc_submit.c           |  360 ++--
 drivers/gpu/drm/xe/xe_migrate.c              |   27 +-
 drivers/gpu/drm/xe/xe_oa.c                   |    5 +-
 drivers/gpu/drm/xe/xe_pt.c                   |    2 +-
 drivers/gpu/drm/xe/xe_pxp_submit.c           |   10 +-
 drivers/gpu/drm/xe/xe_sched_job.c            |   59 +-
 drivers/gpu/drm/xe/xe_sched_job.h            |    9 +-
 drivers/gpu/drm/xe/xe_sched_job_types.h      |    8 +-
 drivers/gpu/drm/xe/xe_sync.c                 |    2 +-
 drivers/gpu/drm/xe/xe_tlb_inval.c            |    3 +-
 drivers/gpu/drm/xe/xe_tlb_inval_job.c        |   86 +-
 include/drm/drm_dep.h                        |  597 +++++++
 include/linux/workqueue.h                    |    3 +
 include/trace/events/amdxdna.h               |   12 +-
 kernel/workqueue.c                           |   41 +
 58 files changed, 4268 insertions(+), 864 deletions(-)
 create mode 100644 drivers/gpu/drm/dep/Makefile
 create mode 100644 drivers/gpu/drm/dep/drm_dep_fence.c
 create mode 100644 drivers/gpu/drm/dep/drm_dep_fence.h
 create mode 100644 drivers/gpu/drm/dep/drm_dep_job.c
 create mode 100644 drivers/gpu/drm/dep/drm_dep_job.h
 create mode 100644 drivers/gpu/drm/dep/drm_dep_queue.c
 create mode 100644 drivers/gpu/drm/dep/drm_dep_queue.h
 create mode 100644 include/drm/drm_dep.h

-- 
2.34.1


             reply	other threads:[~2026-03-16  4:33 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-16  4:32 Matthew Brost [this message]
2026-03-16  4:32 ` [RFC PATCH 01/12] workqueue: Add interface to teach lockdep to warn on reclaim violations Matthew Brost
2026-03-25 15:59   ` Tejun Heo
2026-03-26  1:49     ` Matthew Brost
2026-03-26  2:19       ` Tejun Heo
2026-03-27  4:33         ` Matthew Brost
2026-03-27 17:25           ` Tejun Heo
2026-03-16  4:32 ` [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer Matthew Brost
2026-03-16  9:16   ` Boris Brezillon
2026-03-17  5:22     ` Matthew Brost
2026-03-17  8:48       ` Boris Brezillon
2026-03-16 10:25   ` Danilo Krummrich
2026-03-17  5:10     ` Matthew Brost
2026-03-17 12:19       ` Danilo Krummrich
2026-03-18 23:02         ` Matthew Brost
2026-03-17  2:47   ` Daniel Almeida
2026-03-17  5:45     ` Matthew Brost
2026-03-17  7:17       ` Miguel Ojeda
2026-03-17  8:26         ` Matthew Brost
2026-03-17 12:04           ` Daniel Almeida
2026-03-17 19:41           ` Miguel Ojeda
2026-03-23 17:31             ` Matthew Brost
2026-03-23 17:42               ` Miguel Ojeda
2026-03-17 18:14       ` Matthew Brost
2026-03-17 19:48         ` Daniel Almeida
2026-03-17 20:43         ` Boris Brezillon
2026-03-18 22:40           ` Matthew Brost
2026-03-19  9:57             ` Boris Brezillon
2026-03-22  6:43               ` Matthew Brost
2026-03-23  7:58                 ` Matthew Brost
2026-03-23 10:06                   ` Boris Brezillon
2026-03-23 17:11                     ` Matthew Brost
2026-03-17 12:31     ` Danilo Krummrich
2026-03-17 14:25       ` Daniel Almeida
2026-03-17 14:33         ` Danilo Krummrich
2026-03-18 22:50           ` Matthew Brost
2026-03-17  8:47   ` Christian König
2026-03-17 14:55   ` Boris Brezillon
2026-03-18 23:28     ` Matthew Brost
2026-03-19  9:11       ` Boris Brezillon
2026-03-23  4:50         ` Matthew Brost
2026-03-23  9:55           ` Boris Brezillon
2026-03-23 17:08             ` Matthew Brost
2026-03-23 18:38               ` Matthew Brost
2026-03-24  9:23                 ` Boris Brezillon
2026-03-24 16:06                   ` Matthew Brost
2026-03-25  2:33                     ` Matthew Brost
2026-03-24  8:49               ` Boris Brezillon
2026-03-24 16:51                 ` Matthew Brost
2026-03-17 16:30   ` Shashank Sharma
2026-03-16  4:32 ` [RFC PATCH 03/12] drm/xe: Use WQ_MEM_WARN_ON_RECLAIM on all workqueues in the reclaim path Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 04/12] drm/xe: Issue GGTT invalidation under lock in ggtt_node_remove Matthew Brost
2026-03-26  5:45   ` Bhadane, Dnyaneshwar
2026-03-16  4:32 ` [RFC PATCH 05/12] drm/xe: Return fence from xe_sched_job_arm and adjust job references Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 06/12] drm/xe: Convert to DRM dep queue scheduler layer Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 07/12] drm/xe: Make scheduler message lock IRQ-safe Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 08/12] drm/xe: Rework exec queue object on top of DRM dep Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 09/12] drm/xe: Enable IRQ job put in " Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 10/12] drm/xe: Use DRM dep queue kill semantics Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 11/12] accel/amdxdna: Convert to drm_dep scheduler layer Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 12/12] drm/panthor: " Matthew Brost
2026-03-16  4:52 ` ✗ CI.checkpatch: warning for Introduce DRM dep queue Patchwork
2026-03-16  4:53 ` ✓ CI.KUnit: success " Patchwork
2026-03-16  5:28 ` ✓ Xe.CI.BAT: " Patchwork
2026-03-16  8:09 ` ✗ Xe.CI.FULL: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260316043255.226352-1-matthew.brost@intel.com \
    --to=matthew.brost@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox