From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>, intel-xe@lists.freedesktop.org
Cc: stuart.summers@intel.com, arvind.yadav@intel.com,
himal.prasad.ghimiray@intel.com, francois.dugast@intel.com
Subject: Re: [PATCH v3 00/25] CPU binds and ULLS on migration queue
Date: Fri, 20 Mar 2026 16:31:25 +0100 [thread overview]
Message-ID: <d2d91fd42b591054d7873ebd00ed2ba945470b3a.camel@linux.intel.com> (raw)
In-Reply-To: <20260228013501.106680-1-matthew.brost@intel.com>
On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> We now have data demonstrating the need for CPU binds and ULLS on the
> migration queue, based on results generated from [1].
>
> On BMG, measurements show that when the GPU is continuously
> processing
> faults, copy jobs run approximately 30–40µs faster (depending on the
> test case) with ULLS compared to traditional GuC submission with SLPC
> enabled on the migration queue. Startup from a cold GPU shows an even
> larger speedup. Given the critical nature of fault performance, ULLS
> appears to be a worthwhile feature.
>
> In addition to driver telemetry, UMD compute benchmarks consistently
> show over 1GB/s improvement in pagefault benchmarks with ULLS
> enabled.
>
> ULLS will consume more power (not yet measured) due to a continuously
> running batch on the paging engine. However, compute UMDs already do
> this on engines exposed to users, so this seems like a worthwhile
> tradeoff. To mitigate power concerns, ULLS will exit after a period
> of
> time in which no faults have been processed.
So the primary use-case would be if there is one or more SVM vmas
present, right? Is a check for this included.
>
> CPU binds are required for ULLS to function, as the migration queue
> needs exclusive access to the paging hardware engine.
Could you remind me why exclusive access to the paging hardware engine
is necessary?
> Thus, CPU binds
> are included here.
>
> Beyond being a requirement for ULLS, CPU binds should also reduce
> VM-bind latency, provide clearer multi-tile and TLB-invalidation
> layering, reduce pressure on GuC during fault storms as it is
> bypassed,
> and decouple kernel binds from unrelated copy/clear jobs—especially
> beneficial when faults are serviced in parallel.
Looking at single-falt cases, I still find this surprising, considering
it should, at least in theory, be possible to proceed from the copy /
clear to a bind operation without performing a TLB flush or context
switch on BMG, and in the CPU bind case we also have the latency of
waking the bind CPU thread once the copy has finished.
> In a parallel-faulting
> test case, average bind time was reduced by approximately 15µs. In
> the
> worst case, 2MB copy time (~60–140µs) × (number of pagefault threads
> −
> 1) of latency would otherwise be added to a single fault. Reducing
> this
> latency increases overall throughput of the fault handler.
>
> This series can be merged in phases:
>
> Phase 1: CPU binds (patches 1–13)
> Phase 2: CPU-bind components and multi-tile relayers (patches 14–17)
> Phase 3: ULLS on the migration execution queue (patches 18–25)
In any case, I'll proceed trying to review this. A future direction
maybe if we completely ditch the CPU binds is to rework the page-table
locking (alternatively craft a different xe_pt_pagefault page-table
implementation with CPU-page-table like locking. Since we don't need to
sync the PT access with the GPU).
/Thomas
>
> v2:
> - Use delayed worker to exit ULLS mode in an effort to save on power
> - Various other cleanups
> v3:
> - CPU bind component, multi-tile relayer
> - Split CPU bind patches in many small patches
>
> Matt
>
> [1] https://patchwork.freedesktop.org/series/149811/
>
> Matthew Brost (25):
> drm/xe: Drop struct xe_migrate_pt_update argument from
> populate/clear
> vfuns
> drm/xe: Add xe_migrate_update_pgtables_cpu_execute helper
> drm/xe: Decouple exec queue idle check from LRC
> drm/xe: Add job count to GuC exec queue snapshot
> drm/xe: Update xe_bo_put_deferred arguments to include writeback
> flag
> drm/xe: Add XE_BO_FLAG_PUT_VM_ASYNC
> drm/xe: Update scheduler job layer to support PT jobs
> drm/xe: Add helpers to access PT ops
> drm/xe: Add struct xe_pt_job_ops
> drm/xe: Update GuC submission backend to run PT jobs
> drm/xe: Store level in struct xe_vm_pgtable_update
> drm/xe: Don't use migrate exec queue for page fault binds
> drm/xe: Enable CPU binds for jobs
> drm/xe: Remove unused arguments from xe_migrate_pt_update_ops
> drm/xe: Make bind queues operate cross-tile
> drm/xe: Add CPU bind layer
> drm/xe: Add device flag to enable PT mirroring across tiles
> drm/xe: Add xe_hw_engine_write_ring_tail
> drm/xe: Add ULLS support to LRC
> drm/xe: Add ULLS migration job support to migration layer
> drm/xe: Add MI_SEMAPHORE_WAIT instruction defs
> drm/xe: Add ULLS migration job support to ring ops
> drm/xe: Add ULLS migration job support to GuC submission
> drm/xe: Enter ULLS for migration jobs upon page fault or SVM
> prefetch
> drm/xe: Add modparam to enable / disable ULLS on migrate queue
>
> drivers/gpu/drm/xe/Makefile | 1 +
> .../gpu/drm/xe/instructions/xe_mi_commands.h | 6 +
> drivers/gpu/drm/xe/xe_bo.c | 8 +-
> drivers/gpu/drm/xe/xe_bo.h | 11 +-
> drivers/gpu/drm/xe/xe_bo_types.h | 2 -
> drivers/gpu/drm/xe/xe_cpu_bind.c | 296 +++++++
> drivers/gpu/drm/xe/xe_cpu_bind.h | 118 +++
> drivers/gpu/drm/xe/xe_debugfs.c | 1 +
> drivers/gpu/drm/xe/xe_defaults.h | 1 +
> drivers/gpu/drm/xe/xe_device.c | 17 +-
> drivers/gpu/drm/xe/xe_device_types.h | 11 +
> drivers/gpu/drm/xe/xe_drm_client.c | 2 +-
> drivers/gpu/drm/xe/xe_exec_queue.c | 163 ++--
> drivers/gpu/drm/xe/xe_exec_queue.h | 18 +-
> drivers/gpu/drm/xe/xe_exec_queue_types.h | 21 +-
> drivers/gpu/drm/xe/xe_guc_submit.c | 82 +-
> drivers/gpu/drm/xe/xe_guc_submit_types.h | 2 +
> drivers/gpu/drm/xe/xe_hw_engine.c | 10 +
> drivers/gpu/drm/xe/xe_hw_engine.h | 1 +
> drivers/gpu/drm/xe/xe_lrc.c | 51 ++
> drivers/gpu/drm/xe/xe_lrc.h | 3 +
> drivers/gpu/drm/xe/xe_lrc_types.h | 4 +
> drivers/gpu/drm/xe/xe_migrate.c | 585 +++++--------
> drivers/gpu/drm/xe/xe_migrate.h | 93 +--
> drivers/gpu/drm/xe/xe_module.c | 4 +
> drivers/gpu/drm/xe/xe_module.h | 1 +
> drivers/gpu/drm/xe/xe_pagefault.c | 3 +
> drivers/gpu/drm/xe/xe_pci.c | 2 +
> drivers/gpu/drm/xe/xe_pci_types.h | 1 +
> drivers/gpu/drm/xe/xe_pt.c | 773 +++++++++++-----
> --
> drivers/gpu/drm/xe/xe_pt.h | 12 +-
> drivers/gpu/drm/xe/xe_pt_types.h | 49 +-
> drivers/gpu/drm/xe/xe_ring_ops.c | 31 +
> drivers/gpu/drm/xe/xe_sched_job.c | 100 ++-
> drivers/gpu/drm/xe/xe_sched_job_types.h | 36 +-
> drivers/gpu/drm/xe/xe_sync.c | 20 +-
> drivers/gpu/drm/xe/xe_tlb_inval_job.c | 28 +-
> drivers/gpu/drm/xe/xe_tlb_inval_job.h | 4 +-
> drivers/gpu/drm/xe/xe_vm.c | 241 +++---
> drivers/gpu/drm/xe/xe_vm.h | 3 +
> drivers/gpu/drm/xe/xe_vm_types.h | 22 +-
> 41 files changed, 1658 insertions(+), 1179 deletions(-)
> create mode 100644 drivers/gpu/drm/xe/xe_cpu_bind.c
> create mode 100644 drivers/gpu/drm/xe/xe_cpu_bind.h
prev parent reply other threads:[~2026-03-20 15:31 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-28 1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
2026-02-28 1:34 ` [PATCH v3 01/25] drm/xe: Drop struct xe_migrate_pt_update argument from populate/clear vfuns Matthew Brost
2026-03-05 14:17 ` Francois Dugast
2026-02-28 1:34 ` [PATCH v3 02/25] drm/xe: Add xe_migrate_update_pgtables_cpu_execute helper Matthew Brost
2026-03-05 14:39 ` Francois Dugast
2026-02-28 1:34 ` [PATCH v3 03/25] drm/xe: Decouple exec queue idle check from LRC Matthew Brost
2026-03-02 20:50 ` Summers, Stuart
2026-03-02 21:02 ` Matthew Brost
2026-03-03 21:26 ` Summers, Stuart
2026-03-03 22:42 ` Matthew Brost
2026-03-03 22:54 ` Summers, Stuart
2026-02-28 1:34 ` [PATCH v3 04/25] drm/xe: Add job count to GuC exec queue snapshot Matthew Brost
2026-03-02 20:50 ` Summers, Stuart
2026-02-28 1:34 ` [PATCH v3 05/25] drm/xe: Update xe_bo_put_deferred arguments to include writeback flag Matthew Brost
2026-04-01 12:20 ` Francois Dugast
2026-04-01 22:39 ` Matthew Brost
2026-02-28 1:34 ` [PATCH v3 06/25] drm/xe: Add XE_BO_FLAG_PUT_VM_ASYNC Matthew Brost
2026-04-01 12:22 ` Francois Dugast
2026-04-01 22:38 ` Matthew Brost
2026-02-28 1:34 ` [PATCH v3 07/25] drm/xe: Update scheduler job layer to support PT jobs Matthew Brost
2026-03-03 22:50 ` Summers, Stuart
2026-03-03 23:00 ` Matthew Brost
2026-02-28 1:34 ` [PATCH v3 08/25] drm/xe: Add helpers to access PT ops Matthew Brost
2026-04-07 15:22 ` Francois Dugast
2026-02-28 1:34 ` [PATCH v3 09/25] drm/xe: Add struct xe_pt_job_ops Matthew Brost
2026-03-03 23:26 ` Summers, Stuart
2026-03-03 23:28 ` Matthew Brost
2026-02-28 1:34 ` [PATCH v3 10/25] drm/xe: Update GuC submission backend to run PT jobs Matthew Brost
2026-03-03 23:28 ` Summers, Stuart
2026-03-04 0:26 ` Matthew Brost
2026-03-04 20:43 ` Summers, Stuart
2026-03-04 21:53 ` Matthew Brost
2026-03-05 20:24 ` Summers, Stuart
2026-02-28 1:34 ` [PATCH v3 11/25] drm/xe: Store level in struct xe_vm_pgtable_update Matthew Brost
2026-03-03 23:44 ` Summers, Stuart
2026-02-28 1:34 ` [PATCH v3 12/25] drm/xe: Don't use migrate exec queue for page fault binds Matthew Brost
2026-02-28 1:34 ` [PATCH v3 13/25] drm/xe: Enable CPU binds for jobs Matthew Brost
2026-02-28 1:34 ` [PATCH v3 14/25] drm/xe: Remove unused arguments from xe_migrate_pt_update_ops Matthew Brost
2026-02-28 1:34 ` [PATCH v3 15/25] drm/xe: Make bind queues operate cross-tile Matthew Brost
2026-02-28 1:34 ` [PATCH v3 16/25] drm/xe: Add CPU bind layer Matthew Brost
2026-02-28 1:34 ` [PATCH v3 17/25] drm/xe: Add device flag to enable PT mirroring across tiles Matthew Brost
2026-02-28 1:34 ` [PATCH v3 18/25] drm/xe: Add xe_hw_engine_write_ring_tail Matthew Brost
2026-02-28 1:34 ` [PATCH v3 19/25] drm/xe: Add ULLS support to LRC Matthew Brost
2026-03-05 20:21 ` Francois Dugast
2026-02-28 1:34 ` [PATCH v3 20/25] drm/xe: Add ULLS migration job support to migration layer Matthew Brost
2026-03-05 23:34 ` Summers, Stuart
2026-03-09 23:11 ` Matthew Brost
2026-02-28 1:34 ` [PATCH v3 21/25] drm/xe: Add MI_SEMAPHORE_WAIT instruction defs Matthew Brost
2026-02-28 1:34 ` [PATCH v3 22/25] drm/xe: Add ULLS migration job support to ring ops Matthew Brost
2026-02-28 1:34 ` [PATCH v3 23/25] drm/xe: Add ULLS migration job support to GuC submission Matthew Brost
2026-02-28 1:35 ` [PATCH v3 24/25] drm/xe: Enter ULLS for migration jobs upon page fault or SVM prefetch Matthew Brost
2026-02-28 1:35 ` [PATCH v3 25/25] drm/xe: Add modparam to enable / disable ULLS on migrate queue Matthew Brost
2026-03-05 22:59 ` Summers, Stuart
2026-04-01 22:44 ` Matthew Brost
2026-02-28 1:43 ` ✗ CI.checkpatch: warning for CPU binds and ULLS on migration queue (rev3) Patchwork
2026-02-28 1:44 ` ✓ CI.KUnit: success " Patchwork
2026-02-28 2:32 ` ✓ Xe.CI.BAT: " Patchwork
2026-02-28 13:59 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-03-02 17:54 ` Summers, Stuart
2026-03-02 18:13 ` Matthew Brost
2026-03-05 22:56 ` [PATCH v3 00/25] CPU binds and ULLS on migration queue Summers, Stuart
2026-03-10 22:17 ` Matthew Brost
2026-03-20 15:31 ` Thomas Hellström [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d2d91fd42b591054d7873ebd00ed2ba945470b3a.camel@linux.intel.com \
--to=thomas.hellstrom@linux.intel.com \
--cc=arvind.yadav@intel.com \
--cc=francois.dugast@intel.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
--cc=stuart.summers@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox