public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>, intel-xe@lists.freedesktop.org
Cc: stuart.summers@intel.com, arvind.yadav@intel.com,
	 himal.prasad.ghimiray@intel.com, francois.dugast@intel.com
Subject: Re: [PATCH v3 00/25] CPU binds and ULLS on migration queue
Date: Fri, 20 Mar 2026 16:31:25 +0100	[thread overview]
Message-ID: <d2d91fd42b591054d7873ebd00ed2ba945470b3a.camel@linux.intel.com> (raw)
In-Reply-To: <20260228013501.106680-1-matthew.brost@intel.com>

On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> We now have data demonstrating the need for CPU binds and ULLS on the
> migration queue, based on results generated from [1].
> 
> On BMG, measurements show that when the GPU is continuously
> processing
> faults, copy jobs run approximately 30–40µs faster (depending on the
> test case) with ULLS compared to traditional GuC submission with SLPC
> enabled on the migration queue. Startup from a cold GPU shows an even
> larger speedup. Given the critical nature of fault performance, ULLS
> appears to be a worthwhile feature.
> 
> In addition to driver telemetry, UMD compute benchmarks consistently
> show over 1GB/s improvement in pagefault benchmarks with ULLS
> enabled.
> 
> ULLS will consume more power (not yet measured) due to a continuously
> running batch on the paging engine. However, compute UMDs already do
> this on engines exposed to users, so this seems like a worthwhile
> tradeoff. To mitigate power concerns, ULLS will exit after a period
> of
> time in which no faults have been processed.

So the primary use-case would be if there is one or more SVM vmas
present, right? Is a check for this included.

> 
> CPU binds are required for ULLS to function, as the migration queue
> needs exclusive access to the paging hardware engine. 

Could you remind me why exclusive access to the paging hardware engine
is necessary?

> Thus, CPU binds
> are included here.
> 
> Beyond being a requirement for ULLS, CPU binds should also reduce
> VM-bind latency, provide clearer multi-tile and TLB-invalidation
> layering, reduce pressure on GuC during fault storms as it is
> bypassed,
> and decouple kernel binds from unrelated copy/clear jobs—especially
> beneficial when faults are serviced in parallel.

Looking at single-falt cases, I still find this surprising, considering
it should, at least in theory, be possible to proceed from the copy /
clear to a bind operation without performing a TLB flush or context
switch on BMG, and in the CPU bind case we also have the latency of
waking the bind CPU thread once the copy has finished.

>  In a parallel-faulting
> test case, average bind time was reduced by approximately 15µs. In
> the
> worst case, 2MB copy time (~60–140µs) × (number of pagefault threads
> −
> 1) of latency would otherwise be added to a single fault. Reducing
> this
> latency increases overall throughput of the fault handler.
> 
> This series can be merged in phases:
> 
> Phase 1: CPU binds (patches 1–13)
> Phase 2: CPU-bind components and multi-tile relayers (patches 14–17)
> Phase 3: ULLS on the migration execution queue (patches 18–25)

In any case, I'll proceed trying to review this. A future direction
maybe if we completely ditch the CPU binds is to rework the page-table
locking (alternatively craft a different xe_pt_pagefault page-table
implementation with CPU-page-table like locking. Since we don't need to
sync the PT access with the GPU).

/Thomas


> 
> v2:
>  - Use delayed worker to exit ULLS mode in an effort to save on power
>  - Various other cleanups
> v3:
>  - CPU bind component, multi-tile relayer
>  - Split CPU bind patches in many small patches
> 
> Matt
> 
> [1] https://patchwork.freedesktop.org/series/149811/
> 
> Matthew Brost (25):
>   drm/xe: Drop struct xe_migrate_pt_update argument from
> populate/clear
>     vfuns
>   drm/xe: Add xe_migrate_update_pgtables_cpu_execute helper
>   drm/xe: Decouple exec queue idle check from LRC
>   drm/xe: Add job count to GuC exec queue snapshot
>   drm/xe: Update xe_bo_put_deferred arguments to include writeback
> flag
>   drm/xe: Add XE_BO_FLAG_PUT_VM_ASYNC
>   drm/xe: Update scheduler job layer to support PT jobs
>   drm/xe: Add helpers to access PT ops
>   drm/xe: Add struct xe_pt_job_ops
>   drm/xe: Update GuC submission backend to run PT jobs
>   drm/xe: Store level in struct xe_vm_pgtable_update
>   drm/xe: Don't use migrate exec queue for page fault binds
>   drm/xe: Enable CPU binds for jobs
>   drm/xe: Remove unused arguments from xe_migrate_pt_update_ops
>   drm/xe: Make bind queues operate cross-tile
>   drm/xe: Add CPU bind layer
>   drm/xe: Add device flag to enable PT mirroring across tiles
>   drm/xe: Add xe_hw_engine_write_ring_tail
>   drm/xe: Add ULLS support to LRC
>   drm/xe: Add ULLS migration job support to migration layer
>   drm/xe: Add MI_SEMAPHORE_WAIT instruction defs
>   drm/xe: Add ULLS migration job support to ring ops
>   drm/xe: Add ULLS migration job support to GuC submission
>   drm/xe: Enter ULLS for migration jobs upon page fault or SVM
> prefetch
>   drm/xe: Add modparam to enable / disable ULLS on migrate queue
> 
>  drivers/gpu/drm/xe/Makefile                   |   1 +
>  .../gpu/drm/xe/instructions/xe_mi_commands.h  |   6 +
>  drivers/gpu/drm/xe/xe_bo.c                    |   8 +-
>  drivers/gpu/drm/xe/xe_bo.h                    |  11 +-
>  drivers/gpu/drm/xe/xe_bo_types.h              |   2 -
>  drivers/gpu/drm/xe/xe_cpu_bind.c              | 296 +++++++
>  drivers/gpu/drm/xe/xe_cpu_bind.h              | 118 +++
>  drivers/gpu/drm/xe/xe_debugfs.c               |   1 +
>  drivers/gpu/drm/xe/xe_defaults.h              |   1 +
>  drivers/gpu/drm/xe/xe_device.c                |  17 +-
>  drivers/gpu/drm/xe/xe_device_types.h          |  11 +
>  drivers/gpu/drm/xe/xe_drm_client.c            |   2 +-
>  drivers/gpu/drm/xe/xe_exec_queue.c            | 163 ++--
>  drivers/gpu/drm/xe/xe_exec_queue.h            |  18 +-
>  drivers/gpu/drm/xe/xe_exec_queue_types.h      |  21 +-
>  drivers/gpu/drm/xe/xe_guc_submit.c            |  82 +-
>  drivers/gpu/drm/xe/xe_guc_submit_types.h      |   2 +
>  drivers/gpu/drm/xe/xe_hw_engine.c             |  10 +
>  drivers/gpu/drm/xe/xe_hw_engine.h             |   1 +
>  drivers/gpu/drm/xe/xe_lrc.c                   |  51 ++
>  drivers/gpu/drm/xe/xe_lrc.h                   |   3 +
>  drivers/gpu/drm/xe/xe_lrc_types.h             |   4 +
>  drivers/gpu/drm/xe/xe_migrate.c               | 585 +++++--------
>  drivers/gpu/drm/xe/xe_migrate.h               |  93 +--
>  drivers/gpu/drm/xe/xe_module.c                |   4 +
>  drivers/gpu/drm/xe/xe_module.h                |   1 +
>  drivers/gpu/drm/xe/xe_pagefault.c             |   3 +
>  drivers/gpu/drm/xe/xe_pci.c                   |   2 +
>  drivers/gpu/drm/xe/xe_pci_types.h             |   1 +
>  drivers/gpu/drm/xe/xe_pt.c                    | 773 +++++++++++-----
> --
>  drivers/gpu/drm/xe/xe_pt.h                    |  12 +-
>  drivers/gpu/drm/xe/xe_pt_types.h              |  49 +-
>  drivers/gpu/drm/xe/xe_ring_ops.c              |  31 +
>  drivers/gpu/drm/xe/xe_sched_job.c             | 100 ++-
>  drivers/gpu/drm/xe/xe_sched_job_types.h       |  36 +-
>  drivers/gpu/drm/xe/xe_sync.c                  |  20 +-
>  drivers/gpu/drm/xe/xe_tlb_inval_job.c         |  28 +-
>  drivers/gpu/drm/xe/xe_tlb_inval_job.h         |   4 +-
>  drivers/gpu/drm/xe/xe_vm.c                    | 241 +++---
>  drivers/gpu/drm/xe/xe_vm.h                    |   3 +
>  drivers/gpu/drm/xe/xe_vm_types.h              |  22 +-
>  41 files changed, 1658 insertions(+), 1179 deletions(-)
>  create mode 100644 drivers/gpu/drm/xe/xe_cpu_bind.c
>  create mode 100644 drivers/gpu/drm/xe/xe_cpu_bind.h

      parent reply	other threads:[~2026-03-20 15:31 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
2026-02-28  1:34 ` [PATCH v3 01/25] drm/xe: Drop struct xe_migrate_pt_update argument from populate/clear vfuns Matthew Brost
2026-03-05 14:17   ` Francois Dugast
2026-02-28  1:34 ` [PATCH v3 02/25] drm/xe: Add xe_migrate_update_pgtables_cpu_execute helper Matthew Brost
2026-03-05 14:39   ` Francois Dugast
2026-02-28  1:34 ` [PATCH v3 03/25] drm/xe: Decouple exec queue idle check from LRC Matthew Brost
2026-03-02 20:50   ` Summers, Stuart
2026-03-02 21:02     ` Matthew Brost
2026-03-03 21:26       ` Summers, Stuart
2026-03-03 22:42         ` Matthew Brost
2026-03-03 22:54           ` Summers, Stuart
2026-02-28  1:34 ` [PATCH v3 04/25] drm/xe: Add job count to GuC exec queue snapshot Matthew Brost
2026-03-02 20:50   ` Summers, Stuart
2026-02-28  1:34 ` [PATCH v3 05/25] drm/xe: Update xe_bo_put_deferred arguments to include writeback flag Matthew Brost
2026-04-01 12:20   ` Francois Dugast
2026-04-01 22:39     ` Matthew Brost
2026-02-28  1:34 ` [PATCH v3 06/25] drm/xe: Add XE_BO_FLAG_PUT_VM_ASYNC Matthew Brost
2026-04-01 12:22   ` Francois Dugast
2026-04-01 22:38     ` Matthew Brost
2026-02-28  1:34 ` [PATCH v3 07/25] drm/xe: Update scheduler job layer to support PT jobs Matthew Brost
2026-03-03 22:50   ` Summers, Stuart
2026-03-03 23:00     ` Matthew Brost
2026-02-28  1:34 ` [PATCH v3 08/25] drm/xe: Add helpers to access PT ops Matthew Brost
2026-04-07 15:22   ` Francois Dugast
2026-02-28  1:34 ` [PATCH v3 09/25] drm/xe: Add struct xe_pt_job_ops Matthew Brost
2026-03-03 23:26   ` Summers, Stuart
2026-03-03 23:28     ` Matthew Brost
2026-02-28  1:34 ` [PATCH v3 10/25] drm/xe: Update GuC submission backend to run PT jobs Matthew Brost
2026-03-03 23:28   ` Summers, Stuart
2026-03-04  0:26     ` Matthew Brost
2026-03-04 20:43       ` Summers, Stuart
2026-03-04 21:53         ` Matthew Brost
2026-03-05 20:24           ` Summers, Stuart
2026-02-28  1:34 ` [PATCH v3 11/25] drm/xe: Store level in struct xe_vm_pgtable_update Matthew Brost
2026-03-03 23:44   ` Summers, Stuart
2026-02-28  1:34 ` [PATCH v3 12/25] drm/xe: Don't use migrate exec queue for page fault binds Matthew Brost
2026-02-28  1:34 ` [PATCH v3 13/25] drm/xe: Enable CPU binds for jobs Matthew Brost
2026-02-28  1:34 ` [PATCH v3 14/25] drm/xe: Remove unused arguments from xe_migrate_pt_update_ops Matthew Brost
2026-02-28  1:34 ` [PATCH v3 15/25] drm/xe: Make bind queues operate cross-tile Matthew Brost
2026-02-28  1:34 ` [PATCH v3 16/25] drm/xe: Add CPU bind layer Matthew Brost
2026-02-28  1:34 ` [PATCH v3 17/25] drm/xe: Add device flag to enable PT mirroring across tiles Matthew Brost
2026-02-28  1:34 ` [PATCH v3 18/25] drm/xe: Add xe_hw_engine_write_ring_tail Matthew Brost
2026-02-28  1:34 ` [PATCH v3 19/25] drm/xe: Add ULLS support to LRC Matthew Brost
2026-03-05 20:21   ` Francois Dugast
2026-02-28  1:34 ` [PATCH v3 20/25] drm/xe: Add ULLS migration job support to migration layer Matthew Brost
2026-03-05 23:34   ` Summers, Stuart
2026-03-09 23:11     ` Matthew Brost
2026-02-28  1:34 ` [PATCH v3 21/25] drm/xe: Add MI_SEMAPHORE_WAIT instruction defs Matthew Brost
2026-02-28  1:34 ` [PATCH v3 22/25] drm/xe: Add ULLS migration job support to ring ops Matthew Brost
2026-02-28  1:34 ` [PATCH v3 23/25] drm/xe: Add ULLS migration job support to GuC submission Matthew Brost
2026-02-28  1:35 ` [PATCH v3 24/25] drm/xe: Enter ULLS for migration jobs upon page fault or SVM prefetch Matthew Brost
2026-02-28  1:35 ` [PATCH v3 25/25] drm/xe: Add modparam to enable / disable ULLS on migrate queue Matthew Brost
2026-03-05 22:59   ` Summers, Stuart
2026-04-01 22:44     ` Matthew Brost
2026-02-28  1:43 ` ✗ CI.checkpatch: warning for CPU binds and ULLS on migration queue (rev3) Patchwork
2026-02-28  1:44 ` ✓ CI.KUnit: success " Patchwork
2026-02-28  2:32 ` ✓ Xe.CI.BAT: " Patchwork
2026-02-28 13:59 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-03-02 17:54   ` Summers, Stuart
2026-03-02 18:13     ` Matthew Brost
2026-03-05 22:56 ` [PATCH v3 00/25] CPU binds and ULLS on migration queue Summers, Stuart
2026-03-10 22:17   ` Matthew Brost
2026-03-20 15:31 ` Thomas Hellström [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d2d91fd42b591054d7873ebd00ed2ba945470b3a.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=arvind.yadav@intel.com \
    --cc=francois.dugast@intel.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=stuart.summers@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox