All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>, intel-xe@lists.freedesktop.org
Cc: stuart.summers@intel.com, arvind.yadav@intel.com,
	 himal.prasad.ghimiray@intel.com, francois.dugast@intel.com
Subject: Re: [PATCH v3 00/25] CPU binds and ULLS on migration queue
Date: Fri, 20 Mar 2026 16:31:25 +0100	[thread overview]
Message-ID: <d2d91fd42b591054d7873ebd00ed2ba945470b3a.camel@linux.intel.com> (raw)
In-Reply-To: <20260228013501.106680-1-matthew.brost@intel.com>

On Fri, 2026-02-27 at 17:34 -0800, Matthew Brost wrote:
> We now have data demonstrating the need for CPU binds and ULLS on the
> migration queue, based on results generated from [1].
> 
> On BMG, measurements show that when the GPU is continuously
> processing
> faults, copy jobs run approximately 30–40µs faster (depending on the
> test case) with ULLS compared to traditional GuC submission with SLPC
> enabled on the migration queue. Startup from a cold GPU shows an even
> larger speedup. Given the critical nature of fault performance, ULLS
> appears to be a worthwhile feature.
> 
> In addition to driver telemetry, UMD compute benchmarks consistently
> show over 1GB/s improvement in pagefault benchmarks with ULLS
> enabled.
> 
> ULLS will consume more power (not yet measured) due to a continuously
> running batch on the paging engine. However, compute UMDs already do
> this on engines exposed to users, so this seems like a worthwhile
> tradeoff. To mitigate power concerns, ULLS will exit after a period
> of
> time in which no faults have been processed.

So the primary use-case would be if there is one or more SVM vmas
present, right? Is a check for this included.

> 
> CPU binds are required for ULLS to function, as the migration queue
> needs exclusive access to the paging hardware engine. 

Could you remind me why exclusive access to the paging hardware engine
is necessary?

> Thus, CPU binds
> are included here.
> 
> Beyond being a requirement for ULLS, CPU binds should also reduce
> VM-bind latency, provide clearer multi-tile and TLB-invalidation
> layering, reduce pressure on GuC during fault storms as it is
> bypassed,
> and decouple kernel binds from unrelated copy/clear jobs—especially
> beneficial when faults are serviced in parallel.

Looking at single-falt cases, I still find this surprising, considering
it should, at least in theory, be possible to proceed from the copy /
clear to a bind operation without performing a TLB flush or context
switch on BMG, and in the CPU bind case we also have the latency of
waking the bind CPU thread once the copy has finished.

>  In a parallel-faulting
> test case, average bind time was reduced by approximately 15µs. In
> the
> worst case, 2MB copy time (~60–140µs) × (number of pagefault threads
> −
> 1) of latency would otherwise be added to a single fault. Reducing
> this
> latency increases overall throughput of the fault handler.
> 
> This series can be merged in phases:
> 
> Phase 1: CPU binds (patches 1–13)
> Phase 2: CPU-bind components and multi-tile relayers (patches 14–17)
> Phase 3: ULLS on the migration execution queue (patches 18–25)

In any case, I'll proceed trying to review this. A future direction
maybe if we completely ditch the CPU binds is to rework the page-table
locking (alternatively craft a different xe_pt_pagefault page-table
implementation with CPU-page-table like locking. Since we don't need to
sync the PT access with the GPU).

/Thomas


> 
> v2:
>  - Use delayed worker to exit ULLS mode in an effort to save on power
>  - Various other cleanups
> v3:
>  - CPU bind component, multi-tile relayer
>  - Split CPU bind patches in many small patches
> 
> Matt
> 
> [1] https://patchwork.freedesktop.org/series/149811/
> 
> Matthew Brost (25):
>   drm/xe: Drop struct xe_migrate_pt_update argument from
> populate/clear
>     vfuns
>   drm/xe: Add xe_migrate_update_pgtables_cpu_execute helper
>   drm/xe: Decouple exec queue idle check from LRC
>   drm/xe: Add job count to GuC exec queue snapshot
>   drm/xe: Update xe_bo_put_deferred arguments to include writeback
> flag
>   drm/xe: Add XE_BO_FLAG_PUT_VM_ASYNC
>   drm/xe: Update scheduler job layer to support PT jobs
>   drm/xe: Add helpers to access PT ops
>   drm/xe: Add struct xe_pt_job_ops
>   drm/xe: Update GuC submission backend to run PT jobs
>   drm/xe: Store level in struct xe_vm_pgtable_update
>   drm/xe: Don't use migrate exec queue for page fault binds
>   drm/xe: Enable CPU binds for jobs
>   drm/xe: Remove unused arguments from xe_migrate_pt_update_ops
>   drm/xe: Make bind queues operate cross-tile
>   drm/xe: Add CPU bind layer
>   drm/xe: Add device flag to enable PT mirroring across tiles
>   drm/xe: Add xe_hw_engine_write_ring_tail
>   drm/xe: Add ULLS support to LRC
>   drm/xe: Add ULLS migration job support to migration layer
>   drm/xe: Add MI_SEMAPHORE_WAIT instruction defs
>   drm/xe: Add ULLS migration job support to ring ops
>   drm/xe: Add ULLS migration job support to GuC submission
>   drm/xe: Enter ULLS for migration jobs upon page fault or SVM
> prefetch
>   drm/xe: Add modparam to enable / disable ULLS on migrate queue
> 
>  drivers/gpu/drm/xe/Makefile                   |   1 +
>  .../gpu/drm/xe/instructions/xe_mi_commands.h  |   6 +
>  drivers/gpu/drm/xe/xe_bo.c                    |   8 +-
>  drivers/gpu/drm/xe/xe_bo.h                    |  11 +-
>  drivers/gpu/drm/xe/xe_bo_types.h              |   2 -
>  drivers/gpu/drm/xe/xe_cpu_bind.c              | 296 +++++++
>  drivers/gpu/drm/xe/xe_cpu_bind.h              | 118 +++
>  drivers/gpu/drm/xe/xe_debugfs.c               |   1 +
>  drivers/gpu/drm/xe/xe_defaults.h              |   1 +
>  drivers/gpu/drm/xe/xe_device.c                |  17 +-
>  drivers/gpu/drm/xe/xe_device_types.h          |  11 +
>  drivers/gpu/drm/xe/xe_drm_client.c            |   2 +-
>  drivers/gpu/drm/xe/xe_exec_queue.c            | 163 ++--
>  drivers/gpu/drm/xe/xe_exec_queue.h            |  18 +-
>  drivers/gpu/drm/xe/xe_exec_queue_types.h      |  21 +-
>  drivers/gpu/drm/xe/xe_guc_submit.c            |  82 +-
>  drivers/gpu/drm/xe/xe_guc_submit_types.h      |   2 +
>  drivers/gpu/drm/xe/xe_hw_engine.c             |  10 +
>  drivers/gpu/drm/xe/xe_hw_engine.h             |   1 +
>  drivers/gpu/drm/xe/xe_lrc.c                   |  51 ++
>  drivers/gpu/drm/xe/xe_lrc.h                   |   3 +
>  drivers/gpu/drm/xe/xe_lrc_types.h             |   4 +
>  drivers/gpu/drm/xe/xe_migrate.c               | 585 +++++--------
>  drivers/gpu/drm/xe/xe_migrate.h               |  93 +--
>  drivers/gpu/drm/xe/xe_module.c                |   4 +
>  drivers/gpu/drm/xe/xe_module.h                |   1 +
>  drivers/gpu/drm/xe/xe_pagefault.c             |   3 +
>  drivers/gpu/drm/xe/xe_pci.c                   |   2 +
>  drivers/gpu/drm/xe/xe_pci_types.h             |   1 +
>  drivers/gpu/drm/xe/xe_pt.c                    | 773 +++++++++++-----
> --
>  drivers/gpu/drm/xe/xe_pt.h                    |  12 +-
>  drivers/gpu/drm/xe/xe_pt_types.h              |  49 +-
>  drivers/gpu/drm/xe/xe_ring_ops.c              |  31 +
>  drivers/gpu/drm/xe/xe_sched_job.c             | 100 ++-
>  drivers/gpu/drm/xe/xe_sched_job_types.h       |  36 +-
>  drivers/gpu/drm/xe/xe_sync.c                  |  20 +-
>  drivers/gpu/drm/xe/xe_tlb_inval_job.c         |  28 +-
>  drivers/gpu/drm/xe/xe_tlb_inval_job.h         |   4 +-
>  drivers/gpu/drm/xe/xe_vm.c                    | 241 +++---
>  drivers/gpu/drm/xe/xe_vm.h                    |   3 +
>  drivers/gpu/drm/xe/xe_vm_types.h              |  22 +-
>  41 files changed, 1658 insertions(+), 1179 deletions(-)
>  create mode 100644 drivers/gpu/drm/xe/xe_cpu_bind.c
>  create mode 100644 drivers/gpu/drm/xe/xe_cpu_bind.h

      parent reply	other threads:[~2026-03-20 15:31 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-28  1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
2026-02-28  1:34 ` [PATCH v3 01/25] drm/xe: Drop struct xe_migrate_pt_update argument from populate/clear vfuns Matthew Brost
2026-03-05 14:17   ` Francois Dugast
2026-02-28  1:34 ` [PATCH v3 02/25] drm/xe: Add xe_migrate_update_pgtables_cpu_execute helper Matthew Brost
2026-03-05 14:39   ` Francois Dugast
2026-02-28  1:34 ` [PATCH v3 03/25] drm/xe: Decouple exec queue idle check from LRC Matthew Brost
2026-03-02 20:50   ` Summers, Stuart
2026-03-02 21:02     ` Matthew Brost
2026-03-03 21:26       ` Summers, Stuart
2026-03-03 22:42         ` Matthew Brost
2026-03-03 22:54           ` Summers, Stuart
2026-02-28  1:34 ` [PATCH v3 04/25] drm/xe: Add job count to GuC exec queue snapshot Matthew Brost
2026-03-02 20:50   ` Summers, Stuart
2026-02-28  1:34 ` [PATCH v3 05/25] drm/xe: Update xe_bo_put_deferred arguments to include writeback flag Matthew Brost
2026-04-01 12:20   ` Francois Dugast
2026-04-01 22:39     ` Matthew Brost
2026-02-28  1:34 ` [PATCH v3 06/25] drm/xe: Add XE_BO_FLAG_PUT_VM_ASYNC Matthew Brost
2026-04-01 12:22   ` Francois Dugast
2026-04-01 22:38     ` Matthew Brost
2026-02-28  1:34 ` [PATCH v3 07/25] drm/xe: Update scheduler job layer to support PT jobs Matthew Brost
2026-03-03 22:50   ` Summers, Stuart
2026-03-03 23:00     ` Matthew Brost
2026-02-28  1:34 ` [PATCH v3 08/25] drm/xe: Add helpers to access PT ops Matthew Brost
2026-04-07 15:22   ` Francois Dugast
2026-02-28  1:34 ` [PATCH v3 09/25] drm/xe: Add struct xe_pt_job_ops Matthew Brost
2026-03-03 23:26   ` Summers, Stuart
2026-03-03 23:28     ` Matthew Brost
2026-02-28  1:34 ` [PATCH v3 10/25] drm/xe: Update GuC submission backend to run PT jobs Matthew Brost
2026-03-03 23:28   ` Summers, Stuart
2026-03-04  0:26     ` Matthew Brost
2026-03-04 20:43       ` Summers, Stuart
2026-03-04 21:53         ` Matthew Brost
2026-03-05 20:24           ` Summers, Stuart
2026-02-28  1:34 ` [PATCH v3 11/25] drm/xe: Store level in struct xe_vm_pgtable_update Matthew Brost
2026-03-03 23:44   ` Summers, Stuart
2026-02-28  1:34 ` [PATCH v3 12/25] drm/xe: Don't use migrate exec queue for page fault binds Matthew Brost
2026-02-28  1:34 ` [PATCH v3 13/25] drm/xe: Enable CPU binds for jobs Matthew Brost
2026-02-28  1:34 ` [PATCH v3 14/25] drm/xe: Remove unused arguments from xe_migrate_pt_update_ops Matthew Brost
2026-02-28  1:34 ` [PATCH v3 15/25] drm/xe: Make bind queues operate cross-tile Matthew Brost
2026-02-28  1:34 ` [PATCH v3 16/25] drm/xe: Add CPU bind layer Matthew Brost
2026-02-28  1:34 ` [PATCH v3 17/25] drm/xe: Add device flag to enable PT mirroring across tiles Matthew Brost
2026-02-28  1:34 ` [PATCH v3 18/25] drm/xe: Add xe_hw_engine_write_ring_tail Matthew Brost
2026-02-28  1:34 ` [PATCH v3 19/25] drm/xe: Add ULLS support to LRC Matthew Brost
2026-03-05 20:21   ` Francois Dugast
2026-02-28  1:34 ` [PATCH v3 20/25] drm/xe: Add ULLS migration job support to migration layer Matthew Brost
2026-03-05 23:34   ` Summers, Stuart
2026-03-09 23:11     ` Matthew Brost
2026-02-28  1:34 ` [PATCH v3 21/25] drm/xe: Add MI_SEMAPHORE_WAIT instruction defs Matthew Brost
2026-02-28  1:34 ` [PATCH v3 22/25] drm/xe: Add ULLS migration job support to ring ops Matthew Brost
2026-02-28  1:34 ` [PATCH v3 23/25] drm/xe: Add ULLS migration job support to GuC submission Matthew Brost
2026-02-28  1:35 ` [PATCH v3 24/25] drm/xe: Enter ULLS for migration jobs upon page fault or SVM prefetch Matthew Brost
2026-02-28  1:35 ` [PATCH v3 25/25] drm/xe: Add modparam to enable / disable ULLS on migrate queue Matthew Brost
2026-03-05 22:59   ` Summers, Stuart
2026-04-01 22:44     ` Matthew Brost
2026-02-28  1:43 ` ✗ CI.checkpatch: warning for CPU binds and ULLS on migration queue (rev3) Patchwork
2026-02-28  1:44 ` ✓ CI.KUnit: success " Patchwork
2026-02-28  2:32 ` ✓ Xe.CI.BAT: " Patchwork
2026-02-28 13:59 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-03-02 17:54   ` Summers, Stuart
2026-03-02 18:13     ` Matthew Brost
2026-03-05 22:56 ` [PATCH v3 00/25] CPU binds and ULLS on migration queue Summers, Stuart
2026-03-10 22:17   ` Matthew Brost
2026-03-20 15:31 ` Thomas Hellström [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d2d91fd42b591054d7873ebd00ed2ba945470b3a.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=arvind.yadav@intel.com \
    --cc=francois.dugast@intel.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=stuart.summers@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.