From: Matthew Brost <matthew.brost@intel.com>
To: intel-xe@lists.freedesktop.org
Cc: stuart.summers@intel.com, arvind.yadav@intel.com,
himal.prasad.ghimiray@intel.com,
thomas.hellstrom@linux.intel.com, francois.dugast@intel.com
Subject: [PATCH v3 13/25] drm/xe: Enable CPU binds for jobs
Date: Fri, 27 Feb 2026 17:34:49 -0800 [thread overview]
Message-ID: <20260228013501.106680-14-matthew.brost@intel.com> (raw)
In-Reply-To: <20260228013501.106680-1-matthew.brost@intel.com>
No reason to use the GPU for binds.
Benefits of CPU-based binds:
- Lower latency once dependencies are resolved, as there is no
interaction with the GuC or a hardware context switch both of which
are relatively slow.
- Large arrays of binds do not risk running out of migration PTEs,
avoiding -ENOBUFS being returned to userspace.
- Kernel binds are decoupled from the migration exec queue (which issues
copies and clears), so they cannot get stuck behind unrelated
jobs—this can be a problem with parallel GPU faults.
- Paves the for path decouping binds from tiles and individual engines
- Enables ULLS on the migration exec queue, as this queue has exclusive
access to the paging copy engine.
Update migration layer to formulate a PT job which will issue CPU bind
in the submission backend.
All code related to GPU-based binding has been removed.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
drivers/gpu/drm/xe/xe_bo_types.h | 2 -
drivers/gpu/drm/xe/xe_migrate.c | 239 ++-----------------------------
drivers/gpu/drm/xe/xe_pt.c | 1 -
3 files changed, 14 insertions(+), 228 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
index d4fe3c8dca5b..bcbd23c7d2ed 100644
--- a/drivers/gpu/drm/xe/xe_bo_types.h
+++ b/drivers/gpu/drm/xe/xe_bo_types.h
@@ -79,8 +79,6 @@ struct xe_bo {
/** @freed: List node for delayed put. */
struct llist_node freed;
- /** @update_index: Update index if PT BO */
- int update_index;
/** @created: Whether the bo has passed initial creation */
bool created;
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 547affe55361..00288a2ead00 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -75,18 +75,12 @@ struct xe_migrate {
* Protected by @job_mutex.
*/
struct dma_fence *fence;
- /**
- * @vm_update_sa: For integrated, used to suballocate page-tables
- * out of the pt_bo.
- */
- struct drm_suballoc_manager vm_update_sa;
/** @min_chunk_size: For dgfx, Minimum chunk size */
u64 min_chunk_size;
};
#define MAX_PREEMPTDISABLE_TRANSFER SZ_8M /* Around 1ms. */
#define MAX_CCS_LIMITED_TRANSFER SZ_4M /* XE_PAGE_SIZE * (FIELD_MAX(XE2_CCS_SIZE_MASK) + 1) */
-#define NUM_KERNEL_PDE 15
#define NUM_PT_SLOTS 32
#define LEVEL0_PAGE_TABLE_ENCODE_SIZE SZ_2M
#define MAX_NUM_PTE 512
@@ -111,7 +105,6 @@ static void xe_migrate_fini(void *arg)
dma_fence_put(m->fence);
xe_bo_put(m->pt_bo);
- drm_suballoc_manager_fini(&m->vm_update_sa);
mutex_destroy(&m->job_mutex);
xe_vm_close_and_put(m->q->vm);
xe_exec_queue_put(m->q);
@@ -205,8 +198,6 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
BUILD_BUG_ON(NUM_PT_SLOTS > SZ_2M/XE_PAGE_SIZE);
/* Must be a multiple of 64K to support all platforms */
BUILD_BUG_ON(NUM_PT_SLOTS * XE_PAGE_SIZE % SZ_64K);
- /* And one slot reserved for the 4KiB page table updates */
- BUILD_BUG_ON(!(NUM_KERNEL_PDE & 1));
/* Need to be sure everything fits in the first PT, or create more */
xe_tile_assert(tile, m->batch_base_ofs + xe_bo_size(batch) < SZ_2M);
@@ -344,8 +335,6 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
/*
* Example layout created above, with root level = 3:
* [PT0...PT7]: kernel PT's for copy/clear; 64 or 4KiB PTE's
- * [PT8]: Kernel PT for VM_BIND, 4 KiB PTE's
- * [PT9...PT26]: Userspace PT's for VM_BIND, 4 KiB PTE's
* [PT27 = PDE 0] [PT28 = PDE 1] [PT29 = PDE 2] [PT30 & PT31 = 2M vram identity map]
*
* This makes the lowest part of the VM point to the pagetables.
@@ -353,19 +342,10 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
* and flushes, other parts of the VM can be used either for copying and
* clearing.
*
- * For performance, the kernel reserves PDE's, so about 20 are left
- * for async VM updates.
- *
* To make it easier to work, each scratch PT is put in slot (1 + PT #)
* everywhere, this allows lockless updates to scratch pages by using
* the different addresses in VM.
*/
-#define NUM_VMUSA_UNIT_PER_PAGE 32
-#define VM_SA_UPDATE_UNIT_SIZE (XE_PAGE_SIZE / NUM_VMUSA_UNIT_PER_PAGE)
-#define NUM_VMUSA_WRITES_PER_UNIT (VM_SA_UPDATE_UNIT_SIZE / sizeof(u64))
- drm_suballoc_manager_init(&m->vm_update_sa,
- (size_t)(map_ofs / XE_PAGE_SIZE - NUM_KERNEL_PDE) *
- NUM_VMUSA_UNIT_PER_PAGE, 0);
m->pt_bo = bo;
return 0;
@@ -1078,6 +1058,9 @@ struct xe_lrc *xe_migrate_lrc(struct xe_migrate *migrate)
return migrate->q->lrc[0];
}
+/* XXX: With CPU binds this can be removed in a follow up */
+#define NUM_KERNEL_PDE 15
+
static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
{
/*
@@ -1686,56 +1669,6 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m,
return fence;
}
-static void write_pgtable(struct xe_tile *tile, struct xe_bb *bb, u64 ppgtt_ofs,
- const struct xe_vm_pgtable_update_op *pt_op,
- const struct xe_vm_pgtable_update *update,
- struct xe_migrate_pt_update *pt_update)
-{
- const struct xe_migrate_pt_update_ops *ops = pt_update->ops;
- struct xe_vm *vm = pt_update->vops->vm;
- u32 chunk;
- u32 ofs = update->ofs, size = update->qwords;
-
- /*
- * If we have 512 entries (max), we would populate it ourselves,
- * and update the PDE above it to the new pointer.
- * The only time this can only happen if we have to update the top
- * PDE. This requires a BO that is almost vm->size big.
- *
- * This shouldn't be possible in practice.. might change when 16K
- * pages are used. Hence the assert.
- */
- xe_tile_assert(tile, update->qwords < MAX_NUM_PTE);
- if (!ppgtt_ofs)
- ppgtt_ofs = xe_migrate_vram_ofs(tile_to_xe(tile),
- xe_bo_addr(update->pt_bo, 0,
- XE_PAGE_SIZE), false);
-
- do {
- u64 addr = ppgtt_ofs + ofs * 8;
-
- chunk = min(size, MAX_PTE_PER_SDI);
-
- /* Ensure populatefn can do memset64 by aligning bb->cs */
- if (!(bb->len & 1))
- bb->cs[bb->len++] = MI_NOOP;
-
- bb->cs[bb->len++] = MI_STORE_DATA_IMM | MI_SDI_NUM_QW(chunk);
- bb->cs[bb->len++] = lower_32_bits(addr);
- bb->cs[bb->len++] = upper_32_bits(addr);
- if (pt_op->bind)
- ops->populate(tile, NULL, bb->cs + bb->len,
- ofs, chunk, update);
- else
- ops->clear(vm, tile, NULL, bb->cs + bb->len,
- ofs, chunk, update);
-
- bb->len += chunk * 2;
- ofs += chunk;
- size -= chunk;
- } while (size);
-}
-
struct xe_vm *xe_migrate_get_vm(struct xe_migrate *m)
{
return xe_vm_get(m->q->vm);
@@ -1836,162 +1769,18 @@ __xe_migrate_update_pgtables(struct xe_migrate *m,
{
const struct xe_migrate_pt_update_ops *ops = pt_update->ops;
struct xe_tile *tile = m->tile;
- struct xe_gt *gt = tile->primary_gt;
- struct xe_device *xe = tile_to_xe(tile);
struct xe_sched_job *job;
struct dma_fence *fence;
- struct drm_suballoc *sa_bo = NULL;
- struct xe_bb *bb;
- u32 i, j, batch_size = 0, ppgtt_ofs, update_idx, page_ofs = 0;
- u32 num_updates = 0, current_update = 0;
- u64 addr;
- int err = 0;
bool is_migrate = is_migrate_queue(m, pt_update_ops->q);
- bool usm = is_migrate && xe->info.has_usm;
-
- for (i = 0; i < pt_update_ops->num_ops; ++i) {
- struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->pt_job_ops->ops[i];
- struct xe_vm_pgtable_update *updates = pt_op->entries;
-
- num_updates += pt_op->num_entries;
- for (j = 0; j < pt_op->num_entries; ++j) {
- u32 num_cmds = DIV_ROUND_UP(updates[j].qwords,
- MAX_PTE_PER_SDI);
-
- /* align noop + MI_STORE_DATA_IMM cmd prefix */
- batch_size += 4 * num_cmds + updates[j].qwords * 2;
- }
- }
-
- /* fixed + PTE entries */
- if (IS_DGFX(xe))
- batch_size += 2;
- else
- batch_size += 6 * (num_updates / MAX_PTE_PER_SDI + 1) +
- num_updates * 2;
-
- bb = xe_bb_new(gt, batch_size, usm);
- if (IS_ERR(bb))
- return ERR_CAST(bb);
-
- /* For sysmem PTE's, need to map them in our hole.. */
- if (!IS_DGFX(xe)) {
- u16 pat_index = xe->pat.idx[XE_CACHE_WB];
- u32 ptes, ofs;
-
- ppgtt_ofs = NUM_KERNEL_PDE - 1;
- if (!is_migrate) {
- u32 num_units = DIV_ROUND_UP(num_updates,
- NUM_VMUSA_WRITES_PER_UNIT);
-
- if (num_units > m->vm_update_sa.size) {
- err = -ENOBUFS;
- goto err_bb;
- }
- sa_bo = drm_suballoc_new(&m->vm_update_sa, num_units,
- GFP_KERNEL, true, 0);
- if (IS_ERR(sa_bo)) {
- err = PTR_ERR(sa_bo);
- goto err_bb;
- }
-
- ppgtt_ofs = NUM_KERNEL_PDE +
- (drm_suballoc_soffset(sa_bo) /
- NUM_VMUSA_UNIT_PER_PAGE);
- page_ofs = (drm_suballoc_soffset(sa_bo) %
- NUM_VMUSA_UNIT_PER_PAGE) *
- VM_SA_UPDATE_UNIT_SIZE;
- }
-
- /* Map our PT's to gtt */
- i = 0;
- j = 0;
- ptes = num_updates;
- ofs = ppgtt_ofs * XE_PAGE_SIZE + page_ofs;
- while (ptes) {
- u32 chunk = min(MAX_PTE_PER_SDI, ptes);
- u32 idx = 0;
-
- bb->cs[bb->len++] = MI_STORE_DATA_IMM |
- MI_SDI_NUM_QW(chunk);
- bb->cs[bb->len++] = ofs;
- bb->cs[bb->len++] = 0; /* upper_32_bits */
-
- for (; i < pt_update_ops->num_ops; ++i) {
- struct xe_vm_pgtable_update_op *pt_op =
- &pt_update_ops->pt_job_ops->ops[i];
- struct xe_vm_pgtable_update *updates = pt_op->entries;
-
- for (; j < pt_op->num_entries; ++j, ++current_update, ++idx) {
- struct xe_vm *vm = pt_update->vops->vm;
- struct xe_bo *pt_bo = updates[j].pt_bo;
-
- if (idx == chunk)
- goto next_cmd;
-
- xe_tile_assert(tile, xe_bo_size(pt_bo) == SZ_4K);
-
- /* Map a PT at most once */
- if (pt_bo->update_index < 0)
- pt_bo->update_index = current_update;
-
- addr = vm->pt_ops->pte_encode_bo(pt_bo, 0,
- pat_index, 0);
- bb->cs[bb->len++] = lower_32_bits(addr);
- bb->cs[bb->len++] = upper_32_bits(addr);
- }
-
- j = 0;
- }
-
-next_cmd:
- ptes -= chunk;
- ofs += chunk * sizeof(u64);
- }
-
- bb->cs[bb->len++] = MI_BATCH_BUFFER_END;
- update_idx = bb->len;
-
- addr = xe_migrate_vm_addr(ppgtt_ofs, 0) +
- (page_ofs / sizeof(u64)) * XE_PAGE_SIZE;
- for (i = 0; i < pt_update_ops->num_ops; ++i) {
- struct xe_vm_pgtable_update_op *pt_op =
- &pt_update_ops->pt_job_ops->ops[i];
- struct xe_vm_pgtable_update *updates = pt_op->entries;
-
- for (j = 0; j < pt_op->num_entries; ++j) {
- struct xe_bo *pt_bo = updates[j].pt_bo;
-
- write_pgtable(tile, bb, addr +
- pt_bo->update_index * XE_PAGE_SIZE,
- pt_op, &updates[j], pt_update);
- }
- }
- } else {
- /* phys pages, no preamble required */
- bb->cs[bb->len++] = MI_BATCH_BUFFER_END;
- update_idx = bb->len;
-
- for (i = 0; i < pt_update_ops->num_ops; ++i) {
- struct xe_vm_pgtable_update_op *pt_op =
- &pt_update_ops->pt_job_ops->ops[i];
- struct xe_vm_pgtable_update *updates = pt_op->entries;
-
- for (j = 0; j < pt_op->num_entries; ++j)
- write_pgtable(tile, bb, 0, pt_op, &updates[j],
- pt_update);
- }
- }
+ int err;
- job = xe_bb_create_migration_job(pt_update_ops->q, bb,
- xe_migrate_batch_base(m, usm),
- update_idx);
+ job = xe_sched_job_create(pt_update_ops->q, NULL);
if (IS_ERR(job)) {
err = PTR_ERR(job);
- goto err_sa;
+ goto err_out;
}
- xe_sched_job_add_migrate_flush(job, MI_INVALIDATE_TLB);
+ xe_tile_assert(tile, job->is_pt_job);
if (ops->pre_commit) {
pt_update->job = job;
@@ -2002,6 +1791,12 @@ __xe_migrate_update_pgtables(struct xe_migrate *m,
if (is_migrate)
mutex_lock(&m->job_mutex);
+ job->pt_update[0].vm = pt_update->vops->vm;
+ job->pt_update[0].tile = tile;
+ job->pt_update[0].ops = ops;
+ job->pt_update[0].pt_job_ops =
+ xe_pt_job_ops_get(pt_update_ops->pt_job_ops);
+
xe_sched_job_arm(job);
fence = dma_fence_get(&job->drm.s_fence->finished);
xe_sched_job_push(job);
@@ -2009,17 +1804,11 @@ __xe_migrate_update_pgtables(struct xe_migrate *m,
if (is_migrate)
mutex_unlock(&m->job_mutex);
- xe_bb_free(bb, fence);
- drm_suballoc_free(sa_bo, fence);
-
return fence;
err_job:
xe_sched_job_put(job);
-err_sa:
- drm_suballoc_free(sa_bo, NULL);
-err_bb:
- xe_bb_free(bb, NULL);
+err_out:
return ERR_PTR(err);
}
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 0a90d1460a8b..dc567e442db2 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -380,7 +380,6 @@ xe_pt_new_shared(struct xe_walk_update *wupd, struct xe_pt *parent,
entry->pt = parent;
entry->flags = 0;
entry->qwords = 0;
- entry->pt_bo->update_index = -1;
entry->level = parent->level;
if (alloc_entries) {
--
2.34.1
next prev parent reply other threads:[~2026-02-28 1:35 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-28 1:34 [PATCH v3 00/25] CPU binds and ULLS on migration queue Matthew Brost
2026-02-28 1:34 ` [PATCH v3 01/25] drm/xe: Drop struct xe_migrate_pt_update argument from populate/clear vfuns Matthew Brost
2026-03-05 14:17 ` Francois Dugast
2026-02-28 1:34 ` [PATCH v3 02/25] drm/xe: Add xe_migrate_update_pgtables_cpu_execute helper Matthew Brost
2026-03-05 14:39 ` Francois Dugast
2026-02-28 1:34 ` [PATCH v3 03/25] drm/xe: Decouple exec queue idle check from LRC Matthew Brost
2026-03-02 20:50 ` Summers, Stuart
2026-03-02 21:02 ` Matthew Brost
2026-03-03 21:26 ` Summers, Stuart
2026-03-03 22:42 ` Matthew Brost
2026-03-03 22:54 ` Summers, Stuart
2026-02-28 1:34 ` [PATCH v3 04/25] drm/xe: Add job count to GuC exec queue snapshot Matthew Brost
2026-03-02 20:50 ` Summers, Stuart
2026-02-28 1:34 ` [PATCH v3 05/25] drm/xe: Update xe_bo_put_deferred arguments to include writeback flag Matthew Brost
2026-04-01 12:20 ` Francois Dugast
2026-04-01 22:39 ` Matthew Brost
2026-02-28 1:34 ` [PATCH v3 06/25] drm/xe: Add XE_BO_FLAG_PUT_VM_ASYNC Matthew Brost
2026-04-01 12:22 ` Francois Dugast
2026-04-01 22:38 ` Matthew Brost
2026-02-28 1:34 ` [PATCH v3 07/25] drm/xe: Update scheduler job layer to support PT jobs Matthew Brost
2026-03-03 22:50 ` Summers, Stuart
2026-03-03 23:00 ` Matthew Brost
2026-02-28 1:34 ` [PATCH v3 08/25] drm/xe: Add helpers to access PT ops Matthew Brost
2026-04-07 15:22 ` Francois Dugast
2026-02-28 1:34 ` [PATCH v3 09/25] drm/xe: Add struct xe_pt_job_ops Matthew Brost
2026-03-03 23:26 ` Summers, Stuart
2026-03-03 23:28 ` Matthew Brost
2026-02-28 1:34 ` [PATCH v3 10/25] drm/xe: Update GuC submission backend to run PT jobs Matthew Brost
2026-03-03 23:28 ` Summers, Stuart
2026-03-04 0:26 ` Matthew Brost
2026-03-04 20:43 ` Summers, Stuart
2026-03-04 21:53 ` Matthew Brost
2026-03-05 20:24 ` Summers, Stuart
2026-02-28 1:34 ` [PATCH v3 11/25] drm/xe: Store level in struct xe_vm_pgtable_update Matthew Brost
2026-03-03 23:44 ` Summers, Stuart
2026-02-28 1:34 ` [PATCH v3 12/25] drm/xe: Don't use migrate exec queue for page fault binds Matthew Brost
2026-02-28 1:34 ` Matthew Brost [this message]
2026-02-28 1:34 ` [PATCH v3 14/25] drm/xe: Remove unused arguments from xe_migrate_pt_update_ops Matthew Brost
2026-02-28 1:34 ` [PATCH v3 15/25] drm/xe: Make bind queues operate cross-tile Matthew Brost
2026-02-28 1:34 ` [PATCH v3 16/25] drm/xe: Add CPU bind layer Matthew Brost
2026-02-28 1:34 ` [PATCH v3 17/25] drm/xe: Add device flag to enable PT mirroring across tiles Matthew Brost
2026-02-28 1:34 ` [PATCH v3 18/25] drm/xe: Add xe_hw_engine_write_ring_tail Matthew Brost
2026-02-28 1:34 ` [PATCH v3 19/25] drm/xe: Add ULLS support to LRC Matthew Brost
2026-03-05 20:21 ` Francois Dugast
2026-02-28 1:34 ` [PATCH v3 20/25] drm/xe: Add ULLS migration job support to migration layer Matthew Brost
2026-03-05 23:34 ` Summers, Stuart
2026-03-09 23:11 ` Matthew Brost
2026-02-28 1:34 ` [PATCH v3 21/25] drm/xe: Add MI_SEMAPHORE_WAIT instruction defs Matthew Brost
2026-02-28 1:34 ` [PATCH v3 22/25] drm/xe: Add ULLS migration job support to ring ops Matthew Brost
2026-02-28 1:34 ` [PATCH v3 23/25] drm/xe: Add ULLS migration job support to GuC submission Matthew Brost
2026-02-28 1:35 ` [PATCH v3 24/25] drm/xe: Enter ULLS for migration jobs upon page fault or SVM prefetch Matthew Brost
2026-02-28 1:35 ` [PATCH v3 25/25] drm/xe: Add modparam to enable / disable ULLS on migrate queue Matthew Brost
2026-03-05 22:59 ` Summers, Stuart
2026-04-01 22:44 ` Matthew Brost
2026-02-28 1:43 ` ✗ CI.checkpatch: warning for CPU binds and ULLS on migration queue (rev3) Patchwork
2026-02-28 1:44 ` ✓ CI.KUnit: success " Patchwork
2026-02-28 2:32 ` ✓ Xe.CI.BAT: " Patchwork
2026-02-28 13:59 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-03-02 17:54 ` Summers, Stuart
2026-03-02 18:13 ` Matthew Brost
2026-03-05 22:56 ` [PATCH v3 00/25] CPU binds and ULLS on migration queue Summers, Stuart
2026-03-10 22:17 ` Matthew Brost
2026-03-20 15:31 ` Thomas Hellström
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260228013501.106680-14-matthew.brost@intel.com \
--to=matthew.brost@intel.com \
--cc=arvind.yadav@intel.com \
--cc=francois.dugast@intel.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=stuart.summers@intel.com \
--cc=thomas.hellstrom@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox