* [PATCH v4 1/2] drm/ttm: Drop tt->restore after successful restore
[not found] <20260505033013.3266938-1-matthew.brost@intel.com>
@ 2026-05-05 3:30 ` Matthew Brost
2026-05-05 7:04 ` Thomas Hellström
2026-05-05 3:30 ` [PATCH v4 2/2] drm/ttm/pool: back up at native page order Matthew Brost
1 sibling, 1 reply; 6+ messages in thread
From: Matthew Brost @ 2026-05-05 3:30 UTC (permalink / raw)
To: intel-xe, dri-devel
Cc: Thomas Hellström, Christian Koenig, Huang Rui, Matthew Auld,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, linux-kernel, stable
ttm_pool_restore_and_alloc() can successfully complete the restore
process via ttm_pool_restore_commit(), but tt->restore is not dropped
afterward. As a result, subsequent backup/restore flows observe what
appears to be a completed restore, while in reality shmem handles are
still installed in tt->pages, leading to the stack trace below.
Fix this by freeing and dropping tt->restore in
ttm_pool_restore_and_alloc() upon successful completion of the restore.
20545 [ 309.784531] RIP: 0010:sg_alloc_append_table_from_pages+0x38c/0x490
20547 [ 309.809570] RSP: 0018:ffffc9000623b838 EFLAGS: 00010206
20548 [ 309.814827] RAX: 0000000000001000 RBX: ffff88816e42a160 RCX: 0000000000000000
20549 [ 309.821986] RDX: 0000000000002000 RSI: 0000000000000003 RDI: 0000000000001000
20550 [ 309.829147] RBP: ffff88816e42a168 R08: 0000000000000002 R09: 000000007ffff000
20551 [ 309.836310] R10: ffffc9000623b928 R11: 0000000000000000 R12: 000000007ffff000
20552 [ 309.843471] R13: ffff88815ba5a100 R14: 0000000000000000 R15: 0000000000000001
20553 [ 309.850634] FS: 00007f9ff305e700(0000) GS:ffff888276c94000(0000) knlGS:0000000000000000
20554 [ 309.858749] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
20555 [ 309.864519] CR2: 00007f9fca701000 CR3: 00000001565e2005 CR4: 0000000008f70ef0
20556 [ 309.871678] PKRU: 55555558
20557 [ 309.874403] Call Trace:
20558 [ 309.876866] <TASK>
20559 [ 309.878988] sg_alloc_table_from_pages_segment+0x60/0x100
20560 [ 309.884415] ? ttm_resource_manager_usage+0x36/0x60 [ttm]
20561 [ 309.889845] ? xe_tt_map_sg+0x7d/0xd0 [xe]
20562 [ 309.894045] xe_tt_map_sg+0x7d/0xd0 [xe]
20563 [ 309.898037] xe_bo_move+0x927/0xaa0 [xe]
20564 [ 309.902029] ttm_bo_handle_move_mem+0xba/0x170 [ttm]
20565 [ 309.907022] ttm_bo_validate+0xbe/0x190 [ttm]
20566 [ 309.911405] xe_bo_validate+0x9a/0x120 [xe]
20567 [ 309.915663] xe_gpuvm_validate+0xd9/0x140 [xe]
20568 [ 309.920206] drm_gpuvm_validate+0x2f0/0x5b0 [drm_gpuvm]
20569 [ 309.925459] ? drm_exec_lock_obj+0x63/0x210 [drm_exec]
20570 [ 309.930627] xe_vm_validate_rebind+0x46/0xb0 [xe]
20571 [ 309.935428] xe_exec_fn+0x20/0x40 [xe]
20572 [ 309.939249] drm_gpuvm_exec_lock+0x78/0xc0 [drm_gpuvm]
20573 [ 309.944410] xe_validation_exec_lock+0x5a/0xa0 [xe]
20574 [ 309.949385] xe_exec_ioctl+0x806/0xc30 [xe]
20575 [ 309.953639] ? ttwu_queue_wakelist+0xd9/0xf0
20576 [ 309.957935] ? __pfx_xe_exec_fn+0x10/0x10 [xe]
20577 [ 309.962449] ? __wake_up_common+0x73/0xa0
20578 [ 309.966482] ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
20579 [ 309.971263] drm_ioctl_kernel+0xa3/0x100
20580 [ 309.975209] drm_ioctl+0x213/0x440
20581 [ 309.978637] ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
20582 [ 309.983415] xe_drm_ioctl+0x67/0xd0 [xe]
20583 [ 309.987408] __x64_sys_ioctl+0x7f/0xd0
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: b63d715b8090 ("drm/ttm/pool, drm/ttm/tt: Provide a helper to shrink pages")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
v3:
- Call ttm_pool_apply_caching after freeing local restore (sashiko)
- Save alloc in snapshot on restore failure (sashiko)
v4:
- Actual 'Save alloc in snapshot on restore failure (sashiko)'
---
drivers/gpu/drm/ttm/ttm_pool.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 278bbe7a11ad..c7aab60b7f01 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -902,6 +902,7 @@ int ttm_pool_restore_and_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
{
struct ttm_pool_tt_restore *restore = tt->restore;
struct ttm_pool_alloc_state alloc;
+ int ret;
if (WARN_ON(!ttm_tt_is_backed_up(tt)))
return -EINVAL;
@@ -925,14 +926,24 @@ int ttm_pool_restore_and_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
} else {
alloc = restore->snapshot_alloc;
if (ttm_pool_restore_valid(restore)) {
- int ret = ttm_pool_restore_commit(restore, tt->backup,
- ctx, &alloc);
+ ret = ttm_pool_restore_commit(restore, tt->backup,
+ ctx, &alloc);
- if (ret)
+ if (ret) {
+ restore->snapshot_alloc = alloc;
return ret;
+ }
}
- if (!alloc.remaining_pages)
+ if (!alloc.remaining_pages) {
+ kfree(tt->restore);
+ tt->restore = NULL;
+
+ ret = ttm_pool_apply_caching(&alloc);
+ if (ret)
+ return ret;
+
return 0;
+ }
}
return __ttm_pool_alloc(pool, tt, ctx, &alloc, restore);
--
2.34.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v4 2/2] drm/ttm/pool: back up at native page order
[not found] <20260505033013.3266938-1-matthew.brost@intel.com>
2026-05-05 3:30 ` [PATCH v4 1/2] drm/ttm: Drop tt->restore after successful restore Matthew Brost
@ 2026-05-05 3:30 ` Matthew Brost
2026-05-05 9:02 ` Thomas Hellström
1 sibling, 1 reply; 6+ messages in thread
From: Matthew Brost @ 2026-05-05 3:30 UTC (permalink / raw)
To: intel-xe, dri-devel
Cc: Christian Koenig, Huang Rui, Matthew Auld, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
linux-kernel, stable, Thomas Hellström
ttm_pool_split_for_swap() splits high-order pool pages into order-0
pages during backup so each 4K page can be released to the system as
soon as it has been written to shmem. While this minimizes the
allocator's working set during reclaim, it actively fragments memory:
every TTM-backed compound page that the shrinker touches is shattered
into order-0 pages, even when the rest of the system would prefer that
the high-order block stay intact. Under sustained kswapd pressure this
is enough to drive other parts of MM into recovery loops from which
they cannot easily escape, because the memory TTM just freed is no
longer contiguous.
Stop unconditionally splitting on the backup path and back up each
compound at its native order in ttm_pool_backup():
- For each non-handle slot, read the order from the head page and
back up all 1<<order subpages to consecutive shmem indices,
writing the resulting handles into tt->pages[] as we go.
- On success, the compound is freed once at its native order. No
split_page(), no per-4K refcount juggling, no fragmentation
introduced from this path.
- Slots that already hold a backup handle from a previous partial
attempt are skipped. A compound that would extend past a
fault-injection-truncated num_pages is skipped rather than split.
A per-subpage backup failure cannot be made fully atomic: backing up a
subpage allocates a shmem folio before the source page can be released,
so under true OOM any subpage in a compound (not just the first) may
fail to be backed up with the rest of the source compound still live
and contiguous. To make forward progress in that case, fall back to
splitting the source compound and backing up its remaining subpages
individually:
- On the first per-subpage failure for a compound (and only if
order > 0), call ttm_pool_split_for_swap() to split the source
compound, release the subpages whose contents already live in
shmem (their handles in tt->pages stay valid), and retry the
failing subpage at order 0.
- Subsequent successful subpage backups in the now-split compound
free their source page individually as soon as the handle is
written.
- A second failure after splitting terminates the loop with partial
progress; the remaining order-0 subpages stay in tt->pages as
plain page pointers and are cleaned up by the normal
ttm_pool_drop_backed_up() / ttm_pool_free_range() paths.
This restores the original split-on-OOM fallback behavior while
keeping the common, non-OOM case fragmentation-free. It also
preserves the "partial backup is allowed" contract: shrunken is
incremented per backed-up subpage so the caller still sees forward
progress when a compound only partially succeeds.
The restore-side leftover-page branch in ttm_pool_restore_commit() is
left as-is for now: that path can still split a previously-retained
compound, but in practice it is unreachable under realistic workloads
(per profiling we have not been able to trigger it), so it is not
worth complicating the restore state machine to avoid the split there.
If it ever becomes a problem in practice it can be addressed
independently.
ttm_pool_split_for_swap() itself is retained both for the OOM
fallback above and for the restore path's remaining caller. The
DMA-mapped pre-backup unmap loop, the purge path, ttm_pool_free_*,
and ttm_pool_unmap_and_free() already operate at native order and
are unchanged.
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: b63d715b8090 ("drm/ttm/pool, drm/ttm/tt: Provide a helper to shrink pages")
Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Assisted-by: Claude:claude-opus-4.6
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
A follow-up should attempt writeback to shmem at folio order as well,
but the API for doing so is unclear and may be incomplete.
This patch is related to the pending series [1] and significantly
reduces the likelihood of Xe entering a kswapd loop under fragmentation.
The kswapd → shrinker → Xe shrinker → TTM backup path is still
exercised; however, with this change the backup path no longer worsens
fragmentation, which previously amplified reclaim pressure and
reinforced the kswapd loop.
Nonetheless, the pathological case that [1] aims to address still exists
and requires a proper solution. Even with this patch, a kswapd loop due
to severe fragmentation can still be triggered, although it is now
substantially harder to reproduce.
v2:
- Split pages and free immediately if backup fails are higher order
(Thomas)
v3:
- Skip handles in purge path (sashiko)
[1] https://patchwork.freedesktop.org/series/165330/
---
drivers/gpu/drm/ttm/ttm_pool.c | 87 ++++++++++++++++++++++++++++------
1 file changed, 72 insertions(+), 15 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index c7aab60b7f01..f9e631a20979 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -1047,12 +1047,11 @@ long ttm_pool_backup(struct ttm_pool *pool, struct ttm_tt *tt,
{
struct file *backup = tt->backup;
struct page *page;
- unsigned long handle;
gfp_t alloc_gfp;
gfp_t gfp;
int ret = 0;
pgoff_t shrunken = 0;
- pgoff_t i, num_pages;
+ pgoff_t i, num_pages, npages;
if (WARN_ON(ttm_tt_is_backed_up(tt)))
return -EINVAL;
@@ -1072,7 +1071,8 @@ long ttm_pool_backup(struct ttm_pool *pool, struct ttm_tt *tt,
unsigned int order;
page = tt->pages[i];
- if (unlikely(!page)) {
+ if (unlikely(!page ||
+ ttm_backup_page_ptr_is_handle(page))) {
num_pages = 1;
continue;
}
@@ -1108,28 +1108,85 @@ long ttm_pool_backup(struct ttm_pool *pool, struct ttm_tt *tt,
if (IS_ENABLED(CONFIG_FAULT_INJECTION) && should_fail(&backup_fault_inject, 1))
num_pages = DIV_ROUND_UP(num_pages, 2);
- for (i = 0; i < num_pages; ++i) {
- s64 shandle;
+ for (i = 0; i < num_pages; i += npages) {
+ unsigned int order;
+ pgoff_t j;
+ bool folio_has_been_split = false;
+ npages = 1;
page = tt->pages[i];
if (unlikely(!page))
continue;
- ttm_pool_split_for_swap(pool, page);
+ /* Already-handled entry from a previous attempt. */
+ if (unlikely(ttm_backup_page_ptr_is_handle(page)))
+ continue;
+
+ order = ttm_pool_page_order(pool, page);
+ npages = 1UL << order;
- shandle = ttm_backup_backup_page(backup, page, flags->writeback, i,
- gfp, alloc_gfp);
- if (shandle < 0) {
- /* We allow partially shrunken tts */
- ret = shandle;
+ /*
+ * Back up the compound atomically at its native order. If
+ * fault injection truncated num_pages mid-compound, skip
+ * the partial tail rather than splitting.
+ */
+ if (unlikely(i + npages > num_pages))
break;
+
+ for (j = 0; j < npages; ++j) {
+ s64 shandle;
+
+try_again_after_split:
+ if (IS_ENABLED(CONFIG_FAULT_INJECTION) &&
+ should_fail(&backup_fault_inject, 1))
+ shandle = -ENOMEM;
+ else
+ shandle = ttm_backup_backup_page(backup, page + j,
+ flags->writeback,
+ i + j, gfp,
+ alloc_gfp);
+
+ if (shandle < 0 && !folio_has_been_split && order) {
+ pgoff_t k;
+
+ /*
+ * True OOM: could not allocate a shmem folio
+ * for the next subpage. Fall back to splitting
+ * the source compound and backing up subpages
+ * individually. Release the already-backed-up
+ * subpages whose contents now live in shmem;
+ * any further failure terminates the loop with
+ * partial progress (handled by the caller).
+ */
+ folio_has_been_split = true;
+ ttm_pool_split_for_swap(pool, page);
+
+ for (k = 0; k < j; ++k) {
+ __free_pages_gpu_account(page + k, 0, false);
+ shrunken++;
+ }
+
+ goto try_again_after_split;
+ } else if (shandle < 0) {
+ ret = shandle;
+ goto out;
+ } else if (folio_has_been_split) {
+ __free_pages_gpu_account(page + j, 0, false);
+ shrunken++;
+ }
+
+ tt->pages[i + j] = ttm_backup_handle_to_page_ptr(shandle);
+ }
+
+ if (!folio_has_been_split) {
+ /* Compound fully backed up; free at native order. */
+ page->private = 0;
+ __free_pages_gpu_account(page, order, false);
+ shrunken += npages;
}
- handle = shandle;
- tt->pages[i] = ttm_backup_handle_to_page_ptr(handle);
- __free_pages_gpu_account(page, 0, false);
- shrunken++;
}
+out:
return shrunken ? shrunken : ret;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v4 1/2] drm/ttm: Drop tt->restore after successful restore
2026-05-05 3:30 ` [PATCH v4 1/2] drm/ttm: Drop tt->restore after successful restore Matthew Brost
@ 2026-05-05 7:04 ` Thomas Hellström
2026-05-05 17:35 ` Matthew Brost
0 siblings, 1 reply; 6+ messages in thread
From: Thomas Hellström @ 2026-05-05 7:04 UTC (permalink / raw)
To: Matthew Brost, intel-xe, dri-devel
Cc: Christian Koenig, Huang Rui, Matthew Auld, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
linux-kernel, stable
On Mon, 2026-05-04 at 20:30 -0700, Matthew Brost wrote:
> ttm_pool_restore_and_alloc() can successfully complete the restore
> process via ttm_pool_restore_commit(), but tt->restore is not dropped
> afterward. As a result, subsequent backup/restore flows observe what
> appears to be a completed restore, while in reality shmem handles are
> still installed in tt->pages, leading to the stack trace below.
>
> Fix this by freeing and dropping tt->restore in
> ttm_pool_restore_and_alloc() upon successful completion of the
> restore.
>
> 20545 [ 309.784531] RIP:
> 0010:sg_alloc_append_table_from_pages+0x38c/0x490
> 20547 [ 309.809570] RSP: 0018:ffffc9000623b838 EFLAGS: 00010206
> 20548 [ 309.814827] RAX: 0000000000001000 RBX: ffff88816e42a160 RCX:
> 0000000000000000
> 20549 [ 309.821986] RDX: 0000000000002000 RSI: 0000000000000003 RDI:
> 0000000000001000
> 20550 [ 309.829147] RBP: ffff88816e42a168 R08: 0000000000000002 R09:
> 000000007ffff000
> 20551 [ 309.836310] R10: ffffc9000623b928 R11: 0000000000000000 R12:
> 000000007ffff000
> 20552 [ 309.843471] R13: ffff88815ba5a100 R14: 0000000000000000 R15:
> 0000000000000001
> 20553 [ 309.850634] FS: 00007f9ff305e700(0000)
> GS:ffff888276c94000(0000) knlGS:0000000000000000
> 20554 [ 309.858749] CS: 0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> 20555 [ 309.864519] CR2: 00007f9fca701000 CR3: 00000001565e2005 CR4:
> 0000000008f70ef0
> 20556 [ 309.871678] PKRU: 55555558
> 20557 [ 309.874403] Call Trace:
> 20558 [ 309.876866] <TASK>
> 20559 [ 309.878988] sg_alloc_table_from_pages_segment+0x60/0x100
> 20560 [ 309.884415] ? ttm_resource_manager_usage+0x36/0x60 [ttm]
> 20561 [ 309.889845] ? xe_tt_map_sg+0x7d/0xd0 [xe]
> 20562 [ 309.894045] xe_tt_map_sg+0x7d/0xd0 [xe]
> 20563 [ 309.898037] xe_bo_move+0x927/0xaa0 [xe]
> 20564 [ 309.902029] ttm_bo_handle_move_mem+0xba/0x170 [ttm]
> 20565 [ 309.907022] ttm_bo_validate+0xbe/0x190 [ttm]
> 20566 [ 309.911405] xe_bo_validate+0x9a/0x120 [xe]
> 20567 [ 309.915663] xe_gpuvm_validate+0xd9/0x140 [xe]
> 20568 [ 309.920206] drm_gpuvm_validate+0x2f0/0x5b0 [drm_gpuvm]
> 20569 [ 309.925459] ? drm_exec_lock_obj+0x63/0x210 [drm_exec]
> 20570 [ 309.930627] xe_vm_validate_rebind+0x46/0xb0 [xe]
> 20571 [ 309.935428] xe_exec_fn+0x20/0x40 [xe]
> 20572 [ 309.939249] drm_gpuvm_exec_lock+0x78/0xc0 [drm_gpuvm]
> 20573 [ 309.944410] xe_validation_exec_lock+0x5a/0xa0 [xe]
> 20574 [ 309.949385] xe_exec_ioctl+0x806/0xc30 [xe]
> 20575 [ 309.953639] ? ttwu_queue_wakelist+0xd9/0xf0
> 20576 [ 309.957935] ? __pfx_xe_exec_fn+0x10/0x10 [xe]
> 20577 [ 309.962449] ? __wake_up_common+0x73/0xa0
> 20578 [ 309.966482] ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
> 20579 [ 309.971263] drm_ioctl_kernel+0xa3/0x100
> 20580 [ 309.975209] drm_ioctl+0x213/0x440
> 20581 [ 309.978637] ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
> 20582 [ 309.983415] xe_drm_ioctl+0x67/0xd0 [xe]
> 20583 [ 309.987408] __x64_sys_ioctl+0x7f/0xd0
>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Maxime Ripard <mripard@kernel.org>
> Cc: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: David Airlie <airlied@gmail.com>
> Cc: Simona Vetter <simona@ffwll.ch>
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org
> Fixes: b63d715b8090 ("drm/ttm/pool, drm/ttm/tt: Provide a helper to
> shrink pages")
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>
> ---
>
> v3:
> - Call ttm_pool_apply_caching after freeing local restore (sashiko)
> - Save alloc in snapshot on restore failure (sashiko)
> v4:
> - Actual 'Save alloc in snapshot on restore failure (sashiko)'
> ---
> drivers/gpu/drm/ttm/ttm_pool.c | 19 +++++++++++++++----
> 1 file changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
> b/drivers/gpu/drm/ttm/ttm_pool.c
> index 278bbe7a11ad..c7aab60b7f01 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -902,6 +902,7 @@ int ttm_pool_restore_and_alloc(struct ttm_pool
> *pool, struct ttm_tt *tt,
> {
> struct ttm_pool_tt_restore *restore = tt->restore;
> struct ttm_pool_alloc_state alloc;
> + int ret;
>
> if (WARN_ON(!ttm_tt_is_backed_up(tt)))
> return -EINVAL;
> @@ -925,14 +926,24 @@ int ttm_pool_restore_and_alloc(struct ttm_pool
> *pool, struct ttm_tt *tt,
> } else {
> alloc = restore->snapshot_alloc;
> if (ttm_pool_restore_valid(restore)) {
> - int ret = ttm_pool_restore_commit(restore,
> tt->backup,
> - ctx,
> &alloc);
> + ret = ttm_pool_restore_commit(restore, tt-
> >backup,
> + ctx, &alloc);
>
> - if (ret)
> + if (ret) {
> + restore->snapshot_alloc = alloc;
> return ret;
> + }
> }
> - if (!alloc.remaining_pages)
> + if (!alloc.remaining_pages) {
> + kfree(tt->restore);
> + tt->restore = NULL;
> +
> + ret = ttm_pool_apply_caching(&alloc);
return ttm_pool_apply_caching(&alloc) ?
Otherwise LGTM.
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> + if (ret)
> + return ret;
> +
> return 0;
> + }
> }
>
> return __ttm_pool_alloc(pool, tt, ctx, &alloc, restore);
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v4 2/2] drm/ttm/pool: back up at native page order
2026-05-05 3:30 ` [PATCH v4 2/2] drm/ttm/pool: back up at native page order Matthew Brost
@ 2026-05-05 9:02 ` Thomas Hellström
2026-05-05 17:36 ` Matthew Brost
0 siblings, 1 reply; 6+ messages in thread
From: Thomas Hellström @ 2026-05-05 9:02 UTC (permalink / raw)
To: Matthew Brost, intel-xe, dri-devel
Cc: Christian Koenig, Huang Rui, Matthew Auld, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
linux-kernel, stable
On Mon, 2026-05-04 at 20:30 -0700, Matthew Brost wrote:
> ttm_pool_split_for_swap() splits high-order pool pages into order-0
> pages during backup so each 4K page can be released to the system as
> soon as it has been written to shmem. While this minimizes the
> allocator's working set during reclaim, it actively fragments memory:
> every TTM-backed compound page that the shrinker touches is shattered
> into order-0 pages, even when the rest of the system would prefer
> that
> the high-order block stay intact. Under sustained kswapd pressure
> this
> is enough to drive other parts of MM into recovery loops from which
> they cannot easily escape, because the memory TTM just freed is no
> longer contiguous.
>
> Stop unconditionally splitting on the backup path and back up each
> compound at its native order in ttm_pool_backup():
>
> - For each non-handle slot, read the order from the head page and
> back up all 1<<order subpages to consecutive shmem indices,
> writing the resulting handles into tt->pages[] as we go.
> - On success, the compound is freed once at its native order. No
> split_page(), no per-4K refcount juggling, no fragmentation
> introduced from this path.
> - Slots that already hold a backup handle from a previous partial
> attempt are skipped. A compound that would extend past a
> fault-injection-truncated num_pages is skipped rather than split.
>
> A per-subpage backup failure cannot be made fully atomic: backing up
> a
> subpage allocates a shmem folio before the source page can be
> released,
> so under true OOM any subpage in a compound (not just the first) may
> fail to be backed up with the rest of the source compound still live
> and contiguous. To make forward progress in that case, fall back to
> splitting the source compound and backing up its remaining subpages
> individually:
>
> - On the first per-subpage failure for a compound (and only if
> order > 0), call ttm_pool_split_for_swap() to split the source
> compound, release the subpages whose contents already live in
> shmem (their handles in tt->pages stay valid), and retry the
> failing subpage at order 0.
> - Subsequent successful subpage backups in the now-split compound
> free their source page individually as soon as the handle is
> written.
> - A second failure after splitting terminates the loop with partial
> progress; the remaining order-0 subpages stay in tt->pages as
> plain page pointers and are cleaned up by the normal
> ttm_pool_drop_backed_up() / ttm_pool_free_range() paths.
>
> This restores the original split-on-OOM fallback behavior while
> keeping the common, non-OOM case fragmentation-free. It also
> preserves the "partial backup is allowed" contract: shrunken is
> incremented per backed-up subpage so the caller still sees forward
> progress when a compound only partially succeeds.
>
> The restore-side leftover-page branch in ttm_pool_restore_commit() is
> left as-is for now: that path can still split a previously-retained
> compound, but in practice it is unreachable under realistic workloads
> (per profiling we have not been able to trigger it), so it is not
> worth complicating the restore state machine to avoid the split
> there.
> If it ever becomes a problem in practice it can be addressed
> independently.
>
> ttm_pool_split_for_swap() itself is retained both for the OOM
> fallback above and for the restore path's remaining caller. The
> DMA-mapped pre-backup unmap loop, the purge path, ttm_pool_free_*,
> and ttm_pool_unmap_and_free() already operate at native order and
> are unchanged.
>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Maxime Ripard <mripard@kernel.org>
> Cc: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: David Airlie <airlied@gmail.com>
> Cc: Simona Vetter <simona@ffwll.ch>
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org
> Fixes: b63d715b8090 ("drm/ttm/pool, drm/ttm/tt: Provide a helper to
> shrink pages")
> Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Assisted-by: Claude:claude-opus-4.6
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>
> ---
>
> A follow-up should attempt writeback to shmem at folio order as well,
> but the API for doing so is unclear and may be incomplete.
>
> This patch is related to the pending series [1] and significantly
> reduces the likelihood of Xe entering a kswapd loop under
> fragmentation.
> The kswapd → shrinker → Xe shrinker → TTM backup path is still
> exercised; however, with this change the backup path no longer
> worsens
> fragmentation, which previously amplified reclaim pressure and
> reinforced the kswapd loop.
>
> Nonetheless, the pathological case that [1] aims to address still
> exists
> and requires a proper solution. Even with this patch, a kswapd loop
> due
> to severe fragmentation can still be triggered, although it is now
> substantially harder to reproduce.
>
> v2:
> - Split pages and free immediately if backup fails are higher order
> (Thomas)
> v3:
> - Skip handles in purge path (sashiko)
>
> [1] https://patchwork.freedesktop.org/series/165330/
> ---
> drivers/gpu/drm/ttm/ttm_pool.c | 87 ++++++++++++++++++++++++++++----
> --
> 1 file changed, 72 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
> b/drivers/gpu/drm/ttm/ttm_pool.c
> index c7aab60b7f01..f9e631a20979 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -1047,12 +1047,11 @@ long ttm_pool_backup(struct ttm_pool *pool,
> struct ttm_tt *tt,
> {
> struct file *backup = tt->backup;
> struct page *page;
> - unsigned long handle;
> gfp_t alloc_gfp;
> gfp_t gfp;
> int ret = 0;
> pgoff_t shrunken = 0;
> - pgoff_t i, num_pages;
> + pgoff_t i, num_pages, npages;
>
> if (WARN_ON(ttm_tt_is_backed_up(tt)))
> return -EINVAL;
> @@ -1072,7 +1071,8 @@ long ttm_pool_backup(struct ttm_pool *pool,
> struct ttm_tt *tt,
> unsigned int order;
>
> page = tt->pages[i];
> - if (unlikely(!page)) {
> + if (unlikely(!page ||
> +
> ttm_backup_page_ptr_is_handle(page))) {
> num_pages = 1;
> continue;
> }
> @@ -1108,28 +1108,85 @@ long ttm_pool_backup(struct ttm_pool *pool,
> struct ttm_tt *tt,
> if (IS_ENABLED(CONFIG_FAULT_INJECTION) &&
> should_fail(&backup_fault_inject, 1))
> num_pages = DIV_ROUND_UP(num_pages, 2);
>
> - for (i = 0; i < num_pages; ++i) {
> - s64 shandle;
> + for (i = 0; i < num_pages; i += npages) {
> + unsigned int order;
> + pgoff_t j;
> + bool folio_has_been_split = false;
>
> + npages = 1;
> page = tt->pages[i];
> if (unlikely(!page))
> continue;
>
> - ttm_pool_split_for_swap(pool, page);
> + /* Already-handled entry from a previous attempt. */
> + if (unlikely(ttm_backup_page_ptr_is_handle(page)))
> + continue;
> +
> + order = ttm_pool_page_order(pool, page);
> + npages = 1UL << order;
>
> - shandle = ttm_backup_backup_page(backup, page,
> flags->writeback, i,
> - gfp, alloc_gfp);
> - if (shandle < 0) {
> - /* We allow partially shrunken tts */
> - ret = shandle;
> + /*
> + * Back up the compound atomically at its native
> order. If
> + * fault injection truncated num_pages mid-compound,
> skip
> + * the partial tail rather than splitting.
> + */
> + if (unlikely(i + npages > num_pages))
> break;
> +
> + for (j = 0; j < npages; ++j) {
> + s64 shandle;
I still think we should move part of this loop to
ttm_backup_backup_folio() at this point, rather than open-coding it
here. It's the design we want to move forward with and would probably
make the pool code cleaner as well. If we think failures would be
common we could have ttm_backup_backup_folio() return the number of
pages that were actually backed up or error Otherwise just return
success or error and on error truncate the shmem pages that were
already copied.
Thanks,
Thomas
> +
> +try_again_after_split:
> + if (IS_ENABLED(CONFIG_FAULT_INJECTION) &&
> + should_fail(&backup_fault_inject, 1))
> + shandle = -ENOMEM;
> + else
> + shandle =
> ttm_backup_backup_page(backup, page + j,
> +
> flags->writeback,
> + i +
> j, gfp,
> +
> alloc_gfp);
> +
> + if (shandle < 0 && !folio_has_been_split &&
> order) {
> + pgoff_t k;
> +
> + /*
> + * True OOM: could not allocate a
> shmem folio
> + * for the next subpage. Fall back
> to splitting
> + * the source compound and backing
> up subpages
> + * individually. Release the
> already-backed-up
> + * subpages whose contents now live
> in shmem;
> + * any further failure terminates
> the loop with
> + * partial progress (handled by the
> caller).
> + */
> + folio_has_been_split = true;
> + ttm_pool_split_for_swap(pool, page);
> +
> + for (k = 0; k < j; ++k) {
> + __free_pages_gpu_account(pag
> e + k, 0, false);
> + shrunken++;
> + }
> +
> + goto try_again_after_split;
> + } else if (shandle < 0) {
> + ret = shandle;
> + goto out;
> + } else if (folio_has_been_split) {
> + __free_pages_gpu_account(page + j,
> 0, false);
> + shrunken++;
> + }
> +
> + tt->pages[i + j] =
> ttm_backup_handle_to_page_ptr(shandle);
> + }
> +
> + if (!folio_has_been_split) {
> + /* Compound fully backed up; free at native
> order. */
> + page->private = 0;
> + __free_pages_gpu_account(page, order,
> false);
> + shrunken += npages;
> }
> - handle = shandle;
> - tt->pages[i] =
> ttm_backup_handle_to_page_ptr(handle);
> - __free_pages_gpu_account(page, 0, false);
> - shrunken++;
> }
>
> +out:
> return shrunken ? shrunken : ret;
> }
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v4 1/2] drm/ttm: Drop tt->restore after successful restore
2026-05-05 7:04 ` Thomas Hellström
@ 2026-05-05 17:35 ` Matthew Brost
0 siblings, 0 replies; 6+ messages in thread
From: Matthew Brost @ 2026-05-05 17:35 UTC (permalink / raw)
To: Thomas Hellström
Cc: intel-xe, dri-devel, Christian Koenig, Huang Rui, Matthew Auld,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, linux-kernel, stable
On Tue, May 05, 2026 at 09:04:45AM +0200, Thomas Hellström wrote:
> On Mon, 2026-05-04 at 20:30 -0700, Matthew Brost wrote:
> > ttm_pool_restore_and_alloc() can successfully complete the restore
> > process via ttm_pool_restore_commit(), but tt->restore is not dropped
> > afterward. As a result, subsequent backup/restore flows observe what
> > appears to be a completed restore, while in reality shmem handles are
> > still installed in tt->pages, leading to the stack trace below.
> >
> > Fix this by freeing and dropping tt->restore in
> > ttm_pool_restore_and_alloc() upon successful completion of the
> > restore.
> >
> > 20545 [ 309.784531] RIP:
> > 0010:sg_alloc_append_table_from_pages+0x38c/0x490
> > 20547 [ 309.809570] RSP: 0018:ffffc9000623b838 EFLAGS: 00010206
> > 20548 [ 309.814827] RAX: 0000000000001000 RBX: ffff88816e42a160 RCX:
> > 0000000000000000
> > 20549 [ 309.821986] RDX: 0000000000002000 RSI: 0000000000000003 RDI:
> > 0000000000001000
> > 20550 [ 309.829147] RBP: ffff88816e42a168 R08: 0000000000000002 R09:
> > 000000007ffff000
> > 20551 [ 309.836310] R10: ffffc9000623b928 R11: 0000000000000000 R12:
> > 000000007ffff000
> > 20552 [ 309.843471] R13: ffff88815ba5a100 R14: 0000000000000000 R15:
> > 0000000000000001
> > 20553 [ 309.850634] FS: 00007f9ff305e700(0000)
> > GS:ffff888276c94000(0000) knlGS:0000000000000000
> > 20554 [ 309.858749] CS: 0010 DS: 0000 ES: 0000 CR0:
> > 0000000080050033
> > 20555 [ 309.864519] CR2: 00007f9fca701000 CR3: 00000001565e2005 CR4:
> > 0000000008f70ef0
> > 20556 [ 309.871678] PKRU: 55555558
> > 20557 [ 309.874403] Call Trace:
> > 20558 [ 309.876866] <TASK>
> > 20559 [ 309.878988] sg_alloc_table_from_pages_segment+0x60/0x100
> > 20560 [ 309.884415] ? ttm_resource_manager_usage+0x36/0x60 [ttm]
> > 20561 [ 309.889845] ? xe_tt_map_sg+0x7d/0xd0 [xe]
> > 20562 [ 309.894045] xe_tt_map_sg+0x7d/0xd0 [xe]
> > 20563 [ 309.898037] xe_bo_move+0x927/0xaa0 [xe]
> > 20564 [ 309.902029] ttm_bo_handle_move_mem+0xba/0x170 [ttm]
> > 20565 [ 309.907022] ttm_bo_validate+0xbe/0x190 [ttm]
> > 20566 [ 309.911405] xe_bo_validate+0x9a/0x120 [xe]
> > 20567 [ 309.915663] xe_gpuvm_validate+0xd9/0x140 [xe]
> > 20568 [ 309.920206] drm_gpuvm_validate+0x2f0/0x5b0 [drm_gpuvm]
> > 20569 [ 309.925459] ? drm_exec_lock_obj+0x63/0x210 [drm_exec]
> > 20570 [ 309.930627] xe_vm_validate_rebind+0x46/0xb0 [xe]
> > 20571 [ 309.935428] xe_exec_fn+0x20/0x40 [xe]
> > 20572 [ 309.939249] drm_gpuvm_exec_lock+0x78/0xc0 [drm_gpuvm]
> > 20573 [ 309.944410] xe_validation_exec_lock+0x5a/0xa0 [xe]
> > 20574 [ 309.949385] xe_exec_ioctl+0x806/0xc30 [xe]
> > 20575 [ 309.953639] ? ttwu_queue_wakelist+0xd9/0xf0
> > 20576 [ 309.957935] ? __pfx_xe_exec_fn+0x10/0x10 [xe]
> > 20577 [ 309.962449] ? __wake_up_common+0x73/0xa0
> > 20578 [ 309.966482] ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
> > 20579 [ 309.971263] drm_ioctl_kernel+0xa3/0x100
> > 20580 [ 309.975209] drm_ioctl+0x213/0x440
> > 20581 [ 309.978637] ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
> > 20582 [ 309.983415] xe_drm_ioctl+0x67/0xd0 [xe]
> > 20583 [ 309.987408] __x64_sys_ioctl+0x7f/0xd0
> >
> > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Cc: Christian Koenig <christian.koenig@amd.com>
> > Cc: Huang Rui <ray.huang@amd.com>
> > Cc: Matthew Auld <matthew.auld@intel.com>
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Cc: Maxime Ripard <mripard@kernel.org>
> > Cc: Thomas Zimmermann <tzimmermann@suse.de>
> > Cc: David Airlie <airlied@gmail.com>
> > Cc: Simona Vetter <simona@ffwll.ch>
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: stable@vger.kernel.org
> > Fixes: b63d715b8090 ("drm/ttm/pool, drm/ttm/tt: Provide a helper to
> > shrink pages")
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >
> > ---
> >
> > v3:
> > - Call ttm_pool_apply_caching after freeing local restore (sashiko)
> > - Save alloc in snapshot on restore failure (sashiko)
> > v4:
> > - Actual 'Save alloc in snapshot on restore failure (sashiko)'
> > ---
> > drivers/gpu/drm/ttm/ttm_pool.c | 19 +++++++++++++++----
> > 1 file changed, 15 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
> > b/drivers/gpu/drm/ttm/ttm_pool.c
> > index 278bbe7a11ad..c7aab60b7f01 100644
> > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > @@ -902,6 +902,7 @@ int ttm_pool_restore_and_alloc(struct ttm_pool
> > *pool, struct ttm_tt *tt,
> > {
> > struct ttm_pool_tt_restore *restore = tt->restore;
> > struct ttm_pool_alloc_state alloc;
> > + int ret;
> >
> > if (WARN_ON(!ttm_tt_is_backed_up(tt)))
> > return -EINVAL;
> > @@ -925,14 +926,24 @@ int ttm_pool_restore_and_alloc(struct ttm_pool
> > *pool, struct ttm_tt *tt,
> > } else {
> > alloc = restore->snapshot_alloc;
> > if (ttm_pool_restore_valid(restore)) {
> > - int ret = ttm_pool_restore_commit(restore,
> > tt->backup,
> > - ctx,
> > &alloc);
> > + ret = ttm_pool_restore_commit(restore, tt-
> > >backup,
> > + ctx, &alloc);
> >
> > - if (ret)
> > + if (ret) {
> > + restore->snapshot_alloc = alloc;
> > return ret;
> > + }
> > }
> > - if (!alloc.remaining_pages)
> > + if (!alloc.remaining_pages) {
> > + kfree(tt->restore);
> > + tt->restore = NULL;
> > +
> > + ret = ttm_pool_apply_caching(&alloc);
>
> return ttm_pool_apply_caching(&alloc) ?
>
Yes, will do.
> Otherwise LGTM.
> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>
Thanks.
Matt
>
> > + if (ret)
> > + return ret;
> > +
> > return 0;
> > + }
> > }
> >
> > return __ttm_pool_alloc(pool, tt, ctx, &alloc, restore);
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v4 2/2] drm/ttm/pool: back up at native page order
2026-05-05 9:02 ` Thomas Hellström
@ 2026-05-05 17:36 ` Matthew Brost
0 siblings, 0 replies; 6+ messages in thread
From: Matthew Brost @ 2026-05-05 17:36 UTC (permalink / raw)
To: Thomas Hellström
Cc: intel-xe, dri-devel, Christian Koenig, Huang Rui, Matthew Auld,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, linux-kernel, stable
On Tue, May 05, 2026 at 11:02:35AM +0200, Thomas Hellström wrote:
> On Mon, 2026-05-04 at 20:30 -0700, Matthew Brost wrote:
> > ttm_pool_split_for_swap() splits high-order pool pages into order-0
> > pages during backup so each 4K page can be released to the system as
> > soon as it has been written to shmem. While this minimizes the
> > allocator's working set during reclaim, it actively fragments memory:
> > every TTM-backed compound page that the shrinker touches is shattered
> > into order-0 pages, even when the rest of the system would prefer
> > that
> > the high-order block stay intact. Under sustained kswapd pressure
> > this
> > is enough to drive other parts of MM into recovery loops from which
> > they cannot easily escape, because the memory TTM just freed is no
> > longer contiguous.
> >
> > Stop unconditionally splitting on the backup path and back up each
> > compound at its native order in ttm_pool_backup():
> >
> > - For each non-handle slot, read the order from the head page and
> > back up all 1<<order subpages to consecutive shmem indices,
> > writing the resulting handles into tt->pages[] as we go.
> > - On success, the compound is freed once at its native order. No
> > split_page(), no per-4K refcount juggling, no fragmentation
> > introduced from this path.
> > - Slots that already hold a backup handle from a previous partial
> > attempt are skipped. A compound that would extend past a
> > fault-injection-truncated num_pages is skipped rather than split.
> >
> > A per-subpage backup failure cannot be made fully atomic: backing up
> > a
> > subpage allocates a shmem folio before the source page can be
> > released,
> > so under true OOM any subpage in a compound (not just the first) may
> > fail to be backed up with the rest of the source compound still live
> > and contiguous. To make forward progress in that case, fall back to
> > splitting the source compound and backing up its remaining subpages
> > individually:
> >
> > - On the first per-subpage failure for a compound (and only if
> > order > 0), call ttm_pool_split_for_swap() to split the source
> > compound, release the subpages whose contents already live in
> > shmem (their handles in tt->pages stay valid), and retry the
> > failing subpage at order 0.
> > - Subsequent successful subpage backups in the now-split compound
> > free their source page individually as soon as the handle is
> > written.
> > - A second failure after splitting terminates the loop with partial
> > progress; the remaining order-0 subpages stay in tt->pages as
> > plain page pointers and are cleaned up by the normal
> > ttm_pool_drop_backed_up() / ttm_pool_free_range() paths.
> >
> > This restores the original split-on-OOM fallback behavior while
> > keeping the common, non-OOM case fragmentation-free. It also
> > preserves the "partial backup is allowed" contract: shrunken is
> > incremented per backed-up subpage so the caller still sees forward
> > progress when a compound only partially succeeds.
> >
> > The restore-side leftover-page branch in ttm_pool_restore_commit() is
> > left as-is for now: that path can still split a previously-retained
> > compound, but in practice it is unreachable under realistic workloads
> > (per profiling we have not been able to trigger it), so it is not
> > worth complicating the restore state machine to avoid the split
> > there.
> > If it ever becomes a problem in practice it can be addressed
> > independently.
> >
> > ttm_pool_split_for_swap() itself is retained both for the OOM
> > fallback above and for the restore path's remaining caller. The
> > DMA-mapped pre-backup unmap loop, the purge path, ttm_pool_free_*,
> > and ttm_pool_unmap_and_free() already operate at native order and
> > are unchanged.
> >
> > Cc: Christian Koenig <christian.koenig@amd.com>
> > Cc: Huang Rui <ray.huang@amd.com>
> > Cc: Matthew Auld <matthew.auld@intel.com>
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Cc: Maxime Ripard <mripard@kernel.org>
> > Cc: Thomas Zimmermann <tzimmermann@suse.de>
> > Cc: David Airlie <airlied@gmail.com>
> > Cc: Simona Vetter <simona@ffwll.ch>
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: stable@vger.kernel.org
> > Fixes: b63d715b8090 ("drm/ttm/pool, drm/ttm/tt: Provide a helper to
> > shrink pages")
> > Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Assisted-by: Claude:claude-opus-4.6
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >
> > ---
> >
> > A follow-up should attempt writeback to shmem at folio order as well,
> > but the API for doing so is unclear and may be incomplete.
> >
> > This patch is related to the pending series [1] and significantly
> > reduces the likelihood of Xe entering a kswapd loop under
> > fragmentation.
> > The kswapd → shrinker → Xe shrinker → TTM backup path is still
> > exercised; however, with this change the backup path no longer
> > worsens
> > fragmentation, which previously amplified reclaim pressure and
> > reinforced the kswapd loop.
> >
> > Nonetheless, the pathological case that [1] aims to address still
> > exists
> > and requires a proper solution. Even with this patch, a kswapd loop
> > due
> > to severe fragmentation can still be triggered, although it is now
> > substantially harder to reproduce.
> >
> > v2:
> > - Split pages and free immediately if backup fails are higher order
> > (Thomas)
> > v3:
> > - Skip handles in purge path (sashiko)
> >
> > [1] https://patchwork.freedesktop.org/series/165330/
> > ---
> > drivers/gpu/drm/ttm/ttm_pool.c | 87 ++++++++++++++++++++++++++++----
> > --
> > 1 file changed, 72 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
> > b/drivers/gpu/drm/ttm/ttm_pool.c
> > index c7aab60b7f01..f9e631a20979 100644
> > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > @@ -1047,12 +1047,11 @@ long ttm_pool_backup(struct ttm_pool *pool,
> > struct ttm_tt *tt,
> > {
> > struct file *backup = tt->backup;
> > struct page *page;
> > - unsigned long handle;
> > gfp_t alloc_gfp;
> > gfp_t gfp;
> > int ret = 0;
> > pgoff_t shrunken = 0;
> > - pgoff_t i, num_pages;
> > + pgoff_t i, num_pages, npages;
> >
> > if (WARN_ON(ttm_tt_is_backed_up(tt)))
> > return -EINVAL;
> > @@ -1072,7 +1071,8 @@ long ttm_pool_backup(struct ttm_pool *pool,
> > struct ttm_tt *tt,
> > unsigned int order;
> >
> > page = tt->pages[i];
> > - if (unlikely(!page)) {
> > + if (unlikely(!page ||
> > +
> > ttm_backup_page_ptr_is_handle(page))) {
> > num_pages = 1;
> > continue;
> > }
> > @@ -1108,28 +1108,85 @@ long ttm_pool_backup(struct ttm_pool *pool,
> > struct ttm_tt *tt,
> > if (IS_ENABLED(CONFIG_FAULT_INJECTION) &&
> > should_fail(&backup_fault_inject, 1))
> > num_pages = DIV_ROUND_UP(num_pages, 2);
> >
> > - for (i = 0; i < num_pages; ++i) {
> > - s64 shandle;
> > + for (i = 0; i < num_pages; i += npages) {
> > + unsigned int order;
> > + pgoff_t j;
> > + bool folio_has_been_split = false;
> >
> > + npages = 1;
> > page = tt->pages[i];
> > if (unlikely(!page))
> > continue;
> >
> > - ttm_pool_split_for_swap(pool, page);
> > + /* Already-handled entry from a previous attempt. */
> > + if (unlikely(ttm_backup_page_ptr_is_handle(page)))
> > + continue;
> > +
> > + order = ttm_pool_page_order(pool, page);
> > + npages = 1UL << order;
> >
> > - shandle = ttm_backup_backup_page(backup, page,
> > flags->writeback, i,
> > - gfp, alloc_gfp);
> > - if (shandle < 0) {
> > - /* We allow partially shrunken tts */
> > - ret = shandle;
> > + /*
> > + * Back up the compound atomically at its native
> > order. If
> > + * fault injection truncated num_pages mid-compound,
> > skip
> > + * the partial tail rather than splitting.
> > + */
> > + if (unlikely(i + npages > num_pages))
> > break;
> > +
> > + for (j = 0; j < npages; ++j) {
> > + s64 shandle;
>
> I still think we should move part of this loop to
> ttm_backup_backup_folio() at this point, rather than open-coding it
> here. It's the design we want to move forward with and would probably
> make the pool code cleaner as well. If we think failures would be
> common we could have ttm_backup_backup_folio() return the number of
> pages that were actually backed up or error Otherwise just return
> success or error and on error truncate the shmem pages that were
> already copied.
>
Yes, for now I think helper should be ttm_pool layer as it relies a
several other things in ttm_pool.c that at for fixes patch I don't want
to shuffle around. So ttm_pool_backup_folio I think.
Matt
> Thanks,
> Thomas
>
>
> > +
> > +try_again_after_split:
> > + if (IS_ENABLED(CONFIG_FAULT_INJECTION) &&
> > + should_fail(&backup_fault_inject, 1))
> > + shandle = -ENOMEM;
> > + else
> > + shandle =
> > ttm_backup_backup_page(backup, page + j,
> > +
> > flags->writeback,
> > + i +
> > j, gfp,
> > +
> > alloc_gfp);
> > +
> > + if (shandle < 0 && !folio_has_been_split &&
> > order) {
> > + pgoff_t k;
> > +
> > + /*
> > + * True OOM: could not allocate a
> > shmem folio
> > + * for the next subpage. Fall back
> > to splitting
> > + * the source compound and backing
> > up subpages
> > + * individually. Release the
> > already-backed-up
> > + * subpages whose contents now live
> > in shmem;
> > + * any further failure terminates
> > the loop with
> > + * partial progress (handled by the
> > caller).
> > + */
> > + folio_has_been_split = true;
> > + ttm_pool_split_for_swap(pool, page);
> > +
> > + for (k = 0; k < j; ++k) {
> > + __free_pages_gpu_account(pag
> > e + k, 0, false);
> > + shrunken++;
> > + }
> > +
> > + goto try_again_after_split;
> > + } else if (shandle < 0) {
> > + ret = shandle;
> > + goto out;
> > + } else if (folio_has_been_split) {
> > + __free_pages_gpu_account(page + j,
> > 0, false);
> > + shrunken++;
> > + }
> > +
> > + tt->pages[i + j] =
> > ttm_backup_handle_to_page_ptr(shandle);
> > + }
> > +
> > + if (!folio_has_been_split) {
> > + /* Compound fully backed up; free at native
> > order. */
> > + page->private = 0;
> > + __free_pages_gpu_account(page, order,
> > false);
> > + shrunken += npages;
> > }
> > - handle = shandle;
> > - tt->pages[i] =
> > ttm_backup_handle_to_page_ptr(handle);
> > - __free_pages_gpu_account(page, 0, false);
> > - shrunken++;
> > }
> >
> > +out:
> > return shrunken ? shrunken : ret;
> > }
> >
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-05-05 17:37 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260505033013.3266938-1-matthew.brost@intel.com>
2026-05-05 3:30 ` [PATCH v4 1/2] drm/ttm: Drop tt->restore after successful restore Matthew Brost
2026-05-05 7:04 ` Thomas Hellström
2026-05-05 17:35 ` Matthew Brost
2026-05-05 3:30 ` [PATCH v4 2/2] drm/ttm/pool: back up at native page order Matthew Brost
2026-05-05 9:02 ` Thomas Hellström
2026-05-05 17:36 ` Matthew Brost
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox