* [PATCH 0/7] Accelerate page migration with batch copying and hardware offload
@ 2026-04-28 15:50 Shivank Garg
2026-04-28 15:50 ` [PATCH 1/7] mm/migrate: rename PAGE_ migration flags to FOLIO_ Shivank Garg
` (8 more replies)
0 siblings, 9 replies; 13+ messages in thread
From: Shivank Garg @ 2026-04-28 15:50 UTC (permalink / raw)
To: akpm, david
Cc: kinseyho, weixugc, ljs, Liam.Howlett, vbabka, willy, rppt, surenb,
mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
vkoul, bharata, sj, rientjes, xuezhengchu, yiannis, dave.hansen,
hannes, jhubbard, peterx, riel, shakeel.butt, stalexan, tj,
nifan.cxl, jic23, aneesh.kumar, nathan.lynch, Frank.li, djbw,
linux-kernel, linux-mm, Shivank Garg
This is the fifth RFC of the patchset to enhance page migration by
batching folio-copy operations and enabling acceleration via DMA offload.
Single-threaded, folio-by-folio copying bottlenecks page migration in
modern systems with deep memory hierarchies, especially for large folios
where copy overhead dominates, leaving significant hardware potential
untapped.
By batching the copy phase, we create an opportunity for hardware
acceleration. This series builds the framework and provides a DMA
offload driver (dcbm) as a reference implementation, targeting bulk
migration workloads where offloading the copy improves throughput
and latency while freeing the CPU cycles.
See the RFC V3 cover letter [2] for motivation.
Changelog since V4:
-------------------
1. Renamed PAGE_* migration state flags to FOLIO_*. (David)
2. Use the new folio->migrate_info field instead of folio->private
for migration state. (David)
3. Fold folios_mc_copy patch in batch-copy implementation patch. (David)
3. Renamed migrate_offload_start()/stop() to register()/unregister().
(Huang, Ying)
4. Dropped should_batch() callback from struct migrator. Reason-based
policy now lives in migrate_pages_batch(). Migrators can still skip
a batch they don't want (size based policy). (Huang, Ying)
5. CONFIG_MIGRATION_COPY_OFFLOAD is now hidden and selected by the
migrator driver. CONFIG_DCBM_DMA is tristate. (Huang Ying, Gregory Price).
6. Wrapped the SRCU + static_call dispatch in a small helper. (Huang, Ying)
7. Requir m->owner in migrate_offload_register(), SRCU sync at
unregister relies on it. Counters are atomic_long_t to avoid lock-order
issue.
9. Moved DCBM sysfs from /sys/kernel/dcbm to /sys/module/dcbm (Huang, Ying)
10. Rebased on v7.1-rc1.
DESIGN:
-------
New Migration Flow:
[ migrate_pages_batch() ]
|
|--> do_batch = migrate_offload_do_batch(reason) // core filters by migration reason
|
|--> for each folio:
| migrate_folio_unmap() // unmap the folio
| |
| +--> (success):
| if do_batch && folio_supports_batch_copy():
| -> unmap_batch / dst_batch // batch list for copy offloading
| else:
| -> unmap_single / dst_single // single lists for per-folio CPU copy
|
|--> try_to_unmap_flush() // single batched TLB flush
|
|--> Batch copy (if unmap_batch not empty):
| - Migrator is configurable at runtime via sysfs.
|
| static_call(migrate_offload_copy) // Pluggable Migrators
| / | \
| v v v
| [ Default ] [ DMA Offload ] [ ... ]
|
| On -EOPNOTSUPP or other error, batch falls back to per-folio CPU copy.
|
+--> migrate_folios_move() // metadata, update PTEs, finalize
(batch list with already_copied=true, single list with false)
Offload Registration:
Driver fills struct migrator { .name, .offload_copy, .owner } and calls
migrate_offload_register(). This:
- Pins the module via try_module_get()
- Patches the migrate_offload_copy() static_call target
- Enables the migrate_offload_enabled static branch
migrate_offload_unregister() disables the static branch and reverts
the static_call, then synchronize_srcu() waits for in-flight migrations
before module_put().
PERFORMANCE RESULTS:
--------------------
Re-ran the V4 workload on v7.1-rc1 with this series; relative
speedups match V4 (~6x for 2MB folios at 16 DMA channels). No design
change in V5 alters this picture; please refer to the V4 cover letter
for the throughput tables [1].
PLAN:
-----
Patches 1-4 (the batching infrastructure) don't depend on the migrator
interface, so if it helps I can split them off and post them ahead of
the migrator and DCBM bits, which still have a few open questions to
work through.
I would appreciate guidance on splitting the infrastructure portion
ahead of the migrator interface if that matches maintainers' preference.
OPEN QUESTIONS:
---------------
1. Should the batch path run without a registered migrator? Patches 1-4
are self-contained and use folios_mc_copy() (CPU). I have several
options like making batch path always-on for eligible folios, or
giving admin an option to flip the static branch, or keep the gate.
I'm leaning toward always-on.
2. Carrying already_copied via folio->migrate_info vs changing the
migrate_folio() callback signature (Huang, Ying). I went with the
field for now to avoid touching every fs callback before the design
settles. Happy to revisit.
3. Per-caller offload selection: Today eligibility is by migrate_reason
only. Some are latency-tolerant, others may be not. Is reason the
right granularity, or do we want a per-caller hint?
4. Cgroup integration: How should per-cgroup be accounted for different
migrators (e.g.: any accounting for DMA-busy time)?
5. Tuning migrate_pages callers for offloading. For instance, in
compaction COMPACT_CLUSTER_MAX = 32 caps DMA's payoff for compaction
(V4 experiment).
6. Where do batch-size thresholds live, and how are they tuned? Per
Huang Ying's split, that policy lives in the migrator. DCBM has no
threshold today. Open whether it should later be a per-migrator
sysfs knob or hard-coded; probably clearer once a second migrator
(SDXI, mtcopy) shows the trade-off.
FOLLOW-UPS:
--------------
1. dmaengine_prep_dma_memcpy_sg() in DCBM (Vinod Koul). The SG-prep
variant cuts per-batch prep/submit cost (=CPU savings), but ptdma does
not implement the SG hook yet [10]. The end-to-end migration throughput
delta is small because per-descriptor execute time dominates.
I'll post the ptdma SG hook + DCBM switch as a follow-up.
2. SDXI as a second migrator. The SDXI series [11] is in review. SDXI is
a generic memcpy engine without DMA_PRIVATE, so channel acquisition
goes through dma_find_channel() or async_tx rather than
dma_request_chan_by_mask(). I have a local DCBM variant working on top
of the SDXI driver. I'm planning to send it as a follow-up once the
SDXI series settles.
3. IOMMU SG merging in DCBM (Gregory). dma_map_sgtable() may merge
contiguous PFNs unevenly, so src.nents != dst.nents. DCBM falls back
to CPU for safety. Though I haven't seen it on Zen3 + PTDMA. I'll
understand this and address it a follow-up.
4. Revisit Multi-threaded CPU copy migrator once the infra is settled.
EARLIER POSTINGS:
-----------------
[1] RFC V4: https://lore.kernel.org/all/20260309120725.308854-3-shivankg@amd.com
[2] RFC V3: https://lore.kernel.org/all/20250923174752.35701-1-shivankg@amd.com
[3] RFC V2: https://lore.kernel.org/all/20250319192211.10092-1-shivankg@amd.com
[4] RFC V1: https://lore.kernel.org/all/20240614221525.19170-1-shivankg@amd.com
[5] RFC from Zi Yan: https://lore.kernel.org/all/20250103172419.4148674-1-ziy@nvidia.com
RELATED DISCUSSIONS:
--------------------
[6] MM-alignment Session [Nov 12, 2025]:
https://lore.kernel.org/linux-mm/bd6a3c75-b9f0-cbcf-f7c4-1ef5dff06d24@google.com
[7] Linux Memory Hotness and Promotion call [Nov 6, 2025]:
https://lore.kernel.org/linux-mm/8ff2fd10-c9ac-4912-cf56-7ecd4afd2770@google.com
[8] LSFMM 2025:
https://lore.kernel.org/all/cf6fc05d-c0b0-4de3-985e-5403977aa3aa@amd.com
[9] OSS India:
https://ossindia2025.sched.com/event/23Jk1
[10] DMA_MEMCPY_SG comparison:
https://lore.kernel.org/linux-mm/3e73addb-ac01-4a05-bc75-c6c1c56072df@amd.com
[11] SDXI V1:
https://lore.kernel.org/all/20260410-sdxi-base-v1-0-1d184cb5c60a@amd.com
Thanks to everyone who reviewed, tested or participated in discussions
around this series. Your feedback helped me throughout the development
process.
Best Regards,
Shivank
Shivank Garg (6):
mm/migrate: rename PAGE_ migration flags to FOLIO_
mm/migrate: use migrate_info field instead of private
mm/migrate: skip data copy for already-copied folios
mm/migrate: add batch-copy path in migrate_pages_batch
mm/migrate: add copy offload registration infrastructure
drivers/migrate_offload: add DMA batch copy driver (dcbm)
Zi Yan (1):
mm/migrate: adjust NR_MAX_BATCHED_MIGRATION for testing
drivers/Kconfig | 2 +
drivers/Makefile | 2 +
drivers/migrate_offload/Kconfig | 9 +
drivers/migrate_offload/Makefile | 1 +
drivers/migrate_offload/dcbm/Makefile | 1 +
drivers/migrate_offload/dcbm/dcbm.c | 440 ++++++++++++++++++++++++++
include/linux/migrate_copy_offload.h | 44 +++
include/linux/mm.h | 2 +
include/linux/mm_types.h | 1 +
mm/Kconfig | 6 +
mm/Makefile | 1 +
mm/migrate.c | 211 ++++++++----
mm/migrate_copy_offload.c | 94 ++++++
mm/util.c | 30 ++
14 files changed, 784 insertions(+), 60 deletions(-)
create mode 100644 drivers/migrate_offload/Kconfig
create mode 100644 drivers/migrate_offload/Makefile
create mode 100644 drivers/migrate_offload/dcbm/Makefile
create mode 100644 drivers/migrate_offload/dcbm/dcbm.c
create mode 100644 include/linux/migrate_copy_offload.h
create mode 100644 mm/migrate_copy_offload.c
base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
--
2.43.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/7] mm/migrate: rename PAGE_ migration flags to FOLIO_
2026-04-28 15:50 [PATCH 0/7] Accelerate page migration with batch copying and hardware offload Shivank Garg
@ 2026-04-28 15:50 ` Shivank Garg
2026-04-30 9:07 ` Huang, Ying
2026-04-28 15:50 ` [PATCH 2/7] mm/migrate: use migrate_info field instead of private Shivank Garg
` (7 subsequent siblings)
8 siblings, 1 reply; 13+ messages in thread
From: Shivank Garg @ 2026-04-28 15:50 UTC (permalink / raw)
To: akpm, david
Cc: kinseyho, weixugc, ljs, Liam.Howlett, vbabka, willy, rppt, surenb,
mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
vkoul, bharata, sj, rientjes, xuezhengchu, yiannis, dave.hansen,
hannes, jhubbard, peterx, riel, shakeel.butt, stalexan, tj,
nifan.cxl, jic23, aneesh.kumar, nathan.lynch, Frank.li, djbw,
linux-kernel, linux-mm, Shivank Garg, Baolin Wang, Lance Yang
These flags only track folio-specific state during migration and are
not used for movable_ops pages. Rename the enum values and the
old_page_state variable to match.
No functional change.
Suggested-by: David Hildenbrand <david@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
mm/migrate.c | 48 +++++++++++++++++++++++-------------------------
1 file changed, 23 insertions(+), 25 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 8a64291ab5b4..0c6a0ab6ecce 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1135,26 +1135,24 @@ static int move_to_new_folio(struct folio *dst, struct folio *src,
* This is safe because nobody is using it except us.
*/
enum {
- PAGE_WAS_MAPPED = BIT(0),
- PAGE_WAS_MLOCKED = BIT(1),
- PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED,
+ FOLIO_WAS_MAPPED = BIT(0),
+ FOLIO_WAS_MLOCKED = BIT(1),
+ FOLIO_OLD_STATES = FOLIO_WAS_MAPPED | FOLIO_WAS_MLOCKED,
};
static void __migrate_folio_record(struct folio *dst,
- int old_page_state,
- struct anon_vma *anon_vma)
+ int old_folio_state, struct anon_vma *anon_vma)
{
- dst->private = (void *)anon_vma + old_page_state;
+ dst->private = (void *)anon_vma + old_folio_state;
}
static void __migrate_folio_extract(struct folio *dst,
- int *old_page_state,
- struct anon_vma **anon_vmap)
+ int *old_folio_state, struct anon_vma **anon_vmap)
{
unsigned long private = (unsigned long)dst->private;
- *anon_vmap = (struct anon_vma *)(private & ~PAGE_OLD_STATES);
- *old_page_state = private & PAGE_OLD_STATES;
+ *anon_vmap = (struct anon_vma *)(private & ~FOLIO_OLD_STATES);
+ *old_folio_state = private & FOLIO_OLD_STATES;
dst->private = NULL;
}
@@ -1209,7 +1207,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
{
struct folio *dst;
int rc = -EAGAIN;
- int old_page_state = 0;
+ int old_folio_state = 0;
struct anon_vma *anon_vma = NULL;
bool locked = false;
bool dst_locked = false;
@@ -1253,7 +1251,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
}
locked = true;
if (folio_test_mlocked(src))
- old_page_state |= PAGE_WAS_MLOCKED;
+ old_folio_state |= FOLIO_WAS_MLOCKED;
if (folio_test_writeback(src)) {
/*
@@ -1302,7 +1300,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
dst_locked = true;
if (unlikely(page_has_movable_ops(&src->page))) {
- __migrate_folio_record(dst, old_page_state, anon_vma);
+ __migrate_folio_record(dst, old_folio_state, anon_vma);
return 0;
}
@@ -1328,11 +1326,11 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
VM_BUG_ON_FOLIO(folio_test_anon(src) &&
!folio_test_ksm(src) && !anon_vma, src);
try_to_migrate(src, mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0);
- old_page_state |= PAGE_WAS_MAPPED;
+ old_folio_state |= FOLIO_WAS_MAPPED;
}
if (!folio_mapped(src)) {
- __migrate_folio_record(dst, old_page_state, anon_vma);
+ __migrate_folio_record(dst, old_folio_state, anon_vma);
return 0;
}
@@ -1344,7 +1342,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
if (rc == -EAGAIN)
ret = NULL;
- migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED,
+ migrate_folio_undo_src(src, old_folio_state & FOLIO_WAS_MAPPED,
anon_vma, locked, ret);
migrate_folio_undo_dst(dst, dst_locked, put_new_folio, private);
@@ -1358,13 +1356,13 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
struct list_head *ret)
{
int rc;
- int old_page_state = 0;
+ int old_folio_state = 0;
struct anon_vma *anon_vma = NULL;
bool src_deferred_split = false;
bool src_partially_mapped = false;
struct list_head *prev;
- __migrate_folio_extract(dst, &old_page_state, &anon_vma);
+ __migrate_folio_extract(dst, &old_folio_state, &anon_vma);
prev = dst->lru.prev;
list_del(&dst->lru);
@@ -1404,10 +1402,10 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
* isolated from the unevictable LRU: but this case is the easiest.
*/
folio_add_lru(dst);
- if (old_page_state & PAGE_WAS_MLOCKED)
+ if (old_folio_state & FOLIO_WAS_MLOCKED)
lru_add_drain();
- if (old_page_state & PAGE_WAS_MAPPED)
+ if (old_folio_state & FOLIO_WAS_MAPPED)
remove_migration_ptes(src, dst, 0);
out_unlock_both:
@@ -1439,11 +1437,11 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
*/
if (rc == -EAGAIN) {
list_add(&dst->lru, prev);
- __migrate_folio_record(dst, old_page_state, anon_vma);
+ __migrate_folio_record(dst, old_folio_state, anon_vma);
return rc;
}
- migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED,
+ migrate_folio_undo_src(src, old_folio_state & FOLIO_WAS_MAPPED,
anon_vma, true, ret);
migrate_folio_undo_dst(dst, true, put_new_folio, private);
@@ -1777,11 +1775,11 @@ static void migrate_folios_undo(struct list_head *src_folios,
dst = list_first_entry(dst_folios, struct folio, lru);
dst2 = list_next_entry(dst, lru);
list_for_each_entry_safe(folio, folio2, src_folios, lru) {
- int old_page_state = 0;
+ int old_folio_state = 0;
struct anon_vma *anon_vma = NULL;
- __migrate_folio_extract(dst, &old_page_state, &anon_vma);
- migrate_folio_undo_src(folio, old_page_state & PAGE_WAS_MAPPED,
+ __migrate_folio_extract(dst, &old_folio_state, &anon_vma);
+ migrate_folio_undo_src(folio, old_folio_state & FOLIO_WAS_MAPPED,
anon_vma, true, ret_folios);
list_del(&dst->lru);
migrate_folio_undo_dst(dst, true, put_new_folio, private);
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/7] mm/migrate: use migrate_info field instead of private
2026-04-28 15:50 [PATCH 0/7] Accelerate page migration with batch copying and hardware offload Shivank Garg
2026-04-28 15:50 ` [PATCH 1/7] mm/migrate: rename PAGE_ migration flags to FOLIO_ Shivank Garg
@ 2026-04-28 15:50 ` Shivank Garg
2026-04-28 15:50 ` [PATCH 3/7] mm/migrate: skip data copy for already-copied folios Shivank Garg
` (6 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Shivank Garg @ 2026-04-28 15:50 UTC (permalink / raw)
To: akpm, david
Cc: kinseyho, weixugc, ljs, Liam.Howlett, vbabka, willy, rppt, surenb,
mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
vkoul, bharata, sj, rientjes, xuezhengchu, yiannis, dave.hansen,
hannes, jhubbard, peterx, riel, shakeel.butt, stalexan, tj,
nifan.cxl, jic23, aneesh.kumar, nathan.lynch, Frank.li, djbw,
linux-kernel, linux-mm, Shivank Garg
Add an unsigned long migrate_info member to the struct folio union and
use it to store migration state (anon_vma pointer and FOLIO_WAS_*
flags) instead of using folio->private.
No functional change.
Suggested-by: David Hildenbrand <david@kernel.org>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
include/linux/mm_types.h | 1 +
mm/migrate.c | 14 +++++++-------
2 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index a308e2c23b82..f52818dcf4d2 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -426,6 +426,7 @@ struct folio {
union {
void *private;
swp_entry_t swap;
+ unsigned long migrate_info;
};
atomic_t _mapcount;
atomic_t _refcount;
diff --git a/mm/migrate.c b/mm/migrate.c
index 0c6a0ab6ecce..03c2a6f7e5e4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1130,7 +1130,7 @@ static int move_to_new_folio(struct folio *dst, struct folio *src,
}
/*
- * To record some information during migration, we use unused private
+ * To record some information during migration, we use the migrate_info
* field of struct folio of the newly allocated destination folio.
* This is safe because nobody is using it except us.
*/
@@ -1143,17 +1143,17 @@ enum {
static void __migrate_folio_record(struct folio *dst,
int old_folio_state, struct anon_vma *anon_vma)
{
- dst->private = (void *)anon_vma + old_folio_state;
+ dst->migrate_info = (unsigned long)anon_vma | old_folio_state;
}
static void __migrate_folio_extract(struct folio *dst,
int *old_folio_state, struct anon_vma **anon_vmap)
{
- unsigned long private = (unsigned long)dst->private;
+ unsigned long info = dst->migrate_info;
- *anon_vmap = (struct anon_vma *)(private & ~FOLIO_OLD_STATES);
- *old_folio_state = private & FOLIO_OLD_STATES;
- dst->private = NULL;
+ *anon_vmap = (struct anon_vma *)(info & ~FOLIO_OLD_STATES);
+ *old_folio_state = info & FOLIO_OLD_STATES;
+ dst->migrate_info = 0;
}
/* Restore the source folio to the original state upon failure */
@@ -1217,7 +1217,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
return -ENOMEM;
*dstp = dst;
- dst->private = NULL;
+ dst->migrate_info = 0;
if (!folio_trylock(src)) {
if (mode == MIGRATE_ASYNC)
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/7] mm/migrate: skip data copy for already-copied folios
2026-04-28 15:50 [PATCH 0/7] Accelerate page migration with batch copying and hardware offload Shivank Garg
2026-04-28 15:50 ` [PATCH 1/7] mm/migrate: rename PAGE_ migration flags to FOLIO_ Shivank Garg
2026-04-28 15:50 ` [PATCH 2/7] mm/migrate: use migrate_info field instead of private Shivank Garg
@ 2026-04-28 15:50 ` Shivank Garg
2026-04-28 15:50 ` [PATCH 4/7] mm/migrate: add batch-copy path in migrate_pages_batch Shivank Garg
` (5 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Shivank Garg @ 2026-04-28 15:50 UTC (permalink / raw)
To: akpm, david
Cc: kinseyho, weixugc, ljs, Liam.Howlett, vbabka, willy, rppt, surenb,
mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
vkoul, bharata, sj, rientjes, xuezhengchu, yiannis, dave.hansen,
hannes, jhubbard, peterx, riel, shakeel.butt, stalexan, tj,
nifan.cxl, jic23, aneesh.kumar, nathan.lynch, Frank.li, djbw,
linux-kernel, linux-mm, Shivank Garg
Add a FOLIO_ALREADY_COPIED flag to the dst->migrate_info migration
state. When set, __migrate_folio() skips folio_mc_copy() and
performs metadata-only migration. All callers currently pass
already_copied=false. The batch-copy path enables it later in a
subsequent patch.
Move the dst->migrate_info state enum earlier in the file so
__migrate_folio() and move_to_new_folio() can see FOLIO_ALREADY_COPIED.
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
mm/migrate.c | 53 +++++++++++++++++++++++++++++++---------------------
1 file changed, 32 insertions(+), 21 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 03c2a6f7e5e4..c493e67e359d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -850,6 +850,19 @@ void folio_migrate_flags(struct folio *newfolio, struct folio *folio)
}
EXPORT_SYMBOL(folio_migrate_flags);
+/*
+ * To record some information during migration, we use the migrate_info
+ * field of struct folio of the newly allocated destination folio.
+ * This is safe because nobody is using it except us.
+ */
+enum {
+ FOLIO_WAS_MAPPED = BIT(0),
+ FOLIO_WAS_MLOCKED = BIT(1),
+ FOLIO_ALREADY_COPIED = BIT(2),
+ FOLIO_OLD_STATES = FOLIO_WAS_MAPPED | FOLIO_WAS_MLOCKED |
+ FOLIO_ALREADY_COPIED,
+};
+
/************************************************************
* Migration functions
***********************************************************/
@@ -859,14 +872,20 @@ static int __migrate_folio(struct address_space *mapping, struct folio *dst,
enum migrate_mode mode)
{
int rc, expected_count = folio_expected_ref_count(src) + 1;
+ bool already_copied = (dst->migrate_info & FOLIO_ALREADY_COPIED);
+
+ if (already_copied)
+ dst->migrate_info = 0;
/* Check whether src does not have extra refs before we do more work */
if (folio_ref_count(src) != expected_count)
return -EAGAIN;
- rc = folio_mc_copy(dst, src);
- if (unlikely(rc))
- return rc;
+ if (!already_copied) {
+ rc = folio_mc_copy(dst, src);
+ if (unlikely(rc))
+ return rc;
+ }
rc = __folio_migrate_mapping(mapping, dst, src, expected_count);
if (rc)
@@ -1090,7 +1109,7 @@ static int fallback_migrate_folio(struct address_space *mapping,
* 0 - success
*/
static int move_to_new_folio(struct folio *dst, struct folio *src,
- enum migrate_mode mode)
+ enum migrate_mode mode, bool already_copied)
{
struct address_space *mapping = folio_mapping(src);
int rc = -EAGAIN;
@@ -1098,6 +1117,9 @@ static int move_to_new_folio(struct folio *dst, struct folio *src,
VM_BUG_ON_FOLIO(!folio_test_locked(src), src);
VM_BUG_ON_FOLIO(!folio_test_locked(dst), dst);
+ if (already_copied)
+ dst->migrate_info = FOLIO_ALREADY_COPIED;
+
if (!mapping)
rc = migrate_folio(mapping, dst, src, mode);
else if (mapping_inaccessible(mapping))
@@ -1129,17 +1151,6 @@ static int move_to_new_folio(struct folio *dst, struct folio *src,
return rc;
}
-/*
- * To record some information during migration, we use the migrate_info
- * field of struct folio of the newly allocated destination folio.
- * This is safe because nobody is using it except us.
- */
-enum {
- FOLIO_WAS_MAPPED = BIT(0),
- FOLIO_WAS_MLOCKED = BIT(1),
- FOLIO_OLD_STATES = FOLIO_WAS_MAPPED | FOLIO_WAS_MLOCKED,
-};
-
static void __migrate_folio_record(struct folio *dst,
int old_folio_state, struct anon_vma *anon_vma)
{
@@ -1353,7 +1364,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
struct folio *src, struct folio *dst,
enum migrate_mode mode, enum migrate_reason reason,
- struct list_head *ret)
+ struct list_head *ret, bool already_copied)
{
int rc;
int old_folio_state = 0;
@@ -1379,7 +1390,7 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
src_partially_mapped = folio_test_partially_mapped(src);
}
- rc = move_to_new_folio(dst, src, mode);
+ rc = move_to_new_folio(dst, src, mode, already_copied);
if (rc)
goto out;
@@ -1536,7 +1547,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
}
if (!folio_mapped(src))
- rc = move_to_new_folio(dst, src, mode);
+ rc = move_to_new_folio(dst, src, mode, false);
if (page_was_mapped)
remove_migration_ptes(src, !rc ? dst : src, ttu);
@@ -1720,7 +1731,7 @@ static void migrate_folios_move(struct list_head *src_folios,
struct list_head *ret_folios,
struct migrate_pages_stats *stats,
int *retry, int *thp_retry, int *nr_failed,
- int *nr_retry_pages)
+ int *nr_retry_pages, bool already_copied)
{
struct folio *folio, *folio2, *dst, *dst2;
bool is_thp;
@@ -1737,7 +1748,7 @@ static void migrate_folios_move(struct list_head *src_folios,
rc = migrate_folio_move(put_new_folio, private,
folio, dst, mode,
- reason, ret_folios);
+ reason, ret_folios, already_copied);
/*
* The rules are:
* 0: folio will be freed
@@ -1994,7 +2005,7 @@ static int migrate_pages_batch(struct list_head *from,
migrate_folios_move(&unmap_folios, &dst_folios,
put_new_folio, private, mode, reason,
ret_folios, stats, &retry, &thp_retry,
- &nr_failed, &nr_retry_pages);
+ &nr_failed, &nr_retry_pages, false);
}
nr_failed += retry;
stats->nr_thp_failed += thp_retry;
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 4/7] mm/migrate: add batch-copy path in migrate_pages_batch
2026-04-28 15:50 [PATCH 0/7] Accelerate page migration with batch copying and hardware offload Shivank Garg
` (2 preceding siblings ...)
2026-04-28 15:50 ` [PATCH 3/7] mm/migrate: skip data copy for already-copied folios Shivank Garg
@ 2026-04-28 15:50 ` Shivank Garg
2026-04-28 15:50 ` [PATCH 5/7] mm/migrate: add copy offload registration infrastructure Shivank Garg
` (4 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Shivank Garg @ 2026-04-28 15:50 UTC (permalink / raw)
To: akpm, david
Cc: kinseyho, weixugc, ljs, Liam.Howlett, vbabka, willy, rppt, surenb,
mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
vkoul, bharata, sj, rientjes, xuezhengchu, yiannis, dave.hansen,
hannes, jhubbard, peterx, riel, shakeel.butt, stalexan, tj,
nifan.cxl, jic23, aneesh.kumar, nathan.lynch, Frank.li, djbw,
linux-kernel, linux-mm, Shivank Garg
Add folios_mc_copy() which walks list of src and dst folios in lockstep,
and copies folio content via folio_mc_copy(). folios_cnt parameter is
unused here, but is part of the offload_copy callback signature used by
later patches in the series.
Split unmapped folios into batch-eligible (unmap_batch/dst_batch) and
standard (unmap_single/dst_single) lists, gated by the
migrate_offload_enabled which is off by default. So, when no offload
driver is active, the branch is never taken and everything goes
through the standard path.
After TLB flush, batch copy the eligible folios via folios_mc_copy()
and pass already_copied=true into migrate_folios_move() so
__migrate_folio() skips the per-folio copy.
On batch copy failure, already_copied flag stays false and each folio
fall back to individual copy.
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
include/linux/mm.h | 2 ++
mm/migrate.c | 61 +++++++++++++++++++++++++++++++++++-----------
mm/util.c | 30 +++++++++++++++++++++++
3 files changed, 79 insertions(+), 14 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0b776907152e..e6ab9bc3de8f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1917,6 +1917,8 @@ void __folio_put(struct folio *folio);
void split_page(struct page *page, unsigned int order);
void folio_copy(struct folio *dst, struct folio *src);
int folio_mc_copy(struct folio *dst, struct folio *src);
+int folios_mc_copy(struct list_head *dst_list, struct list_head *src_list,
+ unsigned int __always_unused folios_cnt);
unsigned long nr_free_buffer_pages(void);
diff --git a/mm/migrate.c b/mm/migrate.c
index c493e67e359d..6c2f1cb66f96 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -43,6 +43,7 @@
#include <linux/sched/sysctl.h>
#include <linux/memory-tiers.h>
#include <linux/pagewalk.h>
+#include <linux/jump_label.h>
#include <asm/tlbflush.h>
@@ -51,6 +52,8 @@
#include "internal.h"
#include "swap.h"
+DEFINE_STATIC_KEY_FALSE(migrate_offload_enabled);
+
static const struct movable_operations *offline_movable_ops;
static const struct movable_operations *zsmalloc_movable_ops;
@@ -1724,6 +1727,12 @@ static int migrate_hugetlbs(struct list_head *from, new_folio_t get_new_folio,
return nr_failed;
}
+/* movable_ops folios have their own migrate path */
+static bool folio_supports_batch_copy(struct folio *folio)
+{
+ return likely(!page_has_movable_ops(&folio->page));
+}
+
static void migrate_folios_move(struct list_head *src_folios,
struct list_head *dst_folios,
free_folio_t put_new_folio, unsigned long private,
@@ -1752,7 +1761,7 @@ static void migrate_folios_move(struct list_head *src_folios,
/*
* The rules are:
* 0: folio will be freed
- * -EAGAIN: stay on the unmap_folios list
+ * -EAGAIN: stay on the src_folios list
* Other errno: put on ret_folios list
*/
switch (rc) {
@@ -1823,8 +1832,12 @@ static int migrate_pages_batch(struct list_head *from,
bool is_large = false;
struct folio *folio, *folio2, *dst = NULL;
int rc, rc_saved = 0, nr_pages;
- LIST_HEAD(unmap_folios);
- LIST_HEAD(dst_folios);
+ unsigned int nr_batch = 0;
+ bool batch_copied = false;
+ LIST_HEAD(unmap_batch);
+ LIST_HEAD(dst_batch);
+ LIST_HEAD(unmap_single);
+ LIST_HEAD(dst_single);
bool nosplit = (reason == MR_NUMA_MISPLACED);
VM_WARN_ON_ONCE(mode != MIGRATE_ASYNC &&
@@ -1919,8 +1932,8 @@ static int migrate_pages_batch(struct list_head *from,
private, folio, &dst, mode, ret_folios);
/*
* The rules are:
- * 0: folio will be put on unmap_folios list,
- * dst folio put on dst_folios list
+ * 0: folio put on unmap_batch or unmap_single,
+ * dst folio put on dst_batch or dst_single
* -EAGAIN: stay on the from list
* -ENOMEM: stay on the from list
* Other errno: put on ret_folios list
@@ -1961,7 +1974,7 @@ static int migrate_pages_batch(struct list_head *from,
/* nr_failed isn't updated for not used */
stats->nr_thp_failed += thp_retry;
rc_saved = rc;
- if (list_empty(&unmap_folios))
+ if (list_empty(&unmap_batch) && list_empty(&unmap_single))
goto out;
else
goto move;
@@ -1971,8 +1984,15 @@ static int migrate_pages_batch(struct list_head *from,
nr_retry_pages += nr_pages;
break;
case 0:
- list_move_tail(&folio->lru, &unmap_folios);
- list_add_tail(&dst->lru, &dst_folios);
+ if (static_branch_unlikely(&migrate_offload_enabled) &&
+ folio_supports_batch_copy(folio)) {
+ list_move_tail(&folio->lru, &unmap_batch);
+ list_add_tail(&dst->lru, &dst_batch);
+ nr_batch++;
+ } else {
+ list_move_tail(&folio->lru, &unmap_single);
+ list_add_tail(&dst->lru, &dst_single);
+ }
break;
default:
/*
@@ -1995,17 +2015,28 @@ static int migrate_pages_batch(struct list_head *from,
/* Flush TLBs for all unmapped folios */
try_to_unmap_flush();
+ /* Batch-copy eligible folios before the move phase */
+ if (!list_empty(&unmap_batch)) {
+ rc = folios_mc_copy(&dst_batch, &unmap_batch, nr_batch);
+ batch_copied = (rc == 0);
+ }
+
retry = 1;
for (pass = 0; pass < nr_pass && retry; pass++) {
retry = 0;
thp_retry = 0;
nr_retry_pages = 0;
- /* Move the unmapped folios */
- migrate_folios_move(&unmap_folios, &dst_folios,
- put_new_folio, private, mode, reason,
- ret_folios, stats, &retry, &thp_retry,
- &nr_failed, &nr_retry_pages, false);
+ if (!list_empty(&unmap_batch))
+ migrate_folios_move(&unmap_batch, &dst_batch, put_new_folio,
+ private, mode, reason, ret_folios, stats,
+ &retry, &thp_retry, &nr_failed,
+ &nr_retry_pages, batch_copied);
+ if (!list_empty(&unmap_single))
+ migrate_folios_move(&unmap_single, &dst_single, put_new_folio,
+ private, mode, reason, ret_folios, stats,
+ &retry, &thp_retry, &nr_failed,
+ &nr_retry_pages, false);
}
nr_failed += retry;
stats->nr_thp_failed += thp_retry;
@@ -2014,7 +2045,9 @@ static int migrate_pages_batch(struct list_head *from,
rc = rc_saved ? : nr_failed;
out:
/* Cleanup remaining folios */
- migrate_folios_undo(&unmap_folios, &dst_folios,
+ migrate_folios_undo(&unmap_batch, &dst_batch,
+ put_new_folio, private, ret_folios);
+ migrate_folios_undo(&unmap_single, &dst_single,
put_new_folio, private, ret_folios);
return rc;
diff --git a/mm/util.c b/mm/util.c
index 232c3930a662..77eeb285def1 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -778,6 +778,36 @@ int folio_mc_copy(struct folio *dst, struct folio *src)
}
EXPORT_SYMBOL(folio_mc_copy);
+/**
+ * folios_mc_copy - Copy the contents of list of folios.
+ * @dst_list: destination folio list.
+ * @src_list: source folio list.
+ * @folios_cnt: unused here, present for callback signature compatibility.
+ *
+ * Walks list of src and dst folios in lockstep and copies folio
+ * content via folio_mc_copy(). The caller must ensure both lists have
+ * the same number of entries. This may sleep.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int folios_mc_copy(struct list_head *dst_list, struct list_head *src_list,
+ unsigned int __always_unused folios_cnt)
+{
+ struct folio *src, *dst;
+ int ret;
+
+ dst = list_first_entry(dst_list, struct folio, lru);
+ list_for_each_entry(src, src_list, lru) {
+ ret = folio_mc_copy(dst, src);
+ if (ret)
+ return ret;
+ dst = list_next_entry(dst, lru);
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(folios_mc_copy);
+
int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS;
static int sysctl_overcommit_ratio __read_mostly = 50;
static unsigned long sysctl_overcommit_kbytes __read_mostly;
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 5/7] mm/migrate: add copy offload registration infrastructure
2026-04-28 15:50 [PATCH 0/7] Accelerate page migration with batch copying and hardware offload Shivank Garg
` (3 preceding siblings ...)
2026-04-28 15:50 ` [PATCH 4/7] mm/migrate: add batch-copy path in migrate_pages_batch Shivank Garg
@ 2026-04-28 15:50 ` Shivank Garg
2026-04-28 15:50 ` [PATCH 6/7] drivers/migrate_offload: add DMA batch copy driver (dcbm) Shivank Garg
` (3 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Shivank Garg @ 2026-04-28 15:50 UTC (permalink / raw)
To: akpm, david
Cc: kinseyho, weixugc, ljs, Liam.Howlett, vbabka, willy, rppt, surenb,
mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
vkoul, bharata, sj, rientjes, xuezhengchu, yiannis, dave.hansen,
hannes, jhubbard, peterx, riel, shakeel.butt, stalexan, tj,
nifan.cxl, jic23, aneesh.kumar, nathan.lynch, Frank.li, djbw,
linux-kernel, linux-mm, Shivank Garg, Mike Day
Add a registration interface that lets a single offload provider
(DMA, multi-threaded CPU copy, etc) take over the batch folio copy
performed by migrate_pages_batch().
The provider fills in a struct migrator with an offload_copy()
callback and calls migrate_offload_register(). Registration patches
the migrate_offload_copy() static_call and flips the
migrate_offload_enabled static branch. The migrate_offload_unregister()
reverts both.
Whether a migration reason is batch-copy eligible is decided by the
core in migrate_offload_do_batch(). A migrator may decline a particular
batch (e.g. when nr_batch is too small to amortize setup) by returning
-EOPNOTSUPP, and the move phase falls back to per-folio CPU copy.
Only one migrator can be active at a time. A second registration
returns -EBUSY, and only the active migrator can unregister itself.
The static_call dispatch is protected by SRCU so that the
synchronize_srcu() in unregister waits for all in-flight copy before
the module reference is dropped.
Co-developed-by: Mike Day <michael.day@amd.com>
Signed-off-by: Mike Day <michael.day@amd.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
include/linux/migrate_copy_offload.h | 44 +++++++++++++
mm/Kconfig | 6 ++
mm/Makefile | 1 +
mm/migrate.c | 57 +++++++++++++++--
mm/migrate_copy_offload.c | 94 ++++++++++++++++++++++++++++
5 files changed, 198 insertions(+), 4 deletions(-)
create mode 100644 include/linux/migrate_copy_offload.h
create mode 100644 mm/migrate_copy_offload.c
diff --git a/include/linux/migrate_copy_offload.h b/include/linux/migrate_copy_offload.h
new file mode 100644
index 000000000000..d68b10a84743
--- /dev/null
+++ b/include/linux/migrate_copy_offload.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_MIGRATE_COPY_OFFLOAD_H
+#define _LINUX_MIGRATE_COPY_OFFLOAD_H
+
+#include <linux/errno.h>
+#include <linux/jump_label.h>
+#include <linux/srcu.h>
+#include <linux/types.h>
+
+struct list_head;
+struct module;
+
+#define MIGRATOR_NAME_LEN 32
+
+/**
+ * struct migrator - batch-copy provider for page migration.
+ * @name: name of the provider.
+ * @offload_copy: copy @folio_cnt folios from @src_list to @dst_list.
+ *
+ * The migrator may inspect @folio_cnt to decide whether the batch
+ * is worth offloading, e.g. skip when the batch is too small to
+ * amortize setup cost. If returns error, the core falls back to CPU copy.
+ *
+ * @owner: module providing the migrator.
+ */
+struct migrator {
+ char name[MIGRATOR_NAME_LEN];
+ int (*offload_copy)(struct list_head *dst_list,
+ struct list_head *src_list,
+ unsigned int folio_cnt);
+ struct module *owner;
+};
+
+#ifdef CONFIG_MIGRATION_COPY_OFFLOAD
+extern struct static_key_false migrate_offload_enabled;
+extern struct srcu_struct migrate_offload_srcu;
+int migrate_offload_register(struct migrator *m);
+int migrate_offload_unregister(struct migrator *m);
+#else
+static inline int migrate_offload_register(struct migrator *m) { return -EOPNOTSUPP; }
+static inline int migrate_offload_unregister(struct migrator *m) { return -EOPNOTSUPP; }
+#endif
+
+#endif /* _LINUX_MIGRATE_COPY_OFFLOAD_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index e8bf1e9e6ad9..325d79619680 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -647,6 +647,12 @@ config MIGRATION
config DEVICE_MIGRATION
def_bool MIGRATION && ZONE_DEVICE
+# Page-migration batch-copy offload infrastructure.
+# Selected by migrator drivers (e.g. CONFIG_DCBM_DMA).
+config MIGRATION_COPY_OFFLOAD
+ bool
+ depends on MIGRATION
+
config ARCH_ENABLE_HUGEPAGE_MIGRATION
bool
diff --git a/mm/Makefile b/mm/Makefile
index 8ad2ab08244e..db1ac8097089 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -96,6 +96,7 @@ obj-$(CONFIG_FAILSLAB) += failslab.o
obj-$(CONFIG_FAIL_PAGE_ALLOC) += fail_page_alloc.o
obj-$(CONFIG_MEMTEST) += memtest.o
obj-$(CONFIG_MIGRATION) += migrate.o
+obj-$(CONFIG_MIGRATION_COPY_OFFLOAD) += migrate_copy_offload.o
obj-$(CONFIG_NUMA) += memory-tiers.o
obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o
obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o
diff --git a/mm/migrate.c b/mm/migrate.c
index 6c2f1cb66f96..9af070f9a1f2 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -44,6 +44,8 @@
#include <linux/memory-tiers.h>
#include <linux/pagewalk.h>
#include <linux/jump_label.h>
+#include <linux/static_call.h>
+#include <linux/migrate_copy_offload.h>
#include <asm/tlbflush.h>
@@ -54,6 +56,51 @@
DEFINE_STATIC_KEY_FALSE(migrate_offload_enabled);
+#ifdef CONFIG_MIGRATION_COPY_OFFLOAD
+DEFINE_SRCU(migrate_offload_srcu);
+DEFINE_STATIC_CALL(migrate_offload_copy, folios_mc_copy);
+
+static bool migrate_offload_do_batch(int reason)
+{
+ if (!static_branch_unlikely(&migrate_offload_enabled))
+ return false;
+
+ switch (reason) {
+ case MR_COMPACTION:
+ case MR_SYSCALL:
+ case MR_DEMOTION:
+ case MR_NUMA_MISPLACED:
+ return true;
+ default:
+ return false;
+ }
+}
+
+static int migrate_offload_batch_copy(struct list_head *dst_batch,
+ struct list_head *src_batch,
+ unsigned int nr_batch)
+{
+ int idx, rc;
+
+ idx = srcu_read_lock(&migrate_offload_srcu);
+ rc = static_call(migrate_offload_copy)(dst_batch, src_batch, nr_batch);
+ srcu_read_unlock(&migrate_offload_srcu, idx);
+ return rc;
+}
+#else
+static bool migrate_offload_do_batch(int reason)
+{
+ return false;
+}
+
+static int migrate_offload_batch_copy(struct list_head *dst_batch,
+ struct list_head *src_batch,
+ unsigned int nr_batch)
+{
+ return -EOPNOTSUPP;
+}
+#endif
+
static const struct movable_operations *offline_movable_ops;
static const struct movable_operations *zsmalloc_movable_ops;
@@ -1833,7 +1880,7 @@ static int migrate_pages_batch(struct list_head *from,
struct folio *folio, *folio2, *dst = NULL;
int rc, rc_saved = 0, nr_pages;
unsigned int nr_batch = 0;
- bool batch_copied = false;
+ bool do_batch = false, batch_copied = false;
LIST_HEAD(unmap_batch);
LIST_HEAD(dst_batch);
LIST_HEAD(unmap_single);
@@ -1843,6 +1890,8 @@ static int migrate_pages_batch(struct list_head *from,
VM_WARN_ON_ONCE(mode != MIGRATE_ASYNC &&
!list_empty(from) && !list_is_singular(from));
+ do_batch = migrate_offload_do_batch(reason);
+
for (pass = 0; pass < nr_pass && retry; pass++) {
retry = 0;
thp_retry = 0;
@@ -1984,8 +2033,7 @@ static int migrate_pages_batch(struct list_head *from,
nr_retry_pages += nr_pages;
break;
case 0:
- if (static_branch_unlikely(&migrate_offload_enabled) &&
- folio_supports_batch_copy(folio)) {
+ if (do_batch && folio_supports_batch_copy(folio)) {
list_move_tail(&folio->lru, &unmap_batch);
list_add_tail(&dst->lru, &dst_batch);
nr_batch++;
@@ -2017,7 +2065,8 @@ static int migrate_pages_batch(struct list_head *from,
/* Batch-copy eligible folios before the move phase */
if (!list_empty(&unmap_batch)) {
- rc = folios_mc_copy(&dst_batch, &unmap_batch, nr_batch);
+ rc = migrate_offload_batch_copy(&dst_batch, &unmap_batch,
+ nr_batch);
batch_copied = (rc == 0);
}
diff --git a/mm/migrate_copy_offload.c b/mm/migrate_copy_offload.c
new file mode 100644
index 000000000000..6f837c725239
--- /dev/null
+++ b/mm/migrate_copy_offload.c
@@ -0,0 +1,94 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/jump_label.h>
+#include <linux/module.h>
+#include <linux/srcu.h>
+#include <linux/migrate.h>
+#include <linux/migrate_copy_offload.h>
+#include <linux/static_call.h>
+
+static DEFINE_MUTEX(migrator_mutex);
+static struct migrator *active_migrator;
+
+DECLARE_STATIC_CALL(migrate_offload_copy, folios_mc_copy);
+
+/**
+ * migrate_offload_register - register a batch-copy provider for page migration.
+ * @m: migrator to install.
+ *
+ * Only one provider can be active at a time, returns -EBUSY if another migrator
+ * is already registered.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int migrate_offload_register(struct migrator *m)
+{
+ int ret = 0;
+
+ if (!m || !m->offload_copy || !m->owner)
+ return -EINVAL;
+
+ mutex_lock(&migrator_mutex);
+ if (active_migrator) {
+ ret = -EBUSY;
+ goto unlock;
+ }
+
+ if (!try_module_get(m->owner)) {
+ ret = -ENODEV;
+ goto unlock;
+ }
+
+ static_call_update(migrate_offload_copy, m->offload_copy);
+ active_migrator = m;
+ static_branch_enable(&migrate_offload_enabled);
+
+unlock:
+ mutex_unlock(&migrator_mutex);
+
+ if (ret)
+ pr_err("migrate_offload: %s: failed to register (%d)\n",
+ m->name, ret);
+ else
+ pr_info("migrate_offload: enabled by %s\n", m->name);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(migrate_offload_register);
+
+/**
+ * migrate_offload_unregister - unregister the active batch-copy provider.
+ * @m: migrator to remove (must be the currently active one).
+ *
+ * Reverts static_call targets and waits for SRCU grace period so that
+ * no in-flight migration is still calling the driver functions before
+ * releasing the module.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int migrate_offload_unregister(struct migrator *m)
+{
+ struct module *owner;
+
+ mutex_lock(&migrator_mutex);
+ if (active_migrator != m) {
+ mutex_unlock(&migrator_mutex);
+ return -EINVAL;
+ }
+
+ /*
+ * Disable the static branch first so new migrate_pages_batch calls
+ * won't enter the batch copy path.
+ */
+ static_branch_disable(&migrate_offload_enabled);
+ static_call_update(migrate_offload_copy, folios_mc_copy);
+ owner = active_migrator->owner;
+ active_migrator = NULL;
+ mutex_unlock(&migrator_mutex);
+
+ /* Wait for all in-flight callers to finish before module_put(). */
+ synchronize_srcu(&migrate_offload_srcu);
+ module_put(owner);
+
+ pr_info("migrate_offload: disabled by %s\n", m->name);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(migrate_offload_unregister);
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 6/7] drivers/migrate_offload: add DMA batch copy driver (dcbm)
2026-04-28 15:50 [PATCH 0/7] Accelerate page migration with batch copying and hardware offload Shivank Garg
` (4 preceding siblings ...)
2026-04-28 15:50 ` [PATCH 5/7] mm/migrate: add copy offload registration infrastructure Shivank Garg
@ 2026-04-28 15:50 ` Shivank Garg
2026-04-28 15:50 ` [PATCH 7/7] mm/migrate: adjust NR_MAX_BATCHED_MIGRATION for testing Shivank Garg
` (2 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Shivank Garg @ 2026-04-28 15:50 UTC (permalink / raw)
To: akpm, david
Cc: kinseyho, weixugc, ljs, Liam.Howlett, vbabka, willy, rppt, surenb,
mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
vkoul, bharata, sj, rientjes, xuezhengchu, yiannis, dave.hansen,
hannes, jhubbard, peterx, riel, shakeel.butt, stalexan, tj,
nifan.cxl, jic23, aneesh.kumar, nathan.lynch, Frank.li, djbw,
linux-kernel, linux-mm, Shivank Garg
Simple DMAEngine-based migrator that plugs into the page migration
copy offload infrastructure to batch-copy folios via DMA memcpy
channels. It is intended for testing the offload plumbing and as a
template for future migrators (SDXI, multi-threaded CPU copy, etc.).
When DMA fails, the callback returns an error and the migration path
falls back to per-folio CPU copy.
Loading the module exposes attributes under /sys/module/dcbm/:
offloading - enable/disable DMA offload
nr_dma_chan - max DMA channels to use
folios_migrated - folios copied via DMA
folios_failures - fallback count
CONFIG_DCBM_DMA selects MIGRATION_COPY_OFFLOAD so enabling the
driver pulls in the infrastructure automatically.
Channel acquisition uses dma_request_chan_by_mask(DMA_MEMCPY), which
works for providers that set DMA_PRIVATE (e.g. AMD PTDMA). Generic
mem-to-mem engines that do not set DMA_PRIVATE (e.g. SDXI) should
acquire channels via dma_find_channel(DMA_MEMCPY) or the async_tx
APIs, which can be added in a follow-up.
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
drivers/Kconfig | 2 +
drivers/Makefile | 2 +
drivers/migrate_offload/Kconfig | 9 +
drivers/migrate_offload/Makefile | 1 +
drivers/migrate_offload/dcbm/Makefile | 1 +
drivers/migrate_offload/dcbm/dcbm.c | 440 ++++++++++++++++++++++++++
6 files changed, 455 insertions(+)
create mode 100644 drivers/migrate_offload/Kconfig
create mode 100644 drivers/migrate_offload/Makefile
create mode 100644 drivers/migrate_offload/dcbm/Makefile
create mode 100644 drivers/migrate_offload/dcbm/dcbm.c
diff --git a/drivers/Kconfig b/drivers/Kconfig
index f2bed2ddeb66..3e83a1475cbc 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -253,4 +253,6 @@ source "drivers/cdx/Kconfig"
source "drivers/resctrl/Kconfig"
+source "drivers/migrate_offload/Kconfig"
+
endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 0841ea851847..88cb8e3e88df 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -42,6 +42,8 @@ obj-y += clk/
# really early.
obj-$(CONFIG_DMADEVICES) += dma/
+obj-$(CONFIG_MIGRATION_COPY_OFFLOAD) += migrate_offload/
+
# SOC specific infrastructure drivers.
obj-y += soc/
obj-$(CONFIG_PM_GENERIC_DOMAINS) += pmdomain/
diff --git a/drivers/migrate_offload/Kconfig b/drivers/migrate_offload/Kconfig
new file mode 100644
index 000000000000..930d8605c15d
--- /dev/null
+++ b/drivers/migrate_offload/Kconfig
@@ -0,0 +1,9 @@
+config DCBM_DMA
+ tristate "DMA Core Batch Migrator"
+ depends on MIGRATION && DMA_ENGINE
+ select MIGRATION_COPY_OFFLOAD
+ help
+ DMA-based batch copy engine for page migration. Uses
+ DMAEngine memcpy channels to offload folio data copies
+ during migration. Primarily intended for testing the copy
+ offload infrastructure.
diff --git a/drivers/migrate_offload/Makefile b/drivers/migrate_offload/Makefile
new file mode 100644
index 000000000000..9e16018beb15
--- /dev/null
+++ b/drivers/migrate_offload/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_DCBM_DMA) += dcbm/
diff --git a/drivers/migrate_offload/dcbm/Makefile b/drivers/migrate_offload/dcbm/Makefile
new file mode 100644
index 000000000000..56ba47cce0f1
--- /dev/null
+++ b/drivers/migrate_offload/dcbm/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_DCBM_DMA) += dcbm.o
diff --git a/drivers/migrate_offload/dcbm/dcbm.c b/drivers/migrate_offload/dcbm/dcbm.c
new file mode 100644
index 000000000000..893580cb9fac
--- /dev/null
+++ b/drivers/migrate_offload/dcbm/dcbm.c
@@ -0,0 +1,440 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * DMA Core Batch Migrator (DCBM)
+ *
+ * Uses DMAEngine memcpy channels to offload batch folio copies during
+ * page migration. Reference driver meant for testing the offload
+ * infrastructure.
+ *
+ * Copyright (C) 2024-26 Advanced Micro Devices, Inc.
+ */
+
+#include <linux/module.h>
+#include <linux/dma-mapping.h>
+#include <linux/dmaengine.h>
+#include <linux/migrate.h>
+#include <linux/migrate_copy_offload.h>
+
+#define MAX_DMA_CHANNELS 16
+
+static atomic_long_t folios_migrated;
+static atomic_long_t folios_failures;
+
+static bool offloading_enabled;
+static unsigned int nr_dma_channels = 1;
+static DEFINE_MUTEX(dcbm_mutex);
+
+struct dma_work {
+ struct dma_chan *chan;
+ struct completion done;
+ atomic_t pending;
+ struct sg_table *src_sgt;
+ struct sg_table *dst_sgt;
+ bool mapped;
+};
+
+static void dma_completion_callback(void *data)
+{
+ struct dma_work *work = data;
+
+ if (atomic_dec_and_test(&work->pending))
+ complete(&work->done);
+}
+
+static int setup_sg_tables(struct dma_work *work, struct list_head **src_pos,
+ struct list_head **dst_pos, int nr)
+{
+ struct scatterlist *sg_src, *sg_dst;
+ struct device *dev;
+ int i, ret;
+
+ work->src_sgt = kmalloc_obj(*work->src_sgt, GFP_KERNEL);
+ if (!work->src_sgt)
+ return -ENOMEM;
+ work->dst_sgt = kmalloc_obj(*work->dst_sgt, GFP_KERNEL);
+ if (!work->dst_sgt) {
+ ret = -ENOMEM;
+ goto err_free_src;
+ }
+
+ ret = sg_alloc_table(work->src_sgt, nr, GFP_KERNEL);
+ if (ret)
+ goto err_free_dst;
+ ret = sg_alloc_table(work->dst_sgt, nr, GFP_KERNEL);
+ if (ret)
+ goto err_free_src_table;
+
+ sg_src = work->src_sgt->sgl;
+ sg_dst = work->dst_sgt->sgl;
+ for (i = 0; i < nr; i++) {
+ struct folio *src = list_entry(*src_pos, struct folio, lru);
+ struct folio *dst = list_entry(*dst_pos, struct folio, lru);
+
+ sg_set_folio(sg_src, src, folio_size(src), 0);
+ sg_set_folio(sg_dst, dst, folio_size(dst), 0);
+
+ *src_pos = (*src_pos)->next;
+ *dst_pos = (*dst_pos)->next;
+
+ if (i < nr - 1) {
+ sg_src = sg_next(sg_src);
+ sg_dst = sg_next(sg_dst);
+ }
+ }
+
+ dev = dmaengine_get_dma_device(work->chan);
+ if (!dev) {
+ ret = -ENODEV;
+ goto err_free_dst_table;
+ }
+ ret = dma_map_sgtable(dev, work->src_sgt, DMA_TO_DEVICE,
+ DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING);
+ if (ret)
+ goto err_free_dst_table;
+ ret = dma_map_sgtable(dev, work->dst_sgt, DMA_FROM_DEVICE,
+ DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING);
+ if (ret)
+ goto err_unmap_src;
+
+ /*
+ * TODO: IOMMU may merge segments unevenly on the two sides, fall back
+ * bail to CPU copy. In practice, I have not observed merging in tests.
+ * Handling unequal nents is left for follow-up.
+ */
+ if (work->src_sgt->nents != work->dst_sgt->nents) {
+ ret = -EINVAL;
+ goto err_unmap_dst;
+ }
+ work->mapped = true;
+ return 0;
+
+err_unmap_dst:
+ dma_unmap_sgtable(dev, work->dst_sgt, DMA_FROM_DEVICE,
+ DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING);
+err_unmap_src:
+ dma_unmap_sgtable(dev, work->src_sgt, DMA_TO_DEVICE,
+ DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING);
+err_free_dst_table:
+ sg_free_table(work->dst_sgt);
+err_free_src_table:
+ sg_free_table(work->src_sgt);
+err_free_dst:
+ kfree(work->dst_sgt);
+ work->dst_sgt = NULL;
+err_free_src:
+ kfree(work->src_sgt);
+ work->src_sgt = NULL;
+ return ret;
+}
+
+static void cleanup_dma_work(struct dma_work *works, int actual_channels)
+{
+ struct device *dev;
+ int i;
+
+ if (!works)
+ return;
+
+ for (i = 0; i < actual_channels; i++) {
+ if (!works[i].chan)
+ continue;
+
+ dev = dmaengine_get_dma_device(works[i].chan);
+
+ if (works[i].mapped)
+ dmaengine_terminate_sync(works[i].chan);
+
+ if (dev && works[i].mapped) {
+ if (works[i].src_sgt) {
+ dma_unmap_sgtable(dev, works[i].src_sgt,
+ DMA_TO_DEVICE,
+ DMA_ATTR_SKIP_CPU_SYNC |
+ DMA_ATTR_NO_KERNEL_MAPPING);
+ sg_free_table(works[i].src_sgt);
+ kfree(works[i].src_sgt);
+ }
+ if (works[i].dst_sgt) {
+ dma_unmap_sgtable(dev, works[i].dst_sgt,
+ DMA_FROM_DEVICE,
+ DMA_ATTR_SKIP_CPU_SYNC |
+ DMA_ATTR_NO_KERNEL_MAPPING);
+ sg_free_table(works[i].dst_sgt);
+ kfree(works[i].dst_sgt);
+ }
+ }
+ dma_release_channel(works[i].chan);
+ }
+ kfree(works);
+}
+
+static int submit_dma_transfers(struct dma_work *work)
+{
+ struct scatterlist *sg_src, *sg_dst;
+ struct dma_async_tx_descriptor *tx;
+ unsigned long flags = DMA_CTRL_ACK;
+ dma_cookie_t cookie;
+ int i;
+
+ atomic_set(&work->pending, 1);
+
+ sg_src = work->src_sgt->sgl;
+ sg_dst = work->dst_sgt->sgl;
+ for_each_sgtable_dma_sg(work->src_sgt, sg_src, i) {
+ if (i == work->src_sgt->nents - 1)
+ flags |= DMA_PREP_INTERRUPT;
+
+ tx = dmaengine_prep_dma_memcpy(work->chan,
+ sg_dma_address(sg_dst),
+ sg_dma_address(sg_src),
+ sg_dma_len(sg_src), flags);
+ if (!tx) {
+ atomic_set(&work->pending, 0);
+ return -EIO;
+ }
+
+ if (i == work->src_sgt->nents - 1) {
+ tx->callback = dma_completion_callback;
+ tx->callback_param = work;
+ }
+
+ cookie = dmaengine_submit(tx);
+ if (dma_submit_error(cookie)) {
+ atomic_set(&work->pending, 0);
+ return -EIO;
+ }
+ sg_dst = sg_next(sg_dst);
+ }
+ return 0;
+}
+
+/**
+ * folios_copy_dma - copy a batch of folios via DMA memcpy
+ * @dst_list: destination folio list
+ * @src_list: source folio list
+ * @nr_folios: number of folios in each list
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+static int folios_copy_dma(struct list_head *dst_list,
+ struct list_head *src_list, unsigned int nr_folios)
+{
+ struct dma_work *works;
+ struct list_head *src_pos = src_list->next;
+ struct list_head *dst_pos = dst_list->next;
+ int i, folios_per_chan, ret;
+ dma_cap_mask_t mask;
+ int actual_channels = 0;
+ unsigned int max_channels;
+
+ max_channels = min3(nr_dma_channels, nr_folios,
+ (unsigned int)MAX_DMA_CHANNELS);
+
+ works = kcalloc(max_channels, sizeof(*works), GFP_KERNEL);
+ if (!works)
+ return -ENOMEM;
+
+ dma_cap_zero(mask);
+ dma_cap_set(DMA_MEMCPY, mask);
+
+ for (i = 0; i < max_channels; i++) {
+ works[actual_channels].chan = dma_request_chan_by_mask(&mask);
+ if (IS_ERR(works[actual_channels].chan))
+ break;
+ init_completion(&works[actual_channels].done);
+ actual_channels++;
+ }
+
+ if (actual_channels == 0) {
+ kfree(works);
+ return -ENODEV;
+ }
+
+ for (i = 0; i < actual_channels; i++) {
+ folios_per_chan = nr_folios * (i + 1) / actual_channels -
+ (nr_folios * i) / actual_channels;
+ if (folios_per_chan == 0)
+ continue;
+
+ ret = setup_sg_tables(&works[i], &src_pos, &dst_pos,
+ folios_per_chan);
+ if (ret)
+ goto err_cleanup;
+ }
+
+ for (i = 0; i < actual_channels; i++) {
+ ret = submit_dma_transfers(&works[i]);
+ if (ret)
+ goto err_cleanup;
+ }
+
+ for (i = 0; i < actual_channels; i++) {
+ if (atomic_read(&works[i].pending) > 0)
+ dma_async_issue_pending(works[i].chan);
+ }
+
+ for (i = 0; i < actual_channels; i++) {
+ if (atomic_read(&works[i].pending) == 0)
+ continue;
+ if (!wait_for_completion_timeout(&works[i].done,
+ msecs_to_jiffies(10000))) {
+ ret = -ETIMEDOUT;
+ goto err_cleanup;
+ }
+ }
+
+ cleanup_dma_work(works, actual_channels);
+
+ atomic_long_add(nr_folios, &folios_migrated);
+ return 0;
+
+err_cleanup:
+ pr_warn_ratelimited("dcbm: DMA copy failed (%d), falling back to CPU\n",
+ ret);
+ cleanup_dma_work(works, actual_channels);
+
+ atomic_long_add(nr_folios, &folios_failures);
+ return ret;
+}
+
+static struct migrator dma_migrator = {
+ .name = "DCBM",
+ .offload_copy = folios_copy_dma,
+ .owner = THIS_MODULE,
+};
+
+static ssize_t offloading_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ return sysfs_emit(buf, "%d\n", offloading_enabled);
+}
+
+static ssize_t offloading_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ bool enable;
+ int ret;
+
+ ret = kstrtobool(buf, &enable);
+ if (ret)
+ return ret;
+
+ mutex_lock(&dcbm_mutex);
+
+ if (enable == offloading_enabled)
+ goto out;
+
+ if (enable) {
+ ret = migrate_offload_register(&dma_migrator);
+ if (ret) {
+ mutex_unlock(&dcbm_mutex);
+ return ret;
+ }
+ offloading_enabled = true;
+ } else {
+ migrate_offload_unregister(&dma_migrator);
+ offloading_enabled = false;
+ }
+out:
+ mutex_unlock(&dcbm_mutex);
+ return count;
+}
+
+static ssize_t folios_migrated_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ return sysfs_emit(buf, "%lu\n", atomic_long_read(&folios_migrated));
+}
+
+static ssize_t folios_migrated_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ atomic_long_set(&folios_migrated, 0);
+ return count;
+}
+
+static ssize_t folios_failures_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ return sysfs_emit(buf, "%lu\n", atomic_long_read(&folios_failures));
+}
+
+static ssize_t folios_failures_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ atomic_long_set(&folios_failures, 0);
+ return count;
+}
+
+static ssize_t nr_dma_chan_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ return sysfs_emit(buf, "%u\n", nr_dma_channels);
+}
+
+static ssize_t nr_dma_chan_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ unsigned int val;
+ int ret;
+
+ ret = kstrtouint(buf, 0, &val);
+ if (ret)
+ return ret;
+
+ if (val < 1 || val > MAX_DMA_CHANNELS)
+ return -EINVAL;
+
+ mutex_lock(&dcbm_mutex);
+ nr_dma_channels = val;
+ mutex_unlock(&dcbm_mutex);
+ return count;
+}
+
+static struct kobj_attribute offloading_attr = __ATTR_RW(offloading);
+static struct kobj_attribute nr_dma_chan_attr = __ATTR_RW(nr_dma_chan);
+static struct kobj_attribute folios_migrated_attr = __ATTR_RW(folios_migrated);
+static struct kobj_attribute folios_failures_attr = __ATTR_RW(folios_failures);
+
+static struct attribute *dcbm_attrs[] = {
+ &offloading_attr.attr,
+ &nr_dma_chan_attr.attr,
+ &folios_migrated_attr.attr,
+ &folios_failures_attr.attr,
+ NULL
+};
+
+static const struct attribute_group dcbm_attr_group = {
+ .attrs = dcbm_attrs,
+};
+
+static int __init dcbm_init(void)
+{
+ int ret;
+
+ ret = sysfs_create_group(&THIS_MODULE->mkobj.kobj, &dcbm_attr_group);
+ if (ret)
+ return ret;
+
+ pr_info("dcbm: DMA Core Batch Migrator initialized\n");
+ return 0;
+}
+
+static void __exit dcbm_exit(void)
+{
+ mutex_lock(&dcbm_mutex);
+ if (offloading_enabled) {
+ migrate_offload_unregister(&dma_migrator);
+ offloading_enabled = false;
+ }
+ mutex_unlock(&dcbm_mutex);
+
+ sysfs_remove_group(&THIS_MODULE->mkobj.kobj, &dcbm_attr_group);
+ pr_info("dcbm: DMA Core Batch Migrator unloaded\n");
+}
+
+module_init(dcbm_init);
+module_exit(dcbm_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Shivank Garg");
+MODULE_DESCRIPTION("DMA Core Batch Migrator");
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 7/7] mm/migrate: adjust NR_MAX_BATCHED_MIGRATION for testing
2026-04-28 15:50 [PATCH 0/7] Accelerate page migration with batch copying and hardware offload Shivank Garg
` (5 preceding siblings ...)
2026-04-28 15:50 ` [PATCH 6/7] drivers/migrate_offload: add DMA batch copy driver (dcbm) Shivank Garg
@ 2026-04-28 15:50 ` Shivank Garg
2026-04-28 17:11 ` [PATCH 0/7] Accelerate page migration with batch copying and hardware offload Garg, Shivank
2026-04-30 8:47 ` Huang, Ying
8 siblings, 0 replies; 13+ messages in thread
From: Shivank Garg @ 2026-04-28 15:50 UTC (permalink / raw)
To: akpm, david
Cc: kinseyho, weixugc, ljs, Liam.Howlett, vbabka, willy, rppt, surenb,
mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
vkoul, bharata, sj, rientjes, xuezhengchu, yiannis, dave.hansen,
hannes, jhubbard, peterx, riel, shakeel.butt, stalexan, tj,
nifan.cxl, jic23, aneesh.kumar, nathan.lynch, Frank.li, djbw,
linux-kernel, linux-mm, Shivank Garg
From: Zi Yan <ziy@nvidia.com>
Change NR_MAX_BATCHED_MIGRATION to HPAGE_PUD_NR to allow batching THP
copies.
These are for testing purpose only.
Signed-off-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
mm/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 9af070f9a1f2..a16c009d31d0 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1658,7 +1658,7 @@ static inline int try_split_folio(struct folio *folio, struct list_head *split_f
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-#define NR_MAX_BATCHED_MIGRATION HPAGE_PMD_NR
+#define NR_MAX_BATCHED_MIGRATION HPAGE_PUD_NR
#else
#define NR_MAX_BATCHED_MIGRATION 512
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 0/7] Accelerate page migration with batch copying and hardware offload
2026-04-28 15:50 [PATCH 0/7] Accelerate page migration with batch copying and hardware offload Shivank Garg
` (6 preceding siblings ...)
2026-04-28 15:50 ` [PATCH 7/7] mm/migrate: adjust NR_MAX_BATCHED_MIGRATION for testing Shivank Garg
@ 2026-04-28 17:11 ` Garg, Shivank
2026-04-28 19:33 ` David Hildenbrand (Arm)
2026-04-30 8:47 ` Huang, Ying
8 siblings, 1 reply; 13+ messages in thread
From: Garg, Shivank @ 2026-04-28 17:11 UTC (permalink / raw)
To: akpm, david
Cc: kinseyho, weixugc, ljs, Liam.Howlett, vbabka, willy, rppt, surenb,
mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
vkoul, bharata, sj, rientjes, xuezhengchu, yiannis, dave.hansen,
hannes, jhubbard, peterx, riel, shakeel.butt, stalexan, tj,
nifan.cxl, jic23, aneesh.kumar, nathan.lynch, Frank.li, djbw,
linux-kernel, linux-mm
Hi all,
Apologies. The subject prefix should have been [RFC PATCH v5 0/7].
This is the fifth RFC, as mentioned in the cover letter, but I
missed the prefix while formatting the patches. Please treat this
round as RFC v5.
Thanks,
Shivank
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/7] Accelerate page migration with batch copying and hardware offload
2026-04-28 17:11 ` [PATCH 0/7] Accelerate page migration with batch copying and hardware offload Garg, Shivank
@ 2026-04-28 19:33 ` David Hildenbrand (Arm)
2026-04-29 5:51 ` Garg, Shivank
0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-28 19:33 UTC (permalink / raw)
To: Garg, Shivank, akpm
Cc: kinseyho, weixugc, ljs, Liam.Howlett, vbabka, willy, rppt, surenb,
mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
vkoul, bharata, sj, rientjes, xuezhengchu, yiannis, dave.hansen,
hannes, jhubbard, peterx, riel, shakeel.butt, stalexan, tj,
nifan.cxl, jic23, aneesh.kumar, nathan.lynch, Frank.li, djbw,
linux-kernel, linux-mm
On 4/28/26 19:11, Garg, Shivank wrote:
> Hi all,
>
> Apologies. The subject prefix should have been [RFC PATCH v5 0/7].
>
> This is the fifth RFC, as mentioned in the cover letter, but I
> missed the prefix while formatting the patches. Please treat this
> round as RFC v5.
Ever since I switched to b4 for patch management, the quality of my life improved :)
$ b4 prep -n SERIES -f mm/mm-unstable
$ b4 prep --set-prefixes RFC
... add patches
$ b4 prep --auto-to-cc
$ b4 prep --edit-cover
$ b4 send --no-sign
--
Cheers,
David
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/7] Accelerate page migration with batch copying and hardware offload
2026-04-28 19:33 ` David Hildenbrand (Arm)
@ 2026-04-29 5:51 ` Garg, Shivank
0 siblings, 0 replies; 13+ messages in thread
From: Garg, Shivank @ 2026-04-29 5:51 UTC (permalink / raw)
To: David Hildenbrand (Arm), akpm
Cc: kinseyho, weixugc, ljs, Liam.Howlett, vbabka, willy, rppt, surenb,
mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
vkoul, bharata, sj, rientjes, xuezhengchu, yiannis, dave.hansen,
hannes, jhubbard, peterx, riel, shakeel.butt, stalexan, tj,
nifan.cxl, jic23, aneesh.kumar, nathan.lynch, Frank.li, djbw,
linux-kernel, linux-mm
On 4/29/2026 1:03 AM, David Hildenbrand (Arm) wrote:
> On 4/28/26 19:11, Garg, Shivank wrote:
>> Hi all,
>>
>> Apologies. The subject prefix should have been [RFC PATCH v5 0/7].
>>
>> This is the fifth RFC, as mentioned in the cover letter, but I
>> missed the prefix while formatting the patches. Please treat this
>> round as RFC v5.
>
> Ever since I switched to b4 for patch management, the quality of my life improved :)
>
> $ b4 prep -n SERIES -f mm/mm-unstable
> $ b4 prep --set-prefixes RFC
> ... add patches
> $ b4 prep --auto-to-cc
> $ b4 prep --edit-cover
> $ b4 send --no-sign
>
Thanks, appreciate the pointers. :)
I'll switch to b4.
Best regards,
Shivank
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/7] Accelerate page migration with batch copying and hardware offload
2026-04-28 15:50 [PATCH 0/7] Accelerate page migration with batch copying and hardware offload Shivank Garg
` (7 preceding siblings ...)
2026-04-28 17:11 ` [PATCH 0/7] Accelerate page migration with batch copying and hardware offload Garg, Shivank
@ 2026-04-30 8:47 ` Huang, Ying
8 siblings, 0 replies; 13+ messages in thread
From: Huang, Ying @ 2026-04-30 8:47 UTC (permalink / raw)
To: Shivank Garg
Cc: akpm, david, kinseyho, weixugc, ljs, Liam.Howlett, vbabka, willy,
rppt, surenb, mhocko, ziy, matthew.brost, joshua.hahnjy,
rakie.kim, byungchul, gourry, apopple, dave, Jonathan.Cameron,
rkodsara, vkoul, bharata, sj, rientjes, xuezhengchu, yiannis,
dave.hansen, hannes, jhubbard, peterx, riel, shakeel.butt,
stalexan, tj, nifan.cxl, jic23, aneesh.kumar, nathan.lynch,
Frank.li, djbw, linux-kernel, linux-mm
Shivank Garg <shivankg@amd.com> writes:
> This is the fifth RFC of the patchset to enhance page migration by
> batching folio-copy operations and enabling acceleration via DMA offload.
>
> Single-threaded, folio-by-folio copying bottlenecks page migration in
> modern systems with deep memory hierarchies, especially for large folios
> where copy overhead dominates, leaving significant hardware potential
> untapped.
>
> By batching the copy phase, we create an opportunity for hardware
> acceleration. This series builds the framework and provides a DMA
> offload driver (dcbm) as a reference implementation, targeting bulk
> migration workloads where offloading the copy improves throughput
> and latency while freeing the CPU cycles.
>
> See the RFC V3 cover letter [2] for motivation.
>
> Changelog since V4:
> -------------------
>
> 1. Renamed PAGE_* migration state flags to FOLIO_*. (David)
> 2. Use the new folio->migrate_info field instead of folio->private
> for migration state. (David)
> 3. Fold folios_mc_copy patch in batch-copy implementation patch. (David)
> 3. Renamed migrate_offload_start()/stop() to register()/unregister().
> (Huang, Ying)
> 4. Dropped should_batch() callback from struct migrator. Reason-based
> policy now lives in migrate_pages_batch(). Migrators can still skip
> a batch they don't want (size based policy). (Huang, Ying)
> 5. CONFIG_MIGRATION_COPY_OFFLOAD is now hidden and selected by the
> migrator driver. CONFIG_DCBM_DMA is tristate. (Huang Ying, Gregory Price).
> 6. Wrapped the SRCU + static_call dispatch in a small helper. (Huang, Ying)
> 7. Requir m->owner in migrate_offload_register(), SRCU sync at
> unregister relies on it. Counters are atomic_long_t to avoid lock-order
> issue.
> 9. Moved DCBM sysfs from /sys/kernel/dcbm to /sys/module/dcbm (Huang, Ying)
> 10. Rebased on v7.1-rc1.
>
>
> DESIGN:
> -------
>
> New Migration Flow:
>
> [ migrate_pages_batch() ]
> |
> |--> do_batch = migrate_offload_do_batch(reason) // core filters by migration reason
> |
> |--> for each folio:
> | migrate_folio_unmap() // unmap the folio
> | |
> | +--> (success):
> | if do_batch && folio_supports_batch_copy():
> | -> unmap_batch / dst_batch // batch list for copy offloading
> | else:
> | -> unmap_single / dst_single // single lists for per-folio CPU copy
> |
> |--> try_to_unmap_flush() // single batched TLB flush
> |
> |--> Batch copy (if unmap_batch not empty):
> | - Migrator is configurable at runtime via sysfs.
> |
> | static_call(migrate_offload_copy) // Pluggable Migrators
> | / | \
> | v v v
> | [ Default ] [ DMA Offload ] [ ... ]
> |
> | On -EOPNOTSUPP or other error, batch falls back to per-folio CPU copy.
> |
> +--> migrate_folios_move() // metadata, update PTEs, finalize
> (batch list with already_copied=true, single list with false)
>
> Offload Registration:
>
> Driver fills struct migrator { .name, .offload_copy, .owner } and calls
> migrate_offload_register(). This:
> - Pins the module via try_module_get()
> - Patches the migrate_offload_copy() static_call target
> - Enables the migrate_offload_enabled static branch
>
> migrate_offload_unregister() disables the static branch and reverts
> the static_call, then synchronize_srcu() waits for in-flight migrations
> before module_put().
>
> PERFORMANCE RESULTS:
> --------------------
>
> Re-ran the V4 workload on v7.1-rc1 with this series; relative
> speedups match V4 (~6x for 2MB folios at 16 DMA channels). No design
> change in V5 alters this picture; please refer to the V4 cover letter
> for the throughput tables [1].
IMHO, it's better to copy performance data here.
In addition to the performance benefit, I want to know the downside as
well. For example, the migration latency of the first folio may be
longer. If so, by how much? Can you measure the batch number vs. total
migration time (benefit) and first folio migration time (downside)?
That can be used to determine the optimal batch number.
> PLAN:
> -----
>
> Patches 1-4 (the batching infrastructure) don't depend on the migrator
> interface, so if it helps I can split them off and post them ahead of
> the migrator and DCBM bits, which still have a few open questions to
> work through.
>
> I would appreciate guidance on splitting the infrastructure portion
> ahead of the migrator interface if that matches maintainers' preference.
>
> OPEN QUESTIONS:
> ---------------
>
> 1. Should the batch path run without a registered migrator? Patches 1-4
> are self-contained and use folios_mc_copy() (CPU). I have several
> options like making batch path always-on for eligible folios, or
> giving admin an option to flip the static branch, or keep the gate.
> I'm leaning toward always-on.
>
> 2. Carrying already_copied via folio->migrate_info vs changing the
> migrate_folio() callback signature (Huang, Ying). I went with the
> field for now to avoid touching every fs callback before the design
> settles. Happy to revisit.
>
> 3. Per-caller offload selection: Today eligibility is by migrate_reason
> only. Some are latency-tolerant, others may be not. Is reason the
> right granularity, or do we want a per-caller hint?
>
> 4. Cgroup integration: How should per-cgroup be accounted for different
> migrators (e.g.: any accounting for DMA-busy time)?
>
> 5. Tuning migrate_pages callers for offloading. For instance, in
> compaction COMPACT_CLUSTER_MAX = 32 caps DMA's payoff for compaction
> (V4 experiment).
>
> 6. Where do batch-size thresholds live, and how are they tuned? Per
> Huang Ying's split, that policy lives in the migrator. DCBM has no
> threshold today. Open whether it should later be a per-migrator
> sysfs knob or hard-coded; probably clearer once a second migrator
> (SDXI, mtcopy) shows the trade-off.
>
>
> FOLLOW-UPS:
> --------------
>
> 1. dmaengine_prep_dma_memcpy_sg() in DCBM (Vinod Koul). The SG-prep
> variant cuts per-batch prep/submit cost (=CPU savings), but ptdma does
> not implement the SG hook yet [10]. The end-to-end migration throughput
> delta is small because per-descriptor execute time dominates.
> I'll post the ptdma SG hook + DCBM switch as a follow-up.
>
> 2. SDXI as a second migrator. The SDXI series [11] is in review. SDXI is
> a generic memcpy engine without DMA_PRIVATE, so channel acquisition
> goes through dma_find_channel() or async_tx rather than
> dma_request_chan_by_mask(). I have a local DCBM variant working on top
> of the SDXI driver. I'm planning to send it as a follow-up once the
> SDXI series settles.
>
> 3. IOMMU SG merging in DCBM (Gregory). dma_map_sgtable() may merge
> contiguous PFNs unevenly, so src.nents != dst.nents. DCBM falls back
> to CPU for safety. Though I haven't seen it on Zen3 + PTDMA. I'll
> understand this and address it a follow-up.
>
> 4. Revisit Multi-threaded CPU copy migrator once the infra is settled.
>
> EARLIER POSTINGS:
> -----------------
> [1] RFC V4: https://lore.kernel.org/all/20260309120725.308854-3-shivankg@amd.com
> [2] RFC V3: https://lore.kernel.org/all/20250923174752.35701-1-shivankg@amd.com
> [3] RFC V2: https://lore.kernel.org/all/20250319192211.10092-1-shivankg@amd.com
> [4] RFC V1: https://lore.kernel.org/all/20240614221525.19170-1-shivankg@amd.com
> [5] RFC from Zi Yan: https://lore.kernel.org/all/20250103172419.4148674-1-ziy@nvidia.com
>
> RELATED DISCUSSIONS:
> --------------------
> [6] MM-alignment Session [Nov 12, 2025]:
> https://lore.kernel.org/linux-mm/bd6a3c75-b9f0-cbcf-f7c4-1ef5dff06d24@google.com
> [7] Linux Memory Hotness and Promotion call [Nov 6, 2025]:
> https://lore.kernel.org/linux-mm/8ff2fd10-c9ac-4912-cf56-7ecd4afd2770@google.com
> [8] LSFMM 2025:
> https://lore.kernel.org/all/cf6fc05d-c0b0-4de3-985e-5403977aa3aa@amd.com
> [9] OSS India:
> https://ossindia2025.sched.com/event/23Jk1
> [10] DMA_MEMCPY_SG comparison:
> https://lore.kernel.org/linux-mm/3e73addb-ac01-4a05-bc75-c6c1c56072df@amd.com
> [11] SDXI V1:
> https://lore.kernel.org/all/20260410-sdxi-base-v1-0-1d184cb5c60a@amd.com
>
> Thanks to everyone who reviewed, tested or participated in discussions
> around this series. Your feedback helped me throughout the development
> process.
>
> Best Regards,
> Shivank
>
>
> Shivank Garg (6):
> mm/migrate: rename PAGE_ migration flags to FOLIO_
> mm/migrate: use migrate_info field instead of private
> mm/migrate: skip data copy for already-copied folios
> mm/migrate: add batch-copy path in migrate_pages_batch
> mm/migrate: add copy offload registration infrastructure
> drivers/migrate_offload: add DMA batch copy driver (dcbm)
>
> Zi Yan (1):
> mm/migrate: adjust NR_MAX_BATCHED_MIGRATION for testing
>
> drivers/Kconfig | 2 +
> drivers/Makefile | 2 +
> drivers/migrate_offload/Kconfig | 9 +
> drivers/migrate_offload/Makefile | 1 +
> drivers/migrate_offload/dcbm/Makefile | 1 +
> drivers/migrate_offload/dcbm/dcbm.c | 440 ++++++++++++++++++++++++++
> include/linux/migrate_copy_offload.h | 44 +++
> include/linux/mm.h | 2 +
> include/linux/mm_types.h | 1 +
> mm/Kconfig | 6 +
> mm/Makefile | 1 +
> mm/migrate.c | 211 ++++++++----
> mm/migrate_copy_offload.c | 94 ++++++
> mm/util.c | 30 ++
> 14 files changed, 784 insertions(+), 60 deletions(-)
> create mode 100644 drivers/migrate_offload/Kconfig
> create mode 100644 drivers/migrate_offload/Makefile
> create mode 100644 drivers/migrate_offload/dcbm/Makefile
> create mode 100644 drivers/migrate_offload/dcbm/dcbm.c
> create mode 100644 include/linux/migrate_copy_offload.h
> create mode 100644 mm/migrate_copy_offload.c
>
>
> base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
---
Best Regards,
Huang, Ying
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/7] mm/migrate: rename PAGE_ migration flags to FOLIO_
2026-04-28 15:50 ` [PATCH 1/7] mm/migrate: rename PAGE_ migration flags to FOLIO_ Shivank Garg
@ 2026-04-30 9:07 ` Huang, Ying
0 siblings, 0 replies; 13+ messages in thread
From: Huang, Ying @ 2026-04-30 9:07 UTC (permalink / raw)
To: Shivank Garg
Cc: akpm, david, kinseyho, weixugc, ljs, Liam.Howlett, vbabka, willy,
rppt, surenb, mhocko, ziy, matthew.brost, joshua.hahnjy,
rakie.kim, byungchul, gourry, apopple, dave, Jonathan.Cameron,
rkodsara, vkoul, bharata, sj, rientjes, xuezhengchu, yiannis,
dave.hansen, hannes, jhubbard, peterx, riel, shakeel.butt,
stalexan, tj, nifan.cxl, jic23, aneesh.kumar, nathan.lynch,
Frank.li, djbw, linux-kernel, linux-mm, Baolin Wang, Lance Yang
Shivank Garg <shivankg@amd.com> writes:
> These flags only track folio-specific state during migration and are
> not used for movable_ops pages. Rename the enum values and the
> old_page_state variable to match.
>
> No functional change.
>
> Suggested-by: David Hildenbrand <david@kernel.org>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
LGTM, Thanks! Feel free to add my
Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com>
in the future versions.
[snip]
---
Best Regards,
Huang, Ying
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-04-30 9:07 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 15:50 [PATCH 0/7] Accelerate page migration with batch copying and hardware offload Shivank Garg
2026-04-28 15:50 ` [PATCH 1/7] mm/migrate: rename PAGE_ migration flags to FOLIO_ Shivank Garg
2026-04-30 9:07 ` Huang, Ying
2026-04-28 15:50 ` [PATCH 2/7] mm/migrate: use migrate_info field instead of private Shivank Garg
2026-04-28 15:50 ` [PATCH 3/7] mm/migrate: skip data copy for already-copied folios Shivank Garg
2026-04-28 15:50 ` [PATCH 4/7] mm/migrate: add batch-copy path in migrate_pages_batch Shivank Garg
2026-04-28 15:50 ` [PATCH 5/7] mm/migrate: add copy offload registration infrastructure Shivank Garg
2026-04-28 15:50 ` [PATCH 6/7] drivers/migrate_offload: add DMA batch copy driver (dcbm) Shivank Garg
2026-04-28 15:50 ` [PATCH 7/7] mm/migrate: adjust NR_MAX_BATCHED_MIGRATION for testing Shivank Garg
2026-04-28 17:11 ` [PATCH 0/7] Accelerate page migration with batch copying and hardware offload Garg, Shivank
2026-04-28 19:33 ` David Hildenbrand (Arm)
2026-04-29 5:51 ` Garg, Shivank
2026-04-30 8:47 ` Huang, Ying
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox