[RFC PATCH v4 0/6] Accelerate page migration with batch copying and hardware offload

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

* [RFC PATCH v4 0/6] Accelerate page migration with batch copying and hardware offload
@ 2026-03-09 12:07 Shivank Garg
  2026-03-09 12:07 ` [RFC PATCH v4 1/6] mm: introduce folios_mc_copy() for batch folio copying Shivank Garg
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Shivank Garg @ 2026-03-09 12:07 UTC (permalink / raw)
  To: akpm, david
  Cc: lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt, surenb,
	mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
	gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm,
	Shivank Garg

This is the fourth RFC of the patchset to enhance page migration by
batching folio-copy operations and enabling acceleration via DMA offload.

Single-threaded, folio-by-folio copying bottlenecks page migration in
modern systems with deep memory hierarchies, especially for large folios
where copy overhead dominates, leaving significant hardware potential
untapped.

By batching the copy phase, we create an opportunity for hardware
acceleration. This series builds the framework and provides a DMA
offload driver (dcbm) as a reference implementation, targeting bulk
migration workloads where offloading the copy improves throughput
and latency while freeing the CPU cycles.

See the RFC V3 cover letter [1] for motivation.


Changelog since V3:
-------------------

1. Redesigned batch migration flow: pre-copy the batch before the move
   phase instead of interleaving copy with metadata updates.
   Simpler design, avoids redundancy with existing migrate_folios_move()
   path.

2. Rewrote offload registration infrastructure: Simplified the migrate
   copy offload infrastructure design, fixed the srcu_read_lock()
   placement and other minor bugs.

3. Added should_batch() callback to struct migrator so offload drivers can
   filter which migration reasons are eligible for offload.

4. Renamed for clarity:
   - CONFIG_OFFC_MIGRATION     -> CONFIG_MIGRATION_COPY_OFFLOAD
   - migrate_offc.[ch]         -> migrate_copy_offload.[ch]
   - drivers/migoffcopy/       -> drivers/migrate_offload/
   - start_offloading/stop_offloading -> migrate_offload_start/stop

5. Dropped mtcopy driver to keep focus on core infrastructure and DMA
   offload (for testing and reference). Multi-threaded CPU copy can
   follow separately.

6. Rebased on v7.0-rc2.


DESIGN:
-------

New Migration Flow:

[ migrate_pages_batch() ]
    |
    |--> do_batch = should_batch(reason) // driver filters by migration reason (e.g. allow
    |                                    // NUMA balancing, skip other), Once per batch
    |
    |--> for each folio:
    |      migrate_folio_unmap()        // unmap the folio
    |      |
    |      +--> (success):
    |           if migrate_offload_enabled && do_batch && folio_supports_batch_copy():
    |               -> src_batch / dst_batch    // batch list for copy offloading
    |           else:
    |               -> src_std / dst_std        // standard lists for per-folio CPU copy
    |
    |--> try_to_unmap_flush()                   // single batched TLB flush 
    |
    |--> Batch copy (if src_batch not empty):
    |    - Migrator is configurable at runtime via sysfs.
    |
    |      static_call(migrate_offload_copy)    // Pluggable Migrators
    |              /          |            \
    |             v           v             v
    |     [ Default ]  [ DMA Offload ]  [ ... ]
    |
    |      On failure, folios fall back to per-folio CPU copy.
    |
    +--> migrate_folios_move()      // metadata, update PTEs, finalize
         (batch list with already_copied=true, std list with false)

Offload Registration:

    Driver fills struct migrator { .name, .offload_copy, .should_batch, .owner }
    and calls migrate_offload_start().  This:
      - Pins the module via try_module_get()
      - Patches static_call targets for offload_copy and should_batch
      - Enables the migrate_offload_enabled static branch

    migrate_offload_stop() disables the static branch and reverts both
    static_calls, then synchronize_srcu() waits for in-flight
    migrations before module_put().


PERFORMANCE RESULTS:
--------------------

System Info: AMD Zen 3 EPYC server (2-sockets, 32 cores, SMT Enabled),
1 NUMA node per socket, v7.0-rc2, DVFS set to Performance, PTDMA hardware.

Benchmark: move_pages() syscall to move pages between two NUMA nodes.

1. Moving different sized folios such that total transfer size is constant
(1GB), with different number of DMA channels. Throughput in GB/s.

a. Baseline (vanilla kernel: v7.0-rc2, single-threaded, serial folio_copy):

============================================================================================
       | 4K          | 16K         | 64K         | 256K        | 1M           | 2M         |
============================================================================================
       |3.55±0.19    | 5.66±0.30   | 6.16±0.09   | 7.12±0.83   | 6.93±0.09   | 10.88±0.19  |

b. DMA offload (Patched Kernel, dcbm driver, N DMA channels):

============================================================================================
Channel Cnt| 4K      | 16K         | 64K         | 256K        | 1M          | 2M          |
============================================================================================
1      | 2.63±0.26   | 2.92±0.09   |  3.16±0.13  |  4.75±0.70  |  7.38±0.18  | 12.64±0.07  |
2      | 3.20±0.12   | 4.68±0.17   |  5.16±0.36  |  7.42±1.00  |  8.05±0.05  | 14.40±0.10  |
4      | 3.78±0.16   | 6.45±0.06   |  7.36±0.18  |  9.70±0.11  | 11.68±2.37  | 27.16±0.20  |
8      | 4.32±0.24   | 8.20±0.45   |  9.45±0.26  | 12.99±2.87  | 13.18±0.08  | 46.17±0.67  |
12     | 4.35±0.16   | 8.80±0.09   | 11.65±2.71  | 15.46±4.95  | 14.69±4.10  | 60.89±0.68  |
16     | 4.40±0.19   | 9.25±0.13   | 11.02±0.26  | 13.56±0.15  | 18.04±7.11  | 66.86±0.81  |

- DMA offload with 16 channels achieves ~6x speedup for 2MB folios.
- Larger folios benefit more; small folios are DMA-setup bound.

2. Varying total move size (folio count) for fixed 2MB folio size,
   single DMA channel. Throughput (GB/s):

2MB Folios | Baseline    | DMA
=================================
1          |  7.34       |  6.17
8          |  8.27       |  8.85
16         |  7.56       |  9.12
32         |  8.39       | 11.73
64         |  9.37       | 12.18
256        | 10.58       | 12.50
512        | 10.78       | 12.68
1024       | 10.77       | 12.76
2048       | 10.87       | 12.81
8192       | 10.84       | 12.82

- Throughput increases with batch size but plateaus after ~64 pages.
- Even a single DMA channel outperforms baseline for batch-size >= 8 pages.

EARLIER POSTINGS:
-----------------
[1] RFC V3: https://lore.kernel.org/all/20250923174752.35701-1-shivankg@amd.com
[2] RFC V2: https://lore.kernel.org/all/20250319192211.10092-1-shivankg@amd.com
[3] RFC V1: https://lore.kernel.org/all/20240614221525.19170-1-shivankg@amd.com
[4] RFC from Zi Yan: https://lore.kernel.org/all/20250103172419.4148674-1-ziy@nvidia.com

RELATED DISCUSSIONS:
-------------------
[5] MM-alignment Session [Nov 12, 2025]:
    https://lore.kernel.org/linux-mm/bd6a3c75-b9f0-cbcf-f7c4-1ef5dff06d24@google.com/
[6] Linux Memory Hotness and Promotion call [Nov 6, 2025]:
    https://lore.kernel.org/linux-mm/8ff2fd10-c9ac-4912-cf56-7ecd4afd2770@google.com/
[7] LSFMM 2025:
    https://lore.kernel.org/all/cf6fc05d-c0b0-4de3-985e-5403977aa3aa@amd.com
[8] OSS India:
    https://ossindia2025.sched.com/event/23Jk1

Git Tree: https://github.com/shivankgarg98/linux/commits/shivank/V4_migrate_pages_optimization_precopy

Thanks to everyone who reviewed, tested or participated in discussions
around this series. Your feedback helped me throughout the development
process.

Best Regards,
Shivank


Shivank Garg (5):
  mm: introduce folios_mc_copy() for batch folio copying
  mm/migrate: skip data copy for already-copied folios
  mm/migrate: add batch-copy path in migrate_pages_batch
  mm/migrate: add copy offload registration infrastructure
  drivers/migrate_offload: add DMA batch copy driver (dcbm)

Zi Yan (1):
  mm/migrate: adjust NR_MAX_BATCHED_MIGRATION for testing

 drivers/Kconfig                       |   2 +
 drivers/Makefile                      |   2 +
 drivers/migrate_offload/Kconfig       |   8 +
 drivers/migrate_offload/Makefile      |   1 +
 drivers/migrate_offload/dcbm/Makefile |   1 +
 drivers/migrate_offload/dcbm/dcbm.c   | 457 ++++++++++++++++++++++++++
 include/linux/migrate_copy_offload.h  |  34 ++
 include/linux/mm.h                    |   2 +
 mm/Kconfig                            |   9 +
 mm/Makefile                           |   1 +
 mm/migrate.c                          | 133 ++++++--
 mm/migrate_copy_offload.c             |  99 ++++++
 mm/util.c                             |  31 ++
 13 files changed, 748 insertions(+), 32 deletions(-)
 create mode 100644 drivers/migrate_offload/Kconfig
 create mode 100644 drivers/migrate_offload/Makefile
 create mode 100644 drivers/migrate_offload/dcbm/Makefile
 create mode 100644 drivers/migrate_offload/dcbm/dcbm.c
 create mode 100644 include/linux/migrate_copy_offload.h
 create mode 100644 mm/migrate_copy_offload.c

-- 
2.43.0



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v4 1/6] mm: introduce folios_mc_copy() for batch folio copying
  2026-03-09 12:07 [RFC PATCH v4 0/6] Accelerate page migration with batch copying and hardware offload Shivank Garg
@ 2026-03-09 12:07 ` Shivank Garg
  2026-03-12  9:41   ` David Hildenbrand (Arm)
  2026-03-09 12:07 ` [RFC PATCH v4 2/6] mm/migrate: skip data copy for already-copied folios Shivank Garg
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Shivank Garg @ 2026-03-09 12:07 UTC (permalink / raw)
  To: akpm, david
  Cc: lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt, surenb,
	mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
	gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm,
	Shivank Garg

Add folios_mc_copy() which walks list of src and dst folios in lockstep,
and copies folio content via folio_mc_copy(). folios_cnt parameter is
unused here, but is part of the offload_copy callback signature used by
later patches in the series.

Signed-off-by: Shivank Garg <shivankg@amd.com>
---
 include/linux/mm.h |  2 ++
 mm/util.c          | 31 +++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5be3d8a8f806..e1ca4d6b7361 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1644,6 +1644,8 @@ void __folio_put(struct folio *folio);
 void split_page(struct page *page, unsigned int order);
 void folio_copy(struct folio *dst, struct folio *src);
 int folio_mc_copy(struct folio *dst, struct folio *src);
+int folios_mc_copy(struct list_head *dst_list, struct list_head *src_list,
+		unsigned int __always_unused folios_cnt);
 
 unsigned long nr_free_buffer_pages(void);
 
diff --git a/mm/util.c b/mm/util.c
index b05ab6f97e11..5bda599168f8 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -749,6 +749,37 @@ int folio_mc_copy(struct folio *dst, struct folio *src)
 }
 EXPORT_SYMBOL(folio_mc_copy);
 
+/**
+ * folios_mc_copy - Copy the contents of list of folios.
+ * @dst_list: destination folio list.
+ * @src_list: source folio list.
+ * @folios_cnt: unused here, present for callback signature compatibility.
+ *
+ * Walks list of src and dst folios in lockstep and copies folio
+ * content via folio_mc_copy(). The caller must ensure both lists have
+ * the same number of entries. This may sleep.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int folios_mc_copy(struct list_head *dst_list, struct list_head *src_list,
+		unsigned int __always_unused folios_cnt)
+{
+	struct folio *src, *dst;
+	int ret;
+
+	dst = list_first_entry(dst_list, struct folio, lru);
+	list_for_each_entry(src, src_list, lru) {
+		cond_resched();
+		ret = folio_mc_copy(dst, src);
+		if (ret)
+			return ret;
+		dst = list_next_entry(dst, lru);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(folios_mc_copy);
+
 int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS;
 static int sysctl_overcommit_ratio __read_mostly = 50;
 static unsigned long sysctl_overcommit_kbytes __read_mostly;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v4 2/6] mm/migrate: skip data copy for already-copied folios
  2026-03-09 12:07 [RFC PATCH v4 0/6] Accelerate page migration with batch copying and hardware offload Shivank Garg
  2026-03-09 12:07 ` [RFC PATCH v4 1/6] mm: introduce folios_mc_copy() for batch folio copying Shivank Garg
@ 2026-03-09 12:07 ` Shivank Garg
  2026-03-12  9:44   ` David Hildenbrand (Arm)
  2026-03-24  8:22   ` Huang, Ying
  2026-03-09 12:07 ` [RFC PATCH v4 3/6] mm/migrate: add batch-copy path in migrate_pages_batch Shivank Garg
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 21+ messages in thread
From: Shivank Garg @ 2026-03-09 12:07 UTC (permalink / raw)
  To: akpm, david
  Cc: lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt, surenb,
	mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
	gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm,
	Shivank Garg

Add a PAGE_ALREADY_COPIED flag to the dst->private migration state.
When set, __migrate_folio() skips folio_mc_copy() and performs
metadata-only migration. All callers currently pass
already_copied=false. The batch-copy path enables it in a later patch.

Move the dst->private state enum earlier in the file so
__migrate_folio() and move_to_new_folio() can see PAGE_ALREADY_COPIED.

Signed-off-by: Shivank Garg <shivankg@amd.com>
---
 mm/migrate.c | 52 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 1bf2cf8c44dd..1d8c1fb627c9 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -848,6 +848,18 @@ void folio_migrate_flags(struct folio *newfolio, struct folio *folio)
 }
 EXPORT_SYMBOL(folio_migrate_flags);
 
+/*
+ * To record some information during migration, we use unused private
+ * field of struct folio of the newly allocated destination folio.
+ * This is safe because nobody is using it except us.
+ */
+enum {
+	PAGE_WAS_MAPPED = BIT(0),
+	PAGE_WAS_MLOCKED = BIT(1),
+	PAGE_ALREADY_COPIED = BIT(2),
+	PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED | PAGE_ALREADY_COPIED,
+};
+
 /************************************************************
  *                    Migration functions
  ***********************************************************/
@@ -857,14 +869,20 @@ static int __migrate_folio(struct address_space *mapping, struct folio *dst,
 			   enum migrate_mode mode)
 {
 	int rc, expected_count = folio_expected_ref_count(src) + 1;
+	bool already_copied = ((unsigned long)dst->private & PAGE_ALREADY_COPIED);
+
+	if (already_copied)
+		dst->private = NULL;
 
 	/* Check whether src does not have extra refs before we do more work */
 	if (folio_ref_count(src) != expected_count)
 		return -EAGAIN;
 
-	rc = folio_mc_copy(dst, src);
-	if (unlikely(rc))
-		return rc;
+	if (!already_copied) {
+		rc = folio_mc_copy(dst, src);
+		if (unlikely(rc))
+			return rc;
+	}
 
 	rc = __folio_migrate_mapping(mapping, dst, src, expected_count);
 	if (rc)
@@ -1088,7 +1106,7 @@ static int fallback_migrate_folio(struct address_space *mapping,
  *     0 - success
  */
 static int move_to_new_folio(struct folio *dst, struct folio *src,
-				enum migrate_mode mode)
+		enum migrate_mode mode, bool already_copied)
 {
 	struct address_space *mapping = folio_mapping(src);
 	int rc = -EAGAIN;
@@ -1096,6 +1114,9 @@ static int move_to_new_folio(struct folio *dst, struct folio *src,
 	VM_BUG_ON_FOLIO(!folio_test_locked(src), src);
 	VM_BUG_ON_FOLIO(!folio_test_locked(dst), dst);
 
+	if (already_copied)
+		dst->private = (void *)(unsigned long)PAGE_ALREADY_COPIED;
+
 	if (!mapping)
 		rc = migrate_folio(mapping, dst, src, mode);
 	else if (mapping_inaccessible(mapping))
@@ -1127,17 +1148,6 @@ static int move_to_new_folio(struct folio *dst, struct folio *src,
 	return rc;
 }
 
-/*
- * To record some information during migration, we use unused private
- * field of struct folio of the newly allocated destination folio.
- * This is safe because nobody is using it except us.
- */
-enum {
-	PAGE_WAS_MAPPED = BIT(0),
-	PAGE_WAS_MLOCKED = BIT(1),
-	PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED,
-};
-
 static void __migrate_folio_record(struct folio *dst,
 				   int old_page_state,
 				   struct anon_vma *anon_vma)
@@ -1353,7 +1363,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
 static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
 			      struct folio *src, struct folio *dst,
 			      enum migrate_mode mode, enum migrate_reason reason,
-			      struct list_head *ret)
+			      struct list_head *ret, bool already_copied)
 {
 	int rc;
 	int old_page_state = 0;
@@ -1371,7 +1381,7 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
 		goto out_unlock_both;
 	}
 
-	rc = move_to_new_folio(dst, src, mode);
+	rc = move_to_new_folio(dst, src, mode, already_copied);
 	if (rc)
 		goto out;
 
@@ -1519,7 +1529,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
 	}
 
 	if (!folio_mapped(src))
-		rc = move_to_new_folio(dst, src, mode);
+		rc = move_to_new_folio(dst, src, mode, false);
 
 	if (page_was_mapped)
 		remove_migration_ptes(src, !rc ? dst : src, ttu);
@@ -1703,7 +1713,7 @@ static void migrate_folios_move(struct list_head *src_folios,
 		struct list_head *ret_folios,
 		struct migrate_pages_stats *stats,
 		int *retry, int *thp_retry, int *nr_failed,
-		int *nr_retry_pages)
+		int *nr_retry_pages, bool already_copied)
 {
 	struct folio *folio, *folio2, *dst, *dst2;
 	bool is_thp;
@@ -1720,7 +1730,7 @@ static void migrate_folios_move(struct list_head *src_folios,
 
 		rc = migrate_folio_move(put_new_folio, private,
 				folio, dst, mode,
-				reason, ret_folios);
+				reason, ret_folios, already_copied);
 		/*
 		 * The rules are:
 		 *	0: folio will be freed
@@ -1977,7 +1987,7 @@ static int migrate_pages_batch(struct list_head *from,
 		migrate_folios_move(&unmap_folios, &dst_folios,
 				put_new_folio, private, mode, reason,
 				ret_folios, stats, &retry, &thp_retry,
-				&nr_failed, &nr_retry_pages);
+				&nr_failed, &nr_retry_pages, false);
 	}
 	nr_failed += retry;
 	stats->nr_thp_failed += thp_retry;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v4 3/6] mm/migrate: add batch-copy path in migrate_pages_batch
  2026-03-09 12:07 [RFC PATCH v4 0/6] Accelerate page migration with batch copying and hardware offload Shivank Garg
  2026-03-09 12:07 ` [RFC PATCH v4 1/6] mm: introduce folios_mc_copy() for batch folio copying Shivank Garg
  2026-03-09 12:07 ` [RFC PATCH v4 2/6] mm/migrate: skip data copy for already-copied folios Shivank Garg
@ 2026-03-09 12:07 ` Shivank Garg
  2026-03-24  8:42   ` Huang, Ying
  2026-03-09 12:07 ` [RFC PATCH v4 4/6] mm/migrate: add copy offload registration infrastructure Shivank Garg
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Shivank Garg @ 2026-03-09 12:07 UTC (permalink / raw)
  To: akpm, david
  Cc: lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt, surenb,
	mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
	gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm,
	Shivank Garg

Split unmapped folios into batch-eligible (src_batch/dst_batch) and
standard (src_std/dst_std) lists, gated by the migrate_offload_enabled
which is off by default. So, when no offload driver is active, the
branch is never taken and everything goes through the standard path.

After TLB flush, batch copy the eligible folios via folios_mc_copy()
and pass already_copied=true into migrate_folios_move() so
__migrate_folio() skips the per-folio copy.

On batch copy failure, already_copied flag stays false and each folio
fall back to individual copy.

Signed-off-by: Shivank Garg <shivankg@amd.com>
---
 mm/migrate.c | 55 +++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 44 insertions(+), 11 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 1d8c1fb627c9..69daa16f9cf3 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -43,6 +43,7 @@
 #include <linux/sched/sysctl.h>
 #include <linux/memory-tiers.h>
 #include <linux/pagewalk.h>
+#include <linux/jump_label.h>
 
 #include <asm/tlbflush.h>
 
@@ -51,6 +52,8 @@
 #include "internal.h"
 #include "swap.h"
 
+DEFINE_STATIC_KEY_FALSE(migrate_offload_enabled);
+
 static const struct movable_operations *offline_movable_ops;
 static const struct movable_operations *zsmalloc_movable_ops;
 
@@ -1706,6 +1709,12 @@ static int migrate_hugetlbs(struct list_head *from, new_folio_t get_new_folio,
 	return nr_failed;
 }
 
+/* movable_ops folios have their own migrate path */
+static bool folio_supports_batch_copy(struct folio *folio)
+{
+	return likely(!page_has_movable_ops(&folio->page));
+}
+
 static void migrate_folios_move(struct list_head *src_folios,
 		struct list_head *dst_folios,
 		free_folio_t put_new_folio, unsigned long private,
@@ -1805,8 +1814,12 @@ static int migrate_pages_batch(struct list_head *from,
 	bool is_large = false;
 	struct folio *folio, *folio2, *dst = NULL;
 	int rc, rc_saved = 0, nr_pages;
-	LIST_HEAD(unmap_folios);
-	LIST_HEAD(dst_folios);
+	unsigned int nr_batch = 0;
+	bool batch_copied = false;
+	LIST_HEAD(src_batch);
+	LIST_HEAD(dst_batch);
+	LIST_HEAD(src_std);
+	LIST_HEAD(dst_std);
 	bool nosplit = (reason == MR_NUMA_MISPLACED);
 
 	VM_WARN_ON_ONCE(mode != MIGRATE_ASYNC &&
@@ -1943,7 +1956,7 @@ static int migrate_pages_batch(struct list_head *from,
 				/* nr_failed isn't updated for not used */
 				stats->nr_thp_failed += thp_retry;
 				rc_saved = rc;
-				if (list_empty(&unmap_folios))
+				if (list_empty(&src_batch) && list_empty(&src_std))
 					goto out;
 				else
 					goto move;
@@ -1953,8 +1966,15 @@ static int migrate_pages_batch(struct list_head *from,
 				nr_retry_pages += nr_pages;
 				break;
 			case 0:
-				list_move_tail(&folio->lru, &unmap_folios);
-				list_add_tail(&dst->lru, &dst_folios);
+				if (static_branch_unlikely(&migrate_offload_enabled) &&
+				    folio_supports_batch_copy(folio)) {
+					list_move_tail(&folio->lru, &src_batch);
+					list_add_tail(&dst->lru, &dst_batch);
+					nr_batch++;
+				} else {
+					list_move_tail(&folio->lru, &src_std);
+					list_add_tail(&dst->lru, &dst_std);
+				}
 				break;
 			default:
 				/*
@@ -1977,17 +1997,28 @@ static int migrate_pages_batch(struct list_head *from,
 	/* Flush TLBs for all unmapped folios */
 	try_to_unmap_flush();
 
+	/* Batch-copy eligible folios before the move phase */
+	if (!list_empty(&src_batch)) {
+		rc = folios_mc_copy(&dst_batch, &src_batch, nr_batch);
+		batch_copied = (rc == 0);
+	}
+
 	retry = 1;
 	for (pass = 0; pass < nr_pass && retry; pass++) {
 		retry = 0;
 		thp_retry = 0;
 		nr_retry_pages = 0;
 
-		/* Move the unmapped folios */
-		migrate_folios_move(&unmap_folios, &dst_folios,
-				put_new_folio, private, mode, reason,
-				ret_folios, stats, &retry, &thp_retry,
-				&nr_failed, &nr_retry_pages, false);
+		if (!list_empty(&src_batch))
+			migrate_folios_move(&src_batch, &dst_batch, put_new_folio,
+					private, mode, reason, ret_folios, stats,
+					&retry, &thp_retry, &nr_failed,
+					&nr_retry_pages, batch_copied);
+		if (!list_empty(&src_std))
+			migrate_folios_move(&src_std, &dst_std,	put_new_folio,
+					private, mode, reason, ret_folios, stats,
+					&retry, &thp_retry, &nr_failed,
+					&nr_retry_pages, false);
 	}
 	nr_failed += retry;
 	stats->nr_thp_failed += thp_retry;
@@ -1996,7 +2027,9 @@ static int migrate_pages_batch(struct list_head *from,
 	rc = rc_saved ? : nr_failed;
 out:
 	/* Cleanup remaining folios */
-	migrate_folios_undo(&unmap_folios, &dst_folios,
+	migrate_folios_undo(&src_batch, &dst_batch,
+			put_new_folio, private, ret_folios);
+	migrate_folios_undo(&src_std, &dst_std,
 			put_new_folio, private, ret_folios);
 
 	return rc;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v4 4/6] mm/migrate: add copy offload registration infrastructure
  2026-03-09 12:07 [RFC PATCH v4 0/6] Accelerate page migration with batch copying and hardware offload Shivank Garg
                   ` (2 preceding siblings ...)
  2026-03-09 12:07 ` [RFC PATCH v4 3/6] mm/migrate: add batch-copy path in migrate_pages_batch Shivank Garg
@ 2026-03-09 12:07 ` Shivank Garg
  2026-03-09 17:54   ` Gregory Price
  2026-03-24 10:54   ` Huang, Ying
  2026-03-09 12:07 ` [RFC PATCH v4 5/6] drivers/migrate_offload: add DMA batch copy driver (dcbm) Shivank Garg
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 21+ messages in thread
From: Shivank Garg @ 2026-03-09 12:07 UTC (permalink / raw)
  To: akpm, david
  Cc: lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt, surenb,
	mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
	gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm,
	Shivank Garg, Mike Day

Introduce CONFIG_MIGRATION_COPY_OFFLOAD, which lets offload driver
(DMA, multi-threaded CPU copy, etc) take over the batch folio copy in
migrate_pages_batch().

Offload driver fill in a struct migrator with their offload_copy() and
should_batch() implementation and call migrate_offload_start(), which
patches the migrate_offload_copy() static_call and flips the
migrate_offload_enabled static branch. The migrate_offload_stop() call
reverts both.

Only one migrator can be active a time. A second registration returns
-EBUSY, and only the active migrator can stop itself. The static_call
dispatch is under SRCU so synchronize_srcu() in stop path guarantees
no in-flight copy before the module reference is dropped.

Co-developed-by: Mike Day <michael.day@amd.com>
Signed-off-by: Mike Day <michael.day@amd.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
 include/linux/migrate_copy_offload.h | 34 ++++++++++
 mm/Kconfig                           |  9 +++
 mm/Makefile                          |  1 +
 mm/migrate.c                         | 30 ++++++++-
 mm/migrate_copy_offload.c            | 99 ++++++++++++++++++++++++++++
 5 files changed, 171 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/migrate_copy_offload.h
 create mode 100644 mm/migrate_copy_offload.c

diff --git a/include/linux/migrate_copy_offload.h b/include/linux/migrate_copy_offload.h
new file mode 100644
index 000000000000..ee112826ebdf
--- /dev/null
+++ b/include/linux/migrate_copy_offload.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_MIGRATE_COPY_OFFLOAD_H
+#define _LINUX_MIGRATE_COPY_OFFLOAD_H
+
+#include <linux/jump_label.h>
+#include <linux/srcu.h>
+#include <linux/types.h>
+
+struct list_head;
+struct module;
+
+#define MIGRATOR_NAME_LEN 32
+
+struct migrator {
+	char name[MIGRATOR_NAME_LEN];
+	int (*offload_copy)(struct list_head *dst_list,
+			    struct list_head *src_list,
+			    unsigned int folio_cnt);
+	bool (*should_batch)(int reason);
+	struct module *owner;
+};
+
+#ifdef CONFIG_MIGRATION_COPY_OFFLOAD
+extern struct static_key_false migrate_offload_enabled;
+extern struct srcu_struct migrate_offload_srcu;
+bool migrate_should_batch_default(int reason);
+int migrate_offload_start(struct migrator *m);
+int migrate_offload_stop(struct migrator *m);
+#else
+static inline int migrate_offload_start(struct migrator *m) { return 0; }
+static inline int migrate_offload_stop(struct migrator *m) { return 0; }
+#endif /* CONFIG_MIGRATION_COPY_OFFLOAD */
+
+#endif /* _LINUX_MIGRATE_COPY_OFFLOAD_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index ebd8ea353687..faf0cae9991b 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -648,6 +648,15 @@ config MIGRATION
 config DEVICE_MIGRATION
 	def_bool MIGRATION && ZONE_DEVICE
 
+config MIGRATION_COPY_OFFLOAD
+	bool "Page migration copy offload"
+	depends on MIGRATION
+	help
+	  Adds migration copy offload infrastructure which allow
+	  offload engines (DMA, multi-threaded CPU copy, etc.) to
+	  register as the batch-copy provider for page migration
+	  via migrate_offload_start()/migrate_offload_stop().
+
 config ARCH_ENABLE_HUGEPAGE_MIGRATION
 	bool
 
diff --git a/mm/Makefile b/mm/Makefile
index 8ad2ab08244e..db1ac8097089 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -96,6 +96,7 @@ obj-$(CONFIG_FAILSLAB) += failslab.o
 obj-$(CONFIG_FAIL_PAGE_ALLOC) += fail_page_alloc.o
 obj-$(CONFIG_MEMTEST)		+= memtest.o
 obj-$(CONFIG_MIGRATION) += migrate.o
+obj-$(CONFIG_MIGRATION_COPY_OFFLOAD) += migrate_copy_offload.o
 obj-$(CONFIG_NUMA) += memory-tiers.o
 obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o
 obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o
diff --git a/mm/migrate.c b/mm/migrate.c
index 69daa16f9cf3..acaaa9cc0d4f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -44,6 +44,8 @@
 #include <linux/memory-tiers.h>
 #include <linux/pagewalk.h>
 #include <linux/jump_label.h>
+#include <linux/static_call.h>
+#include <linux/migrate_copy_offload.h>
 
 #include <asm/tlbflush.h>
 
@@ -54,6 +56,17 @@
 
 DEFINE_STATIC_KEY_FALSE(migrate_offload_enabled);
 
+#ifdef CONFIG_MIGRATION_COPY_OFFLOAD
+DEFINE_SRCU(migrate_offload_srcu);
+DEFINE_STATIC_CALL(migrate_offload_copy, folios_mc_copy);
+
+bool migrate_should_batch_default(int reason)
+{
+	return false;
+}
+DEFINE_STATIC_CALL(migrate_should_batch, migrate_should_batch_default);
+#endif
+
 static const struct movable_operations *offline_movable_ops;
 static const struct movable_operations *zsmalloc_movable_ops;
 
@@ -1820,11 +1833,18 @@ static int migrate_pages_batch(struct list_head *from,
 	LIST_HEAD(dst_batch);
 	LIST_HEAD(src_std);
 	LIST_HEAD(dst_std);
+	bool do_batch = false;
 	bool nosplit = (reason == MR_NUMA_MISPLACED);
 
 	VM_WARN_ON_ONCE(mode != MIGRATE_ASYNC &&
 			!list_empty(from) && !list_is_singular(from));
 
+#ifdef CONFIG_MIGRATION_COPY_OFFLOAD
+	/* Check if the offload driver wants to batch for this reason */
+	if (static_branch_unlikely(&migrate_offload_enabled))
+		do_batch = static_call(migrate_should_batch)(reason);
+#endif
+
 	for (pass = 0; pass < nr_pass && retry; pass++) {
 		retry = 0;
 		thp_retry = 0;
@@ -1967,7 +1987,7 @@ static int migrate_pages_batch(struct list_head *from,
 				break;
 			case 0:
 				if (static_branch_unlikely(&migrate_offload_enabled) &&
-				    folio_supports_batch_copy(folio)) {
+				    do_batch && folio_supports_batch_copy(folio)) {
 					list_move_tail(&folio->lru, &src_batch);
 					list_add_tail(&dst->lru, &dst_batch);
 					nr_batch++;
@@ -1997,11 +2017,17 @@ static int migrate_pages_batch(struct list_head *from,
 	/* Flush TLBs for all unmapped folios */
 	try_to_unmap_flush();
 
+#ifdef CONFIG_MIGRATION_COPY_OFFLOAD
 	/* Batch-copy eligible folios before the move phase */
 	if (!list_empty(&src_batch)) {
-		rc = folios_mc_copy(&dst_batch, &src_batch, nr_batch);
+		int idx = srcu_read_lock(&migrate_offload_srcu);
+
+		rc = static_call(migrate_offload_copy)(&dst_batch,
+				&src_batch, nr_batch);
+		srcu_read_unlock(&migrate_offload_srcu, idx);
 		batch_copied = (rc == 0);
 	}
+#endif
 
 	retry = 1;
 	for (pass = 0; pass < nr_pass && retry; pass++) {
diff --git a/mm/migrate_copy_offload.c b/mm/migrate_copy_offload.c
new file mode 100644
index 000000000000..c22068fe09a0
--- /dev/null
+++ b/mm/migrate_copy_offload.c
@@ -0,0 +1,99 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/jump_label.h>
+#include <linux/module.h>
+#include <linux/srcu.h>
+#include <linux/migrate.h>
+#include <linux/migrate_copy_offload.h>
+#include <linux/static_call.h>
+
+static DEFINE_MUTEX(migrator_mutex);
+static struct migrator *active_migrator;
+
+DECLARE_STATIC_CALL(migrate_offload_copy, folios_mc_copy);
+DECLARE_STATIC_CALL(migrate_should_batch, migrate_should_batch_default);
+
+/**
+ * migrate_offload_start - register a batch-copy provider for page migration.
+ * @m: migrator to install.
+ *
+ * Only one provider can be active at a time, returns -EBUSY if another migrator
+ * is already registered.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int migrate_offload_start(struct migrator *m)
+{
+	int ret = 0;
+
+	if (!m || !m->offload_copy)
+		return -EINVAL;
+
+	mutex_lock(&migrator_mutex);
+	if (active_migrator) {
+		ret = -EBUSY;
+		goto unlock;
+	}
+
+	if (m->owner && !try_module_get(m->owner)) {
+		ret = -ENODEV;
+		goto unlock;
+	}
+
+	static_call_update(migrate_offload_copy, m->offload_copy);
+	static_call_update(migrate_should_batch,
+		m->should_batch ? m->should_batch : migrate_should_batch_default);
+	active_migrator = m;
+	static_branch_enable(&migrate_offload_enabled);
+
+unlock:
+	mutex_unlock(&migrator_mutex);
+
+	if (ret)
+		pr_err("migrate_offload: %s: failed to register (%d)\n",
+		       m->name, ret);
+	else
+		pr_info("migrate_offload: enabled by %s\n", m->name);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(migrate_offload_start);
+
+/**
+ * migrate_offload_stop - unregister the active batch-copy provider.
+ * @m: migrator to remove (must be the currently active one).
+ *
+ * Reverts static_call targets and waits for SRCU grace period so that
+ * no in-flight migration is still calling the driver functions before
+ * releasing the module.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int migrate_offload_stop(struct migrator *m)
+{
+	struct module *owner;
+
+	mutex_lock(&migrator_mutex);
+	if (active_migrator != m) {
+		mutex_unlock(&migrator_mutex);
+		return -EINVAL;
+	}
+
+	/*
+	 * Disable the static branch first so new migrate_pages_batch calls
+	 * won't enter the batch copy path.
+	 */
+	static_branch_disable(&migrate_offload_enabled);
+	static_call_update(migrate_offload_copy, folios_mc_copy);
+	static_call_update(migrate_should_batch, migrate_should_batch_default);
+	owner = active_migrator->owner;
+	active_migrator = NULL;
+	mutex_unlock(&migrator_mutex);
+
+	/* Wait for all in-flight callers to finish before module_put(). */
+	synchronize_srcu(&migrate_offload_srcu);
+	if (owner)
+		module_put(owner);
+
+	pr_info("migrate_offload: disabled by %s\n", m->name);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(migrate_offload_stop);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v4 5/6] drivers/migrate_offload: add DMA batch copy driver (dcbm)
  2026-03-09 12:07 [RFC PATCH v4 0/6] Accelerate page migration with batch copying and hardware offload Shivank Garg
                   ` (3 preceding siblings ...)
  2026-03-09 12:07 ` [RFC PATCH v4 4/6] mm/migrate: add copy offload registration infrastructure Shivank Garg
@ 2026-03-09 12:07 ` Shivank Garg
  2026-03-09 18:04   ` Gregory Price
  2026-03-24  8:10   ` Huang, Ying
  2026-03-09 12:07 ` [RFC PATCH v4 6/6] mm/migrate: adjust NR_MAX_BATCHED_MIGRATION for testing Shivank Garg
  2026-03-18 14:29 ` [RFC PATCH v4 0/6] Accelerate page migration with batch copying and hardware offload Garg, Shivank
  6 siblings, 2 replies; 21+ messages in thread
From: Shivank Garg @ 2026-03-09 12:07 UTC (permalink / raw)
  To: akpm, david
  Cc: lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt, surenb,
	mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
	gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm,
	Shivank Garg

Simple DMAEngine based driver that uses memcpy channels to batch-copy
folios during page migration. Primarily for testing the copy offload
infrastructure.

When DMA fails the callback returns an error and the migration path
falls back to per-folio CPU copy.

Sysfs interface under /sys/kernel/dcbm/:
  offloading      - enable/disable DMA offload
  nr_dma_chan     - max number of DMA channels to use
  folios_migrated - folios copied via DMA
  folios_failures - fallback count

Signed-off-by: Shivank Garg <shivankg@amd.com>
---
 drivers/Kconfig                       |   2 +
 drivers/Makefile                      |   2 +
 drivers/migrate_offload/Kconfig       |   8 +
 drivers/migrate_offload/Makefile      |   1 +
 drivers/migrate_offload/dcbm/Makefile |   1 +
 drivers/migrate_offload/dcbm/dcbm.c   | 457 ++++++++++++++++++++++++++
 6 files changed, 471 insertions(+)
 create mode 100644 drivers/migrate_offload/Kconfig
 create mode 100644 drivers/migrate_offload/Makefile
 create mode 100644 drivers/migrate_offload/dcbm/Makefile
 create mode 100644 drivers/migrate_offload/dcbm/dcbm.c

diff --git a/drivers/Kconfig b/drivers/Kconfig
index c0f1fb893ec0..3dbea1380603 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -255,4 +255,6 @@ source "drivers/cdx/Kconfig"
 
 source "drivers/resctrl/Kconfig"
 
+source "drivers/migrate_offload/Kconfig"
+
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 53fbd2e0acdd..f55bddf490cc 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -42,6 +42,8 @@ obj-y				+= clk/
 # really early.
 obj-$(CONFIG_DMADEVICES)	+= dma/
 
+obj-$(CONFIG_MIGRATION_COPY_OFFLOAD)	+= migrate_offload/
+
 # SOC specific infrastructure drivers.
 obj-y				+= soc/
 obj-$(CONFIG_PM_GENERIC_DOMAINS)	+= pmdomain/
diff --git a/drivers/migrate_offload/Kconfig b/drivers/migrate_offload/Kconfig
new file mode 100644
index 000000000000..0bbaedbae4ad
--- /dev/null
+++ b/drivers/migrate_offload/Kconfig
@@ -0,0 +1,8 @@
+config DCBM_DMA
+	bool "DMA Core Batch Migrator"
+	depends on MIGRATION_COPY_OFFLOAD && DMA_ENGINE
+	help
+	  DMA-based batch copy engine for page migration. Uses
+	  DMAEngine memcpy channels to offload folio data copies
+	  during migration. Primarily intended for testing the copy
+	  offload infrastructure.
diff --git a/drivers/migrate_offload/Makefile b/drivers/migrate_offload/Makefile
new file mode 100644
index 000000000000..9e16018beb15
--- /dev/null
+++ b/drivers/migrate_offload/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_DCBM_DMA)		+= dcbm/
diff --git a/drivers/migrate_offload/dcbm/Makefile b/drivers/migrate_offload/dcbm/Makefile
new file mode 100644
index 000000000000..56ba47cce0f1
--- /dev/null
+++ b/drivers/migrate_offload/dcbm/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_DCBM_DMA) += dcbm.o
diff --git a/drivers/migrate_offload/dcbm/dcbm.c b/drivers/migrate_offload/dcbm/dcbm.c
new file mode 100644
index 000000000000..89751d03101e
--- /dev/null
+++ b/drivers/migrate_offload/dcbm/dcbm.c
@@ -0,0 +1,457 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * DMA Core Batch Migrator (DCBM)
+ *
+ * Uses DMAEngine memcpy channels to offload batch folio copies during
+ * page migration. Reference driver meant for testing the offload
+ * infrastructure.
+ *
+ * Copyright (C) 2024-26 Advanced Micro Devices, Inc.
+ */
+
+#include <linux/module.h>
+#include <linux/dma-mapping.h>
+#include <linux/dmaengine.h>
+#include <linux/migrate.h>
+#include <linux/migrate_copy_offload.h>
+
+#define MAX_DMA_CHANNELS	16
+
+static unsigned long long folios_migrated;
+static unsigned long long folios_failures;
+
+static bool offloading_enabled;
+static unsigned int nr_dma_channels = 1;
+static DEFINE_MUTEX(dcbm_mutex);
+
+struct dma_work {
+	struct dma_chan *chan;
+	struct completion done;
+	atomic_t pending;
+	struct sg_table *src_sgt;
+	struct sg_table *dst_sgt;
+	bool mapped;
+};
+
+static void dma_completion_callback(void *data)
+{
+	struct dma_work *work = data;
+
+	if (atomic_dec_and_test(&work->pending))
+		complete(&work->done);
+}
+
+static int setup_sg_tables(struct dma_work *work, struct list_head **src_pos,
+		struct list_head **dst_pos, int nr)
+{
+	struct scatterlist *sg_src, *sg_dst;
+	struct device *dev;
+	int i, ret;
+
+	work->src_sgt = kmalloc_obj(*work->src_sgt, GFP_KERNEL);
+	if (!work->src_sgt)
+		return -ENOMEM;
+	work->dst_sgt = kmalloc_obj(*work->dst_sgt, GFP_KERNEL);
+	if (!work->dst_sgt)
+		goto err_free_src;
+
+	ret = sg_alloc_table(work->src_sgt, nr, GFP_KERNEL);
+	if (ret)
+		goto err_free_dst;
+	ret = sg_alloc_table(work->dst_sgt, nr, GFP_KERNEL);
+	if (ret)
+		goto err_free_src_table;
+
+	sg_src = work->src_sgt->sgl;
+	sg_dst = work->dst_sgt->sgl;
+	for (i = 0; i < nr; i++) {
+		struct folio *src = list_entry(*src_pos, struct folio, lru);
+		struct folio *dst = list_entry(*dst_pos, struct folio, lru);
+
+		sg_set_folio(sg_src, src, folio_size(src), 0);
+		sg_set_folio(sg_dst, dst, folio_size(dst), 0);
+
+		*src_pos = (*src_pos)->next;
+		*dst_pos = (*dst_pos)->next;
+
+		if (i < nr - 1) {
+			sg_src = sg_next(sg_src);
+			sg_dst = sg_next(sg_dst);
+		}
+	}
+
+	dev = dmaengine_get_dma_device(work->chan);
+	if (!dev) {
+		ret = -ENODEV;
+		goto err_free_dst_table;
+	}
+	ret = dma_map_sgtable(dev, work->src_sgt, DMA_TO_DEVICE,
+			DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING);
+	if (ret)
+		goto err_free_dst_table;
+	ret = dma_map_sgtable(dev, work->dst_sgt, DMA_FROM_DEVICE,
+			DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING);
+	if (ret)
+		goto err_unmap_src;
+
+	if (work->src_sgt->nents != work->dst_sgt->nents) {
+		ret = -EINVAL;
+		goto err_unmap_dst;
+	}
+	work->mapped = true;
+	return 0;
+
+err_unmap_dst:
+	dma_unmap_sgtable(dev, work->dst_sgt, DMA_FROM_DEVICE,
+			DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING);
+err_unmap_src:
+	dma_unmap_sgtable(dev, work->src_sgt, DMA_TO_DEVICE,
+			DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING);
+err_free_dst_table:
+	sg_free_table(work->dst_sgt);
+err_free_src_table:
+	sg_free_table(work->src_sgt);
+err_free_dst:
+	kfree(work->dst_sgt);
+	work->dst_sgt = NULL;
+err_free_src:
+	kfree(work->src_sgt);
+	work->src_sgt = NULL;
+	return ret;
+}
+
+static void cleanup_dma_work(struct dma_work *works, int actual_channels)
+{
+	struct device *dev;
+	int i;
+
+	if (!works)
+		return;
+
+	for (i = 0; i < actual_channels; i++) {
+		if (!works[i].chan)
+			continue;
+
+		dev = dmaengine_get_dma_device(works[i].chan);
+
+		if (works[i].mapped)
+			dmaengine_terminate_sync(works[i].chan);
+
+		if (dev && works[i].mapped) {
+			if (works[i].src_sgt) {
+				dma_unmap_sgtable(dev, works[i].src_sgt,
+						DMA_TO_DEVICE,
+						DMA_ATTR_SKIP_CPU_SYNC |
+						DMA_ATTR_NO_KERNEL_MAPPING);
+				sg_free_table(works[i].src_sgt);
+				kfree(works[i].src_sgt);
+			}
+			if (works[i].dst_sgt) {
+				dma_unmap_sgtable(dev, works[i].dst_sgt,
+						DMA_FROM_DEVICE,
+						DMA_ATTR_SKIP_CPU_SYNC |
+						DMA_ATTR_NO_KERNEL_MAPPING);
+				sg_free_table(works[i].dst_sgt);
+				kfree(works[i].dst_sgt);
+			}
+		}
+		dma_release_channel(works[i].chan);
+	}
+	kfree(works);
+}
+
+static int submit_dma_transfers(struct dma_work *work)
+{
+	struct scatterlist *sg_src, *sg_dst;
+	struct dma_async_tx_descriptor *tx;
+	unsigned long flags = DMA_CTRL_ACK;
+	dma_cookie_t cookie;
+	int i;
+
+	atomic_set(&work->pending, 1);
+
+	sg_src = work->src_sgt->sgl;
+	sg_dst = work->dst_sgt->sgl;
+	for_each_sgtable_dma_sg(work->src_sgt, sg_src, i) {
+		if (i == work->src_sgt->nents - 1)
+			flags |= DMA_PREP_INTERRUPT;
+
+		tx = dmaengine_prep_dma_memcpy(work->chan,
+				sg_dma_address(sg_dst),
+				sg_dma_address(sg_src),
+				sg_dma_len(sg_src), flags);
+		if (!tx) {
+			atomic_set(&work->pending, 0);
+			return -EIO;
+		}
+
+		if (i == work->src_sgt->nents - 1) {
+			tx->callback = dma_completion_callback;
+			tx->callback_param = work;
+		}
+
+		cookie = dmaengine_submit(tx);
+		if (dma_submit_error(cookie)) {
+			atomic_set(&work->pending, 0);
+			return -EIO;
+		}
+		sg_dst = sg_next(sg_dst);
+	}
+	return 0;
+}
+
+/**
+ * folios_copy_dma - copy a batch of folios via DMA memcpy
+ * @dst_list: destination folio list
+ * @src_list: source folio list
+ * @nr_folios: number of folios in each list
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+static int folios_copy_dma(struct list_head *dst_list,
+		struct list_head *src_list, unsigned int nr_folios)
+{
+	struct dma_work *works;
+	struct list_head *src_pos = src_list->next;
+	struct list_head *dst_pos = dst_list->next;
+	int i, folios_per_chan, ret;
+	dma_cap_mask_t mask;
+	int actual_channels = 0;
+	unsigned int max_channels;
+
+	max_channels = min3(nr_dma_channels, nr_folios,
+			(unsigned int)MAX_DMA_CHANNELS);
+
+	works = kcalloc(max_channels, sizeof(*works), GFP_KERNEL);
+	if (!works)
+		return -ENOMEM;
+
+	dma_cap_zero(mask);
+	dma_cap_set(DMA_MEMCPY, mask);
+
+	for (i = 0; i < max_channels; i++) {
+		works[actual_channels].chan = dma_request_chan_by_mask(&mask);
+		if (IS_ERR(works[actual_channels].chan))
+			break;
+		init_completion(&works[actual_channels].done);
+		actual_channels++;
+	}
+
+	if (actual_channels == 0) {
+		kfree(works);
+		return -ENODEV;
+	}
+
+	for (i = 0; i < actual_channels; i++) {
+		folios_per_chan = nr_folios * (i + 1) / actual_channels -
+				(nr_folios * i) / actual_channels;
+		if (folios_per_chan == 0)
+			continue;
+
+		ret = setup_sg_tables(&works[i], &src_pos, &dst_pos,
+				folios_per_chan);
+		if (ret)
+			goto err_cleanup;
+	}
+
+	for (i = 0; i < actual_channels; i++) {
+		ret = submit_dma_transfers(&works[i]);
+		if (ret)
+			goto err_cleanup;
+	}
+
+	for (i = 0; i < actual_channels; i++) {
+		if (atomic_read(&works[i].pending) > 0)
+			dma_async_issue_pending(works[i].chan);
+	}
+
+	for (i = 0; i < actual_channels; i++) {
+		if (atomic_read(&works[i].pending) == 0)
+			continue;
+		if (!wait_for_completion_timeout(&works[i].done,
+				msecs_to_jiffies(10000))) {
+			ret = -ETIMEDOUT;
+			goto err_cleanup;
+		}
+	}
+
+	cleanup_dma_work(works, actual_channels);
+
+	mutex_lock(&dcbm_mutex);
+	folios_migrated += nr_folios;
+	mutex_unlock(&dcbm_mutex);
+	return 0;
+
+err_cleanup:
+	pr_warn_ratelimited("dcbm: DMA copy failed (%d), falling back to CPU\n",
+			ret);
+	cleanup_dma_work(works, actual_channels);
+
+	mutex_lock(&dcbm_mutex);
+	folios_failures += nr_folios;
+	mutex_unlock(&dcbm_mutex);
+	return ret;
+}
+
+/* TODO: tune based on usecase */
+static bool dma_should_batch(int reason)
+{
+	if (reason == MR_SYSCALL || reason == MR_COMPACTION || reason == MR_DEMOTION ||
+	    reason == MR_NUMA_MISPLACED)
+		return true;
+	return false;
+}
+
+static struct migrator dma_migrator = {
+	.name = "DCBM",
+	.offload_copy = folios_copy_dma,
+	.should_batch = dma_should_batch,
+	.owner = THIS_MODULE,
+};
+
+static ssize_t offloading_show(struct kobject *kobj,
+		struct kobj_attribute *attr, char *buf)
+{
+	return sysfs_emit(buf, "%d\n", offloading_enabled);
+}
+
+static ssize_t offloading_store(struct kobject *kobj,
+		struct kobj_attribute *attr, const char *buf, size_t count)
+{
+	bool enable;
+	int ret;
+
+	ret = kstrtobool(buf, &enable);
+	if (ret)
+		return ret;
+
+	mutex_lock(&dcbm_mutex);
+
+	if (enable == offloading_enabled)
+		goto out;
+
+	if (enable) {
+		ret = migrate_offload_start(&dma_migrator);
+		if (ret) {
+			mutex_unlock(&dcbm_mutex);
+			return ret;
+		}
+		offloading_enabled = true;
+	} else {
+		migrate_offload_stop(&dma_migrator);
+		offloading_enabled = false;
+	}
+out:
+	mutex_unlock(&dcbm_mutex);
+	return count;
+}
+
+static ssize_t folios_migrated_show(struct kobject *kobj,
+		struct kobj_attribute *attr, char *buf)
+{
+	return sysfs_emit(buf, "%llu\n", folios_migrated);
+}
+
+static ssize_t folios_migrated_store(struct kobject *kobj,
+		struct kobj_attribute *attr, const char *buf, size_t count)
+{
+	mutex_lock(&dcbm_mutex);
+	folios_migrated = 0;
+	mutex_unlock(&dcbm_mutex);
+	return count;
+}
+
+static ssize_t folios_failures_show(struct kobject *kobj,
+		struct kobj_attribute *attr, char *buf)
+{
+	return sysfs_emit(buf, "%llu\n", folios_failures);
+}
+
+static ssize_t folios_failures_store(struct kobject *kobj,
+		struct kobj_attribute *attr, const char *buf, size_t count)
+{
+	mutex_lock(&dcbm_mutex);
+	folios_failures = 0;
+	mutex_unlock(&dcbm_mutex);
+	return count;
+}
+
+static ssize_t nr_dma_chan_show(struct kobject *kobj,
+		struct kobj_attribute *attr, char *buf)
+{
+	return sysfs_emit(buf, "%u\n", nr_dma_channels);
+}
+
+static ssize_t nr_dma_chan_store(struct kobject *kobj,
+		struct kobj_attribute *attr, const char *buf, size_t count)
+{
+	unsigned int val;
+	int ret;
+
+	ret = kstrtouint(buf, 0, &val);
+	if (ret)
+		return ret;
+
+	if (val < 1 || val > MAX_DMA_CHANNELS)
+		return -EINVAL;
+
+	mutex_lock(&dcbm_mutex);
+	nr_dma_channels = val;
+	mutex_unlock(&dcbm_mutex);
+	return count;
+}
+
+static struct kobj_attribute offloading_attr = __ATTR_RW(offloading);
+static struct kobj_attribute nr_dma_chan_attr = __ATTR_RW(nr_dma_chan);
+static struct kobj_attribute folios_migrated_attr = __ATTR_RW(folios_migrated);
+static struct kobj_attribute folios_failures_attr = __ATTR_RW(folios_failures);
+
+static struct attribute *dcbm_attrs[] = {
+	&offloading_attr.attr,
+	&nr_dma_chan_attr.attr,
+	&folios_migrated_attr.attr,
+	&folios_failures_attr.attr,
+	NULL
+};
+ATTRIBUTE_GROUPS(dcbm);
+
+static struct kobject *dcbm_kobj;
+
+static int __init dcbm_init(void)
+{
+	int ret;
+
+	dcbm_kobj = kobject_create_and_add("dcbm", kernel_kobj);
+	if (!dcbm_kobj)
+		return -ENOMEM;
+
+	ret = sysfs_create_groups(dcbm_kobj, dcbm_groups);
+	if (ret) {
+		kobject_put(dcbm_kobj);
+		return ret;
+	}
+
+	pr_info("dcbm: DMA Core Batch Migrator initialized\n");
+	return 0;
+}
+
+static void __exit dcbm_exit(void)
+{
+	mutex_lock(&dcbm_mutex);
+	if (offloading_enabled) {
+		migrate_offload_stop(&dma_migrator);
+		offloading_enabled = false;
+	}
+	mutex_unlock(&dcbm_mutex);
+
+	sysfs_remove_groups(dcbm_kobj, dcbm_groups);
+	kobject_put(dcbm_kobj);
+	pr_info("dcbm: DMA Core Batch Migrator unloaded\n");
+}
+
+module_init(dcbm_init);
+module_exit(dcbm_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Shivank Garg");
+MODULE_DESCRIPTION("DMA Core Batch Migrator");
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v4 6/6] mm/migrate: adjust NR_MAX_BATCHED_MIGRATION for testing
  2026-03-09 12:07 [RFC PATCH v4 0/6] Accelerate page migration with batch copying and hardware offload Shivank Garg
                   ` (4 preceding siblings ...)
  2026-03-09 12:07 ` [RFC PATCH v4 5/6] drivers/migrate_offload: add DMA batch copy driver (dcbm) Shivank Garg
@ 2026-03-09 12:07 ` Shivank Garg
  2026-03-18 14:29 ` [RFC PATCH v4 0/6] Accelerate page migration with batch copying and hardware offload Garg, Shivank
  6 siblings, 0 replies; 21+ messages in thread
From: Shivank Garg @ 2026-03-09 12:07 UTC (permalink / raw)
  To: akpm, david
  Cc: lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt, surenb,
	mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
	gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm,
	Shivank Garg

From: Zi Yan <ziy@nvidia.com>

Change NR_MAX_BATCHED_MIGRATION to HPAGE_PUD_NR to allow batching THP
copies.

These are for testing purpose only.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
 mm/migrate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index acaaa9cc0d4f..8540e303190b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1606,7 +1606,7 @@ static inline int try_split_folio(struct folio *folio, struct list_head *split_f
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-#define NR_MAX_BATCHED_MIGRATION	HPAGE_PMD_NR
+#define NR_MAX_BATCHED_MIGRATION	HPAGE_PUD_NR
 #else
 #define NR_MAX_BATCHED_MIGRATION	512
 #endif
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v4 4/6] mm/migrate: add copy offload registration infrastructure
  2026-03-09 12:07 ` [RFC PATCH v4 4/6] mm/migrate: add copy offload registration infrastructure Shivank Garg
@ 2026-03-09 17:54   ` Gregory Price
  2026-03-10 10:07     ` Garg, Shivank
  2026-03-24 10:54   ` Huang, Ying
  1 sibling, 1 reply; 21+ messages in thread
From: Gregory Price @ 2026-03-09 17:54 UTC (permalink / raw)
  To: Shivank Garg
  Cc: akpm, david, lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt,
	surenb, mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	byungchul, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm,
	Mike Day

On Mon, Mar 09, 2026 at 12:07:29PM +0000, Shivank Garg wrote:
... snip ... 
> 
> Only one migrator can be active a time. A second registration returns
> -EBUSY, and only the active migrator can stop itself. The static_call
> dispatch is under SRCU so synchronize_srcu() in stop path guarantees
> no in-flight copy before the module reference is dropped.
> 
... snip ...

> @@ -1820,11 +1833,18 @@ static int migrate_pages_batch(struct list_head *from,
>  
> +#ifdef CONFIG_MIGRATION_COPY_OFFLOAD
> +	/* Check if the offload driver wants to batch for this reason */
> +	if (static_branch_unlikely(&migrate_offload_enabled))
> +		do_batch = static_call(migrate_should_batch)(reason);
> +#endif
> +

should the migrate_should_batch call also be done under srcu?

In theory it's an incredibly small window, but there is a race window.

~Gregory


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v4 5/6] drivers/migrate_offload: add DMA batch copy driver (dcbm)
  2026-03-09 12:07 ` [RFC PATCH v4 5/6] drivers/migrate_offload: add DMA batch copy driver (dcbm) Shivank Garg
@ 2026-03-09 18:04   ` Gregory Price
  2026-03-12  9:33     ` Garg, Shivank
  2026-03-24  8:10   ` Huang, Ying
  1 sibling, 1 reply; 21+ messages in thread
From: Gregory Price @ 2026-03-09 18:04 UTC (permalink / raw)
  To: Shivank Garg
  Cc: akpm, david, lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt,
	surenb, mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	byungchul, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm

On Mon, Mar 09, 2026 at 12:07:31PM +0000, Shivank Garg wrote:
> diff --git a/drivers/migrate_offload/Kconfig b/drivers/migrate_offload/Kconfig
> new file mode 100644
> index 000000000000..0bbaedbae4ad
> --- /dev/null
> +++ b/drivers/migrate_offload/Kconfig
> @@ -0,0 +1,8 @@
> +config DCBM_DMA
> +	bool "DMA Core Batch Migrator"

Should this be tri-state or is built-in the only valid state?

> +static int setup_sg_tables(struct dma_work *work, struct list_head **src_pos,
> +		struct list_head **dst_pos, int nr)
> +{
... snip ..
> +	dev = dmaengine_get_dma_device(work->chan);
> +	if (!dev) {
> +		ret = -ENODEV;
> +		goto err_free_dst_table;
> +	}
> +	ret = dma_map_sgtable(dev, work->src_sgt, DMA_TO_DEVICE,
> +			DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING);
> +	if (ret)
> +		goto err_free_dst_table;
> +	ret = dma_map_sgtable(dev, work->dst_sgt, DMA_FROM_DEVICE,
> +			DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING);
> +	if (ret)
> +		goto err_unmap_src;
> +
> +	if (work->src_sgt->nents != work->dst_sgt->nents) {
> +		ret = -EINVAL;
> +		goto err_unmap_dst;
> +	}

Fairly new to dma space, but I thought the dma stuff could merge pages
on iommu systems. Wouldn't this check hit fairly often?

~Gregory


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v4 4/6] mm/migrate: add copy offload registration infrastructure
  2026-03-09 17:54   ` Gregory Price
@ 2026-03-10 10:07     ` Garg, Shivank
  0 siblings, 0 replies; 21+ messages in thread
From: Garg, Shivank @ 2026-03-10 10:07 UTC (permalink / raw)
  To: Gregory Price
  Cc: akpm, david, lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt,
	surenb, mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	byungchul, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm,
	Mike Day



On 3/9/2026 11:24 PM, Gregory Price wrote:
> On Mon, Mar 09, 2026 at 12:07:29PM +0000, Shivank Garg wrote:
> ... snip ... 
>>
>> Only one migrator can be active a time. A second registration returns
>> -EBUSY, and only the active migrator can stop itself. The static_call
>> dispatch is under SRCU so synchronize_srcu() in stop path guarantees
>> no in-flight copy before the module reference is dropped.
>>
> ... snip ...
> 
>> @@ -1820,11 +1833,18 @@ static int migrate_pages_batch(struct list_head *from,
>>  
>> +#ifdef CONFIG_MIGRATION_COPY_OFFLOAD
>> +	/* Check if the offload driver wants to batch for this reason */
>> +	if (static_branch_unlikely(&migrate_offload_enabled))
>> +		do_batch = static_call(migrate_should_batch)(reason);
>> +#endif
>> +
> 
> should the migrate_should_batch call also be done under srcu?
> 
> In theory it's an incredibly small window, but there is a race window.

Yes, right.
thanks for pointing this out. 

Best regards,
Shivank


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v4 5/6] drivers/migrate_offload: add DMA batch copy driver (dcbm)
  2026-03-09 18:04   ` Gregory Price
@ 2026-03-12  9:33     ` Garg, Shivank
  0 siblings, 0 replies; 21+ messages in thread
From: Garg, Shivank @ 2026-03-12  9:33 UTC (permalink / raw)
  To: Gregory Price
  Cc: akpm, david, lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt,
	surenb, mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	byungchul, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm



On 3/9/2026 11:34 PM, Gregory Price wrote:
> On Mon, Mar 09, 2026 at 12:07:31PM +0000, Shivank Garg wrote:
>> diff --git a/drivers/migrate_offload/Kconfig b/drivers/migrate_offload/Kconfig
>> new file mode 100644
>> index 000000000000..0bbaedbae4ad
>> --- /dev/null
>> +++ b/drivers/migrate_offload/Kconfig
>> @@ -0,0 +1,8 @@
>> +config DCBM_DMA
>> +	bool "DMA Core Batch Migrator"
> 
> Should this be tri-state or is built-in the only valid state?

Right, will fix this.

> 
>> +static int setup_sg_tables(struct dma_work *work, struct list_head **src_pos,
>> +		struct list_head **dst_pos, int nr)
>> +{
> ... snip ..
>> +	dev = dmaengine_get_dma_device(work->chan);
>> +	if (!dev) {
>> +		ret = -ENODEV;
>> +		goto err_free_dst_table;
>> +	}
>> +	ret = dma_map_sgtable(dev, work->src_sgt, DMA_TO_DEVICE,
>> +			DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING);
>> +	if (ret)
>> +		goto err_free_dst_table;
>> +	ret = dma_map_sgtable(dev, work->dst_sgt, DMA_FROM_DEVICE,
>> +			DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING);
>> +	if (ret)
>> +		goto err_unmap_src;
>> +
>> +	if (work->src_sgt->nents != work->dst_sgt->nents) {
>> +		ret = -EINVAL;
>> +		goto err_unmap_dst;
>> +	}
> 
> Fairly new to dma space, but I thought the dma stuff could merge pages
> on iommu systems. Wouldn't this check hit fairly often?
>

I tested on Zen3 system (with PTDMA) across different folio sizes and
didn't see this check hit in ~1000s of runs. I'll think more about this
problem and discuss with IOMMU team.

For now the focus of this series is the batch migration and core offload
infrastructure. This is a reference driver to test the offload plumbing
potential performance benefit. I'm happy to refine this once the design
settles.

Best regards,
Shivank


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v4 1/6] mm: introduce folios_mc_copy() for batch folio copying
  2026-03-09 12:07 ` [RFC PATCH v4 1/6] mm: introduce folios_mc_copy() for batch folio copying Shivank Garg
@ 2026-03-12  9:41   ` David Hildenbrand (Arm)
  2026-03-15 18:09     ` Garg, Shivank
  0 siblings, 1 reply; 21+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-12  9:41 UTC (permalink / raw)
  To: Shivank Garg, akpm
  Cc: lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt, surenb,
	mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
	gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm

On 3/9/26 13:07, Shivank Garg wrote:
> Add folios_mc_copy() which walks list of src and dst folios in lockstep,
> and copies folio content via folio_mc_copy(). folios_cnt parameter is
> unused here, but is part of the offload_copy callback signature used by
> later patches in the series.
> 
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---

I'd just squash that into patch #3.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v4 2/6] mm/migrate: skip data copy for already-copied folios
  2026-03-09 12:07 ` [RFC PATCH v4 2/6] mm/migrate: skip data copy for already-copied folios Shivank Garg
@ 2026-03-12  9:44   ` David Hildenbrand (Arm)
  2026-03-15 18:25     ` Garg, Shivank
  2026-03-24  8:22   ` Huang, Ying
  1 sibling, 1 reply; 21+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-12  9:44 UTC (permalink / raw)
  To: Shivank Garg, akpm
  Cc: lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt, surenb,
	mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
	gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm

On 3/9/26 13:07, Shivank Garg wrote:
> Add a PAGE_ALREADY_COPIED flag to the dst->private migration state.
> When set, __migrate_folio() skips folio_mc_copy() and performs
> metadata-only migration. All callers currently pass
> already_copied=false. The batch-copy path enables it in a later patch.
> 
> Move the dst->private state enum earlier in the file so
> __migrate_folio() and move_to_new_folio() can see PAGE_ALREADY_COPIED.
> 
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---
>  mm/migrate.c | 52 +++++++++++++++++++++++++++++++---------------------
>  1 file changed, 31 insertions(+), 21 deletions(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 1bf2cf8c44dd..1d8c1fb627c9 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -848,6 +848,18 @@ void folio_migrate_flags(struct folio *newfolio, struct folio *folio)
>  }
>  EXPORT_SYMBOL(folio_migrate_flags);
>  
> +/*
> + * To record some information during migration, we use unused private
> + * field of struct folio of the newly allocated destination folio.
> + * This is safe because nobody is using it except us.
> + */
> +enum {
> +	PAGE_WAS_MAPPED = BIT(0),
> +	PAGE_WAS_MLOCKED = BIT(1),
> +	PAGE_ALREADY_COPIED = BIT(2),
> +	PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED | PAGE_ALREADY_COPIED,

All these states really only apply to proper folios (not movable_ops).
So once we complete decoupling movable_ops migration from folio
migration, these flags would only appear in the folio migration part.

Can we convert them first to state it clearly already that these are
folio migration flags?

FOLIO_MF_WAS_MAPPED

...

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v4 1/6] mm: introduce folios_mc_copy() for batch folio copying
  2026-03-12  9:41   ` David Hildenbrand (Arm)
@ 2026-03-15 18:09     ` Garg, Shivank
  0 siblings, 0 replies; 21+ messages in thread
From: Garg, Shivank @ 2026-03-15 18:09 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt, surenb,
	mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
	gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm,
	akpm



On 3/12/2026 3:11 PM, David Hildenbrand (Arm) wrote:
> On 3/9/26 13:07, Shivank Garg wrote:
>> Add folios_mc_copy() which walks list of src and dst folios in lockstep,
>> and copies folio content via folio_mc_copy(). folios_cnt parameter is
>> unused here, but is part of the offload_copy callback signature used by
>> later patches in the series.
>>
>> Signed-off-by: Shivank Garg <shivankg@amd.com>
>> ---
> 
> I'd just squash that into patch #3.
> 

Done.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v4 2/6] mm/migrate: skip data copy for already-copied folios
  2026-03-12  9:44   ` David Hildenbrand (Arm)
@ 2026-03-15 18:25     ` Garg, Shivank
  2026-03-23 12:20       ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 21+ messages in thread
From: Garg, Shivank @ 2026-03-15 18:25 UTC (permalink / raw)
  To: David Hildenbrand (Arm), akpm
  Cc: lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt, surenb,
	mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
	gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm



On 3/12/2026 3:14 PM, David Hildenbrand (Arm) wrote:
> On 3/9/26 13:07, Shivank Garg wrote:
>> Add a PAGE_ALREADY_COPIED flag to the dst->private migration state.
>> When set, __migrate_folio() skips folio_mc_copy() and performs
>> metadata-only migration. All callers currently pass
>> already_copied=false. The batch-copy path enables it in a later patch.
>>
>> Move the dst->private state enum earlier in the file so
>> __migrate_folio() and move_to_new_folio() can see PAGE_ALREADY_COPIED.
>>
>> Signed-off-by: Shivank Garg <shivankg@amd.com>
>> ---
>>  mm/migrate.c | 52 +++++++++++++++++++++++++++++++---------------------
>>  1 file changed, 31 insertions(+), 21 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 1bf2cf8c44dd..1d8c1fb627c9 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -848,6 +848,18 @@ void folio_migrate_flags(struct folio *newfolio, struct folio *folio)
>>  }
>>  EXPORT_SYMBOL(folio_migrate_flags);
>>  
>> +/*
>> + * To record some information during migration, we use unused private
>> + * field of struct folio of the newly allocated destination folio.
>> + * This is safe because nobody is using it except us.
>> + */
>> +enum {
>> +	PAGE_WAS_MAPPED = BIT(0),
>> +	PAGE_WAS_MLOCKED = BIT(1),
>> +	PAGE_ALREADY_COPIED = BIT(2),
>> +	PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED | PAGE_ALREADY_COPIED,
> 
> All these states really only apply to proper folios (not movable_ops).
> So once we complete decoupling movable_ops migration from folio
> migration, these flags would only appear in the folio migration part.
> 
> Can we convert them first to state it clearly already that these are
> folio migration flags?
> 
> FOLIO_MF_WAS_MAPPED
> 
> ...
> 
Sure, done.

Should I fold it into the series? Or send it as independent patch as this series would
likely take few more rounds of reviews and discussion.



Subject: [PATCH] mm/migrate: rename PAGE_ migration flags to FOLIO_MF_

These flags only track folio-specific state during migration and are
not used for movable_ops pages. Rename the enum values and the
old_page_state variable to match.

No functional change.

Suggested-by: David Hildenbrand <david@kernel.org>
Signed-off-by: Shivank Garg shivankg@amd.com
---
 mm/migrate.c | 46 +++++++++++++++++++++++-----------------------
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 1bf2cf8c44dd..8c9115cc4586 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1133,26 +1133,26 @@ static int move_to_new_folio(struct folio *dst, struct folio *src,
  * This is safe because nobody is using it except us.
  */
 enum {
-	PAGE_WAS_MAPPED = BIT(0),
-	PAGE_WAS_MLOCKED = BIT(1),
-	PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED,
+	FOLIO_MF_WAS_MAPPED = BIT(0),
+	FOLIO_MF_WAS_MLOCKED = BIT(1),
+	FOLIO_MF_OLD_STATES = FOLIO_MF_WAS_MAPPED | FOLIO_MF_WAS_MLOCKED,
 };
 
 static void __migrate_folio_record(struct folio *dst,
-				   int old_page_state,
+				   int old_folio_state,
 				   struct anon_vma *anon_vma)
 {
-	dst->private = (void *)anon_vma + old_page_state;
+	dst->private = (void *)anon_vma + old_folio_state;
 }
 
 static void __migrate_folio_extract(struct folio *dst,
-				   int *old_page_state,
+				   int *old_folio_state,
 				   struct anon_vma **anon_vmap)
 {
 	unsigned long private = (unsigned long)dst->private;
 
-	*anon_vmap = (struct anon_vma *)(private & ~PAGE_OLD_STATES);
-	*old_page_state = private & PAGE_OLD_STATES;
+	*anon_vmap = (struct anon_vma *)(private & ~FOLIO_MF_OLD_STATES);
+	*old_folio_state = private & FOLIO_MF_OLD_STATES;
 	dst->private = NULL;
 }
 
@@ -1207,7 +1207,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
 {
 	struct folio *dst;
 	int rc = -EAGAIN;
-	int old_page_state = 0;
+	int old_folio_state = 0;
 	struct anon_vma *anon_vma = NULL;
 	bool locked = false;
 	bool dst_locked = false;
@@ -1251,7 +1251,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
 	}
 	locked = true;
 	if (folio_test_mlocked(src))
-		old_page_state |= PAGE_WAS_MLOCKED;
+		old_folio_state |= FOLIO_MF_WAS_MLOCKED;
 
 	if (folio_test_writeback(src)) {
 		/*
@@ -1300,7 +1300,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
 	dst_locked = true;
 
 	if (unlikely(page_has_movable_ops(&src->page))) {
-		__migrate_folio_record(dst, old_page_state, anon_vma);
+		__migrate_folio_record(dst, old_folio_state, anon_vma);
 		return 0;
 	}
 
@@ -1326,11 +1326,11 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
 		VM_BUG_ON_FOLIO(folio_test_anon(src) &&
 			       !folio_test_ksm(src) && !anon_vma, src);
 		try_to_migrate(src, mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0);
-		old_page_state |= PAGE_WAS_MAPPED;
+		old_folio_state |= FOLIO_MF_WAS_MAPPED;
 	}
 
 	if (!folio_mapped(src)) {
-		__migrate_folio_record(dst, old_page_state, anon_vma);
+		__migrate_folio_record(dst, old_folio_state, anon_vma);
 		return 0;
 	}
 
@@ -1342,7 +1342,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
 	if (rc == -EAGAIN)
 		ret = NULL;
 
-	migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED,
+	migrate_folio_undo_src(src, old_folio_state & FOLIO_MF_WAS_MAPPED,
 			       anon_vma, locked, ret);
 	migrate_folio_undo_dst(dst, dst_locked, put_new_folio, private);
 
@@ -1356,11 +1356,11 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
 			      struct list_head *ret)
 {
 	int rc;
-	int old_page_state = 0;
+	int old_folio_state = 0;
 	struct anon_vma *anon_vma = NULL;
 	struct list_head *prev;
 
-	__migrate_folio_extract(dst, &old_page_state, &anon_vma);
+	__migrate_folio_extract(dst, &old_folio_state, &anon_vma);
 	prev = dst->lru.prev;
 	list_del(&dst->lru);
 
@@ -1385,10 +1385,10 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
 	 * isolated from the unevictable LRU: but this case is the easiest.
 	 */
 	folio_add_lru(dst);
-	if (old_page_state & PAGE_WAS_MLOCKED)
+	if (old_folio_state & FOLIO_MF_WAS_MLOCKED)
 		lru_add_drain();
 
-	if (old_page_state & PAGE_WAS_MAPPED)
+	if (old_folio_state & FOLIO_MF_WAS_MAPPED)
 		remove_migration_ptes(src, dst, 0);
 
 out_unlock_both:
@@ -1420,11 +1420,11 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
 	 */
 	if (rc == -EAGAIN) {
 		list_add(&dst->lru, prev);
-		__migrate_folio_record(dst, old_page_state, anon_vma);
+		__migrate_folio_record(dst, old_folio_state, anon_vma);
 		return rc;
 	}
 
-	migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED,
+	migrate_folio_undo_src(src, old_folio_state & FOLIO_MF_WAS_MAPPED,
 			       anon_vma, true, ret);
 	migrate_folio_undo_dst(dst, true, put_new_folio, private);
 
@@ -1758,11 +1758,11 @@ static void migrate_folios_undo(struct list_head *src_folios,
 	dst = list_first_entry(dst_folios, struct folio, lru);
 	dst2 = list_next_entry(dst, lru);
 	list_for_each_entry_safe(folio, folio2, src_folios, lru) {
-		int old_page_state = 0;
+		int old_folio_state = 0;
 		struct anon_vma *anon_vma = NULL;
 
-		__migrate_folio_extract(dst, &old_page_state, &anon_vma);
-		migrate_folio_undo_src(folio, old_page_state & PAGE_WAS_MAPPED,
+		__migrate_folio_extract(dst, &old_folio_state, &anon_vma);
+		migrate_folio_undo_src(folio, old_folio_state & FOLIO_MF_WAS_MAPPED,
 				anon_vma, true, ret_folios);
 		list_del(&dst->lru);
 		migrate_folio_undo_dst(dst, true, put_new_folio, private);
-- 

Thanks,
Shivank





^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v4 0/6] Accelerate page migration with batch copying and hardware offload
  2026-03-09 12:07 [RFC PATCH v4 0/6] Accelerate page migration with batch copying and hardware offload Shivank Garg
                   ` (5 preceding siblings ...)
  2026-03-09 12:07 ` [RFC PATCH v4 6/6] mm/migrate: adjust NR_MAX_BATCHED_MIGRATION for testing Shivank Garg
@ 2026-03-18 14:29 ` Garg, Shivank
  6 siblings, 0 replies; 21+ messages in thread
From: Garg, Shivank @ 2026-03-18 14:29 UTC (permalink / raw)
  To: akpm, david
  Cc: lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt, surenb,
	mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
	gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm

On 3/9/2026 5:37 PM, Shivank Garg wrote:
> This is the fourth RFC of the patchset to enhance page migration by
> batching folio-copy operations and enabling acceleration via DMA offload.
> 
> Single-threaded, folio-by-folio copying bottlenecks page migration in
> modern systems with deep memory hierarchies, especially for large folios
> where copy overhead dominates, leaving significant hardware potential
> untapped.
> 
> By batching the copy phase, we create an opportunity for hardware
> acceleration. This series builds the framework and provides a DMA
> offload driver (dcbm) as a reference implementation, targeting bulk
> migration workloads where offloading the copy improves throughput
> and latency while freeing the CPU cycles.
> 

[snip]

> System Info: AMD Zen 3 EPYC server (2-sockets, 32 cores, SMT Enabled),
> 1 NUMA node per socket, v7.0-rc2, DVFS set to Performance, PTDMA hardware.

> 
> a. Baseline (vanilla kernel: v7.0-rc2, single-threaded, serial folio_copy):
> 
> ============================================================================================
>        | 4K          | 16K         | 64K         | 256K        | 1M           | 2M         |
> ============================================================================================
>        |3.55±0.19    | 5.66±0.30   | 6.16±0.09   | 7.12±0.83   | 6.93±0.09   | 10.88±0.19  |
> 
> b. DMA offload (Patched Kernel, dcbm driver, N DMA channels):
> 
> ============================================================================================
> Channel Cnt| 4K      | 16K         | 64K         | 256K        | 1M          | 2M          |
> ============================================================================================
> 1      | 2.63±0.26   | 2.92±0.09   |  3.16±0.13  |  4.75±0.70  |  7.38±0.18  | 12.64±0.07  |
> 2      | 3.20±0.12   | 4.68±0.17   |  5.16±0.36  |  7.42±1.00  |  8.05±0.05  | 14.40±0.10  |
> 4      | 3.78±0.16   | 6.45±0.06   |  7.36±0.18  |  9.70±0.11  | 11.68±2.37  | 27.16±0.20  |
> 8      | 4.32±0.24   | 8.20±0.45   |  9.45±0.26  | 12.99±2.87  | 13.18±0.08  | 46.17±0.67  |
> 12     | 4.35±0.16   | 8.80±0.09   | 11.65±2.71  | 15.46±4.95  | 14.69±4.10  | 60.89±0.68  |
> 16     | 4.40±0.19   | 9.25±0.13   | 11.02±0.26  | 13.56±0.15  | 18.04±7.11  | 66.86±0.81  |

I ran experiments to evaluate DMA offload for Memory Compaction page migration (on above system)

Each NUMA ~250GB per node. I bind everything to Node 1 (CPU 32) and keep background MM daemons disabled.

The experiment has two phases: Fragmentation and Compaction(/migration)

1. Memory Fragmentation

I allocate ~248GB of anonymous memory on Node 1 and touch every page to
ensure physical backing. Then, for each 2MB-aligned region (512
contiguous 4KB pages), I free 50% of pages at evenly-spaced offsets using
MADV_DONTNEED. The freed pages return to the buddy allocator, but the
remaining 256 occupied pages in each region prevent merging into higher
order blocks.

After this, Node 1 is 100% fragmented with 50% free memory means every
hugepage allocation requires compaction.

[ ] [X] [ ] [X] [ ] [X] [ ] [X] [ ] [X] [ ] [X] ...

The fragmenter process stays alive throughout the measurement, with
oom_score_adj=-1000 to prevent the OOM killer from targeting it.

2. Compaction Trigger

To benchmark compaction in a reproducible way, I use a kernel module that
calls alloc_pages_node() in a tight loop for the target node. Each
allocation enters the slow path:
__alloc_pages_slowpath() -> try_to_compact_pages() -> compact_zone() -> migrate_pages(),
performing page migration under MR_COMPACTION. The allocation is pinned
to CPU 32 on Node 1.

Target: Allocate **16384** order-9 pages (32GB), producing ~4.5 million
4KB page migrations per run.

3. CPU Contention (Busy System)

To emulate a real-world scenario for busy-system, I run a cpu hogging
process on the same CPU as compaction:

while (run) { counter++; __asm__ volatile("" : "+r"(counter)); }

Both compaction and the hog are pinned to CPU 32, so they compete for the
same core, emulating a real-world scenario where compaction shares CPU
time with application workloads.

I measure the following metrics:
1. Wall time: elapsed time for all hugepage allocations
2. Pages migrated: delta of /proc/vmstat counters (pgmigrate_success)
3. DMA copies: DCBM sysfs counter (folios_migrated)
4. /proc/stat for the pinned CPU — user%, sys%, idle% during the run
5. Hog iterations (busy modes): total loop count of the CPU-hog process

Experiment Results:

I run four configurations on fresh reboot to avoid buddy allocator
state degradation between runs:

Baseline (vanilla kernel) and DMA (migration offload enabled),
Each on an idle and a busy system.

  Mode            Wall time(ms)  Migrated   DMA_Copy  Hog_Iters    User%     Sys%    Idle%
  --------------------------------------------------------------------------------------------
1 baseline         16708         4563506          -      -          0.00%   99.40%    0.29%
2 dma              18887         4622952    4623181      -          0.00%   76.65%   22.55%
3 busy-baseline    33256         4599846          -   62300165085  49.90%   49.75%    0.06%
4 busy-dma         32475         4602750    4604672   66022189744  56.32%   42.97%    0.06%

Inference:

1. On an idle system, wall time increases with DMA (~13%) because the
current compaction batch size (COMPACT_CLUSTER_MAX = 32 pages) is
too small for DMA to amortize its setup cost. However, kernel sys%
drops from 99.4% to 76.7%, freeing 22.5% of CPU time.

2. On a busy system, wall time decreases slightly (~2.3%) and the hog
process accumulates 6% more iterations with DMA offload. The CPU
time freed during DMA transfers goes directly to the competing
userspace workload.
This shows that DMA offload for compaction benefits busy system with
high fragmentation.

Note:
Tuning the compaction algorithm for larger DMA batches and using DMA
hardware optimized for small-size transfers should improve the results
further.

Thanks,
Shivank

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v4 2/6] mm/migrate: skip data copy for already-copied folios
  2026-03-15 18:25     ` Garg, Shivank
@ 2026-03-23 12:20       ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-23 12:20 UTC (permalink / raw)
  To: Garg, Shivank, akpm
  Cc: lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt, surenb,
	mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
	gourry, ying.huang, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm

On 3/15/26 19:25, Garg, Shivank wrote:
> 
> 
> On 3/12/2026 3:14 PM, David Hildenbrand (Arm) wrote:
>> On 3/9/26 13:07, Shivank Garg wrote:
>>> Add a PAGE_ALREADY_COPIED flag to the dst->private migration state.
>>> When set, __migrate_folio() skips folio_mc_copy() and performs
>>> metadata-only migration. All callers currently pass
>>> already_copied=false. The batch-copy path enables it in a later patch.
>>>
>>> Move the dst->private state enum earlier in the file so
>>> __migrate_folio() and move_to_new_folio() can see PAGE_ALREADY_COPIED.
>>>
>>> Signed-off-by: Shivank Garg <shivankg@amd.com>
>>> ---
>>>  mm/migrate.c | 52 +++++++++++++++++++++++++++++++---------------------
>>>  1 file changed, 31 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index 1bf2cf8c44dd..1d8c1fb627c9 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -848,6 +848,18 @@ void folio_migrate_flags(struct folio *newfolio, struct folio *folio)
>>>  }
>>>  EXPORT_SYMBOL(folio_migrate_flags);
>>>  
>>> +/*
>>> + * To record some information during migration, we use unused private
>>> + * field of struct folio of the newly allocated destination folio.
>>> + * This is safe because nobody is using it except us.
>>> + */
>>> +enum {
>>> +	PAGE_WAS_MAPPED = BIT(0),
>>> +	PAGE_WAS_MLOCKED = BIT(1),
>>> +	PAGE_ALREADY_COPIED = BIT(2),
>>> +	PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED | PAGE_ALREADY_COPIED,
>>
>> All these states really only apply to proper folios (not movable_ops).
>> So once we complete decoupling movable_ops migration from folio
>> migration, these flags would only appear in the folio migration part.
>>
>> Can we convert them first to state it clearly already that these are
>> folio migration flags?
>>
>> FOLIO_MF_WAS_MAPPED
>>
>> ...
>>
> Sure, done.
> 
> Should I fold it into the series? Or send it as independent patch as this series would
> likely take few more rounds of reviews and discussion.

Best to send it out as a standalone cleanup :)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v4 5/6] drivers/migrate_offload: add DMA batch copy driver (dcbm)
  2026-03-09 12:07 ` [RFC PATCH v4 5/6] drivers/migrate_offload: add DMA batch copy driver (dcbm) Shivank Garg
  2026-03-09 18:04   ` Gregory Price
@ 2026-03-24  8:10   ` Huang, Ying
  1 sibling, 0 replies; 21+ messages in thread
From: Huang, Ying @ 2026-03-24  8:10 UTC (permalink / raw)
  To: Shivank Garg
  Cc: akpm, david, lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt,
	surenb, mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	byungchul, gourry, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm

Hi, Shivank,

Shivank Garg <shivankg@amd.com> writes:

> Simple DMAEngine based driver that uses memcpy channels to batch-copy
> folios during page migration. Primarily for testing the copy offload
> infrastructure.
>
> When DMA fails the callback returns an error and the migration path
> falls back to per-folio CPU copy.
>
> Sysfs interface under /sys/kernel/dcbm/:
>   offloading      - enable/disable DMA offload
>   nr_dma_chan     - max number of DMA channels to use
>   folios_migrated - folios copied via DMA
>   folios_failures - fallback count

How about placing the sysfs interface under /sys/module/dcbm/?  We will
have multiple migrator implementations in the future, so dcbm behaves
more like a driver, right?

---
Best Regards,
Huang, Ying

[snip]


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v4 2/6] mm/migrate: skip data copy for already-copied folios
  2026-03-09 12:07 ` [RFC PATCH v4 2/6] mm/migrate: skip data copy for already-copied folios Shivank Garg
  2026-03-12  9:44   ` David Hildenbrand (Arm)
@ 2026-03-24  8:22   ` Huang, Ying
  1 sibling, 0 replies; 21+ messages in thread
From: Huang, Ying @ 2026-03-24  8:22 UTC (permalink / raw)
  To: Shivank Garg
  Cc: akpm, david, lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt,
	surenb, mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	byungchul, gourry, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm

Shivank Garg <shivankg@amd.com> writes:

> Add a PAGE_ALREADY_COPIED flag to the dst->private migration state.
> When set, __migrate_folio() skips folio_mc_copy() and performs
> metadata-only migration. All callers currently pass
> already_copied=false. The batch-copy path enables it in a later patch.
>
> Move the dst->private state enum earlier in the file so
> __migrate_folio() and move_to_new_folio() can see PAGE_ALREADY_COPIED.
>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---
>  mm/migrate.c | 52 +++++++++++++++++++++++++++++++---------------------
>  1 file changed, 31 insertions(+), 21 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 1bf2cf8c44dd..1d8c1fb627c9 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -848,6 +848,18 @@ void folio_migrate_flags(struct folio *newfolio, struct folio *folio)
>  }
>  EXPORT_SYMBOL(folio_migrate_flags);
>  
> +/*
> + * To record some information during migration, we use unused private
> + * field of struct folio of the newly allocated destination folio.
> + * This is safe because nobody is using it except us.
> + */
> +enum {
> +	PAGE_WAS_MAPPED = BIT(0),
> +	PAGE_WAS_MLOCKED = BIT(1),
> +	PAGE_ALREADY_COPIED = BIT(2),
> +	PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED | PAGE_ALREADY_COPIED,
> +};
> +
>  /************************************************************
>   *                    Migration functions
>   ***********************************************************/
> @@ -857,14 +869,20 @@ static int __migrate_folio(struct address_space *mapping, struct folio *dst,
>  			   enum migrate_mode mode)
>  {
>  	int rc, expected_count = folio_expected_ref_count(src) + 1;
> +	bool already_copied = ((unsigned long)dst->private & PAGE_ALREADY_COPIED);
> +
> +	if (already_copied)
> +		dst->private = NULL;
>  
>  	/* Check whether src does not have extra refs before we do more work */
>  	if (folio_ref_count(src) != expected_count)
>  		return -EAGAIN;
>  
> -	rc = folio_mc_copy(dst, src);
> -	if (unlikely(rc))
> -		return rc;
> +	if (!already_copied) {
> +		rc = folio_mc_copy(dst, src);
> +		if (unlikely(rc))
> +			return rc;
> +	}
>  
>  	rc = __folio_migrate_mapping(mapping, dst, src, expected_count);
>  	if (rc)
> @@ -1088,7 +1106,7 @@ static int fallback_migrate_folio(struct address_space *mapping,
>   *     0 - success
>   */
>  static int move_to_new_folio(struct folio *dst, struct folio *src,
> -				enum migrate_mode mode)
> +		enum migrate_mode mode, bool already_copied)
>  {
>  	struct address_space *mapping = folio_mapping(src);
>  	int rc = -EAGAIN;
> @@ -1096,6 +1114,9 @@ static int move_to_new_folio(struct folio *dst, struct folio *src,
>  	VM_BUG_ON_FOLIO(!folio_test_locked(src), src);
>  	VM_BUG_ON_FOLIO(!folio_test_locked(dst), dst);
>  
> +	if (already_copied)
> +		dst->private = (void *)(unsigned long)PAGE_ALREADY_COPIED;
> +

IMHO, this appears to be an unusual way to pass arguments to a function.
Why not adjust the parameters of migrate_folio()?  How about turning enum
migrate_mode into a bitmask (migrate_flags)?

>  	if (!mapping)
>  		rc = migrate_folio(mapping, dst, src, mode);
>  	else if (mapping_inaccessible(mapping))
> @@ -1127,17 +1148,6 @@ static int move_to_new_folio(struct folio *dst, struct folio *src,
>  	return rc;
>  }
>  
> -/*
> - * To record some information during migration, we use unused private
> - * field of struct folio of the newly allocated destination folio.
> - * This is safe because nobody is using it except us.
> - */
> -enum {
> -	PAGE_WAS_MAPPED = BIT(0),
> -	PAGE_WAS_MLOCKED = BIT(1),
> -	PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED,
> -};
> -
>  static void __migrate_folio_record(struct folio *dst,
>  				   int old_page_state,
>  				   struct anon_vma *anon_vma)
> @@ -1353,7 +1363,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
>  static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>  			      struct folio *src, struct folio *dst,
>  			      enum migrate_mode mode, enum migrate_reason reason,
> -			      struct list_head *ret)
> +			      struct list_head *ret, bool already_copied)
>  {
>  	int rc;
>  	int old_page_state = 0;
> @@ -1371,7 +1381,7 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>  		goto out_unlock_both;
>  	}
>  
> -	rc = move_to_new_folio(dst, src, mode);
> +	rc = move_to_new_folio(dst, src, mode, already_copied);
>  	if (rc)
>  		goto out;
>  
> @@ -1519,7 +1529,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
>  	}
>  
>  	if (!folio_mapped(src))
> -		rc = move_to_new_folio(dst, src, mode);
> +		rc = move_to_new_folio(dst, src, mode, false);
>  
>  	if (page_was_mapped)
>  		remove_migration_ptes(src, !rc ? dst : src, ttu);
> @@ -1703,7 +1713,7 @@ static void migrate_folios_move(struct list_head *src_folios,
>  		struct list_head *ret_folios,
>  		struct migrate_pages_stats *stats,
>  		int *retry, int *thp_retry, int *nr_failed,
> -		int *nr_retry_pages)
> +		int *nr_retry_pages, bool already_copied)
>  {
>  	struct folio *folio, *folio2, *dst, *dst2;
>  	bool is_thp;
> @@ -1720,7 +1730,7 @@ static void migrate_folios_move(struct list_head *src_folios,
>  
>  		rc = migrate_folio_move(put_new_folio, private,
>  				folio, dst, mode,
> -				reason, ret_folios);
> +				reason, ret_folios, already_copied);
>  		/*
>  		 * The rules are:
>  		 *	0: folio will be freed
> @@ -1977,7 +1987,7 @@ static int migrate_pages_batch(struct list_head *from,
>  		migrate_folios_move(&unmap_folios, &dst_folios,
>  				put_new_folio, private, mode, reason,
>  				ret_folios, stats, &retry, &thp_retry,
> -				&nr_failed, &nr_retry_pages);
> +				&nr_failed, &nr_retry_pages, false);
>  	}
>  	nr_failed += retry;
>  	stats->nr_thp_failed += thp_retry;

---
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v4 3/6] mm/migrate: add batch-copy path in migrate_pages_batch
  2026-03-09 12:07 ` [RFC PATCH v4 3/6] mm/migrate: add batch-copy path in migrate_pages_batch Shivank Garg
@ 2026-03-24  8:42   ` Huang, Ying
  0 siblings, 0 replies; 21+ messages in thread
From: Huang, Ying @ 2026-03-24  8:42 UTC (permalink / raw)
  To: Shivank Garg
  Cc: akpm, david, lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt,
	surenb, mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	byungchul, gourry, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm

Shivank Garg <shivankg@amd.com> writes:

> Split unmapped folios into batch-eligible (src_batch/dst_batch) and
> standard (src_std/dst_std) lists, gated by the migrate_offload_enabled
> which is off by default. So, when no offload driver is active, the
> branch is never taken and everything goes through the standard path.
>
> After TLB flush, batch copy the eligible folios via folios_mc_copy()
> and pass already_copied=true into migrate_folios_move() so
> __migrate_folio() skips the per-folio copy.
>
> On batch copy failure, already_copied flag stays false and each folio
> fall back to individual copy.
>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---
>  mm/migrate.c | 55 +++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 44 insertions(+), 11 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 1d8c1fb627c9..69daa16f9cf3 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -43,6 +43,7 @@
>  #include <linux/sched/sysctl.h>
>  #include <linux/memory-tiers.h>
>  #include <linux/pagewalk.h>
> +#include <linux/jump_label.h>
>  
>  #include <asm/tlbflush.h>
>  
> @@ -51,6 +52,8 @@
>  #include "internal.h"
>  #include "swap.h"
>  
> +DEFINE_STATIC_KEY_FALSE(migrate_offload_enabled);
> +
>  static const struct movable_operations *offline_movable_ops;
>  static const struct movable_operations *zsmalloc_movable_ops;
>  
> @@ -1706,6 +1709,12 @@ static int migrate_hugetlbs(struct list_head *from, new_folio_t get_new_folio,
>  	return nr_failed;
>  }
>  
> +/* movable_ops folios have their own migrate path */
> +static bool folio_supports_batch_copy(struct folio *folio)
> +{
> +	return likely(!page_has_movable_ops(&folio->page));
> +}
> +
>  static void migrate_folios_move(struct list_head *src_folios,
>  		struct list_head *dst_folios,
>  		free_folio_t put_new_folio, unsigned long private,
> @@ -1805,8 +1814,12 @@ static int migrate_pages_batch(struct list_head *from,
>  	bool is_large = false;
>  	struct folio *folio, *folio2, *dst = NULL;
>  	int rc, rc_saved = 0, nr_pages;
> -	LIST_HEAD(unmap_folios);
> -	LIST_HEAD(dst_folios);
> +	unsigned int nr_batch = 0;
> +	bool batch_copied = false;
> +	LIST_HEAD(src_batch);
> +	LIST_HEAD(dst_batch);
> +	LIST_HEAD(src_std);
> +	LIST_HEAD(dst_std);

IMHO, the naming appears too copy centric, how about unmap_batch and
unmap_single?  "unmap" is one step of migration.

>  	bool nosplit = (reason == MR_NUMA_MISPLACED);
>  
>  	VM_WARN_ON_ONCE(mode != MIGRATE_ASYNC &&
> @@ -1943,7 +1956,7 @@ static int migrate_pages_batch(struct list_head *from,

unmap/dst_folios  in comments need to be changed too.

			rc = migrate_folio_unmap(get_new_folio, put_new_folio,
					private, folio, &dst, mode, ret_folios);
			/*
			 * The rules are:
			 *	0: folio will be put on unmap_folios list,
			 *	   dst folio put on dst_folios list
			 *	-EAGAIN: stay on the from list
			 *	-ENOMEM: stay on the from list
			 *	Other errno: put on ret_folios list
			 */


>  				/* nr_failed isn't updated for not used */
>  				stats->nr_thp_failed += thp_retry;
>  				rc_saved = rc;
> -				if (list_empty(&unmap_folios))
> +				if (list_empty(&src_batch) && list_empty(&src_std))
>  					goto out;
>  				else
>  					goto move;
> @@ -1953,8 +1966,15 @@ static int migrate_pages_batch(struct list_head *from,
>  				nr_retry_pages += nr_pages;
>  				break;
>  			case 0:
> -				list_move_tail(&folio->lru, &unmap_folios);
> -				list_add_tail(&dst->lru, &dst_folios);
> +				if (static_branch_unlikely(&migrate_offload_enabled) &&
> +				    folio_supports_batch_copy(folio)) {
> +					list_move_tail(&folio->lru, &src_batch);
> +					list_add_tail(&dst->lru, &dst_batch);
> +					nr_batch++;
> +				} else {
> +					list_move_tail(&folio->lru, &src_std);
> +					list_add_tail(&dst->lru, &dst_std);
> +				}
>  				break;
>  			default:
>  				/*
> @@ -1977,17 +1997,28 @@ static int migrate_pages_batch(struct list_head *from,
>  	/* Flush TLBs for all unmapped folios */
>  	try_to_unmap_flush();
>  
> +	/* Batch-copy eligible folios before the move phase */
> +	if (!list_empty(&src_batch)) {
> +		rc = folios_mc_copy(&dst_batch, &src_batch, nr_batch);
> +		batch_copied = (rc == 0);
> +	}
> +
>  	retry = 1;
>  	for (pass = 0; pass < nr_pass && retry; pass++) {
>  		retry = 0;
>  		thp_retry = 0;
>  		nr_retry_pages = 0;
>  
> -		/* Move the unmapped folios */
> -		migrate_folios_move(&unmap_folios, &dst_folios,
> -				put_new_folio, private, mode, reason,
> -				ret_folios, stats, &retry, &thp_retry,
> -				&nr_failed, &nr_retry_pages, false);
> +		if (!list_empty(&src_batch))
> +			migrate_folios_move(&src_batch, &dst_batch, put_new_folio,
> +					private, mode, reason, ret_folios, stats,
> +					&retry, &thp_retry, &nr_failed,
> +					&nr_retry_pages, batch_copied);
> +		if (!list_empty(&src_std))
> +			migrate_folios_move(&src_std, &dst_std,	put_new_folio,
> +					private, mode, reason, ret_folios, stats,
> +					&retry, &thp_retry, &nr_failed,
> +					&nr_retry_pages, false);
>  	}
>  	nr_failed += retry;
>  	stats->nr_thp_failed += thp_retry;
> @@ -1996,7 +2027,9 @@ static int migrate_pages_batch(struct list_head *from,
>  	rc = rc_saved ? : nr_failed;
>  out:
>  	/* Cleanup remaining folios */
> -	migrate_folios_undo(&unmap_folios, &dst_folios,
> +	migrate_folios_undo(&src_batch, &dst_batch,
> +			put_new_folio, private, ret_folios);
> +	migrate_folios_undo(&src_std, &dst_std,
>  			put_new_folio, private, ret_folios);
>  
>  	return rc;

---
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v4 4/6] mm/migrate: add copy offload registration infrastructure
  2026-03-09 12:07 ` [RFC PATCH v4 4/6] mm/migrate: add copy offload registration infrastructure Shivank Garg
  2026-03-09 17:54   ` Gregory Price
@ 2026-03-24 10:54   ` Huang, Ying
  1 sibling, 0 replies; 21+ messages in thread
From: Huang, Ying @ 2026-03-24 10:54 UTC (permalink / raw)
  To: Shivank Garg
  Cc: akpm, david, lorenzo.stoakes, Liam.Howlett, vbabka, willy, rppt,
	surenb, mhocko, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	byungchul, gourry, apopple, dave, Jonathan.Cameron, rkodsara,
	vkoul, bharata, sj, weixugc, dan.j.williams, rientjes,
	xuezhengchu, yiannis, dave.hansen, hannes, jhubbard, peterx, riel,
	shakeel.butt, stalexan, tj, nifan.cxl, linux-kernel, linux-mm,
	Mike Day

Shivank Garg <shivankg@amd.com> writes:

> Introduce CONFIG_MIGRATION_COPY_OFFLOAD, which lets offload driver

Do we really need a new kconfig option?  IMHO, we have too many now.
Because we have a jump label already, the performance difference should
be trivial.  Can you measure the size difference?

> (DMA, multi-threaded CPU copy, etc) take over the batch folio copy in
> migrate_pages_batch().
>
> Offload driver fill in a struct migrator with their offload_copy() and
> should_batch() implementation and call migrate_offload_start(), which
> patches the migrate_offload_copy() static_call and flips the
> migrate_offload_enabled static branch. The migrate_offload_stop() call
> reverts both.
>
> Only one migrator can be active a time. A second registration returns
> -EBUSY, and only the active migrator can stop itself. The static_call
> dispatch is under SRCU so synchronize_srcu() in stop path guarantees
> no in-flight copy before the module reference is dropped.
>
> Co-developed-by: Mike Day <michael.day@amd.com>
> Signed-off-by: Mike Day <michael.day@amd.com>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---
>  include/linux/migrate_copy_offload.h | 34 ++++++++++
>  mm/Kconfig                           |  9 +++
>  mm/Makefile                          |  1 +
>  mm/migrate.c                         | 30 ++++++++-
>  mm/migrate_copy_offload.c            | 99 ++++++++++++++++++++++++++++
>  5 files changed, 171 insertions(+), 2 deletions(-)
>  create mode 100644 include/linux/migrate_copy_offload.h
>  create mode 100644 mm/migrate_copy_offload.c
>
> diff --git a/include/linux/migrate_copy_offload.h b/include/linux/migrate_copy_offload.h
> new file mode 100644
> index 000000000000..ee112826ebdf
> --- /dev/null
> +++ b/include/linux/migrate_copy_offload.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_MIGRATE_COPY_OFFLOAD_H
> +#define _LINUX_MIGRATE_COPY_OFFLOAD_H
> +
> +#include <linux/jump_label.h>
> +#include <linux/srcu.h>
> +#include <linux/types.h>
> +
> +struct list_head;
> +struct module;
> +
> +#define MIGRATOR_NAME_LEN 32
> +
> +struct migrator {
> +	char name[MIGRATOR_NAME_LEN];
> +	int (*offload_copy)(struct list_head *dst_list,
> +			    struct list_head *src_list,
> +			    unsigned int folio_cnt);
> +	bool (*should_batch)(int reason);
> +	struct module *owner;
> +};
> +
> +#ifdef CONFIG_MIGRATION_COPY_OFFLOAD
> +extern struct static_key_false migrate_offload_enabled;
> +extern struct srcu_struct migrate_offload_srcu;
> +bool migrate_should_batch_default(int reason);
> +int migrate_offload_start(struct migrator *m);
> +int migrate_offload_stop(struct migrator *m);

Why not naming the function migrate_offload_register/unregister()?
IMHO, that sounds more natural.

> +#else
> +static inline int migrate_offload_start(struct migrator *m) { return 0; }
> +static inline int migrate_offload_stop(struct migrator *m) { return 0; }
> +#endif /* CONFIG_MIGRATION_COPY_OFFLOAD */
> +
> +#endif /* _LINUX_MIGRATE_COPY_OFFLOAD_H */
> diff --git a/mm/Kconfig b/mm/Kconfig
> index ebd8ea353687..faf0cae9991b 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -648,6 +648,15 @@ config MIGRATION
>  config DEVICE_MIGRATION
>  	def_bool MIGRATION && ZONE_DEVICE
>  
> +config MIGRATION_COPY_OFFLOAD
> +	bool "Page migration copy offload"
> +	depends on MIGRATION
> +	help
> +	  Adds migration copy offload infrastructure which allow
> +	  offload engines (DMA, multi-threaded CPU copy, etc.) to
> +	  register as the batch-copy provider for page migration
> +	  via migrate_offload_start()/migrate_offload_stop().
> +
>  config ARCH_ENABLE_HUGEPAGE_MIGRATION
>  	bool
>  
> diff --git a/mm/Makefile b/mm/Makefile
> index 8ad2ab08244e..db1ac8097089 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -96,6 +96,7 @@ obj-$(CONFIG_FAILSLAB) += failslab.o
>  obj-$(CONFIG_FAIL_PAGE_ALLOC) += fail_page_alloc.o
>  obj-$(CONFIG_MEMTEST)		+= memtest.o
>  obj-$(CONFIG_MIGRATION) += migrate.o
> +obj-$(CONFIG_MIGRATION_COPY_OFFLOAD) += migrate_copy_offload.o
>  obj-$(CONFIG_NUMA) += memory-tiers.o
>  obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o
>  obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 69daa16f9cf3..acaaa9cc0d4f 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -44,6 +44,8 @@
>  #include <linux/memory-tiers.h>
>  #include <linux/pagewalk.h>
>  #include <linux/jump_label.h>
> +#include <linux/static_call.h>
> +#include <linux/migrate_copy_offload.h>
>  
>  #include <asm/tlbflush.h>
>  
> @@ -54,6 +56,17 @@
>  
>  DEFINE_STATIC_KEY_FALSE(migrate_offload_enabled);
>  
> +#ifdef CONFIG_MIGRATION_COPY_OFFLOAD
> +DEFINE_SRCU(migrate_offload_srcu);
> +DEFINE_STATIC_CALL(migrate_offload_copy, folios_mc_copy);
> +
> +bool migrate_should_batch_default(int reason)
> +{
> +	return false;
> +}
> +DEFINE_STATIC_CALL(migrate_should_batch, migrate_should_batch_default);
> +#endif
> +
>  static const struct movable_operations *offline_movable_ops;
>  static const struct movable_operations *zsmalloc_movable_ops;
>  
> @@ -1820,11 +1833,18 @@ static int migrate_pages_batch(struct list_head *from,
>  	LIST_HEAD(dst_batch);
>  	LIST_HEAD(src_std);
>  	LIST_HEAD(dst_std);
> +	bool do_batch = false;
>  	bool nosplit = (reason == MR_NUMA_MISPLACED);
>  
>  	VM_WARN_ON_ONCE(mode != MIGRATE_ASYNC &&
>  			!list_empty(from) && !list_is_singular(from));
>  
> +#ifdef CONFIG_MIGRATION_COPY_OFFLOAD
> +	/* Check if the offload driver wants to batch for this reason */
> +	if (static_branch_unlikely(&migrate_offload_enabled))
> +		do_batch = static_call(migrate_should_batch)(reason);

Should batching based on "reason" be determined by the general migrate
code instead of the migrator implementation?  For example, if we only
batch copying for ASYNC migration, we should determine that in
migrate_pages_batch() instead of the migreation implementation.  Or am I
missed something?  If so, can you provide an example?

> +#endif
> +
>  	for (pass = 0; pass < nr_pass && retry; pass++) {
>  		retry = 0;
>  		thp_retry = 0;
> @@ -1967,7 +1987,7 @@ static int migrate_pages_batch(struct list_head *from,
>  				break;
>  			case 0:
>  				if (static_branch_unlikely(&migrate_offload_enabled) &&
> -				    folio_supports_batch_copy(folio)) {
> +				    do_batch && folio_supports_batch_copy(folio)) {
>  					list_move_tail(&folio->lru, &src_batch);
>  					list_add_tail(&dst->lru, &dst_batch);
>  					nr_batch++;
> @@ -1997,11 +2017,17 @@ static int migrate_pages_batch(struct list_head *from,
>  	/* Flush TLBs for all unmapped folios */
>  	try_to_unmap_flush();
>  
> +#ifdef CONFIG_MIGRATION_COPY_OFFLOAD
>  	/* Batch-copy eligible folios before the move phase */
>  	if (!list_empty(&src_batch)) {

Guard with "static_branch_unlikely(&migrate_offload_enabled)" first?
Better to define a inline function to shorten the expression.

> -		rc = folios_mc_copy(&dst_batch, &src_batch, nr_batch);
> +		int idx = srcu_read_lock(&migrate_offload_srcu);
> +
> +		rc = static_call(migrate_offload_copy)(&dst_batch,
> +				&src_batch, nr_batch);
> +		srcu_read_unlock(&migrate_offload_srcu, idx);
>  		batch_copied = (rc == 0);
>  	}
> +#endif
>  
>  	retry = 1;
>  	for (pass = 0; pass < nr_pass && retry; pass++) {
> diff --git a/mm/migrate_copy_offload.c b/mm/migrate_copy_offload.c
> new file mode 100644
> index 000000000000..c22068fe09a0
> --- /dev/null
> +++ b/mm/migrate_copy_offload.c
> @@ -0,0 +1,99 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/jump_label.h>
> +#include <linux/module.h>
> +#include <linux/srcu.h>
> +#include <linux/migrate.h>
> +#include <linux/migrate_copy_offload.h>
> +#include <linux/static_call.h>
> +
> +static DEFINE_MUTEX(migrator_mutex);
> +static struct migrator *active_migrator;
> +
> +DECLARE_STATIC_CALL(migrate_offload_copy, folios_mc_copy);
> +DECLARE_STATIC_CALL(migrate_should_batch, migrate_should_batch_default);
> +
> +/**
> + * migrate_offload_start - register a batch-copy provider for page migration.
> + * @m: migrator to install.
> + *
> + * Only one provider can be active at a time, returns -EBUSY if another migrator
> + * is already registered.
> + *
> + * Return: 0 on success, negative errno on failure.
> + */
> +int migrate_offload_start(struct migrator *m)
> +{
> +	int ret = 0;
> +
> +	if (!m || !m->offload_copy)
> +		return -EINVAL;
> +
> +	mutex_lock(&migrator_mutex);
> +	if (active_migrator) {
> +		ret = -EBUSY;
> +		goto unlock;
> +	}
> +
> +	if (m->owner && !try_module_get(m->owner)) {
> +		ret = -ENODEV;
> +		goto unlock;
> +	}
> +
> +	static_call_update(migrate_offload_copy, m->offload_copy);
> +	static_call_update(migrate_should_batch,
> +		m->should_batch ? m->should_batch : migrate_should_batch_default);
> +	active_migrator = m;
> +	static_branch_enable(&migrate_offload_enabled);
> +
> +unlock:
> +	mutex_unlock(&migrator_mutex);
> +
> +	if (ret)
> +		pr_err("migrate_offload: %s: failed to register (%d)\n",
> +		       m->name, ret);
> +	else
> +		pr_info("migrate_offload: enabled by %s\n", m->name);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(migrate_offload_start);
> +
> +/**
> + * migrate_offload_stop - unregister the active batch-copy provider.
> + * @m: migrator to remove (must be the currently active one).
> + *
> + * Reverts static_call targets and waits for SRCU grace period so that
> + * no in-flight migration is still calling the driver functions before
> + * releasing the module.
> + *
> + * Return: 0 on success, negative errno on failure.
> + */
> +int migrate_offload_stop(struct migrator *m)
> +{
> +	struct module *owner;
> +
> +	mutex_lock(&migrator_mutex);
> +	if (active_migrator != m) {
> +		mutex_unlock(&migrator_mutex);
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Disable the static branch first so new migrate_pages_batch calls
> +	 * won't enter the batch copy path.
> +	 */
> +	static_branch_disable(&migrate_offload_enabled);
> +	static_call_update(migrate_offload_copy, folios_mc_copy);
> +	static_call_update(migrate_should_batch, migrate_should_batch_default);
> +	owner = active_migrator->owner;
> +	active_migrator = NULL;
> +	mutex_unlock(&migrator_mutex);
> +
> +	/* Wait for all in-flight callers to finish before module_put(). */
> +	synchronize_srcu(&migrate_offload_srcu);
> +	if (owner)
> +		module_put(owner);
> +
> +	pr_info("migrate_offload: disabled by %s\n", m->name);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(migrate_offload_stop);

---
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2026-03-24 10:54 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09 12:07 [RFC PATCH v4 0/6] Accelerate page migration with batch copying and hardware offload Shivank Garg
2026-03-09 12:07 ` [RFC PATCH v4 1/6] mm: introduce folios_mc_copy() for batch folio copying Shivank Garg
2026-03-12  9:41   ` David Hildenbrand (Arm)
2026-03-15 18:09     ` Garg, Shivank
2026-03-09 12:07 ` [RFC PATCH v4 2/6] mm/migrate: skip data copy for already-copied folios Shivank Garg
2026-03-12  9:44   ` David Hildenbrand (Arm)
2026-03-15 18:25     ` Garg, Shivank
2026-03-23 12:20       ` David Hildenbrand (Arm)
2026-03-24  8:22   ` Huang, Ying
2026-03-09 12:07 ` [RFC PATCH v4 3/6] mm/migrate: add batch-copy path in migrate_pages_batch Shivank Garg
2026-03-24  8:42   ` Huang, Ying
2026-03-09 12:07 ` [RFC PATCH v4 4/6] mm/migrate: add copy offload registration infrastructure Shivank Garg
2026-03-09 17:54   ` Gregory Price
2026-03-10 10:07     ` Garg, Shivank
2026-03-24 10:54   ` Huang, Ying
2026-03-09 12:07 ` [RFC PATCH v4 5/6] drivers/migrate_offload: add DMA batch copy driver (dcbm) Shivank Garg
2026-03-09 18:04   ` Gregory Price
2026-03-12  9:33     ` Garg, Shivank
2026-03-24  8:10   ` Huang, Ying
2026-03-09 12:07 ` [RFC PATCH v4 6/6] mm/migrate: adjust NR_MAX_BATCHED_MIGRATION for testing Shivank Garg
2026-03-18 14:29 ` [RFC PATCH v4 0/6] Accelerate page migration with batch copying and hardware offload Garg, Shivank

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox