From: "Huang, Ying" <ying.huang@linux.alibaba.com>
To: Shivank Garg <shivankg@amd.com>
Cc: <akpm@linux-foundation.org>, <david@redhat.com>,
<ziy@nvidia.com>, <willy@infradead.org>,
<matthew.brost@intel.com>, <joshua.hahnjy@gmail.com>,
<rakie.kim@sk.com>, <byungchul@sk.com>, <gourry@gourry.net>,
<apopple@nvidia.com>, <lorenzo.stoakes@oracle.com>,
<Liam.Howlett@oracle.com>, <vbabka@suse.cz>, <rppt@kernel.org>,
<surenb@google.com>, <mhocko@suse.com>, <vkoul@kernel.org>,
<lucas.demarchi@intel.com>, <rdunlap@infradead.org>,
<jgg@ziepe.ca>, <kuba@kernel.org>, <justonli@chromium.org>,
<ivecera@redhat.com>, <dave.jiang@intel.com>,
<Jonathan.Cameron@huawei.com>, <dan.j.williams@intel.com>,
<rientjes@google.com>, <Raghavendra.KodsaraThimmappa@amd.com>,
<bharata@amd.com>, <alirad.malek@zptcorp.com>,
<yiannis@zptcorp.com>, <weixugc@google.com>,
<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>
Subject: Re: [RFC V3 0/9] Accelerate page migration with batch copying and hardware offload
Date: Wed, 24 Sep 2025 09:49:37 +0800 [thread overview]
Message-ID: <87plbghb66.fsf@DESKTOP-5N7EMDA> (raw)
In-Reply-To: <20250923174752.35701-1-shivankg@amd.com> (Shivank Garg's message of "Tue, 23 Sep 2025 17:47:35 +0000")
Hi, Shivank,
Thanks for working on this!
Shivank Garg <shivankg@amd.com> writes:
> This is the third RFC of the patchset to enhance page migration by batching
> folio-copy operations and enabling acceleration via multi-threaded CPU or
> DMA offload.
>
> Single-threaded, folio-by-folio copying bottlenecks page migration
> in modern systems with deep memory hierarchies, especially for large
> folios where copy overhead dominates, leaving significant hardware
> potential untapped.
>
> By batching the copy phase, we create an opportunity for significant
> hardware acceleration. This series builds a framework for this acceleration
> and provides two initial offload driver implementations: one using multiple
> CPU threads (mtcopy) and another leveraging the DMAEngine subsystem (dcbm).
>
> This version incorporates significant feedback to improve correctness,
> robustness, and the efficiency of the DMA offload path.
>
> Changelog since V2:
>
> 1. DMA Engine Rewrite:
> - Switched from per-folio dma_map_page() to batch dma_map_sgtable()
> - Single completion interrupt per batch (reduced overhead)
> - Order of magnitude improvement in setup time for large batches
> 2. Code cleanups and refactoring
> 3. Rebased on latest mainline (6.17-rc6+)
>
> MOTIVATION:
> -----------
>
> Current Migration Flow:
> [ move_pages(), Compaction, Tiering, etc. ]
> |
> v
> [ migrate_pages() ] // Common entry point
> |
> v
> [ migrate_pages_batch() ] // NR_MAX_BATCHED_MIGRATION (512) folios at a time
> |
> |--> [ migrate_folio_unmap() ]
> |
> |--> [ try_to_unmap_flush() ] // Perform a single, batched TLB flush
> |
> |--> [ migrate_folios_move() ] // Bottleneck: Interleaved copy
> - For each folio:
> - Metadata prep: Copy flags, mappings, etc.
> - folio_copy() <-- Single-threaded, serial data copy.
> - Update PTEs & finalize for that single folio.
>
> Understanding overheads in page migration (move_pages() syscall):
>
> Total move_pages() overheads = folio_copy() + Other overheads
> 1. folio_copy() is the core copy operation that interests us.
> 2. The remaining operations are user/kernel transitions, page table walks,
> locking, folio unmap, dst folio alloc, TLB flush, copying flags, updating
> mappings and PTEs etc. that contribute to the remaining overheads.
>
> Percentage of folio_copy() overheads in move_pages(N pages) syscall time:
> Number of pages being migrated and folio size:
> 4KB 2MB
> 1 page <1% ~66%
> 512 page ~35% ~97%
>
> Based on Amdahl's Law, optimizing folio_copy() for large pages offers a
> substantial performance opportunity.
>
> move_pages() syscall speedup = 1 / ((1 - F) + (F / S))
> Where F is the fraction of time spent in folio_copy() and S is the speedup of
> folio_copy().
>
> For 4KB folios, folio copy overheads are significantly small in single-page
> migrations to impact overall speedup, even for 512 pages, maximum theoretical
> speedup is limited to ~1.54x with infinite folio_copy() speedup.
>
> For 2MB THPs, folio copy overheads are significant even in single page
> migrations, with a theoretical speedup of ~3x with infinite folio_copy()
> speedup and up to ~33x for 512 pages.
>
> A realistic value of S (speedup of folio_copy()) is 7.5x for DMA offload
> based on my measurements for copying 512 2MB pages.
> This gives move_pages(), a practical speedup of 6.3x for 512 2MB page (also
> observed in the experiments below).
>
> DESIGN: A Pluggable Migrator Framework
> ---------------------------------------
>
> Introduce migrate_folios_batch_move():
>
> [ migrate_pages_batch() ]
> |
> |--> migrate_folio_unmap()
> |
> |--> try_to_unmap_flush()
> |
> +--> [ migrate_folios_batch_move() ] // new batched design
> |
> |--> Metadata migration
> | - Metadata prep: Copy flags, mappings, etc.
> | - Use MIGRATE_NO_COPY to skip the actual data copy.
> |
> |--> Batch copy folio data
> | - Migrator is configurable at runtime via sysfs.
> |
> | static_call(_folios_copy) // Pluggable migrators
> | / | \
> | v v v
> | [ Default ] [ MT CPU copy ] [ DMA Offload ]
> |
> +--> Update PTEs to point to dst folios and complete migration.
>
I just jump in the discussion, so this may be discussed before already.
Sorry if so. Why not
migrate_folios_unmap()
try_to_unmap_flush()
copy folios in parallel if possible
migrate_folios_move(): with MIGRATE_NO_COPY?
> User Control of Migrator:
>
> # echo 1 > /sys/kernel/dcbm/offloading
> |
> +--> Driver's sysfs handler
> |
> +--> calls start_offloading(&cpu_migrator)
> |
> +--> calls offc_update_migrator()
> |
> +--> static_call_update(_folios_copy, mig->migrate_offc)
>
> Later, During Migration ...
> migrate_folios_batch_move()
> |
> +--> static_call(_folios_copy) // Now dispatches to the selected migrator
> |
> +-> [ mtcopy | dcbm | kernel_default ]
>
[snip]
---
Best Regards,
Huang, Ying
next prev parent reply other threads:[~2025-09-24 1:49 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-23 17:47 [RFC V3 0/9] Accelerate page migration with batch copying and hardware offload Shivank Garg
2025-09-23 17:47 ` [RFC V3 1/9] mm/migrate: factor out code in move_to_new_folio() and migrate_folio_move() Shivank Garg
2025-10-02 10:30 ` Jonathan Cameron
2025-09-23 17:47 ` [RFC V3 2/9] mm/migrate: revive MIGRATE_NO_COPY in migrate_mode Shivank Garg
2025-09-23 17:47 ` [RFC V3 3/9] mm: Introduce folios_mc_copy() for batch copying folios Shivank Garg
2025-09-23 17:47 ` [RFC V3 4/9] mm/migrate: add migrate_folios_batch_move to batch the folio move operations Shivank Garg
2025-10-02 11:03 ` Jonathan Cameron
2025-10-16 9:17 ` Garg, Shivank
2025-09-23 17:47 ` [RFC V3 5/9] mm: add support for copy offload for folio Migration Shivank Garg
2025-10-02 11:10 ` Jonathan Cameron
2025-10-16 9:40 ` Garg, Shivank
2025-09-23 17:47 ` [RFC V3 6/9] mtcopy: introduce multi-threaded page copy routine Shivank Garg
2025-10-02 11:29 ` Jonathan Cameron
2025-10-20 8:28 ` Byungchul Park
2025-11-06 6:27 ` Garg, Shivank
2025-11-12 2:12 ` Byungchul Park
2025-09-23 17:47 ` [RFC V3 7/9] dcbm: add dma core batch migrator for batch page offloading Shivank Garg
2025-10-02 11:38 ` Jonathan Cameron
2025-10-16 9:59 ` Garg, Shivank
2025-09-23 17:47 ` [RFC V3 8/9] adjust NR_MAX_BATCHED_MIGRATION for testing Shivank Garg
2025-09-23 17:47 ` [RFC V3 9/9] mtcopy: spread threads across die " Shivank Garg
2025-09-24 1:49 ` Huang, Ying [this message]
2025-09-24 2:03 ` [RFC V3 0/9] Accelerate page migration with batch copying and hardware offload Zi Yan
2025-09-24 3:11 ` Huang, Ying
2025-09-24 3:22 ` Zi Yan
2025-10-02 17:10 ` Garg, Shivank
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87plbghb66.fsf@DESKTOP-5N7EMDA \
--to=ying.huang@linux.alibaba.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=Liam.Howlett@oracle.com \
--cc=Raghavendra.KodsaraThimmappa@amd.com \
--cc=akpm@linux-foundation.org \
--cc=alirad.malek@zptcorp.com \
--cc=apopple@nvidia.com \
--cc=bharata@amd.com \
--cc=byungchul@sk.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=david@redhat.com \
--cc=gourry@gourry.net \
--cc=ivecera@redhat.com \
--cc=jgg@ziepe.ca \
--cc=joshua.hahnjy@gmail.com \
--cc=justonli@chromium.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=lucas.demarchi@intel.com \
--cc=matthew.brost@intel.com \
--cc=mhocko@suse.com \
--cc=rakie.kim@sk.com \
--cc=rdunlap@infradead.org \
--cc=rientjes@google.com \
--cc=rppt@kernel.org \
--cc=shivankg@amd.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=vkoul@kernel.org \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=yiannis@zptcorp.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.