Re: [RFC PATCH 1/1] mm: batch page copies in folio_copy() and folio_mc_copy()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Shivank Garg <shivankg@amd.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org
Cc: Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Thomas Gleixner <tglx@kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H . Peter Anvin" <hpa@zytor.com>,
	Ankur Arora <ankur.a.arora@oracle.com>,
	Bharata B Rao <bharata@amd.com>,
	Hrushikesh Salunke <hsalunke@amd.com>,
	David Rientjes <rientjes@google.com>
Subject: Re: [RFC PATCH 1/1] mm: batch page copies in folio_copy() and folio_mc_copy()
Date: Tue, 12 May 2026 11:31:22 +0200	[thread overview]
Message-ID: <073e5e2c-7102-4141-b0d7-fa5635f811f5@kernel.org> (raw)
In-Reply-To: <20260427142036.111940-4-shivankg@amd.com>

On 4/27/26 16:20, Shivank Garg wrote:
> Rewrite folio_copy() and folio_mc_copy() as thin wrappers around new
> batched helpers copy_highpages() and copy_mc_highpages().
> 
> The current implementations iterate copy_highpage() (or its #MC-aware
> variant) per 4 KB page. For a single 2 MB folio that loop runs 512
> times and pays, per page:
> 
>   - kmap_local_page() / kunmap_local()
>   - cond_resched()
>   - one invocation of the architecture copy_page()/memcpy() primitive
> 
> The new helpers issue a single copy_mc_to_kernel()/memcpy() over
> the whole contiguous range when CONFIG_HIGHMEM is off and no
> architecture overrides (__HAVE_ARCH_COPY_HIGHPAGE) copy_highpage().
> HIGHMEM and arch overrides keep the existing per-page path.
> 
> Tested on dual-socket AMD EPYC 9655 (Zen 5) with a CXL.mem node.
> In-kernel folio_mc_copy() microbenchmark on 2 MB folios, source
> evicted from cache before each iteration and measured throughput:
> 
>   direction         baseline GB/s   optimized GB/s   speedup
>   DRAM0 -> DRAM1     18.65 ± 1.37    38.03 ± 3.21     2.04x
>   DRAM0 -> CXL       25.46 ± 2.89    39.29 ± 1.17     1.54x
>   CXL   -> DRAM0     20.61 ± 3.95    35.07 ± 0.62     1.70x
> 
> End-to-end move_pages(2) throughput on anonymous 2 MB mTHP folios,
> 1 GB migrated per run:
> 
>   direction         baseline GB/s   optimized GB/s   speedup
>   DRAM0 -> DRAM1      7.20 ± 0.03     8.01 ± 0.02     1.11x
>   DRAM0 -> CXL       11.12 ± 0.15    13.07 ± 0.03     1.18x
>   DRAM1 -> DRAM0      7.21 ± 0.02     7.95 ± 0.02     1.10x
>   CXL   -> DRAM0      9.10 ± 0.05     9.49 ± 0.01     1.04x
> 
> On AMD EPYC 7713 (Zen 3 / Milan, REP_GOOD without FSRM/ERMS) the
> folio_copy() bulk path regresses because memcpy() falls through to
> memcpy_orig (an unrolled movq loop), which is slower than the
> per-page copy_page() (microcoded rep movsq) it replaces. 

Do you know what the reason for that fallback is? Could it be fixed (e.g., when
we detect page alignment or sth like that?)

-- 
Cheers,

David

next prev parent reply	other threads:[~2026-05-12  9:31 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-27 14:20 [RFC PATCH 0/1] batch page copies in folio_copy() and folio_mc_copy() Shivank Garg
2026-04-27 14:20 ` [RFC PATCH 1/1] mm: " Shivank Garg
2026-05-12  9:31   ` David Hildenbrand (Arm) [this message]
2026-05-14  5:17     ` Garg, Shivank

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=073e5e2c-7102-4141-b0d7-fa5635f811f5@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=ankur.a.arora@oracle.com \
    --cc=bharata@amd.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=hsalunke@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    --cc=shivankg@amd.com \
    --cc=surenb@google.com \
    --cc=tglx@kernel.org \
    --cc=vbabka@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.