public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <ljs@kernel.org>
To: Usama Arif <usama.arif@linux.dev>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	chrisl@kernel.org, kasong@tencent.com, ziy@nvidia.com,
	 bhe@redhat.com, willy@infradead.org, youngjun.park@lge.com,
	hannes@cmpxchg.org,  riel@surriel.com, shakeel.butt@linux.dev,
	alex@ghiti.fr, kas@kernel.org,  baohua@kernel.org,
	dev.jain@arm.com, baolin.wang@linux.alibaba.com,
	 npache@redhat.com, Liam.Howlett@oracle.com,
	ryan.roberts@arm.com,  Vlastimil Babka <vbabka@kernel.org>,
	lance.yang@linux.dev, linux-kernel@vger.kernel.org,
	 nphamcs@gmail.com, shikemeng@huaweicloud.com,
	kernel-team@meta.com
Subject: Re: [PATCH 00/13] mm: PMD-level swap entries for anonymous THPs
Date: Wed, 29 Apr 2026 13:52:38 +0100	[thread overview]
Message-ID: <afH-2TT48qUC7L__@lucifer> (raw)
In-Reply-To: <ef54bbc4-ad15-4782-b4db-b21812708102@linux.dev>

On Wed, Apr 29, 2026 at 10:39:23AM +0100, Usama Arif wrote:
>
>
> On 28/04/2026 20:54, David Hildenbrand (Arm) wrote:
> > On 4/27/26 12:01, Usama Arif wrote:
> >> When reclaim swaps out a PMD-mapped anonymous THP today, the PMD is
> >> split into 512 PTE-level swap entries via TTU_SPLIT_HUGE_PMD before
> >> unmap.
> >>
> >> This series introduces a PMD-level swap entry. The huge mapping is
> >> preserved across the swap round-trip, and do_huge_pmd_swap_page()
> >> resolves the entire 2 MB region in a single fault on swap-in,
> >> no khugepaged involvement is needed. swap_map metadata is identical
> >> either way (512 single-slot counts), so the PTE split buys nothing
> >> on the swap side, it is purely a page-table representation change.
> >>
> >> This work was brought about after Hugh reported that one of the
> >> major blockers for having lazy page table deposit is the lack of
> >> PMD swap entries [1]. However, this series has benefits of its
> >> own:
> >> - The huge mapping is restored on swap-in.  Today even when the
> >>   folio is still in swap cache as a single 2 MB folio, the swap-in
> >>   path installs 512 PTE mappings -- the PMD mapping is gone, the
> >>   freshly-materialised PTE table sticks around, and only
> >>   khugepaged can later collapse the range back into a THP.
> >>   do_huge_pmd_swap_page() reinstalls the PMD mapping directly in
> >>   one fault, no khugepaged involvement.
> >
> > Ack, that's nice.
> >
> >> - Memory saved per swapped-out THP *once lazy page table deposit is
> >>   merged* [2]. With lazy page table deposit [2], splitting a PMD into
> >>   512 PTE swap entries forces allocation of a 4 KB PTE table page.
> >>   The new path leaves the pgtable hierarchy at PMD level and avoids
> >>   that allocation entirely.
> >>   This will save memory when swapping, which is likely when there is
> >>   memory pressure and exactly when allocations are most likely to
> >>   fail.
> >
> > Also ack.
> >
> >> - Walkers (zap, mprotect, smaps, pagemap, soft-dirty, uffd-wp)
> >>   visit one PMD entry instead of 512 PTEs, reducing traversal
> >>   time and lock-hold windows.
> >
> > Right.
> >
> >>
> >> The swap entry value is identical to 512 PTE swap entries (same
> >> type, same starting offset), so swap_map refcounting is unchanged.
> >> Only the page-table representation differs; the swap slot allocator,
> >> swap I/O, and swap cache are untouched.  The new path falls back to
> >> the existing PTE-split path whenever a PMD-order resource is
> >> unavailable: zswap enabled, non-contiguous swap allocation
> >> (THP_SWPOUT_FALLBACK), PMD-order folio allocation failure on swap-in
> >> or fork, racing folio split, or rmap-driven split on a swapcache
> >> folio.  Walkers that previously assumed every non-present PMD encodes
> >> a PFN (migration / device_private) are taught to recognise PMD swap
> >> entries.
> >
> > All sounds nice. I'll get to review this soon. LSF/MM and travel will slow me a
> > bit down in May :(
> >
>
> Thanks! Appreciate it!
>

My email is a disaster right now, various other stuff + lately working hard on
the thing-I'm-going-to-talk-about-at-LSF and the-slides-for-that has left me
with only backlog but... :) will want to have a look post-LSF also. But May
likely to be slow for me alos.

Cheers, Lorenzo

  reply	other threads:[~2026-04-29 12:52 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-27 10:01 [PATCH 00/13] mm: PMD-level swap entries for anonymous THPs Usama Arif
2026-04-27 10:01 ` [PATCH 01/13] mm: add softleaf_to_pmd() and convert existing callers Usama Arif
2026-04-27 10:01 ` [PATCH 02/13] mm: extract ensure_on_mmlist() helper Usama Arif
2026-04-27 10:01 ` [PATCH 03/13] fs/proc: use softleaf_has_pfn() in pagemap PMD walker Usama Arif
2026-04-27 10:01 ` [PATCH 04/13] mm/huge_memory: move softleaf_to_folio() inside migration branch Usama Arif
2026-04-27 10:01 ` [PATCH 05/13] mm: add PMD swap entry detection support Usama Arif
2026-04-27 10:01 ` [PATCH 06/13] mm: add PMD swap entry splitting support Usama Arif
2026-04-27 10:01 ` [PATCH 07/13] mm: handle PMD swap entries in fork path Usama Arif
2026-04-27 10:01 ` [PATCH 08/13] mm: swap in PMD swap entries as whole THPs during swapoff Usama Arif
2026-04-27 10:01 ` [PATCH 09/13] mm: handle PMD swap entries in non-present PMD walkers Usama Arif
2026-04-27 10:01 ` [PATCH 10/13] mm: handle PMD swap entries in UFFDIO_MOVE Usama Arif
2026-04-27 10:02 ` [PATCH 11/13] mm: handle PMD swap entry faults on swap-in Usama Arif
2026-04-27 10:02 ` [PATCH 12/13] mm: install PMD swap entries on swap-out Usama Arif
2026-04-27 10:02 ` [PATCH 13/13] selftests/mm: add PMD swap entry tests Usama Arif
2026-04-27 13:38 ` [PATCH 00/13] mm: PMD-level swap entries for anonymous THPs Usama Arif
2026-04-27 18:26 ` Zi Yan
2026-04-27 20:12   ` Usama Arif
2026-04-29 12:57     ` Zi Yan
2026-04-28 19:54 ` David Hildenbrand (Arm)
2026-04-29  9:39   ` Usama Arif
2026-04-29 12:52     ` Lorenzo Stoakes [this message]
2026-04-29 10:44 ` Kairui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afH-2TT48qUC7L__@lucifer \
    --to=ljs@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@ghiti.fr \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=kas@kernel.org \
    --cc=kasong@tencent.com \
    --cc=kernel-team@meta.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npache@redhat.com \
    --cc=nphamcs@gmail.com \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=willy@infradead.org \
    --cc=youngjun.park@lge.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox