From: Lance Yang <lance.yang@linux.dev>
To: usama.arif@linux.dev
Cc: akpm@linux-foundation.org, david@kernel.org, chrisl@kernel.org,
kasong@tencent.com, ljs@kernel.org, ziy@nvidia.com,
ying.huang@linux.alibaba.com, baoquan.he@linux.dev,
willy@infradead.org, youngjun.park@lge.com, hannes@cmpxchg.org,
riel@surriel.com, shakeel.butt@linux.dev, alex@ghiti.fr,
kas@kernel.org, baohua@kernel.org, dev.jain@arm.com,
baolin.wang@linux.alibaba.com, npache@redhat.com,
liam@infradead.org, ryan.roberts@arm.com, vbabka@kernel.org,
lance.yang@linux.dev, linux-kernel@vger.kernel.org,
nphamcs@gmail.com, shikemeng@huaweicloud.com,
kernel-team@meta.com, linux-mm@kvack.org
Subject: Re: [v2 13/16] mm: handle PMD swap entries in UFFDIO_MOVE
Date: Fri, 12 Jun 2026 16:50:27 +0800 [thread overview]
Message-ID: <20260612085027.5401-1-lance.yang@linux.dev> (raw)
In-Reply-To: <20260602142537.198755-14-usama.arif@linux.dev>
+Cc linux-mm
On Tue, Jun 02, 2026 at 07:24:21AM -0700, Usama Arif wrote:
[...]
>@@ -2846,11 +2902,66 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pm
> }
>
> if (!pmd_trans_huge(src_pmdval)) {
>- spin_unlock(src_ptl);
> if (pmd_is_migration_entry(src_pmdval)) {
>+ spin_unlock(src_ptl);
> pmd_migration_entry_wait(mm, &src_pmdval);
> return -EAGAIN;
> }
>+ if (pmd_is_swap_entry(src_pmdval)) {
Looks buggy ... unless I missed something ...
>+ swp_entry_t entry;
>+ struct swap_info_struct *si;
>+
>+ /*
>+ * UFFDIO_MOVE on anon mappings requires single-owner
>+ * semantics; refuse to move a shared swap entry.
>+ */
>+ if (!pmd_swp_exclusive(src_pmdval)) {
>+ spin_unlock(src_ptl);
>+ return -EBUSY;
>+ }
>+
>+ entry = softleaf_from_pmd(src_pmdval);
>+ spin_unlock(src_ptl);
>+
>+ /* Pin the swap device against a racing swapoff. */
>+ si = get_swap_device(entry);
>+ if (unlikely(!si))
>+ return -EAGAIN;
>+
>+ src_folio = swap_cache_get_folio(entry);
We only check the first swap slot. Imagine we have something like this
after the PMD-sized swapcache folio was split while the PMD swap entry
was installed:
page table:
src PMD -> swap entry S
swap cache:
S + 0 -> no folio
S + 1 -> order-0 folio in the swap cache
S + 2 -> no folio
S + 3 -> order-0 folio in the swap cache
...
>+
>+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0,
>+ mm, src_addr,
>+ src_addr + HPAGE_PMD_SIZE);
>+ mmu_notifier_invalidate_range_start(&range);
>+
>+ if (src_folio) {
>+ folio_lock(src_folio);
>+ if (folio_nr_pages(src_folio) != HPAGE_PMD_NR) {
If S has a non-PMD-sized folio, this returns -EBUSY.
>+ err = -EBUSY;
>+ folio_unlock(src_folio);
>+ folio_put(src_folio);
>+ mmu_notifier_invalidate_range_end(&range);
>+ put_swap_device(si);
>+ return err;
>+ }
>+ }
>+
>+ dst_ptl = pmd_lockptr(mm, dst_pmd);
But if S has no folio, the initial lookup passes src_folio == NULL to
move_swap_pmd(), , which only rechecks S:
if (src_folio) {
[...]
} else if (swap_cache_has_folio(entry)) {
double_pt_unlock(dst_ptl, src_ptl);
return -EAGAIN;
}
So if S is empty, the move can still go ahead even if S + 1 ... S + 511
contain folios in the swap cache.
>+ err = move_swap_pmd(mm, dst_vma, dst_addr, src_addr,
>+ dst_pmd, src_pmd, dst_pmdval,
>+ src_pmdval, dst_ptl, src_ptl,
>+ src_folio, entry);
>+
In that case, checking only S misses the order-0 folios in later slots.
move_swap_pmd() can then move the PMD swap entry whole without calling
folio_move_anon_rmap() or updating folio->index for those later folios.
Note that move_swap_pte() already does this for PTE-mapped swap entries,
because a folio in the swap cache needs its index and mapping updated to
align with dst_vma.
If those folios are later faulted in at dst, their rmap metadata still
points at the old anon_vma/index. Later rmap users derive the virtual
address from folio->mapping and folio->index, so they can look at the
wrong VMA/address ...
Should check the whole PMD swap range before deciding there is no folio
in the swap cache to update?
Am I reading that code right?
>+ mmu_notifier_invalidate_range_end(&range);
>+ if (src_folio) {
>+ folio_unlock(src_folio);
>+ folio_put(src_folio);
>+ }
>+ put_swap_device(si);
>+ return err;
>+ }
>+ spin_unlock(src_ptl);
> return -ENOENT;
> }
>
>--
>2.52.0
>
Cheers, Lance
next prev parent reply other threads:[~2026-06-12 8:51 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-02 14:24 [v2 00/16] mm: PMD-level swap entries for anonymous THPs Usama Arif
2026-06-02 14:24 ` [v2 01/16] mm: add softleaf_to_pmd() and convert existing callers Usama Arif
2026-06-02 14:24 ` [v2 02/16] mm: extract mm_prepare_for_swap_entries() helper Usama Arif
2026-06-02 14:24 ` [v2 03/16] fs/proc: use softleaf_has_pfn() in pagemap PMD walker Usama Arif
2026-06-02 14:24 ` [v2 04/16] mm/huge_memory: move softleaf_to_folio() inside migration branch Usama Arif
2026-06-02 14:24 ` [v2 05/16] mm/migrate_device: move softleaf_to_folio() inside device-private branch Usama Arif
2026-06-02 14:24 ` [v2 06/16] mm: rename ARCH_ENABLE_THP_MIGRATION to ARCH_SUPPORTS_PMD_SOFTLEAF Usama Arif
2026-06-02 14:24 ` [v2 07/16] mm: add PMD swap entry detection support Usama Arif
2026-06-02 14:24 ` [v2 08/16] mm: add PMD swap entry splitting support Usama Arif
2026-06-02 14:24 ` [v2 09/16] mm: handle PMD swap entries in fork path Usama Arif
2026-06-02 14:24 ` [v2 10/16] mm: swap in PMD swap entries as whole THPs during swapoff Usama Arif
2026-06-02 14:24 ` [v2 11/16] mm: handle PMD swap entries in non-present PMD walkers Usama Arif
2026-06-12 6:45 ` Lance Yang
2026-06-12 15:05 ` Usama Arif
2026-06-12 15:21 ` Lance Yang
2026-06-02 14:24 ` [v2 12/16] mm: handle PMD swap entries in MADV_WILLNEED Usama Arif
2026-06-02 14:24 ` [v2 13/16] mm: handle PMD swap entries in UFFDIO_MOVE Usama Arif
2026-06-12 8:50 ` Lance Yang [this message]
2026-06-02 14:24 ` [v2 14/16] mm: handle PMD swap entry faults on swap-in Usama Arif
2026-06-02 14:24 ` [v2 15/16] mm: install PMD swap entries on swap-out Usama Arif
2026-06-12 14:21 ` Lance Yang
2026-06-02 14:24 ` [v2 16/16] selftests/mm: add PMD swap entry tests Usama Arif
2026-06-09 14:29 ` [v2 00/16] mm: PMD-level swap entries for anonymous THPs Usama Arif
2026-06-10 12:24 ` David Hildenbrand (Arm)
2026-06-10 13:01 ` Lance Yang
2026-06-10 13:48 ` David Hildenbrand (Arm)
2026-06-10 14:44 ` Usama Arif
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260612085027.5401-1-lance.yang@linux.dev \
--to=lance.yang@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=alex@ghiti.fr \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=baoquan.he@linux.dev \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=hannes@cmpxchg.org \
--cc=kas@kernel.org \
--cc=kasong@tencent.com \
--cc=kernel-team@meta.com \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=npache@redhat.com \
--cc=nphamcs@gmail.com \
--cc=riel@surriel.com \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=usama.arif@linux.dev \
--cc=vbabka@kernel.org \
--cc=willy@infradead.org \
--cc=ying.huang@linux.alibaba.com \
--cc=youngjun.park@lge.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.