* Re: [v2 13/16] mm: handle PMD swap entries in UFFDIO_MOVE
[not found] <20260602142537.198755-14-usama.arif@linux.dev>
@ 2026-06-12 8:50 ` Lance Yang
0 siblings, 0 replies; only message in thread
From: Lance Yang @ 2026-06-12 8:50 UTC (permalink / raw)
To: usama.arif
Cc: akpm, david, chrisl, kasong, ljs, ziy, ying.huang, baoquan.he,
willy, youngjun.park, hannes, riel, shakeel.butt, alex, kas,
baohua, dev.jain, baolin.wang, npache, liam, ryan.roberts, vbabka,
lance.yang, linux-kernel, nphamcs, shikemeng, kernel-team,
linux-mm
+Cc linux-mm
On Tue, Jun 02, 2026 at 07:24:21AM -0700, Usama Arif wrote:
[...]
>@@ -2846,11 +2902,66 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pm
> }
>
> if (!pmd_trans_huge(src_pmdval)) {
>- spin_unlock(src_ptl);
> if (pmd_is_migration_entry(src_pmdval)) {
>+ spin_unlock(src_ptl);
> pmd_migration_entry_wait(mm, &src_pmdval);
> return -EAGAIN;
> }
>+ if (pmd_is_swap_entry(src_pmdval)) {
Looks buggy ... unless I missed something ...
>+ swp_entry_t entry;
>+ struct swap_info_struct *si;
>+
>+ /*
>+ * UFFDIO_MOVE on anon mappings requires single-owner
>+ * semantics; refuse to move a shared swap entry.
>+ */
>+ if (!pmd_swp_exclusive(src_pmdval)) {
>+ spin_unlock(src_ptl);
>+ return -EBUSY;
>+ }
>+
>+ entry = softleaf_from_pmd(src_pmdval);
>+ spin_unlock(src_ptl);
>+
>+ /* Pin the swap device against a racing swapoff. */
>+ si = get_swap_device(entry);
>+ if (unlikely(!si))
>+ return -EAGAIN;
>+
>+ src_folio = swap_cache_get_folio(entry);
We only check the first swap slot. Imagine we have something like this
after the PMD-sized swapcache folio was split while the PMD swap entry
was installed:
page table:
src PMD -> swap entry S
swap cache:
S + 0 -> no folio
S + 1 -> order-0 folio in the swap cache
S + 2 -> no folio
S + 3 -> order-0 folio in the swap cache
...
>+
>+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0,
>+ mm, src_addr,
>+ src_addr + HPAGE_PMD_SIZE);
>+ mmu_notifier_invalidate_range_start(&range);
>+
>+ if (src_folio) {
>+ folio_lock(src_folio);
>+ if (folio_nr_pages(src_folio) != HPAGE_PMD_NR) {
If S has a non-PMD-sized folio, this returns -EBUSY.
>+ err = -EBUSY;
>+ folio_unlock(src_folio);
>+ folio_put(src_folio);
>+ mmu_notifier_invalidate_range_end(&range);
>+ put_swap_device(si);
>+ return err;
>+ }
>+ }
>+
>+ dst_ptl = pmd_lockptr(mm, dst_pmd);
But if S has no folio, the initial lookup passes src_folio == NULL to
move_swap_pmd(), , which only rechecks S:
if (src_folio) {
[...]
} else if (swap_cache_has_folio(entry)) {
double_pt_unlock(dst_ptl, src_ptl);
return -EAGAIN;
}
So if S is empty, the move can still go ahead even if S + 1 ... S + 511
contain folios in the swap cache.
>+ err = move_swap_pmd(mm, dst_vma, dst_addr, src_addr,
>+ dst_pmd, src_pmd, dst_pmdval,
>+ src_pmdval, dst_ptl, src_ptl,
>+ src_folio, entry);
>+
In that case, checking only S misses the order-0 folios in later slots.
move_swap_pmd() can then move the PMD swap entry whole without calling
folio_move_anon_rmap() or updating folio->index for those later folios.
Note that move_swap_pte() already does this for PTE-mapped swap entries,
because a folio in the swap cache needs its index and mapping updated to
align with dst_vma.
If those folios are later faulted in at dst, their rmap metadata still
points at the old anon_vma/index. Later rmap users derive the virtual
address from folio->mapping and folio->index, so they can look at the
wrong VMA/address ...
Should check the whole PMD swap range before deciding there is no folio
in the swap cache to update?
Am I reading that code right?
>+ mmu_notifier_invalidate_range_end(&range);
>+ if (src_folio) {
>+ folio_unlock(src_folio);
>+ folio_put(src_folio);
>+ }
>+ put_swap_device(si);
>+ return err;
>+ }
>+ spin_unlock(src_ptl);
> return -ENOENT;
> }
>
>--
>2.52.0
>
Cheers, Lance
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2026-06-12 8:51 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260602142537.198755-14-usama.arif@linux.dev>
2026-06-12 8:50 ` [v2 13/16] mm: handle PMD swap entries in UFFDIO_MOVE Lance Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox