Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [v2 13/16] mm: handle PMD swap entries in UFFDIO_MOVE
       [not found] <20260602142537.198755-14-usama.arif@linux.dev>
@ 2026-06-12  8:50 ` Lance Yang
  0 siblings, 0 replies; only message in thread
From: Lance Yang @ 2026-06-12  8:50 UTC (permalink / raw)
  To: usama.arif
  Cc: akpm, david, chrisl, kasong, ljs, ziy, ying.huang, baoquan.he,
	willy, youngjun.park, hannes, riel, shakeel.butt, alex, kas,
	baohua, dev.jain, baolin.wang, npache, liam, ryan.roberts, vbabka,
	lance.yang, linux-kernel, nphamcs, shikemeng, kernel-team,
	linux-mm

+Cc linux-mm

On Tue, Jun 02, 2026 at 07:24:21AM -0700, Usama Arif wrote:
[...]
>@@ -2846,11 +2902,66 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pm
> 	}
> 
> 	if (!pmd_trans_huge(src_pmdval)) {
>-		spin_unlock(src_ptl);
> 		if (pmd_is_migration_entry(src_pmdval)) {
>+			spin_unlock(src_ptl);
> 			pmd_migration_entry_wait(mm, &src_pmdval);
> 			return -EAGAIN;
> 		}
>+		if (pmd_is_swap_entry(src_pmdval)) {

Looks buggy ... unless I missed something ...

>+			swp_entry_t entry;
>+			struct swap_info_struct *si;
>+
>+			/*
>+			 * UFFDIO_MOVE on anon mappings requires single-owner
>+			 * semantics; refuse to move a shared swap entry.
>+			 */
>+			if (!pmd_swp_exclusive(src_pmdval)) {
>+				spin_unlock(src_ptl);
>+				return -EBUSY;
>+			}
>+
>+			entry = softleaf_from_pmd(src_pmdval);
>+			spin_unlock(src_ptl);
>+
>+			/* Pin the swap device against a racing swapoff. */
>+			si = get_swap_device(entry);
>+			if (unlikely(!si))
>+				return -EAGAIN;
>+
>+			src_folio = swap_cache_get_folio(entry);

We only check the first swap slot. Imagine we have something like this
after the PMD-sized swapcache folio was split while the PMD swap entry
was installed:

page table:
  src PMD -> swap entry S

swap cache:
  S + 0   -> no folio
  S + 1   -> order-0 folio in the swap cache
  S + 2   -> no folio
  S + 3   -> order-0 folio in the swap cache
  ...

>+
>+			mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0,
>+						mm, src_addr,
>+						src_addr + HPAGE_PMD_SIZE);
>+			mmu_notifier_invalidate_range_start(&range);
>+
>+			if (src_folio) {
>+				folio_lock(src_folio);
>+				if (folio_nr_pages(src_folio) != HPAGE_PMD_NR) {

If S has a non-PMD-sized folio, this returns -EBUSY.

>+					err = -EBUSY;
>+					folio_unlock(src_folio);
>+					folio_put(src_folio);
>+					mmu_notifier_invalidate_range_end(&range);
>+					put_swap_device(si);
>+					return err;
>+				}
>+			}
>+
>+			dst_ptl = pmd_lockptr(mm, dst_pmd);

But if S has no folio, the initial lookup passes src_folio == NULL to
move_swap_pmd(), , which only rechecks S:

	if (src_folio) {
[...]
	} else if (swap_cache_has_folio(entry)) {
		double_pt_unlock(dst_ptl, src_ptl);
		return -EAGAIN;
	}

So if S is empty, the move can still go ahead even if S + 1 ... S + 511
contain folios in the swap cache.

>+			err = move_swap_pmd(mm, dst_vma, dst_addr, src_addr,
>+					    dst_pmd, src_pmd, dst_pmdval,
>+					    src_pmdval, dst_ptl, src_ptl,
>+					    src_folio, entry);
>+

In that case, checking only S misses the order-0 folios in later slots.
move_swap_pmd() can then move the PMD swap entry whole without calling
folio_move_anon_rmap() or updating folio->index for those later folios.

Note that move_swap_pte() already does this for PTE-mapped swap entries,
because a folio in the swap cache needs its index and mapping updated to
align with dst_vma.

If those folios are later faulted in at dst, their rmap metadata still
points at the old anon_vma/index. Later rmap users derive the virtual
address from folio->mapping and folio->index, so they can look at the
wrong VMA/address ...

Should check the whole PMD swap range before deciding there is no folio
in the swap cache to update?

Am I reading that code right?

>+			mmu_notifier_invalidate_range_end(&range);
>+			if (src_folio) {
>+				folio_unlock(src_folio);
>+				folio_put(src_folio);
>+			}
>+			put_swap_device(si);
>+			return err;
>+		}
>+		spin_unlock(src_ptl);
> 		return -ENOENT;
> 	}
> 
>-- 
>2.52.0
>

Cheers, Lance


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2026-06-12  8:51 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260602142537.198755-14-usama.arif@linux.dev>
2026-06-12  8:50 ` [v2 13/16] mm: handle PMD swap entries in UFFDIO_MOVE Lance Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox