The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: usama.arif@linux.dev
Cc: akpm@linux-foundation.org, david@kernel.org, chrisl@kernel.org,
	kasong@tencent.com, ljs@kernel.org, ziy@nvidia.com,
	ying.huang@linux.alibaba.com, baoquan.he@linux.dev,
	willy@infradead.org, youngjun.park@lge.com, hannes@cmpxchg.org,
	riel@surriel.com, shakeel.butt@linux.dev, alex@ghiti.fr,
	kas@kernel.org, baohua@kernel.org, dev.jain@arm.com,
	baolin.wang@linux.alibaba.com, npache@redhat.com,
	liam@infradead.org, ryan.roberts@arm.com, vbabka@kernel.org,
	lance.yang@linux.dev, linux-kernel@vger.kernel.org,
	nphamcs@gmail.com, shikemeng@huaweicloud.com,
	kernel-team@meta.com, linux-mm@kvack.org
Subject: Re: [v2 13/16] mm: handle PMD swap entries in UFFDIO_MOVE
Date: Fri, 12 Jun 2026 16:50:27 +0800	[thread overview]
Message-ID: <20260612085027.5401-1-lance.yang@linux.dev> (raw)
In-Reply-To: <20260602142537.198755-14-usama.arif@linux.dev>

+Cc linux-mm

On Tue, Jun 02, 2026 at 07:24:21AM -0700, Usama Arif wrote:
[...]
>@@ -2846,11 +2902,66 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pm
> 	}
> 
> 	if (!pmd_trans_huge(src_pmdval)) {
>-		spin_unlock(src_ptl);
> 		if (pmd_is_migration_entry(src_pmdval)) {
>+			spin_unlock(src_ptl);
> 			pmd_migration_entry_wait(mm, &src_pmdval);
> 			return -EAGAIN;
> 		}
>+		if (pmd_is_swap_entry(src_pmdval)) {

Looks buggy ... unless I missed something ...

>+			swp_entry_t entry;
>+			struct swap_info_struct *si;
>+
>+			/*
>+			 * UFFDIO_MOVE on anon mappings requires single-owner
>+			 * semantics; refuse to move a shared swap entry.
>+			 */
>+			if (!pmd_swp_exclusive(src_pmdval)) {
>+				spin_unlock(src_ptl);
>+				return -EBUSY;
>+			}
>+
>+			entry = softleaf_from_pmd(src_pmdval);
>+			spin_unlock(src_ptl);
>+
>+			/* Pin the swap device against a racing swapoff. */
>+			si = get_swap_device(entry);
>+			if (unlikely(!si))
>+				return -EAGAIN;
>+
>+			src_folio = swap_cache_get_folio(entry);

We only check the first swap slot. Imagine we have something like this
after the PMD-sized swapcache folio was split while the PMD swap entry
was installed:

page table:
  src PMD -> swap entry S

swap cache:
  S + 0   -> no folio
  S + 1   -> order-0 folio in the swap cache
  S + 2   -> no folio
  S + 3   -> order-0 folio in the swap cache
  ...

>+
>+			mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0,
>+						mm, src_addr,
>+						src_addr + HPAGE_PMD_SIZE);
>+			mmu_notifier_invalidate_range_start(&range);
>+
>+			if (src_folio) {
>+				folio_lock(src_folio);
>+				if (folio_nr_pages(src_folio) != HPAGE_PMD_NR) {

If S has a non-PMD-sized folio, this returns -EBUSY.

>+					err = -EBUSY;
>+					folio_unlock(src_folio);
>+					folio_put(src_folio);
>+					mmu_notifier_invalidate_range_end(&range);
>+					put_swap_device(si);
>+					return err;
>+				}
>+			}
>+
>+			dst_ptl = pmd_lockptr(mm, dst_pmd);

But if S has no folio, the initial lookup passes src_folio == NULL to
move_swap_pmd(), , which only rechecks S:

	if (src_folio) {
[...]
	} else if (swap_cache_has_folio(entry)) {
		double_pt_unlock(dst_ptl, src_ptl);
		return -EAGAIN;
	}

So if S is empty, the move can still go ahead even if S + 1 ... S + 511
contain folios in the swap cache.

>+			err = move_swap_pmd(mm, dst_vma, dst_addr, src_addr,
>+					    dst_pmd, src_pmd, dst_pmdval,
>+					    src_pmdval, dst_ptl, src_ptl,
>+					    src_folio, entry);
>+

In that case, checking only S misses the order-0 folios in later slots.
move_swap_pmd() can then move the PMD swap entry whole without calling
folio_move_anon_rmap() or updating folio->index for those later folios.

Note that move_swap_pte() already does this for PTE-mapped swap entries,
because a folio in the swap cache needs its index and mapping updated to
align with dst_vma.

If those folios are later faulted in at dst, their rmap metadata still
points at the old anon_vma/index. Later rmap users derive the virtual
address from folio->mapping and folio->index, so they can look at the
wrong VMA/address ...

Should check the whole PMD swap range before deciding there is no folio
in the swap cache to update?

Am I reading that code right?

>+			mmu_notifier_invalidate_range_end(&range);
>+			if (src_folio) {
>+				folio_unlock(src_folio);
>+				folio_put(src_folio);
>+			}
>+			put_swap_device(si);
>+			return err;
>+		}
>+		spin_unlock(src_ptl);
> 		return -ENOENT;
> 	}
> 
>-- 
>2.52.0
>

Cheers, Lance

  reply	other threads:[~2026-06-12  8:51 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-02 14:24 [v2 00/16] mm: PMD-level swap entries for anonymous THPs Usama Arif
2026-06-02 14:24 ` [v2 01/16] mm: add softleaf_to_pmd() and convert existing callers Usama Arif
2026-06-02 14:24 ` [v2 02/16] mm: extract mm_prepare_for_swap_entries() helper Usama Arif
2026-06-02 14:24 ` [v2 03/16] fs/proc: use softleaf_has_pfn() in pagemap PMD walker Usama Arif
2026-06-02 14:24 ` [v2 04/16] mm/huge_memory: move softleaf_to_folio() inside migration branch Usama Arif
2026-06-02 14:24 ` [v2 05/16] mm/migrate_device: move softleaf_to_folio() inside device-private branch Usama Arif
2026-06-02 14:24 ` [v2 06/16] mm: rename ARCH_ENABLE_THP_MIGRATION to ARCH_SUPPORTS_PMD_SOFTLEAF Usama Arif
2026-06-02 14:24 ` [v2 07/16] mm: add PMD swap entry detection support Usama Arif
2026-06-02 14:24 ` [v2 08/16] mm: add PMD swap entry splitting support Usama Arif
2026-06-02 14:24 ` [v2 09/16] mm: handle PMD swap entries in fork path Usama Arif
2026-06-02 14:24 ` [v2 10/16] mm: swap in PMD swap entries as whole THPs during swapoff Usama Arif
2026-06-02 14:24 ` [v2 11/16] mm: handle PMD swap entries in non-present PMD walkers Usama Arif
2026-06-12  6:45   ` Lance Yang
2026-06-12 15:05     ` Usama Arif
2026-06-12 15:21       ` Lance Yang
2026-06-02 14:24 ` [v2 12/16] mm: handle PMD swap entries in MADV_WILLNEED Usama Arif
2026-06-02 14:24 ` [v2 13/16] mm: handle PMD swap entries in UFFDIO_MOVE Usama Arif
2026-06-12  8:50   ` Lance Yang [this message]
2026-06-02 14:24 ` [v2 14/16] mm: handle PMD swap entry faults on swap-in Usama Arif
2026-06-02 14:24 ` [v2 15/16] mm: install PMD swap entries on swap-out Usama Arif
2026-06-12 14:21   ` Lance Yang
2026-06-02 14:24 ` [v2 16/16] selftests/mm: add PMD swap entry tests Usama Arif
2026-06-09 14:29 ` [v2 00/16] mm: PMD-level swap entries for anonymous THPs Usama Arif
2026-06-10 12:24   ` David Hildenbrand (Arm)
2026-06-10 13:01     ` Lance Yang
2026-06-10 13:48       ` David Hildenbrand (Arm)
2026-06-10 14:44         ` Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260612085027.5401-1-lance.yang@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=alex@ghiti.fr \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=baoquan.he@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=kas@kernel.org \
    --cc=kasong@tencent.com \
    --cc=kernel-team@meta.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=npache@redhat.com \
    --cc=nphamcs@gmail.com \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=youngjun.park@lge.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox