Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] mm/userfaultfd: fix UFFDIO_COPY retry private/shared VMA panic
@ 2026-05-14  0:54 Michael Bommarito
  2026-05-14  0:54 ` [PATCH 1/1] mm/userfaultfd: validate effective UFFDIO_COPY ops after retry Michael Bommarito
  2026-05-14 18:00 ` [PATCH 0/1] mm/userfaultfd: fix UFFDIO_COPY retry private/shared VMA panic Mike Rapoport
  0 siblings, 2 replies; 3+ messages in thread
From: Michael Bommarito @ 2026-05-14  0:54 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, Peter Xu
  Cc: David Carlier, linux-mm, linux-kernel

Hi,

mfill_copy_folio_retry() drops the destination VMA lock before
copy_from_user() and reacquires it afterwards.  Commit 292411fda25b
("mm/userfaultfd: detect VMA type change after copy retry in
mfill_copy_folio_retry()") added a comparison of vma_uffd_ops() across
that window, but the comparison is not tight enough for private/shared
shmem swaps: both private and shared shmem VMAs expose shmem_uffd_ops
through vm_ops, while UFFDIO_COPY into a MAP_PRIVATE file-backed VMA
overrides the effective copy ops to anon_uffd_ops at
mfill_atomic_pte_copy() time.

If the destination is replaced from MAP_PRIVATE shmem to MAP_SHARED
shmem during the retry, vma_uffd_ops() still compares equal across
the window, the override flip is not detected, and the stale
anonymous folio is installed into the new shared shmem VMA.
mfill_atomic_install_pte() sees a folio without page-cache mapping,
calls folio_add_new_anon_rmap(), and __folio_set_anon() reaches
BUG_ON(!anon_vma) because the new shared shmem VMA has no anon_vma.

Reproducer (UML+KASAN, 7.1-rc2-00002-g8d90b09e6741, unprivileged uid
65534 with vm.unprivileged_userfaultfd=1):

  pre-fix:
    PROCESS_UID=65534 EUID=65534
    DST_INITIAL=MAP_PRIVATE_SHMEM addr=0x40041000
    SRC_INITIAL=UFFD_MISSING_ANON addr=0x40042000
    SOURCE_USERFAULT addr=0x40042000 flags=0x0
    DST_REPLACED_WITH_SHARED=ok
    DST_REREGISTERED_AFTER_REMAP=ok
    SOURCE_RESOLVED=ok
    BUG: failure at mm/rmap.c:1468/__folio_set_anon()!
    Kernel panic - not syncing: BUG!

  post-fix:
    DST_UFFDIO_COPY_IOCTL_RET=-1 errno=11 copy=-11
    RETRY_RESULT=-11
    (no BUG / WARN / KASAN signal in dmesg)

The patch introduces a vma_uffd_copy_ops() helper that applies the
MAP_PRIVATE override inline.  mfill_copy_folio_retry() now compares both
the raw vma_uffd_ops() and the effective copy ops across the dropped-lock
window: the raw comparison preserves 292411fda25b's VMA-type replacement
guard, while the effective comparison catches the private/shared shmem
override flip.  The mfill_atomic_pte_copy() call site goes through the
same helper, preserving today's semantics.  Because the override is
applied on both sides, a stable MAP_PRIVATE shmem VMA returns
&anon_uffd_ops on both effective-copy checks and the comparison still
succeeds, so the change does not reintroduce the spurious -EAGAIN that
v5/v6 of 292411fda25b's series triggered on MAP_PRIVATE shmem (see that
series's v6 changelog).

A separate concern from Peter Xu's review of v1 of 292411fda25b's
series -- replacement with a different shmem VMA carrying the same
flags but a different inode -- is out of scope here and is also
unaddressed by 292411fda25b.

Testing.

  - x86_64 UML build with KASAN clean.
  - Reproducer above: pre-fix panics deterministically on the first
    iteration; post-fix returns -EAGAIN with empty dmesg.
  - tools/testing/selftests/mm/uffd-stress {anon,shmem,shmem-private}
    16M / 4 cpus, 4 bounces each, KASAN-silent on stock and patched.
  - tools/testing/selftests/mm/uffd-unit-tests on stock and patched:
    identical pass / skip profile through the events block; both
    hit the same pre-existing UML arch limitation in the
    "poison on anon" case at arch/um/kernel/trap.c:198, unrelated
    to this patch.
  - scripts/checkpatch.pl --strict clean.

Fixes: 292411fda25b ("mm/userfaultfd: detect VMA type change after copy retry in mfill_copy_folio_retry()")

Michael Bommarito (1):
  mm/userfaultfd: validate effective UFFDIO_COPY ops after retry

 mm/userfaultfd.c | 40 ++++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)

--
2.46.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/1] mm/userfaultfd: validate effective UFFDIO_COPY ops after retry
  2026-05-14  0:54 [PATCH 0/1] mm/userfaultfd: fix UFFDIO_COPY retry private/shared VMA panic Michael Bommarito
@ 2026-05-14  0:54 ` Michael Bommarito
  2026-05-14 18:00 ` [PATCH 0/1] mm/userfaultfd: fix UFFDIO_COPY retry private/shared VMA panic Mike Rapoport
  1 sibling, 0 replies; 3+ messages in thread
From: Michael Bommarito @ 2026-05-14  0:54 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, Peter Xu
  Cc: David Carlier, linux-mm, linux-kernel

UFFDIO_COPY fills MAP_PRIVATE file-backed VMAs with anonymous memory.
mfill_atomic_pte_copy() implements that by overriding the VMA's uffd ops
with anon_uffd_ops when VM_SHARED is not set.

mfill_copy_folio_retry() can drop the destination VMA lock after an
initial copy_from_user() failure and reacquire the destination VMA. It
currently checks whether vma_uffd_ops() changed while the lock was
dropped, but that is not the same as checking whether the effective
UFFDIO_COPY ops changed.

Private and shared shmem VMAs both expose shmem_uffd_ops through vm_ops.
If a private shmem destination is replaced with a shared shmem destination
while the retry has dropped the lock, vma_uffd_ops() still compares equal
even though the effective copy ops changed from anon_uffd_ops to
shmem_uffd_ops.

The stale anon folio can then be installed into the new shared shmem VMA.
mfill_atomic_install_pte() sees a folio without page-cache mapping and
calls folio_add_new_anon_rmap(), which reaches BUG_ON(!anon_vma) because
the new shared shmem VMA has no anon_vma.

Compare both the raw VMA uffd ops and the effective UFFDIO_COPY ops
across the retry. The raw comparison preserves the existing VMA-type
replacement guard, while the effective comparison also catches replacements
where the raw ops stay equal but the MAP_PRIVATE override result changes.
If either comparison changes, return -EAGAIN and let the ioctl retry
instead of installing the stale folio through the wrong path.

Fixes: 292411fda25b ("mm/userfaultfd: detect VMA type change after copy retry in mfill_copy_folio_retry()")
Assisted-by: Codex:gpt-5-5-xhigh
Assisted-by: Claude:opus-4-7

Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
 mm/userfaultfd.c | 40 ++++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 180bad42fc79..5af13953c29a 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -69,6 +69,24 @@ static const struct vm_uffd_ops *vma_uffd_ops(struct vm_area_struct *vma)
 	return vma->vm_ops ? vma->vm_ops->uffd_ops : NULL;
 }
 
+static const struct vm_uffd_ops *vma_uffd_copy_ops(struct vm_area_struct *vma)
+{
+	const struct vm_uffd_ops *ops = vma_uffd_ops(vma);
+
+	if (!ops)
+		return NULL;
+
+	/*
+	 * UFFDIO_COPY fills MAP_PRIVATE file-backed mappings as anonymous
+	 * memory. This is an effective ops override, so retry validation must
+	 * compare the override result, not just vma->vm_ops->uffd_ops.
+	 */
+	if (!(vma->vm_flags & VM_SHARED))
+		return &anon_uffd_ops;
+
+	return ops;
+}
+
 static __always_inline
 bool validate_dst_vma(struct vm_area_struct *dst_vma, unsigned long dst_end)
 {
@@ -447,6 +465,7 @@ static int mfill_copy_folio_retry(struct mfill_state *state,
 				  struct folio *folio)
 {
 	const struct vm_uffd_ops *orig_ops = vma_uffd_ops(state->vma);
+	const struct vm_uffd_ops *orig_copy_ops = vma_uffd_copy_ops(state->vma);
 	unsigned long src_addr = state->src_addr;
 	void *kaddr;
 	int err;
@@ -469,10 +488,11 @@ static int mfill_copy_folio_retry(struct mfill_state *state,
 
 	/*
 	 * The VMA type may have changed while the lock was dropped
-	 * (e.g. replaced with a hugetlb mapping), making the caller's
-	 * ops pointer stale.
+	 * (e.g. replaced with a hugetlb mapping). Also catch changes where
+	 * the raw ops stay equal but the effective UFFDIO_COPY ops differ.
 	 */
-	if (vma_uffd_ops(state->vma) != orig_ops)
+	if (vma_uffd_ops(state->vma) != orig_ops ||
+	    vma_uffd_copy_ops(state->vma) != orig_copy_ops)
 		return -EAGAIN;
 
 	err = mfill_establish_pmd(state);
@@ -545,19 +565,7 @@ static int __mfill_atomic_pte(struct mfill_state *state,
 
 static int mfill_atomic_pte_copy(struct mfill_state *state)
 {
-	const struct vm_uffd_ops *ops = vma_uffd_ops(state->vma);
-
-	/*
-	 * The normal page fault path for a MAP_PRIVATE mapping in a
-	 * file-backed VMA will invoke the fault, fill the hole in the file and
-	 * COW it right away. The result generates plain anonymous memory.
-	 * So when we are asked to fill a hole in a MAP_PRIVATE mapping, we'll
-	 * generate anonymous memory directly without actually filling the
-	 * hole. For the MAP_PRIVATE case the robustness check only happens in
-	 * the pagetable (to verify it's still none) and not in the page cache.
-	 */
-	if (!(state->vma->vm_flags & VM_SHARED))
-		ops = &anon_uffd_ops;
+	const struct vm_uffd_ops *ops = vma_uffd_copy_ops(state->vma);
 
 	return __mfill_atomic_pte(state, ops);
 }
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 0/1] mm/userfaultfd: fix UFFDIO_COPY retry private/shared VMA panic
  2026-05-14  0:54 [PATCH 0/1] mm/userfaultfd: fix UFFDIO_COPY retry private/shared VMA panic Michael Bommarito
  2026-05-14  0:54 ` [PATCH 1/1] mm/userfaultfd: validate effective UFFDIO_COPY ops after retry Michael Bommarito
@ 2026-05-14 18:00 ` Mike Rapoport
  1 sibling, 0 replies; 3+ messages in thread
From: Mike Rapoport @ 2026-05-14 18:00 UTC (permalink / raw)
  To: Michael Bommarito, Andrew Morton, David Hildenbrand
  Cc: Peter Xu, David Carlier, linux-mm, linux-kernel, Lorenzo Stoakes,
	Liam R. Howlett

Hi,

On Wed, May 13, 2026 at 08:54:39PM -0400, Michael Bommarito wrote:
> Hi,
> 
> mfill_copy_folio_retry() drops the destination VMA lock before
> copy_from_user() and reacquires it afterwards.  Commit 292411fda25b
> ("mm/userfaultfd: detect VMA type change after copy retry in
> mfill_copy_folio_retry()") added a comparison of vma_uffd_ops() across
> that window, but the comparison is not tight enough for private/shared
> shmem swaps: both private and shared shmem VMAs expose shmem_uffd_ops
> through vm_ops, while UFFDIO_COPY into a MAP_PRIVATE file-backed VMA
> overrides the effective copy ops to anon_uffd_ops at
> mfill_atomic_pte_copy() time.
> 
> A separate concern from Peter Xu's review of v1 of 292411fda25b's
> series -- replacement with a different shmem VMA carrying the same
> flags but a different inode -- is out of scope here and is also
> unaddressed by 292411fda25b.

Thanks for the patch!

I'd prefer to deal with all the issues at once and I added vma_snapshot
suggested by Peter on top of you changes.


From a6921db3b2c382a0b57a49847bf75934237ac93b Mon Sep 17 00:00:00 2001
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
Date: Thu, 14 May 2026 18:51:58 +0300
Subject: [PATCH] userfaultfd: snapshot VMA state across UFFDIO_COPY retry

mfill_copy_folio_retry() drops the VMA lock for copy_from_user() and
reacquires it afterwards. The destination VMA can be replaced during that
window.

The existing check compares vma_uffd_ops() before and after the retry, but
if a shmem VMA with MAP_SHARED is replaced with a shmem VMA with
MAP_PRIVATE (or vice versa) the replacement goes undetected.

The change from MAP_PRIVATE to MAP_SHARED will treat the folio allocated
with shmem_alloc_folio() as anonymous and this will cause BUG() when
mfill_atomic_install_pte() will try to folio_add_new_anon_rmap().

The change from MAP_SHARED to MAP_PRIVATE allows injection of folios into
the page cache of the original VMA.

Introduce helpers for more comprehensive comparison of VMA state:
- vma_snapshot_get() to save the relevant VMA state into a struct
  vma_snapshot (original uffd_ops, actual uffd_ops, relevant VMA flags,
  vm_file and pgoff) before dropping the lock
- vma_snapshot_changed() to compare the saved state with the state of the
  VMA acquired after retaking the locks
- vma_snapshot_put() to release vm_file pinning.

Use DEFINE_FREE() cleanup to wrap vma_snapshot_put() to avoid complicating
error handling paths in mfill_copy_folio_retry().

Add vma_uffd_copy_ops() to avoid code duplication when original ops of
shmem VMA with MAP_PRIVATE are replaced with anon_uffd_ops.

Fixes: 292411fda25b ("mm/userfaultfd: detect VMA type change after copy retry in mfill_copy_folio_retry()")
Fixes: 6ab703034f14 ("userfaultfd: mfill_atomic(): remove retry logic")
Suggested-by: Peter Xu <peterx@redhat.com>
Co-developed-by: David Carlier <devnexen@gmail.com>
Signed-off-by: David Carlier <devnexen@gmail.com>
Co-developed-by: Michael Bommarito <michael.bommarito@gmail.com>
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
 mm/userfaultfd.c | 99 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 79 insertions(+), 20 deletions(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 180bad42fc79..b70b84776a79 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -14,6 +14,8 @@
 #include <linux/userfaultfd_k.h>
 #include <linux/mmu_notifier.h>
 #include <linux/hugetlb.h>
+#include <linux/file.h>
+#include <linux/cleanup.h>
 #include <asm/tlbflush.h>
 #include <asm/tlb.h>
 #include "internal.h"
@@ -69,6 +71,24 @@ static const struct vm_uffd_ops *vma_uffd_ops(struct vm_area_struct *vma)
 	return vma->vm_ops ? vma->vm_ops->uffd_ops : NULL;
 }
 
+static const struct vm_uffd_ops *vma_uffd_copy_ops(struct vm_area_struct *vma)
+{
+	const struct vm_uffd_ops *ops = vma_uffd_ops(vma);
+
+	if (!ops)
+		return NULL;
+
+	/*
+	 * UFFDIO_COPY fills MAP_PRIVATE file-backed mappings as anonymous
+	 * memory. This is an effective ops override, so retry validation must
+	 * compare the override result, not just vma->vm_ops->uffd_ops.
+	 */
+	if (!(vma->vm_flags & VM_SHARED))
+		return &anon_uffd_ops;
+
+	return ops;
+}
+
 static __always_inline
 bool validate_dst_vma(struct vm_area_struct *dst_vma, unsigned long dst_end)
 {
@@ -443,14 +463,70 @@ static int mfill_copy_folio_locked(struct folio *folio, unsigned long src_addr)
 	return ret;
 }
 
+#define VMA_SNAPSHOT_FLAGS append_vma_flags(__VMA_UFFD_FLAGS, VMA_SHARED_BIT)
+
+struct vma_snapshot {
+	const struct vm_uffd_ops *copy_ops;
+	const struct vm_uffd_ops *ops;
+	struct file *file;
+	vma_flags_t flags;
+	pgoff_t pgoff;
+};
+
+static void vma_snapshot_get(struct vma_snapshot *s, struct vm_area_struct *vma)
+{
+	s->flags = vma_flags_and_mask(&vma->flags, VMA_SNAPSHOT_FLAGS);
+	s->copy_ops = vma_uffd_copy_ops(vma);
+	s->ops = vma_uffd_ops(vma);
+	s->pgoff = vma->vm_pgoff;
+
+	if (vma->vm_file)
+		s->file = get_file(vma->vm_file);
+}
+
+static bool vma_snapshot_changed(struct vma_snapshot *s,
+				 struct vm_area_struct *vma)
+{
+	vma_flags_t flags = vma_flags_and_mask(&vma->flags, VMA_SNAPSHOT_FLAGS);
+
+	if (!vma_flags_same_pair(&s->flags, &flags))
+		return true;
+
+	/* VMA type or effective uffd_ops changed while the lock was dropped */
+	if (s->ops != vma_uffd_ops(vma) || s->copy_ops != vma_uffd_copy_ops(vma))
+		return true;
+
+	/* VMA was anonymous before; changed only if it no longer is */
+	if (!s->file)
+		return !vma_is_anonymous(vma);
+
+	/* VMA was file backed, but inode or offset has changed */
+	if (!vma->vm_file || vma->vm_file->f_inode != s->file->f_inode ||
+	    vma->vm_pgoff != s->pgoff)
+		return true;
+
+	return false;
+}
+
+static void vma_snapshot_put(struct vma_snapshot *s)
+{
+	if (s->file)
+		fput(s->file);
+}
+
+DEFINE_FREE(snapshot_put, struct vma_snapshot *, if (_T) vma_snapshot_put(_T));
+
 static int mfill_copy_folio_retry(struct mfill_state *state,
 				  struct folio *folio)
 {
-	const struct vm_uffd_ops *orig_ops = vma_uffd_ops(state->vma);
+	struct vma_snapshot s = { 0 };
+	struct vma_snapshot *p __free(snapshot_put) = &s;
 	unsigned long src_addr = state->src_addr;
 	void *kaddr;
 	int err;
 
+	vma_snapshot_get(&s, state->vma);
+
 	/* retry copying with mm_lock dropped */
 	mfill_put_vma(state);
 
@@ -467,12 +543,7 @@ static int mfill_copy_folio_retry(struct mfill_state *state,
 	if (err)
 		return err;
 
-	/*
-	 * The VMA type may have changed while the lock was dropped
-	 * (e.g. replaced with a hugetlb mapping), making the caller's
-	 * ops pointer stale.
-	 */
-	if (vma_uffd_ops(state->vma) != orig_ops)
+	if (vma_snapshot_changed(&s, state->vma))
 		return -EAGAIN;
 
 	err = mfill_establish_pmd(state);
@@ -545,19 +616,7 @@ static int __mfill_atomic_pte(struct mfill_state *state,
 
 static int mfill_atomic_pte_copy(struct mfill_state *state)
 {
-	const struct vm_uffd_ops *ops = vma_uffd_ops(state->vma);
-
-	/*
-	 * The normal page fault path for a MAP_PRIVATE mapping in a
-	 * file-backed VMA will invoke the fault, fill the hole in the file and
-	 * COW it right away. The result generates plain anonymous memory.
-	 * So when we are asked to fill a hole in a MAP_PRIVATE mapping, we'll
-	 * generate anonymous memory directly without actually filling the
-	 * hole. For the MAP_PRIVATE case the robustness check only happens in
-	 * the pagetable (to verify it's still none) and not in the page cache.
-	 */
-	if (!(state->vma->vm_flags & VM_SHARED))
-		ops = &anon_uffd_ops;
+	const struct vm_uffd_ops *ops = vma_uffd_copy_ops(state->vma);
 
 	return __mfill_atomic_pte(state, ops);
 }

base-commit: 444fc9435e57157fcf30fc99aee44997f3458641
-- 
2.53.0


-- 
Sincerely yours,
Mike.


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-14 18:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-14  0:54 [PATCH 0/1] mm/userfaultfd: fix UFFDIO_COPY retry private/shared VMA panic Michael Bommarito
2026-05-14  0:54 ` [PATCH 1/1] mm/userfaultfd: validate effective UFFDIO_COPY ops after retry Michael Bommarito
2026-05-14 18:00 ` [PATCH 0/1] mm/userfaultfd: fix UFFDIO_COPY retry private/shared VMA panic Mike Rapoport

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox