All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] mm/userfaultfd: fix UFFDIO_COPY retry private/shared VMA panic
@ 2026-05-14  0:54 Michael Bommarito
  2026-05-14  0:54 ` [PATCH 1/1] mm/userfaultfd: validate effective UFFDIO_COPY ops after retry Michael Bommarito
  2026-05-14 18:00 ` [PATCH 0/1] mm/userfaultfd: fix UFFDIO_COPY retry private/shared VMA panic Mike Rapoport
  0 siblings, 2 replies; 4+ messages in thread
From: Michael Bommarito @ 2026-05-14  0:54 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, Peter Xu
  Cc: David Carlier, linux-mm, linux-kernel

Hi,

mfill_copy_folio_retry() drops the destination VMA lock before
copy_from_user() and reacquires it afterwards.  Commit 292411fda25b
("mm/userfaultfd: detect VMA type change after copy retry in
mfill_copy_folio_retry()") added a comparison of vma_uffd_ops() across
that window, but the comparison is not tight enough for private/shared
shmem swaps: both private and shared shmem VMAs expose shmem_uffd_ops
through vm_ops, while UFFDIO_COPY into a MAP_PRIVATE file-backed VMA
overrides the effective copy ops to anon_uffd_ops at
mfill_atomic_pte_copy() time.

If the destination is replaced from MAP_PRIVATE shmem to MAP_SHARED
shmem during the retry, vma_uffd_ops() still compares equal across
the window, the override flip is not detected, and the stale
anonymous folio is installed into the new shared shmem VMA.
mfill_atomic_install_pte() sees a folio without page-cache mapping,
calls folio_add_new_anon_rmap(), and __folio_set_anon() reaches
BUG_ON(!anon_vma) because the new shared shmem VMA has no anon_vma.

Reproducer (UML+KASAN, 7.1-rc2-00002-g8d90b09e6741, unprivileged uid
65534 with vm.unprivileged_userfaultfd=1):

  pre-fix:
    PROCESS_UID=65534 EUID=65534
    DST_INITIAL=MAP_PRIVATE_SHMEM addr=0x40041000
    SRC_INITIAL=UFFD_MISSING_ANON addr=0x40042000
    SOURCE_USERFAULT addr=0x40042000 flags=0x0
    DST_REPLACED_WITH_SHARED=ok
    DST_REREGISTERED_AFTER_REMAP=ok
    SOURCE_RESOLVED=ok
    BUG: failure at mm/rmap.c:1468/__folio_set_anon()!
    Kernel panic - not syncing: BUG!

  post-fix:
    DST_UFFDIO_COPY_IOCTL_RET=-1 errno=11 copy=-11
    RETRY_RESULT=-11
    (no BUG / WARN / KASAN signal in dmesg)

The patch introduces a vma_uffd_copy_ops() helper that applies the
MAP_PRIVATE override inline.  mfill_copy_folio_retry() now compares both
the raw vma_uffd_ops() and the effective copy ops across the dropped-lock
window: the raw comparison preserves 292411fda25b's VMA-type replacement
guard, while the effective comparison catches the private/shared shmem
override flip.  The mfill_atomic_pte_copy() call site goes through the
same helper, preserving today's semantics.  Because the override is
applied on both sides, a stable MAP_PRIVATE shmem VMA returns
&anon_uffd_ops on both effective-copy checks and the comparison still
succeeds, so the change does not reintroduce the spurious -EAGAIN that
v5/v6 of 292411fda25b's series triggered on MAP_PRIVATE shmem (see that
series's v6 changelog).

A separate concern from Peter Xu's review of v1 of 292411fda25b's
series -- replacement with a different shmem VMA carrying the same
flags but a different inode -- is out of scope here and is also
unaddressed by 292411fda25b.

Testing.

  - x86_64 UML build with KASAN clean.
  - Reproducer above: pre-fix panics deterministically on the first
    iteration; post-fix returns -EAGAIN with empty dmesg.
  - tools/testing/selftests/mm/uffd-stress {anon,shmem,shmem-private}
    16M / 4 cpus, 4 bounces each, KASAN-silent on stock and patched.
  - tools/testing/selftests/mm/uffd-unit-tests on stock and patched:
    identical pass / skip profile through the events block; both
    hit the same pre-existing UML arch limitation in the
    "poison on anon" case at arch/um/kernel/trap.c:198, unrelated
    to this patch.
  - scripts/checkpatch.pl --strict clean.

Fixes: 292411fda25b ("mm/userfaultfd: detect VMA type change after copy retry in mfill_copy_folio_retry()")

Michael Bommarito (1):
  mm/userfaultfd: validate effective UFFDIO_COPY ops after retry

 mm/userfaultfd.c | 40 ++++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)

--
2.46.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/1] mm/userfaultfd: validate effective UFFDIO_COPY ops after retry
  2026-05-14  0:54 [PATCH 0/1] mm/userfaultfd: fix UFFDIO_COPY retry private/shared VMA panic Michael Bommarito
@ 2026-05-14  0:54 ` Michael Bommarito
  2026-05-14 18:00 ` [PATCH 0/1] mm/userfaultfd: fix UFFDIO_COPY retry private/shared VMA panic Mike Rapoport
  1 sibling, 0 replies; 4+ messages in thread
From: Michael Bommarito @ 2026-05-14  0:54 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, Peter Xu
  Cc: David Carlier, linux-mm, linux-kernel

UFFDIO_COPY fills MAP_PRIVATE file-backed VMAs with anonymous memory.
mfill_atomic_pte_copy() implements that by overriding the VMA's uffd ops
with anon_uffd_ops when VM_SHARED is not set.

mfill_copy_folio_retry() can drop the destination VMA lock after an
initial copy_from_user() failure and reacquire the destination VMA. It
currently checks whether vma_uffd_ops() changed while the lock was
dropped, but that is not the same as checking whether the effective
UFFDIO_COPY ops changed.

Private and shared shmem VMAs both expose shmem_uffd_ops through vm_ops.
If a private shmem destination is replaced with a shared shmem destination
while the retry has dropped the lock, vma_uffd_ops() still compares equal
even though the effective copy ops changed from anon_uffd_ops to
shmem_uffd_ops.

The stale anon folio can then be installed into the new shared shmem VMA.
mfill_atomic_install_pte() sees a folio without page-cache mapping and
calls folio_add_new_anon_rmap(), which reaches BUG_ON(!anon_vma) because
the new shared shmem VMA has no anon_vma.

Compare both the raw VMA uffd ops and the effective UFFDIO_COPY ops
across the retry. The raw comparison preserves the existing VMA-type
replacement guard, while the effective comparison also catches replacements
where the raw ops stay equal but the MAP_PRIVATE override result changes.
If either comparison changes, return -EAGAIN and let the ioctl retry
instead of installing the stale folio through the wrong path.

Fixes: 292411fda25b ("mm/userfaultfd: detect VMA type change after copy retry in mfill_copy_folio_retry()")
Assisted-by: Codex:gpt-5-5-xhigh
Assisted-by: Claude:opus-4-7

Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
 mm/userfaultfd.c | 40 ++++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 180bad42fc79..5af13953c29a 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -69,6 +69,24 @@ static const struct vm_uffd_ops *vma_uffd_ops(struct vm_area_struct *vma)
 	return vma->vm_ops ? vma->vm_ops->uffd_ops : NULL;
 }
 
+static const struct vm_uffd_ops *vma_uffd_copy_ops(struct vm_area_struct *vma)
+{
+	const struct vm_uffd_ops *ops = vma_uffd_ops(vma);
+
+	if (!ops)
+		return NULL;
+
+	/*
+	 * UFFDIO_COPY fills MAP_PRIVATE file-backed mappings as anonymous
+	 * memory. This is an effective ops override, so retry validation must
+	 * compare the override result, not just vma->vm_ops->uffd_ops.
+	 */
+	if (!(vma->vm_flags & VM_SHARED))
+		return &anon_uffd_ops;
+
+	return ops;
+}
+
 static __always_inline
 bool validate_dst_vma(struct vm_area_struct *dst_vma, unsigned long dst_end)
 {
@@ -447,6 +465,7 @@ static int mfill_copy_folio_retry(struct mfill_state *state,
 				  struct folio *folio)
 {
 	const struct vm_uffd_ops *orig_ops = vma_uffd_ops(state->vma);
+	const struct vm_uffd_ops *orig_copy_ops = vma_uffd_copy_ops(state->vma);
 	unsigned long src_addr = state->src_addr;
 	void *kaddr;
 	int err;
@@ -469,10 +488,11 @@ static int mfill_copy_folio_retry(struct mfill_state *state,
 
 	/*
 	 * The VMA type may have changed while the lock was dropped
-	 * (e.g. replaced with a hugetlb mapping), making the caller's
-	 * ops pointer stale.
+	 * (e.g. replaced with a hugetlb mapping). Also catch changes where
+	 * the raw ops stay equal but the effective UFFDIO_COPY ops differ.
 	 */
-	if (vma_uffd_ops(state->vma) != orig_ops)
+	if (vma_uffd_ops(state->vma) != orig_ops ||
+	    vma_uffd_copy_ops(state->vma) != orig_copy_ops)
 		return -EAGAIN;
 
 	err = mfill_establish_pmd(state);
@@ -545,19 +565,7 @@ static int __mfill_atomic_pte(struct mfill_state *state,
 
 static int mfill_atomic_pte_copy(struct mfill_state *state)
 {
-	const struct vm_uffd_ops *ops = vma_uffd_ops(state->vma);
-
-	/*
-	 * The normal page fault path for a MAP_PRIVATE mapping in a
-	 * file-backed VMA will invoke the fault, fill the hole in the file and
-	 * COW it right away. The result generates plain anonymous memory.
-	 * So when we are asked to fill a hole in a MAP_PRIVATE mapping, we'll
-	 * generate anonymous memory directly without actually filling the
-	 * hole. For the MAP_PRIVATE case the robustness check only happens in
-	 * the pagetable (to verify it's still none) and not in the page cache.
-	 */
-	if (!(state->vma->vm_flags & VM_SHARED))
-		ops = &anon_uffd_ops;
+	const struct vm_uffd_ops *ops = vma_uffd_copy_ops(state->vma);
 
 	return __mfill_atomic_pte(state, ops);
 }
-- 
2.46.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/1] mm/userfaultfd: fix UFFDIO_COPY retry private/shared VMA panic
  2026-05-14  0:54 [PATCH 0/1] mm/userfaultfd: fix UFFDIO_COPY retry private/shared VMA panic Michael Bommarito
  2026-05-14  0:54 ` [PATCH 1/1] mm/userfaultfd: validate effective UFFDIO_COPY ops after retry Michael Bommarito
@ 2026-05-14 18:00 ` Mike Rapoport
  2026-05-17  6:28   ` [PATCH] userfaultfd: snapshot VMA state across UFFDIO_COPY retry Heechan Kang
  1 sibling, 1 reply; 4+ messages in thread
From: Mike Rapoport @ 2026-05-14 18:00 UTC (permalink / raw)
  To: Michael Bommarito, Andrew Morton, David Hildenbrand
  Cc: Peter Xu, David Carlier, linux-mm, linux-kernel, Lorenzo Stoakes,
	Liam R. Howlett

Hi,

On Wed, May 13, 2026 at 08:54:39PM -0400, Michael Bommarito wrote:
> Hi,
> 
> mfill_copy_folio_retry() drops the destination VMA lock before
> copy_from_user() and reacquires it afterwards.  Commit 292411fda25b
> ("mm/userfaultfd: detect VMA type change after copy retry in
> mfill_copy_folio_retry()") added a comparison of vma_uffd_ops() across
> that window, but the comparison is not tight enough for private/shared
> shmem swaps: both private and shared shmem VMAs expose shmem_uffd_ops
> through vm_ops, while UFFDIO_COPY into a MAP_PRIVATE file-backed VMA
> overrides the effective copy ops to anon_uffd_ops at
> mfill_atomic_pte_copy() time.
> 
> A separate concern from Peter Xu's review of v1 of 292411fda25b's
> series -- replacement with a different shmem VMA carrying the same
> flags but a different inode -- is out of scope here and is also
> unaddressed by 292411fda25b.

Thanks for the patch!

I'd prefer to deal with all the issues at once and I added vma_snapshot
suggested by Peter on top of you changes.


From a6921db3b2c382a0b57a49847bf75934237ac93b Mon Sep 17 00:00:00 2001
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
Date: Thu, 14 May 2026 18:51:58 +0300
Subject: [PATCH] userfaultfd: snapshot VMA state across UFFDIO_COPY retry

mfill_copy_folio_retry() drops the VMA lock for copy_from_user() and
reacquires it afterwards. The destination VMA can be replaced during that
window.

The existing check compares vma_uffd_ops() before and after the retry, but
if a shmem VMA with MAP_SHARED is replaced with a shmem VMA with
MAP_PRIVATE (or vice versa) the replacement goes undetected.

The change from MAP_PRIVATE to MAP_SHARED will treat the folio allocated
with shmem_alloc_folio() as anonymous and this will cause BUG() when
mfill_atomic_install_pte() will try to folio_add_new_anon_rmap().

The change from MAP_SHARED to MAP_PRIVATE allows injection of folios into
the page cache of the original VMA.

Introduce helpers for more comprehensive comparison of VMA state:
- vma_snapshot_get() to save the relevant VMA state into a struct
  vma_snapshot (original uffd_ops, actual uffd_ops, relevant VMA flags,
  vm_file and pgoff) before dropping the lock
- vma_snapshot_changed() to compare the saved state with the state of the
  VMA acquired after retaking the locks
- vma_snapshot_put() to release vm_file pinning.

Use DEFINE_FREE() cleanup to wrap vma_snapshot_put() to avoid complicating
error handling paths in mfill_copy_folio_retry().

Add vma_uffd_copy_ops() to avoid code duplication when original ops of
shmem VMA with MAP_PRIVATE are replaced with anon_uffd_ops.

Fixes: 292411fda25b ("mm/userfaultfd: detect VMA type change after copy retry in mfill_copy_folio_retry()")
Fixes: 6ab703034f14 ("userfaultfd: mfill_atomic(): remove retry logic")
Suggested-by: Peter Xu <peterx@redhat.com>
Co-developed-by: David Carlier <devnexen@gmail.com>
Signed-off-by: David Carlier <devnexen@gmail.com>
Co-developed-by: Michael Bommarito <michael.bommarito@gmail.com>
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
 mm/userfaultfd.c | 99 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 79 insertions(+), 20 deletions(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 180bad42fc79..b70b84776a79 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -14,6 +14,8 @@
 #include <linux/userfaultfd_k.h>
 #include <linux/mmu_notifier.h>
 #include <linux/hugetlb.h>
+#include <linux/file.h>
+#include <linux/cleanup.h>
 #include <asm/tlbflush.h>
 #include <asm/tlb.h>
 #include "internal.h"
@@ -69,6 +71,24 @@ static const struct vm_uffd_ops *vma_uffd_ops(struct vm_area_struct *vma)
 	return vma->vm_ops ? vma->vm_ops->uffd_ops : NULL;
 }
 
+static const struct vm_uffd_ops *vma_uffd_copy_ops(struct vm_area_struct *vma)
+{
+	const struct vm_uffd_ops *ops = vma_uffd_ops(vma);
+
+	if (!ops)
+		return NULL;
+
+	/*
+	 * UFFDIO_COPY fills MAP_PRIVATE file-backed mappings as anonymous
+	 * memory. This is an effective ops override, so retry validation must
+	 * compare the override result, not just vma->vm_ops->uffd_ops.
+	 */
+	if (!(vma->vm_flags & VM_SHARED))
+		return &anon_uffd_ops;
+
+	return ops;
+}
+
 static __always_inline
 bool validate_dst_vma(struct vm_area_struct *dst_vma, unsigned long dst_end)
 {
@@ -443,14 +463,70 @@ static int mfill_copy_folio_locked(struct folio *folio, unsigned long src_addr)
 	return ret;
 }
 
+#define VMA_SNAPSHOT_FLAGS append_vma_flags(__VMA_UFFD_FLAGS, VMA_SHARED_BIT)
+
+struct vma_snapshot {
+	const struct vm_uffd_ops *copy_ops;
+	const struct vm_uffd_ops *ops;
+	struct file *file;
+	vma_flags_t flags;
+	pgoff_t pgoff;
+};
+
+static void vma_snapshot_get(struct vma_snapshot *s, struct vm_area_struct *vma)
+{
+	s->flags = vma_flags_and_mask(&vma->flags, VMA_SNAPSHOT_FLAGS);
+	s->copy_ops = vma_uffd_copy_ops(vma);
+	s->ops = vma_uffd_ops(vma);
+	s->pgoff = vma->vm_pgoff;
+
+	if (vma->vm_file)
+		s->file = get_file(vma->vm_file);
+}
+
+static bool vma_snapshot_changed(struct vma_snapshot *s,
+				 struct vm_area_struct *vma)
+{
+	vma_flags_t flags = vma_flags_and_mask(&vma->flags, VMA_SNAPSHOT_FLAGS);
+
+	if (!vma_flags_same_pair(&s->flags, &flags))
+		return true;
+
+	/* VMA type or effective uffd_ops changed while the lock was dropped */
+	if (s->ops != vma_uffd_ops(vma) || s->copy_ops != vma_uffd_copy_ops(vma))
+		return true;
+
+	/* VMA was anonymous before; changed only if it no longer is */
+	if (!s->file)
+		return !vma_is_anonymous(vma);
+
+	/* VMA was file backed, but inode or offset has changed */
+	if (!vma->vm_file || vma->vm_file->f_inode != s->file->f_inode ||
+	    vma->vm_pgoff != s->pgoff)
+		return true;
+
+	return false;
+}
+
+static void vma_snapshot_put(struct vma_snapshot *s)
+{
+	if (s->file)
+		fput(s->file);
+}
+
+DEFINE_FREE(snapshot_put, struct vma_snapshot *, if (_T) vma_snapshot_put(_T));
+
 static int mfill_copy_folio_retry(struct mfill_state *state,
 				  struct folio *folio)
 {
-	const struct vm_uffd_ops *orig_ops = vma_uffd_ops(state->vma);
+	struct vma_snapshot s = { 0 };
+	struct vma_snapshot *p __free(snapshot_put) = &s;
 	unsigned long src_addr = state->src_addr;
 	void *kaddr;
 	int err;
 
+	vma_snapshot_get(&s, state->vma);
+
 	/* retry copying with mm_lock dropped */
 	mfill_put_vma(state);
 
@@ -467,12 +543,7 @@ static int mfill_copy_folio_retry(struct mfill_state *state,
 	if (err)
 		return err;
 
-	/*
-	 * The VMA type may have changed while the lock was dropped
-	 * (e.g. replaced with a hugetlb mapping), making the caller's
-	 * ops pointer stale.
-	 */
-	if (vma_uffd_ops(state->vma) != orig_ops)
+	if (vma_snapshot_changed(&s, state->vma))
 		return -EAGAIN;
 
 	err = mfill_establish_pmd(state);
@@ -545,19 +616,7 @@ static int __mfill_atomic_pte(struct mfill_state *state,
 
 static int mfill_atomic_pte_copy(struct mfill_state *state)
 {
-	const struct vm_uffd_ops *ops = vma_uffd_ops(state->vma);
-
-	/*
-	 * The normal page fault path for a MAP_PRIVATE mapping in a
-	 * file-backed VMA will invoke the fault, fill the hole in the file and
-	 * COW it right away. The result generates plain anonymous memory.
-	 * So when we are asked to fill a hole in a MAP_PRIVATE mapping, we'll
-	 * generate anonymous memory directly without actually filling the
-	 * hole. For the MAP_PRIVATE case the robustness check only happens in
-	 * the pagetable (to verify it's still none) and not in the page cache.
-	 */
-	if (!(state->vma->vm_flags & VM_SHARED))
-		ops = &anon_uffd_ops;
+	const struct vm_uffd_ops *ops = vma_uffd_copy_ops(state->vma);
 
 	return __mfill_atomic_pte(state, ops);
 }

base-commit: 444fc9435e57157fcf30fc99aee44997f3458641
-- 
2.53.0


-- 
Sincerely yours,
Mike.


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] userfaultfd: snapshot VMA state across UFFDIO_COPY retry
  2026-05-14 18:00 ` [PATCH 0/1] mm/userfaultfd: fix UFFDIO_COPY retry private/shared VMA panic Mike Rapoport
@ 2026-05-17  6:28   ` Heechan Kang
  0 siblings, 0 replies; 4+ messages in thread
From: Heechan Kang @ 2026-05-17  6:28 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Michael Bommarito, Andrew Morton, David Hildenbrand, Peter Xu,
	David Carlier, linux-mm, linux-kernel, Lorenzo Stoakes,
	Liam R . Howlett, Hugh Dickins, Greg Kroah-Hartman, Heechan Kang

Hello Mike,

I tested the vma_snapshot patch from this message on top of:

  6916d5703ddf ("Merge tag 'drm-fixes-2026-05-16' of https://gitlab.freedesktop.org/drm/kernel")
  v7.1-rc3-362-g6916d5703ddf

I used my UFFDIO_COPY MAP_SHARED tmpfs -> MAP_PRIVATE tmpfs replacement
reproducer against a root-owned 0444 sparse tmpfs file.

Before the test:

  sha256: ad7facb2586fc6e966c004d7d1d16b024f5805ff7cb47c7a85dabd8b48892ca7

After running 30000 attempts as uid 1000, the reproducer reported:

  RESULT: not reproduced after 30000 attempts

The target file remained unchanged:

  sha256: ad7facb2586fc6e966c004d7d1d16b024f5805ff7cb47c7a85dabd8b48892ca7
  first 32 bytes remained zero

So this patch fixes the MAP_SHARED tmpfs -> MAP_PRIVATE tmpfs replacement
case I tested. I did not separately test the MAP_PRIVATE -> MAP_SHARED
panic case.

Tested-by: Heechan Kang <gganji11@naver.com>

Thanks,
Heechan


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-17  6:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-14  0:54 [PATCH 0/1] mm/userfaultfd: fix UFFDIO_COPY retry private/shared VMA panic Michael Bommarito
2026-05-14  0:54 ` [PATCH 1/1] mm/userfaultfd: validate effective UFFDIO_COPY ops after retry Michael Bommarito
2026-05-14 18:00 ` [PATCH 0/1] mm/userfaultfd: fix UFFDIO_COPY retry private/shared VMA panic Mike Rapoport
2026-05-17  6:28   ` [PATCH] userfaultfd: snapshot VMA state across UFFDIO_COPY retry Heechan Kang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.