public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [PATCH v2] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry()
@ 2026-03-30 20:29 David Carlier
  2026-03-30 20:40 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: David Carlier @ 2026-03-30 20:29 UTC (permalink / raw)
  To: Peter Xu, Andrew Morton
  Cc: Mike Rapoport, linux-mm, linux-kernel, David Carlier

In mfill_copy_folio_retry(), all locks are dropped to retry
copy_from_user() with page faults enabled. During this window, the VMA
can be replaced entirely (e.g. munmap + mmap + UFFDIO_REGISTER by
another thread), but the caller proceeds with a folio allocated from the
original VMA's backing store.

Checking ops alone is insufficient: the replacement VMA could be the
same type (e.g. shmem -> shmem) with identical flags but a different
backing inode. Take a snapshot of the VMA's inode and flags before
dropping locks, and compare after re-acquiring them. If anything
changed, bail out with -EAGAIN.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Carlier <devnexen@gmail.com>
---
 mm/userfaultfd.c | 64 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 57 insertions(+), 7 deletions(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 481ec7eb4442..22e82f953eb5 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -443,33 +443,83 @@ static int mfill_copy_folio_locked(struct folio *folio, unsigned long src_addr)
 	return ret;
 }
 
+struct vma_snapshot {
+	struct inode *inode;
+	vm_flags_t flags;
+};
+
+static void vma_snapshot_take(struct vm_area_struct *vma,
+			      struct vma_snapshot *s)
+{
+	s->flags = vma->vm_flags;
+	if (vma->vm_file) {
+		s->inode = vma->vm_file->f_inode;
+		ihold(s->inode);
+	} else {
+		s->inode = NULL;
+	}
+}
+
+static bool vma_snapshot_changed(struct vm_area_struct *vma,
+				 struct vma_snapshot *s)
+{
+	if (s->flags != vma->vm_flags)
+		return true;
+
+	if (s->inode && vma->vm_file->f_inode != s->inode)
+		return true;
+
+	if (!s->inode && !vma_is_anonymous(vma))
+		return true;
+
+	return false;
+}
+
+static void vma_snapshot_release(struct vma_snapshot *s)
+{
+	if (s->inode) {
+		iput(s->inode);
+		s->inode = NULL;
+	}
+}
+
 static int mfill_copy_folio_retry(struct mfill_state *state, struct folio *folio)
 {
 	unsigned long src_addr = state->src_addr;
+	struct vma_snapshot s;
 	void *kaddr;
 	int err;
 
+	/* Take a quick snapshot of the current vma */
+	vma_snapshot_take(state->vma, &s);
+
 	/* retry copying with mm_lock dropped */
 	mfill_put_vma(state);
 
 	kaddr = kmap_local_folio(folio, 0);
 	err = copy_from_user(kaddr, (const void __user *) src_addr, PAGE_SIZE);
 	kunmap_local(kaddr);
-	if (unlikely(err))
-		return -EFAULT;
+	if (unlikely(err)) {
+		err = -EFAULT;
+		goto out;
+	}
 
 	flush_dcache_folio(folio);
 
 	/* reget VMA and PMD, they could change underneath us */
 	err = mfill_get_vma(state);
 	if (err)
-		return err;
+		goto out;
 
-	err = mfill_establish_pmd(state);
-	if (err)
-		return err;
+	if (vma_snapshot_changed(state->vma, &s)) {
+		err = -EAGAIN;
+		goto out;
+	}
 
-	return 0;
+	err = mfill_establish_pmd(state);
+out:
+	vma_snapshot_release(&s);
+	return err;
 }
 
 static int __mfill_atomic_pte(struct mfill_state *state,
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry()
  2026-03-30 20:29 [PATCH v2] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry() David Carlier
@ 2026-03-30 20:40 ` Andrew Morton
  2026-03-30 21:27   ` David CARLIER
  2026-03-30 21:32   ` David CARLIER
  2026-03-30 20:51 ` Peter Xu
  2026-03-31 11:56 ` Mike Rapoport
  2 siblings, 2 replies; 8+ messages in thread
From: Andrew Morton @ 2026-03-30 20:40 UTC (permalink / raw)
  To: David Carlier; +Cc: Peter Xu, Mike Rapoport, linux-mm, linux-kernel

On Mon, 30 Mar 2026 21:29:09 +0100 David Carlier <devnexen@gmail.com> wrote:

> In mfill_copy_folio_retry(), all locks are dropped to retry
> copy_from_user() with page faults enabled. During this window, the VMA
> can be replaced entirely (e.g. munmap + mmap + UFFDIO_REGISTER by
> another thread), but the caller proceeds with a folio allocated from the
> original VMA's backing store.
> 
> Checking ops alone is insufficient: the replacement VMA could be the
> same type (e.g. shmem -> shmem) with identical flags but a different
> backing inode. Take a snapshot of the VMA's inode and flags before
> dropping locks, and compare after re-acquiring them. If anything
> changed, bail out with -EAGAIN.

Thanks.  What are the userspace-visible runtime effects of the bug?

If they're serious we might be looking at a cc:stable and a
Fixes: tag?



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry()
  2026-03-30 20:29 [PATCH v2] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry() David Carlier
  2026-03-30 20:40 ` Andrew Morton
@ 2026-03-30 20:51 ` Peter Xu
  2026-03-31 11:56 ` Mike Rapoport
  2 siblings, 0 replies; 8+ messages in thread
From: Peter Xu @ 2026-03-30 20:51 UTC (permalink / raw)
  To: David Carlier; +Cc: Andrew Morton, Mike Rapoport, linux-mm, linux-kernel

On Mon, Mar 30, 2026 at 09:29:09PM +0100, David Carlier wrote:
> +struct vma_snapshot {
> +	struct inode *inode;
> +	vm_flags_t flags;

Note that I used vma_flags_t / memcpy() / memcmp(), explicitly because IIUC
we're moving towards deprecating vm_flags_t.

Please see vma_flags_to_legacy().

> +};

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry()
  2026-03-30 20:40 ` Andrew Morton
@ 2026-03-30 21:27   ` David CARLIER
  2026-03-30 21:32   ` David CARLIER
  1 sibling, 0 replies; 8+ messages in thread
From: David CARLIER @ 2026-03-30 21:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Peter Xu, Mike Rapoport, linux-mm, linux-kernel

Hi

On Mon, 30 Mar 2026 at 21:40, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Mon, 30 Mar 2026 21:29:09 +0100 David Carlier <devnexen@gmail.com> wrote:
>
> > In mfill_copy_folio_retry(), all locks are dropped to retry
> > copy_from_user() with page faults enabled. During this window, the VMA
> > can be replaced entirely (e.g. munmap + mmap + UFFDIO_REGISTER by
> > another thread), but the caller proceeds with a folio allocated from the
> > original VMA's backing store.
> >
> > Checking ops alone is insufficient: the replacement VMA could be the
> > same type (e.g. shmem -> shmem) with identical flags but a different
> > backing inode. Take a snapshot of the VMA's inode and flags before
> > dropping locks, and compare after re-acquiring them. If anything
> > changed, bail out with -EAGAIN.
>
> Thanks.  What are the userspace-visible runtime effects of the bug?
>
> If they're serious we might be looking at a cc:stable and a
> Fixes: tag?
>

The bug manifests as a NULL pointer dereference in
shmem_mfill_filemap_add() via file_inode(vma->vm_file) when vm_file is
NULL (anonymous VMA). This is a kernel
  oops/panic — so it's definitely serious enough for Cc: stable and Fixes:.

Cheers


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry()
  2026-03-30 20:40 ` Andrew Morton
  2026-03-30 21:27   ` David CARLIER
@ 2026-03-30 21:32   ` David CARLIER
  2026-03-30 23:42     ` Andrew Morton
  1 sibling, 1 reply; 8+ messages in thread
From: David CARLIER @ 2026-03-30 21:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Peter Xu, Mike Rapoport, linux-mm, linux-kernel

To "mitigate" my previous answer after further digging ...

The userspace-visible effect is a kernel NULL pointer dereference. When
  a shared shmem VMA gets replaced by an anonymous VMA during the
retry
  window, the stale ops->filemap_add() ends up calling
  shmem_mfill_filemap_add() which dereferences vma->vm_file via
  file_inode(). Since vm_file is NULL for anonymous mappings, this is a
  straight kernel oops.

  The window is particularly wide when copy_from_user() blocks on slow
  backing stores (FUSE, NFS) as it runs with page faults enabled.

  The Fixes target would be 56a3706fd7f9 ("shmem, userfaultfd:
implement
  shmem uffd operations using vm_uffd_ops") but that's mm-unstable only,
  so no Cc: stable for now.

On Mon, 30 Mar 2026 at 21:40, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Mon, 30 Mar 2026 21:29:09 +0100 David Carlier <devnexen@gmail.com> wrote:
>
> > In mfill_copy_folio_retry(), all locks are dropped to retry
> > copy_from_user() with page faults enabled. During this window, the VMA
> > can be replaced entirely (e.g. munmap + mmap + UFFDIO_REGISTER by
> > another thread), but the caller proceeds with a folio allocated from the
> > original VMA's backing store.
> >
> > Checking ops alone is insufficient: the replacement VMA could be the
> > same type (e.g. shmem -> shmem) with identical flags but a different
> > backing inode. Take a snapshot of the VMA's inode and flags before
> > dropping locks, and compare after re-acquiring them. If anything
> > changed, bail out with -EAGAIN.
>
> Thanks.  What are the userspace-visible runtime effects of the bug?
>
> If they're serious we might be looking at a cc:stable and a
> Fixes: tag?
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry()
  2026-03-30 21:32   ` David CARLIER
@ 2026-03-30 23:42     ` Andrew Morton
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2026-03-30 23:42 UTC (permalink / raw)
  To: David CARLIER; +Cc: Peter Xu, Mike Rapoport, linux-mm, linux-kernel

On Mon, 30 Mar 2026 22:32:58 +0100 David CARLIER <devnexen@gmail.com> wrote:

> The userspace-visible effect is a kernel NULL pointer dereference. When
>   a shared shmem VMA gets replaced by an anonymous VMA during the
> retry
>   window, the stale ops->filemap_add() ends up calling
>   shmem_mfill_filemap_add() which dereferences vma->vm_file via
>   file_inode(). Since vm_file is NULL for anonymous mappings, this is a
>   straight kernel oops.
> 
>   The window is particularly wide when copy_from_user() blocks on slow
>   backing stores (FUSE, NFS) as it runs with page faults enabled.
> 
>   The Fixes target would be 56a3706fd7f9 ("shmem, userfaultfd:
> implement
>   shmem uffd operations using vm_uffd_ops") but that's mm-unstable only,
>   so no Cc: stable for now.

Ah, OK, thanks.  I'll add a note to "shmem, userfaultfd: implement
shmem uffd operations using vm_uffd_ops" for now, let's see what Mike
thinks.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry()
  2026-03-30 20:29 [PATCH v2] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry() David Carlier
  2026-03-30 20:40 ` Andrew Morton
  2026-03-30 20:51 ` Peter Xu
@ 2026-03-31 11:56 ` Mike Rapoport
  2026-03-31 12:07   ` David CARLIER
  2 siblings, 1 reply; 8+ messages in thread
From: Mike Rapoport @ 2026-03-31 11:56 UTC (permalink / raw)
  To: David Carlier
  Cc: Peter Xu, Andrew Morton, linux-mm, linux-kernel, Lorenzo Stoakes

(added VMA folks)

Hi,

On Mon, Mar 30, 2026 at 09:29:09PM +0100, David Carlier wrote:
> In mfill_copy_folio_retry(), all locks are dropped to retry
> copy_from_user() with page faults enabled. During this window, the VMA
> can be replaced entirely (e.g. munmap + mmap + UFFDIO_REGISTER by
> another thread), but the caller proceeds with a folio allocated from the
> original VMA's backing store.

Is it possible at all that after all that dance vma pointer will remain the
same?
 
> Checking ops alone is insufficient: the replacement VMA could be the
> same type (e.g. shmem -> shmem) with identical flags but a different
> backing inode. Take a snapshot of the VMA's inode and flags before
> dropping locks, and compare after re-acquiring them. If anything
> changed, bail out with -EAGAIN.
> 
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: David Carlier <devnexen@gmail.com>

Sashiko has comments and they seem quite relevant to me:
https://sashiko.dev/#/patchset/20260330214948.148349-1-devnexen%40gmail.com

> ---
>  mm/userfaultfd.c | 64 ++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 57 insertions(+), 7 deletions(-)

...

> +	if (vma_snapshot_changed(state->vma, &s)) {
> +		err = -EAGAIN;

Whatever we do verify the VMA this should not be EAGAIN. EINVAL or ENOENT
like mfill_get_vma() returns seem more appropriate.

> +		goto out;
> +	}

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry()
  2026-03-31 11:56 ` Mike Rapoport
@ 2026-03-31 12:07   ` David CARLIER
  0 siblings, 0 replies; 8+ messages in thread
From: David CARLIER @ 2026-03-31 12:07 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Peter Xu, Andrew Morton, linux-mm, linux-kernel, Lorenzo Stoakes

Hi Mike,

▎ Is it possible at all that after all that dance vma pointer will
remain
  ▎ the same?

  Yes, VMA structs are slab-allocated so after munmap frees the old
VMA
  and mmap allocates a new one, SLUB can hand back the same address.
The
  pointer matches but it's a different VMA — which is exactly why the
  snapshot is needed.

  ▎ This isn't a bug, but struct vm_area_struct uses vm_flags, not
flags.
  ▎ Will this cause a compilation error?

  This is a false positive from Sashiko. vm_area_struct has a union at
  include/linux/mm_types.h:956-960:

  union {
      const vm_flags_t vm_flags;
      vma_flags_t flags;
  };
  Peter explicitly asked to use vma_flags_t / vma->flags since
vm_flags_t
  is being deprecated (see vma_flags_to_legacy()).

  ▎ If the original VMA was file-backed (s->inode is non-NULL), but is
  ▎ concurrently replaced by an anonymous VMA during the lock-dropped
  ▎ window, vma->vm_file will be NULL. Does accessing
  ▎ vma->vm_file->f_inode here cause a NULL pointer dereference?

  Good catch, this is a real bug. Will fix with a vm_file NULL guard.

  ▎ Filesystem eviction paths often acquire locks (like i_rwsem) that
  ▎ invert with the mmap lock. Can this cause an AB-BA deadlock? Should
  ▎ this take a reference to the struct file via get_file() and
release it
  ▎ with fput() instead, which defers destruction safely?

  Valid concern. I'll switch to get_file()/fput() which defers the
  destruction safely.

  ▎ Whatever we do verify the VMA this should not be EAGAIN. EINVAL or
  ▎ ENOENT like mfill_get_vma() returns seem more appropriate.

  Agreed, will change to -EINVAL to match mfill_get_vma()'s validation
  failures.

  Will send a v2 with all three fixes later on.

Cheers !


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-31 12:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-30 20:29 [PATCH v2] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry() David Carlier
2026-03-30 20:40 ` Andrew Morton
2026-03-30 21:27   ` David CARLIER
2026-03-30 21:32   ` David CARLIER
2026-03-30 23:42     ` Andrew Morton
2026-03-30 20:51 ` Peter Xu
2026-03-31 11:56 ` Mike Rapoport
2026-03-31 12:07   ` David CARLIER

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox