public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [PATCH 1/2] mm/userfaultfd: fix stale ops and VMA type mismatch after copy retry
@ 2026-03-28 17:01 David Carlier
  2026-03-28 17:01 ` [PATCH 2/2] mm/userfaultfd: fix wrong likely() hint on mmap_changing check in move_pages() David Carlier
  2026-03-30 19:38 ` [PATCH 1/2] mm/userfaultfd: fix stale ops and VMA type mismatch after copy retry Peter Xu
  0 siblings, 2 replies; 5+ messages in thread
From: David Carlier @ 2026-03-28 17:01 UTC (permalink / raw)
  To: Andrew Morton, Peter Xu
  Cc: Mike Rapoport, linux-mm, linux-kernel, David Carlier

In mfill_atomic_pte_copy(), ops is derived from the VMA once and passed
to __mfill_atomic_pte(). When the initial copy_from_user() fails under
pagefault_disable(), mfill_copy_folio_retry() drops all locks, performs
the copy with page faults enabled, then re-acquires locks via
mfill_get_vma(). During this window, the VMA can be replaced entirely
(e.g. munmap + mmap + UFFDIO_REGISTER by another thread), but ops is
never re-validated.

If a shared shmem VMA is replaced by an anonymous VMA, the stale
shmem_uffd_ops->filemap_add calls shmem_mfill_filemap_add() with an
anonymous VMA, causing a NULL pointer dereference at file_inode(vma->
vm_file) since vm_file is NULL for anonymous mappings.

The mmap_changing guard does not fully prevent this because
userfaultfd_unmap_prep() only increments mmap_changing when
UFFD_FEATURE_EVENT_UNMAP is enabled, which is optional. Without it,
munmap proceeds without any signal to the retry path.

The copy_from_user() in the retry runs with page faults enabled and can
block on slow backing stores (FUSE, NFS), significantly widening the
race window.

Fix this by:
 - Validating that the VMA's userfaultfd context matches state->ctx in
   mfill_get_vma() to detect cross-context VMA replacement.
 - Re-checking that vma_uffd_ops() still matches the frozen ops after
   the retry, and that the VMA is still VM_SHARED when ops expects it
   to be, returning -EAGAIN otherwise.

Signed-off-by: David Carlier <devnexen@gmail.com>
---
 mm/userfaultfd.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 481ec7eb4442..2a6e034b15aa 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -225,8 +225,9 @@ static int mfill_get_vma(struct mfill_state *state)
 	 */
 	down_read(&ctx->map_changing_lock);
 	state->vma = dst_vma;
+
 	err = -EAGAIN;
-	if (atomic_read(&ctx->mmap_changing))
+	if (dst_vma->vm_userfaultfd_ctx.ctx != ctx || atomic_read(&ctx->mmap_changing))
 		goto out_unlock;
 
 	err = -EINVAL;
@@ -498,6 +499,12 @@ static int __mfill_atomic_pte(struct mfill_state *state,
 			ret = mfill_copy_folio_retry(state, folio);
 			if (ret)
 				goto err_folio_put;
+			if (vma_uffd_ops(state->vma) != ops ||
+			    (ops != &anon_uffd_ops &&
+			    !(state->vma->vm_flags & VM_SHARED))) {
+				ret = -EAGAIN;
+				goto err_folio_put;
+			}
 		}
 	} else if (uffd_flags_mode_is(flags, MFILL_ATOMIC_ZEROPAGE)) {
 		clear_user_highpage(&folio->page, state->dst_addr);
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] mm/userfaultfd: fix wrong likely() hint on mmap_changing check in move_pages()
  2026-03-28 17:01 [PATCH 1/2] mm/userfaultfd: fix stale ops and VMA type mismatch after copy retry David Carlier
@ 2026-03-28 17:01 ` David Carlier
  2026-03-30 19:39   ` Peter Xu
  2026-03-31 11:42   ` Mike Rapoport
  2026-03-30 19:38 ` [PATCH 1/2] mm/userfaultfd: fix stale ops and VMA type mismatch after copy retry Peter Xu
  1 sibling, 2 replies; 5+ messages in thread
From: David Carlier @ 2026-03-28 17:01 UTC (permalink / raw)
  To: Andrew Morton, Peter Xu
  Cc: Mike Rapoport, linux-mm, linux-kernel, David Carlier

The mmap_changing check in move_pages() uses likely() but the condition
being true (concurrent mapping changes during UFFDIO_MOVE) is the
exceptional case, not the common one. All other mmap_changing checks in
the same file correctly use no branch hint or use unlikely().

Replace likely() with unlikely() to match the expected branch behavior.

Signed-off-by: David Carlier <devnexen@gmail.com>
---
 mm/userfaultfd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 2a6e034b15aa..dc1b3dd1aece 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1872,7 +1872,7 @@ ssize_t move_pages(struct userfaultfd_ctx *ctx, unsigned long dst_start,
 	/* Re-check after taking map_changing_lock */
 	err = -EAGAIN;
 	down_read(&ctx->map_changing_lock);
-	if (likely(atomic_read(&ctx->mmap_changing)))
+	if (unlikely(atomic_read(&ctx->mmap_changing)))
 		goto out_unlock;
 	/*
 	 * Make sure the vma is not shared, that the src and dst remap
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] mm/userfaultfd: fix stale ops and VMA type mismatch after copy retry
  2026-03-28 17:01 [PATCH 1/2] mm/userfaultfd: fix stale ops and VMA type mismatch after copy retry David Carlier
  2026-03-28 17:01 ` [PATCH 2/2] mm/userfaultfd: fix wrong likely() hint on mmap_changing check in move_pages() David Carlier
@ 2026-03-30 19:38 ` Peter Xu
  1 sibling, 0 replies; 5+ messages in thread
From: Peter Xu @ 2026-03-30 19:38 UTC (permalink / raw)
  To: David Carlier
  Cc: Andrew Morton, Mike Rapoport, linux-mm, linux-kernel,
	James Houghton

Hi, David,

On Sat, Mar 28, 2026 at 05:01:00PM +0000, David Carlier wrote:
> In mfill_atomic_pte_copy(), ops is derived from the VMA once and passed
> to __mfill_atomic_pte(). When the initial copy_from_user() fails under
> pagefault_disable(), mfill_copy_folio_retry() drops all locks, performs
> the copy with page faults enabled, then re-acquires locks via
> mfill_get_vma(). During this window, the VMA can be replaced entirely
> (e.g. munmap + mmap + UFFDIO_REGISTER by another thread), but ops is
> never re-validated.

Thanks for the report, this seems a bug indeed.

> 
> If a shared shmem VMA is replaced by an anonymous VMA, the stale
> shmem_uffd_ops->filemap_add calls shmem_mfill_filemap_add() with an
> anonymous VMA, causing a NULL pointer dereference at file_inode(vma->
> vm_file) since vm_file is NULL for anonymous mappings.
> 
> The mmap_changing guard does not fully prevent this because
> userfaultfd_unmap_prep() only increments mmap_changing when
> UFFD_FEATURE_EVENT_UNMAP is enabled, which is optional. Without it,
> munmap proceeds without any signal to the retry path.
> 
> The copy_from_user() in the retry runs with page faults enabled and can
> block on slow backing stores (FUSE, NFS), significantly widening the
> race window.
> 
> Fix this by:
>  - Validating that the VMA's userfaultfd context matches state->ctx in
>    mfill_get_vma() to detect cross-context VMA replacement.
>  - Re-checking that vma_uffd_ops() still matches the frozen ops after
>    the retry, and that the VMA is still VM_SHARED when ops expects it
>    to be, returning -EAGAIN otherwise.
> 
> Signed-off-by: David Carlier <devnexen@gmail.com>
> ---
>  mm/userfaultfd.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 481ec7eb4442..2a6e034b15aa 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -225,8 +225,9 @@ static int mfill_get_vma(struct mfill_state *state)
>  	 */
>  	down_read(&ctx->map_changing_lock);
>  	state->vma = dst_vma;
> +
>  	err = -EAGAIN;
> -	if (atomic_read(&ctx->mmap_changing))
> +	if (dst_vma->vm_userfaultfd_ctx.ctx != ctx || atomic_read(&ctx->mmap_changing))

This is a valid and good check.  Though IMHO it doesn't need to be in the
same patch as a fix to the bug reported, likely it suites for a separate
patch.

For example, a new VMA (as you described in the case of when the bug can
happen) can also be registered to the same userfaultfd ctx, and this check
won't help there since the ctx will still match.

>  		goto out_unlock;
>  
>  	err = -EINVAL;
> @@ -498,6 +499,12 @@ static int __mfill_atomic_pte(struct mfill_state *state,
>  			ret = mfill_copy_folio_retry(state, folio);
>  			if (ret)
>  				goto err_folio_put;
> +			if (vma_uffd_ops(state->vma) != ops ||
> +			    (ops != &anon_uffd_ops &&
> +			    !(state->vma->vm_flags & VM_SHARED))) {

Hard-code this (out of mfill_atomic_pte_copy) is very likely not a good
idea..  Meanwhile I also feel like it won't completely fix the problem.

Consider the changed VMA is not about shmem VMA becoming anon VMA, but
shmem VMA1 becoming shmem VMA2 and both of them can even have the same
flags.  IMHO in that case we should still fallback with EAGAIN because we
still hold a folio that we got allocated from VMA1 (rather than VMA2) here.

Ideally, we should check "if the VMA has changed at all", which will be a
very safe check, however I don't think we have any good way to refcount
VMA... at least not something I'm aware of.

A simple workaround here is we take a snapshot memory of the VMA attributes
and making sure that didn't change after the retried copy_from_user(),
assuming that guarantees the VMA didn't change.  AFAIU, the important bit
is at least two things: (1) the inode representing the backing store of the
VMA mapping (when it's a file), and (2) vma flags.

This patch below will do something like what I discussed above; that's the
best I can come up with so far.  Not sure if there's even better way to do.

Thanks,

===8<===
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 481ec7eb44420..caac1afabd520 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -443,33 +443,90 @@ static int mfill_copy_folio_locked(struct folio *folio, unsigned long src_addr)
 	return ret;
 }
 
+struct vma_snapshot {
+	struct inode *inode;
+	vma_flags_t flags;
+};
+
+static void vma_snapshot_take(struct vm_area_struct *vma,
+			      struct vma_snapshot *s)
+{
+	struct inode *inode;
+
+	memcpy(&s->flags, &vma->flags, sizeof(s->flags));
+	if (vma->vm_file) {
+		/* We're holding vma lock, so file and inode are available */
+		inode = vma->vm_file->f_inode;
+		ihold(inode);
+		s->inode = inode;
+	} else {
+		s->inode = NULL;
+	}
+}
+
+static bool vma_snapshot_changed(struct vm_area_struct *vma,
+				 struct vma_snapshot *s)
+{
+	/* If vma flags changed? */
+	if (memcmp(&s->flags, &vma->flags, sizeof(s->flags)))
+		return true;
+
+	/* If there's a backing store of mapping, make sure it didn't change */
+	if (s->inode && vma->vm_file->f_inode != s->inode)
+		return true;
+
+	/* If not, making sure it's still anonymous */
+	if (!s->inode && !vma_is_anonymous(vma))
+		return true;
+
+	return false;
+}
+
+static void vma_snapshot_release(struct vma_snapshot *s)
+{
+	if (s->inode) {
+		iput(s->inode);
+		s->inode = NULL;
+	}
+}
+
 static int mfill_copy_folio_retry(struct mfill_state *state, struct folio *folio)
 {
 	unsigned long src_addr = state->src_addr;
+	struct vma_snapshot s;
 	void *kaddr;
 	int err;
 
+	/* Take a quick snapshot of the current vma */
+	vma_snapshot_take(state->vma, &s);
+
 	/* retry copying with mm_lock dropped */
 	mfill_put_vma(state);
 
 	kaddr = kmap_local_folio(folio, 0);
 	err = copy_from_user(kaddr, (const void __user *) src_addr, PAGE_SIZE);
 	kunmap_local(kaddr);
-	if (unlikely(err))
-		return -EFAULT;
+	if (unlikely(err)) {
+		err = -EFAULT;
+		goto out;
+	}
 
 	flush_dcache_folio(folio);
 
 	/* reget VMA and PMD, they could change underneath us */
 	err = mfill_get_vma(state);
 	if (err)
-		return err;
+		goto out;
 
-	err = mfill_establish_pmd(state);
-	if (err)
-		return err;
+	if (vma_snapshot_changed(state->vma, &s)) {
+		err = -EAGAIN;
+		goto out;
+	}
 
-	return 0;
+	err = mfill_establish_pmd(state);
+out:
+	vma_snapshot_release(&s);
+	return err;
 }
 
 static int __mfill_atomic_pte(struct mfill_state *state,
-- 
2.50.1


-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] mm/userfaultfd: fix wrong likely() hint on mmap_changing check in move_pages()
  2026-03-28 17:01 ` [PATCH 2/2] mm/userfaultfd: fix wrong likely() hint on mmap_changing check in move_pages() David Carlier
@ 2026-03-30 19:39   ` Peter Xu
  2026-03-31 11:42   ` Mike Rapoport
  1 sibling, 0 replies; 5+ messages in thread
From: Peter Xu @ 2026-03-30 19:39 UTC (permalink / raw)
  To: David Carlier; +Cc: Andrew Morton, Mike Rapoport, linux-mm, linux-kernel

On Sat, Mar 28, 2026 at 05:01:01PM +0000, David Carlier wrote:
> The mmap_changing check in move_pages() uses likely() but the condition
> being true (concurrent mapping changes during UFFDIO_MOVE) is the
> exceptional case, not the common one. All other mmap_changing checks in
> the same file correctly use no branch hint or use unlikely().
> 
> Replace likely() with unlikely() to match the expected branch behavior.
> 
> Signed-off-by: David Carlier <devnexen@gmail.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] mm/userfaultfd: fix wrong likely() hint on mmap_changing check in move_pages()
  2026-03-28 17:01 ` [PATCH 2/2] mm/userfaultfd: fix wrong likely() hint on mmap_changing check in move_pages() David Carlier
  2026-03-30 19:39   ` Peter Xu
@ 2026-03-31 11:42   ` Mike Rapoport
  1 sibling, 0 replies; 5+ messages in thread
From: Mike Rapoport @ 2026-03-31 11:42 UTC (permalink / raw)
  To: David Carlier; +Cc: Andrew Morton, Peter Xu, linux-mm, linux-kernel

On Sat, Mar 28, 2026 at 05:01:01PM +0000, David Carlier wrote:
> The mmap_changing check in move_pages() uses likely() but the condition
> being true (concurrent mapping changes during UFFDIO_MOVE) is the
> exceptional case, not the common one. All other mmap_changing checks in
> the same file correctly use no branch hint or use unlikely().
> 
> Replace likely() with unlikely() to match the expected branch behavior.
> 
> Signed-off-by: David Carlier <devnexen@gmail.com>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  mm/userfaultfd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-31 11:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-28 17:01 [PATCH 1/2] mm/userfaultfd: fix stale ops and VMA type mismatch after copy retry David Carlier
2026-03-28 17:01 ` [PATCH 2/2] mm/userfaultfd: fix wrong likely() hint on mmap_changing check in move_pages() David Carlier
2026-03-30 19:39   ` Peter Xu
2026-03-31 11:42   ` Mike Rapoport
2026-03-30 19:38 ` [PATCH 1/2] mm/userfaultfd: fix stale ops and VMA type mismatch after copy retry Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox