* [PATCH 1/2] mm/userfaultfd: fix stale ops and VMA type mismatch after copy retry @ 2026-03-28 17:01 David Carlier 2026-03-28 17:01 ` [PATCH 2/2] mm/userfaultfd: fix wrong likely() hint on mmap_changing check in move_pages() David Carlier 2026-03-30 19:38 ` [PATCH 1/2] mm/userfaultfd: fix stale ops and VMA type mismatch after copy retry Peter Xu 0 siblings, 2 replies; 5+ messages in thread From: David Carlier @ 2026-03-28 17:01 UTC (permalink / raw) To: Andrew Morton, Peter Xu Cc: Mike Rapoport, linux-mm, linux-kernel, David Carlier In mfill_atomic_pte_copy(), ops is derived from the VMA once and passed to __mfill_atomic_pte(). When the initial copy_from_user() fails under pagefault_disable(), mfill_copy_folio_retry() drops all locks, performs the copy with page faults enabled, then re-acquires locks via mfill_get_vma(). During this window, the VMA can be replaced entirely (e.g. munmap + mmap + UFFDIO_REGISTER by another thread), but ops is never re-validated. If a shared shmem VMA is replaced by an anonymous VMA, the stale shmem_uffd_ops->filemap_add calls shmem_mfill_filemap_add() with an anonymous VMA, causing a NULL pointer dereference at file_inode(vma-> vm_file) since vm_file is NULL for anonymous mappings. The mmap_changing guard does not fully prevent this because userfaultfd_unmap_prep() only increments mmap_changing when UFFD_FEATURE_EVENT_UNMAP is enabled, which is optional. Without it, munmap proceeds without any signal to the retry path. The copy_from_user() in the retry runs with page faults enabled and can block on slow backing stores (FUSE, NFS), significantly widening the race window. Fix this by: - Validating that the VMA's userfaultfd context matches state->ctx in mfill_get_vma() to detect cross-context VMA replacement. - Re-checking that vma_uffd_ops() still matches the frozen ops after the retry, and that the VMA is still VM_SHARED when ops expects it to be, returning -EAGAIN otherwise. Signed-off-by: David Carlier <devnexen@gmail.com> --- mm/userfaultfd.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 481ec7eb4442..2a6e034b15aa 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -225,8 +225,9 @@ static int mfill_get_vma(struct mfill_state *state) */ down_read(&ctx->map_changing_lock); state->vma = dst_vma; + err = -EAGAIN; - if (atomic_read(&ctx->mmap_changing)) + if (dst_vma->vm_userfaultfd_ctx.ctx != ctx || atomic_read(&ctx->mmap_changing)) goto out_unlock; err = -EINVAL; @@ -498,6 +499,12 @@ static int __mfill_atomic_pte(struct mfill_state *state, ret = mfill_copy_folio_retry(state, folio); if (ret) goto err_folio_put; + if (vma_uffd_ops(state->vma) != ops || + (ops != &anon_uffd_ops && + !(state->vma->vm_flags & VM_SHARED))) { + ret = -EAGAIN; + goto err_folio_put; + } } } else if (uffd_flags_mode_is(flags, MFILL_ATOMIC_ZEROPAGE)) { clear_user_highpage(&folio->page, state->dst_addr); -- 2.53.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] mm/userfaultfd: fix wrong likely() hint on mmap_changing check in move_pages() 2026-03-28 17:01 [PATCH 1/2] mm/userfaultfd: fix stale ops and VMA type mismatch after copy retry David Carlier @ 2026-03-28 17:01 ` David Carlier 2026-03-30 19:39 ` Peter Xu 2026-03-31 11:42 ` Mike Rapoport 2026-03-30 19:38 ` [PATCH 1/2] mm/userfaultfd: fix stale ops and VMA type mismatch after copy retry Peter Xu 1 sibling, 2 replies; 5+ messages in thread From: David Carlier @ 2026-03-28 17:01 UTC (permalink / raw) To: Andrew Morton, Peter Xu Cc: Mike Rapoport, linux-mm, linux-kernel, David Carlier The mmap_changing check in move_pages() uses likely() but the condition being true (concurrent mapping changes during UFFDIO_MOVE) is the exceptional case, not the common one. All other mmap_changing checks in the same file correctly use no branch hint or use unlikely(). Replace likely() with unlikely() to match the expected branch behavior. Signed-off-by: David Carlier <devnexen@gmail.com> --- mm/userfaultfd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 2a6e034b15aa..dc1b3dd1aece 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1872,7 +1872,7 @@ ssize_t move_pages(struct userfaultfd_ctx *ctx, unsigned long dst_start, /* Re-check after taking map_changing_lock */ err = -EAGAIN; down_read(&ctx->map_changing_lock); - if (likely(atomic_read(&ctx->mmap_changing))) + if (unlikely(atomic_read(&ctx->mmap_changing))) goto out_unlock; /* * Make sure the vma is not shared, that the src and dst remap -- 2.53.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] mm/userfaultfd: fix wrong likely() hint on mmap_changing check in move_pages() 2026-03-28 17:01 ` [PATCH 2/2] mm/userfaultfd: fix wrong likely() hint on mmap_changing check in move_pages() David Carlier @ 2026-03-30 19:39 ` Peter Xu 2026-03-31 11:42 ` Mike Rapoport 1 sibling, 0 replies; 5+ messages in thread From: Peter Xu @ 2026-03-30 19:39 UTC (permalink / raw) To: David Carlier; +Cc: Andrew Morton, Mike Rapoport, linux-mm, linux-kernel On Sat, Mar 28, 2026 at 05:01:01PM +0000, David Carlier wrote: > The mmap_changing check in move_pages() uses likely() but the condition > being true (concurrent mapping changes during UFFDIO_MOVE) is the > exceptional case, not the common one. All other mmap_changing checks in > the same file correctly use no branch hint or use unlikely(). > > Replace likely() with unlikely() to match the expected branch behavior. > > Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- Peter Xu ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] mm/userfaultfd: fix wrong likely() hint on mmap_changing check in move_pages() 2026-03-28 17:01 ` [PATCH 2/2] mm/userfaultfd: fix wrong likely() hint on mmap_changing check in move_pages() David Carlier 2026-03-30 19:39 ` Peter Xu @ 2026-03-31 11:42 ` Mike Rapoport 1 sibling, 0 replies; 5+ messages in thread From: Mike Rapoport @ 2026-03-31 11:42 UTC (permalink / raw) To: David Carlier; +Cc: Andrew Morton, Peter Xu, linux-mm, linux-kernel On Sat, Mar 28, 2026 at 05:01:01PM +0000, David Carlier wrote: > The mmap_changing check in move_pages() uses likely() but the condition > being true (concurrent mapping changes during UFFDIO_MOVE) is the > exceptional case, not the common one. All other mmap_changing checks in > the same file correctly use no branch hint or use unlikely(). > > Replace likely() with unlikely() to match the expected branch behavior. > > Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> > --- > mm/userfaultfd.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] mm/userfaultfd: fix stale ops and VMA type mismatch after copy retry 2026-03-28 17:01 [PATCH 1/2] mm/userfaultfd: fix stale ops and VMA type mismatch after copy retry David Carlier 2026-03-28 17:01 ` [PATCH 2/2] mm/userfaultfd: fix wrong likely() hint on mmap_changing check in move_pages() David Carlier @ 2026-03-30 19:38 ` Peter Xu 1 sibling, 0 replies; 5+ messages in thread From: Peter Xu @ 2026-03-30 19:38 UTC (permalink / raw) To: David Carlier Cc: Andrew Morton, Mike Rapoport, linux-mm, linux-kernel, James Houghton Hi, David, On Sat, Mar 28, 2026 at 05:01:00PM +0000, David Carlier wrote: > In mfill_atomic_pte_copy(), ops is derived from the VMA once and passed > to __mfill_atomic_pte(). When the initial copy_from_user() fails under > pagefault_disable(), mfill_copy_folio_retry() drops all locks, performs > the copy with page faults enabled, then re-acquires locks via > mfill_get_vma(). During this window, the VMA can be replaced entirely > (e.g. munmap + mmap + UFFDIO_REGISTER by another thread), but ops is > never re-validated. Thanks for the report, this seems a bug indeed. > > If a shared shmem VMA is replaced by an anonymous VMA, the stale > shmem_uffd_ops->filemap_add calls shmem_mfill_filemap_add() with an > anonymous VMA, causing a NULL pointer dereference at file_inode(vma-> > vm_file) since vm_file is NULL for anonymous mappings. > > The mmap_changing guard does not fully prevent this because > userfaultfd_unmap_prep() only increments mmap_changing when > UFFD_FEATURE_EVENT_UNMAP is enabled, which is optional. Without it, > munmap proceeds without any signal to the retry path. > > The copy_from_user() in the retry runs with page faults enabled and can > block on slow backing stores (FUSE, NFS), significantly widening the > race window. > > Fix this by: > - Validating that the VMA's userfaultfd context matches state->ctx in > mfill_get_vma() to detect cross-context VMA replacement. > - Re-checking that vma_uffd_ops() still matches the frozen ops after > the retry, and that the VMA is still VM_SHARED when ops expects it > to be, returning -EAGAIN otherwise. > > Signed-off-by: David Carlier <devnexen@gmail.com> > --- > mm/userfaultfd.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index 481ec7eb4442..2a6e034b15aa 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -225,8 +225,9 @@ static int mfill_get_vma(struct mfill_state *state) > */ > down_read(&ctx->map_changing_lock); > state->vma = dst_vma; > + > err = -EAGAIN; > - if (atomic_read(&ctx->mmap_changing)) > + if (dst_vma->vm_userfaultfd_ctx.ctx != ctx || atomic_read(&ctx->mmap_changing)) This is a valid and good check. Though IMHO it doesn't need to be in the same patch as a fix to the bug reported, likely it suites for a separate patch. For example, a new VMA (as you described in the case of when the bug can happen) can also be registered to the same userfaultfd ctx, and this check won't help there since the ctx will still match. > goto out_unlock; > > err = -EINVAL; > @@ -498,6 +499,12 @@ static int __mfill_atomic_pte(struct mfill_state *state, > ret = mfill_copy_folio_retry(state, folio); > if (ret) > goto err_folio_put; > + if (vma_uffd_ops(state->vma) != ops || > + (ops != &anon_uffd_ops && > + !(state->vma->vm_flags & VM_SHARED))) { Hard-code this (out of mfill_atomic_pte_copy) is very likely not a good idea.. Meanwhile I also feel like it won't completely fix the problem. Consider the changed VMA is not about shmem VMA becoming anon VMA, but shmem VMA1 becoming shmem VMA2 and both of them can even have the same flags. IMHO in that case we should still fallback with EAGAIN because we still hold a folio that we got allocated from VMA1 (rather than VMA2) here. Ideally, we should check "if the VMA has changed at all", which will be a very safe check, however I don't think we have any good way to refcount VMA... at least not something I'm aware of. A simple workaround here is we take a snapshot memory of the VMA attributes and making sure that didn't change after the retried copy_from_user(), assuming that guarantees the VMA didn't change. AFAIU, the important bit is at least two things: (1) the inode representing the backing store of the VMA mapping (when it's a file), and (2) vma flags. This patch below will do something like what I discussed above; that's the best I can come up with so far. Not sure if there's even better way to do. Thanks, ===8<=== diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 481ec7eb44420..caac1afabd520 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -443,33 +443,90 @@ static int mfill_copy_folio_locked(struct folio *folio, unsigned long src_addr) return ret; } +struct vma_snapshot { + struct inode *inode; + vma_flags_t flags; +}; + +static void vma_snapshot_take(struct vm_area_struct *vma, + struct vma_snapshot *s) +{ + struct inode *inode; + + memcpy(&s->flags, &vma->flags, sizeof(s->flags)); + if (vma->vm_file) { + /* We're holding vma lock, so file and inode are available */ + inode = vma->vm_file->f_inode; + ihold(inode); + s->inode = inode; + } else { + s->inode = NULL; + } +} + +static bool vma_snapshot_changed(struct vm_area_struct *vma, + struct vma_snapshot *s) +{ + /* If vma flags changed? */ + if (memcmp(&s->flags, &vma->flags, sizeof(s->flags))) + return true; + + /* If there's a backing store of mapping, make sure it didn't change */ + if (s->inode && vma->vm_file->f_inode != s->inode) + return true; + + /* If not, making sure it's still anonymous */ + if (!s->inode && !vma_is_anonymous(vma)) + return true; + + return false; +} + +static void vma_snapshot_release(struct vma_snapshot *s) +{ + if (s->inode) { + iput(s->inode); + s->inode = NULL; + } +} + static int mfill_copy_folio_retry(struct mfill_state *state, struct folio *folio) { unsigned long src_addr = state->src_addr; + struct vma_snapshot s; void *kaddr; int err; + /* Take a quick snapshot of the current vma */ + vma_snapshot_take(state->vma, &s); + /* retry copying with mm_lock dropped */ mfill_put_vma(state); kaddr = kmap_local_folio(folio, 0); err = copy_from_user(kaddr, (const void __user *) src_addr, PAGE_SIZE); kunmap_local(kaddr); - if (unlikely(err)) - return -EFAULT; + if (unlikely(err)) { + err = -EFAULT; + goto out; + } flush_dcache_folio(folio); /* reget VMA and PMD, they could change underneath us */ err = mfill_get_vma(state); if (err) - return err; + goto out; - err = mfill_establish_pmd(state); - if (err) - return err; + if (vma_snapshot_changed(state->vma, &s)) { + err = -EAGAIN; + goto out; + } - return 0; + err = mfill_establish_pmd(state); +out: + vma_snapshot_release(&s); + return err; } static int __mfill_atomic_pte(struct mfill_state *state, -- 2.50.1 -- Peter Xu ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-03-31 11:43 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-28 17:01 [PATCH 1/2] mm/userfaultfd: fix stale ops and VMA type mismatch after copy retry David Carlier 2026-03-28 17:01 ` [PATCH 2/2] mm/userfaultfd: fix wrong likely() hint on mmap_changing check in move_pages() David Carlier 2026-03-30 19:39 ` Peter Xu 2026-03-31 11:42 ` Mike Rapoport 2026-03-30 19:38 ` [PATCH 1/2] mm/userfaultfd: fix stale ops and VMA type mismatch after copy retry Peter Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox