* [PATCH v4] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry()
@ 2026-03-31 13:41 David Carlier
2026-04-01 3:01 ` Andrew Morton
0 siblings, 1 reply; 13+ messages in thread
From: David Carlier @ 2026-03-31 13:41 UTC (permalink / raw)
To: Peter Xu, Andrew Morton
Cc: Mike Rapoport, linux-mm, linux-kernel, David Carlier
In mfill_copy_folio_retry(), all locks are dropped to retry
copy_from_user() with page faults enabled. During this window, the VMA
can be replaced entirely (e.g. munmap + mmap + UFFDIO_REGISTER by
another thread), but the caller proceeds with a folio allocated from the
original VMA's backing store.
Checking ops alone is insufficient: the replacement VMA could be the
same type (e.g. shmem -> shmem) with identical flags but a different
backing inode. Take a snapshot of the VMA's file and flags before
dropping locks, and compare after re-acquiring them. If anything
changed, bail out with -EINVAL.
Use get_file()/fput() rather than ihold()/iput() to hold the file
reference across the lock-dropped window, avoiding potential deadlocks
from filesystem eviction under mmap_lock.
Fixes: 56a3706fd7f9 ("shmem, userfaultfd: implement shmem uffd operations using vm_uffd_ops")
Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Carlier <devnexen@gmail.com>
---
mm/userfaultfd.c | 63 ++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 56 insertions(+), 7 deletions(-)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 481ec7eb4442..93d6a954e659 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -443,33 +443,82 @@ static int mfill_copy_folio_locked(struct folio *folio, unsigned long src_addr)
return ret;
}
+struct vma_snapshot {
+ struct file *file;
+ vma_flags_t flags;
+};
+
+static void vma_snapshot_take(struct vm_area_struct *vma,
+ struct vma_snapshot *s)
+{
+ memcpy(&s->flags, &vma->flags, sizeof(s->flags));
+ if (vma->vm_file)
+ s->file = get_file(vma->vm_file);
+ else
+ s->file = NULL;
+}
+
+static bool vma_snapshot_changed(struct vm_area_struct *vma,
+ struct vma_snapshot *s)
+{
+ if (memcmp(&s->flags, &vma->flags, sizeof(s->flags)))
+ return true;
+
+ if (s->file && (!vma->vm_file ||
+ vma->vm_file->f_inode != s->file->f_inode))
+ return true;
+
+ if (!s->file && !vma_is_anonymous(vma))
+ return true;
+
+ return false;
+}
+
+static void vma_snapshot_release(struct vma_snapshot *s)
+{
+ if (s->file) {
+ fput(s->file);
+ s->file = NULL;
+ }
+}
+
static int mfill_copy_folio_retry(struct mfill_state *state, struct folio *folio)
{
unsigned long src_addr = state->src_addr;
+ struct vma_snapshot s;
void *kaddr;
int err;
+ /* Take a quick snapshot of the current vma */
+ vma_snapshot_take(state->vma, &s);
+
/* retry copying with mm_lock dropped */
mfill_put_vma(state);
kaddr = kmap_local_folio(folio, 0);
err = copy_from_user(kaddr, (const void __user *) src_addr, PAGE_SIZE);
kunmap_local(kaddr);
- if (unlikely(err))
- return -EFAULT;
+ if (unlikely(err)) {
+ err = -EFAULT;
+ goto out;
+ }
flush_dcache_folio(folio);
/* reget VMA and PMD, they could change underneath us */
err = mfill_get_vma(state);
if (err)
- return err;
+ goto out;
- err = mfill_establish_pmd(state);
- if (err)
- return err;
+ if (vma_snapshot_changed(state->vma, &s)) {
+ err = -EINVAL;
+ goto out;
+ }
- return 0;
+ err = mfill_establish_pmd(state);
+out:
+ vma_snapshot_release(&s);
+ return err;
}
static int __mfill_atomic_pte(struct mfill_state *state,
--
2.53.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH v4] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry() 2026-03-31 13:41 [PATCH v4] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry() David Carlier @ 2026-04-01 3:01 ` Andrew Morton 2026-04-01 7:49 ` Mike Rapoport 0 siblings, 1 reply; 13+ messages in thread From: Andrew Morton @ 2026-04-01 3:01 UTC (permalink / raw) To: David Carlier; +Cc: Peter Xu, Mike Rapoport, linux-mm, linux-kernel On Tue, 31 Mar 2026 14:41:58 +0100 David Carlier <devnexen@gmail.com> wrote: > In mfill_copy_folio_retry(), all locks are dropped to retry > copy_from_user() with page faults enabled. During this window, the VMA > can be replaced entirely (e.g. munmap + mmap + UFFDIO_REGISTER by > another thread), but the caller proceeds with a folio allocated from the > original VMA's backing store. > > Checking ops alone is insufficient: the replacement VMA could be the > same type (e.g. shmem -> shmem) with identical flags but a different > backing inode. Take a snapshot of the VMA's file and flags before > dropping locks, and compare after re-acquiring them. If anything > changed, bail out with -EINVAL. > > Use get_file()/fput() rather than ihold()/iput() to hold the file > reference across the lock-dropped window, avoiding potential deadlocks > from filesystem eviction under mmap_lock. Thanks, I've queued this as a squashable fix against mm-unstable's "shmem, userfaultfd: implement shmem uffd operations using vm_uffd_ops ongoing". I've fumbled the ball on your [2/2] unlikely() fix ;). Please resend that after -rc1. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry() 2026-04-01 3:01 ` Andrew Morton @ 2026-04-01 7:49 ` Mike Rapoport 2026-04-01 8:06 ` David CARLIER 0 siblings, 1 reply; 13+ messages in thread From: Mike Rapoport @ 2026-04-01 7:49 UTC (permalink / raw) To: Andrew Morton Cc: David Carlier, Peter Xu, linux-mm, linux-kernel, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka Hi Andrew, On Tue, Mar 31, 2026 at 08:01:48PM -0700, Andrew Morton wrote: > On Tue, 31 Mar 2026 14:41:58 +0100 David Carlier <devnexen@gmail.com> wrote: > > > In mfill_copy_folio_retry(), all locks are dropped to retry > > copy_from_user() with page faults enabled. During this window, the VMA > > can be replaced entirely (e.g. munmap + mmap + UFFDIO_REGISTER by > > another thread), but the caller proceeds with a folio allocated from the > > original VMA's backing store. What does "folio allocated from the original VMA's backing store" exactly mean? Why is this a problem? > > Checking ops alone is insufficient: the replacement VMA could be the > > same type (e.g. shmem -> shmem) with identical flags but a different > > backing inode. Take a snapshot of the VMA's file and flags before > > dropping locks, and compare after re-acquiring them. If anything > > changed, bail out with -EINVAL. > > > > Use get_file()/fput() rather than ihold()/iput() to hold the file > > reference across the lock-dropped window, avoiding potential deadlocks > > from filesystem eviction under mmap_lock. > > Thanks, I've queued this as a squashable fix against mm-unstable's > "shmem, userfaultfd: implement shmem uffd operations using vm_uffd_ops > ongoing". First, this a pre-existing and TBH quite theoretical bug and it was there since the very beginning, so it should not be added as a fixup for the uffd+guestmemfd series. Second, I have reservations about vma_snapshot implementation. What invariant does it exactly enforce? > I've fumbled the ball on your [2/2] unlikely() fix ;). Please resend that > after -rc1. This one should go the same route IMO. -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry() 2026-04-01 7:49 ` Mike Rapoport @ 2026-04-01 8:06 ` David CARLIER 2026-04-01 15:23 ` Peter Xu 2026-04-02 3:58 ` Mike Rapoport 0 siblings, 2 replies; 13+ messages in thread From: David CARLIER @ 2026-04-01 8:06 UTC (permalink / raw) To: Mike Rapoport Cc: Andrew Morton, Peter Xu, linux-mm, linux-kernel, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka Hi Mike, On Tue, Apr 01, 2026 at 08:49:00AM +0300, Mike Rapoport wrote: > What does "folio allocated from the original VMA's backing store" exactly > mean? Why is this a problem? Fair point, the commit message was vague here. What I meant is: mfill_atomic_pte_copy() captures ops = vma_uffd_ops(state->vma) and passes it to __mfill_atomic_pte(). There, ops->alloc_folio() allocates a folio for the original VMA's inode (e.g. a shmem folio for that specific shmem inode). Then mfill_copy_folio_retry() drops all locks for the copy_from_user retry. After mfill_get_vma() re-acquires them, state->vma may now point to a replacement VMA, but ops is still the stale pointer from before the drop. The code then calls ops->filemap_add(folio, state->vma, ...) which would insert a folio allocated for the old inode into the new VMA's backing store. If the VMA changed type entirely (e.g. shmem -> anon), ops->filemap_add could be operating on a VMA that has no business receiving this folio. > First, this a pre-existing and TBH quite theoretical bug and it was there > since the very beginning, so it should not be added as a fixup for the > uffd+guestmemfd series. You're right. The race window (VMA replacement during the lock-dropped copy retry) existed in the original mcopy_atomic_pte() code long before the vm_uffd_ops refactoring. The Fixes tag pointing at 56a3706fd7f9 was wrong. I'll drop it and resend as a standalone fix against the original retry logic. > Second, I have reservations about vma_snapshot implementation. What > invariant does it exactly enforce? The invariant I was going for: "the folio we allocated is still compatible with the VMA we're about to install it into." Since alloc_folio() allocates from the VMA's backing file (inode), checking that vm_file is still the same after re-acquiring locks ensures the folio matches the inode. The vm_flags comparison was a secondary guard against permission/type changes during the window. That said, I can see the vma_snapshot abstraction is doing too much for what's really needed. Would a simpler approach work better — just saving vm_file (with get_file/fput) before the drop and comparing it directly after re-acquiring? That makes the invariant explicit: "same backing file means the folio is valid for this VMA." Happy to rework along those lines, or if you have a different approach in mind I'm open to suggestions. > > I've fumbled the ball on your [2/2] unlikely() fix ;). Please resend that > > after -rc1. > > This one should go the same route IMO. Agreed, I'll resend both after -rc1. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry() 2026-04-01 8:06 ` David CARLIER @ 2026-04-01 15:23 ` Peter Xu 2026-04-01 18:34 ` David CARLIER 2026-04-02 3:58 ` Mike Rapoport 1 sibling, 1 reply; 13+ messages in thread From: Peter Xu @ 2026-04-01 15:23 UTC (permalink / raw) To: David CARLIER Cc: Mike Rapoport, Andrew Morton, linux-mm, linux-kernel, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka On Wed, Apr 01, 2026 at 09:06:36AM +0100, David CARLIER wrote: > The invariant I was going for: "the folio we allocated is still > compatible with the VMA we're about to install it into." Since > alloc_folio() allocates from the VMA's backing file (inode), checking > that vm_file is still the same after re-acquiring locks ensures the > folio matches the inode. The vm_flags comparison was a secondary guard > against permission/type changes during the window. > > That said, I can see the vma_snapshot abstraction is doing too much for > what's really needed. Would a simpler approach work better — just > saving vm_file (with get_file/fput) before the drop and comparing it > directly after re-acquiring? That makes the invariant explicit: "same > backing file means the folio is valid for this VMA." IMHO the flags is needed, consider a shared shmem vma remapped to a private shmem vma. That needs to be covered in the fix. Actually instead of reducing checks, maybe we also need to check the offset of the mapping too, that is: vma->vm_pgoff can't change otherwise it may also affect how the back store would behave on this UFFDIO_COPY request. For that, see the example of shmem_get_pgoff_policy() where it seems we can apply different policies to different ranges of the back store. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry() 2026-04-01 15:23 ` Peter Xu @ 2026-04-01 18:34 ` David CARLIER 2026-04-01 19:22 ` Peter Xu 0 siblings, 1 reply; 13+ messages in thread From: David CARLIER @ 2026-04-01 18:34 UTC (permalink / raw) To: Peter Xu Cc: Mike Rapoport, Andrew Morton, linux-mm, linux-kernel, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka Hi Peter, On Tue, Apr 01, 2026 at 04:23:00PM +0300, Peter Xu wrote: > IMHO the flags is needed, consider a shared shmem vma remapped to a private > shmem vma. That needs to be covered in the fix. Right, I hadn't considered that case. Shared->private changes how the folio gets handled even with the same backing file. I'll keep the flags check. > Actually instead of reducing checks, maybe we also need to check the offset > of the mapping too, that is: vma->vm_pgoff can't change otherwise it may > also affect how the back store would behave on this UFFDIO_COPY request. > > For that, see the example of shmem_get_pgoff_policy() where it seems we can > apply different policies to different ranges of the back store. Good point. If vm_pgoff changes, linear_page_index() derives a different page cache offset for the same virtual address, and shmem_get_pgoff_policy() could apply a different NUMA policy to that range. So the folio could end up at the wrong offset or with the wrong placement. I'll add vm_pgoff to the snapshot. So the full set of checks after re-acquiring locks would be: vm_file, vm_flags, and vm_pgoff — ensuring the folio was allocated for the right backing file, at the right offset, with the right VMA type. Thanks ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry() 2026-04-01 18:34 ` David CARLIER @ 2026-04-01 19:22 ` Peter Xu 2026-04-01 20:05 ` David CARLIER 2026-04-02 4:02 ` Mike Rapoport 0 siblings, 2 replies; 13+ messages in thread From: Peter Xu @ 2026-04-01 19:22 UTC (permalink / raw) To: David CARLIER Cc: Mike Rapoport, Andrew Morton, linux-mm, linux-kernel, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka On Wed, Apr 01, 2026 at 07:34:47PM +0100, David CARLIER wrote: > Hi Peter, > > On Tue, Apr 01, 2026 at 04:23:00PM +0300, Peter Xu wrote: > > IMHO the flags is needed, consider a shared shmem vma remapped to a private > > shmem vma. That needs to be covered in the fix. > > Right, I hadn't considered that case. Shared->private changes how > the > folio gets handled even with the same backing file. I'll keep the > flags > check. > > > Actually instead of reducing checks, maybe we also need to check > the offset > > of the mapping too, that is: vma->vm_pgoff can't change otherwise it may > > also affect how the back store would behave on this UFFDIO_COPY > request. > > > > For that, see the example of shmem_get_pgoff_policy() where it > seems we can > > apply different policies to different ranges of the back store. > > Good point. If vm_pgoff changes, linear_page_index() derives a > different page cache offset for the same virtual address, and > shmem_get_pgoff_policy() could apply a different NUMA policy to that > range. So the folio could end up at the wrong offset or with the wrong > placement. > > I'll add vm_pgoff to the snapshot. So the full set of checks after > re-acquiring locks would be: vm_file, vm_flags, and vm_pgoff — ensuring > the folio was allocated for the right backing file, at the right > offset, > with the right VMA type. When caching the offset, we should likely use linear_page_index() with the address provided rather than caching vma->vm_pgoff directly, then it'll avoid same vm_pgoff while VMA mapping shifted like this: VMA1: vm_pgoff=0x10000, vm_start=0x10000 VMA2: vm_pgoff=0x10000, vm_start=0xf000 So VMA1 unmapped, then the app mappped VMA2 at different VA but still cover the address we're requesting for UFFDIO_COPY, even if the VMA will still have the same vm_pgoff, the VA to access the same offset might change. Using linear_page_index() will be accurate, IIUC. The other thing is I just noticed the err code was changed to -EINVAL for snapshot changed cases, sorry I didn't follow previously as closely on the discussion. I think it should be -EAGAIN. It's because the userapp can't resolve -EINVAL failures and app will crash. In a VMA change use case, we should return -EAGAIN to imply the app to retry, rather than crashing. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry() 2026-04-01 19:22 ` Peter Xu @ 2026-04-01 20:05 ` David CARLIER 2026-04-02 4:02 ` Mike Rapoport 1 sibling, 0 replies; 13+ messages in thread From: David CARLIER @ 2026-04-01 20:05 UTC (permalink / raw) To: Peter Xu Cc: Mike Rapoport, Andrew Morton, linux-mm, linux-kernel, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka On Tue, Apr 01, 2026 at 08:22:00PM +0300, Peter Xu wrote: > When caching the offset, we should likely use linear_page_index() with the > address provided rather than caching vma->vm_pgoff directly, then it'll > avoid same vm_pgoff while VMA mapping shifted like this: > > VMA1: vm_pgoff=0x10000, vm_start=0x10000 > VMA2: vm_pgoff=0x10000, vm_start=0xf000 Makes sense. linear_page_index() folds in vm_start so it catches the shifted mapping case where raw vm_pgoff alone wouldn't. I'll snapshot the computed pgoff instead. > The other thing is I just noticed the err code was changed to -EINVAL for > snapshot changed cases, sorry I didn't follow previously as closely on the > discussion. I think it should be -EAGAIN. It's because the userapp can't > resolve -EINVAL failures and app will crash. In a VMA change use case, we > should return -EAGAIN to imply the app to retry, rather than crashing. Right, -EAGAIN is the correct choice here. The VMA changed underneath us, but that's a transient condition the app can recover from by retrying. Will fix. So v5 will snapshot: vm_file (with get_file/fput), vm_flags, and linear_page_index(vma, dst_addr). Return -EAGAIN if any changed. Thanks for the detailed review ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry() 2026-04-01 19:22 ` Peter Xu 2026-04-01 20:05 ` David CARLIER @ 2026-04-02 4:02 ` Mike Rapoport 2026-04-02 5:59 ` David CARLIER 2026-04-02 13:29 ` Peter Xu 1 sibling, 2 replies; 13+ messages in thread From: Mike Rapoport @ 2026-04-02 4:02 UTC (permalink / raw) To: Peter Xu Cc: David CARLIER, Andrew Morton, linux-mm, linux-kernel, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka On Wed, Apr 01, 2026 at 03:22:03PM -0400, Peter Xu wrote: > > The other thing is I just noticed the err code was changed to -EINVAL for > snapshot changed cases, sorry I didn't follow previously as closely on the > discussion. I think it should be -EAGAIN. It's because the userapp can't > resolve -EINVAL failures and app will crash. In a VMA change use case, we > should return -EAGAIN to imply the app to retry, rather than crashing. No. The return value should express that the VMA is invalid. -EINVAL could work, but looking now at the manual -ENOENT would be even better: ENOENT (since Linux 4.11) The faulting process has changed its virtual memory layout simultaneously with an outstanding UFFDIO_COPY operation. > Thanks, > -- > Peter Xu -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry() 2026-04-02 4:02 ` Mike Rapoport @ 2026-04-02 5:59 ` David CARLIER 2026-04-02 13:29 ` Peter Xu 1 sibling, 0 replies; 13+ messages in thread From: David CARLIER @ 2026-04-02 5:59 UTC (permalink / raw) To: Mike Rapoport Cc: Peter Xu, Andrew Morton, linux-mm, linux-kernel, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka Understood on all points. Will rework patch 1 to a simple ops comparison with -ENOENT, drop vma_snapshot entirely. Will cc MEMORY MAPPING folks. Holding off until after -rc1, will resend both then. Cheers. On Thu, 2 Apr 2026 at 05:02, Mike Rapoport <rppt@kernel.org> wrote: > > On Wed, Apr 01, 2026 at 03:22:03PM -0400, Peter Xu wrote: > > > > The other thing is I just noticed the err code was changed to -EINVAL for > > snapshot changed cases, sorry I didn't follow previously as closely on the > > discussion. I think it should be -EAGAIN. It's because the userapp can't > > resolve -EINVAL failures and app will crash. In a VMA change use case, we > > should return -EAGAIN to imply the app to retry, rather than crashing. > > No. The return value should express that the VMA is invalid. -EINVAL could > work, but looking now at the manual -ENOENT would be even better: > > ENOENT (since Linux 4.11) > The faulting process has changed its virtual memory layout > simultaneously with an outstanding UFFDIO_COPY operation. > > > Thanks, > > -- > > Peter Xu > > -- > Sincerely yours, > Mike. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry() 2026-04-02 4:02 ` Mike Rapoport 2026-04-02 5:59 ` David CARLIER @ 2026-04-02 13:29 ` Peter Xu 1 sibling, 0 replies; 13+ messages in thread From: Peter Xu @ 2026-04-02 13:29 UTC (permalink / raw) To: Mike Rapoport Cc: David CARLIER, Andrew Morton, linux-mm, linux-kernel, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka Hi, Mike, On Thu, Apr 02, 2026 at 07:02:40AM +0300, Mike Rapoport wrote: > On Wed, Apr 01, 2026 at 03:22:03PM -0400, Peter Xu wrote: > > > > The other thing is I just noticed the err code was changed to -EINVAL for > > snapshot changed cases, sorry I didn't follow previously as closely on the > > discussion. I think it should be -EAGAIN. It's because the userapp can't > > resolve -EINVAL failures and app will crash. In a VMA change use case, we > > should return -EAGAIN to imply the app to retry, rather than crashing. > > No. The return value should express that the VMA is invalid. -EINVAL could > work, but looking now at the manual -ENOENT would be even better: > > ENOENT (since Linux 4.11) > The faulting process has changed its virtual memory layout > simultaneously with an outstanding UFFDIO_COPY operation. The VMA changed, but it doesn't mean the UFFDIO_COPY becomes illegal, am I right? For example, I wonder if it's possible someone runs soft-dirty concurrently with userfaultfd, we shouldn't fail the userapp if there's a concurrent thread collecting dirty information, which IIUC can cause VMA flag changes, and should be benign, and I think there can be other things causing the interruption too. -EAGAIN essentially requested the user to retry. If the VMA is not legal to be operated anymore, it will fail later. However we should allow legal users to pass. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry() 2026-04-01 8:06 ` David CARLIER 2026-04-01 15:23 ` Peter Xu @ 2026-04-02 3:58 ` Mike Rapoport 2026-04-02 13:42 ` Peter Xu 1 sibling, 1 reply; 13+ messages in thread From: Mike Rapoport @ 2026-04-02 3:58 UTC (permalink / raw) To: David CARLIER Cc: Andrew Morton, Peter Xu, linux-mm, linux-kernel, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka Hi David, It feels that you use an LLM for correspondence. Please tune it down to produce more laconic and to the point responses. On Wed, Apr 01, 2026 at 09:06:36AM +0100, David CARLIER wrote: > On Tue, Apr 01, 2026 at 08:49:00AM +0300, Mike Rapoport wrote: > > What does "folio allocated from the original VMA's backing store" exactly > > mean? Why is this a problem? > > Fair point, the commit message was vague here. What I meant is: > > mfill_atomic_pte_copy() captures ops = vma_uffd_ops(state->vma) and > passes it to __mfill_atomic_pte(). There, ops->alloc_folio() allocates > a folio for the original VMA's inode (e.g. a shmem folio for that > specific shmem inode). I wouldn't say ->alloc_folio() allocates a folio _for_ the inode, it allocates it with inode's memory policy. Worst can happen without any changes is that the allocated folio will end up in a wrong node. This is still a footgun, but I don't see it as a big deal. Let's revisit it after -rc1 and please make sure to cc "MEMORY MAPPING" folks for insights about how to better track VMA changes or their absence. > Then mfill_copy_folio_retry() drops all locks for > the copy_from_user retry. After mfill_get_vma() re-acquires them, > state->vma may now point to a replacement VMA, but ops is still the > stale pointer from before the drop. And this is a bug in my uffd refactoring, and it needs to be fixed ASAP with a simple comparison of old ops and new ops. > > Second, I have reservations about vma_snapshot implementation. What > > invariant does it exactly enforce? > > The invariant I was going for: "the folio we allocated is still > compatible with the VMA we're about to install it into." Since > alloc_folio() allocates from the VMA's backing file (inode), checking > that vm_file is still the same after re-acquiring locks ensures the > folio matches the inode. Again, it's not that folio matches the inode, but folio is allocated using the correct mempolicy. > The vm_flags comparison was a secondary guard against permission/type > changes during the window. Permissions should be fine, they are checked in userfaultfd_register. Some other flags that don't matter to uffd operation may change during the window, though and then a comparison of vm_flags will give a false positive. -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry() 2026-04-02 3:58 ` Mike Rapoport @ 2026-04-02 13:42 ` Peter Xu 0 siblings, 0 replies; 13+ messages in thread From: Peter Xu @ 2026-04-02 13:42 UTC (permalink / raw) To: Mike Rapoport Cc: David CARLIER, Andrew Morton, linux-mm, linux-kernel, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka Hi, Mike, Let me also leave my comments inline just for you to consider. On Thu, Apr 02, 2026 at 06:58:33AM +0300, Mike Rapoport wrote: > Hi David, > > It feels that you use an LLM for correspondence. Please tune it down to > produce more laconic and to the point responses. > > On Wed, Apr 01, 2026 at 09:06:36AM +0100, David CARLIER wrote: > > On Tue, Apr 01, 2026 at 08:49:00AM +0300, Mike Rapoport wrote: > > > What does "folio allocated from the original VMA's backing store" exactly > > > mean? Why is this a problem? > > > > Fair point, the commit message was vague here. What I meant is: > > > > mfill_atomic_pte_copy() captures ops = vma_uffd_ops(state->vma) and > > passes it to __mfill_atomic_pte(). There, ops->alloc_folio() allocates > > a folio for the original VMA's inode (e.g. a shmem folio for that > > specific shmem inode). > > I wouldn't say ->alloc_folio() allocates a folio _for_ the inode, it > allocates it with inode's memory policy. Worst can happen without any > changes is that the allocated folio will end up in a wrong node. For shmem it's only about mempolicy indeed, but since we're trying to export it as an API in the series, IMHO it would be nice to be generic. So we shouldn't assume it's only about mempolicy, we should rely on detecting any context change and bail out with -EAGAIN, relying all rest checks to the next UFFDIO_COPY ioctl done on top of the new mapping topology. > > This is still a footgun, but I don't see it as a big deal. IIUC this is a real bug reported. Actually, if my understanding is correct, we should be able to easily write a reproducer by registering the src addr of UFFDIO_COPY to userfaultfd too, then the ioctl(UFFDIO_COPY) thread will get blocked faulting in the src_addr. During that, we can change the VMA layout in another thread to test whatever setup we want. > Let's revisit it after -rc1 and please make sure to cc "MEMORY MAPPING" > folks for insights about how to better track VMA changes or their absence. No strong feeling here if we want to slightly postpone this fix. It looks like not easy to happen as it looks to be a bug present for a while, indeed. It's just that if my understanding is correct, with above reproducer we can crash the kernel easily without a proper fix. > > > Then mfill_copy_folio_retry() drops all locks for > > the copy_from_user retry. After mfill_get_vma() re-acquires them, > > state->vma may now point to a replacement VMA, but ops is still the > > stale pointer from before the drop. > > And this is a bug in my uffd refactoring, and it needs to be fixed ASAP > with a simple comparison of old ops and new ops. > > > > Second, I have reservations about vma_snapshot implementation. What > > > invariant does it exactly enforce? > > > > The invariant I was going for: "the folio we allocated is still > > compatible with the VMA we're about to install it into." Since > > alloc_folio() allocates from the VMA's backing file (inode), checking > > that vm_file is still the same after re-acquiring locks ensures the > > folio matches the inode. > > Again, it's not that folio matches the inode, but folio is allocated using > the correct mempolicy. > > > The vm_flags comparison was a secondary guard against permission/type > > changes during the window. > > Permissions should be fine, they are checked in userfaultfd_register. > Some other flags that don't matter to uffd operation may change during the > window, though and then a comparison of vm_flags will give a false > positive. IMHO false positive is fine in this case when -EAGAIN will be used (which I still think we should), if it only causes a retry. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-04-02 13:42 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-31 13:41 [PATCH v4] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry() David Carlier 2026-04-01 3:01 ` Andrew Morton 2026-04-01 7:49 ` Mike Rapoport 2026-04-01 8:06 ` David CARLIER 2026-04-01 15:23 ` Peter Xu 2026-04-01 18:34 ` David CARLIER 2026-04-01 19:22 ` Peter Xu 2026-04-01 20:05 ` David CARLIER 2026-04-02 4:02 ` Mike Rapoport 2026-04-02 5:59 ` David CARLIER 2026-04-02 13:29 ` Peter Xu 2026-04-02 3:58 ` Mike Rapoport 2026-04-02 13:42 ` Peter Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox