Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@kernel.org>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Carlier <devnexen@gmail.com>,
	Heechan Kang <gganji11@naver.com>,
	"Liam R. Howlett" <liam@infradead.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	Michael Bommarito <michael.bommarito@gmail.com>,
	Peter Xu <peterx@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RESEND] userfaultfd: snapshot VMA state across UFFDIO_COPY retry
Date: Wed, 20 May 2026 15:53:33 +0300	[thread overview]
Message-ID: <ag2uzVgx2Xl7zi58@kernel.org> (raw)
In-Reply-To: <a4e9fac5-e674-4528-9815-d588a35fedec@kernel.org>

On Wed, May 20, 2026 at 01:09:04PM +0200, David Hildenbrand (Arm) wrote:
> On 5/19/26 07:25, Mike Rapoport wrote:
> > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> > 
> > mfill_copy_folio_retry() drops the VMA lock for copy_from_user() and
> > reacquires it afterwards. The destination VMA can be replaced during that
> > window.
> > 
> > The existing check compares vma_uffd_ops() before and after the retry, but
> > if a shmem VMA with MAP_SHARED is replaced with a shmem VMA with
> > MAP_PRIVATE (or vice versa) the replacement goes undetected.
> > 
> > The change from MAP_PRIVATE to MAP_SHARED will treat the folio allocated
> > with shmem_alloc_folio() as anonymous and this will cause BUG() when
> > mfill_atomic_install_pte() will try to folio_add_new_anon_rmap().
> > 
> > The change from MAP_SHARED to MAP_PRIVATE allows injection of folios into
> > the page cache of the original VMA.
> > 
> > Introduce helpers for more comprehensive comparison of VMA state:
> > - vma_snapshot_get() to save the relevant VMA state into a struct
> >   vma_snapshot (original uffd_ops, actual uffd_ops, relevant VMA flags,
> >   vm_file and pgoff) before dropping the lock
> > - vma_snapshot_changed() to compare the saved state with the state of the
> >   VMA acquired after retaking the locks
> > - vma_snapshot_put() to release vm_file pinning.
> > 
> > Use DEFINE_FREE() cleanup to wrap vma_snapshot_put() to avoid complicating
> > error handling paths in mfill_copy_folio_retry().
> > 
> > Add vma_uffd_copy_ops() to avoid code duplication when original ops of
> > shmem VMA with MAP_PRIVATE are replaced with anon_uffd_ops.
> > 
> > Fixes: 292411fda25b ("mm/userfaultfd: detect VMA type change after copy retry in mfill_copy_folio_retry()")
> > Fixes: 6ab703034f14 ("userfaultfd: mfill_atomic(): remove retry logic")
> > Tested-by: Heechan Kang <gganji11@naver.com>
> > Suggested-by: Peter Xu <peterx@redhat.com>
> > Co-developed-by: David Carlier <devnexen@gmail.com>
> > Signed-off-by: David Carlier <devnexen@gmail.com>
> > Co-developed-by: Michael Bommarito <michael.bommarito@gmail.com>
> > Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> > Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> > ---
> >  mm/userfaultfd.c | 99 ++++++++++++++++++++++++++++++++++++++----------
> >  1 file changed, 79 insertions(+), 20 deletions(-)
> > 
> > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > index 180bad42fc79..b70b84776a79 100644
> > --- a/mm/userfaultfd.c
> > +++ b/mm/userfaultfd.c
> > @@ -14,6 +14,8 @@
> >  #include <linux/userfaultfd_k.h>
> >  #include <linux/mmu_notifier.h>
> >  #include <linux/hugetlb.h>
> > +#include <linux/file.h>
> > +#include <linux/cleanup.h>
> >  #include <asm/tlbflush.h>
> >  #include <asm/tlb.h>
> >  #include "internal.h"
> > @@ -69,6 +71,24 @@ static const struct vm_uffd_ops *vma_uffd_ops(struct vm_area_struct *vma)
> >  	return vma->vm_ops ? vma->vm_ops->uffd_ops : NULL;
> >  }
> >  
> > +static const struct vm_uffd_ops *vma_uffd_copy_ops(struct vm_area_struct *vma)
> > +{
> > +	const struct vm_uffd_ops *ops = vma_uffd_ops(vma);
> > +
> > +	if (!ops)
> > +		return NULL;
> > +
> > +	/*
> > +	 * UFFDIO_COPY fills MAP_PRIVATE file-backed mappings as anonymous
> > +	 * memory. This is an effective ops override, so retry validation must
> > +	 * compare the override result, not just vma->vm_ops->uffd_ops.
> > +	 */
> > +	if (!(vma->vm_flags & VM_SHARED))
> > +		return &anon_uffd_ops;
> > +
> > +	return ops;
> > +}
> > +
> >  static __always_inline
> >  bool validate_dst_vma(struct vm_area_struct *dst_vma, unsigned long dst_end)
> >  {
> > @@ -443,14 +463,70 @@ static int mfill_copy_folio_locked(struct folio *folio, unsigned long src_addr)
> >  	return ret;
> >  }
> >  
> > +#define VMA_SNAPSHOT_FLAGS append_vma_flags(__VMA_UFFD_FLAGS, VMA_SHARED_BIT)
> > +
> > +struct vma_snapshot {
> > +	const struct vm_uffd_ops *copy_ops;
> > +	const struct vm_uffd_ops *ops;
> > +	struct file *file;
> > +	vma_flags_t flags;
> > +	pgoff_t pgoff;
> > +};
> 
> As this is all uffd specific, I wonder whether that should be "struct
> uffd_vma_snapshot"/"vma_uffd_snapshot" etc.

It's local to this file, I wouldn't worry about namespacing until there's
an actual need if it will ever arise.
 
> From a high level, this LGTM.
> 
> I wish we could identify relevant VMA changes more easily. Like, using a per-MM
> sequence counter that we simply increment on any VMA changes.

Do you mean per-VMA?
Per-MM counter would capture unrelated changes, e.g. an masvise() for
unrelated range.

We kinda have an infrastructure to detect VMA changes that affect uffd
operation for non-cooperative, we can also hook on that even if
UFFD_EVENT_* are not requested, but that's way more involved that this VMA
snapshot.
 
> -- 
> Cheers,
> 
> David

-- 
Sincerely yours,
Mike.


  reply	other threads:[~2026-05-20 12:53 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19  5:25 [PATCH RESEND] userfaultfd: snapshot VMA state across UFFDIO_COPY retry Mike Rapoport
2026-05-19  5:36 ` David CARLIER
2026-05-20 12:40   ` Mike Rapoport
2026-05-20 11:09 ` David Hildenbrand (Arm)
2026-05-20 12:53   ` Mike Rapoport [this message]
2026-05-20 13:48     ` David Hildenbrand (Arm)
2026-05-20 14:03       ` David Hildenbrand (Arm)
2026-05-20 14:12       ` Mike Rapoport
2026-05-20 14:38         ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ag2uzVgx2Xl7zi58@kernel.org \
    --to=rppt@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=devnexen@gmail.com \
    --cc=gganji11@naver.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=michael.bommarito@gmail.com \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox