Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@kernel.org>
To: Lorenzo Stoakes <ljs@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Carlier <devnexen@gmail.com>,
	David Hildenbrand <david@kernel.org>,
	Heechan Kang <gganji11@naver.com>,
	"Liam R. Howlett" <liam@infradead.org>,
	Michael Bommarito <michael.bommarito@gmail.com>,
	Peter Xu <peterx@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RESEND] userfaultfd: snapshot VMA state across UFFDIO_COPY retry
Date: Wed, 27 May 2026 22:01:30 +0300	[thread overview]
Message-ID: <ahc_iuhFIiGph0D5@kernel.org> (raw)
In-Reply-To: <ahcUDJ-eZP2GRwOJ@lucifer>

On Wed, May 27, 2026 at 05:08:23PM +0100, Lorenzo Stoakes wrote:
> On Tue, May 26, 2026 at 08:30:32PM +0300, Mike Rapoport wrote:
> > On Tue, May 26, 2026 at 04:12:56PM +0100, Lorenzo Stoakes wrote:
> > > On Tue, May 19, 2026 at 08:25:16AM +0300, Mike Rapoport wrote:
> > > > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> > > >
> > > > mfill_copy_folio_retry() drops the VMA lock for copy_from_user() and
> > > > reacquires it afterwards. The destination VMA can be replaced during that
> > > > window.
> > > >
> > > > The existing check compares vma_uffd_ops() before and after the retry, but
> > > > if a shmem VMA with MAP_SHARED is replaced with a shmem VMA with
> > > > MAP_PRIVATE (or vice versa) the replacement goes undetected.
> > > >
> > > > The change from MAP_PRIVATE to MAP_SHARED will treat the folio allocated
> > > > with shmem_alloc_folio() as anonymous and this will cause BUG() when
> > > > mfill_atomic_install_pte() will try to folio_add_new_anon_rmap().
> > > >
> > > > The change from MAP_SHARED to MAP_PRIVATE allows injection of folios into
> > > > the page cache of the original VMA.
> > >
> > > OK this seems like a sensible subset of things to check - we are looking to
> > > check what might materially impact _this particular operation_, and working
> > > to the absolute minimum we need to check.
> > >
> > > >
> > > > Introduce helpers for more comprehensive comparison of VMA state:
> > > > - vma_snapshot_get() to save the relevant VMA state into a struct
> > > >   vma_snapshot (original uffd_ops, actual uffd_ops, relevant VMA flags,
> > > >   vm_file and pgoff) before dropping the lock
> > > > - vma_snapshot_changed() to compare the saved state with the state of the
> > > >   VMA acquired after retaking the locks
> > > > - vma_snapshot_put() to release vm_file pinning.
> > >
> > > I feel like this is possibly a little misleading, it sort of implies that
> > > this is a broader kind of snapshot rather than some uffd-specific thing.
> > >
> > > This is more of a naming thing, yes things being defined in
> > > mm/userfaultfd.c tells you it's uffd-specific, but people are easily
> > > confused by stuff like this.
> > >
> > > Will comment inline.
> >
> > tl;dr:
> > * will update comments
> > * will rename vma_uffd_copy_ops() and replace NULL checks with VM_WARN
> > * will change some of single letter variable names.
> 
> Ack, thanks!
> 
> >
> > > >
> > > > Use DEFINE_FREE() cleanup to wrap vma_snapshot_put() to avoid complicating
> > > > error handling paths in mfill_copy_folio_retry().
> > >
> > > That's nice!
> > >
> > > >
> > > > Add vma_uffd_copy_ops() to avoid code duplication when original ops of
> > > > shmem VMA with MAP_PRIVATE are replaced with anon_uffd_ops.
> > >
> > > I think this is a bit overly brief. And is it only shmem where this might
> > > happen? Is MAP_PRIVATE hugetlbfs not also possible?
> >
> > No, hugetlb is not applicabe here, it was not changed to use
> > mfill_copy_folio_retry().
> 
> OK, maybe worth a mention?

Ok.
 
> > > > +static const struct vm_uffd_ops *vma_uffd_copy_ops(struct vm_area_struct *vma)
> > >
> > > I find this a bit confusing, these are the uffd ops for... copy? Or you're
> > > copying uffd ops?
> > >
> > > I'd maybe rename it to vma_uffd_ops_on_copy() to make it clear which it is,
> > > and add a comment like:
> >
> > vma_uffd_ops_for_copy()?
> 
> Works for me!

It's not needed actually, so I dropped it completely.

> > > /*
> > >  * Determine the ops to use when performing a UFFDIO_COPY operation - this
> > >  * specifically handles the case of a MAP_PRIVATE file-backed mapping,
> > >  * which, upon copy, is CoW'd into anonymous memory.
> > >  */
> >
> > I'll pick the large comment that got dropped and stick it here.
> 
> Ack thanks.

And that remains in place.

> > > I'm not sure if you were protecting against a possible future usage? But in
> > > that case surely you'd just want to VM_WARN_ON_ONCE(!ops)?
> > >
> > > We should also have a comment describing the situation like:
> > >
> > > 	/*
> > > 	 * Only async WP allows arbitrary file-backed mappings for UFFD,
> > > 	 * which is the only case in which ops can be NULL. However,
> > > 	 * we are copying here which is not permitted for these
> > >          * mappings, so this will never be the case here.
> > > 	 */
> >
> > I can live with one sentence tops here ;-)
> 
> It took me a _long_ time to understand what you were doing here, there's a
> balance to be had in commenting, but right now it's tilted towards 'nobody but
> Mike understands this'.

That's job security isn't it? ;-P
 
> I can tell, because nobody else bothered to review this in detail ;)

Were you on strike? ;-)
 
> > > Also in:
> > >
> > > static const struct vm_uffd_ops *vma_uffd_ops(struct vm_area_struct *vma)
> > > {
> > > 	if (vma_is_anonymous(vma))
> > > 		return &anon_uffd_ops;
> > > 	return vma->vm_ops ? vma->vm_ops->uffd_ops : NULL;
> > > }
> > >
> > > This is doing a redundant check _and_ making life confusing, as if
> > > !vma->vm_ops is a condition that can be reached there, it can't, as
> > > vma_is_anonymous() is literally a !vma->vm_ops check :)
> > >
> > > So surely that should be:
> > >
> > > static const struct vm_uffd_ops *vma_uffd_ops(struct vm_area_struct *vma)
> > > {
> > > 	if (vma_is_anonymous(vma))
> > > 		return &anon_uffd_ops;
> > >
> > > 	return vma->vm_ops->uffd_ops;
> > > }
> > >
> > > I mean obviously that can be a separate patch, but that's not helped
> > > make things clear here! :)
> >
> > You are welcome to send it ;-)
> 
> Well you asked for it ;)

Did it myself.
 
> > > > +struct vma_snapshot {
> > > > +	const struct vm_uffd_ops *copy_ops;
> > > > +	const struct vm_uffd_ops *ops;
> > > > +	struct file *file;
> > > > +	vma_flags_t flags;
> > > > +	pgoff_t pgoff;
> > > > +};
> > >
> > > Yeah I don't love this naming, you're not snapshotting the VMA at all in
> > > any real sense, you're comparing a subset of things specifically for the
> > > UFFDIO_COPY case.
> > >
> > > Which is fine - but I think we should just make that clear here, because
> > > perhaps in the future we'll want a generic version of something like this
> > > and this is going to be super confusing if we do...
> >
> > Let's deal with it in the future then. This is a local structure, used by a
> > few local functions.
> 
> I'd really rather we rename it. This is asking for confusion...

Renamed everything to mfill_retry_state. 

> > > vma_uffdio_copy_snapshot maybe ?
> > >
> > > Maybe ok to leave the functions the same name to avoid a mouthful, the type
> > > name should make it clear what's going on.
> > >
> > > > +
> > > > +static void vma_snapshot_get(struct vma_snapshot *s, struct vm_area_struct *vma)
> > >
> > > No single letter var please! This isn't plan9 :)
> >
> > "C is a Spartan language, and your naming conventions should follow suit"
> > https://docs.kernel.org/process/coding-style.html#naming
> >
> > Here it's perfectly understandable what 's' means.
> 
> That very document goes on to give suggestions that aren't single letter :)

There's 'i' there a paragraph below ;)
 
> > > > +static bool vma_snapshot_changed(struct vma_snapshot *s,
> > >
> > > > +				 struct vm_area_struct *vma)
> > >
> > > Can the VMA potentially be any arbitrary VMA that might now be remapped in
> > > place? Like anon vs. file and same file and offset?
> > >
> > >
> > > > +{
> > > > +	vma_flags_t flags = vma_flags_and_mask(&vma->flags, VMA_SNAPSHOT_FLAGS);
> > > > +
> > >
> > > You have a comment for each of the other cases, so one here is useful I
> > > think, e.g.:
> > >
> > > 	/* Have any UFFD flags (missing, WP, minor) changed? */

Added it, although the flags are just one line above.

> > >
> > > > +	if (!vma_flags_same_pair(&s->flags, &flags))
> > > > +		return true;
> > >
> > > > +
> > > > +	/* VMA type or effective uffd_ops changed while the lock was dropped */
> > >
> > > Can the 'effective' (probably better to say copy or use effective
> > > throughout) uffd_ops change though? You already check that VMA_SHARED_BIT
> > > is either set or unset across the lock drop above so is the copy_ops check
> > > even meaningful here?
> >
> > Yes, it can change if a shmem VMA changes from SHARED to PRIVATE or vice
> > versa.
> 
> How?
> 
> 	if (!vma_flags_same_pair(&s->flags, &flags))
> 		return true;
> 
> Checks against VMA_SNAPSHOT_FLAGS which includes VMA_SHARED_BIT. So that would
> catch it already no?
 
Yes, you are right. It's enough to check for flags and the original ops.
And with the copy_ops check dropped there is no need in
vma_uffd_copy_ops().

> > > > +	if (s->ops != vma_uffd_ops(vma) || s->copy_ops != vma_uffd_copy_ops(vma))
> > > > +		return true;
> > > Surely we should also check that the UFFD context matches (accounting for
> > > NULL cases)?
> >
> > It's checked in mfill_get_vma().
> 
> Yeah it's a bit strange not to do that on a validation step though, would prefer
> we live with the duplication or at least add a comment saying it's checked.
 
There's a bunch of validation checks on mfill_get_vma() path, let's leave
them there. 

> > > > +	/* VMA was file backed, but inode or offset has changed */
> > > > +	if (!vma->vm_file || vma->vm_file->f_inode != s->file->f_inode ||
> > > > +	    vma->vm_pgoff != s->pgoff)
> > > > +		return true;
> > >
> > > Hm so are we OK with a VMA where the file has been closed, then the same
> > > inode re-opened at the exact same offset? (e.g. vma->vm_file differs)
> >
> > The copy will go to exactly the same place in that case.
> 
> I mean OK, but maybe it's better to be conserative? Is just a bit weird to
> tolerate that when we pin the file we could just check file equality?

Yes, we could :) 

> > > >  static int mfill_copy_folio_retry(struct mfill_state *state,
> > > >  				  struct folio *folio)
> > > >  {
> > > > -	const struct vm_uffd_ops *orig_ops = vma_uffd_ops(state->vma);
> > > > +	struct vma_snapshot s = { 0 };
> > > > +	struct vma_snapshot *p __free(snapshot_put) = &s;
> > >
> > > Could we avoid these single letter var names?
> >
> > You suggest to call it a_dummy_pointer_to_allow_use_of_free_cleanup? ;-P
> 
> Or, and I know it's _crazy_, but like 'snapshot' and 'prev'(or whatever p is
> supposed to be - see the problem??), or something vaguely greppable? :)

p is pointer, prev has nothing to do with it.
But renamed anyway.
 
> Cheers, Lorenzo

-- 
Sincerely yours,
Mike.


      reply	other threads:[~2026-05-27 19:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19  5:25 [PATCH RESEND] userfaultfd: snapshot VMA state across UFFDIO_COPY retry Mike Rapoport
2026-05-19  5:36 ` David CARLIER
2026-05-20 12:40   ` Mike Rapoport
2026-05-20 11:09 ` David Hildenbrand (Arm)
2026-05-20 12:53   ` Mike Rapoport
2026-05-20 13:48     ` David Hildenbrand (Arm)
2026-05-20 14:03       ` David Hildenbrand (Arm)
2026-05-26 15:23         ` Lorenzo Stoakes
2026-05-20 14:12       ` Mike Rapoport
2026-05-20 14:38         ` David Hildenbrand (Arm)
2026-05-25 17:12           ` Liam R. Howlett
2026-05-26 12:47             ` David Hildenbrand (Arm)
2026-05-26 15:25               ` Lorenzo Stoakes
2026-05-26 19:06                 ` Liam R. Howlett
2026-05-26 15:12 ` Lorenzo Stoakes
2026-05-26 17:30   ` Mike Rapoport
2026-05-27 16:08     ` Lorenzo Stoakes
2026-05-27 19:01       ` Mike Rapoport [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahc_iuhFIiGph0D5@kernel.org \
    --to=rppt@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=devnexen@gmail.com \
    --cc=gganji11@naver.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=michael.bommarito@gmail.com \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox