All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@kernel.org>
To: Lorenzo Stoakes <ljs@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Carlier <devnexen@gmail.com>,
	David Hildenbrand <david@kernel.org>,
	Heechan Kang <gganji11@naver.com>,
	"Liam R. Howlett" <liam@infradead.org>,
	Michael Bommarito <michael.bommarito@gmail.com>,
	Peter Xu <peterx@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RESEND] userfaultfd: snapshot VMA state across UFFDIO_COPY retry
Date: Wed, 27 May 2026 22:01:30 +0300	[thread overview]
Message-ID: <ahc_iuhFIiGph0D5@kernel.org> (raw)
In-Reply-To: <ahcUDJ-eZP2GRwOJ@lucifer>

On Wed, May 27, 2026 at 05:08:23PM +0100, Lorenzo Stoakes wrote:
> On Tue, May 26, 2026 at 08:30:32PM +0300, Mike Rapoport wrote:
> > On Tue, May 26, 2026 at 04:12:56PM +0100, Lorenzo Stoakes wrote:
> > > On Tue, May 19, 2026 at 08:25:16AM +0300, Mike Rapoport wrote:
> > > > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> > > >
> > > > mfill_copy_folio_retry() drops the VMA lock for copy_from_user() and
> > > > reacquires it afterwards. The destination VMA can be replaced during that
> > > > window.
> > > >
> > > > The existing check compares vma_uffd_ops() before and after the retry, but
> > > > if a shmem VMA with MAP_SHARED is replaced with a shmem VMA with
> > > > MAP_PRIVATE (or vice versa) the replacement goes undetected.
> > > >
> > > > The change from MAP_PRIVATE to MAP_SHARED will treat the folio allocated
> > > > with shmem_alloc_folio() as anonymous and this will cause BUG() when
> > > > mfill_atomic_install_pte() will try to folio_add_new_anon_rmap().
> > > >
> > > > The change from MAP_SHARED to MAP_PRIVATE allows injection of folios into
> > > > the page cache of the original VMA.
> > >
> > > OK this seems like a sensible subset of things to check - we are looking to
> > > check what might materially impact _this particular operation_, and working
> > > to the absolute minimum we need to check.
> > >
> > > >
> > > > Introduce helpers for more comprehensive comparison of VMA state:
> > > > - vma_snapshot_get() to save the relevant VMA state into a struct
> > > >   vma_snapshot (original uffd_ops, actual uffd_ops, relevant VMA flags,
> > > >   vm_file and pgoff) before dropping the lock
> > > > - vma_snapshot_changed() to compare the saved state with the state of the
> > > >   VMA acquired after retaking the locks
> > > > - vma_snapshot_put() to release vm_file pinning.
> > >
> > > I feel like this is possibly a little misleading, it sort of implies that
> > > this is a broader kind of snapshot rather than some uffd-specific thing.
> > >
> > > This is more of a naming thing, yes things being defined in
> > > mm/userfaultfd.c tells you it's uffd-specific, but people are easily
> > > confused by stuff like this.
> > >
> > > Will comment inline.
> >
> > tl;dr:
> > * will update comments
> > * will rename vma_uffd_copy_ops() and replace NULL checks with VM_WARN
> > * will change some of single letter variable names.
> 
> Ack, thanks!
> 
> >
> > > >
> > > > Use DEFINE_FREE() cleanup to wrap vma_snapshot_put() to avoid complicating
> > > > error handling paths in mfill_copy_folio_retry().
> > >
> > > That's nice!
> > >
> > > >
> > > > Add vma_uffd_copy_ops() to avoid code duplication when original ops of
> > > > shmem VMA with MAP_PRIVATE are replaced with anon_uffd_ops.
> > >
> > > I think this is a bit overly brief. And is it only shmem where this might
> > > happen? Is MAP_PRIVATE hugetlbfs not also possible?
> >
> > No, hugetlb is not applicabe here, it was not changed to use
> > mfill_copy_folio_retry().
> 
> OK, maybe worth a mention?

Ok.
 
> > > > +static const struct vm_uffd_ops *vma_uffd_copy_ops(struct vm_area_struct *vma)
> > >
> > > I find this a bit confusing, these are the uffd ops for... copy? Or you're
> > > copying uffd ops?
> > >
> > > I'd maybe rename it to vma_uffd_ops_on_copy() to make it clear which it is,
> > > and add a comment like:
> >
> > vma_uffd_ops_for_copy()?
> 
> Works for me!

It's not needed actually, so I dropped it completely.

> > > /*
> > >  * Determine the ops to use when performing a UFFDIO_COPY operation - this
> > >  * specifically handles the case of a MAP_PRIVATE file-backed mapping,
> > >  * which, upon copy, is CoW'd into anonymous memory.
> > >  */
> >
> > I'll pick the large comment that got dropped and stick it here.
> 
> Ack thanks.

And that remains in place.

> > > I'm not sure if you were protecting against a possible future usage? But in
> > > that case surely you'd just want to VM_WARN_ON_ONCE(!ops)?
> > >
> > > We should also have a comment describing the situation like:
> > >
> > > 	/*
> > > 	 * Only async WP allows arbitrary file-backed mappings for UFFD,
> > > 	 * which is the only case in which ops can be NULL. However,
> > > 	 * we are copying here which is not permitted for these
> > >          * mappings, so this will never be the case here.
> > > 	 */
> >
> > I can live with one sentence tops here ;-)
> 
> It took me a _long_ time to understand what you were doing here, there's a
> balance to be had in commenting, but right now it's tilted towards 'nobody but
> Mike understands this'.

That's job security isn't it? ;-P
 
> I can tell, because nobody else bothered to review this in detail ;)

Were you on strike? ;-)
 
> > > Also in:
> > >
> > > static const struct vm_uffd_ops *vma_uffd_ops(struct vm_area_struct *vma)
> > > {
> > > 	if (vma_is_anonymous(vma))
> > > 		return &anon_uffd_ops;
> > > 	return vma->vm_ops ? vma->vm_ops->uffd_ops : NULL;
> > > }
> > >
> > > This is doing a redundant check _and_ making life confusing, as if
> > > !vma->vm_ops is a condition that can be reached there, it can't, as
> > > vma_is_anonymous() is literally a !vma->vm_ops check :)
> > >
> > > So surely that should be:
> > >
> > > static const struct vm_uffd_ops *vma_uffd_ops(struct vm_area_struct *vma)
> > > {
> > > 	if (vma_is_anonymous(vma))
> > > 		return &anon_uffd_ops;
> > >
> > > 	return vma->vm_ops->uffd_ops;
> > > }
> > >
> > > I mean obviously that can be a separate patch, but that's not helped
> > > make things clear here! :)
> >
> > You are welcome to send it ;-)
> 
> Well you asked for it ;)

Did it myself.
 
> > > > +struct vma_snapshot {
> > > > +	const struct vm_uffd_ops *copy_ops;
> > > > +	const struct vm_uffd_ops *ops;
> > > > +	struct file *file;
> > > > +	vma_flags_t flags;
> > > > +	pgoff_t pgoff;
> > > > +};
> > >
> > > Yeah I don't love this naming, you're not snapshotting the VMA at all in
> > > any real sense, you're comparing a subset of things specifically for the
> > > UFFDIO_COPY case.
> > >
> > > Which is fine - but I think we should just make that clear here, because
> > > perhaps in the future we'll want a generic version of something like this
> > > and this is going to be super confusing if we do...
> >
> > Let's deal with it in the future then. This is a local structure, used by a
> > few local functions.
> 
> I'd really rather we rename it. This is asking for confusion...

Renamed everything to mfill_retry_state. 

> > > vma_uffdio_copy_snapshot maybe ?
> > >
> > > Maybe ok to leave the functions the same name to avoid a mouthful, the type
> > > name should make it clear what's going on.
> > >
> > > > +
> > > > +static void vma_snapshot_get(struct vma_snapshot *s, struct vm_area_struct *vma)
> > >
> > > No single letter var please! This isn't plan9 :)
> >
> > "C is a Spartan language, and your naming conventions should follow suit"
> > https://docs.kernel.org/process/coding-style.html#naming
> >
> > Here it's perfectly understandable what 's' means.
> 
> That very document goes on to give suggestions that aren't single letter :)

There's 'i' there a paragraph below ;)
 
> > > > +static bool vma_snapshot_changed(struct vma_snapshot *s,
> > >
> > > > +				 struct vm_area_struct *vma)
> > >
> > > Can the VMA potentially be any arbitrary VMA that might now be remapped in
> > > place? Like anon vs. file and same file and offset?
> > >
> > >
> > > > +{
> > > > +	vma_flags_t flags = vma_flags_and_mask(&vma->flags, VMA_SNAPSHOT_FLAGS);
> > > > +
> > >
> > > You have a comment for each of the other cases, so one here is useful I
> > > think, e.g.:
> > >
> > > 	/* Have any UFFD flags (missing, WP, minor) changed? */

Added it, although the flags are just one line above.

> > >
> > > > +	if (!vma_flags_same_pair(&s->flags, &flags))
> > > > +		return true;
> > >
> > > > +
> > > > +	/* VMA type or effective uffd_ops changed while the lock was dropped */
> > >
> > > Can the 'effective' (probably better to say copy or use effective
> > > throughout) uffd_ops change though? You already check that VMA_SHARED_BIT
> > > is either set or unset across the lock drop above so is the copy_ops check
> > > even meaningful here?
> >
> > Yes, it can change if a shmem VMA changes from SHARED to PRIVATE or vice
> > versa.
> 
> How?
> 
> 	if (!vma_flags_same_pair(&s->flags, &flags))
> 		return true;
> 
> Checks against VMA_SNAPSHOT_FLAGS which includes VMA_SHARED_BIT. So that would
> catch it already no?
 
Yes, you are right. It's enough to check for flags and the original ops.
And with the copy_ops check dropped there is no need in
vma_uffd_copy_ops().

> > > > +	if (s->ops != vma_uffd_ops(vma) || s->copy_ops != vma_uffd_copy_ops(vma))
> > > > +		return true;
> > > Surely we should also check that the UFFD context matches (accounting for
> > > NULL cases)?
> >
> > It's checked in mfill_get_vma().
> 
> Yeah it's a bit strange not to do that on a validation step though, would prefer
> we live with the duplication or at least add a comment saying it's checked.
 
There's a bunch of validation checks on mfill_get_vma() path, let's leave
them there. 

> > > > +	/* VMA was file backed, but inode or offset has changed */
> > > > +	if (!vma->vm_file || vma->vm_file->f_inode != s->file->f_inode ||
> > > > +	    vma->vm_pgoff != s->pgoff)
> > > > +		return true;
> > >
> > > Hm so are we OK with a VMA where the file has been closed, then the same
> > > inode re-opened at the exact same offset? (e.g. vma->vm_file differs)
> >
> > The copy will go to exactly the same place in that case.
> 
> I mean OK, but maybe it's better to be conserative? Is just a bit weird to
> tolerate that when we pin the file we could just check file equality?

Yes, we could :) 

> > > >  static int mfill_copy_folio_retry(struct mfill_state *state,
> > > >  				  struct folio *folio)
> > > >  {
> > > > -	const struct vm_uffd_ops *orig_ops = vma_uffd_ops(state->vma);
> > > > +	struct vma_snapshot s = { 0 };
> > > > +	struct vma_snapshot *p __free(snapshot_put) = &s;
> > >
> > > Could we avoid these single letter var names?
> >
> > You suggest to call it a_dummy_pointer_to_allow_use_of_free_cleanup? ;-P
> 
> Or, and I know it's _crazy_, but like 'snapshot' and 'prev'(or whatever p is
> supposed to be - see the problem??), or something vaguely greppable? :)

p is pointer, prev has nothing to do with it.
But renamed anyway.
 
> Cheers, Lorenzo

-- 
Sincerely yours,
Mike.


      reply	other threads:[~2026-05-27 19:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19  5:25 [PATCH RESEND] userfaultfd: snapshot VMA state across UFFDIO_COPY retry Mike Rapoport
2026-05-19  5:36 ` David CARLIER
2026-05-20 12:40   ` Mike Rapoport
2026-05-20 11:09 ` David Hildenbrand (Arm)
2026-05-20 12:53   ` Mike Rapoport
2026-05-20 13:48     ` David Hildenbrand (Arm)
2026-05-20 14:03       ` David Hildenbrand (Arm)
2026-05-26 15:23         ` Lorenzo Stoakes
2026-05-20 14:12       ` Mike Rapoport
2026-05-20 14:38         ` David Hildenbrand (Arm)
2026-05-25 17:12           ` Liam R. Howlett
2026-05-26 12:47             ` David Hildenbrand (Arm)
2026-05-26 15:25               ` Lorenzo Stoakes
2026-05-26 19:06                 ` Liam R. Howlett
2026-05-26 15:12 ` Lorenzo Stoakes
2026-05-26 17:30   ` Mike Rapoport
2026-05-27 16:08     ` Lorenzo Stoakes
2026-05-27 19:01       ` Mike Rapoport [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahc_iuhFIiGph0D5@kernel.org \
    --to=rppt@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=devnexen@gmail.com \
    --cc=gganji11@naver.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=michael.bommarito@gmail.com \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.