All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Lorenzo Stoakes <lstoakes@gmail.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Rapoport <rppt@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>
Subject: Re: [PATCH v2] mm: userfaultfd: avoid passing an invalid range to vma_merge()
Date: Wed, 17 May 2023 09:50:31 -0400	[thread overview]
Message-ID: <ZGTbp9LLNYG4ILXk@x1n> (raw)
In-Reply-To: <20230516225251.xwmz2oyebo7k56ys@revolver>

On Tue, May 16, 2023 at 06:52:51PM -0400, Liam R. Howlett wrote:
> * Peter Xu <peterx@redhat.com> [230516 16:12]:
> ...
> 
> > > > 
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > ---
> > > >  fs/userfaultfd.c | 27 +++++++++++++++++++++------
> > > >  1 file changed, 21 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > > > index 0fd96d6e39ce..7eb88bc74d00 100644
> > > > --- a/fs/userfaultfd.c
> > > > +++ b/fs/userfaultfd.c
> > > > @@ -1458,10 +1458,17 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
> > > >  	BUG_ON(!found);
> > > >  
> > > >  	vma_iter_set(&vmi, start);
> > > > -	prev = vma_prev(&vmi);
> > > > +	vma = vma_find(&vmi, end);
> > > > +	if (!vma)
> > > > +		goto out_unlock;
> > > > +
> > > > +	if (vma->vm_start < start)
> > > > +		prev = vma;
> > > > +	else
> > > > +		prev = vma_prev(&vmi);
> > > >  
> > > >  	ret = 0;
> > > > -	for_each_vma_range(vmi, vma, end) {
> > > > +	do {
> > > 
> > > The iterator may be off by one, depending on if vma_prev() is called or
> > > not.
> > > 
> > > Perhaps:
> > > 	prev = vma_prev(&vmi); /* could be wrong, or null */
> > > 	vma = vma_find(&vmi, end);
> > > 	if (!vma)
> > > 		goto out_unlock;
> > > 
> > > 	if (vma->vm_start < start)
> > > 		prev = vma;
> > > 
> > > now we know we are at vma with the iterator..
> > > 	ret = 0
> > > 	do{
> > > 	...
> > 
> > Will do, thanks.
> > 
> > I think I got trapped similarly when I was looking at xarray months ago
> > where xarray also had similar side effects to have off-by-one the iterator
> > behavior.
> > 
> > Do you think it'll make sense to have something describing such side
> > effects for maple tree (or the current vma api), or.. maybe there's already
> > some but I just didn't really know?
> 
> Well, it's an iterator so I though a position was implied.  But I think
> the documentation is lacking on the vma_iterator interface and I should
> fix that.

Thanks.

> 
> ...
> 
> > > > From: Peter Xu <peterx@redhat.com>
> > > > Date: Tue, 16 May 2023 09:39:38 -0400
> > > > Subject: [PATCH 2/2] mm/uffd: Allow vma to merge as much as possible
> > > > 
> > > > We used to not pass in the pgoff correctly when register/unregister uffd
> > > > regions, it caused incorrect behavior on vma merging.
> > > > 
> > > > For example, when we have:
> > > > 
> > > >   vma1(range 0-9, with uffd), vma2(range 10-19, no uffd)
> > > > 
> > > > Then someone unregisters uffd on range (5-9), it should become:
> > > > 
> > > >   vma1(range 0-4, with uffd), vma2(range 5-19, no uffd)
> > > > 
> > > > But with current code it's:
> > > > 
> > > >   vma1(range 0-4, with uffd), vma3(range 5-9, no uffd), vma2(range 10-19, no uffd)
> > > > 
> > > > This patch allows such merge to happen correctly.
> > > > 
> > > > This behavior seems to have existed since the 1st day of uffd, keep it just
> > > > as a performance optmization and not copy stable.
> > > > 
> > > > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > > > Cc: Mike Rapoport (IBM) <rppt@kernel.org>
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > ---
> > > >  fs/userfaultfd.c | 8 ++++++--
> > > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > > > index 7eb88bc74d00..891048b4799f 100644
> > > > --- a/fs/userfaultfd.c
> > > > +++ b/fs/userfaultfd.c
> > > > @@ -1332,6 +1332,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
> > > >  	bool basic_ioctls;
> > > >  	unsigned long start, end, vma_end;
> > > >  	struct vma_iterator vmi;
> > > > +	pgoff_t pgoff;
> > > >  
> > > >  	user_uffdio_register = (struct uffdio_register __user *) arg;
> > > >  
> > > > @@ -1489,8 +1490,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
> > > >  		vma_end = min(end, vma->vm_end);
> > > >  
> > > >  		new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags;
> > > > +		pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
> > > 
> > > I don't think this is safe.  You are telling vma_merge() something that
> > > is not true and will result in can_vma_merge_before() passing.  I mean,
> > > sure it will become true after you split (unless you can't?), but I
> > > don't know if you can just merge a VMA that doesn't pass
> > > can_vma_merge_before(), even for a short period?
> > 
> > I must admit I'm not really that handy yet to vma codes, so I could miss
> > something obvious.
> > 
> > My reasoning comes from two parts that this pgoff looks all fine:
> > 
> > 1) It's documented in vma_merge() in that:
> > 
> >  * Given a mapping request (addr,end,vm_flags,file,pgoff,anon_name),
> >  * figure out ...
> > 
> >   So fundamentally this pgoff is part of the mapping request paired with
> >   all the rest of the information.  AFAICT it means it must match with what
> >   "addr" is describing in VA address space.  That's why I think offseting
> >   it makes sense here.
> > 
> >   It also matches my understanding in vma_merge() code on how the pgoff is
> >   used.
> > 
> > 2) Uffd is nothing special in this regard, namely:
> > 
> >    mbind_range():
> > 
> > 	pgoff = vma->vm_pgoff + ((vmstart - vma->vm_start) >> PAGE_SHIFT);
> > 	merged = vma_merge(vmi, vma->vm_mm, *prev, vmstart, vmend, vma->vm_flags,
> > 			 vma->anon_vma, vma->vm_file, pgoff, new_pol,
> > 			 vma->vm_userfaultfd_ctx, anon_vma_name(vma));
> > 
> >    mlock_fixup():
> >    
> > 	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
> > 	*prev = vma_merge(vmi, mm, *prev, start, end, newflags,
> > 			vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
> > 			vma->vm_userfaultfd_ctx, anon_vma_name(vma));
> > 
> >    mprotect_fixup():
> > 
> > 	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
> > 	*pprev = vma_merge(vmi, mm, *pprev, start, end, newflags,
> > 			   vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
> > 			   vma->vm_userfaultfd_ctx, anon_vma_name(vma));
> > 
> > I had a feeling that it's just something overlooked in the initial proposal
> > of uffd, but maybe I missed something important?
> 
> I think you are correct.  It's worth noting that all of these skip
> splitting if merging succeeds.

Yes, IIUC that's what we want because vma_merge() just handles everything
there (including split, or say, vma range adjustments) if any !NULL
returned.

> 
> We know it won't match case 1-4 (we have a current vma).  We then pass
> in vma_end = min(end, vma->vm_end);

Case 4 seems still possible and should be the case that mentioned in the
patch 2, iiuc.  But yes I think vma_end calculation is needed, afaik it is
to cover the last iteration, where that's the only place possible that we
may operate on "end" (where < vma->vm_end) rather than "vma->vm_end".  It
actually pairs with the initial "start" adjustment to me.

> 
> vma_lookup() will only be called if end == vma->vm_end, so next will not
> be set (and found) unless it is adjacent to the current vma and the vma
> in question does not need to be split anyways.
> 
> I also see that we use pgoff+pglen in the check, which avoids my concern
> above.

Right.

It seems so far all concerns are more or less ruled out.  I'll prepare a
formal patchset, we can continue the discussion there.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2023-05-17 13:50 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-15 19:32 [PATCH v2] mm: userfaultfd: avoid passing an invalid range to vma_merge() Lorenzo Stoakes
2023-05-15 20:23 ` Liam R. Howlett
2023-05-15 21:27 ` Peter Xu
2023-05-15 22:04   ` Lorenzo Stoakes
2023-05-15 23:07     ` Lorenzo Stoakes
2023-05-16 15:06       ` Peter Xu
2023-05-16 16:49         ` Liam R. Howlett
2023-05-16 20:12           ` Peter Xu
2023-05-16 22:52             ` Liam R. Howlett
2023-05-17 13:50               ` Peter Xu [this message]
2023-05-17 22:51                 ` Liam R. Howlett
2023-05-18  0:38                   ` Peter Xu
2023-05-16 19:25         ` Lorenzo Stoakes
2023-05-16 20:30           ` Peter Xu
2023-05-16 21:01             ` Lorenzo Stoakes
2023-05-16 21:39               ` Peter Xu
2023-05-16 22:15                 ` Lorenzo Stoakes
2023-05-16 22:32                   ` Peter Xu
2023-05-17  6:21                     ` Lorenzo Stoakes
2023-05-16 22:38                 ` Liam R. Howlett
2023-05-16 22:51                   ` Peter Xu
2023-05-16 22:53                     ` Liam R. Howlett
2023-05-15 21:39 ` Peter Xu
2023-05-15 22:10   ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZGTbp9LLNYG4ILXk@x1n \
    --to=peterx@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=mark.rutland@arm.com \
    --cc=rppt@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.