linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Peter Xu <peterx@redhat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Jann Horn <jannh@google.com>, Pedro Falcato <pfalcato@suse.de>,
	Rik van Riel <riel@surriel.com>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH v3 09/10] mm/mremap: permit mremap() move of multiple VMAs
Date: Fri, 11 Jul 2025 14:49:28 +0100	[thread overview]
Message-ID: <c9a86e07-1671-42ed-85b6-0b5fa4ce24c0@lucifer.local> (raw)
In-Reply-To: <dcdb3478-68ea-4d9f-af4f-2f5438de45d2@suse.cz>

On Fri, Jul 11, 2025 at 03:34:23PM +0200, Vlastimil Babka wrote:
> +cc linux-api - see the description of the new behavior below

Ah yeah :) I sent on 0/10 also. Friday...

>
> On 7/11/25 13:38, Lorenzo Stoakes wrote:
> > Historically we've made it a uAPI requirement that mremap() may only
> > operate on a single VMA at a time.
> >
> > For instances where VMAs need to be resized, this makes sense, as it
> > becomes very difficult to determine what a user actually wants should they
> > indicate a desire to expand or shrink the size of multiple VMAs (truncate?
> > Adjust sizes individually? Some other strategy?).
> >
> > However, in instances where a user is moving VMAs, it is restrictive to
> > disallow this.
> >
> > This is especially the case when anonymous mapping remap may or may not be
> > mergeable depending on whether VMAs have or have not been faulted due to
> > anon_vma assignment and folio index alignment with vma->vm_pgoff.
> >
> > Often this can result in surprising impact where a moved region is faulted,
> > then moved back and a user fails to observe a merge from otherwise
> > compatible, adjacent VMAs.
> >
> > This change allows such cases to work without the user having to be
> > cognizant of whether a prior mremap() move or other VMA operations has
> > resulted in VMA fragmentation.
> >
> > We only permit this for mremap() operations that do NOT change the size of
> > the VMA and DO specify MREMAP_MAYMOVE | MREMAP_FIXED.
> >
> > Should no VMA exist in the range, -EFAULT is returned as usual.
> >
> > If a VMA move spans a single VMA - then there is no functional change.
> >
> > Otherwise, we place additional requirements upon VMAs:
> >
> > * They must not have a userfaultfd context associated with them - this
> >   requires dropping the lock to notify users, and we want to perform the
> >   operation with the mmap write lock held throughout.
> >
> > * If file-backed, they cannot have a custom get_unmapped_area handler -
> >   this might result in MREMAP_FIXED not being honoured, which could result
> >   in unexpected positioning of VMAs in the moved region.
> >
> > There may be gaps in the range of VMAs that are moved:
> >
> >                    X        Y                       X        Y
> >                  <--->     <->                    <--->     <->
> >          |-------|   |-----| |-----|      |-------|   |-----| |-----|
> >          |   A   |   |  B  | |  C  | ---> |   A'  |   |  B' | |  C' |
> >          |-------|   |-----| |-----|      |-------|   |-----| |-----|
> >         addr                           new_addr
> >
> > The move will preserve the gaps between each VMA.
>
> AFAIU "moving a gap" doesn't mean we unmap anything pre-existing where the
> moved gap's range falls to, right? Worth pointing out explicitly.
>
> > Note that any failures encountered will result in a partial move. Since an
> > mremap() can fail at any time, this might result in only some of the VMAs
> > being moved.
> >
> > Note that failures are very rare and typically require an out of a memory
> > condition or a mapping limit condition to be hit, assuming the VMAs being
> > moved are valid.
> >
> > We don't try to assess ahead of time whether VMAs are valid according to
> > the multi VMA rules, as it would be rather unusual for a user to mix
> > uffd-enabled VMAs and/or VMAs which map unusual driver mappings that
> > specify custom get_unmapped_area() handlers in an aggregate operation.
> >
> > So we optimise for the far, far more likely case of the operation being
> > entirely permissible.
>
> Guess it's the sanest thing to do given all the cirumstances.
>
> > In the case of the move of a single VMA, the above conditions are
> > permitted. This makes the behaviour identical for a single VMA as before.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>
> Some nits:
>
> > ---
> >  mm/mremap.c | 157 +++++++++++++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 150 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index 8cb08ccea6ad..59f49de0f84e 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -69,6 +69,8 @@ struct vma_remap_struct {
> >  	enum mremap_type remap_type;	/* expand, shrink, etc. */
> >  	bool mmap_locked;		/* Is mm currently write-locked? */
> >  	unsigned long charged;		/* If VM_ACCOUNT, # pages to account. */
> > +	bool seen_vma;			/* Is >1 VMA being moved? */
>
> Seems this could be local variable of remap_move().

Yes, this is because before there _was_ some external use, but after rework
not any more. Will fix up in a fix-patch.

>
> > +	bool vmi_needs_reset;		/* Was the VMA iterator invalidated? */
> >  };
> >
> >  static pud_t *get_old_pud(struct mm_struct *mm, unsigned long addr)
> > @@ -1188,6 +1190,9 @@ static int copy_vma_and_data(struct vma_remap_struct *vrm,
> >  		*new_vma_ptr = NULL;
> >  		return -ENOMEM;
> >  	}
>
> A newline here?

I kinda thought it made sense to 'group' it with logic above, so this was
on purpose.

>
> > +	if (vma != vrm->vma)
> > +		vrm->vmi_needs_reset = true;
>
> A comment on what this condition means wouldn't hurt? Is it when "Source vma
> may have been merged into new_vma" in copy_vma(), or when not?
>

Sure will add in a fix-patch.

  reply	other threads:[~2025-07-11 13:50 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1752232673.git.lorenzo.stoakes@oracle.com>
     [not found] ` <8f41e72b0543953d277e96d5e67a52f287cdbac3.1752232673.git.lorenzo.stoakes@oracle.com>
2025-07-11 13:34   ` [PATCH v3 09/10] mm/mremap: permit mremap() move of multiple VMAs Vlastimil Babka
2025-07-11 13:49     ` Lorenzo Stoakes [this message]
2025-07-25 17:11   ` Jann Horn
2025-07-25 17:27     ` Lorenzo Stoakes
2025-07-25 19:10       ` Jann Horn
2025-07-25 19:59         ` Lorenzo Stoakes
2025-07-11 13:45 ` [PATCH v3 00/10] " Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9a86e07-1671-42ed-85b6-0b5fa4ce24c0@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    --cc=pfalcato@suse.de \
    --cc=riel@surriel.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).