* [PATCH] mm/mremap: unmap full fixed target for multi-VMA moves
@ 2026-05-07 18:24 fujunjie
2026-05-11 15:05 ` Lorenzo Stoakes
0 siblings, 1 reply; 10+ messages in thread
From: fujunjie @ 2026-05-07 18:24 UTC (permalink / raw)
To: Andrew Morton, Liam R. Howlett, Lorenzo Stoakes
Cc: Vlastimil Babka, Jann Horn, Shuah Khan, linux-mm, linux-kernel,
linux-kselftest
When MREMAP_FIXED moves a range spanning multiple VMAs, remap_move()
iterates over the source VMAs and skips source holes. For each source
VMA segment it narrows vrm->new_len to that segment length before calling
mremap_to(), so the MREMAP_FIXED destination unmap is also limited to the
current segment.
If the source range contains holes and the destination range is already
mapped, the destination subranges corresponding to those source holes can
therefore survive the move. That violates MREMAP_FIXED semantics, where
the destination range is supposed to be unmapped before the move, and it
also means that preserved source gaps do not necessarily become gaps in
the destination. Userspace can then observe mappings at destination
addresses that should have become holes.
Unmap the full fixed target range once before moving the first source
segment in the multi-VMA move path. Split mremap_to() into a wrapper that
performs the MREMAP_FIXED target unmap and a lower-level __mremap_to()
helper that assumes any required target unmap has already been handled.
Then the multi-VMA path can call __mremap_to() after clearing the full
fixed target, while the single-VMA path continues to use mremap_to().
Use a common unmap_fixed_target() helper for both the ordinary mremap_to()
target unmap and the full-range target unmap. This keeps the existing
iterator invalidation and source VMA reload behaviour in one place. The
full target unmap still uses the existing uf_unmap_early path for
userfaultfd unmap notifications.
Extend the multi-VMA mremap selftest to check that the destination pages
corresponding to source holes are unmapped after mremap(). This covers
both MREMAP_FIXED and MREMAP_FIXED |
MREMAP_DONTUNMAP.
Fixes: d23cb648e365 ("mm/mremap: permit mremap() move of multiple VMAs")
Cc: stable@vger.kernel.org # 6.17+
Signed-off-by: fujunjie <fujunjie1@qq.com>
---
mm/mremap.c | 92 +++++++++++++++++-------
tools/testing/selftests/mm/mremap_test.c | 52 ++++++++++++++
2 files changed, 118 insertions(+), 26 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c
index e9c8b1d05832..e91fff2ba3ce 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1430,37 +1430,39 @@ static unsigned long shrink_vma(struct vma_remap_struct *vrm,
return 0;
}
+static unsigned long unmap_fixed_target(struct vma_remap_struct *vrm,
+ unsigned long addr, unsigned long len)
+{
+ struct mm_struct *mm = current->mm;
+ unsigned long err;
+
+ err = do_munmap(mm, addr, len, vrm->uf_unmap_early);
+ vrm->vma = NULL; /* Invalidated. */
+ vrm->vmi_needs_invalidate = true;
+ if (err)
+ return err;
+
+ /*
+ * If we remap a portion of a VMA elsewhere in the same VMA, this can
+ * invalidate the old VMA. Reset.
+ */
+ vrm->vma = vma_lookup(mm, vrm->addr);
+ if (!vrm->vma)
+ return -EFAULT;
+
+ return 0;
+}
+
/*
- * mremap_to() - remap a vma to a new location.
+ * __mremap_to() - remap a vma to a new location, assuming any required
+ * MREMAP_FIXED target unmap has already been handled.
* Returns: The new address of the vma or an error.
*/
-static unsigned long mremap_to(struct vma_remap_struct *vrm)
+static unsigned long __mremap_to(struct vma_remap_struct *vrm)
{
struct mm_struct *mm = current->mm;
unsigned long err;
- if (vrm->flags & MREMAP_FIXED) {
- /*
- * In mremap_to().
- * VMA is moved to dst address, and munmap dst first.
- * do_munmap will check if dst is sealed.
- */
- err = do_munmap(mm, vrm->new_addr, vrm->new_len,
- vrm->uf_unmap_early);
- vrm->vma = NULL; /* Invalidated. */
- vrm->vmi_needs_invalidate = true;
- if (err)
- return err;
-
- /*
- * If we remap a portion of a VMA elsewhere in the same VMA,
- * this can invalidate the old VMA. Reset.
- */
- vrm->vma = vma_lookup(mm, vrm->addr);
- if (!vrm->vma)
- return -EFAULT;
- }
-
if (vrm->remap_type == MREMAP_SHRINK) {
err = shrink_vma(vrm, /* drop_lock= */false);
if (err)
@@ -1486,6 +1488,23 @@ static unsigned long mremap_to(struct vma_remap_struct *vrm)
return move_vma(vrm);
}
+/*
+ * mremap_to() - remap a vma to a new location.
+ * Returns: The new address of the vma or an error.
+ */
+static unsigned long mremap_to(struct vma_remap_struct *vrm)
+{
+ unsigned long err;
+
+ if (vrm->flags & MREMAP_FIXED) {
+ err = unmap_fixed_target(vrm, vrm->new_addr, vrm->new_len);
+ if (err)
+ return err;
+ }
+
+ return __mremap_to(vrm);
+}
+
static int vma_expandable(struct vm_area_struct *vma, unsigned long delta)
{
unsigned long end = vma->vm_end + delta;
@@ -1882,9 +1901,11 @@ static unsigned long remap_move(struct vma_remap_struct *vrm)
unsigned long start = vrm->addr;
unsigned long end = vrm->addr + vrm->old_len;
unsigned long new_addr = vrm->new_addr;
+ unsigned long new_len = vrm->new_len;
unsigned long target_addr = new_addr;
unsigned long res = -EFAULT;
unsigned long last_end;
+ bool fixed_target_unmapped = false;
bool seen_vma = false;
VMA_ITERATOR(vmi, current->mm, start);
@@ -1939,8 +1960,27 @@ static unsigned long remap_move(struct vma_remap_struct *vrm)
}
res_vma = check_prep_vma(vrm);
- if (!res_vma)
- res_vma = mremap_to(vrm);
+ /*
+ * remap_move() narrows vrm->new_len to the current source VMA
+ * segment before calling mremap_to(). For sparse source ranges
+ * this would leave target ranges corresponding to source holes
+ * mapped.
+ *
+ * If the requested source range extends beyond the first VMA,
+ * unmap the full fixed target range once, using the original
+ * new_addr/new_len, before moving any segment.
+ */
+ if (!res_vma && !seen_vma && vma->vm_end < end) {
+ res_vma = unmap_fixed_target(vrm, new_addr, new_len);
+ if (!res_vma)
+ fixed_target_unmapped = true;
+ }
+ if (!res_vma) {
+ if (fixed_target_unmapped)
+ res_vma = __mremap_to(vrm);
+ else
+ res_vma = mremap_to(vrm);
+ }
if (IS_ERR_VALUE(res_vma))
return res_vma;
diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
index 308576437228..ebbe1aa42ef0 100644
--- a/tools/testing/selftests/mm/mremap_test.c
+++ b/tools/testing/selftests/mm/mremap_test.c
@@ -419,6 +419,54 @@ static bool is_multiple_vma_range_ok(unsigned int pattern_seed,
return true;
}
+static bool is_range_unmapped(void *addr, size_t size)
+{
+ void *ptr;
+
+ ptr = mmap(addr, size, PROT_NONE,
+ MAP_PRIVATE | MAP_ANON | MAP_FIXED_NOREPLACE, -1, 0);
+ if (ptr == MAP_FAILED) {
+ if (errno == EEXIST)
+ ksft_print_msg("range %p-%p is still mapped\n",
+ addr, (char *)addr + size);
+ else
+ perror("mmap MAP_FIXED_NOREPLACE");
+ return false;
+ }
+
+ if (ptr != addr) {
+ ksft_print_msg("mmap MAP_FIXED_NOREPLACE returned %p, expected %p\n",
+ ptr, addr);
+ munmap(ptr, size);
+ return false;
+ }
+
+ if (munmap(ptr, size)) {
+ perror("munmap MAP_FIXED_NOREPLACE probe");
+ return false;
+ }
+
+ return true;
+}
+
+static bool multiple_vma_holes_unmapped(char *ptr, unsigned long page_size)
+{
+ static const int holes[] = { 1, 3, 7, 9 };
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(holes); i++) {
+ char *addr = &ptr[holes[i] * page_size];
+
+ if (!is_range_unmapped(addr, page_size)) {
+ ksft_print_msg("target hole page %d is mapped\n",
+ holes[i]);
+ return false;
+ }
+ }
+
+ return true;
+}
+
static void mremap_move_multiple_vmas(unsigned int pattern_seed,
unsigned long page_size,
bool dont_unmap)
@@ -529,6 +577,10 @@ static void mremap_move_multiple_vmas(unsigned int pattern_seed,
success = false;
goto out_unmap;
}
+ if (!multiple_vma_holes_unmapped(tgt_ptr, page_size)) {
+ success = false;
+ goto out_unmap;
+ }
out_unmap:
if (munmap(tgt_ptr, 2 * size))
base-commit: 1b55f8358e35a67bf3969339ea7b86988af92f66
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH] mm/mremap: unmap full fixed target for multi-VMA moves 2026-05-07 18:24 [PATCH] mm/mremap: unmap full fixed target for multi-VMA moves fujunjie @ 2026-05-11 15:05 ` Lorenzo Stoakes 2026-05-11 15:19 ` Jann Horn 0 siblings, 1 reply; 10+ messages in thread From: Lorenzo Stoakes @ 2026-05-11 15:05 UTC (permalink / raw) To: fujunjie Cc: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Jann Horn, Shuah Khan, linux-mm, linux-kernel, linux-kselftest On Thu, May 07, 2026 at 06:24:45PM +0000, fujunjie wrote: > When MREMAP_FIXED moves a range spanning multiple VMAs, remap_move() > iterates over the source VMAs and skips source holes. For each source > VMA segment it narrows vrm->new_len to that segment length before calling > mremap_to(), so the MREMAP_FIXED destination unmap is also limited to the > current segment. > > If the source range contains holes and the destination range is already > mapped, the destination subranges corresponding to those source holes can > therefore survive the move. That violates MREMAP_FIXED semantics, where > the destination range is supposed to be unmapped before the move, and it > also means that preserved source gaps do not necessarily become gaps in > the destination. Userspace can then observe mappings at destination > addresses that should have become holes. Hmmm I think it's a bit debateable honestly. The ability to handle there being gaps is a _new thing_, so there are no semantics to speak of. Prevoiusly mremap() simply required that you only span across a single VMA. And the idea is that we're, conceptually, batching up an operation like moving: |----| |-------| |-----| | | | | | | |----| |-------| |-----| To: |----| |-------| |-----| | | | | | | |----| |-------| |-----| Now if there was an existing VMA: |...................................| | | |...................................| The result of iterating through each and moving them individually would be: |...|----|.....|-------|..|-----|...| | | | | | | | | |...|----|.....|-------|..|-----|...| So it's not obvious that the source range spanning holes is actually indicative of the caller wanting an unmap in each of these places. The only semantics there, really, are that you are going to unmap the ranges the destination VMA is going to span, which of course this still does. I would argue that the semantics of 'aggregate does the same as if you were to perform the operation individually on VMAs' trumps somebody assuming you're 'moving the gaps'. So I think it's really not obvious that we should be treating unmapped areas as if they were VMAs and to move 'space' too... I'm also concerned that now we're changing a fundamental behaviour that users might rely upon (as weird as it sounds). I mean changing this would arguably be breaking userland. Really the intent was to allow somebody to not have to worry about unmerged VMAs, for cases where the VMAs _have no gaps_, and the ability to tolerate gaps was added as a nice-to-have. You also have to account for the fact that a partial failure may occur, at which point you've done a rather strange, partial, unmap. Is there an actual use case that wants this behaviour? > > Unmap the full fixed target range once before moving the first source > segment in the multi-VMA move path. Split mremap_to() into a wrapper that > performs the MREMAP_FIXED target unmap and a lower-level __mremap_to() > helper that assumes any required target unmap has already been handled. > Then the multi-VMA path can call __mremap_to() after clearing the full > fixed target, while the single-VMA path continues to use mremap_to(). > > Use a common unmap_fixed_target() helper for both the ordinary mremap_to() > target unmap and the full-range target unmap. This keeps the existing > iterator invalidation and source VMA reload behaviour in one place. The > full target unmap still uses the existing uf_unmap_early path for > userfaultfd unmap notifications. > > Extend the multi-VMA mremap selftest to check that the destination pages > corresponding to source holes are unmapped after mremap(). This covers > both MREMAP_FIXED and MREMAP_FIXED | > MREMAP_DONTUNMAP. > > Fixes: d23cb648e365 ("mm/mremap: permit mremap() move of multiple VMAs") > Cc: stable@vger.kernel.org # 6.17+ > Signed-off-by: fujunjie <fujunjie1@qq.com> > --- > mm/mremap.c | 92 +++++++++++++++++------- > tools/testing/selftests/mm/mremap_test.c | 52 ++++++++++++++ > 2 files changed, 118 insertions(+), 26 deletions(-) > > diff --git a/mm/mremap.c b/mm/mremap.c > index e9c8b1d05832..e91fff2ba3ce 100644 > --- a/mm/mremap.c > +++ b/mm/mremap.c > @@ -1430,37 +1430,39 @@ static unsigned long shrink_vma(struct vma_remap_struct *vrm, > return 0; > } > > +static unsigned long unmap_fixed_target(struct vma_remap_struct *vrm, > + unsigned long addr, unsigned long len) > +{ > + struct mm_struct *mm = current->mm; > + unsigned long err; > + > + err = do_munmap(mm, addr, len, vrm->uf_unmap_early); > + vrm->vma = NULL; /* Invalidated. */ > + vrm->vmi_needs_invalidate = true; > + if (err) > + return err; > + > + /* > + * If we remap a portion of a VMA elsewhere in the same VMA, this can > + * invalidate the old VMA. Reset. > + */ > + vrm->vma = vma_lookup(mm, vrm->addr); > + if (!vrm->vma) > + return -EFAULT; > + > + return 0; > +} It might be nice to factor this out even if we don't take this patch... Be good to also update the unmap_source_vma() case to to use it if we can. We'd want in that case though to figure out if we want to lookup the VMA though, and then optionally have a VMI... anyway :) > + > /* > - * mremap_to() - remap a vma to a new location. > + * __mremap_to() - remap a vma to a new location, assuming any required > + * MREMAP_FIXED target unmap has already been handled. > * Returns: The new address of the vma or an error. > */ > -static unsigned long mremap_to(struct vma_remap_struct *vrm) > +static unsigned long __mremap_to(struct vma_remap_struct *vrm) > { > struct mm_struct *mm = current->mm; > unsigned long err; > > - if (vrm->flags & MREMAP_FIXED) { > - /* > - * In mremap_to(). > - * VMA is moved to dst address, and munmap dst first. > - * do_munmap will check if dst is sealed. > - */ > - err = do_munmap(mm, vrm->new_addr, vrm->new_len, > - vrm->uf_unmap_early); > - vrm->vma = NULL; /* Invalidated. */ > - vrm->vmi_needs_invalidate = true; > - if (err) > - return err; > - > - /* > - * If we remap a portion of a VMA elsewhere in the same VMA, > - * this can invalidate the old VMA. Reset. > - */ > - vrm->vma = vma_lookup(mm, vrm->addr); > - if (!vrm->vma) > - return -EFAULT; > - } > - > if (vrm->remap_type == MREMAP_SHRINK) { > err = shrink_vma(vrm, /* drop_lock= */false); > if (err) > @@ -1486,6 +1488,23 @@ static unsigned long mremap_to(struct vma_remap_struct *vrm) > return move_vma(vrm); > } > > +/* > + * mremap_to() - remap a vma to a new location. > + * Returns: The new address of the vma or an error. > + */ > +static unsigned long mremap_to(struct vma_remap_struct *vrm) > +{ > + unsigned long err; > + > + if (vrm->flags & MREMAP_FIXED) { > + err = unmap_fixed_target(vrm, vrm->new_addr, vrm->new_len); > + if (err) > + return err; > + } > + > + return __mremap_to(vrm); > +} > + > static int vma_expandable(struct vm_area_struct *vma, unsigned long delta) > { > unsigned long end = vma->vm_end + delta; > @@ -1882,9 +1901,11 @@ static unsigned long remap_move(struct vma_remap_struct *vrm) > unsigned long start = vrm->addr; > unsigned long end = vrm->addr + vrm->old_len; > unsigned long new_addr = vrm->new_addr; > + unsigned long new_len = vrm->new_len; > unsigned long target_addr = new_addr; > unsigned long res = -EFAULT; > unsigned long last_end; > + bool fixed_target_unmapped = false; > bool seen_vma = false; > > VMA_ITERATOR(vmi, current->mm, start); > @@ -1939,8 +1960,27 @@ static unsigned long remap_move(struct vma_remap_struct *vrm) > } > > res_vma = check_prep_vma(vrm); > - if (!res_vma) > - res_vma = mremap_to(vrm); > + /* > + * remap_move() narrows vrm->new_len to the current source VMA > + * segment before calling mremap_to(). For sparse source ranges > + * this would leave target ranges corresponding to source holes > + * mapped. > + * > + * If the requested source range extends beyond the first VMA, > + * unmap the full fixed target range once, using the original > + * new_addr/new_len, before moving any segment. > + */ Hmm yeah one batched up operation to unmap the whole range? That's not correct from the point of view of partial failure cases, now you're unmapping _the whole range_. I think it's not useful to reference the 'narrowing', I think it's confusing to use 'new_len' to mean 'unnarrowed new_len', which of course has to be equal to old_len to even be here. You'd need to rename that to something sensible, like range_len. Also that original length might include a trailing space _after_ the end of the range. So you might end up unmapping stuff _past the end of the range_ of the range you are moving, which might unmap real stuff, which is completely incorrect. Honestly I'm really thinking that we don't want this behaviour. It's simpler to treat this as aggregated mremap()'s, and if a user needs this kind of behaviour they can unmap the target range themselves, it's not difficult. Maybe the real TODO here is to update the man page to make this clear. > + if (!res_vma && !seen_vma && vma->vm_end < end) { > + res_vma = unmap_fixed_target(vrm, new_addr, new_len); > + if (!res_vma) > + fixed_target_unmapped = true; > + } > + if (!res_vma) { > + if (fixed_target_unmapped) > + res_vma = __mremap_to(vrm); > + else > + res_vma = mremap_to(vrm); > + } I don't love this, we're taking an already badly named function (mea culpa, though in <kinda> my defence I was continuing a tradition at least :) And now you're adding more complexity and more state just to figure out if you need to unmap here, yeah again, I think the existing behaviour is better overall. > if (IS_ERR_VALUE(res_vma)) > return res_vma; > > diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c > index 308576437228..ebbe1aa42ef0 100644 > --- a/tools/testing/selftests/mm/mremap_test.c > +++ b/tools/testing/selftests/mm/mremap_test.c > @@ -419,6 +419,54 @@ static bool is_multiple_vma_range_ok(unsigned int pattern_seed, > return true; > } > > +static bool is_range_unmapped(void *addr, size_t size) > +{ > + void *ptr; > + > + ptr = mmap(addr, size, PROT_NONE, > + MAP_PRIVATE | MAP_ANON | MAP_FIXED_NOREPLACE, -1, 0); > + if (ptr == MAP_FAILED) { > + if (errno == EEXIST) > + ksft_print_msg("range %p-%p is still mapped\n", > + addr, (char *)addr + size); > + else > + perror("mmap MAP_FIXED_NOREPLACE"); > + return false; > + } > + > + if (ptr != addr) { > + ksft_print_msg("mmap MAP_FIXED_NOREPLACE returned %p, expected %p\n", > + ptr, addr); > + munmap(ptr, size); > + return false; > + } > + > + if (munmap(ptr, size)) { > + perror("munmap MAP_FIXED_NOREPLACE probe"); > + return false; > + } > + > + return true; > +} This is not a great way of figuring that out, you can use the procmap_query stuff to figure out it out from vm_util.h, see open_procmap(), query_procmap(), find_vma_procmap(), etc. > + > +static bool multiple_vma_holes_unmapped(char *ptr, unsigned long page_size) > +{ > + static const int holes[] = { 1, 3, 7, 9 }; > + int i; > + > + for (i = 0; i < ARRAY_SIZE(holes); i++) { > + char *addr = &ptr[holes[i] * page_size]; > + > + if (!is_range_unmapped(addr, page_size)) { > + ksft_print_msg("target hole page %d is mapped\n", > + holes[i]); > + return false; > + } > + } > + > + return true; > +} > + > static void mremap_move_multiple_vmas(unsigned int pattern_seed, > unsigned long page_size, > bool dont_unmap) > @@ -529,6 +577,10 @@ static void mremap_move_multiple_vmas(unsigned int pattern_seed, > success = false; > goto out_unmap; > } > + if (!multiple_vma_holes_unmapped(tgt_ptr, page_size)) { > + success = false; > + goto out_unmap; > + } > > out_unmap: > if (munmap(tgt_ptr, 2 * size)) > > base-commit: 1b55f8358e35a67bf3969339ea7b86988af92f66 > -- > 2.34.1 > Cheers, Lorenzo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/mremap: unmap full fixed target for multi-VMA moves 2026-05-11 15:05 ` Lorenzo Stoakes @ 2026-05-11 15:19 ` Jann Horn 2026-05-11 15:32 ` Lorenzo Stoakes 0 siblings, 1 reply; 10+ messages in thread From: Jann Horn @ 2026-05-11 15:19 UTC (permalink / raw) To: Lorenzo Stoakes Cc: fujunjie, Andrew Morton, Liam R. Howlett, Vlastimil Babka, Shuah Khan, linux-mm, linux-kernel, linux-kselftest On Mon, May 11, 2026 at 5:05 PM Lorenzo Stoakes <ljs@kernel.org> wrote: > On Thu, May 07, 2026 at 06:24:45PM +0000, fujunjie wrote: > > When MREMAP_FIXED moves a range spanning multiple VMAs, remap_move() > > iterates over the source VMAs and skips source holes. For each source > > VMA segment it narrows vrm->new_len to that segment length before calling > > mremap_to(), so the MREMAP_FIXED destination unmap is also limited to the > > current segment. > > > > If the source range contains holes and the destination range is already > > mapped, the destination subranges corresponding to those source holes can > > therefore survive the move. That violates MREMAP_FIXED semantics, where > > the destination range is supposed to be unmapped before the move, and it > > also means that preserved source gaps do not necessarily become gaps in > > the destination. Userspace can then observe mappings at destination > > addresses that should have become holes. > > Hmmm I think it's a bit debateable honestly. The ability to handle there being > gaps is a _new thing_, so there are no semantics to speak of. Prevoiusly > mremap() simply required that you only span across a single VMA. FWIW, I think mremap() on a source region with gaps is such a hazardous operation that nearly no userspace code should be doing it - gaps are areas in which any mmap() call without a fixed address could place unrelated mappings (unless stack VMAs are involved, which would also be a weird scenario), so to use it safely, you have to, among other things, make sure not to use libc malloc() at a time when that could place an allocation in the gap (which means you also can't use printf(), and so on, unless you have swapped out the memory allocator), and make sure that you have no other threads that could be doing that, and so on. There are rare circumstances under which it could be safe, but I think it is almost always better to have a PROT_NONE anonymous VMA or such as a placeholder. I think the right documentation for this is "do not use this on a source region with gaps, it is technically possible but extremely hazardous". ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/mremap: unmap full fixed target for multi-VMA moves 2026-05-11 15:19 ` Jann Horn @ 2026-05-11 15:32 ` Lorenzo Stoakes 2026-05-11 15:40 ` Jann Horn 0 siblings, 1 reply; 10+ messages in thread From: Lorenzo Stoakes @ 2026-05-11 15:32 UTC (permalink / raw) To: Jann Horn Cc: fujunjie, Andrew Morton, Liam R. Howlett, Vlastimil Babka, Shuah Khan, linux-mm, linux-kernel, linux-kselftest On Mon, May 11, 2026 at 05:19:50PM +0200, Jann Horn wrote: > On Mon, May 11, 2026 at 5:05 PM Lorenzo Stoakes <ljs@kernel.org> wrote: > > On Thu, May 07, 2026 at 06:24:45PM +0000, fujunjie wrote: > > > When MREMAP_FIXED moves a range spanning multiple VMAs, remap_move() > > > iterates over the source VMAs and skips source holes. For each source > > > VMA segment it narrows vrm->new_len to that segment length before calling > > > mremap_to(), so the MREMAP_FIXED destination unmap is also limited to the > > > current segment. > > > > > > If the source range contains holes and the destination range is already > > > mapped, the destination subranges corresponding to those source holes can > > > therefore survive the move. That violates MREMAP_FIXED semantics, where > > > the destination range is supposed to be unmapped before the move, and it > > > also means that preserved source gaps do not necessarily become gaps in > > > the destination. Userspace can then observe mappings at destination > > > addresses that should have become holes. > > > > Hmmm I think it's a bit debateable honestly. The ability to handle there being > > gaps is a _new thing_, so there are no semantics to speak of. Prevoiusly > > mremap() simply required that you only span across a single VMA. > > FWIW, I think mremap() on a source region with gaps is such a > hazardous operation that nearly no userspace code should be doing it - > gaps are areas in which any mmap() call without a fixed address could > place unrelated mappings (unless stack VMAs are involved, which would > also be a weird scenario), so to use it safely, you have to, among > other things, make sure not to use libc malloc() at a time when that > could place an allocation in the gap (which means you also can't use > printf(), and so on, unless you have swapped out the memory > allocator), and make sure that you have no other threads that could be > doing that, and so on. There are rare circumstances under which it > could be safe, but I think it is almost always better to have a > PROT_NONE anonymous VMA or such as a placeholder. Well, we're holding the mmap write lock so none of that could happen _during_ the operation right? You might debate also the fact we hold that for an extended period. Honestly I probably shouldn't have allowed for this, I've had to do at least one fixup relating to it I seem to recall and the semantics are _clearly_ confusing. > > I think the right documentation for this is "do not use this on a > source region with gaps, it is technically possible but extremely > hazardous". Yeah, I mean on reflection, allowing it was probably a mistake. The real use case was 'my VMAs are fragmented and I don't want to have to know about VMA merge rules in order to move them', i.e. no gaps. I will do a manpage update and indicate that it probably shouldn't be used but if it is, the sematics are such that gaps are not propagated (i.e. it is as if you mremap()'d each individually). Cheers, Lorenzo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/mremap: unmap full fixed target for multi-VMA moves 2026-05-11 15:32 ` Lorenzo Stoakes @ 2026-05-11 15:40 ` Jann Horn 2026-05-11 16:00 ` Lorenzo Stoakes 0 siblings, 1 reply; 10+ messages in thread From: Jann Horn @ 2026-05-11 15:40 UTC (permalink / raw) To: Lorenzo Stoakes Cc: fujunjie, Andrew Morton, Liam R. Howlett, Vlastimil Babka, Shuah Khan, linux-mm, linux-kernel, linux-kselftest On Mon, May 11, 2026 at 5:32 PM Lorenzo Stoakes <ljs@kernel.org> wrote: > On Mon, May 11, 2026 at 05:19:50PM +0200, Jann Horn wrote: > > On Mon, May 11, 2026 at 5:05 PM Lorenzo Stoakes <ljs@kernel.org> wrote: > > > Hmmm I think it's a bit debateable honestly. The ability to handle there being > > > gaps is a _new thing_, so there are no semantics to speak of. Prevoiusly > > > mremap() simply required that you only span across a single VMA. > > > > FWIW, I think mremap() on a source region with gaps is such a > > hazardous operation that nearly no userspace code should be doing it - > > gaps are areas in which any mmap() call without a fixed address could > > place unrelated mappings (unless stack VMAs are involved, which would > > also be a weird scenario), so to use it safely, you have to, among > > other things, make sure not to use libc malloc() at a time when that > > could place an allocation in the gap (which means you also can't use > > printf(), and so on, unless you have swapped out the memory > > allocator), and make sure that you have no other threads that could be > > doing that, and so on. There are rare circumstances under which it > > could be safe, but I think it is almost always better to have a > > PROT_NONE anonymous VMA or such as a placeholder. > > Well, we're holding the mmap write lock so none of that could happen > _during_ the operation right? Not during the operation, but right before the operation. So from the userspace perspective, you have to know that there are no concurrent threads that could be creating memory mappings at non-fixed addresses, and you have to know that no mappings can have been created in the memory range between when you checked that it's empty and when you make the syscall. > You might debate also the fact we hold that for an extended period. Eh, I mean, that's also true if you call mmap() with MAP_FIXED on a gigantic virtual address region with lots of populated PTEs in it or such, or if you call mprotect() on a big region. I don't think it is problematic that there are some very chonky MM syscalls you can make that will hold the mmap_lock for a long time, as long as we don't expect heavily multithreaded code to be doing those operations frequently. (Being able to keep the mmap lock held in write mode for a long time can be useful as an exploitation trick in some cases, but I don't think that's easy enough to fix to be worth the trouble of addressing it.) > Honestly I probably shouldn't have allowed for this, I've had to do at > least one fixup relating to it I seem to recall and the semantics are > _clearly_ confusing. > > > > > I think the right documentation for this is "do not use this on a > > source region with gaps, it is technically possible but extremely > > hazardous". > > Yeah, I mean on reflection, allowing it was probably a mistake. > > The real use case was 'my VMAs are fragmented and I don't want to have to > know about VMA merge rules in order to move them', i.e. no gaps. > > I will do a manpage update and indicate that it probably shouldn't be used > but if it is, the sematics are such that gaps are not propagated (i.e. it > is as if you mremap()'d each individually). Thanks! ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/mremap: unmap full fixed target for multi-VMA moves 2026-05-11 15:40 ` Jann Horn @ 2026-05-11 16:00 ` Lorenzo Stoakes 2026-05-11 16:02 ` Jann Horn 0 siblings, 1 reply; 10+ messages in thread From: Lorenzo Stoakes @ 2026-05-11 16:00 UTC (permalink / raw) To: Jann Horn Cc: fujunjie, Andrew Morton, Liam R. Howlett, Vlastimil Babka, Shuah Khan, linux-mm, linux-kernel, linux-kselftest On Mon, May 11, 2026 at 05:40:24PM +0200, Jann Horn wrote: > On Mon, May 11, 2026 at 5:32 PM Lorenzo Stoakes <ljs@kernel.org> wrote: > > On Mon, May 11, 2026 at 05:19:50PM +0200, Jann Horn wrote: > > > On Mon, May 11, 2026 at 5:05 PM Lorenzo Stoakes <ljs@kernel.org> wrote: > > > > Hmmm I think it's a bit debateable honestly. The ability to handle there being > > > > gaps is a _new thing_, so there are no semantics to speak of. Prevoiusly > > > > mremap() simply required that you only span across a single VMA. > > > > > > FWIW, I think mremap() on a source region with gaps is such a > > > hazardous operation that nearly no userspace code should be doing it - > > > gaps are areas in which any mmap() call without a fixed address could > > > place unrelated mappings (unless stack VMAs are involved, which would > > > also be a weird scenario), so to use it safely, you have to, among > > > other things, make sure not to use libc malloc() at a time when that > > > could place an allocation in the gap (which means you also can't use > > > printf(), and so on, unless you have swapped out the memory > > > allocator), and make sure that you have no other threads that could be > > > doing that, and so on. There are rare circumstances under which it > > > could be safe, but I think it is almost always better to have a > > > PROT_NONE anonymous VMA or such as a placeholder. > > > > Well, we're holding the mmap write lock so none of that could happen > > _during_ the operation right? > > Not during the operation, but right before the operation. So from the > userspace perspective, you have to know that there are no concurrent > threads that could be creating memory mappings at non-fixed addresses, > and you have to know that no mappings can have been created in the > memory range between when you checked that it's empty and when you > make the syscall. That's a very good point :) But I guess applies to any operations that operate over a range of mappings anyway (madvise() lets you also do this, though it'll give an error code _at the end_ _after having done the operations_ if there are gaps). So madvise() can have the exact same thing happen right? which is... fun :) I actually wonder if we shouldn't just change this to disallow gaps. It'd simplify the code and we could even do the check upfront in one pass. It's doubtful anybody is relying on the gaps behaviour for anything real. > > > You might debate also the fact we hold that for an extended period. > > Eh, I mean, that's also true if you call mmap() with MAP_FIXED on a > gigantic virtual address region with lots of populated PTEs in it or > such, or if you call mprotect() on a big region. I don't think it is > problematic that there are some very chonky MM syscalls you can make > that will hold the mmap_lock for a long time, as long as we don't > expect heavily multithreaded code to be doing those operations > frequently. (Being able to keep the mmap lock held in write mode for a > long time can be useful as an exploitation trick in some cases, but I > don't think that's easy enough to fix to be worth the trouble of > addressing it.) Yeah true. > > > Honestly I probably shouldn't have allowed for this, I've had to do at > > least one fixup relating to it I seem to recall and the semantics are > > _clearly_ confusing. > > > > > > > > I think the right documentation for this is "do not use this on a > > > source region with gaps, it is technically possible but extremely > > > hazardous". > > > > Yeah, I mean on reflection, allowing it was probably a mistake. > > > > The real use case was 'my VMAs are fragmented and I don't want to have to > > know about VMA merge rules in order to move them', i.e. no gaps. > > > > I will do a manpage update and indicate that it probably shouldn't be used > > but if it is, the sematics are such that gaps are not propagated (i.e. it > > is as if you mremap()'d each individually). > > Thanks! Cheers, Lorenzo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/mremap: unmap full fixed target for multi-VMA moves 2026-05-11 16:00 ` Lorenzo Stoakes @ 2026-05-11 16:02 ` Jann Horn 2026-05-11 16:30 ` Lorenzo Stoakes 0 siblings, 1 reply; 10+ messages in thread From: Jann Horn @ 2026-05-11 16:02 UTC (permalink / raw) To: Lorenzo Stoakes Cc: fujunjie, Andrew Morton, Liam R. Howlett, Vlastimil Babka, Shuah Khan, linux-mm, linux-kernel, linux-kselftest On Mon, May 11, 2026 at 6:00 PM Lorenzo Stoakes <ljs@kernel.org> wrote: > On Mon, May 11, 2026 at 05:40:24PM +0200, Jann Horn wrote: > > On Mon, May 11, 2026 at 5:32 PM Lorenzo Stoakes <ljs@kernel.org> wrote: > > > On Mon, May 11, 2026 at 05:19:50PM +0200, Jann Horn wrote: > > > > On Mon, May 11, 2026 at 5:05 PM Lorenzo Stoakes <ljs@kernel.org> wrote: > > > > > Hmmm I think it's a bit debateable honestly. The ability to handle there being > > > > > gaps is a _new thing_, so there are no semantics to speak of. Prevoiusly > > > > > mremap() simply required that you only span across a single VMA. > > > > > > > > FWIW, I think mremap() on a source region with gaps is such a > > > > hazardous operation that nearly no userspace code should be doing it - > > > > gaps are areas in which any mmap() call without a fixed address could > > > > place unrelated mappings (unless stack VMAs are involved, which would > > > > also be a weird scenario), so to use it safely, you have to, among > > > > other things, make sure not to use libc malloc() at a time when that > > > > could place an allocation in the gap (which means you also can't use > > > > printf(), and so on, unless you have swapped out the memory > > > > allocator), and make sure that you have no other threads that could be > > > > doing that, and so on. There are rare circumstances under which it > > > > could be safe, but I think it is almost always better to have a > > > > PROT_NONE anonymous VMA or such as a placeholder. > > > > > > Well, we're holding the mmap write lock so none of that could happen > > > _during_ the operation right? > > > > Not during the operation, but right before the operation. So from the > > userspace perspective, you have to know that there are no concurrent > > threads that could be creating memory mappings at non-fixed addresses, > > and you have to know that no mappings can have been created in the > > memory range between when you checked that it's empty and when you > > make the syscall. > > That's a very good point :) > > But I guess applies to any operations that operate over a range of mappings > anyway (madvise() lets you also do this, though it'll give an error code > _at the end_ _after having done the operations_ if there are gaps). > > So madvise() can have the exact same thing happen right? which is... fun :) > > I actually wonder if we shouldn't just change this to disallow gaps. It'd That would be nice... > simplify the code and we could even do the check upfront in one pass. It's > doubtful anybody is relying on the gaps behaviour for anything real. (I remember that in the past, there were concerns that MM operations with multiple passes are slow, but I guess mremap() is probably not such a hot operation that that's a concern here.) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/mremap: unmap full fixed target for multi-VMA moves 2026-05-11 16:02 ` Jann Horn @ 2026-05-11 16:30 ` Lorenzo Stoakes 2026-05-12 7:24 ` Fujunjie 0 siblings, 1 reply; 10+ messages in thread From: Lorenzo Stoakes @ 2026-05-11 16:30 UTC (permalink / raw) To: Jann Horn Cc: fujunjie, Andrew Morton, Liam R. Howlett, Vlastimil Babka, Shuah Khan, linux-mm, linux-kernel, linux-kselftest On Mon, May 11, 2026 at 06:02:37PM +0200, Jann Horn wrote: > On Mon, May 11, 2026 at 6:00 PM Lorenzo Stoakes <ljs@kernel.org> wrote: > > On Mon, May 11, 2026 at 05:40:24PM +0200, Jann Horn wrote: > > > On Mon, May 11, 2026 at 5:32 PM Lorenzo Stoakes <ljs@kernel.org> wrote: > > > > On Mon, May 11, 2026 at 05:19:50PM +0200, Jann Horn wrote: > > > > > On Mon, May 11, 2026 at 5:05 PM Lorenzo Stoakes <ljs@kernel.org> wrote: > > > > > > Hmmm I think it's a bit debateable honestly. The ability to handle there being > > > > > > gaps is a _new thing_, so there are no semantics to speak of. Prevoiusly > > > > > > mremap() simply required that you only span across a single VMA. > > > > > > > > > > FWIW, I think mremap() on a source region with gaps is such a > > > > > hazardous operation that nearly no userspace code should be doing it - > > > > > gaps are areas in which any mmap() call without a fixed address could > > > > > place unrelated mappings (unless stack VMAs are involved, which would > > > > > also be a weird scenario), so to use it safely, you have to, among > > > > > other things, make sure not to use libc malloc() at a time when that > > > > > could place an allocation in the gap (which means you also can't use > > > > > printf(), and so on, unless you have swapped out the memory > > > > > allocator), and make sure that you have no other threads that could be > > > > > doing that, and so on. There are rare circumstances under which it > > > > > could be safe, but I think it is almost always better to have a > > > > > PROT_NONE anonymous VMA or such as a placeholder. > > > > > > > > Well, we're holding the mmap write lock so none of that could happen > > > > _during_ the operation right? > > > > > > Not during the operation, but right before the operation. So from the > > > userspace perspective, you have to know that there are no concurrent > > > threads that could be creating memory mappings at non-fixed addresses, > > > and you have to know that no mappings can have been created in the > > > memory range between when you checked that it's empty and when you > > > make the syscall. > > > > That's a very good point :) > > > > But I guess applies to any operations that operate over a range of mappings > > anyway (madvise() lets you also do this, though it'll give an error code > > _at the end_ _after having done the operations_ if there are gaps). > > > > So madvise() can have the exact same thing happen right? which is... fun :) > > > > I actually wonder if we shouldn't just change this to disallow gaps. It'd > > That would be nice... Yeah let's go with that then I think :) > > > simplify the code and we could even do the check upfront in one pass. It's > > doubtful anybody is relying on the gaps behaviour for anything real. > > (I remember that in the past, there were concerns that MM operations > with multiple passes are slow, but I guess mremap() is probably not > such a hot operation that that's a concern here.) Yeah not at all, it's rather an expensive operation and if it's on your hot path then you did something wrong :) Cheers, Lorenzo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/mremap: unmap full fixed target for multi-VMA moves 2026-05-11 16:30 ` Lorenzo Stoakes @ 2026-05-12 7:24 ` Fujunjie 2026-05-12 13:59 ` Lorenzo Stoakes 0 siblings, 1 reply; 10+ messages in thread From: Fujunjie @ 2026-05-12 7:24 UTC (permalink / raw) To: Lorenzo Stoakes, Jann Horn Cc: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Shuah Khan, linux-mm, linux-kernel, linux-kselftest On 5/12/2026 12:30 AM, Lorenzo Stoakes wrote: > Yeah let's go with that then I think :) Thanks Lorenzo and Jann. I agree that the patch as posted goes in the wrong direction by treating source gaps as ranges that should be propagated to the destination. Please drop this patch. If disallowing source gaps in this multi-VMA move path still sounds right, I can look into a v2/new patch that rejects sparse source ranges up front instead. Thanks, fujunjie ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/mremap: unmap full fixed target for multi-VMA moves 2026-05-12 7:24 ` Fujunjie @ 2026-05-12 13:59 ` Lorenzo Stoakes 0 siblings, 0 replies; 10+ messages in thread From: Lorenzo Stoakes @ 2026-05-12 13:59 UTC (permalink / raw) To: Fujunjie Cc: Jann Horn, Andrew Morton, Liam R. Howlett, Vlastimil Babka, Shuah Khan, linux-mm, linux-kernel, linux-kselftest On Tue, May 12, 2026 at 03:24:24PM +0800, Fujunjie wrote: > > > On 5/12/2026 12:30 AM, Lorenzo Stoakes wrote: > > Yeah let's go with that then I think :) > > > Thanks Lorenzo and Jann. > > I agree that the patch as posted goes in the wrong direction by treating > source gaps as ranges that should be propagated to the destination. Please > drop this patch. > > If disallowing source gaps in this multi-VMA move path still sounds right, > I can look into a v2/new patch that rejects sparse source ranges up front > instead. Sorry I really want to think about this a little more + make sure the semantics are sane, so I'd rather take a look at this myself. I do really hate to take something away from newer contributors but in this case things are quite fiddly so I want to make sure everything's correct :) > > Thanks, > fujunjie > Thanks, Lorenzo ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-05-12 13:59 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-07 18:24 [PATCH] mm/mremap: unmap full fixed target for multi-VMA moves fujunjie 2026-05-11 15:05 ` Lorenzo Stoakes 2026-05-11 15:19 ` Jann Horn 2026-05-11 15:32 ` Lorenzo Stoakes 2026-05-11 15:40 ` Jann Horn 2026-05-11 16:00 ` Lorenzo Stoakes 2026-05-11 16:02 ` Jann Horn 2026-05-11 16:30 ` Lorenzo Stoakes 2026-05-12 7:24 ` Fujunjie 2026-05-12 13:59 ` Lorenzo Stoakes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox