* [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs
@ 2025-07-07 5:27 Lorenzo Stoakes
2025-07-07 5:27 ` [PATCH 01/10] mm/mremap: perform some simple cleanups Lorenzo Stoakes
` (11 more replies)
0 siblings, 12 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-07-07 5:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato,
Rik van Riel, linux-mm, linux-fsdevel, linux-kernel,
linux-kselftest
Historically we've made it a uAPI requirement that mremap() may only
operate on a single VMA at a time.
For instances where VMAs need to be resized, this makes sense, as it
becomes very difficult to determine what a user actually wants should they
indicate a desire to expand or shrink the size of multiple VMAs (truncate?
Adjust sizes individually? Some other strategy?).
However, in instances where a user is moving VMAs, it is restrictive to
disallow this.
This is especially the case when anonymous mapping remap may or may not be
mergeable depending on whether VMAs have or have not been faulted due to
anon_vma assignment and folio index alignment with vma->vm_pgoff.
Often this can result in surprising impact where a moved region is faulted,
then moved back and a user fails to observe a merge from otherwise
compatible, adjacent VMAs.
This change allows such cases to work without the user having to be
cognizant of whether a prior mremap() move or other VMA operations has
resulted in VMA fragmentation.
In order to do this, this series performs a large amount of refactoring,
most pertinently - grouping sanity checks together, separately those that
check input parameters and those relating to VMAs.
we also simplify the post-mmap lock drop processing for uffd and mlock()'d
VMAs.
With this done, we can then fairly straightforwardly implement this
functionality.
This works exclusively for mremap() invocations which specify
MREMAP_FIXED. It is not compatible with VMAs which use userfaultfd, as the
notification of the userland fault handler would require us to drop the
mmap lock.
The input and output addresses ranges must not overlap. We carefully
account for moves which would result in VMA merges or would otherwise
result in VMA iterator invalidation.
Lorenzo Stoakes (10):
mm/mremap: perform some simple cleanups
mm/mremap: refactor initial parameter sanity checks
mm/mremap: put VMA check and prep logic into helper function
mm/mremap: cleanup post-processing stage of mremap
mm/mremap: use an explicit uffd failure path for mremap
mm/mremap: check remap conditions earlier
mm/mremap: move remap_is_valid() into check_prep_vma()
mm/mremap: clean up mlock populate behaviour
mm/mremap: permit mremap() move of multiple VMAs
tools/testing/selftests: extend mremap_test to test multi-VMA mremap
fs/userfaultfd.c | 15 +-
include/linux/userfaultfd_k.h | 1 +
mm/mremap.c | 502 ++++++++++++++---------
tools/testing/selftests/mm/mremap_test.c | 145 ++++++-
4 files changed, 462 insertions(+), 201 deletions(-)
--
2.50.0
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 01/10] mm/mremap: perform some simple cleanups
2025-07-07 5:27 [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs Lorenzo Stoakes
@ 2025-07-07 5:27 ` Lorenzo Stoakes
2025-07-10 11:09 ` Vlastimil Babka
2025-07-07 5:27 ` [PATCH 02/10] mm/mremap: refactor initial parameter sanity checks Lorenzo Stoakes
` (10 subsequent siblings)
11 siblings, 1 reply; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-07-07 5:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato,
Rik van Riel, linux-mm, linux-fsdevel, linux-kernel,
linux-kselftest
We const-ify the vrm flags parameter to indicate this will never change.
We rename resize_is_valid() to remap_is_valid(), as this function does not
only apply to cases where we resize, so it's simply confusing to refer to
that here.
We remove the BUG() from mremap_at(), as we should not BUG() unless we are
certain it'll result in system instability.
We rename vrm_charge() to vrm_calc_charge() to make it clear this simply
calculates the charged number of pages rather than actually adjusting any
state.
We update the comment for vrm_implies_new_addr() to explain that
MREMAP_DONTUNMAP does not require a set address, but will always be moved.
Additionally consistently use 'res' rather than 'ret' for result values.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/mremap.c | 55 +++++++++++++++++++++++++++++++----------------------
1 file changed, 32 insertions(+), 23 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c
index 1f5bebbb9c0c..65c7f29b6116 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -52,7 +52,7 @@ struct vma_remap_struct {
unsigned long addr; /* User-specified address from which we remap. */
unsigned long old_len; /* Length of range being remapped. */
unsigned long new_len; /* Desired new length of mapping. */
- unsigned long flags; /* user-specified MREMAP_* flags. */
+ const unsigned long flags; /* user-specified MREMAP_* flags. */
unsigned long new_addr; /* Optionally, desired new address. */
/* uffd state. */
@@ -909,7 +909,11 @@ static bool vrm_overlaps(struct vma_remap_struct *vrm)
return false;
}
-/* Do the mremap() flags require that the new_addr parameter be specified? */
+/*
+ * Will a new address definitely be assigned? This either if the user specifies
+ * it via MREMAP_FIXED, or if MREMAP_DONTUNMAP is used, indicating we will
+ * always detemrine a target address.
+ */
static bool vrm_implies_new_addr(struct vma_remap_struct *vrm)
{
return vrm->flags & (MREMAP_FIXED | MREMAP_DONTUNMAP);
@@ -955,7 +959,7 @@ static unsigned long vrm_set_new_addr(struct vma_remap_struct *vrm)
*
* Returns true on success, false if insufficient memory to charge.
*/
-static bool vrm_charge(struct vma_remap_struct *vrm)
+static bool vrm_calc_charge(struct vma_remap_struct *vrm)
{
unsigned long charged;
@@ -1260,8 +1264,11 @@ static unsigned long move_vma(struct vma_remap_struct *vrm)
if (err)
return err;
- /* If accounted, charge the number of bytes the operation will use. */
- if (!vrm_charge(vrm))
+ /*
+ * If accounted, determine the number of bytes the operation will
+ * charge.
+ */
+ if (!vrm_calc_charge(vrm))
return -ENOMEM;
/* We don't want racing faults. */
@@ -1300,12 +1307,12 @@ static unsigned long move_vma(struct vma_remap_struct *vrm)
}
/*
- * resize_is_valid() - Ensure the vma can be resized to the new length at the give
- * address.
+ * remap_is_valid() - Ensure the VMA can be moved or resized to the new length,
+ * at the given address.
*
* Return 0 on success, error otherwise.
*/
-static int resize_is_valid(struct vma_remap_struct *vrm)
+static int remap_is_valid(struct vma_remap_struct *vrm)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma = vrm->vma;
@@ -1444,7 +1451,7 @@ static unsigned long mremap_to(struct vma_remap_struct *vrm)
vrm->old_len = vrm->new_len;
}
- err = resize_is_valid(vrm);
+ err = remap_is_valid(vrm);
if (err)
return err;
@@ -1569,7 +1576,7 @@ static unsigned long expand_vma_in_place(struct vma_remap_struct *vrm)
struct vm_area_struct *vma = vrm->vma;
VMA_ITERATOR(vmi, mm, vma->vm_end);
- if (!vrm_charge(vrm))
+ if (!vrm_calc_charge(vrm))
return -ENOMEM;
/*
@@ -1630,7 +1637,7 @@ static unsigned long expand_vma(struct vma_remap_struct *vrm)
unsigned long err;
unsigned long addr = vrm->addr;
- err = resize_is_valid(vrm);
+ err = remap_is_valid(vrm);
if (err)
return err;
@@ -1703,18 +1710,20 @@ static unsigned long mremap_at(struct vma_remap_struct *vrm)
return expand_vma(vrm);
}
- BUG();
+ /* Should not be possible. */
+ WARN_ON_ONCE(1);
+ return -EINVAL;
}
static unsigned long do_mremap(struct vma_remap_struct *vrm)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
- unsigned long ret;
+ unsigned long res;
- ret = check_mremap_params(vrm);
- if (ret)
- return ret;
+ res = check_mremap_params(vrm);
+ if (res)
+ return res;
vrm->old_len = PAGE_ALIGN(vrm->old_len);
vrm->new_len = PAGE_ALIGN(vrm->new_len);
@@ -1726,41 +1735,41 @@ static unsigned long do_mremap(struct vma_remap_struct *vrm)
vma = vrm->vma = vma_lookup(mm, vrm->addr);
if (!vma) {
- ret = -EFAULT;
+ res = -EFAULT;
goto out;
}
/* If mseal()'d, mremap() is prohibited. */
if (!can_modify_vma(vma)) {
- ret = -EPERM;
+ res = -EPERM;
goto out;
}
/* Align to hugetlb page size, if required. */
if (is_vm_hugetlb_page(vma) && !align_hugetlb(vrm)) {
- ret = -EINVAL;
+ res = -EINVAL;
goto out;
}
vrm->remap_type = vrm_remap_type(vrm);
/* Actually execute mremap. */
- ret = vrm_implies_new_addr(vrm) ? mremap_to(vrm) : mremap_at(vrm);
+ res = vrm_implies_new_addr(vrm) ? mremap_to(vrm) : mremap_at(vrm);
out:
if (vrm->mmap_locked) {
mmap_write_unlock(mm);
vrm->mmap_locked = false;
- if (!offset_in_page(ret) && vrm->mlocked && vrm->new_len > vrm->old_len)
+ if (!offset_in_page(res) && vrm->mlocked && vrm->new_len > vrm->old_len)
mm_populate(vrm->new_addr + vrm->old_len, vrm->delta);
}
userfaultfd_unmap_complete(mm, vrm->uf_unmap_early);
- mremap_userfaultfd_complete(vrm->uf, vrm->addr, ret, vrm->old_len);
+ mremap_userfaultfd_complete(vrm->uf, vrm->addr, res, vrm->old_len);
userfaultfd_unmap_complete(mm, vrm->uf_unmap);
- return ret;
+ return res;
}
/*
--
2.50.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 02/10] mm/mremap: refactor initial parameter sanity checks
2025-07-07 5:27 [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs Lorenzo Stoakes
2025-07-07 5:27 ` [PATCH 01/10] mm/mremap: perform some simple cleanups Lorenzo Stoakes
@ 2025-07-07 5:27 ` Lorenzo Stoakes
2025-07-10 11:38 ` Vlastimil Babka
2025-07-07 5:27 ` [PATCH 03/10] mm/mremap: put VMA check and prep logic into helper function Lorenzo Stoakes
` (9 subsequent siblings)
11 siblings, 1 reply; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-07-07 5:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato,
Rik van Riel, linux-mm, linux-fsdevel, linux-kernel,
linux-kselftest
We are currently checking some things later, and some things
immediately. Aggregate the checks and avoid ones that need not be made.
Simplify things by aligning lengths immediately. Defer setting the delta
parameter until later, which removes some duplicate code in the hugetlb
case.
We can safely perform the checks moved from mremap_to() to
check_mremap_params() because:
* If we set a new address via vrm_set_new_addr(), then this is guaranteed
to not overlap nor to position the new VMA past TASK_SIZE, so there's no
need to check these later.
* We can simply page align lengths immediately. We do not need to check for
overlap nor TASK_SIZE sanity after hugetlb alignment as this asserts
addresses are huge-aligned, then huge-aligns lengths, rounding down. This
means any existing overlap would have already been caught.
Moving things around like this lays the groundwork for subsequent changes
to permit operations on batches of VMAs.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/mremap.c | 29 ++++++++++++++---------------
1 file changed, 14 insertions(+), 15 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c
index 65c7f29b6116..9ce20c238ffd 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1413,14 +1413,6 @@ static unsigned long mremap_to(struct vma_remap_struct *vrm)
struct mm_struct *mm = current->mm;
unsigned long err;
- /* Is the new length or address silly? */
- if (vrm->new_len > TASK_SIZE ||
- vrm->new_addr > TASK_SIZE - vrm->new_len)
- return -EINVAL;
-
- if (vrm_overlaps(vrm))
- return -EINVAL;
-
if (vrm->flags & MREMAP_FIXED) {
/*
* In mremap_to().
@@ -1525,7 +1517,12 @@ static unsigned long check_mremap_params(struct vma_remap_struct *vrm)
* for DOS-emu "duplicate shm area" thing. But
* a zero new-len is nonsensical.
*/
- if (!PAGE_ALIGN(vrm->new_len))
+ if (!vrm->new_len)
+ return -EINVAL;
+
+ /* Is the new length or address silly? */
+ if (vrm->new_len > TASK_SIZE ||
+ vrm->new_addr > TASK_SIZE - vrm->new_len)
return -EINVAL;
/* Remainder of checks are for cases with specific new_addr. */
@@ -1544,6 +1541,10 @@ static unsigned long check_mremap_params(struct vma_remap_struct *vrm)
if (flags & MREMAP_DONTUNMAP && vrm->old_len != vrm->new_len)
return -EINVAL;
+ /* Target VMA must not overlap source VMA. */
+ if (vrm_overlaps(vrm))
+ return -EINVAL;
+
/*
* move_vma() need us to stay 4 maps below the threshold, otherwise
* it will bail out at the very beginning.
@@ -1620,8 +1621,6 @@ static bool align_hugetlb(struct vma_remap_struct *vrm)
if (vrm->new_len > vrm->old_len)
return false;
- vrm_set_delta(vrm);
-
return true;
}
@@ -1721,14 +1720,13 @@ static unsigned long do_mremap(struct vma_remap_struct *vrm)
struct vm_area_struct *vma;
unsigned long res;
+ vrm->old_len = PAGE_ALIGN(vrm->old_len);
+ vrm->new_len = PAGE_ALIGN(vrm->new_len);
+
res = check_mremap_params(vrm);
if (res)
return res;
- vrm->old_len = PAGE_ALIGN(vrm->old_len);
- vrm->new_len = PAGE_ALIGN(vrm->new_len);
- vrm_set_delta(vrm);
-
if (mmap_write_lock_killable(mm))
return -EINTR;
vrm->mmap_locked = true;
@@ -1751,6 +1749,7 @@ static unsigned long do_mremap(struct vma_remap_struct *vrm)
goto out;
}
+ vrm_set_delta(vrm);
vrm->remap_type = vrm_remap_type(vrm);
/* Actually execute mremap. */
--
2.50.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 03/10] mm/mremap: put VMA check and prep logic into helper function
2025-07-07 5:27 [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs Lorenzo Stoakes
2025-07-07 5:27 ` [PATCH 01/10] mm/mremap: perform some simple cleanups Lorenzo Stoakes
2025-07-07 5:27 ` [PATCH 02/10] mm/mremap: refactor initial parameter sanity checks Lorenzo Stoakes
@ 2025-07-07 5:27 ` Lorenzo Stoakes
2025-07-10 13:10 ` Vlastimil Babka
2025-07-07 5:27 ` [PATCH 04/10] mm/mremap: cleanup post-processing stage of mremap Lorenzo Stoakes
` (8 subsequent siblings)
11 siblings, 1 reply; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-07-07 5:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato,
Rik van Riel, linux-mm, linux-fsdevel, linux-kernel,
linux-kselftest
Rather than lumping everything together in do_mremap(), add a new helper
function, check_prep_vma(), to do the work relating to each VMA.
This further lays groundwork for subsequent patches which will allow for
batched VMA mremap().
Additionally, if we set vrm->new_addr == vrm->addr when prepping the VMA,
this avoids us needing to do so in the expand VMA mlocked case.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/mremap.c | 58 ++++++++++++++++++++++++++---------------------------
1 file changed, 28 insertions(+), 30 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c
index 9ce20c238ffd..60eb0ac8634b 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1634,7 +1634,6 @@ static bool align_hugetlb(struct vma_remap_struct *vrm)
static unsigned long expand_vma(struct vma_remap_struct *vrm)
{
unsigned long err;
- unsigned long addr = vrm->addr;
err = remap_is_valid(vrm);
if (err)
@@ -1649,16 +1648,8 @@ static unsigned long expand_vma(struct vma_remap_struct *vrm)
if (err)
return err;
- /*
- * We want to populate the newly expanded portion of the VMA to
- * satisfy the expectation that mlock()'ing a VMA maintains all
- * of its pages in memory.
- */
- if (vrm->mlocked)
- vrm->new_addr = addr;
-
/* OK we're done! */
- return addr;
+ return vrm->addr;
}
/*
@@ -1714,10 +1705,33 @@ static unsigned long mremap_at(struct vma_remap_struct *vrm)
return -EINVAL;
}
+static int check_prep_vma(struct vma_remap_struct *vrm)
+{
+ struct vm_area_struct *vma = vrm->vma;
+
+ if (!vma)
+ return -EFAULT;
+
+ /* If mseal()'d, mremap() is prohibited. */
+ if (!can_modify_vma(vma))
+ return -EPERM;
+
+ /* Align to hugetlb page size, if required. */
+ if (is_vm_hugetlb_page(vma) && !align_hugetlb(vrm))
+ return -EINVAL;
+
+ vrm_set_delta(vrm);
+ vrm->remap_type = vrm_remap_type(vrm);
+ /* For convenience, we set new_addr even if VMA won't move. */
+ if (!vrm_implies_new_addr(vrm))
+ vrm->new_addr = vrm->addr;
+
+ return 0;
+}
+
static unsigned long do_mremap(struct vma_remap_struct *vrm)
{
struct mm_struct *mm = current->mm;
- struct vm_area_struct *vma;
unsigned long res;
vrm->old_len = PAGE_ALIGN(vrm->old_len);
@@ -1731,26 +1745,10 @@ static unsigned long do_mremap(struct vma_remap_struct *vrm)
return -EINTR;
vrm->mmap_locked = true;
- vma = vrm->vma = vma_lookup(mm, vrm->addr);
- if (!vma) {
- res = -EFAULT;
- goto out;
- }
-
- /* If mseal()'d, mremap() is prohibited. */
- if (!can_modify_vma(vma)) {
- res = -EPERM;
- goto out;
- }
-
- /* Align to hugetlb page size, if required. */
- if (is_vm_hugetlb_page(vma) && !align_hugetlb(vrm)) {
- res = -EINVAL;
+ vrm->vma = vma_lookup(current->mm, vrm->addr);
+ res = check_prep_vma(vrm);
+ if (res)
goto out;
- }
-
- vrm_set_delta(vrm);
- vrm->remap_type = vrm_remap_type(vrm);
/* Actually execute mremap. */
res = vrm_implies_new_addr(vrm) ? mremap_to(vrm) : mremap_at(vrm);
--
2.50.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 04/10] mm/mremap: cleanup post-processing stage of mremap
2025-07-07 5:27 [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs Lorenzo Stoakes
` (2 preceding siblings ...)
2025-07-07 5:27 ` [PATCH 03/10] mm/mremap: put VMA check and prep logic into helper function Lorenzo Stoakes
@ 2025-07-07 5:27 ` Lorenzo Stoakes
2025-07-10 13:49 ` Vlastimil Babka
2025-07-07 5:27 ` [PATCH 05/10] mm/mremap: use an explicit uffd failure path for mremap Lorenzo Stoakes
` (7 subsequent siblings)
11 siblings, 1 reply; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-07-07 5:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato,
Rik van Riel, linux-mm, linux-fsdevel, linux-kernel,
linux-kselftest
Separate out the uffd bits so it clear's what's happening.
Don't bother setting vrm->mmap_locked after unlocking, because after this
we are done anyway.
The only time we drop the mmap lock is on VMA shrink, at which point
vrm->new_len will be < vrm->old_len and the operation will not be performed
anyway, so move this code out of the if (vrm->mmap_locked) block.
All addresses returned by mremap() are page-aligned, so the
offset_in_page() check on ret seems only to be incorrectly trying to detect
whether an error occurred - explicitly check for this.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/mremap.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c
index 60eb0ac8634b..660bdb75e2f9 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1729,6 +1729,15 @@ static int check_prep_vma(struct vma_remap_struct *vrm)
return 0;
}
+static void notify_uffd(struct vma_remap_struct *vrm, unsigned long ret)
+{
+ struct mm_struct *mm = current->mm;
+
+ userfaultfd_unmap_complete(mm, vrm->uf_unmap_early);
+ mremap_userfaultfd_complete(vrm->uf, vrm->addr, ret, vrm->old_len);
+ userfaultfd_unmap_complete(mm, vrm->uf_unmap);
+}
+
static unsigned long do_mremap(struct vma_remap_struct *vrm)
{
struct mm_struct *mm = current->mm;
@@ -1754,18 +1763,13 @@ static unsigned long do_mremap(struct vma_remap_struct *vrm)
res = vrm_implies_new_addr(vrm) ? mremap_to(vrm) : mremap_at(vrm);
out:
- if (vrm->mmap_locked) {
+ if (vrm->mmap_locked)
mmap_write_unlock(mm);
- vrm->mmap_locked = false;
-
- if (!offset_in_page(res) && vrm->mlocked && vrm->new_len > vrm->old_len)
- mm_populate(vrm->new_addr + vrm->old_len, vrm->delta);
- }
- userfaultfd_unmap_complete(mm, vrm->uf_unmap_early);
- mremap_userfaultfd_complete(vrm->uf, vrm->addr, res, vrm->old_len);
- userfaultfd_unmap_complete(mm, vrm->uf_unmap);
+ if (!IS_ERR_VALUE(res) && vrm->mlocked && vrm->new_len > vrm->old_len)
+ mm_populate(vrm->new_addr + vrm->old_len, vrm->delta);
+ notify_uffd(vrm, res);
return res;
}
--
2.50.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 05/10] mm/mremap: use an explicit uffd failure path for mremap
2025-07-07 5:27 [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs Lorenzo Stoakes
` (3 preceding siblings ...)
2025-07-07 5:27 ` [PATCH 04/10] mm/mremap: cleanup post-processing stage of mremap Lorenzo Stoakes
@ 2025-07-07 5:27 ` Lorenzo Stoakes
2025-07-07 7:56 ` kernel test robot
` (2 more replies)
2025-07-07 5:27 ` [PATCH 06/10] mm/mremap: check remap conditions earlier Lorenzo Stoakes
` (6 subsequent siblings)
11 siblings, 3 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-07-07 5:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato,
Rik van Riel, linux-mm, linux-fsdevel, linux-kernel,
linux-kselftest
Right now it appears that the code is relying upon the returned destination
address having bits outside PAGE_MASK to indicate whether an error value is
specified, and decrementing the increased refcount on the uffd ctx if so.
This is not a safe means of determining an error value, so instead, be
specific. It makes far more sense to do so in a dedicated error path, so
add mremap_userfaultfd_fail() for this purpose and use this when an error
arises.
A vm_userfaultfd_ctx is not established until we are at the point where
mremap_userfaultfd_prep() is invoked in copy_vma_and_data(), so this is a
no-op until this happens.
That is - uffd remap notification only occurs if the VMA is actually moved
- at which point a UFFD_EVENT_REMAP event is raised.
No errors can occur after this point currently, though it's certainly not
guaranteed this will always remain the case, and we mustn't rely on this.
However, the reason for needing to handle this case is that, when an error
arises on a VMA move at the point of adjusting page tables, we revert this
operation, and propagate the error.
At this point, it is not correct to raise a uffd remap event, and we must
handle it.
This refactoring makes it abundantly clear what we are doing.
We assume vrm->new_addr is always valid, which a prior change made the case
even for mremap() invocations which don't move the VMA, however given no
uffd context would be set up in this case it's immaterial to this change
anyway.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
fs/userfaultfd.c | 15 ++++++++++-----
include/linux/userfaultfd_k.h | 1 +
mm/mremap.c | 16 ++++++++++++----
3 files changed, 23 insertions(+), 9 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 2a644aa1a510..54c6cc7fe9c6 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -750,11 +750,6 @@ void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *vm_ctx,
if (!ctx)
return;
- if (to & ~PAGE_MASK) {
- userfaultfd_ctx_put(ctx);
- return;
- }
-
msg_init(&ewq.msg);
ewq.msg.event = UFFD_EVENT_REMAP;
@@ -765,6 +760,16 @@ void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *vm_ctx,
userfaultfd_event_wait_completion(ctx, &ewq);
}
+void mremap_userfaultfd_fail(struct vm_userfaultfd_ctx *vm_ctx)
+{
+ struct userfaultfd_ctx *ctx = vm_ctx->ctx;
+
+ if (!ctx)
+ return;
+
+ userfaultfd_ctx_put(ctx);
+}
+
bool userfaultfd_remove(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index df85330bcfa6..6680a4de40b3 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -259,6 +259,7 @@ extern void mremap_userfaultfd_prep(struct vm_area_struct *,
extern void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *,
unsigned long from, unsigned long to,
unsigned long len);
+void mremap_userfaultfd_fail(struct vm_userfaultfd_ctx *);
extern bool userfaultfd_remove(struct vm_area_struct *vma,
unsigned long start,
diff --git a/mm/mremap.c b/mm/mremap.c
index 660bdb75e2f9..db7e773d0884 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1729,12 +1729,17 @@ static int check_prep_vma(struct vma_remap_struct *vrm)
return 0;
}
-static void notify_uffd(struct vma_remap_struct *vrm, unsigned long ret)
+static void notify_uffd(struct vma_remap_struct *vrm, bool failed)
{
struct mm_struct *mm = current->mm;
+ /* Regardless of success/failure, we always notify of any unmaps. */
userfaultfd_unmap_complete(mm, vrm->uf_unmap_early);
- mremap_userfaultfd_complete(vrm->uf, vrm->addr, ret, vrm->old_len);
+ if (failed)
+ mremap_userfaultfd_fail(vrm->uf);
+ else
+ mremap_userfaultfd_complete(vrm->uf, vrm->addr,
+ vrm->new_addr, vrm->old_len);
userfaultfd_unmap_complete(mm, vrm->uf_unmap);
}
@@ -1742,6 +1747,7 @@ static unsigned long do_mremap(struct vma_remap_struct *vrm)
{
struct mm_struct *mm = current->mm;
unsigned long res;
+ bool failed;
vrm->old_len = PAGE_ALIGN(vrm->old_len);
vrm->new_len = PAGE_ALIGN(vrm->new_len);
@@ -1763,13 +1769,15 @@ static unsigned long do_mremap(struct vma_remap_struct *vrm)
res = vrm_implies_new_addr(vrm) ? mremap_to(vrm) : mremap_at(vrm);
out:
+ failed = IS_ERR_VALUE(res);
+
if (vrm->mmap_locked)
mmap_write_unlock(mm);
- if (!IS_ERR_VALUE(res) && vrm->mlocked && vrm->new_len > vrm->old_len)
+ if (!failed && vrm->mlocked && vrm->new_len > vrm->old_len)
mm_populate(vrm->new_addr + vrm->old_len, vrm->delta);
- notify_uffd(vrm, res);
+ notify_uffd(vrm, failed);
return res;
}
--
2.50.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 06/10] mm/mremap: check remap conditions earlier
2025-07-07 5:27 [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs Lorenzo Stoakes
` (4 preceding siblings ...)
2025-07-07 5:27 ` [PATCH 05/10] mm/mremap: use an explicit uffd failure path for mremap Lorenzo Stoakes
@ 2025-07-07 5:27 ` Lorenzo Stoakes
2025-07-10 14:36 ` Vlastimil Babka
2025-07-07 5:27 ` [PATCH 07/10] mm/mremap: move remap_is_valid() into check_prep_vma() Lorenzo Stoakes
` (5 subsequent siblings)
11 siblings, 1 reply; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-07-07 5:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato,
Rik van Riel, linux-mm, linux-fsdevel, linux-kernel,
linux-kselftest
When we expand or move a VMA, this requires a number of additional checks
to be performed.
Make it really obvious under what circumstances these checks must be
performed and aggregate all the checks in one place by invoking this in
check_prep_vma().
We have to adjust the checks to account for shrink + move operations by
checking new_len <= old_len rather than new_len == old_len.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/mremap.c | 28 +++++++++++++++++++---------
1 file changed, 19 insertions(+), 9 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c
index db7e773d0884..20844fb91755 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1343,7 +1343,7 @@ static int remap_is_valid(struct vma_remap_struct *vrm)
if (old_len > vma->vm_end - addr)
return -EFAULT;
- if (new_len == old_len)
+ if (new_len <= old_len)
return 0;
/* Need to be careful about a growing mapping */
@@ -1443,10 +1443,6 @@ static unsigned long mremap_to(struct vma_remap_struct *vrm)
vrm->old_len = vrm->new_len;
}
- err = remap_is_valid(vrm);
- if (err)
- return err;
-
/* MREMAP_DONTUNMAP expands by old_len since old_len == new_len */
if (vrm->flags & MREMAP_DONTUNMAP) {
vm_flags_t vm_flags = vrm->vma->vm_flags;
@@ -1635,10 +1631,6 @@ static unsigned long expand_vma(struct vma_remap_struct *vrm)
{
unsigned long err;
- err = remap_is_valid(vrm);
- if (err)
- return err;
-
/*
* [addr, old_len) spans precisely to the end of the VMA, so try to
* expand it in-place.
@@ -1705,6 +1697,21 @@ static unsigned long mremap_at(struct vma_remap_struct *vrm)
return -EINVAL;
}
+/*
+ * Will this operation result in the VMA being expanded or moved and thus need
+ * to map a new portion of virtual address space?
+ */
+static bool vrm_will_map_new(struct vma_remap_struct *vrm)
+{
+ if (vrm->remap_type == MREMAP_EXPAND)
+ return true;
+
+ if (vrm_implies_new_addr(vrm))
+ return true;
+
+ return false;
+}
+
static int check_prep_vma(struct vma_remap_struct *vrm)
{
struct vm_area_struct *vma = vrm->vma;
@@ -1726,6 +1733,9 @@ static int check_prep_vma(struct vma_remap_struct *vrm)
if (!vrm_implies_new_addr(vrm))
vrm->new_addr = vrm->addr;
+ if (vrm_will_map_new(vrm))
+ return remap_is_valid(vrm);
+
return 0;
}
--
2.50.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 07/10] mm/mremap: move remap_is_valid() into check_prep_vma()
2025-07-07 5:27 [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs Lorenzo Stoakes
` (5 preceding siblings ...)
2025-07-07 5:27 ` [PATCH 06/10] mm/mremap: check remap conditions earlier Lorenzo Stoakes
@ 2025-07-07 5:27 ` Lorenzo Stoakes
2025-07-10 14:44 ` Vlastimil Babka
2025-07-07 5:27 ` [PATCH 08/10] mm/mremap: clean up mlock populate behaviour Lorenzo Stoakes
` (4 subsequent siblings)
11 siblings, 1 reply; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-07-07 5:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato,
Rik van Riel, linux-mm, linux-fsdevel, linux-kernel,
linux-kselftest
Group parameter check logic together, moving check_mremap_params() next to
it.
This puts all such checks into a single place, and invokes them early so we
can simply bail out as soon as we are aware that a condition is not met.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/mremap.c | 273 +++++++++++++++++++++++++---------------------------
1 file changed, 131 insertions(+), 142 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c
index 20844fb91755..3678f21c2c36 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1306,64 +1306,6 @@ static unsigned long move_vma(struct vma_remap_struct *vrm)
return err ? (unsigned long)err : vrm->new_addr;
}
-/*
- * remap_is_valid() - Ensure the VMA can be moved or resized to the new length,
- * at the given address.
- *
- * Return 0 on success, error otherwise.
- */
-static int remap_is_valid(struct vma_remap_struct *vrm)
-{
- struct mm_struct *mm = current->mm;
- struct vm_area_struct *vma = vrm->vma;
- unsigned long addr = vrm->addr;
- unsigned long old_len = vrm->old_len;
- unsigned long new_len = vrm->new_len;
- unsigned long pgoff;
-
- /*
- * !old_len is a special case where an attempt is made to 'duplicate'
- * a mapping. This makes no sense for private mappings as it will
- * instead create a fresh/new mapping unrelated to the original. This
- * is contrary to the basic idea of mremap which creates new mappings
- * based on the original. There are no known use cases for this
- * behavior. As a result, fail such attempts.
- */
- if (!old_len && !(vma->vm_flags & (VM_SHARED | VM_MAYSHARE))) {
- pr_warn_once("%s (%d): attempted to duplicate a private mapping with mremap. This is not supported.\n",
- current->comm, current->pid);
- return -EINVAL;
- }
-
- if ((vrm->flags & MREMAP_DONTUNMAP) &&
- (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP)))
- return -EINVAL;
-
- /* We can't remap across vm area boundaries */
- if (old_len > vma->vm_end - addr)
- return -EFAULT;
-
- if (new_len <= old_len)
- return 0;
-
- /* Need to be careful about a growing mapping */
- pgoff = (addr - vma->vm_start) >> PAGE_SHIFT;
- pgoff += vma->vm_pgoff;
- if (pgoff + (new_len >> PAGE_SHIFT) < pgoff)
- return -EINVAL;
-
- if (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP))
- return -EFAULT;
-
- if (!mlock_future_ok(mm, vma->vm_flags, vrm->delta))
- return -EAGAIN;
-
- if (!may_expand_vm(mm, vma->vm_flags, vrm->delta >> PAGE_SHIFT))
- return -ENOMEM;
-
- return 0;
-}
-
/*
* The user has requested that the VMA be shrunk (i.e., old_len > new_len), so
* execute this, optionally dropping the mmap lock when we do so.
@@ -1490,77 +1432,6 @@ static bool vrm_can_expand_in_place(struct vma_remap_struct *vrm)
return true;
}
-/*
- * Are the parameters passed to mremap() valid? If so return 0, otherwise return
- * error.
- */
-static unsigned long check_mremap_params(struct vma_remap_struct *vrm)
-
-{
- unsigned long addr = vrm->addr;
- unsigned long flags = vrm->flags;
-
- /* Ensure no unexpected flag values. */
- if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP))
- return -EINVAL;
-
- /* Start address must be page-aligned. */
- if (offset_in_page(addr))
- return -EINVAL;
-
- /*
- * We allow a zero old-len as a special case
- * for DOS-emu "duplicate shm area" thing. But
- * a zero new-len is nonsensical.
- */
- if (!vrm->new_len)
- return -EINVAL;
-
- /* Is the new length or address silly? */
- if (vrm->new_len > TASK_SIZE ||
- vrm->new_addr > TASK_SIZE - vrm->new_len)
- return -EINVAL;
-
- /* Remainder of checks are for cases with specific new_addr. */
- if (!vrm_implies_new_addr(vrm))
- return 0;
-
- /* The new address must be page-aligned. */
- if (offset_in_page(vrm->new_addr))
- return -EINVAL;
-
- /* A fixed address implies a move. */
- if (!(flags & MREMAP_MAYMOVE))
- return -EINVAL;
-
- /* MREMAP_DONTUNMAP does not allow resizing in the process. */
- if (flags & MREMAP_DONTUNMAP && vrm->old_len != vrm->new_len)
- return -EINVAL;
-
- /* Target VMA must not overlap source VMA. */
- if (vrm_overlaps(vrm))
- return -EINVAL;
-
- /*
- * move_vma() need us to stay 4 maps below the threshold, otherwise
- * it will bail out at the very beginning.
- * That is a problem if we have already unmaped the regions here
- * (new_addr, and old_addr), because userspace will not know the
- * state of the vma's after it gets -ENOMEM.
- * So, to avoid such scenario we can pre-compute if the whole
- * operation has high chances to success map-wise.
- * Worst-scenario case is when both vma's (new_addr and old_addr) get
- * split in 3 before unmapping it.
- * That means 2 more maps (1 for each) to the ones we already hold.
- * Check whether current map count plus 2 still leads us to 4 maps below
- * the threshold, otherwise return -ENOMEM here to be more safe.
- */
- if ((current->mm->map_count + 2) >= sysctl_max_map_count - 3)
- return -ENOMEM;
-
- return 0;
-}
-
/*
* We know we can expand the VMA in-place by delta pages, so do so.
*
@@ -1712,9 +1583,26 @@ static bool vrm_will_map_new(struct vma_remap_struct *vrm)
return false;
}
+static void notify_uffd(struct vma_remap_struct *vrm, bool failed)
+{
+ struct mm_struct *mm = current->mm;
+
+ /* Regardless of success/failure, we always notify of any unmaps. */
+ userfaultfd_unmap_complete(mm, vrm->uf_unmap_early);
+ if (failed)
+ mremap_userfaultfd_fail(vrm->uf);
+ else
+ mremap_userfaultfd_complete(vrm->uf, vrm->addr,
+ vrm->new_addr, vrm->old_len);
+ userfaultfd_unmap_complete(mm, vrm->uf_unmap);
+}
+
static int check_prep_vma(struct vma_remap_struct *vrm)
{
struct vm_area_struct *vma = vrm->vma;
+ struct mm_struct *mm = current->mm;
+ unsigned long addr = vrm->addr;
+ unsigned long old_len, new_len, pgoff;
if (!vma)
return -EFAULT;
@@ -1731,26 +1619,127 @@ static int check_prep_vma(struct vma_remap_struct *vrm)
vrm->remap_type = vrm_remap_type(vrm);
/* For convenience, we set new_addr even if VMA won't move. */
if (!vrm_implies_new_addr(vrm))
- vrm->new_addr = vrm->addr;
+ vrm->new_addr = addr;
+
+ /* Below only meaningful if we expand or move a VMA. */
+ if (!vrm_will_map_new(vrm))
+ return 0;
- if (vrm_will_map_new(vrm))
- return remap_is_valid(vrm);
+ old_len = vrm->old_len;
+ new_len = vrm->new_len;
+
+ /*
+ * !old_len is a special case where an attempt is made to 'duplicate'
+ * a mapping. This makes no sense for private mappings as it will
+ * instead create a fresh/new mapping unrelated to the original. This
+ * is contrary to the basic idea of mremap which creates new mappings
+ * based on the original. There are no known use cases for this
+ * behavior. As a result, fail such attempts.
+ */
+ if (!old_len && !(vma->vm_flags & (VM_SHARED | VM_MAYSHARE))) {
+ pr_warn_once("%s (%d): attempted to duplicate a private mapping with mremap. This is not supported.\n",
+ current->comm, current->pid);
+ return -EINVAL;
+ }
+
+ if ((vrm->flags & MREMAP_DONTUNMAP) &&
+ (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP)))
+ return -EINVAL;
+
+ /* We can't remap across vm area boundaries */
+ if (old_len > vma->vm_end - addr)
+ return -EFAULT;
+
+ if (new_len <= old_len)
+ return 0;
+
+ /* Need to be careful about a growing mapping */
+ pgoff = (addr - vma->vm_start) >> PAGE_SHIFT;
+ pgoff += vma->vm_pgoff;
+ if (pgoff + (new_len >> PAGE_SHIFT) < pgoff)
+ return -EINVAL;
+
+ if (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP))
+ return -EFAULT;
+
+ if (!mlock_future_ok(mm, vma->vm_flags, vrm->delta))
+ return -EAGAIN;
+
+ if (!may_expand_vm(mm, vma->vm_flags, vrm->delta >> PAGE_SHIFT))
+ return -ENOMEM;
return 0;
}
-static void notify_uffd(struct vma_remap_struct *vrm, bool failed)
+/*
+ * Are the parameters passed to mremap() valid? If so return 0, otherwise return
+ * error.
+ */
+static unsigned long check_mremap_params(struct vma_remap_struct *vrm)
+
{
- struct mm_struct *mm = current->mm;
+ unsigned long addr = vrm->addr;
+ unsigned long flags = vrm->flags;
- /* Regardless of success/failure, we always notify of any unmaps. */
- userfaultfd_unmap_complete(mm, vrm->uf_unmap_early);
- if (failed)
- mremap_userfaultfd_fail(vrm->uf);
- else
- mremap_userfaultfd_complete(vrm->uf, vrm->addr,
- vrm->new_addr, vrm->old_len);
- userfaultfd_unmap_complete(mm, vrm->uf_unmap);
+ /* Ensure no unexpected flag values. */
+ if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP))
+ return -EINVAL;
+
+ /* Start address must be page-aligned. */
+ if (offset_in_page(addr))
+ return -EINVAL;
+
+ /*
+ * We allow a zero old-len as a special case
+ * for DOS-emu "duplicate shm area" thing. But
+ * a zero new-len is nonsensical.
+ */
+ if (!vrm->new_len)
+ return -EINVAL;
+
+ /* Is the new length or address silly? */
+ if (vrm->new_len > TASK_SIZE ||
+ vrm->new_addr > TASK_SIZE - vrm->new_len)
+ return -EINVAL;
+
+ /* Remainder of checks are for cases with specific new_addr. */
+ if (!vrm_implies_new_addr(vrm))
+ return 0;
+
+ /* The new address must be page-aligned. */
+ if (offset_in_page(vrm->new_addr))
+ return -EINVAL;
+
+ /* A fixed address implies a move. */
+ if (!(flags & MREMAP_MAYMOVE))
+ return -EINVAL;
+
+ /* MREMAP_DONTUNMAP does not allow resizing in the process. */
+ if (flags & MREMAP_DONTUNMAP && vrm->old_len != vrm->new_len)
+ return -EINVAL;
+
+ /* Target VMA must not overlap source VMA. */
+ if (vrm_overlaps(vrm))
+ return -EINVAL;
+
+ /*
+ * move_vma() need us to stay 4 maps below the threshold, otherwise
+ * it will bail out at the very beginning.
+ * That is a problem if we have already unmaped the regions here
+ * (new_addr, and old_addr), because userspace will not know the
+ * state of the vma's after it gets -ENOMEM.
+ * So, to avoid such scenario we can pre-compute if the whole
+ * operation has high chances to success map-wise.
+ * Worst-scenario case is when both vma's (new_addr and old_addr) get
+ * split in 3 before unmapping it.
+ * That means 2 more maps (1 for each) to the ones we already hold.
+ * Check whether current map count plus 2 still leads us to 4 maps below
+ * the threshold, otherwise return -ENOMEM here to be more safe.
+ */
+ if ((current->mm->map_count + 2) >= sysctl_max_map_count - 3)
+ return -ENOMEM;
+
+ return 0;
}
static unsigned long do_mremap(struct vma_remap_struct *vrm)
--
2.50.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 08/10] mm/mremap: clean up mlock populate behaviour
2025-07-07 5:27 [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs Lorenzo Stoakes
` (6 preceding siblings ...)
2025-07-07 5:27 ` [PATCH 07/10] mm/mremap: move remap_is_valid() into check_prep_vma() Lorenzo Stoakes
@ 2025-07-07 5:27 ` Lorenzo Stoakes
2025-07-10 14:47 ` Vlastimil Babka
2025-07-07 5:27 ` [PATCH 09/10] mm/mremap: permit mremap() move of multiple VMAs Lorenzo Stoakes
` (3 subsequent siblings)
11 siblings, 1 reply; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-07-07 5:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato,
Rik van Riel, linux-mm, linux-fsdevel, linux-kernel,
linux-kselftest
When an mlock()'d VMA is expanded, we need to populate the expanded region
to maintain the contract that all mlock()'d memory is present (albeit -
with some period after mmap unlock where the expanded part of the mapping
remains unfaulted).
The current implementation is very unclear, so make it absolutely explicit
under what circumstances we do this.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/mremap.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c
index 3678f21c2c36..28e776cddc08 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -65,7 +65,7 @@ struct vma_remap_struct {
/* Internal state, determined in do_mremap(). */
unsigned long delta; /* Absolute delta of old_len,new_len. */
- bool mlocked; /* Was the VMA mlock()'d? */
+ bool populate_expand; /* mlock()'d expanded, must populate. */
enum mremap_type remap_type; /* expand, shrink, etc. */
bool mmap_locked; /* Is mm currently write-locked? */
unsigned long charged; /* If VM_ACCOUNT, # pages to account. */
@@ -1010,10 +1010,8 @@ static void vrm_stat_account(struct vma_remap_struct *vrm,
struct vm_area_struct *vma = vrm->vma;
vm_stat_account(mm, vma->vm_flags, pages);
- if (vma->vm_flags & VM_LOCKED) {
+ if (vma->vm_flags & VM_LOCKED)
mm->locked_vm += pages;
- vrm->mlocked = true;
- }
}
/*
@@ -1653,6 +1651,10 @@ static int check_prep_vma(struct vma_remap_struct *vrm)
if (new_len <= old_len)
return 0;
+ /* We are expanding and the VMA is mlock()'d so we need to populate. */
+ if (vma->vm_flags & VM_LOCKED)
+ vrm->populate_expand = true;
+
/* Need to be careful about a growing mapping */
pgoff = (addr - vma->vm_start) >> PAGE_SHIFT;
pgoff += vma->vm_pgoff;
@@ -1773,7 +1775,8 @@ static unsigned long do_mremap(struct vma_remap_struct *vrm)
if (vrm->mmap_locked)
mmap_write_unlock(mm);
- if (!failed && vrm->mlocked && vrm->new_len > vrm->old_len)
+ /* VMA mlock'd + was expanded, so populated expanded region. */
+ if (!failed && vrm->populate_expand)
mm_populate(vrm->new_addr + vrm->old_len, vrm->delta);
notify_uffd(vrm, failed);
--
2.50.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 09/10] mm/mremap: permit mremap() move of multiple VMAs
2025-07-07 5:27 [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs Lorenzo Stoakes
` (7 preceding siblings ...)
2025-07-07 5:27 ` [PATCH 08/10] mm/mremap: clean up mlock populate behaviour Lorenzo Stoakes
@ 2025-07-07 5:27 ` Lorenzo Stoakes
2025-07-09 18:13 ` Liam R. Howlett
2025-07-11 8:17 ` Mark Brown
2025-07-07 5:27 ` [PATCH 10/10] tools/testing/selftests: extend mremap_test to test multi-VMA mremap Lorenzo Stoakes
` (2 subsequent siblings)
11 siblings, 2 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-07-07 5:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato,
Rik van Riel, linux-mm, linux-fsdevel, linux-kernel,
linux-kselftest
Historically we've made it a uAPI requirement that mremap() may only
operate on a single VMA at a time.
For instances where VMAs need to be resized, this makes sense, as it
becomes very difficult to determine what a user actually wants should they
indicate a desire to expand or shrink the size of multiple VMAs (truncate?
Adjust sizes individually? Some other strategy?).
However, in instances where a user is moving VMAs, it is restrictive to
disallow this.
This is especially the case when anonymous mapping remap may or may not be
mergeable depending on whether VMAs have or have not been faulted due to
anon_vma assignment and folio index alignment with vma->vm_pgoff.
Often this can result in surprising impact where a moved region is faulted,
then moved back and a user fails to observe a merge from otherwise
compatible, adjacent VMAs.
This change allows such cases to work without the user having to be
cognizant of whether a prior mremap() move or other VMA operations has
resulted in VMA fragmentation.
Having refactored mremap code to aggregate per-VMA and parameter checks, we
are now in a position to permit this kind of move.
We do so by detecting if this is a move-only operation up-front, and then
utilising a separate code path via remap_move() rather than the ordinary
single-VMA path.
There are two tasks that occur outside of the mmap write lock - userfaultfd
notification and population of unmapped regions of expanded VMAs should the
VMA be mlock()'d.
The latter doesn't apply, as this is logic for a move only and thus no
expansion can take place. In the former case, we explicitly disallow
multi-VMA operations on uffd-armed VMAs.
The mmap lock is never dropped in the move-only case, this only occurs on a
VMA shrink.
We take care to handle cases where a VMA merge has occurred, by resetting
the VMA iterator in such instances.
We needn't worry about self-merges, as in those cases we would, by
definition, not be spanning multiple VMAs. The overlapping range test is
performed on the whole range so specifically disallows this.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/mremap.c | 106 ++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 99 insertions(+), 7 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c
index 28e776cddc08..2e6005e1d22c 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -69,6 +69,8 @@ struct vma_remap_struct {
enum mremap_type remap_type; /* expand, shrink, etc. */
bool mmap_locked; /* Is mm currently write-locked? */
unsigned long charged; /* If VM_ACCOUNT, # pages to account. */
+ bool multi_vma; /* Is >1 VMA being moved? */
+ bool vma_reset; /* Was the VMA merged/unmap occur? */
};
static pud_t *get_old_pud(struct mm_struct *mm, unsigned long addr)
@@ -1111,6 +1113,7 @@ static void unmap_source_vma(struct vma_remap_struct *vrm)
err = do_vmi_munmap(&vmi, mm, addr, len, vrm->uf_unmap, /* unlock= */false);
vrm->vma = NULL; /* Invalidated. */
+ vrm->vma_reset = true;
if (err) {
/* OOM: unable to split vma, just get accounts right */
vm_acct_memory(len >> PAGE_SHIFT);
@@ -1181,6 +1184,7 @@ static int copy_vma_and_data(struct vma_remap_struct *vrm,
new_vma = copy_vma(&vma, vrm->new_addr, vrm->new_len, new_pgoff,
&pmc.need_rmap_locks);
+ vrm->vma_reset = vma != vrm->vma;
if (!new_vma) {
vrm_uncharge(vrm);
*new_vma_ptr = NULL;
@@ -1325,6 +1329,7 @@ static unsigned long shrink_vma(struct vma_remap_struct *vrm,
res = do_vmi_munmap(&vmi, mm, unmap_start, unmap_bytes,
vrm->uf_unmap, drop_lock);
vrm->vma = NULL; /* Invalidated. */
+ vrm->vma_reset = true;
if (res)
return res;
@@ -1362,6 +1367,7 @@ static unsigned long mremap_to(struct vma_remap_struct *vrm)
err = do_munmap(mm, vrm->new_addr, vrm->new_len,
vrm->uf_unmap_early);
vrm->vma = NULL; /* Invalidated. */
+ vrm->vma_reset = true;
if (err)
return err;
@@ -1581,6 +1587,18 @@ static bool vrm_will_map_new(struct vma_remap_struct *vrm)
return false;
}
+/* Does this remap ONLY move mappings? */
+static bool vrm_move_only(struct vma_remap_struct *vrm)
+{
+ if (!vrm_implies_new_addr(vrm))
+ return false;
+
+ if (vrm->old_len != vrm->new_len)
+ return false;
+
+ return true;
+}
+
static void notify_uffd(struct vma_remap_struct *vrm, bool failed)
{
struct mm_struct *mm = current->mm;
@@ -1644,10 +1662,29 @@ static int check_prep_vma(struct vma_remap_struct *vrm)
(vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP)))
return -EINVAL;
- /* We can't remap across vm area boundaries */
+ /*
+ * We can't remap across the end of VMAs, as another VMA may be
+ * adjacent:
+ *
+ * addr vma->vm_end
+ * |-----.----------|
+ * | . |
+ * |-----.----------|
+ * .<--------->xxx>
+ * old_len
+ *
+ * We also require that vma->vm_start <= addr < vma->vm_end.
+ */
if (old_len > vma->vm_end - addr)
return -EFAULT;
+ /*
+ * We can't support moving multiple uffd VMAs as notify requires mmap
+ * lock to be dropped.
+ */
+ if (vrm->multi_vma && userfaultfd_armed(vma))
+ return -EINVAL;
+
if (new_len <= old_len)
return 0;
@@ -1744,6 +1781,57 @@ static unsigned long check_mremap_params(struct vma_remap_struct *vrm)
return 0;
}
+static unsigned long remap_move(struct vma_remap_struct *vrm)
+{
+ struct vm_area_struct *vma;
+ unsigned long start = vrm->addr;
+ unsigned long end = vrm->addr + vrm->old_len;
+ unsigned long new_addr = vrm->new_addr;
+ unsigned long prev_addr = start;
+ VMA_ITERATOR(vmi, current->mm, start);
+
+ /*
+ * When moving VMAs we allow for batched moves across multiple VMAs,
+ * with all VMAs in the input range [addr, addr + old_len) being moved
+ * (and split as necessary).
+ */
+ for_each_vma_range(vmi, vma, end) {
+ unsigned long addr = max(vma->vm_start, start);
+ unsigned long len = min(end, vma->vm_end) - addr;
+ unsigned long offset = addr - start;
+ unsigned long res;
+
+ /* Merged with self, move on. */
+ if (vrm->multi_vma && prev_addr == addr)
+ continue;
+
+ vrm->vma = vma;
+ vrm->addr = addr;
+ vrm->new_addr = new_addr + offset;
+ vrm->old_len = vrm->new_len = len;
+
+ res = check_prep_vma(vrm);
+ if (!res)
+ res = mremap_to(vrm);
+ if (IS_ERR_VALUE(res))
+ return res;
+
+ /* mmap lock is only dropped on shrink. */
+ VM_WARN_ON_ONCE(!vrm->mmap_locked);
+ /* This is a move, no expand should occur. */
+ VM_WARN_ON_ONCE(vrm->populate_expand);
+
+ if (vrm->vma_reset) {
+ vma_iter_reset(&vmi);
+ vrm->vma_reset = false;
+ }
+ vrm->multi_vma = true;
+ prev_addr = addr;
+ }
+
+ return new_addr;
+}
+
static unsigned long do_mremap(struct vma_remap_struct *vrm)
{
struct mm_struct *mm = current->mm;
@@ -1761,13 +1849,17 @@ static unsigned long do_mremap(struct vma_remap_struct *vrm)
return -EINTR;
vrm->mmap_locked = true;
- vrm->vma = vma_lookup(current->mm, vrm->addr);
- res = check_prep_vma(vrm);
- if (res)
- goto out;
+ if (vrm_move_only(vrm)) {
+ res = remap_move(vrm);
+ } else {
+ vrm->vma = vma_lookup(current->mm, vrm->addr);
+ res = check_prep_vma(vrm);
+ if (res)
+ goto out;
- /* Actually execute mremap. */
- res = vrm_implies_new_addr(vrm) ? mremap_to(vrm) : mremap_at(vrm);
+ /* Actually execute mremap. */
+ res = vrm_implies_new_addr(vrm) ? mremap_to(vrm) : mremap_at(vrm);
+ }
out:
failed = IS_ERR_VALUE(res);
--
2.50.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 10/10] tools/testing/selftests: extend mremap_test to test multi-VMA mremap
2025-07-07 5:27 [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs Lorenzo Stoakes
` (8 preceding siblings ...)
2025-07-07 5:27 ` [PATCH 09/10] mm/mremap: permit mremap() move of multiple VMAs Lorenzo Stoakes
@ 2025-07-07 5:27 ` Lorenzo Stoakes
2025-07-07 6:12 ` [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs Hugh Dickins
2025-07-07 10:34 ` Lorenzo Stoakes
11 siblings, 0 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-07-07 5:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato,
Rik van Riel, linux-mm, linux-fsdevel, linux-kernel,
linux-kselftest
Now that we have added the ability to move multiple VMAs at once, assert
that this functions correctly, both overwriting VMAs and moving backwards
and forwards with merge and VMA invalidation.
Additionally assert that page tables are correctly propagated by setting
random data and reading it back.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
tools/testing/selftests/mm/mremap_test.c | 145 ++++++++++++++++++++++-
1 file changed, 144 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
index bb84476a177f..36b93a421161 100644
--- a/tools/testing/selftests/mm/mremap_test.c
+++ b/tools/testing/selftests/mm/mremap_test.c
@@ -380,6 +380,148 @@ static void mremap_move_within_range(unsigned int pattern_seed, char *rand_addr)
ksft_test_result_fail("%s\n", test_name);
}
+static bool is_multiple_vma_range_ok(unsigned int pattern_seed,
+ char *ptr, unsigned long page_size)
+{
+ int i;
+
+ srand(pattern_seed);
+ for (i = 0; i <= 10; i += 2) {
+ int j;
+ char *buf = &ptr[i * page_size];
+ size_t size = i == 4 ? 2 * page_size : page_size;
+
+ for (j = 0; j < size; j++) {
+ char chr = rand();
+
+ if (chr != buf[j]) {
+ ksft_print_msg("page %d offset %d corrupted, expected %d got %d\n",
+ i, j, chr, buf[j]);
+ return false;
+ }
+ }
+ }
+
+ return true;
+}
+
+static void mremap_move_multiple_vmas(unsigned int pattern_seed,
+ unsigned long page_size)
+{
+ char *test_name = "mremap move multiple vmas";
+ const size_t size = 11 * page_size;
+ bool success = true;
+ char *ptr, *tgt_ptr;
+ int i;
+
+ ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANON, -1, 0);
+ if (ptr == MAP_FAILED) {
+ perror("mmap");
+ success = false;
+ goto out;
+ }
+
+ tgt_ptr = mmap(NULL, 2 * size, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANON, -1, 0);
+ if (tgt_ptr == MAP_FAILED) {
+ perror("mmap");
+ success = false;
+ goto out;
+ }
+
+ /*
+ * Unmap so we end up with:
+ *
+ * 0 2 4 5 6 8 10 offset in buffer
+ * |*| |*| |*****| |*| |*|
+ * |*| |*| |*****| |*| |*|
+ * 0 1 2 3 4 5 6 pattern offset
+ */
+ for (i = 1; i < 10; i += 2) {
+ if (i == 5)
+ continue;
+
+ if (munmap(&ptr[i * page_size], page_size)) {
+ perror("munmap");
+ success = false;
+ goto out_unmap;
+ }
+ }
+
+ srand(pattern_seed);
+
+ /* Set up random patterns. */
+ for (i = 0; i <= 10; i += 2) {
+ int j;
+ size_t size = i == 4 ? 2 * page_size : page_size;
+ char *buf = &ptr[i * page_size];
+
+ for (j = 0; j < size; j++)
+ buf[j] = rand();
+ }
+
+ /* First, just move the whole thing. */
+ if (mremap(ptr, size, size,
+ MREMAP_MAYMOVE | MREMAP_FIXED, tgt_ptr) == MAP_FAILED) {
+ perror("mremap");
+ success = false;
+ goto out_unmap;
+ }
+
+ /* Check move was ok. */
+ if (!is_multiple_vma_range_ok(pattern_seed, tgt_ptr, page_size)) {
+ success = false;
+ goto out_unmap;
+ }
+
+ goto out_unmap;
+
+ /* Move next to itself. */
+ if (mremap(tgt_ptr, size, size,
+ MREMAP_MAYMOVE | MREMAP_FIXED, &tgt_ptr[size]) == MAP_FAILED) {
+ perror("mremap");
+ goto out_unmap;
+ }
+ /* Check that the move is ok. */
+ if (!is_multiple_vma_range_ok(pattern_seed, &tgt_ptr[size], page_size)) {
+ success = false;
+ goto out_unmap;
+ }
+
+ /* Map a range to overwrite. */
+ if (mmap(tgt_ptr, size, PROT_NONE,
+ MAP_PRIVATE | MAP_ANON | MAP_FIXED, -1, 0) == MAP_FAILED) {
+ perror("mmap tgt");
+ success = false;
+ goto out_unmap;
+ }
+
+ /* Move and overwrite. */
+ if (mremap(&tgt_ptr[size], size, size,
+ MREMAP_MAYMOVE | MREMAP_FIXED, tgt_ptr) == MAP_FAILED) {
+ perror("mremap");
+ goto out_unmap;
+ }
+ /* Check that the move is ok. */
+ if (!is_multiple_vma_range_ok(pattern_seed, tgt_ptr, page_size)) {
+ success = false;
+ goto out_unmap;
+ }
+
+out_unmap:
+ if (munmap(tgt_ptr, 2 * size))
+ perror("munmap tgt");
+ if (munmap(ptr, size))
+ perror("munmap src");
+
+out:
+ if (success)
+ ksft_test_result_pass("%s\n", test_name);
+ else
+ ksft_test_result_fail("%s\n", test_name);
+}
+
/* Returns the time taken for the remap on success else returns -1. */
static long long remap_region(struct config c, unsigned int threshold_mb,
char *rand_addr)
@@ -721,7 +863,7 @@ int main(int argc, char **argv)
char *rand_addr;
size_t rand_size;
int num_expand_tests = 2;
- int num_misc_tests = 2;
+ int num_misc_tests = 3;
struct test test_cases[MAX_TEST] = {};
struct test perf_test_cases[MAX_PERF_TEST];
int page_size;
@@ -848,6 +990,7 @@ int main(int argc, char **argv)
mremap_move_within_range(pattern_seed, rand_addr);
mremap_move_1mb_from_start(pattern_seed, rand_addr);
+ mremap_move_multiple_vmas(pattern_seed, page_size);
if (run_perf_tests) {
ksft_print_msg("\n%s\n",
--
2.50.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs
2025-07-07 5:27 [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs Lorenzo Stoakes
` (9 preceding siblings ...)
2025-07-07 5:27 ` [PATCH 10/10] tools/testing/selftests: extend mremap_test to test multi-VMA mremap Lorenzo Stoakes
@ 2025-07-07 6:12 ` Hugh Dickins
2025-07-07 10:31 ` Lorenzo Stoakes
2025-07-07 10:34 ` Lorenzo Stoakes
11 siblings, 1 reply; 31+ messages in thread
From: Hugh Dickins @ 2025-07-07 6:12 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Peter Xu, Alexander Viro, Christian Brauner,
Jan Kara, Liam R . Howlett, Vlastimil Babka, Jann Horn,
Pedro Falcato, Rik van Riel, linux-mm, linux-fsdevel,
linux-kernel, linux-kselftest
On Mon, 7 Jul 2025, Lorenzo Stoakes wrote:
> Historically we've made it a uAPI requirement that mremap() may only
> operate on a single VMA at a time.
>
> For instances where VMAs need to be resized, this makes sense, as it
> becomes very difficult to determine what a user actually wants should they
> indicate a desire to expand or shrink the size of multiple VMAs (truncate?
> Adjust sizes individually? Some other strategy?).
>
> However, in instances where a user is moving VMAs, it is restrictive to
> disallow this.
>
> This is especially the case when anonymous mapping remap may or may not be
> mergeable depending on whether VMAs have or have not been faulted due to
> anon_vma assignment and folio index alignment with vma->vm_pgoff.
>
> Often this can result in surprising impact where a moved region is faulted,
> then moved back and a user fails to observe a merge from otherwise
> compatible, adjacent VMAs.
>
> This change allows such cases to work without the user having to be
> cognizant of whether a prior mremap() move or other VMA operations has
> resulted in VMA fragmentation.
>
> In order to do this, this series performs a large amount of refactoring,
> most pertinently - grouping sanity checks together, separately those that
> check input parameters and those relating to VMAs.
>
> we also simplify the post-mmap lock drop processing for uffd and mlock()'d
> VMAs.
>
> With this done, we can then fairly straightforwardly implement this
> functionality.
>
> This works exclusively for mremap() invocations which specify
> MREMAP_FIXED. It is not compatible with VMAs which use userfaultfd, as the
> notification of the userland fault handler would require us to drop the
> mmap lock.
>
> The input and output addresses ranges must not overlap. We carefully
> account for moves which would result in VMA merges or would otherwise
> result in VMA iterator invalidation.
Applause!
No way shall I review this, but each time I've seen an mremap series
from Lorenzo go by, I've wanted to say "but wouldn't it be better to...";
but it felt too impertinent to prod you in a direction I'd never dare
take myself (and quite likely that you had already tried, but found it
fundamentally impossible).
Thank you, yes, this is a very welcome step forward.
Hugh
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 05/10] mm/mremap: use an explicit uffd failure path for mremap
2025-07-07 5:27 ` [PATCH 05/10] mm/mremap: use an explicit uffd failure path for mremap Lorenzo Stoakes
@ 2025-07-07 7:56 ` kernel test robot
2025-07-07 10:13 ` Lorenzo Stoakes
2025-07-07 10:20 ` Lorenzo Stoakes
2025-07-10 14:24 ` Vlastimil Babka
2 siblings, 1 reply; 31+ messages in thread
From: kernel test robot @ 2025-07-07 7:56 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: llvm, oe-kbuild-all, Linux Memory Management List, Peter Xu,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Jann Horn, Pedro Falcato, Rik van Riel,
linux-fsdevel, linux-kernel, linux-kselftest
Hi Lorenzo,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Lorenzo-Stoakes/mm-mremap-perform-some-simple-cleanups/20250707-133132
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/be3e068c77107d385d89eae634317cb59e04e5ba.1751865330.git.lorenzo.stoakes%40oracle.com
patch subject: [PATCH 05/10] mm/mremap: use an explicit uffd failure path for mremap
config: i386-buildonly-randconfig-002-20250707 (https://download.01.org/0day-ci/archive/20250707/202507071505.e2HFMCH2-lkp@intel.com/config)
compiler: clang version 20.1.7 (https://github.com/llvm/llvm-project 6146a88f60492b520a36f8f8f3231e15f3cc6082)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250707/202507071505.e2HFMCH2-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202507071505.e2HFMCH2-lkp@intel.com/
All errors (new ones prefixed by >>):
>> mm/mremap.c:1739:3: error: call to undeclared function 'mremap_userfaultfd_fail'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
1739 | mremap_userfaultfd_fail(vrm->uf);
| ^
mm/mremap.c:1739:3: note: did you mean 'mremap_userfaultfd_prep'?
include/linux/userfaultfd_k.h:363:20: note: 'mremap_userfaultfd_prep' declared here
363 | static inline void mremap_userfaultfd_prep(struct vm_area_struct *vma,
| ^
1 error generated.
vim +/mremap_userfaultfd_fail +1739 mm/mremap.c
1731
1732 static void notify_uffd(struct vma_remap_struct *vrm, bool failed)
1733 {
1734 struct mm_struct *mm = current->mm;
1735
1736 /* Regardless of success/failure, we always notify of any unmaps. */
1737 userfaultfd_unmap_complete(mm, vrm->uf_unmap_early);
1738 if (failed)
> 1739 mremap_userfaultfd_fail(vrm->uf);
1740 else
1741 mremap_userfaultfd_complete(vrm->uf, vrm->addr,
1742 vrm->new_addr, vrm->old_len);
1743 userfaultfd_unmap_complete(mm, vrm->uf_unmap);
1744 }
1745
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 05/10] mm/mremap: use an explicit uffd failure path for mremap
2025-07-07 7:56 ` kernel test robot
@ 2025-07-07 10:13 ` Lorenzo Stoakes
0 siblings, 0 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-07-07 10:13 UTC (permalink / raw)
To: kernel test robot
Cc: Andrew Morton, llvm, oe-kbuild-all, Linux Memory Management List,
Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato,
Rik van Riel, linux-fsdevel, linux-kernel, linux-kselftest
On Mon, Jul 07, 2025 at 03:56:53PM +0800, kernel test robot wrote:
> Hi Lorenzo,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on akpm-mm/mm-everything]
Thanks for the report, I just need to add a stub for this, will send a fix-patch!
Cheers, Lorenzo
>
> url: https://github.com/intel-lab-lkp/linux/commits/Lorenzo-Stoakes/mm-mremap-perform-some-simple-cleanups/20250707-133132
> base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link: https://lore.kernel.org/r/be3e068c77107d385d89eae634317cb59e04e5ba.1751865330.git.lorenzo.stoakes%40oracle.com
> patch subject: [PATCH 05/10] mm/mremap: use an explicit uffd failure path for mremap
> config: i386-buildonly-randconfig-002-20250707 (https://download.01.org/0day-ci/archive/20250707/202507071505.e2HFMCH2-lkp@intel.com/config)
> compiler: clang version 20.1.7 (https://github.com/llvm/llvm-project 6146a88f60492b520a36f8f8f3231e15f3cc6082)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250707/202507071505.e2HFMCH2-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202507071505.e2HFMCH2-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
> >> mm/mremap.c:1739:3: error: call to undeclared function 'mremap_userfaultfd_fail'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
> 1739 | mremap_userfaultfd_fail(vrm->uf);
> | ^
> mm/mremap.c:1739:3: note: did you mean 'mremap_userfaultfd_prep'?
> include/linux/userfaultfd_k.h:363:20: note: 'mremap_userfaultfd_prep' declared here
> 363 | static inline void mremap_userfaultfd_prep(struct vm_area_struct *vma,
> | ^
> 1 error generated.
>
>
> vim +/mremap_userfaultfd_fail +1739 mm/mremap.c
>
> 1731
> 1732 static void notify_uffd(struct vma_remap_struct *vrm, bool failed)
> 1733 {
> 1734 struct mm_struct *mm = current->mm;
> 1735
> 1736 /* Regardless of success/failure, we always notify of any unmaps. */
> 1737 userfaultfd_unmap_complete(mm, vrm->uf_unmap_early);
> 1738 if (failed)
> > 1739 mremap_userfaultfd_fail(vrm->uf);
> 1740 else
> 1741 mremap_userfaultfd_complete(vrm->uf, vrm->addr,
> 1742 vrm->new_addr, vrm->old_len);
> 1743 userfaultfd_unmap_complete(mm, vrm->uf_unmap);
> 1744 }
> 1745
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 05/10] mm/mremap: use an explicit uffd failure path for mremap
2025-07-07 5:27 ` [PATCH 05/10] mm/mremap: use an explicit uffd failure path for mremap Lorenzo Stoakes
2025-07-07 7:56 ` kernel test robot
@ 2025-07-07 10:20 ` Lorenzo Stoakes
2025-07-10 14:24 ` Vlastimil Babka
2 siblings, 0 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-07-07 10:20 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato,
Rik van Riel, linux-mm, linux-fsdevel, linux-kernel,
linux-kselftest
Hi Andrew,
I missed the !CONFIG_USERFAULTFD stub, could you apply the attached fix-patch?
Thanks!
----8<----
From 048bfb6ee415843bd584c64a2c6e6be9b1114962 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Mon, 7 Jul 2025 11:15:18 +0100
Subject: [PATCH] add missing mremap_userfaultfd_fail() stub
This covers the !CONFIG_USERFAULTFD case.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
include/linux/userfaultfd_k.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 6680a4de40b3..c0e716aec26a 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -372,6 +372,10 @@ static inline void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *ctx,
{
}
+static inline void mremap_userfaultfd_fail(struct vm_userfaultfd_ctx *ctx)
+{
+}
+
static inline bool userfaultfd_remove(struct vm_area_struct *vma,
unsigned long start,
unsigned long end)
--
2.50.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs
2025-07-07 6:12 ` [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs Hugh Dickins
@ 2025-07-07 10:31 ` Lorenzo Stoakes
0 siblings, 0 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-07-07 10:31 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Peter Xu, Alexander Viro, Christian Brauner,
Jan Kara, Liam R . Howlett, Vlastimil Babka, Jann Horn,
Pedro Falcato, Rik van Riel, linux-mm, linux-fsdevel,
linux-kernel, linux-kselftest
On Sun, Jul 06, 2025 at 11:12:35PM -0700, Hugh Dickins wrote:
> Applause!
>
> No way shall I review this, but each time I've seen an mremap series
> from Lorenzo go by, I've wanted to say "but wouldn't it be better to...";
> but it felt too impertinent to prod you in a direction I'd never dare
> take myself (and quite likely that you had already tried, but found it
> fundamentally impossible).
>
> Thank you, yes, this is a very welcome step forward.
Thank you that's very kind of you! :) and please, by all means do feel free
to prod or to give your thoughts and opinions on things, they're very
welcome and appreciated!
With respect to this series, I think it really underlines what a difference
refactoring can make to being able to have code do something new - prior to
my last refactoring series and the refactoring bits here I just don't think
it would have been possible.
WRT to the relocate anon series - I thought it'd be interesting to talk
about why it didn't work out a bit in case you/others might find it
interesting:
Indeed, while I'd like us to more efficiently process VMAs in the anon_vma
case, it turns out there's simply too many moving parts for it to be
feasible at this time - I reached the point of dealing with many many edge
cases addressing the points David raised about folios in the swap cache and
migration entries (which might also fail to migrate), having gone to great
lengths to avoid having a not-reliable undo path.
I'd even invented a new means of 'hiding' anon_vma's from the rmap walker,
and did split folio work up front and and and :)
But then there came a point where unavoidably I'd ahave to do a split folio
mid-way through the operation and GUP fast could race and increment a
refcount that'd break that and... it was just obvious this approach wasn't
workable, and was far too fragile.
Important to accept when one reaches such a point, but it wasn't a waste,
as a. there's a lot that can be reused and applied later, b. I learned a
great deal, c. it helped further my research in this area.
I think overall efforts in this direction will require a more ambitious
rework of the anon_vma stuff, something I intend to do :) but it'll all be
done incrementally, with a great deal of care, and obviously working with
the community throughout.
>
> Hugh
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs
2025-07-07 5:27 [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs Lorenzo Stoakes
` (10 preceding siblings ...)
2025-07-07 6:12 ` [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs Hugh Dickins
@ 2025-07-07 10:34 ` Lorenzo Stoakes
11 siblings, 0 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-07-07 10:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato,
Rik van Riel, linux-mm, linux-fsdevel, linux-kernel,
linux-kselftest, linux-api
+cc linux-api, FYI - apologies I intended to cc from the start, was simply
an oversight. All future respins will cc.
This series changes mremap() semantics (I will update the manpage
accordingly of course).
Cheers, Lorenzo
On Mon, Jul 07, 2025 at 06:27:43AM +0100, Lorenzo Stoakes wrote:
> Historically we've made it a uAPI requirement that mremap() may only
> operate on a single VMA at a time.
>
> For instances where VMAs need to be resized, this makes sense, as it
> becomes very difficult to determine what a user actually wants should they
> indicate a desire to expand or shrink the size of multiple VMAs (truncate?
> Adjust sizes individually? Some other strategy?).
>
> However, in instances where a user is moving VMAs, it is restrictive to
> disallow this.
>
> This is especially the case when anonymous mapping remap may or may not be
> mergeable depending on whether VMAs have or have not been faulted due to
> anon_vma assignment and folio index alignment with vma->vm_pgoff.
>
> Often this can result in surprising impact where a moved region is faulted,
> then moved back and a user fails to observe a merge from otherwise
> compatible, adjacent VMAs.
>
> This change allows such cases to work without the user having to be
> cognizant of whether a prior mremap() move or other VMA operations has
> resulted in VMA fragmentation.
>
> In order to do this, this series performs a large amount of refactoring,
> most pertinently - grouping sanity checks together, separately those that
> check input parameters and those relating to VMAs.
>
> we also simplify the post-mmap lock drop processing for uffd and mlock()'d
> VMAs.
>
> With this done, we can then fairly straightforwardly implement this
> functionality.
>
> This works exclusively for mremap() invocations which specify
> MREMAP_FIXED. It is not compatible with VMAs which use userfaultfd, as the
> notification of the userland fault handler would require us to drop the
> mmap lock.
>
> The input and output addresses ranges must not overlap. We carefully
> account for moves which would result in VMA merges or would otherwise
> result in VMA iterator invalidation.
>
> Lorenzo Stoakes (10):
> mm/mremap: perform some simple cleanups
> mm/mremap: refactor initial parameter sanity checks
> mm/mremap: put VMA check and prep logic into helper function
> mm/mremap: cleanup post-processing stage of mremap
> mm/mremap: use an explicit uffd failure path for mremap
> mm/mremap: check remap conditions earlier
> mm/mremap: move remap_is_valid() into check_prep_vma()
> mm/mremap: clean up mlock populate behaviour
> mm/mremap: permit mremap() move of multiple VMAs
> tools/testing/selftests: extend mremap_test to test multi-VMA mremap
>
> fs/userfaultfd.c | 15 +-
> include/linux/userfaultfd_k.h | 1 +
> mm/mremap.c | 502 ++++++++++++++---------
> tools/testing/selftests/mm/mremap_test.c | 145 ++++++-
> 4 files changed, 462 insertions(+), 201 deletions(-)
>
> --
> 2.50.0
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 09/10] mm/mremap: permit mremap() move of multiple VMAs
2025-07-07 5:27 ` [PATCH 09/10] mm/mremap: permit mremap() move of multiple VMAs Lorenzo Stoakes
@ 2025-07-09 18:13 ` Liam R. Howlett
2025-07-10 10:41 ` Lorenzo Stoakes
2025-07-11 8:17 ` Mark Brown
1 sibling, 1 reply; 31+ messages in thread
From: Liam R. Howlett @ 2025-07-09 18:13 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Peter Xu, Alexander Viro, Christian Brauner,
Jan Kara, Vlastimil Babka, Jann Horn, Pedro Falcato, Rik van Riel,
linux-mm, linux-fsdevel, linux-kernel, linux-kselftest
* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [250707 01:28]:
> Historically we've made it a uAPI requirement that mremap() may only
> operate on a single VMA at a time.
>
> For instances where VMAs need to be resized, this makes sense, as it
> becomes very difficult to determine what a user actually wants should they
> indicate a desire to expand or shrink the size of multiple VMAs (truncate?
> Adjust sizes individually? Some other strategy?).
>
> However, in instances where a user is moving VMAs, it is restrictive to
> disallow this.
>
> This is especially the case when anonymous mapping remap may or may not be
> mergeable depending on whether VMAs have or have not been faulted due to
> anon_vma assignment and folio index alignment with vma->vm_pgoff.
>
> Often this can result in surprising impact where a moved region is faulted,
> then moved back and a user fails to observe a merge from otherwise
> compatible, adjacent VMAs.
>
> This change allows such cases to work without the user having to be
> cognizant of whether a prior mremap() move or other VMA operations has
> resulted in VMA fragmentation.
>
> Having refactored mremap code to aggregate per-VMA and parameter checks, we
> are now in a position to permit this kind of move.
>
> We do so by detecting if this is a move-only operation up-front, and then
> utilising a separate code path via remap_move() rather than the ordinary
> single-VMA path.
>
> There are two tasks that occur outside of the mmap write lock - userfaultfd
> notification and population of unmapped regions of expanded VMAs should the
> VMA be mlock()'d.
>
> The latter doesn't apply, as this is logic for a move only and thus no
> expansion can take place. In the former case, we explicitly disallow
> multi-VMA operations on uffd-armed VMAs.
>
> The mmap lock is never dropped in the move-only case, this only occurs on a
> VMA shrink.
>
> We take care to handle cases where a VMA merge has occurred, by resetting
> the VMA iterator in such instances.
>
> We needn't worry about self-merges, as in those cases we would, by
> definition, not be spanning multiple VMAs. The overlapping range test is
> performed on the whole range so specifically disallows this.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> mm/mremap.c | 106 ++++++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 99 insertions(+), 7 deletions(-)
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 28e776cddc08..2e6005e1d22c 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -69,6 +69,8 @@ struct vma_remap_struct {
> enum mremap_type remap_type; /* expand, shrink, etc. */
> bool mmap_locked; /* Is mm currently write-locked? */
> unsigned long charged; /* If VM_ACCOUNT, # pages to account. */
> + bool multi_vma; /* Is >1 VMA being moved? */
> + bool vma_reset; /* Was the VMA merged/unmap occur? */
The name doesn't read well in code. vmi_reset or reset_iter might be
better, but I don't really mind it like this.
> };
>
> static pud_t *get_old_pud(struct mm_struct *mm, unsigned long addr)
> @@ -1111,6 +1113,7 @@ static void unmap_source_vma(struct vma_remap_struct *vrm)
>
> err = do_vmi_munmap(&vmi, mm, addr, len, vrm->uf_unmap, /* unlock= */false);
> vrm->vma = NULL; /* Invalidated. */
> + vrm->vma_reset = true;
I believe the munmap() operation leaves the vmi in the correct position
to reuse, so this is cautious that costs an extra walk of the tree. I
don't think it's critical to performance, but if it is we can look here.
It would have to be passed through which might be a pain.
> if (err) {
> /* OOM: unable to split vma, just get accounts right */
> vm_acct_memory(len >> PAGE_SHIFT);
> @@ -1181,6 +1184,7 @@ static int copy_vma_and_data(struct vma_remap_struct *vrm,
>
> new_vma = copy_vma(&vma, vrm->new_addr, vrm->new_len, new_pgoff,
> &pmc.need_rmap_locks);
> + vrm->vma_reset = vma != vrm->vma;
> if (!new_vma) {
> vrm_uncharge(vrm);
> *new_vma_ptr = NULL;
> @@ -1325,6 +1329,7 @@ static unsigned long shrink_vma(struct vma_remap_struct *vrm,
> res = do_vmi_munmap(&vmi, mm, unmap_start, unmap_bytes,
> vrm->uf_unmap, drop_lock);
> vrm->vma = NULL; /* Invalidated. */
> + vrm->vma_reset = true;
Ditto here, lock depending..
> if (res)
> return res;
>
> @@ -1362,6 +1367,7 @@ static unsigned long mremap_to(struct vma_remap_struct *vrm)
> err = do_munmap(mm, vrm->new_addr, vrm->new_len,
> vrm->uf_unmap_early);
> vrm->vma = NULL; /* Invalidated. */
> + vrm->vma_reset = true;
Pretty sure this one is needed, regardless of passing through (and
updating this call).
> if (err)
> return err;
>
> @@ -1581,6 +1587,18 @@ static bool vrm_will_map_new(struct vma_remap_struct *vrm)
> return false;
> }
>
> +/* Does this remap ONLY move mappings? */
> +static bool vrm_move_only(struct vma_remap_struct *vrm)
> +{
> + if (!vrm_implies_new_addr(vrm))
> + return false;
> +
> + if (vrm->old_len != vrm->new_len)
> + return false;
> +
> + return true;
> +}
> +
> static void notify_uffd(struct vma_remap_struct *vrm, bool failed)
> {
> struct mm_struct *mm = current->mm;
> @@ -1644,10 +1662,29 @@ static int check_prep_vma(struct vma_remap_struct *vrm)
> (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP)))
> return -EINVAL;
>
> - /* We can't remap across vm area boundaries */
> + /*
> + * We can't remap across the end of VMAs, as another VMA may be
> + * adjacent:
> + *
> + * addr vma->vm_end
> + * |-----.----------|
> + * | . |
> + * |-----.----------|
> + * .<--------->xxx>
> + * old_len
> + *
> + * We also require that vma->vm_start <= addr < vma->vm_end.
> + */
> if (old_len > vma->vm_end - addr)
> return -EFAULT;
>
> + /*
> + * We can't support moving multiple uffd VMAs as notify requires mmap
> + * lock to be dropped.
> + */
> + if (vrm->multi_vma && userfaultfd_armed(vma))
> + return -EINVAL;
> +
> if (new_len <= old_len)
> return 0;
>
> @@ -1744,6 +1781,57 @@ static unsigned long check_mremap_params(struct vma_remap_struct *vrm)
> return 0;
> }
>
> +static unsigned long remap_move(struct vma_remap_struct *vrm)
> +{
> + struct vm_area_struct *vma;
> + unsigned long start = vrm->addr;
> + unsigned long end = vrm->addr + vrm->old_len;
> + unsigned long new_addr = vrm->new_addr;
> + unsigned long prev_addr = start;
> + VMA_ITERATOR(vmi, current->mm, start);
> +
> + /*
> + * When moving VMAs we allow for batched moves across multiple VMAs,
> + * with all VMAs in the input range [addr, addr + old_len) being moved
> + * (and split as necessary).
> + */
> + for_each_vma_range(vmi, vma, end) {
> + unsigned long addr = max(vma->vm_start, start);
> + unsigned long len = min(end, vma->vm_end) - addr;
> + unsigned long offset = addr - start;
> + unsigned long res;
> +
> + /* Merged with self, move on. */
> + if (vrm->multi_vma && prev_addr == addr)
> + continue;
> +
> + vrm->vma = vma;
> + vrm->addr = addr;
> + vrm->new_addr = new_addr + offset;
> + vrm->old_len = vrm->new_len = len;
> +
> + res = check_prep_vma(vrm);
> + if (!res)
> + res = mremap_to(vrm);
> + if (IS_ERR_VALUE(res))
> + return res;
> +
> + /* mmap lock is only dropped on shrink. */
> + VM_WARN_ON_ONCE(!vrm->mmap_locked);
> + /* This is a move, no expand should occur. */
> + VM_WARN_ON_ONCE(vrm->populate_expand);
> +
> + if (vrm->vma_reset) {
> + vma_iter_reset(&vmi);
> + vrm->vma_reset = false;
> + }
What code path results in vma_reset == false here?
> + vrm->multi_vma = true;
> + prev_addr = addr;
> + }
> +
> + return new_addr;
> +}
The iterator use looks good.
> +
> static unsigned long do_mremap(struct vma_remap_struct *vrm)
> {
> struct mm_struct *mm = current->mm;
> @@ -1761,13 +1849,17 @@ static unsigned long do_mremap(struct vma_remap_struct *vrm)
> return -EINTR;
> vrm->mmap_locked = true;
>
> - vrm->vma = vma_lookup(current->mm, vrm->addr);
> - res = check_prep_vma(vrm);
> - if (res)
> - goto out;
> + if (vrm_move_only(vrm)) {
> + res = remap_move(vrm);
> + } else {
> + vrm->vma = vma_lookup(current->mm, vrm->addr);
> + res = check_prep_vma(vrm);
> + if (res)
> + goto out;
>
> - /* Actually execute mremap. */
> - res = vrm_implies_new_addr(vrm) ? mremap_to(vrm) : mremap_at(vrm);
> + /* Actually execute mremap. */
> + res = vrm_implies_new_addr(vrm) ? mremap_to(vrm) : mremap_at(vrm);
> + }
>
> out:
> failed = IS_ERR_VALUE(res);
> --
> 2.50.0
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 09/10] mm/mremap: permit mremap() move of multiple VMAs
2025-07-09 18:13 ` Liam R. Howlett
@ 2025-07-10 10:41 ` Lorenzo Stoakes
0 siblings, 0 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-07-10 10:41 UTC (permalink / raw)
To: Liam R. Howlett, Andrew Morton, Peter Xu, Alexander Viro,
Christian Brauner, Jan Kara, Vlastimil Babka, Jann Horn,
Pedro Falcato, Rik van Riel, linux-mm, linux-fsdevel,
linux-kernel, linux-kselftest
On Wed, Jul 09, 2025 at 02:13:41PM -0400, Liam R. Howlett wrote:
> * Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [250707 01:28]:
> > Historically we've made it a uAPI requirement that mremap() may only
> > operate on a single VMA at a time.
> >
> > For instances where VMAs need to be resized, this makes sense, as it
> > becomes very difficult to determine what a user actually wants should they
> > indicate a desire to expand or shrink the size of multiple VMAs (truncate?
> > Adjust sizes individually? Some other strategy?).
> >
> > However, in instances where a user is moving VMAs, it is restrictive to
> > disallow this.
> >
> > This is especially the case when anonymous mapping remap may or may not be
> > mergeable depending on whether VMAs have or have not been faulted due to
> > anon_vma assignment and folio index alignment with vma->vm_pgoff.
> >
> > Often this can result in surprising impact where a moved region is faulted,
> > then moved back and a user fails to observe a merge from otherwise
> > compatible, adjacent VMAs.
> >
> > This change allows such cases to work without the user having to be
> > cognizant of whether a prior mremap() move or other VMA operations has
> > resulted in VMA fragmentation.
> >
> > Having refactored mremap code to aggregate per-VMA and parameter checks, we
> > are now in a position to permit this kind of move.
> >
> > We do so by detecting if this is a move-only operation up-front, and then
> > utilising a separate code path via remap_move() rather than the ordinary
> > single-VMA path.
> >
> > There are two tasks that occur outside of the mmap write lock - userfaultfd
> > notification and population of unmapped regions of expanded VMAs should the
> > VMA be mlock()'d.
> >
> > The latter doesn't apply, as this is logic for a move only and thus no
> > expansion can take place. In the former case, we explicitly disallow
> > multi-VMA operations on uffd-armed VMAs.
> >
> > The mmap lock is never dropped in the move-only case, this only occurs on a
> > VMA shrink.
> >
> > We take care to handle cases where a VMA merge has occurred, by resetting
> > the VMA iterator in such instances.
> >
> > We needn't worry about self-merges, as in those cases we would, by
> > definition, not be spanning multiple VMAs. The overlapping range test is
> > performed on the whole range so specifically disallows this.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> > mm/mremap.c | 106 ++++++++++++++++++++++++++++++++++++++++++++++++----
> > 1 file changed, 99 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index 28e776cddc08..2e6005e1d22c 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -69,6 +69,8 @@ struct vma_remap_struct {
> > enum mremap_type remap_type; /* expand, shrink, etc. */
> > bool mmap_locked; /* Is mm currently write-locked? */
> > unsigned long charged; /* If VM_ACCOUNT, # pages to account. */
> > + bool multi_vma; /* Is >1 VMA being moved? */
> > + bool vma_reset; /* Was the VMA merged/unmap occur? */
>
> The name doesn't read well in code. vmi_reset or reset_iter might be
> better, but I don't really mind it like this.
Yeah it is a bit odd I agree.
>
> > };
> >
> > static pud_t *get_old_pud(struct mm_struct *mm, unsigned long addr)
> > @@ -1111,6 +1113,7 @@ static void unmap_source_vma(struct vma_remap_struct *vrm)
> >
> > err = do_vmi_munmap(&vmi, mm, addr, len, vrm->uf_unmap, /* unlock= */false);
> > vrm->vma = NULL; /* Invalidated. */
> > + vrm->vma_reset = true;
>
> I believe the munmap() operation leaves the vmi in the correct position
> to reuse, so this is cautious that costs an extra walk of the tree. I
> don't think it's critical to performance, but if it is we can look here.
> It would have to be passed through which might be a pain.
Yeah I think this means we _always_ reset the VMI as you mention below, unless
MREMAP_DONT_UNMAP | MREMAP_FIXED is used.
It's right to invalidate the vrm->vma here, as this is the source VMA so is now
a dangling pointer.
I think the problem I was worried about here was a partial unmap causing a
split, and keep in mind we might be moving things backwards also.
But I don't think the _iterator_ should be invalidated by this actually right?
We'd still be in the correct position.
So yeah, I'll drop this.
>
> > if (err) {
> > /* OOM: unable to split vma, just get accounts right */
> > vm_acct_memory(len >> PAGE_SHIFT);
> > @@ -1181,6 +1184,7 @@ static int copy_vma_and_data(struct vma_remap_struct *vrm,
> >
> > new_vma = copy_vma(&vma, vrm->new_addr, vrm->new_len, new_pgoff,
> > &pmc.need_rmap_locks);
> > + vrm->vma_reset = vma != vrm->vma;
> > if (!new_vma) {
> > vrm_uncharge(vrm);
> > *new_vma_ptr = NULL;
> > @@ -1325,6 +1329,7 @@ static unsigned long shrink_vma(struct vma_remap_struct *vrm,
> > res = do_vmi_munmap(&vmi, mm, unmap_start, unmap_bytes,
> > vrm->uf_unmap, drop_lock);
> > vrm->vma = NULL; /* Invalidated. */
> > + vrm->vma_reset = true;
>
> Ditto here, lock depending..
We won't ever drop the lock in a move path to be clear. Only on shrink, which is
disallowed for multi VMA move (as is expand).
So probably this is overcautious and I'll drop it.
>
> > if (res)
> > return res;
> >
> > @@ -1362,6 +1367,7 @@ static unsigned long mremap_to(struct vma_remap_struct *vrm)
> > err = do_munmap(mm, vrm->new_addr, vrm->new_len,
> > vrm->uf_unmap_early);
> > vrm->vma = NULL; /* Invalidated. */
> > + vrm->vma_reset = true;
>
> Pretty sure this one is needed, regardless of passing through (and
> updating this call).
Yes this one for sure.
>
> > if (err)
> > return err;
> >
> > @@ -1581,6 +1587,18 @@ static bool vrm_will_map_new(struct vma_remap_struct *vrm)
> > return false;
> > }
> >
> > +/* Does this remap ONLY move mappings? */
> > +static bool vrm_move_only(struct vma_remap_struct *vrm)
> > +{
> > + if (!vrm_implies_new_addr(vrm))
> > + return false;
> > +
> > + if (vrm->old_len != vrm->new_len)
> > + return false;
> > +
> > + return true;
> > +}
> > +
> > static void notify_uffd(struct vma_remap_struct *vrm, bool failed)
> > {
> > struct mm_struct *mm = current->mm;
> > @@ -1644,10 +1662,29 @@ static int check_prep_vma(struct vma_remap_struct *vrm)
> > (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP)))
> > return -EINVAL;
> >
> > - /* We can't remap across vm area boundaries */
> > + /*
> > + * We can't remap across the end of VMAs, as another VMA may be
> > + * adjacent:
> > + *
> > + * addr vma->vm_end
> > + * |-----.----------|
> > + * | . |
> > + * |-----.----------|
> > + * .<--------->xxx>
> > + * old_len
> > + *
> > + * We also require that vma->vm_start <= addr < vma->vm_end.
> > + */
> > if (old_len > vma->vm_end - addr)
> > return -EFAULT;
> >
> > + /*
> > + * We can't support moving multiple uffd VMAs as notify requires mmap
> > + * lock to be dropped.
> > + */
> > + if (vrm->multi_vma && userfaultfd_armed(vma))
> > + return -EINVAL;
> > +
> > if (new_len <= old_len)
> > return 0;
> >
> > @@ -1744,6 +1781,57 @@ static unsigned long check_mremap_params(struct vma_remap_struct *vrm)
> > return 0;
> > }
> >
> > +static unsigned long remap_move(struct vma_remap_struct *vrm)
> > +{
> > + struct vm_area_struct *vma;
> > + unsigned long start = vrm->addr;
> > + unsigned long end = vrm->addr + vrm->old_len;
> > + unsigned long new_addr = vrm->new_addr;
> > + unsigned long prev_addr = start;
> > + VMA_ITERATOR(vmi, current->mm, start);
> > +
> > + /*
> > + * When moving VMAs we allow for batched moves across multiple VMAs,
> > + * with all VMAs in the input range [addr, addr + old_len) being moved
> > + * (and split as necessary).
> > + */
> > + for_each_vma_range(vmi, vma, end) {
> > + unsigned long addr = max(vma->vm_start, start);
> > + unsigned long len = min(end, vma->vm_end) - addr;
> > + unsigned long offset = addr - start;
> > + unsigned long res;
> > +
> > + /* Merged with self, move on. */
> > + if (vrm->multi_vma && prev_addr == addr)
> > + continue;
> > +
> > + vrm->vma = vma;
> > + vrm->addr = addr;
> > + vrm->new_addr = new_addr + offset;
> > + vrm->old_len = vrm->new_len = len;
> > +
> > + res = check_prep_vma(vrm);
> > + if (!res)
> > + res = mremap_to(vrm);
> > + if (IS_ERR_VALUE(res))
> > + return res;
> > +
> > + /* mmap lock is only dropped on shrink. */
> > + VM_WARN_ON_ONCE(!vrm->mmap_locked);
> > + /* This is a move, no expand should occur. */
> > + VM_WARN_ON_ONCE(vrm->populate_expand);
> > +
> > + if (vrm->vma_reset) {
> > + vma_iter_reset(&vmi);
> > + vrm->vma_reset = false;
> > + }
>
> What code path results in vma_reset == false here?
Yeah that's a good point, only MREMAP_DONT_UNMAP | MREMAP_FIXED will fail to hit
it, so let's drop for unmaps.
I will test this is all good too.
>
> > + vrm->multi_vma = true;
> > + prev_addr = addr;
> > + }
> > +
> > + return new_addr;
> > +}
>
> The iterator use looks good.
Thanks!
>
> > +
> > static unsigned long do_mremap(struct vma_remap_struct *vrm)
> > {
> > struct mm_struct *mm = current->mm;
> > @@ -1761,13 +1849,17 @@ static unsigned long do_mremap(struct vma_remap_struct *vrm)
> > return -EINTR;
> > vrm->mmap_locked = true;
> >
> > - vrm->vma = vma_lookup(current->mm, vrm->addr);
> > - res = check_prep_vma(vrm);
> > - if (res)
> > - goto out;
> > + if (vrm_move_only(vrm)) {
> > + res = remap_move(vrm);
> > + } else {
> > + vrm->vma = vma_lookup(current->mm, vrm->addr);
> > + res = check_prep_vma(vrm);
> > + if (res)
> > + goto out;
> >
> > - /* Actually execute mremap. */
> > - res = vrm_implies_new_addr(vrm) ? mremap_to(vrm) : mremap_at(vrm);
> > + /* Actually execute mremap. */
> > + res = vrm_implies_new_addr(vrm) ? mremap_to(vrm) : mremap_at(vrm);
> > + }
> >
> > out:
> > failed = IS_ERR_VALUE(res);
> > --
> > 2.50.0
> >
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 01/10] mm/mremap: perform some simple cleanups
2025-07-07 5:27 ` [PATCH 01/10] mm/mremap: perform some simple cleanups Lorenzo Stoakes
@ 2025-07-10 11:09 ` Vlastimil Babka
0 siblings, 0 replies; 31+ messages in thread
From: Vlastimil Babka @ 2025-07-10 11:09 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Jann Horn, Pedro Falcato, Rik van Riel,
linux-mm, linux-fsdevel, linux-kernel, linux-kselftest
On 7/7/25 07:27, Lorenzo Stoakes wrote:
> We const-ify the vrm flags parameter to indicate this will never change.
>
> We rename resize_is_valid() to remap_is_valid(), as this function does not
> only apply to cases where we resize, so it's simply confusing to refer to
> that here.
>
> We remove the BUG() from mremap_at(), as we should not BUG() unless we are
> certain it'll result in system instability.
>
> We rename vrm_charge() to vrm_calc_charge() to make it clear this simply
> calculates the charged number of pages rather than actually adjusting any
> state.
>
> We update the comment for vrm_implies_new_addr() to explain that
> MREMAP_DONTUNMAP does not require a set address, but will always be moved.
>
> Additionally consistently use 'res' rather than 'ret' for result values.
>
> No functional change intended.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 02/10] mm/mremap: refactor initial parameter sanity checks
2025-07-07 5:27 ` [PATCH 02/10] mm/mremap: refactor initial parameter sanity checks Lorenzo Stoakes
@ 2025-07-10 11:38 ` Vlastimil Babka
0 siblings, 0 replies; 31+ messages in thread
From: Vlastimil Babka @ 2025-07-10 11:38 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Jann Horn, Pedro Falcato, Rik van Riel,
linux-mm, linux-fsdevel, linux-kernel, linux-kselftest
On 7/7/25 07:27, Lorenzo Stoakes wrote:
> We are currently checking some things later, and some things
> immediately. Aggregate the checks and avoid ones that need not be made.
>
> Simplify things by aligning lengths immediately. Defer setting the delta
> parameter until later, which removes some duplicate code in the hugetlb
> case.
>
> We can safely perform the checks moved from mremap_to() to
> check_mremap_params() because:
>
> * If we set a new address via vrm_set_new_addr(), then this is guaranteed
> to not overlap nor to position the new VMA past TASK_SIZE, so there's no
> need to check these later.
>
> * We can simply page align lengths immediately. We do not need to check for
> overlap nor TASK_SIZE sanity after hugetlb alignment as this asserts
> addresses are huge-aligned, then huge-aligns lengths, rounding down. This
> means any existing overlap would have already been caught.
>
> Moving things around like this lays the groundwork for subsequent changes
> to permit operations on batches of VMAs.
>
> No functional change intended.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 03/10] mm/mremap: put VMA check and prep logic into helper function
2025-07-07 5:27 ` [PATCH 03/10] mm/mremap: put VMA check and prep logic into helper function Lorenzo Stoakes
@ 2025-07-10 13:10 ` Vlastimil Babka
0 siblings, 0 replies; 31+ messages in thread
From: Vlastimil Babka @ 2025-07-10 13:10 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Jann Horn, Pedro Falcato, Rik van Riel,
linux-mm, linux-fsdevel, linux-kernel, linux-kselftest
On 7/7/25 07:27, Lorenzo Stoakes wrote:
> Rather than lumping everything together in do_mremap(), add a new helper
> function, check_prep_vma(), to do the work relating to each VMA.
>
> This further lays groundwork for subsequent patches which will allow for
> batched VMA mremap().
>
> Additionally, if we set vrm->new_addr == vrm->addr when prepping the VMA,
> this avoids us needing to do so in the expand VMA mlocked case.
>
> No functional change intended.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 04/10] mm/mremap: cleanup post-processing stage of mremap
2025-07-07 5:27 ` [PATCH 04/10] mm/mremap: cleanup post-processing stage of mremap Lorenzo Stoakes
@ 2025-07-10 13:49 ` Vlastimil Babka
2025-07-10 15:28 ` Lorenzo Stoakes
0 siblings, 1 reply; 31+ messages in thread
From: Vlastimil Babka @ 2025-07-10 13:49 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Jann Horn, Pedro Falcato, Rik van Riel,
linux-mm, linux-fsdevel, linux-kernel, linux-kselftest
On 7/7/25 07:27, Lorenzo Stoakes wrote:
> Separate out the uffd bits so it clear's what's happening.
>
> Don't bother setting vrm->mmap_locked after unlocking, because after this
> we are done anyway.
>
> The only time we drop the mmap lock is on VMA shrink, at which point
> vrm->new_len will be < vrm->old_len and the operation will not be performed
> anyway, so move this code out of the if (vrm->mmap_locked) block.
>
> All addresses returned by mremap() are page-aligned, so the
> offset_in_page() check on ret seems only to be incorrectly trying to detect
"incorrectly" to me implies there's a bug. But AFAIU there's not, so maybe
e.g. "inappropriately"?
> whether an error occurred - explicitly check for this.
>
> No functional change intended.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Just a nit:
> ---
> mm/mremap.c | 22 +++++++++++++---------
> 1 file changed, 13 insertions(+), 9 deletions(-)
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 60eb0ac8634b..660bdb75e2f9 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -1729,6 +1729,15 @@ static int check_prep_vma(struct vma_remap_struct *vrm)
> return 0;
> }
>
> +static void notify_uffd(struct vma_remap_struct *vrm, unsigned long ret)
"ret" not "res"? :) Or actually why not name it for what it is,
mremap_userfaultfd_complete() names the parameter "to". Maybe to_addr or
new_addr?
> +{
> + struct mm_struct *mm = current->mm;
> +
> + userfaultfd_unmap_complete(mm, vrm->uf_unmap_early);
> + mremap_userfaultfd_complete(vrm->uf, vrm->addr, ret, vrm->old_len);
> + userfaultfd_unmap_complete(mm, vrm->uf_unmap);
> +}
> +
> static unsigned long do_mremap(struct vma_remap_struct *vrm)
> {
> struct mm_struct *mm = current->mm;
> @@ -1754,18 +1763,13 @@ static unsigned long do_mremap(struct vma_remap_struct *vrm)
> res = vrm_implies_new_addr(vrm) ? mremap_to(vrm) : mremap_at(vrm);
>
> out:
> - if (vrm->mmap_locked) {
> + if (vrm->mmap_locked)
> mmap_write_unlock(mm);
> - vrm->mmap_locked = false;
> -
> - if (!offset_in_page(res) && vrm->mlocked && vrm->new_len > vrm->old_len)
> - mm_populate(vrm->new_addr + vrm->old_len, vrm->delta);
> - }
>
> - userfaultfd_unmap_complete(mm, vrm->uf_unmap_early);
> - mremap_userfaultfd_complete(vrm->uf, vrm->addr, res, vrm->old_len);
> - userfaultfd_unmap_complete(mm, vrm->uf_unmap);
> + if (!IS_ERR_VALUE(res) && vrm->mlocked && vrm->new_len > vrm->old_len)
> + mm_populate(vrm->new_addr + vrm->old_len, vrm->delta);
>
> + notify_uffd(vrm, res);
> return res;
> }
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 05/10] mm/mremap: use an explicit uffd failure path for mremap
2025-07-07 5:27 ` [PATCH 05/10] mm/mremap: use an explicit uffd failure path for mremap Lorenzo Stoakes
2025-07-07 7:56 ` kernel test robot
2025-07-07 10:20 ` Lorenzo Stoakes
@ 2025-07-10 14:24 ` Vlastimil Babka
2 siblings, 0 replies; 31+ messages in thread
From: Vlastimil Babka @ 2025-07-10 14:24 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Jann Horn, Pedro Falcato, Rik van Riel,
linux-mm, linux-fsdevel, linux-kernel, linux-kselftest
On 7/7/25 07:27, Lorenzo Stoakes wrote:
> Right now it appears that the code is relying upon the returned destination
> address having bits outside PAGE_MASK to indicate whether an error value is
> specified, and decrementing the increased refcount on the uffd ctx if so.
>
> This is not a safe means of determining an error value, so instead, be
> specific. It makes far more sense to do so in a dedicated error path, so
> add mremap_userfaultfd_fail() for this purpose and use this when an error
> arises.
>
> A vm_userfaultfd_ctx is not established until we are at the point where
> mremap_userfaultfd_prep() is invoked in copy_vma_and_data(), so this is a
> no-op until this happens.
>
> That is - uffd remap notification only occurs if the VMA is actually moved
> - at which point a UFFD_EVENT_REMAP event is raised.
>
> No errors can occur after this point currently, though it's certainly not
> guaranteed this will always remain the case, and we mustn't rely on this.
>
> However, the reason for needing to handle this case is that, when an error
> arises on a VMA move at the point of adjusting page tables, we revert this
> operation, and propagate the error.
>
> At this point, it is not correct to raise a uffd remap event, and we must
> handle it.
>
> This refactoring makes it abundantly clear what we are doing.
>
> We assume vrm->new_addr is always valid, which a prior change made the case
> even for mremap() invocations which don't move the VMA, however given no
> uffd context would be set up in this case it's immaterial to this change
> anyway.
>
> No functional change intended.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Guess that renders my previous nit unimportant.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 06/10] mm/mremap: check remap conditions earlier
2025-07-07 5:27 ` [PATCH 06/10] mm/mremap: check remap conditions earlier Lorenzo Stoakes
@ 2025-07-10 14:36 ` Vlastimil Babka
0 siblings, 0 replies; 31+ messages in thread
From: Vlastimil Babka @ 2025-07-10 14:36 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Jann Horn, Pedro Falcato, Rik van Riel,
linux-mm, linux-fsdevel, linux-kernel, linux-kselftest
On 7/7/25 07:27, Lorenzo Stoakes wrote:
> When we expand or move a VMA, this requires a number of additional checks
> to be performed.
>
> Make it really obvious under what circumstances these checks must be
> performed and aggregate all the checks in one place by invoking this in
> check_prep_vma().
>
> We have to adjust the checks to account for shrink + move operations by
> checking new_len <= old_len rather than new_len == old_len.
>
> No functional change intended.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 07/10] mm/mremap: move remap_is_valid() into check_prep_vma()
2025-07-07 5:27 ` [PATCH 07/10] mm/mremap: move remap_is_valid() into check_prep_vma() Lorenzo Stoakes
@ 2025-07-10 14:44 ` Vlastimil Babka
0 siblings, 0 replies; 31+ messages in thread
From: Vlastimil Babka @ 2025-07-10 14:44 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Jann Horn, Pedro Falcato, Rik van Riel,
linux-mm, linux-fsdevel, linux-kernel, linux-kselftest
On 7/7/25 07:27, Lorenzo Stoakes wrote:
> Group parameter check logic together, moving check_mremap_params() next to
> it.
>
> This puts all such checks into a single place, and invokes them early so we
> can simply bail out as soon as we are aware that a condition is not met.
>
> No functional change intended.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 08/10] mm/mremap: clean up mlock populate behaviour
2025-07-07 5:27 ` [PATCH 08/10] mm/mremap: clean up mlock populate behaviour Lorenzo Stoakes
@ 2025-07-10 14:47 ` Vlastimil Babka
0 siblings, 0 replies; 31+ messages in thread
From: Vlastimil Babka @ 2025-07-10 14:47 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: Peter Xu, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Jann Horn, Pedro Falcato, Rik van Riel,
linux-mm, linux-fsdevel, linux-kernel, linux-kselftest
On 7/7/25 07:27, Lorenzo Stoakes wrote:
> When an mlock()'d VMA is expanded, we need to populate the expanded region
> to maintain the contract that all mlock()'d memory is present (albeit -
> with some period after mmap unlock where the expanded part of the mapping
> remains unfaulted).
>
> The current implementation is very unclear, so make it absolutely explicit
> under what circumstances we do this.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 04/10] mm/mremap: cleanup post-processing stage of mremap
2025-07-10 13:49 ` Vlastimil Babka
@ 2025-07-10 15:28 ` Lorenzo Stoakes
0 siblings, 0 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-07-10 15:28 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, Peter Xu, Alexander Viro, Christian Brauner,
Jan Kara, Liam R . Howlett, Jann Horn, Pedro Falcato,
Rik van Riel, linux-mm, linux-fsdevel, linux-kernel,
linux-kselftest
On Thu, Jul 10, 2025 at 03:49:09PM +0200, Vlastimil Babka wrote:
> On 7/7/25 07:27, Lorenzo Stoakes wrote:
> > Separate out the uffd bits so it clear's what's happening.
> >
> > Don't bother setting vrm->mmap_locked after unlocking, because after this
> > we are done anyway.
> >
> > The only time we drop the mmap lock is on VMA shrink, at which point
> > vrm->new_len will be < vrm->old_len and the operation will not be performed
> > anyway, so move this code out of the if (vrm->mmap_locked) block.
> >
> > All addresses returned by mremap() are page-aligned, so the
> > offset_in_page() check on ret seems only to be incorrectly trying to detect
>
> "incorrectly" to me implies there's a bug. But AFAIU there's not, so maybe
> e.g. "inappropriately"?
>
> > whether an error occurred - explicitly check for this.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Thanks! :)
>
> Just a nit:
>
> > ---
> > mm/mremap.c | 22 +++++++++++++---------
> > 1 file changed, 13 insertions(+), 9 deletions(-)
> >
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index 60eb0ac8634b..660bdb75e2f9 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -1729,6 +1729,15 @@ static int check_prep_vma(struct vma_remap_struct *vrm)
> > return 0;
> > }
> >
> > +static void notify_uffd(struct vma_remap_struct *vrm, unsigned long ret)
>
> "ret" not "res"? :) Or actually why not name it for what it is,
> mremap_userfaultfd_complete() names the parameter "to". Maybe to_addr or
> new_addr?
Later in the series we eliminate this as you've seen, but still worth fixign up
I think, will do on respin!
>
> > +{
> > + struct mm_struct *mm = current->mm;
> > +
> > + userfaultfd_unmap_complete(mm, vrm->uf_unmap_early);
> > + mremap_userfaultfd_complete(vrm->uf, vrm->addr, ret, vrm->old_len);
> > + userfaultfd_unmap_complete(mm, vrm->uf_unmap);
> > +}
> > +
> > static unsigned long do_mremap(struct vma_remap_struct *vrm)
> > {
> > struct mm_struct *mm = current->mm;
> > @@ -1754,18 +1763,13 @@ static unsigned long do_mremap(struct vma_remap_struct *vrm)
> > res = vrm_implies_new_addr(vrm) ? mremap_to(vrm) : mremap_at(vrm);
> >
> > out:
> > - if (vrm->mmap_locked) {
> > + if (vrm->mmap_locked)
> > mmap_write_unlock(mm);
> > - vrm->mmap_locked = false;
> > -
> > - if (!offset_in_page(res) && vrm->mlocked && vrm->new_len > vrm->old_len)
> > - mm_populate(vrm->new_addr + vrm->old_len, vrm->delta);
> > - }
> >
> > - userfaultfd_unmap_complete(mm, vrm->uf_unmap_early);
> > - mremap_userfaultfd_complete(vrm->uf, vrm->addr, res, vrm->old_len);
> > - userfaultfd_unmap_complete(mm, vrm->uf_unmap);
> > + if (!IS_ERR_VALUE(res) && vrm->mlocked && vrm->new_len > vrm->old_len)
> > + mm_populate(vrm->new_addr + vrm->old_len, vrm->delta);
> >
> > + notify_uffd(vrm, res);
> > return res;
> > }
> >
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 09/10] mm/mremap: permit mremap() move of multiple VMAs
2025-07-07 5:27 ` [PATCH 09/10] mm/mremap: permit mremap() move of multiple VMAs Lorenzo Stoakes
2025-07-09 18:13 ` Liam R. Howlett
@ 2025-07-11 8:17 ` Mark Brown
2025-07-11 8:22 ` Mark Brown
1 sibling, 1 reply; 31+ messages in thread
From: Mark Brown @ 2025-07-11 8:17 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Peter Xu, Alexander Viro, Christian Brauner,
Jan Kara, Liam R . Howlett, Vlastimil Babka, Jann Horn,
Pedro Falcato, Rik van Riel, linux-mm, linux-fsdevel,
linux-kernel, linux-kselftest
[-- Attachment #1: Type: text/plain, Size: 24831 bytes --]
On Mon, Jul 07, 2025 at 06:27:52AM +0100, Lorenzo Stoakes wrote:
> Historically we've made it a uAPI requirement that mremap() may only
> operate on a single VMA at a time.
>
> For instances where VMAs need to be resized, this makes sense, as it
> becomes very difficult to determine what a user actually wants should they
> indicate a desire to expand or shrink the size of multiple VMAs (truncate?
> Adjust sizes individually? Some other strategy?).
I'm seeing failures in the mremap_dontunmap test in -next on Rasperry Pi
4 which bisect down to this patch. The test logging isn't super helpful
here sadly:
# # --------------------------
# # running ./mremap_dontunmap
# # --------------------------
# # TAP version 13
# # 1..5
# # [FAIL]
# not ok 33 mremap_dontunmap # exit=139
Full log at:
https://lava.sirena.org.uk/scheduler/job/1556942#L3693
Bisect log:
# bad: [b551c4e2a98a177a06148cf16505643cd2108386] Add linux-next specific files for 20250710
# good: [23c7369d4050e533fe661e5c750181dffe67b4b9] Merge branch 'for-linux-next-fixes' of https://gitlab.freedesktop.org/drm/misc/kernel.git
# good: [c61da55412a08268ea0cdef99dea11f7ade934ee] ASoC: sdw_utils: Add missed component_name strings for speaker amps
# good: [68e4dadacb7faa393b532b41bbf99a2dbfec3b1b] ASoC: img: Imagination Technologies sound should depend on MIPS
# good: [defe01abfb7f5c5bd53c723b8577d4fcd64faa5a] spi: stm32-ospi: Use of_reserved_mem_region_to_resource() for "memory-region"
# good: [86ccd4d3e8bc9eeb5dde4080fcc67e0505d1d2c6] ASoC: Intel: soc-acpi-intel-lnl-match: add rt1320_l12_rt714_l0 support
# good: [5054740e0092aac528c0589251f612b3b41c9e7b] regulator: sy8827n: make enable gpio NONEXCLUSIVE
# good: [08dc0f5cc26a203e8008c38d9b436c079e7dbb45] ASoC: soc-dapm: add prefix on soc_dapm_dev_attrs
# good: [c61e94e5e4e6bc50064119e6a779564d1d2ac0e7] regulator: stm32-vrefbuf: Remove redundant pm_runtime_mark_last_busy() calls
# good: [571defe0dff3f1e4180bd0db79283d3d5bf74a71] ASoC: codec: rockchip_sai: Remove including of_gpio.h
# good: [2fca750160f29015ab1109bb478537a4e415f7cd] spi: Remove redundant pm_runtime_mark_last_busy() calls
# good: [9f711c9321cffe3e03709176873c277fa911c366] regmap: get rid of redundant debugfs_file_{get,put}()
# good: [bc163baef57002c08b3afe64cdd2f55f55a765eb] ASoC: Use of_reserved_mem_region_to_resource() for "memory-region"
# good: [2bd9648d5a8d329ca734ca2c273a80934867471e] ASoC: SOF: Remove redundant pm_runtime_mark_last_busy() calls
# good: [baee26a9d6cd3d3c6c3c03c56270aa647a67e4bd] ASoC: fsl_mqs: rename system manager indices for i.MX95
# good: [7105fdd54a14bee49371b39374a61b3c967d74cb] spi: dt-bindings: Convert marvell,orion-spi to DT schema
# good: [913bf8d50cbd144c87e9660b591781179182ff59] spi: spi-qpic-snand: add support for 8 bits ECC strength
# good: [34d340d48e595f8dfd4e72fe4100d2579dbe4a1a] ASoC: qcom: sc8280xp: Add support for QCS8275
# good: [0c0ef1d90967717b91cded41b00dbae05d8e521c] ASoC: amd: acp: Enable acp7.2 platform based DMIC support in machine driver
# good: [3fcd3d2fe44dc9dfca20b6aed117f314a50ba0ff] spi: offload trigger: add ADI Util Sigma-Delta SPI driver
# good: [244bc18e5f1875401a4af87d2eae3f9376d9d720] spi: stm32: delete stray tabs in stm32h7_spi_data_idleness()
# good: [7e1c28fbf235791cb5046fafdac5bc16fe8e788d] spi: spi-pci1xxxx: enable concurrent DMA read/write across SPI transfers
# good: [b9ab3b61824190b1c6b2c59e7ba4de591f24eb92] ASoC: SDCA: Add some initial IRQ handlers
# good: [c4f2c05ab02952c9a56067aeb700ded95b183570] spi: stm32: fix pointer-to-pointer variables usage
# good: [427ceac823e58813b510e585011488f603f0d891] regulator: tps6286x-regulator: Enable REGCACHE_MAPLE
# good: [29ddce17e909779633f856ad1c2f111fbf71c0df] ASoC: codecs: Add calibration function to aw88399 chip
# good: [ac4c064f67d3cdf9118b9b09c1e3b28b6c10a7ea] spi: dt-bindings: add nxp,lpc3220-spi.yaml
# good: [08bf1663c21a3e815eda28fa242d84c945ca3b94] dmaengine: Add devm_dma_request_chan()
# good: [2555691165a0285a4617230fed859f20dcc51608] spi: atmel-quadspi: Use `devm_dma_request_chan()`
# good: [ce57bc9771411d6d27f2ca7b40396cbd7d684ba9] regulator: core: Don't use "proxy" headers
# good: [0f60ecffbfe35e12eb56c99640ba2360244b5bb3] ASoC: sdw_utils: generate combined spk components string
# good: [9a944494c299fabf3cc781798eb7c02a0bece364] spi: dt-bindings: stm32: deprecate `st,spi-midi-ns` property
# good: [3e36c822506d924894ff7de549b9377d3114c2d7] spi: spi-pci1xxxx: Add support for per-instance DMA interrupt vectors
# good: [68fbc70ece40139380380dce74059afa592846b3] ASoC: hisilicon: Standardize ASoC menu
# good: [8f9cf02c8852837923f1cdacfcc92e138513325c] spi: microchip-core-qspi: Add regular transfers
# good: [17cc308b183308bf5ada36e164284fff7eb064ba] ASoC: wm8524: enable constraints when sysclk is configured.
# good: [59566923d955b69bfb1e1163f07dff437dde8c9c] ASoC: SOF: amd: add alternate machines for ACP7.0 and ACP7.1 platforms
# good: [024f39fff6d222cedde361f7fe34d9ba4e6afb92] regulator: mtk-dvfsrc: Add support for MediaTek MT8196 DVFSRC
# good: [19cbc930c209d59a2c9828de4c7b767e9f14667e] regulator: pca9450: Support PWM mode also for pca9451a
# good: [c4ca928a6db1593802cd945f075a7e21dd0430c1] ASoC: hdac_hdmi: Rate limit logging on connection and disconnection
# good: [a48352921f0b15b1f7eff83f5b5613d6ae2350d3] ASoC: codecs: wcd939x: Add defines for major/minor version decoding
# good: [3421d46440ebe0865bec71dbd2330b4e17a425ab] HID: core: Add bus define for SoundWire bus
# good: [a1d203d390e04798ccc1c3c06019cd4411885d6d] ASoC: SOF: ipc4-pcm: Enable delay reporting for ChainDMA streams
# good: [bb48117b79ebc39485f7306d09dc602981fe540f] ASoC: Intel: sof_sdw: Implement add_dai_link to filter HDMI PCMs
# good: [2756b7f08ff6ca7c68c8c7dd61c8dc6895c9de34] ASoC: SOF: ipc4-pcm: Harmonize sof_ipc4_set_pipeline_state() dbg print
# good: [ace9b3daf2b4778358573d3698e34cb1c0fa7e14] ASoC: SOF: ipc4/Intel: Add support for library restore firmware functionality
# good: [cd4da713f99651e99fbce8ed6b6ec8f686c029a8] Documentation: PM: *_autosuspend() functions update last busy time
# good: [5fc2c383125c2b4b6037e02ad8796b776b25e6d0] spi: falcon: mark falcon_sflash_xfer() as static
# good: [7f8924e8785b68c998bc1906e049bf5595865e60] ASoC: dt-bindings: cirrus,cs42xx8: add 'port' property
# good: [3e1c01d06e1f52f78fe00ef26a9cf80dbb0a3115] regulator: rpi-panel-v2: Add shutdown hook
# good: [d9f38d9824bfb1b046d2e720349d2f45959ab184] ASoC: tegra: AHUB: Remove unneeded semicolon
# good: [dce4bc30f42d313b4dc5832316196411b7f07ad0] spi: spi-fsl-dspi: Revert unintended dependency change in config SPI_FSL_DSPI
# good: [47972c1c3315672352f25c68f91dd88543541947] ASoC: Intel: Replace deprecated strcpy() with strscpy()
# good: [5eb8a0d7733d4cd32a776acf1d1aa1c7c01c8a14] ASoC: hdmi-codec: use SND_JACK_AVOUT as jack status
# good: [bb8d8ba4715cb8f997d63d90ba935f6073595df5] ASoC: mediatek: mt8183-afe-pcm: use local `dev` pointer in driver callbacks
# good: [8a5a5cecb79058b608e5562d8998123a3adb313c] ASoC: tas2781: Move the "include linux/debugfs.h" into tas2781.h
# good: [a4eb71ff98c4792f441f108910bd829da7a04092] regulator: rpi-panel-v2: Fix missing OF dependency
# good: [6cafcc53eb5fffd9b9bdfde700bb9bad21e98ed3] spi: spi-mt65xx: Add support for MT6991 Dimensity 9400 SPI IPM
# good: [7e10d7242ea8a5947878880b912ffa5806520705] ASoC: ops: dynamically allocate struct snd_ctl_elem_value
# good: [d6fa0ca959db8efd4462d7beef4bdc5568640fd0] regulator: rpi-panel-v2: Add missing GPIOLIB dependency
# good: [1f5cdb6ab45e1c06ae0953609acbb52f8946b3e8] ASoC: codecs: Add support for Richtek RTQ9124
# good: [d49305862fdc4d9ff1b1093b4ed7d8e0cb9971b4] regulator: rpi-panel-v2: Add regulator for 7" Raspberry Pi 720x1280
# good: [6ba68e5aa9d5d15c8877a655db279fcfc0b38b04] ASoC: renesas: msiof: Convert to <linux/spi/sh_msiof.h>
# good: [03b778d1994827ea5cc971dbdfbb457bbb7bfa5d] ASOC: rockchip: Use helper function devm_clk_get_enabled()
# good: [c459262159f39e6e6336797feb975799344b749b] spi: spi-pci1xxxx: Add support for 25MHz Clock frequency in C0
# good: [548d770c330cd1027549947a6ea899c56b5bc4e4] regulator: pca9450: Add support for mode operations
# good: [267be32b0a7b70cc777f8a46f0904c92c0521d89] ASoC: remove component->id
# good: [f6f914893d478b7ba08e5c375de1ced16deb5e92] ASoC: dt-bindings: tas57xx: add tas5753 compatibility
# good: [111a2c8ab462d77d1519b71b46f13ae1b46920b4] ASoC: imx-card: Use helper function for_each_child_of_node_scoped()
# good: [9a30e332c36c52e92e5316b4a012d45284dedeb5] spi: spi-fsl-dspi: Enable support for S32G platforms
# good: [c95e925daa434ee1a40a86aec6476ce588e4bd77] ASoC: Intel: avs: Add rt5640 machine board
# good: [c8c4694ede7ed42d8d4db0e8927dea9839a3e248] regmap: kunit: Constify regmap_range_cfg array
# good: [e6e8897995a9e6028563ce36c27877e5478c8571] ASoC: qcom: sm8250: Add Fairphone 5 soundcard compatible
# good: [ece5d881004f041c2e1493436409dbcbea3ad5f8] ASoC: codecs: wcd939x: Drop unused 'struct wcd939x_priv' fields
# good: [b9ecde0bcf6a99a3ff08496d4ba90a385ebbfd68] ASoC: codecs: wcd939x: Add VDD_PX supply
# good: [ac209bde018fd320b79976657a44c23113181af6] ASoC: tas2781: Drop the unnecessary symbol imply
# good: [59d1fc7b3e1ae9d46799da0e71dafc7d71a154a0] fbdev: pm3fb: fix potential divide by zero
git bisect start 'b551c4e2a98a177a06148cf16505643cd2108386' '23c7369d4050e533fe661e5c750181dffe67b4b9' 'c61da55412a08268ea0cdef99dea11f7ade934ee' '68e4dadacb7faa393b532b41bbf99a2dbfec3b1b' 'defe01abfb7f5c5bd53c723b8577d4fcd64faa5a' '86ccd4d3e8bc9eeb5dde4080fcc67e0505d1d2c6' '5054740e0092aac528c0589251f612b3b41c9e7b' '08dc0f5cc26a203e8008c38d9b436c079e7dbb45' 'c61e94e5e4e6bc50064119e6a779564d1d2ac0e7' '571defe0dff3f1e4180bd0db79283d3d5bf74a71' '2fca750160f29015ab1109bb478537a4e415f7cd' '9f711c9321cffe3e03709176873c277fa911c366' 'bc163baef57002c08b3afe64cdd2f55f55a765eb' '2bd9648d5a8d329ca734ca2c273a80934867471e' 'baee26a9d6cd3d3c6c3c03c56270aa647a67e4bd' '7105fdd54a14bee49371b39374a61b3c967d74cb' '913bf8d50cbd144c87e9660b591781179182ff59' '34d340d48e595f8dfd4e72fe4100d2579dbe4a1a' '0c0ef1d90967717b91cded41b00dbae05d8e521c' '3fcd3d2fe44dc9dfca20b6aed117f314a50ba0ff' '244bc18e5f1875401a4af87d2eae3f9376d9d720' '7e1c28fbf235791cb5046fafdac5bc16fe8e788d' 'b9ab3b61824190b1c6b2c59e7ba4de591f24eb92' 'c4f2c05ab02952c9a56067aeb700ded95b183570' '427ceac823e58813b510e585011488f603f0d891' '29ddce17e909779633f856ad1c2f111fbf71c0df' 'ac4c064f67d3cdf9118b9b09c1e3b28b6c10a7ea' '08bf1663c21a3e815eda28fa242d84c945ca3b94' '2555691165a0285a4617230fed859f20dcc51608' 'ce57bc9771411d6d27f2ca7b40396cbd7d684ba9' '0f60ecffbfe35e12eb56c99640ba2360244b5bb3' '9a944494c299fabf3cc781798eb7c02a0bece364' '3e36c822506d924894ff7de549b9377d3114c2d7' '68fbc70ece40139380380dce74059afa592846b3' '8f9cf02c8852837923f1cdacfcc92e138513325c' '17cc308b183308bf5ada36e164284fff7eb064ba' '59566923d955b69bfb1e1163f07dff437dde8c9c' '024f39fff6d222cedde361f7fe34d9ba4e6afb92' '19cbc930c209d59a2c9828de4c7b767e9f14667e' 'c4ca928a6db1593802cd945f075a7e21dd0430c1' 'a48352921f0b15b1f7eff83f5b5613d6ae2350d3' '3421d46440ebe0865bec71dbd2330b4e17a425ab' 'a1d203d390e04798ccc1c3c06019cd4411885d6d' 'bb48117b79ebc39485f7306d09dc602981fe540f' '2756b7f08ff6ca7c68c8c7dd61c8dc6895c9de34' 'ace9b3daf2b4778358573d3698e34cb1c0fa7e14' 'cd4da713f99651e99fbce8ed6b6ec8f686c029a8' '5fc2c383125c2b4b6037e02ad8796b776b25e6d0' '7f8924e8785b68c998bc1906e049bf5595865e60' '3e1c01d06e1f52f78fe00ef26a9cf80dbb0a3115' 'd9f38d9824bfb1b046d2e720349d2f45959ab184' 'dce4bc30f42d313b4dc5832316196411b7f07ad0' '47972c1c3315672352f25c68f91dd88543541947' '5eb8a0d7733d4cd32a776acf1d1aa1c7c01c8a14' 'bb8d8ba4715cb8f997d63d90ba935f6073595df5' '8a5a5cecb79058b608e5562d8998123a3adb313c' 'a4eb71ff98c4792f441f108910bd829da7a04092' '6cafcc53eb5fffd9b9bdfde700bb9bad21e98ed3' '7e10d7242ea8a5947878880b912ffa5806520705' 'd6fa0ca959db8efd4462d7beef4bdc5568640fd0' '1f5cdb6ab45e1c06ae0953609acbb52f8946b3e8' 'd49305862fdc4d9ff1b1093b4ed7d8e0cb9971b4' '6ba68e5aa9d5d15c8877a655db279fcfc0b38b04' '03b778d1994827ea5cc971dbdfbb457bbb7bfa5d' 'c459262159f39e6e6336797feb975799344b749b' '548d770c330cd1027549947a6ea899c56b5bc4e4' '267be32b0a7b70cc777f8a46f0904c92c0521d89' 'f6f914893d478b7ba08e5c375de1ced16deb5e92' '111a2c8ab462d77d1519b71b46f13ae1b46920b4' '9a30e332c36c52e92e5316b4a012d45284dedeb5' 'c95e925daa434ee1a40a86aec6476ce588e4bd77' 'c8c4694ede7ed42d8d4db0e8927dea9839a3e248' 'e6e8897995a9e6028563ce36c27877e5478c8571' 'ece5d881004f041c2e1493436409dbcbea3ad5f8' 'b9ecde0bcf6a99a3ff08496d4ba90a385ebbfd68' 'ac209bde018fd320b79976657a44c23113181af6' '59d1fc7b3e1ae9d46799da0e71dafc7d71a154a0'
# test job: [c61da55412a08268ea0cdef99dea11f7ade934ee] https://lava.sirena.org.uk/scheduler/job/1554477
# test job: [68e4dadacb7faa393b532b41bbf99a2dbfec3b1b] https://lava.sirena.org.uk/scheduler/job/1553557
# test job: [defe01abfb7f5c5bd53c723b8577d4fcd64faa5a] https://lava.sirena.org.uk/scheduler/job/1553629
# test job: [86ccd4d3e8bc9eeb5dde4080fcc67e0505d1d2c6] https://lava.sirena.org.uk/scheduler/job/1547910
# test job: [5054740e0092aac528c0589251f612b3b41c9e7b] https://lava.sirena.org.uk/scheduler/job/1546901
# test job: [08dc0f5cc26a203e8008c38d9b436c079e7dbb45] https://lava.sirena.org.uk/scheduler/job/1546283
# test job: [c61e94e5e4e6bc50064119e6a779564d1d2ac0e7] https://lava.sirena.org.uk/scheduler/job/1538599
# test job: [571defe0dff3f1e4180bd0db79283d3d5bf74a71] https://lava.sirena.org.uk/scheduler/job/1539773
# test job: [2fca750160f29015ab1109bb478537a4e415f7cd] https://lava.sirena.org.uk/scheduler/job/1540311
# test job: [9f711c9321cffe3e03709176873c277fa911c366] https://lava.sirena.org.uk/scheduler/job/1538686
# test job: [bc163baef57002c08b3afe64cdd2f55f55a765eb] https://lava.sirena.org.uk/scheduler/job/1538770
# test job: [2bd9648d5a8d329ca734ca2c273a80934867471e] https://lava.sirena.org.uk/scheduler/job/1539579
# test job: [baee26a9d6cd3d3c6c3c03c56270aa647a67e4bd] https://lava.sirena.org.uk/scheduler/job/1533839
# test job: [7105fdd54a14bee49371b39374a61b3c967d74cb] https://lava.sirena.org.uk/scheduler/job/1533549
# test job: [913bf8d50cbd144c87e9660b591781179182ff59] https://lava.sirena.org.uk/scheduler/job/1531275
# test job: [34d340d48e595f8dfd4e72fe4100d2579dbe4a1a] https://lava.sirena.org.uk/scheduler/job/1530304
# test job: [0c0ef1d90967717b91cded41b00dbae05d8e521c] https://lava.sirena.org.uk/scheduler/job/1530359
# test job: [3fcd3d2fe44dc9dfca20b6aed117f314a50ba0ff] https://lava.sirena.org.uk/scheduler/job/1528973
# test job: [244bc18e5f1875401a4af87d2eae3f9376d9d720] https://lava.sirena.org.uk/scheduler/job/1528302
# test job: [7e1c28fbf235791cb5046fafdac5bc16fe8e788d] https://lava.sirena.org.uk/scheduler/job/1525649
# test job: [b9ab3b61824190b1c6b2c59e7ba4de591f24eb92] https://lava.sirena.org.uk/scheduler/job/1526361
# test job: [c4f2c05ab02952c9a56067aeb700ded95b183570] https://lava.sirena.org.uk/scheduler/job/1526591
# test job: [427ceac823e58813b510e585011488f603f0d891] https://lava.sirena.org.uk/scheduler/job/1525654
# test job: [29ddce17e909779633f856ad1c2f111fbf71c0df] https://lava.sirena.org.uk/scheduler/job/1523992
# test job: [ac4c064f67d3cdf9118b9b09c1e3b28b6c10a7ea] https://lava.sirena.org.uk/scheduler/job/1517626
# test job: [08bf1663c21a3e815eda28fa242d84c945ca3b94] https://lava.sirena.org.uk/scheduler/job/1517665
# test job: [2555691165a0285a4617230fed859f20dcc51608] https://lava.sirena.org.uk/scheduler/job/1515754
# test job: [ce57bc9771411d6d27f2ca7b40396cbd7d684ba9] https://lava.sirena.org.uk/scheduler/job/1515782
# test job: [0f60ecffbfe35e12eb56c99640ba2360244b5bb3] https://lava.sirena.org.uk/scheduler/job/1511594
# test job: [9a944494c299fabf3cc781798eb7c02a0bece364] https://lava.sirena.org.uk/scheduler/job/1507934
# test job: [3e36c822506d924894ff7de549b9377d3114c2d7] https://lava.sirena.org.uk/scheduler/job/1506338
# test job: [68fbc70ece40139380380dce74059afa592846b3] https://lava.sirena.org.uk/scheduler/job/1504159
# test job: [8f9cf02c8852837923f1cdacfcc92e138513325c] https://lava.sirena.org.uk/scheduler/job/1502880
# test job: [17cc308b183308bf5ada36e164284fff7eb064ba] https://lava.sirena.org.uk/scheduler/job/1501547
# test job: [59566923d955b69bfb1e1163f07dff437dde8c9c] https://lava.sirena.org.uk/scheduler/job/1499646
# test job: [024f39fff6d222cedde361f7fe34d9ba4e6afb92] https://lava.sirena.org.uk/scheduler/job/1499695
# test job: [19cbc930c209d59a2c9828de4c7b767e9f14667e] https://lava.sirena.org.uk/scheduler/job/1497287
# test job: [c4ca928a6db1593802cd945f075a7e21dd0430c1] https://lava.sirena.org.uk/scheduler/job/1496260
# test job: [a48352921f0b15b1f7eff83f5b5613d6ae2350d3] https://lava.sirena.org.uk/scheduler/job/1497370
# test job: [3421d46440ebe0865bec71dbd2330b4e17a425ab] https://lava.sirena.org.uk/scheduler/job/1493080
# test job: [a1d203d390e04798ccc1c3c06019cd4411885d6d] https://lava.sirena.org.uk/scheduler/job/1491512
# test job: [bb48117b79ebc39485f7306d09dc602981fe540f] https://lava.sirena.org.uk/scheduler/job/1489355
# test job: [2756b7f08ff6ca7c68c8c7dd61c8dc6895c9de34] https://lava.sirena.org.uk/scheduler/job/1489208
# test job: [ace9b3daf2b4778358573d3698e34cb1c0fa7e14] https://lava.sirena.org.uk/scheduler/job/1489287
# test job: [cd4da713f99651e99fbce8ed6b6ec8f686c029a8] https://lava.sirena.org.uk/scheduler/job/1538824
# test job: [5fc2c383125c2b4b6037e02ad8796b776b25e6d0] https://lava.sirena.org.uk/scheduler/job/1486895
# test job: [7f8924e8785b68c998bc1906e049bf5595865e60] https://lava.sirena.org.uk/scheduler/job/1486909
# test job: [3e1c01d06e1f52f78fe00ef26a9cf80dbb0a3115] https://lava.sirena.org.uk/scheduler/job/1481719
# test job: [d9f38d9824bfb1b046d2e720349d2f45959ab184] https://lava.sirena.org.uk/scheduler/job/1481618
# test job: [dce4bc30f42d313b4dc5832316196411b7f07ad0] https://lava.sirena.org.uk/scheduler/job/1479448
# test job: [47972c1c3315672352f25c68f91dd88543541947] https://lava.sirena.org.uk/scheduler/job/1479573
# test job: [5eb8a0d7733d4cd32a776acf1d1aa1c7c01c8a14] https://lava.sirena.org.uk/scheduler/job/1474690
# test job: [bb8d8ba4715cb8f997d63d90ba935f6073595df5] https://lava.sirena.org.uk/scheduler/job/1472408
# test job: [8a5a5cecb79058b608e5562d8998123a3adb313c] https://lava.sirena.org.uk/scheduler/job/1472426
# test job: [a4eb71ff98c4792f441f108910bd829da7a04092] https://lava.sirena.org.uk/scheduler/job/1468996
# test job: [6cafcc53eb5fffd9b9bdfde700bb9bad21e98ed3] https://lava.sirena.org.uk/scheduler/job/1468939
# test job: [7e10d7242ea8a5947878880b912ffa5806520705] https://lava.sirena.org.uk/scheduler/job/1466044
# test job: [d6fa0ca959db8efd4462d7beef4bdc5568640fd0] https://lava.sirena.org.uk/scheduler/job/1464680
# test job: [1f5cdb6ab45e1c06ae0953609acbb52f8946b3e8] https://lava.sirena.org.uk/scheduler/job/1462968
# test job: [d49305862fdc4d9ff1b1093b4ed7d8e0cb9971b4] https://lava.sirena.org.uk/scheduler/job/1463040
# test job: [6ba68e5aa9d5d15c8877a655db279fcfc0b38b04] https://lava.sirena.org.uk/scheduler/job/1463329
# test job: [03b778d1994827ea5cc971dbdfbb457bbb7bfa5d] https://lava.sirena.org.uk/scheduler/job/1461889
# test job: [c459262159f39e6e6336797feb975799344b749b] https://lava.sirena.org.uk/scheduler/job/1460981
# test job: [548d770c330cd1027549947a6ea899c56b5bc4e4] https://lava.sirena.org.uk/scheduler/job/1460090
# test job: [267be32b0a7b70cc777f8a46f0904c92c0521d89] https://lava.sirena.org.uk/scheduler/job/1460420
# test job: [f6f914893d478b7ba08e5c375de1ced16deb5e92] https://lava.sirena.org.uk/scheduler/job/1461475
# test job: [111a2c8ab462d77d1519b71b46f13ae1b46920b4] https://lava.sirena.org.uk/scheduler/job/1460866
# test job: [9a30e332c36c52e92e5316b4a012d45284dedeb5] https://lava.sirena.org.uk/scheduler/job/1460552
# test job: [c95e925daa434ee1a40a86aec6476ce588e4bd77] https://lava.sirena.org.uk/scheduler/job/1460133
# test job: [c8c4694ede7ed42d8d4db0e8927dea9839a3e248] https://lava.sirena.org.uk/scheduler/job/1461293
# test job: [e6e8897995a9e6028563ce36c27877e5478c8571] https://lava.sirena.org.uk/scheduler/job/1461776
# test job: [ece5d881004f041c2e1493436409dbcbea3ad5f8] https://lava.sirena.org.uk/scheduler/job/1461679
# test job: [b9ecde0bcf6a99a3ff08496d4ba90a385ebbfd68] https://lava.sirena.org.uk/scheduler/job/1461079
# test job: [ac209bde018fd320b79976657a44c23113181af6] https://lava.sirena.org.uk/scheduler/job/1461915
# test job: [59d1fc7b3e1ae9d46799da0e71dafc7d71a154a0] https://lava.sirena.org.uk/scheduler/job/1486270
# test job: [b551c4e2a98a177a06148cf16505643cd2108386] https://lava.sirena.org.uk/scheduler/job/1556942
# bad: [b551c4e2a98a177a06148cf16505643cd2108386] Add linux-next specific files for 20250710
git bisect bad b551c4e2a98a177a06148cf16505643cd2108386
# test job: [f3de7f26f8f9605ff28de30c0aede05f8d4e200e] https://lava.sirena.org.uk/scheduler/job/1556998
# bad: [f3de7f26f8f9605ff28de30c0aede05f8d4e200e] Merge branch 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
git bisect bad f3de7f26f8f9605ff28de30c0aede05f8d4e200e
# test job: [9dad5d75b0f2a73d55ed8086e9358043d2e0c716] https://lava.sirena.org.uk/scheduler/job/1557216
# bad: [9dad5d75b0f2a73d55ed8086e9358043d2e0c716] Merge branch 'fs-next' of linux-next
git bisect bad 9dad5d75b0f2a73d55ed8086e9358043d2e0c716
# test job: [454cf3d6f23d6190f3ed7041df67b1b9f341401c] https://lava.sirena.org.uk/scheduler/job/1557308
# bad: [454cf3d6f23d6190f3ed7041df67b1b9f341401c] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip.git
git bisect bad 454cf3d6f23d6190f3ed7041df67b1b9f341401c
# test job: [cf9c77f6aac29c615ae95863ae8b166ef545a706] https://lava.sirena.org.uk/scheduler/job/1557353
# bad: [cf9c77f6aac29c615ae95863ae8b166ef545a706] Merge branch 'for-next/perf' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git
git bisect bad cf9c77f6aac29c615ae95863ae8b166ef545a706
# test job: [96af8062e56b3b6e498e5e428c95c2851e0928da] https://lava.sirena.org.uk/scheduler/job/1557498
# good: [96af8062e56b3b6e498e5e428c95c2851e0928da] mm/mremap: refactor initial parameter sanity checks
git bisect good 96af8062e56b3b6e498e5e428c95c2851e0928da
# test job: [585189332afe02c99e66c6a0d328fe05e456ff6a] https://lava.sirena.org.uk/scheduler/job/1557542
# good: [585189332afe02c99e66c6a0d328fe05e456ff6a] perf vendor events: Update TigerLake events
git bisect good 585189332afe02c99e66c6a0d328fe05e456ff6a
# test job: [1fee8fec454108627040ef10e82ae069d4fd6011] https://lava.sirena.org.uk/scheduler/job/1557714
# bad: [1fee8fec454108627040ef10e82ae069d4fd6011] Merge branch 'perf-tools-next' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git
git bisect bad 1fee8fec454108627040ef10e82ae069d4fd6011
# test job: [0f6de4b4991f0928010f1adb3510fe4f28a39151] https://lava.sirena.org.uk/scheduler/job/1557844
# good: [0f6de4b4991f0928010f1adb3510fe4f28a39151] ocfs2: embed actual values into ocfs2_sysfile_lock_key names
git bisect good 0f6de4b4991f0928010f1adb3510fe4f28a39151
# test job: [c72e3e275d53bb3346882830aeff0548f2e49939] https://lava.sirena.org.uk/scheduler/job/1557926
# good: [c72e3e275d53bb3346882830aeff0548f2e49939] squashfs: fix incorrect argument to sizeof in kmalloc_array call
git bisect good c72e3e275d53bb3346882830aeff0548f2e49939
# test job: [eecc2c71dba4c9578d8cc3a1d45b1b2d8f38e72d] https://lava.sirena.org.uk/scheduler/job/1557958
# bad: [eecc2c71dba4c9578d8cc3a1d45b1b2d8f38e72d] Merge branch 'mm-unstable' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
git bisect bad eecc2c71dba4c9578d8cc3a1d45b1b2d8f38e72d
# test job: [1ddb6c7254bd854323be891703b1fa5971f9dc23] https://lava.sirena.org.uk/scheduler/job/1558086
# good: [1ddb6c7254bd854323be891703b1fa5971f9dc23] mm/mremap: check remap conditions earlier
git bisect good 1ddb6c7254bd854323be891703b1fa5971f9dc23
# test job: [4faaafd5b36a76b214af76679edb664ae12ed1e9] https://lava.sirena.org.uk/scheduler/job/1558195
# good: [4faaafd5b36a76b214af76679edb664ae12ed1e9] mm/mremap: clean up mlock populate behaviour
git bisect good 4faaafd5b36a76b214af76679edb664ae12ed1e9
# test job: [db63de7a28bf61149a11217e527944396cfaf30a] https://lava.sirena.org.uk/scheduler/job/1558295
# bad: [db63de7a28bf61149a11217e527944396cfaf30a] tools/testing/selftests: extend mremap_test to test multi-VMA mremap
git bisect bad db63de7a28bf61149a11217e527944396cfaf30a
# test job: [b4bfab2332244ef12c6a9fb1165d67f0e4747e1f] https://lava.sirena.org.uk/scheduler/job/1558332
# bad: [b4bfab2332244ef12c6a9fb1165d67f0e4747e1f] mm/mremap: permit mremap() move of multiple VMAs
git bisect bad b4bfab2332244ef12c6a9fb1165d67f0e4747e1f
# first bad commit: [b4bfab2332244ef12c6a9fb1165d67f0e4747e1f] mm/mremap: permit mremap() move of multiple VMAs
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 09/10] mm/mremap: permit mremap() move of multiple VMAs
2025-07-11 8:17 ` Mark Brown
@ 2025-07-11 8:22 ` Mark Brown
2025-07-11 8:31 ` Lorenzo Stoakes
0 siblings, 1 reply; 31+ messages in thread
From: Mark Brown @ 2025-07-11 8:22 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Peter Xu, Alexander Viro, Christian Brauner,
Jan Kara, Liam R . Howlett, Vlastimil Babka, Jann Horn,
Pedro Falcato, Rik van Riel, linux-mm, linux-fsdevel,
linux-kernel, linux-kselftest
[-- Attachment #1: Type: text/plain, Size: 945 bytes --]
On Fri, Jul 11, 2025 at 09:17:27AM +0100, Mark Brown wrote:
> On Mon, Jul 07, 2025 at 06:27:52AM +0100, Lorenzo Stoakes wrote:
> > Historically we've made it a uAPI requirement that mremap() may only
> > operate on a single VMA at a time.
> >
> > For instances where VMAs need to be resized, this makes sense, as it
> > becomes very difficult to determine what a user actually wants should they
> > indicate a desire to expand or shrink the size of multiple VMAs (truncate?
> > Adjust sizes individually? Some other strategy?).
>
> I'm seeing failures in the mremap_dontunmap test in -next on Rasperry Pi
> 4 which bisect down to this patch. The test logging isn't super helpful
> here sadly:
Same thing on Orion O6 (a more modern ARM v9 system with more RAM than
my Pi):
https://lava.sirena.org.uk/scheduler/job/1556807
and Avenger 96 (which is 32 bit arm):
https://lava.sirena.org.uk/scheduler/job/1556479
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 09/10] mm/mremap: permit mremap() move of multiple VMAs
2025-07-11 8:22 ` Mark Brown
@ 2025-07-11 8:31 ` Lorenzo Stoakes
0 siblings, 0 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-07-11 8:31 UTC (permalink / raw)
To: Mark Brown
Cc: Andrew Morton, Peter Xu, Alexander Viro, Christian Brauner,
Jan Kara, Liam R . Howlett, Vlastimil Babka, Jann Horn,
Pedro Falcato, Rik van Riel, linux-mm, linux-fsdevel,
linux-kernel, linux-kselftest
Thanks, yeah, I'm going to send a respin today. I accidentally enabled for the
MREMAP_DONTUNMAP case, so this should resolve it among other things.
Note there's a v2 at
https://lore.kernel.org/linux-mm/cover.1752162066.git.lorenzo.stoakes@oracle.com/
On Fri, Jul 11, 2025 at 09:22:13AM +0100, Mark Brown wrote:
> On Fri, Jul 11, 2025 at 09:17:27AM +0100, Mark Brown wrote:
> > On Mon, Jul 07, 2025 at 06:27:52AM +0100, Lorenzo Stoakes wrote:
> > > Historically we've made it a uAPI requirement that mremap() may only
> > > operate on a single VMA at a time.
> > >
> > > For instances where VMAs need to be resized, this makes sense, as it
> > > becomes very difficult to determine what a user actually wants should they
> > > indicate a desire to expand or shrink the size of multiple VMAs (truncate?
> > > Adjust sizes individually? Some other strategy?).
> >
> > I'm seeing failures in the mremap_dontunmap test in -next on Rasperry Pi
> > 4 which bisect down to this patch. The test logging isn't super helpful
> > here sadly:
>
> Same thing on Orion O6 (a more modern ARM v9 system with more RAM than
> my Pi):
>
> https://lava.sirena.org.uk/scheduler/job/1556807
>
> and Avenger 96 (which is 32 bit arm):
>
> https://lava.sirena.org.uk/scheduler/job/1556479
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2025-07-11 8:31 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-07 5:27 [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs Lorenzo Stoakes
2025-07-07 5:27 ` [PATCH 01/10] mm/mremap: perform some simple cleanups Lorenzo Stoakes
2025-07-10 11:09 ` Vlastimil Babka
2025-07-07 5:27 ` [PATCH 02/10] mm/mremap: refactor initial parameter sanity checks Lorenzo Stoakes
2025-07-10 11:38 ` Vlastimil Babka
2025-07-07 5:27 ` [PATCH 03/10] mm/mremap: put VMA check and prep logic into helper function Lorenzo Stoakes
2025-07-10 13:10 ` Vlastimil Babka
2025-07-07 5:27 ` [PATCH 04/10] mm/mremap: cleanup post-processing stage of mremap Lorenzo Stoakes
2025-07-10 13:49 ` Vlastimil Babka
2025-07-10 15:28 ` Lorenzo Stoakes
2025-07-07 5:27 ` [PATCH 05/10] mm/mremap: use an explicit uffd failure path for mremap Lorenzo Stoakes
2025-07-07 7:56 ` kernel test robot
2025-07-07 10:13 ` Lorenzo Stoakes
2025-07-07 10:20 ` Lorenzo Stoakes
2025-07-10 14:24 ` Vlastimil Babka
2025-07-07 5:27 ` [PATCH 06/10] mm/mremap: check remap conditions earlier Lorenzo Stoakes
2025-07-10 14:36 ` Vlastimil Babka
2025-07-07 5:27 ` [PATCH 07/10] mm/mremap: move remap_is_valid() into check_prep_vma() Lorenzo Stoakes
2025-07-10 14:44 ` Vlastimil Babka
2025-07-07 5:27 ` [PATCH 08/10] mm/mremap: clean up mlock populate behaviour Lorenzo Stoakes
2025-07-10 14:47 ` Vlastimil Babka
2025-07-07 5:27 ` [PATCH 09/10] mm/mremap: permit mremap() move of multiple VMAs Lorenzo Stoakes
2025-07-09 18:13 ` Liam R. Howlett
2025-07-10 10:41 ` Lorenzo Stoakes
2025-07-11 8:17 ` Mark Brown
2025-07-11 8:22 ` Mark Brown
2025-07-11 8:31 ` Lorenzo Stoakes
2025-07-07 5:27 ` [PATCH 10/10] tools/testing/selftests: extend mremap_test to test multi-VMA mremap Lorenzo Stoakes
2025-07-07 6:12 ` [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs Hugh Dickins
2025-07-07 10:31 ` Lorenzo Stoakes
2025-07-07 10:34 ` Lorenzo Stoakes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).