* [PATCH v2 00/10] mm: remove vma_merge()
@ 2024-08-23 20:06 Lorenzo Stoakes
2024-08-23 20:06 ` [PATCH v2 01/10] tools: improve vma test Makefile Lorenzo Stoakes
` (9 more replies)
0 siblings, 10 replies; 26+ messages in thread
From: Lorenzo Stoakes @ 2024-08-23 20:06 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton; +Cc: Liam R . Howlett, Vlastimil Babka
REVIEWERS NOTE:
This series is based on mm-unstable and rebased on Liam's series [0],
including the fix patch [1] sent for this. In order to review these
patches locally, if they are not already in mm-unstable, you will need to
apply those series before applying this one.
The infamous vma_merge() function has been the cause of a great deal of
pain, bugs and confusion for a very long time.
It is subtle, contains many corner cases, tries to do far too much and is
as a result very fragile.
The fact that the function requires there to be a numbering system to cover
each possible eventuality with references to each in the many branches of
its implementation as to which case you are looking at speaks to all this.
Some of this complexity is inherent - unfortunately there is no getting
away from the need to figure out precisely how to execute the merge,
whether we need to remove VMAs, whether it is safe to do so, what
constitutes a mergeable VMA and so on.
However, a lot of the complexity is not inherent but instead a product of
the function's 'organic' development.
Liam has gone to great lengths to improve the situation as a part of his
maple tree implementation, greatly improving the readability of the code,
and Vlastimil and myself have additionally gone to lengths to try to
improve things further.
However, with the availability of userland VMA testing, it now becomes
possible to perform a rather more significant refactoring while maintaining
confidence in its correct operation.
An attempt was previously made by Vlastimil [2] to eliminate vma_merge(),
however it was rather - brutal - and an astute reader might refer to the
date of that patch for insight as to its intent.
This series instead divides merge operations into two natural kinds -
merges which occur when a NEW vma is being added to the address space, and
merges which occur when a vma is being MODIFIED.
Happily, the vma_expand() function introduced by Liam, which has the
capacity for also deleting a subsequent VMA, covers each of the NEW vma
cases.
By abstracting the actual final commit of changes to a VMA to its own
function, commit_merge() and writing a wrapper around vma_expand() for new
VMA cases vma_merge_new_range(), we can avoid having to use vma_merge() for
these instances altogether.
By doing so we are also able to then de-duplicate all existing merge logic
in mmap_region() and do_brk_flags() and have everything invoke this new
function, so we universally take the same approach to merging new VMAs.
Having done so, we can then completely rework vma_merge() into
vma_merge_existing_range() and use this for the instances where a merge is
proposed for a region of an existing VMA.
This eliminates vma_merge() and its numbered cases and instead divides
things into logical cases - merge both, merge left, merge right (the latter
2 being either partial or full merges).
The code is heavily annotated with ASCII diagrams and greatly simplified in
comparison to the existing vma_merge() function.
Having made this change, we take the opportunity to address an issue with
merging VMAs possessing a vm_ops->close() hook - commit 714965ca8252
("mm/mmap: start distinguishing if vma can be removed in mergeability
test") and commit fc0c8f9089c2 ("mm, mmap: fix vma_merge() case 7 with
vma_ops->close") make efforts to relax how we handle these, making
assumptions about which VMAs might end up deleted (and thus, if possessing
a vm_ops->close() hook, cannot be).
This refactor means we do not need to guess, so instead explicitly only
disallow merge in instances where a VMA with a vm_ops->close() hook would
be deleted (and try a smaller merge in cases where this is possible).
In addition to these changes, we introduce a new vma_merge_struct
abstraction to allow VMA merge state to be threaded through the operation
neatly.
There is heavy unit testing provided for all merge functionality, added
prior to the refactoring, allowing for before/after testing.
The vm_ops->close() change also introduces exhaustive testing to
demonstrate that this functions as expected, and in addition to this the
reproduction code from commit fc0c8f9089c2 ("mm, mmap: fix vma_merge() case
7 with vma_ops->close") was tested and confirmed passing.
[0]:https://lore.kernel.org/all/20240822192543.3359552-1-Liam.Howlett@oracle.com
[1]:https://lore.kernel.org/all/20240823133034.3527917-1-Liam.Howlett@oracle.com
[2]:https://lore.kernel.org/linux-mm/20240401192623.18575-2-vbabka@suse.cz/
v2:
* Updated tests to function without the vmg change, and moved earlier in
series so we can test against the code _exactly_ as it was previously.
* Added vmg->mm to store mm_struct and avoid hacky container_of() in
vma_merge() prior to refactor. It's logical to thread this through.
* Stopped specifying vmg->vma for vma_merge_new_vma() from the start,
which was previously removed later in the series.
* Improve vma_modify_flags() to be better formatted for a large number of
flags.
* Removed if (vma) { ... } logic in mmap_region() and integrated the
approach from a later commit of putting logic into the if (next &&... )
block. Improved comment about why we are doing this.
* Introduced VMG_STATE() and VMG_VMA_STATE() macros and use these to avoid
duplication of initialisation of vmg state.
* Expanded the commit message for abstracting the policy comparison to
explain the logic.
* Reverted the use of vmg in vma_shrink() and split_vma().
* Reverted the cleanup of __split_vma() int -> bool as at this point fully
irrelevant to series.
* Reinstated incorrectly removed vmg.uffd_ctx assignment in mmap_region().
* Removed a confusing comment about assignment of vmg.end in early version
of mmap_region().
* Renamed vma_merge_new_vma() to vma_merge_new_range() and
vma_merge_modified() to vma_merge_existing_range(). This makes it clearer
what we're attempting to do.
* Stopped setting vmg parameters in do_brk_flags() that we did not set in
the original implementation, i.e. vma parameters for things like
anon_vma, uffd context, etc. which in the original implementation are not
checked in can_vma_merge_after().
* Moved VM_SPECIAL maple tree rewalk out of if (!prev && !next) { ... }
block in vma_merge_new_range() (which was changed to !next anyway). This
should always be done in the VM_SPECIAL case if vmg->prev is specified.
* Updated vma_merge_new_range() to correct the case where prev, next could
be merged individually with the proposed range, however not
together.
* Update vma_merge_new_range() to require that the caller sets prev and
next. This simplifies the logic and avoids unnecessary maple tree walks.
* Updated mmap_region() to update vmg->flags from vma->vm_flags on merge
reattempt.
* Updated callers of vma_merge_new_range() to ensure we always point the
iterator at prev if it exists.
* Added new state field to vmg to allow for errors to be returned.
* Adjusted do_brk_flags() to read vmg->state and handle memory allocation
failures accordingly.
* Do not double-assign VM_SOFTDIRTY in do_brk_flags().
* Separated out move of vma_prepare(), init_vma_prep(), vma_complete(),
can_vma_merge_before(), can_vma_merge_after() functions to separate
commit.
* Adjusted commit_merge() change to initially _only_ have parameters
relevant to vma_expand() to make review easier.
* Reinstated 'vma iterator must be pointing to start' comment in
commit_merge().
* Adjusted commit_merge() again when introducing vma_merge_existing_range()
to accept parameters specific to existing range merges.
* Removed unnecessary abstraction of vmg->end in vma_merge_existing_range()
as only used once.
* Abstract expanded parameter to local variable for clarity in
vma_merge_existing_range().
* Unlink anon_vma objects if VMA pre-allocation fails on commit_merge() in
vma_merge_existing_range() if any were duplicated. This was incorrectly
excluded from the refactor.
* Moved comment from close commit regarding merge_will_delete_both to
previous commit as unchanged behaviour.
* Corrected failure to assign vmg->flags after applying VM_ACCOUNT in
map_region() (this had caused a ~5% regression in do_brk_flags()
incidentally, now resolved).
* Added vmi assumptions and asserts in merge functions.
* Added lock asserts in merge functions.
* Added an assert to vma_merge_new_range() to ensure no VMA within
[vmg->start, vmg->end).
* Added additional comments describing why we are moving the iterator to
avoid maple tree re-walks.
* Added new test for the case of prev, next both with vm_ops->close()
adding a new VMA, which should result in prev being expanded but NOT
merged with next.
* Adjusted test code to do a mock version of anon_vma duplication, and
cleanup after itself.
* Adjusted test code to allow vma preallocation to fail so we can test
how we handle this.
* Added a test to assert correct anon_vma duplication behaviour.
* Added a test to assert that preallocation failure results in anon_vma's
being unlinked.
* Corrected vma_expand() assumption - we need vma, next not prev.
* Reinstated removed VM_WARN_ON() around vp.anon_vma state in
commit_merge().
* Rebased over Pedro + Liam's changes.
* Updated test logic to handle current->{mm,pid,comm} fields after rebase
on Liam's changes which use these. Also added stub for pr_warn_once() for
the same reason.
* Adjusted logic fundamentals based on rebase - vma_merge_new_range() now
assumes vmi is pointing at the gap...
v1:
https://lore.kernel.org/linux-mm/cover.1722849859.git.lorenzo.stoakes@oracle.com/
Lorenzo Stoakes (10):
tools: improve vma test Makefile
tools: add VMA merge tests
mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify()
mm: remove duplicated open-coded VMA policy check
mm: abstract vma_expand() to use vma_merge_struct
mm: avoid using vma_merge() for new VMAs
mm: make vma_prepare() and friends static and internal to vma.c
mm: introduce commit_merge(), abstracting final commit of merge
mm: refactor vma_merge() into modify-only vma_merge_existing_range()
mm: rework vm_ops->close() handling on VMA merge
mm/mmap.c | 106 +--
mm/vma.c | 1297 ++++++++++++++++-------------
mm/vma.h | 152 ++--
tools/testing/vma/Makefile | 6 +-
tools/testing/vma/vma.c | 1302 +++++++++++++++++++++++++++++-
tools/testing/vma/vma_internal.h | 51 +-
6 files changed, 2217 insertions(+), 697 deletions(-)
--
2.46.0
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 01/10] tools: improve vma test Makefile
2024-08-23 20:06 [PATCH v2 00/10] mm: remove vma_merge() Lorenzo Stoakes
@ 2024-08-23 20:06 ` Lorenzo Stoakes
2024-08-28 19:16 ` Liam R. Howlett
2024-08-23 20:06 ` [PATCH v2 02/10] tools: add VMA merge tests Lorenzo Stoakes
` (8 subsequent siblings)
9 siblings, 1 reply; 26+ messages in thread
From: Lorenzo Stoakes @ 2024-08-23 20:06 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton; +Cc: Liam R . Howlett, Vlastimil Babka
Have vma.o depend on its source dependencies explicitly, as previously
these were simply being ignored as existing object files were up to date.
This now correctly re-triggers the build if mm/ source is changed as well
as local source code.
Also set clean as a phony rule.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
tools/testing/vma/Makefile | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/vma/Makefile b/tools/testing/vma/Makefile
index bfc905d222cf..860fd2311dcc 100644
--- a/tools/testing/vma/Makefile
+++ b/tools/testing/vma/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-or-later
-.PHONY: default
+.PHONY: default clean
default: vma
@@ -9,7 +9,9 @@ include ../shared/shared.mk
OFILES = $(SHARED_OFILES) vma.o maple-shim.o
TARGETS = vma
-vma: $(OFILES) vma_internal.h ../../../mm/vma.c ../../../mm/vma.h
+vma.o: vma.c vma_internal.h ../../../mm/vma.c ../../../mm/vma.h
+
+vma: $(OFILES)
$(CC) $(CFLAGS) -o $@ $(OFILES) $(LDLIBS)
clean:
--
2.46.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 02/10] tools: add VMA merge tests
2024-08-23 20:06 [PATCH v2 00/10] mm: remove vma_merge() Lorenzo Stoakes
2024-08-23 20:06 ` [PATCH v2 01/10] tools: improve vma test Makefile Lorenzo Stoakes
@ 2024-08-23 20:06 ` Lorenzo Stoakes
2024-08-28 19:16 ` Liam R. Howlett
2024-08-23 20:06 ` [PATCH v2 03/10] mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify() Lorenzo Stoakes
` (7 subsequent siblings)
9 siblings, 1 reply; 26+ messages in thread
From: Lorenzo Stoakes @ 2024-08-23 20:06 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton; +Cc: Liam R . Howlett, Vlastimil Babka
Add a variety of VMA merge unit tests to assert that the behaviour of VMA
merge is correct at an abstract level and VMAs are merged or not merged as
expected.
These are intentionally added _before_ we start refactoring vma_merge() in
order that we can continually assert correctness throughout the rest of the
series.
In order to reduce churn going forward, we backport the vma_merge_struct
data type to the test code which we introduce and use in a future commit,
and add wrappers around the merge new and existing VMA cases.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
tools/testing/vma/vma.c | 1218 +++++++++++++++++++++++++++++-
tools/testing/vma/vma_internal.h | 45 +-
2 files changed, 1253 insertions(+), 10 deletions(-)
diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
index 48e033c60d87..9b272633ca9e 100644
--- a/tools/testing/vma/vma.c
+++ b/tools/testing/vma/vma.c
@@ -7,13 +7,43 @@
#include "maple-shared.h"
#include "vma_internal.h"
+/* Include so header guard set. */
+#include "../../../mm/vma.h"
+
+static bool fail_prealloc;
+
+/* Then override vma_iter_prealloc() so we can choose to fail it. */
+#define vma_iter_prealloc(vmi, vma) \
+ (fail_prealloc ? -ENOMEM : mas_preallocate(&(vmi)->mas, (vma), GFP_KERNEL))
+
/*
* Directly import the VMA implementation here. Our vma_internal.h wrapper
* provides userland-equivalent functionality for everything vma.c uses.
*/
#include "../../../mm/vma.c"
+/*
+ * Temporarily forward-ported from a future in which vmg's are used for merging.
+ */
+struct vma_merge_struct {
+ struct mm_struct *mm;
+ struct vma_iterator *vmi;
+ pgoff_t pgoff;
+ struct vm_area_struct *prev;
+ struct vm_area_struct *next; /* Modified by vma_merge(). */
+ struct vm_area_struct *vma; /* Either a new VMA or the one being modified. */
+ unsigned long start;
+ unsigned long end;
+ unsigned long flags;
+ struct file *file;
+ struct anon_vma *anon_vma;
+ struct mempolicy *policy;
+ struct vm_userfaultfd_ctx uffd_ctx;
+ struct anon_vma_name *anon_name;
+};
+
const struct vm_operations_struct vma_dummy_vm_ops;
+static struct anon_vma dummy_anon_vma;
#define ASSERT_TRUE(_expr) \
do { \
@@ -28,6 +58,14 @@ const struct vm_operations_struct vma_dummy_vm_ops;
#define ASSERT_EQ(_val1, _val2) ASSERT_TRUE((_val1) == (_val2))
#define ASSERT_NE(_val1, _val2) ASSERT_TRUE((_val1) != (_val2))
+static struct task_struct __current;
+
+struct task_struct *get_current(void)
+{
+ return &__current;
+}
+
+/* Helper function to simply allocate a VMA. */
static struct vm_area_struct *alloc_vma(struct mm_struct *mm,
unsigned long start,
unsigned long end,
@@ -47,22 +85,201 @@ static struct vm_area_struct *alloc_vma(struct mm_struct *mm,
return ret;
}
+/* Helper function to allocate a VMA and link it to the tree. */
+static struct vm_area_struct *alloc_and_link_vma(struct mm_struct *mm,
+ unsigned long start,
+ unsigned long end,
+ pgoff_t pgoff,
+ vm_flags_t flags)
+{
+ struct vm_area_struct *vma = alloc_vma(mm, start, end, pgoff, flags);
+
+ if (vma == NULL)
+ return NULL;
+
+ if (vma_link(mm, vma)) {
+ vm_area_free(vma);
+ return NULL;
+ }
+
+ /*
+ * Reset this counter which we use to track whether writes have
+ * begun. Linking to the tree will have caused this to be incremented,
+ * which means we will get a false positive otherwise.
+ */
+ vma->vm_lock_seq = -1;
+
+ return vma;
+}
+
+/* Helper function which provides a wrapper around a merge new VMA operation. */
+static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
+{
+ /* vma_merge() needs a VMA to determine mm, anon_vma, and file. */
+ struct vm_area_struct dummy = {
+ .vm_mm = vmg->mm,
+ .vm_flags = vmg->flags,
+ .anon_vma = vmg->anon_vma,
+ .vm_file = vmg->file,
+ };
+
+ /*
+ * For convenience, get prev and next VMAs. Which the new VMA operation
+ * requires.
+ */
+ vmg->next = vma_next(vmg->vmi);
+ vmg->prev = vma_prev(vmg->vmi);
+
+ vma_iter_set(vmg->vmi, vmg->start);
+ return vma_merge_new_vma(vmg->vmi, vmg->prev, &dummy, vmg->start,
+ vmg->end, vmg->pgoff);
+}
+
+/*
+ * Helper function which provides a wrapper around a merge existing VMA
+ * operation.
+ */
+static struct vm_area_struct *merge_existing(struct vma_merge_struct *vmg)
+{
+ /* vma_merge() needs a VMA to determine mm, anon_vma, and file. */
+ struct vm_area_struct dummy = {
+ .vm_mm = vmg->mm,
+ .vm_flags = vmg->flags,
+ .anon_vma = vmg->anon_vma,
+ .vm_file = vmg->file,
+ };
+
+ return vma_merge(vmg->vmi, vmg->prev, &dummy, vmg->start, vmg->end,
+ vmg->flags, vmg->pgoff, vmg->policy, vmg->uffd_ctx,
+ vmg->anon_name);
+}
+
+/*
+ * Helper function which provides a wrapper around the expansion of an existing
+ * VMA.
+ */
+static int expand_existing(struct vma_merge_struct *vmg)
+{
+ return vma_expand(vmg->vmi, vmg->vma, vmg->start, vmg->end, vmg->pgoff,
+ vmg->next);
+}
+
+/*
+ * Helper function to reset merge state the associated VMA iterator to a
+ * specified new range.
+ */
+static void vmg_set_range(struct vma_merge_struct *vmg, unsigned long start,
+ unsigned long end, pgoff_t pgoff, vm_flags_t flags)
+{
+ vma_iter_set(vmg->vmi, start);
+
+ vmg->prev = NULL;
+ vmg->next = NULL;
+ vmg->vma = NULL;
+
+ vmg->start = start;
+ vmg->end = end;
+ vmg->pgoff = pgoff;
+ vmg->flags = flags;
+}
+
+/*
+ * Helper function to try to merge a new VMA.
+ *
+ * Update vmg and the iterator for it and try to merge, otherwise allocate a new
+ * VMA, link it to the maple tree and return it.
+ */
+static struct vm_area_struct *try_merge_new_vma(struct mm_struct *mm,
+ struct vma_merge_struct *vmg,
+ unsigned long start, unsigned long end,
+ pgoff_t pgoff, vm_flags_t flags,
+ bool *was_merged)
+{
+ struct vm_area_struct *merged;
+
+ vmg_set_range(vmg, start, end, pgoff, flags);
+
+ merged = merge_new(vmg);
+ if (merged) {
+ *was_merged = true;
+ return merged;
+ }
+
+ *was_merged = false;
+ return alloc_and_link_vma(mm, start, end, pgoff, flags);
+}
+
+/*
+ * Helper function to reset the dummy anon_vma to indicate it has not been
+ * duplicated.
+ */
+static void reset_dummy_anon_vma(void)
+{
+ dummy_anon_vma.was_cloned = false;
+ dummy_anon_vma.was_unlinked = false;
+}
+
+/*
+ * Helper function to remove all VMAs and destroy the maple tree associated with
+ * a virtual address space. Returns a count of VMAs in the tree.
+ */
+static int cleanup_mm(struct mm_struct *mm, struct vma_iterator *vmi)
+{
+ struct vm_area_struct *vma;
+ int count = 0;
+
+ fail_prealloc = false;
+ reset_dummy_anon_vma();
+
+ vma_iter_set(vmi, 0);
+ for_each_vma(*vmi, vma) {
+ vm_area_free(vma);
+ count++;
+ }
+
+ mtree_destroy(&mm->mm_mt);
+ mm->map_count = 0;
+ return count;
+}
+
+/* Helper function to determine if VMA has had vma_start_write() performed. */
+static bool vma_write_started(struct vm_area_struct *vma)
+{
+ int seq = vma->vm_lock_seq;
+
+ /* We reset after each check. */
+ vma->vm_lock_seq = -1;
+
+ /* The vma_start_write() stub simply increments this value. */
+ return seq > -1;
+}
+
+/* Helper function providing a dummy vm_ops->close() method.*/
+static void dummy_close(struct vm_area_struct *)
+{
+}
+
static bool test_simple_merge(void)
{
struct vm_area_struct *vma;
unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
struct mm_struct mm = {};
struct vm_area_struct *vma_left = alloc_vma(&mm, 0, 0x1000, 0, flags);
- struct vm_area_struct *vma_middle = alloc_vma(&mm, 0x1000, 0x2000, 1, flags);
struct vm_area_struct *vma_right = alloc_vma(&mm, 0x2000, 0x3000, 2, flags);
VMA_ITERATOR(vmi, &mm, 0x1000);
+ struct vma_merge_struct vmg = {
+ .mm = &mm,
+ .vmi = &vmi,
+ .start = 0x1000,
+ .end = 0x2000,
+ .flags = flags,
+ .pgoff = 1,
+ };
ASSERT_FALSE(vma_link(&mm, vma_left));
- ASSERT_FALSE(vma_link(&mm, vma_middle));
ASSERT_FALSE(vma_link(&mm, vma_right));
- vma = vma_merge_new_vma(&vmi, vma_left, vma_middle, 0x1000,
- 0x2000, 1);
+ vma = merge_new(&vmg);
ASSERT_NE(vma, NULL);
ASSERT_EQ(vma->vm_start, 0);
@@ -142,10 +359,17 @@ static bool test_simple_expand(void)
struct mm_struct mm = {};
struct vm_area_struct *vma = alloc_vma(&mm, 0, 0x1000, 0, flags);
VMA_ITERATOR(vmi, &mm, 0);
+ struct vma_merge_struct vmg = {
+ .vmi = &vmi,
+ .vma = vma,
+ .start = 0,
+ .end = 0x3000,
+ .pgoff = 0,
+ };
ASSERT_FALSE(vma_link(&mm, vma));
- ASSERT_FALSE(vma_expand(&vmi, vma, 0, 0x3000, 0, NULL));
+ ASSERT_FALSE(expand_existing(&vmg));
ASSERT_EQ(vma->vm_start, 0);
ASSERT_EQ(vma->vm_end, 0x3000);
@@ -178,6 +402,980 @@ static bool test_simple_shrink(void)
return true;
}
+static bool test_merge_new(void)
+{
+ unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+ struct mm_struct mm = {};
+ VMA_ITERATOR(vmi, &mm, 0);
+ struct vma_merge_struct vmg = {
+ .mm = &mm,
+ .vmi = &vmi,
+ };
+ struct anon_vma_chain dummy_anon_vma_chain_a = {
+ .anon_vma = &dummy_anon_vma,
+ };
+ struct anon_vma_chain dummy_anon_vma_chain_b = {
+ .anon_vma = &dummy_anon_vma,
+ };
+ struct anon_vma_chain dummy_anon_vma_chain_c = {
+ .anon_vma = &dummy_anon_vma,
+ };
+ struct anon_vma_chain dummy_anon_vma_chain_d = {
+ .anon_vma = &dummy_anon_vma,
+ };
+ int count;
+ struct vm_area_struct *vma, *vma_a, *vma_b, *vma_c, *vma_d;
+ bool merged;
+
+ /*
+ * 0123456789abc
+ * AA B CC
+ */
+ vma_a = alloc_and_link_vma(&mm, 0, 0x2000, 0, flags);
+ ASSERT_NE(vma_a, NULL);
+ /* We give each VMA a single avc so we can test anon_vma duplication. */
+ INIT_LIST_HEAD(&vma_a->anon_vma_chain);
+ list_add(&dummy_anon_vma_chain_a.same_vma, &vma_a->anon_vma_chain);
+
+ vma_b = alloc_and_link_vma(&mm, 0x3000, 0x4000, 3, flags);
+ ASSERT_NE(vma_b, NULL);
+ INIT_LIST_HEAD(&vma_b->anon_vma_chain);
+ list_add(&dummy_anon_vma_chain_b.same_vma, &vma_b->anon_vma_chain);
+
+ vma_c = alloc_and_link_vma(&mm, 0xb000, 0xc000, 0xb, flags);
+ ASSERT_NE(vma_c, NULL);
+ INIT_LIST_HEAD(&vma_c->anon_vma_chain);
+ list_add(&dummy_anon_vma_chain_c.same_vma, &vma_c->anon_vma_chain);
+
+ /*
+ * NO merge.
+ *
+ * 0123456789abc
+ * AA B ** CC
+ */
+ vma_d = try_merge_new_vma(&mm, &vmg, 0x7000, 0x9000, 7, flags, &merged);
+ ASSERT_NE(vma_d, NULL);
+ INIT_LIST_HEAD(&vma_d->anon_vma_chain);
+ list_add(&dummy_anon_vma_chain_d.same_vma, &vma_d->anon_vma_chain);
+ ASSERT_FALSE(merged);
+ ASSERT_EQ(mm.map_count, 4);
+
+ /*
+ * Merge BOTH sides.
+ *
+ * 0123456789abc
+ * AA*B DD CC
+ */
+ vma_b->anon_vma = &dummy_anon_vma;
+ vma = try_merge_new_vma(&mm, &vmg, 0x2000, 0x3000, 2, flags, &merged);
+ ASSERT_EQ(vma, vma_a);
+ /* Merge with A, delete B. */
+ ASSERT_TRUE(merged);
+ ASSERT_EQ(vma->vm_start, 0);
+ ASSERT_EQ(vma->vm_end, 0x4000);
+ ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_write_started(vma));
+ ASSERT_EQ(mm.map_count, 3);
+
+ /*
+ * Merge to PREVIOUS VMA.
+ *
+ * 0123456789abc
+ * AAAA* DD CC
+ */
+ vma = try_merge_new_vma(&mm, &vmg, 0x4000, 0x5000, 4, flags, &merged);
+ ASSERT_EQ(vma, vma_a);
+ /* Extend A. */
+ ASSERT_TRUE(merged);
+ ASSERT_EQ(vma->vm_start, 0);
+ ASSERT_EQ(vma->vm_end, 0x5000);
+ ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_write_started(vma));
+ ASSERT_EQ(mm.map_count, 3);
+
+ /*
+ * Merge to NEXT VMA.
+ *
+ * 0123456789abc
+ * AAAAA *DD CC
+ */
+ vma_d->anon_vma = &dummy_anon_vma;
+ vma = try_merge_new_vma(&mm, &vmg, 0x6000, 0x7000, 6, flags, &merged);
+ ASSERT_EQ(vma, vma_d);
+ /* Prepend. */
+ ASSERT_TRUE(merged);
+ ASSERT_EQ(vma->vm_start, 0x6000);
+ ASSERT_EQ(vma->vm_end, 0x9000);
+ ASSERT_EQ(vma->vm_pgoff, 6);
+ ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_write_started(vma));
+ ASSERT_EQ(mm.map_count, 3);
+
+ /*
+ * Merge BOTH sides.
+ *
+ * 0123456789abc
+ * AAAAA*DDD CC
+ */
+ vma = try_merge_new_vma(&mm, &vmg, 0x5000, 0x6000, 5, flags, &merged);
+ ASSERT_EQ(vma, vma_a);
+ /* Merge with A, delete D. */
+ ASSERT_TRUE(merged);
+ ASSERT_EQ(vma->vm_start, 0);
+ ASSERT_EQ(vma->vm_end, 0x9000);
+ ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_write_started(vma));
+ ASSERT_EQ(mm.map_count, 2);
+
+ /*
+ * Merge to NEXT VMA.
+ *
+ * 0123456789abc
+ * AAAAAAAAA *CC
+ */
+ vma_c->anon_vma = &dummy_anon_vma;
+ vma = try_merge_new_vma(&mm, &vmg, 0xa000, 0xb000, 0xa, flags, &merged);
+ ASSERT_EQ(vma, vma_c);
+ /* Prepend C. */
+ ASSERT_TRUE(merged);
+ ASSERT_EQ(vma->vm_start, 0xa000);
+ ASSERT_EQ(vma->vm_end, 0xc000);
+ ASSERT_EQ(vma->vm_pgoff, 0xa);
+ ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_write_started(vma));
+ ASSERT_EQ(mm.map_count, 2);
+
+ /*
+ * Merge BOTH sides.
+ *
+ * 0123456789abc
+ * AAAAAAAAA*CCC
+ */
+ vma = try_merge_new_vma(&mm, &vmg, 0x9000, 0xa000, 0x9, flags, &merged);
+ ASSERT_EQ(vma, vma_a);
+ /* Extend A and delete C. */
+ ASSERT_TRUE(merged);
+ ASSERT_EQ(vma->vm_start, 0);
+ ASSERT_EQ(vma->vm_end, 0xc000);
+ ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_write_started(vma));
+ ASSERT_EQ(mm.map_count, 1);
+
+ /*
+ * Final state.
+ *
+ * 0123456789abc
+ * AAAAAAAAAAAAA
+ */
+
+ count = 0;
+ vma_iter_set(&vmi, 0);
+ for_each_vma(vmi, vma) {
+ ASSERT_NE(vma, NULL);
+ ASSERT_EQ(vma->vm_start, 0);
+ ASSERT_EQ(vma->vm_end, 0xc000);
+ ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
+
+ vm_area_free(vma);
+ count++;
+ }
+
+ /* Should only have one VMA left (though freed) after all is done.*/
+ ASSERT_EQ(count, 1);
+
+ mtree_destroy(&mm.mm_mt);
+ return true;
+}
+
+static bool test_vma_merge_special_flags(void)
+{
+ unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+ struct mm_struct mm = {};
+ VMA_ITERATOR(vmi, &mm, 0);
+ struct vma_merge_struct vmg = {
+ .mm = &mm,
+ .vmi = &vmi,
+ };
+ vm_flags_t special_flags[] = { VM_IO, VM_DONTEXPAND, VM_PFNMAP, VM_MIXEDMAP };
+ vm_flags_t all_special_flags = 0;
+ int i;
+ struct vm_area_struct *vma_left, *vma;
+
+ /* Make sure there aren't new VM_SPECIAL flags. */
+ for (i = 0; i < ARRAY_SIZE(special_flags); i++) {
+ all_special_flags |= special_flags[i];
+ }
+ ASSERT_EQ(all_special_flags, VM_SPECIAL);
+
+ /*
+ * 01234
+ * AAA
+ */
+ vma_left = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ ASSERT_NE(vma_left, NULL);
+
+ /* 1. Set up new VMA with special flag that would otherwise merge. */
+
+ /*
+ * 01234
+ * AAA*
+ *
+ * This should merge if not for the VM_SPECIAL flag.
+ */
+ vmg_set_range(&vmg, 0x3000, 0x4000, 3, flags);
+ for (i = 0; i < ARRAY_SIZE(special_flags); i++) {
+ vm_flags_t special_flag = special_flags[i];
+
+ vma_left->__vm_flags = flags | special_flag;
+ vmg.flags = flags | special_flag;
+ vma = merge_new(&vmg);
+ ASSERT_EQ(vma, NULL);
+ }
+
+ /* 2. Modify VMA with special flag that would otherwise merge. */
+
+ /*
+ * 01234
+ * AAAB
+ *
+ * Create a VMA to modify.
+ */
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x4000, 3, flags);
+ ASSERT_NE(vma, NULL);
+ vmg.vma = vma;
+
+ for (i = 0; i < ARRAY_SIZE(special_flags); i++) {
+ vm_flags_t special_flag = special_flags[i];
+
+ vma_left->__vm_flags = flags | special_flag;
+ vmg.flags = flags | special_flag;
+ vma = merge_existing(&vmg);
+ ASSERT_EQ(vma, NULL);
+ }
+
+ cleanup_mm(&mm, &vmi);
+ return true;
+}
+
+static bool test_vma_merge_with_close(void)
+{
+ unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+ struct mm_struct mm = {};
+ VMA_ITERATOR(vmi, &mm, 0);
+ struct vma_merge_struct vmg = {
+ .mm = &mm,
+ .vmi = &vmi,
+ };
+ const struct vm_operations_struct vm_ops = {
+ .close = dummy_close,
+ };
+ struct vm_area_struct *vma_next =
+ alloc_and_link_vma(&mm, 0x2000, 0x3000, 2, flags);
+ struct vm_area_struct *vma;
+
+ /*
+ * When we merge VMAs we sometimes have to delete others as part of the
+ * operation.
+ *
+ * Considering the two possible adjacent VMAs to which a VMA can be
+ * merged:
+ *
+ * [ prev ][ vma ][ next ]
+ *
+ * In no case will we need to delete prev. If the operation is
+ * mergeable, then prev will be extended with one or both of vma and
+ * next deleted.
+ *
+ * As a result, during initial mergeability checks, only
+ * can_vma_merge_before() (which implies the VMA being merged with is
+ * 'next' as shown above) bothers to check to see whether the next VMA
+ * has a vm_ops->close() callback that will need to be called when
+ * removed.
+ *
+ * If it does, then we cannot merge as the resources that the close()
+ * operation potentially clears down are tied only to the existing VMA
+ * range and we have no way of extending those to the nearly merged one.
+ *
+ * We must consider two scenarios:
+ *
+ * A.
+ *
+ * vm_ops->close: - - !NULL
+ * [ prev ][ vma ][ next ]
+ *
+ * Where prev may or may not be present/mergeable.
+ *
+ * This is picked up by a specific check in can_vma_merge_before().
+ *
+ * B.
+ *
+ * vm_ops->close: - !NULL
+ * [ prev ][ vma ]
+ *
+ * Where prev and vma are present and mergeable.
+ *
+ * This is picked up by a specific check in the modified VMA merge.
+ *
+ * IMPORTANT NOTE: We make the assumption that the following case:
+ *
+ * - !NULL NULL
+ * [ prev ][ vma ][ next ]
+ *
+ * Cannot occur, because vma->vm_ops being the same implies the same
+ * vma->vm_file, and therefore this would mean that next->vm_ops->close
+ * would be set too, and thus scenario A would pick this up.
+ */
+
+ ASSERT_NE(vma_next, NULL);
+
+ /*
+ * SCENARIO A
+ *
+ * 0123
+ * *N
+ */
+
+ /* Make the next VMA have a close() callback. */
+ vma_next->vm_ops = &vm_ops;
+
+ /* Our proposed VMA has characteristics that would otherwise be merged. */
+ vmg_set_range(&vmg, 0x1000, 0x2000, 1, flags);
+
+ /* The next VMA having a close() operator should cause the merge to fail.*/
+ ASSERT_EQ(merge_new(&vmg), NULL);
+
+ /* Now create the VMA so we can merge via modified flags */
+ vmg_set_range(&vmg, 0x1000, 0x2000, 1, flags);
+ vma = alloc_and_link_vma(&mm, 0x1000, 0x2000, 1, flags);
+ vmg.vma = vma;
+
+ /*
+ * The VMA being modified in a way that would otherwise merge should
+ * also fail.
+ */
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+
+ /* SCENARIO B
+ *
+ * 0123
+ * P*
+ *
+ * In order for this scenario to trigger, the VMA currently being
+ * modified must also have a .close().
+ */
+
+ /* Reset VMG state. */
+ vmg_set_range(&vmg, 0x1000, 0x2000, 1, flags);
+ /*
+ * Make next unmergeable, and don't let the scenario A check pick this
+ * up, we want to reproduce scenario B only.
+ */
+ vma_next->vm_ops = NULL;
+ vma_next->__vm_flags &= ~VM_MAYWRITE;
+ /* Allocate prev. */
+ vmg.prev = alloc_and_link_vma(&mm, 0, 0x1000, 0, flags);
+ /* Assign a vm_ops->close() function to VMA explicitly. */
+ vma->vm_ops = &vm_ops;
+ vmg.vma = vma;
+ /* Make sure merge does not occur. */
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+
+ cleanup_mm(&mm, &vmi);
+ return true;
+}
+
+static bool test_vma_merge_new_with_close(void)
+{
+ unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+ struct mm_struct mm = {};
+ VMA_ITERATOR(vmi, &mm, 0);
+ struct vma_merge_struct vmg = {
+ .mm = &mm,
+ .vmi = &vmi,
+ };
+ struct vm_area_struct *vma_prev = alloc_and_link_vma(&mm, 0, 0x2000, 0, flags);
+ struct vm_area_struct *vma_next = alloc_and_link_vma(&mm, 0x5000, 0x7000, 5, flags);
+ const struct vm_operations_struct vm_ops = {
+ .close = dummy_close,
+ };
+ struct vm_area_struct *vma;
+
+ /*
+ * We should allow the partial merge of a proposed new VMA if the
+ * surrounding VMAs have vm_ops->close() hooks (but are otherwise
+ * compatible), e.g.:
+ *
+ * New VMA
+ * A v-------v B
+ * |-----| |-----|
+ * close close
+ *
+ * Since the rule is to not DELETE a VMA with a close operation, this
+ * should be permitted, only rather than expanding A and deleting B, we
+ * should simply expand A and leave B intact, e.g.:
+ *
+ * New VMA
+ * A B
+ * |------------||-----|
+ * close close
+ */
+
+ /* Have prev and next have a vm_ops->close() hook. */
+ vma_prev->vm_ops = &vm_ops;
+ vma_next->vm_ops = &vm_ops;
+
+ vmg_set_range(&vmg, 0x2000, 0x5000, 2, flags);
+ vma = merge_new(&vmg);
+ ASSERT_NE(vma, NULL);
+ ASSERT_EQ(vma->vm_start, 0);
+ ASSERT_EQ(vma->vm_end, 0x5000);
+ ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma->vm_ops, &vm_ops);
+ ASSERT_TRUE(vma_write_started(vma));
+ ASSERT_EQ(mm.map_count, 2);
+
+ cleanup_mm(&mm, &vmi);
+ return true;
+}
+
+static bool test_merge_existing(void)
+{
+ unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+ struct mm_struct mm = {};
+ VMA_ITERATOR(vmi, &mm, 0);
+ struct vm_area_struct *vma, *vma_prev, *vma_next;
+ struct vma_merge_struct vmg = {
+ .mm = &mm,
+ .vmi = &vmi,
+ };
+
+ /*
+ * Merge right case - partial span.
+ *
+ * <->
+ * 0123456789
+ * VVVVNNN
+ * ->
+ * 0123456789
+ * VNNNNNN
+ */
+ vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, flags);
+ vmg_set_range(&vmg, 0x3000, 0x6000, 3, flags);
+ vmg.vma = vma;
+ vmg.prev = vma;
+ vma->anon_vma = &dummy_anon_vma;
+ ASSERT_EQ(merge_existing(&vmg), vma_next);
+ ASSERT_EQ(vma_next->vm_start, 0x3000);
+ ASSERT_EQ(vma_next->vm_end, 0x9000);
+ ASSERT_EQ(vma_next->vm_pgoff, 3);
+ ASSERT_EQ(vma_next->anon_vma, &dummy_anon_vma);
+ ASSERT_EQ(vma->vm_start, 0x2000);
+ ASSERT_EQ(vma->vm_end, 0x3000);
+ ASSERT_EQ(vma->vm_pgoff, 2);
+ ASSERT_TRUE(vma_write_started(vma));
+ ASSERT_TRUE(vma_write_started(vma_next));
+ ASSERT_EQ(mm.map_count, 2);
+
+ /* Clear down and reset. */
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
+
+ /*
+ * Merge right case - full span.
+ *
+ * <-->
+ * 0123456789
+ * VVVVNNN
+ * ->
+ * 0123456789
+ * NNNNNNN
+ */
+ vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, flags);
+ vmg_set_range(&vmg, 0x2000, 0x6000, 2, flags);
+ vmg.vma = vma;
+ vma->anon_vma = &dummy_anon_vma;
+ ASSERT_EQ(merge_existing(&vmg), vma_next);
+ ASSERT_EQ(vma_next->vm_start, 0x2000);
+ ASSERT_EQ(vma_next->vm_end, 0x9000);
+ ASSERT_EQ(vma_next->vm_pgoff, 2);
+ ASSERT_EQ(vma_next->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_write_started(vma_next));
+ ASSERT_EQ(mm.map_count, 1);
+
+ /* Clear down and reset. We should have deleted vma. */
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);
+
+ /*
+ * Merge left case - partial span.
+ *
+ * <->
+ * 0123456789
+ * PPPVVVV
+ * ->
+ * 0123456789
+ * PPPPPPV
+ */
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
+ vmg_set_range(&vmg, 0x3000, 0x6000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+ vma->anon_vma = &dummy_anon_vma;
+
+ ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x6000);
+ ASSERT_EQ(vma_prev->vm_pgoff, 0);
+ ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
+ ASSERT_EQ(vma->vm_start, 0x6000);
+ ASSERT_EQ(vma->vm_end, 0x7000);
+ ASSERT_EQ(vma->vm_pgoff, 6);
+ ASSERT_TRUE(vma_write_started(vma_prev));
+ ASSERT_TRUE(vma_write_started(vma));
+ ASSERT_EQ(mm.map_count, 2);
+
+ /* Clear down and reset. */
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
+
+ /*
+ * Merge left case - full span.
+ *
+ * <-->
+ * 0123456789
+ * PPPVVVV
+ * ->
+ * 0123456789
+ * PPPPPPP
+ */
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
+ vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+ vma->anon_vma = &dummy_anon_vma;
+ ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x7000);
+ ASSERT_EQ(vma_prev->vm_pgoff, 0);
+ ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_write_started(vma_prev));
+ ASSERT_EQ(mm.map_count, 1);
+
+ /* Clear down and reset. We should have deleted vma. */
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);
+
+ /*
+ * Merge both case.
+ *
+ * <-->
+ * 0123456789
+ * PPPVVVVNNN
+ * ->
+ * 0123456789
+ * PPPPPPPPPP
+ */
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, flags);
+ vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+ vma->anon_vma = &dummy_anon_vma;
+ ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x9000);
+ ASSERT_EQ(vma_prev->vm_pgoff, 0);
+ ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_write_started(vma_prev));
+ ASSERT_EQ(mm.map_count, 1);
+
+ /* Clear down and reset. We should have deleted prev and next. */
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);
+
+ /*
+ * Non-merge ranges. the modified VMA merge operation assumes that the
+ * caller always specifies ranges within the input VMA so we need only
+ * examine these cases.
+ *
+ * -
+ * -
+ * -
+ * <->
+ * <>
+ * <>
+ * 0123456789a
+ * PPPVVVVVNNN
+ */
+
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x8000, 3, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x8000, 0xa000, 8, flags);
+
+ vmg_set_range(&vmg, 0x4000, 0x5000, 4, flags);
+ vmg.prev = vma;
+ vmg.vma = vma;
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+
+ vmg_set_range(&vmg, 0x5000, 0x6000, 5, flags);
+ vmg.prev = vma;
+ vmg.vma = vma;
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+
+ vmg_set_range(&vmg, 0x6000, 0x7000, 6, flags);
+ vmg.prev = vma;
+ vmg.vma = vma;
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+
+ vmg_set_range(&vmg, 0x4000, 0x7000, 4, flags);
+ vmg.prev = vma;
+ vmg.vma = vma;
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+
+ vmg_set_range(&vmg, 0x4000, 0x6000, 4, flags);
+ vmg.prev = vma;
+ vmg.vma = vma;
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+
+ vmg_set_range(&vmg, 0x5000, 0x6000, 5, flags);
+ vmg.prev = vma;
+ vmg.vma = vma;
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 3);
+
+ return true;
+}
+
+static bool test_anon_vma_non_mergeable(void)
+{
+ unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+ struct mm_struct mm = {};
+ VMA_ITERATOR(vmi, &mm, 0);
+ struct vm_area_struct *vma, *vma_prev, *vma_next;
+ struct vma_merge_struct vmg = {
+ .mm = &mm,
+ .vmi = &vmi,
+ };
+ struct anon_vma_chain dummy_anon_vma_chain1 = {
+ .anon_vma = &dummy_anon_vma,
+ };
+ struct anon_vma_chain dummy_anon_vma_chain2 = {
+ .anon_vma = &dummy_anon_vma,
+ };
+
+ /*
+ * In the case of modified VMA merge, merging both left and right VMAs
+ * but where prev and next have incompatible anon_vma objects, we revert
+ * to a merge of prev and VMA:
+ *
+ * <-->
+ * 0123456789
+ * PPPVVVVNNN
+ * ->
+ * 0123456789
+ * PPPPPPPNNN
+ */
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, flags);
+
+ /*
+ * Give both prev and next single anon_vma_chain fields, so they will
+ * merge with the NULL vmg->anon_vma.
+ *
+ * However, when prev is compared to next, the merge should fail.
+ */
+
+ INIT_LIST_HEAD(&vma_prev->anon_vma_chain);
+ list_add(&dummy_anon_vma_chain1.same_vma, &vma_prev->anon_vma_chain);
+ ASSERT_TRUE(list_is_singular(&vma_prev->anon_vma_chain));
+ vma_prev->anon_vma = &dummy_anon_vma;
+ ASSERT_TRUE(is_mergeable_anon_vma(NULL, vma_prev->anon_vma, vma_prev));
+
+ INIT_LIST_HEAD(&vma_next->anon_vma_chain);
+ list_add(&dummy_anon_vma_chain2.same_vma, &vma_next->anon_vma_chain);
+ ASSERT_TRUE(list_is_singular(&vma_next->anon_vma_chain));
+ vma_next->anon_vma = (struct anon_vma *)2;
+ ASSERT_TRUE(is_mergeable_anon_vma(NULL, vma_next->anon_vma, vma_next));
+
+ ASSERT_FALSE(is_mergeable_anon_vma(vma_prev->anon_vma, vma_next->anon_vma, NULL));
+
+ vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+
+ ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x7000);
+ ASSERT_EQ(vma_prev->vm_pgoff, 0);
+ ASSERT_TRUE(vma_write_started(vma_prev));
+ ASSERT_FALSE(vma_write_started(vma_next));
+
+ /* Clear down and reset. */
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
+
+ /*
+ * Now consider the new VMA case. This is equivalent, only adding a new
+ * VMA in a gap between prev and next.
+ *
+ * <-->
+ * 0123456789
+ * PPP****NNN
+ * ->
+ * 0123456789
+ * PPPPPPPNNN
+ */
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, flags);
+
+ INIT_LIST_HEAD(&vma_prev->anon_vma_chain);
+ list_add(&dummy_anon_vma_chain1.same_vma, &vma_prev->anon_vma_chain);
+ vma_prev->anon_vma = (struct anon_vma *)1;
+
+ INIT_LIST_HEAD(&vma_next->anon_vma_chain);
+ list_add(&dummy_anon_vma_chain2.same_vma, &vma_next->anon_vma_chain);
+ vma_next->anon_vma = (struct anon_vma *)2;
+
+ vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
+ vmg.prev = vma_prev;
+
+ ASSERT_EQ(merge_new(&vmg), vma_prev);
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x7000);
+ ASSERT_EQ(vma_prev->vm_pgoff, 0);
+ ASSERT_TRUE(vma_write_started(vma_prev));
+ ASSERT_FALSE(vma_write_started(vma_next));
+
+ /* Final cleanup. */
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
+
+ return true;
+}
+
+static bool test_dup_anon_vma(void)
+{
+ unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+ struct mm_struct mm = {};
+ VMA_ITERATOR(vmi, &mm, 0);
+ struct vma_merge_struct vmg = {
+ .mm = &mm,
+ .vmi = &vmi,
+ };
+ struct anon_vma_chain dummy_anon_vma_chain = {
+ .anon_vma = &dummy_anon_vma,
+ };
+ struct vm_area_struct *vma_prev, *vma_next, *vma;
+
+ reset_dummy_anon_vma();
+
+ /*
+ * Expanding a VMA delete the next one duplicates next's anon_vma and
+ * assigns it to the expanded VMA.
+ *
+ * This covers new VMA merging, as these operations amount to a VMA
+ * expand.
+ */
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+ vma_next->anon_vma = &dummy_anon_vma;
+
+ vmg_set_range(&vmg, 0, 0x5000, 0, flags);
+ vmg.vma = vma_prev;
+ vmg.next = vma_next;
+
+ ASSERT_EQ(expand_existing(&vmg), 0);
+
+ /* Will have been cloned. */
+ ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_prev->anon_vma->was_cloned);
+
+ /* Cleanup ready for next run. */
+ cleanup_mm(&mm, &vmi);
+
+ /*
+ * next has anon_vma, we assign to prev.
+ *
+ * |<----->|
+ * |-------*********-------|
+ * prev vma next
+ * extend delete delete
+ */
+
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x5000, 0x8000, 5, flags);
+
+ /* Initialise avc so mergeability check passes. */
+ INIT_LIST_HEAD(&vma_next->anon_vma_chain);
+ list_add(&dummy_anon_vma_chain.same_vma, &vma_next->anon_vma_chain);
+
+ vma_next->anon_vma = &dummy_anon_vma;
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+
+ ASSERT_EQ(merge_existing(&vmg), vma_prev);
+
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x8000);
+
+ ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_prev->anon_vma->was_cloned);
+
+ cleanup_mm(&mm, &vmi);
+
+ /*
+ * vma has anon_vma, we assign to prev.
+ *
+ * |<----->|
+ * |-------*********-------|
+ * prev vma next
+ * extend delete delete
+ */
+
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x5000, 0x8000, 5, flags);
+
+ vma->anon_vma = &dummy_anon_vma;
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+
+ ASSERT_EQ(merge_existing(&vmg), vma_prev);
+
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x8000);
+
+ ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_prev->anon_vma->was_cloned);
+
+ cleanup_mm(&mm, &vmi);
+
+ /*
+ * vma has anon_vma, we assign to prev.
+ *
+ * |<----->|
+ * |-------*************
+ * prev vma
+ * extend shrink/delete
+ */
+
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x8000, 3, flags);
+
+ vma->anon_vma = &dummy_anon_vma;
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+
+ ASSERT_EQ(merge_existing(&vmg), vma_prev);
+
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x5000);
+
+ ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_prev->anon_vma->was_cloned);
+
+ cleanup_mm(&mm, &vmi);
+
+ /*
+ * vma has anon_vma, we assign to next.
+ *
+ * |<----->|
+ * *************-------|
+ * vma next
+ * shrink/delete extend
+ */
+
+ vma = alloc_and_link_vma(&mm, 0, 0x5000, 0, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x5000, 0x8000, 5, flags);
+
+ vma->anon_vma = &dummy_anon_vma;
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+ vmg.prev = vma;
+ vmg.vma = vma;
+
+ ASSERT_EQ(merge_existing(&vmg), vma_next);
+
+ ASSERT_EQ(vma_next->vm_start, 0x3000);
+ ASSERT_EQ(vma_next->vm_end, 0x8000);
+
+ ASSERT_EQ(vma_next->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_next->anon_vma->was_cloned);
+
+ cleanup_mm(&mm, &vmi);
+ return true;
+}
+
+static bool test_vmi_prealloc_fail(void)
+{
+ unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+ struct mm_struct mm = {};
+ VMA_ITERATOR(vmi, &mm, 0);
+ struct vma_merge_struct vmg = {
+ .mm = &mm,
+ .vmi = &vmi,
+ };
+ struct vm_area_struct *vma_prev, *vma;
+
+ /*
+ * We are merging vma into prev, with vma possessing an anon_vma, which
+ * will be duplicated. We cause the vmi preallocation to fail and assert
+ * the duplicated anon_vma is unlinked.
+ */
+
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+ vma->anon_vma = &dummy_anon_vma;
+
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+
+ fail_prealloc = true;
+
+ /* This will cause the merge to fail. */
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+ /* We will already have assigned the anon_vma. */
+ ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
+ /* And it was both cloned and unlinked. */
+ ASSERT_TRUE(dummy_anon_vma.was_cloned);
+ ASSERT_TRUE(dummy_anon_vma.was_unlinked);
+
+ cleanup_mm(&mm, &vmi); /* Resets fail_prealloc too. */
+
+ /*
+ * We repeat the same operation for expanding a VMA, which is what new
+ * VMA merging ultimately uses too. This asserts that unlinking is
+ * performed in this case too.
+ */
+
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+ vma->anon_vma = &dummy_anon_vma;
+
+ vmg_set_range(&vmg, 0, 0x5000, 3, flags);
+ vmg.vma = vma_prev;
+ vmg.next = vma;
+
+ fail_prealloc = true;
+ ASSERT_EQ(expand_existing(&vmg), -ENOMEM);
+
+ ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(dummy_anon_vma.was_cloned);
+ ASSERT_TRUE(dummy_anon_vma.was_unlinked);
+
+ cleanup_mm(&mm, &vmi);
+ return true;
+}
+
int main(void)
{
int num_tests = 0, num_fail = 0;
@@ -193,11 +1391,21 @@ int main(void)
} \
} while (0)
+ /* Very simple tests to kick the tyres. */
TEST(simple_merge);
TEST(simple_modify);
TEST(simple_expand);
TEST(simple_shrink);
+ TEST(merge_new);
+ TEST(vma_merge_special_flags);
+ TEST(vma_merge_with_close);
+ TEST(vma_merge_new_with_close);
+ TEST(merge_existing);
+ TEST(anon_vma_non_mergeable);
+ TEST(dup_anon_vma);
+ TEST(vmi_prealloc_fail);
+
#undef TEST
printf("%d tests run, %d passed, %d failed.\n",
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
index 093560e5b2ac..a3c262c6eb73 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -81,8 +81,6 @@
#define AS_MM_ALL_LOCKS 2
-#define current NULL
-
/* We hardcode this for now. */
#define sysctl_max_map_count 0x1000000UL
@@ -92,6 +90,12 @@ typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;
typedef unsigned long vm_flags_t;
typedef __bitwise unsigned int vm_fault_t;
+/*
+ * The shared stubs do not implement this, it amounts to an fprintf(STDERR,...)
+ * either way :)
+ */
+#define pr_warn_once pr_err
+
typedef struct refcount_struct {
atomic_t refs;
} refcount_t;
@@ -100,9 +104,30 @@ struct kref {
refcount_t refcount;
};
+/*
+ * Define the task command name length as enum, then it can be visible to
+ * BPF programs.
+ */
+enum {
+ TASK_COMM_LEN = 16,
+};
+
+struct task_struct {
+ char comm[TASK_COMM_LEN];
+ pid_t pid;
+ struct mm_struct *mm;
+};
+
+struct task_struct *get_current(void);
+#define current get_current()
+
struct anon_vma {
struct anon_vma *root;
struct rb_root_cached rb_root;
+
+ /* Test fields. */
+ bool was_cloned;
+ bool was_unlinked;
};
struct anon_vma_chain {
@@ -682,13 +707,21 @@ static inline int vma_dup_policy(struct vm_area_struct *, struct vm_area_struct
return 0;
}
-static inline int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *)
+static inline int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
{
+ /* For testing purposes. We indicate that an anon_vma has been cloned. */
+ if (src->anon_vma != NULL) {
+ dst->anon_vma = src->anon_vma;
+ dst->anon_vma->was_cloned = true;
+ }
+
return 0;
}
-static inline void vma_start_write(struct vm_area_struct *)
+static inline void vma_start_write(struct vm_area_struct *vma)
{
+ /* Used to indicate to tests that a write operation has begun. */
+ vma->vm_lock_seq++;
}
static inline void vma_adjust_trans_huge(struct vm_area_struct *vma,
@@ -759,8 +792,10 @@ static inline void vma_assert_write_locked(struct vm_area_struct *)
{
}
-static inline void unlink_anon_vmas(struct vm_area_struct *)
+static inline void unlink_anon_vmas(struct vm_area_struct *vma)
{
+ /* For testing purposes, indicate that the anon_vma was unlinked. */
+ vma->anon_vma->was_unlinked = true;
}
static inline void anon_vma_unlock_write(struct anon_vma *)
--
2.46.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 03/10] mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify()
2024-08-23 20:06 [PATCH v2 00/10] mm: remove vma_merge() Lorenzo Stoakes
2024-08-23 20:06 ` [PATCH v2 01/10] tools: improve vma test Makefile Lorenzo Stoakes
2024-08-23 20:06 ` [PATCH v2 02/10] tools: add VMA merge tests Lorenzo Stoakes
@ 2024-08-23 20:06 ` Lorenzo Stoakes
2024-08-28 19:35 ` Liam R. Howlett
2024-08-23 20:06 ` [PATCH v2 04/10] mm: remove duplicated open-coded VMA policy check Lorenzo Stoakes
` (6 subsequent siblings)
9 siblings, 1 reply; 26+ messages in thread
From: Lorenzo Stoakes @ 2024-08-23 20:06 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton; +Cc: Liam R . Howlett, Vlastimil Babka
Rather than passing around huge numbers of parameters to numerous helper
functions, abstract them into a single struct that we thread through the
operation, the vma_merge_struct ('vmg').
Adjust vma_merge() and vma_modify() to accept this parameter, as well as
predicate functions can_vma_merge_before(), can_vma_merge_after(), and the
vma_modify_...() helper functions.
Also introduce VMG_STATE() and VMG_VMA_STATE() helper macros to allow for
easy vmg declaration.
We additionally remove the requirement that vma_merge() is passed a VMA
object representing the candidate new VMA. Previously it used this to
obtain the mm_struct, file and anon_vma properties of the proposed range (a
rather confusing state of affairs), which are now provided by the vmg
directly.
We also remove the pgoff calculation previously performed vma_modify(), and
instead calculate this in VMG_VMA_STATE() via the vma_pgoff_offset()
helper.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/mmap.c | 77 ++++++++-------
mm/vma.c | 209 ++++++++++++++++++++++++----------------
mm/vma.h | 128 ++++++++++++++----------
tools/testing/vma/vma.c | 43 +--------
4 files changed, 249 insertions(+), 208 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index e495b0381265..4066c0444495 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1373,9 +1373,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long end = addr + len;
unsigned long merge_start = addr, merge_end = end;
bool writable_file_mapping = false;
- pgoff_t vm_pgoff;
int error = -ENOMEM;
VMA_ITERATOR(vmi, mm, addr);
+ VMG_STATE(vmg, mm, &vmi, addr, end, vm_flags, pgoff);
+
+ vmg.file = file;
/* Find the first overlapping VMA */
vma = vma_find(&vmi, end);
@@ -1388,12 +1390,12 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
if (vms_gather_munmap_vmas(&vms, &mas_detach))
return -ENOMEM;
- next = vms.next;
- prev = vms.prev;
+ next = vmg.next = vms.next;
+ prev = vmg.prev = vms.prev;
vma = NULL;
} else {
- next = vma_next(&vmi);
- prev = vma_prev(&vmi);
+ next = vmg.next = vma_next(&vmi);
+ prev = vmg.prev = vma_prev(&vmi);
if (prev)
vma_iter_next_range(&vmi);
}
@@ -1413,6 +1415,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vms.nr_accounted = 0;
vm_flags |= VM_ACCOUNT;
+ vmg.flags = vm_flags;
}
if (vm_flags & VM_SPECIAL)
@@ -1421,28 +1424,31 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
/* Attempt to expand an old mapping */
/* Check next */
if (next && next->vm_start == end && !vma_policy(next) &&
- can_vma_merge_before(next, vm_flags, NULL, file, pgoff+pglen,
- NULL_VM_UFFD_CTX, NULL)) {
+ can_vma_merge_before(&vmg)) {
merge_end = next->vm_end;
vma = next;
- vm_pgoff = next->vm_pgoff - pglen;
+ vmg.pgoff = next->vm_pgoff - pglen;
+ /*
+ * We set this here so if we will merge with the previous VMA in
+ * the code below, can_vma_merge_after() ensures anon_vma
+ * compatibility between prev and next.
+ */
+ vmg.anon_vma = vma->anon_vma;
+ vmg.uffd_ctx = vma->vm_userfaultfd_ctx;
}
/* Check prev */
if (prev && prev->vm_end == addr && !vma_policy(prev) &&
- (vma ? can_vma_merge_after(prev, vm_flags, vma->anon_vma, file,
- pgoff, vma->vm_userfaultfd_ctx, NULL) :
- can_vma_merge_after(prev, vm_flags, NULL, file, pgoff,
- NULL_VM_UFFD_CTX, NULL))) {
+ can_vma_merge_after(&vmg)) {
merge_start = prev->vm_start;
vma = prev;
- vm_pgoff = prev->vm_pgoff;
+ vmg.pgoff = prev->vm_pgoff;
vma_prev(&vmi); /* Equivalent to going to the previous range */
}
if (vma) {
/* Actually expand, if possible */
- if (!vma_expand(&vmi, vma, merge_start, merge_end, vm_pgoff, next)) {
+ if (!vma_expand(&vmi, vma, merge_start, merge_end, vmg.pgoff, next)) {
khugepaged_enter_vma(vma, vm_flags);
goto expanded;
}
@@ -1772,26 +1778,29 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
* Expand the existing vma if possible; Note that singular lists do not
* occur after forking, so the expand will only happen on new VMAs.
*/
- if (vma && vma->vm_end == addr && !vma_policy(vma) &&
- can_vma_merge_after(vma, flags, NULL, NULL,
- addr >> PAGE_SHIFT, NULL_VM_UFFD_CTX, NULL)) {
- vma_iter_config(vmi, vma->vm_start, addr + len);
- if (vma_iter_prealloc(vmi, vma))
- goto unacct_fail;
-
- vma_start_write(vma);
-
- init_vma_prep(&vp, vma);
- vma_prepare(&vp);
- vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0);
- vma->vm_end = addr + len;
- vm_flags_set(vma, VM_SOFTDIRTY);
- vma_iter_store(vmi, vma);
-
- vma_complete(&vp, vmi, mm);
- validate_mm(mm);
- khugepaged_enter_vma(vma, flags);
- goto out;
+ if (vma && vma->vm_end == addr && !vma_policy(vma)) {
+ VMG_STATE(vmg, mm, vmi, addr, addr + len, flags, PHYS_PFN(addr));
+
+ vmg.prev = vma;
+ if (can_vma_merge_after(&vmg)) {
+ vma_iter_config(vmi, vma->vm_start, addr + len);
+ if (vma_iter_prealloc(vmi, vma))
+ goto unacct_fail;
+
+ vma_start_write(vma);
+
+ init_vma_prep(&vp, vma);
+ vma_prepare(&vp);
+ vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0);
+ vma->vm_end = addr + len;
+ vm_flags_set(vma, VM_SOFTDIRTY);
+ vma_iter_store(vmi, vma);
+
+ vma_complete(&vp, vmi, mm);
+ validate_mm(mm);
+ khugepaged_enter_vma(vma, flags);
+ goto out;
+ }
}
if (vma)
diff --git a/mm/vma.c b/mm/vma.c
index e5a5e418c7cb..74c627ff0313 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -7,16 +7,18 @@
#include "vma_internal.h"
#include "vma.h"
-/*
- * If the vma has a ->close operation then the driver probably needs to release
- * per-vma resources, so we don't attempt to merge those if the caller indicates
- * the current vma may be removed as part of the merge.
- */
-static inline bool is_mergeable_vma(struct vm_area_struct *vma,
- struct file *file, unsigned long vm_flags,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name, bool may_remove_vma)
+static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_next)
{
+ struct vm_area_struct *vma = merge_next ? vmg->next : vmg->prev;
+ /*
+ * If the vma has a ->close operation then the driver probably needs to
+ * release per-vma resources, so we don't attempt to merge those if the
+ * caller indicates the current vma may be removed as part of the merge,
+ * which is the case if we are attempting to merge the next VMA into
+ * this one.
+ */
+ bool may_remove_vma = merge_next;
+
/*
* VM_SOFTDIRTY should not prevent from VMA merging, if we
* match the flags but dirty bit -- the caller should mark
@@ -25,15 +27,15 @@ static inline bool is_mergeable_vma(struct vm_area_struct *vma,
* the kernel to generate new VMAs when old one could be
* extended instead.
*/
- if ((vma->vm_flags ^ vm_flags) & ~VM_SOFTDIRTY)
+ if ((vma->vm_flags ^ vmg->flags) & ~VM_SOFTDIRTY)
return false;
- if (vma->vm_file != file)
+ if (vma->vm_file != vmg->file)
return false;
if (may_remove_vma && vma->vm_ops && vma->vm_ops->close)
return false;
- if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx))
+ if (!is_mergeable_vm_userfaultfd_ctx(vma, vmg->uffd_ctx))
return false;
- if (!anon_vma_name_eq(anon_vma_name(vma), anon_name))
+ if (!anon_vma_name_eq(anon_vma_name(vma), vmg->anon_name))
return false;
return true;
}
@@ -94,16 +96,16 @@ static void init_multi_vma_prep(struct vma_prepare *vp,
* We assume the vma may be removed as part of the merge.
*/
bool
-can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
- struct anon_vma *anon_vma, struct file *file,
- pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name)
+can_vma_merge_before(struct vma_merge_struct *vmg)
{
- if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name, true) &&
- is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
- if (vma->vm_pgoff == vm_pgoff)
+ pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start);
+
+ if (is_mergeable_vma(vmg, true) &&
+ is_mergeable_anon_vma(vmg->anon_vma, vmg->next->anon_vma, vmg->next)) {
+ if (vmg->next->vm_pgoff == vmg->pgoff + pglen)
return true;
}
+
return false;
}
@@ -116,18 +118,11 @@ can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
*
* We assume that vma is not removed as part of the merge.
*/
-bool
-can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
- struct anon_vma *anon_vma, struct file *file,
- pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name)
+bool can_vma_merge_after(struct vma_merge_struct *vmg)
{
- if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name, false) &&
- is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
- pgoff_t vm_pglen;
-
- vm_pglen = vma_pages(vma);
- if (vma->vm_pgoff + vm_pglen == vm_pgoff)
+ if (is_mergeable_vma(vmg, false) &&
+ is_mergeable_anon_vma(vmg->anon_vma, vmg->prev->anon_vma, vmg->prev)) {
+ if (vmg->prev->vm_pgoff + vma_pages(vmg->prev) == vmg->pgoff)
return true;
}
return false;
@@ -1022,16 +1017,10 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
* **** is not represented - it will be merged and the vma containing the
* area is returned, or the function will return NULL
*/
-static struct vm_area_struct
-*vma_merge(struct vma_iterator *vmi, struct vm_area_struct *prev,
- struct vm_area_struct *src, unsigned long addr, unsigned long end,
- unsigned long vm_flags, pgoff_t pgoff, struct mempolicy *policy,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name)
+static struct vm_area_struct *vma_merge(struct vma_merge_struct *vmg)
{
- struct mm_struct *mm = src->vm_mm;
- struct anon_vma *anon_vma = src->anon_vma;
- struct file *file = src->vm_file;
+ struct mm_struct *mm = vmg->mm;
+ struct vm_area_struct *prev = vmg->prev;
struct vm_area_struct *curr, *next, *res;
struct vm_area_struct *vma, *adjust, *remove, *remove2;
struct vm_area_struct *anon_dup = NULL;
@@ -1041,16 +1030,18 @@ static struct vm_area_struct
bool merge_prev = false;
bool merge_next = false;
bool vma_expanded = false;
+ unsigned long addr = vmg->start;
+ unsigned long end = vmg->end;
unsigned long vma_start = addr;
unsigned long vma_end = end;
- pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
+ pgoff_t pglen = PHYS_PFN(end - addr);
long adj_start = 0;
/*
* We later require that vma->vm_flags == vm_flags,
* so this tests vma->vm_flags & VM_SPECIAL, too.
*/
- if (vm_flags & VM_SPECIAL)
+ if (vmg->flags & VM_SPECIAL)
return NULL;
/* Does the input range span an existing VMA? (cases 5 - 8) */
@@ -1058,27 +1049,26 @@ static struct vm_area_struct
if (!curr || /* cases 1 - 4 */
end == curr->vm_end) /* cases 6 - 8, adjacent VMA */
- next = vma_lookup(mm, end);
+ next = vmg->next = vma_lookup(mm, end);
else
- next = NULL; /* case 5 */
+ next = vmg->next = NULL; /* case 5 */
if (prev) {
vma_start = prev->vm_start;
vma_pgoff = prev->vm_pgoff;
/* Can we merge the predecessor? */
- if (addr == prev->vm_end && mpol_equal(vma_policy(prev), policy)
- && can_vma_merge_after(prev, vm_flags, anon_vma, file,
- pgoff, vm_userfaultfd_ctx, anon_name)) {
+ if (addr == prev->vm_end && mpol_equal(vma_policy(prev), vmg->policy)
+ && can_vma_merge_after(vmg)) {
+
merge_prev = true;
- vma_prev(vmi);
+ vma_prev(vmg->vmi);
}
}
/* Can we merge the successor? */
- if (next && mpol_equal(policy, vma_policy(next)) &&
- can_vma_merge_before(next, vm_flags, anon_vma, file, pgoff+pglen,
- vm_userfaultfd_ctx, anon_name)) {
+ if (next && mpol_equal(vmg->policy, vma_policy(next)) &&
+ can_vma_merge_before(vmg)) {
merge_next = true;
}
@@ -1129,7 +1119,7 @@ static struct vm_area_struct
remove = curr;
} else { /* case 5 */
adjust = curr;
- adj_start = (end - curr->vm_start);
+ adj_start = end - curr->vm_start;
}
if (!err)
err = dup_anon_vma(prev, curr, &anon_dup);
@@ -1169,13 +1159,13 @@ static struct vm_area_struct
vma_expanded = true;
if (vma_expanded) {
- vma_iter_config(vmi, vma_start, vma_end);
+ vma_iter_config(vmg->vmi, vma_start, vma_end);
} else {
- vma_iter_config(vmi, adjust->vm_start + adj_start,
+ vma_iter_config(vmg->vmi, adjust->vm_start + adj_start,
adjust->vm_end);
}
- if (vma_iter_prealloc(vmi, vma))
+ if (vma_iter_prealloc(vmg->vmi, vma))
goto prealloc_fail;
init_multi_vma_prep(&vp, vma, adjust, remove, remove2);
@@ -1187,20 +1177,20 @@ static struct vm_area_struct
vma_set_range(vma, vma_start, vma_end, vma_pgoff);
if (vma_expanded)
- vma_iter_store(vmi, vma);
+ vma_iter_store(vmg->vmi, vma);
if (adj_start) {
adjust->vm_start += adj_start;
adjust->vm_pgoff += adj_start >> PAGE_SHIFT;
if (adj_start < 0) {
WARN_ON(vma_expanded);
- vma_iter_store(vmi, next);
+ vma_iter_store(vmg->vmi, next);
}
}
- vma_complete(&vp, vmi, mm);
+ vma_complete(&vp, vmg->vmi, mm);
validate_mm(mm);
- khugepaged_enter_vma(res, vm_flags);
+ khugepaged_enter_vma(res, vmg->flags);
return res;
prealloc_fail:
@@ -1208,8 +1198,8 @@ static struct vm_area_struct
unlink_anon_vmas(anon_dup);
anon_vma_fail:
- vma_iter_set(vmi, addr);
- vma_iter_load(vmi);
+ vma_iter_set(vmg->vmi, addr);
+ vma_iter_load(vmg->vmi);
return NULL;
}
@@ -1226,32 +1216,27 @@ static struct vm_area_struct
* The function returns either the merged VMA, the original VMA if a split was
* required instead, or an error if the split failed.
*/
-struct vm_area_struct *vma_modify(struct vma_iterator *vmi,
- struct vm_area_struct *prev,
- struct vm_area_struct *vma,
- unsigned long start, unsigned long end,
- unsigned long vm_flags,
- struct mempolicy *policy,
- struct vm_userfaultfd_ctx uffd_ctx,
- struct anon_vma_name *anon_name)
+static struct vm_area_struct *vma_modify(struct vma_merge_struct *vmg)
{
- pgoff_t pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
+ struct vm_area_struct *vma = vmg->vma;
struct vm_area_struct *merged;
- merged = vma_merge(vmi, prev, vma, start, end, vm_flags,
- pgoff, policy, uffd_ctx, anon_name);
+ /* First, try to merge. */
+ merged = vma_merge(vmg);
if (merged)
return merged;
- if (vma->vm_start < start) {
- int err = split_vma(vmi, vma, start, 1);
+ /* Split any preceding portion of the VMA. */
+ if (vma->vm_start < vmg->start) {
+ int err = split_vma(vmg->vmi, vma, vmg->start, 1);
if (err)
return ERR_PTR(err);
}
- if (vma->vm_end > end) {
- int err = split_vma(vmi, vma, end, 0);
+ /* Split any trailing portion of the VMA. */
+ if (vma->vm_end > vmg->end) {
+ int err = split_vma(vmg->vmi, vma, vmg->end, 0);
if (err)
return ERR_PTR(err);
@@ -1260,6 +1245,65 @@ struct vm_area_struct *vma_modify(struct vma_iterator *vmi,
return vma;
}
+struct vm_area_struct *vma_modify_flags(
+ struct vma_iterator *vmi, struct vm_area_struct *prev,
+ struct vm_area_struct *vma, unsigned long start, unsigned long end,
+ unsigned long new_flags)
+{
+ VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
+
+ vmg.flags = new_flags;
+
+ return vma_modify(&vmg);
+}
+
+struct vm_area_struct
+*vma_modify_flags_name(struct vma_iterator *vmi,
+ struct vm_area_struct *prev,
+ struct vm_area_struct *vma,
+ unsigned long start,
+ unsigned long end,
+ unsigned long new_flags,
+ struct anon_vma_name *new_name)
+{
+ VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
+
+ vmg.flags = new_flags;
+ vmg.anon_name = new_name;
+
+ return vma_modify(&vmg);
+}
+
+struct vm_area_struct
+*vma_modify_policy(struct vma_iterator *vmi,
+ struct vm_area_struct *prev,
+ struct vm_area_struct *vma,
+ unsigned long start, unsigned long end,
+ struct mempolicy *new_pol)
+{
+ VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
+
+ vmg.policy = new_pol;
+
+ return vma_modify(&vmg);
+}
+
+struct vm_area_struct
+*vma_modify_flags_uffd(struct vma_iterator *vmi,
+ struct vm_area_struct *prev,
+ struct vm_area_struct *vma,
+ unsigned long start, unsigned long end,
+ unsigned long new_flags,
+ struct vm_userfaultfd_ctx new_ctx)
+{
+ VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
+
+ vmg.flags = new_flags;
+ vmg.uffd_ctx = new_ctx;
+
+ return vma_modify(&vmg);
+}
+
/*
* Attempt to merge a newly mapped VMA with those adjacent to it. The caller
* must ensure that [start, end) does not overlap any existing VMA.
@@ -1269,8 +1313,11 @@ struct vm_area_struct
struct vm_area_struct *vma, unsigned long start,
unsigned long end, pgoff_t pgoff)
{
- return vma_merge(vmi, prev, vma, start, end, vma->vm_flags, pgoff,
- vma_policy(vma), vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+ VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
+
+ vmg.pgoff = pgoff;
+
+ return vma_merge(&vmg);
}
/*
@@ -1281,12 +1328,10 @@ struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi,
struct vm_area_struct *vma,
unsigned long delta)
{
- pgoff_t pgoff = vma->vm_pgoff + vma_pages(vma);
+ VMG_VMA_STATE(vmg, vmi, vma, vma, vma->vm_end, vma->vm_end + delta);
/* vma is specified as prev, so case 1 or 2 will apply. */
- return vma_merge(vmi, vma, vma, vma->vm_end, vma->vm_end + delta,
- vma->vm_flags, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+ return vma_merge(&vmg);
}
void unlink_file_vma_batch_init(struct unlink_vma_file_batch *vb)
diff --git a/mm/vma.h b/mm/vma.h
index c774642697a0..9060a1a1edfa 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -52,6 +52,59 @@ struct vma_munmap_struct {
unsigned long data_vm;
};
+/* Represents a VMA merge operation. */
+struct vma_merge_struct {
+ struct mm_struct *mm;
+ struct vma_iterator *vmi;
+ pgoff_t pgoff;
+ struct vm_area_struct *prev;
+ struct vm_area_struct *next; /* Modified by vma_merge(). */
+ struct vm_area_struct *vma; /* Either a new VMA or the one being modified. */
+ unsigned long start;
+ unsigned long end;
+ unsigned long flags;
+ struct file *file;
+ struct anon_vma *anon_vma;
+ struct mempolicy *policy;
+ struct vm_userfaultfd_ctx uffd_ctx;
+ struct anon_vma_name *anon_name;
+};
+
+/* Assumes addr >= vma->vm_start. */
+static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
+ unsigned long addr)
+{
+ return vma->vm_pgoff + PHYS_PFN(addr - vma->vm_start);
+}
+
+#define VMG_STATE(name, mm_, vmi_, start_, end_, flags_, pgoff_) \
+ struct vma_merge_struct name = { \
+ .mm = mm_, \
+ .vmi = vmi_, \
+ .start = start_, \
+ .end = end_, \
+ .flags = flags_, \
+ .pgoff = pgoff_, \
+ }
+
+#define VMG_VMA_STATE(name, vmi_, prev_, vma_, start_, end_) \
+ struct vma_merge_struct name = { \
+ .mm = vma_->vm_mm, \
+ .vmi = vmi_, \
+ .prev = prev_, \
+ .next = NULL, \
+ .vma = vma_, \
+ .start = start_, \
+ .end = end_, \
+ .flags = vma_->vm_flags, \
+ .pgoff = vma_pgoff_offset(vma_, start_), \
+ .file = vma_->vm_file, \
+ .anon_vma = vma_->anon_vma, \
+ .policy = vma_policy(vma_), \
+ .uffd_ctx = vma_->vm_userfaultfd_ctx, \
+ .anon_name = anon_vma_name(vma_), \
+ }
+
#ifdef CONFIG_DEBUG_VM_MAPLE_TREE
void validate_mm(struct mm_struct *mm);
#else
@@ -208,80 +261,53 @@ void remove_vma(struct vm_area_struct *vma, bool unreachable, bool closed);
void unmap_region(struct ma_state *mas, struct vm_area_struct *vma,
struct vm_area_struct *prev, struct vm_area_struct *next);
-/* Required by mmap_region(). */
-bool
-can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
- struct anon_vma *anon_vma, struct file *file,
- pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name);
-
-/* Required by mmap_region() and do_brk_flags(). */
-bool
-can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
- struct anon_vma *anon_vma, struct file *file,
- pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name);
-
-struct vm_area_struct *vma_modify(struct vma_iterator *vmi,
- struct vm_area_struct *prev,
- struct vm_area_struct *vma,
- unsigned long start, unsigned long end,
- unsigned long vm_flags,
- struct mempolicy *policy,
- struct vm_userfaultfd_ctx uffd_ctx,
- struct anon_vma_name *anon_name);
+/*
+ * Can we merge the VMA described by vmg into the following VMA vmg->next?
+ *
+ * Required by mmap_region().
+ */
+bool can_vma_merge_before(struct vma_merge_struct *vmg);
+
+/*
+ * Can we merge the VMA described by vmg into the preceding VMA vmg->prev?
+ *
+ * Required by mmap_region() and do_brk_flags().
+ */
+bool can_vma_merge_after(struct vma_merge_struct *vmg);
/* We are about to modify the VMA's flags. */
-static inline struct vm_area_struct
-*vma_modify_flags(struct vma_iterator *vmi,
- struct vm_area_struct *prev,
- struct vm_area_struct *vma,
- unsigned long start, unsigned long end,
- unsigned long new_flags)
-{
- return vma_modify(vmi, prev, vma, start, end, new_flags,
- vma_policy(vma), vma->vm_userfaultfd_ctx,
- anon_vma_name(vma));
-}
+struct vm_area_struct *vma_modify_flags(struct vma_iterator *vmi,
+ struct vm_area_struct *prev,
+ struct vm_area_struct *vma,
+ unsigned long start, unsigned long end,
+ unsigned long new_flags);
/* We are about to modify the VMA's flags and/or anon_name. */
-static inline struct vm_area_struct
+struct vm_area_struct
*vma_modify_flags_name(struct vma_iterator *vmi,
struct vm_area_struct *prev,
struct vm_area_struct *vma,
unsigned long start,
unsigned long end,
unsigned long new_flags,
- struct anon_vma_name *new_name)
-{
- return vma_modify(vmi, prev, vma, start, end, new_flags,
- vma_policy(vma), vma->vm_userfaultfd_ctx, new_name);
-}
+ struct anon_vma_name *new_name);
/* We are about to modify the VMA's memory policy. */
-static inline struct vm_area_struct
+struct vm_area_struct
*vma_modify_policy(struct vma_iterator *vmi,
struct vm_area_struct *prev,
struct vm_area_struct *vma,
unsigned long start, unsigned long end,
- struct mempolicy *new_pol)
-{
- return vma_modify(vmi, prev, vma, start, end, vma->vm_flags,
- new_pol, vma->vm_userfaultfd_ctx, anon_vma_name(vma));
-}
+ struct mempolicy *new_pol);
/* We are about to modify the VMA's flags and/or uffd context. */
-static inline struct vm_area_struct
+struct vm_area_struct
*vma_modify_flags_uffd(struct vma_iterator *vmi,
struct vm_area_struct *prev,
struct vm_area_struct *vma,
unsigned long start, unsigned long end,
unsigned long new_flags,
- struct vm_userfaultfd_ctx new_ctx)
-{
- return vma_modify(vmi, prev, vma, start, end, new_flags,
- vma_policy(vma), new_ctx, anon_vma_name(vma));
-}
+ struct vm_userfaultfd_ctx new_ctx);
struct vm_area_struct
*vma_merge_new_vma(struct vma_iterator *vmi, struct vm_area_struct *prev,
diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
index 9b272633ca9e..b12f72979478 100644
--- a/tools/testing/vma/vma.c
+++ b/tools/testing/vma/vma.c
@@ -22,26 +22,6 @@ static bool fail_prealloc;
*/
#include "../../../mm/vma.c"
-/*
- * Temporarily forward-ported from a future in which vmg's are used for merging.
- */
-struct vma_merge_struct {
- struct mm_struct *mm;
- struct vma_iterator *vmi;
- pgoff_t pgoff;
- struct vm_area_struct *prev;
- struct vm_area_struct *next; /* Modified by vma_merge(). */
- struct vm_area_struct *vma; /* Either a new VMA or the one being modified. */
- unsigned long start;
- unsigned long end;
- unsigned long flags;
- struct file *file;
- struct anon_vma *anon_vma;
- struct mempolicy *policy;
- struct vm_userfaultfd_ctx uffd_ctx;
- struct anon_vma_name *anon_name;
-};
-
const struct vm_operations_struct vma_dummy_vm_ops;
static struct anon_vma dummy_anon_vma;
@@ -115,14 +95,6 @@ static struct vm_area_struct *alloc_and_link_vma(struct mm_struct *mm,
/* Helper function which provides a wrapper around a merge new VMA operation. */
static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
{
- /* vma_merge() needs a VMA to determine mm, anon_vma, and file. */
- struct vm_area_struct dummy = {
- .vm_mm = vmg->mm,
- .vm_flags = vmg->flags,
- .anon_vma = vmg->anon_vma,
- .vm_file = vmg->file,
- };
-
/*
* For convenience, get prev and next VMAs. Which the new VMA operation
* requires.
@@ -131,8 +103,7 @@ static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
vmg->prev = vma_prev(vmg->vmi);
vma_iter_set(vmg->vmi, vmg->start);
- return vma_merge_new_vma(vmg->vmi, vmg->prev, &dummy, vmg->start,
- vmg->end, vmg->pgoff);
+ return vma_merge(vmg);
}
/*
@@ -141,17 +112,7 @@ static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
*/
static struct vm_area_struct *merge_existing(struct vma_merge_struct *vmg)
{
- /* vma_merge() needs a VMA to determine mm, anon_vma, and file. */
- struct vm_area_struct dummy = {
- .vm_mm = vmg->mm,
- .vm_flags = vmg->flags,
- .anon_vma = vmg->anon_vma,
- .vm_file = vmg->file,
- };
-
- return vma_merge(vmg->vmi, vmg->prev, &dummy, vmg->start, vmg->end,
- vmg->flags, vmg->pgoff, vmg->policy, vmg->uffd_ctx,
- vmg->anon_name);
+ return vma_merge(vmg);
}
/*
--
2.46.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 04/10] mm: remove duplicated open-coded VMA policy check
2024-08-23 20:06 [PATCH v2 00/10] mm: remove vma_merge() Lorenzo Stoakes
` (2 preceding siblings ...)
2024-08-23 20:06 ` [PATCH v2 03/10] mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify() Lorenzo Stoakes
@ 2024-08-23 20:06 ` Lorenzo Stoakes
2024-08-28 19:42 ` Liam R. Howlett
2024-08-23 20:07 ` [PATCH v2 05/10] mm: abstract vma_expand() to use vma_merge_struct Lorenzo Stoakes
` (5 subsequent siblings)
9 siblings, 1 reply; 26+ messages in thread
From: Lorenzo Stoakes @ 2024-08-23 20:06 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton; +Cc: Liam R . Howlett, Vlastimil Babka
Both can_vma_merge_before() and can_vma_merge_after() are invoked after
checking for compatible VMA NUMA policy, we can simply move this to
is_mergeable_vma() and abstract this altogether.
In mmap_region() we set vmg->policy to NULL, so the policy comparisons
checked in can_vma_merge_before() and can_vma_merge_after() are exactly
equivalent to !vma_policy(vmg.next) and !vma_policy(vmg.prev).
Equally, in do_brk_flags(), vmg->policy is NULL, so the
can_vma_merge_after() is checking !vma_policy(vma), as we set vmg.prev to
vma.
In vma_merge(), we compare prev and next policies with vmg->policy before
checking can_vma_merge_after() and can_vma_merge_before() respectively,
which this patch causes to be checked in precisely the same way.
This therefore maintains precisely the same logic as before, only now
abstracted into is_mergeable_vma().
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/mmap.c | 8 +++-----
mm/vma.c | 9 ++++-----
2 files changed, 7 insertions(+), 10 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 4066c0444495..c72f50feb98a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1423,8 +1423,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
/* Attempt to expand an old mapping */
/* Check next */
- if (next && next->vm_start == end && !vma_policy(next) &&
- can_vma_merge_before(&vmg)) {
+ if (next && next->vm_start == end && can_vma_merge_before(&vmg)) {
merge_end = next->vm_end;
vma = next;
vmg.pgoff = next->vm_pgoff - pglen;
@@ -1438,8 +1437,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
}
/* Check prev */
- if (prev && prev->vm_end == addr && !vma_policy(prev) &&
- can_vma_merge_after(&vmg)) {
+ if (prev && prev->vm_end == addr && can_vma_merge_after(&vmg)) {
merge_start = prev->vm_start;
vma = prev;
vmg.pgoff = prev->vm_pgoff;
@@ -1778,7 +1776,7 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
* Expand the existing vma if possible; Note that singular lists do not
* occur after forking, so the expand will only happen on new VMAs.
*/
- if (vma && vma->vm_end == addr && !vma_policy(vma)) {
+ if (vma && vma->vm_end == addr) {
VMG_STATE(vmg, mm, vmi, addr, addr + len, flags, PHYS_PFN(addr));
vmg.prev = vma;
diff --git a/mm/vma.c b/mm/vma.c
index 74c627ff0313..b1ec412fac7f 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -19,6 +19,8 @@ static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_nex
*/
bool may_remove_vma = merge_next;
+ if (!mpol_equal(vmg->policy, vma_policy(vma)))
+ return false;
/*
* VM_SOFTDIRTY should not prevent from VMA merging, if we
* match the flags but dirty bit -- the caller should mark
@@ -1058,17 +1060,14 @@ static struct vm_area_struct *vma_merge(struct vma_merge_struct *vmg)
vma_pgoff = prev->vm_pgoff;
/* Can we merge the predecessor? */
- if (addr == prev->vm_end && mpol_equal(vma_policy(prev), vmg->policy)
- && can_vma_merge_after(vmg)) {
-
+ if (addr == prev->vm_end && can_vma_merge_after(vmg)) {
merge_prev = true;
vma_prev(vmg->vmi);
}
}
/* Can we merge the successor? */
- if (next && mpol_equal(vmg->policy, vma_policy(next)) &&
- can_vma_merge_before(vmg)) {
+ if (next && can_vma_merge_before(vmg)) {
merge_next = true;
}
--
2.46.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 05/10] mm: abstract vma_expand() to use vma_merge_struct
2024-08-23 20:06 [PATCH v2 00/10] mm: remove vma_merge() Lorenzo Stoakes
` (3 preceding siblings ...)
2024-08-23 20:06 ` [PATCH v2 04/10] mm: remove duplicated open-coded VMA policy check Lorenzo Stoakes
@ 2024-08-23 20:07 ` Lorenzo Stoakes
2024-08-28 20:03 ` Liam R. Howlett
2024-08-23 20:07 ` [PATCH v2 06/10] mm: avoid using vma_merge() for new VMAs Lorenzo Stoakes
` (4 subsequent siblings)
9 siblings, 1 reply; 26+ messages in thread
From: Lorenzo Stoakes @ 2024-08-23 20:07 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton; +Cc: Liam R . Howlett, Vlastimil Babka
The purpose of the vmg is to thread merge state through functions and avoid
egregious parameter lists. We expand this to vma_expand(), which is used
for a number of merge cases.
Accordingly, adjust its callers, mmap_region() and relocate_vma_down(), to
use a vmg.
An added purpose of this change is the ability in a future commit to
perform all new VMA range merging using vma_expand().
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/mmap.c | 16 +++++++++-------
mm/vma.c | 39 +++++++++++++++++----------------------
mm/vma.h | 5 +----
tools/testing/vma/vma.c | 3 +--
4 files changed, 28 insertions(+), 35 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index c72f50feb98a..0d242c9b1f4c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1371,7 +1371,6 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
struct ma_state mas_detach;
struct maple_tree mt_detach;
unsigned long end = addr + len;
- unsigned long merge_start = addr, merge_end = end;
bool writable_file_mapping = false;
int error = -ENOMEM;
VMA_ITERATOR(vmi, mm, addr);
@@ -1424,8 +1423,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
/* Attempt to expand an old mapping */
/* Check next */
if (next && next->vm_start == end && can_vma_merge_before(&vmg)) {
- merge_end = next->vm_end;
- vma = next;
+ vmg.end = next->vm_end;
+ vma = vmg.vma = next;
vmg.pgoff = next->vm_pgoff - pglen;
/*
* We set this here so if we will merge with the previous VMA in
@@ -1438,15 +1437,15 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
/* Check prev */
if (prev && prev->vm_end == addr && can_vma_merge_after(&vmg)) {
- merge_start = prev->vm_start;
- vma = prev;
+ vmg.start = prev->vm_start;
+ vma = vmg.vma = prev;
vmg.pgoff = prev->vm_pgoff;
vma_prev(&vmi); /* Equivalent to going to the previous range */
}
if (vma) {
/* Actually expand, if possible */
- if (!vma_expand(&vmi, vma, merge_start, merge_end, vmg.pgoff, next)) {
+ if (!vma_expand(&vmg)) {
khugepaged_enter_vma(vma, vm_flags);
goto expanded;
}
@@ -2319,6 +2318,7 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
unsigned long new_start = old_start - shift;
unsigned long new_end = old_end - shift;
VMA_ITERATOR(vmi, mm, new_start);
+ VMG_STATE(vmg, mm, &vmi, new_start, old_end, 0, vma->vm_pgoff);
struct vm_area_struct *next;
struct mmu_gather tlb;
@@ -2335,7 +2335,8 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
/*
* cover the whole range: [new_start, old_end)
*/
- if (vma_expand(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL))
+ vmg.vma = vma;
+ if (vma_expand(&vmg))
return -ENOMEM;
/*
@@ -2368,6 +2369,7 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
tlb_finish_mmu(&tlb);
vma_prev(&vmi);
+
/* Shrink the vma to just the new range */
return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff);
}
diff --git a/mm/vma.c b/mm/vma.c
index b1ec412fac7f..4867ae722a9a 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -467,30 +467,25 @@ void validate_mm(struct mm_struct *mm)
/*
* vma_expand - Expand an existing VMA
*
- * @vmi: The vma iterator
- * @vma: The vma to expand
- * @start: The start of the vma
- * @end: The exclusive end of the vma
- * @pgoff: The page offset of vma
- * @next: The current of next vma.
+ * @vmg: Describes a VMA expansion operation.
*
- * Expand @vma to @start and @end. Can expand off the start and end. Will
- * expand over @next if it's different from @vma and @end == @next->vm_end.
- * Checking if the @vma can expand and merge with @next needs to be handled by
- * the caller.
+ * Expand @vma to vmg->start and vmg->end. Can expand off the start and end.
+ * Will expand over vmg->next if it's different from vmg->vma and vmg->end ==
+ * vmg->next->vm_end. Checking if the vmg->vma can expand and merge with
+ * vmg->next needs to be handled by the caller.
*
* Returns: 0 on success
*/
-int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
- unsigned long start, unsigned long end, pgoff_t pgoff,
- struct vm_area_struct *next)
+int vma_expand(struct vma_merge_struct *vmg)
{
struct vm_area_struct *anon_dup = NULL;
bool remove_next = false;
+ struct vm_area_struct *vma = vmg->vma;
+ struct vm_area_struct *next = vmg->next;
struct vma_prepare vp;
vma_start_write(vma);
- if (next && (vma != next) && (end == next->vm_end)) {
+ if (next && (vma != next) && (vmg->end == next->vm_end)) {
int ret;
remove_next = true;
@@ -503,21 +498,21 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
init_multi_vma_prep(&vp, vma, NULL, remove_next ? next : NULL, NULL);
/* Not merging but overwriting any part of next is not handled. */
VM_WARN_ON(next && !vp.remove &&
- next != vma && end > next->vm_start);
+ next != vma && vmg->end > next->vm_start);
/* Only handles expanding */
- VM_WARN_ON(vma->vm_start < start || vma->vm_end > end);
+ VM_WARN_ON(vma->vm_start < vmg->start || vma->vm_end > vmg->end);
/* Note: vma iterator must be pointing to 'start' */
- vma_iter_config(vmi, start, end);
- if (vma_iter_prealloc(vmi, vma))
+ vma_iter_config(vmg->vmi, vmg->start, vmg->end);
+ if (vma_iter_prealloc(vmg->vmi, vma))
goto nomem;
vma_prepare(&vp);
- vma_adjust_trans_huge(vma, start, end, 0);
- vma_set_range(vma, start, end, pgoff);
- vma_iter_store(vmi, vma);
+ vma_adjust_trans_huge(vma, vmg->start, vmg->end, 0);
+ vma_set_range(vma, vmg->start, vmg->end, vmg->pgoff);
+ vma_iter_store(vmg->vmi, vma);
- vma_complete(&vp, vmi, vma->vm_mm);
+ vma_complete(&vp, vmg->vmi, vma->vm_mm);
return 0;
nomem:
diff --git a/mm/vma.h b/mm/vma.h
index 9060a1a1edfa..8f01fbc20fe7 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -128,10 +128,7 @@ void init_vma_prep(struct vma_prepare *vp,
void vma_complete(struct vma_prepare *vp,
struct vma_iterator *vmi, struct mm_struct *mm);
-int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
- unsigned long start, unsigned long end, pgoff_t pgoff,
- struct vm_area_struct *next);
-
+int vma_expand(struct vma_merge_struct *vmg);
int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long start, unsigned long end, pgoff_t pgoff);
diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
index b12f72979478..cce1ba78c57f 100644
--- a/tools/testing/vma/vma.c
+++ b/tools/testing/vma/vma.c
@@ -121,8 +121,7 @@ static struct vm_area_struct *merge_existing(struct vma_merge_struct *vmg)
*/
static int expand_existing(struct vma_merge_struct *vmg)
{
- return vma_expand(vmg->vmi, vmg->vma, vmg->start, vmg->end, vmg->pgoff,
- vmg->next);
+ return vma_expand(vmg);
}
/*
--
2.46.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 06/10] mm: avoid using vma_merge() for new VMAs
2024-08-23 20:06 [PATCH v2 00/10] mm: remove vma_merge() Lorenzo Stoakes
` (4 preceding siblings ...)
2024-08-23 20:07 ` [PATCH v2 05/10] mm: abstract vma_expand() to use vma_merge_struct Lorenzo Stoakes
@ 2024-08-23 20:07 ` Lorenzo Stoakes
2024-08-27 11:41 ` Lorenzo Stoakes
` (2 more replies)
2024-08-23 20:07 ` [PATCH v2 07/10] mm: make vma_prepare() and friends static and internal to vma.c Lorenzo Stoakes
` (3 subsequent siblings)
9 siblings, 3 replies; 26+ messages in thread
From: Lorenzo Stoakes @ 2024-08-23 20:07 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton; +Cc: Liam R . Howlett, Vlastimil Babka
Abstract vma_merge_new_vma() to use vma_merge_struct and rename the
resultant function vma_merge_new_range() to be clear what the purpose of
this function is - a new VMA is desired in the specified range, and we wish
to see if it is possible to 'merge' surrounding VMAs into this range rather
than having to allocate a new VMA.
Note that this function uses vma_extend() exclusively, so adopts its
requirement that the iterator point at or before the gap. We add an assert
to this effect.
This is as opposed to vma_merge_existing_range(), which will be introduced
in a subsequent commit, and provide the same functionality for cases in
which we are modifying an existing VMA.
In mmap_region() and do_brk_flags() we open code scenarios where we prefer
to use vma_expand() rather than invoke a full vma_merge() operation.
Abstract this logic and eliminate all of the open-coding, and also use the
same logic for all cases where we add new VMAs to, rather than ultimately
use vma_merge(), rather use vma_expand().
Doing so removes duplication and simplifies VMA merging in all such cases,
laying the ground for us to eliminate the merging of new VMAs in
vma_merge() altogether.
Also add the ability for the vmg to track state, and able to report errors,
allowing for us to differentiate a failed merge from an inability to
allocate memory in callers.
This makes it far easier to understand what is happening in these cases
avoiding confusion, bugs and allowing for future optimisation.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/mmap.c | 93 +++++-------------
mm/vma.c | 162 ++++++++++++++++++++++++++-----
mm/vma.h | 20 +++-
tools/testing/vma/vma.c | 33 ++++++-
tools/testing/vma/vma_internal.h | 6 ++
5 files changed, 216 insertions(+), 98 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 0d242c9b1f4c..80d70ed099cf 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1364,8 +1364,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma = NULL;
- struct vm_area_struct *next, *prev, *merge;
pgoff_t pglen = PHYS_PFN(len);
+ struct vm_area_struct *merge;
unsigned long charged = 0;
struct vma_munmap_struct vms;
struct ma_state mas_detach;
@@ -1389,13 +1389,13 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
if (vms_gather_munmap_vmas(&vms, &mas_detach))
return -ENOMEM;
- next = vmg.next = vms.next;
- prev = vmg.prev = vms.prev;
+ vmg.next = vms.next;
+ vmg.prev = vms.prev;
vma = NULL;
} else {
- next = vmg.next = vma_next(&vmi);
- prev = vmg.prev = vma_prev(&vmi);
- if (prev)
+ vmg.next = vma_next(&vmi);
+ vmg.prev = vma_prev(&vmi);
+ if (vmg.prev)
vma_iter_next_range(&vmi);
}
@@ -1417,45 +1417,9 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vmg.flags = vm_flags;
}
- if (vm_flags & VM_SPECIAL)
- goto cannot_expand;
-
- /* Attempt to expand an old mapping */
- /* Check next */
- if (next && next->vm_start == end && can_vma_merge_before(&vmg)) {
- vmg.end = next->vm_end;
- vma = vmg.vma = next;
- vmg.pgoff = next->vm_pgoff - pglen;
- /*
- * We set this here so if we will merge with the previous VMA in
- * the code below, can_vma_merge_after() ensures anon_vma
- * compatibility between prev and next.
- */
- vmg.anon_vma = vma->anon_vma;
- vmg.uffd_ctx = vma->vm_userfaultfd_ctx;
- }
-
- /* Check prev */
- if (prev && prev->vm_end == addr && can_vma_merge_after(&vmg)) {
- vmg.start = prev->vm_start;
- vma = vmg.vma = prev;
- vmg.pgoff = prev->vm_pgoff;
- vma_prev(&vmi); /* Equivalent to going to the previous range */
- }
-
- if (vma) {
- /* Actually expand, if possible */
- if (!vma_expand(&vmg)) {
- khugepaged_enter_vma(vma, vm_flags);
- goto expanded;
- }
-
- /* If the expand fails, then reposition the vma iterator */
- if (unlikely(vma == prev))
- vma_iter_set(&vmi, addr);
- }
-
-cannot_expand:
+ vma = vma_merge_new_range(&vmg);
+ if (vma)
+ goto expanded;
/*
* Determine the object being mapped and call the appropriate
@@ -1503,10 +1467,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
* If vm_flags changed after call_mmap(), we should try merge
* vma again as we may succeed this time.
*/
- if (unlikely(vm_flags != vma->vm_flags && prev)) {
- merge = vma_merge_new_vma(&vmi, prev, vma,
- vma->vm_start, vma->vm_end,
- vma->vm_pgoff);
+ if (unlikely(vm_flags != vma->vm_flags && vmg.prev)) {
+ vmg.flags = vma->vm_flags;
+ /* If this fails, state is reset ready for a reattempt. */
+ merge = vma_merge_new_range(&vmg);
+
if (merge) {
/*
* ->mmap() can change vma->vm_file and fput
@@ -1521,6 +1486,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
/* Update vm_flags to pick up the change. */
vm_flags = vma->vm_flags;
goto unmap_writable;
+ } else {
+ vma_iter_config(&vmi, addr, end);
}
}
@@ -1554,7 +1521,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma_link_file(vma);
/*
- * vma_merge() calls khugepaged_enter_vma() either, the below
+ * vma_merge_new_range() calls khugepaged_enter_vma() too, the below
* call covers the non-merge case.
*/
khugepaged_enter_vma(vma, vma->vm_flags);
@@ -1609,7 +1576,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma_iter_set(&vmi, vma->vm_end);
/* Undo any partial mapping done by a device driver. */
- unmap_region(&vmi.mas, vma, prev, next);
+ unmap_region(&vmi.mas, vma, vmg.prev, vmg.next);
}
if (writable_file_mapping)
mapping_unmap_writable(file->f_mapping);
@@ -1755,7 +1722,6 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long addr, unsigned long len, unsigned long flags)
{
struct mm_struct *mm = current->mm;
- struct vma_prepare vp;
/*
* Check against address space limits by the changed size
@@ -1779,25 +1745,12 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
VMG_STATE(vmg, mm, vmi, addr, addr + len, flags, PHYS_PFN(addr));
vmg.prev = vma;
- if (can_vma_merge_after(&vmg)) {
- vma_iter_config(vmi, vma->vm_start, addr + len);
- if (vma_iter_prealloc(vmi, vma))
- goto unacct_fail;
-
- vma_start_write(vma);
-
- init_vma_prep(&vp, vma);
- vma_prepare(&vp);
- vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0);
- vma->vm_end = addr + len;
- vm_flags_set(vma, VM_SOFTDIRTY);
- vma_iter_store(vmi, vma);
-
- vma_complete(&vp, vmi, mm);
- validate_mm(mm);
- khugepaged_enter_vma(vma, flags);
+ vma_iter_next_range(vmi);
+
+ if (vma_merge_new_range(&vmg))
goto out;
- }
+ else if (vmg_nomem(&vmg))
+ goto unacct_fail;
}
if (vma)
diff --git a/mm/vma.c b/mm/vma.c
index 4867ae722a9a..8a5fa15f46a2 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -464,6 +464,116 @@ void validate_mm(struct mm_struct *mm)
}
#endif /* CONFIG_DEBUG_VM_MAPLE_TREE */
+/*
+ * vma_merge_new_range - Attempt to merge a new VMA into address space
+ *
+ * @vmg: Describes the VMA we are adding, in the range @vmg->start to @vmg->end
+ * (exclusive), which we try to merge with any adjacent VMAs if possible.
+ *
+ * We are about to add a VMA to the address space starting at @vmg->start and
+ * ending at @vmg->end. There are three different possible scenarios:
+ *
+ * 1. There is a VMA with identical properties immediately adjacent to the
+ * proposed new VMA [@vmg->start, @vmg->end) either before or after it -
+ * EXPAND that VMA:
+ *
+ * Proposed: |-----| or |-----|
+ * Existing: |----| |----|
+ *
+ * 2. There are VMAs with identical properties immediately adjacent to the
+ * proposed new VMA [@vmg->start, @vmg->end) both before AND after it -
+ * EXPAND the former and REMOVE the latter:
+ *
+ * Proposed: |-----|
+ * Existing: |----| |----|
+ *
+ * 3. There are no VMAs immediately adjacent to the proposed new VMA or those
+ * VMAs do not have identical attributes - NO MERGE POSSIBLE.
+ *
+ * In instances where we can merge, this function returns the expanded VMA which
+ * will have its range adjusted accordingly and the underlying maple tree also
+ * adjusted.
+ *
+ * Returns: In instances where no merge was possible, NULL. Otherwise, a pointer
+ * to the VMA we expanded.
+ *
+ * This function adjusts @vmg to provide @vmg->next if not already specified,
+ * and adjusts [@vmg->start, @vmg->end) to span the expanded range.
+ *
+ * ASSUMPTIONS:
+ * - The caller must hold a WRITE lock on the mm_struct->mmap_lock.
+ * - The caller must have determined that [@vmg->start, @vmg->end) is empty,
+ other than VMAs that will be unmapped should the operation succeed.
+ * - The caller must have specified the previous vma in @vmg->prev.
+ * - The caller must have specified the next vma in @vmg->next.
+ * - The caller must have positioned the vmi at or before the gap.
+ */
+struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
+{
+ struct vm_area_struct *prev = vmg->prev;
+ struct vm_area_struct *next = vmg->next;
+ unsigned long start = vmg->start;
+ unsigned long end = vmg->end;
+ pgoff_t pgoff = vmg->pgoff;
+ pgoff_t pglen = PHYS_PFN(end - start);
+ bool can_merge_before, can_merge_after;
+
+ mmap_assert_write_locked(vmg->mm);
+ VM_WARN_ON(vmg->vma);
+ /* vmi must point at or before the gap. */
+ VM_WARN_ON(vma_iter_addr(vmg->vmi) > end);
+
+ vmg->state = VMA_MERGE_NOMERGE;
+
+ /* Special VMAs are unmergeable, also if no prev/next. */
+ if ((vmg->flags & VM_SPECIAL) || (!prev && !next))
+ return NULL;
+
+ can_merge_before = next && next->vm_start == end &&
+ can_vma_merge_before(vmg);
+ can_merge_after = prev && prev->vm_end == start &&
+ can_vma_merge_after(vmg);
+
+ /* If we can merge with the next VMA, adjust vmg accordingly. */
+ if (can_merge_before &&
+ (!can_merge_after || is_mergeable_anon_vma(prev->anon_vma,
+ next->anon_vma, NULL))) {
+ vmg->end = next->vm_end;
+ vmg->vma = next;
+ vmg->pgoff = next->vm_pgoff - pglen;
+ }
+
+ /* If we can merge with the previous VMA, adjust vmg accordingly. */
+ if (can_merge_after) {
+ vmg->start = prev->vm_start;
+ vmg->vma = prev;
+ vmg->pgoff = prev->vm_pgoff;
+
+ vma_prev(vmg->vmi); /* Equivalent to going to the previous range */
+ }
+
+ /*
+ * Now try to expand adjacent VMA(s). This takes care of removing the
+ * following VMA if we have VMAs on both sides.
+ */
+ if (vmg->vma && !vma_expand(vmg)) {
+ khugepaged_enter_vma(vmg->vma, vmg->flags);
+
+ vmg->state = VMA_MERGE_SUCCESS;
+ return vmg->vma;
+ }
+
+ /* If expansion failed, reset state. Allows us to retry merge later. */
+ vmg->vma = NULL;
+ vmg->start = start;
+ vmg->end = end;
+ vmg->pgoff = pgoff;
+ if (vmg->vma == prev)
+ vma_iter_set(vmg->vmi, start);
+
+ return NULL;
+}
+
/*
* vma_expand - Expand an existing VMA
*
@@ -474,7 +584,11 @@ void validate_mm(struct mm_struct *mm)
* vmg->next->vm_end. Checking if the vmg->vma can expand and merge with
* vmg->next needs to be handled by the caller.
*
- * Returns: 0 on success
+ * Returns: 0 on success.
+ *
+ * ASSUMPTIONS:
+ * - The caller must hold a WRITE lock on vmg->vma->mm->mmap_lock.
+ * - The caller must have set @vmg->vma and @vmg->next.
*/
int vma_expand(struct vma_merge_struct *vmg)
{
@@ -484,6 +598,8 @@ int vma_expand(struct vma_merge_struct *vmg)
struct vm_area_struct *next = vmg->next;
struct vma_prepare vp;
+ mmap_assert_write_locked(vmg->mm);
+
vma_start_write(vma);
if (next && (vma != next) && (vmg->end == next->vm_end)) {
int ret;
@@ -516,6 +632,7 @@ int vma_expand(struct vma_merge_struct *vmg)
return 0;
nomem:
+ vmg->state = VMA_MERGE_ERROR_NOMEM;
if (anon_dup)
unlink_anon_vmas(anon_dup);
return -ENOMEM;
@@ -1034,6 +1151,8 @@ static struct vm_area_struct *vma_merge(struct vma_merge_struct *vmg)
pgoff_t pglen = PHYS_PFN(end - addr);
long adj_start = 0;
+ vmg->state = VMA_MERGE_NOMERGE;
+
/*
* We later require that vma->vm_flags == vm_flags,
* so this tests vma->vm_flags & VM_SPECIAL, too.
@@ -1185,13 +1304,19 @@ static struct vm_area_struct *vma_merge(struct vma_merge_struct *vmg)
vma_complete(&vp, vmg->vmi, mm);
validate_mm(mm);
khugepaged_enter_vma(res, vmg->flags);
+
+ vmg->state = VMA_MERGE_SUCCESS;
return res;
prealloc_fail:
+ vmg->state = VMA_MERGE_ERROR_NOMEM;
if (anon_dup)
unlink_anon_vmas(anon_dup);
anon_vma_fail:
+ if (err == -ENOMEM)
+ vmg->state = VMA_MERGE_ERROR_NOMEM;
+
vma_iter_set(vmg->vmi, addr);
vma_iter_load(vmg->vmi);
return NULL;
@@ -1298,22 +1423,6 @@ struct vm_area_struct
return vma_modify(&vmg);
}
-/*
- * Attempt to merge a newly mapped VMA with those adjacent to it. The caller
- * must ensure that [start, end) does not overlap any existing VMA.
- */
-struct vm_area_struct
-*vma_merge_new_vma(struct vma_iterator *vmi, struct vm_area_struct *prev,
- struct vm_area_struct *vma, unsigned long start,
- unsigned long end, pgoff_t pgoff)
-{
- VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
-
- vmg.pgoff = pgoff;
-
- return vma_merge(&vmg);
-}
-
/*
* Expand vma by delta bytes, potentially merging with an immediately adjacent
* VMA with identical properties.
@@ -1324,8 +1433,9 @@ struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi,
{
VMG_VMA_STATE(vmg, vmi, vma, vma, vma->vm_end, vma->vm_end + delta);
- /* vma is specified as prev, so case 1 or 2 will apply. */
- return vma_merge(&vmg);
+ /* We use the VMA to populate VMG fields only. */
+ vmg.vma = NULL;
+ return vma_merge_new_range(&vmg);
}
void unlink_file_vma_batch_init(struct unlink_vma_file_batch *vb)
@@ -1426,9 +1536,10 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
struct vm_area_struct *vma = *vmap;
unsigned long vma_start = vma->vm_start;
struct mm_struct *mm = vma->vm_mm;
- struct vm_area_struct *new_vma, *prev;
+ struct vm_area_struct *new_vma;
bool faulted_in_anon_vma = true;
VMA_ITERATOR(vmi, mm, addr);
+ VMG_VMA_STATE(vmg, &vmi, NULL, vma, addr, addr + len);
/*
* If anonymous vma has not yet been faulted, update new pgoff
@@ -1439,11 +1550,18 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
faulted_in_anon_vma = false;
}
- new_vma = find_vma_prev(mm, addr, &prev);
+ new_vma = find_vma_prev(mm, addr, &vmg.prev);
if (new_vma && new_vma->vm_start < addr + len)
return NULL; /* should never get here */
- new_vma = vma_merge_new_vma(&vmi, prev, vma, addr, addr + len, pgoff);
+ vmg.vma = NULL; /* New VMA range. */
+ vmg.pgoff = pgoff;
+ vmg.next = vma_next(&vmi);
+ vma_prev(&vmi);
+ vma_iter_next_range(&vmi);
+
+ new_vma = vma_merge_new_range(&vmg);
+
if (new_vma) {
/*
* Source vma may have been merged into new_vma
diff --git a/mm/vma.h b/mm/vma.h
index 8f01fbc20fe7..dbcdf1431014 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -52,6 +52,13 @@ struct vma_munmap_struct {
unsigned long data_vm;
};
+enum vma_merge_state {
+ VMA_MERGE_START,
+ VMA_MERGE_ERROR_NOMEM,
+ VMA_MERGE_NOMERGE,
+ VMA_MERGE_SUCCESS,
+};
+
/* Represents a VMA merge operation. */
struct vma_merge_struct {
struct mm_struct *mm;
@@ -68,8 +75,14 @@ struct vma_merge_struct {
struct mempolicy *policy;
struct vm_userfaultfd_ctx uffd_ctx;
struct anon_vma_name *anon_name;
+ enum vma_merge_state state;
};
+static inline bool vmg_nomem(struct vma_merge_struct *vmg)
+{
+ return vmg->state == VMA_MERGE_ERROR_NOMEM;
+}
+
/* Assumes addr >= vma->vm_start. */
static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
unsigned long addr)
@@ -85,6 +98,7 @@ static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
.end = end_, \
.flags = flags_, \
.pgoff = pgoff_, \
+ .state = VMA_MERGE_START, \
}
#define VMG_VMA_STATE(name, vmi_, prev_, vma_, start_, end_) \
@@ -103,6 +117,7 @@ static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
.policy = vma_policy(vma_), \
.uffd_ctx = vma_->vm_userfaultfd_ctx, \
.anon_name = anon_vma_name(vma_), \
+ .state = VMA_MERGE_START, \
}
#ifdef CONFIG_DEBUG_VM_MAPLE_TREE
@@ -306,10 +321,7 @@ struct vm_area_struct
unsigned long new_flags,
struct vm_userfaultfd_ctx new_ctx);
-struct vm_area_struct
-*vma_merge_new_vma(struct vma_iterator *vmi, struct vm_area_struct *prev,
- struct vm_area_struct *vma, unsigned long start,
- unsigned long end, pgoff_t pgoff);
+struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg);
struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi,
struct vm_area_struct *vma,
diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
index cce1ba78c57f..3a3a850d951c 100644
--- a/tools/testing/vma/vma.c
+++ b/tools/testing/vma/vma.c
@@ -101,9 +101,9 @@ static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
*/
vmg->next = vma_next(vmg->vmi);
vmg->prev = vma_prev(vmg->vmi);
+ vma_iter_next_range(vmg->vmi);
- vma_iter_set(vmg->vmi, vmg->start);
- return vma_merge(vmg);
+ return vma_merge_new_range(vmg);
}
/*
@@ -162,10 +162,14 @@ static struct vm_area_struct *try_merge_new_vma(struct mm_struct *mm,
merged = merge_new(vmg);
if (merged) {
*was_merged = true;
+ ASSERT_EQ(vmg->state, VMA_MERGE_SUCCESS);
return merged;
}
*was_merged = false;
+
+ ASSERT_EQ(vmg->state, VMA_MERGE_NOMERGE);
+
return alloc_and_link_vma(mm, start, end, pgoff, flags);
}
@@ -595,6 +599,7 @@ static bool test_vma_merge_special_flags(void)
vmg.flags = flags | special_flag;
vma = merge_new(&vmg);
ASSERT_EQ(vma, NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
}
/* 2. Modify VMA with special flag that would otherwise merge. */
@@ -616,6 +621,7 @@ static bool test_vma_merge_special_flags(void)
vmg.flags = flags | special_flag;
vma = merge_existing(&vmg);
ASSERT_EQ(vma, NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
}
cleanup_mm(&mm, &vmi);
@@ -708,6 +714,7 @@ static bool test_vma_merge_with_close(void)
/* The next VMA having a close() operator should cause the merge to fail.*/
ASSERT_EQ(merge_new(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
/* Now create the VMA so we can merge via modified flags */
vmg_set_range(&vmg, 0x1000, 0x2000, 1, flags);
@@ -719,6 +726,7 @@ static bool test_vma_merge_with_close(void)
* also fail.
*/
ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
/* SCENARIO B
*
@@ -744,6 +752,7 @@ static bool test_vma_merge_with_close(void)
vmg.vma = vma;
/* Make sure merge does not occur. */
ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
cleanup_mm(&mm, &vmi);
return true;
@@ -792,6 +801,7 @@ static bool test_vma_merge_new_with_close(void)
vmg_set_range(&vmg, 0x2000, 0x5000, 2, flags);
vma = merge_new(&vmg);
ASSERT_NE(vma, NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma->vm_start, 0);
ASSERT_EQ(vma->vm_end, 0x5000);
ASSERT_EQ(vma->vm_pgoff, 0);
@@ -831,6 +841,7 @@ static bool test_merge_existing(void)
vmg.prev = vma;
vma->anon_vma = &dummy_anon_vma;
ASSERT_EQ(merge_existing(&vmg), vma_next);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_next->vm_start, 0x3000);
ASSERT_EQ(vma_next->vm_end, 0x9000);
ASSERT_EQ(vma_next->vm_pgoff, 3);
@@ -861,6 +872,7 @@ static bool test_merge_existing(void)
vmg.vma = vma;
vma->anon_vma = &dummy_anon_vma;
ASSERT_EQ(merge_existing(&vmg), vma_next);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_next->vm_start, 0x2000);
ASSERT_EQ(vma_next->vm_end, 0x9000);
ASSERT_EQ(vma_next->vm_pgoff, 2);
@@ -889,6 +901,7 @@ static bool test_merge_existing(void)
vma->anon_vma = &dummy_anon_vma;
ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
ASSERT_EQ(vma_prev->vm_end, 0x6000);
ASSERT_EQ(vma_prev->vm_pgoff, 0);
@@ -920,6 +933,7 @@ static bool test_merge_existing(void)
vmg.vma = vma;
vma->anon_vma = &dummy_anon_vma;
ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
ASSERT_EQ(vma_prev->vm_end, 0x7000);
ASSERT_EQ(vma_prev->vm_pgoff, 0);
@@ -948,6 +962,7 @@ static bool test_merge_existing(void)
vmg.vma = vma;
vma->anon_vma = &dummy_anon_vma;
ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
ASSERT_EQ(vma_prev->vm_end, 0x9000);
ASSERT_EQ(vma_prev->vm_pgoff, 0);
@@ -981,31 +996,37 @@ static bool test_merge_existing(void)
vmg.prev = vma;
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
vmg_set_range(&vmg, 0x5000, 0x6000, 5, flags);
vmg.prev = vma;
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
vmg_set_range(&vmg, 0x6000, 0x7000, 6, flags);
vmg.prev = vma;
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
vmg_set_range(&vmg, 0x4000, 0x7000, 4, flags);
vmg.prev = vma;
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
vmg_set_range(&vmg, 0x4000, 0x6000, 4, flags);
vmg.prev = vma;
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
vmg_set_range(&vmg, 0x5000, 0x6000, 5, flags);
vmg.prev = vma;
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
ASSERT_EQ(cleanup_mm(&mm, &vmi), 3);
@@ -1071,6 +1092,7 @@ static bool test_anon_vma_non_mergeable(void)
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
ASSERT_EQ(vma_prev->vm_end, 0x7000);
ASSERT_EQ(vma_prev->vm_pgoff, 0);
@@ -1106,6 +1128,7 @@ static bool test_anon_vma_non_mergeable(void)
vmg.prev = vma_prev;
ASSERT_EQ(merge_new(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
ASSERT_EQ(vma_prev->vm_end, 0x7000);
ASSERT_EQ(vma_prev->vm_pgoff, 0);
@@ -1181,6 +1204,7 @@ static bool test_dup_anon_vma(void)
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
ASSERT_EQ(vma_prev->vm_end, 0x8000);
@@ -1209,6 +1233,7 @@ static bool test_dup_anon_vma(void)
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
ASSERT_EQ(vma_prev->vm_end, 0x8000);
@@ -1236,6 +1261,7 @@ static bool test_dup_anon_vma(void)
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
ASSERT_EQ(vma_prev->vm_end, 0x5000);
@@ -1263,6 +1289,7 @@ static bool test_dup_anon_vma(void)
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), vma_next);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_next->vm_start, 0x3000);
ASSERT_EQ(vma_next->vm_end, 0x8000);
@@ -1303,6 +1330,7 @@ static bool test_vmi_prealloc_fail(void)
/* This will cause the merge to fail. */
ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_ERROR_NOMEM);
/* We will already have assigned the anon_vma. */
ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
/* And it was both cloned and unlinked. */
@@ -1327,6 +1355,7 @@ static bool test_vmi_prealloc_fail(void)
fail_prealloc = true;
ASSERT_EQ(expand_existing(&vmg), -ENOMEM);
+ ASSERT_EQ(vmg.state, VMA_MERGE_ERROR_NOMEM);
ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
ASSERT_TRUE(dummy_anon_vma.was_cloned);
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
index a3c262c6eb73..c5b9da034511 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -740,6 +740,12 @@ static inline void vma_iter_free(struct vma_iterator *vmi)
mas_destroy(&vmi->mas);
}
+static inline
+struct vm_area_struct *vma_iter_next_range(struct vma_iterator *vmi)
+{
+ return mas_next_range(&vmi->mas, ULONG_MAX);
+}
+
static inline void vm_acct_memory(long pages)
{
}
--
2.46.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 07/10] mm: make vma_prepare() and friends static and internal to vma.c
2024-08-23 20:06 [PATCH v2 00/10] mm: remove vma_merge() Lorenzo Stoakes
` (5 preceding siblings ...)
2024-08-23 20:07 ` [PATCH v2 06/10] mm: avoid using vma_merge() for new VMAs Lorenzo Stoakes
@ 2024-08-23 20:07 ` Lorenzo Stoakes
2024-08-23 20:07 ` [PATCH v2 08/10] mm: introduce commit_merge(), abstracting final commit of merge Lorenzo Stoakes
` (2 subsequent siblings)
9 siblings, 0 replies; 26+ messages in thread
From: Lorenzo Stoakes @ 2024-08-23 20:07 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton; +Cc: Liam R . Howlett, Vlastimil Babka
Now we have abstracted merge behaviour for new VMA ranges, we are able to
render vma_prepare(), init_vma_prep(), vma_complete(),
can_vma_merge_before() and can_vma_merge_after() static and internal to
vma.c.
These are internal implementation details of kernel VMA manipulation and
merging mechanisms and thus should not be exposed. This also renders the
functions userland testable.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/vma.c | 318 +++++++++++++++++++++++++++----------------------------
mm/vma.h | 25 -----
2 files changed, 158 insertions(+), 185 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index 8a5fa15f46a2..07df0d5fb3d2 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -97,8 +97,7 @@ static void init_multi_vma_prep(struct vma_prepare *vp,
*
* We assume the vma may be removed as part of the merge.
*/
-bool
-can_vma_merge_before(struct vma_merge_struct *vmg)
+static bool can_vma_merge_before(struct vma_merge_struct *vmg)
{
pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start);
@@ -120,7 +119,7 @@ can_vma_merge_before(struct vma_merge_struct *vmg)
*
* We assume that vma is not removed as part of the merge.
*/
-bool can_vma_merge_after(struct vma_merge_struct *vmg)
+static bool can_vma_merge_after(struct vma_merge_struct *vmg)
{
if (is_mergeable_vma(vmg, false) &&
is_mergeable_anon_vma(vmg->anon_vma, vmg->prev->anon_vma, vmg->prev)) {
@@ -130,6 +129,162 @@ bool can_vma_merge_after(struct vma_merge_struct *vmg)
return false;
}
+static void __vma_link_file(struct vm_area_struct *vma,
+ struct address_space *mapping)
+{
+ if (vma_is_shared_maywrite(vma))
+ mapping_allow_writable(mapping);
+
+ flush_dcache_mmap_lock(mapping);
+ vma_interval_tree_insert(vma, &mapping->i_mmap);
+ flush_dcache_mmap_unlock(mapping);
+}
+
+/*
+ * Requires inode->i_mapping->i_mmap_rwsem
+ */
+static void __remove_shared_vm_struct(struct vm_area_struct *vma,
+ struct address_space *mapping)
+{
+ if (vma_is_shared_maywrite(vma))
+ mapping_unmap_writable(mapping);
+
+ flush_dcache_mmap_lock(mapping);
+ vma_interval_tree_remove(vma, &mapping->i_mmap);
+ flush_dcache_mmap_unlock(mapping);
+}
+
+/*
+ * vma_prepare() - Helper function for handling locking VMAs prior to altering
+ * @vp: The initialized vma_prepare struct
+ */
+static void vma_prepare(struct vma_prepare *vp)
+{
+ if (vp->file) {
+ uprobe_munmap(vp->vma, vp->vma->vm_start, vp->vma->vm_end);
+
+ if (vp->adj_next)
+ uprobe_munmap(vp->adj_next, vp->adj_next->vm_start,
+ vp->adj_next->vm_end);
+
+ i_mmap_lock_write(vp->mapping);
+ if (vp->insert && vp->insert->vm_file) {
+ /*
+ * Put into interval tree now, so instantiated pages
+ * are visible to arm/parisc __flush_dcache_page
+ * throughout; but we cannot insert into address
+ * space until vma start or end is updated.
+ */
+ __vma_link_file(vp->insert,
+ vp->insert->vm_file->f_mapping);
+ }
+ }
+
+ if (vp->anon_vma) {
+ anon_vma_lock_write(vp->anon_vma);
+ anon_vma_interval_tree_pre_update_vma(vp->vma);
+ if (vp->adj_next)
+ anon_vma_interval_tree_pre_update_vma(vp->adj_next);
+ }
+
+ if (vp->file) {
+ flush_dcache_mmap_lock(vp->mapping);
+ vma_interval_tree_remove(vp->vma, &vp->mapping->i_mmap);
+ if (vp->adj_next)
+ vma_interval_tree_remove(vp->adj_next,
+ &vp->mapping->i_mmap);
+ }
+
+}
+
+/*
+ * vma_complete- Helper function for handling the unlocking after altering VMAs,
+ * or for inserting a VMA.
+ *
+ * @vp: The vma_prepare struct
+ * @vmi: The vma iterator
+ * @mm: The mm_struct
+ */
+static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
+ struct mm_struct *mm)
+{
+ if (vp->file) {
+ if (vp->adj_next)
+ vma_interval_tree_insert(vp->adj_next,
+ &vp->mapping->i_mmap);
+ vma_interval_tree_insert(vp->vma, &vp->mapping->i_mmap);
+ flush_dcache_mmap_unlock(vp->mapping);
+ }
+
+ if (vp->remove && vp->file) {
+ __remove_shared_vm_struct(vp->remove, vp->mapping);
+ if (vp->remove2)
+ __remove_shared_vm_struct(vp->remove2, vp->mapping);
+ } else if (vp->insert) {
+ /*
+ * split_vma has split insert from vma, and needs
+ * us to insert it before dropping the locks
+ * (it may either follow vma or precede it).
+ */
+ vma_iter_store(vmi, vp->insert);
+ mm->map_count++;
+ }
+
+ if (vp->anon_vma) {
+ anon_vma_interval_tree_post_update_vma(vp->vma);
+ if (vp->adj_next)
+ anon_vma_interval_tree_post_update_vma(vp->adj_next);
+ anon_vma_unlock_write(vp->anon_vma);
+ }
+
+ if (vp->file) {
+ i_mmap_unlock_write(vp->mapping);
+ uprobe_mmap(vp->vma);
+
+ if (vp->adj_next)
+ uprobe_mmap(vp->adj_next);
+ }
+
+ if (vp->remove) {
+again:
+ vma_mark_detached(vp->remove, true);
+ if (vp->file) {
+ uprobe_munmap(vp->remove, vp->remove->vm_start,
+ vp->remove->vm_end);
+ fput(vp->file);
+ }
+ if (vp->remove->anon_vma)
+ anon_vma_merge(vp->vma, vp->remove);
+ mm->map_count--;
+ mpol_put(vma_policy(vp->remove));
+ if (!vp->remove2)
+ WARN_ON_ONCE(vp->vma->vm_end < vp->remove->vm_end);
+ vm_area_free(vp->remove);
+
+ /*
+ * In mprotect's case 6 (see comments on vma_merge),
+ * we are removing both mid and next vmas
+ */
+ if (vp->remove2) {
+ vp->remove = vp->remove2;
+ vp->remove2 = NULL;
+ goto again;
+ }
+ }
+ if (vp->insert && vp->file)
+ uprobe_mmap(vp->insert);
+}
+
+/*
+ * init_vma_prep() - Initializer wrapper for vma_prepare struct
+ * @vp: The vma_prepare struct
+ * @vma: The vma that will be altered once locked
+ */
+static void init_vma_prep(struct vma_prepare *vp, struct vm_area_struct *vma)
+{
+ init_multi_vma_prep(vp, vma, NULL, NULL, NULL);
+}
+
/*
* Close a vm structure and free it.
*/
@@ -270,31 +425,6 @@ static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
return __split_vma(vmi, vma, addr, new_below);
}
-/*
- * init_vma_prep() - Initializer wrapper for vma_prepare struct
- * @vp: The vma_prepare struct
- * @vma: The vma that will be altered once locked
- */
-void init_vma_prep(struct vma_prepare *vp,
- struct vm_area_struct *vma)
-{
- init_multi_vma_prep(vp, vma, NULL, NULL, NULL);
-}
-
-/*
- * Requires inode->i_mapping->i_mmap_rwsem
- */
-static void __remove_shared_vm_struct(struct vm_area_struct *vma,
- struct address_space *mapping)
-{
- if (vma_is_shared_maywrite(vma))
- mapping_unmap_writable(mapping);
-
- flush_dcache_mmap_lock(mapping);
- vma_interval_tree_remove(vma, &mapping->i_mmap);
- flush_dcache_mmap_unlock(mapping);
-}
-
/*
* vma has some anon_vma assigned, and is already inserted on that
* anon_vma's interval trees.
@@ -327,60 +457,6 @@ anon_vma_interval_tree_post_update_vma(struct vm_area_struct *vma)
anon_vma_interval_tree_insert(avc, &avc->anon_vma->rb_root);
}
-static void __vma_link_file(struct vm_area_struct *vma,
- struct address_space *mapping)
-{
- if (vma_is_shared_maywrite(vma))
- mapping_allow_writable(mapping);
-
- flush_dcache_mmap_lock(mapping);
- vma_interval_tree_insert(vma, &mapping->i_mmap);
- flush_dcache_mmap_unlock(mapping);
-}
-
-/*
- * vma_prepare() - Helper function for handling locking VMAs prior to altering
- * @vp: The initialized vma_prepare struct
- */
-void vma_prepare(struct vma_prepare *vp)
-{
- if (vp->file) {
- uprobe_munmap(vp->vma, vp->vma->vm_start, vp->vma->vm_end);
-
- if (vp->adj_next)
- uprobe_munmap(vp->adj_next, vp->adj_next->vm_start,
- vp->adj_next->vm_end);
-
- i_mmap_lock_write(vp->mapping);
- if (vp->insert && vp->insert->vm_file) {
- /*
- * Put into interval tree now, so instantiated pages
- * are visible to arm/parisc __flush_dcache_page
- * throughout; but we cannot insert into address
- * space until vma start or end is updated.
- */
- __vma_link_file(vp->insert,
- vp->insert->vm_file->f_mapping);
- }
- }
-
- if (vp->anon_vma) {
- anon_vma_lock_write(vp->anon_vma);
- anon_vma_interval_tree_pre_update_vma(vp->vma);
- if (vp->adj_next)
- anon_vma_interval_tree_pre_update_vma(vp->adj_next);
- }
-
- if (vp->file) {
- flush_dcache_mmap_lock(vp->mapping);
- vma_interval_tree_remove(vp->vma, &vp->mapping->i_mmap);
- if (vp->adj_next)
- vma_interval_tree_remove(vp->adj_next,
- &vp->mapping->i_mmap);
- }
-
-}
-
/*
* dup_anon_vma() - Helper function to duplicate anon_vma
* @dst: The destination VMA
@@ -675,84 +751,6 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
return 0;
}
-/*
- * vma_complete- Helper function for handling the unlocking after altering VMAs,
- * or for inserting a VMA.
- *
- * @vp: The vma_prepare struct
- * @vmi: The vma iterator
- * @mm: The mm_struct
- */
-void vma_complete(struct vma_prepare *vp,
- struct vma_iterator *vmi, struct mm_struct *mm)
-{
- if (vp->file) {
- if (vp->adj_next)
- vma_interval_tree_insert(vp->adj_next,
- &vp->mapping->i_mmap);
- vma_interval_tree_insert(vp->vma, &vp->mapping->i_mmap);
- flush_dcache_mmap_unlock(vp->mapping);
- }
-
- if (vp->remove && vp->file) {
- __remove_shared_vm_struct(vp->remove, vp->mapping);
- if (vp->remove2)
- __remove_shared_vm_struct(vp->remove2, vp->mapping);
- } else if (vp->insert) {
- /*
- * split_vma has split insert from vma, and needs
- * us to insert it before dropping the locks
- * (it may either follow vma or precede it).
- */
- vma_iter_store(vmi, vp->insert);
- mm->map_count++;
- }
-
- if (vp->anon_vma) {
- anon_vma_interval_tree_post_update_vma(vp->vma);
- if (vp->adj_next)
- anon_vma_interval_tree_post_update_vma(vp->adj_next);
- anon_vma_unlock_write(vp->anon_vma);
- }
-
- if (vp->file) {
- i_mmap_unlock_write(vp->mapping);
- uprobe_mmap(vp->vma);
-
- if (vp->adj_next)
- uprobe_mmap(vp->adj_next);
- }
-
- if (vp->remove) {
-again:
- vma_mark_detached(vp->remove, true);
- if (vp->file) {
- uprobe_munmap(vp->remove, vp->remove->vm_start,
- vp->remove->vm_end);
- fput(vp->file);
- }
- if (vp->remove->anon_vma)
- anon_vma_merge(vp->vma, vp->remove);
- mm->map_count--;
- mpol_put(vma_policy(vp->remove));
- if (!vp->remove2)
- WARN_ON_ONCE(vp->vma->vm_end < vp->remove->vm_end);
- vm_area_free(vp->remove);
-
- /*
- * In mprotect's case 6 (see comments on vma_merge),
- * we are removing both mid and next vmas
- */
- if (vp->remove2) {
- vp->remove = vp->remove2;
- vp->remove2 = NULL;
- goto again;
- }
- }
- if (vp->insert && vp->file)
- uprobe_mmap(vp->insert);
-}
-
static inline void vms_clear_ptes(struct vma_munmap_struct *vms,
struct ma_state *mas_detach, bool mm_wr_locked)
{
diff --git a/mm/vma.h b/mm/vma.h
index dbcdf1431014..edc9034d675c 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -132,17 +132,6 @@ void anon_vma_interval_tree_pre_update_vma(struct vm_area_struct *vma);
/* Required for expand_downwards(). */
void anon_vma_interval_tree_post_update_vma(struct vm_area_struct *vma);
-/* Required for do_brk_flags(). */
-void vma_prepare(struct vma_prepare *vp);
-
-/* Required for do_brk_flags(). */
-void init_vma_prep(struct vma_prepare *vp,
- struct vm_area_struct *vma);
-
-/* Required for do_brk_flags(). */
-void vma_complete(struct vma_prepare *vp,
- struct vma_iterator *vmi, struct mm_struct *mm);
-
int vma_expand(struct vma_merge_struct *vmg);
int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long start, unsigned long end, pgoff_t pgoff);
@@ -273,20 +262,6 @@ void remove_vma(struct vm_area_struct *vma, bool unreachable, bool closed);
void unmap_region(struct ma_state *mas, struct vm_area_struct *vma,
struct vm_area_struct *prev, struct vm_area_struct *next);
-/*
- * Can we merge the VMA described by vmg into the following VMA vmg->next?
- *
- * Required by mmap_region().
- */
-bool can_vma_merge_before(struct vma_merge_struct *vmg);
-
-/*
- * Can we merge the VMA described by vmg into the preceding VMA vmg->prev?
- *
- * Required by mmap_region() and do_brk_flags().
- */
-bool can_vma_merge_after(struct vma_merge_struct *vmg);
-
/* We are about to modify the VMA's flags. */
struct vm_area_struct *vma_modify_flags(struct vma_iterator *vmi,
struct vm_area_struct *prev,
--
2.46.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 08/10] mm: introduce commit_merge(), abstracting final commit of merge
2024-08-23 20:06 [PATCH v2 00/10] mm: remove vma_merge() Lorenzo Stoakes
` (6 preceding siblings ...)
2024-08-23 20:07 ` [PATCH v2 07/10] mm: make vma_prepare() and friends static and internal to vma.c Lorenzo Stoakes
@ 2024-08-23 20:07 ` Lorenzo Stoakes
2024-08-23 20:07 ` [PATCH v2 09/10] mm: refactor vma_merge() into modify-only vma_merge_existing_range() Lorenzo Stoakes
2024-08-23 20:07 ` [PATCH v2 10/10] mm: rework vm_ops->close() handling on VMA merge Lorenzo Stoakes
9 siblings, 0 replies; 26+ messages in thread
From: Lorenzo Stoakes @ 2024-08-23 20:07 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton; +Cc: Liam R . Howlett, Vlastimil Babka
Pull the part of vma_expand() which actually commits the merge operation,
that is inserts it into the maple tree and sets the VMA's vma->vm_start and
vma->vm_end parameters, into its own function.
We implement only the parts needed for vma_expand() which now as a result
of previous work is also the means by which new VMA ranges are merged.
The next commit in the series will implement merging of existing ranges
which will extend commit_merge() to accommodate this case and result in all
merges using this common code.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/vma.c | 39 +++++++++++++++++++++++++++------------
1 file changed, 27 insertions(+), 12 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index 07df0d5fb3d2..8d670059e728 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -540,6 +540,31 @@ void validate_mm(struct mm_struct *mm)
}
#endif /* CONFIG_DEBUG_VM_MAPLE_TREE */
+/* Actually perform the VMA merge operation. */
+static int commit_merge(struct vma_merge_struct *vmg,
+ struct vm_area_struct *remove)
+{
+ struct vma_prepare vp;
+
+ init_multi_vma_prep(&vp, vmg->vma, NULL, remove, NULL);
+
+ /* Note: vma iterator must be pointing to 'start'. */
+ vma_iter_config(vmg->vmi, vmg->start, vmg->end);
+
+ if (vma_iter_prealloc(vmg->vmi, vmg->vma))
+ return -ENOMEM;
+
+ vma_prepare(&vp);
+ vma_adjust_trans_huge(vmg->vma, vmg->start, vmg->end, 0);
+ vma_set_range(vmg->vma, vmg->start, vmg->end, vmg->pgoff);
+
+ vma_iter_store(vmg->vmi, vmg->vma);
+
+ vma_complete(&vp, vmg->vmi, vmg->vma->vm_mm);
+
+ return 0;
+}
+
/*
* vma_merge_new_range - Attempt to merge a new VMA into address space
*
@@ -672,7 +697,6 @@ int vma_expand(struct vma_merge_struct *vmg)
bool remove_next = false;
struct vm_area_struct *vma = vmg->vma;
struct vm_area_struct *next = vmg->next;
- struct vma_prepare vp;
mmap_assert_write_locked(vmg->mm);
@@ -687,24 +711,15 @@ int vma_expand(struct vma_merge_struct *vmg)
return ret;
}
- init_multi_vma_prep(&vp, vma, NULL, remove_next ? next : NULL, NULL);
/* Not merging but overwriting any part of next is not handled. */
- VM_WARN_ON(next && !vp.remove &&
+ VM_WARN_ON(next && !remove_next &&
next != vma && vmg->end > next->vm_start);
/* Only handles expanding */
VM_WARN_ON(vma->vm_start < vmg->start || vma->vm_end > vmg->end);
- /* Note: vma iterator must be pointing to 'start' */
- vma_iter_config(vmg->vmi, vmg->start, vmg->end);
- if (vma_iter_prealloc(vmg->vmi, vma))
+ if (commit_merge(vmg, remove_next ? next : NULL))
goto nomem;
- vma_prepare(&vp);
- vma_adjust_trans_huge(vma, vmg->start, vmg->end, 0);
- vma_set_range(vma, vmg->start, vmg->end, vmg->pgoff);
- vma_iter_store(vmg->vmi, vma);
-
- vma_complete(&vp, vmg->vmi, vma->vm_mm);
return 0;
nomem:
--
2.46.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 09/10] mm: refactor vma_merge() into modify-only vma_merge_existing_range()
2024-08-23 20:06 [PATCH v2 00/10] mm: remove vma_merge() Lorenzo Stoakes
` (7 preceding siblings ...)
2024-08-23 20:07 ` [PATCH v2 08/10] mm: introduce commit_merge(), abstracting final commit of merge Lorenzo Stoakes
@ 2024-08-23 20:07 ` Lorenzo Stoakes
2024-08-23 20:07 ` [PATCH v2 10/10] mm: rework vm_ops->close() handling on VMA merge Lorenzo Stoakes
9 siblings, 0 replies; 26+ messages in thread
From: Lorenzo Stoakes @ 2024-08-23 20:07 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton; +Cc: Liam R . Howlett, Vlastimil Babka
The existing vma_merge() function is no longer required to handle what were
previously referred to as cases 1-3 (i.e. the merging of a new VMA), as
this is now handled by vma_merge_new_vma().
Additionally, simplify the convoluted control flow of the original,
maintaining identical logic only expressed more clearly and doing away with
a complicated set of cases, rather logically examining each possible
outcome - merging of both the previous and subsequent VMA, merging of the
previous VMA and merging of the subsequent VMA alone.
We now utilise the previously implemented commit_merge() function to share
logic with vma_expand() de-duplicating code and providing less surface area
for bugs and confusion. In order to do so, we adjust this function to
accept parameters specific to merging existing ranges.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/vma.c | 527 +++++++++++++++++++++-------------------
tools/testing/vma/vma.c | 9 +-
2 files changed, 283 insertions(+), 253 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index 8d670059e728..31c52598ee50 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -542,29 +542,297 @@ void validate_mm(struct mm_struct *mm)
/* Actually perform the VMA merge operation. */
static int commit_merge(struct vma_merge_struct *vmg,
- struct vm_area_struct *remove)
+ struct vm_area_struct *adjust,
+ struct vm_area_struct *remove,
+ struct vm_area_struct *remove2,
+ long adj_start,
+ bool expanded)
{
struct vma_prepare vp;
- init_multi_vma_prep(&vp, vmg->vma, NULL, remove, NULL);
+ init_multi_vma_prep(&vp, vmg->vma, adjust, remove, remove2);
- /* Note: vma iterator must be pointing to 'start'. */
- vma_iter_config(vmg->vmi, vmg->start, vmg->end);
+ VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma &&
+ vp.anon_vma != adjust->anon_vma);
+
+ if (expanded) {
+ /* Note: vma iterator must be pointing to 'start'. */
+ vma_iter_config(vmg->vmi, vmg->start, vmg->end);
+ } else {
+ vma_iter_config(vmg->vmi, adjust->vm_start + adj_start,
+ adjust->vm_end);
+ }
if (vma_iter_prealloc(vmg->vmi, vmg->vma))
return -ENOMEM;
vma_prepare(&vp);
- vma_adjust_trans_huge(vmg->vma, vmg->start, vmg->end, 0);
+ vma_adjust_trans_huge(vmg->vma, vmg->start, vmg->end, adj_start);
vma_set_range(vmg->vma, vmg->start, vmg->end, vmg->pgoff);
- vma_iter_store(vmg->vmi, vmg->vma);
+ if (expanded)
+ vma_iter_store(vmg->vmi, vmg->vma);
+
+ if (adj_start) {
+ adjust->vm_start += adj_start;
+ adjust->vm_pgoff += PHYS_PFN(adj_start);
+ if (adj_start < 0) {
+ WARN_ON(expanded);
+ vma_iter_store(vmg->vmi, adjust);
+ }
+ }
vma_complete(&vp, vmg->vmi, vmg->vma->vm_mm);
return 0;
}
+/*
+ * vma_merge_existing_range - Attempt to merge VMAs based on a VMA having its
+ * attributes modified.
+ *
+ * @vmg: Describes the modifications being made to a VMA and associated
+ * metadata.
+ *
+ * When the attributes of a range within a VMA change, then it might be possible
+ * for immediately adjacent VMAs to be merged into that VMA due to having
+ * identical properties.
+ *
+ * This function checks for the existence of any such mergeable VMAs and updates
+ * the maple tree describing the @vmg->vma->vm_mm address space to account for
+ * this, as well as any VMAs shrunk/expanded/deleted as a result of this merge.
+ *
+ * As part of this operation, if a merge occurs, the @vmg object will have its
+ * vma, start, end, and pgoff fields modified to execute the merge. Subsequent
+ * calls to this function should reset these fields.
+ *
+ * Returns: The merged VMA if merge succeeds, or NULL otherwise.
+ *
+ * ASSUMPTIONS:
+ * - The caller must assign the VMA to be modifed to @vmg->vma.
+ * - The caller must have set @vmg->prev to the previous VMA, if there is one.
+ * - The caller must not set @vmg->next, as we determine this.
+ * - The caller must hold a WRITE lock on the mm_struct->mmap_lock.
+ * - vmi must be positioned within [@vmg->vma->vm_start, @vmg->vma->vm_end).
+ */
+static struct vm_area_struct *vma_merge_existing_range(struct vma_merge_struct *vmg)
+{
+ struct vm_area_struct *vma = vmg->vma;
+ struct vm_area_struct *prev = vmg->prev;
+ struct vm_area_struct *next, *res;
+ struct vm_area_struct *anon_dup = NULL;
+ struct vm_area_struct *adjust = NULL;
+ unsigned long start = vmg->start;
+ unsigned long end = vmg->end;
+ bool left_side = vma && start == vma->vm_start;
+ bool right_side = vma && end == vma->vm_end;
+ bool merge_both = false;
+ int err = 0;
+ long adj_start = 0;
+ bool merge_will_delete_vma, merge_will_delete_next;
+ bool merge_left, merge_right;
+ bool expanded;
+
+ mmap_assert_write_locked(vmg->mm);
+ VM_WARN_ON(!vma); /* We are modifying a VMA, so caller must specify. */
+ VM_WARN_ON(vmg->next); /* We set this. */
+ VM_WARN_ON(prev && start <= prev->vm_start);
+ VM_WARN_ON(start >= end);
+ /*
+ * If vma == prev, then we are offset into a VMA. Otherwise, if we are
+ * not, we must span a portion of the VMA.
+ */
+ VM_WARN_ON(vma && ((vma != prev && vmg->start != vma->vm_start) ||
+ vmg->end > vma->vm_end));
+ /* The vmi must be positioned within vmg->vma. */
+ VM_WARN_ON(vma && !(vma_iter_addr(vmg->vmi) >= vma->vm_start &&
+ vma_iter_addr(vmg->vmi) < vma->vm_end));
+
+ vmg->state = VMA_MERGE_NOMERGE;
+
+ /*
+ * If a special mapping or neither at the furthermost left or right side
+ * of the VMA, then we have no chance of merging and should abort.
+ *
+ * We later require that vma->vm_flags == vm_flags, so this tests
+ * vma->vm_flags & VM_SPECIAL, too.
+ */
+ if (vmg->flags & VM_SPECIAL || (!left_side && !right_side))
+ return NULL;
+
+ if (left_side && prev && prev->vm_end == start && can_vma_merge_after(vmg)) {
+ merge_left = true;
+ vma_prev(vmg->vmi);
+ } else {
+ merge_left = false;
+ }
+
+ if (right_side) {
+ next = vmg->next = vma_lookup(vma->vm_mm, end);
+
+ /*
+ * We can merge right if there is a subsequent VMA, if it is
+ * immediately adjacent, and if it is compatible with vma.
+ */
+ merge_right = next && end == next->vm_start &&
+ can_vma_merge_before(vmg);
+
+ /*
+ * We can only merge both if the anonymous VMA of the previous
+ * VMA is compatible with the anonymous VMA of the subsequent
+ * VMA.
+ *
+ * Otherwise, we default to merging only the left.
+ */
+ if (merge_left && merge_right)
+ merge_right = merge_both =
+ is_mergeable_anon_vma(prev->anon_vma,
+ next->anon_vma, NULL);
+ } else {
+ merge_right = false;
+ next = NULL;
+ }
+
+ /* If we have nothing to merge, abort. */
+ if (!merge_left && !merge_right)
+ return NULL;
+
+ /* If we span the entire VMA, a merge implies it will be deleted. */
+ merge_will_delete_vma = left_side && right_side;
+ /*
+ * If we merge both VMAs, then next is also deleted. This implies
+ * merge_will_delete_vma also.
+ */
+ merge_will_delete_next = merge_both;
+
+ /* No matter what happens, we will be adjusting vma. */
+ vma_start_write(vma);
+
+ if (merge_left)
+ vma_start_write(prev);
+
+ if (merge_right)
+ vma_start_write(next);
+
+ if (merge_both) {
+ /*
+ * |<----->|
+ * |-------*********-------|
+ * prev vma next
+ * extend delete delete
+ */
+
+ vmg->vma = prev;
+ vmg->start = prev->vm_start;
+ vmg->end = next->vm_end;
+ vmg->pgoff = prev->vm_pgoff;
+
+ /*
+ * We already ensured anon_vma compatibility above, so now it's
+ * simply a case of, if prev has no anon_vma object, which of
+ * next or vma contains the anon_vma we must duplicate.
+ */
+ err = dup_anon_vma(prev, next->anon_vma ? next : vma, &anon_dup);
+ } else if (merge_left) {
+ /*
+ * |<----->| OR
+ * |<--------->|
+ * |-------*************
+ * prev vma
+ * extend shrink/delete
+ */
+
+ vmg->vma = prev;
+ vmg->start = prev->vm_start;
+ vmg->pgoff = prev->vm_pgoff;
+
+ if (merge_will_delete_vma) {
+ /*
+ * can_vma_merge_after() assumed we would not be
+ * removing vma, so it skipped the check for
+ * vm_ops->close, but we are removing vma.
+ */
+ if (vma->vm_ops && vma->vm_ops->close)
+ err = -EINVAL;
+ } else {
+ adjust = vma;
+ adj_start = vmg->end - vma->vm_start;
+ }
+
+ if (!err)
+ err = dup_anon_vma(prev, vma, &anon_dup);
+ } else { /* merge_right */
+ /*
+ * |<----->| OR
+ * |<--------->|
+ * *************-------|
+ * vma next
+ * shrink/delete extend
+ */
+
+ pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start);
+
+ VM_WARN_ON(!merge_right);
+ /* If we are offset into a VMA, then prev must be vma. */
+ VM_WARN_ON(vmg->start > vma->vm_start && prev && vma != prev);
+
+ if (merge_will_delete_vma) {
+ vmg->vma = next;
+ vmg->end = next->vm_end;
+ vmg->pgoff = next->vm_pgoff - pglen;
+ } else {
+ /*
+ * We shrink vma and expand next.
+ *
+ * IMPORTANT: This is the ONLY case where the final
+ * merged VMA is NOT vmg->vma, but rather vmg->next.
+ */
+
+ vmg->start = vma->vm_start;
+ vmg->end = start;
+ vmg->pgoff = vma->vm_pgoff;
+
+ adjust = next;
+ adj_start = -(vma->vm_end - start);
+ }
+
+ err = dup_anon_vma(next, vma, &anon_dup);
+ }
+
+ if (err)
+ goto abort;
+
+ /*
+ * In nearly all cases, we expand vmg->vma. There is one exception -
+ * merge_right where we partially span the VMA. In this case we shrink
+ * the end of vmg->vma and adjust the start of vmg->next accordingly.
+ */
+ expanded = !merge_right || merge_will_delete_vma;
+
+ if (commit_merge(vmg, adjust,
+ merge_will_delete_vma ? vma : NULL,
+ merge_will_delete_next ? next : NULL,
+ adj_start, expanded)) {
+ if (anon_dup)
+ unlink_anon_vmas(anon_dup);
+
+ vmg->state = VMA_MERGE_ERROR_NOMEM;
+ return NULL;
+ }
+
+ res = merge_left ? prev : next;
+ khugepaged_enter_vma(res, vmg->flags);
+
+ vmg->state = VMA_MERGE_SUCCESS;
+ return res;
+
+abort:
+ vma_iter_set(vmg->vmi, start);
+ vma_iter_load(vmg->vmi);
+ vmg->state = VMA_MERGE_ERROR_NOMEM;
+ return NULL;
+}
+
/*
* vma_merge_new_range - Attempt to merge a new VMA into address space
*
@@ -717,7 +985,7 @@ int vma_expand(struct vma_merge_struct *vmg)
/* Only handles expanding */
VM_WARN_ON(vma->vm_start < vmg->start || vma->vm_end > vmg->end);
- if (commit_merge(vmg, remove_next ? next : NULL))
+ if (commit_merge(vmg, NULL, remove_next ? next : NULL, NULL, 0, true))
goto nomem;
return 0;
@@ -1092,249 +1360,6 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
}
-/*
- * Given a mapping request (addr,end,vm_flags,file,pgoff,anon_name),
- * figure out whether that can be merged with its predecessor or its
- * successor. Or both (it neatly fills a hole).
- *
- * In most cases - when called for mmap, brk or mremap - [addr,end) is
- * certain not to be mapped by the time vma_merge is called; but when
- * called for mprotect, it is certain to be already mapped (either at
- * an offset within prev, or at the start of next), and the flags of
- * this area are about to be changed to vm_flags - and the no-change
- * case has already been eliminated.
- *
- * The following mprotect cases have to be considered, where **** is
- * the area passed down from mprotect_fixup, never extending beyond one
- * vma, PPPP is the previous vma, CCCC is a concurrent vma that starts
- * at the same address as **** and is of the same or larger span, and
- * NNNN the next vma after ****:
- *
- * **** **** ****
- * PPPPPPNNNNNN PPPPPPNNNNNN PPPPPPCCCCCC
- * cannot merge might become might become
- * PPNNNNNNNNNN PPPPPPPPPPCC
- * mmap, brk or case 4 below case 5 below
- * mremap move:
- * **** ****
- * PPPP NNNN PPPPCCCCNNNN
- * might become might become
- * PPPPPPPPPPPP 1 or PPPPPPPPPPPP 6 or
- * PPPPPPPPNNNN 2 or PPPPPPPPNNNN 7 or
- * PPPPNNNNNNNN 3 PPPPNNNNNNNN 8
- *
- * It is important for case 8 that the vma CCCC overlapping the
- * region **** is never going to extended over NNNN. Instead NNNN must
- * be extended in region **** and CCCC must be removed. This way in
- * all cases where vma_merge succeeds, the moment vma_merge drops the
- * rmap_locks, the properties of the merged vma will be already
- * correct for the whole merged range. Some of those properties like
- * vm_page_prot/vm_flags may be accessed by rmap_walks and they must
- * be correct for the whole merged range immediately after the
- * rmap_locks are released. Otherwise if NNNN would be removed and
- * CCCC would be extended over the NNNN range, remove_migration_ptes
- * or other rmap walkers (if working on addresses beyond the "end"
- * parameter) may establish ptes with the wrong permissions of CCCC
- * instead of the right permissions of NNNN.
- *
- * In the code below:
- * PPPP is represented by *prev
- * CCCC is represented by *curr or not represented at all (NULL)
- * NNNN is represented by *next or not represented at all (NULL)
- * **** is not represented - it will be merged and the vma containing the
- * area is returned, or the function will return NULL
- */
-static struct vm_area_struct *vma_merge(struct vma_merge_struct *vmg)
-{
- struct mm_struct *mm = vmg->mm;
- struct vm_area_struct *prev = vmg->prev;
- struct vm_area_struct *curr, *next, *res;
- struct vm_area_struct *vma, *adjust, *remove, *remove2;
- struct vm_area_struct *anon_dup = NULL;
- struct vma_prepare vp;
- pgoff_t vma_pgoff;
- int err = 0;
- bool merge_prev = false;
- bool merge_next = false;
- bool vma_expanded = false;
- unsigned long addr = vmg->start;
- unsigned long end = vmg->end;
- unsigned long vma_start = addr;
- unsigned long vma_end = end;
- pgoff_t pglen = PHYS_PFN(end - addr);
- long adj_start = 0;
-
- vmg->state = VMA_MERGE_NOMERGE;
-
- /*
- * We later require that vma->vm_flags == vm_flags,
- * so this tests vma->vm_flags & VM_SPECIAL, too.
- */
- if (vmg->flags & VM_SPECIAL)
- return NULL;
-
- /* Does the input range span an existing VMA? (cases 5 - 8) */
- curr = find_vma_intersection(mm, prev ? prev->vm_end : 0, end);
-
- if (!curr || /* cases 1 - 4 */
- end == curr->vm_end) /* cases 6 - 8, adjacent VMA */
- next = vmg->next = vma_lookup(mm, end);
- else
- next = vmg->next = NULL; /* case 5 */
-
- if (prev) {
- vma_start = prev->vm_start;
- vma_pgoff = prev->vm_pgoff;
-
- /* Can we merge the predecessor? */
- if (addr == prev->vm_end && can_vma_merge_after(vmg)) {
- merge_prev = true;
- vma_prev(vmg->vmi);
- }
- }
-
- /* Can we merge the successor? */
- if (next && can_vma_merge_before(vmg)) {
- merge_next = true;
- }
-
- /* Verify some invariant that must be enforced by the caller. */
- VM_WARN_ON(prev && addr <= prev->vm_start);
- VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end));
- VM_WARN_ON(addr >= end);
-
- if (!merge_prev && !merge_next)
- return NULL; /* Not mergeable. */
-
- if (merge_prev)
- vma_start_write(prev);
-
- res = vma = prev;
- remove = remove2 = adjust = NULL;
-
- /* Can we merge both the predecessor and the successor? */
- if (merge_prev && merge_next &&
- is_mergeable_anon_vma(prev->anon_vma, next->anon_vma, NULL)) {
- vma_start_write(next);
- remove = next; /* case 1 */
- vma_end = next->vm_end;
- err = dup_anon_vma(prev, next, &anon_dup);
- if (curr) { /* case 6 */
- vma_start_write(curr);
- remove = curr;
- remove2 = next;
- /*
- * Note that the dup_anon_vma below cannot overwrite err
- * since the first caller would do nothing unless next
- * has an anon_vma.
- */
- if (!next->anon_vma)
- err = dup_anon_vma(prev, curr, &anon_dup);
- }
- } else if (merge_prev) { /* case 2 */
- if (curr) {
- vma_start_write(curr);
- if (end == curr->vm_end) { /* case 7 */
- /*
- * can_vma_merge_after() assumed we would not be
- * removing prev vma, so it skipped the check
- * for vm_ops->close, but we are removing curr
- */
- if (curr->vm_ops && curr->vm_ops->close)
- err = -EINVAL;
- remove = curr;
- } else { /* case 5 */
- adjust = curr;
- adj_start = end - curr->vm_start;
- }
- if (!err)
- err = dup_anon_vma(prev, curr, &anon_dup);
- }
- } else { /* merge_next */
- vma_start_write(next);
- res = next;
- if (prev && addr < prev->vm_end) { /* case 4 */
- vma_start_write(prev);
- vma_end = addr;
- adjust = next;
- adj_start = -(prev->vm_end - addr);
- err = dup_anon_vma(next, prev, &anon_dup);
- } else {
- /*
- * Note that cases 3 and 8 are the ONLY ones where prev
- * is permitted to be (but is not necessarily) NULL.
- */
- vma = next; /* case 3 */
- vma_start = addr;
- vma_end = next->vm_end;
- vma_pgoff = next->vm_pgoff - pglen;
- if (curr) { /* case 8 */
- vma_pgoff = curr->vm_pgoff;
- vma_start_write(curr);
- remove = curr;
- err = dup_anon_vma(next, curr, &anon_dup);
- }
- }
- }
-
- /* Error in anon_vma clone. */
- if (err)
- goto anon_vma_fail;
-
- if (vma_start < vma->vm_start || vma_end > vma->vm_end)
- vma_expanded = true;
-
- if (vma_expanded) {
- vma_iter_config(vmg->vmi, vma_start, vma_end);
- } else {
- vma_iter_config(vmg->vmi, adjust->vm_start + adj_start,
- adjust->vm_end);
- }
-
- if (vma_iter_prealloc(vmg->vmi, vma))
- goto prealloc_fail;
-
- init_multi_vma_prep(&vp, vma, adjust, remove, remove2);
- VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma &&
- vp.anon_vma != adjust->anon_vma);
-
- vma_prepare(&vp);
- vma_adjust_trans_huge(vma, vma_start, vma_end, adj_start);
- vma_set_range(vma, vma_start, vma_end, vma_pgoff);
-
- if (vma_expanded)
- vma_iter_store(vmg->vmi, vma);
-
- if (adj_start) {
- adjust->vm_start += adj_start;
- adjust->vm_pgoff += adj_start >> PAGE_SHIFT;
- if (adj_start < 0) {
- WARN_ON(vma_expanded);
- vma_iter_store(vmg->vmi, next);
- }
- }
-
- vma_complete(&vp, vmg->vmi, mm);
- validate_mm(mm);
- khugepaged_enter_vma(res, vmg->flags);
-
- vmg->state = VMA_MERGE_SUCCESS;
- return res;
-
-prealloc_fail:
- vmg->state = VMA_MERGE_ERROR_NOMEM;
- if (anon_dup)
- unlink_anon_vmas(anon_dup);
-
-anon_vma_fail:
- if (err == -ENOMEM)
- vmg->state = VMA_MERGE_ERROR_NOMEM;
-
- vma_iter_set(vmg->vmi, addr);
- vma_iter_load(vmg->vmi);
- return NULL;
-}
-
/*
* We are about to modify one or multiple of a VMA's flags, policy, userfaultfd
* context and anonymous VMA name within the range [start, end).
@@ -1354,7 +1379,7 @@ static struct vm_area_struct *vma_modify(struct vma_merge_struct *vmg)
struct vm_area_struct *merged;
/* First, try to merge. */
- merged = vma_merge(vmg);
+ merged = vma_merge_existing_range(vmg);
if (merged)
return merged;
diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
index 3a3a850d951c..d31bb7bd972a 100644
--- a/tools/testing/vma/vma.c
+++ b/tools/testing/vma/vma.c
@@ -112,7 +112,7 @@ static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
*/
static struct vm_area_struct *merge_existing(struct vma_merge_struct *vmg)
{
- return vma_merge(vmg);
+ return vma_merge_existing_range(vmg);
}
/*
@@ -752,7 +752,12 @@ static bool test_vma_merge_with_close(void)
vmg.vma = vma;
/* Make sure merge does not occur. */
ASSERT_EQ(merge_existing(&vmg), NULL);
- ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
+ /*
+ * Initially this is misapprehended as an out of memory report, as the
+ * close() check is handled in the same way as anon_vma duplication
+ * failures, however a subsequent patch resolves this.
+ */
+ ASSERT_EQ(vmg.state, VMA_MERGE_ERROR_NOMEM);
cleanup_mm(&mm, &vmi);
return true;
--
2.46.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 10/10] mm: rework vm_ops->close() handling on VMA merge
2024-08-23 20:06 [PATCH v2 00/10] mm: remove vma_merge() Lorenzo Stoakes
` (8 preceding siblings ...)
2024-08-23 20:07 ` [PATCH v2 09/10] mm: refactor vma_merge() into modify-only vma_merge_existing_range() Lorenzo Stoakes
@ 2024-08-23 20:07 ` Lorenzo Stoakes
9 siblings, 0 replies; 26+ messages in thread
From: Lorenzo Stoakes @ 2024-08-23 20:07 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton; +Cc: Liam R . Howlett, Vlastimil Babka
In commit 714965ca8252 ("mm/mmap: start distinguishing if vma can be
removed in mergeability test") we relaxed the VMA merge rules for VMAs
possessing a vm_ops->close() hook, permitting this operation in instances
where we wouldn't delete the VMA as part of the merge operation.
This was later corrected in commit fc0c8f9089c2 ("mm, mmap: fix vma_merge()
case 7 with vma_ops->close") to account for a subtle case that the previous
commit had not taken into account.
In both instances, we first rely on is_mergeable_vma() to determine whether
we might be dealing with a VMA that might be removed, taking advantage of
the fact that a 'previous' VMA will never be deleted, only VMAs that follow
it.
The second patch corrects the instance where a merge of the previous VMA
into a subsequent one did not correctly check whether the subsequent VMA
had a vm_ops->close() handler.
Both changes prevent merge cases that are actually permissible (for
instance a merge of a VMA into a following VMA with a vm_ops->close(), but
with no previous VMA, which would result in the next VMA being extended,
not deleted).
In addition, both changes fail to consider the case where a VMA that would
otherwise be merged with the previous and next VMA might have
vm_ops->close(), on the assumption that for this to be the case, all three
would have to have the same vma->vm_file to be mergeable and thus the same
vm_ops.
And in addition both changes operate at 50,000 feet, trying to guess
whether a VMA will be deleted.
As we have majorly refactored the VMA merge operation and de-duplicated
code to the point where we know precisely where deletions will occur, this
patch removes the aforementioned checks altogether and instead explicitly
checks whether a VMA will be deleted.
In cases where a reduced merge is still possible (where we merge both
previous and next VMA but the next VMA has a vm_ops->close hook, meaning we
could just merge the previous and current VMA), we do so, otherwise the
merge is not permitted.
We take advantage of our userland testing to assert that this functions
correctly - replacing the previous limited vm_ops->close() tests with tests
for every single case where we delete a VMA.
We also update all testing for both new and modified VMAs to set
vma->vm_ops->close() in every single instance where this would not prevent
the merge, to assert that we never do so.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/vma.c | 60 ++++++++++-----
tools/testing/vma/vma.c | 166 +++++++++++++++++++++++++++++++---------
2 files changed, 167 insertions(+), 59 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index 31c52598ee50..dcc641a3f979 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -10,14 +10,6 @@
static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_next)
{
struct vm_area_struct *vma = merge_next ? vmg->next : vmg->prev;
- /*
- * If the vma has a ->close operation then the driver probably needs to
- * release per-vma resources, so we don't attempt to merge those if the
- * caller indicates the current vma may be removed as part of the merge,
- * which is the case if we are attempting to merge the next VMA into
- * this one.
- */
- bool may_remove_vma = merge_next;
if (!mpol_equal(vmg->policy, vma_policy(vma)))
return false;
@@ -33,8 +25,6 @@ static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_nex
return false;
if (vma->vm_file != vmg->file)
return false;
- if (may_remove_vma && vma->vm_ops && vma->vm_ops->close)
- return false;
if (!is_mergeable_vm_userfaultfd_ctx(vma, vmg->uffd_ctx))
return false;
if (!anon_vma_name_eq(anon_vma_name(vma), vmg->anon_name))
@@ -587,6 +577,12 @@ static int commit_merge(struct vma_merge_struct *vmg,
return 0;
}
+/* We can only remove VMAs when merging if they do not have a close hook. */
+static bool can_merge_remove_vma(struct vm_area_struct *vma)
+{
+ return !vma->vm_ops || !vma->vm_ops->close;
+}
+
/*
* vma_merge_existing_range - Attempt to merge VMAs based on a VMA having its
* attributes modified.
@@ -699,12 +695,30 @@ static struct vm_area_struct *vma_merge_existing_range(struct vma_merge_struct *
/* If we span the entire VMA, a merge implies it will be deleted. */
merge_will_delete_vma = left_side && right_side;
+
+ /*
+ * If we need to remove vma in its entirety but are unable to do so,
+ * we have no sensible recourse but to abort the merge.
+ */
+ if (merge_will_delete_vma && !can_merge_remove_vma(vma))
+ return NULL;
+
/*
* If we merge both VMAs, then next is also deleted. This implies
* merge_will_delete_vma also.
*/
merge_will_delete_next = merge_both;
+ /*
+ * If we cannot delete next, then we can reduce the operation to merging
+ * prev and vma (thereby deleting vma).
+ */
+ if (merge_will_delete_next && !can_merge_remove_vma(next)) {
+ merge_will_delete_next = false;
+ merge_right = false;
+ merge_both = false;
+ }
+
/* No matter what happens, we will be adjusting vma. */
vma_start_write(vma);
@@ -746,21 +760,12 @@ static struct vm_area_struct *vma_merge_existing_range(struct vma_merge_struct *
vmg->start = prev->vm_start;
vmg->pgoff = prev->vm_pgoff;
- if (merge_will_delete_vma) {
- /*
- * can_vma_merge_after() assumed we would not be
- * removing vma, so it skipped the check for
- * vm_ops->close, but we are removing vma.
- */
- if (vma->vm_ops && vma->vm_ops->close)
- err = -EINVAL;
- } else {
+ if (!merge_will_delete_vma) {
adjust = vma;
adj_start = vmg->end - vma->vm_start;
}
- if (!err)
- err = dup_anon_vma(prev, vma, &anon_dup);
+ err = dup_anon_vma(prev, vma, &anon_dup);
} else { /* merge_right */
/*
* |<----->| OR
@@ -885,6 +890,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
unsigned long end = vmg->end;
pgoff_t pgoff = vmg->pgoff;
pgoff_t pglen = PHYS_PFN(end - start);
+ bool merge_next = false;
bool can_merge_before, can_merge_after;
mmap_assert_write_locked(vmg->mm);
@@ -910,6 +916,8 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
vmg->end = next->vm_end;
vmg->vma = next;
vmg->pgoff = next->vm_pgoff - pglen;
+
+ merge_next = true;
}
/* If we can merge with the previous VMA, adjust vmg accordingly. */
@@ -918,6 +926,14 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
vmg->vma = prev;
vmg->pgoff = prev->vm_pgoff;
+ /*
+ * If this merge would result in removal of the next VMA but we
+ * are not permitted to do so, reduce the operation to merging
+ * prev and vma.
+ */
+ if (merge_next && !can_merge_remove_vma(next))
+ vmg->end = end;
+
vma_prev(vmg->vmi); /* Equivalent to going to the previous range */
}
@@ -973,6 +989,8 @@ int vma_expand(struct vma_merge_struct *vmg)
int ret;
remove_next = true;
+ /* This should already have been checked by this point. */
+ VM_WARN_ON(!can_merge_remove_vma(next));
vma_start_write(next);
ret = dup_anon_vma(vma, next, &anon_dup);
if (ret)
diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
index d31bb7bd972a..ea4f9109ffc2 100644
--- a/tools/testing/vma/vma.c
+++ b/tools/testing/vma/vma.c
@@ -387,6 +387,9 @@ static bool test_merge_new(void)
struct anon_vma_chain dummy_anon_vma_chain_d = {
.anon_vma = &dummy_anon_vma,
};
+ const struct vm_operations_struct vm_ops = {
+ .close = dummy_close,
+ };
int count;
struct vm_area_struct *vma, *vma_a, *vma_b, *vma_c, *vma_d;
bool merged;
@@ -430,6 +433,7 @@ static bool test_merge_new(void)
* 0123456789abc
* AA*B DD CC
*/
+ vma_a->vm_ops = &vm_ops; /* This should have no impact. */
vma_b->anon_vma = &dummy_anon_vma;
vma = try_merge_new_vma(&mm, &vmg, 0x2000, 0x3000, 2, flags, &merged);
ASSERT_EQ(vma, vma_a);
@@ -466,6 +470,7 @@ static bool test_merge_new(void)
* AAAAA *DD CC
*/
vma_d->anon_vma = &dummy_anon_vma;
+ vma_d->vm_ops = &vm_ops; /* This should have no impact. */
vma = try_merge_new_vma(&mm, &vmg, 0x6000, 0x7000, 6, flags, &merged);
ASSERT_EQ(vma, vma_d);
/* Prepend. */
@@ -483,6 +488,7 @@ static bool test_merge_new(void)
* 0123456789abc
* AAAAA*DDD CC
*/
+ vma_d->vm_ops = NULL; /* This would otherwise degrade the merge. */
vma = try_merge_new_vma(&mm, &vmg, 0x5000, 0x6000, 5, flags, &merged);
ASSERT_EQ(vma, vma_a);
/* Merge with A, delete D. */
@@ -640,13 +646,11 @@ static bool test_vma_merge_with_close(void)
const struct vm_operations_struct vm_ops = {
.close = dummy_close,
};
- struct vm_area_struct *vma_next =
- alloc_and_link_vma(&mm, 0x2000, 0x3000, 2, flags);
- struct vm_area_struct *vma;
+ struct vm_area_struct *vma_prev, *vma_next, *vma;
/*
- * When we merge VMAs we sometimes have to delete others as part of the
- * operation.
+ * When merging VMAs we are not permitted to remove any VMA that has a
+ * vm_ops->close() hook.
*
* Considering the two possible adjacent VMAs to which a VMA can be
* merged:
@@ -697,28 +701,52 @@ static bool test_vma_merge_with_close(void)
* would be set too, and thus scenario A would pick this up.
*/
- ASSERT_NE(vma_next, NULL);
-
/*
- * SCENARIO A
+ * The only case of a new VMA merge that results in a VMA being deleted
+ * is one where both the previous and next VMAs are merged - in this
+ * instance the next VMA is deleted, and the previous VMA is extended.
*
- * 0123
- * *N
+ * If we are unable to do so, we reduce the operation to simply
+ * extending the prev VMA and not merging next.
+ *
+ * 0123456789
+ * PPP**NNNN
+ * ->
+ * 0123456789
+ * PPPPPPNNN
*/
- /* Make the next VMA have a close() callback. */
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x5000, 0x9000, 5, flags);
vma_next->vm_ops = &vm_ops;
- /* Our proposed VMA has characteristics that would otherwise be merged. */
- vmg_set_range(&vmg, 0x1000, 0x2000, 1, flags);
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+ ASSERT_EQ(merge_new(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x5000);
+ ASSERT_EQ(vma_prev->vm_pgoff, 0);
- /* The next VMA having a close() operator should cause the merge to fail.*/
- ASSERT_EQ(merge_new(&vmg), NULL);
- ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
- /* Now create the VMA so we can merge via modified flags */
- vmg_set_range(&vmg, 0x1000, 0x2000, 1, flags);
- vma = alloc_and_link_vma(&mm, 0x1000, 0x2000, 1, flags);
+ /*
+ * When modifying an existing VMA there are further cases where we
+ * delete VMAs.
+ *
+ * <>
+ * 0123456789
+ * PPPVV
+ *
+ * In this instance, if vma has a close hook, the merge simply cannot
+ * proceed.
+ */
+
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+ vma->vm_ops = &vm_ops;
+
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+ vmg.prev = vma_prev;
vmg.vma = vma;
/*
@@ -728,38 +756,90 @@ static bool test_vma_merge_with_close(void)
ASSERT_EQ(merge_existing(&vmg), NULL);
ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
- /* SCENARIO B
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
+
+ /*
+ * This case is mirrored if merging with next.
*
- * 0123
- * P*
+ * <>
+ * 0123456789
+ * VVNNNN
*
- * In order for this scenario to trigger, the VMA currently being
- * modified must also have a .close().
+ * In this instance, if vma has a close hook, the merge simply cannot
+ * proceed.
*/
- /* Reset VMG state. */
- vmg_set_range(&vmg, 0x1000, 0x2000, 1, flags);
- /*
- * Make next unmergeable, and don't let the scenario A check pick this
- * up, we want to reproduce scenario B only.
- */
- vma_next->vm_ops = NULL;
- vma_next->__vm_flags &= ~VM_MAYWRITE;
- /* Allocate prev. */
- vmg.prev = alloc_and_link_vma(&mm, 0, 0x1000, 0, flags);
- /* Assign a vm_ops->close() function to VMA explicitly. */
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x5000, 0x9000, 5, flags);
vma->vm_ops = &vm_ops;
+
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
vmg.vma = vma;
- /* Make sure merge does not occur. */
ASSERT_EQ(merge_existing(&vmg), NULL);
/*
* Initially this is misapprehended as an out of memory report, as the
* close() check is handled in the same way as anon_vma duplication
* failures, however a subsequent patch resolves this.
*/
- ASSERT_EQ(vmg.state, VMA_MERGE_ERROR_NOMEM);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
+
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
+
+ /*
+ * Finally, we consider two variants of the case where we modify a VMA
+ * to merge with both the previous and next VMAs.
+ *
+ * The first variant is where vma has a close hook. In this instance, no
+ * merge can proceed.
+ *
+ * <>
+ * 0123456789
+ * PPPVVNNNN
+ */
+
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x5000, 0x9000, 5, flags);
+ vma->vm_ops = &vm_ops;
+
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
+
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 3);
+
+ /*
+ * The second variant is where next has a close hook. In this instance,
+ * we reduce the operation to a merge between prev and vma.
+ *
+ * <>
+ * 0123456789
+ * PPPVVNNNN
+ * ->
+ * 0123456789
+ * PPPPPNNNN
+ */
+
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x5000, 0x9000, 5, flags);
+ vma_next->vm_ops = &vm_ops;
+
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+
+ ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x5000);
+ ASSERT_EQ(vma_prev->vm_pgoff, 0);
+
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
- cleanup_mm(&mm, &vmi);
return true;
}
@@ -828,6 +908,9 @@ static bool test_merge_existing(void)
.mm = &mm,
.vmi = &vmi,
};
+ const struct vm_operations_struct vm_ops = {
+ .close = dummy_close,
+ };
/*
* Merge right case - partial span.
@@ -840,7 +923,9 @@ static bool test_merge_existing(void)
* VNNNNNN
*/
vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, flags);
+ vma->vm_ops = &vm_ops; /* This should have no impact. */
vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, flags);
+ vma_next->vm_ops = &vm_ops; /* This should have no impact. */
vmg_set_range(&vmg, 0x3000, 0x6000, 3, flags);
vmg.vma = vma;
vmg.prev = vma;
@@ -873,6 +958,7 @@ static bool test_merge_existing(void)
*/
vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, flags);
vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, flags);
+ vma_next->vm_ops = &vm_ops; /* This should have no impact. */
vmg_set_range(&vmg, 0x2000, 0x6000, 2, flags);
vmg.vma = vma;
vma->anon_vma = &dummy_anon_vma;
@@ -899,7 +985,9 @@ static bool test_merge_existing(void)
* PPPPPPV
*/
vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
+ vma->vm_ops = &vm_ops; /* This should have no impact. */
vmg_set_range(&vmg, 0x3000, 0x6000, 3, flags);
vmg.prev = vma_prev;
vmg.vma = vma;
@@ -932,6 +1020,7 @@ static bool test_merge_existing(void)
* PPPPPPP
*/
vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
vmg.prev = vma_prev;
@@ -960,6 +1049,7 @@ static bool test_merge_existing(void)
* PPPPPPPPPP
*/
vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, flags);
vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
--
2.46.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v2 06/10] mm: avoid using vma_merge() for new VMAs
2024-08-23 20:07 ` [PATCH v2 06/10] mm: avoid using vma_merge() for new VMAs Lorenzo Stoakes
@ 2024-08-27 11:41 ` Lorenzo Stoakes
2024-08-28 20:52 ` Liam R. Howlett
2024-08-29 19:46 ` Mark Brown
2 siblings, 0 replies; 26+ messages in thread
From: Lorenzo Stoakes @ 2024-08-27 11:41 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton; +Cc: Liam R . Howlett, Vlastimil Babka
On Fri, Aug 23, 2024 at 09:07:01PM GMT, Lorenzo Stoakes wrote:
[snip]
> void unlink_file_vma_batch_init(struct unlink_vma_file_batch *vb)
> @@ -1426,9 +1536,10 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
> struct vm_area_struct *vma = *vmap;
> unsigned long vma_start = vma->vm_start;
> struct mm_struct *mm = vma->vm_mm;
> - struct vm_area_struct *new_vma, *prev;
> + struct vm_area_struct *new_vma;
> bool faulted_in_anon_vma = true;
> VMA_ITERATOR(vmi, mm, addr);
> + VMG_VMA_STATE(vmg, &vmi, NULL, vma, addr, addr + len);
>
> /*
> * If anonymous vma has not yet been faulted, update new pgoff
> @@ -1439,11 +1550,18 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
> faulted_in_anon_vma = false;
> }
>
> - new_vma = find_vma_prev(mm, addr, &prev);
> + new_vma = find_vma_prev(mm, addr, &vmg.prev);
> if (new_vma && new_vma->vm_start < addr + len)
> return NULL; /* should never get here */
>
> - new_vma = vma_merge_new_vma(&vmi, prev, vma, addr, addr + len, pgoff);
> + vmg.vma = NULL; /* New VMA range. */
> + vmg.pgoff = pgoff;
> + vmg.next = vma_next(&vmi);
> + vma_prev(&vmi);
> + vma_iter_next_range(&vmi);
> +
> + new_vma = vma_merge_new_range(&vmg);
> +
> if (new_vma) {
> /*
> * Source vma may have been merged into new_vma
[snip]
Hi Andrew - could you squash the attached fix-patch into this please? As
there is an issue with a CONFIG_DEBUG_VM check firing when copy_vma()
unnecessarily moves the VMA iterator as reported at [0].
Thanks!
[0]: https://lore.kernel.org/linux-mm/202408271452.c842a71d-lkp@intel.com/
----8<----
From 53b41cc9ddfaf30f8a037f466686d942e0e64943 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Tue, 27 Aug 2024 11:59:27 +0100
Subject: [PATCH] mm: only advance iterator if prev exists
If we have no VMAs prior to us, such as in a case where we are mremap()'ing
a VMA backwards, then we will advance the iterator backwards to 0, before
moving to the original range again.
The intent is to position the iterator at or before the gap, therefore we
must avoid this - this is simply addressed by only advancing the iterator
should vma_prev() yield a result.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202408271452.c842a71d-lkp@intel.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/vma.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index 8a5fa15f46a2..7d948edbbb9e 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -1557,8 +1557,8 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
vmg.vma = NULL; /* New VMA range. */
vmg.pgoff = pgoff;
vmg.next = vma_next(&vmi);
- vma_prev(&vmi);
- vma_iter_next_range(&vmi);
+ if (vma_prev(&vmi))
+ vma_iter_next_range(&vmi);
new_vma = vma_merge_new_range(&vmg);
--
2.46.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v2 01/10] tools: improve vma test Makefile
2024-08-23 20:06 ` [PATCH v2 01/10] tools: improve vma test Makefile Lorenzo Stoakes
@ 2024-08-28 19:16 ` Liam R. Howlett
0 siblings, 0 replies; 26+ messages in thread
From: Liam R. Howlett @ 2024-08-28 19:16 UTC (permalink / raw)
To: Lorenzo Stoakes; +Cc: linux-mm, linux-kernel, Andrew Morton, Vlastimil Babka
* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [240823 16:07]:
> Have vma.o depend on its source dependencies explicitly, as previously
> these were simply being ignored as existing object files were up to date.
>
> This now correctly re-triggers the build if mm/ source is changed as well
> as local source code.
>
> Also set clean as a phony rule.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> tools/testing/vma/Makefile | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/vma/Makefile b/tools/testing/vma/Makefile
> index bfc905d222cf..860fd2311dcc 100644
> --- a/tools/testing/vma/Makefile
> +++ b/tools/testing/vma/Makefile
> @@ -1,6 +1,6 @@
> # SPDX-License-Identifier: GPL-2.0-or-later
>
> -.PHONY: default
> +.PHONY: default clean
>
> default: vma
>
> @@ -9,7 +9,9 @@ include ../shared/shared.mk
> OFILES = $(SHARED_OFILES) vma.o maple-shim.o
> TARGETS = vma
>
> -vma: $(OFILES) vma_internal.h ../../../mm/vma.c ../../../mm/vma.h
> +vma.o: vma.c vma_internal.h ../../../mm/vma.c ../../../mm/vma.h
> +
> +vma: $(OFILES)
> $(CC) $(CFLAGS) -o $@ $(OFILES) $(LDLIBS)
>
> clean:
> --
> 2.46.0
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 02/10] tools: add VMA merge tests
2024-08-23 20:06 ` [PATCH v2 02/10] tools: add VMA merge tests Lorenzo Stoakes
@ 2024-08-28 19:16 ` Liam R. Howlett
0 siblings, 0 replies; 26+ messages in thread
From: Liam R. Howlett @ 2024-08-28 19:16 UTC (permalink / raw)
To: Lorenzo Stoakes; +Cc: linux-mm, linux-kernel, Andrew Morton, Vlastimil Babka
* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [240823 16:07]:
> Add a variety of VMA merge unit tests to assert that the behaviour of VMA
> merge is correct at an abstract level and VMAs are merged or not merged as
> expected.
>
> These are intentionally added _before_ we start refactoring vma_merge() in
> order that we can continually assert correctness throughout the rest of the
> series.
>
> In order to reduce churn going forward, we backport the vma_merge_struct
> data type to the test code which we introduce and use in a future commit,
> and add wrappers around the merge new and existing VMA cases.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> tools/testing/vma/vma.c | 1218 +++++++++++++++++++++++++++++-
> tools/testing/vma/vma_internal.h | 45 +-
> 2 files changed, 1253 insertions(+), 10 deletions(-)
>
> diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
> index 48e033c60d87..9b272633ca9e 100644
> --- a/tools/testing/vma/vma.c
> +++ b/tools/testing/vma/vma.c
> @@ -7,13 +7,43 @@
> #include "maple-shared.h"
> #include "vma_internal.h"
>
> +/* Include so header guard set. */
> +#include "../../../mm/vma.h"
> +
> +static bool fail_prealloc;
> +
> +/* Then override vma_iter_prealloc() so we can choose to fail it. */
> +#define vma_iter_prealloc(vmi, vma) \
> + (fail_prealloc ? -ENOMEM : mas_preallocate(&(vmi)->mas, (vma), GFP_KERNEL))
> +
> /*
> * Directly import the VMA implementation here. Our vma_internal.h wrapper
> * provides userland-equivalent functionality for everything vma.c uses.
> */
> #include "../../../mm/vma.c"
>
> +/*
> + * Temporarily forward-ported from a future in which vmg's are used for merging.
> + */
> +struct vma_merge_struct {
> + struct mm_struct *mm;
> + struct vma_iterator *vmi;
> + pgoff_t pgoff;
> + struct vm_area_struct *prev;
> + struct vm_area_struct *next; /* Modified by vma_merge(). */
> + struct vm_area_struct *vma; /* Either a new VMA or the one being modified. */
> + unsigned long start;
> + unsigned long end;
> + unsigned long flags;
> + struct file *file;
> + struct anon_vma *anon_vma;
> + struct mempolicy *policy;
> + struct vm_userfaultfd_ctx uffd_ctx;
> + struct anon_vma_name *anon_name;
> +};
> +
> const struct vm_operations_struct vma_dummy_vm_ops;
> +static struct anon_vma dummy_anon_vma;
>
> #define ASSERT_TRUE(_expr) \
> do { \
> @@ -28,6 +58,14 @@ const struct vm_operations_struct vma_dummy_vm_ops;
> #define ASSERT_EQ(_val1, _val2) ASSERT_TRUE((_val1) == (_val2))
> #define ASSERT_NE(_val1, _val2) ASSERT_TRUE((_val1) != (_val2))
>
> +static struct task_struct __current;
> +
> +struct task_struct *get_current(void)
> +{
> + return &__current;
> +}
> +
> +/* Helper function to simply allocate a VMA. */
> static struct vm_area_struct *alloc_vma(struct mm_struct *mm,
> unsigned long start,
> unsigned long end,
> @@ -47,22 +85,201 @@ static struct vm_area_struct *alloc_vma(struct mm_struct *mm,
> return ret;
> }
>
> +/* Helper function to allocate a VMA and link it to the tree. */
> +static struct vm_area_struct *alloc_and_link_vma(struct mm_struct *mm,
> + unsigned long start,
> + unsigned long end,
> + pgoff_t pgoff,
> + vm_flags_t flags)
> +{
> + struct vm_area_struct *vma = alloc_vma(mm, start, end, pgoff, flags);
> +
> + if (vma == NULL)
> + return NULL;
> +
> + if (vma_link(mm, vma)) {
> + vm_area_free(vma);
> + return NULL;
> + }
> +
> + /*
> + * Reset this counter which we use to track whether writes have
> + * begun. Linking to the tree will have caused this to be incremented,
> + * which means we will get a false positive otherwise.
> + */
> + vma->vm_lock_seq = -1;
> +
> + return vma;
> +}
> +
> +/* Helper function which provides a wrapper around a merge new VMA operation. */
> +static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
> +{
> + /* vma_merge() needs a VMA to determine mm, anon_vma, and file. */
> + struct vm_area_struct dummy = {
> + .vm_mm = vmg->mm,
> + .vm_flags = vmg->flags,
> + .anon_vma = vmg->anon_vma,
> + .vm_file = vmg->file,
> + };
> +
> + /*
> + * For convenience, get prev and next VMAs. Which the new VMA operation
> + * requires.
> + */
> + vmg->next = vma_next(vmg->vmi);
> + vmg->prev = vma_prev(vmg->vmi);
> +
> + vma_iter_set(vmg->vmi, vmg->start);
> + return vma_merge_new_vma(vmg->vmi, vmg->prev, &dummy, vmg->start,
> + vmg->end, vmg->pgoff);
> +}
> +
> +/*
> + * Helper function which provides a wrapper around a merge existing VMA
> + * operation.
> + */
> +static struct vm_area_struct *merge_existing(struct vma_merge_struct *vmg)
> +{
> + /* vma_merge() needs a VMA to determine mm, anon_vma, and file. */
> + struct vm_area_struct dummy = {
> + .vm_mm = vmg->mm,
> + .vm_flags = vmg->flags,
> + .anon_vma = vmg->anon_vma,
> + .vm_file = vmg->file,
> + };
> +
> + return vma_merge(vmg->vmi, vmg->prev, &dummy, vmg->start, vmg->end,
> + vmg->flags, vmg->pgoff, vmg->policy, vmg->uffd_ctx,
> + vmg->anon_name);
> +}
> +
> +/*
> + * Helper function which provides a wrapper around the expansion of an existing
> + * VMA.
> + */
> +static int expand_existing(struct vma_merge_struct *vmg)
> +{
> + return vma_expand(vmg->vmi, vmg->vma, vmg->start, vmg->end, vmg->pgoff,
> + vmg->next);
> +}
> +
> +/*
> + * Helper function to reset merge state the associated VMA iterator to a
> + * specified new range.
> + */
> +static void vmg_set_range(struct vma_merge_struct *vmg, unsigned long start,
> + unsigned long end, pgoff_t pgoff, vm_flags_t flags)
> +{
> + vma_iter_set(vmg->vmi, start);
> +
> + vmg->prev = NULL;
> + vmg->next = NULL;
> + vmg->vma = NULL;
> +
> + vmg->start = start;
> + vmg->end = end;
> + vmg->pgoff = pgoff;
> + vmg->flags = flags;
> +}
> +
> +/*
> + * Helper function to try to merge a new VMA.
> + *
> + * Update vmg and the iterator for it and try to merge, otherwise allocate a new
> + * VMA, link it to the maple tree and return it.
> + */
> +static struct vm_area_struct *try_merge_new_vma(struct mm_struct *mm,
> + struct vma_merge_struct *vmg,
> + unsigned long start, unsigned long end,
> + pgoff_t pgoff, vm_flags_t flags,
> + bool *was_merged)
> +{
> + struct vm_area_struct *merged;
> +
> + vmg_set_range(vmg, start, end, pgoff, flags);
> +
> + merged = merge_new(vmg);
> + if (merged) {
> + *was_merged = true;
> + return merged;
> + }
> +
> + *was_merged = false;
> + return alloc_and_link_vma(mm, start, end, pgoff, flags);
> +}
> +
> +/*
> + * Helper function to reset the dummy anon_vma to indicate it has not been
> + * duplicated.
> + */
> +static void reset_dummy_anon_vma(void)
> +{
> + dummy_anon_vma.was_cloned = false;
> + dummy_anon_vma.was_unlinked = false;
> +}
> +
> +/*
> + * Helper function to remove all VMAs and destroy the maple tree associated with
> + * a virtual address space. Returns a count of VMAs in the tree.
> + */
> +static int cleanup_mm(struct mm_struct *mm, struct vma_iterator *vmi)
> +{
> + struct vm_area_struct *vma;
> + int count = 0;
> +
> + fail_prealloc = false;
> + reset_dummy_anon_vma();
> +
> + vma_iter_set(vmi, 0);
> + for_each_vma(*vmi, vma) {
> + vm_area_free(vma);
> + count++;
> + }
> +
> + mtree_destroy(&mm->mm_mt);
> + mm->map_count = 0;
> + return count;
> +}
> +
> +/* Helper function to determine if VMA has had vma_start_write() performed. */
> +static bool vma_write_started(struct vm_area_struct *vma)
> +{
> + int seq = vma->vm_lock_seq;
> +
> + /* We reset after each check. */
> + vma->vm_lock_seq = -1;
> +
> + /* The vma_start_write() stub simply increments this value. */
> + return seq > -1;
> +}
> +
> +/* Helper function providing a dummy vm_ops->close() method.*/
> +static void dummy_close(struct vm_area_struct *)
> +{
> +}
> +
> static bool test_simple_merge(void)
> {
> struct vm_area_struct *vma;
> unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
> struct mm_struct mm = {};
> struct vm_area_struct *vma_left = alloc_vma(&mm, 0, 0x1000, 0, flags);
> - struct vm_area_struct *vma_middle = alloc_vma(&mm, 0x1000, 0x2000, 1, flags);
> struct vm_area_struct *vma_right = alloc_vma(&mm, 0x2000, 0x3000, 2, flags);
> VMA_ITERATOR(vmi, &mm, 0x1000);
> + struct vma_merge_struct vmg = {
> + .mm = &mm,
> + .vmi = &vmi,
> + .start = 0x1000,
> + .end = 0x2000,
> + .flags = flags,
> + .pgoff = 1,
> + };
>
> ASSERT_FALSE(vma_link(&mm, vma_left));
> - ASSERT_FALSE(vma_link(&mm, vma_middle));
> ASSERT_FALSE(vma_link(&mm, vma_right));
>
> - vma = vma_merge_new_vma(&vmi, vma_left, vma_middle, 0x1000,
> - 0x2000, 1);
> + vma = merge_new(&vmg);
> ASSERT_NE(vma, NULL);
>
> ASSERT_EQ(vma->vm_start, 0);
> @@ -142,10 +359,17 @@ static bool test_simple_expand(void)
> struct mm_struct mm = {};
> struct vm_area_struct *vma = alloc_vma(&mm, 0, 0x1000, 0, flags);
> VMA_ITERATOR(vmi, &mm, 0);
> + struct vma_merge_struct vmg = {
> + .vmi = &vmi,
> + .vma = vma,
> + .start = 0,
> + .end = 0x3000,
> + .pgoff = 0,
> + };
>
> ASSERT_FALSE(vma_link(&mm, vma));
>
> - ASSERT_FALSE(vma_expand(&vmi, vma, 0, 0x3000, 0, NULL));
> + ASSERT_FALSE(expand_existing(&vmg));
>
> ASSERT_EQ(vma->vm_start, 0);
> ASSERT_EQ(vma->vm_end, 0x3000);
> @@ -178,6 +402,980 @@ static bool test_simple_shrink(void)
> return true;
> }
>
> +static bool test_merge_new(void)
> +{
> + unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
> + struct mm_struct mm = {};
> + VMA_ITERATOR(vmi, &mm, 0);
> + struct vma_merge_struct vmg = {
> + .mm = &mm,
> + .vmi = &vmi,
> + };
> + struct anon_vma_chain dummy_anon_vma_chain_a = {
> + .anon_vma = &dummy_anon_vma,
> + };
> + struct anon_vma_chain dummy_anon_vma_chain_b = {
> + .anon_vma = &dummy_anon_vma,
> + };
> + struct anon_vma_chain dummy_anon_vma_chain_c = {
> + .anon_vma = &dummy_anon_vma,
> + };
> + struct anon_vma_chain dummy_anon_vma_chain_d = {
> + .anon_vma = &dummy_anon_vma,
> + };
> + int count;
> + struct vm_area_struct *vma, *vma_a, *vma_b, *vma_c, *vma_d;
> + bool merged;
> +
> + /*
> + * 0123456789abc
> + * AA B CC
> + */
> + vma_a = alloc_and_link_vma(&mm, 0, 0x2000, 0, flags);
> + ASSERT_NE(vma_a, NULL);
> + /* We give each VMA a single avc so we can test anon_vma duplication. */
> + INIT_LIST_HEAD(&vma_a->anon_vma_chain);
> + list_add(&dummy_anon_vma_chain_a.same_vma, &vma_a->anon_vma_chain);
> +
> + vma_b = alloc_and_link_vma(&mm, 0x3000, 0x4000, 3, flags);
> + ASSERT_NE(vma_b, NULL);
> + INIT_LIST_HEAD(&vma_b->anon_vma_chain);
> + list_add(&dummy_anon_vma_chain_b.same_vma, &vma_b->anon_vma_chain);
> +
> + vma_c = alloc_and_link_vma(&mm, 0xb000, 0xc000, 0xb, flags);
> + ASSERT_NE(vma_c, NULL);
> + INIT_LIST_HEAD(&vma_c->anon_vma_chain);
> + list_add(&dummy_anon_vma_chain_c.same_vma, &vma_c->anon_vma_chain);
> +
> + /*
> + * NO merge.
> + *
> + * 0123456789abc
> + * AA B ** CC
> + */
> + vma_d = try_merge_new_vma(&mm, &vmg, 0x7000, 0x9000, 7, flags, &merged);
> + ASSERT_NE(vma_d, NULL);
> + INIT_LIST_HEAD(&vma_d->anon_vma_chain);
> + list_add(&dummy_anon_vma_chain_d.same_vma, &vma_d->anon_vma_chain);
> + ASSERT_FALSE(merged);
> + ASSERT_EQ(mm.map_count, 4);
> +
> + /*
> + * Merge BOTH sides.
> + *
> + * 0123456789abc
> + * AA*B DD CC
> + */
> + vma_b->anon_vma = &dummy_anon_vma;
> + vma = try_merge_new_vma(&mm, &vmg, 0x2000, 0x3000, 2, flags, &merged);
> + ASSERT_EQ(vma, vma_a);
> + /* Merge with A, delete B. */
> + ASSERT_TRUE(merged);
> + ASSERT_EQ(vma->vm_start, 0);
> + ASSERT_EQ(vma->vm_end, 0x4000);
> + ASSERT_EQ(vma->vm_pgoff, 0);
> + ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
> + ASSERT_TRUE(vma_write_started(vma));
> + ASSERT_EQ(mm.map_count, 3);
> +
> + /*
> + * Merge to PREVIOUS VMA.
> + *
> + * 0123456789abc
> + * AAAA* DD CC
> + */
> + vma = try_merge_new_vma(&mm, &vmg, 0x4000, 0x5000, 4, flags, &merged);
> + ASSERT_EQ(vma, vma_a);
> + /* Extend A. */
> + ASSERT_TRUE(merged);
> + ASSERT_EQ(vma->vm_start, 0);
> + ASSERT_EQ(vma->vm_end, 0x5000);
> + ASSERT_EQ(vma->vm_pgoff, 0);
> + ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
> + ASSERT_TRUE(vma_write_started(vma));
> + ASSERT_EQ(mm.map_count, 3);
> +
> + /*
> + * Merge to NEXT VMA.
> + *
> + * 0123456789abc
> + * AAAAA *DD CC
> + */
> + vma_d->anon_vma = &dummy_anon_vma;
> + vma = try_merge_new_vma(&mm, &vmg, 0x6000, 0x7000, 6, flags, &merged);
> + ASSERT_EQ(vma, vma_d);
> + /* Prepend. */
> + ASSERT_TRUE(merged);
> + ASSERT_EQ(vma->vm_start, 0x6000);
> + ASSERT_EQ(vma->vm_end, 0x9000);
> + ASSERT_EQ(vma->vm_pgoff, 6);
> + ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
> + ASSERT_TRUE(vma_write_started(vma));
> + ASSERT_EQ(mm.map_count, 3);
> +
> + /*
> + * Merge BOTH sides.
> + *
> + * 0123456789abc
> + * AAAAA*DDD CC
> + */
> + vma = try_merge_new_vma(&mm, &vmg, 0x5000, 0x6000, 5, flags, &merged);
> + ASSERT_EQ(vma, vma_a);
> + /* Merge with A, delete D. */
> + ASSERT_TRUE(merged);
> + ASSERT_EQ(vma->vm_start, 0);
> + ASSERT_EQ(vma->vm_end, 0x9000);
> + ASSERT_EQ(vma->vm_pgoff, 0);
> + ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
> + ASSERT_TRUE(vma_write_started(vma));
> + ASSERT_EQ(mm.map_count, 2);
> +
> + /*
> + * Merge to NEXT VMA.
> + *
> + * 0123456789abc
> + * AAAAAAAAA *CC
> + */
> + vma_c->anon_vma = &dummy_anon_vma;
> + vma = try_merge_new_vma(&mm, &vmg, 0xa000, 0xb000, 0xa, flags, &merged);
> + ASSERT_EQ(vma, vma_c);
> + /* Prepend C. */
> + ASSERT_TRUE(merged);
> + ASSERT_EQ(vma->vm_start, 0xa000);
> + ASSERT_EQ(vma->vm_end, 0xc000);
> + ASSERT_EQ(vma->vm_pgoff, 0xa);
> + ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
> + ASSERT_TRUE(vma_write_started(vma));
> + ASSERT_EQ(mm.map_count, 2);
> +
> + /*
> + * Merge BOTH sides.
> + *
> + * 0123456789abc
> + * AAAAAAAAA*CCC
> + */
> + vma = try_merge_new_vma(&mm, &vmg, 0x9000, 0xa000, 0x9, flags, &merged);
> + ASSERT_EQ(vma, vma_a);
> + /* Extend A and delete C. */
> + ASSERT_TRUE(merged);
> + ASSERT_EQ(vma->vm_start, 0);
> + ASSERT_EQ(vma->vm_end, 0xc000);
> + ASSERT_EQ(vma->vm_pgoff, 0);
> + ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
> + ASSERT_TRUE(vma_write_started(vma));
> + ASSERT_EQ(mm.map_count, 1);
> +
> + /*
> + * Final state.
> + *
> + * 0123456789abc
> + * AAAAAAAAAAAAA
> + */
> +
> + count = 0;
> + vma_iter_set(&vmi, 0);
> + for_each_vma(vmi, vma) {
> + ASSERT_NE(vma, NULL);
> + ASSERT_EQ(vma->vm_start, 0);
> + ASSERT_EQ(vma->vm_end, 0xc000);
> + ASSERT_EQ(vma->vm_pgoff, 0);
> + ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
> +
> + vm_area_free(vma);
> + count++;
> + }
> +
> + /* Should only have one VMA left (though freed) after all is done.*/
> + ASSERT_EQ(count, 1);
> +
> + mtree_destroy(&mm.mm_mt);
> + return true;
> +}
> +
> +static bool test_vma_merge_special_flags(void)
> +{
> + unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
> + struct mm_struct mm = {};
> + VMA_ITERATOR(vmi, &mm, 0);
> + struct vma_merge_struct vmg = {
> + .mm = &mm,
> + .vmi = &vmi,
> + };
> + vm_flags_t special_flags[] = { VM_IO, VM_DONTEXPAND, VM_PFNMAP, VM_MIXEDMAP };
> + vm_flags_t all_special_flags = 0;
> + int i;
> + struct vm_area_struct *vma_left, *vma;
> +
> + /* Make sure there aren't new VM_SPECIAL flags. */
> + for (i = 0; i < ARRAY_SIZE(special_flags); i++) {
> + all_special_flags |= special_flags[i];
> + }
> + ASSERT_EQ(all_special_flags, VM_SPECIAL);
> +
> + /*
> + * 01234
> + * AAA
> + */
> + vma_left = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
> + ASSERT_NE(vma_left, NULL);
> +
> + /* 1. Set up new VMA with special flag that would otherwise merge. */
> +
> + /*
> + * 01234
> + * AAA*
> + *
> + * This should merge if not for the VM_SPECIAL flag.
> + */
> + vmg_set_range(&vmg, 0x3000, 0x4000, 3, flags);
> + for (i = 0; i < ARRAY_SIZE(special_flags); i++) {
> + vm_flags_t special_flag = special_flags[i];
> +
> + vma_left->__vm_flags = flags | special_flag;
> + vmg.flags = flags | special_flag;
> + vma = merge_new(&vmg);
> + ASSERT_EQ(vma, NULL);
> + }
> +
> + /* 2. Modify VMA with special flag that would otherwise merge. */
> +
> + /*
> + * 01234
> + * AAAB
> + *
> + * Create a VMA to modify.
> + */
> + vma = alloc_and_link_vma(&mm, 0x3000, 0x4000, 3, flags);
> + ASSERT_NE(vma, NULL);
> + vmg.vma = vma;
> +
> + for (i = 0; i < ARRAY_SIZE(special_flags); i++) {
> + vm_flags_t special_flag = special_flags[i];
> +
> + vma_left->__vm_flags = flags | special_flag;
> + vmg.flags = flags | special_flag;
> + vma = merge_existing(&vmg);
> + ASSERT_EQ(vma, NULL);
> + }
> +
> + cleanup_mm(&mm, &vmi);
> + return true;
> +}
> +
> +static bool test_vma_merge_with_close(void)
> +{
> + unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
> + struct mm_struct mm = {};
> + VMA_ITERATOR(vmi, &mm, 0);
> + struct vma_merge_struct vmg = {
> + .mm = &mm,
> + .vmi = &vmi,
> + };
> + const struct vm_operations_struct vm_ops = {
> + .close = dummy_close,
> + };
> + struct vm_area_struct *vma_next =
> + alloc_and_link_vma(&mm, 0x2000, 0x3000, 2, flags);
> + struct vm_area_struct *vma;
> +
> + /*
> + * When we merge VMAs we sometimes have to delete others as part of the
> + * operation.
> + *
> + * Considering the two possible adjacent VMAs to which a VMA can be
> + * merged:
> + *
> + * [ prev ][ vma ][ next ]
> + *
> + * In no case will we need to delete prev. If the operation is
> + * mergeable, then prev will be extended with one or both of vma and
> + * next deleted.
> + *
> + * As a result, during initial mergeability checks, only
> + * can_vma_merge_before() (which implies the VMA being merged with is
> + * 'next' as shown above) bothers to check to see whether the next VMA
> + * has a vm_ops->close() callback that will need to be called when
> + * removed.
> + *
> + * If it does, then we cannot merge as the resources that the close()
> + * operation potentially clears down are tied only to the existing VMA
> + * range and we have no way of extending those to the nearly merged one.
> + *
> + * We must consider two scenarios:
> + *
> + * A.
> + *
> + * vm_ops->close: - - !NULL
> + * [ prev ][ vma ][ next ]
> + *
> + * Where prev may or may not be present/mergeable.
> + *
> + * This is picked up by a specific check in can_vma_merge_before().
> + *
> + * B.
> + *
> + * vm_ops->close: - !NULL
> + * [ prev ][ vma ]
> + *
> + * Where prev and vma are present and mergeable.
> + *
> + * This is picked up by a specific check in the modified VMA merge.
> + *
> + * IMPORTANT NOTE: We make the assumption that the following case:
> + *
> + * - !NULL NULL
> + * [ prev ][ vma ][ next ]
> + *
> + * Cannot occur, because vma->vm_ops being the same implies the same
> + * vma->vm_file, and therefore this would mean that next->vm_ops->close
> + * would be set too, and thus scenario A would pick this up.
> + */
> +
> + ASSERT_NE(vma_next, NULL);
> +
> + /*
> + * SCENARIO A
> + *
> + * 0123
> + * *N
> + */
> +
> + /* Make the next VMA have a close() callback. */
> + vma_next->vm_ops = &vm_ops;
> +
> + /* Our proposed VMA has characteristics that would otherwise be merged. */
> + vmg_set_range(&vmg, 0x1000, 0x2000, 1, flags);
> +
> + /* The next VMA having a close() operator should cause the merge to fail.*/
> + ASSERT_EQ(merge_new(&vmg), NULL);
> +
> + /* Now create the VMA so we can merge via modified flags */
> + vmg_set_range(&vmg, 0x1000, 0x2000, 1, flags);
> + vma = alloc_and_link_vma(&mm, 0x1000, 0x2000, 1, flags);
> + vmg.vma = vma;
> +
> + /*
> + * The VMA being modified in a way that would otherwise merge should
> + * also fail.
> + */
> + ASSERT_EQ(merge_existing(&vmg), NULL);
> +
> + /* SCENARIO B
> + *
> + * 0123
> + * P*
> + *
> + * In order for this scenario to trigger, the VMA currently being
> + * modified must also have a .close().
> + */
> +
> + /* Reset VMG state. */
> + vmg_set_range(&vmg, 0x1000, 0x2000, 1, flags);
> + /*
> + * Make next unmergeable, and don't let the scenario A check pick this
> + * up, we want to reproduce scenario B only.
> + */
> + vma_next->vm_ops = NULL;
> + vma_next->__vm_flags &= ~VM_MAYWRITE;
> + /* Allocate prev. */
> + vmg.prev = alloc_and_link_vma(&mm, 0, 0x1000, 0, flags);
> + /* Assign a vm_ops->close() function to VMA explicitly. */
> + vma->vm_ops = &vm_ops;
> + vmg.vma = vma;
> + /* Make sure merge does not occur. */
> + ASSERT_EQ(merge_existing(&vmg), NULL);
> +
> + cleanup_mm(&mm, &vmi);
> + return true;
> +}
> +
> +static bool test_vma_merge_new_with_close(void)
> +{
> + unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
> + struct mm_struct mm = {};
> + VMA_ITERATOR(vmi, &mm, 0);
> + struct vma_merge_struct vmg = {
> + .mm = &mm,
> + .vmi = &vmi,
> + };
> + struct vm_area_struct *vma_prev = alloc_and_link_vma(&mm, 0, 0x2000, 0, flags);
> + struct vm_area_struct *vma_next = alloc_and_link_vma(&mm, 0x5000, 0x7000, 5, flags);
> + const struct vm_operations_struct vm_ops = {
> + .close = dummy_close,
> + };
> + struct vm_area_struct *vma;
> +
> + /*
> + * We should allow the partial merge of a proposed new VMA if the
> + * surrounding VMAs have vm_ops->close() hooks (but are otherwise
> + * compatible), e.g.:
> + *
> + * New VMA
> + * A v-------v B
> + * |-----| |-----|
> + * close close
> + *
> + * Since the rule is to not DELETE a VMA with a close operation, this
> + * should be permitted, only rather than expanding A and deleting B, we
> + * should simply expand A and leave B intact, e.g.:
> + *
> + * New VMA
> + * A B
> + * |------------||-----|
> + * close close
> + */
> +
> + /* Have prev and next have a vm_ops->close() hook. */
> + vma_prev->vm_ops = &vm_ops;
> + vma_next->vm_ops = &vm_ops;
> +
> + vmg_set_range(&vmg, 0x2000, 0x5000, 2, flags);
> + vma = merge_new(&vmg);
> + ASSERT_NE(vma, NULL);
> + ASSERT_EQ(vma->vm_start, 0);
> + ASSERT_EQ(vma->vm_end, 0x5000);
> + ASSERT_EQ(vma->vm_pgoff, 0);
> + ASSERT_EQ(vma->vm_ops, &vm_ops);
> + ASSERT_TRUE(vma_write_started(vma));
> + ASSERT_EQ(mm.map_count, 2);
> +
> + cleanup_mm(&mm, &vmi);
> + return true;
> +}
> +
> +static bool test_merge_existing(void)
> +{
> + unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
> + struct mm_struct mm = {};
> + VMA_ITERATOR(vmi, &mm, 0);
> + struct vm_area_struct *vma, *vma_prev, *vma_next;
> + struct vma_merge_struct vmg = {
> + .mm = &mm,
> + .vmi = &vmi,
> + };
> +
> + /*
> + * Merge right case - partial span.
> + *
> + * <->
> + * 0123456789
> + * VVVVNNN
> + * ->
> + * 0123456789
> + * VNNNNNN
> + */
> + vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, flags);
> + vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, flags);
> + vmg_set_range(&vmg, 0x3000, 0x6000, 3, flags);
> + vmg.vma = vma;
> + vmg.prev = vma;
> + vma->anon_vma = &dummy_anon_vma;
> + ASSERT_EQ(merge_existing(&vmg), vma_next);
> + ASSERT_EQ(vma_next->vm_start, 0x3000);
> + ASSERT_EQ(vma_next->vm_end, 0x9000);
> + ASSERT_EQ(vma_next->vm_pgoff, 3);
> + ASSERT_EQ(vma_next->anon_vma, &dummy_anon_vma);
> + ASSERT_EQ(vma->vm_start, 0x2000);
> + ASSERT_EQ(vma->vm_end, 0x3000);
> + ASSERT_EQ(vma->vm_pgoff, 2);
> + ASSERT_TRUE(vma_write_started(vma));
> + ASSERT_TRUE(vma_write_started(vma_next));
> + ASSERT_EQ(mm.map_count, 2);
> +
> + /* Clear down and reset. */
> + ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
> +
> + /*
> + * Merge right case - full span.
> + *
> + * <-->
> + * 0123456789
> + * VVVVNNN
> + * ->
> + * 0123456789
> + * NNNNNNN
> + */
> + vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, flags);
> + vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, flags);
> + vmg_set_range(&vmg, 0x2000, 0x6000, 2, flags);
> + vmg.vma = vma;
> + vma->anon_vma = &dummy_anon_vma;
> + ASSERT_EQ(merge_existing(&vmg), vma_next);
> + ASSERT_EQ(vma_next->vm_start, 0x2000);
> + ASSERT_EQ(vma_next->vm_end, 0x9000);
> + ASSERT_EQ(vma_next->vm_pgoff, 2);
> + ASSERT_EQ(vma_next->anon_vma, &dummy_anon_vma);
> + ASSERT_TRUE(vma_write_started(vma_next));
> + ASSERT_EQ(mm.map_count, 1);
> +
> + /* Clear down and reset. We should have deleted vma. */
> + ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);
> +
> + /*
> + * Merge left case - partial span.
> + *
> + * <->
> + * 0123456789
> + * PPPVVVV
> + * ->
> + * 0123456789
> + * PPPPPPV
> + */
> + vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
> + vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
> + vmg_set_range(&vmg, 0x3000, 0x6000, 3, flags);
> + vmg.prev = vma_prev;
> + vmg.vma = vma;
> + vma->anon_vma = &dummy_anon_vma;
> +
> + ASSERT_EQ(merge_existing(&vmg), vma_prev);
> + ASSERT_EQ(vma_prev->vm_start, 0);
> + ASSERT_EQ(vma_prev->vm_end, 0x6000);
> + ASSERT_EQ(vma_prev->vm_pgoff, 0);
> + ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
> + ASSERT_EQ(vma->vm_start, 0x6000);
> + ASSERT_EQ(vma->vm_end, 0x7000);
> + ASSERT_EQ(vma->vm_pgoff, 6);
> + ASSERT_TRUE(vma_write_started(vma_prev));
> + ASSERT_TRUE(vma_write_started(vma));
> + ASSERT_EQ(mm.map_count, 2);
> +
> + /* Clear down and reset. */
> + ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
> +
> + /*
> + * Merge left case - full span.
> + *
> + * <-->
> + * 0123456789
> + * PPPVVVV
> + * ->
> + * 0123456789
> + * PPPPPPP
> + */
> + vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
> + vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
> + vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
> + vmg.prev = vma_prev;
> + vmg.vma = vma;
> + vma->anon_vma = &dummy_anon_vma;
> + ASSERT_EQ(merge_existing(&vmg), vma_prev);
> + ASSERT_EQ(vma_prev->vm_start, 0);
> + ASSERT_EQ(vma_prev->vm_end, 0x7000);
> + ASSERT_EQ(vma_prev->vm_pgoff, 0);
> + ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
> + ASSERT_TRUE(vma_write_started(vma_prev));
> + ASSERT_EQ(mm.map_count, 1);
> +
> + /* Clear down and reset. We should have deleted vma. */
> + ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);
> +
> + /*
> + * Merge both case.
> + *
> + * <-->
> + * 0123456789
> + * PPPVVVVNNN
> + * ->
> + * 0123456789
> + * PPPPPPPPPP
> + */
> + vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
> + vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
> + vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, flags);
> + vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
> + vmg.prev = vma_prev;
> + vmg.vma = vma;
> + vma->anon_vma = &dummy_anon_vma;
> + ASSERT_EQ(merge_existing(&vmg), vma_prev);
> + ASSERT_EQ(vma_prev->vm_start, 0);
> + ASSERT_EQ(vma_prev->vm_end, 0x9000);
> + ASSERT_EQ(vma_prev->vm_pgoff, 0);
> + ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
> + ASSERT_TRUE(vma_write_started(vma_prev));
> + ASSERT_EQ(mm.map_count, 1);
> +
> + /* Clear down and reset. We should have deleted prev and next. */
> + ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);
> +
> + /*
> + * Non-merge ranges. the modified VMA merge operation assumes that the
> + * caller always specifies ranges within the input VMA so we need only
> + * examine these cases.
> + *
> + * -
> + * -
> + * -
> + * <->
> + * <>
> + * <>
> + * 0123456789a
> + * PPPVVVVVNNN
> + */
> +
> + vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
> + vma = alloc_and_link_vma(&mm, 0x3000, 0x8000, 3, flags);
> + vma_next = alloc_and_link_vma(&mm, 0x8000, 0xa000, 8, flags);
> +
> + vmg_set_range(&vmg, 0x4000, 0x5000, 4, flags);
> + vmg.prev = vma;
> + vmg.vma = vma;
> + ASSERT_EQ(merge_existing(&vmg), NULL);
> +
> + vmg_set_range(&vmg, 0x5000, 0x6000, 5, flags);
> + vmg.prev = vma;
> + vmg.vma = vma;
> + ASSERT_EQ(merge_existing(&vmg), NULL);
> +
> + vmg_set_range(&vmg, 0x6000, 0x7000, 6, flags);
> + vmg.prev = vma;
> + vmg.vma = vma;
> + ASSERT_EQ(merge_existing(&vmg), NULL);
> +
> + vmg_set_range(&vmg, 0x4000, 0x7000, 4, flags);
> + vmg.prev = vma;
> + vmg.vma = vma;
> + ASSERT_EQ(merge_existing(&vmg), NULL);
> +
> + vmg_set_range(&vmg, 0x4000, 0x6000, 4, flags);
> + vmg.prev = vma;
> + vmg.vma = vma;
> + ASSERT_EQ(merge_existing(&vmg), NULL);
> +
> + vmg_set_range(&vmg, 0x5000, 0x6000, 5, flags);
> + vmg.prev = vma;
> + vmg.vma = vma;
> + ASSERT_EQ(merge_existing(&vmg), NULL);
> +
> + ASSERT_EQ(cleanup_mm(&mm, &vmi), 3);
> +
> + return true;
> +}
> +
> +static bool test_anon_vma_non_mergeable(void)
> +{
> + unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
> + struct mm_struct mm = {};
> + VMA_ITERATOR(vmi, &mm, 0);
> + struct vm_area_struct *vma, *vma_prev, *vma_next;
> + struct vma_merge_struct vmg = {
> + .mm = &mm,
> + .vmi = &vmi,
> + };
> + struct anon_vma_chain dummy_anon_vma_chain1 = {
> + .anon_vma = &dummy_anon_vma,
> + };
> + struct anon_vma_chain dummy_anon_vma_chain2 = {
> + .anon_vma = &dummy_anon_vma,
> + };
> +
> + /*
> + * In the case of modified VMA merge, merging both left and right VMAs
> + * but where prev and next have incompatible anon_vma objects, we revert
> + * to a merge of prev and VMA:
> + *
> + * <-->
> + * 0123456789
> + * PPPVVVVNNN
> + * ->
> + * 0123456789
> + * PPPPPPPNNN
> + */
> + vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
> + vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
> + vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, flags);
> +
> + /*
> + * Give both prev and next single anon_vma_chain fields, so they will
> + * merge with the NULL vmg->anon_vma.
> + *
> + * However, when prev is compared to next, the merge should fail.
> + */
> +
> + INIT_LIST_HEAD(&vma_prev->anon_vma_chain);
> + list_add(&dummy_anon_vma_chain1.same_vma, &vma_prev->anon_vma_chain);
> + ASSERT_TRUE(list_is_singular(&vma_prev->anon_vma_chain));
> + vma_prev->anon_vma = &dummy_anon_vma;
> + ASSERT_TRUE(is_mergeable_anon_vma(NULL, vma_prev->anon_vma, vma_prev));
> +
> + INIT_LIST_HEAD(&vma_next->anon_vma_chain);
> + list_add(&dummy_anon_vma_chain2.same_vma, &vma_next->anon_vma_chain);
> + ASSERT_TRUE(list_is_singular(&vma_next->anon_vma_chain));
> + vma_next->anon_vma = (struct anon_vma *)2;
> + ASSERT_TRUE(is_mergeable_anon_vma(NULL, vma_next->anon_vma, vma_next));
> +
> + ASSERT_FALSE(is_mergeable_anon_vma(vma_prev->anon_vma, vma_next->anon_vma, NULL));
> +
> + vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
> + vmg.prev = vma_prev;
> + vmg.vma = vma;
> +
> + ASSERT_EQ(merge_existing(&vmg), vma_prev);
> + ASSERT_EQ(vma_prev->vm_start, 0);
> + ASSERT_EQ(vma_prev->vm_end, 0x7000);
> + ASSERT_EQ(vma_prev->vm_pgoff, 0);
> + ASSERT_TRUE(vma_write_started(vma_prev));
> + ASSERT_FALSE(vma_write_started(vma_next));
> +
> + /* Clear down and reset. */
> + ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
> +
> + /*
> + * Now consider the new VMA case. This is equivalent, only adding a new
> + * VMA in a gap between prev and next.
> + *
> + * <-->
> + * 0123456789
> + * PPP****NNN
> + * ->
> + * 0123456789
> + * PPPPPPPNNN
> + */
> + vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
> + vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, flags);
> +
> + INIT_LIST_HEAD(&vma_prev->anon_vma_chain);
> + list_add(&dummy_anon_vma_chain1.same_vma, &vma_prev->anon_vma_chain);
> + vma_prev->anon_vma = (struct anon_vma *)1;
> +
> + INIT_LIST_HEAD(&vma_next->anon_vma_chain);
> + list_add(&dummy_anon_vma_chain2.same_vma, &vma_next->anon_vma_chain);
> + vma_next->anon_vma = (struct anon_vma *)2;
> +
> + vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
> + vmg.prev = vma_prev;
> +
> + ASSERT_EQ(merge_new(&vmg), vma_prev);
> + ASSERT_EQ(vma_prev->vm_start, 0);
> + ASSERT_EQ(vma_prev->vm_end, 0x7000);
> + ASSERT_EQ(vma_prev->vm_pgoff, 0);
> + ASSERT_TRUE(vma_write_started(vma_prev));
> + ASSERT_FALSE(vma_write_started(vma_next));
> +
> + /* Final cleanup. */
> + ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
> +
> + return true;
> +}
> +
> +static bool test_dup_anon_vma(void)
> +{
> + unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
> + struct mm_struct mm = {};
> + VMA_ITERATOR(vmi, &mm, 0);
> + struct vma_merge_struct vmg = {
> + .mm = &mm,
> + .vmi = &vmi,
> + };
> + struct anon_vma_chain dummy_anon_vma_chain = {
> + .anon_vma = &dummy_anon_vma,
> + };
> + struct vm_area_struct *vma_prev, *vma_next, *vma;
> +
> + reset_dummy_anon_vma();
> +
> + /*
> + * Expanding a VMA delete the next one duplicates next's anon_vma and
> + * assigns it to the expanded VMA.
> + *
> + * This covers new VMA merging, as these operations amount to a VMA
> + * expand.
> + */
> + vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
> + vma_next = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
> + vma_next->anon_vma = &dummy_anon_vma;
> +
> + vmg_set_range(&vmg, 0, 0x5000, 0, flags);
> + vmg.vma = vma_prev;
> + vmg.next = vma_next;
> +
> + ASSERT_EQ(expand_existing(&vmg), 0);
> +
> + /* Will have been cloned. */
> + ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
> + ASSERT_TRUE(vma_prev->anon_vma->was_cloned);
> +
> + /* Cleanup ready for next run. */
> + cleanup_mm(&mm, &vmi);
> +
> + /*
> + * next has anon_vma, we assign to prev.
> + *
> + * |<----->|
> + * |-------*********-------|
> + * prev vma next
> + * extend delete delete
> + */
> +
> + vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
> + vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
> + vma_next = alloc_and_link_vma(&mm, 0x5000, 0x8000, 5, flags);
> +
> + /* Initialise avc so mergeability check passes. */
> + INIT_LIST_HEAD(&vma_next->anon_vma_chain);
> + list_add(&dummy_anon_vma_chain.same_vma, &vma_next->anon_vma_chain);
> +
> + vma_next->anon_vma = &dummy_anon_vma;
> + vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
> + vmg.prev = vma_prev;
> + vmg.vma = vma;
> +
> + ASSERT_EQ(merge_existing(&vmg), vma_prev);
> +
> + ASSERT_EQ(vma_prev->vm_start, 0);
> + ASSERT_EQ(vma_prev->vm_end, 0x8000);
> +
> + ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
> + ASSERT_TRUE(vma_prev->anon_vma->was_cloned);
> +
> + cleanup_mm(&mm, &vmi);
> +
> + /*
> + * vma has anon_vma, we assign to prev.
> + *
> + * |<----->|
> + * |-------*********-------|
> + * prev vma next
> + * extend delete delete
> + */
> +
> + vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
> + vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
> + vma_next = alloc_and_link_vma(&mm, 0x5000, 0x8000, 5, flags);
> +
> + vma->anon_vma = &dummy_anon_vma;
> + vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
> + vmg.prev = vma_prev;
> + vmg.vma = vma;
> +
> + ASSERT_EQ(merge_existing(&vmg), vma_prev);
> +
> + ASSERT_EQ(vma_prev->vm_start, 0);
> + ASSERT_EQ(vma_prev->vm_end, 0x8000);
> +
> + ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
> + ASSERT_TRUE(vma_prev->anon_vma->was_cloned);
> +
> + cleanup_mm(&mm, &vmi);
> +
> + /*
> + * vma has anon_vma, we assign to prev.
> + *
> + * |<----->|
> + * |-------*************
> + * prev vma
> + * extend shrink/delete
> + */
> +
> + vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
> + vma = alloc_and_link_vma(&mm, 0x3000, 0x8000, 3, flags);
> +
> + vma->anon_vma = &dummy_anon_vma;
> + vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
> + vmg.prev = vma_prev;
> + vmg.vma = vma;
> +
> + ASSERT_EQ(merge_existing(&vmg), vma_prev);
> +
> + ASSERT_EQ(vma_prev->vm_start, 0);
> + ASSERT_EQ(vma_prev->vm_end, 0x5000);
> +
> + ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
> + ASSERT_TRUE(vma_prev->anon_vma->was_cloned);
> +
> + cleanup_mm(&mm, &vmi);
> +
> + /*
> + * vma has anon_vma, we assign to next.
> + *
> + * |<----->|
> + * *************-------|
> + * vma next
> + * shrink/delete extend
> + */
> +
> + vma = alloc_and_link_vma(&mm, 0, 0x5000, 0, flags);
> + vma_next = alloc_and_link_vma(&mm, 0x5000, 0x8000, 5, flags);
> +
> + vma->anon_vma = &dummy_anon_vma;
> + vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
> + vmg.prev = vma;
> + vmg.vma = vma;
> +
> + ASSERT_EQ(merge_existing(&vmg), vma_next);
> +
> + ASSERT_EQ(vma_next->vm_start, 0x3000);
> + ASSERT_EQ(vma_next->vm_end, 0x8000);
> +
> + ASSERT_EQ(vma_next->anon_vma, &dummy_anon_vma);
> + ASSERT_TRUE(vma_next->anon_vma->was_cloned);
> +
> + cleanup_mm(&mm, &vmi);
> + return true;
> +}
> +
> +static bool test_vmi_prealloc_fail(void)
> +{
> + unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
> + struct mm_struct mm = {};
> + VMA_ITERATOR(vmi, &mm, 0);
> + struct vma_merge_struct vmg = {
> + .mm = &mm,
> + .vmi = &vmi,
> + };
> + struct vm_area_struct *vma_prev, *vma;
> +
> + /*
> + * We are merging vma into prev, with vma possessing an anon_vma, which
> + * will be duplicated. We cause the vmi preallocation to fail and assert
> + * the duplicated anon_vma is unlinked.
> + */
> +
> + vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
> + vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
> + vma->anon_vma = &dummy_anon_vma;
> +
> + vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
> + vmg.prev = vma_prev;
> + vmg.vma = vma;
> +
> + fail_prealloc = true;
> +
> + /* This will cause the merge to fail. */
> + ASSERT_EQ(merge_existing(&vmg), NULL);
> + /* We will already have assigned the anon_vma. */
> + ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
> + /* And it was both cloned and unlinked. */
> + ASSERT_TRUE(dummy_anon_vma.was_cloned);
> + ASSERT_TRUE(dummy_anon_vma.was_unlinked);
> +
> + cleanup_mm(&mm, &vmi); /* Resets fail_prealloc too. */
> +
> + /*
> + * We repeat the same operation for expanding a VMA, which is what new
> + * VMA merging ultimately uses too. This asserts that unlinking is
> + * performed in this case too.
> + */
> +
> + vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
> + vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
> + vma->anon_vma = &dummy_anon_vma;
> +
> + vmg_set_range(&vmg, 0, 0x5000, 3, flags);
> + vmg.vma = vma_prev;
> + vmg.next = vma;
> +
> + fail_prealloc = true;
> + ASSERT_EQ(expand_existing(&vmg), -ENOMEM);
> +
> + ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
> + ASSERT_TRUE(dummy_anon_vma.was_cloned);
> + ASSERT_TRUE(dummy_anon_vma.was_unlinked);
> +
> + cleanup_mm(&mm, &vmi);
> + return true;
> +}
> +
> int main(void)
> {
> int num_tests = 0, num_fail = 0;
> @@ -193,11 +1391,21 @@ int main(void)
> } \
> } while (0)
>
> + /* Very simple tests to kick the tyres. */
> TEST(simple_merge);
> TEST(simple_modify);
> TEST(simple_expand);
> TEST(simple_shrink);
>
> + TEST(merge_new);
> + TEST(vma_merge_special_flags);
> + TEST(vma_merge_with_close);
> + TEST(vma_merge_new_with_close);
> + TEST(merge_existing);
> + TEST(anon_vma_non_mergeable);
> + TEST(dup_anon_vma);
> + TEST(vmi_prealloc_fail);
> +
> #undef TEST
>
> printf("%d tests run, %d passed, %d failed.\n",
> diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
> index 093560e5b2ac..a3c262c6eb73 100644
> --- a/tools/testing/vma/vma_internal.h
> +++ b/tools/testing/vma/vma_internal.h
> @@ -81,8 +81,6 @@
>
> #define AS_MM_ALL_LOCKS 2
>
> -#define current NULL
> -
> /* We hardcode this for now. */
> #define sysctl_max_map_count 0x1000000UL
>
> @@ -92,6 +90,12 @@ typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;
> typedef unsigned long vm_flags_t;
> typedef __bitwise unsigned int vm_fault_t;
>
> +/*
> + * The shared stubs do not implement this, it amounts to an fprintf(STDERR,...)
> + * either way :)
> + */
> +#define pr_warn_once pr_err
> +
> typedef struct refcount_struct {
> atomic_t refs;
> } refcount_t;
> @@ -100,9 +104,30 @@ struct kref {
> refcount_t refcount;
> };
>
> +/*
> + * Define the task command name length as enum, then it can be visible to
> + * BPF programs.
> + */
> +enum {
> + TASK_COMM_LEN = 16,
> +};
> +
> +struct task_struct {
> + char comm[TASK_COMM_LEN];
> + pid_t pid;
> + struct mm_struct *mm;
> +};
> +
> +struct task_struct *get_current(void);
> +#define current get_current()
> +
> struct anon_vma {
> struct anon_vma *root;
> struct rb_root_cached rb_root;
> +
> + /* Test fields. */
> + bool was_cloned;
> + bool was_unlinked;
> };
>
> struct anon_vma_chain {
> @@ -682,13 +707,21 @@ static inline int vma_dup_policy(struct vm_area_struct *, struct vm_area_struct
> return 0;
> }
>
> -static inline int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *)
> +static inline int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> {
> + /* For testing purposes. We indicate that an anon_vma has been cloned. */
> + if (src->anon_vma != NULL) {
> + dst->anon_vma = src->anon_vma;
> + dst->anon_vma->was_cloned = true;
> + }
> +
> return 0;
> }
>
> -static inline void vma_start_write(struct vm_area_struct *)
> +static inline void vma_start_write(struct vm_area_struct *vma)
> {
> + /* Used to indicate to tests that a write operation has begun. */
> + vma->vm_lock_seq++;
> }
>
> static inline void vma_adjust_trans_huge(struct vm_area_struct *vma,
> @@ -759,8 +792,10 @@ static inline void vma_assert_write_locked(struct vm_area_struct *)
> {
> }
>
> -static inline void unlink_anon_vmas(struct vm_area_struct *)
> +static inline void unlink_anon_vmas(struct vm_area_struct *vma)
> {
> + /* For testing purposes, indicate that the anon_vma was unlinked. */
> + vma->anon_vma->was_unlinked = true;
> }
>
> static inline void anon_vma_unlock_write(struct anon_vma *)
> --
> 2.46.0
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 03/10] mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify()
2024-08-23 20:06 ` [PATCH v2 03/10] mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify() Lorenzo Stoakes
@ 2024-08-28 19:35 ` Liam R. Howlett
2024-08-30 13:28 ` Lorenzo Stoakes
0 siblings, 1 reply; 26+ messages in thread
From: Liam R. Howlett @ 2024-08-28 19:35 UTC (permalink / raw)
To: Lorenzo Stoakes; +Cc: linux-mm, linux-kernel, Andrew Morton, Vlastimil Babka
* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [240823 16:07]:
> Rather than passing around huge numbers of parameters to numerous helper
> functions, abstract them into a single struct that we thread through the
> operation, the vma_merge_struct ('vmg').
>
> Adjust vma_merge() and vma_modify() to accept this parameter, as well as
> predicate functions can_vma_merge_before(), can_vma_merge_after(), and the
> vma_modify_...() helper functions.
>
> Also introduce VMG_STATE() and VMG_VMA_STATE() helper macros to allow for
> easy vmg declaration.
>
> We additionally remove the requirement that vma_merge() is passed a VMA
> object representing the candidate new VMA. Previously it used this to
> obtain the mm_struct, file and anon_vma properties of the proposed range (a
> rather confusing state of affairs), which are now provided by the vmg
> directly.
>
> We also remove the pgoff calculation previously performed vma_modify(), and
> instead calculate this in VMG_VMA_STATE() via the vma_pgoff_offset()
> helper.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Some small things..
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> mm/mmap.c | 77 ++++++++-------
> mm/vma.c | 209 ++++++++++++++++++++++++----------------
> mm/vma.h | 128 ++++++++++++++----------
> tools/testing/vma/vma.c | 43 +--------
> 4 files changed, 249 insertions(+), 208 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index e495b0381265..4066c0444495 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1373,9 +1373,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> unsigned long end = addr + len;
> unsigned long merge_start = addr, merge_end = end;
> bool writable_file_mapping = false;
> - pgoff_t vm_pgoff;
> int error = -ENOMEM;
> VMA_ITERATOR(vmi, mm, addr);
> + VMG_STATE(vmg, mm, &vmi, addr, end, vm_flags, pgoff);
> +
> + vmg.file = file;
>
Extra whitespace here.
> /* Find the first overlapping VMA */
> vma = vma_find(&vmi, end);
> @@ -1388,12 +1390,12 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> if (vms_gather_munmap_vmas(&vms, &mas_detach))
> return -ENOMEM;
>
> - next = vms.next;
> - prev = vms.prev;
> + next = vmg.next = vms.next;
> + prev = vmg.prev = vms.prev;
> vma = NULL;
> } else {
> - next = vma_next(&vmi);
> - prev = vma_prev(&vmi);
> + next = vmg.next = vma_next(&vmi);
> + prev = vmg.prev = vma_prev(&vmi);
> if (prev)
> vma_iter_next_range(&vmi);
> }
> @@ -1413,6 +1415,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
>
> vms.nr_accounted = 0;
> vm_flags |= VM_ACCOUNT;
> + vmg.flags = vm_flags;
> }
>
> if (vm_flags & VM_SPECIAL)
> @@ -1421,28 +1424,31 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> /* Attempt to expand an old mapping */
> /* Check next */
> if (next && next->vm_start == end && !vma_policy(next) &&
> - can_vma_merge_before(next, vm_flags, NULL, file, pgoff+pglen,
> - NULL_VM_UFFD_CTX, NULL)) {
> + can_vma_merge_before(&vmg)) {
> merge_end = next->vm_end;
> vma = next;
> - vm_pgoff = next->vm_pgoff - pglen;
> + vmg.pgoff = next->vm_pgoff - pglen;
> + /*
> + * We set this here so if we will merge with the previous VMA in
> + * the code below, can_vma_merge_after() ensures anon_vma
> + * compatibility between prev and next.
> + */
> + vmg.anon_vma = vma->anon_vma;
> + vmg.uffd_ctx = vma->vm_userfaultfd_ctx;
> }
>
> /* Check prev */
> if (prev && prev->vm_end == addr && !vma_policy(prev) &&
> - (vma ? can_vma_merge_after(prev, vm_flags, vma->anon_vma, file,
> - pgoff, vma->vm_userfaultfd_ctx, NULL) :
> - can_vma_merge_after(prev, vm_flags, NULL, file, pgoff,
> - NULL_VM_UFFD_CTX, NULL))) {
> + can_vma_merge_after(&vmg)) {
> merge_start = prev->vm_start;
> vma = prev;
> - vm_pgoff = prev->vm_pgoff;
> + vmg.pgoff = prev->vm_pgoff;
> vma_prev(&vmi); /* Equivalent to going to the previous range */
> }
>
> if (vma) {
> /* Actually expand, if possible */
> - if (!vma_expand(&vmi, vma, merge_start, merge_end, vm_pgoff, next)) {
> + if (!vma_expand(&vmi, vma, merge_start, merge_end, vmg.pgoff, next)) {
> khugepaged_enter_vma(vma, vm_flags);
> goto expanded;
> }
> @@ -1772,26 +1778,29 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> * Expand the existing vma if possible; Note that singular lists do not
> * occur after forking, so the expand will only happen on new VMAs.
> */
> - if (vma && vma->vm_end == addr && !vma_policy(vma) &&
> - can_vma_merge_after(vma, flags, NULL, NULL,
> - addr >> PAGE_SHIFT, NULL_VM_UFFD_CTX, NULL)) {
> - vma_iter_config(vmi, vma->vm_start, addr + len);
> - if (vma_iter_prealloc(vmi, vma))
> - goto unacct_fail;
> -
> - vma_start_write(vma);
> -
> - init_vma_prep(&vp, vma);
> - vma_prepare(&vp);
> - vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0);
> - vma->vm_end = addr + len;
> - vm_flags_set(vma, VM_SOFTDIRTY);
> - vma_iter_store(vmi, vma);
> -
> - vma_complete(&vp, vmi, mm);
> - validate_mm(mm);
> - khugepaged_enter_vma(vma, flags);
> - goto out;
> + if (vma && vma->vm_end == addr && !vma_policy(vma)) {
> + VMG_STATE(vmg, mm, vmi, addr, addr + len, flags, PHYS_PFN(addr));
> +
> + vmg.prev = vma;
> + if (can_vma_merge_after(&vmg)) {
> + vma_iter_config(vmi, vma->vm_start, addr + len);
> + if (vma_iter_prealloc(vmi, vma))
> + goto unacct_fail;
> +
> + vma_start_write(vma);
> +
> + init_vma_prep(&vp, vma);
> + vma_prepare(&vp);
> + vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0);
> + vma->vm_end = addr + len;
> + vm_flags_set(vma, VM_SOFTDIRTY);
> + vma_iter_store(vmi, vma);
> +
> + vma_complete(&vp, vmi, mm);
> + validate_mm(mm);
> + khugepaged_enter_vma(vma, flags);
> + goto out;
> + }
> }
>
> if (vma)
> diff --git a/mm/vma.c b/mm/vma.c
> index e5a5e418c7cb..74c627ff0313 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -7,16 +7,18 @@
> #include "vma_internal.h"
> #include "vma.h"
>
> -/*
> - * If the vma has a ->close operation then the driver probably needs to release
> - * per-vma resources, so we don't attempt to merge those if the caller indicates
> - * the current vma may be removed as part of the merge.
> - */
> -static inline bool is_mergeable_vma(struct vm_area_struct *vma,
> - struct file *file, unsigned long vm_flags,
> - struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> - struct anon_vma_name *anon_name, bool may_remove_vma)
> +static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_next)
> {
> + struct vm_area_struct *vma = merge_next ? vmg->next : vmg->prev;
> + /*
> + * If the vma has a ->close operation then the driver probably needs to
> + * release per-vma resources, so we don't attempt to merge those if the
> + * caller indicates the current vma may be removed as part of the merge,
> + * which is the case if we are attempting to merge the next VMA into
> + * this one.
> + */
> + bool may_remove_vma = merge_next;
> +
> /*
> * VM_SOFTDIRTY should not prevent from VMA merging, if we
> * match the flags but dirty bit -- the caller should mark
> @@ -25,15 +27,15 @@ static inline bool is_mergeable_vma(struct vm_area_struct *vma,
> * the kernel to generate new VMAs when old one could be
> * extended instead.
> */
> - if ((vma->vm_flags ^ vm_flags) & ~VM_SOFTDIRTY)
> + if ((vma->vm_flags ^ vmg->flags) & ~VM_SOFTDIRTY)
> return false;
> - if (vma->vm_file != file)
> + if (vma->vm_file != vmg->file)
> return false;
> if (may_remove_vma && vma->vm_ops && vma->vm_ops->close)
> return false;
> - if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx))
> + if (!is_mergeable_vm_userfaultfd_ctx(vma, vmg->uffd_ctx))
> return false;
> - if (!anon_vma_name_eq(anon_vma_name(vma), anon_name))
> + if (!anon_vma_name_eq(anon_vma_name(vma), vmg->anon_name))
> return false;
> return true;
> }
> @@ -94,16 +96,16 @@ static void init_multi_vma_prep(struct vma_prepare *vp,
> * We assume the vma may be removed as part of the merge.
> */
> bool
> -can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
> - struct anon_vma *anon_vma, struct file *file,
> - pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> - struct anon_vma_name *anon_name)
> +can_vma_merge_before(struct vma_merge_struct *vmg)
> {
> - if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name, true) &&
> - is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
> - if (vma->vm_pgoff == vm_pgoff)
> + pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start);
> +
> + if (is_mergeable_vma(vmg, true) &&
> + is_mergeable_anon_vma(vmg->anon_vma, vmg->next->anon_vma, vmg->next)) {
> + if (vmg->next->vm_pgoff == vmg->pgoff + pglen)
> return true;
> }
> +
> return false;
> }
>
> @@ -116,18 +118,11 @@ can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
> *
> * We assume that vma is not removed as part of the merge.
> */
> -bool
> -can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
> - struct anon_vma *anon_vma, struct file *file,
> - pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> - struct anon_vma_name *anon_name)
> +bool can_vma_merge_after(struct vma_merge_struct *vmg)
> {
> - if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name, false) &&
> - is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
> - pgoff_t vm_pglen;
> -
> - vm_pglen = vma_pages(vma);
> - if (vma->vm_pgoff + vm_pglen == vm_pgoff)
> + if (is_mergeable_vma(vmg, false) &&
I started putting /* merge_next = */ false, so that it's easier to read.
Not a whole lot of arguments, but it is still useful to remember what is
passed.
> + is_mergeable_anon_vma(vmg->anon_vma, vmg->prev->anon_vma, vmg->prev)) {
> + if (vmg->prev->vm_pgoff + vma_pages(vmg->prev) == vmg->pgoff)
> return true;
> }
> return false;
> @@ -1022,16 +1017,10 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
> * **** is not represented - it will be merged and the vma containing the
> * area is returned, or the function will return NULL
> */
> -static struct vm_area_struct
> -*vma_merge(struct vma_iterator *vmi, struct vm_area_struct *prev,
> - struct vm_area_struct *src, unsigned long addr, unsigned long end,
> - unsigned long vm_flags, pgoff_t pgoff, struct mempolicy *policy,
> - struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> - struct anon_vma_name *anon_name)
> +static struct vm_area_struct *vma_merge(struct vma_merge_struct *vmg)
> {
> - struct mm_struct *mm = src->vm_mm;
> - struct anon_vma *anon_vma = src->anon_vma;
> - struct file *file = src->vm_file;
> + struct mm_struct *mm = vmg->mm;
> + struct vm_area_struct *prev = vmg->prev;
> struct vm_area_struct *curr, *next, *res;
> struct vm_area_struct *vma, *adjust, *remove, *remove2;
> struct vm_area_struct *anon_dup = NULL;
> @@ -1041,16 +1030,18 @@ static struct vm_area_struct
> bool merge_prev = false;
> bool merge_next = false;
> bool vma_expanded = false;
> + unsigned long addr = vmg->start;
> + unsigned long end = vmg->end;
> unsigned long vma_start = addr;
> unsigned long vma_end = end;
> - pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
> + pgoff_t pglen = PHYS_PFN(end - addr);
> long adj_start = 0;
>
> /*
> * We later require that vma->vm_flags == vm_flags,
> * so this tests vma->vm_flags & VM_SPECIAL, too.
> */
> - if (vm_flags & VM_SPECIAL)
> + if (vmg->flags & VM_SPECIAL)
> return NULL;
>
> /* Does the input range span an existing VMA? (cases 5 - 8) */
> @@ -1058,27 +1049,26 @@ static struct vm_area_struct
>
> if (!curr || /* cases 1 - 4 */
> end == curr->vm_end) /* cases 6 - 8, adjacent VMA */
> - next = vma_lookup(mm, end);
> + next = vmg->next = vma_lookup(mm, end);
> else
> - next = NULL; /* case 5 */
> + next = vmg->next = NULL; /* case 5 */
>
> if (prev) {
> vma_start = prev->vm_start;
> vma_pgoff = prev->vm_pgoff;
>
> /* Can we merge the predecessor? */
> - if (addr == prev->vm_end && mpol_equal(vma_policy(prev), policy)
> - && can_vma_merge_after(prev, vm_flags, anon_vma, file,
> - pgoff, vm_userfaultfd_ctx, anon_name)) {
> + if (addr == prev->vm_end && mpol_equal(vma_policy(prev), vmg->policy)
> + && can_vma_merge_after(vmg)) {
> +
> merge_prev = true;
> - vma_prev(vmi);
> + vma_prev(vmg->vmi);
> }
> }
>
> /* Can we merge the successor? */
> - if (next && mpol_equal(policy, vma_policy(next)) &&
> - can_vma_merge_before(next, vm_flags, anon_vma, file, pgoff+pglen,
> - vm_userfaultfd_ctx, anon_name)) {
> + if (next && mpol_equal(vmg->policy, vma_policy(next)) &&
> + can_vma_merge_before(vmg)) {
> merge_next = true;
> }
>
> @@ -1129,7 +1119,7 @@ static struct vm_area_struct
> remove = curr;
> } else { /* case 5 */
> adjust = curr;
> - adj_start = (end - curr->vm_start);
> + adj_start = end - curr->vm_start;
unnecessary change, if you respin.. but it's temporary.
> }
> if (!err)
> err = dup_anon_vma(prev, curr, &anon_dup);
> @@ -1169,13 +1159,13 @@ static struct vm_area_struct
> vma_expanded = true;
>
> if (vma_expanded) {
> - vma_iter_config(vmi, vma_start, vma_end);
> + vma_iter_config(vmg->vmi, vma_start, vma_end);
> } else {
> - vma_iter_config(vmi, adjust->vm_start + adj_start,
> + vma_iter_config(vmg->vmi, adjust->vm_start + adj_start,
> adjust->vm_end);
> }
>
> - if (vma_iter_prealloc(vmi, vma))
> + if (vma_iter_prealloc(vmg->vmi, vma))
> goto prealloc_fail;
>
> init_multi_vma_prep(&vp, vma, adjust, remove, remove2);
> @@ -1187,20 +1177,20 @@ static struct vm_area_struct
> vma_set_range(vma, vma_start, vma_end, vma_pgoff);
>
> if (vma_expanded)
> - vma_iter_store(vmi, vma);
> + vma_iter_store(vmg->vmi, vma);
>
> if (adj_start) {
> adjust->vm_start += adj_start;
> adjust->vm_pgoff += adj_start >> PAGE_SHIFT;
> if (adj_start < 0) {
> WARN_ON(vma_expanded);
> - vma_iter_store(vmi, next);
> + vma_iter_store(vmg->vmi, next);
> }
> }
>
> - vma_complete(&vp, vmi, mm);
> + vma_complete(&vp, vmg->vmi, mm);
> validate_mm(mm);
> - khugepaged_enter_vma(res, vm_flags);
> + khugepaged_enter_vma(res, vmg->flags);
> return res;
>
> prealloc_fail:
> @@ -1208,8 +1198,8 @@ static struct vm_area_struct
> unlink_anon_vmas(anon_dup);
>
> anon_vma_fail:
> - vma_iter_set(vmi, addr);
> - vma_iter_load(vmi);
> + vma_iter_set(vmg->vmi, addr);
> + vma_iter_load(vmg->vmi);
> return NULL;
> }
>
> @@ -1226,32 +1216,27 @@ static struct vm_area_struct
> * The function returns either the merged VMA, the original VMA if a split was
> * required instead, or an error if the split failed.
> */
> -struct vm_area_struct *vma_modify(struct vma_iterator *vmi,
> - struct vm_area_struct *prev,
> - struct vm_area_struct *vma,
> - unsigned long start, unsigned long end,
> - unsigned long vm_flags,
> - struct mempolicy *policy,
> - struct vm_userfaultfd_ctx uffd_ctx,
> - struct anon_vma_name *anon_name)
> +static struct vm_area_struct *vma_modify(struct vma_merge_struct *vmg)
> {
> - pgoff_t pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
> + struct vm_area_struct *vma = vmg->vma;
> struct vm_area_struct *merged;
>
> - merged = vma_merge(vmi, prev, vma, start, end, vm_flags,
> - pgoff, policy, uffd_ctx, anon_name);
> + /* First, try to merge. */
> + merged = vma_merge(vmg);
> if (merged)
> return merged;
>
> - if (vma->vm_start < start) {
> - int err = split_vma(vmi, vma, start, 1);
> + /* Split any preceding portion of the VMA. */
> + if (vma->vm_start < vmg->start) {
> + int err = split_vma(vmg->vmi, vma, vmg->start, 1);
>
> if (err)
> return ERR_PTR(err);
> }
>
> - if (vma->vm_end > end) {
> - int err = split_vma(vmi, vma, end, 0);
> + /* Split any trailing portion of the VMA. */
> + if (vma->vm_end > vmg->end) {
> + int err = split_vma(vmg->vmi, vma, vmg->end, 0);
>
> if (err)
> return ERR_PTR(err);
> @@ -1260,6 +1245,65 @@ struct vm_area_struct *vma_modify(struct vma_iterator *vmi,
> return vma;
> }
>
> +struct vm_area_struct *vma_modify_flags(
> + struct vma_iterator *vmi, struct vm_area_struct *prev,
> + struct vm_area_struct *vma, unsigned long start, unsigned long end,
> + unsigned long new_flags)
> +{
> + VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
> +
> + vmg.flags = new_flags;
> +
> + return vma_modify(&vmg);
> +}
> +
> +struct vm_area_struct
> +*vma_modify_flags_name(struct vma_iterator *vmi,
> + struct vm_area_struct *prev,
> + struct vm_area_struct *vma,
> + unsigned long start,
> + unsigned long end,
> + unsigned long new_flags,
> + struct anon_vma_name *new_name)
> +{
> + VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
> +
> + vmg.flags = new_flags;
> + vmg.anon_name = new_name;
> +
> + return vma_modify(&vmg);
> +}
> +
> +struct vm_area_struct
> +*vma_modify_policy(struct vma_iterator *vmi,
> + struct vm_area_struct *prev,
> + struct vm_area_struct *vma,
> + unsigned long start, unsigned long end,
> + struct mempolicy *new_pol)
> +{
> + VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
> +
> + vmg.policy = new_pol;
> +
> + return vma_modify(&vmg);
> +}
> +
> +struct vm_area_struct
> +*vma_modify_flags_uffd(struct vma_iterator *vmi,
> + struct vm_area_struct *prev,
> + struct vm_area_struct *vma,
> + unsigned long start, unsigned long end,
> + unsigned long new_flags,
> + struct vm_userfaultfd_ctx new_ctx)
> +{
> + VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
> +
> + vmg.flags = new_flags;
> + vmg.uffd_ctx = new_ctx;
> +
> + return vma_modify(&vmg);
> +}
> +
> /*
> * Attempt to merge a newly mapped VMA with those adjacent to it. The caller
> * must ensure that [start, end) does not overlap any existing VMA.
> @@ -1269,8 +1313,11 @@ struct vm_area_struct
> struct vm_area_struct *vma, unsigned long start,
> unsigned long end, pgoff_t pgoff)
> {
> - return vma_merge(vmi, prev, vma, start, end, vma->vm_flags, pgoff,
> - vma_policy(vma), vma->vm_userfaultfd_ctx, anon_vma_name(vma));
> + VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
> +
> + vmg.pgoff = pgoff;
> +
> + return vma_merge(&vmg);
> }
>
> /*
> @@ -1281,12 +1328,10 @@ struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi,
> struct vm_area_struct *vma,
> unsigned long delta)
> {
> - pgoff_t pgoff = vma->vm_pgoff + vma_pages(vma);
> + VMG_VMA_STATE(vmg, vmi, vma, vma, vma->vm_end, vma->vm_end + delta);
>
> /* vma is specified as prev, so case 1 or 2 will apply. */
> - return vma_merge(vmi, vma, vma, vma->vm_end, vma->vm_end + delta,
> - vma->vm_flags, pgoff, vma_policy(vma),
> - vma->vm_userfaultfd_ctx, anon_vma_name(vma));
> + return vma_merge(&vmg);
> }
Many of these are small and can be static inline, some (all?) in the
header.
>
> void unlink_file_vma_batch_init(struct unlink_vma_file_batch *vb)
> diff --git a/mm/vma.h b/mm/vma.h
> index c774642697a0..9060a1a1edfa 100644
> --- a/mm/vma.h
> +++ b/mm/vma.h
> @@ -52,6 +52,59 @@ struct vma_munmap_struct {
> unsigned long data_vm;
> };
>
> +/* Represents a VMA merge operation. */
> +struct vma_merge_struct {
> + struct mm_struct *mm;
> + struct vma_iterator *vmi;
> + pgoff_t pgoff;
> + struct vm_area_struct *prev;
> + struct vm_area_struct *next; /* Modified by vma_merge(). */
> + struct vm_area_struct *vma; /* Either a new VMA or the one being modified. */
> + unsigned long start;
> + unsigned long end;
> + unsigned long flags;
> + struct file *file;
> + struct anon_vma *anon_vma;
> + struct mempolicy *policy;
> + struct vm_userfaultfd_ctx uffd_ctx;
> + struct anon_vma_name *anon_name;
> +};
> +
> +/* Assumes addr >= vma->vm_start. */
> +static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
> + unsigned long addr)
> +{
> + return vma->vm_pgoff + PHYS_PFN(addr - vma->vm_start);
> +}
> +
> +#define VMG_STATE(name, mm_, vmi_, start_, end_, flags_, pgoff_) \
> + struct vma_merge_struct name = { \
> + .mm = mm_, \
> + .vmi = vmi_, \
> + .start = start_, \
> + .end = end_, \
> + .flags = flags_, \
> + .pgoff = pgoff_, \
> + }
> +
> +#define VMG_VMA_STATE(name, vmi_, prev_, vma_, start_, end_) \
> + struct vma_merge_struct name = { \
> + .mm = vma_->vm_mm, \
> + .vmi = vmi_, \
> + .prev = prev_, \
> + .next = NULL, \
> + .vma = vma_, \
> + .start = start_, \
> + .end = end_, \
> + .flags = vma_->vm_flags, \
> + .pgoff = vma_pgoff_offset(vma_, start_), \
> + .file = vma_->vm_file, \
> + .anon_vma = vma_->anon_vma, \
> + .policy = vma_policy(vma_), \
> + .uffd_ctx = vma_->vm_userfaultfd_ctx, \
> + .anon_name = anon_vma_name(vma_), \
> + }
> +
> #ifdef CONFIG_DEBUG_VM_MAPLE_TREE
> void validate_mm(struct mm_struct *mm);
> #else
> @@ -208,80 +261,53 @@ void remove_vma(struct vm_area_struct *vma, bool unreachable, bool closed);
> void unmap_region(struct ma_state *mas, struct vm_area_struct *vma,
> struct vm_area_struct *prev, struct vm_area_struct *next);
>
> -/* Required by mmap_region(). */
> -bool
> -can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
> - struct anon_vma *anon_vma, struct file *file,
> - pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> - struct anon_vma_name *anon_name);
> -
> -/* Required by mmap_region() and do_brk_flags(). */
> -bool
> -can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
> - struct anon_vma *anon_vma, struct file *file,
> - pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> - struct anon_vma_name *anon_name);
> -
> -struct vm_area_struct *vma_modify(struct vma_iterator *vmi,
> - struct vm_area_struct *prev,
> - struct vm_area_struct *vma,
> - unsigned long start, unsigned long end,
> - unsigned long vm_flags,
> - struct mempolicy *policy,
> - struct vm_userfaultfd_ctx uffd_ctx,
> - struct anon_vma_name *anon_name);
> +/*
> + * Can we merge the VMA described by vmg into the following VMA vmg->next?
> + *
> + * Required by mmap_region().
> + */
> +bool can_vma_merge_before(struct vma_merge_struct *vmg);
> +
> +/*
> + * Can we merge the VMA described by vmg into the preceding VMA vmg->prev?
> + *
> + * Required by mmap_region() and do_brk_flags().
> + */
> +bool can_vma_merge_after(struct vma_merge_struct *vmg);
>
> /* We are about to modify the VMA's flags. */
> -static inline struct vm_area_struct
> -*vma_modify_flags(struct vma_iterator *vmi,
> - struct vm_area_struct *prev,
> - struct vm_area_struct *vma,
> - unsigned long start, unsigned long end,
> - unsigned long new_flags)
> -{
> - return vma_modify(vmi, prev, vma, start, end, new_flags,
> - vma_policy(vma), vma->vm_userfaultfd_ctx,
> - anon_vma_name(vma));
> -}
> +struct vm_area_struct *vma_modify_flags(struct vma_iterator *vmi,
> + struct vm_area_struct *prev,
> + struct vm_area_struct *vma,
> + unsigned long start, unsigned long end,
> + unsigned long new_flags);
Does this get better with two tab indent?
>
> /* We are about to modify the VMA's flags and/or anon_name. */
> -static inline struct vm_area_struct
> +struct vm_area_struct
> *vma_modify_flags_name(struct vma_iterator *vmi,
> struct vm_area_struct *prev,
> struct vm_area_struct *vma,
> unsigned long start,
> unsigned long end,
> unsigned long new_flags,
> - struct anon_vma_name *new_name)
> -{
> - return vma_modify(vmi, prev, vma, start, end, new_flags,
> - vma_policy(vma), vma->vm_userfaultfd_ctx, new_name);
> -}
> + struct anon_vma_name *new_name);
>
> /* We are about to modify the VMA's memory policy. */
> -static inline struct vm_area_struct
> +struct vm_area_struct
> *vma_modify_policy(struct vma_iterator *vmi,
> struct vm_area_struct *prev,
> struct vm_area_struct *vma,
> unsigned long start, unsigned long end,
> - struct mempolicy *new_pol)
> -{
> - return vma_modify(vmi, prev, vma, start, end, vma->vm_flags,
> - new_pol, vma->vm_userfaultfd_ctx, anon_vma_name(vma));
> -}
> + struct mempolicy *new_pol);
>
> /* We are about to modify the VMA's flags and/or uffd context. */
> -static inline struct vm_area_struct
> +struct vm_area_struct
> *vma_modify_flags_uffd(struct vma_iterator *vmi,
> struct vm_area_struct *prev,
> struct vm_area_struct *vma,
> unsigned long start, unsigned long end,
> unsigned long new_flags,
> - struct vm_userfaultfd_ctx new_ctx)
> -{
> - return vma_modify(vmi, prev, vma, start, end, new_flags,
> - vma_policy(vma), new_ctx, anon_vma_name(vma));
> -}
> + struct vm_userfaultfd_ctx new_ctx);
>
> struct vm_area_struct
> *vma_merge_new_vma(struct vma_iterator *vmi, struct vm_area_struct *prev,
> diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
> index 9b272633ca9e..b12f72979478 100644
> --- a/tools/testing/vma/vma.c
> +++ b/tools/testing/vma/vma.c
> @@ -22,26 +22,6 @@ static bool fail_prealloc;
> */
> #include "../../../mm/vma.c"
>
> -/*
> - * Temporarily forward-ported from a future in which vmg's are used for merging.
the future is soon.
> - */
> -struct vma_merge_struct {
> - struct mm_struct *mm;
> - struct vma_iterator *vmi;
> - pgoff_t pgoff;
> - struct vm_area_struct *prev;
> - struct vm_area_struct *next; /* Modified by vma_merge(). */
> - struct vm_area_struct *vma; /* Either a new VMA or the one being modified. */
> - unsigned long start;
> - unsigned long end;
> - unsigned long flags;
> - struct file *file;
> - struct anon_vma *anon_vma;
> - struct mempolicy *policy;
> - struct vm_userfaultfd_ctx uffd_ctx;
> - struct anon_vma_name *anon_name;
> -};
> -
> const struct vm_operations_struct vma_dummy_vm_ops;
> static struct anon_vma dummy_anon_vma;
>
> @@ -115,14 +95,6 @@ static struct vm_area_struct *alloc_and_link_vma(struct mm_struct *mm,
> /* Helper function which provides a wrapper around a merge new VMA operation. */
> static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
> {
> - /* vma_merge() needs a VMA to determine mm, anon_vma, and file. */
> - struct vm_area_struct dummy = {
> - .vm_mm = vmg->mm,
> - .vm_flags = vmg->flags,
> - .anon_vma = vmg->anon_vma,
> - .vm_file = vmg->file,
> - };
> -
> /*
> * For convenience, get prev and next VMAs. Which the new VMA operation
> * requires.
> @@ -131,8 +103,7 @@ static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
> vmg->prev = vma_prev(vmg->vmi);
>
> vma_iter_set(vmg->vmi, vmg->start);
> - return vma_merge_new_vma(vmg->vmi, vmg->prev, &dummy, vmg->start,
> - vmg->end, vmg->pgoff);
> + return vma_merge(vmg);
> }
>
> /*
> @@ -141,17 +112,7 @@ static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
> */
> static struct vm_area_struct *merge_existing(struct vma_merge_struct *vmg)
> {
> - /* vma_merge() needs a VMA to determine mm, anon_vma, and file. */
> - struct vm_area_struct dummy = {
> - .vm_mm = vmg->mm,
> - .vm_flags = vmg->flags,
> - .anon_vma = vmg->anon_vma,
> - .vm_file = vmg->file,
> - };
> -
> - return vma_merge(vmg->vmi, vmg->prev, &dummy, vmg->start, vmg->end,
> - vmg->flags, vmg->pgoff, vmg->policy, vmg->uffd_ctx,
> - vmg->anon_name);
> + return vma_merge(vmg);
> }
>
> /*
> --
> 2.46.0
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 04/10] mm: remove duplicated open-coded VMA policy check
2024-08-23 20:06 ` [PATCH v2 04/10] mm: remove duplicated open-coded VMA policy check Lorenzo Stoakes
@ 2024-08-28 19:42 ` Liam R. Howlett
0 siblings, 0 replies; 26+ messages in thread
From: Liam R. Howlett @ 2024-08-28 19:42 UTC (permalink / raw)
To: Lorenzo Stoakes; +Cc: linux-mm, linux-kernel, Andrew Morton, Vlastimil Babka
* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [240823 16:07]:
> Both can_vma_merge_before() and can_vma_merge_after() are invoked after
> checking for compatible VMA NUMA policy, we can simply move this to
> is_mergeable_vma() and abstract this altogether.
>
> In mmap_region() we set vmg->policy to NULL, so the policy comparisons
> checked in can_vma_merge_before() and can_vma_merge_after() are exactly
> equivalent to !vma_policy(vmg.next) and !vma_policy(vmg.prev).
>
> Equally, in do_brk_flags(), vmg->policy is NULL, so the
> can_vma_merge_after() is checking !vma_policy(vma), as we set vmg.prev to
> vma.
>
> In vma_merge(), we compare prev and next policies with vmg->policy before
> checking can_vma_merge_after() and can_vma_merge_before() respectively,
> which this patch causes to be checked in precisely the same way.
>
> This therefore maintains precisely the same logic as before, only now
> abstracted into is_mergeable_vma().
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> mm/mmap.c | 8 +++-----
> mm/vma.c | 9 ++++-----
> 2 files changed, 7 insertions(+), 10 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 4066c0444495..c72f50feb98a 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1423,8 +1423,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
>
> /* Attempt to expand an old mapping */
> /* Check next */
> - if (next && next->vm_start == end && !vma_policy(next) &&
> - can_vma_merge_before(&vmg)) {
> + if (next && next->vm_start == end && can_vma_merge_before(&vmg)) {
It occurs to me that we could use the vma iterators index/last to check
the limits prior to needing to move at all.. but I'll leave that for
future work.
> merge_end = next->vm_end;
> vma = next;
> vmg.pgoff = next->vm_pgoff - pglen;
> @@ -1438,8 +1437,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> }
>
> /* Check prev */
> - if (prev && prev->vm_end == addr && !vma_policy(prev) &&
> - can_vma_merge_after(&vmg)) {
> + if (prev && prev->vm_end == addr && can_vma_merge_after(&vmg)) {
> merge_start = prev->vm_start;
> vma = prev;
> vmg.pgoff = prev->vm_pgoff;
> @@ -1778,7 +1776,7 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> * Expand the existing vma if possible; Note that singular lists do not
> * occur after forking, so the expand will only happen on new VMAs.
> */
> - if (vma && vma->vm_end == addr && !vma_policy(vma)) {
> + if (vma && vma->vm_end == addr) {
> VMG_STATE(vmg, mm, vmi, addr, addr + len, flags, PHYS_PFN(addr));
>
> vmg.prev = vma;
> diff --git a/mm/vma.c b/mm/vma.c
> index 74c627ff0313..b1ec412fac7f 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -19,6 +19,8 @@ static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_nex
> */
> bool may_remove_vma = merge_next;
>
> + if (!mpol_equal(vmg->policy, vma_policy(vma)))
> + return false;
Should we ensure there is any vma to check here? Maybe just an assert?
> /*
> * VM_SOFTDIRTY should not prevent from VMA merging, if we
> * match the flags but dirty bit -- the caller should mark
> @@ -1058,17 +1060,14 @@ static struct vm_area_struct *vma_merge(struct vma_merge_struct *vmg)
> vma_pgoff = prev->vm_pgoff;
>
> /* Can we merge the predecessor? */
> - if (addr == prev->vm_end && mpol_equal(vma_policy(prev), vmg->policy)
> - && can_vma_merge_after(vmg)) {
> -
> + if (addr == prev->vm_end && can_vma_merge_after(vmg)) {
> merge_prev = true;
> vma_prev(vmg->vmi);
> }
> }
>
> /* Can we merge the successor? */
> - if (next && mpol_equal(vmg->policy, vma_policy(next)) &&
> - can_vma_merge_before(vmg)) {
> + if (next && can_vma_merge_before(vmg)) {
> merge_next = true;
> }
>
> --
> 2.46.0
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 05/10] mm: abstract vma_expand() to use vma_merge_struct
2024-08-23 20:07 ` [PATCH v2 05/10] mm: abstract vma_expand() to use vma_merge_struct Lorenzo Stoakes
@ 2024-08-28 20:03 ` Liam R. Howlett
0 siblings, 0 replies; 26+ messages in thread
From: Liam R. Howlett @ 2024-08-28 20:03 UTC (permalink / raw)
To: Lorenzo Stoakes; +Cc: linux-mm, linux-kernel, Andrew Morton, Vlastimil Babka
* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [240823 16:07]:
> The purpose of the vmg is to thread merge state through functions and avoid
> egregious parameter lists. We expand this to vma_expand(), which is used
> for a number of merge cases.
>
> Accordingly, adjust its callers, mmap_region() and relocate_vma_down(), to
> use a vmg.
>
> An added purpose of this change is the ability in a future commit to
> perform all new VMA range merging using vma_expand().
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Just one extra whitespace below.
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> mm/mmap.c | 16 +++++++++-------
> mm/vma.c | 39 +++++++++++++++++----------------------
> mm/vma.h | 5 +----
> tools/testing/vma/vma.c | 3 +--
> 4 files changed, 28 insertions(+), 35 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index c72f50feb98a..0d242c9b1f4c 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1371,7 +1371,6 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> struct ma_state mas_detach;
> struct maple_tree mt_detach;
> unsigned long end = addr + len;
> - unsigned long merge_start = addr, merge_end = end;
> bool writable_file_mapping = false;
> int error = -ENOMEM;
> VMA_ITERATOR(vmi, mm, addr);
> @@ -1424,8 +1423,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> /* Attempt to expand an old mapping */
> /* Check next */
> if (next && next->vm_start == end && can_vma_merge_before(&vmg)) {
> - merge_end = next->vm_end;
> - vma = next;
> + vmg.end = next->vm_end;
> + vma = vmg.vma = next;
> vmg.pgoff = next->vm_pgoff - pglen;
> /*
> * We set this here so if we will merge with the previous VMA in
> @@ -1438,15 +1437,15 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
>
> /* Check prev */
> if (prev && prev->vm_end == addr && can_vma_merge_after(&vmg)) {
> - merge_start = prev->vm_start;
> - vma = prev;
> + vmg.start = prev->vm_start;
> + vma = vmg.vma = prev;
> vmg.pgoff = prev->vm_pgoff;
> vma_prev(&vmi); /* Equivalent to going to the previous range */
> }
>
> if (vma) {
> /* Actually expand, if possible */
> - if (!vma_expand(&vmi, vma, merge_start, merge_end, vmg.pgoff, next)) {
> + if (!vma_expand(&vmg)) {
> khugepaged_enter_vma(vma, vm_flags);
> goto expanded;
> }
> @@ -2319,6 +2318,7 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
> unsigned long new_start = old_start - shift;
> unsigned long new_end = old_end - shift;
> VMA_ITERATOR(vmi, mm, new_start);
> + VMG_STATE(vmg, mm, &vmi, new_start, old_end, 0, vma->vm_pgoff);
> struct vm_area_struct *next;
> struct mmu_gather tlb;
>
> @@ -2335,7 +2335,8 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
> /*
> * cover the whole range: [new_start, old_end)
> */
> - if (vma_expand(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL))
> + vmg.vma = vma;
> + if (vma_expand(&vmg))
> return -ENOMEM;
>
> /*
> @@ -2368,6 +2369,7 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
> tlb_finish_mmu(&tlb);
>
> vma_prev(&vmi);
> +
Unnecessary whitespace change.
> /* Shrink the vma to just the new range */
> return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff);
> }
> diff --git a/mm/vma.c b/mm/vma.c
> index b1ec412fac7f..4867ae722a9a 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -467,30 +467,25 @@ void validate_mm(struct mm_struct *mm)
> /*
> * vma_expand - Expand an existing VMA
> *
> - * @vmi: The vma iterator
> - * @vma: The vma to expand
> - * @start: The start of the vma
> - * @end: The exclusive end of the vma
> - * @pgoff: The page offset of vma
> - * @next: The current of next vma.
> + * @vmg: Describes a VMA expansion operation.
> *
> - * Expand @vma to @start and @end. Can expand off the start and end. Will
> - * expand over @next if it's different from @vma and @end == @next->vm_end.
> - * Checking if the @vma can expand and merge with @next needs to be handled by
> - * the caller.
> + * Expand @vma to vmg->start and vmg->end. Can expand off the start and end.
> + * Will expand over vmg->next if it's different from vmg->vma and vmg->end ==
> + * vmg->next->vm_end. Checking if the vmg->vma can expand and merge with
> + * vmg->next needs to be handled by the caller.
> *
> * Returns: 0 on success
> */
> -int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
> - unsigned long start, unsigned long end, pgoff_t pgoff,
> - struct vm_area_struct *next)
> +int vma_expand(struct vma_merge_struct *vmg)
> {
> struct vm_area_struct *anon_dup = NULL;
> bool remove_next = false;
> + struct vm_area_struct *vma = vmg->vma;
> + struct vm_area_struct *next = vmg->next;
> struct vma_prepare vp;
>
> vma_start_write(vma);
> - if (next && (vma != next) && (end == next->vm_end)) {
> + if (next && (vma != next) && (vmg->end == next->vm_end)) {
> int ret;
>
> remove_next = true;
> @@ -503,21 +498,21 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
> init_multi_vma_prep(&vp, vma, NULL, remove_next ? next : NULL, NULL);
> /* Not merging but overwriting any part of next is not handled. */
> VM_WARN_ON(next && !vp.remove &&
> - next != vma && end > next->vm_start);
> + next != vma && vmg->end > next->vm_start);
> /* Only handles expanding */
> - VM_WARN_ON(vma->vm_start < start || vma->vm_end > end);
> + VM_WARN_ON(vma->vm_start < vmg->start || vma->vm_end > vmg->end);
>
> /* Note: vma iterator must be pointing to 'start' */
> - vma_iter_config(vmi, start, end);
> - if (vma_iter_prealloc(vmi, vma))
> + vma_iter_config(vmg->vmi, vmg->start, vmg->end);
> + if (vma_iter_prealloc(vmg->vmi, vma))
> goto nomem;
>
> vma_prepare(&vp);
> - vma_adjust_trans_huge(vma, start, end, 0);
> - vma_set_range(vma, start, end, pgoff);
> - vma_iter_store(vmi, vma);
> + vma_adjust_trans_huge(vma, vmg->start, vmg->end, 0);
> + vma_set_range(vma, vmg->start, vmg->end, vmg->pgoff);
> + vma_iter_store(vmg->vmi, vma);
>
> - vma_complete(&vp, vmi, vma->vm_mm);
> + vma_complete(&vp, vmg->vmi, vma->vm_mm);
> return 0;
>
> nomem:
> diff --git a/mm/vma.h b/mm/vma.h
> index 9060a1a1edfa..8f01fbc20fe7 100644
> --- a/mm/vma.h
> +++ b/mm/vma.h
> @@ -128,10 +128,7 @@ void init_vma_prep(struct vma_prepare *vp,
> void vma_complete(struct vma_prepare *vp,
> struct vma_iterator *vmi, struct mm_struct *mm);
>
> -int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
> - unsigned long start, unsigned long end, pgoff_t pgoff,
> - struct vm_area_struct *next);
> -
> +int vma_expand(struct vma_merge_struct *vmg);
> int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
> unsigned long start, unsigned long end, pgoff_t pgoff);
>
> diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
> index b12f72979478..cce1ba78c57f 100644
> --- a/tools/testing/vma/vma.c
> +++ b/tools/testing/vma/vma.c
> @@ -121,8 +121,7 @@ static struct vm_area_struct *merge_existing(struct vma_merge_struct *vmg)
> */
> static int expand_existing(struct vma_merge_struct *vmg)
> {
> - return vma_expand(vmg->vmi, vmg->vma, vmg->start, vmg->end, vmg->pgoff,
> - vmg->next);
> + return vma_expand(vmg);
> }
>
> /*
> --
> 2.46.0
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 06/10] mm: avoid using vma_merge() for new VMAs
2024-08-23 20:07 ` [PATCH v2 06/10] mm: avoid using vma_merge() for new VMAs Lorenzo Stoakes
2024-08-27 11:41 ` Lorenzo Stoakes
@ 2024-08-28 20:52 ` Liam R. Howlett
2024-08-30 15:19 ` Lorenzo Stoakes
2024-08-29 19:46 ` Mark Brown
2 siblings, 1 reply; 26+ messages in thread
From: Liam R. Howlett @ 2024-08-28 20:52 UTC (permalink / raw)
To: Lorenzo Stoakes; +Cc: linux-mm, linux-kernel, Andrew Morton, Vlastimil Babka
* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [240823 16:07]:
> Abstract vma_merge_new_vma() to use vma_merge_struct and rename the
> resultant function vma_merge_new_range() to be clear what the purpose of
> this function is - a new VMA is desired in the specified range, and we wish
> to see if it is possible to 'merge' surrounding VMAs into this range rather
> than having to allocate a new VMA.
>
> Note that this function uses vma_extend() exclusively, so adopts its
> requirement that the iterator point at or before the gap. We add an assert
> to this effect.
>
> This is as opposed to vma_merge_existing_range(), which will be introduced
> in a subsequent commit, and provide the same functionality for cases in
> which we are modifying an existing VMA.
>
> In mmap_region() and do_brk_flags() we open code scenarios where we prefer
> to use vma_expand() rather than invoke a full vma_merge() operation.
>
> Abstract this logic and eliminate all of the open-coding, and also use the
> same logic for all cases where we add new VMAs to, rather than ultimately
> use vma_merge(), rather use vma_expand().
>
> Doing so removes duplication and simplifies VMA merging in all such cases,
> laying the ground for us to eliminate the merging of new VMAs in
> vma_merge() altogether.
>
> Also add the ability for the vmg to track state, and able to report errors,
> allowing for us to differentiate a failed merge from an inability to
> allocate memory in callers.
>
> This makes it far easier to understand what is happening in these cases
> avoiding confusion, bugs and allowing for future optimisation.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> mm/mmap.c | 93 +++++-------------
> mm/vma.c | 162 ++++++++++++++++++++++++++-----
> mm/vma.h | 20 +++-
> tools/testing/vma/vma.c | 33 ++++++-
> tools/testing/vma/vma_internal.h | 6 ++
> 5 files changed, 216 insertions(+), 98 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 0d242c9b1f4c..80d70ed099cf 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1364,8 +1364,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> {
> struct mm_struct *mm = current->mm;
> struct vm_area_struct *vma = NULL;
> - struct vm_area_struct *next, *prev, *merge;
> pgoff_t pglen = PHYS_PFN(len);
> + struct vm_area_struct *merge;
> unsigned long charged = 0;
> struct vma_munmap_struct vms;
> struct ma_state mas_detach;
> @@ -1389,13 +1389,13 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> if (vms_gather_munmap_vmas(&vms, &mas_detach))
> return -ENOMEM;
>
> - next = vmg.next = vms.next;
> - prev = vmg.prev = vms.prev;
> + vmg.next = vms.next;
> + vmg.prev = vms.prev;
> vma = NULL;
> } else {
> - next = vmg.next = vma_next(&vmi);
> - prev = vmg.prev = vma_prev(&vmi);
> - if (prev)
> + vmg.next = vma_next(&vmi);
> + vmg.prev = vma_prev(&vmi);
> + if (vmg.prev)
> vma_iter_next_range(&vmi);
> }
>
> @@ -1417,45 +1417,9 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> vmg.flags = vm_flags;
> }
>
> - if (vm_flags & VM_SPECIAL)
> - goto cannot_expand;
> -
> - /* Attempt to expand an old mapping */
> - /* Check next */
> - if (next && next->vm_start == end && can_vma_merge_before(&vmg)) {
> - vmg.end = next->vm_end;
> - vma = vmg.vma = next;
> - vmg.pgoff = next->vm_pgoff - pglen;
> - /*
> - * We set this here so if we will merge with the previous VMA in
> - * the code below, can_vma_merge_after() ensures anon_vma
> - * compatibility between prev and next.
> - */
> - vmg.anon_vma = vma->anon_vma;
> - vmg.uffd_ctx = vma->vm_userfaultfd_ctx;
> - }
> -
> - /* Check prev */
> - if (prev && prev->vm_end == addr && can_vma_merge_after(&vmg)) {
> - vmg.start = prev->vm_start;
> - vma = vmg.vma = prev;
> - vmg.pgoff = prev->vm_pgoff;
> - vma_prev(&vmi); /* Equivalent to going to the previous range */
> - }
> -
> - if (vma) {
> - /* Actually expand, if possible */
> - if (!vma_expand(&vmg)) {
> - khugepaged_enter_vma(vma, vm_flags);
> - goto expanded;
> - }
> -
> - /* If the expand fails, then reposition the vma iterator */
> - if (unlikely(vma == prev))
> - vma_iter_set(&vmi, addr);
> - }
> -
> -cannot_expand:
> + vma = vma_merge_new_range(&vmg);
> + if (vma)
> + goto expanded;
>
> /*
> * Determine the object being mapped and call the appropriate
> @@ -1503,10 +1467,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> * If vm_flags changed after call_mmap(), we should try merge
> * vma again as we may succeed this time.
> */
> - if (unlikely(vm_flags != vma->vm_flags && prev)) {
> - merge = vma_merge_new_vma(&vmi, prev, vma,
> - vma->vm_start, vma->vm_end,
> - vma->vm_pgoff);
> + if (unlikely(vm_flags != vma->vm_flags && vmg.prev)) {
> + vmg.flags = vma->vm_flags;
> + /* If this fails, state is reset ready for a reattempt. */
> + merge = vma_merge_new_range(&vmg);
> +
Extra white space.
> if (merge) {
> /*
> * ->mmap() can change vma->vm_file and fput
> @@ -1521,6 +1486,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> /* Update vm_flags to pick up the change. */
> vm_flags = vma->vm_flags;
> goto unmap_writable;
> + } else {
> + vma_iter_config(&vmi, addr, end);
This else can be dropped since the if ends in a goto. I guess, what you
are trying to fix is the merge of the prev (which moved the iterator)
that failed. Might be easier to read, either way it is correct.
> }
> }
>
> @@ -1554,7 +1521,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> vma_link_file(vma);
>
> /*
> - * vma_merge() calls khugepaged_enter_vma() either, the below
> + * vma_merge_new_range() calls khugepaged_enter_vma() too, the below
> * call covers the non-merge case.
> */
> khugepaged_enter_vma(vma, vma->vm_flags);
> @@ -1609,7 +1576,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
>
> vma_iter_set(&vmi, vma->vm_end);
> /* Undo any partial mapping done by a device driver. */
> - unmap_region(&vmi.mas, vma, prev, next);
> + unmap_region(&vmi.mas, vma, vmg.prev, vmg.next);
> }
> if (writable_file_mapping)
> mapping_unmap_writable(file->f_mapping);
> @@ -1755,7 +1722,6 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> unsigned long addr, unsigned long len, unsigned long flags)
> {
> struct mm_struct *mm = current->mm;
> - struct vma_prepare vp;
>
> /*
> * Check against address space limits by the changed size
> @@ -1779,25 +1745,12 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> VMG_STATE(vmg, mm, vmi, addr, addr + len, flags, PHYS_PFN(addr));
>
> vmg.prev = vma;
> - if (can_vma_merge_after(&vmg)) {
> - vma_iter_config(vmi, vma->vm_start, addr + len);
> - if (vma_iter_prealloc(vmi, vma))
> - goto unacct_fail;
> -
> - vma_start_write(vma);
> -
> - init_vma_prep(&vp, vma);
> - vma_prepare(&vp);
> - vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0);
> - vma->vm_end = addr + len;
> - vm_flags_set(vma, VM_SOFTDIRTY);
> - vma_iter_store(vmi, vma);
> -
> - vma_complete(&vp, vmi, mm);
> - validate_mm(mm);
> - khugepaged_enter_vma(vma, flags);
> + vma_iter_next_range(vmi);
> +
> + if (vma_merge_new_range(&vmg))
> goto out;
> - }
> + else if (vmg_nomem(&vmg))
> + goto unacct_fail;
> }
>
> if (vma)
> diff --git a/mm/vma.c b/mm/vma.c
> index 4867ae722a9a..8a5fa15f46a2 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -464,6 +464,116 @@ void validate_mm(struct mm_struct *mm)
> }
> #endif /* CONFIG_DEBUG_VM_MAPLE_TREE */
>
> +/*
> + * vma_merge_new_range - Attempt to merge a new VMA into address space
> + *
> + * @vmg: Describes the VMA we are adding, in the range @vmg->start to @vmg->end
> + * (exclusive), which we try to merge with any adjacent VMAs if possible.
> + *
> + * We are about to add a VMA to the address space starting at @vmg->start and
> + * ending at @vmg->end. There are three different possible scenarios:
> + *
> + * 1. There is a VMA with identical properties immediately adjacent to the
> + * proposed new VMA [@vmg->start, @vmg->end) either before or after it -
> + * EXPAND that VMA:
> + *
> + * Proposed: |-----| or |-----|
> + * Existing: |----| |----|
> + *
> + * 2. There are VMAs with identical properties immediately adjacent to the
> + * proposed new VMA [@vmg->start, @vmg->end) both before AND after it -
> + * EXPAND the former and REMOVE the latter:
> + *
> + * Proposed: |-----|
> + * Existing: |----| |----|
> + *
> + * 3. There are no VMAs immediately adjacent to the proposed new VMA or those
> + * VMAs do not have identical attributes - NO MERGE POSSIBLE.
> + *
> + * In instances where we can merge, this function returns the expanded VMA which
> + * will have its range adjusted accordingly and the underlying maple tree also
> + * adjusted.
> + *
> + * Returns: In instances where no merge was possible, NULL. Otherwise, a pointer
> + * to the VMA we expanded.
> + *
> + * This function adjusts @vmg to provide @vmg->next if not already specified,
> + * and adjusts [@vmg->start, @vmg->end) to span the expanded range.
> + *
> + * ASSUMPTIONS:
> + * - The caller must hold a WRITE lock on the mm_struct->mmap_lock.
> + * - The caller must have determined that [@vmg->start, @vmg->end) is empty,
> + other than VMAs that will be unmapped should the operation succeed.
> + * - The caller must have specified the previous vma in @vmg->prev.
> + * - The caller must have specified the next vma in @vmg->next.
> + * - The caller must have positioned the vmi at or before the gap.
> + */
> +struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
> +{
> + struct vm_area_struct *prev = vmg->prev;
> + struct vm_area_struct *next = vmg->next;
> + unsigned long start = vmg->start;
> + unsigned long end = vmg->end;
> + pgoff_t pgoff = vmg->pgoff;
> + pgoff_t pglen = PHYS_PFN(end - start);
> + bool can_merge_before, can_merge_after;
> +
> + mmap_assert_write_locked(vmg->mm);
> + VM_WARN_ON(vmg->vma);
> + /* vmi must point at or before the gap. */
> + VM_WARN_ON(vma_iter_addr(vmg->vmi) > end);
> +
> + vmg->state = VMA_MERGE_NOMERGE;
> +
> + /* Special VMAs are unmergeable, also if no prev/next. */
> + if ((vmg->flags & VM_SPECIAL) || (!prev && !next))
> + return NULL;
> +
> + can_merge_before = next && next->vm_start == end &&
> + can_vma_merge_before(vmg);
> + can_merge_after = prev && prev->vm_end == start &&
> + can_vma_merge_after(vmg);
Can we please rewrite this as if statements for clarity?
> +
> + /* If we can merge with the next VMA, adjust vmg accordingly. */
> + if (can_merge_before &&
> + (!can_merge_after || is_mergeable_anon_vma(prev->anon_vma,
> + next->anon_vma, NULL))) {
> + vmg->end = next->vm_end;
> + vmg->vma = next;
> + vmg->pgoff = next->vm_pgoff - pglen;
> + }
> +
> + /* If we can merge with the previous VMA, adjust vmg accordingly. */
> + if (can_merge_after) {
> + vmg->start = prev->vm_start;
> + vmg->vma = prev;
> + vmg->pgoff = prev->vm_pgoff;
> +
> + vma_prev(vmg->vmi); /* Equivalent to going to the previous range */
> + }
> +
> + /*
> + * Now try to expand adjacent VMA(s). This takes care of removing the
> + * following VMA if we have VMAs on both sides.
> + */
> + if (vmg->vma && !vma_expand(vmg)) {
> + khugepaged_enter_vma(vmg->vma, vmg->flags);
This might be able to be moved into vma_expand().
> +
Extra whitespace
> + vmg->state = VMA_MERGE_SUCCESS;
> + return vmg->vma;
> + }
> +
> + /* If expansion failed, reset state. Allows us to retry merge later. */
> + vmg->vma = NULL;
> + vmg->start = start;
> + vmg->end = end;
> + vmg->pgoff = pgoff;
> + if (vmg->vma == prev)
> + vma_iter_set(vmg->vmi, start);
> +
> + return NULL;
> +}
> +
> /*
> * vma_expand - Expand an existing VMA
> *
> @@ -474,7 +584,11 @@ void validate_mm(struct mm_struct *mm)
> * vmg->next->vm_end. Checking if the vmg->vma can expand and merge with
> * vmg->next needs to be handled by the caller.
> *
> - * Returns: 0 on success
> + * Returns: 0 on success.
> + *
> + * ASSUMPTIONS:
> + * - The caller must hold a WRITE lock on vmg->vma->mm->mmap_lock.
> + * - The caller must have set @vmg->vma and @vmg->next.
> */
> int vma_expand(struct vma_merge_struct *vmg)
> {
> @@ -484,6 +598,8 @@ int vma_expand(struct vma_merge_struct *vmg)
> struct vm_area_struct *next = vmg->next;
> struct vma_prepare vp;
>
> + mmap_assert_write_locked(vmg->mm);
> +
There are a few unnecessary whitespaces here..
> vma_start_write(vma);
> if (next && (vma != next) && (vmg->end == next->vm_end)) {
> int ret;
> @@ -516,6 +632,7 @@ int vma_expand(struct vma_merge_struct *vmg)
> return 0;
>
> nomem:
> + vmg->state = VMA_MERGE_ERROR_NOMEM;
> if (anon_dup)
> unlink_anon_vmas(anon_dup);
> return -ENOMEM;
> @@ -1034,6 +1151,8 @@ static struct vm_area_struct *vma_merge(struct vma_merge_struct *vmg)
> pgoff_t pglen = PHYS_PFN(end - addr);
> long adj_start = 0;
>
> + vmg->state = VMA_MERGE_NOMERGE;
> +
> /*
> * We later require that vma->vm_flags == vm_flags,
> * so this tests vma->vm_flags & VM_SPECIAL, too.
> @@ -1185,13 +1304,19 @@ static struct vm_area_struct *vma_merge(struct vma_merge_struct *vmg)
> vma_complete(&vp, vmg->vmi, mm);
> validate_mm(mm);
> khugepaged_enter_vma(res, vmg->flags);
> +
> + vmg->state = VMA_MERGE_SUCCESS;
> return res;
>
> prealloc_fail:
> + vmg->state = VMA_MERGE_ERROR_NOMEM;
> if (anon_dup)
> unlink_anon_vmas(anon_dup);
>
> anon_vma_fail:
> + if (err == -ENOMEM)
> + vmg->state = VMA_MERGE_ERROR_NOMEM;
> +
> vma_iter_set(vmg->vmi, addr);
> vma_iter_load(vmg->vmi);
> return NULL;
> @@ -1298,22 +1423,6 @@ struct vm_area_struct
> return vma_modify(&vmg);
> }
>
> -/*
> - * Attempt to merge a newly mapped VMA with those adjacent to it. The caller
> - * must ensure that [start, end) does not overlap any existing VMA.
> - */
> -struct vm_area_struct
> -*vma_merge_new_vma(struct vma_iterator *vmi, struct vm_area_struct *prev,
> - struct vm_area_struct *vma, unsigned long start,
> - unsigned long end, pgoff_t pgoff)
> -{
> - VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
> -
> - vmg.pgoff = pgoff;
> -
> - return vma_merge(&vmg);
> -}
> -
> /*
> * Expand vma by delta bytes, potentially merging with an immediately adjacent
> * VMA with identical properties.
> @@ -1324,8 +1433,9 @@ struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi,
> {
> VMG_VMA_STATE(vmg, vmi, vma, vma, vma->vm_end, vma->vm_end + delta);
>
> - /* vma is specified as prev, so case 1 or 2 will apply. */
> - return vma_merge(&vmg);
> + /* We use the VMA to populate VMG fields only. */
> + vmg.vma = NULL;
> + return vma_merge_new_range(&vmg);
> }
>
> void unlink_file_vma_batch_init(struct unlink_vma_file_batch *vb)
> @@ -1426,9 +1536,10 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
> struct vm_area_struct *vma = *vmap;
> unsigned long vma_start = vma->vm_start;
> struct mm_struct *mm = vma->vm_mm;
> - struct vm_area_struct *new_vma, *prev;
> + struct vm_area_struct *new_vma;
> bool faulted_in_anon_vma = true;
> VMA_ITERATOR(vmi, mm, addr);
> + VMG_VMA_STATE(vmg, &vmi, NULL, vma, addr, addr + len);
>
> /*
> * If anonymous vma has not yet been faulted, update new pgoff
> @@ -1439,11 +1550,18 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
> faulted_in_anon_vma = false;
> }
>
> - new_vma = find_vma_prev(mm, addr, &prev);
> + new_vma = find_vma_prev(mm, addr, &vmg.prev);
> if (new_vma && new_vma->vm_start < addr + len)
> return NULL; /* should never get here */
>
> - new_vma = vma_merge_new_vma(&vmi, prev, vma, addr, addr + len, pgoff);
> + vmg.vma = NULL; /* New VMA range. */
> + vmg.pgoff = pgoff;
> + vmg.next = vma_next(&vmi);
> + vma_prev(&vmi);
> + vma_iter_next_range(&vmi);
You have already fixed this.
> +
> + new_vma = vma_merge_new_range(&vmg);
> +
> if (new_vma) {
> /*
> * Source vma may have been merged into new_vma
> diff --git a/mm/vma.h b/mm/vma.h
> index 8f01fbc20fe7..dbcdf1431014 100644
> --- a/mm/vma.h
> +++ b/mm/vma.h
> @@ -52,6 +52,13 @@ struct vma_munmap_struct {
> unsigned long data_vm;
> };
>
> +enum vma_merge_state {
> + VMA_MERGE_START,
> + VMA_MERGE_ERROR_NOMEM,
> + VMA_MERGE_NOMERGE,
> + VMA_MERGE_SUCCESS,
> +};
> +
> /* Represents a VMA merge operation. */
> struct vma_merge_struct {
> struct mm_struct *mm;
> @@ -68,8 +75,14 @@ struct vma_merge_struct {
> struct mempolicy *policy;
> struct vm_userfaultfd_ctx uffd_ctx;
> struct anon_vma_name *anon_name;
> + enum vma_merge_state state;
> };
>
> +static inline bool vmg_nomem(struct vma_merge_struct *vmg)
> +{
> + return vmg->state == VMA_MERGE_ERROR_NOMEM;
> +}
> +
> /* Assumes addr >= vma->vm_start. */
> static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
> unsigned long addr)
> @@ -85,6 +98,7 @@ static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
> .end = end_, \
> .flags = flags_, \
> .pgoff = pgoff_, \
> + .state = VMA_MERGE_START, \
> }
>
> #define VMG_VMA_STATE(name, vmi_, prev_, vma_, start_, end_) \
> @@ -103,6 +117,7 @@ static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
> .policy = vma_policy(vma_), \
> .uffd_ctx = vma_->vm_userfaultfd_ctx, \
> .anon_name = anon_vma_name(vma_), \
> + .state = VMA_MERGE_START, \
> }
>
> #ifdef CONFIG_DEBUG_VM_MAPLE_TREE
> @@ -306,10 +321,7 @@ struct vm_area_struct
> unsigned long new_flags,
> struct vm_userfaultfd_ctx new_ctx);
>
> -struct vm_area_struct
> -*vma_merge_new_vma(struct vma_iterator *vmi, struct vm_area_struct *prev,
> - struct vm_area_struct *vma, unsigned long start,
> - unsigned long end, pgoff_t pgoff);
> +struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg);
>
> struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi,
> struct vm_area_struct *vma,
> diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
> index cce1ba78c57f..3a3a850d951c 100644
> --- a/tools/testing/vma/vma.c
> +++ b/tools/testing/vma/vma.c
> @@ -101,9 +101,9 @@ static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
> */
> vmg->next = vma_next(vmg->vmi);
> vmg->prev = vma_prev(vmg->vmi);
> + vma_iter_next_range(vmg->vmi);
>
> - vma_iter_set(vmg->vmi, vmg->start);
> - return vma_merge(vmg);
> + return vma_merge_new_range(vmg);
> }
>
> /*
> @@ -162,10 +162,14 @@ static struct vm_area_struct *try_merge_new_vma(struct mm_struct *mm,
> merged = merge_new(vmg);
> if (merged) {
> *was_merged = true;
> + ASSERT_EQ(vmg->state, VMA_MERGE_SUCCESS);
> return merged;
> }
>
> *was_merged = false;
> +
> + ASSERT_EQ(vmg->state, VMA_MERGE_NOMERGE);
> +
> return alloc_and_link_vma(mm, start, end, pgoff, flags);
> }
>
> @@ -595,6 +599,7 @@ static bool test_vma_merge_special_flags(void)
> vmg.flags = flags | special_flag;
> vma = merge_new(&vmg);
> ASSERT_EQ(vma, NULL);
> + ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
> }
>
> /* 2. Modify VMA with special flag that would otherwise merge. */
> @@ -616,6 +621,7 @@ static bool test_vma_merge_special_flags(void)
> vmg.flags = flags | special_flag;
> vma = merge_existing(&vmg);
> ASSERT_EQ(vma, NULL);
> + ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
> }
>
> cleanup_mm(&mm, &vmi);
> @@ -708,6 +714,7 @@ static bool test_vma_merge_with_close(void)
>
> /* The next VMA having a close() operator should cause the merge to fail.*/
> ASSERT_EQ(merge_new(&vmg), NULL);
> + ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
>
> /* Now create the VMA so we can merge via modified flags */
> vmg_set_range(&vmg, 0x1000, 0x2000, 1, flags);
> @@ -719,6 +726,7 @@ static bool test_vma_merge_with_close(void)
> * also fail.
> */
> ASSERT_EQ(merge_existing(&vmg), NULL);
> + ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
>
> /* SCENARIO B
> *
> @@ -744,6 +752,7 @@ static bool test_vma_merge_with_close(void)
> vmg.vma = vma;
> /* Make sure merge does not occur. */
> ASSERT_EQ(merge_existing(&vmg), NULL);
> + ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
>
> cleanup_mm(&mm, &vmi);
> return true;
> @@ -792,6 +801,7 @@ static bool test_vma_merge_new_with_close(void)
> vmg_set_range(&vmg, 0x2000, 0x5000, 2, flags);
> vma = merge_new(&vmg);
> ASSERT_NE(vma, NULL);
> + ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
> ASSERT_EQ(vma->vm_start, 0);
> ASSERT_EQ(vma->vm_end, 0x5000);
> ASSERT_EQ(vma->vm_pgoff, 0);
> @@ -831,6 +841,7 @@ static bool test_merge_existing(void)
> vmg.prev = vma;
> vma->anon_vma = &dummy_anon_vma;
> ASSERT_EQ(merge_existing(&vmg), vma_next);
> + ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
> ASSERT_EQ(vma_next->vm_start, 0x3000);
> ASSERT_EQ(vma_next->vm_end, 0x9000);
> ASSERT_EQ(vma_next->vm_pgoff, 3);
> @@ -861,6 +872,7 @@ static bool test_merge_existing(void)
> vmg.vma = vma;
> vma->anon_vma = &dummy_anon_vma;
> ASSERT_EQ(merge_existing(&vmg), vma_next);
> + ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
> ASSERT_EQ(vma_next->vm_start, 0x2000);
> ASSERT_EQ(vma_next->vm_end, 0x9000);
> ASSERT_EQ(vma_next->vm_pgoff, 2);
> @@ -889,6 +901,7 @@ static bool test_merge_existing(void)
> vma->anon_vma = &dummy_anon_vma;
>
> ASSERT_EQ(merge_existing(&vmg), vma_prev);
> + ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
> ASSERT_EQ(vma_prev->vm_start, 0);
> ASSERT_EQ(vma_prev->vm_end, 0x6000);
> ASSERT_EQ(vma_prev->vm_pgoff, 0);
> @@ -920,6 +933,7 @@ static bool test_merge_existing(void)
> vmg.vma = vma;
> vma->anon_vma = &dummy_anon_vma;
> ASSERT_EQ(merge_existing(&vmg), vma_prev);
> + ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
> ASSERT_EQ(vma_prev->vm_start, 0);
> ASSERT_EQ(vma_prev->vm_end, 0x7000);
> ASSERT_EQ(vma_prev->vm_pgoff, 0);
> @@ -948,6 +962,7 @@ static bool test_merge_existing(void)
> vmg.vma = vma;
> vma->anon_vma = &dummy_anon_vma;
> ASSERT_EQ(merge_existing(&vmg), vma_prev);
> + ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
> ASSERT_EQ(vma_prev->vm_start, 0);
> ASSERT_EQ(vma_prev->vm_end, 0x9000);
> ASSERT_EQ(vma_prev->vm_pgoff, 0);
> @@ -981,31 +996,37 @@ static bool test_merge_existing(void)
> vmg.prev = vma;
> vmg.vma = vma;
> ASSERT_EQ(merge_existing(&vmg), NULL);
> + ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
>
> vmg_set_range(&vmg, 0x5000, 0x6000, 5, flags);
> vmg.prev = vma;
> vmg.vma = vma;
> ASSERT_EQ(merge_existing(&vmg), NULL);
> + ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
>
> vmg_set_range(&vmg, 0x6000, 0x7000, 6, flags);
> vmg.prev = vma;
> vmg.vma = vma;
> ASSERT_EQ(merge_existing(&vmg), NULL);
> + ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
>
> vmg_set_range(&vmg, 0x4000, 0x7000, 4, flags);
> vmg.prev = vma;
> vmg.vma = vma;
> ASSERT_EQ(merge_existing(&vmg), NULL);
> + ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
>
> vmg_set_range(&vmg, 0x4000, 0x6000, 4, flags);
> vmg.prev = vma;
> vmg.vma = vma;
> ASSERT_EQ(merge_existing(&vmg), NULL);
> + ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
>
> vmg_set_range(&vmg, 0x5000, 0x6000, 5, flags);
> vmg.prev = vma;
> vmg.vma = vma;
> ASSERT_EQ(merge_existing(&vmg), NULL);
> + ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
>
> ASSERT_EQ(cleanup_mm(&mm, &vmi), 3);
>
> @@ -1071,6 +1092,7 @@ static bool test_anon_vma_non_mergeable(void)
> vmg.vma = vma;
>
> ASSERT_EQ(merge_existing(&vmg), vma_prev);
> + ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
> ASSERT_EQ(vma_prev->vm_start, 0);
> ASSERT_EQ(vma_prev->vm_end, 0x7000);
> ASSERT_EQ(vma_prev->vm_pgoff, 0);
> @@ -1106,6 +1128,7 @@ static bool test_anon_vma_non_mergeable(void)
> vmg.prev = vma_prev;
>
> ASSERT_EQ(merge_new(&vmg), vma_prev);
> + ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
> ASSERT_EQ(vma_prev->vm_start, 0);
> ASSERT_EQ(vma_prev->vm_end, 0x7000);
> ASSERT_EQ(vma_prev->vm_pgoff, 0);
> @@ -1181,6 +1204,7 @@ static bool test_dup_anon_vma(void)
> vmg.vma = vma;
>
> ASSERT_EQ(merge_existing(&vmg), vma_prev);
> + ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
>
> ASSERT_EQ(vma_prev->vm_start, 0);
> ASSERT_EQ(vma_prev->vm_end, 0x8000);
> @@ -1209,6 +1233,7 @@ static bool test_dup_anon_vma(void)
> vmg.vma = vma;
>
> ASSERT_EQ(merge_existing(&vmg), vma_prev);
> + ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
>
> ASSERT_EQ(vma_prev->vm_start, 0);
> ASSERT_EQ(vma_prev->vm_end, 0x8000);
> @@ -1236,6 +1261,7 @@ static bool test_dup_anon_vma(void)
> vmg.vma = vma;
>
> ASSERT_EQ(merge_existing(&vmg), vma_prev);
> + ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
>
> ASSERT_EQ(vma_prev->vm_start, 0);
> ASSERT_EQ(vma_prev->vm_end, 0x5000);
> @@ -1263,6 +1289,7 @@ static bool test_dup_anon_vma(void)
> vmg.vma = vma;
>
> ASSERT_EQ(merge_existing(&vmg), vma_next);
> + ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
>
> ASSERT_EQ(vma_next->vm_start, 0x3000);
> ASSERT_EQ(vma_next->vm_end, 0x8000);
> @@ -1303,6 +1330,7 @@ static bool test_vmi_prealloc_fail(void)
>
> /* This will cause the merge to fail. */
> ASSERT_EQ(merge_existing(&vmg), NULL);
> + ASSERT_EQ(vmg.state, VMA_MERGE_ERROR_NOMEM);
> /* We will already have assigned the anon_vma. */
> ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
> /* And it was both cloned and unlinked. */
> @@ -1327,6 +1355,7 @@ static bool test_vmi_prealloc_fail(void)
>
> fail_prealloc = true;
> ASSERT_EQ(expand_existing(&vmg), -ENOMEM);
> + ASSERT_EQ(vmg.state, VMA_MERGE_ERROR_NOMEM);
>
> ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
> ASSERT_TRUE(dummy_anon_vma.was_cloned);
> diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
> index a3c262c6eb73..c5b9da034511 100644
> --- a/tools/testing/vma/vma_internal.h
> +++ b/tools/testing/vma/vma_internal.h
> @@ -740,6 +740,12 @@ static inline void vma_iter_free(struct vma_iterator *vmi)
> mas_destroy(&vmi->mas);
> }
>
> +static inline
> +struct vm_area_struct *vma_iter_next_range(struct vma_iterator *vmi)
> +{
> + return mas_next_range(&vmi->mas, ULONG_MAX);
> +}
> +
> static inline void vm_acct_memory(long pages)
> {
> }
> --
> 2.46.0
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 06/10] mm: avoid using vma_merge() for new VMAs
2024-08-23 20:07 ` [PATCH v2 06/10] mm: avoid using vma_merge() for new VMAs Lorenzo Stoakes
2024-08-27 11:41 ` Lorenzo Stoakes
2024-08-28 20:52 ` Liam R. Howlett
@ 2024-08-29 19:46 ` Mark Brown
2024-08-29 21:22 ` Lorenzo Stoakes
2 siblings, 1 reply; 26+ messages in thread
From: Mark Brown @ 2024-08-29 19:46 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: linux-mm, linux-kernel, Andrew Morton, Liam R . Howlett,
Vlastimil Babka, Ryan Roberts, Aishwarya TCV, dev.jain
[-- Attachment #1: Type: text/plain, Size: 3754 bytes --]
On Fri, Aug 23, 2024 at 09:07:01PM +0100, Lorenzo Stoakes wrote:
> Abstract vma_merge_new_vma() to use vma_merge_struct and rename the
> resultant function vma_merge_new_range() to be clear what the purpose of
> this function is - a new VMA is desired in the specified range, and we wish
> to see if it is possible to 'merge' surrounding VMAs into this range rather
> than having to allocate a new VMA.
This patch, which is in -next today with the fixup Lorenzo posted as
commit 8c9d0f8b1e9a42586, seems to be causing problems with the mremap
expand merge selftest. The test has been failing for a few days. It
unfortunately doesn't log anything about why it's upset:
# # ok 15 5MB mremap - Source 1MB-aligned, Dest 1MB-aligned with 40MB Preamble
# # not ok 16 mremap expand merge
# # ok 18 mremap mremap move within range
I identified this commit using a bisect which appears to converge fairly
smoothly, I didn't do any other analysis:
git bisect start
# status: waiting for both good and bad commits
# bad: [b18bbfc14a38b5234e09c2adcf713e38063a7e6e] Add linux-next specific files for 20240829
git bisect bad b18bbfc14a38b5234e09c2adcf713e38063a7e6e
# status: waiting for good commit(s), bad commit known
# good: [559a93afece952cb129a236febe5d1b8f7c79367] Merge branch 'for-linux-next-fixes' of https://gitlab.freedesktop.org/drm/misc/kernel.git
git bisect good 559a93afece952cb129a236febe5d1b8f7c79367
# bad: [47b9fb7c1f3efb63f2a5ed614385476856196527] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath.git
git bisect bad 47b9fb7c1f3efb63f2a5ed614385476856196527
# bad: [3582529887df7b4d397a05584ceb9d9944b54328] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git
git bisect bad 3582529887df7b4d397a05584ceb9d9944b54328
# bad: [a97442baf3538d884696539023a068269c983182] Merge branch 'at91-next' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux.git
git bisect bad a97442baf3538d884696539023a068269c983182
# bad: [9287e4adbc6ab8fa04d25eb82e097fed877a4642] mm: optimization on page allocation when CMA enabled
git bisect bad 9287e4adbc6ab8fa04d25eb82e097fed877a4642
# good: [becee36d72a57fd245033efdfd94cb54ac24472e] mm: swap: extend swap_shmem_alloc() to support batch SWAP_MAP_SHMEM flag setting
git bisect good becee36d72a57fd245033efdfd94cb54ac24472e
# good: [8939071cd2fdfd69f48631845613f5d841e0e97e] selftests-test_zswap-add-test-for-hierarchical-zswapwriteback-fix
git bisect good 8939071cd2fdfd69f48631845613f5d841e0e97e
# bad: [42c5d3ae23db3086624db99c52498d5843d3b88a] mm/damon/core-test: fix damon_test_ops_registration() for DAMON_VADDR unset case
git bisect bad 42c5d3ae23db3086624db99c52498d5843d3b88a
# good: [6abe8def9543d42a31fa6560410a1ab45b3e1917] mm/vma: drop incorrect comment from vms_gather_munmap_vmas()
git bisect good 6abe8def9543d42a31fa6560410a1ab45b3e1917
# bad: [bdf4d125288d2ac24a49f7f52d882261bed6e1e1] mm: vmalloc: refactor vm_area_alloc_pages() function
git bisect bad bdf4d125288d2ac24a49f7f52d882261bed6e1e1
# good: [5cc3d95b7d3a1cd7c09772d6b7aaa0371cc37236] mm: abstract vma_expand() to use vma_merge_struct
git bisect good 5cc3d95b7d3a1cd7c09772d6b7aaa0371cc37236
# bad: [042a3d61fddaff155b6595d8e47323f1e85aefb3] mm: make vma_prepare() and friends static and internal to vma.c
git bisect bad 042a3d61fddaff155b6595d8e47323f1e85aefb3
# bad: [b46446ab789bc2cfc3d18ddcc0e93fa533f6b479] mm: only advance iterator if prev exists
git bisect bad b46446ab789bc2cfc3d18ddcc0e93fa533f6b479
# bad: [8c9d0f8b1e9a4258676714557c8d69fbb85578ab] mm: avoid using vma_merge() for new VMAs
git bisect bad 8c9d0f8b1e9a4258676714557c8d69fbb85578ab
# first bad commit: [8c9d0f8b1e9a4258676714557c8d69fbb85578ab] mm: avoid using vma_merge() for new VMAs
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 06/10] mm: avoid using vma_merge() for new VMAs
2024-08-29 19:46 ` Mark Brown
@ 2024-08-29 21:22 ` Lorenzo Stoakes
2024-08-30 12:59 ` Mark Brown
0 siblings, 1 reply; 26+ messages in thread
From: Lorenzo Stoakes @ 2024-08-29 21:22 UTC (permalink / raw)
To: Mark Brown, Andrew Morton
Cc: linux-mm, linux-kernel, Liam R . Howlett, Vlastimil Babka,
Ryan Roberts, Aishwarya TCV, dev.jain
On Thu, Aug 29, 2024 at 08:46:28PM GMT, Mark Brown wrote:
> On Fri, Aug 23, 2024 at 09:07:01PM +0100, Lorenzo Stoakes wrote:
> > Abstract vma_merge_new_vma() to use vma_merge_struct and rename the
> > resultant function vma_merge_new_range() to be clear what the purpose of
> > this function is - a new VMA is desired in the specified range, and we wish
> > to see if it is possible to 'merge' surrounding VMAs into this range rather
> > than having to allocate a new VMA.
>
> This patch, which is in -next today with the fixup Lorenzo posted as
> commit 8c9d0f8b1e9a42586, seems to be causing problems with the mremap
> expand merge selftest. The test has been failing for a few days. It
> unfortunately doesn't log anything about why it's upset:
>
> # # ok 15 5MB mremap - Source 1MB-aligned, Dest 1MB-aligned with 40MB Preamble
> # # not ok 16 mremap expand merge
> # # ok 18 mremap mremap move within range
[snip]
Thanks, I figured out the problem, it's not arm-specific, I was running
self-tests but eyeballing-failure resulted in me missing this.
This is a product of vma_merge_extend() invoking vma_merge_new_range() without
having determined the next VMA correctly, after moving from vma_merge() (which
looked this up for us) to vma_merge_new_range() (which does not).
This is after having adjusted the assumptions between v1 and v2 of the series in
each merge function, and I simply missed this mremap()-specific case.
Andrew - I enclose a fix-patch to get a fix out for this asap, but I am due a
respin relatively soon and will also include that in this.
----8<----
From 3678f8a53f98de52f11946d4d32e6fb239d11c2f Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Thu, 29 Aug 2024 22:18:02 +0100
Subject: [PATCH] mm: correctly determine vmg.next in vma_merge_extend()
vma_merge_next_range() requires that the caller specify prev AND next.
Failure to specify results in missed merges. Fix this by explicitly looking
up next.
This function is explicitly used by mremap() in extend cases.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Mark Brown <broonie@kernel.org>
---
mm/vma.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/vma.c b/mm/vma.c
index 7cddeea907f8..bd35abc70ed8 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -1489,6 +1489,10 @@ struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi,
{
VMG_VMA_STATE(vmg, vmi, vma, vma, vma->vm_end, vma->vm_end + delta);
+ vmg.next = vma_next(vmi);
+ if (vma_prev(vmi))
+ vma_iter_next_range(vmi);
+
/* We use the VMA to populate VMG fields only. */
vmg.vma = NULL;
return vma_merge_new_range(&vmg);
--
2.46.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v2 06/10] mm: avoid using vma_merge() for new VMAs
2024-08-29 21:22 ` Lorenzo Stoakes
@ 2024-08-30 12:59 ` Mark Brown
2024-08-30 13:02 ` Lorenzo Stoakes
0 siblings, 1 reply; 26+ messages in thread
From: Mark Brown @ 2024-08-30 12:59 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, linux-mm, linux-kernel, Liam R . Howlett,
Vlastimil Babka, Ryan Roberts, Aishwarya TCV, dev.jain
[-- Attachment #1: Type: text/plain, Size: 1174 bytes --]
On Thu, Aug 29, 2024 at 10:22:53PM +0100, Lorenzo Stoakes wrote:
> Thanks, I figured out the problem, it's not arm-specific, I was running
> self-tests but eyeballing-failure resulted in me missing this.
>
> This is a product of vma_merge_extend() invoking vma_merge_new_range() without
> having determined the next VMA correctly, after moving from vma_merge() (which
> looked this up for us) to vma_merge_new_range() (which does not).
>
> This is after having adjusted the assumptions between v1 and v2 of the series in
> each merge function, and I simply missed this mremap()-specific case.
>
> Andrew - I enclose a fix-patch to get a fix out for this asap, but I am due a
> respin relatively soon and will also include that in this.
>
> ----8<----
> From 3678f8a53f98de52f11946d4d32e6fb239d11c2f Mon Sep 17 00:00:00 2001
> From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Date: Thu, 29 Aug 2024 22:18:02 +0100
> Subject: [PATCH] mm: correctly determine vmg.next in vma_merge_extend()
>
> vma_merge_next_range() requires that the caller specify prev AND next.
This fixes the problem for me.
Tested-by: Mark Brown <broonie@kernel.org>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 06/10] mm: avoid using vma_merge() for new VMAs
2024-08-30 12:59 ` Mark Brown
@ 2024-08-30 13:02 ` Lorenzo Stoakes
2024-08-30 13:05 ` Mark Brown
0 siblings, 1 reply; 26+ messages in thread
From: Lorenzo Stoakes @ 2024-08-30 13:02 UTC (permalink / raw)
To: Mark Brown
Cc: Andrew Morton, linux-mm, linux-kernel, Liam R . Howlett,
Vlastimil Babka, Ryan Roberts, Aishwarya TCV, dev.jain
On Fri, Aug 30, 2024 at 01:59:37PM GMT, Mark Brown wrote:
> On Thu, Aug 29, 2024 at 10:22:53PM +0100, Lorenzo Stoakes wrote:
>
> > Thanks, I figured out the problem, it's not arm-specific, I was running
> > self-tests but eyeballing-failure resulted in me missing this.
> >
> > This is a product of vma_merge_extend() invoking vma_merge_new_range() without
> > having determined the next VMA correctly, after moving from vma_merge() (which
> > looked this up for us) to vma_merge_new_range() (which does not).
> >
> > This is after having adjusted the assumptions between v1 and v2 of the series in
> > each merge function, and I simply missed this mremap()-specific case.
> >
> > Andrew - I enclose a fix-patch to get a fix out for this asap, but I am due a
> > respin relatively soon and will also include that in this.
> >
> > ----8<----
> > From 3678f8a53f98de52f11946d4d32e6fb239d11c2f Mon Sep 17 00:00:00 2001
> > From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Date: Thu, 29 Aug 2024 22:18:02 +0100
> > Subject: [PATCH] mm: correctly determine vmg.next in vma_merge_extend()
> >
> > vma_merge_next_range() requires that the caller specify prev AND next.
>
> This fixes the problem for me.
>
> Tested-by: Mark Brown <broonie@kernel.org>
Thanks! I will be folding this into a respin soon, are you good with me
adding this tag to the patch 6/10 in general? No worries if not, as fix
will be subsumed there.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 06/10] mm: avoid using vma_merge() for new VMAs
2024-08-30 13:02 ` Lorenzo Stoakes
@ 2024-08-30 13:05 ` Mark Brown
2024-08-30 13:10 ` Lorenzo Stoakes
0 siblings, 1 reply; 26+ messages in thread
From: Mark Brown @ 2024-08-30 13:05 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, linux-mm, linux-kernel, Liam R . Howlett,
Vlastimil Babka, Ryan Roberts, Aishwarya TCV, dev.jain
[-- Attachment #1: Type: text/plain, Size: 355 bytes --]
On Fri, Aug 30, 2024 at 02:02:33PM +0100, Lorenzo Stoakes wrote:
> On Fri, Aug 30, 2024 at 01:59:37PM GMT, Mark Brown wrote:
> > Tested-by: Mark Brown <broonie@kernel.org>
> Thanks! I will be folding this into a respin soon, are you good with me
> adding this tag to the patch 6/10 in general? No worries if not, as fix
> will be subsumed there.
Sure.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 06/10] mm: avoid using vma_merge() for new VMAs
2024-08-30 13:05 ` Mark Brown
@ 2024-08-30 13:10 ` Lorenzo Stoakes
0 siblings, 0 replies; 26+ messages in thread
From: Lorenzo Stoakes @ 2024-08-30 13:10 UTC (permalink / raw)
To: Mark Brown
Cc: Andrew Morton, linux-mm, linux-kernel, Liam R . Howlett,
Vlastimil Babka, Ryan Roberts, Aishwarya TCV, dev.jain
On Fri, Aug 30, 2024 at 02:05:23PM GMT, Mark Brown wrote:
> On Fri, Aug 30, 2024 at 02:02:33PM +0100, Lorenzo Stoakes wrote:
> > On Fri, Aug 30, 2024 at 01:59:37PM GMT, Mark Brown wrote:
>
> > > Tested-by: Mark Brown <broonie@kernel.org>
>
> > Thanks! I will be folding this into a respin soon, are you good with me
> > adding this tag to the patch 6/10 in general? No worries if not, as fix
> > will be subsumed there.
>
> Sure.
Thanks!
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 03/10] mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify()
2024-08-28 19:35 ` Liam R. Howlett
@ 2024-08-30 13:28 ` Lorenzo Stoakes
0 siblings, 0 replies; 26+ messages in thread
From: Lorenzo Stoakes @ 2024-08-30 13:28 UTC (permalink / raw)
To: Liam R. Howlett, linux-mm, linux-kernel, Andrew Morton,
Vlastimil Babka
On Wed, Aug 28, 2024 at 03:35:06PM GMT, Liam R. Howlett wrote:
> * Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [240823 16:07]:
> > Rather than passing around huge numbers of parameters to numerous helper
> > functions, abstract them into a single struct that we thread through the
> > operation, the vma_merge_struct ('vmg').
> >
> > Adjust vma_merge() and vma_modify() to accept this parameter, as well as
> > predicate functions can_vma_merge_before(), can_vma_merge_after(), and the
> > vma_modify_...() helper functions.
> >
> > Also introduce VMG_STATE() and VMG_VMA_STATE() helper macros to allow for
> > easy vmg declaration.
> >
> > We additionally remove the requirement that vma_merge() is passed a VMA
> > object representing the candidate new VMA. Previously it used this to
> > obtain the mm_struct, file and anon_vma properties of the proposed range (a
> > rather confusing state of affairs), which are now provided by the vmg
> > directly.
> >
> > We also remove the pgoff calculation previously performed vma_modify(), and
> > instead calculate this in VMG_VMA_STATE() via the vma_pgoff_offset()
> > helper.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Some small things..
All addressed except one thing, see below...
>
> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Thanks!
>
> > ---
> > mm/mmap.c | 77 ++++++++-------
> > mm/vma.c | 209 ++++++++++++++++++++++++----------------
> > mm/vma.h | 128 ++++++++++++++----------
> > tools/testing/vma/vma.c | 43 +--------
> > 4 files changed, 249 insertions(+), 208 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index e495b0381265..4066c0444495 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1373,9 +1373,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> > unsigned long end = addr + len;
> > unsigned long merge_start = addr, merge_end = end;
> > bool writable_file_mapping = false;
> > - pgoff_t vm_pgoff;
> > int error = -ENOMEM;
> > VMA_ITERATOR(vmi, mm, addr);
> > + VMG_STATE(vmg, mm, &vmi, addr, end, vm_flags, pgoff);
> > +
> > + vmg.file = file;
> >
>
> Extra whitespace here.
>
> > /* Find the first overlapping VMA */
> > vma = vma_find(&vmi, end);
> > @@ -1388,12 +1390,12 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> > if (vms_gather_munmap_vmas(&vms, &mas_detach))
> > return -ENOMEM;
> >
> > - next = vms.next;
> > - prev = vms.prev;
> > + next = vmg.next = vms.next;
> > + prev = vmg.prev = vms.prev;
> > vma = NULL;
> > } else {
> > - next = vma_next(&vmi);
> > - prev = vma_prev(&vmi);
> > + next = vmg.next = vma_next(&vmi);
> > + prev = vmg.prev = vma_prev(&vmi);
> > if (prev)
> > vma_iter_next_range(&vmi);
> > }
> > @@ -1413,6 +1415,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> >
> > vms.nr_accounted = 0;
> > vm_flags |= VM_ACCOUNT;
> > + vmg.flags = vm_flags;
> > }
> >
> > if (vm_flags & VM_SPECIAL)
> > @@ -1421,28 +1424,31 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> > /* Attempt to expand an old mapping */
> > /* Check next */
> > if (next && next->vm_start == end && !vma_policy(next) &&
> > - can_vma_merge_before(next, vm_flags, NULL, file, pgoff+pglen,
> > - NULL_VM_UFFD_CTX, NULL)) {
> > + can_vma_merge_before(&vmg)) {
> > merge_end = next->vm_end;
> > vma = next;
> > - vm_pgoff = next->vm_pgoff - pglen;
> > + vmg.pgoff = next->vm_pgoff - pglen;
> > + /*
> > + * We set this here so if we will merge with the previous VMA in
> > + * the code below, can_vma_merge_after() ensures anon_vma
> > + * compatibility between prev and next.
> > + */
> > + vmg.anon_vma = vma->anon_vma;
> > + vmg.uffd_ctx = vma->vm_userfaultfd_ctx;
> > }
> >
> > /* Check prev */
> > if (prev && prev->vm_end == addr && !vma_policy(prev) &&
> > - (vma ? can_vma_merge_after(prev, vm_flags, vma->anon_vma, file,
> > - pgoff, vma->vm_userfaultfd_ctx, NULL) :
> > - can_vma_merge_after(prev, vm_flags, NULL, file, pgoff,
> > - NULL_VM_UFFD_CTX, NULL))) {
> > + can_vma_merge_after(&vmg)) {
> > merge_start = prev->vm_start;
> > vma = prev;
> > - vm_pgoff = prev->vm_pgoff;
> > + vmg.pgoff = prev->vm_pgoff;
> > vma_prev(&vmi); /* Equivalent to going to the previous range */
> > }
> >
> > if (vma) {
> > /* Actually expand, if possible */
> > - if (!vma_expand(&vmi, vma, merge_start, merge_end, vm_pgoff, next)) {
> > + if (!vma_expand(&vmi, vma, merge_start, merge_end, vmg.pgoff, next)) {
> > khugepaged_enter_vma(vma, vm_flags);
> > goto expanded;
> > }
> > @@ -1772,26 +1778,29 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > * Expand the existing vma if possible; Note that singular lists do not
> > * occur after forking, so the expand will only happen on new VMAs.
> > */
> > - if (vma && vma->vm_end == addr && !vma_policy(vma) &&
> > - can_vma_merge_after(vma, flags, NULL, NULL,
> > - addr >> PAGE_SHIFT, NULL_VM_UFFD_CTX, NULL)) {
> > - vma_iter_config(vmi, vma->vm_start, addr + len);
> > - if (vma_iter_prealloc(vmi, vma))
> > - goto unacct_fail;
> > -
> > - vma_start_write(vma);
> > -
> > - init_vma_prep(&vp, vma);
> > - vma_prepare(&vp);
> > - vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0);
> > - vma->vm_end = addr + len;
> > - vm_flags_set(vma, VM_SOFTDIRTY);
> > - vma_iter_store(vmi, vma);
> > -
> > - vma_complete(&vp, vmi, mm);
> > - validate_mm(mm);
> > - khugepaged_enter_vma(vma, flags);
> > - goto out;
> > + if (vma && vma->vm_end == addr && !vma_policy(vma)) {
> > + VMG_STATE(vmg, mm, vmi, addr, addr + len, flags, PHYS_PFN(addr));
> > +
> > + vmg.prev = vma;
> > + if (can_vma_merge_after(&vmg)) {
> > + vma_iter_config(vmi, vma->vm_start, addr + len);
> > + if (vma_iter_prealloc(vmi, vma))
> > + goto unacct_fail;
> > +
> > + vma_start_write(vma);
> > +
> > + init_vma_prep(&vp, vma);
> > + vma_prepare(&vp);
> > + vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0);
> > + vma->vm_end = addr + len;
> > + vm_flags_set(vma, VM_SOFTDIRTY);
> > + vma_iter_store(vmi, vma);
> > +
> > + vma_complete(&vp, vmi, mm);
> > + validate_mm(mm);
> > + khugepaged_enter_vma(vma, flags);
> > + goto out;
> > + }
> > }
> >
> > if (vma)
> > diff --git a/mm/vma.c b/mm/vma.c
> > index e5a5e418c7cb..74c627ff0313 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -7,16 +7,18 @@
> > #include "vma_internal.h"
> > #include "vma.h"
> >
> > -/*
> > - * If the vma has a ->close operation then the driver probably needs to release
> > - * per-vma resources, so we don't attempt to merge those if the caller indicates
> > - * the current vma may be removed as part of the merge.
> > - */
> > -static inline bool is_mergeable_vma(struct vm_area_struct *vma,
> > - struct file *file, unsigned long vm_flags,
> > - struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> > - struct anon_vma_name *anon_name, bool may_remove_vma)
> > +static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_next)
> > {
> > + struct vm_area_struct *vma = merge_next ? vmg->next : vmg->prev;
> > + /*
> > + * If the vma has a ->close operation then the driver probably needs to
> > + * release per-vma resources, so we don't attempt to merge those if the
> > + * caller indicates the current vma may be removed as part of the merge,
> > + * which is the case if we are attempting to merge the next VMA into
> > + * this one.
> > + */
> > + bool may_remove_vma = merge_next;
> > +
> > /*
> > * VM_SOFTDIRTY should not prevent from VMA merging, if we
> > * match the flags but dirty bit -- the caller should mark
> > @@ -25,15 +27,15 @@ static inline bool is_mergeable_vma(struct vm_area_struct *vma,
> > * the kernel to generate new VMAs when old one could be
> > * extended instead.
> > */
> > - if ((vma->vm_flags ^ vm_flags) & ~VM_SOFTDIRTY)
> > + if ((vma->vm_flags ^ vmg->flags) & ~VM_SOFTDIRTY)
> > return false;
> > - if (vma->vm_file != file)
> > + if (vma->vm_file != vmg->file)
> > return false;
> > if (may_remove_vma && vma->vm_ops && vma->vm_ops->close)
> > return false;
> > - if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx))
> > + if (!is_mergeable_vm_userfaultfd_ctx(vma, vmg->uffd_ctx))
> > return false;
> > - if (!anon_vma_name_eq(anon_vma_name(vma), anon_name))
> > + if (!anon_vma_name_eq(anon_vma_name(vma), vmg->anon_name))
> > return false;
> > return true;
> > }
> > @@ -94,16 +96,16 @@ static void init_multi_vma_prep(struct vma_prepare *vp,
> > * We assume the vma may be removed as part of the merge.
> > */
> > bool
> > -can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
> > - struct anon_vma *anon_vma, struct file *file,
> > - pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> > - struct anon_vma_name *anon_name)
> > +can_vma_merge_before(struct vma_merge_struct *vmg)
> > {
> > - if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name, true) &&
> > - is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
> > - if (vma->vm_pgoff == vm_pgoff)
> > + pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start);
> > +
> > + if (is_mergeable_vma(vmg, true) &&
> > + is_mergeable_anon_vma(vmg->anon_vma, vmg->next->anon_vma, vmg->next)) {
> > + if (vmg->next->vm_pgoff == vmg->pgoff + pglen)
> > return true;
> > }
> > +
> > return false;
> > }
> >
> > @@ -116,18 +118,11 @@ can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
> > *
> > * We assume that vma is not removed as part of the merge.
> > */
> > -bool
> > -can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
> > - struct anon_vma *anon_vma, struct file *file,
> > - pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> > - struct anon_vma_name *anon_name)
> > +bool can_vma_merge_after(struct vma_merge_struct *vmg)
> > {
> > - if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name, false) &&
> > - is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
> > - pgoff_t vm_pglen;
> > -
> > - vm_pglen = vma_pages(vma);
> > - if (vma->vm_pgoff + vm_pglen == vm_pgoff)
> > + if (is_mergeable_vma(vmg, false) &&
>
> I started putting /* merge_next = */ false, so that it's easier to read.
> Not a whole lot of arguments, but it is still useful to remember what is
> passed.
>
> > + is_mergeable_anon_vma(vmg->anon_vma, vmg->prev->anon_vma, vmg->prev)) {
> > + if (vmg->prev->vm_pgoff + vma_pages(vmg->prev) == vmg->pgoff)
> > return true;
> > }
> > return false;
> > @@ -1022,16 +1017,10 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
> > * **** is not represented - it will be merged and the vma containing the
> > * area is returned, or the function will return NULL
> > */
> > -static struct vm_area_struct
> > -*vma_merge(struct vma_iterator *vmi, struct vm_area_struct *prev,
> > - struct vm_area_struct *src, unsigned long addr, unsigned long end,
> > - unsigned long vm_flags, pgoff_t pgoff, struct mempolicy *policy,
> > - struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> > - struct anon_vma_name *anon_name)
> > +static struct vm_area_struct *vma_merge(struct vma_merge_struct *vmg)
> > {
> > - struct mm_struct *mm = src->vm_mm;
> > - struct anon_vma *anon_vma = src->anon_vma;
> > - struct file *file = src->vm_file;
> > + struct mm_struct *mm = vmg->mm;
> > + struct vm_area_struct *prev = vmg->prev;
> > struct vm_area_struct *curr, *next, *res;
> > struct vm_area_struct *vma, *adjust, *remove, *remove2;
> > struct vm_area_struct *anon_dup = NULL;
> > @@ -1041,16 +1030,18 @@ static struct vm_area_struct
> > bool merge_prev = false;
> > bool merge_next = false;
> > bool vma_expanded = false;
> > + unsigned long addr = vmg->start;
> > + unsigned long end = vmg->end;
> > unsigned long vma_start = addr;
> > unsigned long vma_end = end;
> > - pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
> > + pgoff_t pglen = PHYS_PFN(end - addr);
> > long adj_start = 0;
> >
> > /*
> > * We later require that vma->vm_flags == vm_flags,
> > * so this tests vma->vm_flags & VM_SPECIAL, too.
> > */
> > - if (vm_flags & VM_SPECIAL)
> > + if (vmg->flags & VM_SPECIAL)
> > return NULL;
> >
> > /* Does the input range span an existing VMA? (cases 5 - 8) */
> > @@ -1058,27 +1049,26 @@ static struct vm_area_struct
> >
> > if (!curr || /* cases 1 - 4 */
> > end == curr->vm_end) /* cases 6 - 8, adjacent VMA */
> > - next = vma_lookup(mm, end);
> > + next = vmg->next = vma_lookup(mm, end);
> > else
> > - next = NULL; /* case 5 */
> > + next = vmg->next = NULL; /* case 5 */
> >
> > if (prev) {
> > vma_start = prev->vm_start;
> > vma_pgoff = prev->vm_pgoff;
> >
> > /* Can we merge the predecessor? */
> > - if (addr == prev->vm_end && mpol_equal(vma_policy(prev), policy)
> > - && can_vma_merge_after(prev, vm_flags, anon_vma, file,
> > - pgoff, vm_userfaultfd_ctx, anon_name)) {
> > + if (addr == prev->vm_end && mpol_equal(vma_policy(prev), vmg->policy)
> > + && can_vma_merge_after(vmg)) {
> > +
> > merge_prev = true;
> > - vma_prev(vmi);
> > + vma_prev(vmg->vmi);
> > }
> > }
> >
> > /* Can we merge the successor? */
> > - if (next && mpol_equal(policy, vma_policy(next)) &&
> > - can_vma_merge_before(next, vm_flags, anon_vma, file, pgoff+pglen,
> > - vm_userfaultfd_ctx, anon_name)) {
> > + if (next && mpol_equal(vmg->policy, vma_policy(next)) &&
> > + can_vma_merge_before(vmg)) {
> > merge_next = true;
> > }
> >
> > @@ -1129,7 +1119,7 @@ static struct vm_area_struct
> > remove = curr;
> > } else { /* case 5 */
> > adjust = curr;
> > - adj_start = (end - curr->vm_start);
> > + adj_start = end - curr->vm_start;
>
> unnecessary change, if you respin.. but it's temporary.
>
> > }
> > if (!err)
> > err = dup_anon_vma(prev, curr, &anon_dup);
> > @@ -1169,13 +1159,13 @@ static struct vm_area_struct
> > vma_expanded = true;
> >
> > if (vma_expanded) {
> > - vma_iter_config(vmi, vma_start, vma_end);
> > + vma_iter_config(vmg->vmi, vma_start, vma_end);
> > } else {
> > - vma_iter_config(vmi, adjust->vm_start + adj_start,
> > + vma_iter_config(vmg->vmi, adjust->vm_start + adj_start,
> > adjust->vm_end);
> > }
> >
> > - if (vma_iter_prealloc(vmi, vma))
> > + if (vma_iter_prealloc(vmg->vmi, vma))
> > goto prealloc_fail;
> >
> > init_multi_vma_prep(&vp, vma, adjust, remove, remove2);
> > @@ -1187,20 +1177,20 @@ static struct vm_area_struct
> > vma_set_range(vma, vma_start, vma_end, vma_pgoff);
> >
> > if (vma_expanded)
> > - vma_iter_store(vmi, vma);
> > + vma_iter_store(vmg->vmi, vma);
> >
> > if (adj_start) {
> > adjust->vm_start += adj_start;
> > adjust->vm_pgoff += adj_start >> PAGE_SHIFT;
> > if (adj_start < 0) {
> > WARN_ON(vma_expanded);
> > - vma_iter_store(vmi, next);
> > + vma_iter_store(vmg->vmi, next);
> > }
> > }
> >
> > - vma_complete(&vp, vmi, mm);
> > + vma_complete(&vp, vmg->vmi, mm);
> > validate_mm(mm);
> > - khugepaged_enter_vma(res, vm_flags);
> > + khugepaged_enter_vma(res, vmg->flags);
> > return res;
> >
> > prealloc_fail:
> > @@ -1208,8 +1198,8 @@ static struct vm_area_struct
> > unlink_anon_vmas(anon_dup);
> >
> > anon_vma_fail:
> > - vma_iter_set(vmi, addr);
> > - vma_iter_load(vmi);
> > + vma_iter_set(vmg->vmi, addr);
> > + vma_iter_load(vmg->vmi);
> > return NULL;
> > }
> >
> > @@ -1226,32 +1216,27 @@ static struct vm_area_struct
> > * The function returns either the merged VMA, the original VMA if a split was
> > * required instead, or an error if the split failed.
> > */
> > -struct vm_area_struct *vma_modify(struct vma_iterator *vmi,
> > - struct vm_area_struct *prev,
> > - struct vm_area_struct *vma,
> > - unsigned long start, unsigned long end,
> > - unsigned long vm_flags,
> > - struct mempolicy *policy,
> > - struct vm_userfaultfd_ctx uffd_ctx,
> > - struct anon_vma_name *anon_name)
> > +static struct vm_area_struct *vma_modify(struct vma_merge_struct *vmg)
> > {
> > - pgoff_t pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
> > + struct vm_area_struct *vma = vmg->vma;
> > struct vm_area_struct *merged;
> >
> > - merged = vma_merge(vmi, prev, vma, start, end, vm_flags,
> > - pgoff, policy, uffd_ctx, anon_name);
> > + /* First, try to merge. */
> > + merged = vma_merge(vmg);
> > if (merged)
> > return merged;
> >
> > - if (vma->vm_start < start) {
> > - int err = split_vma(vmi, vma, start, 1);
> > + /* Split any preceding portion of the VMA. */
> > + if (vma->vm_start < vmg->start) {
> > + int err = split_vma(vmg->vmi, vma, vmg->start, 1);
> >
> > if (err)
> > return ERR_PTR(err);
> > }
> >
> > - if (vma->vm_end > end) {
> > - int err = split_vma(vmi, vma, end, 0);
> > + /* Split any trailing portion of the VMA. */
> > + if (vma->vm_end > vmg->end) {
> > + int err = split_vma(vmg->vmi, vma, vmg->end, 0);
> >
> > if (err)
> > return ERR_PTR(err);
> > @@ -1260,6 +1245,65 @@ struct vm_area_struct *vma_modify(struct vma_iterator *vmi,
> > return vma;
> > }
> >
> > +struct vm_area_struct *vma_modify_flags(
> > + struct vma_iterator *vmi, struct vm_area_struct *prev,
> > + struct vm_area_struct *vma, unsigned long start, unsigned long end,
> > + unsigned long new_flags)
> > +{
> > + VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
> > +
> > + vmg.flags = new_flags;
> > +
> > + return vma_modify(&vmg);
> > +}
> > +
> > +struct vm_area_struct
> > +*vma_modify_flags_name(struct vma_iterator *vmi,
> > + struct vm_area_struct *prev,
> > + struct vm_area_struct *vma,
> > + unsigned long start,
> > + unsigned long end,
> > + unsigned long new_flags,
> > + struct anon_vma_name *new_name)
> > +{
> > + VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
> > +
> > + vmg.flags = new_flags;
> > + vmg.anon_name = new_name;
> > +
> > + return vma_modify(&vmg);
> > +}
> > +
> > +struct vm_area_struct
> > +*vma_modify_policy(struct vma_iterator *vmi,
> > + struct vm_area_struct *prev,
> > + struct vm_area_struct *vma,
> > + unsigned long start, unsigned long end,
> > + struct mempolicy *new_pol)
> > +{
> > + VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
> > +
> > + vmg.policy = new_pol;
> > +
> > + return vma_modify(&vmg);
> > +}
> > +
> > +struct vm_area_struct
> > +*vma_modify_flags_uffd(struct vma_iterator *vmi,
> > + struct vm_area_struct *prev,
> > + struct vm_area_struct *vma,
> > + unsigned long start, unsigned long end,
> > + unsigned long new_flags,
> > + struct vm_userfaultfd_ctx new_ctx)
> > +{
> > + VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
> > +
> > + vmg.flags = new_flags;
> > + vmg.uffd_ctx = new_ctx;
> > +
> > + return vma_modify(&vmg);
> > +}
> > +
> > /*
> > * Attempt to merge a newly mapped VMA with those adjacent to it. The caller
> > * must ensure that [start, end) does not overlap any existing VMA.
> > @@ -1269,8 +1313,11 @@ struct vm_area_struct
> > struct vm_area_struct *vma, unsigned long start,
> > unsigned long end, pgoff_t pgoff)
> > {
> > - return vma_merge(vmi, prev, vma, start, end, vma->vm_flags, pgoff,
> > - vma_policy(vma), vma->vm_userfaultfd_ctx, anon_vma_name(vma));
> > + VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
> > +
> > + vmg.pgoff = pgoff;
> > +
> > + return vma_merge(&vmg);
> > }
> >
> > /*
> > @@ -1281,12 +1328,10 @@ struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi,
> > struct vm_area_struct *vma,
> > unsigned long delta)
> > {
> > - pgoff_t pgoff = vma->vm_pgoff + vma_pages(vma);
> > + VMG_VMA_STATE(vmg, vmi, vma, vma, vma->vm_end, vma->vm_end + delta);
> >
> > /* vma is specified as prev, so case 1 or 2 will apply. */
> > - return vma_merge(vmi, vma, vma, vma->vm_end, vma->vm_end + delta,
> > - vma->vm_flags, pgoff, vma_policy(vma),
> > - vma->vm_userfaultfd_ctx, anon_vma_name(vma));
> > + return vma_merge(&vmg);
> > }
>
> Many of these are small and can be static inline, some (all?) in the
> header.
I don't like exposing the raw underlying merge function, so this is on
purpose. No indication of perf hit.
>
> >
> > void unlink_file_vma_batch_init(struct unlink_vma_file_batch *vb)
> > diff --git a/mm/vma.h b/mm/vma.h
> > index c774642697a0..9060a1a1edfa 100644
> > --- a/mm/vma.h
> > +++ b/mm/vma.h
> > @@ -52,6 +52,59 @@ struct vma_munmap_struct {
> > unsigned long data_vm;
> > };
> >
> > +/* Represents a VMA merge operation. */
> > +struct vma_merge_struct {
> > + struct mm_struct *mm;
> > + struct vma_iterator *vmi;
> > + pgoff_t pgoff;
> > + struct vm_area_struct *prev;
> > + struct vm_area_struct *next; /* Modified by vma_merge(). */
> > + struct vm_area_struct *vma; /* Either a new VMA or the one being modified. */
> > + unsigned long start;
> > + unsigned long end;
> > + unsigned long flags;
> > + struct file *file;
> > + struct anon_vma *anon_vma;
> > + struct mempolicy *policy;
> > + struct vm_userfaultfd_ctx uffd_ctx;
> > + struct anon_vma_name *anon_name;
> > +};
> > +
> > +/* Assumes addr >= vma->vm_start. */
> > +static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
> > + unsigned long addr)
> > +{
> > + return vma->vm_pgoff + PHYS_PFN(addr - vma->vm_start);
> > +}
> > +
> > +#define VMG_STATE(name, mm_, vmi_, start_, end_, flags_, pgoff_) \
> > + struct vma_merge_struct name = { \
> > + .mm = mm_, \
> > + .vmi = vmi_, \
> > + .start = start_, \
> > + .end = end_, \
> > + .flags = flags_, \
> > + .pgoff = pgoff_, \
> > + }
> > +
> > +#define VMG_VMA_STATE(name, vmi_, prev_, vma_, start_, end_) \
> > + struct vma_merge_struct name = { \
> > + .mm = vma_->vm_mm, \
> > + .vmi = vmi_, \
> > + .prev = prev_, \
> > + .next = NULL, \
> > + .vma = vma_, \
> > + .start = start_, \
> > + .end = end_, \
> > + .flags = vma_->vm_flags, \
> > + .pgoff = vma_pgoff_offset(vma_, start_), \
> > + .file = vma_->vm_file, \
> > + .anon_vma = vma_->anon_vma, \
> > + .policy = vma_policy(vma_), \
> > + .uffd_ctx = vma_->vm_userfaultfd_ctx, \
> > + .anon_name = anon_vma_name(vma_), \
> > + }
> > +
> > #ifdef CONFIG_DEBUG_VM_MAPLE_TREE
> > void validate_mm(struct mm_struct *mm);
> > #else
> > @@ -208,80 +261,53 @@ void remove_vma(struct vm_area_struct *vma, bool unreachable, bool closed);
> > void unmap_region(struct ma_state *mas, struct vm_area_struct *vma,
> > struct vm_area_struct *prev, struct vm_area_struct *next);
> >
> > -/* Required by mmap_region(). */
> > -bool
> > -can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
> > - struct anon_vma *anon_vma, struct file *file,
> > - pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> > - struct anon_vma_name *anon_name);
> > -
> > -/* Required by mmap_region() and do_brk_flags(). */
> > -bool
> > -can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
> > - struct anon_vma *anon_vma, struct file *file,
> > - pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> > - struct anon_vma_name *anon_name);
> > -
> > -struct vm_area_struct *vma_modify(struct vma_iterator *vmi,
> > - struct vm_area_struct *prev,
> > - struct vm_area_struct *vma,
> > - unsigned long start, unsigned long end,
> > - unsigned long vm_flags,
> > - struct mempolicy *policy,
> > - struct vm_userfaultfd_ctx uffd_ctx,
> > - struct anon_vma_name *anon_name);
> > +/*
> > + * Can we merge the VMA described by vmg into the following VMA vmg->next?
> > + *
> > + * Required by mmap_region().
> > + */
> > +bool can_vma_merge_before(struct vma_merge_struct *vmg);
> > +
> > +/*
> > + * Can we merge the VMA described by vmg into the preceding VMA vmg->prev?
> > + *
> > + * Required by mmap_region() and do_brk_flags().
> > + */
> > +bool can_vma_merge_after(struct vma_merge_struct *vmg);
> >
> > /* We are about to modify the VMA's flags. */
> > -static inline struct vm_area_struct
> > -*vma_modify_flags(struct vma_iterator *vmi,
> > - struct vm_area_struct *prev,
> > - struct vm_area_struct *vma,
> > - unsigned long start, unsigned long end,
> > - unsigned long new_flags)
> > -{
> > - return vma_modify(vmi, prev, vma, start, end, new_flags,
> > - vma_policy(vma), vma->vm_userfaultfd_ctx,
> > - anon_vma_name(vma));
> > -}
> > +struct vm_area_struct *vma_modify_flags(struct vma_iterator *vmi,
> > + struct vm_area_struct *prev,
> > + struct vm_area_struct *vma,
> > + unsigned long start, unsigned long end,
> > + unsigned long new_flags);
>
> Does this get better with two tab indent?
>
> >
> > /* We are about to modify the VMA's flags and/or anon_name. */
> > -static inline struct vm_area_struct
> > +struct vm_area_struct
> > *vma_modify_flags_name(struct vma_iterator *vmi,
> > struct vm_area_struct *prev,
> > struct vm_area_struct *vma,
> > unsigned long start,
> > unsigned long end,
> > unsigned long new_flags,
> > - struct anon_vma_name *new_name)
> > -{
> > - return vma_modify(vmi, prev, vma, start, end, new_flags,
> > - vma_policy(vma), vma->vm_userfaultfd_ctx, new_name);
> > -}
> > + struct anon_vma_name *new_name);
> >
> > /* We are about to modify the VMA's memory policy. */
> > -static inline struct vm_area_struct
> > +struct vm_area_struct
> > *vma_modify_policy(struct vma_iterator *vmi,
> > struct vm_area_struct *prev,
> > struct vm_area_struct *vma,
> > unsigned long start, unsigned long end,
> > - struct mempolicy *new_pol)
> > -{
> > - return vma_modify(vmi, prev, vma, start, end, vma->vm_flags,
> > - new_pol, vma->vm_userfaultfd_ctx, anon_vma_name(vma));
> > -}
> > + struct mempolicy *new_pol);
> >
> > /* We are about to modify the VMA's flags and/or uffd context. */
> > -static inline struct vm_area_struct
> > +struct vm_area_struct
> > *vma_modify_flags_uffd(struct vma_iterator *vmi,
> > struct vm_area_struct *prev,
> > struct vm_area_struct *vma,
> > unsigned long start, unsigned long end,
> > unsigned long new_flags,
> > - struct vm_userfaultfd_ctx new_ctx)
> > -{
> > - return vma_modify(vmi, prev, vma, start, end, new_flags,
> > - vma_policy(vma), new_ctx, anon_vma_name(vma));
> > -}
> > + struct vm_userfaultfd_ctx new_ctx);
> >
> > struct vm_area_struct
> > *vma_merge_new_vma(struct vma_iterator *vmi, struct vm_area_struct *prev,
> > diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
> > index 9b272633ca9e..b12f72979478 100644
> > --- a/tools/testing/vma/vma.c
> > +++ b/tools/testing/vma/vma.c
> > @@ -22,26 +22,6 @@ static bool fail_prealloc;
> > */
> > #include "../../../mm/vma.c"
> >
> > -/*
> > - * Temporarily forward-ported from a future in which vmg's are used for merging.
>
> the future is soon.
>
> > - */
> > -struct vma_merge_struct {
> > - struct mm_struct *mm;
> > - struct vma_iterator *vmi;
> > - pgoff_t pgoff;
> > - struct vm_area_struct *prev;
> > - struct vm_area_struct *next; /* Modified by vma_merge(). */
> > - struct vm_area_struct *vma; /* Either a new VMA or the one being modified. */
> > - unsigned long start;
> > - unsigned long end;
> > - unsigned long flags;
> > - struct file *file;
> > - struct anon_vma *anon_vma;
> > - struct mempolicy *policy;
> > - struct vm_userfaultfd_ctx uffd_ctx;
> > - struct anon_vma_name *anon_name;
> > -};
> > -
> > const struct vm_operations_struct vma_dummy_vm_ops;
> > static struct anon_vma dummy_anon_vma;
> >
> > @@ -115,14 +95,6 @@ static struct vm_area_struct *alloc_and_link_vma(struct mm_struct *mm,
> > /* Helper function which provides a wrapper around a merge new VMA operation. */
> > static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
> > {
> > - /* vma_merge() needs a VMA to determine mm, anon_vma, and file. */
> > - struct vm_area_struct dummy = {
> > - .vm_mm = vmg->mm,
> > - .vm_flags = vmg->flags,
> > - .anon_vma = vmg->anon_vma,
> > - .vm_file = vmg->file,
> > - };
> > -
> > /*
> > * For convenience, get prev and next VMAs. Which the new VMA operation
> > * requires.
> > @@ -131,8 +103,7 @@ static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
> > vmg->prev = vma_prev(vmg->vmi);
> >
> > vma_iter_set(vmg->vmi, vmg->start);
> > - return vma_merge_new_vma(vmg->vmi, vmg->prev, &dummy, vmg->start,
> > - vmg->end, vmg->pgoff);
> > + return vma_merge(vmg);
> > }
> >
> > /*
> > @@ -141,17 +112,7 @@ static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
> > */
> > static struct vm_area_struct *merge_existing(struct vma_merge_struct *vmg)
> > {
> > - /* vma_merge() needs a VMA to determine mm, anon_vma, and file. */
> > - struct vm_area_struct dummy = {
> > - .vm_mm = vmg->mm,
> > - .vm_flags = vmg->flags,
> > - .anon_vma = vmg->anon_vma,
> > - .vm_file = vmg->file,
> > - };
> > -
> > - return vma_merge(vmg->vmi, vmg->prev, &dummy, vmg->start, vmg->end,
> > - vmg->flags, vmg->pgoff, vmg->policy, vmg->uffd_ctx,
> > - vmg->anon_name);
> > + return vma_merge(vmg);
> > }
> >
> > /*
> > --
> > 2.46.0
> >
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 06/10] mm: avoid using vma_merge() for new VMAs
2024-08-28 20:52 ` Liam R. Howlett
@ 2024-08-30 15:19 ` Lorenzo Stoakes
0 siblings, 0 replies; 26+ messages in thread
From: Lorenzo Stoakes @ 2024-08-30 15:19 UTC (permalink / raw)
To: Liam R. Howlett, linux-mm, linux-kernel, Andrew Morton,
Vlastimil Babka
On Wed, Aug 28, 2024 at 04:52:07PM GMT, Liam R. Howlett wrote:
[snip]
> > + /*
> > + * Now try to expand adjacent VMA(s). This takes care of removing the
> > + * following VMA if we have VMAs on both sides.
> > + */
> > + if (vmg->vma && !vma_expand(vmg)) {
> > + khugepaged_enter_vma(vmg->vma, vmg->flags);
>
> This might be able to be moved into vma_expand().
Sorry for the multiple emails, but am working my way through.
Can't do this, as relocate_vma_down() (and the original implementation in
fs/exec.c - I checked) does not invoke khugepaged_enter_vma(), sadly.
>
> > +
>
> Extra whitespace
Ack. A lot of these are to subjective taste for clarity, but am happy to
adjust these for the most part...
>
> > + vmg->state = VMA_MERGE_SUCCESS;
> > + return vmg->vma;
> > + }
> > +
> > + /* If expansion failed, reset state. Allows us to retry merge later. */
> > + vmg->vma = NULL;
> > + vmg->start = start;
> > + vmg->end = end;
> > + vmg->pgoff = pgoff;
> > + if (vmg->vma == prev)
> > + vma_iter_set(vmg->vmi, start);
> > +
> > + return NULL;
> > +}
> > +
> > /*
> > * vma_expand - Expand an existing VMA
> > *
> > @@ -474,7 +584,11 @@ void validate_mm(struct mm_struct *mm)
> > * vmg->next->vm_end. Checking if the vmg->vma can expand and merge with
> > * vmg->next needs to be handled by the caller.
> > *
> > - * Returns: 0 on success
> > + * Returns: 0 on success.
> > + *
> > + * ASSUMPTIONS:
> > + * - The caller must hold a WRITE lock on vmg->vma->mm->mmap_lock.
> > + * - The caller must have set @vmg->vma and @vmg->next.
> > */
> > int vma_expand(struct vma_merge_struct *vmg)
> > {
> > @@ -484,6 +598,8 @@ int vma_expand(struct vma_merge_struct *vmg)
> > struct vm_area_struct *next = vmg->next;
> > struct vma_prepare vp;
> >
> > + mmap_assert_write_locked(vmg->mm);
> > +
>
> There are a few unnecessary whitespaces here..
...except here :) I like to keep the asserts separate from the rest of the
logic, and local declarations on their own lines.
[snip]
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2024-08-30 15:20 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-23 20:06 [PATCH v2 00/10] mm: remove vma_merge() Lorenzo Stoakes
2024-08-23 20:06 ` [PATCH v2 01/10] tools: improve vma test Makefile Lorenzo Stoakes
2024-08-28 19:16 ` Liam R. Howlett
2024-08-23 20:06 ` [PATCH v2 02/10] tools: add VMA merge tests Lorenzo Stoakes
2024-08-28 19:16 ` Liam R. Howlett
2024-08-23 20:06 ` [PATCH v2 03/10] mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify() Lorenzo Stoakes
2024-08-28 19:35 ` Liam R. Howlett
2024-08-30 13:28 ` Lorenzo Stoakes
2024-08-23 20:06 ` [PATCH v2 04/10] mm: remove duplicated open-coded VMA policy check Lorenzo Stoakes
2024-08-28 19:42 ` Liam R. Howlett
2024-08-23 20:07 ` [PATCH v2 05/10] mm: abstract vma_expand() to use vma_merge_struct Lorenzo Stoakes
2024-08-28 20:03 ` Liam R. Howlett
2024-08-23 20:07 ` [PATCH v2 06/10] mm: avoid using vma_merge() for new VMAs Lorenzo Stoakes
2024-08-27 11:41 ` Lorenzo Stoakes
2024-08-28 20:52 ` Liam R. Howlett
2024-08-30 15:19 ` Lorenzo Stoakes
2024-08-29 19:46 ` Mark Brown
2024-08-29 21:22 ` Lorenzo Stoakes
2024-08-30 12:59 ` Mark Brown
2024-08-30 13:02 ` Lorenzo Stoakes
2024-08-30 13:05 ` Mark Brown
2024-08-30 13:10 ` Lorenzo Stoakes
2024-08-23 20:07 ` [PATCH v2 07/10] mm: make vma_prepare() and friends static and internal to vma.c Lorenzo Stoakes
2024-08-23 20:07 ` [PATCH v2 08/10] mm: introduce commit_merge(), abstracting final commit of merge Lorenzo Stoakes
2024-08-23 20:07 ` [PATCH v2 09/10] mm: refactor vma_merge() into modify-only vma_merge_existing_range() Lorenzo Stoakes
2024-08-23 20:07 ` [PATCH v2 10/10] mm: rework vm_ops->close() handling on VMA merge Lorenzo Stoakes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).