linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] fix incorrectly disallowed anonymous VMA merges
@ 2025-03-17 21:15 Lorenzo Stoakes
  2025-03-17 21:15 ` [PATCH 1/3] mm/vma: " Lorenzo Stoakes
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Lorenzo Stoakes @ 2025-03-17 21:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jann Horn, Liam R . Howlett, Suren Baghdasaryan,
	David Hildenbrand, Matthew Wilcox, Rik van Riel, linux-mm,
	linux-kernel

It appears that we have been incorrectly rejecting merge cases for 15
years, apparently by mistake.

Imagine a range of anonymous mapped momemory divided into two VMAs like
this, with incompatible protection bits:

              RW         RWX
	  unfaulted    faulted
	|-----------|-----------|
	|    prev   |    vma    |
	|-----------|-----------|
	             mprotect(RW)

Now imagine mprotect()'ing vma so it is RW. This appears as if it should
merge, it does not.

Neither does this case, again mprotect()'ing vma RW:

              RWX        RW
	   faulted    unfaulted
	|-----------|-----------|
	|    vma    |   next    |
	|-----------|-----------|
	 mprotect(RW)

Nor:

              RW         RWX          RW
	  unfaulted    faulted    unfaulted
	|-----------|-----------|-----------|
	|    prev   |    vma    |    next   |
	|-----------|-----------|-----------|
	             mprotect(RW)

What's going on here?

In commit 5beb49305251 ("mm: change anon_vma linking to fix multi-process
server scalability issue"), from 2010, Rik von Riel took careful care to
account for these cases - commenting that '[this is] easily overlooked:
when mprotect shifts the boundary, make sure the expanding vma has anon_vma
set if the shrinking vma had, to cover any anon pages imported.'

However, commit 965f55dea0e3 ("mmap: avoid merging cloned VMAs") introduced
a little over a year later, appears to have accidentally disallowed this.

By adjusting the is_mergeable_anon_vma() function to avoid lock contention
across large trees of forked anon_vma's, this commit wrongly assumed the
VMA being checked (the ostensible merge 'target') should be faulted, that
is, have an anon_vma, and thus an anon_vma_chain list established, but only
of length 1.

This appears to have been unintentional, as disallowing empty target VMAs
like this across the board makes no sense.

We already have logic that accounts for this case, the same logic Rik
introduced in 2010, now via dup_anon_vma() (and ultimately
anon_vma_clone()), so there is no problem permitting this.

This series fixes this mistake and also ensures that scalability concerns
remain addressed by explicitly checking that whatever VMA is being merged
has not been forked.

A full set of self tests which reproduce the issue are provided, as well as
updating userland VMA tests to assert this behaviour.

The self tests additionally assert scalability concerns are addressed.


Lorenzo Stoakes (3):
  mm/vma: fix incorrectly disallowed anonymous VMA merges
  tools/testing: add PROCMAP_QUERY helper functions in mm self tests
  tools/testing/selftests: assert that anon merge cases behave as
    expected

 mm/vma.c                                  |  81 ++--
 tools/testing/selftests/mm/.gitignore     |   1 +
 tools/testing/selftests/mm/Makefile       |   1 +
 tools/testing/selftests/mm/merge.c        | 454 ++++++++++++++++++++++
 tools/testing/selftests/mm/run_vmtests.sh |   2 +
 tools/testing/selftests/mm/vm_util.c      |  62 +++
 tools/testing/selftests/mm/vm_util.h      |  21 +
 tools/testing/vma/vma.c                   | 100 ++---
 8 files changed, 652 insertions(+), 70 deletions(-)
 create mode 100644 tools/testing/selftests/mm/merge.c

--
2.48.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] mm/vma: fix incorrectly disallowed anonymous VMA merges
  2025-03-17 21:15 [PATCH 0/3] fix incorrectly disallowed anonymous VMA merges Lorenzo Stoakes
@ 2025-03-17 21:15 ` Lorenzo Stoakes
  2025-04-04 12:53   ` Wei Yang
  2025-03-17 21:15 ` [PATCH 2/3] tools/testing: add PROCMAP_QUERY helper functions in mm self tests Lorenzo Stoakes
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Lorenzo Stoakes @ 2025-03-17 21:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jann Horn, Liam R . Howlett, Suren Baghdasaryan,
	David Hildenbrand, Matthew Wilcox, Rik van Riel, linux-mm,
	linux-kernel

anon_vma_chain's were introduced by Rik von Riel in commit 5beb49305251 ("mm:
change anon_vma linking to fix multi-process server scalability issue").

This patch was introduced in March 2010. As part of this change, careful
attention was made to the instance of mprotect() causing a VMA merge, with one
faulted (i.e. having anon_vma set) and another not:

		/*
		 * Easily overlooked: when mprotect shifts the boundary,
		 * make sure the expanding vma has anon_vma set if the
		 * shrinking vma had, to cover any anon pages imported.
		 */

In the modern VMA code, this is handled in dup_anon_vma() (and ultimately
anon_vma_clone()).

This case is one of the three configurations of adjacent VMA anon_vma state
that we might encounter on merge (where dst is the VMA which will be merged
into and src the one being merged into dst):

1.  dst->anon_vma,  src->anon_vma - These must be equal, no-op.
2.  dst->anon_vma, !src->anon_vma - We simply use dst->anon_vma, no-op.
3. !dst->anon_vma,  src->anon_vma - The case in question here.

In case 3, the instance addressed here - we duplicate the AVC connections
from src and place into dst.

However, in practice, we very often do NOT do this.

This appears to be due to an inadvertent consequence of the change
introduced by commit 965f55dea0e3 ("mmap: avoid merging cloned VMAs"),
introduced in May 2011.

This implies that this merge case was functional only for a little over a
year, and has since been broken for ~15 years.

Here, lock scalability concerns lead to us restricting anonymous merges
only to those VMAs with 1 entry in their vma->anon_vma_chain, that is, a
VMA that is not connected to any parent process's anon_vma.

The mergeability test looks like this:

static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1,
		 struct anon_vma *anon_vma2, struct vm_area_struct *vma)
{
	if ((!anon_vma1 || !anon_vma2) && (!vma ||
		!vma->anon_vma || list_is_singular(&vma->anon_vma_chain)))
		return true;
	return anon_vma1 == anon_vma2;
}

However, we have a problem here - typically the vma passed here is the
destination VMA.

For instance in vma_merge_existing_range() we invoke:

can_vma_merge_left()
-> [ check that there is an immediately adjacent prior VMA ]
-> can_vma_merge_after()
  -> is_mergeable_vma() for general attribute check
-> is_mergeable_anon_vma([ proposed anon_vma ], prev->anon_vma, prev)

So if we were considering a target unfaulted 'prev':

	  unfaulted    faulted
	|-----------|-----------|
	|    prev   |    vma    |
	|-----------|-----------|

This would call is_mergeable_anon_vma(NULL, vma->anon_vma, prev).

The list_is_singular() check for vma->anon_vma_chain, an empty list on
fault, would cause this merge to _fail_ even though all else indicates a
merge.

Equally a simple merge into a next VMA would hit the same problem:

	   faulted    unfaulted
	|-----------|-----------|
	|    vma    |    next   |
	|-----------|-----------|

can_vma_merge_right()
-> [ check that there is an immediately adjacent succeeding VMA ]
-> can_vma_merge_before()
  -> is_mergeable_vma() for general attribute check
-> is_mergeable_anon_vma([ proposed anon_vma ], next->anon_vma, next)

For a 3-way merge, we'd also hit the same problem if it was configured like
this for instance:

	  unfaulted    faulted    unfaulted
	|-----------|-----------|-----------|
	|    prev   |    vma    |    next   |
	|-----------|-----------|-----------|

As we'd call can_vma_merge_left() for prev, and can_vma_merge_right() for
next, both of which would fail.

vma_merge_new_range() (and relatedly, vma_expand()) are not impacted, as
the new VMA would never already be faulted (it is a proposed new range).

Because we already handle each of the aforementioned merge cases, and can
absolutely therefore deal with an existing VMA merge with !dst->anon_vma,
src->anon_vma, there is absolutely no reason to disallow this kind of
merge.

It seems that the intention of this patch is to ensure that, in the
instance of merging unfaulted VMAs with faulted ones, we never wish to do
so with those with multiple AVCs due to the fact that anon_vma lock's are
held across both parent and child anon_vma's (actually, the 'root' parent
anon_vma's lock is used).

In fact, the original commit alludes to this - "find_mergeable_anon_vma()
already considers this case".

In find_mergeable_anon_vma() however, we check the anon_vma which will be
merged from, if it is set, then we check
list_is_singular(vma->anon_vma_chain).

So to match this logic, update is_mergeable_anon_vma() to perform this
scalability check on the VMA whose anon_vma we ultimately merge into.

This matches existing behaviour with forked VMAs, only we no longer wrongly
disallow ALL empty target merges.

So we both allow merge cases and ensure the scalability check is correctly
applied.

We may wish to revisit these lock scalability concerns at a later date and
ensure they are still valid.

Additionally, correct userland VMA tests which were mistakenly not
asserting these cases correctly previously to now correctly assert this,
and to ensure vmg->anon_vma state is always consistent to account for newly
introduced asserts.

Fixes: 965f55dea0e3 ("mmap: avoid merging cloned VMAs")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/vma.c                |  81 +++++++++++++++++++++++---------
 tools/testing/vma/vma.c | 100 +++++++++++++++++++++-------------------
 2 files changed, 111 insertions(+), 70 deletions(-)

diff --git a/mm/vma.c b/mm/vma.c
index 5cdc5612bfc1..5418eef3a852 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -57,6 +57,22 @@ struct mmap_state {
 		.state = VMA_MERGE_START,				\
 	}
 
+/*
+ * If, at any point, the VMA had unCoW'd mappings from parents, it will maintain
+ * more than one anon_vma_chain connecting it to more than one anon_vma. A merge
+ * would mean a wider range of folios sharing the root anon_vma lock, and thus
+ * potential lock contention, we do not wish to encourage merging such that this
+ * scales to a problem.
+ */
+static bool vma_had_uncowed_parents(struct vm_area_struct *vma)
+{
+	/*
+	 * The list_is_singular() test is to avoid merging VMA cloned from
+	 * parents. This can improve scalability caused by anon_vma lock.
+	 */
+	return vma && vma->anon_vma && !list_is_singular(&vma->anon_vma_chain);
+}
+
 static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_next)
 {
 	struct vm_area_struct *vma = merge_next ? vmg->next : vmg->prev;
@@ -82,24 +98,28 @@ static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_nex
 	return true;
 }
 
-static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1,
-		 struct anon_vma *anon_vma2, struct vm_area_struct *vma)
+static bool is_mergeable_anon_vma(struct vma_merge_struct *vmg, bool merge_next)
 {
+	struct vm_area_struct *tgt = merge_next ? vmg->next : vmg->prev;
+	struct vm_area_struct *src = vmg->middle; /* exisitng merge case. */
+	struct anon_vma *tgt_anon = tgt->anon_vma;
+	struct anon_vma *src_anon = vmg->anon_vma;
+
 	/*
-	 * The list_is_singular() test is to avoid merging VMA cloned from
-	 * parents. This can improve scalability caused by anon_vma lock.
+	 * We _can_ have !src, vmg->anon_vma via copy_vma(). In this instance we
+	 * will remove the existing VMA's anon_vma's so there's no scalability
+	 * concerns.
 	 */
-	if ((!anon_vma1 || !anon_vma2) && (!vma ||
-		list_is_singular(&vma->anon_vma_chain)))
-		return true;
-	return anon_vma1 == anon_vma2;
-}
+	VM_WARN_ON(src && src_anon != src->anon_vma);
 
-/* Are the anon_vma's belonging to each VMA compatible with one another? */
-static inline bool are_anon_vmas_compatible(struct vm_area_struct *vma1,
-					    struct vm_area_struct *vma2)
-{
-	return is_mergeable_anon_vma(vma1->anon_vma, vma2->anon_vma, NULL);
+	/* Case 1 - we will dup_anon_vma() from src into tgt. */
+	if (!tgt_anon && src_anon)
+		return !vma_had_uncowed_parents(src);
+	/* Case 2 - we will simply use tgt's anon_vma. */
+	if (tgt_anon && !src_anon)
+		return !vma_had_uncowed_parents(tgt);
+	/* Case 3 - the anon_vma's are already shared. */
+	return src_anon == tgt_anon;
 }
 
 /*
@@ -164,7 +184,7 @@ static bool can_vma_merge_before(struct vma_merge_struct *vmg)
 	pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start);
 
 	if (is_mergeable_vma(vmg, /* merge_next = */ true) &&
-	    is_mergeable_anon_vma(vmg->anon_vma, vmg->next->anon_vma, vmg->next)) {
+	    is_mergeable_anon_vma(vmg, /* merge_next = */ true)) {
 		if (vmg->next->vm_pgoff == vmg->pgoff + pglen)
 			return true;
 	}
@@ -184,7 +204,7 @@ static bool can_vma_merge_before(struct vma_merge_struct *vmg)
 static bool can_vma_merge_after(struct vma_merge_struct *vmg)
 {
 	if (is_mergeable_vma(vmg, /* merge_next = */ false) &&
-	    is_mergeable_anon_vma(vmg->anon_vma, vmg->prev->anon_vma, vmg->prev)) {
+	    is_mergeable_anon_vma(vmg, /* merge_next = */ false)) {
 		if (vmg->prev->vm_pgoff + vma_pages(vmg->prev) == vmg->pgoff)
 			return true;
 	}
@@ -400,8 +420,10 @@ static bool can_vma_merge_left(struct vma_merge_struct *vmg)
 static bool can_vma_merge_right(struct vma_merge_struct *vmg,
 				bool can_merge_left)
 {
-	if (!vmg->next || vmg->end != vmg->next->vm_start ||
-	    !can_vma_merge_before(vmg))
+	struct vm_area_struct *next = vmg->next;
+	struct vm_area_struct *prev;
+
+	if (!next || vmg->end != next->vm_start || !can_vma_merge_before(vmg))
 		return false;
 
 	if (!can_merge_left)
@@ -414,7 +436,9 @@ static bool can_vma_merge_right(struct vma_merge_struct *vmg,
 	 *
 	 * We therefore check this in addition to mergeability to either side.
 	 */
-	return are_anon_vmas_compatible(vmg->prev, vmg->next);
+	prev = vmg->prev;
+	return !prev->anon_vma || !next->anon_vma ||
+		prev->anon_vma == next->anon_vma;
 }
 
 /*
@@ -554,7 +578,9 @@ static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
 }
 
 /*
- * dup_anon_vma() - Helper function to duplicate anon_vma
+ * dup_anon_vma() - Helper function to duplicate anon_vma on VMA merge in the
+ * instance that the destination VMA has no anon_vma but the source does.
+ *
  * @dst: The destination VMA
  * @src: The source VMA
  * @dup: Pointer to the destination VMA when successful.
@@ -565,9 +591,18 @@ static int dup_anon_vma(struct vm_area_struct *dst,
 			struct vm_area_struct *src, struct vm_area_struct **dup)
 {
 	/*
-	 * Easily overlooked: when mprotect shifts the boundary, make sure the
-	 * expanding vma has anon_vma set if the shrinking vma had, to cover any
-	 * anon pages imported.
+	 * There are three cases to consider for correctly propagating
+	 * anon_vma's on merge.
+	 *
+	 * The first is trivial - neither VMA has anon_vma, we need not do
+	 * anything.
+	 *
+	 * The second where both have anon_vma is also a no-op, as they must
+	 * then be the same, so there is simply nothing to copy.
+	 *
+	 * Here we cover the third - if the destination VMA has no anon_vma,
+	 * that is it is unfaulted, we need to ensure that the newly merged
+	 * range is referenced by the anon_vma's of the source.
 	 */
 	if (src->anon_vma && !dst->anon_vma) {
 		int ret;
diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
index 11f761769b5b..7cfd6e31db10 100644
--- a/tools/testing/vma/vma.c
+++ b/tools/testing/vma/vma.c
@@ -185,6 +185,15 @@ static void vmg_set_range(struct vma_merge_struct *vmg, unsigned long start,
 	vmg->__adjust_next_start = false;
 }
 
+/* Helper function to set both the VMG range and its anon_vma. */
+static void vmg_set_range_anon_vma(struct vma_merge_struct *vmg, unsigned long start,
+				   unsigned long end, pgoff_t pgoff, vm_flags_t flags,
+				   struct anon_vma *anon_vma)
+{
+	vmg_set_range(vmg, start, end, pgoff, flags);
+	vmg->anon_vma = anon_vma;
+}
+
 /*
  * Helper function to try to merge a new VMA.
  *
@@ -265,6 +274,22 @@ static void dummy_close(struct vm_area_struct *)
 {
 }
 
+static void __vma_set_dummy_anon_vma(struct vm_area_struct *vma,
+				     struct anon_vma_chain *avc,
+				     struct anon_vma *anon_vma)
+{
+	vma->anon_vma = anon_vma;
+	INIT_LIST_HEAD(&vma->anon_vma_chain);
+	list_add(&avc->same_vma, &vma->anon_vma_chain);
+	avc->anon_vma = vma->anon_vma;
+}
+
+static void vma_set_dummy_anon_vma(struct vm_area_struct *vma,
+				   struct anon_vma_chain *avc)
+{
+	__vma_set_dummy_anon_vma(vma, avc, &dummy_anon_vma);
+}
+
 static bool test_simple_merge(void)
 {
 	struct vm_area_struct *vma;
@@ -953,6 +978,7 @@ static bool test_merge_existing(void)
 	const struct vm_operations_struct vm_ops = {
 		.close = dummy_close,
 	};
+	struct anon_vma_chain avc = {};
 
 	/*
 	 * Merge right case - partial span.
@@ -968,10 +994,10 @@ static bool test_merge_existing(void)
 	vma->vm_ops = &vm_ops; /* This should have no impact. */
 	vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, flags);
 	vma_next->vm_ops = &vm_ops; /* This should have no impact. */
-	vmg_set_range(&vmg, 0x3000, 0x6000, 3, flags);
+	vmg_set_range_anon_vma(&vmg, 0x3000, 0x6000, 3, flags, &dummy_anon_vma);
 	vmg.middle = vma;
 	vmg.prev = vma;
-	vma->anon_vma = &dummy_anon_vma;
+	vma_set_dummy_anon_vma(vma, &avc);
 	ASSERT_EQ(merge_existing(&vmg), vma_next);
 	ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
 	ASSERT_EQ(vma_next->vm_start, 0x3000);
@@ -1001,9 +1027,9 @@ static bool test_merge_existing(void)
 	vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, flags);
 	vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, flags);
 	vma_next->vm_ops = &vm_ops; /* This should have no impact. */
-	vmg_set_range(&vmg, 0x2000, 0x6000, 2, flags);
+	vmg_set_range_anon_vma(&vmg, 0x2000, 0x6000, 2, flags, &dummy_anon_vma);
 	vmg.middle = vma;
-	vma->anon_vma = &dummy_anon_vma;
+	vma_set_dummy_anon_vma(vma, &avc);
 	ASSERT_EQ(merge_existing(&vmg), vma_next);
 	ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
 	ASSERT_EQ(vma_next->vm_start, 0x2000);
@@ -1030,11 +1056,10 @@ static bool test_merge_existing(void)
 	vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
 	vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
 	vma->vm_ops = &vm_ops; /* This should have no impact. */
-	vmg_set_range(&vmg, 0x3000, 0x6000, 3, flags);
+	vmg_set_range_anon_vma(&vmg, 0x3000, 0x6000, 3, flags, &dummy_anon_vma);
 	vmg.prev = vma_prev;
 	vmg.middle = vma;
-	vma->anon_vma = &dummy_anon_vma;
-
+	vma_set_dummy_anon_vma(vma, &avc);
 	ASSERT_EQ(merge_existing(&vmg), vma_prev);
 	ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
 	ASSERT_EQ(vma_prev->vm_start, 0);
@@ -1064,10 +1089,10 @@ static bool test_merge_existing(void)
 	vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
 	vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
 	vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
-	vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
+	vmg_set_range_anon_vma(&vmg, 0x3000, 0x7000, 3, flags, &dummy_anon_vma);
 	vmg.prev = vma_prev;
 	vmg.middle = vma;
-	vma->anon_vma = &dummy_anon_vma;
+	vma_set_dummy_anon_vma(vma, &avc);
 	ASSERT_EQ(merge_existing(&vmg), vma_prev);
 	ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
 	ASSERT_EQ(vma_prev->vm_start, 0);
@@ -1094,10 +1119,10 @@ static bool test_merge_existing(void)
 	vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
 	vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
 	vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, flags);
-	vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
+	vmg_set_range_anon_vma(&vmg, 0x3000, 0x7000, 3, flags, &dummy_anon_vma);
 	vmg.prev = vma_prev;
 	vmg.middle = vma;
-	vma->anon_vma = &dummy_anon_vma;
+	vma_set_dummy_anon_vma(vma, &avc);
 	ASSERT_EQ(merge_existing(&vmg), vma_prev);
 	ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
 	ASSERT_EQ(vma_prev->vm_start, 0);
@@ -1180,12 +1205,9 @@ static bool test_anon_vma_non_mergeable(void)
 		.mm = &mm,
 		.vmi = &vmi,
 	};
-	struct anon_vma_chain dummy_anon_vma_chain1 = {
-		.anon_vma = &dummy_anon_vma,
-	};
-	struct anon_vma_chain dummy_anon_vma_chain2 = {
-		.anon_vma = &dummy_anon_vma,
-	};
+	struct anon_vma_chain dummy_anon_vma_chain_1 = {};
+	struct anon_vma_chain dummy_anon_vma_chain_2 = {};
+	struct anon_vma dummy_anon_vma_2;
 
 	/*
 	 * In the case of modified VMA merge, merging both left and right VMAs
@@ -1209,24 +1231,11 @@ static bool test_anon_vma_non_mergeable(void)
 	 *
 	 * However, when prev is compared to next, the merge should fail.
 	 */
-
-	INIT_LIST_HEAD(&vma_prev->anon_vma_chain);
-	list_add(&dummy_anon_vma_chain1.same_vma, &vma_prev->anon_vma_chain);
-	ASSERT_TRUE(list_is_singular(&vma_prev->anon_vma_chain));
-	vma_prev->anon_vma = &dummy_anon_vma;
-	ASSERT_TRUE(is_mergeable_anon_vma(NULL, vma_prev->anon_vma, vma_prev));
-
-	INIT_LIST_HEAD(&vma_next->anon_vma_chain);
-	list_add(&dummy_anon_vma_chain2.same_vma, &vma_next->anon_vma_chain);
-	ASSERT_TRUE(list_is_singular(&vma_next->anon_vma_chain));
-	vma_next->anon_vma = (struct anon_vma *)2;
-	ASSERT_TRUE(is_mergeable_anon_vma(NULL, vma_next->anon_vma, vma_next));
-
-	ASSERT_FALSE(is_mergeable_anon_vma(vma_prev->anon_vma, vma_next->anon_vma, NULL));
-
-	vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
+	vmg_set_range_anon_vma(&vmg, 0x3000, 0x7000, 3, flags, NULL);
 	vmg.prev = vma_prev;
 	vmg.middle = vma;
+	vma_set_dummy_anon_vma(vma_prev, &dummy_anon_vma_chain_1);
+	__vma_set_dummy_anon_vma(vma_next, &dummy_anon_vma_chain_2, &dummy_anon_vma_2);
 
 	ASSERT_EQ(merge_existing(&vmg), vma_prev);
 	ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
@@ -1253,17 +1262,12 @@ static bool test_anon_vma_non_mergeable(void)
 	vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
 	vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, flags);
 
-	INIT_LIST_HEAD(&vma_prev->anon_vma_chain);
-	list_add(&dummy_anon_vma_chain1.same_vma, &vma_prev->anon_vma_chain);
-	vma_prev->anon_vma = (struct anon_vma *)1;
-
-	INIT_LIST_HEAD(&vma_next->anon_vma_chain);
-	list_add(&dummy_anon_vma_chain2.same_vma, &vma_next->anon_vma_chain);
-	vma_next->anon_vma = (struct anon_vma *)2;
-
-	vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
+	vmg_set_range_anon_vma(&vmg, 0x3000, 0x7000, 3, flags, NULL);
 	vmg.prev = vma_prev;
+	vma_set_dummy_anon_vma(vma_prev, &dummy_anon_vma_chain_1);
+	__vma_set_dummy_anon_vma(vma_next, &dummy_anon_vma_chain_2, &dummy_anon_vma_2);
 
+	vmg.anon_vma = NULL;
 	ASSERT_EQ(merge_new(&vmg), vma_prev);
 	ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
 	ASSERT_EQ(vma_prev->vm_start, 0);
@@ -1363,8 +1367,8 @@ static bool test_dup_anon_vma(void)
 	vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
 	vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
 	vma_next = alloc_and_link_vma(&mm, 0x5000, 0x8000, 5, flags);
-
-	vma->anon_vma = &dummy_anon_vma;
+	vmg.anon_vma = &dummy_anon_vma;
+	vma_set_dummy_anon_vma(vma, &dummy_anon_vma_chain);
 	vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
 	vmg.prev = vma_prev;
 	vmg.middle = vma;
@@ -1392,7 +1396,7 @@ static bool test_dup_anon_vma(void)
 	vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
 	vma = alloc_and_link_vma(&mm, 0x3000, 0x8000, 3, flags);
 
-	vma->anon_vma = &dummy_anon_vma;
+	vma_set_dummy_anon_vma(vma, &dummy_anon_vma_chain);
 	vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
 	vmg.prev = vma_prev;
 	vmg.middle = vma;
@@ -1420,7 +1424,7 @@ static bool test_dup_anon_vma(void)
 	vma = alloc_and_link_vma(&mm, 0, 0x5000, 0, flags);
 	vma_next = alloc_and_link_vma(&mm, 0x5000, 0x8000, 5, flags);
 
-	vma->anon_vma = &dummy_anon_vma;
+	vma_set_dummy_anon_vma(vma, &dummy_anon_vma_chain);
 	vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
 	vmg.prev = vma;
 	vmg.middle = vma;
@@ -1447,6 +1451,7 @@ static bool test_vmi_prealloc_fail(void)
 		.mm = &mm,
 		.vmi = &vmi,
 	};
+	struct anon_vma_chain avc = {};
 	struct vm_area_struct *vma_prev, *vma;
 
 	/*
@@ -1459,9 +1464,10 @@ static bool test_vmi_prealloc_fail(void)
 	vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
 	vma->anon_vma = &dummy_anon_vma;
 
-	vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+	vmg_set_range_anon_vma(&vmg, 0x3000, 0x5000, 3, flags, &dummy_anon_vma);
 	vmg.prev = vma_prev;
 	vmg.middle = vma;
+	vma_set_dummy_anon_vma(vma, &avc);
 
 	fail_prealloc = true;
 
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] tools/testing: add PROCMAP_QUERY helper functions in mm self tests
  2025-03-17 21:15 [PATCH 0/3] fix incorrectly disallowed anonymous VMA merges Lorenzo Stoakes
  2025-03-17 21:15 ` [PATCH 1/3] mm/vma: " Lorenzo Stoakes
@ 2025-03-17 21:15 ` Lorenzo Stoakes
  2025-03-18 15:18   ` Lorenzo Stoakes
  2025-04-07  2:42   ` Wei Yang
  2025-03-17 21:15 ` [PATCH 3/3] tools/testing/selftests: assert that anon merge cases behave as expected Lorenzo Stoakes
  2025-03-20 13:33 ` [PATCH 0/3] fix incorrectly disallowed anonymous VMA merges Yeoreum Yun
  3 siblings, 2 replies; 15+ messages in thread
From: Lorenzo Stoakes @ 2025-03-17 21:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jann Horn, Liam R . Howlett, Suren Baghdasaryan,
	David Hildenbrand, Matthew Wilcox, Rik van Riel, linux-mm,
	linux-kernel

The PROCMAP_QUERY ioctl() is very useful - it allows for binary access to
/proc/$pid/[s]maps data and thus convenient lookup of data contained there.

This patch exposes this for convenient use by mm self tests so the state of
VMAs can easily be queried.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 tools/testing/selftests/mm/vm_util.c | 62 ++++++++++++++++++++++++++++
 tools/testing/selftests/mm/vm_util.h | 21 ++++++++++
 2 files changed, 83 insertions(+)

diff --git a/tools/testing/selftests/mm/vm_util.c b/tools/testing/selftests/mm/vm_util.c
index a36734fb62f3..891ce17453cd 100644
--- a/tools/testing/selftests/mm/vm_util.c
+++ b/tools/testing/selftests/mm/vm_util.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <string.h>
+#include <errno.h>
 #include <fcntl.h>
 #include <dirent.h>
 #include <inttypes.h>
@@ -424,3 +425,64 @@ bool check_vmflag_io(void *addr)
 		flags += flaglen;
 	}
 }
+
+/*
+ * Open an fd at /proc/$pid/maps and configure procmap_out ready for
+ * PROCMAP_QUERY query. Returns 0 on success, or an error code otherwise.
+ */
+int open_procmap(pid_t pid, struct procmap_fd *procmap_out)
+{
+	char path[256];
+	int ret = 0;
+
+	memset(procmap_out, '\0', sizeof(*procmap_out));
+	sprintf(path, "/proc/%d/maps", pid);
+	procmap_out->query.size = sizeof(procmap_out->query);
+	procmap_out->fd = open(path, O_RDONLY);
+	if (procmap_out < 0)
+		ret = errno;
+
+	return ret;
+}
+
+/* Perform PROCMAP_QUERY. Returns 0 on success, or an error code otherwise. */
+int query_procmap(struct procmap_fd *procmap)
+{
+	int ret = 0;
+
+	if (ioctl(procmap->fd, PROCMAP_QUERY, &procmap->query) == -1)
+		ret = errno;
+
+	return ret;
+}
+
+/*
+ * Try to find the VMA at specified address, returns true if found, false if not
+ * found, and the test is failed if any other error occurs.
+ *
+ * On success, procmap->query is populated with the results.
+ */
+bool find_vma_procmap(struct procmap_fd *procmap, void *address)
+{
+	int err;
+
+	procmap->query.query_flags = 0;
+	procmap->query.query_addr = (unsigned long)address;
+	err = query_procmap(procmap);
+	if (!err)
+		return true;
+
+	if (err != -ENOENT)
+		ksft_exit_fail_msg("%s: Error %d on ioctl(PROCMAP_QUERY)\n",
+				   __func__, err);
+	return false;
+}
+
+/*
+ * Close fd used by PROCMAP_QUERY mechanism. Returns 0 on success, or an error
+ * code otherwise.
+ */
+int close_procmap(struct procmap_fd *procmap)
+{
+	return close(procmap->fd);
+}
diff --git a/tools/testing/selftests/mm/vm_util.h b/tools/testing/selftests/mm/vm_util.h
index 0e629586556b..aaea0ef78322 100644
--- a/tools/testing/selftests/mm/vm_util.h
+++ b/tools/testing/selftests/mm/vm_util.h
@@ -5,6 +5,7 @@
 #include <err.h>
 #include <strings.h> /* ffsl() */
 #include <unistd.h> /* _SC_PAGESIZE */
+#include <linux/fs.h>
 
 #define BIT_ULL(nr)                   (1ULL << (nr))
 #define PM_SOFT_DIRTY                 BIT_ULL(55)
@@ -18,6 +19,15 @@
 extern unsigned int __page_size;
 extern unsigned int __page_shift;
 
+/*
+ * Represents an open fd and PROCMAP_QUERY state for binary (via ioctl)
+ * /proc/$pid/[s]maps lookup.
+ */
+struct procmap_fd {
+	int fd;
+	struct procmap_query query;
+};
+
 static inline unsigned int psize(void)
 {
 	if (!__page_size)
@@ -55,6 +65,17 @@ int uffd_register_with_ioctls(int uffd, void *addr, uint64_t len,
 			      bool miss, bool wp, bool minor, uint64_t *ioctls);
 unsigned long get_free_hugepages(void);
 bool check_vmflag_io(void *addr);
+int open_procmap(pid_t pid, struct procmap_fd *procmap_out);
+int query_procmap(struct procmap_fd *procmap);
+bool find_vma_procmap(struct procmap_fd *procmap, void *address);
+int close_procmap(struct procmap_fd *procmap);
+
+static inline int open_self_procmap(struct procmap_fd *procmap_out)
+{
+	pid_t pid = getpid();
+
+	return open_procmap(pid, procmap_out);
+}
 
 /*
  * On ppc64 this will only work with radix 2M hugepage size
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] tools/testing/selftests: assert that anon merge cases behave as expected
  2025-03-17 21:15 [PATCH 0/3] fix incorrectly disallowed anonymous VMA merges Lorenzo Stoakes
  2025-03-17 21:15 ` [PATCH 1/3] mm/vma: " Lorenzo Stoakes
  2025-03-17 21:15 ` [PATCH 2/3] tools/testing: add PROCMAP_QUERY helper functions in mm self tests Lorenzo Stoakes
@ 2025-03-17 21:15 ` Lorenzo Stoakes
  2025-04-07  2:54   ` Wei Yang
  2025-03-20 13:33 ` [PATCH 0/3] fix incorrectly disallowed anonymous VMA merges Yeoreum Yun
  3 siblings, 1 reply; 15+ messages in thread
From: Lorenzo Stoakes @ 2025-03-17 21:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jann Horn, Liam R . Howlett, Suren Baghdasaryan,
	David Hildenbrand, Matthew Wilcox, Rik van Riel, linux-mm,
	linux-kernel

Prior to the recently applied commit that permits this merge,
mprotect()'ing a faulted VMA, adjacent to an unfaulted VMA, such that the
two share characteristics would fail to merge due to what appear to be
unintended consequences of commit 965f55dea0e3 ("mmap: avoid merging cloned
VMAs").

Now we have fixed this bug, assert that we can indeed merge anonymous VMAs
this way.

Also assert that forked source/target VMAs are equally
rejected. Previously, all empty target anon merges with one VMA faulted and
the other unfaulted would be rejected incorrectly, now we ensure that
unforked merge, but forked do not.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 tools/testing/selftests/mm/.gitignore     |   1 +
 tools/testing/selftests/mm/Makefile       |   1 +
 tools/testing/selftests/mm/merge.c        | 454 ++++++++++++++++++++++
 tools/testing/selftests/mm/run_vmtests.sh |   2 +
 4 files changed, 458 insertions(+)
 create mode 100644 tools/testing/selftests/mm/merge.c

diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore
index c5241b193db8..91db34941a14 100644
--- a/tools/testing/selftests/mm/.gitignore
+++ b/tools/testing/selftests/mm/.gitignore
@@ -58,3 +58,4 @@ hugetlb_dio
 pkey_sighandler_tests_32
 pkey_sighandler_tests_64
 guard-regions
+merge
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 8270895039d1..ad4d6043a60f 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -98,6 +98,7 @@ TEST_GEN_FILES += hugetlb_madv_vs_map
 TEST_GEN_FILES += hugetlb_dio
 TEST_GEN_FILES += droppable
 TEST_GEN_FILES += guard-regions
+TEST_GEN_FILES += merge
 
 ifneq ($(ARCH),arm64)
 TEST_GEN_FILES += soft-dirty
diff --git a/tools/testing/selftests/mm/merge.c b/tools/testing/selftests/mm/merge.c
new file mode 100644
index 000000000000..9cc61bdbfba8
--- /dev/null
+++ b/tools/testing/selftests/mm/merge.c
@@ -0,0 +1,454 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#define _GNU_SOURCE
+#include "../kselftest_harness.h"
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+#include "vm_util.h"
+
+FIXTURE(merge)
+{
+	unsigned int page_size;
+	char *carveout;
+	struct procmap_fd procmap;
+};
+
+FIXTURE_SETUP(merge)
+{
+	self->page_size = psize();
+	/* Carve out PROT_NONE region to map over. */
+	self->carveout = mmap(NULL, 12 * self->page_size, PROT_NONE,
+			      MAP_ANON | MAP_PRIVATE, -1, 0);
+	ASSERT_NE(self->carveout, MAP_FAILED);
+	/* Setup PROCMAP_QUERY interface. */
+	ASSERT_EQ(open_self_procmap(&self->procmap), 0);
+}
+
+FIXTURE_TEARDOWN(merge)
+{
+	ASSERT_EQ(munmap(self->carveout, 12 * self->page_size), 0);
+	ASSERT_EQ(close_procmap(&self->procmap), 0);
+}
+
+TEST_F(merge, mprotect_unfaulted_left)
+{
+	unsigned int page_size = self->page_size;
+	char *carveout = self->carveout;
+	struct procmap_fd *procmap = &self->procmap;
+	char *ptr;
+
+	/*
+	 * Map 10 pages of R/W memory within. MAP_NORESERVE so we don't hit
+	 * merge failure due to lack of VM_ACCOUNT flag by mistake.
+	 *
+	 * |-----------------------|
+	 * |       unfaulted       |
+	 * |-----------------------|
+	 */
+	ptr = mmap(&carveout[page_size], 10 * page_size, PROT_READ | PROT_WRITE,
+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
+	ASSERT_NE(ptr, MAP_FAILED);
+	/*
+	 * Now make the first 5 pages read-only, splitting the VMA:
+	 *
+	 *      RO          RW
+	 * |-----------|-----------|
+	 * | unfaulted | unfaulted |
+	 * |-----------|-----------|
+	 */
+	ASSERT_EQ(mprotect(ptr, 5 * page_size, PROT_READ), 0);
+	/*
+	 * Fault in the first of the last 5 pages so it gets an anon_vma and
+	 * thus the whole VMA becomes 'faulted':
+	 *
+	 *      RO          RW
+	 * |-----------|-----------|
+	 * | unfaulted |  faulted  |
+	 * |-----------|-----------|
+	 */
+	ptr[5 * page_size] = 'x';
+	/*
+	 * Now mprotect() the RW region read-only, we should merge (though for
+	 * ~15 years we did not! :):
+	 *
+	 *             RO
+	 * |-----------------------|
+	 * |        faulted        |
+	 * |-----------------------|
+	 */
+	ASSERT_EQ(mprotect(&ptr[5 * page_size], 5 * page_size, PROT_READ), 0);
+
+	/* Assert that the merge succeeded using PROCMAP_QUERY. */
+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 10 * page_size);
+}
+
+TEST_F(merge, mprotect_unfaulted_right)
+{
+	unsigned int page_size = self->page_size;
+	char *carveout = self->carveout;
+	struct procmap_fd *procmap = &self->procmap;
+	char *ptr;
+
+	/*
+	 * |-----------------------|
+	 * |       unfaulted       |
+	 * |-----------------------|
+	 */
+	ptr = mmap(&carveout[page_size], 10 * page_size, PROT_READ | PROT_WRITE,
+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
+	ASSERT_NE(ptr, MAP_FAILED);
+	/*
+	 * Now make the last 5 pages read-only, splitting the VMA:
+	 *
+	 *      RW          RO
+	 * |-----------|-----------|
+	 * | unfaulted | unfaulted |
+	 * |-----------|-----------|
+	 */
+	ASSERT_EQ(mprotect(&ptr[5 * page_size], 5 * page_size, PROT_READ), 0);
+	/*
+	 * Fault in the first of the first 5 pages so it gets an anon_vma and
+	 * thus the whole VMA becomes 'faulted':
+	 *
+	 *      RW          RO
+	 * |-----------|-----------|
+	 * |  faulted  | unfaulted |
+	 * |-----------|-----------|
+	 */
+	ptr[0] = 'x';
+	/*
+	 * Now mprotect() the RW region read-only, we should merge:
+	 *
+	 *             RO
+	 * |-----------------------|
+	 * |        faulted        |
+	 * |-----------------------|
+	 */
+	ASSERT_EQ(mprotect(ptr, 5 * page_size, PROT_READ), 0);
+
+	/* Assert that the merge succeeded using PROCMAP_QUERY. */
+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 10 * page_size);
+}
+
+TEST_F(merge, mprotect_unfaulted_both)
+{
+	unsigned int page_size = self->page_size;
+	char *carveout = self->carveout;
+	struct procmap_fd *procmap = &self->procmap;
+	char *ptr;
+
+	/*
+	 * |-----------------------|
+	 * |       unfaulted       |
+	 * |-----------------------|
+	 */
+	ptr = mmap(&carveout[2 * page_size], 9 * page_size, PROT_READ | PROT_WRITE,
+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
+	ASSERT_NE(ptr, MAP_FAILED);
+	/*
+	 * Now make the first and last 3 pages read-only, splitting the VMA:
+	 *
+	 *      RO          RW          RO
+	 * |-----------|-----------|-----------|
+	 * | unfaulted | unfaulted | unfaulted |
+	 * |-----------|-----------|-----------|
+	 */
+	ASSERT_EQ(mprotect(ptr, 3 * page_size, PROT_READ), 0);
+	ASSERT_EQ(mprotect(&ptr[6 * page_size], 3 * page_size, PROT_READ), 0);
+	/*
+	 * Fault in the first of the middle 3 pages so it gets an anon_vma and
+	 * thus the whole VMA becomes 'faulted':
+	 *
+	 *      RO          RW          RO
+	 * |-----------|-----------|-----------|
+	 * | unfaulted |  faulted  | unfaulted |
+	 * |-----------|-----------|-----------|
+	 */
+	ptr[3 * page_size] = 'x';
+	/*
+	 * Now mprotect() the RW region read-only, we should merge:
+	 *
+	 *             RO
+	 * |-----------------------|
+	 * |        faulted        |
+	 * |-----------------------|
+	 */
+	ASSERT_EQ(mprotect(&ptr[3 * page_size], 3 * page_size, PROT_READ), 0);
+
+	/* Assert that the merge succeeded using PROCMAP_QUERY. */
+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 9 * page_size);
+}
+
+TEST_F(merge, mprotect_faulted_left_unfaulted_right)
+{
+	unsigned int page_size = self->page_size;
+	char *carveout = self->carveout;
+	struct procmap_fd *procmap = &self->procmap;
+	char *ptr;
+
+	/*
+	 * |-----------------------|
+	 * |       unfaulted       |
+	 * |-----------------------|
+	 */
+	ptr = mmap(&carveout[2 * page_size], 9 * page_size, PROT_READ | PROT_WRITE,
+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
+	ASSERT_NE(ptr, MAP_FAILED);
+	/*
+	 * Now make the last 3 pages read-only, splitting the VMA:
+	 *
+	 *             RW               RO
+	 * |-----------------------|-----------|
+	 * |       unfaulted       | unfaulted |
+	 * |-----------------------|-----------|
+	 */
+	ASSERT_EQ(mprotect(&ptr[6 * page_size], 3 * page_size, PROT_READ), 0);
+	/*
+	 * Fault in the first of the first 6 pages so it gets an anon_vma and
+	 * thus the whole VMA becomes 'faulted':
+	 *
+	 *             RW               RO
+	 * |-----------------------|-----------|
+	 * |       unfaulted       | unfaulted |
+	 * |-----------------------|-----------|
+	 */
+	ptr[0] = 'x';
+	/*
+	 * Now make the first 3 pages read-only, splitting the VMA:
+	 *
+	 *      RO          RW          RO
+	 * |-----------|-----------|-----------|
+	 * |  faulted  |  faulted  | unfaulted |
+	 * |-----------|-----------|-----------|
+	 */
+	ASSERT_EQ(mprotect(ptr, 3 * page_size, PROT_READ), 0);
+	/*
+	 * Now mprotect() the RW region read-only, we should merge:
+	 *
+	 *             RO
+	 * |-----------------------|
+	 * |        faulted        |
+	 * |-----------------------|
+	 */
+	ASSERT_EQ(mprotect(&ptr[3 * page_size], 3 * page_size, PROT_READ), 0);
+
+	/* Assert that the merge succeeded using PROCMAP_QUERY. */
+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 9 * page_size);
+}
+
+TEST_F(merge, mprotect_unfaulted_left_faulted_right)
+{
+	unsigned int page_size = self->page_size;
+	char *carveout = self->carveout;
+	struct procmap_fd *procmap = &self->procmap;
+	char *ptr;
+
+	/*
+	 * |-----------------------|
+	 * |       unfaulted       |
+	 * |-----------------------|
+	 */
+	ptr = mmap(&carveout[2 * page_size], 9 * page_size, PROT_READ | PROT_WRITE,
+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
+	ASSERT_NE(ptr, MAP_FAILED);
+	/*
+	 * Now make the first 3 pages read-only, splitting the VMA:
+	 *
+	 *      RO                RW
+	 * |-----------|-----------------------|
+	 * | unfaulted |       unfaulted       |
+	 * |-----------|-----------------------|
+	 */
+	ASSERT_EQ(mprotect(ptr, 3 * page_size, PROT_READ), 0);
+	/*
+	 * FAult in the first of the last 6 pages so it gets an anon_vma and
+	 * thus the whole VMA becomes 'faulted':
+	 *
+	 *      RO                RW
+	 * |-----------|-----------------------|
+	 * | unfaulted |        faulted        |
+	 * |-----------|-----------------------|
+	 */
+	ptr[3 * page_size] = 'x';
+	/*
+	 * Now make the last 3 pages read-only, splitting the VMA:
+	 *
+	 *      RO          RW          RO
+	 * |-----------|-----------|-----------|
+	 * | unfaulted |  faulted  |  faulted  |
+	 * |-----------|-----------|-----------|
+	 */
+	ASSERT_EQ(mprotect(&ptr[6 * page_size], 3 * page_size, PROT_READ), 0);
+	/*
+	 * Now mprotect() the RW region read-only, we should merge:
+	 *
+	 *             RO
+	 * |-----------------------|
+	 * |        faulted        |
+	 * |-----------------------|
+	 */
+	ASSERT_EQ(mprotect(&ptr[3 * page_size], 3 * page_size, PROT_READ), 0);
+
+	/* Assert that the merge succeeded using PROCMAP_QUERY. */
+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 9 * page_size);
+}
+
+TEST_F(merge, forked_target_vma)
+{
+	unsigned int page_size = self->page_size;
+	char *carveout = self->carveout;
+	struct procmap_fd *procmap = &self->procmap;
+	pid_t pid;
+	char *ptr, *ptr2;
+	int i;
+
+	/*
+	 * |-----------|
+	 * | unfaulted |
+	 * |-----------|
+	 */
+	ptr = mmap(&carveout[page_size], 5 * page_size, PROT_READ | PROT_WRITE,
+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED, -1, 0);
+	ASSERT_NE(ptr, MAP_FAILED);
+
+	/*
+	 * Fault in process.
+	 *
+	 * |-----------|
+	 * |  faulted  |
+	 * |-----------|
+	 */
+	ptr[0] = 'x';
+
+	pid = fork();
+	ASSERT_NE(pid, -1);
+
+	if (pid != 0) {
+		wait(NULL);
+		return;
+	}
+
+	/* Child process below: */
+
+	/* Reopen for child. */
+	ASSERT_EQ(close_procmap(&self->procmap), 0);
+	ASSERT_EQ(open_self_procmap(&self->procmap), 0);
+
+	/* unCOWing everything does not cause the AVC to go away. */
+	for (i = 0; i < 5 * page_size; i += page_size)
+		ptr[i] = 'x';
+
+	/*
+	 * Map in adjacent VMA in child.
+	 *
+	 *     forked
+	 * |-----------|-----------|
+	 * |  faulted  | unfaulted |
+	 * |-----------|-----------|
+	 *      ptr         ptr2
+	 */
+	ptr2 = mmap(&ptr[5 * page_size], 5 * page_size, PROT_READ | PROT_WRITE,
+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED, -1, 0);
+	ASSERT_NE(ptr2, MAP_FAILED);
+
+	/* Make sure not merged. */
+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 5 * page_size);
+}
+
+TEST_F(merge, forked_source_vma)
+{
+	unsigned int page_size = self->page_size;
+	char *carveout = self->carveout;
+	struct procmap_fd *procmap = &self->procmap;
+	pid_t pid;
+	char *ptr, *ptr2;
+	int i;
+
+	/*
+	 * |............|-----------|
+	 * | <unmapped> | unfaulted |
+	 * |............|-----------|
+	 */
+	ptr = mmap(&carveout[page_size], 5 * page_size, PROT_READ | PROT_WRITE,
+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
+	ASSERT_NE(ptr, MAP_FAILED);
+
+	/*
+	 * Fault in process.
+	 *
+	 * |............||-----------|
+	 * | <unmapped> ||  faulted  |
+	 * |............||-----------|
+	 */
+	ptr[0] = 'x';
+
+	pid = fork();
+	ASSERT_NE(pid, -1);
+
+	if (pid != 0) {
+		wait(NULL);
+		return;
+	}
+
+	/* Child process below: */
+
+	/* Reopen for child. */
+	ASSERT_EQ(close_procmap(&self->procmap), 0);
+	ASSERT_EQ(open_self_procmap(&self->procmap), 0);
+
+	/* unCOWing everything does not cause the AVC to go away. */
+	for (i = 0; i < 5 * page_size; i += page_size)
+		ptr[i] = 'x';
+
+	/*
+	 * Map in adjacent VMA in child, ptr2 before ptr, but incompatible.
+	 *
+	 *      RWX      forked RW
+	 * |-----------|-----------|
+	 * | unfaulted |  faulted  |
+	 * |-----------|-----------|
+	 *      ptr2        ptr
+	 */
+	ptr2 = mmap(&carveout[6 * page_size], 5 * page_size, PROT_READ | PROT_WRITE | PROT_EXEC,
+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
+	ASSERT_NE(ptr2, MAP_FAILED);
+
+
+	/* Make sure not merged. */
+	ASSERT_TRUE(find_vma_procmap(procmap, ptr2));
+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr2);
+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr2 + 5 * page_size);
+
+	/*
+	 * Now mprotect forked region to RWX so it becomes the source for the
+	 * merge to unfaulted region:
+	 *
+	 *      RWX      forked RWX
+	 * |-----------|-----------|
+	 * | unfaulted |  faulted  |
+	 * |-----------|-----------|
+	 *      ptr2        ptr
+	 */
+	ASSERT_EQ(mprotect(ptr, 5 * page_size, PROT_READ | PROT_WRITE | PROT_EXEC), 0);
+	/* Again, make sure not merged. */
+	ASSERT_TRUE(find_vma_procmap(procmap, ptr2));
+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr2);
+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr2 + 5 * page_size);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 9aff33b10999..0b2f8bb91433 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -421,6 +421,8 @@ CATEGORY="madv_guard" run_test ./guard-regions
 # MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
 CATEGORY="madv_populate" run_test ./madv_populate
 
+CATEGORY="vma_merge" run_test ./merge
+
 if [ -x ./memfd_secret ]
 then
 (echo 0 > /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] tools/testing: add PROCMAP_QUERY helper functions in mm self tests
  2025-03-17 21:15 ` [PATCH 2/3] tools/testing: add PROCMAP_QUERY helper functions in mm self tests Lorenzo Stoakes
@ 2025-03-18 15:18   ` Lorenzo Stoakes
  2025-04-07  2:42   ` Wei Yang
  1 sibling, 0 replies; 15+ messages in thread
From: Lorenzo Stoakes @ 2025-03-18 15:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jann Horn, Liam R . Howlett, Suren Baghdasaryan,
	David Hildenbrand, Matthew Wilcox, Rik van Riel, linux-mm,
	linux-kernel

On Mon, Mar 17, 2025 at 09:15:04PM +0000, Lorenzo Stoakes wrote:
> The PROCMAP_QUERY ioctl() is very useful - it allows for binary access to
> /proc/$pid/[s]maps data and thus convenient lookup of data contained there.
>
> This patch exposes this for convenient use by mm self tests so the state of
> VMAs can easily be queried.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Quick fix on this:

----8<----
From 02d674de3736e3726a97a27625d4598c3d49d08a Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Tue, 18 Mar 2025 15:16:29 +0000
Subject: [PATCH] correctly return errno

---
 tools/testing/selftests/mm/vm_util.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/mm/vm_util.c b/tools/testing/selftests/mm/vm_util.c
index 891ce17453cd..1357e2d6a7b6 100644
--- a/tools/testing/selftests/mm/vm_util.c
+++ b/tools/testing/selftests/mm/vm_util.c
@@ -440,7 +440,7 @@ int open_procmap(pid_t pid, struct procmap_fd *procmap_out)
 	procmap_out->query.size = sizeof(procmap_out->query);
 	procmap_out->fd = open(path, O_RDONLY);
 	if (procmap_out < 0)
-		ret = errno;
+		ret = -errno;

 	return ret;
 }
@@ -451,7 +451,7 @@ int query_procmap(struct procmap_fd *procmap)
 	int ret = 0;

 	if (ioctl(procmap->fd, PROCMAP_QUERY, &procmap->query) == -1)
-		ret = errno;
+		ret = -errno;

 	return ret;
 }
--
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] fix incorrectly disallowed anonymous VMA merges
  2025-03-17 21:15 [PATCH 0/3] fix incorrectly disallowed anonymous VMA merges Lorenzo Stoakes
                   ` (2 preceding siblings ...)
  2025-03-17 21:15 ` [PATCH 3/3] tools/testing/selftests: assert that anon merge cases behave as expected Lorenzo Stoakes
@ 2025-03-20 13:33 ` Yeoreum Yun
  3 siblings, 0 replies; 15+ messages in thread
From: Yeoreum Yun @ 2025-03-20 13:33 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Vlastimil Babka, Jann Horn, Liam R . Howlett,
	Suren Baghdasaryan, David Hildenbrand, Matthew Wilcox,
	Rik van Riel, linux-mm, linux-kernel

On Mon, Mar 17, 2025 at 09:15:02PM +0000, Lorenzo Stoakes wrote:
> It appears that we have been incorrectly rejecting merge cases for 15
> years, apparently by mistake.
>
> Imagine a range of anonymous mapped momemory divided into two VMAs like
> this, with incompatible protection bits:
>
>               RW         RWX
> 	  unfaulted    faulted
> 	|-----------|-----------|
> 	|    prev   |    vma    |
> 	|-----------|-----------|
> 	             mprotect(RW)
>
> Now imagine mprotect()'ing vma so it is RW. This appears as if it should
> merge, it does not.
>
> Neither does this case, again mprotect()'ing vma RW:
>
>               RWX        RW
> 	   faulted    unfaulted
> 	|-----------|-----------|
> 	|    vma    |   next    |
> 	|-----------|-----------|
> 	 mprotect(RW)
>
> Nor:
>
>               RW         RWX          RW
> 	  unfaulted    faulted    unfaulted
> 	|-----------|-----------|-----------|
> 	|    prev   |    vma    |    next   |
> 	|-----------|-----------|-----------|
> 	             mprotect(RW)
>
> What's going on here?
>
> In commit 5beb49305251 ("mm: change anon_vma linking to fix multi-process
> server scalability issue"), from 2010, Rik von Riel took careful care to
> account for these cases - commenting that '[this is] easily overlooked:
> when mprotect shifts the boundary, make sure the expanding vma has anon_vma
> set if the shrinking vma had, to cover any anon pages imported.'
>
> However, commit 965f55dea0e3 ("mmap: avoid merging cloned VMAs") introduced
> a little over a year later, appears to have accidentally disallowed this.
>
> By adjusting the is_mergeable_anon_vma() function to avoid lock contention
> across large trees of forked anon_vma's, this commit wrongly assumed the
> VMA being checked (the ostensible merge 'target') should be faulted, that
> is, have an anon_vma, and thus an anon_vma_chain list established, but only
> of length 1.
>
> This appears to have been unintentional, as disallowing empty target VMAs
> like this across the board makes no sense.
>
> We already have logic that accounts for this case, the same logic Rik
> introduced in 2010, now via dup_anon_vma() (and ultimately
> anon_vma_clone()), so there is no problem permitting this.
>
> This series fixes this mistake and also ensures that scalability concerns
> remain addressed by explicitly checking that whatever VMA is being merged
> has not been forked.
>
> A full set of self tests which reproduce the issue are provided, as well as
> updating userland VMA tests to assert this behaviour.
>
> The self tests additionally assert scalability concerns are addressed.
>
>
> Lorenzo Stoakes (3):
>   mm/vma: fix incorrectly disallowed anonymous VMA merges
>   tools/testing: add PROCMAP_QUERY helper functions in mm self tests
>   tools/testing/selftests: assert that anon merge cases behave as
>     expected
>
>  mm/vma.c                                  |  81 ++--
>  tools/testing/selftests/mm/.gitignore     |   1 +
>  tools/testing/selftests/mm/Makefile       |   1 +
>  tools/testing/selftests/mm/merge.c        | 454 ++++++++++++++++++++++
>  tools/testing/selftests/mm/run_vmtests.sh |   2 +
>  tools/testing/selftests/mm/vm_util.c      |  62 +++
>  tools/testing/selftests/mm/vm_util.h      |  21 +
>  tools/testing/vma/vma.c                   | 100 ++---
>  8 files changed, 652 insertions(+), 70 deletions(-)
>  create mode 100644 tools/testing/selftests/mm/merge.c
>
> --
> 2.48.1
>

Look good to me.
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] mm/vma: fix incorrectly disallowed anonymous VMA merges
  2025-03-17 21:15 ` [PATCH 1/3] mm/vma: " Lorenzo Stoakes
@ 2025-04-04 12:53   ` Wei Yang
  2025-04-04 13:04     ` Lorenzo Stoakes
  0 siblings, 1 reply; 15+ messages in thread
From: Wei Yang @ 2025-04-04 12:53 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Vlastimil Babka, Jann Horn, Liam R . Howlett,
	Suren Baghdasaryan, David Hildenbrand, Matthew Wilcox,
	Rik van Riel, linux-mm, linux-kernel

On Mon, Mar 17, 2025 at 09:15:03PM +0000, Lorenzo Stoakes wrote:
[...]
>However, we have a problem here - typically the vma passed here is the
>destination VMA.
>
>For instance in vma_merge_existing_range() we invoke:
>
>can_vma_merge_left()
>-> [ check that there is an immediately adjacent prior VMA ]
>-> can_vma_merge_after()
>  -> is_mergeable_vma() for general attribute check
>-> is_mergeable_anon_vma([ proposed anon_vma ], prev->anon_vma, prev)
>
>So if we were considering a target unfaulted 'prev':
>
>	  unfaulted    faulted
>	|-----------|-----------|
>	|    prev   |    vma    |
>	|-----------|-----------|
>
>This would call is_mergeable_anon_vma(NULL, vma->anon_vma, prev).
>
>The list_is_singular() check for vma->anon_vma_chain, an empty list on
>fault, would cause this merge to _fail_ even though all else indicates a
>merge.
>

Great spot. It is hiding there for 15 years.

>Equally a simple merge into a next VMA would hit the same problem:
>
>	   faulted    unfaulted
>	|-----------|-----------|
>	|    vma    |    next   |
>	|-----------|-----------|
>
[...]
>---
> mm/vma.c                |  81 +++++++++++++++++++++++---------
> tools/testing/vma/vma.c | 100 +++++++++++++++++++++-------------------
> 2 files changed, 111 insertions(+), 70 deletions(-)
>
>diff --git a/mm/vma.c b/mm/vma.c
>index 5cdc5612bfc1..5418eef3a852 100644
>--- a/mm/vma.c
>+++ b/mm/vma.c
>@@ -57,6 +57,22 @@ struct mmap_state {
> 		.state = VMA_MERGE_START,				\
> 	}
> 
>+/*
>+ * If, at any point, the VMA had unCoW'd mappings from parents, it will maintain
>+ * more than one anon_vma_chain connecting it to more than one anon_vma. A merge
>+ * would mean a wider range of folios sharing the root anon_vma lock, and thus
>+ * potential lock contention, we do not wish to encourage merging such that this
>+ * scales to a problem.
>+ */

I don't follow here. Take a look into do_wp_page(), where CoW happens. But I
don't find where it will unlink parent anon_vma from vma->anon_vma_chain.

Per my understanding, the unlink behavior happens in unlink_anon_vma() which
unlink all anon_vma on vma->anon_vma_chain. And the normal caller of
unlink_anon_vma() is free_pgtables(). Other callers are on error path to
release prepared data. From this perspective, I don't see the chance to unlink
parent anon_vma from vma->anon_vma_chain either.

But maybe I missed something. If it is not too bother, would you mind giving
me a hint?

>+static bool vma_had_uncowed_parents(struct vm_area_struct *vma)
>+{
>+	/*
>+	 * The list_is_singular() test is to avoid merging VMA cloned from
>+	 * parents. This can improve scalability caused by anon_vma lock.
>+	 */
>+	return vma && vma->anon_vma && !list_is_singular(&vma->anon_vma_chain);
>+}
>+
> static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_next)
> {
> 	struct vm_area_struct *vma = merge_next ? vmg->next : vmg->prev;
>@@ -82,24 +98,28 @@ static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_nex
> 	return true;
> }
> 
>-static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1,
>-		 struct anon_vma *anon_vma2, struct vm_area_struct *vma)
>+static bool is_mergeable_anon_vma(struct vma_merge_struct *vmg, bool merge_next)
> {
>+	struct vm_area_struct *tgt = merge_next ? vmg->next : vmg->prev;
>+	struct vm_area_struct *src = vmg->middle; /* exisitng merge case. */
                                                     ^^^

A trivial typo here.

>+	struct anon_vma *tgt_anon = tgt->anon_vma;
>+	struct anon_vma *src_anon = vmg->anon_vma;
>+
> 	/*
>-	 * The list_is_singular() test is to avoid merging VMA cloned from
>-	 * parents. This can improve scalability caused by anon_vma lock.
>+	 * We _can_ have !src, vmg->anon_vma via copy_vma(). In this instance we
>+	 * will remove the existing VMA's anon_vma's so there's no scalability
>+	 * concerns.
> 	 */
>-	if ((!anon_vma1 || !anon_vma2) && (!vma ||
>-		list_is_singular(&vma->anon_vma_chain)))
>-		return true;
>-	return anon_vma1 == anon_vma2;
>-}
>+	VM_WARN_ON(src && src_anon != src->anon_vma);
> 
>-/* Are the anon_vma's belonging to each VMA compatible with one another? */
>-static inline bool are_anon_vmas_compatible(struct vm_area_struct *vma1,
>-					    struct vm_area_struct *vma2)
>-{
>-	return is_mergeable_anon_vma(vma1->anon_vma, vma2->anon_vma, NULL);
>+	/* Case 1 - we will dup_anon_vma() from src into tgt. */
>+	if (!tgt_anon && src_anon)
>+		return !vma_had_uncowed_parents(src);
>+	/* Case 2 - we will simply use tgt's anon_vma. */
>+	if (tgt_anon && !src_anon)
>+		return !vma_had_uncowed_parents(tgt);
>+	/* Case 3 - the anon_vma's are already shared. */
>+	return src_anon == tgt_anon;
> }
> 
> /*
>@@ -164,7 +184,7 @@ static bool can_vma_merge_before(struct vma_merge_struct *vmg)
> 	pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start);
> 
> 	if (is_mergeable_vma(vmg, /* merge_next = */ true) &&
>-	    is_mergeable_anon_vma(vmg->anon_vma, vmg->next->anon_vma, vmg->next)) {
>+	    is_mergeable_anon_vma(vmg, /* merge_next = */ true)) {
> 		if (vmg->next->vm_pgoff == vmg->pgoff + pglen)
> 			return true;
> 	}
>@@ -184,7 +204,7 @@ static bool can_vma_merge_before(struct vma_merge_struct *vmg)
> static bool can_vma_merge_after(struct vma_merge_struct *vmg)
> {
> 	if (is_mergeable_vma(vmg, /* merge_next = */ false) &&
>-	    is_mergeable_anon_vma(vmg->anon_vma, vmg->prev->anon_vma, vmg->prev)) {
>+	    is_mergeable_anon_vma(vmg, /* merge_next = */ false)) {
> 		if (vmg->prev->vm_pgoff + vma_pages(vmg->prev) == vmg->pgoff)
> 			return true;
> 	}

We have two sets API to check vma's mergeability:

  * can_vma_merge_before/after
  * can_vma_merge_left/right

And xxx_merge_right() calls xxx_merge_before(), which is a little confusing.

Now can_vma_merge_before/after looks almost same. Do you think it would be
easier for reading to consolidate to one function?

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] mm/vma: fix incorrectly disallowed anonymous VMA merges
  2025-04-04 12:53   ` Wei Yang
@ 2025-04-04 13:04     ` Lorenzo Stoakes
  2025-04-04 23:32       ` Wei Yang
  0 siblings, 1 reply; 15+ messages in thread
From: Lorenzo Stoakes @ 2025-04-04 13:04 UTC (permalink / raw)
  To: Wei Yang
  Cc: Andrew Morton, Vlastimil Babka, Jann Horn, Liam R . Howlett,
	Suren Baghdasaryan, David Hildenbrand, Matthew Wilcox,
	Rik van Riel, linux-mm, linux-kernel

On Fri, Apr 04, 2025 at 12:53:15PM +0000, Wei Yang wrote:
> On Mon, Mar 17, 2025 at 09:15:03PM +0000, Lorenzo Stoakes wrote:
> [...]
> >However, we have a problem here - typically the vma passed here is the
> >destination VMA.
> >
> >For instance in vma_merge_existing_range() we invoke:
> >
> >can_vma_merge_left()
> >-> [ check that there is an immediately adjacent prior VMA ]
> >-> can_vma_merge_after()
> >  -> is_mergeable_vma() for general attribute check
> >-> is_mergeable_anon_vma([ proposed anon_vma ], prev->anon_vma, prev)
> >
> >So if we were considering a target unfaulted 'prev':
> >
> >	  unfaulted    faulted
> >	|-----------|-----------|
> >	|    prev   |    vma    |
> >	|-----------|-----------|
> >
> >This would call is_mergeable_anon_vma(NULL, vma->anon_vma, prev).
> >
> >The list_is_singular() check for vma->anon_vma_chain, an empty list on
> >fault, would cause this merge to _fail_ even though all else indicates a
> >merge.
> >
>
> Great spot. It is hiding there for 15 years.

Thanks!

>
> >Equally a simple merge into a next VMA would hit the same problem:
> >
> >	   faulted    unfaulted
> >	|-----------|-----------|
> >	|    vma    |    next   |
> >	|-----------|-----------|
> >
> [...]
> >---
> > mm/vma.c                |  81 +++++++++++++++++++++++---------
> > tools/testing/vma/vma.c | 100 +++++++++++++++++++++-------------------
> > 2 files changed, 111 insertions(+), 70 deletions(-)
> >
> >diff --git a/mm/vma.c b/mm/vma.c
> >index 5cdc5612bfc1..5418eef3a852 100644
> >--- a/mm/vma.c
> >+++ b/mm/vma.c
> >@@ -57,6 +57,22 @@ struct mmap_state {
> > 		.state = VMA_MERGE_START,				\
> > 	}
> >
> >+/*
> >+ * If, at any point, the VMA had unCoW'd mappings from parents, it will maintain
> >+ * more than one anon_vma_chain connecting it to more than one anon_vma. A merge
> >+ * would mean a wider range of folios sharing the root anon_vma lock, and thus
> >+ * potential lock contention, we do not wish to encourage merging such that this
> >+ * scales to a problem.
> >+ */
>
> I don't follow here. Take a look into do_wp_page(), where CoW happens. But I
> don't find where it will unlink parent anon_vma from vma->anon_vma_chain.

Look at anon_vma_clone() in fork case. It's not the point of CoW that's the
issue, it's propagation of AVC's upon fork.

>
> Per my understanding, the unlink behavior happens in unlink_anon_vma() which
> unlink all anon_vma on vma->anon_vma_chain. And the normal caller of
> unlink_anon_vma() is free_pgtables(). Other callers are on error path to
> release prepared data. From this perspective, I don't see the chance to unlink
> parent anon_vma from vma->anon_vma_chain either.
>
> But maybe I missed something. If it is not too bother, would you mind giving
> me a hint?

What you're saying is 'we never go back and fix this up once unCoW'd' which is
true, but I don't want to write several page-length essays in comments, and this
is a sensible way of looking at things for the purposes of this check.

In future, we may want to actually do something like this, if it makes sense.

>
> >+static bool vma_had_uncowed_parents(struct vm_area_struct *vma)
> >+{
> >+	/*
> >+	 * The list_is_singular() test is to avoid merging VMA cloned from
> >+	 * parents. This can improve scalability caused by anon_vma lock.
> >+	 */
> >+	return vma && vma->anon_vma && !list_is_singular(&vma->anon_vma_chain);
> >+}
> >+
> > static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_next)
> > {
> > 	struct vm_area_struct *vma = merge_next ? vmg->next : vmg->prev;
> >@@ -82,24 +98,28 @@ static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_nex
> > 	return true;
> > }
> >
> >-static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1,
> >-		 struct anon_vma *anon_vma2, struct vm_area_struct *vma)
> >+static bool is_mergeable_anon_vma(struct vma_merge_struct *vmg, bool merge_next)
> > {
> >+	struct vm_area_struct *tgt = merge_next ? vmg->next : vmg->prev;
> >+	struct vm_area_struct *src = vmg->middle; /* exisitng merge case. */
>                                                      ^^^
>
> A trivial typo here.

Thanks, if I have to respin I'll fix it.

>
> >+	struct anon_vma *tgt_anon = tgt->anon_vma;
> >+	struct anon_vma *src_anon = vmg->anon_vma;
> >+
> > 	/*
> >-	 * The list_is_singular() test is to avoid merging VMA cloned from
> >-	 * parents. This can improve scalability caused by anon_vma lock.
> >+	 * We _can_ have !src, vmg->anon_vma via copy_vma(). In this instance we
> >+	 * will remove the existing VMA's anon_vma's so there's no scalability
> >+	 * concerns.
> > 	 */
> >-	if ((!anon_vma1 || !anon_vma2) && (!vma ||
> >-		list_is_singular(&vma->anon_vma_chain)))
> >-		return true;
> >-	return anon_vma1 == anon_vma2;
> >-}
> >+	VM_WARN_ON(src && src_anon != src->anon_vma);
> >
> >-/* Are the anon_vma's belonging to each VMA compatible with one another? */
> >-static inline bool are_anon_vmas_compatible(struct vm_area_struct *vma1,
> >-					    struct vm_area_struct *vma2)
> >-{
> >-	return is_mergeable_anon_vma(vma1->anon_vma, vma2->anon_vma, NULL);
> >+	/* Case 1 - we will dup_anon_vma() from src into tgt. */
> >+	if (!tgt_anon && src_anon)
> >+		return !vma_had_uncowed_parents(src);
> >+	/* Case 2 - we will simply use tgt's anon_vma. */
> >+	if (tgt_anon && !src_anon)
> >+		return !vma_had_uncowed_parents(tgt);
> >+	/* Case 3 - the anon_vma's are already shared. */
> >+	return src_anon == tgt_anon;
> > }
> >
> > /*
> >@@ -164,7 +184,7 @@ static bool can_vma_merge_before(struct vma_merge_struct *vmg)
> > 	pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start);
> >
> > 	if (is_mergeable_vma(vmg, /* merge_next = */ true) &&
> >-	    is_mergeable_anon_vma(vmg->anon_vma, vmg->next->anon_vma, vmg->next)) {
> >+	    is_mergeable_anon_vma(vmg, /* merge_next = */ true)) {
> > 		if (vmg->next->vm_pgoff == vmg->pgoff + pglen)
> > 			return true;
> > 	}
> >@@ -184,7 +204,7 @@ static bool can_vma_merge_before(struct vma_merge_struct *vmg)
> > static bool can_vma_merge_after(struct vma_merge_struct *vmg)
> > {
> > 	if (is_mergeable_vma(vmg, /* merge_next = */ false) &&
> >-	    is_mergeable_anon_vma(vmg->anon_vma, vmg->prev->anon_vma, vmg->prev)) {
> >+	    is_mergeable_anon_vma(vmg, /* merge_next = */ false)) {
> > 		if (vmg->prev->vm_pgoff + vma_pages(vmg->prev) == vmg->pgoff)
> > 			return true;
> > 	}
>
> We have two sets API to check vma's mergeability:
>
>   * can_vma_merge_before/after
>   * can_vma_merge_left/right
>
> And xxx_merge_right() calls xxx_merge_before(), which is a little confusing.
>
> Now can_vma_merge_before/after looks almost same. Do you think it would be
> easier for reading to consolidate to one function?

Yeah it's a bit of a mess, some of it is 'maintaining the previous
structure' since people were at least familiar with it.

But this is something I do plan to address in future. Right now I think
with the heavy commenting and improvements on merge logic in general there
should really not be _too_ much confusion.

So a sensible fixup would be as part of a larger refactor. There is nothing
to do here at the moment, in my view.

>
> --
> Wei Yang
> Help you, Help me


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] mm/vma: fix incorrectly disallowed anonymous VMA merges
  2025-04-04 13:04     ` Lorenzo Stoakes
@ 2025-04-04 23:32       ` Wei Yang
  2025-04-07 10:24         ` Lorenzo Stoakes
  0 siblings, 1 reply; 15+ messages in thread
From: Wei Yang @ 2025-04-04 23:32 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Wei Yang, Andrew Morton, Vlastimil Babka, Jann Horn,
	Liam R . Howlett, Suren Baghdasaryan, David Hildenbrand,
	Matthew Wilcox, Rik van Riel, linux-mm, linux-kernel

On Fri, Apr 04, 2025 at 02:04:10PM +0100, Lorenzo Stoakes wrote:
>On Fri, Apr 04, 2025 at 12:53:15PM +0000, Wei Yang wrote:
>> On Mon, Mar 17, 2025 at 09:15:03PM +0000, Lorenzo Stoakes wrote:
>> [...]
>> >However, we have a problem here - typically the vma passed here is the
>> >destination VMA.
>> >
>> >For instance in vma_merge_existing_range() we invoke:
>> >
>> >can_vma_merge_left()
>> >-> [ check that there is an immediately adjacent prior VMA ]
>> >-> can_vma_merge_after()
>> >  -> is_mergeable_vma() for general attribute check
>> >-> is_mergeable_anon_vma([ proposed anon_vma ], prev->anon_vma, prev)
>> >
>> >So if we were considering a target unfaulted 'prev':
>> >
>> >	  unfaulted    faulted
>> >	|-----------|-----------|
>> >	|    prev   |    vma    |
>> >	|-----------|-----------|
>> >
>> >This would call is_mergeable_anon_vma(NULL, vma->anon_vma, prev).
>> >
>> >The list_is_singular() check for vma->anon_vma_chain, an empty list on
>> >fault, would cause this merge to _fail_ even though all else indicates a
>> >merge.
>> >
>>
>> Great spot. It is hiding there for 15 years.
>
>Thanks!
>
>>
>> >Equally a simple merge into a next VMA would hit the same problem:
>> >
>> >	   faulted    unfaulted
>> >	|-----------|-----------|
>> >	|    vma    |    next   |
>> >	|-----------|-----------|
>> >
>> [...]
>> >---
>> > mm/vma.c                |  81 +++++++++++++++++++++++---------
>> > tools/testing/vma/vma.c | 100 +++++++++++++++++++++-------------------
>> > 2 files changed, 111 insertions(+), 70 deletions(-)
>> >
>> >diff --git a/mm/vma.c b/mm/vma.c
>> >index 5cdc5612bfc1..5418eef3a852 100644
>> >--- a/mm/vma.c
>> >+++ b/mm/vma.c
>> >@@ -57,6 +57,22 @@ struct mmap_state {
>> > 		.state = VMA_MERGE_START,				\
>> > 	}
>> >
>> >+/*
>> >+ * If, at any point, the VMA had unCoW'd mappings from parents, it will maintain
>> >+ * more than one anon_vma_chain connecting it to more than one anon_vma. A merge
>> >+ * would mean a wider range of folios sharing the root anon_vma lock, and thus
>> >+ * potential lock contention, we do not wish to encourage merging such that this
>> >+ * scales to a problem.
>> >+ */
>>
>> I don't follow here. Take a look into do_wp_page(), where CoW happens. But I
>> don't find where it will unlink parent anon_vma from vma->anon_vma_chain.
>
>Look at anon_vma_clone() in fork case. It's not the point of CoW that's the
>issue, it's propagation of AVC's upon fork.
>
>>
>> Per my understanding, the unlink behavior happens in unlink_anon_vma() which
>> unlink all anon_vma on vma->anon_vma_chain. And the normal caller of
>> unlink_anon_vma() is free_pgtables(). Other callers are on error path to
>> release prepared data. From this perspective, I don't see the chance to unlink
>> parent anon_vma from vma->anon_vma_chain either.
>>
>> But maybe I missed something. If it is not too bother, would you mind giving
>> me a hint?
>
>What you're saying is 'we never go back and fix this up once unCoW'd' which is
>true, but I don't want to write several page-length essays in comments, and this
>is a sensible way of looking at things for the purposes of this check.
>
>In future, we may want to actually do something like this, if it makes sense.
>

Ok, this is the future plan instead of current behavior.

My personal feeling is it would misleading to readers. I would think if all
pages mapping in VMA is Cow'd, the vma->anon_vma_chain becomes singular in
current kernel. 

A page-length comment is not we want, how about "maybe_root_anon_vma"? When
vma->anon_vma_chain is empty or singular, it means the (future) vma->anon_vma
is the root anon_vma.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] tools/testing: add PROCMAP_QUERY helper functions in mm self tests
  2025-03-17 21:15 ` [PATCH 2/3] tools/testing: add PROCMAP_QUERY helper functions in mm self tests Lorenzo Stoakes
  2025-03-18 15:18   ` Lorenzo Stoakes
@ 2025-04-07  2:42   ` Wei Yang
  1 sibling, 0 replies; 15+ messages in thread
From: Wei Yang @ 2025-04-07  2:42 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Vlastimil Babka, Jann Horn, Liam R . Howlett,
	Suren Baghdasaryan, David Hildenbrand, Matthew Wilcox,
	Rik van Riel, linux-mm, linux-kernel

On Mon, Mar 17, 2025 at 09:15:04PM +0000, Lorenzo Stoakes wrote:
>The PROCMAP_QUERY ioctl() is very useful - it allows for binary access to
>/proc/$pid/[s]maps data and thus convenient lookup of data contained there.
>
>This patch exposes this for convenient use by mm self tests so the state of
>VMAs can easily be queried.
>
>Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Reviewed-by: Wei Yang <richard.weiyang@gmail.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] tools/testing/selftests: assert that anon merge cases behave as expected
  2025-03-17 21:15 ` [PATCH 3/3] tools/testing/selftests: assert that anon merge cases behave as expected Lorenzo Stoakes
@ 2025-04-07  2:54   ` Wei Yang
  2025-04-07 11:02     ` Lorenzo Stoakes
  0 siblings, 1 reply; 15+ messages in thread
From: Wei Yang @ 2025-04-07  2:54 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Vlastimil Babka, Jann Horn, Liam R . Howlett,
	Suren Baghdasaryan, David Hildenbrand, Matthew Wilcox,
	Rik van Riel, linux-mm, linux-kernel

On Mon, Mar 17, 2025 at 09:15:05PM +0000, Lorenzo Stoakes wrote:
>Prior to the recently applied commit that permits this merge,
>mprotect()'ing a faulted VMA, adjacent to an unfaulted VMA, such that the
>two share characteristics would fail to merge due to what appear to be
>unintended consequences of commit 965f55dea0e3 ("mmap: avoid merging cloned
>VMAs").
>
>Now we have fixed this bug, assert that we can indeed merge anonymous VMAs
>this way.
>
>Also assert that forked source/target VMAs are equally
>rejected. Previously, all empty target anon merges with one VMA faulted and
>the other unfaulted would be rejected incorrectly, now we ensure that
>unforked merge, but forked do not.
>
>Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>---
> tools/testing/selftests/mm/.gitignore     |   1 +
> tools/testing/selftests/mm/Makefile       |   1 +
> tools/testing/selftests/mm/merge.c        | 454 ++++++++++++++++++++++
> tools/testing/selftests/mm/run_vmtests.sh |   2 +
> 4 files changed, 458 insertions(+)
> create mode 100644 tools/testing/selftests/mm/merge.c
>
>diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore
>index c5241b193db8..91db34941a14 100644
>--- a/tools/testing/selftests/mm/.gitignore
>+++ b/tools/testing/selftests/mm/.gitignore
>@@ -58,3 +58,4 @@ hugetlb_dio
> pkey_sighandler_tests_32
> pkey_sighandler_tests_64
> guard-regions
>+merge
>diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
>index 8270895039d1..ad4d6043a60f 100644
>--- a/tools/testing/selftests/mm/Makefile
>+++ b/tools/testing/selftests/mm/Makefile
>@@ -98,6 +98,7 @@ TEST_GEN_FILES += hugetlb_madv_vs_map
> TEST_GEN_FILES += hugetlb_dio
> TEST_GEN_FILES += droppable
> TEST_GEN_FILES += guard-regions
>+TEST_GEN_FILES += merge
> 
> ifneq ($(ARCH),arm64)
> TEST_GEN_FILES += soft-dirty
>diff --git a/tools/testing/selftests/mm/merge.c b/tools/testing/selftests/mm/merge.c
>new file mode 100644
>index 000000000000..9cc61bdbfba8
>--- /dev/null
>+++ b/tools/testing/selftests/mm/merge.c
>@@ -0,0 +1,454 @@
>+// SPDX-License-Identifier: GPL-2.0-or-later
>+
>+#define _GNU_SOURCE
>+#include "../kselftest_harness.h"
>+#include <stdio.h>
>+#include <stdlib.h>
>+#include <unistd.h>
>+#include <sys/mman.h>
>+#include <sys/wait.h>
>+#include "vm_util.h"
>+
>+FIXTURE(merge)
>+{
>+	unsigned int page_size;
>+	char *carveout;
>+	struct procmap_fd procmap;
>+};
>+
>+FIXTURE_SETUP(merge)
>+{
>+	self->page_size = psize();
>+	/* Carve out PROT_NONE region to map over. */
>+	self->carveout = mmap(NULL, 12 * self->page_size, PROT_NONE,
>+			      MAP_ANON | MAP_PRIVATE, -1, 0);
>+	ASSERT_NE(self->carveout, MAP_FAILED);
>+	/* Setup PROCMAP_QUERY interface. */
>+	ASSERT_EQ(open_self_procmap(&self->procmap), 0);
>+}
>+
>+FIXTURE_TEARDOWN(merge)
>+{
>+	ASSERT_EQ(munmap(self->carveout, 12 * self->page_size), 0);
>+	ASSERT_EQ(close_procmap(&self->procmap), 0);
>+}
>+
>+TEST_F(merge, mprotect_unfaulted_left)
>+{
>+	unsigned int page_size = self->page_size;
>+	char *carveout = self->carveout;
>+	struct procmap_fd *procmap = &self->procmap;
>+	char *ptr;
>+
>+	/*
>+	 * Map 10 pages of R/W memory within. MAP_NORESERVE so we don't hit
>+	 * merge failure due to lack of VM_ACCOUNT flag by mistake.
>+	 *
>+	 * |-----------------------|
>+	 * |       unfaulted       |
>+	 * |-----------------------|
>+	 */
>+	ptr = mmap(&carveout[page_size], 10 * page_size, PROT_READ | PROT_WRITE,
>+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
>+	ASSERT_NE(ptr, MAP_FAILED);
>+	/*
>+	 * Now make the first 5 pages read-only, splitting the VMA:
>+	 *
>+	 *      RO          RW
>+	 * |-----------|-----------|
>+	 * | unfaulted | unfaulted |
>+	 * |-----------|-----------|
>+	 */
>+	ASSERT_EQ(mprotect(ptr, 5 * page_size, PROT_READ), 0);
>+	/*
>+	 * Fault in the first of the last 5 pages so it gets an anon_vma and
>+	 * thus the whole VMA becomes 'faulted':
>+	 *
>+	 *      RO          RW
>+	 * |-----------|-----------|
>+	 * | unfaulted |  faulted  |
>+	 * |-----------|-----------|
>+	 */
>+	ptr[5 * page_size] = 'x';
>+	/*
>+	 * Now mprotect() the RW region read-only, we should merge (though for
>+	 * ~15 years we did not! :):
>+	 *
>+	 *             RO
>+	 * |-----------------------|
>+	 * |        faulted        |
>+	 * |-----------------------|
>+	 */
>+	ASSERT_EQ(mprotect(&ptr[5 * page_size], 5 * page_size, PROT_READ), 0);
>+
>+	/* Assert that the merge succeeded using PROCMAP_QUERY. */
>+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
>+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
>+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 10 * page_size);
>+}
>+
>+TEST_F(merge, mprotect_unfaulted_right)
>+{
>+	unsigned int page_size = self->page_size;
>+	char *carveout = self->carveout;
>+	struct procmap_fd *procmap = &self->procmap;
>+	char *ptr;
>+
>+	/*
>+	 * |-----------------------|
>+	 * |       unfaulted       |
>+	 * |-----------------------|
>+	 */
>+	ptr = mmap(&carveout[page_size], 10 * page_size, PROT_READ | PROT_WRITE,
>+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
>+	ASSERT_NE(ptr, MAP_FAILED);
>+	/*
>+	 * Now make the last 5 pages read-only, splitting the VMA:
>+	 *
>+	 *      RW          RO
>+	 * |-----------|-----------|
>+	 * | unfaulted | unfaulted |
>+	 * |-----------|-----------|
>+	 */
>+	ASSERT_EQ(mprotect(&ptr[5 * page_size], 5 * page_size, PROT_READ), 0);
>+	/*
>+	 * Fault in the first of the first 5 pages so it gets an anon_vma and
>+	 * thus the whole VMA becomes 'faulted':
>+	 *
>+	 *      RW          RO
>+	 * |-----------|-----------|
>+	 * |  faulted  | unfaulted |
>+	 * |-----------|-----------|
>+	 */
>+	ptr[0] = 'x';
>+	/*
>+	 * Now mprotect() the RW region read-only, we should merge:
>+	 *
>+	 *             RO
>+	 * |-----------------------|
>+	 * |        faulted        |
>+	 * |-----------------------|
>+	 */
>+	ASSERT_EQ(mprotect(ptr, 5 * page_size, PROT_READ), 0);
>+
>+	/* Assert that the merge succeeded using PROCMAP_QUERY. */
>+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
>+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
>+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 10 * page_size);
>+}
>+
>+TEST_F(merge, mprotect_unfaulted_both)
>+{
>+	unsigned int page_size = self->page_size;
>+	char *carveout = self->carveout;
>+	struct procmap_fd *procmap = &self->procmap;
>+	char *ptr;
>+
>+	/*
>+	 * |-----------------------|
>+	 * |       unfaulted       |
>+	 * |-----------------------|
>+	 */
>+	ptr = mmap(&carveout[2 * page_size], 9 * page_size, PROT_READ | PROT_WRITE,
>+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
>+	ASSERT_NE(ptr, MAP_FAILED);
>+	/*
>+	 * Now make the first and last 3 pages read-only, splitting the VMA:
>+	 *
>+	 *      RO          RW          RO
>+	 * |-----------|-----------|-----------|
>+	 * | unfaulted | unfaulted | unfaulted |
>+	 * |-----------|-----------|-----------|
>+	 */
>+	ASSERT_EQ(mprotect(ptr, 3 * page_size, PROT_READ), 0);
>+	ASSERT_EQ(mprotect(&ptr[6 * page_size], 3 * page_size, PROT_READ), 0);
>+	/*
>+	 * Fault in the first of the middle 3 pages so it gets an anon_vma and
>+	 * thus the whole VMA becomes 'faulted':
>+	 *
>+	 *      RO          RW          RO
>+	 * |-----------|-----------|-----------|
>+	 * | unfaulted |  faulted  | unfaulted |
>+	 * |-----------|-----------|-----------|
>+	 */
>+	ptr[3 * page_size] = 'x';
>+	/*
>+	 * Now mprotect() the RW region read-only, we should merge:
>+	 *
>+	 *             RO
>+	 * |-----------------------|
>+	 * |        faulted        |
>+	 * |-----------------------|
>+	 */
>+	ASSERT_EQ(mprotect(&ptr[3 * page_size], 3 * page_size, PROT_READ), 0);
>+
>+	/* Assert that the merge succeeded using PROCMAP_QUERY. */
>+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
>+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
>+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 9 * page_size);
>+}
>+
>+TEST_F(merge, mprotect_faulted_left_unfaulted_right)
>+{
>+	unsigned int page_size = self->page_size;
>+	char *carveout = self->carveout;
>+	struct procmap_fd *procmap = &self->procmap;
>+	char *ptr;
>+
>+	/*
>+	 * |-----------------------|
>+	 * |       unfaulted       |
>+	 * |-----------------------|
>+	 */
>+	ptr = mmap(&carveout[2 * page_size], 9 * page_size, PROT_READ | PROT_WRITE,
>+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
>+	ASSERT_NE(ptr, MAP_FAILED);
>+	/*
>+	 * Now make the last 3 pages read-only, splitting the VMA:
>+	 *
>+	 *             RW               RO
>+	 * |-----------------------|-----------|
>+	 * |       unfaulted       | unfaulted |
>+	 * |-----------------------|-----------|
>+	 */
>+	ASSERT_EQ(mprotect(&ptr[6 * page_size], 3 * page_size, PROT_READ), 0);
>+	/*
>+	 * Fault in the first of the first 6 pages so it gets an anon_vma and
>+	 * thus the whole VMA becomes 'faulted':
>+	 *
>+	 *             RW               RO
>+	 * |-----------------------|-----------|
>+	 * |       unfaulted       | unfaulted |
                   ^^^

According to your previous comment convention, the comment here describe the
result after ptr[0] = 'x'.

faulted is the correct description here?

>+	 * |-----------------------|-----------|
>+	 */
>+	ptr[0] = 'x';
>+	/*
>+	 * Now make the first 3 pages read-only, splitting the VMA:
>+	 *
>+	 *      RO          RW          RO
>+	 * |-----------|-----------|-----------|
>+	 * |  faulted  |  faulted  | unfaulted |
>+	 * |-----------|-----------|-----------|
>+	 */
>+	ASSERT_EQ(mprotect(ptr, 3 * page_size, PROT_READ), 0);
>+	/*
>+	 * Now mprotect() the RW region read-only, we should merge:
>+	 *
>+	 *             RO
>+	 * |-----------------------|
>+	 * |        faulted        |
>+	 * |-----------------------|
>+	 */
>+	ASSERT_EQ(mprotect(&ptr[3 * page_size], 3 * page_size, PROT_READ), 0);
>+
>+	/* Assert that the merge succeeded using PROCMAP_QUERY. */
>+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
>+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
>+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 9 * page_size);
>+}
>+
>+TEST_F(merge, mprotect_unfaulted_left_faulted_right)
>+{
>+	unsigned int page_size = self->page_size;
>+	char *carveout = self->carveout;
>+	struct procmap_fd *procmap = &self->procmap;
>+	char *ptr;
>+
>+	/*
>+	 * |-----------------------|
>+	 * |       unfaulted       |
>+	 * |-----------------------|
>+	 */
>+	ptr = mmap(&carveout[2 * page_size], 9 * page_size, PROT_READ | PROT_WRITE,
>+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
>+	ASSERT_NE(ptr, MAP_FAILED);
>+	/*
>+	 * Now make the first 3 pages read-only, splitting the VMA:
>+	 *
>+	 *      RO                RW
>+	 * |-----------|-----------------------|
>+	 * | unfaulted |       unfaulted       |
>+	 * |-----------|-----------------------|
>+	 */
>+	ASSERT_EQ(mprotect(ptr, 3 * page_size, PROT_READ), 0);
>+	/*
>+	 * FAult in the first of the last 6 pages so it gets an anon_vma and
            ^

s/A/a/

>+	 * thus the whole VMA becomes 'faulted':
>+	 *
>+	 *      RO                RW
>+	 * |-----------|-----------------------|
>+	 * | unfaulted |        faulted        |
>+	 * |-----------|-----------------------|
>+	 */
>+	ptr[3 * page_size] = 'x';
>+	/*
>+	 * Now make the last 3 pages read-only, splitting the VMA:
>+	 *
>+	 *      RO          RW          RO
>+	 * |-----------|-----------|-----------|
>+	 * | unfaulted |  faulted  |  faulted  |
>+	 * |-----------|-----------|-----------|
>+	 */
>+	ASSERT_EQ(mprotect(&ptr[6 * page_size], 3 * page_size, PROT_READ), 0);
>+	/*
>+	 * Now mprotect() the RW region read-only, we should merge:
>+	 *
>+	 *             RO
>+	 * |-----------------------|
>+	 * |        faulted        |
>+	 * |-----------------------|
>+	 */
>+	ASSERT_EQ(mprotect(&ptr[3 * page_size], 3 * page_size, PROT_READ), 0);
>+
>+	/* Assert that the merge succeeded using PROCMAP_QUERY. */
>+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
>+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
>+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 9 * page_size);
>+}
>+
>+TEST_F(merge, forked_target_vma)
>+{
>+	unsigned int page_size = self->page_size;
>+	char *carveout = self->carveout;
>+	struct procmap_fd *procmap = &self->procmap;
>+	pid_t pid;
>+	char *ptr, *ptr2;
>+	int i;
>+
>+	/*
>+	 * |-----------|
>+	 * | unfaulted |
>+	 * |-----------|
>+	 */
>+	ptr = mmap(&carveout[page_size], 5 * page_size, PROT_READ | PROT_WRITE,
>+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED, -1, 0);
>+	ASSERT_NE(ptr, MAP_FAILED);
>+
>+	/*
>+	 * Fault in process.
>+	 *
>+	 * |-----------|
>+	 * |  faulted  |
>+	 * |-----------|
>+	 */
>+	ptr[0] = 'x';
>+
>+	pid = fork();
>+	ASSERT_NE(pid, -1);
>+
>+	if (pid != 0) {
>+		wait(NULL);
>+		return;
>+	}
>+
>+	/* Child process below: */
>+
>+	/* Reopen for child. */
>+	ASSERT_EQ(close_procmap(&self->procmap), 0);
>+	ASSERT_EQ(open_self_procmap(&self->procmap), 0);
>+
>+	/* unCOWing everything does not cause the AVC to go away. */
           ^^^

Before ptr[i] = 'x', we have unCOWed pages in vma. What we are doing here is
COWing, right?

>+	for (i = 0; i < 5 * page_size; i += page_size)
>+		ptr[i] = 'x';
>+
>+	/*
>+	 * Map in adjacent VMA in child.
>+	 *
>+	 *     forked
>+	 * |-----------|-----------|
>+	 * |  faulted  | unfaulted |
>+	 * |-----------|-----------|
>+	 *      ptr         ptr2
>+	 */
>+	ptr2 = mmap(&ptr[5 * page_size], 5 * page_size, PROT_READ | PROT_WRITE,
>+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED, -1, 0);
>+	ASSERT_NE(ptr2, MAP_FAILED);
>+
>+	/* Make sure not merged. */
>+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
>+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
>+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 5 * page_size);
>+}
>+
>+TEST_F(merge, forked_source_vma)
>+{
>+	unsigned int page_size = self->page_size;
>+	char *carveout = self->carveout;
>+	struct procmap_fd *procmap = &self->procmap;
>+	pid_t pid;
>+	char *ptr, *ptr2;
>+	int i;
>+
>+	/*
>+	 * |............|-----------|
>+	 * | <unmapped> | unfaulted |
>+	 * |............|-----------|

I am not sure "unmapped" is correct here. The range has already been mapped by
FIXTURE_SETUP(merge).

>+	 */
>+	ptr = mmap(&carveout[page_size], 5 * page_size, PROT_READ | PROT_WRITE,
>+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
>+	ASSERT_NE(ptr, MAP_FAILED);
>+
>+	/*
>+	 * Fault in process.
>+	 *
>+	 * |............||-----------|
>+	 * | <unmapped> ||  faulted  |
>+	 * |............||-----------|
                         ^

Extra line here?

>+	 */
>+	ptr[0] = 'x';
>+
>+	pid = fork();
>+	ASSERT_NE(pid, -1);
>+
>+	if (pid != 0) {
>+		wait(NULL);
>+		return;
>+	}
>+
>+	/* Child process below: */
>+
>+	/* Reopen for child. */
>+	ASSERT_EQ(close_procmap(&self->procmap), 0);
>+	ASSERT_EQ(open_self_procmap(&self->procmap), 0);
>+
>+	/* unCOWing everything does not cause the AVC to go away. */

Same as above.

>+	for (i = 0; i < 5 * page_size; i += page_size)
>+		ptr[i] = 'x';
>+
>+	/*
>+	 * Map in adjacent VMA in child, ptr2 before ptr, but incompatible.
>+	 *
>+	 *      RWX      forked RW
>+	 * |-----------|-----------|
>+	 * | unfaulted |  faulted  |
>+	 * |-----------|-----------|
>+	 *      ptr2        ptr
>+	 */
>+	ptr2 = mmap(&carveout[6 * page_size], 5 * page_size, PROT_READ | PROT_WRITE | PROT_EXEC,
>+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
>+	ASSERT_NE(ptr2, MAP_FAILED);
>+

I think pt2 is after ptr. Do I miss something?

>+
>+	/* Make sure not merged. */
>+	ASSERT_TRUE(find_vma_procmap(procmap, ptr2));
>+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr2);
>+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr2 + 5 * page_size);
>+
>+	/*
>+	 * Now mprotect forked region to RWX so it becomes the source for the
>+	 * merge to unfaulted region:
>+	 *
>+	 *      RWX      forked RWX
>+	 * |-----------|-----------|
>+	 * | unfaulted |  faulted  |
>+	 * |-----------|-----------|
>+	 *      ptr2        ptr
>+	 */
>+	ASSERT_EQ(mprotect(ptr, 5 * page_size, PROT_READ | PROT_WRITE | PROT_EXEC), 0);
>+	/* Again, make sure not merged. */
>+	ASSERT_TRUE(find_vma_procmap(procmap, ptr2));
>+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr2);
>+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr2 + 5 * page_size);
>+}
>+
>+TEST_HARNESS_MAIN
>diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
>index 9aff33b10999..0b2f8bb91433 100755
>--- a/tools/testing/selftests/mm/run_vmtests.sh
>+++ b/tools/testing/selftests/mm/run_vmtests.sh
>@@ -421,6 +421,8 @@ CATEGORY="madv_guard" run_test ./guard-regions
> # MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
> CATEGORY="madv_populate" run_test ./madv_populate
> 
>+CATEGORY="vma_merge" run_test ./merge
>+

./run_vmtests.sh -h would show a list of categories. 

How about add the new category "vma_merge" into the list.

> if [ -x ./memfd_secret ]
> then
> (echo 0 > /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
>-- 
>2.48.1
>

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] mm/vma: fix incorrectly disallowed anonymous VMA merges
  2025-04-04 23:32       ` Wei Yang
@ 2025-04-07 10:24         ` Lorenzo Stoakes
  2025-04-07 16:44           ` Lorenzo Stoakes
  0 siblings, 1 reply; 15+ messages in thread
From: Lorenzo Stoakes @ 2025-04-07 10:24 UTC (permalink / raw)
  To: Wei Yang
  Cc: Andrew Morton, Vlastimil Babka, Jann Horn, Liam R . Howlett,
	Suren Baghdasaryan, David Hildenbrand, Matthew Wilcox,
	Rik van Riel, linux-mm, linux-kernel

On Fri, Apr 04, 2025 at 11:32:31PM +0000, Wei Yang wrote:
> On Fri, Apr 04, 2025 at 02:04:10PM +0100, Lorenzo Stoakes wrote:
> >On Fri, Apr 04, 2025 at 12:53:15PM +0000, Wei Yang wrote:
> >> On Mon, Mar 17, 2025 at 09:15:03PM +0000, Lorenzo Stoakes wrote:
> >> [...]
> >> >However, we have a problem here - typically the vma passed here is the
> >> >destination VMA.
> >> >
> >> >For instance in vma_merge_existing_range() we invoke:
> >> >
> >> >can_vma_merge_left()
> >> >-> [ check that there is an immediately adjacent prior VMA ]
> >> >-> can_vma_merge_after()
> >> >  -> is_mergeable_vma() for general attribute check
> >> >-> is_mergeable_anon_vma([ proposed anon_vma ], prev->anon_vma, prev)
> >> >
> >> >So if we were considering a target unfaulted 'prev':
> >> >
> >> >	  unfaulted    faulted
> >> >	|-----------|-----------|
> >> >	|    prev   |    vma    |
> >> >	|-----------|-----------|
> >> >
> >> >This would call is_mergeable_anon_vma(NULL, vma->anon_vma, prev).
> >> >
> >> >The list_is_singular() check for vma->anon_vma_chain, an empty list on
> >> >fault, would cause this merge to _fail_ even though all else indicates a
> >> >merge.
> >> >
> >>
> >> Great spot. It is hiding there for 15 years.
> >
> >Thanks!
> >
> >>
> >> >Equally a simple merge into a next VMA would hit the same problem:
> >> >
> >> >	   faulted    unfaulted
> >> >	|-----------|-----------|
> >> >	|    vma    |    next   |
> >> >	|-----------|-----------|
> >> >
> >> [...]
> >> >---
> >> > mm/vma.c                |  81 +++++++++++++++++++++++---------
> >> > tools/testing/vma/vma.c | 100 +++++++++++++++++++++-------------------
> >> > 2 files changed, 111 insertions(+), 70 deletions(-)
> >> >
> >> >diff --git a/mm/vma.c b/mm/vma.c
> >> >index 5cdc5612bfc1..5418eef3a852 100644
> >> >--- a/mm/vma.c
> >> >+++ b/mm/vma.c
> >> >@@ -57,6 +57,22 @@ struct mmap_state {
> >> > 		.state = VMA_MERGE_START,				\
> >> > 	}
> >> >
> >> >+/*
> >> >+ * If, at any point, the VMA had unCoW'd mappings from parents, it will maintain
> >> >+ * more than one anon_vma_chain connecting it to more than one anon_vma. A merge
> >> >+ * would mean a wider range of folios sharing the root anon_vma lock, and thus
> >> >+ * potential lock contention, we do not wish to encourage merging such that this
> >> >+ * scales to a problem.
> >> >+ */
> >>
> >> I don't follow here. Take a look into do_wp_page(), where CoW happens. But I
> >> don't find where it will unlink parent anon_vma from vma->anon_vma_chain.
> >
> >Look at anon_vma_clone() in fork case. It's not the point of CoW that's the
> >issue, it's propagation of AVC's upon fork.
> >
> >>
> >> Per my understanding, the unlink behavior happens in unlink_anon_vma() which
> >> unlink all anon_vma on vma->anon_vma_chain. And the normal caller of
> >> unlink_anon_vma() is free_pgtables(). Other callers are on error path to
> >> release prepared data. From this perspective, I don't see the chance to unlink
> >> parent anon_vma from vma->anon_vma_chain either.
> >>
> >> But maybe I missed something. If it is not too bother, would you mind giving
> >> me a hint?
> >
> >What you're saying is 'we never go back and fix this up once unCoW'd' which is
> >true, but I don't want to write several page-length essays in comments, and this
> >is a sensible way of looking at things for the purposes of this check.
> >
> >In future, we may want to actually do something like this, if it makes sense.
> >
>
> Ok, this is the future plan instead of current behavior.

No, it also describes the current situation, there just isn't what you've
decided 'should' happen or is implied by this naming later.

I find this to be rather pedantic, honestly. If you're trying to base your
understanding of an incredibly subtle and difficult part of mm on function
names, then you are making a mistake.

The name describes reality closely enough.

>
> My personal feeling is it would misleading to readers. I would think if all
> pages mapping in VMA is Cow'd, the vma->anon_vma_chain becomes singular in
> current kernel.

Yup, you've said this already... :)

>
> A page-length comment is not we want, how about "maybe_root_anon_vma"? When
> vma->anon_vma_chain is empty or singular, it means the (future) vma->anon_vma
> is the root anon_vma.

No, that's a horrible name and your explanation is extremely
confusing. This is already a very confusing area of the kernel (one I am
really trying to improve upon, FWIW), and I don't think this change would
serve to improve things here.

Since you seem dead-set on this issue, I think the best solution here is
just for me to update the comment a little. I'll send a fix-patch.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] tools/testing/selftests: assert that anon merge cases behave as expected
  2025-04-07  2:54   ` Wei Yang
@ 2025-04-07 11:02     ` Lorenzo Stoakes
  2025-04-07 12:09       ` Wei Yang
  0 siblings, 1 reply; 15+ messages in thread
From: Lorenzo Stoakes @ 2025-04-07 11:02 UTC (permalink / raw)
  To: Wei Yang
  Cc: Andrew Morton, Vlastimil Babka, Jann Horn, Liam R . Howlett,
	Suren Baghdasaryan, David Hildenbrand, Matthew Wilcox,
	Rik van Riel, linux-mm, linux-kernel

I know you mean well Wei,

But drive-by extremely pedantic review on minor details isn't really
useful. I can't tell you not to do this, but I can at least ask. I don't
think this is a great use of either of our time.

Thanks.

On Mon, Apr 07, 2025 at 02:54:56AM +0000, Wei Yang wrote:
> On Mon, Mar 17, 2025 at 09:15:05PM +0000, Lorenzo Stoakes wrote:
> >Prior to the recently applied commit that permits this merge,
> >mprotect()'ing a faulted VMA, adjacent to an unfaulted VMA, such that the
> >two share characteristics would fail to merge due to what appear to be
> >unintended consequences of commit 965f55dea0e3 ("mmap: avoid merging cloned
> >VMAs").
> >
> >Now we have fixed this bug, assert that we can indeed merge anonymous VMAs
> >this way.
> >
> >Also assert that forked source/target VMAs are equally
> >rejected. Previously, all empty target anon merges with one VMA faulted and
> >the other unfaulted would be rejected incorrectly, now we ensure that
> >unforked merge, but forked do not.
> >
> >Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> >---
> > tools/testing/selftests/mm/.gitignore     |   1 +
> > tools/testing/selftests/mm/Makefile       |   1 +
> > tools/testing/selftests/mm/merge.c        | 454 ++++++++++++++++++++++
> > tools/testing/selftests/mm/run_vmtests.sh |   2 +
> > 4 files changed, 458 insertions(+)
> > create mode 100644 tools/testing/selftests/mm/merge.c
> >
> >diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore
> >index c5241b193db8..91db34941a14 100644
> >--- a/tools/testing/selftests/mm/.gitignore
> >+++ b/tools/testing/selftests/mm/.gitignore
> >@@ -58,3 +58,4 @@ hugetlb_dio
> > pkey_sighandler_tests_32
> > pkey_sighandler_tests_64
> > guard-regions
> >+merge
> >diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
> >index 8270895039d1..ad4d6043a60f 100644
> >--- a/tools/testing/selftests/mm/Makefile
> >+++ b/tools/testing/selftests/mm/Makefile
> >@@ -98,6 +98,7 @@ TEST_GEN_FILES += hugetlb_madv_vs_map
> > TEST_GEN_FILES += hugetlb_dio
> > TEST_GEN_FILES += droppable
> > TEST_GEN_FILES += guard-regions
> >+TEST_GEN_FILES += merge
> >
> > ifneq ($(ARCH),arm64)
> > TEST_GEN_FILES += soft-dirty
> >diff --git a/tools/testing/selftests/mm/merge.c b/tools/testing/selftests/mm/merge.c
> >new file mode 100644
> >index 000000000000..9cc61bdbfba8
> >--- /dev/null
> >+++ b/tools/testing/selftests/mm/merge.c
> >@@ -0,0 +1,454 @@
> >+// SPDX-License-Identifier: GPL-2.0-or-later
> >+
> >+#define _GNU_SOURCE
> >+#include "../kselftest_harness.h"
> >+#include <stdio.h>
> >+#include <stdlib.h>
> >+#include <unistd.h>
> >+#include <sys/mman.h>
> >+#include <sys/wait.h>
> >+#include "vm_util.h"
> >+
> >+FIXTURE(merge)
> >+{
> >+	unsigned int page_size;
> >+	char *carveout;
> >+	struct procmap_fd procmap;
> >+};
> >+
> >+FIXTURE_SETUP(merge)
> >+{
> >+	self->page_size = psize();
> >+	/* Carve out PROT_NONE region to map over. */
> >+	self->carveout = mmap(NULL, 12 * self->page_size, PROT_NONE,
> >+			      MAP_ANON | MAP_PRIVATE, -1, 0);
> >+	ASSERT_NE(self->carveout, MAP_FAILED);
> >+	/* Setup PROCMAP_QUERY interface. */
> >+	ASSERT_EQ(open_self_procmap(&self->procmap), 0);
> >+}
> >+
> >+FIXTURE_TEARDOWN(merge)
> >+{
> >+	ASSERT_EQ(munmap(self->carveout, 12 * self->page_size), 0);
> >+	ASSERT_EQ(close_procmap(&self->procmap), 0);
> >+}
> >+
> >+TEST_F(merge, mprotect_unfaulted_left)
> >+{
> >+	unsigned int page_size = self->page_size;
> >+	char *carveout = self->carveout;
> >+	struct procmap_fd *procmap = &self->procmap;
> >+	char *ptr;
> >+
> >+	/*
> >+	 * Map 10 pages of R/W memory within. MAP_NORESERVE so we don't hit
> >+	 * merge failure due to lack of VM_ACCOUNT flag by mistake.
> >+	 *
> >+	 * |-----------------------|
> >+	 * |       unfaulted       |
> >+	 * |-----------------------|
> >+	 */
> >+	ptr = mmap(&carveout[page_size], 10 * page_size, PROT_READ | PROT_WRITE,
> >+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
> >+	ASSERT_NE(ptr, MAP_FAILED);
> >+	/*
> >+	 * Now make the first 5 pages read-only, splitting the VMA:
> >+	 *
> >+	 *      RO          RW
> >+	 * |-----------|-----------|
> >+	 * | unfaulted | unfaulted |
> >+	 * |-----------|-----------|
> >+	 */
> >+	ASSERT_EQ(mprotect(ptr, 5 * page_size, PROT_READ), 0);
> >+	/*
> >+	 * Fault in the first of the last 5 pages so it gets an anon_vma and
> >+	 * thus the whole VMA becomes 'faulted':
> >+	 *
> >+	 *      RO          RW
> >+	 * |-----------|-----------|
> >+	 * | unfaulted |  faulted  |
> >+	 * |-----------|-----------|
> >+	 */
> >+	ptr[5 * page_size] = 'x';
> >+	/*
> >+	 * Now mprotect() the RW region read-only, we should merge (though for
> >+	 * ~15 years we did not! :):
> >+	 *
> >+	 *             RO
> >+	 * |-----------------------|
> >+	 * |        faulted        |
> >+	 * |-----------------------|
> >+	 */
> >+	ASSERT_EQ(mprotect(&ptr[5 * page_size], 5 * page_size, PROT_READ), 0);
> >+
> >+	/* Assert that the merge succeeded using PROCMAP_QUERY. */
> >+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
> >+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
> >+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 10 * page_size);
> >+}
> >+
> >+TEST_F(merge, mprotect_unfaulted_right)
> >+{
> >+	unsigned int page_size = self->page_size;
> >+	char *carveout = self->carveout;
> >+	struct procmap_fd *procmap = &self->procmap;
> >+	char *ptr;
> >+
> >+	/*
> >+	 * |-----------------------|
> >+	 * |       unfaulted       |
> >+	 * |-----------------------|
> >+	 */
> >+	ptr = mmap(&carveout[page_size], 10 * page_size, PROT_READ | PROT_WRITE,
> >+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
> >+	ASSERT_NE(ptr, MAP_FAILED);
> >+	/*
> >+	 * Now make the last 5 pages read-only, splitting the VMA:
> >+	 *
> >+	 *      RW          RO
> >+	 * |-----------|-----------|
> >+	 * | unfaulted | unfaulted |
> >+	 * |-----------|-----------|
> >+	 */
> >+	ASSERT_EQ(mprotect(&ptr[5 * page_size], 5 * page_size, PROT_READ), 0);
> >+	/*
> >+	 * Fault in the first of the first 5 pages so it gets an anon_vma and
> >+	 * thus the whole VMA becomes 'faulted':
> >+	 *
> >+	 *      RW          RO
> >+	 * |-----------|-----------|
> >+	 * |  faulted  | unfaulted |
> >+	 * |-----------|-----------|
> >+	 */
> >+	ptr[0] = 'x';
> >+	/*
> >+	 * Now mprotect() the RW region read-only, we should merge:
> >+	 *
> >+	 *             RO
> >+	 * |-----------------------|
> >+	 * |        faulted        |
> >+	 * |-----------------------|
> >+	 */
> >+	ASSERT_EQ(mprotect(ptr, 5 * page_size, PROT_READ), 0);
> >+
> >+	/* Assert that the merge succeeded using PROCMAP_QUERY. */
> >+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
> >+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
> >+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 10 * page_size);
> >+}
> >+
> >+TEST_F(merge, mprotect_unfaulted_both)
> >+{
> >+	unsigned int page_size = self->page_size;
> >+	char *carveout = self->carveout;
> >+	struct procmap_fd *procmap = &self->procmap;
> >+	char *ptr;
> >+
> >+	/*
> >+	 * |-----------------------|
> >+	 * |       unfaulted       |
> >+	 * |-----------------------|
> >+	 */
> >+	ptr = mmap(&carveout[2 * page_size], 9 * page_size, PROT_READ | PROT_WRITE,
> >+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
> >+	ASSERT_NE(ptr, MAP_FAILED);
> >+	/*
> >+	 * Now make the first and last 3 pages read-only, splitting the VMA:
> >+	 *
> >+	 *      RO          RW          RO
> >+	 * |-----------|-----------|-----------|
> >+	 * | unfaulted | unfaulted | unfaulted |
> >+	 * |-----------|-----------|-----------|
> >+	 */
> >+	ASSERT_EQ(mprotect(ptr, 3 * page_size, PROT_READ), 0);
> >+	ASSERT_EQ(mprotect(&ptr[6 * page_size], 3 * page_size, PROT_READ), 0);
> >+	/*
> >+	 * Fault in the first of the middle 3 pages so it gets an anon_vma and
> >+	 * thus the whole VMA becomes 'faulted':
> >+	 *
> >+	 *      RO          RW          RO
> >+	 * |-----------|-----------|-----------|
> >+	 * | unfaulted |  faulted  | unfaulted |
> >+	 * |-----------|-----------|-----------|
> >+	 */
> >+	ptr[3 * page_size] = 'x';
> >+	/*
> >+	 * Now mprotect() the RW region read-only, we should merge:
> >+	 *
> >+	 *             RO
> >+	 * |-----------------------|
> >+	 * |        faulted        |
> >+	 * |-----------------------|
> >+	 */
> >+	ASSERT_EQ(mprotect(&ptr[3 * page_size], 3 * page_size, PROT_READ), 0);
> >+
> >+	/* Assert that the merge succeeded using PROCMAP_QUERY. */
> >+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
> >+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
> >+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 9 * page_size);
> >+}
> >+
> >+TEST_F(merge, mprotect_faulted_left_unfaulted_right)
> >+{
> >+	unsigned int page_size = self->page_size;
> >+	char *carveout = self->carveout;
> >+	struct procmap_fd *procmap = &self->procmap;
> >+	char *ptr;
> >+
> >+	/*
> >+	 * |-----------------------|
> >+	 * |       unfaulted       |
> >+	 * |-----------------------|
> >+	 */
> >+	ptr = mmap(&carveout[2 * page_size], 9 * page_size, PROT_READ | PROT_WRITE,
> >+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
> >+	ASSERT_NE(ptr, MAP_FAILED);
> >+	/*
> >+	 * Now make the last 3 pages read-only, splitting the VMA:
> >+	 *
> >+	 *             RW               RO
> >+	 * |-----------------------|-----------|
> >+	 * |       unfaulted       | unfaulted |
> >+	 * |-----------------------|-----------|
> >+	 */
> >+	ASSERT_EQ(mprotect(&ptr[6 * page_size], 3 * page_size, PROT_READ), 0);
> >+	/*
> >+	 * Fault in the first of the first 6 pages so it gets an anon_vma and
> >+	 * thus the whole VMA becomes 'faulted':
> >+	 *
> >+	 *             RW               RO
> >+	 * |-----------------------|-----------|
> >+	 * |       unfaulted       | unfaulted |
>                    ^^^
>
> According to your previous comment convention, the comment here describe the
> result after ptr[0] = 'x'.
>
> faulted is the correct description here?

My 'previous comment convention'? What? You mean my describing what
happens?

Yes this is incorrect, this should read 'faulted' in the RW section. I will
fix if I respin.

>
> >+	 * |-----------------------|-----------|
> >+	 */
> >+	ptr[0] = 'x';
> >+	/*
> >+	 * Now make the first 3 pages read-only, splitting the VMA:
> >+	 *
> >+	 *      RO          RW          RO
> >+	 * |-----------|-----------|-----------|
> >+	 * |  faulted  |  faulted  | unfaulted |
> >+	 * |-----------|-----------|-----------|
> >+	 */
> >+	ASSERT_EQ(mprotect(ptr, 3 * page_size, PROT_READ), 0);
> >+	/*
> >+	 * Now mprotect() the RW region read-only, we should merge:
> >+	 *
> >+	 *             RO
> >+	 * |-----------------------|
> >+	 * |        faulted        |
> >+	 * |-----------------------|
> >+	 */
> >+	ASSERT_EQ(mprotect(&ptr[3 * page_size], 3 * page_size, PROT_READ), 0);
> >+
> >+	/* Assert that the merge succeeded using PROCMAP_QUERY. */
> >+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
> >+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
> >+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 9 * page_size);
> >+}
> >+
> >+TEST_F(merge, mprotect_unfaulted_left_faulted_right)
> >+{
> >+	unsigned int page_size = self->page_size;
> >+	char *carveout = self->carveout;
> >+	struct procmap_fd *procmap = &self->procmap;
> >+	char *ptr;
> >+
> >+	/*
> >+	 * |-----------------------|
> >+	 * |       unfaulted       |
> >+	 * |-----------------------|
> >+	 */
> >+	ptr = mmap(&carveout[2 * page_size], 9 * page_size, PROT_READ | PROT_WRITE,
> >+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
> >+	ASSERT_NE(ptr, MAP_FAILED);
> >+	/*
> >+	 * Now make the first 3 pages read-only, splitting the VMA:
> >+	 *
> >+	 *      RO                RW
> >+	 * |-----------|-----------------------|
> >+	 * | unfaulted |       unfaulted       |
> >+	 * |-----------|-----------------------|
> >+	 */
> >+	ASSERT_EQ(mprotect(ptr, 3 * page_size, PROT_READ), 0);
> >+	/*
> >+	 * FAult in the first of the last 6 pages so it gets an anon_vma and
>             ^
>
> s/A/a/

Right, I'll fix that if I respin.

>
> >+	 * thus the whole VMA becomes 'faulted':
> >+	 *
> >+	 *      RO                RW
> >+	 * |-----------|-----------------------|
> >+	 * | unfaulted |        faulted        |
> >+	 * |-----------|-----------------------|
> >+	 */
> >+	ptr[3 * page_size] = 'x';
> >+	/*
> >+	 * Now make the last 3 pages read-only, splitting the VMA:
> >+	 *
> >+	 *      RO          RW          RO
> >+	 * |-----------|-----------|-----------|
> >+	 * | unfaulted |  faulted  |  faulted  |
> >+	 * |-----------|-----------|-----------|
> >+	 */
> >+	ASSERT_EQ(mprotect(&ptr[6 * page_size], 3 * page_size, PROT_READ), 0);
> >+	/*
> >+	 * Now mprotect() the RW region read-only, we should merge:
> >+	 *
> >+	 *             RO
> >+	 * |-----------------------|
> >+	 * |        faulted        |
> >+	 * |-----------------------|
> >+	 */
> >+	ASSERT_EQ(mprotect(&ptr[3 * page_size], 3 * page_size, PROT_READ), 0);
> >+
> >+	/* Assert that the merge succeeded using PROCMAP_QUERY. */
> >+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
> >+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
> >+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 9 * page_size);
> >+}
> >+
> >+TEST_F(merge, forked_target_vma)
> >+{
> >+	unsigned int page_size = self->page_size;
> >+	char *carveout = self->carveout;
> >+	struct procmap_fd *procmap = &self->procmap;
> >+	pid_t pid;
> >+	char *ptr, *ptr2;
> >+	int i;
> >+
> >+	/*
> >+	 * |-----------|
> >+	 * | unfaulted |
> >+	 * |-----------|
> >+	 */
> >+	ptr = mmap(&carveout[page_size], 5 * page_size, PROT_READ | PROT_WRITE,
> >+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED, -1, 0);
> >+	ASSERT_NE(ptr, MAP_FAILED);
> >+
> >+	/*
> >+	 * Fault in process.
> >+	 *
> >+	 * |-----------|
> >+	 * |  faulted  |
> >+	 * |-----------|
> >+	 */
> >+	ptr[0] = 'x';
> >+
> >+	pid = fork();
> >+	ASSERT_NE(pid, -1);
> >+
> >+	if (pid != 0) {
> >+		wait(NULL);
> >+		return;
> >+	}
> >+
> >+	/* Child process below: */
> >+
> >+	/* Reopen for child. */
> >+	ASSERT_EQ(close_procmap(&self->procmap), 0);
> >+	ASSERT_EQ(open_self_procmap(&self->procmap), 0);
> >+
> >+	/* unCOWing everything does not cause the AVC to go away. */
>            ^^^
>
> Before ptr[i] = 'x', we have unCOWed pages in vma. What we are doing here is
> COWing, right?

Nope, it's the other way round, as commented. A 'CoW' page is one marked
for copy-on-write right? we now make it just a normal mapping by writing to
it.

>
> >+	for (i = 0; i < 5 * page_size; i += page_size)
> >+		ptr[i] = 'x';
> >+
> >+	/*
> >+	 * Map in adjacent VMA in child.
> >+	 *
> >+	 *     forked
> >+	 * |-----------|-----------|
> >+	 * |  faulted  | unfaulted |
> >+	 * |-----------|-----------|
> >+	 *      ptr         ptr2
> >+	 */
> >+	ptr2 = mmap(&ptr[5 * page_size], 5 * page_size, PROT_READ | PROT_WRITE,
> >+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED, -1, 0);
> >+	ASSERT_NE(ptr2, MAP_FAILED);
> >+
> >+	/* Make sure not merged. */
> >+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
> >+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
> >+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 5 * page_size);
> >+}
> >+
> >+TEST_F(merge, forked_source_vma)
> >+{
> >+	unsigned int page_size = self->page_size;
> >+	char *carveout = self->carveout;
> >+	struct procmap_fd *procmap = &self->procmap;
> >+	pid_t pid;
> >+	char *ptr, *ptr2;
> >+	int i;
> >+
> >+	/*
> >+	 * |............|-----------|
> >+	 * | <unmapped> | unfaulted |
> >+	 * |............|-----------|
>
> I am not sure "unmapped" is correct here. The range has already been mapped by
> FIXTURE_SETUP(merge).

This is pointless and actually misleading pedantry.

For the purposes of what we are doing here, this is unmapped. Do you truly
think mentioning a PROT_NONE mapping here would be useful, meaningful, or
add anything but noise?

>
> >+	 */
> >+	ptr = mmap(&carveout[page_size], 5 * page_size, PROT_READ | PROT_WRITE,
> >+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
> >+	ASSERT_NE(ptr, MAP_FAILED);
> >+
> >+	/*
> >+	 * Fault in process.
> >+	 *
> >+	 * |............||-----------|
> >+	 * | <unmapped> ||  faulted  |
> >+	 * |............||-----------|
>                          ^
>
> Extra line here?

Eh? I don't understand what you mean... you mean an extra '-'? This is to
fit both unfaulted/faulted in the same size SACII 'VMA', a convention I've
kept (hopefully) consistently...

>
> >+	 */
> >+	ptr[0] = 'x';
> >+
> >+	pid = fork();
> >+	ASSERT_NE(pid, -1);
> >+
> >+	if (pid != 0) {
> >+		wait(NULL);
> >+		return;
> >+	}
> >+
> >+	/* Child process below: */
> >+
> >+	/* Reopen for child. */
> >+	ASSERT_EQ(close_procmap(&self->procmap), 0);
> >+	ASSERT_EQ(open_self_procmap(&self->procmap), 0);
> >+
> >+	/* unCOWing everything does not cause the AVC to go away. */
>
> Same as above.

And you're equally wrong here.

I appreciate it's confusing and perhaps a poor way of expressing it, but
I'm not sure there's a gloriously wonderful means of doing so that will
bring clarity to everybody, as is the nature of mm. I do my best.

>
> >+	for (i = 0; i < 5 * page_size; i += page_size)
> >+		ptr[i] = 'x';
> >+
> >+	/*
> >+	 * Map in adjacent VMA in child, ptr2 before ptr, but incompatible.
> >+	 *
> >+	 *      RWX      forked RW
> >+	 * |-----------|-----------|
> >+	 * | unfaulted |  faulted  |
> >+	 * |-----------|-----------|
> >+	 *      ptr2        ptr
> >+	 */
> >+	ptr2 = mmap(&carveout[6 * page_size], 5 * page_size, PROT_READ | PROT_WRITE | PROT_EXEC,
> >+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
> >+	ASSERT_NE(ptr2, MAP_FAILED);
> >+
>
> I think pt2 is after ptr. Do I miss something?

Yup, didn't update the comments but clearly updated the method. I'll update
if there's a respin.

>
> >+
> >+	/* Make sure not merged. */
> >+	ASSERT_TRUE(find_vma_procmap(procmap, ptr2));
> >+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr2);
> >+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr2 + 5 * page_size);
> >+
> >+	/*
> >+	 * Now mprotect forked region to RWX so it becomes the source for the
> >+	 * merge to unfaulted region:
> >+	 *
> >+	 *      RWX      forked RWX
> >+	 * |-----------|-----------|
> >+	 * | unfaulted |  faulted  |
> >+	 * |-----------|-----------|
> >+	 *      ptr2        ptr
> >+	 */
> >+	ASSERT_EQ(mprotect(ptr, 5 * page_size, PROT_READ | PROT_WRITE | PROT_EXEC), 0);
> >+	/* Again, make sure not merged. */
> >+	ASSERT_TRUE(find_vma_procmap(procmap, ptr2));
> >+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr2);
> >+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr2 + 5 * page_size);
> >+}
> >+
> >+TEST_HARNESS_MAIN
> >diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
> >index 9aff33b10999..0b2f8bb91433 100755
> >--- a/tools/testing/selftests/mm/run_vmtests.sh
> >+++ b/tools/testing/selftests/mm/run_vmtests.sh
> >@@ -421,6 +421,8 @@ CATEGORY="madv_guard" run_test ./guard-regions
> > # MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
> > CATEGORY="madv_populate" run_test ./madv_populate
> >
> >+CATEGORY="vma_merge" run_test ./merge
> >+
>
> ./run_vmtests.sh -h would show a list of categories.
>
> How about add the new category "vma_merge" into the list.

Probably unintended but this sounds pretty rude, especially when you are
performing unrequested, rather pedantic, 'drive-by' review like this.

A polite way of putting it would be 'perhaps additionally update the usage
to add vma_merge to the list?'.

Again, this is not very important, sure if I respin I'll do it.

>
> > if [ -x ./memfd_secret ]
> > then
> > (echo 0 > /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
> >--
> >2.48.1
> >
>
> --
> Wei Yang
> Help you, Help me


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] tools/testing/selftests: assert that anon merge cases behave as expected
  2025-04-07 11:02     ` Lorenzo Stoakes
@ 2025-04-07 12:09       ` Wei Yang
  0 siblings, 0 replies; 15+ messages in thread
From: Wei Yang @ 2025-04-07 12:09 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Wei Yang, Andrew Morton, Vlastimil Babka, Jann Horn,
	Liam R . Howlett, Suren Baghdasaryan, David Hildenbrand,
	Matthew Wilcox, Rik van Riel, linux-mm, linux-kernel

On Mon, Apr 07, 2025 at 12:02:00PM +0100, Lorenzo Stoakes wrote:
>I know you mean well Wei,
>
>But drive-by extremely pedantic review on minor details isn't really
>useful. I can't tell you not to do this, but I can at least ask. I don't
>think this is a great use of either of our time.
>
>Thanks.
>
[...]
>> >+
>> >+	/* unCOWing everything does not cause the AVC to go away. */
>>            ^^^
>>
>> Before ptr[i] = 'x', we have unCOWed pages in vma. What we are doing here is
>> COWing, right?
>
>Nope, it's the other way round, as commented. A 'CoW' page is one marked
>for copy-on-write right? we now make it just a normal mapping by writing to
>it.
>

Oh, I misunderstand the meaning of 'CoW' page. It is the page before copy. I
thought it is the page after. Sorry for bothering.

>>
>> >+	for (i = 0; i < 5 * page_size; i += page_size)
>> >+		ptr[i] = 'x';
>> >+
>> >+	/*
>> >+	 * Map in adjacent VMA in child.
>> >+	 *
>> >+	 *     forked
>> >+	 * |-----------|-----------|
>> >+	 * |  faulted  | unfaulted |
>> >+	 * |-----------|-----------|
>> >+	 *      ptr         ptr2
>> >+	 */
>> >+	ptr2 = mmap(&ptr[5 * page_size], 5 * page_size, PROT_READ | PROT_WRITE,
>> >+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED, -1, 0);
>> >+	ASSERT_NE(ptr2, MAP_FAILED);
>> >+
>> >+	/* Make sure not merged. */
>> >+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
>> >+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
>> >+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 5 * page_size);
>> >+}
>> >+
>> >+TEST_F(merge, forked_source_vma)
>> >+{
>> >+	unsigned int page_size = self->page_size;
>> >+	char *carveout = self->carveout;
>> >+	struct procmap_fd *procmap = &self->procmap;
>> >+	pid_t pid;
>> >+	char *ptr, *ptr2;
>> >+	int i;
>> >+
>> >+	/*
>> >+	 * |............|-----------|
>> >+	 * | <unmapped> | unfaulted |
>> >+	 * |............|-----------|
>>
>> I am not sure "unmapped" is correct here. The range has already been mapped by
>> FIXTURE_SETUP(merge).
>
>This is pointless and actually misleading pedantry.
>
>For the purposes of what we are doing here, this is unmapped. Do you truly
>think mentioning a PROT_NONE mapping here would be useful, meaningful, or
>add anything but noise?
>
>>
>> >+	 */
>> >+	ptr = mmap(&carveout[page_size], 5 * page_size, PROT_READ | PROT_WRITE,
>> >+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
>> >+	ASSERT_NE(ptr, MAP_FAILED);
>> >+
>> >+	/*
>> >+	 * Fault in process.
>> >+	 *
>> >+	 * |............||-----------|
>> >+	 * | <unmapped> ||  faulted  |
>> >+	 * |............||-----------|
>>                          ^
>>
>> Extra line here?
>
>Eh? I don't understand what you mean... you mean an extra '-'? This is to
>fit both unfaulted/faulted in the same size SACII 'VMA', a convention I've
>kept (hopefully) consistently...
>

Sounds the character format is corrupted.

The extra line I meant is "||" between unmapped and faulted area. Well it is
trivial, just forget it.

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] mm/vma: fix incorrectly disallowed anonymous VMA merges
  2025-04-07 10:24         ` Lorenzo Stoakes
@ 2025-04-07 16:44           ` Lorenzo Stoakes
  0 siblings, 0 replies; 15+ messages in thread
From: Lorenzo Stoakes @ 2025-04-07 16:44 UTC (permalink / raw)
  To: Wei Yang
  Cc: Andrew Morton, Vlastimil Babka, Jann Horn, Liam R . Howlett,
	Suren Baghdasaryan, David Hildenbrand, Matthew Wilcox,
	Rik van Riel, linux-mm, linux-kernel

On Mon, Apr 07, 2025 at 11:24:19AM +0100, Lorenzo Stoakes wrote:
> Since you seem dead-set on this issue, I think the best solution here is
> just for me to update the comment a little. I'll send a fix-patch.

Actually, re-reading the comment I added:

 * If, at any point, the VMA had unCoW'd mappings from parents, it will maintain
 * more than one anon_vma_chain connecting it to more than one anon_vma. A merge
 * would mean a wider range of folios sharing the root anon_vma lock, and thus
 * potential lock contention, we do not wish to encourage merging such that this
 * scales to a problem.

'If _at any point_' - we already clarify the situation.

So I don't think any change is required here, sorry.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-04-07 16:44 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-17 21:15 [PATCH 0/3] fix incorrectly disallowed anonymous VMA merges Lorenzo Stoakes
2025-03-17 21:15 ` [PATCH 1/3] mm/vma: " Lorenzo Stoakes
2025-04-04 12:53   ` Wei Yang
2025-04-04 13:04     ` Lorenzo Stoakes
2025-04-04 23:32       ` Wei Yang
2025-04-07 10:24         ` Lorenzo Stoakes
2025-04-07 16:44           ` Lorenzo Stoakes
2025-03-17 21:15 ` [PATCH 2/3] tools/testing: add PROCMAP_QUERY helper functions in mm self tests Lorenzo Stoakes
2025-03-18 15:18   ` Lorenzo Stoakes
2025-04-07  2:42   ` Wei Yang
2025-03-17 21:15 ` [PATCH 3/3] tools/testing/selftests: assert that anon merge cases behave as expected Lorenzo Stoakes
2025-04-07  2:54   ` Wei Yang
2025-04-07 11:02     ` Lorenzo Stoakes
2025-04-07 12:09       ` Wei Yang
2025-03-20 13:33 ` [PATCH 0/3] fix incorrectly disallowed anonymous VMA merges Yeoreum Yun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).