linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC Patch 0/5] Make anon_vma operations testable
@ 2025-04-29  9:06 Wei Yang
  2025-04-29  9:06 ` [RFC Patch 1/5] mm: move anon_vma manipulation functions to own file Wei Yang
                   ` (5 more replies)
  0 siblings, 6 replies; 29+ messages in thread
From: Wei Yang @ 2025-04-29  9:06 UTC (permalink / raw)
  To: akpm
  Cc: david, lorenzo.stoakes, riel, vbabka, harry.yoo, jannh, baohua,
	linux-mm, Wei Yang

There are several anon_vma manipulation functions implemented in mm/rmap.c,
those concerning anon_vma preparing, cloning and forking, which logically
could be stand-alone.

This patch series isolates anon_vma manipulation functionality into its own
file, mm/anon_vma.c, and provides an API to the rest of the kernel in
include/linux/anon_vma.h.

It also introduce mm/anon_vma_internal.h, which specifies which headers need
to be imported by anon_vma.c, leading to the very useful property that
anon_vma.c depends only on include/linux/anon_vma.h and
mm/anon_vma_internal.h.

This means we can then re-implement anon_vma_internal.h in userland, adding
shims for kernel mechanisms as required, allowing us to unit test internal
anon_vma functionality.

This patch series takes advantage of existing shim logic and full userland
interval tree support contained in tools/testing/rbtree/ and
tools/include/linux/.

Kernel functionality is stubbed and shimmed as needed in
tools/testing/anon_vma/ which contains a fully functional userland
anon_vma_internal.h file and which imports mm/anon_vma.c to be directly tested
from userland.

Patch 1 split anon_vma related logic to mm/anon_vma.c
Patch 2 add a simple skeleton testing on simple fault and fork
Patch 3/4 add tests for mergeable and reusable anon_vma
Patch 5 assert the anon_vma double-reuse is fixed

Wei Yang (5):
  mm: move anon_vma manipulation functions to own file
  anon_vma: add skeleton code for userland testing of anon_vma logic
  anon_vma: add test for mergeable anon_vma
  anon_vma: add test for reusable anon_vma
  anon_vma: add test to assert no double-reuse

 MAINTAINERS                                |   3 +
 include/linux/anon_vma.h                   | 163 +++++
 include/linux/rmap.h                       | 147 +---
 mm/Makefile                                |   2 +-
 mm/anon_vma.c                              | 396 +++++++++++
 mm/anon_vma_internal.h                     |  14 +
 mm/rmap.c                                  | 391 -----------
 tools/include/linux/rwsem.h                |  10 +
 tools/include/linux/slab.h                 |   4 +
 tools/testing/anon_vma/.gitignore          |   3 +
 tools/testing/anon_vma/Makefile            |  25 +
 tools/testing/anon_vma/anon_vma.c          | 773 +++++++++++++++++++++
 tools/testing/anon_vma/anon_vma_internal.h |  88 +++
 tools/testing/anon_vma/interval_tree.c     |  53 ++
 tools/testing/anon_vma/linux/atomic.h      |  18 +
 tools/testing/anon_vma/linux/fs.h          |   6 +
 tools/testing/anon_vma/linux/mm.h          |  44 ++
 tools/testing/anon_vma/linux/mm_types.h    |  57 ++
 tools/testing/anon_vma/linux/mmzone.h      |   6 +
 tools/testing/anon_vma/linux/rmap.h        |   8 +
 tools/testing/shared/linux/anon_vma.h      |   7 +
 21 files changed, 1680 insertions(+), 538 deletions(-)
 create mode 100644 include/linux/anon_vma.h
 create mode 100644 mm/anon_vma.c
 create mode 100644 mm/anon_vma_internal.h
 create mode 100644 tools/testing/anon_vma/.gitignore
 create mode 100644 tools/testing/anon_vma/Makefile
 create mode 100644 tools/testing/anon_vma/anon_vma.c
 create mode 100644 tools/testing/anon_vma/anon_vma_internal.h
 create mode 100644 tools/testing/anon_vma/interval_tree.c
 create mode 100644 tools/testing/anon_vma/linux/atomic.h
 create mode 100644 tools/testing/anon_vma/linux/fs.h
 create mode 100644 tools/testing/anon_vma/linux/mm.h
 create mode 100644 tools/testing/anon_vma/linux/mm_types.h
 create mode 100644 tools/testing/anon_vma/linux/mmzone.h
 create mode 100644 tools/testing/anon_vma/linux/rmap.h
 create mode 100644 tools/testing/shared/linux/anon_vma.h

-- 
2.34.1



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC Patch 1/5] mm: move anon_vma manipulation functions to own file
  2025-04-29  9:06 [RFC Patch 0/5] Make anon_vma operations testable Wei Yang
@ 2025-04-29  9:06 ` Wei Yang
  2025-04-29  9:06 ` [RFC Patch 2/5] anon_vma: add skeleton code for userland testing of anon_vma logic Wei Yang
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 29+ messages in thread
From: Wei Yang @ 2025-04-29  9:06 UTC (permalink / raw)
  To: akpm
  Cc: david, lorenzo.stoakes, riel, vbabka, harry.yoo, jannh, baohua,
	linux-mm, Wei Yang

This patch introduce anon_vma.c and move anon_vma manipulation function
to this file from rmap.c.

This allows us to create userland testing code and verify the
functionality for further change.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Harry Yoo <harry.yoo@oracle.com>
---
 MAINTAINERS              |   3 +
 include/linux/anon_vma.h | 163 ++++++++++++++++
 include/linux/rmap.h     | 147 +--------------
 mm/Makefile              |   2 +-
 mm/anon_vma.c            | 396 +++++++++++++++++++++++++++++++++++++++
 mm/anon_vma_internal.h   |  14 ++
 mm/rmap.c                | 391 --------------------------------------
 7 files changed, 578 insertions(+), 538 deletions(-)
 create mode 100644 include/linux/anon_vma.h
 create mode 100644 mm/anon_vma.c
 create mode 100644 mm/anon_vma_internal.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 395cfe3c757d..2b4edb27307f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15575,7 +15575,10 @@ R:	Harry Yoo <harry.yoo@oracle.com>
 L:	linux-mm@kvack.org
 S:	Maintained
 F:	include/linux/rmap.h
+F:	include/linux/anon_vma.h
 F:	mm/rmap.c
+F:	mm/anon_vma.c
+F:	mm/anon_vma_internal.c
 
 MEMORY MANAGEMENT - SECRETMEM
 M:	Andrew Morton <akpm@linux-foundation.org>
diff --git a/include/linux/anon_vma.h b/include/linux/anon_vma.h
new file mode 100644
index 000000000000..c2f190c786a9
--- /dev/null
+++ b/include/linux/anon_vma.h
@@ -0,0 +1,163 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * anon_vma.h
+ */
+#ifndef __ANON_VMA_H
+#define __ANON_VMA_H
+
+#include <linux/mm_types.h>
+
+/*
+ * The anon_vma heads a list of private "related" vmas, to scan if
+ * an anonymous page pointing to this anon_vma needs to be unmapped:
+ * the vmas on the list will be related by forking, or by splitting.
+ *
+ * Since vmas come and go as they are split and merged (particularly
+ * in mprotect), the mapping field of an anonymous page cannot point
+ * directly to a vma: instead it points to an anon_vma, on whose list
+ * the related vmas can be easily linked or unlinked.
+ *
+ * After unlinking the last vma on the list, we must garbage collect
+ * the anon_vma object itself: we're guaranteed no page can be
+ * pointing to this anon_vma once its vma list is empty.
+ */
+struct anon_vma {
+	struct anon_vma *root;		/* Root of this anon_vma tree */
+	struct rw_semaphore rwsem;	/* W: modification, R: walking the list */
+	/*
+	 * The refcount is taken on an anon_vma when there is no
+	 * guarantee that the vma of page tables will exist for
+	 * the duration of the operation. A caller that takes
+	 * the reference is responsible for clearing up the
+	 * anon_vma if they are the last user on release
+	 */
+	atomic_t refcount;
+
+	/*
+	 * Count of child anon_vmas. Equals to the count of all anon_vmas that
+	 * have ->parent pointing to this one, including itself.
+	 *
+	 * This counter is used for making decision about reusing anon_vma
+	 * instead of forking new one. See comments in function anon_vma_clone.
+	 */
+	unsigned long num_children;
+	/* Count of VMAs whose ->anon_vma pointer points to this object. */
+	unsigned long num_active_vmas;
+
+	struct anon_vma *parent;	/* Parent of this anon_vma */
+
+	/*
+	 * NOTE: the LSB of the rb_root.rb_node is set by
+	 * mm_take_all_locks() _after_ taking the above lock. So the
+	 * rb_root must only be read/written after taking the above lock
+	 * to be sure to see a valid next pointer. The LSB bit itself
+	 * is serialized by a system wide lock only visible to
+	 * mm_take_all_locks() (mm_all_locks_mutex).
+	 */
+
+	/* Interval tree of private "related" vmas */
+	struct rb_root_cached rb_root;
+};
+
+/*
+ * The copy-on-write semantics of fork mean that an anon_vma
+ * can become associated with multiple processes. Furthermore,
+ * each child process will have its own anon_vma, where new
+ * pages for that process are instantiated.
+ *
+ * This structure allows us to find the anon_vmas associated
+ * with a VMA, or the VMAs associated with an anon_vma.
+ * The "same_vma" list contains the anon_vma_chains linking
+ * all the anon_vmas associated with this VMA.
+ * The "rb" field indexes on an interval tree the anon_vma_chains
+ * which link all the VMAs associated with this anon_vma.
+ */
+struct anon_vma_chain {
+	struct vm_area_struct *vma;
+	struct anon_vma *anon_vma;
+	struct list_head same_vma;   /* locked by mmap_lock & page_table_lock */
+	struct rb_node rb;			/* locked by anon_vma->rwsem */
+	unsigned long rb_subtree_last;
+#ifdef CONFIG_DEBUG_VM_RB
+	unsigned long cached_vma_start, cached_vma_last;
+#endif
+};
+
+#ifdef CONFIG_MMU
+
+static inline void get_anon_vma(struct anon_vma *anon_vma)
+{
+	atomic_inc(&anon_vma->refcount);
+}
+
+void __put_anon_vma(struct anon_vma *anon_vma);
+
+static inline void put_anon_vma(struct anon_vma *anon_vma)
+{
+	if (atomic_dec_and_test(&anon_vma->refcount))
+		__put_anon_vma(anon_vma);
+}
+
+static inline void anon_vma_lock_write(struct anon_vma *anon_vma)
+{
+	down_write(&anon_vma->root->rwsem);
+}
+
+static inline int anon_vma_trylock_write(struct anon_vma *anon_vma)
+{
+	return down_write_trylock(&anon_vma->root->rwsem);
+}
+
+static inline void anon_vma_unlock_write(struct anon_vma *anon_vma)
+{
+	up_write(&anon_vma->root->rwsem);
+}
+
+static inline void anon_vma_lock_read(struct anon_vma *anon_vma)
+{
+	down_read(&anon_vma->root->rwsem);
+}
+
+static inline int anon_vma_trylock_read(struct anon_vma *anon_vma)
+{
+	return down_read_trylock(&anon_vma->root->rwsem);
+}
+
+static inline void anon_vma_unlock_read(struct anon_vma *anon_vma)
+{
+	up_read(&anon_vma->root->rwsem);
+}
+
+
+/*
+ * anon_vma helper functions.
+ */
+void anon_vma_init(void);	/* create anon_vma_cachep */
+int  __anon_vma_prepare(struct vm_area_struct *);
+void unlink_anon_vmas(struct vm_area_struct *);
+int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *);
+int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
+
+static inline int anon_vma_prepare(struct vm_area_struct *vma)
+{
+	if (likely(vma->anon_vma))
+		return 0;
+
+	return __anon_vma_prepare(vma);
+}
+
+static inline void anon_vma_merge(struct vm_area_struct *vma,
+				  struct vm_area_struct *next)
+{
+	VM_BUG_ON_VMA(vma->anon_vma != next->anon_vma, vma);
+	unlink_anon_vmas(next);
+}
+
+#else	/* !CONFIG_MMU */
+
+#define anon_vma_init()		do {} while (0)
+#define anon_vma_prepare(vma)	(0)
+
+#endif	/* CONFIG_MMU */
+
+#endif /* __ANON_VMA_H */
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 6b82b618846e..5116d72b8f79 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -14,82 +14,7 @@
 #include <linux/pagemap.h>
 #include <linux/memremap.h>
 #include <linux/bit_spinlock.h>
-
-/*
- * The anon_vma heads a list of private "related" vmas, to scan if
- * an anonymous page pointing to this anon_vma needs to be unmapped:
- * the vmas on the list will be related by forking, or by splitting.
- *
- * Since vmas come and go as they are split and merged (particularly
- * in mprotect), the mapping field of an anonymous page cannot point
- * directly to a vma: instead it points to an anon_vma, on whose list
- * the related vmas can be easily linked or unlinked.
- *
- * After unlinking the last vma on the list, we must garbage collect
- * the anon_vma object itself: we're guaranteed no page can be
- * pointing to this anon_vma once its vma list is empty.
- */
-struct anon_vma {
-	struct anon_vma *root;		/* Root of this anon_vma tree */
-	struct rw_semaphore rwsem;	/* W: modification, R: walking the list */
-	/*
-	 * The refcount is taken on an anon_vma when there is no
-	 * guarantee that the vma of page tables will exist for
-	 * the duration of the operation. A caller that takes
-	 * the reference is responsible for clearing up the
-	 * anon_vma if they are the last user on release
-	 */
-	atomic_t refcount;
-
-	/*
-	 * Count of child anon_vmas. Equals to the count of all anon_vmas that
-	 * have ->parent pointing to this one, including itself.
-	 *
-	 * This counter is used for making decision about reusing anon_vma
-	 * instead of forking new one. See comments in function anon_vma_clone.
-	 */
-	unsigned long num_children;
-	/* Count of VMAs whose ->anon_vma pointer points to this object. */
-	unsigned long num_active_vmas;
-
-	struct anon_vma *parent;	/* Parent of this anon_vma */
-
-	/*
-	 * NOTE: the LSB of the rb_root.rb_node is set by
-	 * mm_take_all_locks() _after_ taking the above lock. So the
-	 * rb_root must only be read/written after taking the above lock
-	 * to be sure to see a valid next pointer. The LSB bit itself
-	 * is serialized by a system wide lock only visible to
-	 * mm_take_all_locks() (mm_all_locks_mutex).
-	 */
-
-	/* Interval tree of private "related" vmas */
-	struct rb_root_cached rb_root;
-};
-
-/*
- * The copy-on-write semantics of fork mean that an anon_vma
- * can become associated with multiple processes. Furthermore,
- * each child process will have its own anon_vma, where new
- * pages for that process are instantiated.
- *
- * This structure allows us to find the anon_vmas associated
- * with a VMA, or the VMAs associated with an anon_vma.
- * The "same_vma" list contains the anon_vma_chains linking
- * all the anon_vmas associated with this VMA.
- * The "rb" field indexes on an interval tree the anon_vma_chains
- * which link all the VMAs associated with this anon_vma.
- */
-struct anon_vma_chain {
-	struct vm_area_struct *vma;
-	struct anon_vma *anon_vma;
-	struct list_head same_vma;   /* locked by mmap_lock & page_table_lock */
-	struct rb_node rb;			/* locked by anon_vma->rwsem */
-	unsigned long rb_subtree_last;
-#ifdef CONFIG_DEBUG_VM_RB
-	unsigned long cached_vma_start, cached_vma_last;
-#endif
-};
+#include <linux/anon_vma.h>
 
 enum ttu_flags {
 	TTU_SPLIT_HUGE_PMD	= 0x4,	/* split huge PMD if any */
@@ -104,73 +29,6 @@ enum ttu_flags {
 };
 
 #ifdef CONFIG_MMU
-static inline void get_anon_vma(struct anon_vma *anon_vma)
-{
-	atomic_inc(&anon_vma->refcount);
-}
-
-void __put_anon_vma(struct anon_vma *anon_vma);
-
-static inline void put_anon_vma(struct anon_vma *anon_vma)
-{
-	if (atomic_dec_and_test(&anon_vma->refcount))
-		__put_anon_vma(anon_vma);
-}
-
-static inline void anon_vma_lock_write(struct anon_vma *anon_vma)
-{
-	down_write(&anon_vma->root->rwsem);
-}
-
-static inline int anon_vma_trylock_write(struct anon_vma *anon_vma)
-{
-	return down_write_trylock(&anon_vma->root->rwsem);
-}
-
-static inline void anon_vma_unlock_write(struct anon_vma *anon_vma)
-{
-	up_write(&anon_vma->root->rwsem);
-}
-
-static inline void anon_vma_lock_read(struct anon_vma *anon_vma)
-{
-	down_read(&anon_vma->root->rwsem);
-}
-
-static inline int anon_vma_trylock_read(struct anon_vma *anon_vma)
-{
-	return down_read_trylock(&anon_vma->root->rwsem);
-}
-
-static inline void anon_vma_unlock_read(struct anon_vma *anon_vma)
-{
-	up_read(&anon_vma->root->rwsem);
-}
-
-
-/*
- * anon_vma helper functions.
- */
-void anon_vma_init(void);	/* create anon_vma_cachep */
-int  __anon_vma_prepare(struct vm_area_struct *);
-void unlink_anon_vmas(struct vm_area_struct *);
-int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *);
-int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
-
-static inline int anon_vma_prepare(struct vm_area_struct *vma)
-{
-	if (likely(vma->anon_vma))
-		return 0;
-
-	return __anon_vma_prepare(vma);
-}
-
-static inline void anon_vma_merge(struct vm_area_struct *vma,
-				  struct vm_area_struct *next)
-{
-	VM_BUG_ON_VMA(vma->anon_vma != next->anon_vma, vma);
-	unlink_anon_vmas(next);
-}
 
 struct anon_vma *folio_get_anon_vma(const struct folio *folio);
 
@@ -1020,9 +878,6 @@ struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
 
 #else	/* !CONFIG_MMU */
 
-#define anon_vma_init()		do {} while (0)
-#define anon_vma_prepare(vma)	(0)
-
 static inline int folio_referenced(struct folio *folio, int is_locked,
 				  struct mem_cgroup *memcg,
 				  unsigned long *vm_flags)
diff --git a/mm/Makefile b/mm/Makefile
index e7f6bbf8ae5f..468a4a076832 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -37,7 +37,7 @@ mmu-y			:= nommu.o
 mmu-$(CONFIG_MMU)	:= highmem.o memory.o mincore.o \
 			   mlock.o mmap.o mmu_gather.o mprotect.o mremap.o \
 			   msync.o page_vma_mapped.o pagewalk.o \
-			   pgtable-generic.o rmap.o vmalloc.o vma.o
+			   pgtable-generic.o rmap.o vmalloc.o vma.o anon_vma.o
 
 
 ifdef CONFIG_CROSS_MEMORY_ATTACH
diff --git a/mm/anon_vma.c b/mm/anon_vma.c
new file mode 100644
index 000000000000..321784e1c3eb
--- /dev/null
+++ b/mm/anon_vma.c
@@ -0,0 +1,396 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include "anon_vma_internal.h"
+#include <linux/anon_vma.h>
+
+static struct kmem_cache *anon_vma_cachep;
+static struct kmem_cache *anon_vma_chain_cachep;
+
+static inline struct anon_vma *anon_vma_alloc(void)
+{
+	struct anon_vma *anon_vma;
+
+	anon_vma = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
+	if (anon_vma) {
+		atomic_set(&anon_vma->refcount, 1);
+		anon_vma->num_children = 0;
+		anon_vma->num_active_vmas = 0;
+		anon_vma->parent = anon_vma;
+		/*
+		 * Initialise the anon_vma root to point to itself. If called
+		 * from fork, the root will be reset to the parents anon_vma.
+		 */
+		anon_vma->root = anon_vma;
+	}
+
+	return anon_vma;
+}
+
+static inline void anon_vma_free(struct anon_vma *anon_vma)
+{
+	VM_BUG_ON(atomic_read(&anon_vma->refcount));
+
+	/*
+	 * Synchronize against folio_lock_anon_vma_read() such that
+	 * we can safely hold the lock without the anon_vma getting
+	 * freed.
+	 *
+	 * Relies on the full mb implied by the atomic_dec_and_test() from
+	 * put_anon_vma() against the acquire barrier implied by
+	 * down_read_trylock() from folio_lock_anon_vma_read(). This orders:
+	 *
+	 * folio_lock_anon_vma_read()	VS	put_anon_vma()
+	 *   down_read_trylock()		  atomic_dec_and_test()
+	 *   LOCK				  MB
+	 *   atomic_read()			  rwsem_is_locked()
+	 *
+	 * LOCK should suffice since the actual taking of the lock must
+	 * happen _before_ what follows.
+	 */
+	might_sleep();
+	if (rwsem_is_locked(&anon_vma->root->rwsem)) {
+		anon_vma_lock_write(anon_vma);
+		anon_vma_unlock_write(anon_vma);
+	}
+
+	kmem_cache_free(anon_vma_cachep, anon_vma);
+}
+
+void __put_anon_vma(struct anon_vma *anon_vma)
+{
+	struct anon_vma *root = anon_vma->root;
+
+	anon_vma_free(anon_vma);
+	if (root != anon_vma && atomic_dec_and_test(&root->refcount))
+		anon_vma_free(root);
+}
+
+static inline struct anon_vma_chain *anon_vma_chain_alloc(gfp_t gfp)
+{
+	return kmem_cache_alloc(anon_vma_chain_cachep, gfp);
+}
+
+static void anon_vma_chain_free(struct anon_vma_chain *anon_vma_chain)
+{
+	kmem_cache_free(anon_vma_chain_cachep, anon_vma_chain);
+}
+
+static void anon_vma_chain_link(struct vm_area_struct *vma,
+				struct anon_vma_chain *avc,
+				struct anon_vma *anon_vma)
+{
+	avc->vma = vma;
+	avc->anon_vma = anon_vma;
+	list_add(&avc->same_vma, &vma->anon_vma_chain);
+	anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
+}
+
+/**
+ * __anon_vma_prepare - attach an anon_vma to a memory region
+ * @vma: the memory region in question
+ *
+ * This makes sure the memory mapping described by 'vma' has
+ * an 'anon_vma' attached to it, so that we can associate the
+ * anonymous pages mapped into it with that anon_vma.
+ *
+ * The common case will be that we already have one, which
+ * is handled inline by anon_vma_prepare(). But if
+ * not we either need to find an adjacent mapping that we
+ * can re-use the anon_vma from (very common when the only
+ * reason for splitting a vma has been mprotect()), or we
+ * allocate a new one.
+ *
+ * Anon-vma allocations are very subtle, because we may have
+ * optimistically looked up an anon_vma in folio_lock_anon_vma_read()
+ * and that may actually touch the rwsem even in the newly
+ * allocated vma (it depends on RCU to make sure that the
+ * anon_vma isn't actually destroyed).
+ *
+ * As a result, we need to do proper anon_vma locking even
+ * for the new allocation. At the same time, we do not want
+ * to do any locking for the common case of already having
+ * an anon_vma.
+ */
+int __anon_vma_prepare(struct vm_area_struct *vma)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	struct anon_vma *anon_vma, *allocated;
+	struct anon_vma_chain *avc;
+
+	mmap_assert_locked(mm);
+	might_sleep();
+
+	avc = anon_vma_chain_alloc(GFP_KERNEL);
+	if (!avc)
+		goto out_enomem;
+
+	anon_vma = find_mergeable_anon_vma(vma);
+	allocated = NULL;
+	if (!anon_vma) {
+		anon_vma = anon_vma_alloc();
+		if (unlikely(!anon_vma))
+			goto out_enomem_free_avc;
+		anon_vma->num_children++; /* self-parent link for new root */
+		allocated = anon_vma;
+	}
+
+	anon_vma_lock_write(anon_vma);
+	/* page_table_lock to protect against threads */
+	spin_lock(&mm->page_table_lock);
+	if (likely(!vma->anon_vma)) {
+		vma->anon_vma = anon_vma;
+		anon_vma_chain_link(vma, avc, anon_vma);
+		anon_vma->num_active_vmas++;
+		allocated = NULL;
+		avc = NULL;
+	}
+	spin_unlock(&mm->page_table_lock);
+	anon_vma_unlock_write(anon_vma);
+
+	if (unlikely(allocated))
+		put_anon_vma(allocated);
+	if (unlikely(avc))
+		anon_vma_chain_free(avc);
+
+	return 0;
+
+ out_enomem_free_avc:
+	anon_vma_chain_free(avc);
+ out_enomem:
+	return -ENOMEM;
+}
+
+/*
+ * This is a useful helper function for locking the anon_vma root as
+ * we traverse the vma->anon_vma_chain, looping over anon_vma's that
+ * have the same vma.
+ *
+ * Such anon_vma's should have the same root, so you'd expect to see
+ * just a single mutex_lock for the whole traversal.
+ */
+static inline struct anon_vma *lock_anon_vma_root(struct anon_vma *root, struct anon_vma *anon_vma)
+{
+	struct anon_vma *new_root = anon_vma->root;
+	if (new_root != root) {
+		if (WARN_ON_ONCE(root))
+			up_write(&root->rwsem);
+		root = new_root;
+		down_write(&root->rwsem);
+	}
+	return root;
+}
+
+static inline void unlock_anon_vma_root(struct anon_vma *root)
+{
+	if (root)
+		up_write(&root->rwsem);
+}
+
+/*
+ * Attach the anon_vmas from src to dst.
+ * Returns 0 on success, -ENOMEM on failure.
+ *
+ * anon_vma_clone() is called by vma_expand(), vma_merge(), __split_vma(),
+ * copy_vma() and anon_vma_fork(). The first four want an exact copy of src,
+ * while the last one, anon_vma_fork(), may try to reuse an existing anon_vma to
+ * prevent endless growth of anon_vma. Since dst->anon_vma is set to NULL before
+ * call, we can identify this case by checking (!dst->anon_vma &&
+ * src->anon_vma).
+ *
+ * If (!dst->anon_vma && src->anon_vma) is true, this function tries to find
+ * and reuse existing anon_vma which has no vmas and only one child anon_vma.
+ * This prevents degradation of anon_vma hierarchy to endless linear chain in
+ * case of constantly forking task. On the other hand, an anon_vma with more
+ * than one child isn't reused even if there was no alive vma, thus rmap
+ * walker has a good chance of avoiding scanning the whole hierarchy when it
+ * searches where page is mapped.
+ */
+int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
+{
+	struct anon_vma_chain *avc, *pavc;
+	struct anon_vma *root = NULL;
+
+	list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
+		struct anon_vma *anon_vma;
+
+		avc = anon_vma_chain_alloc(GFP_NOWAIT | __GFP_NOWARN);
+		if (unlikely(!avc)) {
+			unlock_anon_vma_root(root);
+			root = NULL;
+			avc = anon_vma_chain_alloc(GFP_KERNEL);
+			if (!avc)
+				goto enomem_failure;
+		}
+		anon_vma = pavc->anon_vma;
+		root = lock_anon_vma_root(root, anon_vma);
+		anon_vma_chain_link(dst, avc, anon_vma);
+
+		/*
+		 * Reuse existing anon_vma if it has no vma and only one
+		 * anon_vma child.
+		 *
+		 * Root anon_vma is never reused:
+		 * it has self-parent reference and at least one child.
+		 */
+		if (!dst->anon_vma && src->anon_vma &&
+		    anon_vma->num_children < 2 &&
+		    anon_vma->num_active_vmas == 0)
+			dst->anon_vma = anon_vma;
+	}
+	if (dst->anon_vma)
+		dst->anon_vma->num_active_vmas++;
+	unlock_anon_vma_root(root);
+	return 0;
+
+ enomem_failure:
+	/*
+	 * dst->anon_vma is dropped here otherwise its num_active_vmas can
+	 * be incorrectly decremented in unlink_anon_vmas().
+	 * We can safely do this because callers of anon_vma_clone() don't care
+	 * about dst->anon_vma if anon_vma_clone() failed.
+	 */
+	dst->anon_vma = NULL;
+	unlink_anon_vmas(dst);
+	return -ENOMEM;
+}
+
+/*
+ * Attach vma to its own anon_vma, as well as to the anon_vmas that
+ * the corresponding VMA in the parent process is attached to.
+ * Returns 0 on success, non-zero on failure.
+ */
+int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
+{
+	struct anon_vma_chain *avc;
+	struct anon_vma *anon_vma;
+	int error;
+
+	/* Don't bother if the parent process has no anon_vma here. */
+	if (!pvma->anon_vma)
+		return 0;
+
+	/* Drop inherited anon_vma, we'll reuse existing or allocate new. */
+	vma->anon_vma = NULL;
+
+	/*
+	 * First, attach the new VMA to the parent VMA's anon_vmas,
+	 * so rmap can find non-COWed pages in child processes.
+	 */
+	error = anon_vma_clone(vma, pvma);
+	if (error)
+		return error;
+
+	/* An existing anon_vma has been reused, all done then. */
+	if (vma->anon_vma)
+		return 0;
+
+	/* Then add our own anon_vma. */
+	anon_vma = anon_vma_alloc();
+	if (!anon_vma)
+		goto out_error;
+	anon_vma->num_active_vmas++;
+	avc = anon_vma_chain_alloc(GFP_KERNEL);
+	if (!avc)
+		goto out_error_free_anon_vma;
+
+	/*
+	 * The root anon_vma's rwsem is the lock actually used when we
+	 * lock any of the anon_vmas in this anon_vma tree.
+	 */
+	anon_vma->root = pvma->anon_vma->root;
+	anon_vma->parent = pvma->anon_vma;
+	/*
+	 * With refcounts, an anon_vma can stay around longer than the
+	 * process it belongs to. The root anon_vma needs to be pinned until
+	 * this anon_vma is freed, because the lock lives in the root.
+	 */
+	get_anon_vma(anon_vma->root);
+	/* Mark this anon_vma as the one where our new (COWed) pages go. */
+	vma->anon_vma = anon_vma;
+	anon_vma_lock_write(anon_vma);
+	anon_vma_chain_link(vma, avc, anon_vma);
+	anon_vma->parent->num_children++;
+	anon_vma_unlock_write(anon_vma);
+
+	return 0;
+
+ out_error_free_anon_vma:
+	put_anon_vma(anon_vma);
+ out_error:
+	unlink_anon_vmas(vma);
+	return -ENOMEM;
+}
+
+void unlink_anon_vmas(struct vm_area_struct *vma)
+{
+	struct anon_vma_chain *avc, *next;
+	struct anon_vma *root = NULL;
+
+	/*
+	 * Unlink each anon_vma chained to the VMA.  This list is ordered
+	 * from newest to oldest, ensuring the root anon_vma gets freed last.
+	 */
+	list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
+		struct anon_vma *anon_vma = avc->anon_vma;
+
+		root = lock_anon_vma_root(root, anon_vma);
+		anon_vma_interval_tree_remove(avc, &anon_vma->rb_root);
+
+		/*
+		 * Leave empty anon_vmas on the list - we'll need
+		 * to free them outside the lock.
+		 */
+		if (RB_EMPTY_ROOT(&anon_vma->rb_root.rb_root)) {
+			anon_vma->parent->num_children--;
+			continue;
+		}
+
+		list_del(&avc->same_vma);
+		anon_vma_chain_free(avc);
+	}
+	if (vma->anon_vma) {
+		vma->anon_vma->num_active_vmas--;
+
+		/*
+		 * vma would still be needed after unlink, and anon_vma will be prepared
+		 * when handle fault.
+		 */
+		vma->anon_vma = NULL;
+	}
+	unlock_anon_vma_root(root);
+
+	/*
+	 * Iterate the list once more, it now only contains empty and unlinked
+	 * anon_vmas, destroy them. Could not do before due to __put_anon_vma()
+	 * needing to write-acquire the anon_vma->root->rwsem.
+	 */
+	list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
+		struct anon_vma *anon_vma = avc->anon_vma;
+
+		VM_WARN_ON(anon_vma->num_children);
+		VM_WARN_ON(anon_vma->num_active_vmas);
+		put_anon_vma(anon_vma);
+
+		list_del(&avc->same_vma);
+		anon_vma_chain_free(avc);
+	}
+}
+
+static void anon_vma_ctor(void *data)
+{
+	struct anon_vma *anon_vma = data;
+
+	init_rwsem(&anon_vma->rwsem);
+	atomic_set(&anon_vma->refcount, 0);
+	anon_vma->rb_root = RB_ROOT_CACHED;
+}
+
+void __init anon_vma_init(void)
+{
+	anon_vma_cachep = kmem_cache_create("anon_vma", sizeof(struct anon_vma),
+			0, SLAB_TYPESAFE_BY_RCU|SLAB_PANIC|SLAB_ACCOUNT,
+			anon_vma_ctor);
+	anon_vma_chain_cachep = KMEM_CACHE(anon_vma_chain,
+			SLAB_PANIC|SLAB_ACCOUNT);
+}
+
diff --git a/mm/anon_vma_internal.h b/mm/anon_vma_internal.h
new file mode 100644
index 000000000000..fa364649dc96
--- /dev/null
+++ b/mm/anon_vma_internal.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * anon_vma_internal.h
+ *
+ * Headers required by anon_vma.c, which can be substituted accordingly when
+ * testing anon_vma functionality.
+ */
+
+#ifndef __MM_ANON_VMA_INTERNAL_H
+#define __MM_ANON_VMA_INTERNAL_H
+
+#include "internal.h"
+
+#endif	/* __MM_ANON_VMA_INTERNAL_H */
diff --git a/mm/rmap.c b/mm/rmap.c
index 67bb273dfb80..ec70360b51f2 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -58,7 +58,6 @@
 #include <linux/pagemap.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
-#include <linux/slab.h>
 #include <linux/init.h>
 #include <linux/ksm.h>
 #include <linux/rmap.h>
@@ -84,387 +83,6 @@
 
 #include "internal.h"
 
-static struct kmem_cache *anon_vma_cachep;
-static struct kmem_cache *anon_vma_chain_cachep;
-
-static inline struct anon_vma *anon_vma_alloc(void)
-{
-	struct anon_vma *anon_vma;
-
-	anon_vma = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
-	if (anon_vma) {
-		atomic_set(&anon_vma->refcount, 1);
-		anon_vma->num_children = 0;
-		anon_vma->num_active_vmas = 0;
-		anon_vma->parent = anon_vma;
-		/*
-		 * Initialise the anon_vma root to point to itself. If called
-		 * from fork, the root will be reset to the parents anon_vma.
-		 */
-		anon_vma->root = anon_vma;
-	}
-
-	return anon_vma;
-}
-
-static inline void anon_vma_free(struct anon_vma *anon_vma)
-{
-	VM_BUG_ON(atomic_read(&anon_vma->refcount));
-
-	/*
-	 * Synchronize against folio_lock_anon_vma_read() such that
-	 * we can safely hold the lock without the anon_vma getting
-	 * freed.
-	 *
-	 * Relies on the full mb implied by the atomic_dec_and_test() from
-	 * put_anon_vma() against the acquire barrier implied by
-	 * down_read_trylock() from folio_lock_anon_vma_read(). This orders:
-	 *
-	 * folio_lock_anon_vma_read()	VS	put_anon_vma()
-	 *   down_read_trylock()		  atomic_dec_and_test()
-	 *   LOCK				  MB
-	 *   atomic_read()			  rwsem_is_locked()
-	 *
-	 * LOCK should suffice since the actual taking of the lock must
-	 * happen _before_ what follows.
-	 */
-	might_sleep();
-	if (rwsem_is_locked(&anon_vma->root->rwsem)) {
-		anon_vma_lock_write(anon_vma);
-		anon_vma_unlock_write(anon_vma);
-	}
-
-	kmem_cache_free(anon_vma_cachep, anon_vma);
-}
-
-static inline struct anon_vma_chain *anon_vma_chain_alloc(gfp_t gfp)
-{
-	return kmem_cache_alloc(anon_vma_chain_cachep, gfp);
-}
-
-static void anon_vma_chain_free(struct anon_vma_chain *anon_vma_chain)
-{
-	kmem_cache_free(anon_vma_chain_cachep, anon_vma_chain);
-}
-
-static void anon_vma_chain_link(struct vm_area_struct *vma,
-				struct anon_vma_chain *avc,
-				struct anon_vma *anon_vma)
-{
-	avc->vma = vma;
-	avc->anon_vma = anon_vma;
-	list_add(&avc->same_vma, &vma->anon_vma_chain);
-	anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
-}
-
-/**
- * __anon_vma_prepare - attach an anon_vma to a memory region
- * @vma: the memory region in question
- *
- * This makes sure the memory mapping described by 'vma' has
- * an 'anon_vma' attached to it, so that we can associate the
- * anonymous pages mapped into it with that anon_vma.
- *
- * The common case will be that we already have one, which
- * is handled inline by anon_vma_prepare(). But if
- * not we either need to find an adjacent mapping that we
- * can re-use the anon_vma from (very common when the only
- * reason for splitting a vma has been mprotect()), or we
- * allocate a new one.
- *
- * Anon-vma allocations are very subtle, because we may have
- * optimistically looked up an anon_vma in folio_lock_anon_vma_read()
- * and that may actually touch the rwsem even in the newly
- * allocated vma (it depends on RCU to make sure that the
- * anon_vma isn't actually destroyed).
- *
- * As a result, we need to do proper anon_vma locking even
- * for the new allocation. At the same time, we do not want
- * to do any locking for the common case of already having
- * an anon_vma.
- */
-int __anon_vma_prepare(struct vm_area_struct *vma)
-{
-	struct mm_struct *mm = vma->vm_mm;
-	struct anon_vma *anon_vma, *allocated;
-	struct anon_vma_chain *avc;
-
-	mmap_assert_locked(mm);
-	might_sleep();
-
-	avc = anon_vma_chain_alloc(GFP_KERNEL);
-	if (!avc)
-		goto out_enomem;
-
-	anon_vma = find_mergeable_anon_vma(vma);
-	allocated = NULL;
-	if (!anon_vma) {
-		anon_vma = anon_vma_alloc();
-		if (unlikely(!anon_vma))
-			goto out_enomem_free_avc;
-		anon_vma->num_children++; /* self-parent link for new root */
-		allocated = anon_vma;
-	}
-
-	anon_vma_lock_write(anon_vma);
-	/* page_table_lock to protect against threads */
-	spin_lock(&mm->page_table_lock);
-	if (likely(!vma->anon_vma)) {
-		vma->anon_vma = anon_vma;
-		anon_vma_chain_link(vma, avc, anon_vma);
-		anon_vma->num_active_vmas++;
-		allocated = NULL;
-		avc = NULL;
-	}
-	spin_unlock(&mm->page_table_lock);
-	anon_vma_unlock_write(anon_vma);
-
-	if (unlikely(allocated))
-		put_anon_vma(allocated);
-	if (unlikely(avc))
-		anon_vma_chain_free(avc);
-
-	return 0;
-
- out_enomem_free_avc:
-	anon_vma_chain_free(avc);
- out_enomem:
-	return -ENOMEM;
-}
-
-/*
- * This is a useful helper function for locking the anon_vma root as
- * we traverse the vma->anon_vma_chain, looping over anon_vma's that
- * have the same vma.
- *
- * Such anon_vma's should have the same root, so you'd expect to see
- * just a single mutex_lock for the whole traversal.
- */
-static inline struct anon_vma *lock_anon_vma_root(struct anon_vma *root, struct anon_vma *anon_vma)
-{
-	struct anon_vma *new_root = anon_vma->root;
-	if (new_root != root) {
-		if (WARN_ON_ONCE(root))
-			up_write(&root->rwsem);
-		root = new_root;
-		down_write(&root->rwsem);
-	}
-	return root;
-}
-
-static inline void unlock_anon_vma_root(struct anon_vma *root)
-{
-	if (root)
-		up_write(&root->rwsem);
-}
-
-/*
- * Attach the anon_vmas from src to dst.
- * Returns 0 on success, -ENOMEM on failure.
- *
- * anon_vma_clone() is called by vma_expand(), vma_merge(), __split_vma(),
- * copy_vma() and anon_vma_fork(). The first four want an exact copy of src,
- * while the last one, anon_vma_fork(), may try to reuse an existing anon_vma to
- * prevent endless growth of anon_vma. Since dst->anon_vma is set to NULL before
- * call, we can identify this case by checking (!dst->anon_vma &&
- * src->anon_vma).
- *
- * If (!dst->anon_vma && src->anon_vma) is true, this function tries to find
- * and reuse existing anon_vma which has no vmas and only one child anon_vma.
- * This prevents degradation of anon_vma hierarchy to endless linear chain in
- * case of constantly forking task. On the other hand, an anon_vma with more
- * than one child isn't reused even if there was no alive vma, thus rmap
- * walker has a good chance of avoiding scanning the whole hierarchy when it
- * searches where page is mapped.
- */
-int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
-{
-	struct anon_vma_chain *avc, *pavc;
-	struct anon_vma *root = NULL;
-
-	list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
-		struct anon_vma *anon_vma;
-
-		avc = anon_vma_chain_alloc(GFP_NOWAIT | __GFP_NOWARN);
-		if (unlikely(!avc)) {
-			unlock_anon_vma_root(root);
-			root = NULL;
-			avc = anon_vma_chain_alloc(GFP_KERNEL);
-			if (!avc)
-				goto enomem_failure;
-		}
-		anon_vma = pavc->anon_vma;
-		root = lock_anon_vma_root(root, anon_vma);
-		anon_vma_chain_link(dst, avc, anon_vma);
-
-		/*
-		 * Reuse existing anon_vma if it has no vma and only one
-		 * anon_vma child.
-		 *
-		 * Root anon_vma is never reused:
-		 * it has self-parent reference and at least one child.
-		 */
-		if (!dst->anon_vma && src->anon_vma &&
-		    anon_vma->num_children < 2 &&
-		    anon_vma->num_active_vmas == 0)
-			dst->anon_vma = anon_vma;
-	}
-	if (dst->anon_vma)
-		dst->anon_vma->num_active_vmas++;
-	unlock_anon_vma_root(root);
-	return 0;
-
- enomem_failure:
-	/*
-	 * dst->anon_vma is dropped here otherwise its num_active_vmas can
-	 * be incorrectly decremented in unlink_anon_vmas().
-	 * We can safely do this because callers of anon_vma_clone() don't care
-	 * about dst->anon_vma if anon_vma_clone() failed.
-	 */
-	dst->anon_vma = NULL;
-	unlink_anon_vmas(dst);
-	return -ENOMEM;
-}
-
-/*
- * Attach vma to its own anon_vma, as well as to the anon_vmas that
- * the corresponding VMA in the parent process is attached to.
- * Returns 0 on success, non-zero on failure.
- */
-int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
-{
-	struct anon_vma_chain *avc;
-	struct anon_vma *anon_vma;
-	int error;
-
-	/* Don't bother if the parent process has no anon_vma here. */
-	if (!pvma->anon_vma)
-		return 0;
-
-	/* Drop inherited anon_vma, we'll reuse existing or allocate new. */
-	vma->anon_vma = NULL;
-
-	/*
-	 * First, attach the new VMA to the parent VMA's anon_vmas,
-	 * so rmap can find non-COWed pages in child processes.
-	 */
-	error = anon_vma_clone(vma, pvma);
-	if (error)
-		return error;
-
-	/* An existing anon_vma has been reused, all done then. */
-	if (vma->anon_vma)
-		return 0;
-
-	/* Then add our own anon_vma. */
-	anon_vma = anon_vma_alloc();
-	if (!anon_vma)
-		goto out_error;
-	anon_vma->num_active_vmas++;
-	avc = anon_vma_chain_alloc(GFP_KERNEL);
-	if (!avc)
-		goto out_error_free_anon_vma;
-
-	/*
-	 * The root anon_vma's rwsem is the lock actually used when we
-	 * lock any of the anon_vmas in this anon_vma tree.
-	 */
-	anon_vma->root = pvma->anon_vma->root;
-	anon_vma->parent = pvma->anon_vma;
-	/*
-	 * With refcounts, an anon_vma can stay around longer than the
-	 * process it belongs to. The root anon_vma needs to be pinned until
-	 * this anon_vma is freed, because the lock lives in the root.
-	 */
-	get_anon_vma(anon_vma->root);
-	/* Mark this anon_vma as the one where our new (COWed) pages go. */
-	vma->anon_vma = anon_vma;
-	anon_vma_lock_write(anon_vma);
-	anon_vma_chain_link(vma, avc, anon_vma);
-	anon_vma->parent->num_children++;
-	anon_vma_unlock_write(anon_vma);
-
-	return 0;
-
- out_error_free_anon_vma:
-	put_anon_vma(anon_vma);
- out_error:
-	unlink_anon_vmas(vma);
-	return -ENOMEM;
-}
-
-void unlink_anon_vmas(struct vm_area_struct *vma)
-{
-	struct anon_vma_chain *avc, *next;
-	struct anon_vma *root = NULL;
-
-	/*
-	 * Unlink each anon_vma chained to the VMA.  This list is ordered
-	 * from newest to oldest, ensuring the root anon_vma gets freed last.
-	 */
-	list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
-		struct anon_vma *anon_vma = avc->anon_vma;
-
-		root = lock_anon_vma_root(root, anon_vma);
-		anon_vma_interval_tree_remove(avc, &anon_vma->rb_root);
-
-		/*
-		 * Leave empty anon_vmas on the list - we'll need
-		 * to free them outside the lock.
-		 */
-		if (RB_EMPTY_ROOT(&anon_vma->rb_root.rb_root)) {
-			anon_vma->parent->num_children--;
-			continue;
-		}
-
-		list_del(&avc->same_vma);
-		anon_vma_chain_free(avc);
-	}
-	if (vma->anon_vma) {
-		vma->anon_vma->num_active_vmas--;
-
-		/*
-		 * vma would still be needed after unlink, and anon_vma will be prepared
-		 * when handle fault.
-		 */
-		vma->anon_vma = NULL;
-	}
-	unlock_anon_vma_root(root);
-
-	/*
-	 * Iterate the list once more, it now only contains empty and unlinked
-	 * anon_vmas, destroy them. Could not do before due to __put_anon_vma()
-	 * needing to write-acquire the anon_vma->root->rwsem.
-	 */
-	list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
-		struct anon_vma *anon_vma = avc->anon_vma;
-
-		VM_WARN_ON(anon_vma->num_children);
-		VM_WARN_ON(anon_vma->num_active_vmas);
-		put_anon_vma(anon_vma);
-
-		list_del(&avc->same_vma);
-		anon_vma_chain_free(avc);
-	}
-}
-
-static void anon_vma_ctor(void *data)
-{
-	struct anon_vma *anon_vma = data;
-
-	init_rwsem(&anon_vma->rwsem);
-	atomic_set(&anon_vma->refcount, 0);
-	anon_vma->rb_root = RB_ROOT_CACHED;
-}
-
-void __init anon_vma_init(void)
-{
-	anon_vma_cachep = kmem_cache_create("anon_vma", sizeof(struct anon_vma),
-			0, SLAB_TYPESAFE_BY_RCU|SLAB_PANIC|SLAB_ACCOUNT,
-			anon_vma_ctor);
-	anon_vma_chain_cachep = KMEM_CACHE(anon_vma_chain,
-			SLAB_PANIC|SLAB_ACCOUNT);
-}
 
 /*
  * Getting a lock on a stable anon_vma from a page off the LRU is tricky!
@@ -2749,15 +2367,6 @@ struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr,
 EXPORT_SYMBOL_GPL(make_device_exclusive);
 #endif
 
-void __put_anon_vma(struct anon_vma *anon_vma)
-{
-	struct anon_vma *root = anon_vma->root;
-
-	anon_vma_free(anon_vma);
-	if (root != anon_vma && atomic_dec_and_test(&root->refcount))
-		anon_vma_free(root);
-}
-
 static struct anon_vma *rmap_walk_anon_lock(const struct folio *folio,
 					    struct rmap_walk_control *rwc)
 {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC Patch 2/5] anon_vma: add skeleton code for userland testing of anon_vma logic
  2025-04-29  9:06 [RFC Patch 0/5] Make anon_vma operations testable Wei Yang
  2025-04-29  9:06 ` [RFC Patch 1/5] mm: move anon_vma manipulation functions to own file Wei Yang
@ 2025-04-29  9:06 ` Wei Yang
  2025-05-01  1:31   ` Wei Yang
  2025-04-29  9:06 ` [RFC Patch 3/5] anon_vma: add test for mergeable anon_vma Wei Yang
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Wei Yang @ 2025-04-29  9:06 UTC (permalink / raw)
  To: akpm
  Cc: david, lorenzo.stoakes, riel, vbabka, harry.yoo, jannh, baohua,
	linux-mm, Wei Yang

Establish a new userland anon_vma unit testing implementation under
tools/testing.

Current is a skeleton with simple cases and provides possibility for new
tests.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Harry Yoo <harry.yoo@oracle.com>
---
 tools/include/linux/rwsem.h                |  10 +
 tools/include/linux/slab.h                 |   4 +
 tools/testing/anon_vma/.gitignore          |   3 +
 tools/testing/anon_vma/Makefile            |  25 ++
 tools/testing/anon_vma/anon_vma.c          | 438 +++++++++++++++++++++
 tools/testing/anon_vma/anon_vma_internal.h |  55 +++
 tools/testing/anon_vma/interval_tree.c     |  53 +++
 tools/testing/anon_vma/linux/atomic.h      |  18 +
 tools/testing/anon_vma/linux/fs.h          |   6 +
 tools/testing/anon_vma/linux/mm.h          |  44 +++
 tools/testing/anon_vma/linux/mm_types.h    |  57 +++
 tools/testing/anon_vma/linux/mmzone.h      |   6 +
 tools/testing/anon_vma/linux/rmap.h        |   8 +
 tools/testing/shared/linux/anon_vma.h      |   7 +
 14 files changed, 734 insertions(+)
 create mode 100644 tools/testing/anon_vma/.gitignore
 create mode 100644 tools/testing/anon_vma/Makefile
 create mode 100644 tools/testing/anon_vma/anon_vma.c
 create mode 100644 tools/testing/anon_vma/anon_vma_internal.h
 create mode 100644 tools/testing/anon_vma/interval_tree.c
 create mode 100644 tools/testing/anon_vma/linux/atomic.h
 create mode 100644 tools/testing/anon_vma/linux/fs.h
 create mode 100644 tools/testing/anon_vma/linux/mm.h
 create mode 100644 tools/testing/anon_vma/linux/mm_types.h
 create mode 100644 tools/testing/anon_vma/linux/mmzone.h
 create mode 100644 tools/testing/anon_vma/linux/rmap.h
 create mode 100644 tools/testing/shared/linux/anon_vma.h

diff --git a/tools/include/linux/rwsem.h b/tools/include/linux/rwsem.h
index f8bffd4a987c..a74e424e4cfe 100644
--- a/tools/include/linux/rwsem.h
+++ b/tools/include/linux/rwsem.h
@@ -38,6 +38,16 @@ static inline int up_write(struct rw_semaphore *sem)
 	return pthread_rwlock_unlock(&sem->lock);
 }
 
+static inline int down_read_trylock(struct rw_semaphore *sem)
+{
+	return pthread_rwlock_tryrdlock(&sem->lock);
+}
+
+static inline int down_write_trylock(struct rw_semaphore *sem)
+{
+	return pthread_rwlock_trywrlock(&sem->lock);
+}
+
 #define down_read_nested(sem, subclass)		down_read(sem)
 #define down_write_nested(sem, subclass)	down_write(sem)
 
diff --git a/tools/include/linux/slab.h b/tools/include/linux/slab.h
index c87051e2b26f..6ebdda7aadbf 100644
--- a/tools/include/linux/slab.h
+++ b/tools/include/linux/slab.h
@@ -41,6 +41,10 @@ struct kmem_cache *kmem_cache_create(const char *name, unsigned int size,
 			unsigned int align, unsigned int flags,
 			void (*ctor)(void *));
 
+#define KMEM_CACHE(__struct, __flags)					\
+	kmem_cache_create(#__struct, sizeof(struct __struct),		\
+		__alignof__(struct __struct), __flags, NULL)
+
 void kmem_cache_free_bulk(struct kmem_cache *cachep, size_t size, void **list);
 int kmem_cache_alloc_bulk(struct kmem_cache *cachep, gfp_t gfp, size_t size,
 			  void **list);
diff --git a/tools/testing/anon_vma/.gitignore b/tools/testing/anon_vma/.gitignore
new file mode 100644
index 000000000000..aa2885a029f9
--- /dev/null
+++ b/tools/testing/anon_vma/.gitignore
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-or-later
+generated/
+anon_vma
diff --git a/tools/testing/anon_vma/Makefile b/tools/testing/anon_vma/Makefile
new file mode 100644
index 000000000000..120e2217e957
--- /dev/null
+++ b/tools/testing/anon_vma/Makefile
@@ -0,0 +1,25 @@
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+.PHONY: default clean
+
+default: anon_vma
+
+include ../shared/shared.mk
+
+CFLAGS += -D CONFIG_PHYS_ADDR_T_64BIT
+ifeq ($(DEBUG), 1)
+	CFLAGS += -D ANON_VMA_INTERVAL_TREE_DEBUG -g
+endif
+
+OFILES = $(LIBS) linux.o anon_vma.o rbtree-shim.o \
+	 interval_tree-shim.o interval_tree.o
+TARGETS = anon_vma
+
+interval_tree.o: ../../../mm/interval_tree.c
+anon_vma.o: anon_vma.c anon_vma_internal.h ../../../mm/anon_vma.c \
+	 ../../../include/linux/anon_vma.h
+
+anon_vma: $(OFILES)
+
+clean:
+	$(RM) $(TARGETS) *.o generated/*
diff --git a/tools/testing/anon_vma/anon_vma.c b/tools/testing/anon_vma/anon_vma.c
new file mode 100644
index 000000000000..f3ca193857ec
--- /dev/null
+++ b/tools/testing/anon_vma/anon_vma.c
@@ -0,0 +1,438 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <linux/bitmap.h>
+
+#include "anon_vma_internal.h"
+#include "../../../mm/anon_vma.c"
+
+#define ASSERT_TRUE(_expr)						\
+	do {								\
+		if (!(_expr)) {						\
+			fprintf(stderr,					\
+				"Assert FAILED at %s:%d:%s(): %s is FALSE.\n", \
+				__FILE__, __LINE__, __func__, #_expr); \
+			return false;					\
+		}							\
+	} while (0)
+#define ASSERT_FALSE(_expr) ASSERT_TRUE(!(_expr))
+#define ASSERT_EQ(_val1, _val2) ASSERT_TRUE((_val1) == (_val2))
+#define ASSERT_NE(_val1, _val2) ASSERT_TRUE((_val1) != (_val2))
+
+static struct mm_struct dummy_mm = {
+	.page_table_lock = __SPIN_LOCK_UNLOCKED(dummy_mm.page_table_lock),
+};
+
+#define NUM_VMAS 50
+static int vmas_idx;
+static struct vm_area_struct *vmas[NUM_VMAS];
+static struct kmem_cache *vm_area_cachep;
+
+static void vma_ctor(void *data)
+{
+	struct vm_area_struct *vma;
+
+	vma = data;
+	INIT_LIST_HEAD(&vma->anon_vma_chain);
+	vma->vm_mm = &dummy_mm;
+	vma->anon_vma = NULL;
+}
+
+void vma_cache_init(void)
+{
+	/* vma's kmem_cache for test */
+	vm_area_cachep = kmem_cache_create("vm_area_struct",
+			sizeof(struct vm_area_struct), 0,
+			SLAB_PANIC|SLAB_ACCOUNT, vma_ctor);
+}
+
+struct vm_area_struct *alloc_vma(unsigned long start, unsigned long end, pgoff_t pgoff)
+{
+	if (vmas_idx >= NUM_VMAS)
+		return NULL;
+
+	vmas[vmas_idx] = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
+	vmas[vmas_idx]->index = vmas_idx;
+	vma_set_range(vmas[vmas_idx], start, end, pgoff);
+
+	return vmas[vmas_idx++];
+}
+
+void vm_area_free(struct vm_area_struct *vma)
+{
+	kmem_cache_free(vm_area_cachep, vma);
+}
+
+void cleanup(void)
+{
+	int i;
+
+	for (i = 0; i < NUM_VMAS; i++) {
+		if (!vmas[i])
+			continue;
+
+		unlink_anon_vmas(vmas[i]);
+		vm_area_free(vmas[i]);
+		vmas[i] = NULL;
+	}
+
+	vmas_idx = 0;
+}
+
+static bool test_simple_fault(void)
+{
+	struct vm_area_struct *root_vma;
+	struct anon_vma_chain *avc;
+
+	root_vma = alloc_vma(0x3000, 0x5000, 3);
+	/* First fault on anonymous vma would create its anon_vma. */
+	__anon_vma_prepare(root_vma);
+	ASSERT_NE(NULL, root_vma->anon_vma);
+	dump_anon_vma_interval_tree(root_vma->anon_vma);
+
+	/*
+	 *  anon_vma           root_vma
+	 *  +-----------+      +-----------+
+	 *  |           | ---> |           |
+	 *  +-----------+      +-----------+
+	 */
+
+	anon_vma_interval_tree_foreach(avc, &root_vma->anon_vma->rb_root, 3, 4) {
+		/* Expect to find itself from anon_vma interval tree */
+		ASSERT_EQ(avc->vma, root_vma);
+	}
+
+	cleanup();
+
+	ASSERT_EQ(0, nr_allocated);
+	return true;
+}
+
+static bool test_simple_fork(void)
+{
+	struct vm_area_struct *root_vma, *child_vma;
+	struct anon_vma_chain *avc;
+	DECLARE_BITMAP(expected, 10);
+	DECLARE_BITMAP(found, 10);
+
+	bitmap_zero(expected, 10);
+	bitmap_zero(found, 10);
+
+	/*
+	 *  anon_vma           root_vma
+	 *  +-----------+      +-----------+
+	 *  |           | ---> |           |
+	 *  +-----------+      +-----------+
+	 */
+
+	root_vma = alloc_vma(0x3000, 0x5000, 3);
+	/* First fault on parent anonymous vma. */
+	__anon_vma_prepare(root_vma);
+	ASSERT_NE(NULL, root_vma->anon_vma);
+	bitmap_set(expected, root_vma->index, 1);
+
+	/*
+	 *  anon_vma           root_vma
+	 *  +-----------+      +-----------+
+	 *  |           | ---> |           |
+	 *  +-----------+      +-----------+
+	 *                \
+	 *                 \   child_vma
+	 *                  \  +-----------+
+	 *                   > |           |
+	 *                     +-----------+
+	 */
+
+	child_vma = alloc_vma(0x3000, 0x5000, 3);
+	/* Fork child will link it to parent and may create its own anon_vma. */
+	anon_vma_fork(child_vma, root_vma);
+	ASSERT_NE(NULL, child_vma->anon_vma);
+	bitmap_set(expected, child_vma->index, 1);
+	/* Parent/Root is root_vma->anon_vma */
+	ASSERT_EQ(child_vma->anon_vma->parent, root_vma->anon_vma);
+	ASSERT_EQ(child_vma->anon_vma->root, root_vma->anon_vma);
+	dump_anon_vma_interval_tree(root_vma->anon_vma);
+
+	anon_vma_interval_tree_foreach(avc, &root_vma->anon_vma->rb_root, 3, 4) {
+		bitmap_set(found, avc->vma->index, 1);
+	}
+
+	/* Expect to find all vma including the forked one. */
+	ASSERT_TRUE(bitmap_equal(expected, found, 10));
+
+	cleanup();
+
+	ASSERT_EQ(0, nr_allocated);
+	return true;
+}
+
+static bool test_fork_two(void)
+{
+	struct vm_area_struct *root_vma, *vma1, *vma2;
+	struct anon_vma_chain *avc;
+	DECLARE_BITMAP(expected, 10);
+	DECLARE_BITMAP(found, 10);
+
+	bitmap_zero(expected, 10);
+	bitmap_zero(found, 10);
+
+	/*
+	 *  anon_vma           root_vma
+	 *  +-----------+      +-----------+
+	 *  |           | ---> |           |
+	 *  +-----------+      +-----------+
+	 */
+
+	root_vma = alloc_vma(0x3000, 0x5000, 3);
+	/* First fault on parent anonymous vma. */
+	__anon_vma_prepare(root_vma);
+	ASSERT_NE(NULL, root_vma->anon_vma);
+	bitmap_set(expected, root_vma->index, 1);
+
+	/* First fork */
+	/*
+	 *  anon_vma           root_vma
+	 *  +-----------+      +-----------+
+	 *  |           | ---> |           |
+	 *  +-----------+      +-----------+
+	 *                \
+	 *                 \   vma1
+	 *                  \  +-----------+
+	 *                   > |           |
+	 *                     +-----------+
+	 */
+	vma1 = alloc_vma(0x3000, 0x5000, 3);
+	anon_vma_fork(vma1, root_vma);
+	ASSERT_NE(NULL, vma1->anon_vma);
+	bitmap_set(expected, vma1->index, 1);
+	/* Parent/Root is root_vma->anon_vma */
+	ASSERT_EQ(vma1->anon_vma->parent, root_vma->anon_vma);
+	ASSERT_EQ(vma1->anon_vma->root, root_vma->anon_vma);
+
+	/* Second fork */
+	/*
+	 *  anon_vma           root_vma
+	 *  +-----------+      +-----------+
+	 *  |           | ---> |           |
+	 *  +-----------+      +-----------+
+	 *               \
+	 *                \------------------+
+	 *                 \   vma1           \   vma2
+	 *                  \  +-----------+   \  +-----------+
+	 *                   > |           |    > |           |
+	 *                     +-----------+      +-----------+
+	 */
+	vma2 = alloc_vma(0x3000, 0x5000, 3);
+	anon_vma_fork(vma2, root_vma);
+	ASSERT_NE(NULL, vma2->anon_vma);
+	bitmap_set(expected, vma2->index, 1);
+	/* Parent/Root is root_vma->anon_vma */
+	ASSERT_EQ(vma2->anon_vma->parent, root_vma->anon_vma);
+	ASSERT_EQ(vma2->anon_vma->root, root_vma->anon_vma);
+	dump_anon_vma_interval_tree(root_vma->anon_vma);
+
+	anon_vma_interval_tree_foreach(avc, &root_vma->anon_vma->rb_root, 3, 4) {
+		bitmap_set(found, avc->vma->index, 1);
+	}
+
+	/* Expect to find all vma including the forked one. */
+	ASSERT_TRUE(bitmap_equal(expected, found, 10));
+
+	/*
+	 *  vma1->anon_vma     vma1
+	 *  +-----------+      +-----------+
+	 *  |           | ---> |           |
+	 *  +-----------+      +-----------+
+	 */
+	anon_vma_interval_tree_foreach(avc, &vma1->anon_vma->rb_root, 3, 4) {
+		/* Expect to find only itself from its anon_vma interval tree */
+		ASSERT_EQ(avc->vma, vma1);
+	}
+
+	/*
+	 *  vma2->anon_vma     vma2
+	 *  +-----------+      +-----------+
+	 *  |           | ---> |           |
+	 *  +-----------+      +-----------+
+	 */
+	anon_vma_interval_tree_foreach(avc, &vma2->anon_vma->rb_root, 3, 4) {
+		/* Expect to find only itself from its anon_vma interval tree */
+		ASSERT_EQ(avc->vma, vma2);
+	}
+
+	cleanup();
+
+	ASSERT_EQ(0, nr_allocated);
+	return true;
+}
+
+static bool test_fork_grand_child(void)
+{
+	struct vm_area_struct *root_vma, *grand_vma, *vma1, *vma2;
+	struct anon_vma_chain *avc;
+	struct anon_vma *root_anon_vma;
+	DECLARE_BITMAP(expected, 10);
+	DECLARE_BITMAP(found, 10);
+
+	bitmap_zero(expected, 10);
+	bitmap_zero(found, 10);
+
+	/*
+	 *  root_anon_vma      root_vma
+	 *  +-----------+      +-----------+
+	 *  |           | ---> |           |
+	 *  +-----------+      +-----------+
+	 */
+
+	root_vma = alloc_vma(0x3000, 0x5000, 3);
+	/* First fault on parent anonymous vma. */
+	__anon_vma_prepare(root_vma);
+	root_anon_vma = root_vma->anon_vma;
+	ASSERT_NE(NULL, root_anon_vma);
+	bitmap_set(expected, root_vma->index, 1);
+
+	/* First fork */
+	/*
+	 *  root_anon_vma      root_vma
+	 *  +-----------+      +-----------+
+	 *  |           | ---> |           |
+	 *  +-----------+      +-----------+
+	 *                \
+	 *                 \   vma1
+	 *                  \  +-----------+
+	 *                   > |           |
+	 *                     +-----------+
+	 */
+	vma1 = alloc_vma(0x3000, 0x5000, 3);
+	anon_vma_fork(vma1, root_vma);
+	ASSERT_NE(NULL, vma1->anon_vma);
+	bitmap_set(expected, vma1->index, 1);
+	/* Parent/Root is root_vma->anon_vma */
+	ASSERT_EQ(vma1->anon_vma->parent, root_vma->anon_vma);
+	ASSERT_EQ(vma1->anon_vma->root, root_vma->anon_vma);
+
+	/* Second fork */
+	/*
+	 *  root_anon_vma      root_vma
+	 *  +-----------+      +-----------+
+	 *  |           | ---> |           |
+	 *  +-----------+      +-----------+
+	 *               \
+	 *                \------------------+
+	 *                 \   vma1           \   vma2
+	 *                  \  +-----------+   \  +-----------+
+	 *                   > |           |    > |           |
+	 *                     +-----------+      +-----------+
+	 */
+	vma2 = alloc_vma(0x3000, 0x5000, 3);
+	anon_vma_fork(vma2, root_vma);
+	ASSERT_NE(NULL, vma2->anon_vma);
+	bitmap_set(expected, vma2->index, 1);
+	/* Parent/Root is root_vma->anon_vma */
+	ASSERT_EQ(vma2->anon_vma->parent, root_vma->anon_vma);
+	ASSERT_EQ(vma2->anon_vma->root, root_vma->anon_vma);
+	dump_anon_vma_interval_tree(root_vma->anon_vma);
+
+	/* Fork grand child from second child */
+	/*
+	 *  root_anon_vma      root_vma
+	 *  +-----------+      +-----------+
+	 *  |           | ---> |           |
+	 *  +-----------+      +-----------+
+	 *               \
+	 *                \------------------+
+	 *                |\   vma1           \   vma2
+	 *                | \  +-----------+   \  +-----------+
+	 *                |  > |           |    > |           |
+	 *                |    +-----------+      +-----------+
+	 *                \
+	 *                 \   grand_vma
+	 *                  \  +-----------+
+	 *                   > |           |
+	 *                     +-----------+
+	 */
+	grand_vma = alloc_vma(0x3000, 0x5000, 3);
+	anon_vma_fork(grand_vma, vma2);
+	ASSERT_NE(NULL, grand_vma->anon_vma);
+	bitmap_set(expected, grand_vma->index, 1);
+	/* Root is root_vma->anon_vma */
+	ASSERT_EQ(grand_vma->anon_vma->root, root_vma->anon_vma);
+	/* Parent is vma2->anon_vma */
+	ASSERT_EQ(grand_vma->anon_vma->parent, vma2->anon_vma);
+
+	/* Expect to find only vmas from second fork */
+	anon_vma_interval_tree_foreach(avc, &vma2->anon_vma->rb_root, 3, 4) {
+		ASSERT_TRUE(avc->vma == vma2 || avc->vma == grand_vma);
+	}
+
+	anon_vma_interval_tree_foreach(avc, &root_vma->anon_vma->rb_root, 3, 4) {
+		bitmap_set(found, avc->vma->index, 1);
+	}
+	/* Expect to find all vma including child and grand child. */
+	ASSERT_TRUE(bitmap_equal(expected, found, 10));
+
+	/* Root process exit or unmap root_vma. */
+	/*
+	 *  root_anon_vma
+	 *  +-----------+
+	 *  |           |
+	 *  +-----------+
+	 *               \
+	 *                \------------------+
+	 *                |\   vma1           \   vma2
+	 *                | \  +-----------+   \  +-----------+
+	 *                |  > |           |    > |           |
+	 *                |    +-----------+      +-----------+
+	 *                \
+	 *                 \   grand_vma
+	 *                  \  +-----------+
+	 *                   > |           |
+	 *                     +-----------+
+	 */
+	bitmap_clear(expected, root_vma->index, 1);
+	unlink_anon_vmas(root_vma);
+	ASSERT_EQ(0, root_anon_vma->num_active_vmas);
+
+	bitmap_zero(found, 10);
+	anon_vma_interval_tree_foreach(avc, &root_anon_vma->rb_root, 3, 4) {
+		bitmap_set(found, avc->vma->index, 1);
+	}
+	/* Expect to find all vmas even root_vma released. */
+	ASSERT_TRUE(bitmap_equal(expected, found, 10));
+
+	cleanup();
+
+	ASSERT_EQ(0, nr_allocated);
+	return true;
+}
+
+int main(void)
+{
+	int num_tests = 0, num_fail = 0;
+
+	vma_cache_init();
+	anon_vma_init();
+
+#define TEST(name)							\
+	do {								\
+		num_tests++;						\
+		if (!test_##name()) {					\
+			num_fail++;					\
+			fprintf(stderr, "Test " #name " FAILED\n");	\
+		}							\
+	} while (0)
+
+	TEST(simple_fault);
+	TEST(simple_fork);
+	TEST(fork_two);
+	TEST(fork_grand_child);
+
+#undef TEST
+
+	printf("%d tests run, %d passed, %d failed.\n",
+	       num_tests, num_tests - num_fail, num_fail);
+
+	return num_fail == 0 ? EXIT_SUCCESS : EXIT_FAILURE;
+}
diff --git a/tools/testing/anon_vma/anon_vma_internal.h b/tools/testing/anon_vma/anon_vma_internal.h
new file mode 100644
index 000000000000..296c1df71f7c
--- /dev/null
+++ b/tools/testing/anon_vma/anon_vma_internal.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef __MM_ANON_VMA_INTERNAL_H
+#define __MM_ANON_VMA_INTERNAL_H
+
+#define CONFIG_MMU
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mm_types.h>
+#include <linux/mm.h>
+#include <linux/bug.h>
+
+#define SLAB_TYPESAFE_BY_RCU	0
+#define SLAB_ACCOUNT		0
+
+static inline void might_sleep(void)
+{
+}
+
+static inline void mmap_assert_locked(struct mm_struct *)
+{
+}
+
+/*
+ * rwsem_is_locked() is only used in anon_vma_free() to synchronize against
+ * folio_lock_anon_vma_read(), which is not used in current tests.
+ *
+ * So just return 0 for simplification.
+ */
+static inline int rwsem_is_locked(struct rw_semaphore *sem)
+{
+	return 0;
+}
+
+static inline struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *vma)
+{
+	return NULL;
+}
+
+#ifndef pgoff_t
+#define pgoff_t unsigned long
+#endif
+static inline void vma_set_range(struct vm_area_struct *vma,
+					  unsigned long start, unsigned long end,
+					  pgoff_t pgoff)
+{
+	vma->vm_start = start;
+	vma->vm_end = end;
+	vma->vm_pgoff = pgoff;
+}
+
+void dump_anon_vma_interval_tree(struct anon_vma *anon_vma);
+extern int nr_allocated;
+#endif // __MM_ANON_VMA_INTERNAL_H
diff --git a/tools/testing/anon_vma/interval_tree.c b/tools/testing/anon_vma/interval_tree.c
new file mode 100644
index 000000000000..154c9ccbf3e1
--- /dev/null
+++ b/tools/testing/anon_vma/interval_tree.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include "../../../mm/interval_tree.c"
+
+#ifdef ANON_VMA_INTERVAL_TREE_DEBUG
+enum child_dir {
+	left_child,
+	right_child,
+	root_node
+};
+
+typedef void (*dump_node)(struct rb_node *node, int level);
+
+void dump_avc(struct rb_node *node, int level)
+{
+	struct anon_vma_chain *avc;
+
+	avc = rb_entry(node, struct anon_vma_chain, rb);
+
+	printf(" -%02d [%lu, %lu] %lu\n", avc->vma->index,
+		avc_start_pgoff(avc), avc_last_pgoff(avc),
+		avc->rb_subtree_last);
+}
+
+void dump_rb_tree(struct rb_node *node, int level,
+		  enum child_dir state, dump_node print)
+{
+	char prefix[40] = "                                        ";
+
+	if (!node)
+		return;
+
+	dump_rb_tree(node->rb_right, level+1, right_child, print);
+
+	if (state == left_child)
+		printf("%.*s|\n", min_t(int, level * 2 + 2, ARRAY_SIZE(prefix)), prefix);
+
+	printf("%02d%.*s", level, min_t(int, level * 2, ARRAY_SIZE(prefix)), prefix);
+	(*print)(node, level);
+
+	if (state == right_child)
+		printf("%.*s|\n", min_t(int, level * 2 + 2, ARRAY_SIZE(prefix)), prefix);
+
+	dump_rb_tree(node->rb_left, level+1, left_child, print);
+}
+
+void dump_anon_vma_interval_tree(struct anon_vma *anon_vma)
+{
+	dump_rb_tree(anon_vma->rb_root.rb_root.rb_node, 0, root_node, dump_avc);
+}
+#else
+void dump_anon_vma_interval_tree(struct anon_vma *anon_vma) { return; }
+#endif
diff --git a/tools/testing/anon_vma/linux/atomic.h b/tools/testing/anon_vma/linux/atomic.h
new file mode 100644
index 000000000000..b9f7c2e91f15
--- /dev/null
+++ b/tools/testing/anon_vma/linux/atomic.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef __TOOLS_LINUX_ATOMIC_H
+#define __TOOLS_LINUX_ATOMIC_H
+
+#include <urcu/uatomic.h>
+
+#define atomic_t int32_t
+#define atomic_inc(x) uatomic_inc(x)
+#define atomic_read(x) uatomic_read(x)
+#define atomic_set(x, y) uatomic_set(x, y)
+
+static inline bool atomic_dec_and_test(atomic_t *v)
+{
+	return uatomic_sub_return(v, 1) == 0;
+}
+
+#endif	/* __TOOLS_LINUX_ATOMIC_H */
diff --git a/tools/testing/anon_vma/linux/fs.h b/tools/testing/anon_vma/linux/fs.h
new file mode 100644
index 000000000000..9e9addd3310c
--- /dev/null
+++ b/tools/testing/anon_vma/linux/fs.h
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef __TOOLS_LINUX_FS_H
+#define __TOOLS_LINUX_FS_H
+
+#endif /* __TOOLS_LINUX_FS_H */
diff --git a/tools/testing/anon_vma/linux/mm.h b/tools/testing/anon_vma/linux/mm.h
new file mode 100644
index 000000000000..3bfe232f298a
--- /dev/null
+++ b/tools/testing/anon_vma/linux/mm.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef __TOOLS_LINUX_MM_H
+#define __TOOLS_LINUX_MM_H
+
+#include <linux/kernel.h>
+#include <linux/mm_types.h>
+#include <linux/interval_tree_generic.h>
+#include <linux/types.h>
+#include "../../include/linux/mm.h"
+
+#define VM_WARN_ON(_expr) (WARN_ON(_expr))
+#define VM_BUG_ON(_expr) (BUG_ON(_expr))
+#define VM_BUG_ON_VMA(_expr, _vma) (BUG_ON(_expr))
+
+static inline unsigned long vma_pages(struct vm_area_struct *vma)
+{
+	return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+}
+
+#include <linux/anon_vma.h>
+
+#define vma_interval_tree_foreach(vma, root, start, last)		\
+	for (vma = vma_interval_tree_iter_first(root, start, last);	\
+	     vma; vma = vma_interval_tree_iter_next(vma, start, last))
+
+void anon_vma_interval_tree_insert(struct anon_vma_chain *node,
+				   struct rb_root_cached *root);
+void anon_vma_interval_tree_remove(struct anon_vma_chain *node,
+				   struct rb_root_cached *root);
+struct anon_vma_chain *
+anon_vma_interval_tree_iter_first(struct rb_root_cached *root,
+				  unsigned long start, unsigned long last);
+struct anon_vma_chain *anon_vma_interval_tree_iter_next(
+	struct anon_vma_chain *node, unsigned long start, unsigned long last);
+#ifdef CONFIG_DEBUG_VM_RB
+void anon_vma_interval_tree_verify(struct anon_vma_chain *node);
+#endif
+
+#define anon_vma_interval_tree_foreach(avc, root, start, last)		 \
+	for (avc = anon_vma_interval_tree_iter_first(root, start, last); \
+	     avc; avc = anon_vma_interval_tree_iter_next(avc, start, last))
+
+#endif /* __TOOLS_LINUX_MM_H */
diff --git a/tools/testing/anon_vma/linux/mm_types.h b/tools/testing/anon_vma/linux/mm_types.h
new file mode 100644
index 000000000000..9cfbedecdc35
--- /dev/null
+++ b/tools/testing/anon_vma/linux/mm_types.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef __TOOLS_LINUX_MM_TYPES_H
+#define __TOOLS_LINUX_MM_TYPES_H
+
+#include <linux/list.h>
+#include <linux/rbtree.h>
+#include <linux/slab.h>
+#include <linux/rwsem.h>
+#include <linux/spinlock.h>
+#include <linux/atomic.h>
+
+struct mm_struct {
+		spinlock_t page_table_lock;
+};
+
+struct vm_area_struct {
+	/* For test purpose */
+	int	index;
+	/* VMA covers [vm_start; vm_end) addresses within mm */
+	unsigned long vm_start;
+	unsigned long vm_end;
+
+	/* The address space we belong to. */
+	struct mm_struct *vm_mm;
+
+	/*
+	 * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
+	 * list, after a COW of one of the file pages.	A MAP_SHARED vma
+	 * can only be in the i_mmap tree.  An anonymous MAP_PRIVATE, stack
+	 * or brk vma (with NULL file) can only be in an anon_vma list.
+	 */
+	struct list_head anon_vma_chain; /* Serialized by mmap_lock &
+					  * page_table_lock */
+	struct anon_vma *anon_vma;	/* Serialized by page_table_lock */
+	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
+					   units */
+	/*
+	 * For areas with an address space and backing store,
+	 * linkage into the address_space->i_mmap interval tree.
+	 *
+	 */
+	struct {
+		struct rb_node rb;
+		unsigned long rb_subtree_last;
+	} shared;
+#ifdef CONFIG_ANON_VMA_NAME
+	/*
+	 * For private and shared anonymous mappings, a pointer to a null
+	 * terminated string containing the name given to the vma, or NULL if
+	 * unnamed. Serialized by mmap_lock. Use anon_vma_name to access.
+	 */
+	struct anon_vma_name *anon_name;
+#endif
+};
+
+#endif	/* __TOOLS_LINUX_MM_TYPES_H */
diff --git a/tools/testing/anon_vma/linux/mmzone.h b/tools/testing/anon_vma/linux/mmzone.h
new file mode 100644
index 000000000000..e6be2d214777
--- /dev/null
+++ b/tools/testing/anon_vma/linux/mmzone.h
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef __TOOLS_LINUX_MMZONE_H
+#define __TOOLS_LINUX_MMZONE_H
+
+#endif /* __TOOLS_LINUX_MMZONE_H */
diff --git a/tools/testing/anon_vma/linux/rmap.h b/tools/testing/anon_vma/linux/rmap.h
new file mode 100644
index 000000000000..d29511ae0294
--- /dev/null
+++ b/tools/testing/anon_vma/linux/rmap.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef __TOOLS_LINUX_RMAP_H
+#define __TOOLS_LINUX_RMAP_H
+
+#include <linux/anon_vma.h>
+
+#endif /* __TOOLS_LINUX_RMAP_H */
diff --git a/tools/testing/shared/linux/anon_vma.h b/tools/testing/shared/linux/anon_vma.h
new file mode 100644
index 000000000000..a5bac325402a
--- /dev/null
+++ b/tools/testing/shared/linux/anon_vma.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _TEST_ANON_VMA_H
+#define _TEST_ANON_VMA_H
+
+#include "../../../../include/linux/anon_vma.h"
+
+#endif /* _TEST_ANON_VMA_H */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC Patch 3/5] anon_vma: add test for mergeable anon_vma
  2025-04-29  9:06 [RFC Patch 0/5] Make anon_vma operations testable Wei Yang
  2025-04-29  9:06 ` [RFC Patch 1/5] mm: move anon_vma manipulation functions to own file Wei Yang
  2025-04-29  9:06 ` [RFC Patch 2/5] anon_vma: add skeleton code for userland testing of anon_vma logic Wei Yang
@ 2025-04-29  9:06 ` Wei Yang
  2025-04-29  9:06 ` [RFC Patch 4/5] anon_vma: add test for reusable anon_vma Wei Yang
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 29+ messages in thread
From: Wei Yang @ 2025-04-29  9:06 UTC (permalink / raw)
  To: akpm
  Cc: david, lorenzo.stoakes, riel, vbabka, harry.yoo, jannh, baohua,
	linux-mm, Wei Yang

Add test to assert anon_vma is mergeable at the first level.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Barry Song <baohua@kernel.org>
---
 tools/testing/anon_vma/anon_vma.c          | 149 +++++++++++++++++++++
 tools/testing/anon_vma/anon_vma_internal.h |  35 ++++-
 2 files changed, 183 insertions(+), 1 deletion(-)

diff --git a/tools/testing/anon_vma/anon_vma.c b/tools/testing/anon_vma/anon_vma.c
index f3ca193857ec..2e6a1200e6c7 100644
--- a/tools/testing/anon_vma/anon_vma.c
+++ b/tools/testing/anon_vma/anon_vma.c
@@ -29,6 +29,7 @@ static struct mm_struct dummy_mm = {
 static int vmas_idx;
 static struct vm_area_struct *vmas[NUM_VMAS];
 static struct kmem_cache *vm_area_cachep;
+struct vm_area_struct *mergeable_vma = NULL;
 
 static void vma_ctor(void *data)
 {
@@ -79,6 +80,7 @@ void cleanup(void)
 	}
 
 	vmas_idx = 0;
+	mergeable_vma = NULL;
 }
 
 static bool test_simple_fault(void)
@@ -408,6 +410,152 @@ static bool test_fork_grand_child(void)
 	return true;
 }
 
+static bool test_mergeable_vma(void)
+{
+	struct vm_area_struct *root_vma, *child_vma, *vma1, *vma2;
+	struct anon_vma_chain *avc;
+	struct anon_vma *root_anon_vma;
+	DECLARE_BITMAP(expected, 10);
+	DECLARE_BITMAP(found, 10);
+
+	bitmap_zero(expected, 10);
+	bitmap_zero(found, 10);
+
+	/*
+	 *  root_anon_vma
+	 *  +-----------+
+	 *  |           |
+	 *  +-----------+
+	 *               \
+	 *                \
+	 *                 \   root_vma
+	 *                  \  +-----------+
+	 *                   > |           |
+	 *                     +-----------+
+	 */
+	root_vma = alloc_vma(0x3000, 0x5000, 3);
+	/* First fault on anonymous vma. */
+	__anon_vma_prepare(root_vma);
+	bitmap_set(expected, root_vma->index, 1);
+	root_anon_vma = root_vma->anon_vma;
+	ASSERT_NE(NULL, root_anon_vma);
+	ASSERT_EQ(1, root_anon_vma->num_active_vmas);
+
+	mergeable_vma = root_vma;
+
+	vma1 = alloc_vma(0x5000, 0x7000, 5);
+	/* First fault on next adjacent anonymous vma. */
+	/*
+	 *  root_anon_vma
+	 *  +-----------+
+	 *  |           |
+	 *  +-----------+
+	 *               \
+	 *                \------------------+
+	 *                 \   root_vma       \   vma1
+	 *                  \  +-----------+   \  +-----------+
+	 *                   > |           |    > |           |
+	 *                     +-----------+      +-----------+
+	 */
+	__anon_vma_prepare(vma1);
+	bitmap_set(expected, vma1->index, 1);
+	ASSERT_EQ(vma1->anon_vma, root_anon_vma);
+	ASSERT_EQ(2, root_anon_vma->num_active_vmas);
+
+	vma2 = alloc_vma(0x2000, 0x3000, 2);
+	/* First fault on previous adjacent anonymous vma. */
+	/*
+	 *  root_anon_vma
+	 *  +-----------+
+	 *  |           |
+	 *  +-----------+
+	 *               \
+	 *                \------------------+------------------+
+	 *                 \   vma2           \   root_vma       \   vma1
+	 *                  \  +-----------+   \  +-----------+   \  +-----------+
+	 *                   > |           |    > |           |    > |           |
+	 *                     +-----------+      +-----------+      +-----------+
+	 */
+	__anon_vma_prepare(vma2);
+	bitmap_set(expected, vma2->index, 1);
+	ASSERT_EQ(vma2->anon_vma, root_anon_vma);
+	ASSERT_EQ(3, root_anon_vma->num_active_vmas);
+	dump_anon_vma_interval_tree(root_anon_vma);
+
+	anon_vma_interval_tree_foreach(avc, &root_anon_vma->rb_root, 2, 7) {
+		bitmap_set(found, avc->vma->index, 1);
+	}
+	/* Expect to find all vmas in range [2, 7] */
+	ASSERT_TRUE(bitmap_equal(expected, found, 10));
+
+	/* Expect to find only root_vma in range [3, 4] */
+	anon_vma_interval_tree_foreach(avc, &root_anon_vma->rb_root, 3, 4) {
+		ASSERT_EQ(avc->vma, root_vma);
+	}
+
+	/* unmap adjacent vmas before fork */
+	unlink_anon_vmas(vma1);
+	unlink_anon_vmas(vma2);
+	ASSERT_EQ(1, root_anon_vma->num_active_vmas);
+
+	/* Fork a child from root_vma */
+	/*
+	 *  root_anon_vma      root_vma
+	 *  +-----------+      +-----------+
+	 *  |           | ---> |           |
+	 *  +-----------+      +-----------+
+	 *                \
+	 *                 \   child_vma
+	 *                  \  +-----------+
+	 *                   > |           |
+	 *                     +-----------+
+	 */
+	child_vma = alloc_vma(0x3000, 0x5000, 3);
+	anon_vma_fork(child_vma, root_vma);
+	ASSERT_NE(NULL, child_vma->anon_vma);
+	/* Parent/Root is root_vma->anon_vma */
+	ASSERT_EQ(child_vma->anon_vma->parent, root_vma->anon_vma);
+	ASSERT_EQ(child_vma->anon_vma->root, root_vma->anon_vma);
+
+	mergeable_vma = child_vma;
+
+	vma1 = alloc_vma(0x5000, 0x7000, 5);
+	/* First fault on next adjacent anonymous vma in child. */
+	/*
+	 *  vma1->anon_vma     vma1
+	 *  +-----------+      +-----------+
+	 *  |           | ---> |           |
+	 *  +-----------+      +-----------+
+	 */
+	__anon_vma_prepare(vma1);
+	ASSERT_NE(vma1->anon_vma, child_vma->anon_vma);
+	ASSERT_EQ(1, child_vma->anon_vma->num_active_vmas);
+	ASSERT_EQ(1, root_anon_vma->num_active_vmas);
+
+	vma2 = alloc_vma(0x2000, 0x3000, 2);
+	/* First fault on previous adjacent anonymous vma in child. */
+	/*
+	 *  vma2->anon_vma     vma2
+	 *  +-----------+      +-----------+
+	 *  |           | ---> |           |
+	 *  +-----------+      +-----------+
+	 */
+	__anon_vma_prepare(vma2);
+	ASSERT_NE(vma2->anon_vma, child_vma->anon_vma);
+	ASSERT_EQ(1, child_vma->anon_vma->num_active_vmas);
+	ASSERT_EQ(1, root_anon_vma->num_active_vmas);
+
+	/* Expect to find only 'child_vma' in range [2, 7] */
+	anon_vma_interval_tree_foreach(avc, &child_vma->anon_vma->rb_root, 2, 7) {
+		ASSERT_EQ(avc->vma, child_vma);
+	}
+
+	cleanup();
+
+	ASSERT_EQ(0, nr_allocated);
+	return true;
+}
+
 int main(void)
 {
 	int num_tests = 0, num_fail = 0;
@@ -428,6 +576,7 @@ int main(void)
 	TEST(simple_fork);
 	TEST(fork_two);
 	TEST(fork_grand_child);
+	TEST(mergeable_vma);
 
 #undef TEST
 
diff --git a/tools/testing/anon_vma/anon_vma_internal.h b/tools/testing/anon_vma/anon_vma_internal.h
index 296c1df71f7c..a761048a2f34 100644
--- a/tools/testing/anon_vma/anon_vma_internal.h
+++ b/tools/testing/anon_vma/anon_vma_internal.h
@@ -33,11 +33,44 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem)
 	return 0;
 }
 
-static inline struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *vma)
+extern struct vm_area_struct *mergeable_vma;
+
+static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct *b)
 {
+	return a->vm_end == b->vm_start &&
+		b->vm_pgoff == a->vm_pgoff + ((b->vm_start - a->vm_start) >> PAGE_SHIFT);
+}
+
+static struct anon_vma *reusable_anon_vma(struct vm_area_struct *old,
+					  struct vm_area_struct *a,
+					  struct vm_area_struct *b)
+{
+	if (anon_vma_compatible(a, b)) {
+		struct anon_vma *anon_vma = READ_ONCE(old->anon_vma);
+
+		if (anon_vma && list_is_singular(&old->anon_vma_chain))
+			return anon_vma;
+	}
 	return NULL;
 }
 
+static inline struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *vma)
+{
+	struct anon_vma *anon_vma = NULL;
+
+	if (!mergeable_vma)
+		return NULL;
+
+	/* Try next first. */
+	if (mergeable_vma->vm_start >= vma->vm_end) {
+		anon_vma = reusable_anon_vma(mergeable_vma, vma, mergeable_vma);
+		if (anon_vma)
+			return anon_vma;
+	}
+
+	return reusable_anon_vma(mergeable_vma, mergeable_vma, vma);
+}
+
 #ifndef pgoff_t
 #define pgoff_t unsigned long
 #endif
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC Patch 4/5] anon_vma: add test for reusable anon_vma
  2025-04-29  9:06 [RFC Patch 0/5] Make anon_vma operations testable Wei Yang
                   ` (2 preceding siblings ...)
  2025-04-29  9:06 ` [RFC Patch 3/5] anon_vma: add test for mergeable anon_vma Wei Yang
@ 2025-04-29  9:06 ` Wei Yang
  2025-04-29  9:06 ` [RFC Patch 5/5] anon_vma: add test to assert no double-reuse Wei Yang
  2025-04-29  9:31 ` [RFC Patch 0/5] Make anon_vma operations testable Lorenzo Stoakes
  5 siblings, 0 replies; 29+ messages in thread
From: Wei Yang @ 2025-04-29  9:06 UTC (permalink / raw)
  To: akpm
  Cc: david, lorenzo.stoakes, riel, vbabka, harry.yoo, jannh, baohua,
	linux-mm, Wei Yang

Add test to assert anon_vma is reusable when there is no active vma,
except root anon_vma.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Harry Yoo <harry.yoo@oracle.com>
---
 tools/testing/anon_vma/anon_vma.c | 141 ++++++++++++++++++++++++++++++
 1 file changed, 141 insertions(+)

diff --git a/tools/testing/anon_vma/anon_vma.c b/tools/testing/anon_vma/anon_vma.c
index 2e6a1200e6c7..495cd02ea661 100644
--- a/tools/testing/anon_vma/anon_vma.c
+++ b/tools/testing/anon_vma/anon_vma.c
@@ -556,6 +556,146 @@ static bool test_mergeable_vma(void)
 	return true;
 }
 
+static bool test_reuse_anon_vma(void)
+{
+	struct vm_area_struct *root_vma, *vma, *vma1, *vma2;
+	struct anon_vma *root_anon_vma, *reused_anon_vma;
+	struct anon_vma_chain *avc;
+
+	/*
+	 *  root_anon_vma      root_vma
+	 *  +-----------+      +-----------+
+	 *  |           | ---> |         av| = root_anon_vma
+	 *  +-----------+      +-----------+
+	 */
+	root_vma = alloc_vma(0x3000, 0x5000, 3);
+	__anon_vma_prepare(root_vma);
+	root_anon_vma = root_vma->anon_vma;
+	ASSERT_NE(NULL, root_anon_vma);
+	ASSERT_EQ(1, root_anon_vma->num_active_vmas);
+
+	/* First fork */
+	/*
+	 *  root_anon_vma      root_vma
+	 *  +-----------+      +-----------+
+	 *  |           | ---> |         av| = root_anon_vma
+	 *  +-----------+      +-----------+
+	 *                \
+	 *                 \   vma
+	 *                  \  +-----------+
+	 *                   > |         av| != root_anon_vma
+	 *                     +-----------+
+	 */
+	vma = alloc_vma(0x3000, 0x5000, 3);
+	anon_vma_fork(vma, root_vma);
+	ASSERT_NE(NULL, vma->anon_vma);
+	/* Parent/Root is root_vma->anon_vma */
+	ASSERT_EQ(vma->anon_vma->parent, root_vma->anon_vma);
+	ASSERT_EQ(vma->anon_vma->root, root_vma->anon_vma);
+
+	/* unlink the root */
+	/*
+	 *  root_anon_vma
+	 *  +-----------+
+	 *  |           |
+	 *  +-----------+
+	 *                \
+	 *                 \   vma
+	 *                  \  +-----------+
+	 *                   > |         av| != root_anon_vma
+	 *                     +-----------+
+	 */
+	unlink_anon_vmas(root_vma);
+	ASSERT_EQ(0, root_anon_vma->num_active_vmas);
+
+	/* Fork grand child from vma */
+	/*
+	 *  root_anon_vma
+	 *  +-----------+
+	 *  |           |
+	 *  +-----------+
+	 *                \
+	 *                |\   vma
+	 *                | \  +-----------+
+	 *                |  > |         av| != root_anon_vma
+	 *                |    +-----------+
+	 *                \
+	 *                 \   vma1
+	 *                  \  +-----------+
+	 *                   > |         av| != root_anon_vma
+	 *                     +-----------+
+	 */
+	vma1 = alloc_vma(0x3000, 0x5000, 3);
+	anon_vma_fork(vma1, vma);
+	ASSERT_NE(NULL, vma1->anon_vma);
+	/* Root is root_anon_vma */
+	ASSERT_EQ(vma1->anon_vma->root, root_anon_vma);
+	/* Parent is vma1->anon_vma */
+	ASSERT_EQ(vma1->anon_vma->parent, vma->anon_vma);
+	/* vma1->anon_vma != root_anon_vma, since we don't reuse root */
+	ASSERT_NE(vma1->anon_vma, root_anon_vma);
+
+	/* unlink vma */
+	/*
+	 *  root_anon_vma
+	 *  +-----------+
+	 *  |           |
+	 *  +-----------+
+	 *                \
+	 *                |
+	 *                \
+	 *                 \   vma1
+	 *                  \  +-----------+
+	 *                   > |         av| != root_anon_vma
+	 *                     +-----------+
+	 */
+	reused_anon_vma = vma->anon_vma;
+	unlink_anon_vmas(vma);
+	ASSERT_EQ(0, reused_anon_vma->num_active_vmas);
+
+	/* Fork from vma1 */
+	/*
+	 *  root_anon_vma
+	 *  +-----------+
+	 *  |           |
+	 *  +-----------+
+	 *                \
+	 *                |
+	 *                \
+	 *                |\   vma1
+	 *                | \  +-----------+
+	 *                |  > |         av| != root_anon_vma
+	 *                |    +-----------+
+	 *                \
+	 *                 \   vma2
+	 *                  \  +-----------+
+	 *                   > |         av| == reused_anon_vma
+	 *                     +-----------+
+	 */
+	vma2 = alloc_vma(0x3000, 0x5000, 3);
+	anon_vma_fork(vma2, vma1);
+	ASSERT_NE(NULL, vma2->anon_vma);
+	/* Root is root_vma->anon_vma */
+	ASSERT_EQ(vma2->anon_vma->root, root_anon_vma);
+	/* vma->anon_vma (reused_anon_vma) is reused here */
+	ASSERT_EQ(vma2->anon_vma, reused_anon_vma);
+
+	/* Expect to find vma1 and vma2 in reused_anon_vma */
+	anon_vma_interval_tree_foreach(avc, &reused_anon_vma->rb_root, 3, 4) {
+		ASSERT_TRUE(avc->vma == vma1 || avc->vma == vma2);
+	}
+
+	/* Expect to find vma1 and vma2 in root_anon_vma */
+	anon_vma_interval_tree_foreach(avc, &root_anon_vma->rb_root, 3, 4) {
+		ASSERT_TRUE(avc->vma == vma1 || avc->vma == vma2);
+	}
+
+	cleanup();
+
+	ASSERT_EQ(0, nr_allocated);
+	return true;
+}
+
 int main(void)
 {
 	int num_tests = 0, num_fail = 0;
@@ -577,6 +717,7 @@ int main(void)
 	TEST(fork_two);
 	TEST(fork_grand_child);
 	TEST(mergeable_vma);
+	TEST(reuse_anon_vma);
 
 #undef TEST
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC Patch 5/5] anon_vma: add test to assert no double-reuse
  2025-04-29  9:06 [RFC Patch 0/5] Make anon_vma operations testable Wei Yang
                   ` (3 preceding siblings ...)
  2025-04-29  9:06 ` [RFC Patch 4/5] anon_vma: add test for reusable anon_vma Wei Yang
@ 2025-04-29  9:06 ` Wei Yang
  2025-04-29  9:31 ` [RFC Patch 0/5] Make anon_vma operations testable Lorenzo Stoakes
  5 siblings, 0 replies; 29+ messages in thread
From: Wei Yang @ 2025-04-29  9:06 UTC (permalink / raw)
  To: akpm
  Cc: david, lorenzo.stoakes, riel, vbabka, harry.yoo, jannh, baohua,
	linux-mm, Wei Yang

commit 2555283eb40d ("mm/rmap: Fix anon_vma->degree ambiguity leading to
double-reuse") fixed anon_vma double-reuse issue introduced by commit
7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy").

Add a test case to assert no double-reuse.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Harry Yoo <harry.yoo@oracle.com>
---
 tools/testing/anon_vma/anon_vma.c | 47 ++++++++++++++++++++++++++++++-
 1 file changed, 46 insertions(+), 1 deletion(-)

diff --git a/tools/testing/anon_vma/anon_vma.c b/tools/testing/anon_vma/anon_vma.c
index 495cd02ea661..633a62f5acba 100644
--- a/tools/testing/anon_vma/anon_vma.c
+++ b/tools/testing/anon_vma/anon_vma.c
@@ -558,7 +558,7 @@ static bool test_mergeable_vma(void)
 
 static bool test_reuse_anon_vma(void)
 {
-	struct vm_area_struct *root_vma, *vma, *vma1, *vma2;
+	struct vm_area_struct *root_vma, *vma, *vma1, *vma2, *vma3;
 	struct anon_vma *root_anon_vma, *reused_anon_vma;
 	struct anon_vma_chain *avc;
 
@@ -690,6 +690,51 @@ static bool test_reuse_anon_vma(void)
 		ASSERT_TRUE(avc->vma == vma1 || avc->vma == vma2);
 	}
 
+	/* Fork from vma2 */
+	/*
+	 * When commit 7a3ef208e662 ("mm: prevent endless growth of anon_vma
+	 * hierarchy") introduce anon_vma reuse, it embedded an issue of
+	 * double-reuse.
+	 *
+	 * It happens when vma2 reuse an existing anon_vma and we fork from
+	 * vma2 later. Before commit 2555283eb40d ("mm/rmap: Fix
+	 * anon_vma->degree ambiguity leading to double-reuse"), the forked
+	 * vma3 would reuse vma1->anon_vma, which is already in use.
+	 *
+	 * The following case assert vma1->anon_vma will not double-reuse.
+	 *
+	 *  root_anon_vma
+	 *  +-----------+
+	 *  |           |
+	 *  +-----------+
+	 *                \
+	 *                |
+	 *                \
+	 *                |\   vma1
+	 *                | \  +-----------+
+	 *                |  > |         av| != root_anon_vma
+	 *                |    +-----------+
+	 *                \
+	 *                |\   vma2
+	 *                | \  +-----------+
+	 *                |  > |         av| == reused_anon_vma
+	 *                |    +-----------+
+	 *                \
+	 *                 \   vma3
+	 *                  \  +-----------+
+	 *                   > |         av| != vma1->anon_vma
+	 *                     +-----------+
+	 */
+	/* vma1->anon_vma already has active vma */
+	ASSERT_NE(NULL, vma1->anon_vma);
+	vma3 = alloc_vma(0x3000, 0x5000, 3);
+	anon_vma_fork(vma3, vma2);
+	ASSERT_NE(NULL, vma3->anon_vma);
+	/* Root is root_vma->anon_vma */
+	ASSERT_EQ(vma3->anon_vma->root, root_anon_vma);
+	/* vma1->anon_vma is NOT reused here */
+	ASSERT_NE(vma3->anon_vma, vma1->anon_vma);
+
 	cleanup();
 
 	ASSERT_EQ(0, nr_allocated);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-04-29  9:06 [RFC Patch 0/5] Make anon_vma operations testable Wei Yang
                   ` (4 preceding siblings ...)
  2025-04-29  9:06 ` [RFC Patch 5/5] anon_vma: add test to assert no double-reuse Wei Yang
@ 2025-04-29  9:31 ` Lorenzo Stoakes
  2025-04-29  9:38   ` David Hildenbrand
  2025-04-29 23:15   ` Wei Yang
  5 siblings, 2 replies; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-04-29  9:31 UTC (permalink / raw)
  To: Wei Yang; +Cc: akpm, david, riel, vbabka, harry.yoo, jannh, baohua, linux-mm

Wei,

NACK the whole series.

I'm really not sure how to get through to you. You were _explicitly_
advised not to send this series. And yet you've sent it anyway.

I mean, I appreciate your enthusiasm and the fact you've made tests here
etc. obviously. And you've clearly put a TON of work in. But I just don't
know why you would when explicitly told not to without at least discussing
it first?

This just isn't a great way of interacting with the community. We're all
human, please try to have some empathy for others here, as I really do try
to have with you as best I can.

This adds a ton of churn and LOCKS IN assumptions about how anon_vma works,
clashes with other series (most notably series I've been working on), takes
away from efforts I want to make to start to join file-backed and anon
reverse mapping logic, separates the two in such a way as to encourage this
to nonly grow and generally isn't conducive to where I want to go with
rmap.

This is part of why I explicitly told you please don't go down this road,
because you're likely to end up doing work that doesn't get used. It's not
a great use of your time either.

Since there's something useful here in tests, I may at a later date come
back to those.

So in order for it not to be an _entirely_ wasted effort, I will come back
to this later when the time is right and progress is made with rmap, and
see if we can extract some value from the testing.

Lorenzo

On Tue, Apr 29, 2025 at 09:06:34AM +0000, Wei Yang wrote:
> There are several anon_vma manipulation functions implemented in mm/rmap.c,
> those concerning anon_vma preparing, cloning and forking, which logically
> could be stand-alone.
>
> This patch series isolates anon_vma manipulation functionality into its own
> file, mm/anon_vma.c, and provides an API to the rest of the kernel in
> include/linux/anon_vma.h.
>
> It also introduce mm/anon_vma_internal.h, which specifies which headers need
> to be imported by anon_vma.c, leading to the very useful property that
> anon_vma.c depends only on include/linux/anon_vma.h and
> mm/anon_vma_internal.h.
>
> This means we can then re-implement anon_vma_internal.h in userland, adding
> shims for kernel mechanisms as required, allowing us to unit test internal
> anon_vma functionality.
>
> This patch series takes advantage of existing shim logic and full userland
> interval tree support contained in tools/testing/rbtree/ and
> tools/include/linux/.
>
> Kernel functionality is stubbed and shimmed as needed in
> tools/testing/anon_vma/ which contains a fully functional userland
> anon_vma_internal.h file and which imports mm/anon_vma.c to be directly tested
> from userland.
>
> Patch 1 split anon_vma related logic to mm/anon_vma.c
> Patch 2 add a simple skeleton testing on simple fault and fork
> Patch 3/4 add tests for mergeable and reusable anon_vma
> Patch 5 assert the anon_vma double-reuse is fixed
>
> Wei Yang (5):
>   mm: move anon_vma manipulation functions to own file
>   anon_vma: add skeleton code for userland testing of anon_vma logic
>   anon_vma: add test for mergeable anon_vma
>   anon_vma: add test for reusable anon_vma
>   anon_vma: add test to assert no double-reuse
>
>  MAINTAINERS                                |   3 +
>  include/linux/anon_vma.h                   | 163 +++++
>  include/linux/rmap.h                       | 147 +---
>  mm/Makefile                                |   2 +-
>  mm/anon_vma.c                              | 396 +++++++++++
>  mm/anon_vma_internal.h                     |  14 +
>  mm/rmap.c                                  | 391 -----------
>  tools/include/linux/rwsem.h                |  10 +
>  tools/include/linux/slab.h                 |   4 +
>  tools/testing/anon_vma/.gitignore          |   3 +
>  tools/testing/anon_vma/Makefile            |  25 +
>  tools/testing/anon_vma/anon_vma.c          | 773 +++++++++++++++++++++
>  tools/testing/anon_vma/anon_vma_internal.h |  88 +++
>  tools/testing/anon_vma/interval_tree.c     |  53 ++
>  tools/testing/anon_vma/linux/atomic.h      |  18 +
>  tools/testing/anon_vma/linux/fs.h          |   6 +
>  tools/testing/anon_vma/linux/mm.h          |  44 ++
>  tools/testing/anon_vma/linux/mm_types.h    |  57 ++
>  tools/testing/anon_vma/linux/mmzone.h      |   6 +
>  tools/testing/anon_vma/linux/rmap.h        |   8 +
>  tools/testing/shared/linux/anon_vma.h      |   7 +
>  21 files changed, 1680 insertions(+), 538 deletions(-)
>  create mode 100644 include/linux/anon_vma.h
>  create mode 100644 mm/anon_vma.c
>  create mode 100644 mm/anon_vma_internal.h
>  create mode 100644 tools/testing/anon_vma/.gitignore
>  create mode 100644 tools/testing/anon_vma/Makefile
>  create mode 100644 tools/testing/anon_vma/anon_vma.c
>  create mode 100644 tools/testing/anon_vma/anon_vma_internal.h
>  create mode 100644 tools/testing/anon_vma/interval_tree.c
>  create mode 100644 tools/testing/anon_vma/linux/atomic.h
>  create mode 100644 tools/testing/anon_vma/linux/fs.h
>  create mode 100644 tools/testing/anon_vma/linux/mm.h
>  create mode 100644 tools/testing/anon_vma/linux/mm_types.h
>  create mode 100644 tools/testing/anon_vma/linux/mmzone.h
>  create mode 100644 tools/testing/anon_vma/linux/rmap.h
>  create mode 100644 tools/testing/shared/linux/anon_vma.h
>
> --
> 2.34.1
>
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-04-29  9:31 ` [RFC Patch 0/5] Make anon_vma operations testable Lorenzo Stoakes
@ 2025-04-29  9:38   ` David Hildenbrand
  2025-04-29  9:41     ` Lorenzo Stoakes
  2025-04-29 23:15   ` Wei Yang
  1 sibling, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2025-04-29  9:38 UTC (permalink / raw)
  To: Lorenzo Stoakes, Wei Yang
  Cc: akpm, riel, vbabka, harry.yoo, jannh, baohua, linux-mm

On 29.04.25 11:31, Lorenzo Stoakes wrote:
> Wei,
> 
> NACK the whole series.
> 
> I'm really not sure how to get through to you. You were _explicitly_
> advised not to send this series. And yet you've sent it anyway.
> 
> I mean, I appreciate your enthusiasm and the fact you've made tests here
> etc. obviously. And you've clearly put a TON of work in. But I just don't
> know why you would when explicitly told not to without at least discussing
> it first?
> 
> This just isn't a great way of interacting with the community. We're all
> human, please try to have some empathy for others here, as I really do try
> to have with you as best I can.
> 
> This adds a ton of churn and LOCKS IN assumptions about how anon_vma works,
> clashes with other series (most notably series I've been working on), takes
> away from efforts I want to make to start to join file-backed and anon
> reverse mapping logic, separates the two in such a way as to encourage this
> to nonly grow and generally isn't conducive to where I want to go with
> rmap.

anon_vma, the unloved child. :)

I would love to see a simplification that makes it less special, and I 
can understand how adding tests for the ways it is special can be 
counter-productive.

> 
> This is part of why I explicitly told you please don't go down this road,
> because you're likely to end up doing work that doesn't get used. It's not
> a great use of your time either.
> 
> Since there's something useful here in tests, I may at a later date come
> back to those.

Agreed, skimming over the tests there are some nice diagrams and cases.

But I would hope that for most of these cases we could test on a higher 
level: test our expectations when running real programs that we want to 
check, especially when performing internal changes on how we handle anon 
memory + rmap.

E.g., do fork(), then test if we can successfully perform rmap 
lookups/updates (e.g., migrate folio to a different numa node etc).

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-04-29  9:38   ` David Hildenbrand
@ 2025-04-29  9:41     ` Lorenzo Stoakes
  2025-04-29 23:56       ` Wei Yang
  0 siblings, 1 reply; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-04-29  9:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, akpm, riel, vbabka, harry.yoo, jannh, baohua, linux-mm

On Tue, Apr 29, 2025 at 11:38:23AM +0200, David Hildenbrand wrote:
> On 29.04.25 11:31, Lorenzo Stoakes wrote:
> > Wei,
> >
> > NACK the whole series.
> >
> > I'm really not sure how to get through to you. You were _explicitly_
> > advised not to send this series. And yet you've sent it anyway.
> >
> > I mean, I appreciate your enthusiasm and the fact you've made tests here
> > etc. obviously. And you've clearly put a TON of work in. But I just don't
> > know why you would when explicitly told not to without at least discussing
> > it first?
> >
> > This just isn't a great way of interacting with the community. We're all
> > human, please try to have some empathy for others here, as I really do try
> > to have with you as best I can.
> >
> > This adds a ton of churn and LOCKS IN assumptions about how anon_vma works,
> > clashes with other series (most notably series I've been working on), takes
> > away from efforts I want to make to start to join file-backed and anon
> > reverse mapping logic, separates the two in such a way as to encourage this
> > to nonly grow and generally isn't conducive to where I want to go with
> > rmap.
>
> anon_vma, the unloved child. :)
>
> I would love to see a simplification that makes it less special, and I can
> understand how adding tests for the ways it is special can be
> counter-productive.
>
> >
> > This is part of why I explicitly told you please don't go down this road,
> > because you're likely to end up doing work that doesn't get used. It's not
> > a great use of your time either.
> >
> > Since there's something useful here in tests, I may at a later date come
> > back to those.
>
> Agreed, skimming over the tests there are some nice diagrams and cases.
>
> But I would hope that for most of these cases we could test on a higher
> level: test our expectations when running real programs that we want to
> check, especially when performing internal changes on how we handle anon
> memory + rmap.
>
> E.g., do fork(), then test if we can successfully perform rmap
> lookups/updates (e.g., migrate folio to a different numa node etc).
>

That's a great point! Wei - if you could look at making some self-tests
(i.e. that live in tools/testing/selftests/mm) that try to recreate _real_
scenarios that use the rmap like this and assert correct behaviour there,
that could be a positive way of moving forward with this.

We'd be absolutely happy to take patches like that!

> --
> Cheers,
>
> David / dhildenb
>
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-04-29  9:31 ` [RFC Patch 0/5] Make anon_vma operations testable Lorenzo Stoakes
  2025-04-29  9:38   ` David Hildenbrand
@ 2025-04-29 23:15   ` Wei Yang
  2025-04-30 14:38     ` Lorenzo Stoakes
  1 sibling, 1 reply; 29+ messages in thread
From: Wei Yang @ 2025-04-29 23:15 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Wei Yang, akpm, david, riel, vbabka, harry.yoo, jannh, baohua,
	linux-mm

On Tue, Apr 29, 2025 at 10:31:07AM +0100, Lorenzo Stoakes wrote:
>Wei,
>
>NACK the whole series.
>
>I'm really not sure how to get through to you. You were _explicitly_
>advised not to send this series. And yet you've sent it anyway.
>
>I mean, I appreciate your enthusiasm and the fact you've made tests here
>etc. obviously. And you've clearly put a TON of work in. But I just don't
>know why you would when explicitly told not to without at least discussing
>it first?
>

Would you minding letting me know what is a preferred way of discussion?

Send a question to mail list?

---
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-04-29  9:41     ` Lorenzo Stoakes
@ 2025-04-29 23:56       ` Wei Yang
  2025-04-30  7:47         ` David Hildenbrand
  0 siblings, 1 reply; 29+ messages in thread
From: Wei Yang @ 2025-04-29 23:56 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: David Hildenbrand, Wei Yang, akpm, riel, vbabka, harry.yoo, jannh,
	baohua, linux-mm

On Tue, Apr 29, 2025 at 10:41:27AM +0100, Lorenzo Stoakes wrote:
>On Tue, Apr 29, 2025 at 11:38:23AM +0200, David Hildenbrand wrote:
>> On 29.04.25 11:31, Lorenzo Stoakes wrote:
>> > Wei,
>> >
>> > NACK the whole series.
>> >
>> > I'm really not sure how to get through to you. You were _explicitly_
>> > advised not to send this series. And yet you've sent it anyway.
>> >
>> > I mean, I appreciate your enthusiasm and the fact you've made tests here
>> > etc. obviously. And you've clearly put a TON of work in. But I just don't
>> > know why you would when explicitly told not to without at least discussing
>> > it first?
>> >
>> > This just isn't a great way of interacting with the community. We're all
>> > human, please try to have some empathy for others here, as I really do try
>> > to have with you as best I can.
>> >
>> > This adds a ton of churn and LOCKS IN assumptions about how anon_vma works,
>> > clashes with other series (most notably series I've been working on), takes
>> > away from efforts I want to make to start to join file-backed and anon
>> > reverse mapping logic, separates the two in such a way as to encourage this
>> > to nonly grow and generally isn't conducive to where I want to go with
>> > rmap.
>>
>> anon_vma, the unloved child. :)
>>
>> I would love to see a simplification that makes it less special, and I can
>> understand how adding tests for the ways it is special can be
>> counter-productive.
>>
>> >
>> > This is part of why I explicitly told you please don't go down this road,
>> > because you're likely to end up doing work that doesn't get used. It's not
>> > a great use of your time either.
>> >
>> > Since there's something useful here in tests, I may at a later date come
>> > back to those.
>>
>> Agreed, skimming over the tests there are some nice diagrams and cases.
>>
>> But I would hope that for most of these cases we could test on a higher
>> level: test our expectations when running real programs that we want to
>> check, especially when performing internal changes on how we handle anon
>> memory + rmap.
>>
>> E.g., do fork(), then test if we can successfully perform rmap
>> lookups/updates (e.g., migrate folio to a different numa node etc).
>>
>
>That's a great point! Wei - if you could look at making some self-tests
>(i.e. that live in tools/testing/selftests/mm) that try to recreate _real_
>scenarios that use the rmap like this and assert correct behaviour there,
>that could be a positive way of moving forward with this.
>

I am trying to understand what scenarios you want.

Something like below?

  * fork and migrate a range in child
  * fork/unmap in parent and migrate a range in child

If the operation is successful, then we are good, right?

>We'd be absolutely happy to take patches like that!
>
>> --
>> Cheers,
>>
>> David / dhildenb
>>
>>
>
>Cheers, Lorenzo

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-04-29 23:56       ` Wei Yang
@ 2025-04-30  7:47         ` David Hildenbrand
  2025-04-30 15:44           ` Wei Yang
  2025-05-14  1:23           ` Wei Yang
  0 siblings, 2 replies; 29+ messages in thread
From: David Hildenbrand @ 2025-04-30  7:47 UTC (permalink / raw)
  To: Wei Yang, Lorenzo Stoakes
  Cc: akpm, riel, vbabka, harry.yoo, jannh, baohua, linux-mm

On 30.04.25 01:56, Wei Yang wrote:
> On Tue, Apr 29, 2025 at 10:41:27AM +0100, Lorenzo Stoakes wrote:
>> On Tue, Apr 29, 2025 at 11:38:23AM +0200, David Hildenbrand wrote:
>>> On 29.04.25 11:31, Lorenzo Stoakes wrote:
>>>> Wei,
>>>>
>>>> NACK the whole series.
>>>>
>>>> I'm really not sure how to get through to you. You were _explicitly_
>>>> advised not to send this series. And yet you've sent it anyway.
>>>>
>>>> I mean, I appreciate your enthusiasm and the fact you've made tests here
>>>> etc. obviously. And you've clearly put a TON of work in. But I just don't
>>>> know why you would when explicitly told not to without at least discussing
>>>> it first?
>>>>
>>>> This just isn't a great way of interacting with the community. We're all
>>>> human, please try to have some empathy for others here, as I really do try
>>>> to have with you as best I can.
>>>>
>>>> This adds a ton of churn and LOCKS IN assumptions about how anon_vma works,
>>>> clashes with other series (most notably series I've been working on), takes
>>>> away from efforts I want to make to start to join file-backed and anon
>>>> reverse mapping logic, separates the two in such a way as to encourage this
>>>> to nonly grow and generally isn't conducive to where I want to go with
>>>> rmap.
>>>
>>> anon_vma, the unloved child. :)
>>>
>>> I would love to see a simplification that makes it less special, and I can
>>> understand how adding tests for the ways it is special can be
>>> counter-productive.
>>>
>>>>
>>>> This is part of why I explicitly told you please don't go down this road,
>>>> because you're likely to end up doing work that doesn't get used. It's not
>>>> a great use of your time either.
>>>>
>>>> Since there's something useful here in tests, I may at a later date come
>>>> back to those.
>>>
>>> Agreed, skimming over the tests there are some nice diagrams and cases.
>>>
>>> But I would hope that for most of these cases we could test on a higher
>>> level: test our expectations when running real programs that we want to
>>> check, especially when performing internal changes on how we handle anon
>>> memory + rmap.
>>>
>>> E.g., do fork(), then test if we can successfully perform rmap
>>> lookups/updates (e.g., migrate folio to a different numa node etc).
>>>
>>
>> That's a great point! Wei - if you could look at making some self-tests
>> (i.e. that live in tools/testing/selftests/mm) that try to recreate _real_
>> scenarios that use the rmap like this and assert correct behaviour there,
>> that could be a positive way of moving forward with this.
>>
> 
> I am trying to understand what scenarios you want.

That is exactly the task to figure out: how can we actually test our 
rmap implementation from a higher level. The example regarding fork and 
migration is possibly a low-hanging fruit.

We might already have the functionality to achieve it, *maybe* we'd even 
want some extensions to make it all even easier to test.

For example, MADV_PAGEOUT is refused on folios that are mapped into 
multiple processes. Maybe we'd want the option to *still* page it out, 
just like MPOL_MF_MOVE_ALL allows with CAP_SYS_NICE to *still* migrate a 
folio that is mapped into multiple processes.

Some rmap tests could make sense for both, anon and pagecache folios.

> 
> Something like below?
> 
>    * fork and migrate a range in child
>    * fork/unmap in parent and migrate a range in child
> 
> If the operation is successful, then we are good, right?

Yes. And one can come up with a bunch of similar rmap test cases, like 
doing a partial mremap() of a THP, then testing if the rmap walk still 
works as expected, pairing the whole thing with for etc.

One "problem" here is that even with MPOL_MF_MOVE_ALL,
move_pages() will not move a folio if it already resides on the target 
node. So one always needs two NUMA nodes, which is a bit suboptimal for 
testing purposes.

For testing purposes, it could have been helpful a couple of times 
already to just have a way of migrating a folio even if it already 
resides on the expected node.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-04-29 23:15   ` Wei Yang
@ 2025-04-30 14:38     ` Lorenzo Stoakes
  2025-04-30 15:41       ` Wei Yang
  0 siblings, 1 reply; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-04-30 14:38 UTC (permalink / raw)
  To: Wei Yang; +Cc: akpm, david, riel, vbabka, harry.yoo, jannh, baohua, linux-mm

On Tue, Apr 29, 2025 at 11:15:03PM +0000, Wei Yang wrote:
> On Tue, Apr 29, 2025 at 10:31:07AM +0100, Lorenzo Stoakes wrote:
> >Wei,
> >
> >NACK the whole series.
> >
> >I'm really not sure how to get through to you. You were _explicitly_
> >advised not to send this series. And yet you've sent it anyway.
> >
> >I mean, I appreciate your enthusiasm and the fact you've made tests here
> >etc. obviously. And you've clearly put a TON of work in. But I just don't
> >know why you would when explicitly told not to without at least discussing
> >it first?
> >
>
> Would you minding letting me know what is a preferred way of discussion?
>
> Send a question to mail list?

Yeah, you can use [DISCUSSION] tags for such things.

>
> ---
> Wei Yang
> Help you, Help me
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-04-30 14:38     ` Lorenzo Stoakes
@ 2025-04-30 15:41       ` Wei Yang
  0 siblings, 0 replies; 29+ messages in thread
From: Wei Yang @ 2025-04-30 15:41 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Wei Yang, akpm, david, riel, vbabka, harry.yoo, jannh, baohua,
	linux-mm

On Wed, Apr 30, 2025 at 03:38:59PM +0100, Lorenzo Stoakes wrote:
>On Tue, Apr 29, 2025 at 11:15:03PM +0000, Wei Yang wrote:
>> On Tue, Apr 29, 2025 at 10:31:07AM +0100, Lorenzo Stoakes wrote:
>> >Wei,
>> >
>> >NACK the whole series.
>> >
>> >I'm really not sure how to get through to you. You were _explicitly_
>> >advised not to send this series. And yet you've sent it anyway.
>> >
>> >I mean, I appreciate your enthusiasm and the fact you've made tests here
>> >etc. obviously. And you've clearly put a TON of work in. But I just don't
>> >know why you would when explicitly told not to without at least discussing
>> >it first?
>> >
>>
>> Would you minding letting me know what is a preferred way of discussion?
>>
>> Send a question to mail list?
>
>Yeah, you can use [DISCUSSION] tags for such things.
>

Thanks for your information.

>>
>> ---
>> Wei Yang
>> Help you, Help me
>>

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-04-30  7:47         ` David Hildenbrand
@ 2025-04-30 15:44           ` Wei Yang
  2025-04-30 21:36             ` David Hildenbrand
  2025-05-14  1:23           ` Wei Yang
  1 sibling, 1 reply; 29+ messages in thread
From: Wei Yang @ 2025-04-30 15:44 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, Lorenzo Stoakes, akpm, riel, vbabka, harry.yoo, jannh,
	baohua, linux-mm

On Wed, Apr 30, 2025 at 09:47:16AM +0200, David Hildenbrand wrote:
>On 30.04.25 01:56, Wei Yang wrote:
>> On Tue, Apr 29, 2025 at 10:41:27AM +0100, Lorenzo Stoakes wrote:
>> > On Tue, Apr 29, 2025 at 11:38:23AM +0200, David Hildenbrand wrote:
>> > > On 29.04.25 11:31, Lorenzo Stoakes wrote:
>> > > > Wei,
>> > > > 
>> > > > NACK the whole series.
>> > > > 
>> > > > I'm really not sure how to get through to you. You were _explicitly_
>> > > > advised not to send this series. And yet you've sent it anyway.
>> > > > 
>> > > > I mean, I appreciate your enthusiasm and the fact you've made tests here
>> > > > etc. obviously. And you've clearly put a TON of work in. But I just don't
>> > > > know why you would when explicitly told not to without at least discussing
>> > > > it first?
>> > > > 
>> > > > This just isn't a great way of interacting with the community. We're all
>> > > > human, please try to have some empathy for others here, as I really do try
>> > > > to have with you as best I can.
>> > > > 
>> > > > This adds a ton of churn and LOCKS IN assumptions about how anon_vma works,
>> > > > clashes with other series (most notably series I've been working on), takes
>> > > > away from efforts I want to make to start to join file-backed and anon
>> > > > reverse mapping logic, separates the two in such a way as to encourage this
>> > > > to nonly grow and generally isn't conducive to where I want to go with
>> > > > rmap.
>> > > 
>> > > anon_vma, the unloved child. :)
>> > > 
>> > > I would love to see a simplification that makes it less special, and I can
>> > > understand how adding tests for the ways it is special can be
>> > > counter-productive.
>> > > 
>> > > > 
>> > > > This is part of why I explicitly told you please don't go down this road,
>> > > > because you're likely to end up doing work that doesn't get used. It's not
>> > > > a great use of your time either.
>> > > > 
>> > > > Since there's something useful here in tests, I may at a later date come
>> > > > back to those.
>> > > 
>> > > Agreed, skimming over the tests there are some nice diagrams and cases.
>> > > 
>> > > But I would hope that for most of these cases we could test on a higher
>> > > level: test our expectations when running real programs that we want to
>> > > check, especially when performing internal changes on how we handle anon
>> > > memory + rmap.
>> > > 
>> > > E.g., do fork(), then test if we can successfully perform rmap
>> > > lookups/updates (e.g., migrate folio to a different numa node etc).
>> > > 
>> > 
>> > That's a great point! Wei - if you could look at making some self-tests
>> > (i.e. that live in tools/testing/selftests/mm) that try to recreate _real_
>> > scenarios that use the rmap like this and assert correct behaviour there,
>> > that could be a positive way of moving forward with this.
>> > 
>> 
>> I am trying to understand what scenarios you want.
>
>That is exactly the task to figure out: how can we actually test our rmap
>implementation from a higher level. The example regarding fork and migration
>is possibly a low-hanging fruit.
>
>We might already have the functionality to achieve it, *maybe* we'd even want
>some extensions to make it all even easier to test.
>
>For example, MADV_PAGEOUT is refused on folios that are mapped into multiple
>processes. Maybe we'd want the option to *still* page it out, just like
>MPOL_MF_MOVE_ALL allows with CAP_SYS_NICE to *still* migrate a folio that is
>mapped into multiple processes.
>
>Some rmap tests could make sense for both, anon and pagecache folios.
>
>> 
>> Something like below?
>> 
>>    * fork and migrate a range in child
>>    * fork/unmap in parent and migrate a range in child
>> 
>> If the operation is successful, then we are good, right?
>
>Yes. And one can come up with a bunch of similar rmap test cases, like doing
>a partial mremap() of a THP, then testing if the rmap walk still works as
>expected, pairing the whole thing with for etc.
>
>One "problem" here is that even with MPOL_MF_MOVE_ALL,
>move_pages() will not move a folio if it already resides on the target node.
>So one always needs two NUMA nodes, which is a bit suboptimal for testing
>purposes.
>
>For testing purposes, it could have been helpful a couple of times already to
>just have a way of migrating a folio even if it already resides on the
>expected node.
>

Thanks for all those detail explanation. I need some time to digest it.

Since lack of some background knowledge, I may have further questions on this.
Hope won't bother you too much.

>-- 
>Cheers,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-04-30 15:44           ` Wei Yang
@ 2025-04-30 21:36             ` David Hildenbrand
  0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2025-04-30 21:36 UTC (permalink / raw)
  To: Wei Yang
  Cc: Lorenzo Stoakes, akpm, riel, vbabka, harry.yoo, jannh, baohua,
	linux-mm

On 30.04.25 17:44, Wei Yang wrote:
> On Wed, Apr 30, 2025 at 09:47:16AM +0200, David Hildenbrand wrote:
>> On 30.04.25 01:56, Wei Yang wrote:
>>> On Tue, Apr 29, 2025 at 10:41:27AM +0100, Lorenzo Stoakes wrote:
>>>> On Tue, Apr 29, 2025 at 11:38:23AM +0200, David Hildenbrand wrote:
>>>>> On 29.04.25 11:31, Lorenzo Stoakes wrote:
>>>>>> Wei,
>>>>>>
>>>>>> NACK the whole series.
>>>>>>
>>>>>> I'm really not sure how to get through to you. You were _explicitly_
>>>>>> advised not to send this series. And yet you've sent it anyway.
>>>>>>
>>>>>> I mean, I appreciate your enthusiasm and the fact you've made tests here
>>>>>> etc. obviously. And you've clearly put a TON of work in. But I just don't
>>>>>> know why you would when explicitly told not to without at least discussing
>>>>>> it first?
>>>>>>
>>>>>> This just isn't a great way of interacting with the community. We're all
>>>>>> human, please try to have some empathy for others here, as I really do try
>>>>>> to have with you as best I can.
>>>>>>
>>>>>> This adds a ton of churn and LOCKS IN assumptions about how anon_vma works,
>>>>>> clashes with other series (most notably series I've been working on), takes
>>>>>> away from efforts I want to make to start to join file-backed and anon
>>>>>> reverse mapping logic, separates the two in such a way as to encourage this
>>>>>> to nonly grow and generally isn't conducive to where I want to go with
>>>>>> rmap.
>>>>>
>>>>> anon_vma, the unloved child. :)
>>>>>
>>>>> I would love to see a simplification that makes it less special, and I can
>>>>> understand how adding tests for the ways it is special can be
>>>>> counter-productive.
>>>>>
>>>>>>
>>>>>> This is part of why I explicitly told you please don't go down this road,
>>>>>> because you're likely to end up doing work that doesn't get used. It's not
>>>>>> a great use of your time either.
>>>>>>
>>>>>> Since there's something useful here in tests, I may at a later date come
>>>>>> back to those.
>>>>>
>>>>> Agreed, skimming over the tests there are some nice diagrams and cases.
>>>>>
>>>>> But I would hope that for most of these cases we could test on a higher
>>>>> level: test our expectations when running real programs that we want to
>>>>> check, especially when performing internal changes on how we handle anon
>>>>> memory + rmap.
>>>>>
>>>>> E.g., do fork(), then test if we can successfully perform rmap
>>>>> lookups/updates (e.g., migrate folio to a different numa node etc).
>>>>>
>>>>
>>>> That's a great point! Wei - if you could look at making some self-tests
>>>> (i.e. that live in tools/testing/selftests/mm) that try to recreate _real_
>>>> scenarios that use the rmap like this and assert correct behaviour there,
>>>> that could be a positive way of moving forward with this.
>>>>
>>>
>>> I am trying to understand what scenarios you want.
>>
>> That is exactly the task to figure out: how can we actually test our rmap
>> implementation from a higher level. The example regarding fork and migration
>> is possibly a low-hanging fruit.
>>
>> We might already have the functionality to achieve it, *maybe* we'd even want
>> some extensions to make it all even easier to test.
>>
>> For example, MADV_PAGEOUT is refused on folios that are mapped into multiple
>> processes. Maybe we'd want the option to *still* page it out, just like
>> MPOL_MF_MOVE_ALL allows with CAP_SYS_NICE to *still* migrate a folio that is
>> mapped into multiple processes.
>>
>> Some rmap tests could make sense for both, anon and pagecache folios.
>>
>>>
>>> Something like below?
>>>
>>>     * fork and migrate a range in child
>>>     * fork/unmap in parent and migrate a range in child
>>>
>>> If the operation is successful, then we are good, right?
>>
>> Yes. And one can come up with a bunch of similar rmap test cases, like doing
>> a partial mremap() of a THP, then testing if the rmap walk still works as
>> expected, pairing the whole thing with for etc.
>>
>> One "problem" here is that even with MPOL_MF_MOVE_ALL,
>> move_pages() will not move a folio if it already resides on the target node.
>> So one always needs two NUMA nodes, which is a bit suboptimal for testing
>> purposes.
>>
>> For testing purposes, it could have been helpful a couple of times already to
>> just have a way of migrating a folio even if it already resides on the
>> expected node.
>>
> 
> Thanks for all those detail explanation. I need some time to digest it.
> 
> Since lack of some background knowledge, I may have further questions on this.
> Hope won't bother you too much.

Sure, feel free to reach out. Having more selftests that test rmap 
behavior could be really helpful.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 2/5] anon_vma: add skeleton code for userland testing of anon_vma logic
  2025-04-29  9:06 ` [RFC Patch 2/5] anon_vma: add skeleton code for userland testing of anon_vma logic Wei Yang
@ 2025-05-01  1:31   ` Wei Yang
  2025-05-01  9:41     ` Lorenzo Stoakes
  0 siblings, 1 reply; 29+ messages in thread
From: Wei Yang @ 2025-05-01  1:31 UTC (permalink / raw)
  To: Wei Yang
  Cc: akpm, david, lorenzo.stoakes, riel, vbabka, harry.yoo, jannh,
	baohua, linux-mm

On Tue, Apr 29, 2025 at 09:06:36AM +0000, Wei Yang wrote:
[...]
>+
>+static bool test_fork_grand_child(void)
>+{
>+	struct vm_area_struct *root_vma, *grand_vma, *vma1, *vma2;
>+	struct anon_vma_chain *avc;
>+	struct anon_vma *root_anon_vma;
>+	DECLARE_BITMAP(expected, 10);
>+	DECLARE_BITMAP(found, 10);
>+
>+	bitmap_zero(expected, 10);
>+	bitmap_zero(found, 10);
>+
>+	/*
>+	 *  root_anon_vma      root_vma
>+	 *  +-----------+      +-----------+
>+	 *  |           | ---> |           |
>+	 *  +-----------+      +-----------+
>+	 */
>+
>+	root_vma = alloc_vma(0x3000, 0x5000, 3);
>+	/* First fault on parent anonymous vma. */
>+	__anon_vma_prepare(root_vma);
>+	root_anon_vma = root_vma->anon_vma;
>+	ASSERT_NE(NULL, root_anon_vma);
>+	bitmap_set(expected, root_vma->index, 1);
>+
>+	/* First fork */
>+	/*
>+	 *  root_anon_vma      root_vma
>+	 *  +-----------+      +-----------+
>+	 *  |           | ---> |           |
>+	 *  +-----------+      +-----------+
>+	 *                \
>+	 *                 \   vma1
>+	 *                  \  +-----------+
>+	 *                   > |           |
>+	 *                     +-----------+
>+	 */
>+	vma1 = alloc_vma(0x3000, 0x5000, 3);
>+	anon_vma_fork(vma1, root_vma);
>+	ASSERT_NE(NULL, vma1->anon_vma);
>+	bitmap_set(expected, vma1->index, 1);
>+	/* Parent/Root is root_vma->anon_vma */
>+	ASSERT_EQ(vma1->anon_vma->parent, root_vma->anon_vma);
>+	ASSERT_EQ(vma1->anon_vma->root, root_vma->anon_vma);
>+
>+	/* Second fork */
>+	/*
>+	 *  root_anon_vma      root_vma
>+	 *  +-----------+      +-----------+
>+	 *  |           | ---> |           |
>+	 *  +-----------+      +-----------+
>+	 *               \
>+	 *                \------------------+
>+	 *                 \   vma1           \   vma2
>+	 *                  \  +-----------+   \  +-----------+
>+	 *                   > |           |    > |           |
>+	 *                     +-----------+      +-----------+
>+	 */
>+	vma2 = alloc_vma(0x3000, 0x5000, 3);
>+	anon_vma_fork(vma2, root_vma);
>+	ASSERT_NE(NULL, vma2->anon_vma);
>+	bitmap_set(expected, vma2->index, 1);
>+	/* Parent/Root is root_vma->anon_vma */
>+	ASSERT_EQ(vma2->anon_vma->parent, root_vma->anon_vma);
>+	ASSERT_EQ(vma2->anon_vma->root, root_vma->anon_vma);
>+	dump_anon_vma_interval_tree(root_vma->anon_vma);
>+
>+	/* Fork grand child from second child */
>+	/*
>+	 *  root_anon_vma      root_vma
>+	 *  +-----------+      +-----------+
>+	 *  |           | ---> |           |
>+	 *  +-----------+      +-----------+
>+	 *               \
>+	 *                \------------------+
>+	 *                |\   vma1           \   vma2
>+	 *                | \  +-----------+   \  +-----------+
>+	 *                |  > |           |    > |           |
>+	 *                |    +-----------+      +-----------+
>+	 *                \
>+	 *                 \   grand_vma
>+	 *                  \  +-----------+
>+	 *                   > |           |
>+	 *                     +-----------+
>+	 */
>+	grand_vma = alloc_vma(0x3000, 0x5000, 3);
>+	anon_vma_fork(grand_vma, vma2);
>+	ASSERT_NE(NULL, grand_vma->anon_vma);
>+	bitmap_set(expected, grand_vma->index, 1);
>+	/* Root is root_vma->anon_vma */
>+	ASSERT_EQ(grand_vma->anon_vma->root, root_vma->anon_vma);
>+	/* Parent is vma2->anon_vma */
>+	ASSERT_EQ(grand_vma->anon_vma->parent, vma2->anon_vma);

Hi, Lorenzo

Here is the case I am talking about in another thread[1].

The naming is a little different from that.

  * root_vma  is VMA A
  * vma2      is VMA B
  * grand_vma is VMA C

If you add following debug code here.

```
	printf("root num_children %d\n", root_vma->anon_vma->num_children);
	printf("vma2 num_children %d\n", vma2->anon_vma->num_children);
	printf("grand_vma num_children %d\n", grand_vma->anon_vma->num_children);
```

You would see vma2 num_children is 1, but it has a child.

If I missed something, feel free to correct me.

[1]: https://lkml.kernel.org/r/20250501011845.ktbfgymor4oz5sok@master

>+
>+	/* Expect to find only vmas from second fork */
>+	anon_vma_interval_tree_foreach(avc, &vma2->anon_vma->rb_root, 3, 4) {
>+		ASSERT_TRUE(avc->vma == vma2 || avc->vma == grand_vma);
>+	}
>+
>+	anon_vma_interval_tree_foreach(avc, &root_vma->anon_vma->rb_root, 3, 4) {
>+		bitmap_set(found, avc->vma->index, 1);
>+	}
>+	/* Expect to find all vma including child and grand child. */
>+	ASSERT_TRUE(bitmap_equal(expected, found, 10));
>+
>+	/* Root process exit or unmap root_vma. */
>+	/*
>+	 *  root_anon_vma
>+	 *  +-----------+
>+	 *  |           |
>+	 *  +-----------+
>+	 *               \
>+	 *                \------------------+
>+	 *                |\   vma1           \   vma2
>+	 *                | \  +-----------+   \  +-----------+
>+	 *                |  > |           |    > |           |
>+	 *                |    +-----------+      +-----------+
>+	 *                \
>+	 *                 \   grand_vma
>+	 *                  \  +-----------+
>+	 *                   > |           |
>+	 *                     +-----------+
>+	 */
>+	bitmap_clear(expected, root_vma->index, 1);
>+	unlink_anon_vmas(root_vma);
>+	ASSERT_EQ(0, root_anon_vma->num_active_vmas);
>+
>+	bitmap_zero(found, 10);
>+	anon_vma_interval_tree_foreach(avc, &root_anon_vma->rb_root, 3, 4) {
>+		bitmap_set(found, avc->vma->index, 1);
>+	}
>+	/* Expect to find all vmas even root_vma released. */
>+	ASSERT_TRUE(bitmap_equal(expected, found, 10));
>+
>+	cleanup();
>+
>+	ASSERT_EQ(0, nr_allocated);
>+	return true;
>+}
>+


-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 2/5] anon_vma: add skeleton code for userland testing of anon_vma logic
  2025-05-01  1:31   ` Wei Yang
@ 2025-05-01  9:41     ` Lorenzo Stoakes
  2025-05-01 14:45       ` Wei Yang
  0 siblings, 1 reply; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-05-01  9:41 UTC (permalink / raw)
  To: Wei Yang; +Cc: akpm, david, riel, vbabka, harry.yoo, jannh, baohua, linux-mm

On Thu, May 01, 2025 at 01:31:27AM +0000, Wei Yang wrote:
> On Tue, Apr 29, 2025 at 09:06:36AM +0000, Wei Yang wrote:
> [...]
> >+
> >+static bool test_fork_grand_child(void)
> >+{
> >+	struct vm_area_struct *root_vma, *grand_vma, *vma1, *vma2;
> >+	struct anon_vma_chain *avc;
> >+	struct anon_vma *root_anon_vma;
> >+	DECLARE_BITMAP(expected, 10);
> >+	DECLARE_BITMAP(found, 10);
> >+
> >+	bitmap_zero(expected, 10);
> >+	bitmap_zero(found, 10);
> >+
> >+	/*
> >+	 *  root_anon_vma      root_vma
> >+	 *  +-----------+      +-----------+
> >+	 *  |           | ---> |           |
> >+	 *  +-----------+      +-----------+
> >+	 */
> >+
> >+	root_vma = alloc_vma(0x3000, 0x5000, 3);
> >+	/* First fault on parent anonymous vma. */
> >+	__anon_vma_prepare(root_vma);
> >+	root_anon_vma = root_vma->anon_vma;
> >+	ASSERT_NE(NULL, root_anon_vma);
> >+	bitmap_set(expected, root_vma->index, 1);
> >+
> >+	/* First fork */
> >+	/*
> >+	 *  root_anon_vma      root_vma
> >+	 *  +-----------+      +-----------+
> >+	 *  |           | ---> |           |
> >+	 *  +-----------+      +-----------+
> >+	 *                \
> >+	 *                 \   vma1
> >+	 *                  \  +-----------+
> >+	 *                   > |           |
> >+	 *                     +-----------+
> >+	 */
> >+	vma1 = alloc_vma(0x3000, 0x5000, 3);
> >+	anon_vma_fork(vma1, root_vma);
> >+	ASSERT_NE(NULL, vma1->anon_vma);
> >+	bitmap_set(expected, vma1->index, 1);
> >+	/* Parent/Root is root_vma->anon_vma */
> >+	ASSERT_EQ(vma1->anon_vma->parent, root_vma->anon_vma);
> >+	ASSERT_EQ(vma1->anon_vma->root, root_vma->anon_vma);
> >+
> >+	/* Second fork */
> >+	/*
> >+	 *  root_anon_vma      root_vma
> >+	 *  +-----------+      +-----------+
> >+	 *  |           | ---> |           |
> >+	 *  +-----------+      +-----------+
> >+	 *               \
> >+	 *                \------------------+
> >+	 *                 \   vma1           \   vma2
> >+	 *                  \  +-----------+   \  +-----------+
> >+	 *                   > |           |    > |           |
> >+	 *                     +-----------+      +-----------+
> >+	 */
> >+	vma2 = alloc_vma(0x3000, 0x5000, 3);
> >+	anon_vma_fork(vma2, root_vma);
> >+	ASSERT_NE(NULL, vma2->anon_vma);
> >+	bitmap_set(expected, vma2->index, 1);
> >+	/* Parent/Root is root_vma->anon_vma */
> >+	ASSERT_EQ(vma2->anon_vma->parent, root_vma->anon_vma);
> >+	ASSERT_EQ(vma2->anon_vma->root, root_vma->anon_vma);
> >+	dump_anon_vma_interval_tree(root_vma->anon_vma);
> >+
> >+	/* Fork grand child from second child */
> >+	/*
> >+	 *  root_anon_vma      root_vma
> >+	 *  +-----------+      +-----------+
> >+	 *  |           | ---> |           |
> >+	 *  +-----------+      +-----------+
> >+	 *               \
> >+	 *                \------------------+
> >+	 *                |\   vma1           \   vma2
> >+	 *                | \  +-----------+   \  +-----------+
> >+	 *                |  > |           |    > |           |
> >+	 *                |    +-----------+      +-----------+
> >+	 *                \
> >+	 *                 \   grand_vma
> >+	 *                  \  +-----------+
> >+	 *                   > |           |
> >+	 *                     +-----------+
> >+	 */
> >+	grand_vma = alloc_vma(0x3000, 0x5000, 3);
> >+	anon_vma_fork(grand_vma, vma2);
> >+	ASSERT_NE(NULL, grand_vma->anon_vma);
> >+	bitmap_set(expected, grand_vma->index, 1);
> >+	/* Root is root_vma->anon_vma */
> >+	ASSERT_EQ(grand_vma->anon_vma->root, root_vma->anon_vma);
> >+	/* Parent is vma2->anon_vma */
> >+	ASSERT_EQ(grand_vma->anon_vma->parent, vma2->anon_vma);
>
> Hi, Lorenzo
>
> Here is the case I am talking about in another thread[1].
>
> The naming is a little different from that.
>
>   * root_vma  is VMA A
>   * vma2      is VMA B
>   * grand_vma is VMA C
>
> If you add following debug code here.
>
> ```
> 	printf("root num_children %d\n", root_vma->anon_vma->num_children);
> 	printf("vma2 num_children %d\n", vma2->anon_vma->num_children);
> 	printf("grand_vma num_children %d\n", grand_vma->anon_vma->num_children);
> ```
>
> You would see vma2 num_children is 1, but it has a child.
>
> If I missed something, feel free to correct me.
>
> [1]: https://lkml.kernel.org/r/20250501011845.ktbfgymor4oz5sok@master
>

See the thread over there. This explanation isn't quite right, and the
diagram is wrong, but indeed anon_vma reuse is a thing that needs
addressing.

Having looked at this more I'm not a huge fan of these tests, you're
essentially open-coding fork over and over again, any change in overlying
fork logic will break.

The VMA and maple tree, etc. unit tests work much better as the whole thing
essentially can be separated out, but in this case you really can't. This
is obviously in addition to the aforementioned issues with causing
separation in an area where I want to try to unify.

So I definitely think that, again, you'd be better off looking at tests as
suggested by David, but moreover I think looking at bugs etc. is more
helpful.

This report was in effect reporting a bug before it happened so all good
for this kind of thing :>) and that is _always_ welcome. But do try to be
more direct and to the point if you can be.

Thanks!


> >+
> >+	/* Expect to find only vmas from second fork */
> >+	anon_vma_interval_tree_foreach(avc, &vma2->anon_vma->rb_root, 3, 4) {
> >+		ASSERT_TRUE(avc->vma == vma2 || avc->vma == grand_vma);
> >+	}
> >+
> >+	anon_vma_interval_tree_foreach(avc, &root_vma->anon_vma->rb_root, 3, 4) {
> >+		bitmap_set(found, avc->vma->index, 1);
> >+	}
> >+	/* Expect to find all vma including child and grand child. */
> >+	ASSERT_TRUE(bitmap_equal(expected, found, 10));
> >+
> >+	/* Root process exit or unmap root_vma. */
> >+	/*
> >+	 *  root_anon_vma
> >+	 *  +-----------+
> >+	 *  |           |
> >+	 *  +-----------+
> >+	 *               \
> >+	 *                \------------------+
> >+	 *                |\   vma1           \   vma2
> >+	 *                | \  +-----------+   \  +-----------+
> >+	 *                |  > |           |    > |           |
> >+	 *                |    +-----------+      +-----------+
> >+	 *                \
> >+	 *                 \   grand_vma
> >+	 *                  \  +-----------+
> >+	 *                   > |           |
> >+	 *                     +-----------+
> >+	 */
> >+	bitmap_clear(expected, root_vma->index, 1);
> >+	unlink_anon_vmas(root_vma);
> >+	ASSERT_EQ(0, root_anon_vma->num_active_vmas);
> >+
> >+	bitmap_zero(found, 10);
> >+	anon_vma_interval_tree_foreach(avc, &root_anon_vma->rb_root, 3, 4) {
> >+		bitmap_set(found, avc->vma->index, 1);
> >+	}
> >+	/* Expect to find all vmas even root_vma released. */
> >+	ASSERT_TRUE(bitmap_equal(expected, found, 10));
> >+
> >+	cleanup();
> >+
> >+	ASSERT_EQ(0, nr_allocated);
> >+	return true;
> >+}
> >+
>
>
> --
> Wei Yang
> Help you, Help me
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 2/5] anon_vma: add skeleton code for userland testing of anon_vma logic
  2025-05-01  9:41     ` Lorenzo Stoakes
@ 2025-05-01 14:45       ` Wei Yang
  0 siblings, 0 replies; 29+ messages in thread
From: Wei Yang @ 2025-05-01 14:45 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Wei Yang, akpm, david, riel, vbabka, harry.yoo, jannh, baohua,
	linux-mm

On Thu, May 01, 2025 at 10:41:54AM +0100, Lorenzo Stoakes wrote:
>On Thu, May 01, 2025 at 01:31:27AM +0000, Wei Yang wrote:
>> On Tue, Apr 29, 2025 at 09:06:36AM +0000, Wei Yang wrote:
>> [...]
>> >+
>> >+static bool test_fork_grand_child(void)
>> >+{
>> >+	struct vm_area_struct *root_vma, *grand_vma, *vma1, *vma2;
>> >+	struct anon_vma_chain *avc;
>> >+	struct anon_vma *root_anon_vma;
>> >+	DECLARE_BITMAP(expected, 10);
>> >+	DECLARE_BITMAP(found, 10);
>> >+
>> >+	bitmap_zero(expected, 10);
>> >+	bitmap_zero(found, 10);
>> >+
>> >+	/*
>> >+	 *  root_anon_vma      root_vma
>> >+	 *  +-----------+      +-----------+
>> >+	 *  |           | ---> |           |
>> >+	 *  +-----------+      +-----------+
>> >+	 */
>> >+
>> >+	root_vma = alloc_vma(0x3000, 0x5000, 3);
>> >+	/* First fault on parent anonymous vma. */
>> >+	__anon_vma_prepare(root_vma);
>> >+	root_anon_vma = root_vma->anon_vma;
>> >+	ASSERT_NE(NULL, root_anon_vma);
>> >+	bitmap_set(expected, root_vma->index, 1);
>> >+
>> >+	/* First fork */
>> >+	/*
>> >+	 *  root_anon_vma      root_vma
>> >+	 *  +-----------+      +-----------+
>> >+	 *  |           | ---> |           |
>> >+	 *  +-----------+      +-----------+
>> >+	 *                \
>> >+	 *                 \   vma1
>> >+	 *                  \  +-----------+
>> >+	 *                   > |           |
>> >+	 *                     +-----------+
>> >+	 */
>> >+	vma1 = alloc_vma(0x3000, 0x5000, 3);
>> >+	anon_vma_fork(vma1, root_vma);
>> >+	ASSERT_NE(NULL, vma1->anon_vma);
>> >+	bitmap_set(expected, vma1->index, 1);
>> >+	/* Parent/Root is root_vma->anon_vma */
>> >+	ASSERT_EQ(vma1->anon_vma->parent, root_vma->anon_vma);
>> >+	ASSERT_EQ(vma1->anon_vma->root, root_vma->anon_vma);
>> >+
>> >+	/* Second fork */
>> >+	/*
>> >+	 *  root_anon_vma      root_vma
>> >+	 *  +-----------+      +-----------+
>> >+	 *  |           | ---> |           |
>> >+	 *  +-----------+      +-----------+
>> >+	 *               \
>> >+	 *                \------------------+
>> >+	 *                 \   vma1           \   vma2
>> >+	 *                  \  +-----------+   \  +-----------+
>> >+	 *                   > |           |    > |           |
>> >+	 *                     +-----------+      +-----------+
>> >+	 */
>> >+	vma2 = alloc_vma(0x3000, 0x5000, 3);
>> >+	anon_vma_fork(vma2, root_vma);
>> >+	ASSERT_NE(NULL, vma2->anon_vma);
>> >+	bitmap_set(expected, vma2->index, 1);
>> >+	/* Parent/Root is root_vma->anon_vma */
>> >+	ASSERT_EQ(vma2->anon_vma->parent, root_vma->anon_vma);
>> >+	ASSERT_EQ(vma2->anon_vma->root, root_vma->anon_vma);
>> >+	dump_anon_vma_interval_tree(root_vma->anon_vma);
>> >+
>> >+	/* Fork grand child from second child */
>> >+	/*
>> >+	 *  root_anon_vma      root_vma
>> >+	 *  +-----------+      +-----------+
>> >+	 *  |           | ---> |           |
>> >+	 *  +-----------+      +-----------+
>> >+	 *               \
>> >+	 *                \------------------+
>> >+	 *                |\   vma1           \   vma2
>> >+	 *                | \  +-----------+   \  +-----------+
>> >+	 *                |  > |           |    > |           |
>> >+	 *                |    +-----------+      +-----------+
>> >+	 *                \
>> >+	 *                 \   grand_vma
>> >+	 *                  \  +-----------+
>> >+	 *                   > |           |
>> >+	 *                     +-----------+
>> >+	 */
>> >+	grand_vma = alloc_vma(0x3000, 0x5000, 3);
>> >+	anon_vma_fork(grand_vma, vma2);
>> >+	ASSERT_NE(NULL, grand_vma->anon_vma);
>> >+	bitmap_set(expected, grand_vma->index, 1);
>> >+	/* Root is root_vma->anon_vma */
>> >+	ASSERT_EQ(grand_vma->anon_vma->root, root_vma->anon_vma);
>> >+	/* Parent is vma2->anon_vma */
>> >+	ASSERT_EQ(grand_vma->anon_vma->parent, vma2->anon_vma);
>>
>> Hi, Lorenzo
>>
>> Here is the case I am talking about in another thread[1].
>>
>> The naming is a little different from that.
>>
>>   * root_vma  is VMA A
>>   * vma2      is VMA B
>>   * grand_vma is VMA C
>>
>> If you add following debug code here.
>>
>> ```
>> 	printf("root num_children %d\n", root_vma->anon_vma->num_children);
>> 	printf("vma2 num_children %d\n", vma2->anon_vma->num_children);
>> 	printf("grand_vma num_children %d\n", grand_vma->anon_vma->num_children);
>> ```
>>
>> You would see vma2 num_children is 1, but it has a child.
>>
>> If I missed something, feel free to correct me.
>>
>> [1]: https://lkml.kernel.org/r/20250501011845.ktbfgymor4oz5sok@master
>>
>
>See the thread over there. This explanation isn't quite right, and the
>diagram is wrong, but indeed anon_vma reuse is a thing that needs
>addressing.
>
>Having looked at this more I'm not a huge fan of these tests, you're
>essentially open-coding fork over and over again, any change in overlying
>fork logic will break.
>
>The VMA and maple tree, etc. unit tests work much better as the whole thing
>essentially can be separated out, but in this case you really can't. This
>is obviously in addition to the aforementioned issues with causing
>separation in an area where I want to try to unify.
>
>So I definitely think that, again, you'd be better off looking at tests as
>suggested by David, but moreover I think looking at bugs etc. is more
>helpful.

Sure. Thank you and David, pointing me a correct direction. And yes, bug has
higher priority.

>
>This report was in effect reporting a bug before it happened so all good
>for this kind of thing :>) and that is _always_ welcome. But do try to be
>more direct and to the point if you can be.

Ah, I am glad you like it. Would try to be more clear on expressing my point
next time. And thanks for your patience on reading my "confusing" mail.

>
>Thanks!
>
>
>> >+
>> >+	/* Expect to find only vmas from second fork */
>> >+	anon_vma_interval_tree_foreach(avc, &vma2->anon_vma->rb_root, 3, 4) {
>> >+		ASSERT_TRUE(avc->vma == vma2 || avc->vma == grand_vma);
>> >+	}
>> >+
>> >+	anon_vma_interval_tree_foreach(avc, &root_vma->anon_vma->rb_root, 3, 4) {
>> >+		bitmap_set(found, avc->vma->index, 1);
>> >+	}
>> >+	/* Expect to find all vma including child and grand child. */
>> >+	ASSERT_TRUE(bitmap_equal(expected, found, 10));
>> >+
>> >+	/* Root process exit or unmap root_vma. */
>> >+	/*
>> >+	 *  root_anon_vma
>> >+	 *  +-----------+
>> >+	 *  |           |
>> >+	 *  +-----------+
>> >+	 *               \
>> >+	 *                \------------------+
>> >+	 *                |\   vma1           \   vma2
>> >+	 *                | \  +-----------+   \  +-----------+
>> >+	 *                |  > |           |    > |           |
>> >+	 *                |    +-----------+      +-----------+
>> >+	 *                \
>> >+	 *                 \   grand_vma
>> >+	 *                  \  +-----------+
>> >+	 *                   > |           |
>> >+	 *                     +-----------+
>> >+	 */
>> >+	bitmap_clear(expected, root_vma->index, 1);
>> >+	unlink_anon_vmas(root_vma);
>> >+	ASSERT_EQ(0, root_anon_vma->num_active_vmas);
>> >+
>> >+	bitmap_zero(found, 10);
>> >+	anon_vma_interval_tree_foreach(avc, &root_anon_vma->rb_root, 3, 4) {
>> >+		bitmap_set(found, avc->vma->index, 1);
>> >+	}
>> >+	/* Expect to find all vmas even root_vma released. */
>> >+	ASSERT_TRUE(bitmap_equal(expected, found, 10));
>> >+
>> >+	cleanup();
>> >+
>> >+	ASSERT_EQ(0, nr_allocated);
>> >+	return true;
>> >+}
>> >+
>>
>>
>> --
>> Wei Yang
>> Help you, Help me
>>

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-04-30  7:47         ` David Hildenbrand
  2025-04-30 15:44           ` Wei Yang
@ 2025-05-14  1:23           ` Wei Yang
  2025-05-27  6:34             ` Wei Yang
  1 sibling, 1 reply; 29+ messages in thread
From: Wei Yang @ 2025-05-14  1:23 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, Lorenzo Stoakes, akpm, riel, vbabka, harry.yoo, jannh,
	baohua, linux-mm

On Wed, Apr 30, 2025 at 09:47:16AM +0200, David Hildenbrand wrote:
[...]
>> > > Agreed, skimming over the tests there are some nice diagrams and cases.
>> > > 
>> > > But I would hope that for most of these cases we could test on a higher
>> > > level: test our expectations when running real programs that we want to
>> > > check, especially when performing internal changes on how we handle anon
>> > > memory + rmap.
>> > > 
>> > > E.g., do fork(), then test if we can successfully perform rmap
>> > > lookups/updates (e.g., migrate folio to a different numa node etc).
>> > > 
>> > 
>> > That's a great point! Wei - if you could look at making some self-tests
>> > (i.e. that live in tools/testing/selftests/mm) that try to recreate _real_
>> > scenarios that use the rmap like this and assert correct behaviour there,
>> > that could be a positive way of moving forward with this.
>> > 
>> 
>> I am trying to understand what scenarios you want.
>

Sorry for the late reply, I handled other things a while.

>That is exactly the task to figure out: how can we actually test our rmap
>implementation from a higher level. The example regarding fork and migration
>is possibly a low-hanging fruit.

If my understanding is correct, you suggested two high level way:

1. fork + migrate (move_pages)

>
>We might already have the functionality to achieve it, *maybe* we'd even want
>some extensions to make it all even easier to test.
>
>For example, MADV_PAGEOUT is refused on folios that are mapped into multiple
>processes. Maybe we'd want the option to *still* page it out, just like
>MPOL_MF_MOVE_ALL allows with CAP_SYS_NICE to *still* migrate a folio that is
>mapped into multiple processes.
>

2. madvise(MADV_PAGEOUT)

Not fully get it here. You mean fork + madvise(MADV_PAGEOUT) + migrate ?

But we need to enable pageout in this way first.

I am not sure why this one is easier way to test. Would you mind sharing more
idea on this?

>Some rmap tests could make sense for both, anon and pagecache folios.
>
>> 
>> Something like below?
>> 
>>    * fork and migrate a range in child
>>    * fork/unmap in parent and migrate a range in child
>> 
>> If the operation is successful, then we are good, right?
>
>Yes. And one can come up with a bunch of similar rmap test cases, like doing
>a partial mremap() of a THP, then testing if the rmap walk still works as
>expected, pairing the whole thing with for etc.
>

For both way, we could arrange all those scenarios and also do partial
mremap() during it. 

>One "problem" here is that even with MPOL_MF_MOVE_ALL,
>move_pages() will not move a folio if it already resides on the target node.
>So one always needs two NUMA nodes, which is a bit suboptimal for testing
>purposes.
>
>For testing purposes, it could have been helpful a couple of times already to
>just have a way of migrating a folio even if it already resides on the
>expected node.
>

This looks we need a new flag for it?

Here is my plan if my understanding is correct.

1. Add test cases for fork + migrate. We may limit it only works on machine
   with 2 NUMA nodes.
2. Enable move_pages() on local node, then remove the test limitation
3. Enable madvise(MADV_PAGEOUT) with multiple mapping, then add related cases
4. Add mremap() or other cases

In general, to verify rmap dose the work correctly, my idea is to

  * mmap(MAP_SHARED)
  * write some initial data before fork
  * after fork and migrate, we write some different data to it
  * if each process do see the new data, rmap is good.

Does it sound good to you?

>-- 
>Cheers,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-05-14  1:23           ` Wei Yang
@ 2025-05-27  6:34             ` Wei Yang
  2025-05-27 11:31               ` David Hildenbrand
  0 siblings, 1 reply; 29+ messages in thread
From: Wei Yang @ 2025-05-27  6:34 UTC (permalink / raw)
  To: Wei Yang
  Cc: David Hildenbrand, Lorenzo Stoakes, akpm, riel, vbabka, harry.yoo,
	jannh, baohua, linux-mm

On Wed, May 14, 2025 at 01:23:18AM +0000, Wei Yang wrote:
>On Wed, Apr 30, 2025 at 09:47:16AM +0200, David Hildenbrand wrote:
>[...]
>>> > > Agreed, skimming over the tests there are some nice diagrams and cases.
>>> > > 
>>> > > But I would hope that for most of these cases we could test on a higher
>>> > > level: test our expectations when running real programs that we want to
>>> > > check, especially when performing internal changes on how we handle anon
>>> > > memory + rmap.
>>> > > 
>>> > > E.g., do fork(), then test if we can successfully perform rmap
>>> > > lookups/updates (e.g., migrate folio to a different numa node etc).
>>> > > 
>>> > 
>>> > That's a great point! Wei - if you could look at making some self-tests
>>> > (i.e. that live in tools/testing/selftests/mm) that try to recreate _real_
>>> > scenarios that use the rmap like this and assert correct behaviour there,
>>> > that could be a positive way of moving forward with this.
>>> > 
>>> 

Ping

>>> I am trying to understand what scenarios you want.
>>
>
>Sorry for the late reply, I handled other things a while.
>
>>That is exactly the task to figure out: how can we actually test our rmap
>>implementation from a higher level. The example regarding fork and migration
>>is possibly a low-hanging fruit.
>
>If my understanding is correct, you suggested two high level way:
>
>1. fork + migrate (move_pages)
>
>>
>>We might already have the functionality to achieve it, *maybe* we'd even want
>>some extensions to make it all even easier to test.
>>
>>For example, MADV_PAGEOUT is refused on folios that are mapped into multiple
>>processes. Maybe we'd want the option to *still* page it out, just like
>>MPOL_MF_MOVE_ALL allows with CAP_SYS_NICE to *still* migrate a folio that is
>>mapped into multiple processes.
>>
>
>2. madvise(MADV_PAGEOUT)
>
>Not fully get it here. You mean fork + madvise(MADV_PAGEOUT) + migrate ?
>
>But we need to enable pageout in this way first.
>
>I am not sure why this one is easier way to test. Would you mind sharing more
>idea on this?
>
>>Some rmap tests could make sense for both, anon and pagecache folios.
>>
>>> 
>>> Something like below?
>>> 
>>>    * fork and migrate a range in child
>>>    * fork/unmap in parent and migrate a range in child
>>> 
>>> If the operation is successful, then we are good, right?
>>
>>Yes. And one can come up with a bunch of similar rmap test cases, like doing
>>a partial mremap() of a THP, then testing if the rmap walk still works as
>>expected, pairing the whole thing with for etc.
>>
>
>For both way, we could arrange all those scenarios and also do partial
>mremap() during it. 
>
>>One "problem" here is that even with MPOL_MF_MOVE_ALL,
>>move_pages() will not move a folio if it already resides on the target node.
>>So one always needs two NUMA nodes, which is a bit suboptimal for testing
>>purposes.
>>
>>For testing purposes, it could have been helpful a couple of times already to
>>just have a way of migrating a folio even if it already resides on the
>>expected node.
>>
>
>This looks we need a new flag for it?
>
>Here is my plan if my understanding is correct.
>
>1. Add test cases for fork + migrate. We may limit it only works on machine
>   with 2 NUMA nodes.
>2. Enable move_pages() on local node, then remove the test limitation
>3. Enable madvise(MADV_PAGEOUT) with multiple mapping, then add related cases
>4. Add mremap() or other cases
>
>In general, to verify rmap dose the work correctly, my idea is to
>
>  * mmap(MAP_SHARED)
>  * write some initial data before fork
>  * after fork and migrate, we write some different data to it
>  * if each process do see the new data, rmap is good.
>
>Does it sound good to you?
>
>>-- 
>>Cheers,
>>
>>David / dhildenb
>
>-- 
>Wei Yang
>Help you, Help me

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-05-27  6:34             ` Wei Yang
@ 2025-05-27 11:31               ` David Hildenbrand
  2025-05-28  1:17                 ` Wei Yang
  2025-05-30  2:11                 ` Wei Yang
  0 siblings, 2 replies; 29+ messages in thread
From: David Hildenbrand @ 2025-05-27 11:31 UTC (permalink / raw)
  To: Wei Yang
  Cc: Lorenzo Stoakes, akpm, riel, vbabka, harry.yoo, jannh, baohua,
	linux-mm

On 27.05.25 08:34, Wei Yang wrote:
> On Wed, May 14, 2025 at 01:23:18AM +0000, Wei Yang wrote:
>> On Wed, Apr 30, 2025 at 09:47:16AM +0200, David Hildenbrand wrote:
>> [...]
>>>>>> Agreed, skimming over the tests there are some nice diagrams and cases.
>>>>>>
>>>>>> But I would hope that for most of these cases we could test on a higher
>>>>>> level: test our expectations when running real programs that we want to
>>>>>> check, especially when performing internal changes on how we handle anon
>>>>>> memory + rmap.
>>>>>>
>>>>>> E.g., do fork(), then test if we can successfully perform rmap
>>>>>> lookups/updates (e.g., migrate folio to a different numa node etc).
>>>>>>
>>>>>
>>>>> That's a great point! Wei - if you could look at making some self-tests
>>>>> (i.e. that live in tools/testing/selftests/mm) that try to recreate _real_
>>>>> scenarios that use the rmap like this and assert correct behaviour there,
>>>>> that could be a positive way of moving forward with this.
>>>>>
>>>>
> 
> Ping

Thanks for reminding me and sorry for the late reply.

> 
>>>> I am trying to understand what scenarios you want.
>>>
>>
>> Sorry for the late reply, I handled other things a while.
>>
>>> That is exactly the task to figure out: how can we actually test our rmap
>>> implementation from a higher level. The example regarding fork and migration
>>> is possibly a low-hanging fruit.
>>
>> If my understanding is correct, you suggested two high level way:
>>
>> 1. fork + migrate (move_pages)

Yes, that should be one way of testing it. We could even get multiple 
child processes into place.

>>
>>>
>>> We might already have the functionality to achieve it, *maybe* we'd even want
>>> some extensions to make it all even easier to test.
>>>
>>> For example, MADV_PAGEOUT is refused on folios that are mapped into multiple
>>> processes. Maybe we'd want the option to *still* page it out, just like
>>> MPOL_MF_MOVE_ALL allows with CAP_SYS_NICE to *still* migrate a folio that is
>>> mapped into multiple processes.
>>>
>>
>> 2. madvise(MADV_PAGEOUT)
>>
>> Not fully get it here. You mean fork + madvise(MADV_PAGEOUT) + migrate ?

fork + madvise(MADV_PAGEOUT) only.

Then verify, that it was actually paged out.

For example, we could have 10 processes mapping the page, then call 
madvise(MADV_PAGEOUT) from one process. We can verify in all processes 
if the page is actually no longer mapped using /proc/self/pagemap

We could likely do it with anon pages (swap), shmem pages (swap) and 
pagecache pages (file backed).

... and possibly even with KSM pages!

>>
>> But we need to enable pageout in this way first.

Yes, madvise(MADV_PAGEOUT) needs to be extended to allow forcing a 
pageout (e.g., a new MADVISE_PAGEOUT_SHARED that only works with with 
CAP_SYY_ADMIN).

>>
>> I am not sure why this one is easier way to test. Would you mind sharing more
>> idea on this?

It might be easier than the migration case, because for migration we 
currently need 2 NUMA nodes ...

>>
>>> Some rmap tests could make sense for both, anon and pagecache folios.
>>>
>>>>
>>>> Something like below?
>>>>
>>>>     * fork and migrate a range in child
>>>>     * fork/unmap in parent and migrate a range in child
>>>>
>>>> If the operation is successful, then we are good, right?
>>>
>>> Yes. And one can come up with a bunch of similar rmap test cases, like doing
>>> a partial mremap() of a THP, then testing if the rmap walk still works as
>>> expected, pairing the whole thing with for etc.
>>>
>>
>> For both way, we could arrange all those scenarios and also do partial
>> mremap() during it.

Exactly.

>>
>>> One "problem" here is that even with MPOL_MF_MOVE_ALL,
>>> move_pages() will not move a folio if it already resides on the target node.
>>> So one always needs two NUMA nodes, which is a bit suboptimal for testing
>>> purposes.
>>>
>>> For testing purposes, it could have been helpful a couple of times already to
>>> just have a way of migrating a folio even if it already resides on the
>>> expected node.
>>>
>>
>> This looks we need a new flag for it?
>>
>> Here is my plan if my understanding is correct.
>>
>> 1. Add test cases for fork + migrate. We may limit it only works on machine
>>    with 2 NUMA nodes.
>> 2. Enable move_pages() on local node, then remove the test limitation
>> 3. Enable madvise(MADV_PAGEOUT) with multiple mapping, then add related cases
>> 4. Add mremap() or other cases
>>
>> In general, to verify rmap dose the work correctly, my idea is to
>>
>>   * mmap(MAP_SHARED)
 >>   * write some initial data before fork>>   * after fork and 
migrate, we write some different data to it
>>   * if each process do see the new data, rmap is good.

With pageout (see above) test, we can just verify using 
/proc/self/pagemap in all processes whether the page was paged out (IOW: 
the rmap was able to identify all page locations). And that should work 
with anon/shmem/file/ksm pages IIUC.

>>
>> Does it sound good to you?

Yes, that absolutely goes into the right direction. Having also rmap 
tests for KSM as raised above might be a real benefit.


-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-05-27 11:31               ` David Hildenbrand
@ 2025-05-28  1:17                 ` Wei Yang
  2025-05-30  2:11                 ` Wei Yang
  1 sibling, 0 replies; 29+ messages in thread
From: Wei Yang @ 2025-05-28  1:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, Lorenzo Stoakes, akpm, riel, vbabka, harry.yoo, jannh,
	baohua, linux-mm

On Tue, May 27, 2025 at 01:31:47PM +0200, David Hildenbrand wrote:
>On 27.05.25 08:34, Wei Yang wrote:
>> On Wed, May 14, 2025 at 01:23:18AM +0000, Wei Yang wrote:
>> > On Wed, Apr 30, 2025 at 09:47:16AM +0200, David Hildenbrand wrote:
>> > [...]
>> > > > > > Agreed, skimming over the tests there are some nice diagrams and cases.
>> > > > > > 
>> > > > > > But I would hope that for most of these cases we could test on a higher
>> > > > > > level: test our expectations when running real programs that we want to
>> > > > > > check, especially when performing internal changes on how we handle anon
>> > > > > > memory + rmap.
>> > > > > > 
>> > > > > > E.g., do fork(), then test if we can successfully perform rmap
>> > > > > > lookups/updates (e.g., migrate folio to a different numa node etc).
>> > > > > > 
>> > > > > 
>> > > > > That's a great point! Wei - if you could look at making some self-tests
>> > > > > (i.e. that live in tools/testing/selftests/mm) that try to recreate _real_
>> > > > > scenarios that use the rmap like this and assert correct behaviour there,
>> > > > > that could be a positive way of moving forward with this.
>> > > > > 
>> > > > 
>> 
>> Ping
>
>Thanks for reminding me and sorry for the late reply.
>
>> 
>> > > > I am trying to understand what scenarios you want.
>> > > 
>> > 
>> > Sorry for the late reply, I handled other things a while.
>> > 
>> > > That is exactly the task to figure out: how can we actually test our rmap
>> > > implementation from a higher level. The example regarding fork and migration
>> > > is possibly a low-hanging fruit.
>> > 
>> > If my understanding is correct, you suggested two high level way:
>> > 
>> > 1. fork + migrate (move_pages)
>
>Yes, that should be one way of testing it. We could even get multiple child
>processes into place.
>
>> > 
>> > > 
>> > > We might already have the functionality to achieve it, *maybe* we'd even want
>> > > some extensions to make it all even easier to test.
>> > > 
>> > > For example, MADV_PAGEOUT is refused on folios that are mapped into multiple
>> > > processes. Maybe we'd want the option to *still* page it out, just like
>> > > MPOL_MF_MOVE_ALL allows with CAP_SYS_NICE to *still* migrate a folio that is
>> > > mapped into multiple processes.
>> > > 
>> > 
>> > 2. madvise(MADV_PAGEOUT)
>> > 
>> > Not fully get it here. You mean fork + madvise(MADV_PAGEOUT) + migrate ?
>
>fork + madvise(MADV_PAGEOUT) only.
>
>Then verify, that it was actually paged out.
>
>For example, we could have 10 processes mapping the page, then call
>madvise(MADV_PAGEOUT) from one process. We can verify in all processes if the
>page is actually no longer mapped using /proc/self/pagemap

Got it, thanks.

Will prepare the test cases.

>
>We could likely do it with anon pages (swap), shmem pages (swap) and
>pagecache pages (file backed).
>
>... and possibly even with KSM pages!
>
>> > 
>> > But we need to enable pageout in this way first.
>
>Yes, madvise(MADV_PAGEOUT) needs to be extended to allow forcing a pageout
>(e.g., a new MADVISE_PAGEOUT_SHARED that only works with with CAP_SYY_ADMIN).
>
>> > 
>> > I am not sure why this one is easier way to test. Would you mind sharing more
>> > idea on this?
>
>It might be easier than the migration case, because for migration we
>currently need 2 NUMA nodes ...
>
>> > 
>> > > Some rmap tests could make sense for both, anon and pagecache folios.
>> > > 
>> > > > 
>> > > > Something like below?
>> > > > 
>> > > >     * fork and migrate a range in child
>> > > >     * fork/unmap in parent and migrate a range in child
>> > > > 
>> > > > If the operation is successful, then we are good, right?
>> > > 
>> > > Yes. And one can come up with a bunch of similar rmap test cases, like doing
>> > > a partial mremap() of a THP, then testing if the rmap walk still works as
>> > > expected, pairing the whole thing with for etc.
>> > > 
>> > 
>> > For both way, we could arrange all those scenarios and also do partial
>> > mremap() during it.
>
>Exactly.
>
>> > 
>> > > One "problem" here is that even with MPOL_MF_MOVE_ALL,
>> > > move_pages() will not move a folio if it already resides on the target node.
>> > > So one always needs two NUMA nodes, which is a bit suboptimal for testing
>> > > purposes.
>> > > 
>> > > For testing purposes, it could have been helpful a couple of times already to
>> > > just have a way of migrating a folio even if it already resides on the
>> > > expected node.
>> > > 
>> > 
>> > This looks we need a new flag for it?
>> > 
>> > Here is my plan if my understanding is correct.
>> > 
>> > 1. Add test cases for fork + migrate. We may limit it only works on machine
>> >    with 2 NUMA nodes.
>> > 2. Enable move_pages() on local node, then remove the test limitation
>> > 3. Enable madvise(MADV_PAGEOUT) with multiple mapping, then add related cases
>> > 4. Add mremap() or other cases
>> > 
>> > In general, to verify rmap dose the work correctly, my idea is to
>> > 
>> >   * mmap(MAP_SHARED)
>>>   * write some initial data before fork>>   * after fork and migrate, we
>write some different data to it
>> >   * if each process do see the new data, rmap is good.
>
>With pageout (see above) test, we can just verify using /proc/self/pagemap in
>all processes whether the page was paged out (IOW: the rmap was able to
>identify all page locations). And that should work with anon/shmem/file/ksm
>pages IIUC.
>
>> > 
>> > Does it sound good to you?
>
>Yes, that absolutely goes into the right direction. Having also rmap tests
>for KSM as raised above might be a real benefit.
>
>
>-- 
>Cheers,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-05-27 11:31               ` David Hildenbrand
  2025-05-28  1:17                 ` Wei Yang
@ 2025-05-30  2:11                 ` Wei Yang
  2025-05-30  8:00                   ` David Hildenbrand
  1 sibling, 1 reply; 29+ messages in thread
From: Wei Yang @ 2025-05-30  2:11 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, Lorenzo Stoakes, akpm, riel, vbabka, harry.yoo, jannh,
	baohua, linux-mm

On Tue, May 27, 2025 at 01:31:47PM +0200, David Hildenbrand wrote:
>On 27.05.25 08:34, Wei Yang wrote:
>> On Wed, May 14, 2025 at 01:23:18AM +0000, Wei Yang wrote:
>> > On Wed, Apr 30, 2025 at 09:47:16AM +0200, David Hildenbrand wrote:
>> > [...]
>> > > > > > Agreed, skimming over the tests there are some nice diagrams and cases.
>> > > > > > 
>> > > > > > But I would hope that for most of these cases we could test on a higher
>> > > > > > level: test our expectations when running real programs that we want to
>> > > > > > check, especially when performing internal changes on how we handle anon
>> > > > > > memory + rmap.
>> > > > > > 
>> > > > > > E.g., do fork(), then test if we can successfully perform rmap
>> > > > > > lookups/updates (e.g., migrate folio to a different numa node etc).
>> > > > > > 
>> > > > > 
>> > > > > That's a great point! Wei - if you could look at making some self-tests
>> > > > > (i.e. that live in tools/testing/selftests/mm) that try to recreate _real_
>> > > > > scenarios that use the rmap like this and assert correct behaviour there,
>> > > > > that could be a positive way of moving forward with this.
>> > > > > 
>> > > > 
>> 
>> Ping
>
>Thanks for reminding me and sorry for the late reply.
>
>> 
>> > > > I am trying to understand what scenarios you want.
>> > > 
>> > 
>> > Sorry for the late reply, I handled other things a while.
>> > 
>> > > That is exactly the task to figure out: how can we actually test our rmap
>> > > implementation from a higher level. The example regarding fork and migration
>> > > is possibly a low-hanging fruit.
>> > 
>> > If my understanding is correct, you suggested two high level way:
>> > 
>> > 1. fork + migrate (move_pages)
>
>Yes, that should be one way of testing it. We could even get multiple child
>processes into place.
>
>> > 
>> > > 
>> > > We might already have the functionality to achieve it, *maybe* we'd even want
>> > > some extensions to make it all even easier to test.
>> > > 
>> > > For example, MADV_PAGEOUT is refused on folios that are mapped into multiple
>> > > processes. Maybe we'd want the option to *still* page it out, just like
>> > > MPOL_MF_MOVE_ALL allows with CAP_SYS_NICE to *still* migrate a folio that is
>> > > mapped into multiple processes.
>> > > 
>> > 
>> > 2. madvise(MADV_PAGEOUT)
>> > 
>> > Not fully get it here. You mean fork + madvise(MADV_PAGEOUT) + migrate ?
>
>fork + madvise(MADV_PAGEOUT) only.
>
>Then verify, that it was actually paged out.
>
>For example, we could have 10 processes mapping the page, then call
>madvise(MADV_PAGEOUT) from one process. We can verify in all processes if the
>page is actually no longer mapped using /proc/self/pagemap
>

Hi, David

I'd like to clarify some detail here.

>We could likely do it with anon pages (swap), shmem pages (swap) and
>pagecache pages (file backed).
>

I could understand the three cases here, but for the (swap) case, it is not
clear to me. You mean we force the page to be swapped out before migration? I
may not have an idea to ensure that.

>... and possibly even with KSM pages!
>

For KSM, it looks like an general background routine. I guess you want
something like:

  * fork process tree
  * do KSM and wait for it to finish?
  * migrate

I guess the KSM duration would related to the total memory on the system. The
bigger, the longer. While the selftest generally have timeout setting. Not
sure it would be suitable for all situation.

Would you mind sharing more on KSM case?

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-05-30  2:11                 ` Wei Yang
@ 2025-05-30  8:00                   ` David Hildenbrand
  2025-05-30 14:05                     ` Wei Yang
  0 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2025-05-30  8:00 UTC (permalink / raw)
  To: Wei Yang
  Cc: Lorenzo Stoakes, akpm, riel, vbabka, harry.yoo, jannh, baohua,
	linux-mm

On 30.05.25 04:11, Wei Yang wrote:
> On Tue, May 27, 2025 at 01:31:47PM +0200, David Hildenbrand wrote:
>> On 27.05.25 08:34, Wei Yang wrote:
>>> On Wed, May 14, 2025 at 01:23:18AM +0000, Wei Yang wrote:
>>>> On Wed, Apr 30, 2025 at 09:47:16AM +0200, David Hildenbrand wrote:
>>>> [...]
>>>>>>>> Agreed, skimming over the tests there are some nice diagrams and cases.
>>>>>>>>
>>>>>>>> But I would hope that for most of these cases we could test on a higher
>>>>>>>> level: test our expectations when running real programs that we want to
>>>>>>>> check, especially when performing internal changes on how we handle anon
>>>>>>>> memory + rmap.
>>>>>>>>
>>>>>>>> E.g., do fork(), then test if we can successfully perform rmap
>>>>>>>> lookups/updates (e.g., migrate folio to a different numa node etc).
>>>>>>>>
>>>>>>>
>>>>>>> That's a great point! Wei - if you could look at making some self-tests
>>>>>>> (i.e. that live in tools/testing/selftests/mm) that try to recreate _real_
>>>>>>> scenarios that use the rmap like this and assert correct behaviour there,
>>>>>>> that could be a positive way of moving forward with this.
>>>>>>>
>>>>>>
>>>
>>> Ping
>>
>> Thanks for reminding me and sorry for the late reply.
>>
>>>
>>>>>> I am trying to understand what scenarios you want.
>>>>>
>>>>
>>>> Sorry for the late reply, I handled other things a while.
>>>>
>>>>> That is exactly the task to figure out: how can we actually test our rmap
>>>>> implementation from a higher level. The example regarding fork and migration
>>>>> is possibly a low-hanging fruit.
>>>>
>>>> If my understanding is correct, you suggested two high level way:
>>>>
>>>> 1. fork + migrate (move_pages)
>>
>> Yes, that should be one way of testing it. We could even get multiple child
>> processes into place.
>>
>>>>
>>>>>
>>>>> We might already have the functionality to achieve it, *maybe* we'd even want
>>>>> some extensions to make it all even easier to test.
>>>>>
>>>>> For example, MADV_PAGEOUT is refused on folios that are mapped into multiple
>>>>> processes. Maybe we'd want the option to *still* page it out, just like
>>>>> MPOL_MF_MOVE_ALL allows with CAP_SYS_NICE to *still* migrate a folio that is
>>>>> mapped into multiple processes.
>>>>>
>>>>
>>>> 2. madvise(MADV_PAGEOUT)
>>>>
>>>> Not fully get it here. You mean fork + madvise(MADV_PAGEOUT) + migrate ?
>>
>> fork + madvise(MADV_PAGEOUT) only.
>>
>> Then verify, that it was actually paged out.
>>
>> For example, we could have 10 processes mapping the page, then call
>> madvise(MADV_PAGEOUT) from one process. We can verify in all processes if the
>> page is actually no longer mapped using /proc/self/pagemap
>>
> 
> Hi, David

Hi!

> 
> I'd like to clarify some detail here.
> 
>> We could likely do it with anon pages (swap), shmem pages (swap) and
>> pagecache pages (file backed).
>>
> 
> I could understand the three cases here, but for the (swap) case, it is not
> clear to me. You mean we force the page to be swapped out before migration? I
> may not have an idea to ensure that.

No need to involve migration for that test. It's sufficient to trigger 
pageout to then check if the page was unmapped in all other processes.

> 
>> ... and possibly even with KSM pages!
>>
> 
> For KSM, it looks like an general background routine. I guess you want
> something like:
> 
>    * fork process tree
>    * do KSM and wait for it to finish?

Probably allocate memory in each process and fill it with the same value.

Then wait for KSM to deduplciate.

>    * migrate

Yes, or instead of migrate, trigger pageout from one process, then check 
if pageout in all other processes (/proc/self/pagemap)

 > > I guess the KSM duration would related to the total memory on the 
system. The
> bigger, the longer. While the selftest generally have timeout setting. Not
> sure it would be suitable for all situation.
> 
> Would you mind sharing more on KSM case?

Worth looking at tools/testing/selftests/mm/ksm_functional_tests.c (and 
the other KSM selftests)

In particular, in ksm_merge() we can detect whether KSM completed the 
scan when two full scans passed.

A lot of that functionality could probably be factored out into 
vm_utils.c to be reused in other tests.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-05-30  8:00                   ` David Hildenbrand
@ 2025-05-30 14:05                     ` Wei Yang
  2025-05-30 14:39                       ` David Hildenbrand
  0 siblings, 1 reply; 29+ messages in thread
From: Wei Yang @ 2025-05-30 14:05 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, Lorenzo Stoakes, akpm, riel, vbabka, harry.yoo, jannh,
	baohua, linux-mm

On Fri, May 30, 2025 at 10:00:18AM +0200, David Hildenbrand wrote:
[...]
>> > 
>> > Thanks for reminding me and sorry for the late reply.
>> > 
>> > > 
>> > > > > > I am trying to understand what scenarios you want.
>> > > > > 
>> > > > 
>> > > > Sorry for the late reply, I handled other things a while.
>> > > > 
>> > > > > That is exactly the task to figure out: how can we actually test our rmap
>> > > > > implementation from a higher level. The example regarding fork and migration
>> > > > > is possibly a low-hanging fruit.
>> > > > 
>> > > > If my understanding is correct, you suggested two high level way:
>> > > > 
>> > > > 1. fork + migrate (move_pages)
>> > 
>> > Yes, that should be one way of testing it. We could even get multiple child
>> > processes into place.
>> > 
>> > > > 
>> > > > > 
>> > > > > We might already have the functionality to achieve it, *maybe* we'd even want
>> > > > > some extensions to make it all even easier to test.
>> > > > > 
>> > > > > For example, MADV_PAGEOUT is refused on folios that are mapped into multiple
>> > > > > processes. Maybe we'd want the option to *still* page it out, just like
>> > > > > MPOL_MF_MOVE_ALL allows with CAP_SYS_NICE to *still* migrate a folio that is
>> > > > > mapped into multiple processes.
>> > > > > 
>> > > > 
>> > > > 2. madvise(MADV_PAGEOUT)
>> > > > 
>> > > > Not fully get it here. You mean fork + madvise(MADV_PAGEOUT) + migrate ?
>> > 
>> > fork + madvise(MADV_PAGEOUT) only.
>> > 
>> > Then verify, that it was actually paged out.
>> > 
>> > For example, we could have 10 processes mapping the page, then call
>> > madvise(MADV_PAGEOUT) from one process. We can verify in all processes if the
>> > page is actually no longer mapped using /proc/self/pagemap
>> > 
>> 
>> Hi, David
>
>Hi!
>
>> 
>> I'd like to clarify some detail here.
>> 
>> > We could likely do it with anon pages (swap), shmem pages (swap) and
>> > pagecache pages (file backed).
>> > 
>> 
>> I could understand the three cases here, but for the (swap) case, it is not
>> clear to me. You mean we force the page to be swapped out before migration? I
>> may not have an idea to ensure that.
>
>No need to involve migration for that test. It's sufficient to trigger
>pageout to then check if the page was unmapped in all other processes.
>

Here you mean for all anon/shmem/pagecache pages, we could have two category
tests:

  * migrate and write different data from on process, verify each process see 
    new data
  * trigger pageout from one process, verify each process has it pageout
    (/proc/self/pagemap)

While one of these category is enough, right? Just like KSM below, migrate or
pageout.

>> 
>> > ... and possibly even with KSM pages!
>> > 
>> 
>> For KSM, it looks like an general background routine. I guess you want
>> something like:
>> 
>>    * fork process tree
>>    * do KSM and wait for it to finish?
>
>Probably allocate memory in each process and fill it with the same value.
>
>Then wait for KSM to deduplciate.
>
>>    * migrate
>
>Yes, or instead of migrate, trigger pageout from one process, then check if
>pageout in all other processes (/proc/self/pagemap)
>
>> > I guess the KSM duration would related to the total memory on the system.
>The
>> bigger, the longer. While the selftest generally have timeout setting. Not
>> sure it would be suitable for all situation.
>> 
>> Would you mind sharing more on KSM case?
>
>Worth looking at tools/testing/selftests/mm/ksm_functional_tests.c (and the
>other KSM selftests)

Thanks, I would take a look into this first.

>
>In particular, in ksm_merge() we can detect whether KSM completed the scan
>when two full scans passed.
>
>A lot of that functionality could probably be factored out into vm_utils.c to
>be reused in other tests.
>
>-- 
>Cheers,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-05-30 14:05                     ` Wei Yang
@ 2025-05-30 14:39                       ` David Hildenbrand
  2025-05-30 23:23                         ` Wei Yang
  0 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2025-05-30 14:39 UTC (permalink / raw)
  To: Wei Yang
  Cc: Lorenzo Stoakes, akpm, riel, vbabka, harry.yoo, jannh, baohua,
	linux-mm

On 30.05.25 16:05, Wei Yang wrote:
> On Fri, May 30, 2025 at 10:00:18AM +0200, David Hildenbrand wrote:
> [...]
>>>>
>>>> Thanks for reminding me and sorry for the late reply.
>>>>
>>>>>
>>>>>>>> I am trying to understand what scenarios you want.
>>>>>>>
>>>>>>
>>>>>> Sorry for the late reply, I handled other things a while.
>>>>>>
>>>>>>> That is exactly the task to figure out: how can we actually test our rmap
>>>>>>> implementation from a higher level. The example regarding fork and migration
>>>>>>> is possibly a low-hanging fruit.
>>>>>>
>>>>>> If my understanding is correct, you suggested two high level way:
>>>>>>
>>>>>> 1. fork + migrate (move_pages)
>>>>
>>>> Yes, that should be one way of testing it. We could even get multiple child
>>>> processes into place.
>>>>
>>>>>>
>>>>>>>
>>>>>>> We might already have the functionality to achieve it, *maybe* we'd even want
>>>>>>> some extensions to make it all even easier to test.
>>>>>>>
>>>>>>> For example, MADV_PAGEOUT is refused on folios that are mapped into multiple
>>>>>>> processes. Maybe we'd want the option to *still* page it out, just like
>>>>>>> MPOL_MF_MOVE_ALL allows with CAP_SYS_NICE to *still* migrate a folio that is
>>>>>>> mapped into multiple processes.
>>>>>>>
>>>>>>
>>>>>> 2. madvise(MADV_PAGEOUT)
>>>>>>
>>>>>> Not fully get it here. You mean fork + madvise(MADV_PAGEOUT) + migrate ?
>>>>
>>>> fork + madvise(MADV_PAGEOUT) only.
>>>>
>>>> Then verify, that it was actually paged out.
>>>>
>>>> For example, we could have 10 processes mapping the page, then call
>>>> madvise(MADV_PAGEOUT) from one process. We can verify in all processes if the
>>>> page is actually no longer mapped using /proc/self/pagemap
>>>>
>>>
>>> Hi, David
>>
>> Hi!
>>
>>>
>>> I'd like to clarify some detail here.
>>>
>>>> We could likely do it with anon pages (swap), shmem pages (swap) and
>>>> pagecache pages (file backed).
>>>>
>>>
>>> I could understand the three cases here, but for the (swap) case, it is not
>>> clear to me. You mean we force the page to be swapped out before migration? I
>>> may not have an idea to ensure that.
>>
>> No need to involve migration for that test. It's sufficient to trigger
>> pageout to then check if the page was unmapped in all other processes.
>>
> 
> Here you mean for all anon/shmem/pagecache pages, we could have two category
> tests:
> 
>    * migrate and write different data from on process, verify each process see
>      new data
>    * trigger pageout from one process, verify each process has it pageout
>      (/proc/self/pagemap)
> 
> While one of these category is enough, right? Just like KSM below, migrate or
> pageout.

Both cases will trigger different code paths: for example, migration 
will trigger restoring of migration entries, which is a different rmap 
operation not triggered by pageout/pagein :)

Pageout is probably easier to implement: but we couldn't test hugetlb. 
With migration we probably could also test hugetlb rmap code (yet 
another case we should probably cover).

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-05-30 14:39                       ` David Hildenbrand
@ 2025-05-30 23:23                         ` Wei Yang
  2025-06-03 21:31                           ` David Hildenbrand
  0 siblings, 1 reply; 29+ messages in thread
From: Wei Yang @ 2025-05-30 23:23 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, Lorenzo Stoakes, akpm, riel, vbabka, harry.yoo, jannh,
	baohua, linux-mm

On Fri, May 30, 2025 at 04:39:21PM +0200, David Hildenbrand wrote:
[...]
>> 
>> Here you mean for all anon/shmem/pagecache pages, we could have two category
>> tests:
>> 
>>    * migrate and write different data from on process, verify each process see
>>      new data
>>    * trigger pageout from one process, verify each process has it pageout
>>      (/proc/self/pagemap)
>> 
>> While one of these category is enough, right? Just like KSM below, migrate or
>> pageout.
>
>Both cases will trigger different code paths: for example, migration will
>trigger restoring of migration entries, which is a different rmap operation
>not triggered by pageout/pagein :)
>
>Pageout is probably easier to implement: but we couldn't test hugetlb. With
>migration we probably could also test hugetlb rmap code (yet another case we
>should probably cover).
>

Oh, the pageout/pagein here is the madvise(MADV_PAGEOUT) one you mention in
your previous reply?

I thought you propose another scenario. Sorry for not following your message.

>-- 
>Cheers,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/5] Make anon_vma operations testable
  2025-05-30 23:23                         ` Wei Yang
@ 2025-06-03 21:31                           ` David Hildenbrand
  0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2025-06-03 21:31 UTC (permalink / raw)
  To: Wei Yang
  Cc: Lorenzo Stoakes, akpm, riel, vbabka, harry.yoo, jannh, baohua,
	linux-mm

On 31.05.25 01:23, Wei Yang wrote:
> On Fri, May 30, 2025 at 04:39:21PM +0200, David Hildenbrand wrote:
> [...]
>>>
>>> Here you mean for all anon/shmem/pagecache pages, we could have two category
>>> tests:
>>>
>>>     * migrate and write different data from on process, verify each process see
>>>       new data
>>>     * trigger pageout from one process, verify each process has it pageout
>>>       (/proc/self/pagemap)
>>>
>>> While one of these category is enough, right? Just like KSM below, migrate or
>>> pageout.
>>
>> Both cases will trigger different code paths: for example, migration will
>> trigger restoring of migration entries, which is a different rmap operation
>> not triggered by pageout/pagein :)
>>
>> Pageout is probably easier to implement: but we couldn't test hugetlb. With
>> migration we probably could also test hugetlb rmap code (yet another case we
>> should probably cover).
>>
> 
> Oh, the pageout/pagein here is the madvise(MADV_PAGEOUT) one you mention in
> your previous reply?

Yes, migration (via move_pages(), but requiring two NUMA nodes) vs. 
pageout (via madvise(MADV_PAGEOUT) which would require a similar "force" 
way of succeeding on shared pages).

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2025-06-03 21:31 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-29  9:06 [RFC Patch 0/5] Make anon_vma operations testable Wei Yang
2025-04-29  9:06 ` [RFC Patch 1/5] mm: move anon_vma manipulation functions to own file Wei Yang
2025-04-29  9:06 ` [RFC Patch 2/5] anon_vma: add skeleton code for userland testing of anon_vma logic Wei Yang
2025-05-01  1:31   ` Wei Yang
2025-05-01  9:41     ` Lorenzo Stoakes
2025-05-01 14:45       ` Wei Yang
2025-04-29  9:06 ` [RFC Patch 3/5] anon_vma: add test for mergeable anon_vma Wei Yang
2025-04-29  9:06 ` [RFC Patch 4/5] anon_vma: add test for reusable anon_vma Wei Yang
2025-04-29  9:06 ` [RFC Patch 5/5] anon_vma: add test to assert no double-reuse Wei Yang
2025-04-29  9:31 ` [RFC Patch 0/5] Make anon_vma operations testable Lorenzo Stoakes
2025-04-29  9:38   ` David Hildenbrand
2025-04-29  9:41     ` Lorenzo Stoakes
2025-04-29 23:56       ` Wei Yang
2025-04-30  7:47         ` David Hildenbrand
2025-04-30 15:44           ` Wei Yang
2025-04-30 21:36             ` David Hildenbrand
2025-05-14  1:23           ` Wei Yang
2025-05-27  6:34             ` Wei Yang
2025-05-27 11:31               ` David Hildenbrand
2025-05-28  1:17                 ` Wei Yang
2025-05-30  2:11                 ` Wei Yang
2025-05-30  8:00                   ` David Hildenbrand
2025-05-30 14:05                     ` Wei Yang
2025-05-30 14:39                       ` David Hildenbrand
2025-05-30 23:23                         ` Wei Yang
2025-06-03 21:31                           ` David Hildenbrand
2025-04-29 23:15   ` Wei Yang
2025-04-30 14:38     ` Lorenzo Stoakes
2025-04-30 15:41       ` Wei Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).