* [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
@ 2026-05-27 11:01 tao
2026-05-27 11:01 ` [PATCH 01/15] mm/rmap: introduce anon_rmap APIs for anonymous folios tao
` (17 more replies)
0 siblings, 18 replies; 22+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
TL;DR
-----
This series introduces ANON_VMA_LAZY, which defers anon_vma creation
until it is actually required.
- anon_vma memory reduced by ~92-97%, anon_vma_chain reduced by ~50-57%
- rmap operations on ANON_VMA_LAZY VMAs do not require anon_vma locking
Background
----------
Currently anon_vma structures are created eagerly when anonymous VMAs
are initialized. However, many VMAs never participate in fork or rmap
operations that require anon_vma chains, so the allocated anon_vma and
anon_vma_chain objects are often unnecessary.
Design overview
---------------
ANON_VMA_LAZY defers anon_vma allocation until it is actually needed
(for example during fork). VMAs that never participate in sharing can
avoid creating anon_vma structures entirely.
Before an anon_vma exists, rmap operations rely directly on VMA
information, so no anon_vma locking is required. An anon_vma is created
and linked only when sharing semantics are required.
This series introduces anon_rmap helpers to make rmap less dependent on
direct anon_vma access. It also introduces anon_vma_tree_t as a container
to support both the lazy and the existing anon_vma layouts.
Once a VMA becomes associated with an anon_vma, the normal behavior
remains unchanged.
Memory impact
-------------
Preliminary measurements show significant reductions in anon_vma-related
slab allocations.
After boot:
Object | Before (active KB) | After (active KB) | Change
vm_area_struct | 117035 | 118176 | +1.0%
anon_vma_chain | 18865.8 | 8112.06 | -57.0%
anon_vma | 20426.4 | 613.75 | -97.0%
After launching 24 apps:
Object | Before (active KB) | After (active KB) | Change
vm_area_struct | 196873 | 197345 | +0.2%
anon_vma_chain | 31477.1 | 15576.8 | -50.5%
anon_vma | 33280 | 2648.12 | -92.0%
Simple fork microbenchmarks also show a slight improvement in fork
performance, since child VMAs do not need to allocate anon_vma
structures during fork.
Feedback and suggestions are welcome.
tao (15):
mm/rmap: introduce anon_rmap APIs for anonymous folios
mm: convert anon_vma rmap APIs to anon_rmap
mm: introduce anon_vma_tree_t for multiple anon_vma topologies
mm: switch to anon_vma_tree_t APIs in preparation for ANON_VMA_LAZY
mm: add CONFIG_ANON_VMA_LAZY and folio helpers
mm: add CONFIG_VMA_REF and VMA helpers
mm: replace direct FOLIO_MAPPING_ANON usage with helpers
mm: prepare rmap infrastructure for ANON_VMA_LAZY
mm: implement ANON_VMA_LAZY rmap semantics
mm: defer anon_vma creation with ANON_VMA_LAZY
mm: handle ANON_VMA_LAZY in huge page operations
mm: handle ANON_VMA_LAZY during migration
mm: support setup and upgrade of ANON_VMA_LAZY folios
mm: support merging of ANON_VMA_LAZY VMAs
mm: enable CONFIG_ANON_VMA_LAZY on arm64 and x86_64
arch/arm64/Kconfig | 1 +
arch/x86/Kconfig | 1 +
fs/proc/page.c | 6 +-
include/linux/mm.h | 38 ++
include/linux/mm_types.h | 9 +-
include/linux/page-flags.h | 34 +-
include/linux/pagemap.h | 2 +-
include/linux/rmap.h | 165 ++++++++-
mm/Kconfig | 22 ++
mm/damon/ops-common.c | 4 +-
mm/debug.c | 2 +-
mm/debug_vm_pgtable.c | 2 +-
mm/gup.c | 6 +-
mm/huge_memory.c | 16 +-
mm/internal.h | 171 +++++++++
mm/khugepaged.c | 13 +-
mm/ksm.c | 43 ++-
mm/memory-failure.c | 11 +-
mm/memory.c | 19 +-
mm/migrate.c | 126 ++++---
mm/mmap.c | 15 +-
mm/mremap.c | 4 +-
mm/page_idle.c | 2 +-
mm/rmap.c | 690 ++++++++++++++++++++++++++++++++++---
mm/vma.c | 76 ++--
mm/vma.h | 4 +-
mm/vma_exec.c | 2 +-
mm/vma_init.c | 1 +
28 files changed, 1279 insertions(+), 206 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 01/15] mm/rmap: introduce anon_rmap APIs for anonymous folios
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:44 ` Lorenzo Stoakes
2026-05-27 11:01 ` [PATCH 02/15] mm: convert anon_vma rmap APIs to anon_rmap tao
` (16 subsequent siblings)
17 siblings, 1 reply; 22+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Add a set of anon_rmap APIs to operate on the reverse mappings of
anonymous folios.
Introduce anon_rmap_for_each_vma() as a wrapper around
vma_interval_tree_foreach(), so callers no longer access the
interval tree directly.
This prepares the rmap code for upcoming ANON_VMA_LAZY support and
RCU-based lockless rmap traversal.
No functional change intended.
Signed-off-by: tao <tao.wangtao@honor.com>
---
include/linux/rmap.h | 68 +++++++++++++++++++++++++++++++++++++++++
mm/rmap.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 141 insertions(+)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 8dc0871e5f00..c42314ea4362 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -937,6 +937,44 @@ int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t pgoff,
void remove_migration_ptes(struct folio *src, struct folio *dst,
enum ttu_flags flags);
+/* Reverse mapping handle for anonymous folio rmap helpers. */
+typedef struct anon_rmap {
+ unsigned long rmap;
+} anon_rmap_t;
+
+#define ANON_RMAP_NULL make_anon_rmap(0)
+
+static inline anon_rmap_t make_anon_rmap(const void *anon_mapping)
+{
+ return (anon_rmap_t){ .rmap = (unsigned long)anon_mapping, };
+}
+
+static inline unsigned long anon_rmap_value(anon_rmap_t anon_rmap)
+{
+ return anon_rmap.rmap;
+}
+
+static inline anon_rmap_t anon_vma_to_anon_rmap(const struct anon_vma *anon_vma)
+{
+ return make_anon_rmap(anon_vma);
+}
+
+static inline struct anon_vma *anon_rmap_to_anon_vma(anon_rmap_t anon_rmap)
+{
+ unsigned long rmap = anon_rmap_value(anon_rmap);
+
+ return (struct anon_vma *)rmap;
+}
+
+anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *vma);
+void put_anon_rmap(anon_rmap_t anon_rmap);
+void anon_rmap_lock_write(anon_rmap_t anon_rmap);
+int anon_rmap_trylock_write(anon_rmap_t anon_rmap);
+void anon_rmap_unlock_write(anon_rmap_t anon_rmap);
+void anon_rmap_lock_read(anon_rmap_t anon_rmap);
+int anon_rmap_trylock_read(anon_rmap_t anon_rmap);
+void anon_rmap_unlock_read(anon_rmap_t anon_rmap);
+
/*
* rmap_walk_control: To control rmap traversing for specific needs
*
@@ -969,6 +1007,36 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc);
struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
struct rmap_walk_control *rwc);
+bool folio_maybe_same_anon_vma(const struct folio *folio,
+ const struct vm_area_struct *vma);
+anon_rmap_t folio_get_anon_rmap(const struct folio *folio);
+anon_rmap_t folio_lock_anon_rmap_read(const struct folio *folio,
+ struct rmap_walk_control *rwc);
+
+static inline struct vm_area_struct *anon_rmap_iter_first_vma(
+ anon_rmap_t anon_rmap, unsigned long start, unsigned long last,
+ struct anon_vma_chain **avc)
+{
+ struct anon_vma *anon_vma = anon_rmap_to_anon_vma(anon_rmap);
+
+ *avc = anon_vma_interval_tree_iter_first(&anon_vma->rb_root, start, last);
+ return *avc ? (*avc)->vma : NULL;
+}
+
+static inline struct vm_area_struct *anon_rmap_iter_next_vma(
+ anon_rmap_t anon_rmap, unsigned long start, unsigned long last,
+ struct anon_vma_chain **avc)
+{
+ if (!*avc)
+ return NULL;
+ *avc = anon_vma_interval_tree_iter_next(*avc, start, last);
+ return *avc ? (*avc)->vma : NULL;
+}
+
+#define anon_rmap_foreach_vma(vma, avc, anon_rmap, start, last) \
+ for (vma = anon_rmap_iter_first_vma(anon_rmap, start, last, &avc); \
+ vma; vma = anon_rmap_iter_next_vma(anon_rmap, start, last, &avc))
+
#else /* !CONFIG_MMU */
#define anon_vma_init() do {} while (0)
diff --git a/mm/rmap.c b/mm/rmap.c
index 78b7fb5f367c..1b2dada71778 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -701,6 +701,79 @@ struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
return anon_vma;
}
+anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *vma)
+{
+ mmap_assert_locked(vma->vm_mm);
+ VM_BUG_ON(!vma->anon_vma);
+ get_anon_vma(vma->anon_vma);
+ return anon_vma_to_anon_rmap(vma->anon_vma);
+}
+
+void put_anon_rmap(anon_rmap_t anon_rmap)
+{
+ put_anon_vma(anon_rmap_to_anon_vma(anon_rmap));
+}
+
+void anon_rmap_lock_write(anon_rmap_t anon_rmap)
+{
+ anon_vma_lock_write(anon_rmap_to_anon_vma(anon_rmap));
+}
+
+int anon_rmap_trylock_write(anon_rmap_t anon_rmap)
+{
+ return anon_vma_trylock_write(anon_rmap_to_anon_vma(anon_rmap));
+}
+
+void anon_rmap_unlock_write(anon_rmap_t anon_rmap)
+{
+ anon_vma_unlock_write(anon_rmap_to_anon_vma(anon_rmap));
+}
+
+void anon_rmap_lock_read(anon_rmap_t anon_rmap)
+{
+ anon_vma_lock_read(anon_rmap_to_anon_vma(anon_rmap));
+}
+
+int anon_rmap_trylock_read(anon_rmap_t anon_rmap)
+{
+ return anon_vma_trylock_read(anon_rmap_to_anon_vma(anon_rmap));
+}
+
+void anon_rmap_unlock_read(anon_rmap_t anon_rmap)
+{
+ anon_vma_unlock_read(anon_rmap_to_anon_vma(anon_rmap));
+}
+
+bool folio_maybe_same_anon_vma(const struct folio *folio,
+ const struct vm_area_struct *vma)
+{
+ struct anon_vma *anon_vma;
+ struct anon_vma *tgt_anon_vma = vma->anon_vma;
+ bool same = false;
+
+ rcu_read_lock();
+ anon_vma = folio_anon_vma(folio);
+ if (anon_vma && tgt_anon_vma)
+ same = anon_vma->root == tgt_anon_vma->root;
+ rcu_read_unlock();
+ return same;
+}
+
+anon_rmap_t folio_get_anon_rmap(const struct folio *folio)
+{
+ struct anon_vma *anon_vma = folio_get_anon_vma(folio);
+
+ return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
+}
+
+anon_rmap_t folio_lock_anon_rmap_read(const struct folio *folio,
+ struct rmap_walk_control *rwc)
+{
+ struct anon_vma *anon_vma = folio_lock_anon_vma_read(folio, rwc);
+
+ return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
+}
+
#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
/*
* Flush TLB entries for recently unmapped pages from remote CPUs. It is
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 02/15] mm: convert anon_vma rmap APIs to anon_rmap
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
2026-05-27 11:01 ` [PATCH 01/15] mm/rmap: introduce anon_rmap APIs for anonymous folios tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:49 ` Lorenzo Stoakes
2026-05-27 11:01 ` [PATCH 03/15] mm: introduce anon_vma_tree_t for multiple anon_vma topologies tao
` (15 subsequent siblings)
17 siblings, 1 reply; 22+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Convert the rmap anon_vma interfaces to anon_rmap APIs to clarify the
semantics of anonymous rmap operations and prepare for upcoming
ANON_VMA_LAZY support and RCU-based lockless rmap traversal.
Replace folio_anon_vma(), folio_get_anon_vma(), folio_lock_anon_vma_read(),
anon_vma_trylock_read(), anon_vma_lock_read(), anon_vma_unlock_read(),
anon_vma_trylock_write(), anon_vma_lock_write(), anon_vma_unlock_write(),
and vma_interval_tree_foreach() with the anon_rmap APIs.
No functional change intended.
Signed-off-by: tao <tao.wangtao@honor.com>
---
include/linux/rmap.h | 6 ++--
mm/damon/ops-common.c | 4 +--
mm/huge_memory.c | 16 +++++------
mm/ksm.c | 43 ++++++++++++++---------------
mm/memory-failure.c | 11 ++++----
mm/migrate.c | 64 +++++++++++++++++++++----------------------
mm/page_idle.c | 2 +-
mm/rmap.c | 51 ++++++++++++++++++----------------
8 files changed, 98 insertions(+), 99 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index c42314ea4362..9802bce92695 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -997,15 +997,13 @@ struct rmap_walk_control {
bool (*rmap_one)(struct folio *folio, struct vm_area_struct *vma,
unsigned long addr, void *arg);
int (*done)(struct folio *folio);
- struct anon_vma *(*anon_lock)(const struct folio *folio,
- struct rmap_walk_control *rwc);
+ anon_rmap_t (*anon_lock)(const struct folio *folio,
+ struct rmap_walk_control *rwc);
bool (*invalid_vma)(struct vm_area_struct *vma, void *arg);
};
void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc);
void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc);
-struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
- struct rmap_walk_control *rwc);
bool folio_maybe_same_anon_vma(const struct folio *folio,
const struct vm_area_struct *vma);
diff --git a/mm/damon/ops-common.c b/mm/damon/ops-common.c
index 8c6d613425c1..5788410965b8 100644
--- a/mm/damon/ops-common.c
+++ b/mm/damon/ops-common.c
@@ -172,7 +172,7 @@ void damon_folio_mkold(struct folio *folio)
{
struct rmap_walk_control rwc = {
.rmap_one = damon_folio_mkold_one,
- .anon_lock = folio_lock_anon_vma_read,
+ .anon_lock = folio_lock_anon_rmap_read,
};
if (!folio_mapped(folio) || !folio_raw_mapping(folio)) {
@@ -236,7 +236,7 @@ bool damon_folio_young(struct folio *folio)
struct rmap_walk_control rwc = {
.arg = &accessed,
.rmap_one = damon_folio_young_one,
- .anon_lock = folio_lock_anon_vma_read,
+ .anon_lock = folio_lock_anon_rmap_read,
};
if (!folio_mapped(folio) || !folio_raw_mapping(folio)) {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 970e077019b7..ab3c2397449a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -4051,7 +4051,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
struct folio *end_folio = folio_next(folio);
bool is_anon = folio_test_anon(folio);
struct address_space *mapping = NULL;
- struct anon_vma *anon_vma = NULL;
+ anon_rmap_t anon_rmap = ANON_RMAP_NULL;
int old_order = folio_order(folio);
struct folio *new_folio, *next;
int nr_shmem_dropped = 0;
@@ -4087,12 +4087,12 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
* is taken to serialise against parallel split or collapse
* operations.
*/
- anon_vma = folio_get_anon_vma(folio);
- if (!anon_vma) {
+ anon_rmap = folio_get_anon_rmap(folio);
+ if (!anon_rmap_value(anon_rmap)) {
ret = -EBUSY;
goto out;
}
- anon_vma_lock_write(anon_vma);
+ anon_rmap_lock_write(anon_rmap);
mapping = NULL;
} else {
unsigned int min_order;
@@ -4122,7 +4122,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
}
}
- anon_vma = NULL;
+ anon_rmap = ANON_RMAP_NULL;
i_mmap_lock_read(mapping);
/*
@@ -4200,9 +4200,9 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
}
out_unlock:
- if (anon_vma) {
- anon_vma_unlock_write(anon_vma);
- put_anon_vma(anon_vma);
+ if (anon_rmap_value(anon_rmap)) {
+ anon_rmap_unlock_write(anon_rmap);
+ put_anon_rmap(anon_rmap);
}
if (mapping)
i_mmap_unlock_read(mapping);
diff --git a/mm/ksm.c b/mm/ksm.c
index 7d5b76478f0b..f4c204a8a379 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -187,7 +187,7 @@ struct ksm_stable_node {
/**
* struct ksm_rmap_item - reverse mapping item for virtual addresses
* @rmap_list: next rmap_item in mm_slot's singly-linked rmap_list
- * @anon_vma: pointer to anon_vma for this mm,address, when in stable tree
+ * @anon_rmap: anonymous folio rmap for this mm,address, when in stable tree
* @nid: NUMA node id of unstable tree in which linked (may not match page)
* @mm: the memory structure this rmap_item is pointing into
* @address: the virtual address this rmap_item tracks (+ flags in low bits)
@@ -201,7 +201,7 @@ struct ksm_stable_node {
struct ksm_rmap_item {
struct ksm_rmap_item *rmap_list;
union {
- struct anon_vma *anon_vma; /* when stable */
+ anon_rmap_t anon_rmap; /* when stable */
#ifdef CONFIG_NUMA
int nid; /* when node of unstable tree */
#endif
@@ -786,7 +786,7 @@ static void break_cow(struct ksm_rmap_item *rmap_item)
* It is not an accident that whenever we want to break COW
* to undo, we also need to drop a reference to the anon_vma.
*/
- put_anon_vma(rmap_item->anon_vma);
+ put_anon_rmap(rmap_item->anon_rmap);
mmap_read_lock(mm);
vma = find_mergeable_vma(mm, addr);
@@ -898,7 +898,7 @@ static void remove_node_from_stable_tree(struct ksm_stable_node *stable_node)
VM_BUG_ON(stable_node->rmap_hlist_len <= 0);
stable_node->rmap_hlist_len--;
- put_anon_vma(rmap_item->anon_vma);
+ put_anon_rmap(rmap_item->anon_rmap);
rmap_item->address &= PAGE_MASK;
cond_resched();
}
@@ -1051,7 +1051,7 @@ static void remove_rmap_item_from_tree(struct ksm_rmap_item *rmap_item)
VM_BUG_ON(stable_node->rmap_hlist_len <= 0);
stable_node->rmap_hlist_len--;
- put_anon_vma(rmap_item->anon_vma);
+ put_anon_rmap(rmap_item->anon_rmap);
rmap_item->head = NULL;
rmap_item->address &= PAGE_MASK;
@@ -1598,9 +1598,8 @@ static int try_to_merge_with_ksm_page(struct ksm_rmap_item *rmap_item,
/* Unstable nid is in union with stable anon_vma: remove first */
remove_rmap_item_from_tree(rmap_item);
- /* Must get reference to anon_vma while still holding mmap_lock */
- rmap_item->anon_vma = vma->anon_vma;
- get_anon_vma(vma->anon_vma);
+ /* Must get reference to anon_rmap while still holding mmap_lock */
+ rmap_item->anon_rmap = vma_get_anon_rmap(vma);
out:
mmap_read_unlock(mm);
trace_ksm_merge_with_ksm_page(kpage, page_to_pfn(kpage ? kpage : page),
@@ -3108,7 +3107,6 @@ struct folio *ksm_might_need_to_copy(struct folio *folio,
struct vm_area_struct *vma, unsigned long addr)
{
struct page *page = folio_page(folio, 0);
- struct anon_vma *anon_vma = folio_anon_vma(folio);
struct folio *new_folio;
if (folio_test_large(folio))
@@ -3118,10 +3116,10 @@ struct folio *ksm_might_need_to_copy(struct folio *folio,
if (folio_stable_node(folio) &&
!(ksm_run & KSM_RUN_UNMERGE))
return folio; /* no need to copy it */
- } else if (!anon_vma) {
+ } else if (!folio_test_anon(folio)) {
return folio; /* no need to copy it */
} else if (folio->index == linear_page_index(vma, addr) &&
- anon_vma->root == vma->anon_vma->root) {
+ folio_maybe_same_anon_vma(folio, vma)) {
return folio; /* still no need to copy it */
}
if (PageHWPoison(page))
@@ -3173,20 +3171,20 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) {
/* Ignore the stable/unstable/sqnr flags */
const unsigned long addr = rmap_item->address & PAGE_MASK;
- struct anon_vma *anon_vma = rmap_item->anon_vma;
+ anon_rmap_t anon_rmap = rmap_item->anon_rmap;
struct anon_vma_chain *vmac;
struct vm_area_struct *vma;
cond_resched();
- if (!anon_vma_trylock_read(anon_vma)) {
+ if (!anon_rmap_trylock_read(anon_rmap)) {
if (rwc->try_lock) {
rwc->contended = true;
return;
}
- anon_vma_lock_read(anon_vma);
+ anon_rmap_lock_read(anon_rmap);
}
- anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
+ anon_rmap_foreach_vma(vma, vmac, anon_rmap,
0, ULONG_MAX) {
cond_resched();
@@ -3207,15 +3205,15 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
continue;
if (!rwc->rmap_one(folio, vma, addr, rwc->arg)) {
- anon_vma_unlock_read(anon_vma);
+ anon_rmap_unlock_read(anon_rmap);
return;
}
if (rwc->done && rwc->done(folio)) {
- anon_vma_unlock_read(anon_vma);
+ anon_rmap_unlock_read(anon_rmap);
return;
}
}
- anon_vma_unlock_read(anon_vma);
+ anon_rmap_unlock_read(anon_rmap);
}
if (!search_new_forks++)
goto again;
@@ -3237,9 +3235,9 @@ void collect_procs_ksm(const struct folio *folio, const struct page *page,
if (!stable_node)
return;
hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) {
- struct anon_vma *av = rmap_item->anon_vma;
+ anon_rmap_t anon_rmap = rmap_item->anon_rmap;
- anon_vma_lock_read(av);
+ anon_rmap_lock_read(anon_rmap);
rcu_read_lock();
for_each_process(tsk) {
struct anon_vma_chain *vmac;
@@ -3248,10 +3246,9 @@ void collect_procs_ksm(const struct folio *folio, const struct page *page,
task_early_kill(tsk, force_early);
if (!t)
continue;
- anon_vma_interval_tree_foreach(vmac, &av->rb_root, 0,
+ anon_rmap_foreach_vma(vma, vmac, anon_rmap, 0,
ULONG_MAX)
{
- vma = vmac->vma;
if (vma->vm_mm == t->mm) {
addr = rmap_item->address & PAGE_MASK;
add_to_kill_ksm(t, page, vma, to_kill,
@@ -3260,7 +3257,7 @@ void collect_procs_ksm(const struct folio *folio, const struct page *page,
}
}
rcu_read_unlock();
- anon_vma_unlock_read(av);
+ anon_rmap_unlock_read(anon_rmap);
}
}
#endif
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ee42d4361309..bc9abba75b5d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -547,11 +547,11 @@ static void collect_procs_anon(const struct folio *folio,
int force_early)
{
struct task_struct *tsk;
- struct anon_vma *av;
+ anon_rmap_t anon_rmap;
pgoff_t pgoff;
- av = folio_lock_anon_vma_read(folio, NULL);
- if (av == NULL) /* Not actually mapped anymore */
+ anon_rmap = folio_lock_anon_rmap_read(folio, NULL);
+ if (!anon_rmap_value(anon_rmap)) /* Not actually mapped anymore */
return;
pgoff = page_pgoff(folio, page);
@@ -564,9 +564,8 @@ static void collect_procs_anon(const struct folio *folio,
if (!t)
continue;
- anon_vma_interval_tree_foreach(vmac, &av->rb_root,
+ anon_rmap_foreach_vma(vma, vmac, anon_rmap,
pgoff, pgoff) {
- vma = vmac->vma;
if (vma->vm_mm != t->mm)
continue;
addr = page_mapped_in_vma(page, vma);
@@ -574,7 +573,7 @@ static void collect_procs_anon(const struct folio *folio,
}
}
rcu_read_unlock();
- anon_vma_unlock_read(av);
+ anon_rmap_unlock_read(anon_rmap);
}
/*
diff --git a/mm/migrate.c b/mm/migrate.c
index 8a64291ab5b4..769983cf14e0 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1142,18 +1142,18 @@ enum {
static void __migrate_folio_record(struct folio *dst,
int old_page_state,
- struct anon_vma *anon_vma)
+ anon_rmap_t anon_rmap)
{
- dst->private = (void *)anon_vma + old_page_state;
+ dst->private = (void *)anon_rmap_to_anon_vma(anon_rmap) + old_page_state;
}
static void __migrate_folio_extract(struct folio *dst,
int *old_page_state,
- struct anon_vma **anon_vmap)
+ anon_rmap_t *anon_rmapp)
{
unsigned long private = (unsigned long)dst->private;
- *anon_vmap = (struct anon_vma *)(private & ~PAGE_OLD_STATES);
+ *anon_rmapp = anon_vma_to_anon_rmap((void *)(private & ~PAGE_OLD_STATES));
*old_page_state = private & PAGE_OLD_STATES;
dst->private = NULL;
}
@@ -1161,15 +1161,15 @@ static void __migrate_folio_extract(struct folio *dst,
/* Restore the source folio to the original state upon failure */
static void migrate_folio_undo_src(struct folio *src,
int page_was_mapped,
- struct anon_vma *anon_vma,
+ anon_rmap_t anon_rmap,
bool locked,
struct list_head *ret)
{
if (page_was_mapped)
remove_migration_ptes(src, src, 0);
- /* Drop an anon_vma reference if we took one */
- if (anon_vma)
- put_anon_vma(anon_vma);
+ /* Drop an anon_rmap reference if we took one */
+ if (anon_rmap_value(anon_rmap))
+ put_anon_rmap(anon_rmap);
if (locked)
folio_unlock(src);
if (ret)
@@ -1210,7 +1210,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
struct folio *dst;
int rc = -EAGAIN;
int old_page_state = 0;
- struct anon_vma *anon_vma = NULL;
+ anon_rmap_t anon_rmap = ANON_RMAP_NULL;
bool locked = false;
bool dst_locked = false;
@@ -1275,19 +1275,19 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
/*
* By try_to_migrate(), src->mapcount goes down to 0 here. In this case,
* we cannot notice that anon_vma is freed while we migrate a page.
- * This get_anon_vma() delays freeing anon_vma pointer until the end
+ * This get_anon_rmap() delays freeing anon_rmap pointer until the end
* of migration. File cache pages are no problem because of page_lock()
* File Caches may use write_page() or lock_page() in migration, then,
* just care Anon page here.
*
- * Only folio_get_anon_vma() understands the subtleties of
- * getting a hold on an anon_vma from outside one of its mms.
- * But if we cannot get anon_vma, then we won't need it anyway,
+ * Only folio_get_anon_rmap() understands the subtleties of
+ * getting a hold on an anon_rmap from outside one of its mms.
+ * But if we cannot get anon_rmap, then we won't need it anyway,
* because that implies that the anon page is no longer mapped
* (and cannot be remapped so long as we hold the page lock).
*/
if (folio_test_anon(src) && !folio_test_ksm(src))
- anon_vma = folio_get_anon_vma(src);
+ anon_rmap = folio_get_anon_rmap(src);
/*
* Block others from accessing the new page when we get around to
@@ -1302,7 +1302,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
dst_locked = true;
if (unlikely(page_has_movable_ops(&src->page))) {
- __migrate_folio_record(dst, old_page_state, anon_vma);
+ __migrate_folio_record(dst, old_page_state, anon_rmap);
return 0;
}
@@ -1326,13 +1326,13 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
} else if (folio_mapped(src)) {
/* Establish migration ptes */
VM_BUG_ON_FOLIO(folio_test_anon(src) &&
- !folio_test_ksm(src) && !anon_vma, src);
+ !folio_test_ksm(src) && !anon_rmap_value(anon_rmap), src);
try_to_migrate(src, mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0);
old_page_state |= PAGE_WAS_MAPPED;
}
if (!folio_mapped(src)) {
- __migrate_folio_record(dst, old_page_state, anon_vma);
+ __migrate_folio_record(dst, old_page_state, anon_rmap);
return 0;
}
@@ -1345,7 +1345,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
ret = NULL;
migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED,
- anon_vma, locked, ret);
+ anon_rmap, locked, ret);
migrate_folio_undo_dst(dst, dst_locked, put_new_folio, private);
return rc;
@@ -1359,12 +1359,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
{
int rc;
int old_page_state = 0;
- struct anon_vma *anon_vma = NULL;
+ anon_rmap_t anon_rmap = ANON_RMAP_NULL;
bool src_deferred_split = false;
bool src_partially_mapped = false;
struct list_head *prev;
- __migrate_folio_extract(dst, &old_page_state, &anon_vma);
+ __migrate_folio_extract(dst, &old_page_state, &anon_rmap);
prev = dst->lru.prev;
list_del(&dst->lru);
@@ -1425,9 +1425,9 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
* and will be freed.
*/
list_del(&src->lru);
- /* Drop an anon_vma reference if we took one */
- if (anon_vma)
- put_anon_vma(anon_vma);
+ /* Drop an anon_rmap reference if we took one */
+ if (anon_rmap_value(anon_rmap))
+ put_anon_rmap(anon_rmap);
folio_unlock(src);
migrate_folio_done(src, reason);
@@ -1439,12 +1439,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
*/
if (rc == -EAGAIN) {
list_add(&dst->lru, prev);
- __migrate_folio_record(dst, old_page_state, anon_vma);
+ __migrate_folio_record(dst, old_page_state, anon_rmap);
return rc;
}
migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED,
- anon_vma, true, ret);
+ anon_rmap, true, ret);
migrate_folio_undo_dst(dst, true, put_new_folio, private);
return rc;
@@ -1476,7 +1476,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
struct folio *dst;
int rc = -EAGAIN;
int page_was_mapped = 0;
- struct anon_vma *anon_vma = NULL;
+ anon_rmap_t anon_rmap = ANON_RMAP_NULL;
struct address_space *mapping = NULL;
enum ttu_flags ttu = 0;
@@ -1513,7 +1513,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
}
if (folio_test_anon(src))
- anon_vma = folio_get_anon_vma(src);
+ anon_rmap = folio_get_anon_rmap(src);
if (unlikely(!folio_trylock(dst)))
goto put_anon;
@@ -1550,8 +1550,8 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
folio_unlock(dst);
put_anon:
- if (anon_vma)
- put_anon_vma(anon_vma);
+ if (anon_rmap_value(anon_rmap))
+ put_anon_rmap(anon_rmap);
if (!rc) {
move_hugetlb_state(src, dst, reason);
@@ -1778,11 +1778,11 @@ static void migrate_folios_undo(struct list_head *src_folios,
dst2 = list_next_entry(dst, lru);
list_for_each_entry_safe(folio, folio2, src_folios, lru) {
int old_page_state = 0;
- struct anon_vma *anon_vma = NULL;
+ anon_rmap_t anon_rmap = ANON_RMAP_NULL;
- __migrate_folio_extract(dst, &old_page_state, &anon_vma);
+ __migrate_folio_extract(dst, &old_page_state, &anon_rmap);
migrate_folio_undo_src(folio, old_page_state & PAGE_WAS_MAPPED,
- anon_vma, true, ret_folios);
+ anon_rmap, true, ret_folios);
list_del(&dst->lru);
migrate_folio_undo_dst(dst, true, put_new_folio, private);
dst = dst2;
diff --git a/mm/page_idle.c b/mm/page_idle.c
index 9c67cbac2965..d4103f20f526 100644
--- a/mm/page_idle.c
+++ b/mm/page_idle.c
@@ -102,7 +102,7 @@ static void page_idle_clear_pte_refs(struct folio *folio)
*/
static struct rmap_walk_control rwc = {
.rmap_one = page_idle_clear_pte_refs_one,
- .anon_lock = folio_lock_anon_vma_read,
+ .anon_lock = folio_lock_anon_rmap_read,
};
if (!folio_mapped(folio) || !folio_raw_mapping(folio))
diff --git a/mm/rmap.c b/mm/rmap.c
index 1b2dada71778..41607168e00e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -630,8 +630,8 @@ struct anon_vma *folio_get_anon_vma(const struct folio *folio)
* reference like with folio_get_anon_vma() and then block on the mutex
* on !rwc->try_lock case.
*/
-struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
- struct rmap_walk_control *rwc)
+static struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
+ struct rmap_walk_control *rwc)
{
struct anon_vma *anon_vma = NULL;
struct anon_vma *root_anon_vma;
@@ -744,6 +744,14 @@ void anon_rmap_unlock_read(anon_rmap_t anon_rmap)
anon_vma_unlock_read(anon_rmap_to_anon_vma(anon_rmap));
}
+static anon_rmap_t folio_anon_rmap(const struct folio *folio)
+{
+ struct anon_vma *anon_vma;
+
+ anon_vma = folio_anon_vma(folio);
+ return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
+}
+
bool folio_maybe_same_anon_vma(const struct folio *folio,
const struct vm_area_struct *vma)
{
@@ -930,13 +938,11 @@ unsigned long page_address_in_vma(const struct folio *folio,
const struct page *page, const struct vm_area_struct *vma)
{
if (folio_test_anon(folio)) {
- struct anon_vma *anon_vma = folio_anon_vma(folio);
/*
* Note: swapoff's unuse_vma() is more efficient with this
* check, and needs it to match anon_vma when KSM is active.
*/
- if (!vma->anon_vma || !anon_vma ||
- vma->anon_vma->root != anon_vma->root)
+ if (!vma->anon_vma || !folio_maybe_same_anon_vma(folio, vma))
return -EFAULT;
} else if (!vma->vm_file) {
return -EFAULT;
@@ -944,7 +950,7 @@ unsigned long page_address_in_vma(const struct folio *folio,
return -EFAULT;
}
- /* KSM folios don't reach here because of the !anon_vma check */
+ /* The !folio_maybe_same_anon_vma() above handles KSM folios */
return vma_address(vma, page_pgoff(folio, page), 1);
}
@@ -1145,7 +1151,7 @@ int folio_referenced(struct folio *folio, int is_locked,
struct rmap_walk_control rwc = {
.rmap_one = folio_referenced_one,
.arg = (void *)&pra,
- .anon_lock = folio_lock_anon_vma_read,
+ .anon_lock = folio_lock_anon_rmap_read,
.try_lock = true,
.invalid_vma = invalid_folio_referenced_vma,
};
@@ -1580,8 +1586,7 @@ static void __page_check_anon_rmap(const struct folio *folio,
* are initially only visible via the pagetables, and the pte is locked
* over the call to folio_add_new_anon_rmap.
*/
- VM_BUG_ON_FOLIO(folio_anon_vma(folio)->root != vma->anon_vma->root,
- folio);
+ VM_BUG_ON_FOLIO(!folio_maybe_same_anon_vma(folio, vma), folio);
VM_BUG_ON_PAGE(page_pgoff(folio, page) != linear_page_index(vma, address),
page);
}
@@ -2468,7 +2473,7 @@ void try_to_unmap(struct folio *folio, enum ttu_flags flags)
.rmap_one = try_to_unmap_one,
.arg = (void *)flags,
.done = folio_not_mapped,
- .anon_lock = folio_lock_anon_vma_read,
+ .anon_lock = folio_lock_anon_rmap_read,
};
if (flags & TTU_RMAP_LOCKED)
@@ -2813,7 +2818,7 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags)
.rmap_one = try_to_migrate_one,
.arg = (void *)flags,
.done = folio_not_mapped,
- .anon_lock = folio_lock_anon_vma_read,
+ .anon_lock = folio_lock_anon_rmap_read,
};
/*
@@ -2990,8 +2995,8 @@ void __put_anon_vma(struct anon_vma *anon_vma)
anon_vma_free(root);
}
-static struct anon_vma *rmap_walk_anon_lock(const struct folio *folio,
- struct rmap_walk_control *rwc)
+static anon_rmap_t rmap_walk_anon_lock(const struct folio *folio,
+ struct rmap_walk_control *rwc)
{
struct anon_vma *anon_vma;
@@ -3006,7 +3011,7 @@ static struct anon_vma *rmap_walk_anon_lock(const struct folio *folio,
*/
anon_vma = folio_anon_vma(folio);
if (!anon_vma)
- return NULL;
+ return ANON_RMAP_NULL;
if (anon_vma_trylock_read(anon_vma))
goto out;
@@ -3019,7 +3024,7 @@ static struct anon_vma *rmap_walk_anon_lock(const struct folio *folio,
anon_vma_lock_read(anon_vma);
out:
- return anon_vma;
+ return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
}
/*
@@ -3035,9 +3040,10 @@ static struct anon_vma *rmap_walk_anon_lock(const struct folio *folio,
static void rmap_walk_anon(struct folio *folio,
struct rmap_walk_control *rwc, bool locked)
{
- struct anon_vma *anon_vma;
+ anon_rmap_t anon_rmap;
pgoff_t pgoff_start, pgoff_end;
struct anon_vma_chain *avc;
+ struct vm_area_struct *vma;
/*
* The folio lock ensures that folio->mapping can't be changed under us
@@ -3046,20 +3052,19 @@ static void rmap_walk_anon(struct folio *folio,
VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
if (locked) {
- anon_vma = folio_anon_vma(folio);
+ anon_rmap = folio_anon_rmap(folio);
/* anon_vma disappear under us? */
- VM_BUG_ON_FOLIO(!anon_vma, folio);
+ VM_BUG_ON_FOLIO(!anon_rmap_value(anon_rmap), folio);
} else {
- anon_vma = rmap_walk_anon_lock(folio, rwc);
+ anon_rmap = rmap_walk_anon_lock(folio, rwc);
}
- if (!anon_vma)
+ if (!anon_rmap_value(anon_rmap))
return;
pgoff_start = folio_pgoff(folio);
pgoff_end = pgoff_start + folio_nr_pages(folio) - 1;
- anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root,
+ anon_rmap_foreach_vma(vma, avc, anon_rmap,
pgoff_start, pgoff_end) {
- struct vm_area_struct *vma = avc->vma;
unsigned long address = vma_address(vma, pgoff_start,
folio_nr_pages(folio));
@@ -3076,7 +3081,7 @@ static void rmap_walk_anon(struct folio *folio,
}
if (!locked)
- anon_vma_unlock_read(anon_vma);
+ anon_rmap_unlock_read(anon_rmap);
}
/**
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 03/15] mm: introduce anon_vma_tree_t for multiple anon_vma topologies
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
2026-05-27 11:01 ` [PATCH 01/15] mm/rmap: introduce anon_rmap APIs for anonymous folios tao
2026-05-27 11:01 ` [PATCH 02/15] mm: convert anon_vma rmap APIs to anon_rmap tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:56 ` Lorenzo Stoakes
2026-05-27 11:01 ` [PATCH 04/15] mm: switch to anon_vma_tree_t APIs in preparation for ANON_VMA_LAZY tao
` (14 subsequent siblings)
17 siblings, 1 reply; 22+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Prepare for upcoming ANON_VMA_LAZY support and RCU-based lockless rmap
traversal by clearly separating anon_vma topology handling from the
anon_rmap semantics.
Prepare for supporting multiple anon_vma topologies by introducing
lightweight abstractions used by the VMA and rmap code.
Introduce anon_vma_tree_t as the type stored in vma->anon_vma:
typedef unsigned long anon_vma_tree_t;
It represents a tagged pointer encoding a reference to the anon_vma
topology. The low bits are reserved as type tags to distinguish
different implementations (e.g. regular anon_vma and lazy anon_vma).
This keeps the VMA representation compact while allowing the topology
to evolve without changing the VMA layout.
Signed-off-by: tao <tao.wangtao@honor.com>
---
include/linux/mm_types.h | 3 +++
mm/internal.h | 54 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 57 insertions(+)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index a308e2c23b82..5f4961ea1572 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -917,6 +917,9 @@ struct vm_area_desc {
struct mmap_action action;
};
+/* Tagged pointer stored in vma->anon_vma. Low bits encode anon_vma type. */
+typedef unsigned long anon_vma_tree_t;
+
/*
* This struct describes a virtual memory area. There is one of these
* per VM-area/task. A VM area is any part of the process virtual memory
diff --git a/mm/internal.h b/mm/internal.h
index 5a2ddcf68e0b..76544ad44ff0 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -246,6 +246,60 @@ static inline void anon_vma_unlock_read(struct anon_vma *anon_vma)
up_read(&anon_vma->root->rwsem);
}
+/* anon_vma_tree_t APIs */
+
+static inline anon_vma_tree_t make_anon_vma_tree(struct anon_vma *anon_vma)
+{
+ return (anon_vma_tree_t)anon_vma;
+}
+
+static inline struct anon_vma *anon_vma_tree_anon_vma(anon_vma_tree_t anon_tree)
+{
+ return (struct anon_vma *)anon_tree;
+}
+
+static inline void anon_vma_tree_lock_write(anon_vma_tree_t anon_tree)
+{
+ struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
+
+ anon_vma_lock_write(anon_vma);
+}
+
+static inline int anon_vma_tree_trylock_write(anon_vma_tree_t anon_tree)
+{
+ struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
+
+ return anon_vma_trylock_write(anon_vma);
+}
+
+static inline void anon_vma_tree_unlock_write(anon_vma_tree_t anon_tree)
+{
+ struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
+
+ anon_vma_unlock_write(anon_vma);
+}
+
+static inline void anon_vma_tree_lock_read(anon_vma_tree_t anon_tree)
+{
+ struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
+
+ anon_vma_lock_read(anon_vma);
+}
+
+static inline int anon_vma_tree_trylock_read(anon_vma_tree_t anon_tree)
+{
+ struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
+
+ return anon_vma_trylock_read(anon_vma);
+}
+
+static inline void anon_vma_tree_unlock_read(anon_vma_tree_t anon_tree)
+{
+ struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
+
+ anon_vma_unlock_read(anon_vma);
+}
+
struct anon_vma *folio_get_anon_vma(const struct folio *folio);
/* Operations which modify VMAs. */
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 04/15] mm: switch to anon_vma_tree_t APIs in preparation for ANON_VMA_LAZY
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (2 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 03/15] mm: introduce anon_vma_tree_t for multiple anon_vma topologies tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 05/15] mm: add CONFIG_ANON_VMA_LAZY and folio helpers tao
` (13 subsequent siblings)
17 siblings, 0 replies; 22+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Replace direct anon_vma usage with anon_vma_tree_t APIs. This prepares
for ANON_VMA_LAZY and prevents external modules from accessing anon_vma
directly.
Signed-off-by: tao <tao.wangtao@honor.com>
---
include/linux/mm_types.h | 2 +-
mm/debug.c | 2 +-
mm/internal.h | 16 +++++++++++
mm/khugepaged.c | 8 +++---
mm/memory.c | 2 +-
mm/mmap.c | 2 +-
mm/mremap.c | 4 +--
mm/rmap.c | 59 ++++++++++++++++++++++------------------
mm/vma.c | 26 +++++++++---------
mm/vma.h | 4 +--
10 files changed, 73 insertions(+), 52 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5f4961ea1572..e7f5debac98e 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -987,7 +987,7 @@ struct vm_area_struct {
*/
struct list_head anon_vma_chain; /* Serialized by mmap_lock &
* page_table_lock */
- struct anon_vma *anon_vma; /* Serialized by page_table_lock */
+ anon_vma_tree_t anon_vma; /* Serialized by page_table_lock */
/* Function pointers to deal with this struct. */
const struct vm_operations_struct *vm_ops;
diff --git a/mm/debug.c b/mm/debug.c
index 77fa8fe1d641..f64cf9c9abbb 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -163,7 +163,7 @@ void dump_vma(const struct vm_area_struct *vma)
"flags: %#lx(%pGv)\n",
vma, (void *)vma->vm_start, (void *)vma->vm_end, vma->vm_mm,
(unsigned long)pgprot_val(vma->vm_page_prot),
- vma->anon_vma, vma->vm_ops, vma->vm_pgoff,
+ (void *)vma->anon_vma, vma->vm_ops, vma->vm_pgoff,
vma->vm_file, vma->vm_private_data,
#ifdef CONFIG_PER_VMA_LOCK
refcount_read(&vma->vm_refcnt),
diff --git a/mm/internal.h b/mm/internal.h
index 76544ad44ff0..3dbbd118a78c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -258,6 +258,22 @@ static inline struct anon_vma *anon_vma_tree_anon_vma(anon_vma_tree_t anon_tree)
return (struct anon_vma *)anon_tree;
}
+/* Store anon_vma in vma->anon_vma using a tagged pointer. */
+static inline void vma_set_anon_vma(struct vm_area_struct *vma,
+ struct anon_vma *anon_vma)
+{
+ vma->anon_vma = (anon_vma_tree_t)anon_vma;
+}
+
+/* Return the VMA's anon_vma. */
+static inline struct anon_vma *vma_anon_vma(const struct vm_area_struct *vma)
+{
+ /* Use READ_ONCE() for reusable_anon_vma */
+ anon_vma_tree_t anon_tree = READ_ONCE(vma->anon_vma);
+
+ return anon_vma_tree_anon_vma(anon_tree);
+}
+
static inline void anon_vma_tree_lock_write(anon_vma_tree_t anon_tree)
{
struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b8452dbdb043..747748eace91 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -761,7 +761,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte,
* Re-establish the PMD to point to the original page table
* entry. Restoring PMD needs to be done prior to releasing
* pages. Since pages are still isolated and locked here,
- * acquiring anon_vma_lock_write is unnecessary.
+ * acquiring anon_vma_tree_lock_write is unnecessary.
*/
pmd_ptl = pmd_lock(vma->vm_mm, pmd);
pmd_populate(vma->vm_mm, pmd, pmd_pgtable(orig_pmd));
@@ -1164,7 +1164,7 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
if (result != SCAN_SUCCEED)
goto out_up_write;
- anon_vma_lock_write(vma->anon_vma);
+ anon_vma_tree_lock_write(vma->anon_vma);
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address,
address + HPAGE_PMD_SIZE);
@@ -1205,7 +1205,7 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
*/
pmd_populate(mm, pmd, pmd_pgtable(_pmd));
spin_unlock(pmd_ptl);
- anon_vma_unlock_write(vma->anon_vma);
+ anon_vma_tree_unlock_write(vma->anon_vma);
goto out_up_write;
}
@@ -1213,7 +1213,7 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
* All pages are isolated and locked so anon_vma rmap
* can't run anymore.
*/
- anon_vma_unlock_write(vma->anon_vma);
+ anon_vma_tree_unlock_write(vma->anon_vma);
result = __collapse_huge_page_copy(pte, folio, pmd, _pmd,
vma, address, pte_ptl,
diff --git a/mm/memory.c b/mm/memory.c
index 86a973119bd4..c13b79987b26 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -602,7 +602,7 @@ static void print_bad_page_map(struct vm_area_struct *vma,
if (page)
dump_page(page, "bad page map");
pr_alert("addr:%px vm_flags:%08lx anon_vma:%px mapping:%px index:%lx\n",
- (void *)addr, vma->vm_flags, vma->anon_vma, mapping, index);
+ (void *)addr, vma->vm_flags, (void *)vma->anon_vma, mapping, index);
pr_alert("file:%pD fault:%ps mmap:%ps mmap_prepare: %ps read_folio:%ps\n",
vma->vm_file,
vma->vm_ops ? vma->vm_ops->fault : NULL,
diff --git a/mm/mmap.c b/mm/mmap.c
index 5754d1c36462..eac1fb3823eb 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1799,7 +1799,7 @@ __latent_entropy int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
* Don't prepare anon_vma until fault since we don't
* copy page for current vma.
*/
- tmp->anon_vma = NULL;
+ vma_set_anon_vma(tmp, NULL);
} else if (anon_vma_fork(tmp, mpnt))
goto fail_nomem_anon_vma_fork;
vm_flags_clear(tmp, VM_LOCKED_MASK);
diff --git a/mm/mremap.c b/mm/mremap.c
index e9c8b1d05832..6af41e58f79f 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -145,13 +145,13 @@ static void take_rmap_locks(struct vm_area_struct *vma)
if (vma->vm_file)
i_mmap_lock_write(vma->vm_file->f_mapping);
if (vma->anon_vma)
- anon_vma_lock_write(vma->anon_vma);
+ anon_vma_tree_lock_write(vma->anon_vma);
}
static void drop_rmap_locks(struct vm_area_struct *vma)
{
if (vma->anon_vma)
- anon_vma_unlock_write(vma->anon_vma);
+ anon_vma_tree_unlock_write(vma->anon_vma);
if (vma->vm_file)
i_mmap_unlock_write(vma->vm_file->f_mapping);
}
diff --git a/mm/rmap.c b/mm/rmap.c
index 41607168e00e..5c4eb090c801 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -186,6 +186,7 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
{
struct mm_struct *mm = vma->vm_mm;
struct anon_vma *anon_vma, *allocated;
+ anon_vma_tree_t anon_tree;
struct anon_vma_chain *avc;
mmap_assert_locked(mm);
@@ -205,11 +206,12 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
allocated = anon_vma;
}
- anon_vma_lock_write(anon_vma);
+ anon_tree = make_anon_vma_tree(anon_vma);
+ anon_vma_tree_lock_write(anon_tree);
/* page_table_lock to protect against threads */
spin_lock(&mm->page_table_lock);
if (likely(!vma->anon_vma)) {
- vma->anon_vma = anon_vma;
+ vma->anon_vma = anon_tree;
anon_vma_chain_assign(vma, avc, anon_vma);
anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
anon_vma->num_active_vmas++;
@@ -217,7 +219,7 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
avc = NULL;
}
spin_unlock(&mm->page_table_lock);
- anon_vma_unlock_write(anon_vma);
+ anon_vma_tree_unlock_write(anon_tree);
if (unlikely(allocated))
put_anon_vma(allocated);
@@ -283,7 +285,7 @@ static void maybe_reuse_anon_vma(struct vm_area_struct *dst,
if (anon_vma->num_children > 1)
return;
- dst->anon_vma = anon_vma;
+ vma_set_anon_vma(dst, anon_vma);
anon_vma->num_active_vmas++;
}
@@ -321,11 +323,11 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src,
enum vma_operation operation)
{
struct anon_vma_chain *avc, *pavc;
- struct anon_vma *active_anon_vma = src->anon_vma;
+ anon_vma_tree_t active_anon_tree = src->anon_vma;
check_anon_vma_clone(dst, src, operation);
- if (!active_anon_vma)
+ if (!active_anon_tree)
return 0;
/*
@@ -350,7 +352,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src,
* Now link the anon_vma's back to the newly inserted AVCs.
* Note that all anon_vma's share the same root.
*/
- anon_vma_lock_write(src->anon_vma);
+ anon_vma_tree_lock_write(active_anon_tree);
list_for_each_entry_reverse(avc, &dst->anon_vma_chain, same_vma) {
struct anon_vma *anon_vma = avc->anon_vma;
@@ -360,9 +362,9 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src,
}
if (operation != VMA_OP_FORK)
- dst->anon_vma->num_active_vmas++;
+ vma_anon_vma(dst)->num_active_vmas++;
- anon_vma_unlock_write(active_anon_vma);
+ anon_vma_tree_unlock_write(active_anon_tree);
return 0;
enomem_failure:
@@ -379,6 +381,7 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
{
struct anon_vma_chain *avc;
struct anon_vma *anon_vma;
+ anon_vma_tree_t anon_tree;
int rc;
/* Don't bother if the parent process has no anon_vma here. */
@@ -386,7 +389,7 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
return 0;
/* Drop inherited anon_vma, we'll reuse existing or allocate new. */
- vma->anon_vma = NULL;
+ vma_set_anon_vma(vma, NULL);
anon_vma = anon_vma_alloc();
if (!anon_vma)
@@ -421,8 +424,8 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
* The root anon_vma's rwsem is the lock actually used when we
* lock any of the anon_vmas in this anon_vma tree.
*/
- anon_vma->root = pvma->anon_vma->root;
- anon_vma->parent = pvma->anon_vma;
+ anon_vma->parent = vma_anon_vma(pvma);
+ anon_vma->root = anon_vma->parent->root;
/*
* With refcounts, an anon_vma can stay around longer than the
* process it belongs to. The root anon_vma needs to be pinned until
@@ -430,13 +433,13 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
*/
get_anon_vma(anon_vma->root);
/* Mark this anon_vma as the one where our new (COWed) pages go. */
- vma->anon_vma = anon_vma;
+ vma->anon_vma = anon_tree = make_anon_vma_tree(anon_vma);
anon_vma_chain_assign(vma, avc, anon_vma);
/* Now let rmap see it. */
- anon_vma_lock_write(anon_vma);
+ anon_vma_tree_lock_write(anon_tree);
anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
anon_vma->parent->num_children++;
- anon_vma_unlock_write(anon_vma);
+ anon_vma_tree_unlock_write(anon_tree);
return 0;
}
@@ -463,7 +466,7 @@ static void cleanup_partial_anon_vmas(struct vm_area_struct *vma)
* able to correctly clone AVC state. Avoid inconsistent anon_vma tree
* state by resetting.
*/
- vma->anon_vma = NULL;
+ vma_set_anon_vma(vma, NULL);
}
/**
@@ -479,18 +482,18 @@ static void cleanup_partial_anon_vmas(struct vm_area_struct *vma)
void unlink_anon_vmas(struct vm_area_struct *vma)
{
struct anon_vma_chain *avc, *next;
- struct anon_vma *active_anon_vma = vma->anon_vma;
+ anon_vma_tree_t active_anon_tree = vma->anon_vma;
/* Always hold mmap lock, read-lock on unmap possibly. */
mmap_assert_locked(vma->vm_mm);
/* Unfaulted is a no-op. */
- if (!active_anon_vma) {
+ if (!active_anon_tree) {
VM_WARN_ON_ONCE(!list_empty(&vma->anon_vma_chain));
return;
}
- anon_vma_lock_write(active_anon_vma);
+ anon_vma_tree_lock_write(active_anon_tree);
/*
* Unlink each anon_vma chained to the VMA. This list is ordered
@@ -514,13 +517,13 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
anon_vma_chain_free(avc);
}
- active_anon_vma->num_active_vmas--;
+ vma_anon_vma(vma)->num_active_vmas--;
/*
* vma would still be needed after unlink, and anon_vma will be prepared
* when handle fault.
*/
- vma->anon_vma = NULL;
- anon_vma_unlock_write(active_anon_vma);
+ vma_set_anon_vma(vma, NULL);
+ anon_vma_tree_unlock_write(active_anon_tree);
/*
@@ -703,10 +706,12 @@ static struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *vma)
{
+ struct anon_vma *anon_vma = anon_vma_tree_anon_vma(vma->anon_vma);
+
mmap_assert_locked(vma->vm_mm);
VM_BUG_ON(!vma->anon_vma);
- get_anon_vma(vma->anon_vma);
- return anon_vma_to_anon_rmap(vma->anon_vma);
+ get_anon_vma(anon_vma);
+ return anon_vma_to_anon_rmap(anon_vma);
}
void put_anon_rmap(anon_rmap_t anon_rmap)
@@ -756,7 +761,7 @@ bool folio_maybe_same_anon_vma(const struct folio *folio,
const struct vm_area_struct *vma)
{
struct anon_vma *anon_vma;
- struct anon_vma *tgt_anon_vma = vma->anon_vma;
+ struct anon_vma *tgt_anon_vma = vma_anon_vma(vma);
bool same = false;
rcu_read_lock();
@@ -1518,7 +1523,7 @@ static __always_inline void __folio_add_rmap(struct folio *folio,
*/
void folio_move_anon_rmap(struct folio *folio, struct vm_area_struct *vma)
{
- void *anon_vma = vma->anon_vma;
+ void *anon_vma = vma_anon_vma(vma);
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
VM_BUG_ON_VMA(!anon_vma, vma);
@@ -1542,7 +1547,7 @@ void folio_move_anon_rmap(struct folio *folio, struct vm_area_struct *vma)
static void __folio_set_anon(struct folio *folio, struct vm_area_struct *vma,
unsigned long address, bool exclusive)
{
- struct anon_vma *anon_vma = vma->anon_vma;
+ struct anon_vma *anon_vma = vma_anon_vma(vma);
BUG_ON(!anon_vma);
diff --git a/mm/vma.c b/mm/vma.c
index d90791b00a7b..3501617085b0 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -107,8 +107,8 @@ static bool is_mergeable_anon_vma(struct vma_merge_struct *vmg, bool merge_next)
{
struct vm_area_struct *tgt = merge_next ? vmg->next : vmg->prev;
struct vm_area_struct *src = vmg->middle; /* existing merge case. */
- struct anon_vma *tgt_anon = tgt->anon_vma;
- struct anon_vma *src_anon = vmg->anon_vma;
+ anon_vma_tree_t tgt_anon = tgt->anon_vma;
+ anon_vma_tree_t src_anon = vmg->anon_vma;
/*
* We _can_ have !src, vmg->anon_vma via copy_vma(). In this instance we
@@ -311,7 +311,7 @@ static void vma_prepare(struct vma_prepare *vp)
}
if (vp->anon_vma) {
- anon_vma_lock_write(vp->anon_vma);
+ anon_vma_tree_lock_write(vp->anon_vma);
anon_vma_interval_tree_pre_update_vma(vp->vma);
if (vp->adj_next)
anon_vma_interval_tree_pre_update_vma(vp->adj_next);
@@ -364,7 +364,7 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
anon_vma_interval_tree_post_update_vma(vp->vma);
if (vp->adj_next)
anon_vma_interval_tree_post_update_vma(vp->adj_next);
- anon_vma_unlock_write(vp->anon_vma);
+ anon_vma_tree_unlock_write(vp->anon_vma);
}
if (vp->file) {
@@ -652,7 +652,7 @@ void validate_mm(struct mm_struct *mm)
mt_validate(&mm->mm_mt);
for_each_vma(vmi, vma) {
#ifdef CONFIG_DEBUG_VM_RB
- struct anon_vma *anon_vma = vma->anon_vma;
+ anon_vma_tree_t anon_tree = vma->anon_vma;
struct anon_vma_chain *avc;
#endif
unsigned long vmi_start, vmi_end;
@@ -676,11 +676,11 @@ void validate_mm(struct mm_struct *mm)
}
#ifdef CONFIG_DEBUG_VM_RB
- if (anon_vma) {
- anon_vma_lock_read(anon_vma);
+ if (anon_tree) {
+ anon_vma_tree_lock_read(anon_tree);
list_for_each_entry(avc, &vma->anon_vma_chain, same_vma)
anon_vma_interval_tree_verify(avc);
- anon_vma_unlock_read(anon_vma);
+ anon_vma_tree_unlock_read(anon_tree);
}
#endif
/* Check for a infinite loop */
@@ -2009,7 +2009,7 @@ static struct anon_vma *reusable_anon_vma(struct vm_area_struct *old,
struct vm_area_struct *b)
{
if (anon_vma_compatible(a, b)) {
- struct anon_vma *anon_vma = READ_ONCE(old->anon_vma);
+ struct anon_vma *anon_vma = vma_anon_vma(old);
if (anon_vma && list_is_singular(&old->anon_vma_chain))
return anon_vma;
@@ -3160,7 +3160,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
/* Lock the VMA before expanding to prevent concurrent page faults */
vma_start_write(vma);
/* We update the anon VMA tree. */
- anon_vma_lock_write(vma->anon_vma);
+ anon_vma_tree_lock_write(vma->anon_vma);
/* Somebody else might have raced and expanded it already */
if (address > vma->vm_end) {
@@ -3186,7 +3186,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
}
}
}
- anon_vma_unlock_write(vma->anon_vma);
+ anon_vma_tree_unlock_write(vma->anon_vma);
vma_iter_free(&vmi);
validate_mm(mm);
return error;
@@ -3239,7 +3239,7 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
/* Lock the VMA before expanding to prevent concurrent page faults */
vma_start_write(vma);
/* We update the anon VMA tree. */
- anon_vma_lock_write(vma->anon_vma);
+ anon_vma_tree_lock_write(vma->anon_vma);
/* Somebody else might have raced and expanded it already */
if (address < vma->vm_start) {
@@ -3266,7 +3266,7 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
}
}
}
- anon_vma_unlock_write(vma->anon_vma);
+ anon_vma_tree_unlock_write(vma->anon_vma);
vma_iter_free(&vmi);
validate_mm(mm);
return error;
diff --git a/mm/vma.h b/mm/vma.h
index 8e4b61a7304c..d3bd83299219 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -15,7 +15,7 @@ struct vma_prepare {
struct vm_area_struct *adj_next;
struct file *file;
struct address_space *mapping;
- struct anon_vma *anon_vma;
+ anon_vma_tree_t anon_vma;
struct vm_area_struct *insert;
struct vm_area_struct *remove;
struct vm_area_struct *remove2;
@@ -104,7 +104,7 @@ struct vma_merge_struct {
vma_flags_t vma_flags;
};
struct file *file;
- struct anon_vma *anon_vma;
+ anon_vma_tree_t anon_vma;
struct mempolicy *policy;
struct vm_userfaultfd_ctx uffd_ctx;
struct anon_vma_name *anon_name;
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 05/15] mm: add CONFIG_ANON_VMA_LAZY and folio helpers
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (3 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 04/15] mm: switch to anon_vma_tree_t APIs in preparation for ANON_VMA_LAZY tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 06/15] mm: add CONFIG_VMA_REF and VMA helpers tao
` (12 subsequent siblings)
17 siblings, 0 replies; 22+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Add the ANON_VMA_LAZY optimization foundation:
- CONFIG_ANON_VMA_LAZY Kconfig option
- FOLIO_MAPPING_ANON_VMA_LAZY flag for folio->mapping
- add a runtime switch for ANON_VMA_LAZY
This feature delays anon_vma allocation until fork, reducing memory
overhead for VMAs without children.
Signed-off-by: tao <tao.wangtao@honor.com>
---
include/linux/page-flags.h | 23 +++++++++++
mm/Kconfig | 14 +++++++
mm/internal.h | 16 ++++++++
mm/mmap.c | 9 ++++
mm/rmap.c | 84 ++++++++++++++++++++++++++++++++++++++
5 files changed, 146 insertions(+)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 0e03d816e8b9..c0cc43118877 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -696,6 +696,12 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
* the FOLIO_MAPPING_ANON_KSM bit may be set along with the FOLIO_MAPPING_ANON
* bit; and then folio->mapping points, not to an anon_vma, but to a private
* structure which KSM associates with that merged folio. See ksm.h.
+ *
+ * If CONFIG_ANON_VMA_LAZY is enabled, the FOLIO_MAPPING_ANON_KSM bit is used
+ * for the ANON_VMA_LAZY optimization. In this case, folio->mapping points to
+ * the ANON_VMA_LAZY root VMA instead of anon_vma. The folio_test_anon()
+ * check also needs to be updated accordingly.
+
*
* Please note that, confusingly, "folio_mapping" refers to the inode
* address_space which maps the folio from disk; whereas "folio_mapped"
@@ -711,11 +717,16 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
#define FOLIO_MAPPING_ANON 0x1
#define FOLIO_MAPPING_ANON_KSM 0x2
#define FOLIO_MAPPING_KSM (FOLIO_MAPPING_ANON | FOLIO_MAPPING_ANON_KSM)
+#define FOLIO_MAPPING_ANON_VMA_LAZY FOLIO_MAPPING_ANON_KSM
#define FOLIO_MAPPING_FLAGS (FOLIO_MAPPING_ANON | FOLIO_MAPPING_ANON_KSM)
static __always_inline bool folio_test_anon(const struct folio *folio)
{
+#ifdef CONFIG_ANON_VMA_LAZY
+ return ((unsigned long)folio->mapping & FOLIO_MAPPING_FLAGS) != 0;
+#else
return ((unsigned long)folio->mapping & FOLIO_MAPPING_ANON) != 0;
+#endif
}
static __always_inline bool folio_test_lazyfree(const struct folio *folio)
@@ -734,6 +745,18 @@ static __always_inline bool PageAnon(const struct page *page)
{
return folio_test_anon(page_folio(page));
}
+
+static inline bool folio_test_anon_vma_lazy(const struct folio *folio)
+{
+#ifdef CONFIG_ANON_VMA_LAZY
+ unsigned long flags = (unsigned long)folio->mapping;
+
+ return (flags & FOLIO_MAPPING_FLAGS) == FOLIO_MAPPING_ANON_VMA_LAZY;
+#else
+ return false;
+#endif
+}
+
#ifdef CONFIG_KSM
/*
* A KSM page is one of those write-protected "shared pages" or "merged pages"
diff --git a/mm/Kconfig b/mm/Kconfig
index e8bf1e9e6ad9..c16b5d9b3ce9 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1412,6 +1412,20 @@ config LOCK_MM_AND_FIND_VMA
bool
depends on !STACK_GROWSUP
+config ARCH_SUPPORTS_ANON_VMA_LAZY
+ def_bool n
+
+config ANON_VMA_LAZY
+ bool "Lazy allocation of anon_vma"
+ def_bool y
+ depends on ARCH_SUPPORTS_ANON_VMA_LAZY && MMU
+ help
+ For anonymous VMAs without children, avoid allocating anon_vma
+ and anon_vma_chain to reduce memory overhead.
+
+ Say Y to enable this optimization for anonymous VMAs without
+ children.
+
config IOMMU_MM_DATA
bool
diff --git a/mm/internal.h b/mm/internal.h
index 3dbbd118a78c..639f9c287f4c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -248,6 +248,22 @@ static inline void anon_vma_unlock_read(struct anon_vma *anon_vma)
/* anon_vma_tree_t APIs */
+/* Encoded anon_vma tree type. Must fit within ANON_VMA_TREE_BITS. */
+#define ANON_VMA_TREE_REGULAR 0 /* regular anon_vma */
+#define ANON_VMA_TREE_VMA 1
+#define ANON_VMA_TREE_PARENT 2
+#define ANON_VMA_TREE_INVALID 3 /* reserved */
+
+#define ANON_VMA_TREE_BITS 2
+#define ANON_VMA_TREE_MASK ((1UL << ANON_VMA_TREE_BITS) - 1)
+
+#ifdef CONFIG_ANON_VMA_LAZY
+extern bool anon_vma_lazy_enable;
+static inline bool anon_vma_lazy_enabled(void) { return anon_vma_lazy_enable; }
+#else
+static inline bool anon_vma_lazy_enabled(void) { return false; }
+#endif
+
static inline anon_vma_tree_t make_anon_vma_tree(struct anon_vma *anon_vma)
{
return (anon_vma_tree_t)anon_vma;
diff --git a/mm/mmap.c b/mm/mmap.c
index eac1fb3823eb..2ae733eb39f0 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1558,6 +1558,15 @@ static const struct ctl_table mmap_table[] = {
.extra2 = (void *)&mmap_rnd_compat_bits_max,
},
#endif
+#ifdef CONFIG_ANON_VMA_LAZY
+ {
+ .procname = "anon_vma_lazy",
+ .data = &anon_vma_lazy_enable,
+ .maxlen = sizeof(anon_vma_lazy_enable),
+ .mode = 0600,
+ .proc_handler = proc_dobool,
+ },
+#endif
};
#endif /* CONFIG_SYSCTL */
diff --git a/mm/rmap.c b/mm/rmap.c
index 5c4eb090c801..48c4463d8b2c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -87,6 +87,90 @@
static struct kmem_cache *anon_vma_cachep;
static struct kmem_cache *anon_vma_chain_cachep;
+#ifdef CONFIG_ANON_VMA_LAZY
+/*
+ * ANON_VMA_LAZY: defer anon_vma allocation until fork().
+ *
+ * anon_vma and anon_vma_chain exist mainly to support reverse mapping
+ * across multiple processes. For VMAs that belong to a single process,
+ * eagerly creating anon_vma introduces unnecessary memory and setup
+ * overhead.
+ *
+ * This optimization delays anon_vma creation until fork(). Before that
+ * the VMA stays in a lazy state and no anon_vma or anon_vma_chain
+ * topology is created.
+ *
+ * vma->anon_vma encodes the anonymous VMA state. Low bits of the pointer
+ * distinguish lazy states:
+ *
+ * NULL
+ * VMA has no anonymous or CoW pages.
+ *
+ * regular anon_vma
+ * Standard anon_vma with anon_vma_chain topology.
+ *
+ * anon_vma_lazy_root | ANON_VMA_TREE_VMA
+ * Lazy root for the VMA that first faults anonymous pages.
+ * No anon_vma or anon_vma_chain topology exists.
+ *
+ * parent_anon_vma | ANON_VMA_TREE_PARENT
+ * Lazy state for VMAs created during fork(). The lazy parent_anon_vma
+ * refers to the anon_vma of the parent VMA.
+ *
+ * Anonymous folios extend folio->mapping with FOLIO_MAPPING_ANON_VMA_LAZY:
+ *
+ * anon_vma | FOLIO_MAPPING_ANON
+ * regular anonymous mapping
+ *
+ * anon_vma_lazy_root | FOLIO_MAPPING_ANON_VMA_LAZY
+ * lazy anonymous mapping
+ *
+ * In typical workloads most VMAs remain in ANON_VMA_TREE_VMA state.
+ * These VMAs have no anon_vma, no anon_vma_chain and only a single VMA.
+ * Reverse mapping can therefore be performed without anon_vma locking,
+ * providing a faster rmap path for the common case.
+ *
+ * During fork(), VMAs in ANON_VMA_TREE_VMA are upgraded to regular
+ * anon_vma in the parent to establish sharing topology. Child VMAs are
+ * created as ANON_VMA_TREE_PARENT and do not allocate anon_vma,
+ * avoiding additional fork overhead.
+ *
+ * Folio mapping rules:
+ *
+ * Lazy anonymous folios store the lazy root in folio->mapping using
+ * FOLIO_MAPPING_ANON_VMA_LAZY. This allows rmap walkers to resolve the
+ * owning VMA without requiring anon_vma topology.
+ *
+ * folio->mapping may be updated during fork() when lazy VMAs are
+ * upgraded to regular anon_vma. dup_anon_rmap() in copy_page_range()
+ * performs the upgrade and installs the new anon_vma mapping.
+ *
+ * folio_move_anon_rmap() updates folio->mapping when anonymous folios
+ * move between VMAs.
+ *
+ * As with regular anonymous memory, __folio_remove_rmap() does not
+ * clear folio->mapping. Rmap walkers validate mappings using
+ * folio_mapped().
+ *
+ * VMA split keeps vma->anon_vma unchanged. The lazy root holds an extra
+ * reference so folio->mapping remains valid without scanning folios.
+ *
+ * Internal helpers:
+ *
+ * anon_vma_link_t
+ * The value encodes a reference to anon_vma topology. Low bits
+ * are used as type tags to distinguish different anon_vma
+ * implementations (e.g. regular anon_vma or anon_vma_lazy).
+ *
+ * anon_rmap_t
+ * anon_rmap_t wraps the tagged pointer used by the rmap code and
+ * provides a type-safe interface for reverse mapping operations,
+ * covering both regular anon_vma and lazy anon_vma mappings.
+ */
+
+bool anon_vma_lazy_enable;
+#endif
+
static inline struct anon_vma *anon_vma_alloc(void)
{
struct anon_vma *anon_vma;
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 06/15] mm: add CONFIG_VMA_REF and VMA helpers
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (4 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 05/15] mm: add CONFIG_ANON_VMA_LAZY and folio helpers tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 07/15] mm: replace direct FOLIO_MAPPING_ANON usage with helpers tao
` (11 subsequent siblings)
17 siblings, 0 replies; 22+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
rcuref only manages the lifetime of a VMA and does not track its state.
Prepare for the upcoming ANON_VMA_LAZY support.
Signed-off-by: tao <tao.wangtao@honor.com>
---
include/linux/mm.h | 38 ++++++++++++++++++++++++++++++++++++++
include/linux/mm_types.h | 4 ++++
mm/Kconfig | 8 ++++++++
mm/debug_vm_pgtable.c | 2 +-
mm/mmap.c | 4 ++--
mm/vma.c | 12 ++++++------
mm/vma_exec.c | 2 +-
mm/vma_init.c | 1 +
8 files changed, 61 insertions(+), 10 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index af23453e9dbd..e98bdb414e43 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -918,6 +918,43 @@ static inline void assert_fault_locked(const struct vm_fault *vmf)
}
#endif /* CONFIG_PER_VMA_LOCK */
+#ifdef CONFIG_VMA_REF
+static inline void vma_rcuref_init(struct vm_area_struct *vma)
+{
+ rcuref_init(&vma->vm_rcuref, 1);
+}
+
+static inline struct vm_area_struct *vma_get(struct vm_area_struct *vma)
+{
+ if (rcuref_get(&vma->vm_rcuref))
+ return vma;
+ return NULL;
+}
+
+static inline bool vma_put(struct vm_area_struct *vma)
+{
+ bool release = rcuref_put(&vma->vm_rcuref);
+
+ if (unlikely(release))
+ vm_area_free(vma);
+ return release;
+}
+#else
+static inline void vma_rcuref_init(struct vm_area_struct *vma) {}
+
+static inline struct vm_area_struct *vma_get(struct vm_area_struct *vma)
+{
+ VM_WARN_ON_ONCE(true); /* not allowed */
+ return NULL;
+}
+
+static inline bool vma_put(struct vm_area_struct *vma)
+{
+ vm_area_free(vma);
+ return true;
+}
+#endif /* CONFIG_VMA_REF */
+
static inline bool mm_flags_test(int flag, const struct mm_struct *mm)
{
return test_bit(flag, ACCESS_PRIVATE(&mm->flags, __mm_flags));
@@ -957,6 +994,7 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
vma->vm_ops = &vma_dummy_vm_ops;
INIT_LIST_HEAD(&vma->anon_vma_chain);
vma_lock_init(vma, false);
+ vma_rcuref_init(vma);
}
/* Use when VMA is not part of the VMA tree and needs no locking */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index e7f5debac98e..a2bf17a42b55 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -6,6 +6,7 @@
#include <linux/auxvec.h>
#include <linux/kref.h>
+#include <linux/rcuref.h>
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/rbtree.h>
@@ -978,6 +979,9 @@ struct vm_area_struct {
* slowpath.
*/
unsigned int vm_lock_seq;
+#endif
+#ifdef CONFIG_ANON_VMA_LAZY
+ rcuref_t vm_rcuref; /* Ensures the VMA stays valid. */
#endif
/*
* A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
diff --git a/mm/Kconfig b/mm/Kconfig
index c16b5d9b3ce9..c039ce583924 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1419,13 +1419,21 @@ config ANON_VMA_LAZY
bool "Lazy allocation of anon_vma"
def_bool y
depends on ARCH_SUPPORTS_ANON_VMA_LAZY && MMU
+ select VMA_REF
help
For anonymous VMAs without children, avoid allocating anon_vma
and anon_vma_chain to reduce memory overhead.
+ ANON_VMA_LAZY records the VMA in folio->mapping, while VMA_REF
+ ensures that the recorded VMA remains valid.
+
Say Y to enable this optimization for anonymous VMAs without
children.
+config VMA_REF
+ def_bool n
+ depends on MMU
+
config IOMMU_MM_DATA
bool
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 23dc3ee09561..cab8a4e71243 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -1036,7 +1036,7 @@ static void __init destroy_args(struct pgtable_debug_args *args)
/* Free vma and mm struct */
if (args->vma)
- vm_area_free(args->vma);
+ vma_put(args->vma);
if (args->mm)
mmput(args->mm);
diff --git a/mm/mmap.c b/mm/mmap.c
index 2ae733eb39f0..ccedebc87cd5 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1481,7 +1481,7 @@ static struct vm_area_struct *__install_special_mapping(
return vma;
out:
- vm_area_free(vma);
+ vma_put(vma);
return ERR_PTR(ret);
}
@@ -1922,7 +1922,7 @@ __latent_entropy int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
fail_nomem_anon_vma_fork:
mpol_put(vma_policy(tmp));
fail_nomem_policy:
- vm_area_free(tmp);
+ vma_put(tmp);
fail_nomem:
retval = -ENOMEM;
vm_unacct_memory(charge);
diff --git a/mm/vma.c b/mm/vma.c
index 3501617085b0..ed15968a5891 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -392,7 +392,7 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
mpol_put(vma_policy(vp->remove));
if (!vp->remove2)
WARN_ON_ONCE(vp->vma->vm_end < vp->remove->vm_end);
- vm_area_free(vp->remove);
+ vma_put(vp->remove);
/*
* In mprotect's case 6 (see comments on vma_merge),
@@ -470,7 +470,7 @@ void remove_vma(struct vm_area_struct *vma)
if (vma->vm_file)
fput(vma->vm_file);
mpol_put(vma_policy(vma));
- vm_area_free(vma);
+ vma_put(vma);
}
/*
@@ -582,7 +582,7 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
out_free_vmi:
vma_iter_free(vmi);
out_free_vma:
- vm_area_free(new);
+ vma_put(new);
return err;
}
@@ -1950,7 +1950,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
out_free_mempol:
mpol_put(vma_policy(new_vma));
out_free_vma:
- vm_area_free(new_vma);
+ vma_put(new_vma);
out:
return NULL;
}
@@ -2596,7 +2596,7 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap,
free_iter_vma:
vma_iter_free(vmi);
free_vma:
- vm_area_free(vma);
+ vma_put(vma);
return error;
}
@@ -2946,7 +2946,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
return 0;
mas_store_fail:
- vm_area_free(vma);
+ vma_put(vma);
unacct_fail:
vm_unacct_memory(len >> PAGE_SHIFT);
return -ENOMEM;
diff --git a/mm/vma_exec.c b/mm/vma_exec.c
index 5cee8b7efa0f..e7f388010488 100644
--- a/mm/vma_exec.c
+++ b/mm/vma_exec.c
@@ -160,6 +160,6 @@ int create_init_stack_vma(struct mm_struct *mm, struct vm_area_struct **vmap,
mmap_write_unlock(mm);
err_free:
*vmap = NULL;
- vm_area_free(vma);
+ vma_put(vma);
return err;
}
diff --git a/mm/vma_init.c b/mm/vma_init.c
index 3c0b65950510..1300d813d61b 100644
--- a/mm/vma_init.c
+++ b/mm/vma_init.c
@@ -137,6 +137,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
INIT_LIST_HEAD(&new->anon_vma_chain);
vma_numab_state_init(new);
dup_anon_vma_name(orig, new);
+ vma_rcuref_init(new);
return new;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 07/15] mm: replace direct FOLIO_MAPPING_ANON usage with helpers
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (5 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 06/15] mm: add CONFIG_VMA_REF and VMA helpers tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 08/15] mm: prepare rmap infrastructure for ANON_VMA_LAZY tao
` (10 subsequent siblings)
17 siblings, 0 replies; 22+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Replace direct uses of FOLIO_MAPPING_ANON in external modules with
helper functions in preparation for ANON_VMA_LAZY.
Signed-off-by: tao <tao.wangtao@honor.com>
---
fs/proc/page.c | 6 ++----
include/linux/page-flags.h | 15 ++++++++++++---
include/linux/pagemap.h | 2 +-
mm/gup.c | 6 ++----
4 files changed, 17 insertions(+), 12 deletions(-)
diff --git a/fs/proc/page.c b/fs/proc/page.c
index f9b2c2c906cd..93ddfda9fa1d 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -148,7 +148,6 @@ u64 stable_page_flags(const struct page *page)
const struct folio *folio;
struct page_snapshot ps;
unsigned long k;
- unsigned long mapping;
bool is_anon;
u64 u = 0;
@@ -163,8 +162,7 @@ u64 stable_page_flags(const struct page *page)
folio = &ps.folio_snapshot;
k = folio->flags.f;
- mapping = (unsigned long)folio->mapping;
- is_anon = mapping & FOLIO_MAPPING_ANON;
+ is_anon = folio_test_anon(folio);
/*
* pseudo flags for the well known (anonymous) memory mapped pages
@@ -173,7 +171,7 @@ u64 stable_page_flags(const struct page *page)
u |= 1 << KPF_MMAP;
if (is_anon) {
u |= 1 << KPF_ANON;
- if (mapping & FOLIO_MAPPING_KSM)
+ if (!PageAnonNotKsm(page))
u |= 1 << KPF_KSM;
}
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index c0cc43118877..50c80a1e2c7c 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -720,15 +720,20 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
#define FOLIO_MAPPING_ANON_VMA_LAZY FOLIO_MAPPING_ANON_KSM
#define FOLIO_MAPPING_FLAGS (FOLIO_MAPPING_ANON | FOLIO_MAPPING_ANON_KSM)
-static __always_inline bool folio_test_anon(const struct folio *folio)
+static __always_inline bool mapping_is_anon(unsigned long mapping)
{
#ifdef CONFIG_ANON_VMA_LAZY
- return ((unsigned long)folio->mapping & FOLIO_MAPPING_FLAGS) != 0;
+ return (mapping & FOLIO_MAPPING_FLAGS) != 0;
#else
- return ((unsigned long)folio->mapping & FOLIO_MAPPING_ANON) != 0;
+ return (mapping & FOLIO_MAPPING_ANON) != 0;
#endif
}
+static __always_inline bool folio_test_anon(const struct folio *folio)
+{
+ return mapping_is_anon((unsigned long)folio->mapping);
+}
+
static __always_inline bool folio_test_lazyfree(const struct folio *folio)
{
return folio_test_anon(folio) && !folio_test_swapbacked(folio);
@@ -738,7 +743,11 @@ static __always_inline bool PageAnonNotKsm(const struct page *page)
{
unsigned long flags = (unsigned long)page_folio(page)->mapping;
+#ifdef CONFIG_ANON_VMA_LAZY
+ return (flags & FOLIO_MAPPING_FLAGS) != FOLIO_MAPPING_KSM;
+#else
return (flags & FOLIO_MAPPING_FLAGS) == FOLIO_MAPPING_ANON;
+#endif
}
static __always_inline bool PageAnon(const struct page *page)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 31a848485ad9..746939872ac4 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -507,7 +507,7 @@ static inline pgoff_t mapping_align_index(const struct address_space *mapping,
static inline bool mapping_large_folio_support(const struct address_space *mapping)
{
/* AS_FOLIO_ORDER is only reasonable for pagecache folios */
- VM_WARN_ONCE((unsigned long)mapping & FOLIO_MAPPING_ANON,
+ VM_WARN_ONCE(mapping_is_anon((unsigned long)mapping),
"Anonymous mapping always supports large folio");
return mapping_max_folio_order(mapping) > 0;
diff --git a/mm/gup.c b/mm/gup.c
index ad9ded39609c..69dda325b082 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2740,7 +2740,6 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
bool reject_file_backed = false;
struct address_space *mapping;
bool check_secretmem = false;
- unsigned long mapping_flags;
/*
* If we aren't pinning then no problematic write can occur. A long term
@@ -2792,9 +2791,8 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
return false;
/* Anonymous folios pose no problem. */
- mapping_flags = (unsigned long)mapping & FOLIO_MAPPING_FLAGS;
- if (mapping_flags)
- return mapping_flags & FOLIO_MAPPING_ANON;
+ if (mapping_is_anon((unsigned long)mapping))
+ return true;
/*
* At this point, we know the mapping is non-null and points to an
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 08/15] mm: prepare rmap infrastructure for ANON_VMA_LAZY
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (6 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 07/15] mm: replace direct FOLIO_MAPPING_ANON usage with helpers tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 09/15] mm: implement ANON_VMA_LAZY rmap semantics tao
` (9 subsequent siblings)
17 siblings, 0 replies; 22+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Introduce ANON_VMA_LAZY helpers and prepare the anon_rmap and
anon_vma_tree infrastructure for the upcoming ANON_VMA_LAZY feature.
Implement the core ANON_VMA_LAZY rmap semantics by updating
anon_rmap_trylock_read(), anon_rmap_lock_read(), anon_rmap_unlock_read(),
and anon_rmap_for_each_vma().
Also update __migrate_folio_record(): instead of storing both
old_page_state and anon_vma in dst->private, store old_page_state in
dst->private and use dst->mapping to hold anon_rmap.
Split folio_lock_anon_rmap_read() and related functions into the next
patch to keep this change small and easier to review.
Signed-off-by: tao <tao.wangtao@honor.com>
---
include/linux/rmap.h | 53 +++++++++++++++++++++---
mm/internal.h | 99 +++++++++++++++++++++++++++++++++++++-------
mm/migrate.c | 11 ++++-
mm/rmap.c | 42 +++++++++++++++++++
4 files changed, 183 insertions(+), 22 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 9802bce92695..ebe9f3f61170 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -938,15 +938,23 @@ void remove_migration_ptes(struct folio *src, struct folio *dst,
enum ttu_flags flags);
/* Reverse mapping handle for anonymous folio rmap helpers. */
+enum anon_rmap_type {
+ ANON_RMAP_ANON_VMA = 0,
+ ANON_RMAP_ANON_VMA_LAZY = 1,
+};
+#define ANON_RMAP_TYPE_BITS 1
+#define ANON_RMAP_TYPE_MASK ((1UL << ANON_RMAP_TYPE_BITS) - 1)
+
typedef struct anon_rmap {
unsigned long rmap;
} anon_rmap_t;
-#define ANON_RMAP_NULL make_anon_rmap(0)
+#define ANON_RMAP_NULL (make_anon_rmap(0, ANON_RMAP_ANON_VMA))
-static inline anon_rmap_t make_anon_rmap(const void *anon_mapping)
+static inline anon_rmap_t make_anon_rmap(const void *anon_mapping,
+ enum anon_rmap_type type)
{
- return (anon_rmap_t){ .rmap = (unsigned long)anon_mapping, };
+ return (anon_rmap_t){ .rmap = (unsigned long)anon_mapping + type, };
}
static inline unsigned long anon_rmap_value(anon_rmap_t anon_rmap)
@@ -956,14 +964,38 @@ static inline unsigned long anon_rmap_value(anon_rmap_t anon_rmap)
static inline anon_rmap_t anon_vma_to_anon_rmap(const struct anon_vma *anon_vma)
{
- return make_anon_rmap(anon_vma);
+ return make_anon_rmap(anon_vma, ANON_RMAP_ANON_VMA);
}
static inline struct anon_vma *anon_rmap_to_anon_vma(anon_rmap_t anon_rmap)
{
unsigned long rmap = anon_rmap_value(anon_rmap);
- return (struct anon_vma *)rmap;
+ return (struct anon_vma *)(rmap - ANON_RMAP_ANON_VMA);
+}
+
+static inline anon_rmap_t vma_to_anon_rmap(const struct vm_area_struct *vma)
+{
+ return make_anon_rmap(vma, ANON_RMAP_ANON_VMA_LAZY);
+}
+
+static inline struct vm_area_struct *anon_rmap_to_vma(anon_rmap_t anon_rmap)
+{
+ unsigned long rmap = anon_rmap_value(anon_rmap);
+
+ VM_BUG_ON((rmap & ANON_RMAP_TYPE_MASK) != ANON_RMAP_ANON_VMA_LAZY);
+ return (struct vm_area_struct *)(rmap - ANON_RMAP_ANON_VMA_LAZY);
+}
+
+static inline bool anon_rmap_is_anon_vma(anon_rmap_t anon_rmap)
+{
+#ifdef CONFIG_ANON_VMA_LAZY
+ unsigned long rmap = anon_rmap_value(anon_rmap);
+
+ return (rmap & ANON_RMAP_TYPE_MASK) == ANON_RMAP_ANON_VMA;
+#else
+ return true;
+#endif
}
anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *vma);
@@ -1015,8 +1047,17 @@ static inline struct vm_area_struct *anon_rmap_iter_first_vma(
anon_rmap_t anon_rmap, unsigned long start, unsigned long last,
struct anon_vma_chain **avc)
{
- struct anon_vma *anon_vma = anon_rmap_to_anon_vma(anon_rmap);
+ struct anon_vma *anon_vma;
+
+ *avc = NULL;
+ if (!anon_rmap_is_anon_vma(anon_rmap)) {
+ struct vm_area_struct *vma = anon_rmap_to_vma(anon_rmap);
+ if (vma->vm_pgoff + vma_pages(vma) < start || vma->vm_pgoff > last)
+ return NULL; /* No overlap in the VMA range. */
+ return vma;
+ } else
+ anon_vma = anon_rmap_to_anon_vma(anon_rmap);
*avc = anon_vma_interval_tree_iter_first(&anon_vma->rb_root, start, last);
return *avc ? (*avc)->vma : NULL;
}
diff --git a/mm/internal.h b/mm/internal.h
index 639f9c287f4c..6b703646f66d 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -260,76 +260,147 @@ static inline void anon_vma_unlock_read(struct anon_vma *anon_vma)
#ifdef CONFIG_ANON_VMA_LAZY
extern bool anon_vma_lazy_enable;
static inline bool anon_vma_lazy_enabled(void) { return anon_vma_lazy_enable; }
-#else
-static inline bool anon_vma_lazy_enabled(void) { return false; }
-#endif
-static inline anon_vma_tree_t make_anon_vma_tree(struct anon_vma *anon_vma)
+static inline int anon_vma_tree_type(anon_vma_tree_t anon_tree)
{
- return (anon_vma_tree_t)anon_vma;
+ VM_WARN_ON(((unsigned long)anon_tree & ANON_VMA_TREE_MASK) ==
+ ANON_VMA_TREE_INVALID);
+ return (unsigned long)anon_tree & ANON_VMA_TREE_MASK;
+}
+
+static inline bool anon_vma_tree_is_vma(anon_vma_tree_t anon_tree)
+{
+ return anon_vma_tree_type(anon_tree) == ANON_VMA_TREE_VMA;
+}
+
+static inline bool anon_vma_tree_is_parent(anon_vma_tree_t anon_tree)
+{
+ return anon_vma_tree_type(anon_tree) == ANON_VMA_TREE_PARENT;
+}
+
+static inline struct vm_area_struct *anon_vma_tree_vma(anon_vma_tree_t anon_tree)
+{
+ BUILD_BUG_ON(__alignof__(struct vm_area_struct) <= ANON_VMA_TREE_MASK);
+ if (!anon_vma_tree_is_vma(anon_tree))
+ return NULL;
+ return (struct vm_area_struct *)(
+ (unsigned long)anon_tree & ~ANON_VMA_TREE_MASK);
}
static inline struct anon_vma *anon_vma_tree_anon_vma(anon_vma_tree_t anon_tree)
{
- return (struct anon_vma *)anon_tree;
+ BUILD_BUG_ON(__alignof__(struct anon_vma) <= ANON_VMA_TREE_MASK);
+ if (anon_vma_tree_is_vma(anon_tree))
+ return NULL;
+ return (struct anon_vma *)((unsigned long)anon_tree & ~ANON_VMA_TREE_MASK);
+}
+
+#else
+static inline bool anon_vma_lazy_enabled(void) { return false; }
+static inline int anon_vma_tree_type(anon_vma_tree_t anon_tree) { return 0; }
+static inline bool anon_vma_tree_is_vma(anon_vma_tree_t anon_tree) { return false; }
+static inline bool anon_vma_tree_is_parent(
+ anon_vma_tree_t anon_tree) { return false; }
+static inline struct vm_area_struct *anon_vma_tree_vma(
+ anon_vma_tree_t anon_tree) { return NULL; }
+static inline struct anon_vma *anon_vma_tree_anon_vma(
+ anon_vma_tree_t anon_tree) { return (struct anon_vma *)anon_tree; }
+#endif
+
+static inline anon_vma_tree_t make_anon_vma_tree(const struct anon_vma *anon_vma)
+{
+ return (anon_vma_tree_t)anon_vma;
}
/* Store anon_vma in vma->anon_vma using a tagged pointer. */
static inline void vma_set_anon_vma(struct vm_area_struct *vma,
- struct anon_vma *anon_vma)
+ const struct anon_vma *anon_vma)
{
vma->anon_vma = (anon_vma_tree_t)anon_vma;
}
-/* Return the VMA's anon_vma. */
+/* Return the VMA's anon_vma, or NULL if it is marked lazy. */
static inline struct anon_vma *vma_anon_vma(const struct vm_area_struct *vma)
{
/* Use READ_ONCE() for reusable_anon_vma */
anon_vma_tree_t anon_tree = READ_ONCE(vma->anon_vma);
+ if (anon_vma_tree_type(anon_tree) != ANON_VMA_TREE_REGULAR)
+ return NULL;
return anon_vma_tree_anon_vma(anon_tree);
}
+static inline bool vma_is_anon_vma_lazy(const struct vm_area_struct *vma)
+{
+ return anon_vma_tree_type((anon_vma_tree_t)vma->anon_vma);
+}
+
+static inline const struct vm_area_struct *vma_anon_vma_lazy_root(
+ const struct vm_area_struct *vma)
+{
+ anon_vma_tree_t anon_tree = (anon_vma_tree_t)vma->anon_vma;
+ int lazy_type = anon_vma_tree_type(anon_tree);
+
+ if (!lazy_type)
+ return NULL;
+ if (anon_vma_tree_is_parent(anon_tree))
+ return vma;
+ return anon_vma_tree_vma(anon_tree);
+}
+
+static inline bool vma_is_anon_vma_lazy_root(const struct vm_area_struct *vma)
+{
+ return vma == vma_anon_vma_lazy_root(vma);
+}
+
+/*
+ * ANON_VMA_TREE_VMA is just a VMA, without anon_vma or anon_vma_chain,
+ * so no protection is needed.
+ */
static inline void anon_vma_tree_lock_write(anon_vma_tree_t anon_tree)
{
struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
- anon_vma_lock_write(anon_vma);
+ if (anon_vma)
+ anon_vma_lock_write(anon_vma);
}
static inline int anon_vma_tree_trylock_write(anon_vma_tree_t anon_tree)
{
struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
- return anon_vma_trylock_write(anon_vma);
+ return anon_vma ? anon_vma_trylock_write(anon_vma) : 1;
}
static inline void anon_vma_tree_unlock_write(anon_vma_tree_t anon_tree)
{
struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
- anon_vma_unlock_write(anon_vma);
+ if (anon_vma)
+ anon_vma_unlock_write(anon_vma);
}
static inline void anon_vma_tree_lock_read(anon_vma_tree_t anon_tree)
{
struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
- anon_vma_lock_read(anon_vma);
+ if (anon_vma)
+ anon_vma_lock_read(anon_vma);
}
static inline int anon_vma_tree_trylock_read(anon_vma_tree_t anon_tree)
{
struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
- return anon_vma_trylock_read(anon_vma);
+ return anon_vma ? anon_vma_trylock_read(anon_vma) : 1;
}
static inline void anon_vma_tree_unlock_read(anon_vma_tree_t anon_tree)
{
struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
- anon_vma_unlock_read(anon_vma);
+ if (anon_vma)
+ anon_vma_unlock_read(anon_vma);
}
struct anon_vma *folio_get_anon_vma(const struct folio *folio);
diff --git a/mm/migrate.c b/mm/migrate.c
index 769983cf14e0..b397cdeab09a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1144,7 +1144,10 @@ static void __migrate_folio_record(struct folio *dst,
int old_page_state,
anon_rmap_t anon_rmap)
{
- dst->private = (void *)anon_rmap_to_anon_vma(anon_rmap) + old_page_state;
+ unsigned long rmap = anon_rmap_value(anon_rmap);
+
+ dst->private = (void *)(rmap & ~PAGE_OLD_STATES) + old_page_state;
+ dst->mapping = (struct address_space *)rmap;
}
static void __migrate_folio_extract(struct folio *dst,
@@ -1152,8 +1155,12 @@ static void __migrate_folio_extract(struct folio *dst,
anon_rmap_t *anon_rmapp)
{
unsigned long private = (unsigned long)dst->private;
+ unsigned long mapping = (unsigned long)dst->mapping;
- *anon_rmapp = anon_vma_to_anon_rmap((void *)(private & ~PAGE_OLD_STATES));
+ VM_BUG_ON((private & ~PAGE_OLD_STATES) != (mapping & ~ANON_RMAP_TYPE_MASK));
+ *anon_rmapp = make_anon_rmap((void *)(mapping & ~ANON_RMAP_TYPE_MASK),
+ mapping & ANON_RMAP_TYPE_MASK);
+ dst->mapping = NULL;
*old_page_state = private & PAGE_OLD_STATES;
dst->private = NULL;
}
diff --git a/mm/rmap.c b/mm/rmap.c
index 48c4463d8b2c..001c44570df8 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -794,42 +794,84 @@ anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *vma)
mmap_assert_locked(vma->vm_mm);
VM_BUG_ON(!vma->anon_vma);
+ if (!anon_vma) {
+ vma_get(vma);
+ return vma_to_anon_rmap(vma);
+ }
get_anon_vma(anon_vma);
return anon_vma_to_anon_rmap(anon_vma);
}
void put_anon_rmap(anon_rmap_t anon_rmap)
{
+ if (!anon_rmap_is_anon_vma(anon_rmap)) {
+ vma_put(anon_rmap_to_vma(anon_rmap));
+ return;
+ }
put_anon_vma(anon_rmap_to_anon_vma(anon_rmap));
}
+/*
+ * Rmap for anonymous pages normally only needs read protection.
+ * However, huge page splitting in huge_memory requires the rmap
+ * write lock to prevent concurrency, achieved by upgrading to a
+ * regular anon_vma.
+ */
void anon_rmap_lock_write(anon_rmap_t anon_rmap)
{
+ VM_BUG_ON(!anon_rmap_is_anon_vma(anon_rmap));
anon_vma_lock_write(anon_rmap_to_anon_vma(anon_rmap));
}
int anon_rmap_trylock_write(anon_rmap_t anon_rmap)
{
+ VM_BUG_ON(!anon_rmap_is_anon_vma(anon_rmap));
return anon_vma_trylock_write(anon_rmap_to_anon_vma(anon_rmap));
}
void anon_rmap_unlock_write(anon_rmap_t anon_rmap)
{
+ VM_BUG_ON(!anon_rmap_is_anon_vma(anon_rmap));
anon_vma_unlock_write(anon_rmap_to_anon_vma(anon_rmap));
}
+static void anon_vma_lazy_lock_read(struct vm_area_struct *vma)
+{
+ vma_get(vma);
+}
+
+static bool anon_vma_lazy_trylock_read(struct vm_area_struct *vma)
+{
+ return (bool)vma_get(vma);
+}
+
+static void anon_vma_lazy_unlock_read(struct vm_area_struct *vma)
+{
+ vma_put(vma);
+}
+
void anon_rmap_lock_read(anon_rmap_t anon_rmap)
{
+ if (!anon_rmap_is_anon_vma(anon_rmap)) {
+ anon_vma_lazy_lock_read(anon_rmap_to_vma(anon_rmap));
+ return;
+ }
anon_vma_lock_read(anon_rmap_to_anon_vma(anon_rmap));
}
int anon_rmap_trylock_read(anon_rmap_t anon_rmap)
{
+ if (!anon_rmap_is_anon_vma(anon_rmap))
+ return anon_vma_lazy_trylock_read(anon_rmap_to_vma(anon_rmap));
return anon_vma_trylock_read(anon_rmap_to_anon_vma(anon_rmap));
}
void anon_rmap_unlock_read(anon_rmap_t anon_rmap)
{
+ if (!anon_rmap_is_anon_vma(anon_rmap)) {
+ anon_vma_lazy_unlock_read(anon_rmap_to_vma(anon_rmap));
+ return;
+ }
anon_vma_unlock_read(anon_rmap_to_anon_vma(anon_rmap));
}
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 09/15] mm: implement ANON_VMA_LAZY rmap semantics
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (7 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 08/15] mm: prepare rmap infrastructure for ANON_VMA_LAZY tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 10/15] mm: defer anon_vma creation with ANON_VMA_LAZY tao
` (8 subsequent siblings)
17 siblings, 0 replies; 22+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Implement ANON_VMA_LAZY anon_rmap semantics by updating
folio_anon_rmap(), folio_maybe_same_anon_vma(), folio_get_anon_rmap(),
and folio_lock_anon_rmap_read().
ANON_VMA_LAZY VMAs resolve the target VMA via root_vma. As this path
does not involve anon_vma topology, vma_get() is sufficient to ensure
that the VMA still exists.
Signed-off-by: tao <tao.wangtao@honor.com>
---
mm/rmap.c | 126 +++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 120 insertions(+), 6 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index 001c44570df8..f70e3cb9812e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -875,9 +875,97 @@ void anon_rmap_unlock_read(anon_rmap_t anon_rmap)
anon_vma_unlock_read(anon_rmap_to_anon_vma(anon_rmap));
}
+static inline bool test_folio_unmapped(const struct folio *folio, bool test)
+{
+ return test && !folio_mapped(folio);
+}
+
+/*
+ * Must be called under rcu_read_lock().
+ *
+ * For FOLIO_MAPPING_ANON_VMA_LAZY, first obtain the VMA recorded in the
+ * lazy mapping and take a reference with vma_get() so its fields can be
+ * safely accessed. If the folio is no longer mapped in that VMA, resolve
+ * and look up the actual VMA covering the folio.
+ */
+static struct vm_area_struct *folio_resolve_anon_vma_lazy(
+ const struct folio *folio, bool tryget, bool test_map)
+{
+ struct vm_area_struct *vma, *anon_lazy_root;
+ struct mm_struct *mm;
+ unsigned long anon_mapping;
+ pgoff_t pgoff;
+ unsigned long addr;
+
+ anon_mapping = (unsigned long)READ_ONCE(folio->mapping);
+ if ((anon_mapping & FOLIO_MAPPING_FLAGS) != FOLIO_MAPPING_ANON_VMA_LAZY)
+ return NULL;
+ if (test_folio_unmapped(folio, test_map))
+ return NULL;
+
+ anon_lazy_root = vma = (struct vm_area_struct *)(anon_mapping -
+ FOLIO_MAPPING_ANON_VMA_LAZY);
+ mm = vma->vm_mm;
+ if (!mm || !vma->anon_vma || !vma_get(anon_lazy_root))
+ return NULL;
+ pgoff = folio->index;
+ if (vma_address(vma, pgoff, folio_nr_pages(folio)) == -EFAULT) {
+ addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ vma = vma_lookup(mm, addr);
+ if (vma && tryget && !vma_get(vma))
+ vma = NULL;
+ }
+ if (!tryget || anon_lazy_root != vma)
+ vma_put(anon_lazy_root);
+ if (test_folio_unmapped(folio, test_map) && vma) {
+ vma_put(vma);
+ vma = NULL;
+ }
+ return vma;
+}
+
+/* Like folio_get_anon_vma(), but for ANON_VMA_LAZY VMAs. */
+static struct vm_area_struct *folio_get_anon_vma_lazy(const struct folio *folio)
+{
+ struct vm_area_struct *vma = NULL;
+
+ rcu_read_lock();
+ vma = folio_resolve_anon_vma_lazy(folio, true, true);
+ rcu_read_unlock();
+ return vma;
+}
+
+/*
+ * For ANON_VMA_LAZY VMAs, similar to folio_get_anon_lazy_vma().
+ *
+ * These VMAs do not have an anon_vma or anon_vma_chain and correspond
+ * to only a single VMA. Therefore, reverse mapping can be performed
+ * without taking the anon_vma lock, providing a faster rmap path for
+ * this common case.
+ */
+static struct vm_area_struct *folio_lock_anon_vma_lazy_read(
+ const struct folio *folio, struct rmap_walk_control *rwc, bool test_map)
+{
+ struct vm_area_struct *vma = NULL;
+
+ rcu_read_lock();
+ vma = folio_resolve_anon_vma_lazy(folio, true, test_map);
+ rcu_read_unlock();
+ return vma;
+}
+
static anon_rmap_t folio_anon_rmap(const struct folio *folio)
{
struct anon_vma *anon_vma;
+ struct vm_area_struct *vma;
+
+ if (folio_test_anon_vma_lazy(folio)) {
+ rcu_read_lock();
+ vma = folio_resolve_anon_vma_lazy(folio, false, false);
+ rcu_read_unlock();
+ if (vma)
+ return vma_to_anon_rmap(vma);
+ }
anon_vma = folio_anon_vma(folio);
return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
@@ -887,29 +975,49 @@ bool folio_maybe_same_anon_vma(const struct folio *folio,
const struct vm_area_struct *vma)
{
struct anon_vma *anon_vma;
- struct anon_vma *tgt_anon_vma = vma_anon_vma(vma);
+ struct anon_vma *tgt_anon_vma = anon_vma_tree_anon_vma(vma->anon_vma);
bool same = false;
rcu_read_lock();
- anon_vma = folio_anon_vma(folio);
- if (anon_vma && tgt_anon_vma)
- same = anon_vma->root == tgt_anon_vma->root;
+ if (folio_test_anon_vma_lazy(folio)) {
+ same = vma == folio_resolve_anon_vma_lazy(folio, false, false);
+ } else {
+ anon_vma = folio_anon_vma(folio);
+ if (anon_vma && tgt_anon_vma)
+ same = anon_vma->root == tgt_anon_vma->root;
+ }
rcu_read_unlock();
return same;
}
anon_rmap_t folio_get_anon_rmap(const struct folio *folio)
{
- struct anon_vma *anon_vma = folio_get_anon_vma(folio);
+ struct anon_vma *anon_vma;
+ struct vm_area_struct *vma;
+
+ if (folio_test_anon_vma_lazy(folio)) {
+ vma = folio_get_anon_vma_lazy(folio);
+ if (vma)
+ return vma_to_anon_rmap(vma);
+ }
+ anon_vma = folio_get_anon_vma(folio);
return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
}
anon_rmap_t folio_lock_anon_rmap_read(const struct folio *folio,
struct rmap_walk_control *rwc)
{
- struct anon_vma *anon_vma = folio_lock_anon_vma_read(folio, rwc);
+ struct anon_vma *anon_vma;
+ struct vm_area_struct *vma;
+
+ if (folio_test_anon_vma_lazy(folio)) {
+ vma = folio_lock_anon_vma_lazy_read(folio, rwc, true);
+ if (vma)
+ return vma_to_anon_rmap(vma);
+ }
+ anon_vma = folio_lock_anon_vma_read(folio, rwc);
return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
}
@@ -3140,6 +3248,12 @@ static anon_rmap_t rmap_walk_anon_lock(const struct folio *folio,
* are holding mmap_lock. Users without mmap_lock are required to
* take a reference count to prevent the anon_vma disappearing
*/
+ if (folio_test_anon_vma_lazy(folio)) {
+ struct vm_area_struct *vma;
+
+ vma = folio_lock_anon_vma_lazy_read(folio, rwc, false);
+ return vma ? vma_to_anon_rmap(vma) : ANON_RMAP_NULL;
+ }
anon_vma = folio_anon_vma(folio);
if (!anon_vma)
return ANON_RMAP_NULL;
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 10/15] mm: defer anon_vma creation with ANON_VMA_LAZY
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (8 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 09/15] mm: implement ANON_VMA_LAZY rmap semantics tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 11/15] mm: handle ANON_VMA_LAZY in huge page operations tao
` (7 subsequent siblings)
17 siblings, 0 replies; 22+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Mark VMAs as ANON_VMA_LAZY and defer anon_vma creation until fork,
avoiding early allocation when it may not be needed and reducing
overhead.
During fork(), ANON_VMA_LAZY VMAs are first upgraded to a regular
anon_vma in the parent to establish the sharing topology. Child VMAs
are created as ANON_VMA_TREE_PARENT and do not allocate anon_vma,
avoiding additional fork overhead.
Signed-off-by: tao <tao.wangtao@honor.com>
---
mm/internal.h | 9 +++
mm/memory.c | 4 +
mm/rmap.c | 209 ++++++++++++++++++++++++++++++++++++++++++++++++--
mm/vma.c | 9 ++-
4 files changed, 222 insertions(+), 9 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index 6b703646f66d..0a36eba3f63c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -417,6 +417,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src,
enum vma_operation operation);
int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma);
int __anon_vma_prepare(struct vm_area_struct *vma);
+/* Called on first anon fault or from anon_vma_prepare(). */
+void vma_prepare_anon_vma_lazy(struct vm_area_struct *vma);
void unlink_anon_vmas(struct vm_area_struct *vma);
static inline int anon_vma_prepare(struct vm_area_struct *vma)
@@ -424,6 +426,13 @@ static inline int anon_vma_prepare(struct vm_area_struct *vma)
if (likely(vma->anon_vma))
return 0;
+#ifdef CONFIG_ANON_VMA_LAZY
+ if (anon_vma_lazy_enabled()) {
+ vma_prepare_anon_vma_lazy(vma);
+ return 0;
+ }
+#endif
+
return __anon_vma_prepare(vma);
}
diff --git a/mm/memory.c b/mm/memory.c
index c13b79987b26..8fd3877f69fb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3822,6 +3822,10 @@ vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf)
if (likely(vma->anon_vma))
return 0;
+ if (anon_vma_lazy_enabled()) {
+ vma_prepare_anon_vma_lazy(vma);
+ return 0;
+ }
if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
if (!mmap_read_trylock(vma->vm_mm))
return VM_FAULT_RETRY;
diff --git a/mm/rmap.c b/mm/rmap.c
index f70e3cb9812e..d9424f4eb6d0 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -240,9 +240,118 @@ static void anon_vma_chain_assign(struct vm_area_struct *vma,
list_add(&avc->same_vma, &vma->anon_vma_chain);
}
+#ifdef CONFIG_ANON_VMA_LAZY
+/* Called on first anon fault or from anon_vma_prepare(). */
+void vma_prepare_anon_vma_lazy(struct vm_area_struct *vma)
+{
+ struct mm_struct *mm = vma->vm_mm;
+
+ spin_lock(&mm->page_table_lock);
+ if (!vma->anon_vma) {
+ vma_get(vma);
+ vma->anon_vma = (anon_vma_tree_t)(
+ (unsigned long)vma + ANON_VMA_TREE_VMA);
+ }
+ spin_unlock(&mm->page_table_lock);
+}
+
+/*
+ * Link VMA to its root ANON_VMA_TREE_VMA. Root holds reference to prevent
+ * premature freeing while folios reference it via folio->mapping.
+ */
+static bool vma_link_anon_vma_lazy_root(struct vm_area_struct *vma,
+ struct vm_area_struct *src)
+{
+ struct mm_struct *mm = src->vm_mm;
+ struct vm_area_struct *root_vma;
+ bool ret = false;
+
+ VM_BUG_ON_VMA(vma->vm_mm != src->vm_mm, vma);
+ /* src may be upgraded concurrently */
+ spin_lock(&mm->page_table_lock);
+ root_vma = anon_vma_tree_vma(src->anon_vma);
+ if (root_vma) {
+ vma_get(root_vma);
+ vma->anon_vma = src->anon_vma;
+ ret = true;
+ } else {
+ vma_set_anon_vma(vma, NULL);
+ }
+ spin_unlock(&mm->page_table_lock);
+ return ret;
+}
+
+/* Link VMA to its ANON_VMA_TREE_PARENT .*/
+static void vma_link_anon_vma_lazy_parent(struct vm_area_struct *vma,
+ struct vm_area_struct *src)
+{
+ struct anon_vma *parent_anon_vma = vma_anon_vma(src);
+
+ vma_assert_write_locked(src);
+ VM_BUG_ON_VMA(vma->anon_vma, vma);
+ VM_BUG_ON_VMA(!parent_anon_vma, src);
+
+ get_anon_vma(parent_anon_vma);
+ vma->anon_vma = (anon_vma_tree_t)(
+ (unsigned long)parent_anon_vma + ANON_VMA_TREE_PARENT);
+}
+
+/* Unlink VMA from anon_vma, dropping root/parent reference. */
+static bool vma_unlink_anon_vma_lazy(struct vm_area_struct *vma,
+ anon_vma_tree_t new_anon_vma_tree)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ anon_vma_tree_t anon_tree_mutable = READ_ONCE(vma->anon_vma);
+ anon_vma_tree_t anon_tree;
+ bool is_lazy = true;
+ struct vm_area_struct *root_vma = NULL;
+ struct anon_vma *parent_anon_vma = NULL;
+
+ VM_BUG_ON_VMA(anon_vma_tree_type(new_anon_vma_tree), vma);
+
+ anon_vma_tree_lock_write(anon_tree_mutable);
+ spin_lock(&mm->page_table_lock);
+ anon_tree = vma->anon_vma;
+ if (anon_vma_tree_is_vma(anon_tree)) {
+ root_vma = anon_vma_tree_vma(anon_tree);
+ vma->anon_vma = new_anon_vma_tree;
+ } else if (anon_vma_tree_is_parent(anon_tree)) {
+ parent_anon_vma = anon_vma_tree_anon_vma(anon_tree);
+ vma->anon_vma = new_anon_vma_tree;
+ } else {
+ is_lazy = false;
+ }
+ spin_unlock(&mm->page_table_lock);
+ anon_vma_tree_unlock_write(anon_tree_mutable);
+ if (!is_lazy)
+ return false;
+
+ /* drop reference after unlock */
+ VM_BUG_ON_VMA(!parent_anon_vma && !root_vma, vma);
+ if (parent_anon_vma) {
+ /* There must be nodes; it cannot be the last reference. */
+ VM_BUG_ON(RB_EMPTY_ROOT(&parent_anon_vma->rb_root.rb_root));
+ put_anon_vma(parent_anon_vma);
+ }
+ if (root_vma)
+ vma_put(root_vma);
+ return is_lazy;
+}
+#else
+static inline bool vma_link_anon_vma_lazy_root(struct vm_area_struct *vma,
+ struct vm_area_struct *src) { return false; }
+static void vma_link_anon_vma_lazy_parent(struct vm_area_struct *vma,
+ struct vm_area_struct *src) {}
+static inline bool vma_unlink_anon_vma_lazy(struct vm_area_struct *vma,
+ anon_vma_tree_t new_anon_vma_tree) { return false; }
+#endif
+
/**
- * __anon_vma_prepare - attach an anon_vma to a memory region
+ * vma_prepare_anon_vma - attach an anon_vma to a memory region
* @vma: the memory region in question
+ * @upgrade_lazy: true when upgrading a lazy VMA to a regular anon_vma.
+ * @parent_anon_vma: non-NULL if the VMA is inherited from its parent,
+ * otherwise NULL.
*
* This makes sure the memory mapping described by 'vma' has
* an 'anon_vma' attached to it, so that we can associate the
@@ -266,12 +375,14 @@ static void anon_vma_chain_assign(struct vm_area_struct *vma,
* to do any locking for the common case of already having
* an anon_vma.
*/
-int __anon_vma_prepare(struct vm_area_struct *vma)
+static int vma_prepare_anon_vma(struct vm_area_struct *vma, bool upgrade_lazy,
+ struct anon_vma *parent_anon_vma)
{
struct mm_struct *mm = vma->vm_mm;
struct anon_vma *anon_vma, *allocated;
anon_vma_tree_t anon_tree;
struct anon_vma_chain *avc;
+ bool is_lazy = false;
mmap_assert_locked(mm);
might_sleep();
@@ -282,19 +393,30 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
anon_vma = find_mergeable_anon_vma(vma);
allocated = NULL;
- if (!anon_vma) {
+ /* If parent_anon_vma exists, mergeable anon_vma root must match it. */
+ if (!anon_vma ||
+ (parent_anon_vma && anon_vma->root != parent_anon_vma->root)) {
anon_vma = anon_vma_alloc();
if (unlikely(!anon_vma))
goto out_enomem_free_avc;
- anon_vma->num_children++; /* self-parent link for new root */
allocated = anon_vma;
+ if (parent_anon_vma) {
+ anon_vma->root = parent_anon_vma->root;
+ anon_vma->parent = parent_anon_vma;
+ }
}
anon_tree = make_anon_vma_tree(anon_vma);
+ if (upgrade_lazy)
+ is_lazy = vma_unlink_anon_vma_lazy(vma, anon_tree);
anon_vma_tree_lock_write(anon_tree);
/* page_table_lock to protect against threads */
spin_lock(&mm->page_table_lock);
- if (likely(!vma->anon_vma)) {
+ if (likely(!vma->anon_vma || is_lazy)) {
+ if (anon_vma->root != anon_vma)
+ get_anon_vma(anon_vma->root);
+ if (allocated)
+ anon_vma->parent->num_children++;
vma->anon_vma = anon_tree;
anon_vma_chain_assign(vma, avc, anon_vma);
anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
@@ -318,6 +440,28 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
return -ENOMEM;
}
+/**
+ * __anon_vma_prepare - attach an anon_vma to a memory region
+ * @vma: the memory region in question
+ *
+ * Wrapper around vma_prepare_anon_vma() for the non-lazy case.
+ * Called when ANON_VMA_LAZY is disabled.
+ */
+int __anon_vma_prepare(struct vm_area_struct *vma)
+{
+ return vma_prepare_anon_vma(vma, false, NULL);
+}
+
+static int vma_upgrade_anon_vma_lazy(struct vm_area_struct *vma)
+{
+ anon_vma_tree_t vma_tree = vma->anon_vma;
+ struct anon_vma *parent_anon_vma = NULL;
+
+ if (anon_vma_tree_is_parent(vma_tree))
+ parent_anon_vma = anon_vma_tree_anon_vma(vma_tree);
+ return vma_prepare_anon_vma(vma, true, parent_anon_vma);
+}
+
static void check_anon_vma_clone(struct vm_area_struct *dst,
struct vm_area_struct *src,
enum vma_operation operation)
@@ -414,6 +558,20 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src,
if (!active_anon_tree)
return 0;
+ /* Check ANON_VMA_LAZY first. */
+ if (anon_vma_tree_is_vma(active_anon_tree)) {
+ if (vma_link_anon_vma_lazy_root(dst, src))
+ return 0;
+ } else if (anon_vma_tree_is_parent(active_anon_tree)) {
+ /* split from tree_parent is rare; promote to regular. */
+ int err = vma_upgrade_anon_vma_lazy(src);
+
+ if (err)
+ return err;
+ VM_BUG_ON_VMA(vma_is_anon_vma_lazy(src), src);
+ dst->anon_vma = src->anon_vma;
+ }
+
/*
* Allocate AVCs. We don't need an anon_vma lock for this as we
* are not updating the anon_vma rbtree nor are we changing
@@ -445,7 +603,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src,
maybe_reuse_anon_vma(dst, anon_vma);
}
- if (operation != VMA_OP_FORK)
+ if (operation != VMA_OP_FORK && vma_anon_vma(dst))
vma_anon_vma(dst)->num_active_vmas++;
anon_vma_tree_unlock_write(active_anon_tree);
@@ -456,9 +614,38 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src,
return -ENOMEM;
}
+static int vma_fork_anon_vma_lazy(struct vm_area_struct *vma,
+ struct vm_area_struct *pvma)
+{
+ int error;
+
+ if (vma_is_anon_vma_lazy(pvma)) {
+ error = vma_upgrade_anon_vma_lazy(pvma);
+ if (error)
+ return error;
+ VM_BUG_ON_VMA(vma_is_anon_vma_lazy(pvma), pvma);
+ }
+
+ vma_set_anon_vma(vma, NULL);
+ error = anon_vma_clone(vma, pvma, VMA_OP_FORK);
+ if (error)
+ return error;
+
+ if (vma->anon_vma)
+ return 0;
+ /* Lazily allocate the child anon_vma. */
+ vma_link_anon_vma_lazy_parent(vma, pvma);
+ return 0;
+}
+
/*
* Attach vma to its own anon_vma, as well as to the anon_vmas that
* the corresponding VMA in the parent process is attached to.
+ *
+ * For ANON_VMA_LAZY: if the parent VMA is lazy, upgrade it to a regular
+ * anon_vma before cloning. The child VMA may also be marked lazy when
+ * ANON_VMA_LAZY is enabled, deferring anon_vma allocation.
+ *
* Returns 0 on success, non-zero on failure.
*/
int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
@@ -472,6 +659,9 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
if (!pvma->anon_vma)
return 0;
+ if (anon_vma_lazy_enabled())
+ return vma_fork_anon_vma_lazy(vma, pvma);
+
/* Drop inherited anon_vma, we'll reuse existing or allocate new. */
vma_set_anon_vma(vma, NULL);
@@ -577,6 +767,10 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
return;
}
+ /* Unlink ANON_VMA_LAZY first, then ancestor anon_vma. */
+ if (vma_is_anon_vma_lazy(vma))
+ vma_unlink_anon_vma_lazy(vma, (anon_vma_tree_t)NULL);
+
anon_vma_tree_lock_write(active_anon_tree);
/*
@@ -601,7 +795,8 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
anon_vma_chain_free(avc);
}
- vma_anon_vma(vma)->num_active_vmas--;
+ if (vma_anon_vma(vma))
+ vma_anon_vma(vma)->num_active_vmas--;
/*
* vma would still be needed after unlink, and anon_vma will be prepared
* when handle fault.
diff --git a/mm/vma.c b/mm/vma.c
index ed15968a5891..0a31ef82a90c 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -1995,6 +1995,8 @@ static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct *
* acceptable for merging, so we can do all of this optimistically. But
* we do that READ_ONCE() to make sure that we never re-load the pointer.
*
+ * For upgrading ANON_VMA_LAZY VMAs, follow the same reuse rules as splitting.
+ *
* IOW: that the "list_is_singular()" test on the anon_vma_chain only
* matters for the 'stable anon_vma' case (ie the thing we want to avoid
* is to return an anon_vma that is "complex" due to having gone through
@@ -2005,12 +2007,15 @@ static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct *
* a read lock on the mmap_lock.
*/
static struct anon_vma *reusable_anon_vma(struct vm_area_struct *old,
+ struct vm_area_struct *vma,
struct vm_area_struct *a,
struct vm_area_struct *b)
{
if (anon_vma_compatible(a, b)) {
struct anon_vma *anon_vma = vma_anon_vma(old);
+ if (anon_vma && vma_is_anon_vma_lazy(vma))
+ return anon_vma;
if (anon_vma && list_is_singular(&old->anon_vma_chain))
return anon_vma;
}
@@ -2034,7 +2039,7 @@ struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *vma)
/* Try next first. */
next = vma_iter_load(&vmi);
if (next) {
- anon_vma = reusable_anon_vma(next, vma, next);
+ anon_vma = reusable_anon_vma(next, vma, vma, next);
if (anon_vma)
return anon_vma;
}
@@ -2044,7 +2049,7 @@ struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *vma)
prev = vma_prev(&vmi);
/* Try prev next. */
if (prev)
- anon_vma = reusable_anon_vma(prev, prev, vma);
+ anon_vma = reusable_anon_vma(prev, vma, prev, vma);
/*
* We might reach here with anon_vma == NULL if we can't find
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 11/15] mm: handle ANON_VMA_LAZY in huge page operations
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (9 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 10/15] mm: defer anon_vma creation with ANON_VMA_LAZY tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 12/15] mm: handle ANON_VMA_LAZY during migration tao
` (6 subsequent siblings)
17 siblings, 0 replies; 22+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
When splitting a huge page, the folio needs to be converted into multiple
subpages. Holding only folio_lock(folio) cannot guarantee that the split
operation completes atomically.
Check and upgrade anon_vma during huge page allocation and collapse to
ensure the anon_vma is properly protected.
Signed-off-by: tao <tao.wangtao@honor.com>
---
mm/internal.h | 5 +++++
mm/khugepaged.c | 5 +++++
mm/memory.c | 17 +++++++++++++----
mm/rmap.c | 15 +++++++++++----
4 files changed, 34 insertions(+), 8 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index 0a36eba3f63c..a746f5272aa6 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -419,6 +419,11 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma);
int __anon_vma_prepare(struct vm_area_struct *vma);
/* Called on first anon fault or from anon_vma_prepare(). */
void vma_prepare_anon_vma_lazy(struct vm_area_struct *vma);
+/*
+ * Upgrade VMA ANON_VMA_LAZY to a regular anon_vma during fork, or when
+ * cloning ANON_VMA_TREE_PARENT or a hugepage VMA.
+ */
+int vma_upgrade_anon_vma_lazy(struct vm_area_struct *vma);
void unlink_anon_vmas(struct vm_area_struct *vma);
static inline int anon_vma_prepare(struct vm_area_struct *vma)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 747748eace91..a33cda026be7 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1164,6 +1164,11 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
if (result != SCAN_SUCCEED)
goto out_up_write;
+ /* Upgrade anon_vma_lazy to protect the anon_vma. */
+ if (vma_upgrade_anon_vma_lazy(vma)) {
+ result = SCAN_FAIL;
+ goto out_up_write;
+ }
anon_vma_tree_lock_write(vma->anon_vma);
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address,
diff --git a/mm/memory.c b/mm/memory.c
index 8fd3877f69fb..26d116b3393c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3819,19 +3819,28 @@ vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
vm_fault_t ret = 0;
+ bool maybe_huge = pmd_none(*vmf->pmd);
- if (likely(vma->anon_vma))
- return 0;
- if (anon_vma_lazy_enabled()) {
+ if (likely(vma->anon_vma)) {
+ if (!vma_is_anon_vma_lazy(vma) || !maybe_huge)
+ return 0;
+ }
+#ifdef CONFIG_ANON_VMA_LAZY
+ if (anon_vma_lazy_enabled() && !maybe_huge) {
vma_prepare_anon_vma_lazy(vma);
return 0;
}
+#endif
if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
if (!mmap_read_trylock(vma->vm_mm))
return VM_FAULT_RETRY;
}
- if (__anon_vma_prepare(vma))
+ if (!vma->anon_vma && __anon_vma_prepare(vma))
+ ret = VM_FAULT_OOM;
+#ifdef CONFIG_ANON_VMA_LAZY
+ if (vma->anon_vma && maybe_huge && vma_upgrade_anon_vma_lazy(vma))
ret = VM_FAULT_OOM;
+#endif
if (vmf->flags & FAULT_FLAG_VMA_LOCK)
mmap_read_unlock(vma->vm_mm);
return ret;
diff --git a/mm/rmap.c b/mm/rmap.c
index d9424f4eb6d0..57cd85efc50a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -452,13 +452,20 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
return vma_prepare_anon_vma(vma, false, NULL);
}
-static int vma_upgrade_anon_vma_lazy(struct vm_area_struct *vma)
+/**
+ * vma_upgrade_anon_vma_lazy - upgrade a VMA's lazy anon_vma to a regular one
+ * @vma: the VMA whose anon_vma_lazy is being upgraded
+ */
+int vma_upgrade_anon_vma_lazy(struct vm_area_struct *vma)
{
- anon_vma_tree_t vma_tree = vma->anon_vma;
+ anon_vma_tree_t anon_tree = READ_ONCE(vma->anon_vma);
struct anon_vma *parent_anon_vma = NULL;
- if (anon_vma_tree_is_parent(vma_tree))
- parent_anon_vma = anon_vma_tree_anon_vma(vma_tree);
+ VM_BUG_ON_VMA(!anon_tree, vma);
+ if (!anon_vma_tree_type(anon_tree))
+ return 0;
+ if (anon_vma_tree_is_parent(anon_tree))
+ parent_anon_vma = anon_vma_tree_anon_vma(anon_tree);
return vma_prepare_anon_vma(vma, true, parent_anon_vma);
}
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 12/15] mm: handle ANON_VMA_LAZY during migration
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (10 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 11/15] mm: handle ANON_VMA_LAZY in huge page operations tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 13/15] mm: support setup and upgrade of ANON_VMA_LAZY folios tao
` (5 subsequent siblings)
17 siblings, 0 replies; 22+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
To ensure the atomicity of folio migration, introduce
folio_trylock_get_anon_rmap().
This helper guarantees that the migration operation is mutually
exclusive with free_pgtables(). For ANON_VMA_LAZY, it uses
vma_start_read() to prevent the VMA from being modified or removed
during migration.
Signed-off-by: tao <tao.wangtao@honor.com>
---
include/linux/rmap.h | 12 ++++++++
mm/migrate.c | 71 +++++++++++++++++++++++++-------------------
mm/rmap.c | 40 +++++++++++++++++++++++++
3 files changed, 92 insertions(+), 31 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index ebe9f3f61170..59244481a8c1 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -1042,6 +1042,18 @@ bool folio_maybe_same_anon_vma(const struct folio *folio,
anon_rmap_t folio_get_anon_rmap(const struct folio *folio);
anon_rmap_t folio_lock_anon_rmap_read(const struct folio *folio,
struct rmap_walk_control *rwc);
+/*
+ * folio_trylock_get_anon_rmap ensures that the migration operation
+ * completes atomically and is mutually exclusive with free_pgtables().
+ *
+ * Note: for ANON_VMA_LAZY, this is not equivalent to
+ * anon_rmap_trylock_read() + folio_get_anon_rmap(), because
+ * anon_rmap_trylock_read() only increments the VMA reference count,
+ * while this helper uses vma_start_read() to prevent the VMA from
+ * being modified or removed.
+ */
+anon_rmap_t folio_trylock_get_anon_rmap(const struct folio *folio);
+void anon_rmap_unlock_put(anon_rmap_t anon_rmap);
static inline struct vm_area_struct *anon_rmap_iter_first_vma(
anon_rmap_t anon_rmap, unsigned long start, unsigned long last,
diff --git a/mm/migrate.c b/mm/migrate.c
index b397cdeab09a..4abbfd1faea2 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1173,10 +1173,11 @@ static void migrate_folio_undo_src(struct folio *src,
struct list_head *ret)
{
if (page_was_mapped)
- remove_migration_ptes(src, src, 0);
+ remove_migration_ptes(src, src,
+ anon_rmap_value(anon_rmap) ? TTU_RMAP_LOCKED : 0);
/* Drop an anon_rmap reference if we took one */
if (anon_rmap_value(anon_rmap))
- put_anon_rmap(anon_rmap);
+ anon_rmap_unlock_put(anon_rmap);
if (locked)
folio_unlock(src);
if (ret)
@@ -1279,6 +1280,18 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
folio_wait_writeback(src);
}
+ /*
+ * Block others from accessing the new page when we get around to
+ * establishing additional references. We are usually the only one
+ * holding a reference to dst at this point. We used to have a BUG
+ * here if folio_trylock(dst) fails, but would like to allow for
+ * cases where there might be a race with the previous use of dst.
+ * This is much like races on refcount of oldpage: just don't BUG().
+ */
+ if (unlikely(!folio_trylock(dst)))
+ goto out;
+ dst_locked = true;
+
/*
* By try_to_migrate(), src->mapcount goes down to 0 here. In this case,
* we cannot notice that anon_vma is freed while we migrate a page.
@@ -1287,26 +1300,17 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
* File Caches may use write_page() or lock_page() in migration, then,
* just care Anon page here.
*
- * Only folio_get_anon_rmap() understands the subtleties of
- * getting a hold on an anon_rmap from outside one of its mms.
+ * Only folio_trylock_get_anon_rmap() understands the subtleties of
+ * getting and locking an anon_rmap from outside one of its mms.
* But if we cannot get anon_rmap, then we won't need it anyway,
* because that implies that the anon page is no longer mapped
* (and cannot be remapped so long as we hold the page lock).
*/
- if (folio_test_anon(src) && !folio_test_ksm(src))
- anon_rmap = folio_get_anon_rmap(src);
-
- /*
- * Block others from accessing the new page when we get around to
- * establishing additional references. We are usually the only one
- * holding a reference to dst at this point. We used to have a BUG
- * here if folio_trylock(dst) fails, but would like to allow for
- * cases where there might be a race with the previous use of dst.
- * This is much like races on refcount of oldpage: just don't BUG().
- */
- if (unlikely(!folio_trylock(dst)))
- goto out;
- dst_locked = true;
+ if (folio_test_anon(src) && !folio_test_ksm(src)) {
+ anon_rmap = folio_trylock_get_anon_rmap(src);
+ if (!anon_rmap_value(anon_rmap))
+ goto out;
+ }
if (unlikely(page_has_movable_ops(&src->page))) {
__migrate_folio_record(dst, old_page_state, anon_rmap);
@@ -1331,10 +1335,14 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
goto out;
}
} else if (folio_mapped(src)) {
+ enum ttu_flags ttu = mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0;
+
+ if (anon_rmap_value(anon_rmap))
+ ttu |= TTU_RMAP_LOCKED;
/* Establish migration ptes */
VM_BUG_ON_FOLIO(folio_test_anon(src) &&
!folio_test_ksm(src) && !anon_rmap_value(anon_rmap), src);
- try_to_migrate(src, mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0);
+ try_to_migrate(src, ttu);
old_page_state |= PAGE_WAS_MAPPED;
}
@@ -1415,7 +1423,8 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
lru_add_drain();
if (old_page_state & PAGE_WAS_MAPPED)
- remove_migration_ptes(src, dst, 0);
+ remove_migration_ptes(src, dst,
+ anon_rmap_value(anon_rmap) ? TTU_RMAP_LOCKED : 0);
out_unlock_both:
folio_unlock(dst);
@@ -1434,7 +1443,7 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
list_del(&src->lru);
/* Drop an anon_rmap reference if we took one */
if (anon_rmap_value(anon_rmap))
- put_anon_rmap(anon_rmap);
+ anon_rmap_unlock_put(anon_rmap);
folio_unlock(src);
migrate_folio_done(src, reason);
@@ -1485,7 +1494,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
int page_was_mapped = 0;
anon_rmap_t anon_rmap = ANON_RMAP_NULL;
struct address_space *mapping = NULL;
- enum ttu_flags ttu = 0;
+ enum ttu_flags ttu = TTU_RMAP_LOCKED;
if (folio_ref_count(src) == 1) {
/* page was freed from under us. So we are done. */
@@ -1519,11 +1528,14 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
goto out_unlock;
}
- if (folio_test_anon(src))
- anon_rmap = folio_get_anon_rmap(src);
-
if (unlikely(!folio_trylock(dst)))
- goto put_anon;
+ goto out_unlock;
+
+ if (folio_test_anon(src)) {
+ anon_rmap = folio_trylock_get_anon_rmap(src);
+ if (!anon_rmap_value(anon_rmap))
+ goto unlock_put_anon;
+ }
if (folio_mapped(src)) {
if (!folio_test_anon(src)) {
@@ -1536,8 +1548,6 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
mapping = hugetlb_folio_mapping_lock_write(src);
if (unlikely(!mapping))
goto unlock_put_anon;
-
- ttu = TTU_RMAP_LOCKED;
}
try_to_migrate(src, ttu);
@@ -1550,15 +1560,14 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
if (page_was_mapped)
remove_migration_ptes(src, !rc ? dst : src, ttu);
- if (ttu & TTU_RMAP_LOCKED)
+ if (page_was_mapped && !folio_test_anon(src))
i_mmap_unlock_write(mapping);
unlock_put_anon:
folio_unlock(dst);
-put_anon:
if (anon_rmap_value(anon_rmap))
- put_anon_rmap(anon_rmap);
+ anon_rmap_unlock_put(anon_rmap);
if (!rc) {
move_hugetlb_state(src, dst, reason);
diff --git a/mm/rmap.c b/mm/rmap.c
index 57cd85efc50a..46876b3dbfbc 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1223,6 +1223,46 @@ anon_rmap_t folio_lock_anon_rmap_read(const struct folio *folio,
return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
}
+anon_rmap_t folio_trylock_get_anon_rmap(const struct folio *folio)
+{
+ struct anon_vma *anon_vma;
+ struct vm_area_struct *vma;
+
+ if (folio_test_anon_vma_lazy(folio)) {
+ vma = folio_get_anon_vma_lazy(folio);
+ if (vma && !lock_vma_under_rcu(vma->vm_mm, vma->vm_start)) {
+ vma_put(vma);
+ vma = NULL;
+ }
+ if (vma)
+ return vma_to_anon_rmap(vma);
+ }
+
+ anon_vma = folio_get_anon_vma(folio);
+ if (anon_vma && !anon_vma_trylock_read(anon_vma)) {
+ put_anon_vma(anon_vma);
+ anon_vma = NULL;
+ }
+ return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
+}
+
+void anon_rmap_unlock_put(anon_rmap_t anon_rmap)
+{
+ struct anon_vma *anon_vma;
+
+ if (!anon_rmap_is_anon_vma(anon_rmap)) {
+ struct vm_area_struct *vma = anon_rmap_to_vma(anon_rmap);
+
+ vma_end_read(vma);
+ vma_put(vma);
+ return;
+ }
+
+ anon_vma = anon_rmap_to_anon_vma(anon_rmap);
+ anon_vma_unlock_read(anon_vma);
+ put_anon_vma(anon_vma);
+}
+
#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
/*
* Flush TLB entries for recently unmapped pages from remote CPUs. It is
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 13/15] mm: support setup and upgrade of ANON_VMA_LAZY folios
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (11 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 12/15] mm: handle ANON_VMA_LAZY during migration tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 14/15] mm: support merging of ANON_VMA_LAZY VMAs tao
` (4 subsequent siblings)
17 siblings, 0 replies; 22+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
new_anon_rmap() and move_anon_rmap() decide whether to set
PAGE_MAPPING_ANON_VMA_LAZY.
try_dup_anon_rmap() upgrades the folio to PAGE_MAPPING_ANON
during fork() when required.
rmap_walk_anon() detects ANON_VMA_LAZY upgrades and retries
the walk to ensure the mapping is handled correctly.
remove_rmap() needs no special handling since folio_mapped()
is checked before use.
Signed-off-by: tao <tao.wangtao@honor.com>
---
include/linux/rmap.h | 38 ++++++++++++++++++++++++++++++++++++++
mm/rmap.c | 21 ++++++++++++++++++++-
2 files changed, 58 insertions(+), 1 deletion(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 59244481a8c1..9b1970698204 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -392,6 +392,14 @@ static __always_inline void __folio_rmap_sanity_checks(const struct folio *folio
unsigned long mapping = (unsigned long)folio->mapping;
struct anon_vma *anon_vma;
+ if (folio_test_anon_vma_lazy(folio)) {
+ struct vm_area_struct *root_vma =
+ (void *)(mapping - FOLIO_MAPPING_ANON_VMA_LAZY);
+
+ VM_WARN_ON_FOLIO(!rcuref_read(&root_vma->vm_rcuref), folio);
+ return;
+ }
+
anon_vma = (void *)(mapping - FOLIO_MAPPING_ANON);
VM_WARN_ON_FOLIO(atomic_read(&anon_vma->refcount) == 0, folio);
}
@@ -431,6 +439,31 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);
+/**
+ * folio_upgrade_anon_vma_lazy - upgrade folio->mapping from ANON_VMA_LAZY to
+ * an anon_vma
+ * @folio: The folio to upgrade
+ * @vma: The VMA the folio currently belongs to
+ *
+ * Upgrade folio->mapping from ANON_VMA_LAZY to an anon_vma.
+ * This transition is strictly one-way and never reverts back to a lazy
+ * mapping.
+ *
+ * Called during fork() while holding the mmap lock and the VMA write lock,
+ * but without taking the folio lock. Concurrent readers may briefly observe
+ * the old lazy mapping. Migration relies on folio_trylock_get_anon_rmap()
+ * to ensure atomicity, while other rmap operations remain unaffected.
+ */
+static inline void folio_upgrade_anon_vma_lazy(struct folio *folio,
+ struct vm_area_struct *vma)
+{
+ unsigned long anon_tree = (unsigned long)vma->anon_vma;
+
+ VM_BUG_ON_VMA(!anon_tree || !IS_ALIGNED(anon_tree, sizeof(long)), vma);
+ anon_tree = anon_tree + FOLIO_MAPPING_ANON;
+ WRITE_ONCE(folio->mapping, (struct address_space *)anon_tree);
+}
+
/* See folio_try_dup_anon_rmap_*() */
static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
struct vm_area_struct *vma)
@@ -438,6 +471,9 @@ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+ if (folio_test_anon_vma_lazy(folio))
+ folio_upgrade_anon_vma_lazy(folio, vma);
+
if (PageAnonExclusive(&folio->page)) {
if (unlikely(folio_needs_cow_for_dma(vma, folio)))
return -EBUSY;
@@ -573,6 +609,8 @@ static __always_inline int __folio_try_dup_anon_rmap(struct folio *folio,
int i;
VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+ if (folio_test_anon_vma_lazy(folio))
+ folio_upgrade_anon_vma_lazy(folio, src_vma);
__folio_rmap_sanity_checks(folio, page, nr_pages, level);
/*
diff --git a/mm/rmap.c b/mm/rmap.c
index 46876b3dbfbc..e14509b47412 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2002,6 +2002,16 @@ void folio_move_anon_rmap(struct folio *folio, struct vm_area_struct *vma)
void *anon_vma = vma_anon_vma(vma);
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
+
+ if (!anon_vma) {
+ const struct vm_area_struct *root_vma = vma_anon_vma_lazy_root(vma);
+
+ VM_BUG_ON_VMA(!root_vma, vma);
+ root_vma = (void *)root_vma + FOLIO_MAPPING_ANON_VMA_LAZY;
+ WRITE_ONCE(folio->mapping, (struct address_space *)root_vma);
+ return;
+ }
+
VM_BUG_ON_VMA(!anon_vma, vma);
anon_vma += FOLIO_MAPPING_ANON;
@@ -2023,7 +2033,16 @@ void folio_move_anon_rmap(struct folio *folio, struct vm_area_struct *vma)
static void __folio_set_anon(struct folio *folio, struct vm_area_struct *vma,
unsigned long address, bool exclusive)
{
- struct anon_vma *anon_vma = vma_anon_vma(vma);
+ anon_vma_tree_t anon_tree = vma->anon_vma;
+ const struct vm_area_struct *root_vma = vma_anon_vma_lazy_root(vma);
+ struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
+
+ if (root_vma && (anon_vma_tree_is_vma(anon_tree) || exclusive)) {
+ root_vma = (void *)root_vma + FOLIO_MAPPING_ANON_VMA_LAZY;
+ WRITE_ONCE(folio->mapping, (struct address_space *)root_vma);
+ folio->index = linear_page_index(vma, address);
+ return;
+ }
BUG_ON(!anon_vma);
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 14/15] mm: support merging of ANON_VMA_LAZY VMAs
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (12 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 13/15] mm: support setup and upgrade of ANON_VMA_LAZY folios tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 15/15] mm: enable CONFIG_ANON_VMA_LAZY on arm64 and x86_64 tao
` (3 subsequent siblings)
17 siblings, 0 replies; 22+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Allow ANON_VMA_LAZY VMAs to merge if they share the same root or if one
side has no root.
For ANON_VMA_LAZY merges, do not delete the lazy root VMA. The lazy root
VMA may still be referenced by folio->mapping.
Signed-off-by: tao <tao.wangtao@honor.com>
---
mm/vma.c | 29 ++++++++++++++++++++++++-----
1 file changed, 24 insertions(+), 5 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index 0a31ef82a90c..ae1047dcfbc2 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -76,9 +76,10 @@ static bool vma_is_fork_child(struct vm_area_struct *vma)
/*
* The list_is_singular() test is to avoid merging VMA cloned from
* parents. This can improve scalability caused by the anon_vma root
- * lock.
+ * lock. ANON_VMA_TREE_VMA has no anon_vma_chain.
*/
- return vma && vma->anon_vma && !list_is_singular(&vma->anon_vma_chain);
+ return vma && vma->anon_vma && !anon_vma_tree_is_vma(vma->anon_vma) &&
+ !list_is_singular(&vma->anon_vma_chain);
}
static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_next)
@@ -776,6 +777,17 @@ static bool can_merge_remove_vma(struct vm_area_struct *vma)
return !vma->vm_ops || !vma->vm_ops->close;
}
+/*
+ * The ANON_VMA_LAZY root VMA may still be referenced by folio->mapping.
+ * Keeping the root avoids allocating an extra VMA.
+ */
+#define SWAP_VMG_TARGET_IF_DELETE_ANON_VMA_LAZY_ROOT(vmg, delete_vma) do { \
+ if (anon_vma_lazy_enabled()) { \
+ if (delete_vma && vma_is_anon_vma_lazy_root(delete_vma)) \
+ swap(vmg->target, delete_vma); \
+ } \
+} while (0)
+
/*
* vma_merge_existing_range - Attempt to merge VMAs based on a VMA having its
* attributes modified.
@@ -933,12 +945,15 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
vmg->end = next->vm_end;
vmg->pgoff = prev->vm_pgoff;
+ SWAP_VMG_TARGET_IF_DELETE_ANON_VMA_LAZY_ROOT(vmg, middle);
+ SWAP_VMG_TARGET_IF_DELETE_ANON_VMA_LAZY_ROOT(vmg, next);
+
/*
* We already ensured anon_vma compatibility above, so now it's
* simply a case of, if prev has no anon_vma object, which of
* next or middle contains the anon_vma we must duplicate.
*/
- err = dup_anon_vma(prev, next->anon_vma ? next : middle,
+ err = dup_anon_vma(vmg->target, next->anon_vma ? next : middle,
&anon_dup);
} else if (merge_left) {
/*
@@ -954,8 +969,10 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
if (!vmg->__remove_middle)
vmg->__adjust_middle_start = true;
+ else
+ SWAP_VMG_TARGET_IF_DELETE_ANON_VMA_LAZY_ROOT(vmg, middle);
- err = dup_anon_vma(prev, middle, &anon_dup);
+ err = dup_anon_vma(vmg->target, middle, &anon_dup);
} else { /* merge_right */
/*
* |<------------->| OR
@@ -974,6 +991,7 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
if (vmg->__remove_middle) {
vmg->end = next->vm_end;
vmg->pgoff = next->vm_pgoff - pglen;
+ SWAP_VMG_TARGET_IF_DELETE_ANON_VMA_LAZY_ROOT(vmg, middle);
} else {
/* We shrink middle and expand next. */
vmg->__adjust_next_start = true;
@@ -982,7 +1000,7 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
vmg->pgoff = middle->vm_pgoff;
}
- err = dup_anon_vma(next, middle, &anon_dup);
+ err = dup_anon_vma(vmg->target, middle, &anon_dup);
}
if (err || commit_merge(vmg))
@@ -1212,6 +1230,7 @@ int vma_expand(struct vma_merge_struct *vmg)
vma_start_write(next);
vmg->__remove_next = true;
+ SWAP_VMG_TARGET_IF_DELETE_ANON_VMA_LAZY_ROOT(vmg, next);
next_sticky = vma_flags_and_mask(&next->flags, VMA_STICKY_FLAGS);
vma_flags_set_mask(&sticky_flags, next_sticky);
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 15/15] mm: enable CONFIG_ANON_VMA_LAZY on arm64 and x86_64
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (13 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 14/15] mm: support merging of ANON_VMA_LAZY VMAs tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:23 ` [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation Pedro Falcato
` (2 subsequent siblings)
17 siblings, 0 replies; 22+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
All prerequisites are in place, so enable CONFIG_ANON_VMA_LAZY for
arm64 and x86_64.
Signed-off-by: tao <tao.wangtao@honor.com>
---
arch/arm64/Kconfig | 1 +
arch/x86/Kconfig | 1 +
mm/rmap.c | 2 +-
3 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fe60738e5943..9517883f0aaf 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -81,6 +81,7 @@ config ARM64
select ARCH_SUPPORTS_NUMA_BALANCING
select ARCH_SUPPORTS_PAGE_TABLE_CHECK
select ARCH_SUPPORTS_PER_VMA_LOCK
+ select ARCH_SUPPORTS_ANON_VMA_LAZY
select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
select ARCH_SUPPORTS_RT
select ARCH_SUPPORTS_SCHED_SMT
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f3f7cb01d69d..cc3430eaa7b4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -28,6 +28,7 @@ config X86_64
select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
select ARCH_SUPPORTS_PER_VMA_LOCK
+ select ARCH_SUPPORTS_ANON_VMA_LAZY
select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
select HAVE_ARCH_SOFT_DIRTY
select MODULES_USE_ELF_RELA
diff --git a/mm/rmap.c b/mm/rmap.c
index e14509b47412..77e2ab95671a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -168,7 +168,7 @@ static struct kmem_cache *anon_vma_chain_cachep;
* covering both regular anon_vma and lazy anon_vma mappings.
*/
-bool anon_vma_lazy_enable;
+bool anon_vma_lazy_enable = true;
#endif
static inline struct anon_vma *anon_vma_alloc(void)
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (14 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 15/15] mm: enable CONFIG_ANON_VMA_LAZY on arm64 and x86_64 tao
@ 2026-05-27 11:23 ` Pedro Falcato
2026-05-27 11:30 ` Lorenzo Stoakes
2026-05-27 14:33 ` Lorenzo Stoakes
17 siblings, 0 replies; 22+ messages in thread
From: Pedro Falcato @ 2026-05-27 11:23 UTC (permalink / raw)
To: tao
Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, kas, ljs, hpa, liam,
vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh, jgg,
jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, linux-arm-kernel, linux-kernel,
linux-fsdevel, linux-mm, damon, shakeel.butt, ryncsn, 21cnbao,
jparsana, dvander, zhangji1, wangzicheng
On Wed, May 27, 2026 at 07:01:32PM +0800, tao wrote:
> TL;DR
> -----
>
> This series introduces ANON_VMA_LAZY, which defers anon_vma creation
> until it is actually required.
>
> - anon_vma memory reduced by ~92-97%, anon_vma_chain reduced by ~50-57%
> - rmap operations on ANON_VMA_LAZY VMAs do not require anon_vma locking
>
> Background
> ----------
>
> Currently anon_vma structures are created eagerly when anonymous VMAs
> are initialized. However, many VMAs never participate in fork or rmap
This is not true, they are created on fault + a few other places.
> operations that require anon_vma chains, so the allocated anon_vma and
> anon_vma_chain objects are often unnecessary.
>
> Design overview
> ---------------
>
> ANON_VMA_LAZY defers anon_vma allocation until it is actually needed
> (for example during fork). VMAs that never participate in sharing can
> avoid creating anon_vma structures entirely.
>
> Before an anon_vma exists, rmap operations rely directly on VMA
> information, so no anon_vma locking is required. An anon_vma is created
> and linked only when sharing semantics are required.
>
> This series introduces anon_rmap helpers to make rmap less dependent on
> direct anon_vma access. It also introduces anon_vma_tree_t as a container
> to support both the lazy and the existing anon_vma layouts.
>
> Once a VMA becomes associated with an anon_vma, the normal behavior
> remains unchanged.
>
> Memory impact
> -------------
>
> Preliminary measurements show significant reductions in anon_vma-related
> slab allocations.
>
> After boot:
>
> Object | Before (active KB) | After (active KB) | Change
> vm_area_struct | 117035 | 118176 | +1.0%
> anon_vma_chain | 18865.8 | 8112.06 | -57.0%
> anon_vma | 20426.4 | 613.75 | -97.0%
>
> After launching 24 apps:
>
> Object | Before (active KB) | After (active KB) | Change
> vm_area_struct | 196873 | 197345 | +0.2%
> anon_vma_chain | 31477.1 | 15576.8 | -50.5%
> anon_vma | 33280 | 2648.12 | -92.0%
>
> Simple fork microbenchmarks also show a slight improvement in fork
> performance, since child VMAs do not need to allocate anon_vma
> structures during fork.
>
> Feedback and suggestions are welcome.
I'm afraid, per previous discussions[1], that no one is really willing to
maintain extra complexity for the current state of anon rmap and anon vmas.
Sorry :/
Also, please don't send series this large without previous discussion and
_at least_ an RFC tag.
[1] https://lore.kernel.org/all/aec533b2-37a7-4f44-a279-c4aa604206ac@lucifer.local/
--
Pedro
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (15 preceding siblings ...)
2026-05-27 11:23 ` [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation Pedro Falcato
@ 2026-05-27 11:30 ` Lorenzo Stoakes
2026-05-27 14:33 ` Lorenzo Stoakes
17 siblings, 0 replies; 22+ messages in thread
From: Lorenzo Stoakes @ 2026-05-27 11:30 UTC (permalink / raw)
To: tao
Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, kas, hpa, liam,
vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh, jgg,
jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
I'm sorry but this is not how kernel development is done.
You're sending a series that's very invasive, that you've not coordinated
with anybody else, nor have you mentioned it at a conference, nor engaged
with in discussion with anybody else in the community in any way.
And you've sent it without an RFC, at -rc5 is... quite something.
We do NOT want to extend or expand or hack in anything like this on top of
the existing anon_vma machinery. It's a mess that requires replacement, not
more hacks or expansion.
I've been working on a replacement for the anonymous rmap, recently
presenting at LSF/MM, and all of that has been very public.
In fact I have engaged in recent work which reduced lock contention in
anon_vma, it's really quite discourteous for you not to have contacted me
or the community in addition to the above.
On Wed, May 27, 2026 at 07:01:32PM +0800, tao wrote:
> TL;DR
> -----
>
> This series introduces ANON_VMA_LAZY, which defers anon_vma creation
> until it is actually required.
>
> - anon_vma memory reduced by ~92-97%, anon_vma_chain reduced by ~50-57%
> - rmap operations on ANON_VMA_LAZY VMAs do not require anon_vma locking
>
> Background
> ----------
>
> Currently anon_vma structures are created eagerly when anonymous VMAs
> are initialized. However, many VMAs never participate in fork or rmap
What are you talking about? 'Initialized'? They are created when memory is
faulted in, and we explicity need to know that that's the case.
Also the folio->mapping is required to point to something to allow for anon
rmap...
> operations that require anon_vma chains, so the allocated anon_vma and
> anon_vma_chain objects are often unnecessary.
Right, because we never split or merge VMAs nor require anon rmap?
>
> Design overview
> ---------------
>
> ANON_VMA_LAZY defers anon_vma allocation until it is actually needed
> (for example during fork). VMAs that never participate in sharing can
> avoid creating anon_vma structures entirely.
Well, it's needed the second something's faulted in so you can perform anon
rmap.
>
> Before an anon_vma exists, rmap operations rely directly on VMA
> information, so no anon_vma locking is required. An anon_vma is created
> and linked only when sharing semantics are required.
Err 'directly on VMA information'... a VMA pointer? That can change at any
point? What about remaps?...
I guess I'll see in the code.
>
> This series introduces anon_rmap helpers to make rmap less dependent on
> direct anon_vma access. It also introduces anon_vma_tree_t as a container
> to support both the lazy and the existing anon_vma layouts.
Super invasive, extending the already broken abstraction further. We don't
want this.
>
> Once a VMA becomes associated with an anon_vma, the normal behavior
> remains unchanged.
>
> Memory impact
> -------------
>
> Preliminary measurements show significant reductions in anon_vma-related
> slab allocations.
>
> After boot:
>
> Object | Before (active KB) | After (active KB) | Change
> vm_area_struct | 117035 | 118176 | +1.0%
> anon_vma_chain | 18865.8 | 8112.06 | -57.0%
> anon_vma | 20426.4 | 613.75 | -97.0%
>
> After launching 24 apps:
>
> Object | Before (active KB) | After (active KB) | Change
> vm_area_struct | 196873 | 197345 | +0.2%
> anon_vma_chain | 31477.1 | 15576.8 | -50.5%
> anon_vma | 33280 | 2648.12 | -92.0%
>
> Simple fork microbenchmarks also show a slight improvement in fork
> performance, since child VMAs do not need to allocate anon_vma
> structures during fork.
This seems completely broken too re: anon_vma propagation on fork?
The above is only meaningful if you're not fundamentally breaking anon rmap
which is very easily done, but in addition, I'm not interested in seeing
the anon_vma machinery extended further.
>
> Feedback and suggestions are welcome.
This is what you should have sought AHEAD of sending this.
I'll look at the code, but in general you've gone about this in a really
unfortuate way with respect to the community. This is not to collaborate.
>
>
> tao (15):
> mm/rmap: introduce anon_rmap APIs for anonymous folios
> mm: convert anon_vma rmap APIs to anon_rmap
> mm: introduce anon_vma_tree_t for multiple anon_vma topologies
> mm: switch to anon_vma_tree_t APIs in preparation for ANON_VMA_LAZY
> mm: add CONFIG_ANON_VMA_LAZY and folio helpers
> mm: add CONFIG_VMA_REF and VMA helpers
> mm: replace direct FOLIO_MAPPING_ANON usage with helpers
> mm: prepare rmap infrastructure for ANON_VMA_LAZY
> mm: implement ANON_VMA_LAZY rmap semantics
> mm: defer anon_vma creation with ANON_VMA_LAZY
> mm: handle ANON_VMA_LAZY in huge page operations
> mm: handle ANON_VMA_LAZY during migration
> mm: support setup and upgrade of ANON_VMA_LAZY folios
> mm: support merging of ANON_VMA_LAZY VMAs
> mm: enable CONFIG_ANON_VMA_LAZY on arm64 and x86_64
>
> arch/arm64/Kconfig | 1 +
> arch/x86/Kconfig | 1 +
> fs/proc/page.c | 6 +-
> include/linux/mm.h | 38 ++
> include/linux/mm_types.h | 9 +-
> include/linux/page-flags.h | 34 +-
> include/linux/pagemap.h | 2 +-
> include/linux/rmap.h | 165 ++++++++-
> mm/Kconfig | 22 ++
> mm/damon/ops-common.c | 4 +-
> mm/debug.c | 2 +-
> mm/debug_vm_pgtable.c | 2 +-
> mm/gup.c | 6 +-
> mm/huge_memory.c | 16 +-
> mm/internal.h | 171 +++++++++
> mm/khugepaged.c | 13 +-
> mm/ksm.c | 43 ++-
> mm/memory-failure.c | 11 +-
> mm/memory.c | 19 +-
> mm/migrate.c | 126 ++++---
> mm/mmap.c | 15 +-
> mm/mremap.c | 4 +-
> mm/page_idle.c | 2 +-
> mm/rmap.c | 690 ++++++++++++++++++++++++++++++++++---
> mm/vma.c | 76 ++--
> mm/vma.h | 4 +-
> mm/vma_exec.c | 2 +-
> mm/vma_init.c | 1 +
> 28 files changed, 1279 insertions(+), 206 deletions(-)
>
> --
> 2.17.1
>
>
Lorenzo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 01/15] mm/rmap: introduce anon_rmap APIs for anonymous folios
2026-05-27 11:01 ` [PATCH 01/15] mm/rmap: introduce anon_rmap APIs for anonymous folios tao
@ 2026-05-27 11:44 ` Lorenzo Stoakes
0 siblings, 0 replies; 22+ messages in thread
From: Lorenzo Stoakes @ 2026-05-27 11:44 UTC (permalink / raw)
To: tao
Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, kas, hpa, liam,
vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh, jgg,
jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
On Wed, May 27, 2026 at 07:01:33PM +0800, tao wrote:
> Add a set of anon_rmap APIs to operate on the reverse mappings of
> anonymous folios.
>
> Introduce anon_rmap_for_each_vma() as a wrapper around
> vma_interval_tree_foreach(), so callers no longer access the
> interval tree directly.
>
> This prepares the rmap code for upcoming ANON_VMA_LAZY support and
> RCU-based lockless rmap traversal.
>
> No functional change intended.
This commit message is total garbage. You're not explaining WHY you're
using words to describe what the code does. I can read the code?
>
> Signed-off-by: tao <tao.wangtao@honor.com>
This is all horrible, horribly invasive, and adding a pile of crap on machinery
we want to get rid of.
You've added zero explanation or comments. This is just not upstreamable,
and even if you did explain yourself we don't want to extend a broken
abstraction with more broken complexity?
You're also seemingly introducing a typesafe wrapper to wrap an arbitrary value?
> ---
> include/linux/rmap.h | 68 +++++++++++++++++++++++++++++++++++++++++
> mm/rmap.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 141 insertions(+)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 8dc0871e5f00..c42314ea4362 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -937,6 +937,44 @@ int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t pgoff,
> void remove_migration_ptes(struct folio *src, struct folio *dst,
> enum ttu_flags flags);
>
> +/* Reverse mapping handle for anonymous folio rmap helpers. */
> +typedef struct anon_rmap {
> + unsigned long rmap;
> +} anon_rmap_t;
I do not know why you're using a typedef when you just treat it as an
arbitrary value?
> +
> +#define ANON_RMAP_NULL make_anon_rmap(0)
This is just equivalent to a NULL value?...
> +
> +static inline anon_rmap_t make_anon_rmap(const void *anon_mapping)
> +{
> + return (anon_rmap_t){ .rmap = (unsigned long)anon_mapping, };
> +}
You're intentionally defeating type safety to store arbitrary values?...
> +
> +static inline unsigned long anon_rmap_value(anon_rmap_t anon_rmap)
> +{
> + return anon_rmap.rmap;
> +}
'Untype safe my arbitrarily type safe wrapped type'...?
> +
> +static inline anon_rmap_t anon_vma_to_anon_rmap(const struct anon_vma *anon_vma)
> +{
> + return make_anon_rmap(anon_vma);
> +}
> +
> +static inline struct anon_vma *anon_rmap_to_anon_vma(anon_rmap_t anon_rmap)
> +{
> + unsigned long rmap = anon_rmap_value(anon_rmap);
> +
> + return (struct anon_vma *)rmap;
> +}
A ton of noise for seemingly no value?
> +
> +anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *vma);
> +void put_anon_rmap(anon_rmap_t anon_rmap);
> +void anon_rmap_lock_write(anon_rmap_t anon_rmap);
> +int anon_rmap_trylock_write(anon_rmap_t anon_rmap);
> +void anon_rmap_unlock_write(anon_rmap_t anon_rmap);
> +void anon_rmap_lock_read(anon_rmap_t anon_rmap);
> +int anon_rmap_trylock_read(anon_rmap_t anon_rmap);
> +void anon_rmap_unlock_read(anon_rmap_t anon_rmap);
Yes let's add a bunch of extra broken abstractions on the broken abstraction.
And let's not comment anything!
> +
> /*
> * rmap_walk_control: To control rmap traversing for specific needs
> *
> @@ -969,6 +1007,36 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc);
> struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
> struct rmap_walk_control *rwc);
>
> +bool folio_maybe_same_anon_vma(const struct folio *folio,
> + const struct vm_area_struct *vma);
What the hell is this?
> +anon_rmap_t folio_get_anon_rmap(const struct folio *folio);
> +anon_rmap_t folio_lock_anon_rmap_read(const struct folio *folio,
> + struct rmap_walk_control *rwc);
> +
> +static inline struct vm_area_struct *anon_rmap_iter_first_vma(
> + anon_rmap_t anon_rmap, unsigned long start, unsigned long last,
> + struct anon_vma_chain **avc)
> +{
> + struct anon_vma *anon_vma = anon_rmap_to_anon_vma(anon_rmap);
> +
> + *avc = anon_vma_interval_tree_iter_first(&anon_vma->rb_root, start, last);
> + return *avc ? (*avc)->vma : NULL;
> +}
So we're allowing for folios to have NULL entries (really the commit
message should have that, rather than me scanning through uncommented
code), but in what world are we ok with an anon folio NOT BEING LINKED BACK
TO ITS VMA?
That's broken no?
> +
> +static inline struct vm_area_struct *anon_rmap_iter_next_vma(
> + anon_rmap_t anon_rmap, unsigned long start, unsigned long last,
> + struct anon_vma_chain **avc)
> +{
> + if (!*avc)
> + return NULL;
> + *avc = anon_vma_interval_tree_iter_next(*avc, start, last);
> + return *avc ? (*avc)->vma : NULL;
> +}
> +
> +#define anon_rmap_foreach_vma(vma, avc, anon_rmap, start, last) \
> + for (vma = anon_rmap_iter_first_vma(anon_rmap, start, last, &avc); \
> + vma; vma = anon_rmap_iter_next_vma(anon_rmap, start, last, &avc))
> +
> #else /* !CONFIG_MMU */
>
> #define anon_vma_init() do {} while (0)
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 78b7fb5f367c..1b2dada71778 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -701,6 +701,79 @@ struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
> return anon_vma;
> }
>
> +anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *vma)
> +{
> + mmap_assert_locked(vma->vm_mm);
> + VM_BUG_ON(!vma->anon_vma);
We don't use BUG_ON() especially VM_BUG_ON().
> + get_anon_vma(vma->anon_vma);
> + return anon_vma_to_anon_rmap(vma->anon_vma);
> +}
> +
> +void put_anon_rmap(anon_rmap_t anon_rmap)
> +{
> + put_anon_vma(anon_rmap_to_anon_vma(anon_rmap));
> +}
> +
> +void anon_rmap_lock_write(anon_rmap_t anon_rmap)
> +{
> + anon_vma_lock_write(anon_rmap_to_anon_vma(anon_rmap));
> +}
> +
> +int anon_rmap_trylock_write(anon_rmap_t anon_rmap)
> +{
> + return anon_vma_trylock_write(anon_rmap_to_anon_vma(anon_rmap));
> +}
> +
> +void anon_rmap_unlock_write(anon_rmap_t anon_rmap)
> +{
> + anon_vma_unlock_write(anon_rmap_to_anon_vma(anon_rmap));
> +}
> +
> +void anon_rmap_lock_read(anon_rmap_t anon_rmap)
> +{
> + anon_vma_lock_read(anon_rmap_to_anon_vma(anon_rmap));
> +}
> +
> +int anon_rmap_trylock_read(anon_rmap_t anon_rmap)
> +{
> + return anon_vma_trylock_read(anon_rmap_to_anon_vma(anon_rmap));
> +}
> +
> +void anon_rmap_unlock_read(anon_rmap_t anon_rmap)
> +{
> + anon_vma_unlock_read(anon_rmap_to_anon_vma(anon_rmap));
> +}
All of these assume that you're getting an anon_vma even though you have
established above that you can put arbitrary other values?
This is terrible?
> +
> +bool folio_maybe_same_anon_vma(const struct folio *folio,
> + const struct vm_area_struct *vma)
> +{
> + struct anon_vma *anon_vma;
> + struct anon_vma *tgt_anon_vma = vma->anon_vma;
> + bool same = false;
> +
> + rcu_read_lock();
> + anon_vma = folio_anon_vma(folio);
> + if (anon_vma && tgt_anon_vma)
> + same = anon_vma->root == tgt_anon_vma->root;
> + rcu_read_unlock();
> + return same;
What VMA locks are being held at this point? You assert none.
Why is it maybe?
Why are you taking the RCU lock?
> +}
> +
> +anon_rmap_t folio_get_anon_rmap(const struct folio *folio)
> +{
> + struct anon_vma *anon_vma = folio_get_anon_vma(folio);
> +
> + return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
> +}
> +
> +anon_rmap_t folio_lock_anon_rmap_read(const struct folio *folio,
> + struct rmap_walk_control *rwc)
> +{
> + struct anon_vma *anon_vma = folio_lock_anon_vma_read(folio, rwc);
> +
> + return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
> +}
> +
> #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> /*
> * Flush TLB entries for recently unmapped pages from remote CPUs. It is
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 02/15] mm: convert anon_vma rmap APIs to anon_rmap
2026-05-27 11:01 ` [PATCH 02/15] mm: convert anon_vma rmap APIs to anon_rmap tao
@ 2026-05-27 11:49 ` Lorenzo Stoakes
0 siblings, 0 replies; 22+ messages in thread
From: Lorenzo Stoakes @ 2026-05-27 11:49 UTC (permalink / raw)
To: tao
Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, kas, hpa, liam,
vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh, jgg,
jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
On Wed, May 27, 2026 at 07:01:34PM +0800, tao wrote:
> Convert the rmap anon_vma interfaces to anon_rmap APIs to clarify the
> semantics of anonymous rmap operations and prepare for upcoming
> ANON_VMA_LAZY support and RCU-based lockless rmap traversal.
>
> Replace folio_anon_vma(), folio_get_anon_vma(), folio_lock_anon_vma_read(),
> anon_vma_trylock_read(), anon_vma_lock_read(), anon_vma_unlock_read(),
> anon_vma_trylock_write(), anon_vma_lock_write(), anon_vma_unlock_write(),
> and vma_interval_tree_foreach() with the anon_rmap APIs.
This is another worthless commit message. You're again just writing what you did
not why or what for. This gives no help whatsoever.
>
> No functional change intended.
Err, there is a functional change, since you're literally changing how things
iterate?
>
> Signed-off-by: tao <tao.wangtao@honor.com>
All of this is terrible, you're replacing a broken abstraction with something
that assumes something completely broken with zero explanation.
No to this.
> ---
> include/linux/rmap.h | 6 ++--
> mm/damon/ops-common.c | 4 +--
> mm/huge_memory.c | 16 +++++------
> mm/ksm.c | 43 ++++++++++++++---------------
> mm/memory-failure.c | 11 ++++----
> mm/migrate.c | 64 +++++++++++++++++++++----------------------
> mm/page_idle.c | 2 +-
> mm/rmap.c | 51 ++++++++++++++++++----------------
> 8 files changed, 98 insertions(+), 99 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index c42314ea4362..9802bce92695 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -997,15 +997,13 @@ struct rmap_walk_control {
> bool (*rmap_one)(struct folio *folio, struct vm_area_struct *vma,
> unsigned long addr, void *arg);
> int (*done)(struct folio *folio);
> - struct anon_vma *(*anon_lock)(const struct folio *folio,
> - struct rmap_walk_control *rwc);
> + anon_rmap_t (*anon_lock)(const struct folio *folio,
> + struct rmap_walk_control *rwc);
> bool (*invalid_vma)(struct vm_area_struct *vma, void *arg);
> };
>
> void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc);
> void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc);
> -struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
> - struct rmap_walk_control *rwc);
>
> bool folio_maybe_same_anon_vma(const struct folio *folio,
> const struct vm_area_struct *vma);
> diff --git a/mm/damon/ops-common.c b/mm/damon/ops-common.c
> index 8c6d613425c1..5788410965b8 100644
> --- a/mm/damon/ops-common.c
> +++ b/mm/damon/ops-common.c
> @@ -172,7 +172,7 @@ void damon_folio_mkold(struct folio *folio)
> {
> struct rmap_walk_control rwc = {
> .rmap_one = damon_folio_mkold_one,
> - .anon_lock = folio_lock_anon_vma_read,
> + .anon_lock = folio_lock_anon_rmap_read,
> };
>
> if (!folio_mapped(folio) || !folio_raw_mapping(folio)) {
> @@ -236,7 +236,7 @@ bool damon_folio_young(struct folio *folio)
> struct rmap_walk_control rwc = {
> .arg = &accessed,
> .rmap_one = damon_folio_young_one,
> - .anon_lock = folio_lock_anon_vma_read,
> + .anon_lock = folio_lock_anon_rmap_read,
> };
>
> if (!folio_mapped(folio) || !folio_raw_mapping(folio)) {
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 970e077019b7..ab3c2397449a 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4051,7 +4051,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> struct folio *end_folio = folio_next(folio);
> bool is_anon = folio_test_anon(folio);
> struct address_space *mapping = NULL;
> - struct anon_vma *anon_vma = NULL;
> + anon_rmap_t anon_rmap = ANON_RMAP_NULL;
> int old_order = folio_order(folio);
> struct folio *new_folio, *next;
> int nr_shmem_dropped = 0;
> @@ -4087,12 +4087,12 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> * is taken to serialise against parallel split or collapse
> * operations.
> */
> - anon_vma = folio_get_anon_vma(folio);
> - if (!anon_vma) {
> + anon_rmap = folio_get_anon_rmap(folio);
> + if (!anon_rmap_value(anon_rmap)) {
> ret = -EBUSY;
> goto out;
> }
> - anon_vma_lock_write(anon_vma);
> + anon_rmap_lock_write(anon_rmap);
> mapping = NULL;
> } else {
> unsigned int min_order;
> @@ -4122,7 +4122,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> }
> }
>
> - anon_vma = NULL;
> + anon_rmap = ANON_RMAP_NULL;
> i_mmap_lock_read(mapping);
>
> /*
> @@ -4200,9 +4200,9 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> }
>
> out_unlock:
> - if (anon_vma) {
> - anon_vma_unlock_write(anon_vma);
> - put_anon_vma(anon_vma);
> + if (anon_rmap_value(anon_rmap)) {
> + anon_rmap_unlock_write(anon_rmap);
> + put_anon_rmap(anon_rmap);
> }
> if (mapping)
> i_mmap_unlock_read(mapping);
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 7d5b76478f0b..f4c204a8a379 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -187,7 +187,7 @@ struct ksm_stable_node {
> /**
> * struct ksm_rmap_item - reverse mapping item for virtual addresses
> * @rmap_list: next rmap_item in mm_slot's singly-linked rmap_list
> - * @anon_vma: pointer to anon_vma for this mm,address, when in stable tree
> + * @anon_rmap: anonymous folio rmap for this mm,address, when in stable tree
> * @nid: NUMA node id of unstable tree in which linked (may not match page)
> * @mm: the memory structure this rmap_item is pointing into
> * @address: the virtual address this rmap_item tracks (+ flags in low bits)
> @@ -201,7 +201,7 @@ struct ksm_stable_node {
> struct ksm_rmap_item {
> struct ksm_rmap_item *rmap_list;
> union {
> - struct anon_vma *anon_vma; /* when stable */
> + anon_rmap_t anon_rmap; /* when stable */
> #ifdef CONFIG_NUMA
> int nid; /* when node of unstable tree */
> #endif
> @@ -786,7 +786,7 @@ static void break_cow(struct ksm_rmap_item *rmap_item)
> * It is not an accident that whenever we want to break COW
> * to undo, we also need to drop a reference to the anon_vma.
> */
> - put_anon_vma(rmap_item->anon_vma);
> + put_anon_rmap(rmap_item->anon_rmap);
>
> mmap_read_lock(mm);
> vma = find_mergeable_vma(mm, addr);
> @@ -898,7 +898,7 @@ static void remove_node_from_stable_tree(struct ksm_stable_node *stable_node)
>
> VM_BUG_ON(stable_node->rmap_hlist_len <= 0);
> stable_node->rmap_hlist_len--;
> - put_anon_vma(rmap_item->anon_vma);
> + put_anon_rmap(rmap_item->anon_rmap);
> rmap_item->address &= PAGE_MASK;
> cond_resched();
> }
> @@ -1051,7 +1051,7 @@ static void remove_rmap_item_from_tree(struct ksm_rmap_item *rmap_item)
> VM_BUG_ON(stable_node->rmap_hlist_len <= 0);
> stable_node->rmap_hlist_len--;
>
> - put_anon_vma(rmap_item->anon_vma);
> + put_anon_rmap(rmap_item->anon_rmap);
> rmap_item->head = NULL;
> rmap_item->address &= PAGE_MASK;
>
> @@ -1598,9 +1598,8 @@ static int try_to_merge_with_ksm_page(struct ksm_rmap_item *rmap_item,
> /* Unstable nid is in union with stable anon_vma: remove first */
> remove_rmap_item_from_tree(rmap_item);
>
> - /* Must get reference to anon_vma while still holding mmap_lock */
> - rmap_item->anon_vma = vma->anon_vma;
> - get_anon_vma(vma->anon_vma);
> + /* Must get reference to anon_rmap while still holding mmap_lock */
> + rmap_item->anon_rmap = vma_get_anon_rmap(vma);
> out:
> mmap_read_unlock(mm);
> trace_ksm_merge_with_ksm_page(kpage, page_to_pfn(kpage ? kpage : page),
> @@ -3108,7 +3107,6 @@ struct folio *ksm_might_need_to_copy(struct folio *folio,
> struct vm_area_struct *vma, unsigned long addr)
> {
> struct page *page = folio_page(folio, 0);
> - struct anon_vma *anon_vma = folio_anon_vma(folio);
> struct folio *new_folio;
>
> if (folio_test_large(folio))
> @@ -3118,10 +3116,10 @@ struct folio *ksm_might_need_to_copy(struct folio *folio,
> if (folio_stable_node(folio) &&
> !(ksm_run & KSM_RUN_UNMERGE))
> return folio; /* no need to copy it */
> - } else if (!anon_vma) {
> + } else if (!folio_test_anon(folio)) {
> return folio; /* no need to copy it */
> } else if (folio->index == linear_page_index(vma, addr) &&
> - anon_vma->root == vma->anon_vma->root) {
> + folio_maybe_same_anon_vma(folio, vma)) {
> return folio; /* still no need to copy it */
> }
> if (PageHWPoison(page))
> @@ -3173,20 +3171,20 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
> hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) {
> /* Ignore the stable/unstable/sqnr flags */
> const unsigned long addr = rmap_item->address & PAGE_MASK;
> - struct anon_vma *anon_vma = rmap_item->anon_vma;
> + anon_rmap_t anon_rmap = rmap_item->anon_rmap;
> struct anon_vma_chain *vmac;
> struct vm_area_struct *vma;
>
> cond_resched();
> - if (!anon_vma_trylock_read(anon_vma)) {
> + if (!anon_rmap_trylock_read(anon_rmap)) {
> if (rwc->try_lock) {
> rwc->contended = true;
> return;
> }
> - anon_vma_lock_read(anon_vma);
> + anon_rmap_lock_read(anon_rmap);
> }
>
> - anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
> + anon_rmap_foreach_vma(vma, vmac, anon_rmap,
> 0, ULONG_MAX) {
>
> cond_resched();
> @@ -3207,15 +3205,15 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
> continue;
>
> if (!rwc->rmap_one(folio, vma, addr, rwc->arg)) {
> - anon_vma_unlock_read(anon_vma);
> + anon_rmap_unlock_read(anon_rmap);
> return;
> }
> if (rwc->done && rwc->done(folio)) {
> - anon_vma_unlock_read(anon_vma);
> + anon_rmap_unlock_read(anon_rmap);
> return;
> }
> }
> - anon_vma_unlock_read(anon_vma);
> + anon_rmap_unlock_read(anon_rmap);
> }
> if (!search_new_forks++)
> goto again;
> @@ -3237,9 +3235,9 @@ void collect_procs_ksm(const struct folio *folio, const struct page *page,
> if (!stable_node)
> return;
> hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) {
> - struct anon_vma *av = rmap_item->anon_vma;
> + anon_rmap_t anon_rmap = rmap_item->anon_rmap;
>
> - anon_vma_lock_read(av);
> + anon_rmap_lock_read(anon_rmap);
> rcu_read_lock();
> for_each_process(tsk) {
> struct anon_vma_chain *vmac;
> @@ -3248,10 +3246,9 @@ void collect_procs_ksm(const struct folio *folio, const struct page *page,
> task_early_kill(tsk, force_early);
> if (!t)
> continue;
> - anon_vma_interval_tree_foreach(vmac, &av->rb_root, 0,
> + anon_rmap_foreach_vma(vma, vmac, anon_rmap, 0,
> ULONG_MAX)
> {
> - vma = vmac->vma;
> if (vma->vm_mm == t->mm) {
> addr = rmap_item->address & PAGE_MASK;
> add_to_kill_ksm(t, page, vma, to_kill,
> @@ -3260,7 +3257,7 @@ void collect_procs_ksm(const struct folio *folio, const struct page *page,
> }
> }
> rcu_read_unlock();
> - anon_vma_unlock_read(av);
> + anon_rmap_unlock_read(anon_rmap);
> }
> }
> #endif
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index ee42d4361309..bc9abba75b5d 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -547,11 +547,11 @@ static void collect_procs_anon(const struct folio *folio,
> int force_early)
> {
> struct task_struct *tsk;
> - struct anon_vma *av;
> + anon_rmap_t anon_rmap;
> pgoff_t pgoff;
>
> - av = folio_lock_anon_vma_read(folio, NULL);
> - if (av == NULL) /* Not actually mapped anymore */
> + anon_rmap = folio_lock_anon_rmap_read(folio, NULL);
> + if (!anon_rmap_value(anon_rmap)) /* Not actually mapped anymore */
> return;
>
> pgoff = page_pgoff(folio, page);
> @@ -564,9 +564,8 @@ static void collect_procs_anon(const struct folio *folio,
>
> if (!t)
> continue;
> - anon_vma_interval_tree_foreach(vmac, &av->rb_root,
> + anon_rmap_foreach_vma(vma, vmac, anon_rmap,
> pgoff, pgoff) {
> - vma = vmac->vma;
> if (vma->vm_mm != t->mm)
> continue;
> addr = page_mapped_in_vma(page, vma);
> @@ -574,7 +573,7 @@ static void collect_procs_anon(const struct folio *folio,
> }
> }
> rcu_read_unlock();
> - anon_vma_unlock_read(av);
> + anon_rmap_unlock_read(anon_rmap);
> }
>
> /*
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 8a64291ab5b4..769983cf14e0 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1142,18 +1142,18 @@ enum {
>
> static void __migrate_folio_record(struct folio *dst,
> int old_page_state,
> - struct anon_vma *anon_vma)
> + anon_rmap_t anon_rmap)
> {
> - dst->private = (void *)anon_vma + old_page_state;
> + dst->private = (void *)anon_rmap_to_anon_vma(anon_rmap) + old_page_state;
> }
>
> static void __migrate_folio_extract(struct folio *dst,
> int *old_page_state,
> - struct anon_vma **anon_vmap)
> + anon_rmap_t *anon_rmapp)
> {
> unsigned long private = (unsigned long)dst->private;
>
> - *anon_vmap = (struct anon_vma *)(private & ~PAGE_OLD_STATES);
> + *anon_rmapp = anon_vma_to_anon_rmap((void *)(private & ~PAGE_OLD_STATES));
> *old_page_state = private & PAGE_OLD_STATES;
> dst->private = NULL;
> }
> @@ -1161,15 +1161,15 @@ static void __migrate_folio_extract(struct folio *dst,
> /* Restore the source folio to the original state upon failure */
> static void migrate_folio_undo_src(struct folio *src,
> int page_was_mapped,
> - struct anon_vma *anon_vma,
> + anon_rmap_t anon_rmap,
> bool locked,
> struct list_head *ret)
> {
> if (page_was_mapped)
> remove_migration_ptes(src, src, 0);
> - /* Drop an anon_vma reference if we took one */
> - if (anon_vma)
> - put_anon_vma(anon_vma);
> + /* Drop an anon_rmap reference if we took one */
> + if (anon_rmap_value(anon_rmap))
> + put_anon_rmap(anon_rmap);
> if (locked)
> folio_unlock(src);
> if (ret)
> @@ -1210,7 +1210,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
> struct folio *dst;
> int rc = -EAGAIN;
> int old_page_state = 0;
> - struct anon_vma *anon_vma = NULL;
> + anon_rmap_t anon_rmap = ANON_RMAP_NULL;
> bool locked = false;
> bool dst_locked = false;
>
> @@ -1275,19 +1275,19 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
> /*
> * By try_to_migrate(), src->mapcount goes down to 0 here. In this case,
> * we cannot notice that anon_vma is freed while we migrate a page.
> - * This get_anon_vma() delays freeing anon_vma pointer until the end
> + * This get_anon_rmap() delays freeing anon_rmap pointer until the end
> * of migration. File cache pages are no problem because of page_lock()
> * File Caches may use write_page() or lock_page() in migration, then,
> * just care Anon page here.
> *
> - * Only folio_get_anon_vma() understands the subtleties of
> - * getting a hold on an anon_vma from outside one of its mms.
> - * But if we cannot get anon_vma, then we won't need it anyway,
> + * Only folio_get_anon_rmap() understands the subtleties of
> + * getting a hold on an anon_rmap from outside one of its mms.
> + * But if we cannot get anon_rmap, then we won't need it anyway,
> * because that implies that the anon page is no longer mapped
> * (and cannot be remapped so long as we hold the page lock).
> */
> if (folio_test_anon(src) && !folio_test_ksm(src))
> - anon_vma = folio_get_anon_vma(src);
> + anon_rmap = folio_get_anon_rmap(src);
>
> /*
> * Block others from accessing the new page when we get around to
> @@ -1302,7 +1302,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
> dst_locked = true;
>
> if (unlikely(page_has_movable_ops(&src->page))) {
> - __migrate_folio_record(dst, old_page_state, anon_vma);
> + __migrate_folio_record(dst, old_page_state, anon_rmap);
> return 0;
> }
>
> @@ -1326,13 +1326,13 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
> } else if (folio_mapped(src)) {
> /* Establish migration ptes */
> VM_BUG_ON_FOLIO(folio_test_anon(src) &&
> - !folio_test_ksm(src) && !anon_vma, src);
> + !folio_test_ksm(src) && !anon_rmap_value(anon_rmap), src);
> try_to_migrate(src, mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0);
> old_page_state |= PAGE_WAS_MAPPED;
> }
>
> if (!folio_mapped(src)) {
> - __migrate_folio_record(dst, old_page_state, anon_vma);
> + __migrate_folio_record(dst, old_page_state, anon_rmap);
> return 0;
> }
>
> @@ -1345,7 +1345,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
> ret = NULL;
>
> migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED,
> - anon_vma, locked, ret);
> + anon_rmap, locked, ret);
> migrate_folio_undo_dst(dst, dst_locked, put_new_folio, private);
>
> return rc;
> @@ -1359,12 +1359,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> {
> int rc;
> int old_page_state = 0;
> - struct anon_vma *anon_vma = NULL;
> + anon_rmap_t anon_rmap = ANON_RMAP_NULL;
> bool src_deferred_split = false;
> bool src_partially_mapped = false;
> struct list_head *prev;
>
> - __migrate_folio_extract(dst, &old_page_state, &anon_vma);
> + __migrate_folio_extract(dst, &old_page_state, &anon_rmap);
> prev = dst->lru.prev;
> list_del(&dst->lru);
>
> @@ -1425,9 +1425,9 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> * and will be freed.
> */
> list_del(&src->lru);
> - /* Drop an anon_vma reference if we took one */
> - if (anon_vma)
> - put_anon_vma(anon_vma);
> + /* Drop an anon_rmap reference if we took one */
> + if (anon_rmap_value(anon_rmap))
> + put_anon_rmap(anon_rmap);
> folio_unlock(src);
> migrate_folio_done(src, reason);
>
> @@ -1439,12 +1439,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> */
> if (rc == -EAGAIN) {
> list_add(&dst->lru, prev);
> - __migrate_folio_record(dst, old_page_state, anon_vma);
> + __migrate_folio_record(dst, old_page_state, anon_rmap);
> return rc;
> }
>
> migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED,
> - anon_vma, true, ret);
> + anon_rmap, true, ret);
> migrate_folio_undo_dst(dst, true, put_new_folio, private);
>
> return rc;
> @@ -1476,7 +1476,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
> struct folio *dst;
> int rc = -EAGAIN;
> int page_was_mapped = 0;
> - struct anon_vma *anon_vma = NULL;
> + anon_rmap_t anon_rmap = ANON_RMAP_NULL;
> struct address_space *mapping = NULL;
> enum ttu_flags ttu = 0;
>
> @@ -1513,7 +1513,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
> }
>
> if (folio_test_anon(src))
> - anon_vma = folio_get_anon_vma(src);
> + anon_rmap = folio_get_anon_rmap(src);
>
> if (unlikely(!folio_trylock(dst)))
> goto put_anon;
> @@ -1550,8 +1550,8 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
> folio_unlock(dst);
>
> put_anon:
> - if (anon_vma)
> - put_anon_vma(anon_vma);
> + if (anon_rmap_value(anon_rmap))
> + put_anon_rmap(anon_rmap);
>
> if (!rc) {
> move_hugetlb_state(src, dst, reason);
> @@ -1778,11 +1778,11 @@ static void migrate_folios_undo(struct list_head *src_folios,
> dst2 = list_next_entry(dst, lru);
> list_for_each_entry_safe(folio, folio2, src_folios, lru) {
> int old_page_state = 0;
> - struct anon_vma *anon_vma = NULL;
> + anon_rmap_t anon_rmap = ANON_RMAP_NULL;
>
> - __migrate_folio_extract(dst, &old_page_state, &anon_vma);
> + __migrate_folio_extract(dst, &old_page_state, &anon_rmap);
> migrate_folio_undo_src(folio, old_page_state & PAGE_WAS_MAPPED,
> - anon_vma, true, ret_folios);
> + anon_rmap, true, ret_folios);
> list_del(&dst->lru);
> migrate_folio_undo_dst(dst, true, put_new_folio, private);
> dst = dst2;
> diff --git a/mm/page_idle.c b/mm/page_idle.c
> index 9c67cbac2965..d4103f20f526 100644
> --- a/mm/page_idle.c
> +++ b/mm/page_idle.c
> @@ -102,7 +102,7 @@ static void page_idle_clear_pte_refs(struct folio *folio)
> */
> static struct rmap_walk_control rwc = {
> .rmap_one = page_idle_clear_pte_refs_one,
> - .anon_lock = folio_lock_anon_vma_read,
> + .anon_lock = folio_lock_anon_rmap_read,
> };
>
> if (!folio_mapped(folio) || !folio_raw_mapping(folio))
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 1b2dada71778..41607168e00e 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -630,8 +630,8 @@ struct anon_vma *folio_get_anon_vma(const struct folio *folio)
> * reference like with folio_get_anon_vma() and then block on the mutex
> * on !rwc->try_lock case.
> */
> -struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
> - struct rmap_walk_control *rwc)
> +static struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
> + struct rmap_walk_control *rwc)
> {
> struct anon_vma *anon_vma = NULL;
> struct anon_vma *root_anon_vma;
> @@ -744,6 +744,14 @@ void anon_rmap_unlock_read(anon_rmap_t anon_rmap)
> anon_vma_unlock_read(anon_rmap_to_anon_vma(anon_rmap));
> }
>
> +static anon_rmap_t folio_anon_rmap(const struct folio *folio)
> +{
> + struct anon_vma *anon_vma;
> +
> + anon_vma = folio_anon_vma(folio);
> + return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
> +}
> +
> bool folio_maybe_same_anon_vma(const struct folio *folio,
> const struct vm_area_struct *vma)
> {
> @@ -930,13 +938,11 @@ unsigned long page_address_in_vma(const struct folio *folio,
> const struct page *page, const struct vm_area_struct *vma)
> {
> if (folio_test_anon(folio)) {
> - struct anon_vma *anon_vma = folio_anon_vma(folio);
> /*
> * Note: swapoff's unuse_vma() is more efficient with this
> * check, and needs it to match anon_vma when KSM is active.
> */
> - if (!vma->anon_vma || !anon_vma ||
> - vma->anon_vma->root != anon_vma->root)
> + if (!vma->anon_vma || !folio_maybe_same_anon_vma(folio, vma))
> return -EFAULT;
> } else if (!vma->vm_file) {
> return -EFAULT;
> @@ -944,7 +950,7 @@ unsigned long page_address_in_vma(const struct folio *folio,
> return -EFAULT;
> }
>
> - /* KSM folios don't reach here because of the !anon_vma check */
> + /* The !folio_maybe_same_anon_vma() above handles KSM folios */
> return vma_address(vma, page_pgoff(folio, page), 1);
> }
>
> @@ -1145,7 +1151,7 @@ int folio_referenced(struct folio *folio, int is_locked,
> struct rmap_walk_control rwc = {
> .rmap_one = folio_referenced_one,
> .arg = (void *)&pra,
> - .anon_lock = folio_lock_anon_vma_read,
> + .anon_lock = folio_lock_anon_rmap_read,
> .try_lock = true,
> .invalid_vma = invalid_folio_referenced_vma,
> };
> @@ -1580,8 +1586,7 @@ static void __page_check_anon_rmap(const struct folio *folio,
> * are initially only visible via the pagetables, and the pte is locked
> * over the call to folio_add_new_anon_rmap.
> */
> - VM_BUG_ON_FOLIO(folio_anon_vma(folio)->root != vma->anon_vma->root,
> - folio);
> + VM_BUG_ON_FOLIO(!folio_maybe_same_anon_vma(folio, vma), folio);
> VM_BUG_ON_PAGE(page_pgoff(folio, page) != linear_page_index(vma, address),
> page);
> }
> @@ -2468,7 +2473,7 @@ void try_to_unmap(struct folio *folio, enum ttu_flags flags)
> .rmap_one = try_to_unmap_one,
> .arg = (void *)flags,
> .done = folio_not_mapped,
> - .anon_lock = folio_lock_anon_vma_read,
> + .anon_lock = folio_lock_anon_rmap_read,
> };
>
> if (flags & TTU_RMAP_LOCKED)
> @@ -2813,7 +2818,7 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags)
> .rmap_one = try_to_migrate_one,
> .arg = (void *)flags,
> .done = folio_not_mapped,
> - .anon_lock = folio_lock_anon_vma_read,
> + .anon_lock = folio_lock_anon_rmap_read,
> };
>
> /*
> @@ -2990,8 +2995,8 @@ void __put_anon_vma(struct anon_vma *anon_vma)
> anon_vma_free(root);
> }
>
> -static struct anon_vma *rmap_walk_anon_lock(const struct folio *folio,
> - struct rmap_walk_control *rwc)
> +static anon_rmap_t rmap_walk_anon_lock(const struct folio *folio,
> + struct rmap_walk_control *rwc)
> {
> struct anon_vma *anon_vma;
>
> @@ -3006,7 +3011,7 @@ static struct anon_vma *rmap_walk_anon_lock(const struct folio *folio,
> */
> anon_vma = folio_anon_vma(folio);
> if (!anon_vma)
> - return NULL;
> + return ANON_RMAP_NULL;
>
> if (anon_vma_trylock_read(anon_vma))
> goto out;
> @@ -3019,7 +3024,7 @@ static struct anon_vma *rmap_walk_anon_lock(const struct folio *folio,
>
> anon_vma_lock_read(anon_vma);
> out:
> - return anon_vma;
> + return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
> }
>
> /*
> @@ -3035,9 +3040,10 @@ static struct anon_vma *rmap_walk_anon_lock(const struct folio *folio,
> static void rmap_walk_anon(struct folio *folio,
> struct rmap_walk_control *rwc, bool locked)
> {
> - struct anon_vma *anon_vma;
> + anon_rmap_t anon_rmap;
> pgoff_t pgoff_start, pgoff_end;
> struct anon_vma_chain *avc;
> + struct vm_area_struct *vma;
I have no idea why you put the VMA at this scope...
>
> /*
> * The folio lock ensures that folio->mapping can't be changed under us
> @@ -3046,20 +3052,19 @@ static void rmap_walk_anon(struct folio *folio,
> VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
>
> if (locked) {
> - anon_vma = folio_anon_vma(folio);
> + anon_rmap = folio_anon_rmap(folio);
> /* anon_vma disappear under us? */
> - VM_BUG_ON_FOLIO(!anon_vma, folio);
> + VM_BUG_ON_FOLIO(!anon_rmap_value(anon_rmap), folio);
> } else {
> - anon_vma = rmap_walk_anon_lock(folio, rwc);
> + anon_rmap = rmap_walk_anon_lock(folio, rwc);
> }
> - if (!anon_vma)
> + if (!anon_rmap_value(anon_rmap))
> return;
>
> pgoff_start = folio_pgoff(folio);
> pgoff_end = pgoff_start + folio_nr_pages(folio) - 1;
> - anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root,
> + anon_rmap_foreach_vma(vma, avc, anon_rmap,
> pgoff_start, pgoff_end) {
> - struct vm_area_struct *vma = avc->vma;
Don't throw random changes like this in with a general replacement patch.
> unsigned long address = vma_address(vma, pgoff_start,
> folio_nr_pages(folio));
>
> @@ -3076,7 +3081,7 @@ static void rmap_walk_anon(struct folio *folio,
> }
>
> if (!locked)
> - anon_vma_unlock_read(anon_vma);
> + anon_rmap_unlock_read(anon_rmap);
> }
>
> /**
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 03/15] mm: introduce anon_vma_tree_t for multiple anon_vma topologies
2026-05-27 11:01 ` [PATCH 03/15] mm: introduce anon_vma_tree_t for multiple anon_vma topologies tao
@ 2026-05-27 11:56 ` Lorenzo Stoakes
0 siblings, 0 replies; 22+ messages in thread
From: Lorenzo Stoakes @ 2026-05-27 11:56 UTC (permalink / raw)
To: tao
Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, kas, hpa, liam,
vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh, jgg,
jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
On Wed, May 27, 2026 at 07:01:35PM +0800, tao wrote:
> Prepare for upcoming ANON_VMA_LAZY support and RCU-based lockless rmap
> traversal by clearly separating anon_vma topology handling from the
> anon_rmap semantics.
RCU is not 'lockless'... and if you truly get RCU semantics you break a bunch of
stuff as I found out.
>
> Prepare for supporting multiple anon_vma topologies by introducing
> lightweight abstractions used by the VMA and rmap code.
>
> Introduce anon_vma_tree_t as the type stored in vma->anon_vma:
>
> typedef unsigned long anon_vma_tree_t;
>
> It represents a tagged pointer encoding a reference to the anon_vma
> topology. The low bits are reserved as type tags to distinguish
> different implementations (e.g. regular anon_vma and lazy anon_vma).
> This keeps the VMA representation compact while allowing the topology
> to evolve without changing the VMA layout.
>
> Signed-off-by: tao <tao.wangtao@honor.com>
The commit message is at least better on this one, but this approach is again,
predicated on extending a broken abstraction.
You could have saved time and effort by coming forward with this earlier to the
community.
You're also adding a bunch more messy code on top of anon_vma. It's just the
wrong direction.
> ---
> include/linux/mm_types.h | 3 +++
> mm/internal.h | 54 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 57 insertions(+)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index a308e2c23b82..5f4961ea1572 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -917,6 +917,9 @@ struct vm_area_desc {
> struct mmap_action action;
> };
>
> +/* Tagged pointer stored in vma->anon_vma. Low bits encode anon_vma type. */
> +typedef unsigned long anon_vma_tree_t;
> +
> /*
> * This struct describes a virtual memory area. There is one of these
> * per VM-area/task. A VM area is any part of the process virtual memory
> diff --git a/mm/internal.h b/mm/internal.h
> index 5a2ddcf68e0b..76544ad44ff0 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -246,6 +246,60 @@ static inline void anon_vma_unlock_read(struct anon_vma *anon_vma)
> up_read(&anon_vma->root->rwsem);
> }
>
> +/* anon_vma_tree_t APIs */
> +
> +static inline anon_vma_tree_t make_anon_vma_tree(struct anon_vma *anon_vma)
> +{
> + return (anon_vma_tree_t)anon_vma;
> +}
You're literally returning an unsigned long of an anon_vma here?
Why is the anon_rmap_t a wrapped struct and this an unsigned long?
> +
> +static inline struct anon_vma *anon_vma_tree_anon_vma(anon_vma_tree_t anon_tree)
> +{
> + return (struct anon_vma *)anon_tree;
> +}
The anon_tree is an anon_vma? What?
And it's a tagged pointer but we don't bother clearing any bits right?...!
> +
> +static inline void anon_vma_tree_lock_write(anon_vma_tree_t anon_tree)
> +{
> + struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
> +
> + anon_vma_lock_write(anon_vma);
> +}
> +
> +static inline int anon_vma_tree_trylock_write(anon_vma_tree_t anon_tree)
> +{
> + struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
> +
> + return anon_vma_trylock_write(anon_vma);
> +}
> +
> +static inline void anon_vma_tree_unlock_write(anon_vma_tree_t anon_tree)
> +{
> + struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
> +
> + anon_vma_unlock_write(anon_vma);
> +}
> +
> +static inline void anon_vma_tree_lock_read(anon_vma_tree_t anon_tree)
> +{
> + struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
> +
> + anon_vma_lock_read(anon_vma);
> +}
> +
> +static inline int anon_vma_tree_trylock_read(anon_vma_tree_t anon_tree)
> +{
> + struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
> +
> + return anon_vma_trylock_read(anon_vma);
> +}
> +
> +static inline void anon_vma_tree_unlock_read(anon_vma_tree_t anon_tree)
> +{
> + struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
> +
> + anon_vma_unlock_read(anon_vma);
> +}
> +
You keep adding more and more code on top of the existing mess. This is NOT what
we want.
> struct anon_vma *folio_get_anon_vma(const struct folio *folio);
>
> /* Operations which modify VMAs. */
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (16 preceding siblings ...)
2026-05-27 11:30 ` Lorenzo Stoakes
@ 2026-05-27 14:33 ` Lorenzo Stoakes
17 siblings, 0 replies; 22+ messages in thread
From: Lorenzo Stoakes @ 2026-05-27 14:33 UTC (permalink / raw)
To: tao
Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, kas, hpa, liam,
vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh, jgg,
jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
OK I've had a look through more thoroughly now and:
NAK and NAK any approach like this.
Not only is this structurally all wrong, it does some insane stuff (pinning
VMAs - no), the RCU usage is highly dubious and I suspect you've completely
broken the anon rmap for things like migration, or have at least added very
dubious edge cases.
You've added insane complexity, and also have failed to add even
perfunctory tests, which is also totally unacceptable.
The implementation is wrong, and the approach is wrong - we do not want to
extend or build on anon_vma. So this is unmergeable, or any approach like
it.
I also, unfortunately, strongly suspect AI here. The turn of phrase, and
poor commit messages, you doing this out of nowhere with absolutely no rmap
experience before, your total lack of communication before.
Claude puts the probability of heavy AI usage at 85-90%, and I'm pretty
convinced. Either way it's utterly unmergeable but that you (likely) used
AI to generate this much work for us makes me actually pretty annoyed.
As a result, I would strongly suggest you no longer submit patches for the
reverse mapping part of mm, as there is now a real lack of trust.
If you wish to rebuild that, I suggest you _discuss_ concepts and ideas,
e.g. send stuff on-list with a [DISCUSSION] tag, and engage with the
community, and go from there.
It's also important to synchronise - I'm working on an anon rmap
replacement that I'm more than happy to discuss with you or anybody else
which should achieve the same numbers in an architecturally sound way.
You going off and, in a vacuum, generating a bunch of code with an
unacceptable approach is not a civil way of engaging nor is it a good use
of your time, or maintainer time looking at it.
Thanks, Lorenzo
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2026-05-27 14:33 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
2026-05-27 11:01 ` [PATCH 01/15] mm/rmap: introduce anon_rmap APIs for anonymous folios tao
2026-05-27 11:44 ` Lorenzo Stoakes
2026-05-27 11:01 ` [PATCH 02/15] mm: convert anon_vma rmap APIs to anon_rmap tao
2026-05-27 11:49 ` Lorenzo Stoakes
2026-05-27 11:01 ` [PATCH 03/15] mm: introduce anon_vma_tree_t for multiple anon_vma topologies tao
2026-05-27 11:56 ` Lorenzo Stoakes
2026-05-27 11:01 ` [PATCH 04/15] mm: switch to anon_vma_tree_t APIs in preparation for ANON_VMA_LAZY tao
2026-05-27 11:01 ` [PATCH 05/15] mm: add CONFIG_ANON_VMA_LAZY and folio helpers tao
2026-05-27 11:01 ` [PATCH 06/15] mm: add CONFIG_VMA_REF and VMA helpers tao
2026-05-27 11:01 ` [PATCH 07/15] mm: replace direct FOLIO_MAPPING_ANON usage with helpers tao
2026-05-27 11:01 ` [PATCH 08/15] mm: prepare rmap infrastructure for ANON_VMA_LAZY tao
2026-05-27 11:01 ` [PATCH 09/15] mm: implement ANON_VMA_LAZY rmap semantics tao
2026-05-27 11:01 ` [PATCH 10/15] mm: defer anon_vma creation with ANON_VMA_LAZY tao
2026-05-27 11:01 ` [PATCH 11/15] mm: handle ANON_VMA_LAZY in huge page operations tao
2026-05-27 11:01 ` [PATCH 12/15] mm: handle ANON_VMA_LAZY during migration tao
2026-05-27 11:01 ` [PATCH 13/15] mm: support setup and upgrade of ANON_VMA_LAZY folios tao
2026-05-27 11:01 ` [PATCH 14/15] mm: support merging of ANON_VMA_LAZY VMAs tao
2026-05-27 11:01 ` [PATCH 15/15] mm: enable CONFIG_ANON_VMA_LAZY on arm64 and x86_64 tao
2026-05-27 11:23 ` [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation Pedro Falcato
2026-05-27 11:30 ` Lorenzo Stoakes
2026-05-27 14:33 ` Lorenzo Stoakes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox