* [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
@ 2026-05-27 11:01 tao
2026-05-27 11:01 ` [PATCH 01/15] mm/rmap: introduce anon_rmap APIs for anonymous folios tao
` (19 more replies)
0 siblings, 20 replies; 64+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
TL;DR
-----
This series introduces ANON_VMA_LAZY, which defers anon_vma creation
until it is actually required.
- anon_vma memory reduced by ~92-97%, anon_vma_chain reduced by ~50-57%
- rmap operations on ANON_VMA_LAZY VMAs do not require anon_vma locking
Background
----------
Currently anon_vma structures are created eagerly when anonymous VMAs
are initialized. However, many VMAs never participate in fork or rmap
operations that require anon_vma chains, so the allocated anon_vma and
anon_vma_chain objects are often unnecessary.
Design overview
---------------
ANON_VMA_LAZY defers anon_vma allocation until it is actually needed
(for example during fork). VMAs that never participate in sharing can
avoid creating anon_vma structures entirely.
Before an anon_vma exists, rmap operations rely directly on VMA
information, so no anon_vma locking is required. An anon_vma is created
and linked only when sharing semantics are required.
This series introduces anon_rmap helpers to make rmap less dependent on
direct anon_vma access. It also introduces anon_vma_tree_t as a container
to support both the lazy and the existing anon_vma layouts.
Once a VMA becomes associated with an anon_vma, the normal behavior
remains unchanged.
Memory impact
-------------
Preliminary measurements show significant reductions in anon_vma-related
slab allocations.
After boot:
Object | Before (active KB) | After (active KB) | Change
vm_area_struct | 117035 | 118176 | +1.0%
anon_vma_chain | 18865.8 | 8112.06 | -57.0%
anon_vma | 20426.4 | 613.75 | -97.0%
After launching 24 apps:
Object | Before (active KB) | After (active KB) | Change
vm_area_struct | 196873 | 197345 | +0.2%
anon_vma_chain | 31477.1 | 15576.8 | -50.5%
anon_vma | 33280 | 2648.12 | -92.0%
Simple fork microbenchmarks also show a slight improvement in fork
performance, since child VMAs do not need to allocate anon_vma
structures during fork.
Feedback and suggestions are welcome.
tao (15):
mm/rmap: introduce anon_rmap APIs for anonymous folios
mm: convert anon_vma rmap APIs to anon_rmap
mm: introduce anon_vma_tree_t for multiple anon_vma topologies
mm: switch to anon_vma_tree_t APIs in preparation for ANON_VMA_LAZY
mm: add CONFIG_ANON_VMA_LAZY and folio helpers
mm: add CONFIG_VMA_REF and VMA helpers
mm: replace direct FOLIO_MAPPING_ANON usage with helpers
mm: prepare rmap infrastructure for ANON_VMA_LAZY
mm: implement ANON_VMA_LAZY rmap semantics
mm: defer anon_vma creation with ANON_VMA_LAZY
mm: handle ANON_VMA_LAZY in huge page operations
mm: handle ANON_VMA_LAZY during migration
mm: support setup and upgrade of ANON_VMA_LAZY folios
mm: support merging of ANON_VMA_LAZY VMAs
mm: enable CONFIG_ANON_VMA_LAZY on arm64 and x86_64
arch/arm64/Kconfig | 1 +
arch/x86/Kconfig | 1 +
fs/proc/page.c | 6 +-
include/linux/mm.h | 38 ++
include/linux/mm_types.h | 9 +-
include/linux/page-flags.h | 34 +-
include/linux/pagemap.h | 2 +-
include/linux/rmap.h | 165 ++++++++-
mm/Kconfig | 22 ++
mm/damon/ops-common.c | 4 +-
mm/debug.c | 2 +-
mm/debug_vm_pgtable.c | 2 +-
mm/gup.c | 6 +-
mm/huge_memory.c | 16 +-
mm/internal.h | 171 +++++++++
mm/khugepaged.c | 13 +-
mm/ksm.c | 43 ++-
mm/memory-failure.c | 11 +-
mm/memory.c | 19 +-
mm/migrate.c | 126 ++++---
mm/mmap.c | 15 +-
mm/mremap.c | 4 +-
mm/page_idle.c | 2 +-
mm/rmap.c | 690 ++++++++++++++++++++++++++++++++++---
mm/vma.c | 76 ++--
mm/vma.h | 4 +-
mm/vma_exec.c | 2 +-
mm/vma_init.c | 1 +
28 files changed, 1279 insertions(+), 206 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH 01/15] mm/rmap: introduce anon_rmap APIs for anonymous folios
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:44 ` Lorenzo Stoakes
2026-05-27 11:01 ` [PATCH 02/15] mm: convert anon_vma rmap APIs to anon_rmap tao
` (18 subsequent siblings)
19 siblings, 1 reply; 64+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Add a set of anon_rmap APIs to operate on the reverse mappings of
anonymous folios.
Introduce anon_rmap_for_each_vma() as a wrapper around
vma_interval_tree_foreach(), so callers no longer access the
interval tree directly.
This prepares the rmap code for upcoming ANON_VMA_LAZY support and
RCU-based lockless rmap traversal.
No functional change intended.
Signed-off-by: tao <tao.wangtao@honor.com>
---
include/linux/rmap.h | 68 +++++++++++++++++++++++++++++++++++++++++
mm/rmap.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 141 insertions(+)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 8dc0871e5f00..c42314ea4362 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -937,6 +937,44 @@ int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t pgoff,
void remove_migration_ptes(struct folio *src, struct folio *dst,
enum ttu_flags flags);
+/* Reverse mapping handle for anonymous folio rmap helpers. */
+typedef struct anon_rmap {
+ unsigned long rmap;
+} anon_rmap_t;
+
+#define ANON_RMAP_NULL make_anon_rmap(0)
+
+static inline anon_rmap_t make_anon_rmap(const void *anon_mapping)
+{
+ return (anon_rmap_t){ .rmap = (unsigned long)anon_mapping, };
+}
+
+static inline unsigned long anon_rmap_value(anon_rmap_t anon_rmap)
+{
+ return anon_rmap.rmap;
+}
+
+static inline anon_rmap_t anon_vma_to_anon_rmap(const struct anon_vma *anon_vma)
+{
+ return make_anon_rmap(anon_vma);
+}
+
+static inline struct anon_vma *anon_rmap_to_anon_vma(anon_rmap_t anon_rmap)
+{
+ unsigned long rmap = anon_rmap_value(anon_rmap);
+
+ return (struct anon_vma *)rmap;
+}
+
+anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *vma);
+void put_anon_rmap(anon_rmap_t anon_rmap);
+void anon_rmap_lock_write(anon_rmap_t anon_rmap);
+int anon_rmap_trylock_write(anon_rmap_t anon_rmap);
+void anon_rmap_unlock_write(anon_rmap_t anon_rmap);
+void anon_rmap_lock_read(anon_rmap_t anon_rmap);
+int anon_rmap_trylock_read(anon_rmap_t anon_rmap);
+void anon_rmap_unlock_read(anon_rmap_t anon_rmap);
+
/*
* rmap_walk_control: To control rmap traversing for specific needs
*
@@ -969,6 +1007,36 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc);
struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
struct rmap_walk_control *rwc);
+bool folio_maybe_same_anon_vma(const struct folio *folio,
+ const struct vm_area_struct *vma);
+anon_rmap_t folio_get_anon_rmap(const struct folio *folio);
+anon_rmap_t folio_lock_anon_rmap_read(const struct folio *folio,
+ struct rmap_walk_control *rwc);
+
+static inline struct vm_area_struct *anon_rmap_iter_first_vma(
+ anon_rmap_t anon_rmap, unsigned long start, unsigned long last,
+ struct anon_vma_chain **avc)
+{
+ struct anon_vma *anon_vma = anon_rmap_to_anon_vma(anon_rmap);
+
+ *avc = anon_vma_interval_tree_iter_first(&anon_vma->rb_root, start, last);
+ return *avc ? (*avc)->vma : NULL;
+}
+
+static inline struct vm_area_struct *anon_rmap_iter_next_vma(
+ anon_rmap_t anon_rmap, unsigned long start, unsigned long last,
+ struct anon_vma_chain **avc)
+{
+ if (!*avc)
+ return NULL;
+ *avc = anon_vma_interval_tree_iter_next(*avc, start, last);
+ return *avc ? (*avc)->vma : NULL;
+}
+
+#define anon_rmap_foreach_vma(vma, avc, anon_rmap, start, last) \
+ for (vma = anon_rmap_iter_first_vma(anon_rmap, start, last, &avc); \
+ vma; vma = anon_rmap_iter_next_vma(anon_rmap, start, last, &avc))
+
#else /* !CONFIG_MMU */
#define anon_vma_init() do {} while (0)
diff --git a/mm/rmap.c b/mm/rmap.c
index 78b7fb5f367c..1b2dada71778 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -701,6 +701,79 @@ struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
return anon_vma;
}
+anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *vma)
+{
+ mmap_assert_locked(vma->vm_mm);
+ VM_BUG_ON(!vma->anon_vma);
+ get_anon_vma(vma->anon_vma);
+ return anon_vma_to_anon_rmap(vma->anon_vma);
+}
+
+void put_anon_rmap(anon_rmap_t anon_rmap)
+{
+ put_anon_vma(anon_rmap_to_anon_vma(anon_rmap));
+}
+
+void anon_rmap_lock_write(anon_rmap_t anon_rmap)
+{
+ anon_vma_lock_write(anon_rmap_to_anon_vma(anon_rmap));
+}
+
+int anon_rmap_trylock_write(anon_rmap_t anon_rmap)
+{
+ return anon_vma_trylock_write(anon_rmap_to_anon_vma(anon_rmap));
+}
+
+void anon_rmap_unlock_write(anon_rmap_t anon_rmap)
+{
+ anon_vma_unlock_write(anon_rmap_to_anon_vma(anon_rmap));
+}
+
+void anon_rmap_lock_read(anon_rmap_t anon_rmap)
+{
+ anon_vma_lock_read(anon_rmap_to_anon_vma(anon_rmap));
+}
+
+int anon_rmap_trylock_read(anon_rmap_t anon_rmap)
+{
+ return anon_vma_trylock_read(anon_rmap_to_anon_vma(anon_rmap));
+}
+
+void anon_rmap_unlock_read(anon_rmap_t anon_rmap)
+{
+ anon_vma_unlock_read(anon_rmap_to_anon_vma(anon_rmap));
+}
+
+bool folio_maybe_same_anon_vma(const struct folio *folio,
+ const struct vm_area_struct *vma)
+{
+ struct anon_vma *anon_vma;
+ struct anon_vma *tgt_anon_vma = vma->anon_vma;
+ bool same = false;
+
+ rcu_read_lock();
+ anon_vma = folio_anon_vma(folio);
+ if (anon_vma && tgt_anon_vma)
+ same = anon_vma->root == tgt_anon_vma->root;
+ rcu_read_unlock();
+ return same;
+}
+
+anon_rmap_t folio_get_anon_rmap(const struct folio *folio)
+{
+ struct anon_vma *anon_vma = folio_get_anon_vma(folio);
+
+ return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
+}
+
+anon_rmap_t folio_lock_anon_rmap_read(const struct folio *folio,
+ struct rmap_walk_control *rwc)
+{
+ struct anon_vma *anon_vma = folio_lock_anon_vma_read(folio, rwc);
+
+ return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
+}
+
#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
/*
* Flush TLB entries for recently unmapped pages from remote CPUs. It is
--
2.17.1
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 02/15] mm: convert anon_vma rmap APIs to anon_rmap
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
2026-05-27 11:01 ` [PATCH 01/15] mm/rmap: introduce anon_rmap APIs for anonymous folios tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:49 ` Lorenzo Stoakes
2026-05-27 11:01 ` [PATCH 03/15] mm: introduce anon_vma_tree_t for multiple anon_vma topologies tao
` (17 subsequent siblings)
19 siblings, 1 reply; 64+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Convert the rmap anon_vma interfaces to anon_rmap APIs to clarify the
semantics of anonymous rmap operations and prepare for upcoming
ANON_VMA_LAZY support and RCU-based lockless rmap traversal.
Replace folio_anon_vma(), folio_get_anon_vma(), folio_lock_anon_vma_read(),
anon_vma_trylock_read(), anon_vma_lock_read(), anon_vma_unlock_read(),
anon_vma_trylock_write(), anon_vma_lock_write(), anon_vma_unlock_write(),
and vma_interval_tree_foreach() with the anon_rmap APIs.
No functional change intended.
Signed-off-by: tao <tao.wangtao@honor.com>
---
include/linux/rmap.h | 6 ++--
mm/damon/ops-common.c | 4 +--
mm/huge_memory.c | 16 +++++------
mm/ksm.c | 43 ++++++++++++++---------------
mm/memory-failure.c | 11 ++++----
mm/migrate.c | 64 +++++++++++++++++++++----------------------
mm/page_idle.c | 2 +-
mm/rmap.c | 51 ++++++++++++++++++----------------
8 files changed, 98 insertions(+), 99 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index c42314ea4362..9802bce92695 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -997,15 +997,13 @@ struct rmap_walk_control {
bool (*rmap_one)(struct folio *folio, struct vm_area_struct *vma,
unsigned long addr, void *arg);
int (*done)(struct folio *folio);
- struct anon_vma *(*anon_lock)(const struct folio *folio,
- struct rmap_walk_control *rwc);
+ anon_rmap_t (*anon_lock)(const struct folio *folio,
+ struct rmap_walk_control *rwc);
bool (*invalid_vma)(struct vm_area_struct *vma, void *arg);
};
void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc);
void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc);
-struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
- struct rmap_walk_control *rwc);
bool folio_maybe_same_anon_vma(const struct folio *folio,
const struct vm_area_struct *vma);
diff --git a/mm/damon/ops-common.c b/mm/damon/ops-common.c
index 8c6d613425c1..5788410965b8 100644
--- a/mm/damon/ops-common.c
+++ b/mm/damon/ops-common.c
@@ -172,7 +172,7 @@ void damon_folio_mkold(struct folio *folio)
{
struct rmap_walk_control rwc = {
.rmap_one = damon_folio_mkold_one,
- .anon_lock = folio_lock_anon_vma_read,
+ .anon_lock = folio_lock_anon_rmap_read,
};
if (!folio_mapped(folio) || !folio_raw_mapping(folio)) {
@@ -236,7 +236,7 @@ bool damon_folio_young(struct folio *folio)
struct rmap_walk_control rwc = {
.arg = &accessed,
.rmap_one = damon_folio_young_one,
- .anon_lock = folio_lock_anon_vma_read,
+ .anon_lock = folio_lock_anon_rmap_read,
};
if (!folio_mapped(folio) || !folio_raw_mapping(folio)) {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 970e077019b7..ab3c2397449a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -4051,7 +4051,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
struct folio *end_folio = folio_next(folio);
bool is_anon = folio_test_anon(folio);
struct address_space *mapping = NULL;
- struct anon_vma *anon_vma = NULL;
+ anon_rmap_t anon_rmap = ANON_RMAP_NULL;
int old_order = folio_order(folio);
struct folio *new_folio, *next;
int nr_shmem_dropped = 0;
@@ -4087,12 +4087,12 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
* is taken to serialise against parallel split or collapse
* operations.
*/
- anon_vma = folio_get_anon_vma(folio);
- if (!anon_vma) {
+ anon_rmap = folio_get_anon_rmap(folio);
+ if (!anon_rmap_value(anon_rmap)) {
ret = -EBUSY;
goto out;
}
- anon_vma_lock_write(anon_vma);
+ anon_rmap_lock_write(anon_rmap);
mapping = NULL;
} else {
unsigned int min_order;
@@ -4122,7 +4122,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
}
}
- anon_vma = NULL;
+ anon_rmap = ANON_RMAP_NULL;
i_mmap_lock_read(mapping);
/*
@@ -4200,9 +4200,9 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
}
out_unlock:
- if (anon_vma) {
- anon_vma_unlock_write(anon_vma);
- put_anon_vma(anon_vma);
+ if (anon_rmap_value(anon_rmap)) {
+ anon_rmap_unlock_write(anon_rmap);
+ put_anon_rmap(anon_rmap);
}
if (mapping)
i_mmap_unlock_read(mapping);
diff --git a/mm/ksm.c b/mm/ksm.c
index 7d5b76478f0b..f4c204a8a379 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -187,7 +187,7 @@ struct ksm_stable_node {
/**
* struct ksm_rmap_item - reverse mapping item for virtual addresses
* @rmap_list: next rmap_item in mm_slot's singly-linked rmap_list
- * @anon_vma: pointer to anon_vma for this mm,address, when in stable tree
+ * @anon_rmap: anonymous folio rmap for this mm,address, when in stable tree
* @nid: NUMA node id of unstable tree in which linked (may not match page)
* @mm: the memory structure this rmap_item is pointing into
* @address: the virtual address this rmap_item tracks (+ flags in low bits)
@@ -201,7 +201,7 @@ struct ksm_stable_node {
struct ksm_rmap_item {
struct ksm_rmap_item *rmap_list;
union {
- struct anon_vma *anon_vma; /* when stable */
+ anon_rmap_t anon_rmap; /* when stable */
#ifdef CONFIG_NUMA
int nid; /* when node of unstable tree */
#endif
@@ -786,7 +786,7 @@ static void break_cow(struct ksm_rmap_item *rmap_item)
* It is not an accident that whenever we want to break COW
* to undo, we also need to drop a reference to the anon_vma.
*/
- put_anon_vma(rmap_item->anon_vma);
+ put_anon_rmap(rmap_item->anon_rmap);
mmap_read_lock(mm);
vma = find_mergeable_vma(mm, addr);
@@ -898,7 +898,7 @@ static void remove_node_from_stable_tree(struct ksm_stable_node *stable_node)
VM_BUG_ON(stable_node->rmap_hlist_len <= 0);
stable_node->rmap_hlist_len--;
- put_anon_vma(rmap_item->anon_vma);
+ put_anon_rmap(rmap_item->anon_rmap);
rmap_item->address &= PAGE_MASK;
cond_resched();
}
@@ -1051,7 +1051,7 @@ static void remove_rmap_item_from_tree(struct ksm_rmap_item *rmap_item)
VM_BUG_ON(stable_node->rmap_hlist_len <= 0);
stable_node->rmap_hlist_len--;
- put_anon_vma(rmap_item->anon_vma);
+ put_anon_rmap(rmap_item->anon_rmap);
rmap_item->head = NULL;
rmap_item->address &= PAGE_MASK;
@@ -1598,9 +1598,8 @@ static int try_to_merge_with_ksm_page(struct ksm_rmap_item *rmap_item,
/* Unstable nid is in union with stable anon_vma: remove first */
remove_rmap_item_from_tree(rmap_item);
- /* Must get reference to anon_vma while still holding mmap_lock */
- rmap_item->anon_vma = vma->anon_vma;
- get_anon_vma(vma->anon_vma);
+ /* Must get reference to anon_rmap while still holding mmap_lock */
+ rmap_item->anon_rmap = vma_get_anon_rmap(vma);
out:
mmap_read_unlock(mm);
trace_ksm_merge_with_ksm_page(kpage, page_to_pfn(kpage ? kpage : page),
@@ -3108,7 +3107,6 @@ struct folio *ksm_might_need_to_copy(struct folio *folio,
struct vm_area_struct *vma, unsigned long addr)
{
struct page *page = folio_page(folio, 0);
- struct anon_vma *anon_vma = folio_anon_vma(folio);
struct folio *new_folio;
if (folio_test_large(folio))
@@ -3118,10 +3116,10 @@ struct folio *ksm_might_need_to_copy(struct folio *folio,
if (folio_stable_node(folio) &&
!(ksm_run & KSM_RUN_UNMERGE))
return folio; /* no need to copy it */
- } else if (!anon_vma) {
+ } else if (!folio_test_anon(folio)) {
return folio; /* no need to copy it */
} else if (folio->index == linear_page_index(vma, addr) &&
- anon_vma->root == vma->anon_vma->root) {
+ folio_maybe_same_anon_vma(folio, vma)) {
return folio; /* still no need to copy it */
}
if (PageHWPoison(page))
@@ -3173,20 +3171,20 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) {
/* Ignore the stable/unstable/sqnr flags */
const unsigned long addr = rmap_item->address & PAGE_MASK;
- struct anon_vma *anon_vma = rmap_item->anon_vma;
+ anon_rmap_t anon_rmap = rmap_item->anon_rmap;
struct anon_vma_chain *vmac;
struct vm_area_struct *vma;
cond_resched();
- if (!anon_vma_trylock_read(anon_vma)) {
+ if (!anon_rmap_trylock_read(anon_rmap)) {
if (rwc->try_lock) {
rwc->contended = true;
return;
}
- anon_vma_lock_read(anon_vma);
+ anon_rmap_lock_read(anon_rmap);
}
- anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
+ anon_rmap_foreach_vma(vma, vmac, anon_rmap,
0, ULONG_MAX) {
cond_resched();
@@ -3207,15 +3205,15 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
continue;
if (!rwc->rmap_one(folio, vma, addr, rwc->arg)) {
- anon_vma_unlock_read(anon_vma);
+ anon_rmap_unlock_read(anon_rmap);
return;
}
if (rwc->done && rwc->done(folio)) {
- anon_vma_unlock_read(anon_vma);
+ anon_rmap_unlock_read(anon_rmap);
return;
}
}
- anon_vma_unlock_read(anon_vma);
+ anon_rmap_unlock_read(anon_rmap);
}
if (!search_new_forks++)
goto again;
@@ -3237,9 +3235,9 @@ void collect_procs_ksm(const struct folio *folio, const struct page *page,
if (!stable_node)
return;
hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) {
- struct anon_vma *av = rmap_item->anon_vma;
+ anon_rmap_t anon_rmap = rmap_item->anon_rmap;
- anon_vma_lock_read(av);
+ anon_rmap_lock_read(anon_rmap);
rcu_read_lock();
for_each_process(tsk) {
struct anon_vma_chain *vmac;
@@ -3248,10 +3246,9 @@ void collect_procs_ksm(const struct folio *folio, const struct page *page,
task_early_kill(tsk, force_early);
if (!t)
continue;
- anon_vma_interval_tree_foreach(vmac, &av->rb_root, 0,
+ anon_rmap_foreach_vma(vma, vmac, anon_rmap, 0,
ULONG_MAX)
{
- vma = vmac->vma;
if (vma->vm_mm == t->mm) {
addr = rmap_item->address & PAGE_MASK;
add_to_kill_ksm(t, page, vma, to_kill,
@@ -3260,7 +3257,7 @@ void collect_procs_ksm(const struct folio *folio, const struct page *page,
}
}
rcu_read_unlock();
- anon_vma_unlock_read(av);
+ anon_rmap_unlock_read(anon_rmap);
}
}
#endif
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ee42d4361309..bc9abba75b5d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -547,11 +547,11 @@ static void collect_procs_anon(const struct folio *folio,
int force_early)
{
struct task_struct *tsk;
- struct anon_vma *av;
+ anon_rmap_t anon_rmap;
pgoff_t pgoff;
- av = folio_lock_anon_vma_read(folio, NULL);
- if (av == NULL) /* Not actually mapped anymore */
+ anon_rmap = folio_lock_anon_rmap_read(folio, NULL);
+ if (!anon_rmap_value(anon_rmap)) /* Not actually mapped anymore */
return;
pgoff = page_pgoff(folio, page);
@@ -564,9 +564,8 @@ static void collect_procs_anon(const struct folio *folio,
if (!t)
continue;
- anon_vma_interval_tree_foreach(vmac, &av->rb_root,
+ anon_rmap_foreach_vma(vma, vmac, anon_rmap,
pgoff, pgoff) {
- vma = vmac->vma;
if (vma->vm_mm != t->mm)
continue;
addr = page_mapped_in_vma(page, vma);
@@ -574,7 +573,7 @@ static void collect_procs_anon(const struct folio *folio,
}
}
rcu_read_unlock();
- anon_vma_unlock_read(av);
+ anon_rmap_unlock_read(anon_rmap);
}
/*
diff --git a/mm/migrate.c b/mm/migrate.c
index 8a64291ab5b4..769983cf14e0 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1142,18 +1142,18 @@ enum {
static void __migrate_folio_record(struct folio *dst,
int old_page_state,
- struct anon_vma *anon_vma)
+ anon_rmap_t anon_rmap)
{
- dst->private = (void *)anon_vma + old_page_state;
+ dst->private = (void *)anon_rmap_to_anon_vma(anon_rmap) + old_page_state;
}
static void __migrate_folio_extract(struct folio *dst,
int *old_page_state,
- struct anon_vma **anon_vmap)
+ anon_rmap_t *anon_rmapp)
{
unsigned long private = (unsigned long)dst->private;
- *anon_vmap = (struct anon_vma *)(private & ~PAGE_OLD_STATES);
+ *anon_rmapp = anon_vma_to_anon_rmap((void *)(private & ~PAGE_OLD_STATES));
*old_page_state = private & PAGE_OLD_STATES;
dst->private = NULL;
}
@@ -1161,15 +1161,15 @@ static void __migrate_folio_extract(struct folio *dst,
/* Restore the source folio to the original state upon failure */
static void migrate_folio_undo_src(struct folio *src,
int page_was_mapped,
- struct anon_vma *anon_vma,
+ anon_rmap_t anon_rmap,
bool locked,
struct list_head *ret)
{
if (page_was_mapped)
remove_migration_ptes(src, src, 0);
- /* Drop an anon_vma reference if we took one */
- if (anon_vma)
- put_anon_vma(anon_vma);
+ /* Drop an anon_rmap reference if we took one */
+ if (anon_rmap_value(anon_rmap))
+ put_anon_rmap(anon_rmap);
if (locked)
folio_unlock(src);
if (ret)
@@ -1210,7 +1210,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
struct folio *dst;
int rc = -EAGAIN;
int old_page_state = 0;
- struct anon_vma *anon_vma = NULL;
+ anon_rmap_t anon_rmap = ANON_RMAP_NULL;
bool locked = false;
bool dst_locked = false;
@@ -1275,19 +1275,19 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
/*
* By try_to_migrate(), src->mapcount goes down to 0 here. In this case,
* we cannot notice that anon_vma is freed while we migrate a page.
- * This get_anon_vma() delays freeing anon_vma pointer until the end
+ * This get_anon_rmap() delays freeing anon_rmap pointer until the end
* of migration. File cache pages are no problem because of page_lock()
* File Caches may use write_page() or lock_page() in migration, then,
* just care Anon page here.
*
- * Only folio_get_anon_vma() understands the subtleties of
- * getting a hold on an anon_vma from outside one of its mms.
- * But if we cannot get anon_vma, then we won't need it anyway,
+ * Only folio_get_anon_rmap() understands the subtleties of
+ * getting a hold on an anon_rmap from outside one of its mms.
+ * But if we cannot get anon_rmap, then we won't need it anyway,
* because that implies that the anon page is no longer mapped
* (and cannot be remapped so long as we hold the page lock).
*/
if (folio_test_anon(src) && !folio_test_ksm(src))
- anon_vma = folio_get_anon_vma(src);
+ anon_rmap = folio_get_anon_rmap(src);
/*
* Block others from accessing the new page when we get around to
@@ -1302,7 +1302,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
dst_locked = true;
if (unlikely(page_has_movable_ops(&src->page))) {
- __migrate_folio_record(dst, old_page_state, anon_vma);
+ __migrate_folio_record(dst, old_page_state, anon_rmap);
return 0;
}
@@ -1326,13 +1326,13 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
} else if (folio_mapped(src)) {
/* Establish migration ptes */
VM_BUG_ON_FOLIO(folio_test_anon(src) &&
- !folio_test_ksm(src) && !anon_vma, src);
+ !folio_test_ksm(src) && !anon_rmap_value(anon_rmap), src);
try_to_migrate(src, mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0);
old_page_state |= PAGE_WAS_MAPPED;
}
if (!folio_mapped(src)) {
- __migrate_folio_record(dst, old_page_state, anon_vma);
+ __migrate_folio_record(dst, old_page_state, anon_rmap);
return 0;
}
@@ -1345,7 +1345,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
ret = NULL;
migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED,
- anon_vma, locked, ret);
+ anon_rmap, locked, ret);
migrate_folio_undo_dst(dst, dst_locked, put_new_folio, private);
return rc;
@@ -1359,12 +1359,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
{
int rc;
int old_page_state = 0;
- struct anon_vma *anon_vma = NULL;
+ anon_rmap_t anon_rmap = ANON_RMAP_NULL;
bool src_deferred_split = false;
bool src_partially_mapped = false;
struct list_head *prev;
- __migrate_folio_extract(dst, &old_page_state, &anon_vma);
+ __migrate_folio_extract(dst, &old_page_state, &anon_rmap);
prev = dst->lru.prev;
list_del(&dst->lru);
@@ -1425,9 +1425,9 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
* and will be freed.
*/
list_del(&src->lru);
- /* Drop an anon_vma reference if we took one */
- if (anon_vma)
- put_anon_vma(anon_vma);
+ /* Drop an anon_rmap reference if we took one */
+ if (anon_rmap_value(anon_rmap))
+ put_anon_rmap(anon_rmap);
folio_unlock(src);
migrate_folio_done(src, reason);
@@ -1439,12 +1439,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
*/
if (rc == -EAGAIN) {
list_add(&dst->lru, prev);
- __migrate_folio_record(dst, old_page_state, anon_vma);
+ __migrate_folio_record(dst, old_page_state, anon_rmap);
return rc;
}
migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED,
- anon_vma, true, ret);
+ anon_rmap, true, ret);
migrate_folio_undo_dst(dst, true, put_new_folio, private);
return rc;
@@ -1476,7 +1476,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
struct folio *dst;
int rc = -EAGAIN;
int page_was_mapped = 0;
- struct anon_vma *anon_vma = NULL;
+ anon_rmap_t anon_rmap = ANON_RMAP_NULL;
struct address_space *mapping = NULL;
enum ttu_flags ttu = 0;
@@ -1513,7 +1513,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
}
if (folio_test_anon(src))
- anon_vma = folio_get_anon_vma(src);
+ anon_rmap = folio_get_anon_rmap(src);
if (unlikely(!folio_trylock(dst)))
goto put_anon;
@@ -1550,8 +1550,8 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
folio_unlock(dst);
put_anon:
- if (anon_vma)
- put_anon_vma(anon_vma);
+ if (anon_rmap_value(anon_rmap))
+ put_anon_rmap(anon_rmap);
if (!rc) {
move_hugetlb_state(src, dst, reason);
@@ -1778,11 +1778,11 @@ static void migrate_folios_undo(struct list_head *src_folios,
dst2 = list_next_entry(dst, lru);
list_for_each_entry_safe(folio, folio2, src_folios, lru) {
int old_page_state = 0;
- struct anon_vma *anon_vma = NULL;
+ anon_rmap_t anon_rmap = ANON_RMAP_NULL;
- __migrate_folio_extract(dst, &old_page_state, &anon_vma);
+ __migrate_folio_extract(dst, &old_page_state, &anon_rmap);
migrate_folio_undo_src(folio, old_page_state & PAGE_WAS_MAPPED,
- anon_vma, true, ret_folios);
+ anon_rmap, true, ret_folios);
list_del(&dst->lru);
migrate_folio_undo_dst(dst, true, put_new_folio, private);
dst = dst2;
diff --git a/mm/page_idle.c b/mm/page_idle.c
index 9c67cbac2965..d4103f20f526 100644
--- a/mm/page_idle.c
+++ b/mm/page_idle.c
@@ -102,7 +102,7 @@ static void page_idle_clear_pte_refs(struct folio *folio)
*/
static struct rmap_walk_control rwc = {
.rmap_one = page_idle_clear_pte_refs_one,
- .anon_lock = folio_lock_anon_vma_read,
+ .anon_lock = folio_lock_anon_rmap_read,
};
if (!folio_mapped(folio) || !folio_raw_mapping(folio))
diff --git a/mm/rmap.c b/mm/rmap.c
index 1b2dada71778..41607168e00e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -630,8 +630,8 @@ struct anon_vma *folio_get_anon_vma(const struct folio *folio)
* reference like with folio_get_anon_vma() and then block on the mutex
* on !rwc->try_lock case.
*/
-struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
- struct rmap_walk_control *rwc)
+static struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
+ struct rmap_walk_control *rwc)
{
struct anon_vma *anon_vma = NULL;
struct anon_vma *root_anon_vma;
@@ -744,6 +744,14 @@ void anon_rmap_unlock_read(anon_rmap_t anon_rmap)
anon_vma_unlock_read(anon_rmap_to_anon_vma(anon_rmap));
}
+static anon_rmap_t folio_anon_rmap(const struct folio *folio)
+{
+ struct anon_vma *anon_vma;
+
+ anon_vma = folio_anon_vma(folio);
+ return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
+}
+
bool folio_maybe_same_anon_vma(const struct folio *folio,
const struct vm_area_struct *vma)
{
@@ -930,13 +938,11 @@ unsigned long page_address_in_vma(const struct folio *folio,
const struct page *page, const struct vm_area_struct *vma)
{
if (folio_test_anon(folio)) {
- struct anon_vma *anon_vma = folio_anon_vma(folio);
/*
* Note: swapoff's unuse_vma() is more efficient with this
* check, and needs it to match anon_vma when KSM is active.
*/
- if (!vma->anon_vma || !anon_vma ||
- vma->anon_vma->root != anon_vma->root)
+ if (!vma->anon_vma || !folio_maybe_same_anon_vma(folio, vma))
return -EFAULT;
} else if (!vma->vm_file) {
return -EFAULT;
@@ -944,7 +950,7 @@ unsigned long page_address_in_vma(const struct folio *folio,
return -EFAULT;
}
- /* KSM folios don't reach here because of the !anon_vma check */
+ /* The !folio_maybe_same_anon_vma() above handles KSM folios */
return vma_address(vma, page_pgoff(folio, page), 1);
}
@@ -1145,7 +1151,7 @@ int folio_referenced(struct folio *folio, int is_locked,
struct rmap_walk_control rwc = {
.rmap_one = folio_referenced_one,
.arg = (void *)&pra,
- .anon_lock = folio_lock_anon_vma_read,
+ .anon_lock = folio_lock_anon_rmap_read,
.try_lock = true,
.invalid_vma = invalid_folio_referenced_vma,
};
@@ -1580,8 +1586,7 @@ static void __page_check_anon_rmap(const struct folio *folio,
* are initially only visible via the pagetables, and the pte is locked
* over the call to folio_add_new_anon_rmap.
*/
- VM_BUG_ON_FOLIO(folio_anon_vma(folio)->root != vma->anon_vma->root,
- folio);
+ VM_BUG_ON_FOLIO(!folio_maybe_same_anon_vma(folio, vma), folio);
VM_BUG_ON_PAGE(page_pgoff(folio, page) != linear_page_index(vma, address),
page);
}
@@ -2468,7 +2473,7 @@ void try_to_unmap(struct folio *folio, enum ttu_flags flags)
.rmap_one = try_to_unmap_one,
.arg = (void *)flags,
.done = folio_not_mapped,
- .anon_lock = folio_lock_anon_vma_read,
+ .anon_lock = folio_lock_anon_rmap_read,
};
if (flags & TTU_RMAP_LOCKED)
@@ -2813,7 +2818,7 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags)
.rmap_one = try_to_migrate_one,
.arg = (void *)flags,
.done = folio_not_mapped,
- .anon_lock = folio_lock_anon_vma_read,
+ .anon_lock = folio_lock_anon_rmap_read,
};
/*
@@ -2990,8 +2995,8 @@ void __put_anon_vma(struct anon_vma *anon_vma)
anon_vma_free(root);
}
-static struct anon_vma *rmap_walk_anon_lock(const struct folio *folio,
- struct rmap_walk_control *rwc)
+static anon_rmap_t rmap_walk_anon_lock(const struct folio *folio,
+ struct rmap_walk_control *rwc)
{
struct anon_vma *anon_vma;
@@ -3006,7 +3011,7 @@ static struct anon_vma *rmap_walk_anon_lock(const struct folio *folio,
*/
anon_vma = folio_anon_vma(folio);
if (!anon_vma)
- return NULL;
+ return ANON_RMAP_NULL;
if (anon_vma_trylock_read(anon_vma))
goto out;
@@ -3019,7 +3024,7 @@ static struct anon_vma *rmap_walk_anon_lock(const struct folio *folio,
anon_vma_lock_read(anon_vma);
out:
- return anon_vma;
+ return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
}
/*
@@ -3035,9 +3040,10 @@ static struct anon_vma *rmap_walk_anon_lock(const struct folio *folio,
static void rmap_walk_anon(struct folio *folio,
struct rmap_walk_control *rwc, bool locked)
{
- struct anon_vma *anon_vma;
+ anon_rmap_t anon_rmap;
pgoff_t pgoff_start, pgoff_end;
struct anon_vma_chain *avc;
+ struct vm_area_struct *vma;
/*
* The folio lock ensures that folio->mapping can't be changed under us
@@ -3046,20 +3052,19 @@ static void rmap_walk_anon(struct folio *folio,
VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
if (locked) {
- anon_vma = folio_anon_vma(folio);
+ anon_rmap = folio_anon_rmap(folio);
/* anon_vma disappear under us? */
- VM_BUG_ON_FOLIO(!anon_vma, folio);
+ VM_BUG_ON_FOLIO(!anon_rmap_value(anon_rmap), folio);
} else {
- anon_vma = rmap_walk_anon_lock(folio, rwc);
+ anon_rmap = rmap_walk_anon_lock(folio, rwc);
}
- if (!anon_vma)
+ if (!anon_rmap_value(anon_rmap))
return;
pgoff_start = folio_pgoff(folio);
pgoff_end = pgoff_start + folio_nr_pages(folio) - 1;
- anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root,
+ anon_rmap_foreach_vma(vma, avc, anon_rmap,
pgoff_start, pgoff_end) {
- struct vm_area_struct *vma = avc->vma;
unsigned long address = vma_address(vma, pgoff_start,
folio_nr_pages(folio));
@@ -3076,7 +3081,7 @@ static void rmap_walk_anon(struct folio *folio,
}
if (!locked)
- anon_vma_unlock_read(anon_vma);
+ anon_rmap_unlock_read(anon_rmap);
}
/**
--
2.17.1
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 03/15] mm: introduce anon_vma_tree_t for multiple anon_vma topologies
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
2026-05-27 11:01 ` [PATCH 01/15] mm/rmap: introduce anon_rmap APIs for anonymous folios tao
2026-05-27 11:01 ` [PATCH 02/15] mm: convert anon_vma rmap APIs to anon_rmap tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:56 ` Lorenzo Stoakes
2026-05-27 11:01 ` [PATCH 04/15] mm: switch to anon_vma_tree_t APIs in preparation for ANON_VMA_LAZY tao
` (16 subsequent siblings)
19 siblings, 1 reply; 64+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Prepare for upcoming ANON_VMA_LAZY support and RCU-based lockless rmap
traversal by clearly separating anon_vma topology handling from the
anon_rmap semantics.
Prepare for supporting multiple anon_vma topologies by introducing
lightweight abstractions used by the VMA and rmap code.
Introduce anon_vma_tree_t as the type stored in vma->anon_vma:
typedef unsigned long anon_vma_tree_t;
It represents a tagged pointer encoding a reference to the anon_vma
topology. The low bits are reserved as type tags to distinguish
different implementations (e.g. regular anon_vma and lazy anon_vma).
This keeps the VMA representation compact while allowing the topology
to evolve without changing the VMA layout.
Signed-off-by: tao <tao.wangtao@honor.com>
---
include/linux/mm_types.h | 3 +++
mm/internal.h | 54 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 57 insertions(+)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index a308e2c23b82..5f4961ea1572 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -917,6 +917,9 @@ struct vm_area_desc {
struct mmap_action action;
};
+/* Tagged pointer stored in vma->anon_vma. Low bits encode anon_vma type. */
+typedef unsigned long anon_vma_tree_t;
+
/*
* This struct describes a virtual memory area. There is one of these
* per VM-area/task. A VM area is any part of the process virtual memory
diff --git a/mm/internal.h b/mm/internal.h
index 5a2ddcf68e0b..76544ad44ff0 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -246,6 +246,60 @@ static inline void anon_vma_unlock_read(struct anon_vma *anon_vma)
up_read(&anon_vma->root->rwsem);
}
+/* anon_vma_tree_t APIs */
+
+static inline anon_vma_tree_t make_anon_vma_tree(struct anon_vma *anon_vma)
+{
+ return (anon_vma_tree_t)anon_vma;
+}
+
+static inline struct anon_vma *anon_vma_tree_anon_vma(anon_vma_tree_t anon_tree)
+{
+ return (struct anon_vma *)anon_tree;
+}
+
+static inline void anon_vma_tree_lock_write(anon_vma_tree_t anon_tree)
+{
+ struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
+
+ anon_vma_lock_write(anon_vma);
+}
+
+static inline int anon_vma_tree_trylock_write(anon_vma_tree_t anon_tree)
+{
+ struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
+
+ return anon_vma_trylock_write(anon_vma);
+}
+
+static inline void anon_vma_tree_unlock_write(anon_vma_tree_t anon_tree)
+{
+ struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
+
+ anon_vma_unlock_write(anon_vma);
+}
+
+static inline void anon_vma_tree_lock_read(anon_vma_tree_t anon_tree)
+{
+ struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
+
+ anon_vma_lock_read(anon_vma);
+}
+
+static inline int anon_vma_tree_trylock_read(anon_vma_tree_t anon_tree)
+{
+ struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
+
+ return anon_vma_trylock_read(anon_vma);
+}
+
+static inline void anon_vma_tree_unlock_read(anon_vma_tree_t anon_tree)
+{
+ struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
+
+ anon_vma_unlock_read(anon_vma);
+}
+
struct anon_vma *folio_get_anon_vma(const struct folio *folio);
/* Operations which modify VMAs. */
--
2.17.1
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 04/15] mm: switch to anon_vma_tree_t APIs in preparation for ANON_VMA_LAZY
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (2 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 03/15] mm: introduce anon_vma_tree_t for multiple anon_vma topologies tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 05/15] mm: add CONFIG_ANON_VMA_LAZY and folio helpers tao
` (15 subsequent siblings)
19 siblings, 0 replies; 64+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Replace direct anon_vma usage with anon_vma_tree_t APIs. This prepares
for ANON_VMA_LAZY and prevents external modules from accessing anon_vma
directly.
Signed-off-by: tao <tao.wangtao@honor.com>
---
include/linux/mm_types.h | 2 +-
mm/debug.c | 2 +-
mm/internal.h | 16 +++++++++++
mm/khugepaged.c | 8 +++---
mm/memory.c | 2 +-
mm/mmap.c | 2 +-
mm/mremap.c | 4 +--
mm/rmap.c | 59 ++++++++++++++++++++++------------------
mm/vma.c | 26 +++++++++---------
mm/vma.h | 4 +--
10 files changed, 73 insertions(+), 52 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5f4961ea1572..e7f5debac98e 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -987,7 +987,7 @@ struct vm_area_struct {
*/
struct list_head anon_vma_chain; /* Serialized by mmap_lock &
* page_table_lock */
- struct anon_vma *anon_vma; /* Serialized by page_table_lock */
+ anon_vma_tree_t anon_vma; /* Serialized by page_table_lock */
/* Function pointers to deal with this struct. */
const struct vm_operations_struct *vm_ops;
diff --git a/mm/debug.c b/mm/debug.c
index 77fa8fe1d641..f64cf9c9abbb 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -163,7 +163,7 @@ void dump_vma(const struct vm_area_struct *vma)
"flags: %#lx(%pGv)\n",
vma, (void *)vma->vm_start, (void *)vma->vm_end, vma->vm_mm,
(unsigned long)pgprot_val(vma->vm_page_prot),
- vma->anon_vma, vma->vm_ops, vma->vm_pgoff,
+ (void *)vma->anon_vma, vma->vm_ops, vma->vm_pgoff,
vma->vm_file, vma->vm_private_data,
#ifdef CONFIG_PER_VMA_LOCK
refcount_read(&vma->vm_refcnt),
diff --git a/mm/internal.h b/mm/internal.h
index 76544ad44ff0..3dbbd118a78c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -258,6 +258,22 @@ static inline struct anon_vma *anon_vma_tree_anon_vma(anon_vma_tree_t anon_tree)
return (struct anon_vma *)anon_tree;
}
+/* Store anon_vma in vma->anon_vma using a tagged pointer. */
+static inline void vma_set_anon_vma(struct vm_area_struct *vma,
+ struct anon_vma *anon_vma)
+{
+ vma->anon_vma = (anon_vma_tree_t)anon_vma;
+}
+
+/* Return the VMA's anon_vma. */
+static inline struct anon_vma *vma_anon_vma(const struct vm_area_struct *vma)
+{
+ /* Use READ_ONCE() for reusable_anon_vma */
+ anon_vma_tree_t anon_tree = READ_ONCE(vma->anon_vma);
+
+ return anon_vma_tree_anon_vma(anon_tree);
+}
+
static inline void anon_vma_tree_lock_write(anon_vma_tree_t anon_tree)
{
struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b8452dbdb043..747748eace91 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -761,7 +761,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte,
* Re-establish the PMD to point to the original page table
* entry. Restoring PMD needs to be done prior to releasing
* pages. Since pages are still isolated and locked here,
- * acquiring anon_vma_lock_write is unnecessary.
+ * acquiring anon_vma_tree_lock_write is unnecessary.
*/
pmd_ptl = pmd_lock(vma->vm_mm, pmd);
pmd_populate(vma->vm_mm, pmd, pmd_pgtable(orig_pmd));
@@ -1164,7 +1164,7 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
if (result != SCAN_SUCCEED)
goto out_up_write;
- anon_vma_lock_write(vma->anon_vma);
+ anon_vma_tree_lock_write(vma->anon_vma);
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address,
address + HPAGE_PMD_SIZE);
@@ -1205,7 +1205,7 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
*/
pmd_populate(mm, pmd, pmd_pgtable(_pmd));
spin_unlock(pmd_ptl);
- anon_vma_unlock_write(vma->anon_vma);
+ anon_vma_tree_unlock_write(vma->anon_vma);
goto out_up_write;
}
@@ -1213,7 +1213,7 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
* All pages are isolated and locked so anon_vma rmap
* can't run anymore.
*/
- anon_vma_unlock_write(vma->anon_vma);
+ anon_vma_tree_unlock_write(vma->anon_vma);
result = __collapse_huge_page_copy(pte, folio, pmd, _pmd,
vma, address, pte_ptl,
diff --git a/mm/memory.c b/mm/memory.c
index 86a973119bd4..c13b79987b26 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -602,7 +602,7 @@ static void print_bad_page_map(struct vm_area_struct *vma,
if (page)
dump_page(page, "bad page map");
pr_alert("addr:%px vm_flags:%08lx anon_vma:%px mapping:%px index:%lx\n",
- (void *)addr, vma->vm_flags, vma->anon_vma, mapping, index);
+ (void *)addr, vma->vm_flags, (void *)vma->anon_vma, mapping, index);
pr_alert("file:%pD fault:%ps mmap:%ps mmap_prepare: %ps read_folio:%ps\n",
vma->vm_file,
vma->vm_ops ? vma->vm_ops->fault : NULL,
diff --git a/mm/mmap.c b/mm/mmap.c
index 5754d1c36462..eac1fb3823eb 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1799,7 +1799,7 @@ __latent_entropy int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
* Don't prepare anon_vma until fault since we don't
* copy page for current vma.
*/
- tmp->anon_vma = NULL;
+ vma_set_anon_vma(tmp, NULL);
} else if (anon_vma_fork(tmp, mpnt))
goto fail_nomem_anon_vma_fork;
vm_flags_clear(tmp, VM_LOCKED_MASK);
diff --git a/mm/mremap.c b/mm/mremap.c
index e9c8b1d05832..6af41e58f79f 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -145,13 +145,13 @@ static void take_rmap_locks(struct vm_area_struct *vma)
if (vma->vm_file)
i_mmap_lock_write(vma->vm_file->f_mapping);
if (vma->anon_vma)
- anon_vma_lock_write(vma->anon_vma);
+ anon_vma_tree_lock_write(vma->anon_vma);
}
static void drop_rmap_locks(struct vm_area_struct *vma)
{
if (vma->anon_vma)
- anon_vma_unlock_write(vma->anon_vma);
+ anon_vma_tree_unlock_write(vma->anon_vma);
if (vma->vm_file)
i_mmap_unlock_write(vma->vm_file->f_mapping);
}
diff --git a/mm/rmap.c b/mm/rmap.c
index 41607168e00e..5c4eb090c801 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -186,6 +186,7 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
{
struct mm_struct *mm = vma->vm_mm;
struct anon_vma *anon_vma, *allocated;
+ anon_vma_tree_t anon_tree;
struct anon_vma_chain *avc;
mmap_assert_locked(mm);
@@ -205,11 +206,12 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
allocated = anon_vma;
}
- anon_vma_lock_write(anon_vma);
+ anon_tree = make_anon_vma_tree(anon_vma);
+ anon_vma_tree_lock_write(anon_tree);
/* page_table_lock to protect against threads */
spin_lock(&mm->page_table_lock);
if (likely(!vma->anon_vma)) {
- vma->anon_vma = anon_vma;
+ vma->anon_vma = anon_tree;
anon_vma_chain_assign(vma, avc, anon_vma);
anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
anon_vma->num_active_vmas++;
@@ -217,7 +219,7 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
avc = NULL;
}
spin_unlock(&mm->page_table_lock);
- anon_vma_unlock_write(anon_vma);
+ anon_vma_tree_unlock_write(anon_tree);
if (unlikely(allocated))
put_anon_vma(allocated);
@@ -283,7 +285,7 @@ static void maybe_reuse_anon_vma(struct vm_area_struct *dst,
if (anon_vma->num_children > 1)
return;
- dst->anon_vma = anon_vma;
+ vma_set_anon_vma(dst, anon_vma);
anon_vma->num_active_vmas++;
}
@@ -321,11 +323,11 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src,
enum vma_operation operation)
{
struct anon_vma_chain *avc, *pavc;
- struct anon_vma *active_anon_vma = src->anon_vma;
+ anon_vma_tree_t active_anon_tree = src->anon_vma;
check_anon_vma_clone(dst, src, operation);
- if (!active_anon_vma)
+ if (!active_anon_tree)
return 0;
/*
@@ -350,7 +352,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src,
* Now link the anon_vma's back to the newly inserted AVCs.
* Note that all anon_vma's share the same root.
*/
- anon_vma_lock_write(src->anon_vma);
+ anon_vma_tree_lock_write(active_anon_tree);
list_for_each_entry_reverse(avc, &dst->anon_vma_chain, same_vma) {
struct anon_vma *anon_vma = avc->anon_vma;
@@ -360,9 +362,9 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src,
}
if (operation != VMA_OP_FORK)
- dst->anon_vma->num_active_vmas++;
+ vma_anon_vma(dst)->num_active_vmas++;
- anon_vma_unlock_write(active_anon_vma);
+ anon_vma_tree_unlock_write(active_anon_tree);
return 0;
enomem_failure:
@@ -379,6 +381,7 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
{
struct anon_vma_chain *avc;
struct anon_vma *anon_vma;
+ anon_vma_tree_t anon_tree;
int rc;
/* Don't bother if the parent process has no anon_vma here. */
@@ -386,7 +389,7 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
return 0;
/* Drop inherited anon_vma, we'll reuse existing or allocate new. */
- vma->anon_vma = NULL;
+ vma_set_anon_vma(vma, NULL);
anon_vma = anon_vma_alloc();
if (!anon_vma)
@@ -421,8 +424,8 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
* The root anon_vma's rwsem is the lock actually used when we
* lock any of the anon_vmas in this anon_vma tree.
*/
- anon_vma->root = pvma->anon_vma->root;
- anon_vma->parent = pvma->anon_vma;
+ anon_vma->parent = vma_anon_vma(pvma);
+ anon_vma->root = anon_vma->parent->root;
/*
* With refcounts, an anon_vma can stay around longer than the
* process it belongs to. The root anon_vma needs to be pinned until
@@ -430,13 +433,13 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
*/
get_anon_vma(anon_vma->root);
/* Mark this anon_vma as the one where our new (COWed) pages go. */
- vma->anon_vma = anon_vma;
+ vma->anon_vma = anon_tree = make_anon_vma_tree(anon_vma);
anon_vma_chain_assign(vma, avc, anon_vma);
/* Now let rmap see it. */
- anon_vma_lock_write(anon_vma);
+ anon_vma_tree_lock_write(anon_tree);
anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
anon_vma->parent->num_children++;
- anon_vma_unlock_write(anon_vma);
+ anon_vma_tree_unlock_write(anon_tree);
return 0;
}
@@ -463,7 +466,7 @@ static void cleanup_partial_anon_vmas(struct vm_area_struct *vma)
* able to correctly clone AVC state. Avoid inconsistent anon_vma tree
* state by resetting.
*/
- vma->anon_vma = NULL;
+ vma_set_anon_vma(vma, NULL);
}
/**
@@ -479,18 +482,18 @@ static void cleanup_partial_anon_vmas(struct vm_area_struct *vma)
void unlink_anon_vmas(struct vm_area_struct *vma)
{
struct anon_vma_chain *avc, *next;
- struct anon_vma *active_anon_vma = vma->anon_vma;
+ anon_vma_tree_t active_anon_tree = vma->anon_vma;
/* Always hold mmap lock, read-lock on unmap possibly. */
mmap_assert_locked(vma->vm_mm);
/* Unfaulted is a no-op. */
- if (!active_anon_vma) {
+ if (!active_anon_tree) {
VM_WARN_ON_ONCE(!list_empty(&vma->anon_vma_chain));
return;
}
- anon_vma_lock_write(active_anon_vma);
+ anon_vma_tree_lock_write(active_anon_tree);
/*
* Unlink each anon_vma chained to the VMA. This list is ordered
@@ -514,13 +517,13 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
anon_vma_chain_free(avc);
}
- active_anon_vma->num_active_vmas--;
+ vma_anon_vma(vma)->num_active_vmas--;
/*
* vma would still be needed after unlink, and anon_vma will be prepared
* when handle fault.
*/
- vma->anon_vma = NULL;
- anon_vma_unlock_write(active_anon_vma);
+ vma_set_anon_vma(vma, NULL);
+ anon_vma_tree_unlock_write(active_anon_tree);
/*
@@ -703,10 +706,12 @@ static struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *vma)
{
+ struct anon_vma *anon_vma = anon_vma_tree_anon_vma(vma->anon_vma);
+
mmap_assert_locked(vma->vm_mm);
VM_BUG_ON(!vma->anon_vma);
- get_anon_vma(vma->anon_vma);
- return anon_vma_to_anon_rmap(vma->anon_vma);
+ get_anon_vma(anon_vma);
+ return anon_vma_to_anon_rmap(anon_vma);
}
void put_anon_rmap(anon_rmap_t anon_rmap)
@@ -756,7 +761,7 @@ bool folio_maybe_same_anon_vma(const struct folio *folio,
const struct vm_area_struct *vma)
{
struct anon_vma *anon_vma;
- struct anon_vma *tgt_anon_vma = vma->anon_vma;
+ struct anon_vma *tgt_anon_vma = vma_anon_vma(vma);
bool same = false;
rcu_read_lock();
@@ -1518,7 +1523,7 @@ static __always_inline void __folio_add_rmap(struct folio *folio,
*/
void folio_move_anon_rmap(struct folio *folio, struct vm_area_struct *vma)
{
- void *anon_vma = vma->anon_vma;
+ void *anon_vma = vma_anon_vma(vma);
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
VM_BUG_ON_VMA(!anon_vma, vma);
@@ -1542,7 +1547,7 @@ void folio_move_anon_rmap(struct folio *folio, struct vm_area_struct *vma)
static void __folio_set_anon(struct folio *folio, struct vm_area_struct *vma,
unsigned long address, bool exclusive)
{
- struct anon_vma *anon_vma = vma->anon_vma;
+ struct anon_vma *anon_vma = vma_anon_vma(vma);
BUG_ON(!anon_vma);
diff --git a/mm/vma.c b/mm/vma.c
index d90791b00a7b..3501617085b0 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -107,8 +107,8 @@ static bool is_mergeable_anon_vma(struct vma_merge_struct *vmg, bool merge_next)
{
struct vm_area_struct *tgt = merge_next ? vmg->next : vmg->prev;
struct vm_area_struct *src = vmg->middle; /* existing merge case. */
- struct anon_vma *tgt_anon = tgt->anon_vma;
- struct anon_vma *src_anon = vmg->anon_vma;
+ anon_vma_tree_t tgt_anon = tgt->anon_vma;
+ anon_vma_tree_t src_anon = vmg->anon_vma;
/*
* We _can_ have !src, vmg->anon_vma via copy_vma(). In this instance we
@@ -311,7 +311,7 @@ static void vma_prepare(struct vma_prepare *vp)
}
if (vp->anon_vma) {
- anon_vma_lock_write(vp->anon_vma);
+ anon_vma_tree_lock_write(vp->anon_vma);
anon_vma_interval_tree_pre_update_vma(vp->vma);
if (vp->adj_next)
anon_vma_interval_tree_pre_update_vma(vp->adj_next);
@@ -364,7 +364,7 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
anon_vma_interval_tree_post_update_vma(vp->vma);
if (vp->adj_next)
anon_vma_interval_tree_post_update_vma(vp->adj_next);
- anon_vma_unlock_write(vp->anon_vma);
+ anon_vma_tree_unlock_write(vp->anon_vma);
}
if (vp->file) {
@@ -652,7 +652,7 @@ void validate_mm(struct mm_struct *mm)
mt_validate(&mm->mm_mt);
for_each_vma(vmi, vma) {
#ifdef CONFIG_DEBUG_VM_RB
- struct anon_vma *anon_vma = vma->anon_vma;
+ anon_vma_tree_t anon_tree = vma->anon_vma;
struct anon_vma_chain *avc;
#endif
unsigned long vmi_start, vmi_end;
@@ -676,11 +676,11 @@ void validate_mm(struct mm_struct *mm)
}
#ifdef CONFIG_DEBUG_VM_RB
- if (anon_vma) {
- anon_vma_lock_read(anon_vma);
+ if (anon_tree) {
+ anon_vma_tree_lock_read(anon_tree);
list_for_each_entry(avc, &vma->anon_vma_chain, same_vma)
anon_vma_interval_tree_verify(avc);
- anon_vma_unlock_read(anon_vma);
+ anon_vma_tree_unlock_read(anon_tree);
}
#endif
/* Check for a infinite loop */
@@ -2009,7 +2009,7 @@ static struct anon_vma *reusable_anon_vma(struct vm_area_struct *old,
struct vm_area_struct *b)
{
if (anon_vma_compatible(a, b)) {
- struct anon_vma *anon_vma = READ_ONCE(old->anon_vma);
+ struct anon_vma *anon_vma = vma_anon_vma(old);
if (anon_vma && list_is_singular(&old->anon_vma_chain))
return anon_vma;
@@ -3160,7 +3160,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
/* Lock the VMA before expanding to prevent concurrent page faults */
vma_start_write(vma);
/* We update the anon VMA tree. */
- anon_vma_lock_write(vma->anon_vma);
+ anon_vma_tree_lock_write(vma->anon_vma);
/* Somebody else might have raced and expanded it already */
if (address > vma->vm_end) {
@@ -3186,7 +3186,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
}
}
}
- anon_vma_unlock_write(vma->anon_vma);
+ anon_vma_tree_unlock_write(vma->anon_vma);
vma_iter_free(&vmi);
validate_mm(mm);
return error;
@@ -3239,7 +3239,7 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
/* Lock the VMA before expanding to prevent concurrent page faults */
vma_start_write(vma);
/* We update the anon VMA tree. */
- anon_vma_lock_write(vma->anon_vma);
+ anon_vma_tree_lock_write(vma->anon_vma);
/* Somebody else might have raced and expanded it already */
if (address < vma->vm_start) {
@@ -3266,7 +3266,7 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
}
}
}
- anon_vma_unlock_write(vma->anon_vma);
+ anon_vma_tree_unlock_write(vma->anon_vma);
vma_iter_free(&vmi);
validate_mm(mm);
return error;
diff --git a/mm/vma.h b/mm/vma.h
index 8e4b61a7304c..d3bd83299219 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -15,7 +15,7 @@ struct vma_prepare {
struct vm_area_struct *adj_next;
struct file *file;
struct address_space *mapping;
- struct anon_vma *anon_vma;
+ anon_vma_tree_t anon_vma;
struct vm_area_struct *insert;
struct vm_area_struct *remove;
struct vm_area_struct *remove2;
@@ -104,7 +104,7 @@ struct vma_merge_struct {
vma_flags_t vma_flags;
};
struct file *file;
- struct anon_vma *anon_vma;
+ anon_vma_tree_t anon_vma;
struct mempolicy *policy;
struct vm_userfaultfd_ctx uffd_ctx;
struct anon_vma_name *anon_name;
--
2.17.1
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 05/15] mm: add CONFIG_ANON_VMA_LAZY and folio helpers
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (3 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 04/15] mm: switch to anon_vma_tree_t APIs in preparation for ANON_VMA_LAZY tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 06/15] mm: add CONFIG_VMA_REF and VMA helpers tao
` (14 subsequent siblings)
19 siblings, 0 replies; 64+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Add the ANON_VMA_LAZY optimization foundation:
- CONFIG_ANON_VMA_LAZY Kconfig option
- FOLIO_MAPPING_ANON_VMA_LAZY flag for folio->mapping
- add a runtime switch for ANON_VMA_LAZY
This feature delays anon_vma allocation until fork, reducing memory
overhead for VMAs without children.
Signed-off-by: tao <tao.wangtao@honor.com>
---
include/linux/page-flags.h | 23 +++++++++++
mm/Kconfig | 14 +++++++
mm/internal.h | 16 ++++++++
mm/mmap.c | 9 ++++
mm/rmap.c | 84 ++++++++++++++++++++++++++++++++++++++
5 files changed, 146 insertions(+)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 0e03d816e8b9..c0cc43118877 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -696,6 +696,12 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
* the FOLIO_MAPPING_ANON_KSM bit may be set along with the FOLIO_MAPPING_ANON
* bit; and then folio->mapping points, not to an anon_vma, but to a private
* structure which KSM associates with that merged folio. See ksm.h.
+ *
+ * If CONFIG_ANON_VMA_LAZY is enabled, the FOLIO_MAPPING_ANON_KSM bit is used
+ * for the ANON_VMA_LAZY optimization. In this case, folio->mapping points to
+ * the ANON_VMA_LAZY root VMA instead of anon_vma. The folio_test_anon()
+ * check also needs to be updated accordingly.
+
*
* Please note that, confusingly, "folio_mapping" refers to the inode
* address_space which maps the folio from disk; whereas "folio_mapped"
@@ -711,11 +717,16 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
#define FOLIO_MAPPING_ANON 0x1
#define FOLIO_MAPPING_ANON_KSM 0x2
#define FOLIO_MAPPING_KSM (FOLIO_MAPPING_ANON | FOLIO_MAPPING_ANON_KSM)
+#define FOLIO_MAPPING_ANON_VMA_LAZY FOLIO_MAPPING_ANON_KSM
#define FOLIO_MAPPING_FLAGS (FOLIO_MAPPING_ANON | FOLIO_MAPPING_ANON_KSM)
static __always_inline bool folio_test_anon(const struct folio *folio)
{
+#ifdef CONFIG_ANON_VMA_LAZY
+ return ((unsigned long)folio->mapping & FOLIO_MAPPING_FLAGS) != 0;
+#else
return ((unsigned long)folio->mapping & FOLIO_MAPPING_ANON) != 0;
+#endif
}
static __always_inline bool folio_test_lazyfree(const struct folio *folio)
@@ -734,6 +745,18 @@ static __always_inline bool PageAnon(const struct page *page)
{
return folio_test_anon(page_folio(page));
}
+
+static inline bool folio_test_anon_vma_lazy(const struct folio *folio)
+{
+#ifdef CONFIG_ANON_VMA_LAZY
+ unsigned long flags = (unsigned long)folio->mapping;
+
+ return (flags & FOLIO_MAPPING_FLAGS) == FOLIO_MAPPING_ANON_VMA_LAZY;
+#else
+ return false;
+#endif
+}
+
#ifdef CONFIG_KSM
/*
* A KSM page is one of those write-protected "shared pages" or "merged pages"
diff --git a/mm/Kconfig b/mm/Kconfig
index e8bf1e9e6ad9..c16b5d9b3ce9 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1412,6 +1412,20 @@ config LOCK_MM_AND_FIND_VMA
bool
depends on !STACK_GROWSUP
+config ARCH_SUPPORTS_ANON_VMA_LAZY
+ def_bool n
+
+config ANON_VMA_LAZY
+ bool "Lazy allocation of anon_vma"
+ def_bool y
+ depends on ARCH_SUPPORTS_ANON_VMA_LAZY && MMU
+ help
+ For anonymous VMAs without children, avoid allocating anon_vma
+ and anon_vma_chain to reduce memory overhead.
+
+ Say Y to enable this optimization for anonymous VMAs without
+ children.
+
config IOMMU_MM_DATA
bool
diff --git a/mm/internal.h b/mm/internal.h
index 3dbbd118a78c..639f9c287f4c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -248,6 +248,22 @@ static inline void anon_vma_unlock_read(struct anon_vma *anon_vma)
/* anon_vma_tree_t APIs */
+/* Encoded anon_vma tree type. Must fit within ANON_VMA_TREE_BITS. */
+#define ANON_VMA_TREE_REGULAR 0 /* regular anon_vma */
+#define ANON_VMA_TREE_VMA 1
+#define ANON_VMA_TREE_PARENT 2
+#define ANON_VMA_TREE_INVALID 3 /* reserved */
+
+#define ANON_VMA_TREE_BITS 2
+#define ANON_VMA_TREE_MASK ((1UL << ANON_VMA_TREE_BITS) - 1)
+
+#ifdef CONFIG_ANON_VMA_LAZY
+extern bool anon_vma_lazy_enable;
+static inline bool anon_vma_lazy_enabled(void) { return anon_vma_lazy_enable; }
+#else
+static inline bool anon_vma_lazy_enabled(void) { return false; }
+#endif
+
static inline anon_vma_tree_t make_anon_vma_tree(struct anon_vma *anon_vma)
{
return (anon_vma_tree_t)anon_vma;
diff --git a/mm/mmap.c b/mm/mmap.c
index eac1fb3823eb..2ae733eb39f0 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1558,6 +1558,15 @@ static const struct ctl_table mmap_table[] = {
.extra2 = (void *)&mmap_rnd_compat_bits_max,
},
#endif
+#ifdef CONFIG_ANON_VMA_LAZY
+ {
+ .procname = "anon_vma_lazy",
+ .data = &anon_vma_lazy_enable,
+ .maxlen = sizeof(anon_vma_lazy_enable),
+ .mode = 0600,
+ .proc_handler = proc_dobool,
+ },
+#endif
};
#endif /* CONFIG_SYSCTL */
diff --git a/mm/rmap.c b/mm/rmap.c
index 5c4eb090c801..48c4463d8b2c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -87,6 +87,90 @@
static struct kmem_cache *anon_vma_cachep;
static struct kmem_cache *anon_vma_chain_cachep;
+#ifdef CONFIG_ANON_VMA_LAZY
+/*
+ * ANON_VMA_LAZY: defer anon_vma allocation until fork().
+ *
+ * anon_vma and anon_vma_chain exist mainly to support reverse mapping
+ * across multiple processes. For VMAs that belong to a single process,
+ * eagerly creating anon_vma introduces unnecessary memory and setup
+ * overhead.
+ *
+ * This optimization delays anon_vma creation until fork(). Before that
+ * the VMA stays in a lazy state and no anon_vma or anon_vma_chain
+ * topology is created.
+ *
+ * vma->anon_vma encodes the anonymous VMA state. Low bits of the pointer
+ * distinguish lazy states:
+ *
+ * NULL
+ * VMA has no anonymous or CoW pages.
+ *
+ * regular anon_vma
+ * Standard anon_vma with anon_vma_chain topology.
+ *
+ * anon_vma_lazy_root | ANON_VMA_TREE_VMA
+ * Lazy root for the VMA that first faults anonymous pages.
+ * No anon_vma or anon_vma_chain topology exists.
+ *
+ * parent_anon_vma | ANON_VMA_TREE_PARENT
+ * Lazy state for VMAs created during fork(). The lazy parent_anon_vma
+ * refers to the anon_vma of the parent VMA.
+ *
+ * Anonymous folios extend folio->mapping with FOLIO_MAPPING_ANON_VMA_LAZY:
+ *
+ * anon_vma | FOLIO_MAPPING_ANON
+ * regular anonymous mapping
+ *
+ * anon_vma_lazy_root | FOLIO_MAPPING_ANON_VMA_LAZY
+ * lazy anonymous mapping
+ *
+ * In typical workloads most VMAs remain in ANON_VMA_TREE_VMA state.
+ * These VMAs have no anon_vma, no anon_vma_chain and only a single VMA.
+ * Reverse mapping can therefore be performed without anon_vma locking,
+ * providing a faster rmap path for the common case.
+ *
+ * During fork(), VMAs in ANON_VMA_TREE_VMA are upgraded to regular
+ * anon_vma in the parent to establish sharing topology. Child VMAs are
+ * created as ANON_VMA_TREE_PARENT and do not allocate anon_vma,
+ * avoiding additional fork overhead.
+ *
+ * Folio mapping rules:
+ *
+ * Lazy anonymous folios store the lazy root in folio->mapping using
+ * FOLIO_MAPPING_ANON_VMA_LAZY. This allows rmap walkers to resolve the
+ * owning VMA without requiring anon_vma topology.
+ *
+ * folio->mapping may be updated during fork() when lazy VMAs are
+ * upgraded to regular anon_vma. dup_anon_rmap() in copy_page_range()
+ * performs the upgrade and installs the new anon_vma mapping.
+ *
+ * folio_move_anon_rmap() updates folio->mapping when anonymous folios
+ * move between VMAs.
+ *
+ * As with regular anonymous memory, __folio_remove_rmap() does not
+ * clear folio->mapping. Rmap walkers validate mappings using
+ * folio_mapped().
+ *
+ * VMA split keeps vma->anon_vma unchanged. The lazy root holds an extra
+ * reference so folio->mapping remains valid without scanning folios.
+ *
+ * Internal helpers:
+ *
+ * anon_vma_link_t
+ * The value encodes a reference to anon_vma topology. Low bits
+ * are used as type tags to distinguish different anon_vma
+ * implementations (e.g. regular anon_vma or anon_vma_lazy).
+ *
+ * anon_rmap_t
+ * anon_rmap_t wraps the tagged pointer used by the rmap code and
+ * provides a type-safe interface for reverse mapping operations,
+ * covering both regular anon_vma and lazy anon_vma mappings.
+ */
+
+bool anon_vma_lazy_enable;
+#endif
+
static inline struct anon_vma *anon_vma_alloc(void)
{
struct anon_vma *anon_vma;
--
2.17.1
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 06/15] mm: add CONFIG_VMA_REF and VMA helpers
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (4 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 05/15] mm: add CONFIG_ANON_VMA_LAZY and folio helpers tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 07/15] mm: replace direct FOLIO_MAPPING_ANON usage with helpers tao
` (13 subsequent siblings)
19 siblings, 0 replies; 64+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
rcuref only manages the lifetime of a VMA and does not track its state.
Prepare for the upcoming ANON_VMA_LAZY support.
Signed-off-by: tao <tao.wangtao@honor.com>
---
include/linux/mm.h | 38 ++++++++++++++++++++++++++++++++++++++
include/linux/mm_types.h | 4 ++++
mm/Kconfig | 8 ++++++++
mm/debug_vm_pgtable.c | 2 +-
mm/mmap.c | 4 ++--
mm/vma.c | 12 ++++++------
mm/vma_exec.c | 2 +-
mm/vma_init.c | 1 +
8 files changed, 61 insertions(+), 10 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index af23453e9dbd..e98bdb414e43 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -918,6 +918,43 @@ static inline void assert_fault_locked(const struct vm_fault *vmf)
}
#endif /* CONFIG_PER_VMA_LOCK */
+#ifdef CONFIG_VMA_REF
+static inline void vma_rcuref_init(struct vm_area_struct *vma)
+{
+ rcuref_init(&vma->vm_rcuref, 1);
+}
+
+static inline struct vm_area_struct *vma_get(struct vm_area_struct *vma)
+{
+ if (rcuref_get(&vma->vm_rcuref))
+ return vma;
+ return NULL;
+}
+
+static inline bool vma_put(struct vm_area_struct *vma)
+{
+ bool release = rcuref_put(&vma->vm_rcuref);
+
+ if (unlikely(release))
+ vm_area_free(vma);
+ return release;
+}
+#else
+static inline void vma_rcuref_init(struct vm_area_struct *vma) {}
+
+static inline struct vm_area_struct *vma_get(struct vm_area_struct *vma)
+{
+ VM_WARN_ON_ONCE(true); /* not allowed */
+ return NULL;
+}
+
+static inline bool vma_put(struct vm_area_struct *vma)
+{
+ vm_area_free(vma);
+ return true;
+}
+#endif /* CONFIG_VMA_REF */
+
static inline bool mm_flags_test(int flag, const struct mm_struct *mm)
{
return test_bit(flag, ACCESS_PRIVATE(&mm->flags, __mm_flags));
@@ -957,6 +994,7 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
vma->vm_ops = &vma_dummy_vm_ops;
INIT_LIST_HEAD(&vma->anon_vma_chain);
vma_lock_init(vma, false);
+ vma_rcuref_init(vma);
}
/* Use when VMA is not part of the VMA tree and needs no locking */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index e7f5debac98e..a2bf17a42b55 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -6,6 +6,7 @@
#include <linux/auxvec.h>
#include <linux/kref.h>
+#include <linux/rcuref.h>
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/rbtree.h>
@@ -978,6 +979,9 @@ struct vm_area_struct {
* slowpath.
*/
unsigned int vm_lock_seq;
+#endif
+#ifdef CONFIG_ANON_VMA_LAZY
+ rcuref_t vm_rcuref; /* Ensures the VMA stays valid. */
#endif
/*
* A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
diff --git a/mm/Kconfig b/mm/Kconfig
index c16b5d9b3ce9..c039ce583924 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1419,13 +1419,21 @@ config ANON_VMA_LAZY
bool "Lazy allocation of anon_vma"
def_bool y
depends on ARCH_SUPPORTS_ANON_VMA_LAZY && MMU
+ select VMA_REF
help
For anonymous VMAs without children, avoid allocating anon_vma
and anon_vma_chain to reduce memory overhead.
+ ANON_VMA_LAZY records the VMA in folio->mapping, while VMA_REF
+ ensures that the recorded VMA remains valid.
+
Say Y to enable this optimization for anonymous VMAs without
children.
+config VMA_REF
+ def_bool n
+ depends on MMU
+
config IOMMU_MM_DATA
bool
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 23dc3ee09561..cab8a4e71243 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -1036,7 +1036,7 @@ static void __init destroy_args(struct pgtable_debug_args *args)
/* Free vma and mm struct */
if (args->vma)
- vm_area_free(args->vma);
+ vma_put(args->vma);
if (args->mm)
mmput(args->mm);
diff --git a/mm/mmap.c b/mm/mmap.c
index 2ae733eb39f0..ccedebc87cd5 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1481,7 +1481,7 @@ static struct vm_area_struct *__install_special_mapping(
return vma;
out:
- vm_area_free(vma);
+ vma_put(vma);
return ERR_PTR(ret);
}
@@ -1922,7 +1922,7 @@ __latent_entropy int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
fail_nomem_anon_vma_fork:
mpol_put(vma_policy(tmp));
fail_nomem_policy:
- vm_area_free(tmp);
+ vma_put(tmp);
fail_nomem:
retval = -ENOMEM;
vm_unacct_memory(charge);
diff --git a/mm/vma.c b/mm/vma.c
index 3501617085b0..ed15968a5891 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -392,7 +392,7 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
mpol_put(vma_policy(vp->remove));
if (!vp->remove2)
WARN_ON_ONCE(vp->vma->vm_end < vp->remove->vm_end);
- vm_area_free(vp->remove);
+ vma_put(vp->remove);
/*
* In mprotect's case 6 (see comments on vma_merge),
@@ -470,7 +470,7 @@ void remove_vma(struct vm_area_struct *vma)
if (vma->vm_file)
fput(vma->vm_file);
mpol_put(vma_policy(vma));
- vm_area_free(vma);
+ vma_put(vma);
}
/*
@@ -582,7 +582,7 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
out_free_vmi:
vma_iter_free(vmi);
out_free_vma:
- vm_area_free(new);
+ vma_put(new);
return err;
}
@@ -1950,7 +1950,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
out_free_mempol:
mpol_put(vma_policy(new_vma));
out_free_vma:
- vm_area_free(new_vma);
+ vma_put(new_vma);
out:
return NULL;
}
@@ -2596,7 +2596,7 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap,
free_iter_vma:
vma_iter_free(vmi);
free_vma:
- vm_area_free(vma);
+ vma_put(vma);
return error;
}
@@ -2946,7 +2946,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
return 0;
mas_store_fail:
- vm_area_free(vma);
+ vma_put(vma);
unacct_fail:
vm_unacct_memory(len >> PAGE_SHIFT);
return -ENOMEM;
diff --git a/mm/vma_exec.c b/mm/vma_exec.c
index 5cee8b7efa0f..e7f388010488 100644
--- a/mm/vma_exec.c
+++ b/mm/vma_exec.c
@@ -160,6 +160,6 @@ int create_init_stack_vma(struct mm_struct *mm, struct vm_area_struct **vmap,
mmap_write_unlock(mm);
err_free:
*vmap = NULL;
- vm_area_free(vma);
+ vma_put(vma);
return err;
}
diff --git a/mm/vma_init.c b/mm/vma_init.c
index 3c0b65950510..1300d813d61b 100644
--- a/mm/vma_init.c
+++ b/mm/vma_init.c
@@ -137,6 +137,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
INIT_LIST_HEAD(&new->anon_vma_chain);
vma_numab_state_init(new);
dup_anon_vma_name(orig, new);
+ vma_rcuref_init(new);
return new;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 07/15] mm: replace direct FOLIO_MAPPING_ANON usage with helpers
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (5 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 06/15] mm: add CONFIG_VMA_REF and VMA helpers tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 08/15] mm: prepare rmap infrastructure for ANON_VMA_LAZY tao
` (12 subsequent siblings)
19 siblings, 0 replies; 64+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Replace direct uses of FOLIO_MAPPING_ANON in external modules with
helper functions in preparation for ANON_VMA_LAZY.
Signed-off-by: tao <tao.wangtao@honor.com>
---
fs/proc/page.c | 6 ++----
include/linux/page-flags.h | 15 ++++++++++++---
include/linux/pagemap.h | 2 +-
mm/gup.c | 6 ++----
4 files changed, 17 insertions(+), 12 deletions(-)
diff --git a/fs/proc/page.c b/fs/proc/page.c
index f9b2c2c906cd..93ddfda9fa1d 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -148,7 +148,6 @@ u64 stable_page_flags(const struct page *page)
const struct folio *folio;
struct page_snapshot ps;
unsigned long k;
- unsigned long mapping;
bool is_anon;
u64 u = 0;
@@ -163,8 +162,7 @@ u64 stable_page_flags(const struct page *page)
folio = &ps.folio_snapshot;
k = folio->flags.f;
- mapping = (unsigned long)folio->mapping;
- is_anon = mapping & FOLIO_MAPPING_ANON;
+ is_anon = folio_test_anon(folio);
/*
* pseudo flags for the well known (anonymous) memory mapped pages
@@ -173,7 +171,7 @@ u64 stable_page_flags(const struct page *page)
u |= 1 << KPF_MMAP;
if (is_anon) {
u |= 1 << KPF_ANON;
- if (mapping & FOLIO_MAPPING_KSM)
+ if (!PageAnonNotKsm(page))
u |= 1 << KPF_KSM;
}
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index c0cc43118877..50c80a1e2c7c 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -720,15 +720,20 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
#define FOLIO_MAPPING_ANON_VMA_LAZY FOLIO_MAPPING_ANON_KSM
#define FOLIO_MAPPING_FLAGS (FOLIO_MAPPING_ANON | FOLIO_MAPPING_ANON_KSM)
-static __always_inline bool folio_test_anon(const struct folio *folio)
+static __always_inline bool mapping_is_anon(unsigned long mapping)
{
#ifdef CONFIG_ANON_VMA_LAZY
- return ((unsigned long)folio->mapping & FOLIO_MAPPING_FLAGS) != 0;
+ return (mapping & FOLIO_MAPPING_FLAGS) != 0;
#else
- return ((unsigned long)folio->mapping & FOLIO_MAPPING_ANON) != 0;
+ return (mapping & FOLIO_MAPPING_ANON) != 0;
#endif
}
+static __always_inline bool folio_test_anon(const struct folio *folio)
+{
+ return mapping_is_anon((unsigned long)folio->mapping);
+}
+
static __always_inline bool folio_test_lazyfree(const struct folio *folio)
{
return folio_test_anon(folio) && !folio_test_swapbacked(folio);
@@ -738,7 +743,11 @@ static __always_inline bool PageAnonNotKsm(const struct page *page)
{
unsigned long flags = (unsigned long)page_folio(page)->mapping;
+#ifdef CONFIG_ANON_VMA_LAZY
+ return (flags & FOLIO_MAPPING_FLAGS) != FOLIO_MAPPING_KSM;
+#else
return (flags & FOLIO_MAPPING_FLAGS) == FOLIO_MAPPING_ANON;
+#endif
}
static __always_inline bool PageAnon(const struct page *page)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 31a848485ad9..746939872ac4 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -507,7 +507,7 @@ static inline pgoff_t mapping_align_index(const struct address_space *mapping,
static inline bool mapping_large_folio_support(const struct address_space *mapping)
{
/* AS_FOLIO_ORDER is only reasonable for pagecache folios */
- VM_WARN_ONCE((unsigned long)mapping & FOLIO_MAPPING_ANON,
+ VM_WARN_ONCE(mapping_is_anon((unsigned long)mapping),
"Anonymous mapping always supports large folio");
return mapping_max_folio_order(mapping) > 0;
diff --git a/mm/gup.c b/mm/gup.c
index ad9ded39609c..69dda325b082 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2740,7 +2740,6 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
bool reject_file_backed = false;
struct address_space *mapping;
bool check_secretmem = false;
- unsigned long mapping_flags;
/*
* If we aren't pinning then no problematic write can occur. A long term
@@ -2792,9 +2791,8 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
return false;
/* Anonymous folios pose no problem. */
- mapping_flags = (unsigned long)mapping & FOLIO_MAPPING_FLAGS;
- if (mapping_flags)
- return mapping_flags & FOLIO_MAPPING_ANON;
+ if (mapping_is_anon((unsigned long)mapping))
+ return true;
/*
* At this point, we know the mapping is non-null and points to an
--
2.17.1
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 08/15] mm: prepare rmap infrastructure for ANON_VMA_LAZY
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (6 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 07/15] mm: replace direct FOLIO_MAPPING_ANON usage with helpers tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 09/15] mm: implement ANON_VMA_LAZY rmap semantics tao
` (11 subsequent siblings)
19 siblings, 0 replies; 64+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Introduce ANON_VMA_LAZY helpers and prepare the anon_rmap and
anon_vma_tree infrastructure for the upcoming ANON_VMA_LAZY feature.
Implement the core ANON_VMA_LAZY rmap semantics by updating
anon_rmap_trylock_read(), anon_rmap_lock_read(), anon_rmap_unlock_read(),
and anon_rmap_for_each_vma().
Also update __migrate_folio_record(): instead of storing both
old_page_state and anon_vma in dst->private, store old_page_state in
dst->private and use dst->mapping to hold anon_rmap.
Split folio_lock_anon_rmap_read() and related functions into the next
patch to keep this change small and easier to review.
Signed-off-by: tao <tao.wangtao@honor.com>
---
include/linux/rmap.h | 53 +++++++++++++++++++++---
mm/internal.h | 99 +++++++++++++++++++++++++++++++++++++-------
mm/migrate.c | 11 ++++-
mm/rmap.c | 42 +++++++++++++++++++
4 files changed, 183 insertions(+), 22 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 9802bce92695..ebe9f3f61170 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -938,15 +938,23 @@ void remove_migration_ptes(struct folio *src, struct folio *dst,
enum ttu_flags flags);
/* Reverse mapping handle for anonymous folio rmap helpers. */
+enum anon_rmap_type {
+ ANON_RMAP_ANON_VMA = 0,
+ ANON_RMAP_ANON_VMA_LAZY = 1,
+};
+#define ANON_RMAP_TYPE_BITS 1
+#define ANON_RMAP_TYPE_MASK ((1UL << ANON_RMAP_TYPE_BITS) - 1)
+
typedef struct anon_rmap {
unsigned long rmap;
} anon_rmap_t;
-#define ANON_RMAP_NULL make_anon_rmap(0)
+#define ANON_RMAP_NULL (make_anon_rmap(0, ANON_RMAP_ANON_VMA))
-static inline anon_rmap_t make_anon_rmap(const void *anon_mapping)
+static inline anon_rmap_t make_anon_rmap(const void *anon_mapping,
+ enum anon_rmap_type type)
{
- return (anon_rmap_t){ .rmap = (unsigned long)anon_mapping, };
+ return (anon_rmap_t){ .rmap = (unsigned long)anon_mapping + type, };
}
static inline unsigned long anon_rmap_value(anon_rmap_t anon_rmap)
@@ -956,14 +964,38 @@ static inline unsigned long anon_rmap_value(anon_rmap_t anon_rmap)
static inline anon_rmap_t anon_vma_to_anon_rmap(const struct anon_vma *anon_vma)
{
- return make_anon_rmap(anon_vma);
+ return make_anon_rmap(anon_vma, ANON_RMAP_ANON_VMA);
}
static inline struct anon_vma *anon_rmap_to_anon_vma(anon_rmap_t anon_rmap)
{
unsigned long rmap = anon_rmap_value(anon_rmap);
- return (struct anon_vma *)rmap;
+ return (struct anon_vma *)(rmap - ANON_RMAP_ANON_VMA);
+}
+
+static inline anon_rmap_t vma_to_anon_rmap(const struct vm_area_struct *vma)
+{
+ return make_anon_rmap(vma, ANON_RMAP_ANON_VMA_LAZY);
+}
+
+static inline struct vm_area_struct *anon_rmap_to_vma(anon_rmap_t anon_rmap)
+{
+ unsigned long rmap = anon_rmap_value(anon_rmap);
+
+ VM_BUG_ON((rmap & ANON_RMAP_TYPE_MASK) != ANON_RMAP_ANON_VMA_LAZY);
+ return (struct vm_area_struct *)(rmap - ANON_RMAP_ANON_VMA_LAZY);
+}
+
+static inline bool anon_rmap_is_anon_vma(anon_rmap_t anon_rmap)
+{
+#ifdef CONFIG_ANON_VMA_LAZY
+ unsigned long rmap = anon_rmap_value(anon_rmap);
+
+ return (rmap & ANON_RMAP_TYPE_MASK) == ANON_RMAP_ANON_VMA;
+#else
+ return true;
+#endif
}
anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *vma);
@@ -1015,8 +1047,17 @@ static inline struct vm_area_struct *anon_rmap_iter_first_vma(
anon_rmap_t anon_rmap, unsigned long start, unsigned long last,
struct anon_vma_chain **avc)
{
- struct anon_vma *anon_vma = anon_rmap_to_anon_vma(anon_rmap);
+ struct anon_vma *anon_vma;
+
+ *avc = NULL;
+ if (!anon_rmap_is_anon_vma(anon_rmap)) {
+ struct vm_area_struct *vma = anon_rmap_to_vma(anon_rmap);
+ if (vma->vm_pgoff + vma_pages(vma) < start || vma->vm_pgoff > last)
+ return NULL; /* No overlap in the VMA range. */
+ return vma;
+ } else
+ anon_vma = anon_rmap_to_anon_vma(anon_rmap);
*avc = anon_vma_interval_tree_iter_first(&anon_vma->rb_root, start, last);
return *avc ? (*avc)->vma : NULL;
}
diff --git a/mm/internal.h b/mm/internal.h
index 639f9c287f4c..6b703646f66d 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -260,76 +260,147 @@ static inline void anon_vma_unlock_read(struct anon_vma *anon_vma)
#ifdef CONFIG_ANON_VMA_LAZY
extern bool anon_vma_lazy_enable;
static inline bool anon_vma_lazy_enabled(void) { return anon_vma_lazy_enable; }
-#else
-static inline bool anon_vma_lazy_enabled(void) { return false; }
-#endif
-static inline anon_vma_tree_t make_anon_vma_tree(struct anon_vma *anon_vma)
+static inline int anon_vma_tree_type(anon_vma_tree_t anon_tree)
{
- return (anon_vma_tree_t)anon_vma;
+ VM_WARN_ON(((unsigned long)anon_tree & ANON_VMA_TREE_MASK) ==
+ ANON_VMA_TREE_INVALID);
+ return (unsigned long)anon_tree & ANON_VMA_TREE_MASK;
+}
+
+static inline bool anon_vma_tree_is_vma(anon_vma_tree_t anon_tree)
+{
+ return anon_vma_tree_type(anon_tree) == ANON_VMA_TREE_VMA;
+}
+
+static inline bool anon_vma_tree_is_parent(anon_vma_tree_t anon_tree)
+{
+ return anon_vma_tree_type(anon_tree) == ANON_VMA_TREE_PARENT;
+}
+
+static inline struct vm_area_struct *anon_vma_tree_vma(anon_vma_tree_t anon_tree)
+{
+ BUILD_BUG_ON(__alignof__(struct vm_area_struct) <= ANON_VMA_TREE_MASK);
+ if (!anon_vma_tree_is_vma(anon_tree))
+ return NULL;
+ return (struct vm_area_struct *)(
+ (unsigned long)anon_tree & ~ANON_VMA_TREE_MASK);
}
static inline struct anon_vma *anon_vma_tree_anon_vma(anon_vma_tree_t anon_tree)
{
- return (struct anon_vma *)anon_tree;
+ BUILD_BUG_ON(__alignof__(struct anon_vma) <= ANON_VMA_TREE_MASK);
+ if (anon_vma_tree_is_vma(anon_tree))
+ return NULL;
+ return (struct anon_vma *)((unsigned long)anon_tree & ~ANON_VMA_TREE_MASK);
+}
+
+#else
+static inline bool anon_vma_lazy_enabled(void) { return false; }
+static inline int anon_vma_tree_type(anon_vma_tree_t anon_tree) { return 0; }
+static inline bool anon_vma_tree_is_vma(anon_vma_tree_t anon_tree) { return false; }
+static inline bool anon_vma_tree_is_parent(
+ anon_vma_tree_t anon_tree) { return false; }
+static inline struct vm_area_struct *anon_vma_tree_vma(
+ anon_vma_tree_t anon_tree) { return NULL; }
+static inline struct anon_vma *anon_vma_tree_anon_vma(
+ anon_vma_tree_t anon_tree) { return (struct anon_vma *)anon_tree; }
+#endif
+
+static inline anon_vma_tree_t make_anon_vma_tree(const struct anon_vma *anon_vma)
+{
+ return (anon_vma_tree_t)anon_vma;
}
/* Store anon_vma in vma->anon_vma using a tagged pointer. */
static inline void vma_set_anon_vma(struct vm_area_struct *vma,
- struct anon_vma *anon_vma)
+ const struct anon_vma *anon_vma)
{
vma->anon_vma = (anon_vma_tree_t)anon_vma;
}
-/* Return the VMA's anon_vma. */
+/* Return the VMA's anon_vma, or NULL if it is marked lazy. */
static inline struct anon_vma *vma_anon_vma(const struct vm_area_struct *vma)
{
/* Use READ_ONCE() for reusable_anon_vma */
anon_vma_tree_t anon_tree = READ_ONCE(vma->anon_vma);
+ if (anon_vma_tree_type(anon_tree) != ANON_VMA_TREE_REGULAR)
+ return NULL;
return anon_vma_tree_anon_vma(anon_tree);
}
+static inline bool vma_is_anon_vma_lazy(const struct vm_area_struct *vma)
+{
+ return anon_vma_tree_type((anon_vma_tree_t)vma->anon_vma);
+}
+
+static inline const struct vm_area_struct *vma_anon_vma_lazy_root(
+ const struct vm_area_struct *vma)
+{
+ anon_vma_tree_t anon_tree = (anon_vma_tree_t)vma->anon_vma;
+ int lazy_type = anon_vma_tree_type(anon_tree);
+
+ if (!lazy_type)
+ return NULL;
+ if (anon_vma_tree_is_parent(anon_tree))
+ return vma;
+ return anon_vma_tree_vma(anon_tree);
+}
+
+static inline bool vma_is_anon_vma_lazy_root(const struct vm_area_struct *vma)
+{
+ return vma == vma_anon_vma_lazy_root(vma);
+}
+
+/*
+ * ANON_VMA_TREE_VMA is just a VMA, without anon_vma or anon_vma_chain,
+ * so no protection is needed.
+ */
static inline void anon_vma_tree_lock_write(anon_vma_tree_t anon_tree)
{
struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
- anon_vma_lock_write(anon_vma);
+ if (anon_vma)
+ anon_vma_lock_write(anon_vma);
}
static inline int anon_vma_tree_trylock_write(anon_vma_tree_t anon_tree)
{
struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
- return anon_vma_trylock_write(anon_vma);
+ return anon_vma ? anon_vma_trylock_write(anon_vma) : 1;
}
static inline void anon_vma_tree_unlock_write(anon_vma_tree_t anon_tree)
{
struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
- anon_vma_unlock_write(anon_vma);
+ if (anon_vma)
+ anon_vma_unlock_write(anon_vma);
}
static inline void anon_vma_tree_lock_read(anon_vma_tree_t anon_tree)
{
struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
- anon_vma_lock_read(anon_vma);
+ if (anon_vma)
+ anon_vma_lock_read(anon_vma);
}
static inline int anon_vma_tree_trylock_read(anon_vma_tree_t anon_tree)
{
struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
- return anon_vma_trylock_read(anon_vma);
+ return anon_vma ? anon_vma_trylock_read(anon_vma) : 1;
}
static inline void anon_vma_tree_unlock_read(anon_vma_tree_t anon_tree)
{
struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
- anon_vma_unlock_read(anon_vma);
+ if (anon_vma)
+ anon_vma_unlock_read(anon_vma);
}
struct anon_vma *folio_get_anon_vma(const struct folio *folio);
diff --git a/mm/migrate.c b/mm/migrate.c
index 769983cf14e0..b397cdeab09a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1144,7 +1144,10 @@ static void __migrate_folio_record(struct folio *dst,
int old_page_state,
anon_rmap_t anon_rmap)
{
- dst->private = (void *)anon_rmap_to_anon_vma(anon_rmap) + old_page_state;
+ unsigned long rmap = anon_rmap_value(anon_rmap);
+
+ dst->private = (void *)(rmap & ~PAGE_OLD_STATES) + old_page_state;
+ dst->mapping = (struct address_space *)rmap;
}
static void __migrate_folio_extract(struct folio *dst,
@@ -1152,8 +1155,12 @@ static void __migrate_folio_extract(struct folio *dst,
anon_rmap_t *anon_rmapp)
{
unsigned long private = (unsigned long)dst->private;
+ unsigned long mapping = (unsigned long)dst->mapping;
- *anon_rmapp = anon_vma_to_anon_rmap((void *)(private & ~PAGE_OLD_STATES));
+ VM_BUG_ON((private & ~PAGE_OLD_STATES) != (mapping & ~ANON_RMAP_TYPE_MASK));
+ *anon_rmapp = make_anon_rmap((void *)(mapping & ~ANON_RMAP_TYPE_MASK),
+ mapping & ANON_RMAP_TYPE_MASK);
+ dst->mapping = NULL;
*old_page_state = private & PAGE_OLD_STATES;
dst->private = NULL;
}
diff --git a/mm/rmap.c b/mm/rmap.c
index 48c4463d8b2c..001c44570df8 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -794,42 +794,84 @@ anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *vma)
mmap_assert_locked(vma->vm_mm);
VM_BUG_ON(!vma->anon_vma);
+ if (!anon_vma) {
+ vma_get(vma);
+ return vma_to_anon_rmap(vma);
+ }
get_anon_vma(anon_vma);
return anon_vma_to_anon_rmap(anon_vma);
}
void put_anon_rmap(anon_rmap_t anon_rmap)
{
+ if (!anon_rmap_is_anon_vma(anon_rmap)) {
+ vma_put(anon_rmap_to_vma(anon_rmap));
+ return;
+ }
put_anon_vma(anon_rmap_to_anon_vma(anon_rmap));
}
+/*
+ * Rmap for anonymous pages normally only needs read protection.
+ * However, huge page splitting in huge_memory requires the rmap
+ * write lock to prevent concurrency, achieved by upgrading to a
+ * regular anon_vma.
+ */
void anon_rmap_lock_write(anon_rmap_t anon_rmap)
{
+ VM_BUG_ON(!anon_rmap_is_anon_vma(anon_rmap));
anon_vma_lock_write(anon_rmap_to_anon_vma(anon_rmap));
}
int anon_rmap_trylock_write(anon_rmap_t anon_rmap)
{
+ VM_BUG_ON(!anon_rmap_is_anon_vma(anon_rmap));
return anon_vma_trylock_write(anon_rmap_to_anon_vma(anon_rmap));
}
void anon_rmap_unlock_write(anon_rmap_t anon_rmap)
{
+ VM_BUG_ON(!anon_rmap_is_anon_vma(anon_rmap));
anon_vma_unlock_write(anon_rmap_to_anon_vma(anon_rmap));
}
+static void anon_vma_lazy_lock_read(struct vm_area_struct *vma)
+{
+ vma_get(vma);
+}
+
+static bool anon_vma_lazy_trylock_read(struct vm_area_struct *vma)
+{
+ return (bool)vma_get(vma);
+}
+
+static void anon_vma_lazy_unlock_read(struct vm_area_struct *vma)
+{
+ vma_put(vma);
+}
+
void anon_rmap_lock_read(anon_rmap_t anon_rmap)
{
+ if (!anon_rmap_is_anon_vma(anon_rmap)) {
+ anon_vma_lazy_lock_read(anon_rmap_to_vma(anon_rmap));
+ return;
+ }
anon_vma_lock_read(anon_rmap_to_anon_vma(anon_rmap));
}
int anon_rmap_trylock_read(anon_rmap_t anon_rmap)
{
+ if (!anon_rmap_is_anon_vma(anon_rmap))
+ return anon_vma_lazy_trylock_read(anon_rmap_to_vma(anon_rmap));
return anon_vma_trylock_read(anon_rmap_to_anon_vma(anon_rmap));
}
void anon_rmap_unlock_read(anon_rmap_t anon_rmap)
{
+ if (!anon_rmap_is_anon_vma(anon_rmap)) {
+ anon_vma_lazy_unlock_read(anon_rmap_to_vma(anon_rmap));
+ return;
+ }
anon_vma_unlock_read(anon_rmap_to_anon_vma(anon_rmap));
}
--
2.17.1
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 09/15] mm: implement ANON_VMA_LAZY rmap semantics
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (7 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 08/15] mm: prepare rmap infrastructure for ANON_VMA_LAZY tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 10/15] mm: defer anon_vma creation with ANON_VMA_LAZY tao
` (10 subsequent siblings)
19 siblings, 0 replies; 64+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Implement ANON_VMA_LAZY anon_rmap semantics by updating
folio_anon_rmap(), folio_maybe_same_anon_vma(), folio_get_anon_rmap(),
and folio_lock_anon_rmap_read().
ANON_VMA_LAZY VMAs resolve the target VMA via root_vma. As this path
does not involve anon_vma topology, vma_get() is sufficient to ensure
that the VMA still exists.
Signed-off-by: tao <tao.wangtao@honor.com>
---
mm/rmap.c | 126 +++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 120 insertions(+), 6 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index 001c44570df8..f70e3cb9812e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -875,9 +875,97 @@ void anon_rmap_unlock_read(anon_rmap_t anon_rmap)
anon_vma_unlock_read(anon_rmap_to_anon_vma(anon_rmap));
}
+static inline bool test_folio_unmapped(const struct folio *folio, bool test)
+{
+ return test && !folio_mapped(folio);
+}
+
+/*
+ * Must be called under rcu_read_lock().
+ *
+ * For FOLIO_MAPPING_ANON_VMA_LAZY, first obtain the VMA recorded in the
+ * lazy mapping and take a reference with vma_get() so its fields can be
+ * safely accessed. If the folio is no longer mapped in that VMA, resolve
+ * and look up the actual VMA covering the folio.
+ */
+static struct vm_area_struct *folio_resolve_anon_vma_lazy(
+ const struct folio *folio, bool tryget, bool test_map)
+{
+ struct vm_area_struct *vma, *anon_lazy_root;
+ struct mm_struct *mm;
+ unsigned long anon_mapping;
+ pgoff_t pgoff;
+ unsigned long addr;
+
+ anon_mapping = (unsigned long)READ_ONCE(folio->mapping);
+ if ((anon_mapping & FOLIO_MAPPING_FLAGS) != FOLIO_MAPPING_ANON_VMA_LAZY)
+ return NULL;
+ if (test_folio_unmapped(folio, test_map))
+ return NULL;
+
+ anon_lazy_root = vma = (struct vm_area_struct *)(anon_mapping -
+ FOLIO_MAPPING_ANON_VMA_LAZY);
+ mm = vma->vm_mm;
+ if (!mm || !vma->anon_vma || !vma_get(anon_lazy_root))
+ return NULL;
+ pgoff = folio->index;
+ if (vma_address(vma, pgoff, folio_nr_pages(folio)) == -EFAULT) {
+ addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ vma = vma_lookup(mm, addr);
+ if (vma && tryget && !vma_get(vma))
+ vma = NULL;
+ }
+ if (!tryget || anon_lazy_root != vma)
+ vma_put(anon_lazy_root);
+ if (test_folio_unmapped(folio, test_map) && vma) {
+ vma_put(vma);
+ vma = NULL;
+ }
+ return vma;
+}
+
+/* Like folio_get_anon_vma(), but for ANON_VMA_LAZY VMAs. */
+static struct vm_area_struct *folio_get_anon_vma_lazy(const struct folio *folio)
+{
+ struct vm_area_struct *vma = NULL;
+
+ rcu_read_lock();
+ vma = folio_resolve_anon_vma_lazy(folio, true, true);
+ rcu_read_unlock();
+ return vma;
+}
+
+/*
+ * For ANON_VMA_LAZY VMAs, similar to folio_get_anon_lazy_vma().
+ *
+ * These VMAs do not have an anon_vma or anon_vma_chain and correspond
+ * to only a single VMA. Therefore, reverse mapping can be performed
+ * without taking the anon_vma lock, providing a faster rmap path for
+ * this common case.
+ */
+static struct vm_area_struct *folio_lock_anon_vma_lazy_read(
+ const struct folio *folio, struct rmap_walk_control *rwc, bool test_map)
+{
+ struct vm_area_struct *vma = NULL;
+
+ rcu_read_lock();
+ vma = folio_resolve_anon_vma_lazy(folio, true, test_map);
+ rcu_read_unlock();
+ return vma;
+}
+
static anon_rmap_t folio_anon_rmap(const struct folio *folio)
{
struct anon_vma *anon_vma;
+ struct vm_area_struct *vma;
+
+ if (folio_test_anon_vma_lazy(folio)) {
+ rcu_read_lock();
+ vma = folio_resolve_anon_vma_lazy(folio, false, false);
+ rcu_read_unlock();
+ if (vma)
+ return vma_to_anon_rmap(vma);
+ }
anon_vma = folio_anon_vma(folio);
return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
@@ -887,29 +975,49 @@ bool folio_maybe_same_anon_vma(const struct folio *folio,
const struct vm_area_struct *vma)
{
struct anon_vma *anon_vma;
- struct anon_vma *tgt_anon_vma = vma_anon_vma(vma);
+ struct anon_vma *tgt_anon_vma = anon_vma_tree_anon_vma(vma->anon_vma);
bool same = false;
rcu_read_lock();
- anon_vma = folio_anon_vma(folio);
- if (anon_vma && tgt_anon_vma)
- same = anon_vma->root == tgt_anon_vma->root;
+ if (folio_test_anon_vma_lazy(folio)) {
+ same = vma == folio_resolve_anon_vma_lazy(folio, false, false);
+ } else {
+ anon_vma = folio_anon_vma(folio);
+ if (anon_vma && tgt_anon_vma)
+ same = anon_vma->root == tgt_anon_vma->root;
+ }
rcu_read_unlock();
return same;
}
anon_rmap_t folio_get_anon_rmap(const struct folio *folio)
{
- struct anon_vma *anon_vma = folio_get_anon_vma(folio);
+ struct anon_vma *anon_vma;
+ struct vm_area_struct *vma;
+
+ if (folio_test_anon_vma_lazy(folio)) {
+ vma = folio_get_anon_vma_lazy(folio);
+ if (vma)
+ return vma_to_anon_rmap(vma);
+ }
+ anon_vma = folio_get_anon_vma(folio);
return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
}
anon_rmap_t folio_lock_anon_rmap_read(const struct folio *folio,
struct rmap_walk_control *rwc)
{
- struct anon_vma *anon_vma = folio_lock_anon_vma_read(folio, rwc);
+ struct anon_vma *anon_vma;
+ struct vm_area_struct *vma;
+
+ if (folio_test_anon_vma_lazy(folio)) {
+ vma = folio_lock_anon_vma_lazy_read(folio, rwc, true);
+ if (vma)
+ return vma_to_anon_rmap(vma);
+ }
+ anon_vma = folio_lock_anon_vma_read(folio, rwc);
return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
}
@@ -3140,6 +3248,12 @@ static anon_rmap_t rmap_walk_anon_lock(const struct folio *folio,
* are holding mmap_lock. Users without mmap_lock are required to
* take a reference count to prevent the anon_vma disappearing
*/
+ if (folio_test_anon_vma_lazy(folio)) {
+ struct vm_area_struct *vma;
+
+ vma = folio_lock_anon_vma_lazy_read(folio, rwc, false);
+ return vma ? vma_to_anon_rmap(vma) : ANON_RMAP_NULL;
+ }
anon_vma = folio_anon_vma(folio);
if (!anon_vma)
return ANON_RMAP_NULL;
--
2.17.1
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 10/15] mm: defer anon_vma creation with ANON_VMA_LAZY
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (8 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 09/15] mm: implement ANON_VMA_LAZY rmap semantics tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 11/15] mm: handle ANON_VMA_LAZY in huge page operations tao
` (9 subsequent siblings)
19 siblings, 0 replies; 64+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Mark VMAs as ANON_VMA_LAZY and defer anon_vma creation until fork,
avoiding early allocation when it may not be needed and reducing
overhead.
During fork(), ANON_VMA_LAZY VMAs are first upgraded to a regular
anon_vma in the parent to establish the sharing topology. Child VMAs
are created as ANON_VMA_TREE_PARENT and do not allocate anon_vma,
avoiding additional fork overhead.
Signed-off-by: tao <tao.wangtao@honor.com>
---
mm/internal.h | 9 +++
mm/memory.c | 4 +
mm/rmap.c | 209 ++++++++++++++++++++++++++++++++++++++++++++++++--
mm/vma.c | 9 ++-
4 files changed, 222 insertions(+), 9 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index 6b703646f66d..0a36eba3f63c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -417,6 +417,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src,
enum vma_operation operation);
int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma);
int __anon_vma_prepare(struct vm_area_struct *vma);
+/* Called on first anon fault or from anon_vma_prepare(). */
+void vma_prepare_anon_vma_lazy(struct vm_area_struct *vma);
void unlink_anon_vmas(struct vm_area_struct *vma);
static inline int anon_vma_prepare(struct vm_area_struct *vma)
@@ -424,6 +426,13 @@ static inline int anon_vma_prepare(struct vm_area_struct *vma)
if (likely(vma->anon_vma))
return 0;
+#ifdef CONFIG_ANON_VMA_LAZY
+ if (anon_vma_lazy_enabled()) {
+ vma_prepare_anon_vma_lazy(vma);
+ return 0;
+ }
+#endif
+
return __anon_vma_prepare(vma);
}
diff --git a/mm/memory.c b/mm/memory.c
index c13b79987b26..8fd3877f69fb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3822,6 +3822,10 @@ vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf)
if (likely(vma->anon_vma))
return 0;
+ if (anon_vma_lazy_enabled()) {
+ vma_prepare_anon_vma_lazy(vma);
+ return 0;
+ }
if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
if (!mmap_read_trylock(vma->vm_mm))
return VM_FAULT_RETRY;
diff --git a/mm/rmap.c b/mm/rmap.c
index f70e3cb9812e..d9424f4eb6d0 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -240,9 +240,118 @@ static void anon_vma_chain_assign(struct vm_area_struct *vma,
list_add(&avc->same_vma, &vma->anon_vma_chain);
}
+#ifdef CONFIG_ANON_VMA_LAZY
+/* Called on first anon fault or from anon_vma_prepare(). */
+void vma_prepare_anon_vma_lazy(struct vm_area_struct *vma)
+{
+ struct mm_struct *mm = vma->vm_mm;
+
+ spin_lock(&mm->page_table_lock);
+ if (!vma->anon_vma) {
+ vma_get(vma);
+ vma->anon_vma = (anon_vma_tree_t)(
+ (unsigned long)vma + ANON_VMA_TREE_VMA);
+ }
+ spin_unlock(&mm->page_table_lock);
+}
+
+/*
+ * Link VMA to its root ANON_VMA_TREE_VMA. Root holds reference to prevent
+ * premature freeing while folios reference it via folio->mapping.
+ */
+static bool vma_link_anon_vma_lazy_root(struct vm_area_struct *vma,
+ struct vm_area_struct *src)
+{
+ struct mm_struct *mm = src->vm_mm;
+ struct vm_area_struct *root_vma;
+ bool ret = false;
+
+ VM_BUG_ON_VMA(vma->vm_mm != src->vm_mm, vma);
+ /* src may be upgraded concurrently */
+ spin_lock(&mm->page_table_lock);
+ root_vma = anon_vma_tree_vma(src->anon_vma);
+ if (root_vma) {
+ vma_get(root_vma);
+ vma->anon_vma = src->anon_vma;
+ ret = true;
+ } else {
+ vma_set_anon_vma(vma, NULL);
+ }
+ spin_unlock(&mm->page_table_lock);
+ return ret;
+}
+
+/* Link VMA to its ANON_VMA_TREE_PARENT .*/
+static void vma_link_anon_vma_lazy_parent(struct vm_area_struct *vma,
+ struct vm_area_struct *src)
+{
+ struct anon_vma *parent_anon_vma = vma_anon_vma(src);
+
+ vma_assert_write_locked(src);
+ VM_BUG_ON_VMA(vma->anon_vma, vma);
+ VM_BUG_ON_VMA(!parent_anon_vma, src);
+
+ get_anon_vma(parent_anon_vma);
+ vma->anon_vma = (anon_vma_tree_t)(
+ (unsigned long)parent_anon_vma + ANON_VMA_TREE_PARENT);
+}
+
+/* Unlink VMA from anon_vma, dropping root/parent reference. */
+static bool vma_unlink_anon_vma_lazy(struct vm_area_struct *vma,
+ anon_vma_tree_t new_anon_vma_tree)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ anon_vma_tree_t anon_tree_mutable = READ_ONCE(vma->anon_vma);
+ anon_vma_tree_t anon_tree;
+ bool is_lazy = true;
+ struct vm_area_struct *root_vma = NULL;
+ struct anon_vma *parent_anon_vma = NULL;
+
+ VM_BUG_ON_VMA(anon_vma_tree_type(new_anon_vma_tree), vma);
+
+ anon_vma_tree_lock_write(anon_tree_mutable);
+ spin_lock(&mm->page_table_lock);
+ anon_tree = vma->anon_vma;
+ if (anon_vma_tree_is_vma(anon_tree)) {
+ root_vma = anon_vma_tree_vma(anon_tree);
+ vma->anon_vma = new_anon_vma_tree;
+ } else if (anon_vma_tree_is_parent(anon_tree)) {
+ parent_anon_vma = anon_vma_tree_anon_vma(anon_tree);
+ vma->anon_vma = new_anon_vma_tree;
+ } else {
+ is_lazy = false;
+ }
+ spin_unlock(&mm->page_table_lock);
+ anon_vma_tree_unlock_write(anon_tree_mutable);
+ if (!is_lazy)
+ return false;
+
+ /* drop reference after unlock */
+ VM_BUG_ON_VMA(!parent_anon_vma && !root_vma, vma);
+ if (parent_anon_vma) {
+ /* There must be nodes; it cannot be the last reference. */
+ VM_BUG_ON(RB_EMPTY_ROOT(&parent_anon_vma->rb_root.rb_root));
+ put_anon_vma(parent_anon_vma);
+ }
+ if (root_vma)
+ vma_put(root_vma);
+ return is_lazy;
+}
+#else
+static inline bool vma_link_anon_vma_lazy_root(struct vm_area_struct *vma,
+ struct vm_area_struct *src) { return false; }
+static void vma_link_anon_vma_lazy_parent(struct vm_area_struct *vma,
+ struct vm_area_struct *src) {}
+static inline bool vma_unlink_anon_vma_lazy(struct vm_area_struct *vma,
+ anon_vma_tree_t new_anon_vma_tree) { return false; }
+#endif
+
/**
- * __anon_vma_prepare - attach an anon_vma to a memory region
+ * vma_prepare_anon_vma - attach an anon_vma to a memory region
* @vma: the memory region in question
+ * @upgrade_lazy: true when upgrading a lazy VMA to a regular anon_vma.
+ * @parent_anon_vma: non-NULL if the VMA is inherited from its parent,
+ * otherwise NULL.
*
* This makes sure the memory mapping described by 'vma' has
* an 'anon_vma' attached to it, so that we can associate the
@@ -266,12 +375,14 @@ static void anon_vma_chain_assign(struct vm_area_struct *vma,
* to do any locking for the common case of already having
* an anon_vma.
*/
-int __anon_vma_prepare(struct vm_area_struct *vma)
+static int vma_prepare_anon_vma(struct vm_area_struct *vma, bool upgrade_lazy,
+ struct anon_vma *parent_anon_vma)
{
struct mm_struct *mm = vma->vm_mm;
struct anon_vma *anon_vma, *allocated;
anon_vma_tree_t anon_tree;
struct anon_vma_chain *avc;
+ bool is_lazy = false;
mmap_assert_locked(mm);
might_sleep();
@@ -282,19 +393,30 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
anon_vma = find_mergeable_anon_vma(vma);
allocated = NULL;
- if (!anon_vma) {
+ /* If parent_anon_vma exists, mergeable anon_vma root must match it. */
+ if (!anon_vma ||
+ (parent_anon_vma && anon_vma->root != parent_anon_vma->root)) {
anon_vma = anon_vma_alloc();
if (unlikely(!anon_vma))
goto out_enomem_free_avc;
- anon_vma->num_children++; /* self-parent link for new root */
allocated = anon_vma;
+ if (parent_anon_vma) {
+ anon_vma->root = parent_anon_vma->root;
+ anon_vma->parent = parent_anon_vma;
+ }
}
anon_tree = make_anon_vma_tree(anon_vma);
+ if (upgrade_lazy)
+ is_lazy = vma_unlink_anon_vma_lazy(vma, anon_tree);
anon_vma_tree_lock_write(anon_tree);
/* page_table_lock to protect against threads */
spin_lock(&mm->page_table_lock);
- if (likely(!vma->anon_vma)) {
+ if (likely(!vma->anon_vma || is_lazy)) {
+ if (anon_vma->root != anon_vma)
+ get_anon_vma(anon_vma->root);
+ if (allocated)
+ anon_vma->parent->num_children++;
vma->anon_vma = anon_tree;
anon_vma_chain_assign(vma, avc, anon_vma);
anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
@@ -318,6 +440,28 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
return -ENOMEM;
}
+/**
+ * __anon_vma_prepare - attach an anon_vma to a memory region
+ * @vma: the memory region in question
+ *
+ * Wrapper around vma_prepare_anon_vma() for the non-lazy case.
+ * Called when ANON_VMA_LAZY is disabled.
+ */
+int __anon_vma_prepare(struct vm_area_struct *vma)
+{
+ return vma_prepare_anon_vma(vma, false, NULL);
+}
+
+static int vma_upgrade_anon_vma_lazy(struct vm_area_struct *vma)
+{
+ anon_vma_tree_t vma_tree = vma->anon_vma;
+ struct anon_vma *parent_anon_vma = NULL;
+
+ if (anon_vma_tree_is_parent(vma_tree))
+ parent_anon_vma = anon_vma_tree_anon_vma(vma_tree);
+ return vma_prepare_anon_vma(vma, true, parent_anon_vma);
+}
+
static void check_anon_vma_clone(struct vm_area_struct *dst,
struct vm_area_struct *src,
enum vma_operation operation)
@@ -414,6 +558,20 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src,
if (!active_anon_tree)
return 0;
+ /* Check ANON_VMA_LAZY first. */
+ if (anon_vma_tree_is_vma(active_anon_tree)) {
+ if (vma_link_anon_vma_lazy_root(dst, src))
+ return 0;
+ } else if (anon_vma_tree_is_parent(active_anon_tree)) {
+ /* split from tree_parent is rare; promote to regular. */
+ int err = vma_upgrade_anon_vma_lazy(src);
+
+ if (err)
+ return err;
+ VM_BUG_ON_VMA(vma_is_anon_vma_lazy(src), src);
+ dst->anon_vma = src->anon_vma;
+ }
+
/*
* Allocate AVCs. We don't need an anon_vma lock for this as we
* are not updating the anon_vma rbtree nor are we changing
@@ -445,7 +603,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src,
maybe_reuse_anon_vma(dst, anon_vma);
}
- if (operation != VMA_OP_FORK)
+ if (operation != VMA_OP_FORK && vma_anon_vma(dst))
vma_anon_vma(dst)->num_active_vmas++;
anon_vma_tree_unlock_write(active_anon_tree);
@@ -456,9 +614,38 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src,
return -ENOMEM;
}
+static int vma_fork_anon_vma_lazy(struct vm_area_struct *vma,
+ struct vm_area_struct *pvma)
+{
+ int error;
+
+ if (vma_is_anon_vma_lazy(pvma)) {
+ error = vma_upgrade_anon_vma_lazy(pvma);
+ if (error)
+ return error;
+ VM_BUG_ON_VMA(vma_is_anon_vma_lazy(pvma), pvma);
+ }
+
+ vma_set_anon_vma(vma, NULL);
+ error = anon_vma_clone(vma, pvma, VMA_OP_FORK);
+ if (error)
+ return error;
+
+ if (vma->anon_vma)
+ return 0;
+ /* Lazily allocate the child anon_vma. */
+ vma_link_anon_vma_lazy_parent(vma, pvma);
+ return 0;
+}
+
/*
* Attach vma to its own anon_vma, as well as to the anon_vmas that
* the corresponding VMA in the parent process is attached to.
+ *
+ * For ANON_VMA_LAZY: if the parent VMA is lazy, upgrade it to a regular
+ * anon_vma before cloning. The child VMA may also be marked lazy when
+ * ANON_VMA_LAZY is enabled, deferring anon_vma allocation.
+ *
* Returns 0 on success, non-zero on failure.
*/
int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
@@ -472,6 +659,9 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
if (!pvma->anon_vma)
return 0;
+ if (anon_vma_lazy_enabled())
+ return vma_fork_anon_vma_lazy(vma, pvma);
+
/* Drop inherited anon_vma, we'll reuse existing or allocate new. */
vma_set_anon_vma(vma, NULL);
@@ -577,6 +767,10 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
return;
}
+ /* Unlink ANON_VMA_LAZY first, then ancestor anon_vma. */
+ if (vma_is_anon_vma_lazy(vma))
+ vma_unlink_anon_vma_lazy(vma, (anon_vma_tree_t)NULL);
+
anon_vma_tree_lock_write(active_anon_tree);
/*
@@ -601,7 +795,8 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
anon_vma_chain_free(avc);
}
- vma_anon_vma(vma)->num_active_vmas--;
+ if (vma_anon_vma(vma))
+ vma_anon_vma(vma)->num_active_vmas--;
/*
* vma would still be needed after unlink, and anon_vma will be prepared
* when handle fault.
diff --git a/mm/vma.c b/mm/vma.c
index ed15968a5891..0a31ef82a90c 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -1995,6 +1995,8 @@ static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct *
* acceptable for merging, so we can do all of this optimistically. But
* we do that READ_ONCE() to make sure that we never re-load the pointer.
*
+ * For upgrading ANON_VMA_LAZY VMAs, follow the same reuse rules as splitting.
+ *
* IOW: that the "list_is_singular()" test on the anon_vma_chain only
* matters for the 'stable anon_vma' case (ie the thing we want to avoid
* is to return an anon_vma that is "complex" due to having gone through
@@ -2005,12 +2007,15 @@ static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct *
* a read lock on the mmap_lock.
*/
static struct anon_vma *reusable_anon_vma(struct vm_area_struct *old,
+ struct vm_area_struct *vma,
struct vm_area_struct *a,
struct vm_area_struct *b)
{
if (anon_vma_compatible(a, b)) {
struct anon_vma *anon_vma = vma_anon_vma(old);
+ if (anon_vma && vma_is_anon_vma_lazy(vma))
+ return anon_vma;
if (anon_vma && list_is_singular(&old->anon_vma_chain))
return anon_vma;
}
@@ -2034,7 +2039,7 @@ struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *vma)
/* Try next first. */
next = vma_iter_load(&vmi);
if (next) {
- anon_vma = reusable_anon_vma(next, vma, next);
+ anon_vma = reusable_anon_vma(next, vma, vma, next);
if (anon_vma)
return anon_vma;
}
@@ -2044,7 +2049,7 @@ struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *vma)
prev = vma_prev(&vmi);
/* Try prev next. */
if (prev)
- anon_vma = reusable_anon_vma(prev, prev, vma);
+ anon_vma = reusable_anon_vma(prev, vma, prev, vma);
/*
* We might reach here with anon_vma == NULL if we can't find
--
2.17.1
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 11/15] mm: handle ANON_VMA_LAZY in huge page operations
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (9 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 10/15] mm: defer anon_vma creation with ANON_VMA_LAZY tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 12/15] mm: handle ANON_VMA_LAZY during migration tao
` (8 subsequent siblings)
19 siblings, 0 replies; 64+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
When splitting a huge page, the folio needs to be converted into multiple
subpages. Holding only folio_lock(folio) cannot guarantee that the split
operation completes atomically.
Check and upgrade anon_vma during huge page allocation and collapse to
ensure the anon_vma is properly protected.
Signed-off-by: tao <tao.wangtao@honor.com>
---
mm/internal.h | 5 +++++
mm/khugepaged.c | 5 +++++
mm/memory.c | 17 +++++++++++++----
mm/rmap.c | 15 +++++++++++----
4 files changed, 34 insertions(+), 8 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index 0a36eba3f63c..a746f5272aa6 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -419,6 +419,11 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma);
int __anon_vma_prepare(struct vm_area_struct *vma);
/* Called on first anon fault or from anon_vma_prepare(). */
void vma_prepare_anon_vma_lazy(struct vm_area_struct *vma);
+/*
+ * Upgrade VMA ANON_VMA_LAZY to a regular anon_vma during fork, or when
+ * cloning ANON_VMA_TREE_PARENT or a hugepage VMA.
+ */
+int vma_upgrade_anon_vma_lazy(struct vm_area_struct *vma);
void unlink_anon_vmas(struct vm_area_struct *vma);
static inline int anon_vma_prepare(struct vm_area_struct *vma)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 747748eace91..a33cda026be7 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1164,6 +1164,11 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
if (result != SCAN_SUCCEED)
goto out_up_write;
+ /* Upgrade anon_vma_lazy to protect the anon_vma. */
+ if (vma_upgrade_anon_vma_lazy(vma)) {
+ result = SCAN_FAIL;
+ goto out_up_write;
+ }
anon_vma_tree_lock_write(vma->anon_vma);
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address,
diff --git a/mm/memory.c b/mm/memory.c
index 8fd3877f69fb..26d116b3393c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3819,19 +3819,28 @@ vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
vm_fault_t ret = 0;
+ bool maybe_huge = pmd_none(*vmf->pmd);
- if (likely(vma->anon_vma))
- return 0;
- if (anon_vma_lazy_enabled()) {
+ if (likely(vma->anon_vma)) {
+ if (!vma_is_anon_vma_lazy(vma) || !maybe_huge)
+ return 0;
+ }
+#ifdef CONFIG_ANON_VMA_LAZY
+ if (anon_vma_lazy_enabled() && !maybe_huge) {
vma_prepare_anon_vma_lazy(vma);
return 0;
}
+#endif
if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
if (!mmap_read_trylock(vma->vm_mm))
return VM_FAULT_RETRY;
}
- if (__anon_vma_prepare(vma))
+ if (!vma->anon_vma && __anon_vma_prepare(vma))
+ ret = VM_FAULT_OOM;
+#ifdef CONFIG_ANON_VMA_LAZY
+ if (vma->anon_vma && maybe_huge && vma_upgrade_anon_vma_lazy(vma))
ret = VM_FAULT_OOM;
+#endif
if (vmf->flags & FAULT_FLAG_VMA_LOCK)
mmap_read_unlock(vma->vm_mm);
return ret;
diff --git a/mm/rmap.c b/mm/rmap.c
index d9424f4eb6d0..57cd85efc50a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -452,13 +452,20 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
return vma_prepare_anon_vma(vma, false, NULL);
}
-static int vma_upgrade_anon_vma_lazy(struct vm_area_struct *vma)
+/**
+ * vma_upgrade_anon_vma_lazy - upgrade a VMA's lazy anon_vma to a regular one
+ * @vma: the VMA whose anon_vma_lazy is being upgraded
+ */
+int vma_upgrade_anon_vma_lazy(struct vm_area_struct *vma)
{
- anon_vma_tree_t vma_tree = vma->anon_vma;
+ anon_vma_tree_t anon_tree = READ_ONCE(vma->anon_vma);
struct anon_vma *parent_anon_vma = NULL;
- if (anon_vma_tree_is_parent(vma_tree))
- parent_anon_vma = anon_vma_tree_anon_vma(vma_tree);
+ VM_BUG_ON_VMA(!anon_tree, vma);
+ if (!anon_vma_tree_type(anon_tree))
+ return 0;
+ if (anon_vma_tree_is_parent(anon_tree))
+ parent_anon_vma = anon_vma_tree_anon_vma(anon_tree);
return vma_prepare_anon_vma(vma, true, parent_anon_vma);
}
--
2.17.1
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 12/15] mm: handle ANON_VMA_LAZY during migration
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (10 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 11/15] mm: handle ANON_VMA_LAZY in huge page operations tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 13/15] mm: support setup and upgrade of ANON_VMA_LAZY folios tao
` (7 subsequent siblings)
19 siblings, 0 replies; 64+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
To ensure the atomicity of folio migration, introduce
folio_trylock_get_anon_rmap().
This helper guarantees that the migration operation is mutually
exclusive with free_pgtables(). For ANON_VMA_LAZY, it uses
vma_start_read() to prevent the VMA from being modified or removed
during migration.
Signed-off-by: tao <tao.wangtao@honor.com>
---
include/linux/rmap.h | 12 ++++++++
mm/migrate.c | 71 +++++++++++++++++++++++++-------------------
mm/rmap.c | 40 +++++++++++++++++++++++++
3 files changed, 92 insertions(+), 31 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index ebe9f3f61170..59244481a8c1 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -1042,6 +1042,18 @@ bool folio_maybe_same_anon_vma(const struct folio *folio,
anon_rmap_t folio_get_anon_rmap(const struct folio *folio);
anon_rmap_t folio_lock_anon_rmap_read(const struct folio *folio,
struct rmap_walk_control *rwc);
+/*
+ * folio_trylock_get_anon_rmap ensures that the migration operation
+ * completes atomically and is mutually exclusive with free_pgtables().
+ *
+ * Note: for ANON_VMA_LAZY, this is not equivalent to
+ * anon_rmap_trylock_read() + folio_get_anon_rmap(), because
+ * anon_rmap_trylock_read() only increments the VMA reference count,
+ * while this helper uses vma_start_read() to prevent the VMA from
+ * being modified or removed.
+ */
+anon_rmap_t folio_trylock_get_anon_rmap(const struct folio *folio);
+void anon_rmap_unlock_put(anon_rmap_t anon_rmap);
static inline struct vm_area_struct *anon_rmap_iter_first_vma(
anon_rmap_t anon_rmap, unsigned long start, unsigned long last,
diff --git a/mm/migrate.c b/mm/migrate.c
index b397cdeab09a..4abbfd1faea2 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1173,10 +1173,11 @@ static void migrate_folio_undo_src(struct folio *src,
struct list_head *ret)
{
if (page_was_mapped)
- remove_migration_ptes(src, src, 0);
+ remove_migration_ptes(src, src,
+ anon_rmap_value(anon_rmap) ? TTU_RMAP_LOCKED : 0);
/* Drop an anon_rmap reference if we took one */
if (anon_rmap_value(anon_rmap))
- put_anon_rmap(anon_rmap);
+ anon_rmap_unlock_put(anon_rmap);
if (locked)
folio_unlock(src);
if (ret)
@@ -1279,6 +1280,18 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
folio_wait_writeback(src);
}
+ /*
+ * Block others from accessing the new page when we get around to
+ * establishing additional references. We are usually the only one
+ * holding a reference to dst at this point. We used to have a BUG
+ * here if folio_trylock(dst) fails, but would like to allow for
+ * cases where there might be a race with the previous use of dst.
+ * This is much like races on refcount of oldpage: just don't BUG().
+ */
+ if (unlikely(!folio_trylock(dst)))
+ goto out;
+ dst_locked = true;
+
/*
* By try_to_migrate(), src->mapcount goes down to 0 here. In this case,
* we cannot notice that anon_vma is freed while we migrate a page.
@@ -1287,26 +1300,17 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
* File Caches may use write_page() or lock_page() in migration, then,
* just care Anon page here.
*
- * Only folio_get_anon_rmap() understands the subtleties of
- * getting a hold on an anon_rmap from outside one of its mms.
+ * Only folio_trylock_get_anon_rmap() understands the subtleties of
+ * getting and locking an anon_rmap from outside one of its mms.
* But if we cannot get anon_rmap, then we won't need it anyway,
* because that implies that the anon page is no longer mapped
* (and cannot be remapped so long as we hold the page lock).
*/
- if (folio_test_anon(src) && !folio_test_ksm(src))
- anon_rmap = folio_get_anon_rmap(src);
-
- /*
- * Block others from accessing the new page when we get around to
- * establishing additional references. We are usually the only one
- * holding a reference to dst at this point. We used to have a BUG
- * here if folio_trylock(dst) fails, but would like to allow for
- * cases where there might be a race with the previous use of dst.
- * This is much like races on refcount of oldpage: just don't BUG().
- */
- if (unlikely(!folio_trylock(dst)))
- goto out;
- dst_locked = true;
+ if (folio_test_anon(src) && !folio_test_ksm(src)) {
+ anon_rmap = folio_trylock_get_anon_rmap(src);
+ if (!anon_rmap_value(anon_rmap))
+ goto out;
+ }
if (unlikely(page_has_movable_ops(&src->page))) {
__migrate_folio_record(dst, old_page_state, anon_rmap);
@@ -1331,10 +1335,14 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
goto out;
}
} else if (folio_mapped(src)) {
+ enum ttu_flags ttu = mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0;
+
+ if (anon_rmap_value(anon_rmap))
+ ttu |= TTU_RMAP_LOCKED;
/* Establish migration ptes */
VM_BUG_ON_FOLIO(folio_test_anon(src) &&
!folio_test_ksm(src) && !anon_rmap_value(anon_rmap), src);
- try_to_migrate(src, mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0);
+ try_to_migrate(src, ttu);
old_page_state |= PAGE_WAS_MAPPED;
}
@@ -1415,7 +1423,8 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
lru_add_drain();
if (old_page_state & PAGE_WAS_MAPPED)
- remove_migration_ptes(src, dst, 0);
+ remove_migration_ptes(src, dst,
+ anon_rmap_value(anon_rmap) ? TTU_RMAP_LOCKED : 0);
out_unlock_both:
folio_unlock(dst);
@@ -1434,7 +1443,7 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
list_del(&src->lru);
/* Drop an anon_rmap reference if we took one */
if (anon_rmap_value(anon_rmap))
- put_anon_rmap(anon_rmap);
+ anon_rmap_unlock_put(anon_rmap);
folio_unlock(src);
migrate_folio_done(src, reason);
@@ -1485,7 +1494,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
int page_was_mapped = 0;
anon_rmap_t anon_rmap = ANON_RMAP_NULL;
struct address_space *mapping = NULL;
- enum ttu_flags ttu = 0;
+ enum ttu_flags ttu = TTU_RMAP_LOCKED;
if (folio_ref_count(src) == 1) {
/* page was freed from under us. So we are done. */
@@ -1519,11 +1528,14 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
goto out_unlock;
}
- if (folio_test_anon(src))
- anon_rmap = folio_get_anon_rmap(src);
-
if (unlikely(!folio_trylock(dst)))
- goto put_anon;
+ goto out_unlock;
+
+ if (folio_test_anon(src)) {
+ anon_rmap = folio_trylock_get_anon_rmap(src);
+ if (!anon_rmap_value(anon_rmap))
+ goto unlock_put_anon;
+ }
if (folio_mapped(src)) {
if (!folio_test_anon(src)) {
@@ -1536,8 +1548,6 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
mapping = hugetlb_folio_mapping_lock_write(src);
if (unlikely(!mapping))
goto unlock_put_anon;
-
- ttu = TTU_RMAP_LOCKED;
}
try_to_migrate(src, ttu);
@@ -1550,15 +1560,14 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
if (page_was_mapped)
remove_migration_ptes(src, !rc ? dst : src, ttu);
- if (ttu & TTU_RMAP_LOCKED)
+ if (page_was_mapped && !folio_test_anon(src))
i_mmap_unlock_write(mapping);
unlock_put_anon:
folio_unlock(dst);
-put_anon:
if (anon_rmap_value(anon_rmap))
- put_anon_rmap(anon_rmap);
+ anon_rmap_unlock_put(anon_rmap);
if (!rc) {
move_hugetlb_state(src, dst, reason);
diff --git a/mm/rmap.c b/mm/rmap.c
index 57cd85efc50a..46876b3dbfbc 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1223,6 +1223,46 @@ anon_rmap_t folio_lock_anon_rmap_read(const struct folio *folio,
return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
}
+anon_rmap_t folio_trylock_get_anon_rmap(const struct folio *folio)
+{
+ struct anon_vma *anon_vma;
+ struct vm_area_struct *vma;
+
+ if (folio_test_anon_vma_lazy(folio)) {
+ vma = folio_get_anon_vma_lazy(folio);
+ if (vma && !lock_vma_under_rcu(vma->vm_mm, vma->vm_start)) {
+ vma_put(vma);
+ vma = NULL;
+ }
+ if (vma)
+ return vma_to_anon_rmap(vma);
+ }
+
+ anon_vma = folio_get_anon_vma(folio);
+ if (anon_vma && !anon_vma_trylock_read(anon_vma)) {
+ put_anon_vma(anon_vma);
+ anon_vma = NULL;
+ }
+ return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
+}
+
+void anon_rmap_unlock_put(anon_rmap_t anon_rmap)
+{
+ struct anon_vma *anon_vma;
+
+ if (!anon_rmap_is_anon_vma(anon_rmap)) {
+ struct vm_area_struct *vma = anon_rmap_to_vma(anon_rmap);
+
+ vma_end_read(vma);
+ vma_put(vma);
+ return;
+ }
+
+ anon_vma = anon_rmap_to_anon_vma(anon_rmap);
+ anon_vma_unlock_read(anon_vma);
+ put_anon_vma(anon_vma);
+}
+
#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
/*
* Flush TLB entries for recently unmapped pages from remote CPUs. It is
--
2.17.1
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 13/15] mm: support setup and upgrade of ANON_VMA_LAZY folios
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (11 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 12/15] mm: handle ANON_VMA_LAZY during migration tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 14/15] mm: support merging of ANON_VMA_LAZY VMAs tao
` (6 subsequent siblings)
19 siblings, 0 replies; 64+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
new_anon_rmap() and move_anon_rmap() decide whether to set
PAGE_MAPPING_ANON_VMA_LAZY.
try_dup_anon_rmap() upgrades the folio to PAGE_MAPPING_ANON
during fork() when required.
rmap_walk_anon() detects ANON_VMA_LAZY upgrades and retries
the walk to ensure the mapping is handled correctly.
remove_rmap() needs no special handling since folio_mapped()
is checked before use.
Signed-off-by: tao <tao.wangtao@honor.com>
---
include/linux/rmap.h | 38 ++++++++++++++++++++++++++++++++++++++
mm/rmap.c | 21 ++++++++++++++++++++-
2 files changed, 58 insertions(+), 1 deletion(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 59244481a8c1..9b1970698204 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -392,6 +392,14 @@ static __always_inline void __folio_rmap_sanity_checks(const struct folio *folio
unsigned long mapping = (unsigned long)folio->mapping;
struct anon_vma *anon_vma;
+ if (folio_test_anon_vma_lazy(folio)) {
+ struct vm_area_struct *root_vma =
+ (void *)(mapping - FOLIO_MAPPING_ANON_VMA_LAZY);
+
+ VM_WARN_ON_FOLIO(!rcuref_read(&root_vma->vm_rcuref), folio);
+ return;
+ }
+
anon_vma = (void *)(mapping - FOLIO_MAPPING_ANON);
VM_WARN_ON_FOLIO(atomic_read(&anon_vma->refcount) == 0, folio);
}
@@ -431,6 +439,31 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);
+/**
+ * folio_upgrade_anon_vma_lazy - upgrade folio->mapping from ANON_VMA_LAZY to
+ * an anon_vma
+ * @folio: The folio to upgrade
+ * @vma: The VMA the folio currently belongs to
+ *
+ * Upgrade folio->mapping from ANON_VMA_LAZY to an anon_vma.
+ * This transition is strictly one-way and never reverts back to a lazy
+ * mapping.
+ *
+ * Called during fork() while holding the mmap lock and the VMA write lock,
+ * but without taking the folio lock. Concurrent readers may briefly observe
+ * the old lazy mapping. Migration relies on folio_trylock_get_anon_rmap()
+ * to ensure atomicity, while other rmap operations remain unaffected.
+ */
+static inline void folio_upgrade_anon_vma_lazy(struct folio *folio,
+ struct vm_area_struct *vma)
+{
+ unsigned long anon_tree = (unsigned long)vma->anon_vma;
+
+ VM_BUG_ON_VMA(!anon_tree || !IS_ALIGNED(anon_tree, sizeof(long)), vma);
+ anon_tree = anon_tree + FOLIO_MAPPING_ANON;
+ WRITE_ONCE(folio->mapping, (struct address_space *)anon_tree);
+}
+
/* See folio_try_dup_anon_rmap_*() */
static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
struct vm_area_struct *vma)
@@ -438,6 +471,9 @@ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+ if (folio_test_anon_vma_lazy(folio))
+ folio_upgrade_anon_vma_lazy(folio, vma);
+
if (PageAnonExclusive(&folio->page)) {
if (unlikely(folio_needs_cow_for_dma(vma, folio)))
return -EBUSY;
@@ -573,6 +609,8 @@ static __always_inline int __folio_try_dup_anon_rmap(struct folio *folio,
int i;
VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+ if (folio_test_anon_vma_lazy(folio))
+ folio_upgrade_anon_vma_lazy(folio, src_vma);
__folio_rmap_sanity_checks(folio, page, nr_pages, level);
/*
diff --git a/mm/rmap.c b/mm/rmap.c
index 46876b3dbfbc..e14509b47412 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2002,6 +2002,16 @@ void folio_move_anon_rmap(struct folio *folio, struct vm_area_struct *vma)
void *anon_vma = vma_anon_vma(vma);
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
+
+ if (!anon_vma) {
+ const struct vm_area_struct *root_vma = vma_anon_vma_lazy_root(vma);
+
+ VM_BUG_ON_VMA(!root_vma, vma);
+ root_vma = (void *)root_vma + FOLIO_MAPPING_ANON_VMA_LAZY;
+ WRITE_ONCE(folio->mapping, (struct address_space *)root_vma);
+ return;
+ }
+
VM_BUG_ON_VMA(!anon_vma, vma);
anon_vma += FOLIO_MAPPING_ANON;
@@ -2023,7 +2033,16 @@ void folio_move_anon_rmap(struct folio *folio, struct vm_area_struct *vma)
static void __folio_set_anon(struct folio *folio, struct vm_area_struct *vma,
unsigned long address, bool exclusive)
{
- struct anon_vma *anon_vma = vma_anon_vma(vma);
+ anon_vma_tree_t anon_tree = vma->anon_vma;
+ const struct vm_area_struct *root_vma = vma_anon_vma_lazy_root(vma);
+ struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
+
+ if (root_vma && (anon_vma_tree_is_vma(anon_tree) || exclusive)) {
+ root_vma = (void *)root_vma + FOLIO_MAPPING_ANON_VMA_LAZY;
+ WRITE_ONCE(folio->mapping, (struct address_space *)root_vma);
+ folio->index = linear_page_index(vma, address);
+ return;
+ }
BUG_ON(!anon_vma);
--
2.17.1
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 14/15] mm: support merging of ANON_VMA_LAZY VMAs
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (12 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 13/15] mm: support setup and upgrade of ANON_VMA_LAZY folios tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:01 ` [PATCH 15/15] mm: enable CONFIG_ANON_VMA_LAZY on arm64 and x86_64 tao
` (5 subsequent siblings)
19 siblings, 0 replies; 64+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
Allow ANON_VMA_LAZY VMAs to merge if they share the same root or if one
side has no root.
For ANON_VMA_LAZY merges, do not delete the lazy root VMA. The lazy root
VMA may still be referenced by folio->mapping.
Signed-off-by: tao <tao.wangtao@honor.com>
---
mm/vma.c | 29 ++++++++++++++++++++++++-----
1 file changed, 24 insertions(+), 5 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index 0a31ef82a90c..ae1047dcfbc2 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -76,9 +76,10 @@ static bool vma_is_fork_child(struct vm_area_struct *vma)
/*
* The list_is_singular() test is to avoid merging VMA cloned from
* parents. This can improve scalability caused by the anon_vma root
- * lock.
+ * lock. ANON_VMA_TREE_VMA has no anon_vma_chain.
*/
- return vma && vma->anon_vma && !list_is_singular(&vma->anon_vma_chain);
+ return vma && vma->anon_vma && !anon_vma_tree_is_vma(vma->anon_vma) &&
+ !list_is_singular(&vma->anon_vma_chain);
}
static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_next)
@@ -776,6 +777,17 @@ static bool can_merge_remove_vma(struct vm_area_struct *vma)
return !vma->vm_ops || !vma->vm_ops->close;
}
+/*
+ * The ANON_VMA_LAZY root VMA may still be referenced by folio->mapping.
+ * Keeping the root avoids allocating an extra VMA.
+ */
+#define SWAP_VMG_TARGET_IF_DELETE_ANON_VMA_LAZY_ROOT(vmg, delete_vma) do { \
+ if (anon_vma_lazy_enabled()) { \
+ if (delete_vma && vma_is_anon_vma_lazy_root(delete_vma)) \
+ swap(vmg->target, delete_vma); \
+ } \
+} while (0)
+
/*
* vma_merge_existing_range - Attempt to merge VMAs based on a VMA having its
* attributes modified.
@@ -933,12 +945,15 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
vmg->end = next->vm_end;
vmg->pgoff = prev->vm_pgoff;
+ SWAP_VMG_TARGET_IF_DELETE_ANON_VMA_LAZY_ROOT(vmg, middle);
+ SWAP_VMG_TARGET_IF_DELETE_ANON_VMA_LAZY_ROOT(vmg, next);
+
/*
* We already ensured anon_vma compatibility above, so now it's
* simply a case of, if prev has no anon_vma object, which of
* next or middle contains the anon_vma we must duplicate.
*/
- err = dup_anon_vma(prev, next->anon_vma ? next : middle,
+ err = dup_anon_vma(vmg->target, next->anon_vma ? next : middle,
&anon_dup);
} else if (merge_left) {
/*
@@ -954,8 +969,10 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
if (!vmg->__remove_middle)
vmg->__adjust_middle_start = true;
+ else
+ SWAP_VMG_TARGET_IF_DELETE_ANON_VMA_LAZY_ROOT(vmg, middle);
- err = dup_anon_vma(prev, middle, &anon_dup);
+ err = dup_anon_vma(vmg->target, middle, &anon_dup);
} else { /* merge_right */
/*
* |<------------->| OR
@@ -974,6 +991,7 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
if (vmg->__remove_middle) {
vmg->end = next->vm_end;
vmg->pgoff = next->vm_pgoff - pglen;
+ SWAP_VMG_TARGET_IF_DELETE_ANON_VMA_LAZY_ROOT(vmg, middle);
} else {
/* We shrink middle and expand next. */
vmg->__adjust_next_start = true;
@@ -982,7 +1000,7 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
vmg->pgoff = middle->vm_pgoff;
}
- err = dup_anon_vma(next, middle, &anon_dup);
+ err = dup_anon_vma(vmg->target, middle, &anon_dup);
}
if (err || commit_merge(vmg))
@@ -1212,6 +1230,7 @@ int vma_expand(struct vma_merge_struct *vmg)
vma_start_write(next);
vmg->__remove_next = true;
+ SWAP_VMG_TARGET_IF_DELETE_ANON_VMA_LAZY_ROOT(vmg, next);
next_sticky = vma_flags_and_mask(&next->flags, VMA_STICKY_FLAGS);
vma_flags_set_mask(&sticky_flags, next_sticky);
--
2.17.1
^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH 15/15] mm: enable CONFIG_ANON_VMA_LAZY on arm64 and x86_64
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (13 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 14/15] mm: support merging of ANON_VMA_LAZY VMAs tao
@ 2026-05-27 11:01 ` tao
2026-05-27 11:23 ` [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation Pedro Falcato
` (4 subsequent siblings)
19 siblings, 0 replies; 64+ messages in thread
From: tao @ 2026-05-27 11:01 UTC (permalink / raw)
To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, tao.wangtao, kas,
ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
All prerequisites are in place, so enable CONFIG_ANON_VMA_LAZY for
arm64 and x86_64.
Signed-off-by: tao <tao.wangtao@honor.com>
---
arch/arm64/Kconfig | 1 +
arch/x86/Kconfig | 1 +
mm/rmap.c | 2 +-
3 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fe60738e5943..9517883f0aaf 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -81,6 +81,7 @@ config ARM64
select ARCH_SUPPORTS_NUMA_BALANCING
select ARCH_SUPPORTS_PAGE_TABLE_CHECK
select ARCH_SUPPORTS_PER_VMA_LOCK
+ select ARCH_SUPPORTS_ANON_VMA_LAZY
select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
select ARCH_SUPPORTS_RT
select ARCH_SUPPORTS_SCHED_SMT
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f3f7cb01d69d..cc3430eaa7b4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -28,6 +28,7 @@ config X86_64
select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
select ARCH_SUPPORTS_PER_VMA_LOCK
+ select ARCH_SUPPORTS_ANON_VMA_LAZY
select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
select HAVE_ARCH_SOFT_DIRTY
select MODULES_USE_ELF_RELA
diff --git a/mm/rmap.c b/mm/rmap.c
index e14509b47412..77e2ab95671a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -168,7 +168,7 @@ static struct kmem_cache *anon_vma_chain_cachep;
* covering both regular anon_vma and lazy anon_vma mappings.
*/
-bool anon_vma_lazy_enable;
+bool anon_vma_lazy_enable = true;
#endif
static inline struct anon_vma *anon_vma_alloc(void)
--
2.17.1
^ permalink raw reply related [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (14 preceding siblings ...)
2026-05-27 11:01 ` [PATCH 15/15] mm: enable CONFIG_ANON_VMA_LAZY on arm64 and x86_64 tao
@ 2026-05-27 11:23 ` Pedro Falcato
2026-05-28 6:45 ` wangtao
2026-05-27 11:30 ` Lorenzo Stoakes
` (3 subsequent siblings)
19 siblings, 1 reply; 64+ messages in thread
From: Pedro Falcato @ 2026-05-27 11:23 UTC (permalink / raw)
To: tao
Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, kas, ljs, hpa, liam,
vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh, jgg,
jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, linux-arm-kernel, linux-kernel,
linux-fsdevel, linux-mm, damon, shakeel.butt, ryncsn, 21cnbao,
jparsana, dvander, zhangji1, wangzicheng
On Wed, May 27, 2026 at 07:01:32PM +0800, tao wrote:
> TL;DR
> -----
>
> This series introduces ANON_VMA_LAZY, which defers anon_vma creation
> until it is actually required.
>
> - anon_vma memory reduced by ~92-97%, anon_vma_chain reduced by ~50-57%
> - rmap operations on ANON_VMA_LAZY VMAs do not require anon_vma locking
>
> Background
> ----------
>
> Currently anon_vma structures are created eagerly when anonymous VMAs
> are initialized. However, many VMAs never participate in fork or rmap
This is not true, they are created on fault + a few other places.
> operations that require anon_vma chains, so the allocated anon_vma and
> anon_vma_chain objects are often unnecessary.
>
> Design overview
> ---------------
>
> ANON_VMA_LAZY defers anon_vma allocation until it is actually needed
> (for example during fork). VMAs that never participate in sharing can
> avoid creating anon_vma structures entirely.
>
> Before an anon_vma exists, rmap operations rely directly on VMA
> information, so no anon_vma locking is required. An anon_vma is created
> and linked only when sharing semantics are required.
>
> This series introduces anon_rmap helpers to make rmap less dependent on
> direct anon_vma access. It also introduces anon_vma_tree_t as a container
> to support both the lazy and the existing anon_vma layouts.
>
> Once a VMA becomes associated with an anon_vma, the normal behavior
> remains unchanged.
>
> Memory impact
> -------------
>
> Preliminary measurements show significant reductions in anon_vma-related
> slab allocations.
>
> After boot:
>
> Object | Before (active KB) | After (active KB) | Change
> vm_area_struct | 117035 | 118176 | +1.0%
> anon_vma_chain | 18865.8 | 8112.06 | -57.0%
> anon_vma | 20426.4 | 613.75 | -97.0%
>
> After launching 24 apps:
>
> Object | Before (active KB) | After (active KB) | Change
> vm_area_struct | 196873 | 197345 | +0.2%
> anon_vma_chain | 31477.1 | 15576.8 | -50.5%
> anon_vma | 33280 | 2648.12 | -92.0%
>
> Simple fork microbenchmarks also show a slight improvement in fork
> performance, since child VMAs do not need to allocate anon_vma
> structures during fork.
>
> Feedback and suggestions are welcome.
I'm afraid, per previous discussions[1], that no one is really willing to
maintain extra complexity for the current state of anon rmap and anon vmas.
Sorry :/
Also, please don't send series this large without previous discussion and
_at least_ an RFC tag.
[1] https://lore.kernel.org/all/aec533b2-37a7-4f44-a279-c4aa604206ac@lucifer.local/
--
Pedro
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (15 preceding siblings ...)
2026-05-27 11:23 ` [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation Pedro Falcato
@ 2026-05-27 11:30 ` Lorenzo Stoakes
2026-05-28 7:11 ` wangtao
2026-05-27 14:33 ` Lorenzo Stoakes
` (2 subsequent siblings)
19 siblings, 1 reply; 64+ messages in thread
From: Lorenzo Stoakes @ 2026-05-27 11:30 UTC (permalink / raw)
To: tao
Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, kas, hpa, liam,
vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh, jgg,
jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
I'm sorry but this is not how kernel development is done.
You're sending a series that's very invasive, that you've not coordinated
with anybody else, nor have you mentioned it at a conference, nor engaged
with in discussion with anybody else in the community in any way.
And you've sent it without an RFC, at -rc5 is... quite something.
We do NOT want to extend or expand or hack in anything like this on top of
the existing anon_vma machinery. It's a mess that requires replacement, not
more hacks or expansion.
I've been working on a replacement for the anonymous rmap, recently
presenting at LSF/MM, and all of that has been very public.
In fact I have engaged in recent work which reduced lock contention in
anon_vma, it's really quite discourteous for you not to have contacted me
or the community in addition to the above.
On Wed, May 27, 2026 at 07:01:32PM +0800, tao wrote:
> TL;DR
> -----
>
> This series introduces ANON_VMA_LAZY, which defers anon_vma creation
> until it is actually required.
>
> - anon_vma memory reduced by ~92-97%, anon_vma_chain reduced by ~50-57%
> - rmap operations on ANON_VMA_LAZY VMAs do not require anon_vma locking
>
> Background
> ----------
>
> Currently anon_vma structures are created eagerly when anonymous VMAs
> are initialized. However, many VMAs never participate in fork or rmap
What are you talking about? 'Initialized'? They are created when memory is
faulted in, and we explicity need to know that that's the case.
Also the folio->mapping is required to point to something to allow for anon
rmap...
> operations that require anon_vma chains, so the allocated anon_vma and
> anon_vma_chain objects are often unnecessary.
Right, because we never split or merge VMAs nor require anon rmap?
>
> Design overview
> ---------------
>
> ANON_VMA_LAZY defers anon_vma allocation until it is actually needed
> (for example during fork). VMAs that never participate in sharing can
> avoid creating anon_vma structures entirely.
Well, it's needed the second something's faulted in so you can perform anon
rmap.
>
> Before an anon_vma exists, rmap operations rely directly on VMA
> information, so no anon_vma locking is required. An anon_vma is created
> and linked only when sharing semantics are required.
Err 'directly on VMA information'... a VMA pointer? That can change at any
point? What about remaps?...
I guess I'll see in the code.
>
> This series introduces anon_rmap helpers to make rmap less dependent on
> direct anon_vma access. It also introduces anon_vma_tree_t as a container
> to support both the lazy and the existing anon_vma layouts.
Super invasive, extending the already broken abstraction further. We don't
want this.
>
> Once a VMA becomes associated with an anon_vma, the normal behavior
> remains unchanged.
>
> Memory impact
> -------------
>
> Preliminary measurements show significant reductions in anon_vma-related
> slab allocations.
>
> After boot:
>
> Object | Before (active KB) | After (active KB) | Change
> vm_area_struct | 117035 | 118176 | +1.0%
> anon_vma_chain | 18865.8 | 8112.06 | -57.0%
> anon_vma | 20426.4 | 613.75 | -97.0%
>
> After launching 24 apps:
>
> Object | Before (active KB) | After (active KB) | Change
> vm_area_struct | 196873 | 197345 | +0.2%
> anon_vma_chain | 31477.1 | 15576.8 | -50.5%
> anon_vma | 33280 | 2648.12 | -92.0%
>
> Simple fork microbenchmarks also show a slight improvement in fork
> performance, since child VMAs do not need to allocate anon_vma
> structures during fork.
This seems completely broken too re: anon_vma propagation on fork?
The above is only meaningful if you're not fundamentally breaking anon rmap
which is very easily done, but in addition, I'm not interested in seeing
the anon_vma machinery extended further.
>
> Feedback and suggestions are welcome.
This is what you should have sought AHEAD of sending this.
I'll look at the code, but in general you've gone about this in a really
unfortuate way with respect to the community. This is not to collaborate.
>
>
> tao (15):
> mm/rmap: introduce anon_rmap APIs for anonymous folios
> mm: convert anon_vma rmap APIs to anon_rmap
> mm: introduce anon_vma_tree_t for multiple anon_vma topologies
> mm: switch to anon_vma_tree_t APIs in preparation for ANON_VMA_LAZY
> mm: add CONFIG_ANON_VMA_LAZY and folio helpers
> mm: add CONFIG_VMA_REF and VMA helpers
> mm: replace direct FOLIO_MAPPING_ANON usage with helpers
> mm: prepare rmap infrastructure for ANON_VMA_LAZY
> mm: implement ANON_VMA_LAZY rmap semantics
> mm: defer anon_vma creation with ANON_VMA_LAZY
> mm: handle ANON_VMA_LAZY in huge page operations
> mm: handle ANON_VMA_LAZY during migration
> mm: support setup and upgrade of ANON_VMA_LAZY folios
> mm: support merging of ANON_VMA_LAZY VMAs
> mm: enable CONFIG_ANON_VMA_LAZY on arm64 and x86_64
>
> arch/arm64/Kconfig | 1 +
> arch/x86/Kconfig | 1 +
> fs/proc/page.c | 6 +-
> include/linux/mm.h | 38 ++
> include/linux/mm_types.h | 9 +-
> include/linux/page-flags.h | 34 +-
> include/linux/pagemap.h | 2 +-
> include/linux/rmap.h | 165 ++++++++-
> mm/Kconfig | 22 ++
> mm/damon/ops-common.c | 4 +-
> mm/debug.c | 2 +-
> mm/debug_vm_pgtable.c | 2 +-
> mm/gup.c | 6 +-
> mm/huge_memory.c | 16 +-
> mm/internal.h | 171 +++++++++
> mm/khugepaged.c | 13 +-
> mm/ksm.c | 43 ++-
> mm/memory-failure.c | 11 +-
> mm/memory.c | 19 +-
> mm/migrate.c | 126 ++++---
> mm/mmap.c | 15 +-
> mm/mremap.c | 4 +-
> mm/page_idle.c | 2 +-
> mm/rmap.c | 690 ++++++++++++++++++++++++++++++++++---
> mm/vma.c | 76 ++--
> mm/vma.h | 4 +-
> mm/vma_exec.c | 2 +-
> mm/vma_init.c | 1 +
> 28 files changed, 1279 insertions(+), 206 deletions(-)
>
> --
> 2.17.1
>
>
Lorenzo
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 01/15] mm/rmap: introduce anon_rmap APIs for anonymous folios
2026-05-27 11:01 ` [PATCH 01/15] mm/rmap: introduce anon_rmap APIs for anonymous folios tao
@ 2026-05-27 11:44 ` Lorenzo Stoakes
2026-05-28 7:47 ` wangtao
0 siblings, 1 reply; 64+ messages in thread
From: Lorenzo Stoakes @ 2026-05-27 11:44 UTC (permalink / raw)
To: tao
Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, kas, hpa, liam,
vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh, jgg,
jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
On Wed, May 27, 2026 at 07:01:33PM +0800, tao wrote:
> Add a set of anon_rmap APIs to operate on the reverse mappings of
> anonymous folios.
>
> Introduce anon_rmap_for_each_vma() as a wrapper around
> vma_interval_tree_foreach(), so callers no longer access the
> interval tree directly.
>
> This prepares the rmap code for upcoming ANON_VMA_LAZY support and
> RCU-based lockless rmap traversal.
>
> No functional change intended.
This commit message is total garbage. You're not explaining WHY you're
using words to describe what the code does. I can read the code?
>
> Signed-off-by: tao <tao.wangtao@honor.com>
This is all horrible, horribly invasive, and adding a pile of crap on machinery
we want to get rid of.
You've added zero explanation or comments. This is just not upstreamable,
and even if you did explain yourself we don't want to extend a broken
abstraction with more broken complexity?
You're also seemingly introducing a typesafe wrapper to wrap an arbitrary value?
> ---
> include/linux/rmap.h | 68 +++++++++++++++++++++++++++++++++++++++++
> mm/rmap.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 141 insertions(+)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 8dc0871e5f00..c42314ea4362 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -937,6 +937,44 @@ int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t pgoff,
> void remove_migration_ptes(struct folio *src, struct folio *dst,
> enum ttu_flags flags);
>
> +/* Reverse mapping handle for anonymous folio rmap helpers. */
> +typedef struct anon_rmap {
> + unsigned long rmap;
> +} anon_rmap_t;
I do not know why you're using a typedef when you just treat it as an
arbitrary value?
> +
> +#define ANON_RMAP_NULL make_anon_rmap(0)
This is just equivalent to a NULL value?...
> +
> +static inline anon_rmap_t make_anon_rmap(const void *anon_mapping)
> +{
> + return (anon_rmap_t){ .rmap = (unsigned long)anon_mapping, };
> +}
You're intentionally defeating type safety to store arbitrary values?...
> +
> +static inline unsigned long anon_rmap_value(anon_rmap_t anon_rmap)
> +{
> + return anon_rmap.rmap;
> +}
'Untype safe my arbitrarily type safe wrapped type'...?
> +
> +static inline anon_rmap_t anon_vma_to_anon_rmap(const struct anon_vma *anon_vma)
> +{
> + return make_anon_rmap(anon_vma);
> +}
> +
> +static inline struct anon_vma *anon_rmap_to_anon_vma(anon_rmap_t anon_rmap)
> +{
> + unsigned long rmap = anon_rmap_value(anon_rmap);
> +
> + return (struct anon_vma *)rmap;
> +}
A ton of noise for seemingly no value?
> +
> +anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *vma);
> +void put_anon_rmap(anon_rmap_t anon_rmap);
> +void anon_rmap_lock_write(anon_rmap_t anon_rmap);
> +int anon_rmap_trylock_write(anon_rmap_t anon_rmap);
> +void anon_rmap_unlock_write(anon_rmap_t anon_rmap);
> +void anon_rmap_lock_read(anon_rmap_t anon_rmap);
> +int anon_rmap_trylock_read(anon_rmap_t anon_rmap);
> +void anon_rmap_unlock_read(anon_rmap_t anon_rmap);
Yes let's add a bunch of extra broken abstractions on the broken abstraction.
And let's not comment anything!
> +
> /*
> * rmap_walk_control: To control rmap traversing for specific needs
> *
> @@ -969,6 +1007,36 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc);
> struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
> struct rmap_walk_control *rwc);
>
> +bool folio_maybe_same_anon_vma(const struct folio *folio,
> + const struct vm_area_struct *vma);
What the hell is this?
> +anon_rmap_t folio_get_anon_rmap(const struct folio *folio);
> +anon_rmap_t folio_lock_anon_rmap_read(const struct folio *folio,
> + struct rmap_walk_control *rwc);
> +
> +static inline struct vm_area_struct *anon_rmap_iter_first_vma(
> + anon_rmap_t anon_rmap, unsigned long start, unsigned long last,
> + struct anon_vma_chain **avc)
> +{
> + struct anon_vma *anon_vma = anon_rmap_to_anon_vma(anon_rmap);
> +
> + *avc = anon_vma_interval_tree_iter_first(&anon_vma->rb_root, start, last);
> + return *avc ? (*avc)->vma : NULL;
> +}
So we're allowing for folios to have NULL entries (really the commit
message should have that, rather than me scanning through uncommented
code), but in what world are we ok with an anon folio NOT BEING LINKED BACK
TO ITS VMA?
That's broken no?
> +
> +static inline struct vm_area_struct *anon_rmap_iter_next_vma(
> + anon_rmap_t anon_rmap, unsigned long start, unsigned long last,
> + struct anon_vma_chain **avc)
> +{
> + if (!*avc)
> + return NULL;
> + *avc = anon_vma_interval_tree_iter_next(*avc, start, last);
> + return *avc ? (*avc)->vma : NULL;
> +}
> +
> +#define anon_rmap_foreach_vma(vma, avc, anon_rmap, start, last) \
> + for (vma = anon_rmap_iter_first_vma(anon_rmap, start, last, &avc); \
> + vma; vma = anon_rmap_iter_next_vma(anon_rmap, start, last, &avc))
> +
> #else /* !CONFIG_MMU */
>
> #define anon_vma_init() do {} while (0)
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 78b7fb5f367c..1b2dada71778 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -701,6 +701,79 @@ struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
> return anon_vma;
> }
>
> +anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *vma)
> +{
> + mmap_assert_locked(vma->vm_mm);
> + VM_BUG_ON(!vma->anon_vma);
We don't use BUG_ON() especially VM_BUG_ON().
> + get_anon_vma(vma->anon_vma);
> + return anon_vma_to_anon_rmap(vma->anon_vma);
> +}
> +
> +void put_anon_rmap(anon_rmap_t anon_rmap)
> +{
> + put_anon_vma(anon_rmap_to_anon_vma(anon_rmap));
> +}
> +
> +void anon_rmap_lock_write(anon_rmap_t anon_rmap)
> +{
> + anon_vma_lock_write(anon_rmap_to_anon_vma(anon_rmap));
> +}
> +
> +int anon_rmap_trylock_write(anon_rmap_t anon_rmap)
> +{
> + return anon_vma_trylock_write(anon_rmap_to_anon_vma(anon_rmap));
> +}
> +
> +void anon_rmap_unlock_write(anon_rmap_t anon_rmap)
> +{
> + anon_vma_unlock_write(anon_rmap_to_anon_vma(anon_rmap));
> +}
> +
> +void anon_rmap_lock_read(anon_rmap_t anon_rmap)
> +{
> + anon_vma_lock_read(anon_rmap_to_anon_vma(anon_rmap));
> +}
> +
> +int anon_rmap_trylock_read(anon_rmap_t anon_rmap)
> +{
> + return anon_vma_trylock_read(anon_rmap_to_anon_vma(anon_rmap));
> +}
> +
> +void anon_rmap_unlock_read(anon_rmap_t anon_rmap)
> +{
> + anon_vma_unlock_read(anon_rmap_to_anon_vma(anon_rmap));
> +}
All of these assume that you're getting an anon_vma even though you have
established above that you can put arbitrary other values?
This is terrible?
> +
> +bool folio_maybe_same_anon_vma(const struct folio *folio,
> + const struct vm_area_struct *vma)
> +{
> + struct anon_vma *anon_vma;
> + struct anon_vma *tgt_anon_vma = vma->anon_vma;
> + bool same = false;
> +
> + rcu_read_lock();
> + anon_vma = folio_anon_vma(folio);
> + if (anon_vma && tgt_anon_vma)
> + same = anon_vma->root == tgt_anon_vma->root;
> + rcu_read_unlock();
> + return same;
What VMA locks are being held at this point? You assert none.
Why is it maybe?
Why are you taking the RCU lock?
> +}
> +
> +anon_rmap_t folio_get_anon_rmap(const struct folio *folio)
> +{
> + struct anon_vma *anon_vma = folio_get_anon_vma(folio);
> +
> + return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
> +}
> +
> +anon_rmap_t folio_lock_anon_rmap_read(const struct folio *folio,
> + struct rmap_walk_control *rwc)
> +{
> + struct anon_vma *anon_vma = folio_lock_anon_vma_read(folio, rwc);
> +
> + return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
> +}
> +
> #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> /*
> * Flush TLB entries for recently unmapped pages from remote CPUs. It is
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 02/15] mm: convert anon_vma rmap APIs to anon_rmap
2026-05-27 11:01 ` [PATCH 02/15] mm: convert anon_vma rmap APIs to anon_rmap tao
@ 2026-05-27 11:49 ` Lorenzo Stoakes
2026-05-28 8:55 ` wangtao
0 siblings, 1 reply; 64+ messages in thread
From: Lorenzo Stoakes @ 2026-05-27 11:49 UTC (permalink / raw)
To: tao
Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, kas, hpa, liam,
vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh, jgg,
jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
On Wed, May 27, 2026 at 07:01:34PM +0800, tao wrote:
> Convert the rmap anon_vma interfaces to anon_rmap APIs to clarify the
> semantics of anonymous rmap operations and prepare for upcoming
> ANON_VMA_LAZY support and RCU-based lockless rmap traversal.
>
> Replace folio_anon_vma(), folio_get_anon_vma(), folio_lock_anon_vma_read(),
> anon_vma_trylock_read(), anon_vma_lock_read(), anon_vma_unlock_read(),
> anon_vma_trylock_write(), anon_vma_lock_write(), anon_vma_unlock_write(),
> and vma_interval_tree_foreach() with the anon_rmap APIs.
This is another worthless commit message. You're again just writing what you did
not why or what for. This gives no help whatsoever.
>
> No functional change intended.
Err, there is a functional change, since you're literally changing how things
iterate?
>
> Signed-off-by: tao <tao.wangtao@honor.com>
All of this is terrible, you're replacing a broken abstraction with something
that assumes something completely broken with zero explanation.
No to this.
> ---
> include/linux/rmap.h | 6 ++--
> mm/damon/ops-common.c | 4 +--
> mm/huge_memory.c | 16 +++++------
> mm/ksm.c | 43 ++++++++++++++---------------
> mm/memory-failure.c | 11 ++++----
> mm/migrate.c | 64 +++++++++++++++++++++----------------------
> mm/page_idle.c | 2 +-
> mm/rmap.c | 51 ++++++++++++++++++----------------
> 8 files changed, 98 insertions(+), 99 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index c42314ea4362..9802bce92695 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -997,15 +997,13 @@ struct rmap_walk_control {
> bool (*rmap_one)(struct folio *folio, struct vm_area_struct *vma,
> unsigned long addr, void *arg);
> int (*done)(struct folio *folio);
> - struct anon_vma *(*anon_lock)(const struct folio *folio,
> - struct rmap_walk_control *rwc);
> + anon_rmap_t (*anon_lock)(const struct folio *folio,
> + struct rmap_walk_control *rwc);
> bool (*invalid_vma)(struct vm_area_struct *vma, void *arg);
> };
>
> void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc);
> void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc);
> -struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
> - struct rmap_walk_control *rwc);
>
> bool folio_maybe_same_anon_vma(const struct folio *folio,
> const struct vm_area_struct *vma);
> diff --git a/mm/damon/ops-common.c b/mm/damon/ops-common.c
> index 8c6d613425c1..5788410965b8 100644
> --- a/mm/damon/ops-common.c
> +++ b/mm/damon/ops-common.c
> @@ -172,7 +172,7 @@ void damon_folio_mkold(struct folio *folio)
> {
> struct rmap_walk_control rwc = {
> .rmap_one = damon_folio_mkold_one,
> - .anon_lock = folio_lock_anon_vma_read,
> + .anon_lock = folio_lock_anon_rmap_read,
> };
>
> if (!folio_mapped(folio) || !folio_raw_mapping(folio)) {
> @@ -236,7 +236,7 @@ bool damon_folio_young(struct folio *folio)
> struct rmap_walk_control rwc = {
> .arg = &accessed,
> .rmap_one = damon_folio_young_one,
> - .anon_lock = folio_lock_anon_vma_read,
> + .anon_lock = folio_lock_anon_rmap_read,
> };
>
> if (!folio_mapped(folio) || !folio_raw_mapping(folio)) {
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 970e077019b7..ab3c2397449a 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4051,7 +4051,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> struct folio *end_folio = folio_next(folio);
> bool is_anon = folio_test_anon(folio);
> struct address_space *mapping = NULL;
> - struct anon_vma *anon_vma = NULL;
> + anon_rmap_t anon_rmap = ANON_RMAP_NULL;
> int old_order = folio_order(folio);
> struct folio *new_folio, *next;
> int nr_shmem_dropped = 0;
> @@ -4087,12 +4087,12 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> * is taken to serialise against parallel split or collapse
> * operations.
> */
> - anon_vma = folio_get_anon_vma(folio);
> - if (!anon_vma) {
> + anon_rmap = folio_get_anon_rmap(folio);
> + if (!anon_rmap_value(anon_rmap)) {
> ret = -EBUSY;
> goto out;
> }
> - anon_vma_lock_write(anon_vma);
> + anon_rmap_lock_write(anon_rmap);
> mapping = NULL;
> } else {
> unsigned int min_order;
> @@ -4122,7 +4122,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> }
> }
>
> - anon_vma = NULL;
> + anon_rmap = ANON_RMAP_NULL;
> i_mmap_lock_read(mapping);
>
> /*
> @@ -4200,9 +4200,9 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> }
>
> out_unlock:
> - if (anon_vma) {
> - anon_vma_unlock_write(anon_vma);
> - put_anon_vma(anon_vma);
> + if (anon_rmap_value(anon_rmap)) {
> + anon_rmap_unlock_write(anon_rmap);
> + put_anon_rmap(anon_rmap);
> }
> if (mapping)
> i_mmap_unlock_read(mapping);
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 7d5b76478f0b..f4c204a8a379 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -187,7 +187,7 @@ struct ksm_stable_node {
> /**
> * struct ksm_rmap_item - reverse mapping item for virtual addresses
> * @rmap_list: next rmap_item in mm_slot's singly-linked rmap_list
> - * @anon_vma: pointer to anon_vma for this mm,address, when in stable tree
> + * @anon_rmap: anonymous folio rmap for this mm,address, when in stable tree
> * @nid: NUMA node id of unstable tree in which linked (may not match page)
> * @mm: the memory structure this rmap_item is pointing into
> * @address: the virtual address this rmap_item tracks (+ flags in low bits)
> @@ -201,7 +201,7 @@ struct ksm_stable_node {
> struct ksm_rmap_item {
> struct ksm_rmap_item *rmap_list;
> union {
> - struct anon_vma *anon_vma; /* when stable */
> + anon_rmap_t anon_rmap; /* when stable */
> #ifdef CONFIG_NUMA
> int nid; /* when node of unstable tree */
> #endif
> @@ -786,7 +786,7 @@ static void break_cow(struct ksm_rmap_item *rmap_item)
> * It is not an accident that whenever we want to break COW
> * to undo, we also need to drop a reference to the anon_vma.
> */
> - put_anon_vma(rmap_item->anon_vma);
> + put_anon_rmap(rmap_item->anon_rmap);
>
> mmap_read_lock(mm);
> vma = find_mergeable_vma(mm, addr);
> @@ -898,7 +898,7 @@ static void remove_node_from_stable_tree(struct ksm_stable_node *stable_node)
>
> VM_BUG_ON(stable_node->rmap_hlist_len <= 0);
> stable_node->rmap_hlist_len--;
> - put_anon_vma(rmap_item->anon_vma);
> + put_anon_rmap(rmap_item->anon_rmap);
> rmap_item->address &= PAGE_MASK;
> cond_resched();
> }
> @@ -1051,7 +1051,7 @@ static void remove_rmap_item_from_tree(struct ksm_rmap_item *rmap_item)
> VM_BUG_ON(stable_node->rmap_hlist_len <= 0);
> stable_node->rmap_hlist_len--;
>
> - put_anon_vma(rmap_item->anon_vma);
> + put_anon_rmap(rmap_item->anon_rmap);
> rmap_item->head = NULL;
> rmap_item->address &= PAGE_MASK;
>
> @@ -1598,9 +1598,8 @@ static int try_to_merge_with_ksm_page(struct ksm_rmap_item *rmap_item,
> /* Unstable nid is in union with stable anon_vma: remove first */
> remove_rmap_item_from_tree(rmap_item);
>
> - /* Must get reference to anon_vma while still holding mmap_lock */
> - rmap_item->anon_vma = vma->anon_vma;
> - get_anon_vma(vma->anon_vma);
> + /* Must get reference to anon_rmap while still holding mmap_lock */
> + rmap_item->anon_rmap = vma_get_anon_rmap(vma);
> out:
> mmap_read_unlock(mm);
> trace_ksm_merge_with_ksm_page(kpage, page_to_pfn(kpage ? kpage : page),
> @@ -3108,7 +3107,6 @@ struct folio *ksm_might_need_to_copy(struct folio *folio,
> struct vm_area_struct *vma, unsigned long addr)
> {
> struct page *page = folio_page(folio, 0);
> - struct anon_vma *anon_vma = folio_anon_vma(folio);
> struct folio *new_folio;
>
> if (folio_test_large(folio))
> @@ -3118,10 +3116,10 @@ struct folio *ksm_might_need_to_copy(struct folio *folio,
> if (folio_stable_node(folio) &&
> !(ksm_run & KSM_RUN_UNMERGE))
> return folio; /* no need to copy it */
> - } else if (!anon_vma) {
> + } else if (!folio_test_anon(folio)) {
> return folio; /* no need to copy it */
> } else if (folio->index == linear_page_index(vma, addr) &&
> - anon_vma->root == vma->anon_vma->root) {
> + folio_maybe_same_anon_vma(folio, vma)) {
> return folio; /* still no need to copy it */
> }
> if (PageHWPoison(page))
> @@ -3173,20 +3171,20 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
> hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) {
> /* Ignore the stable/unstable/sqnr flags */
> const unsigned long addr = rmap_item->address & PAGE_MASK;
> - struct anon_vma *anon_vma = rmap_item->anon_vma;
> + anon_rmap_t anon_rmap = rmap_item->anon_rmap;
> struct anon_vma_chain *vmac;
> struct vm_area_struct *vma;
>
> cond_resched();
> - if (!anon_vma_trylock_read(anon_vma)) {
> + if (!anon_rmap_trylock_read(anon_rmap)) {
> if (rwc->try_lock) {
> rwc->contended = true;
> return;
> }
> - anon_vma_lock_read(anon_vma);
> + anon_rmap_lock_read(anon_rmap);
> }
>
> - anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
> + anon_rmap_foreach_vma(vma, vmac, anon_rmap,
> 0, ULONG_MAX) {
>
> cond_resched();
> @@ -3207,15 +3205,15 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
> continue;
>
> if (!rwc->rmap_one(folio, vma, addr, rwc->arg)) {
> - anon_vma_unlock_read(anon_vma);
> + anon_rmap_unlock_read(anon_rmap);
> return;
> }
> if (rwc->done && rwc->done(folio)) {
> - anon_vma_unlock_read(anon_vma);
> + anon_rmap_unlock_read(anon_rmap);
> return;
> }
> }
> - anon_vma_unlock_read(anon_vma);
> + anon_rmap_unlock_read(anon_rmap);
> }
> if (!search_new_forks++)
> goto again;
> @@ -3237,9 +3235,9 @@ void collect_procs_ksm(const struct folio *folio, const struct page *page,
> if (!stable_node)
> return;
> hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) {
> - struct anon_vma *av = rmap_item->anon_vma;
> + anon_rmap_t anon_rmap = rmap_item->anon_rmap;
>
> - anon_vma_lock_read(av);
> + anon_rmap_lock_read(anon_rmap);
> rcu_read_lock();
> for_each_process(tsk) {
> struct anon_vma_chain *vmac;
> @@ -3248,10 +3246,9 @@ void collect_procs_ksm(const struct folio *folio, const struct page *page,
> task_early_kill(tsk, force_early);
> if (!t)
> continue;
> - anon_vma_interval_tree_foreach(vmac, &av->rb_root, 0,
> + anon_rmap_foreach_vma(vma, vmac, anon_rmap, 0,
> ULONG_MAX)
> {
> - vma = vmac->vma;
> if (vma->vm_mm == t->mm) {
> addr = rmap_item->address & PAGE_MASK;
> add_to_kill_ksm(t, page, vma, to_kill,
> @@ -3260,7 +3257,7 @@ void collect_procs_ksm(const struct folio *folio, const struct page *page,
> }
> }
> rcu_read_unlock();
> - anon_vma_unlock_read(av);
> + anon_rmap_unlock_read(anon_rmap);
> }
> }
> #endif
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index ee42d4361309..bc9abba75b5d 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -547,11 +547,11 @@ static void collect_procs_anon(const struct folio *folio,
> int force_early)
> {
> struct task_struct *tsk;
> - struct anon_vma *av;
> + anon_rmap_t anon_rmap;
> pgoff_t pgoff;
>
> - av = folio_lock_anon_vma_read(folio, NULL);
> - if (av == NULL) /* Not actually mapped anymore */
> + anon_rmap = folio_lock_anon_rmap_read(folio, NULL);
> + if (!anon_rmap_value(anon_rmap)) /* Not actually mapped anymore */
> return;
>
> pgoff = page_pgoff(folio, page);
> @@ -564,9 +564,8 @@ static void collect_procs_anon(const struct folio *folio,
>
> if (!t)
> continue;
> - anon_vma_interval_tree_foreach(vmac, &av->rb_root,
> + anon_rmap_foreach_vma(vma, vmac, anon_rmap,
> pgoff, pgoff) {
> - vma = vmac->vma;
> if (vma->vm_mm != t->mm)
> continue;
> addr = page_mapped_in_vma(page, vma);
> @@ -574,7 +573,7 @@ static void collect_procs_anon(const struct folio *folio,
> }
> }
> rcu_read_unlock();
> - anon_vma_unlock_read(av);
> + anon_rmap_unlock_read(anon_rmap);
> }
>
> /*
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 8a64291ab5b4..769983cf14e0 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1142,18 +1142,18 @@ enum {
>
> static void __migrate_folio_record(struct folio *dst,
> int old_page_state,
> - struct anon_vma *anon_vma)
> + anon_rmap_t anon_rmap)
> {
> - dst->private = (void *)anon_vma + old_page_state;
> + dst->private = (void *)anon_rmap_to_anon_vma(anon_rmap) + old_page_state;
> }
>
> static void __migrate_folio_extract(struct folio *dst,
> int *old_page_state,
> - struct anon_vma **anon_vmap)
> + anon_rmap_t *anon_rmapp)
> {
> unsigned long private = (unsigned long)dst->private;
>
> - *anon_vmap = (struct anon_vma *)(private & ~PAGE_OLD_STATES);
> + *anon_rmapp = anon_vma_to_anon_rmap((void *)(private & ~PAGE_OLD_STATES));
> *old_page_state = private & PAGE_OLD_STATES;
> dst->private = NULL;
> }
> @@ -1161,15 +1161,15 @@ static void __migrate_folio_extract(struct folio *dst,
> /* Restore the source folio to the original state upon failure */
> static void migrate_folio_undo_src(struct folio *src,
> int page_was_mapped,
> - struct anon_vma *anon_vma,
> + anon_rmap_t anon_rmap,
> bool locked,
> struct list_head *ret)
> {
> if (page_was_mapped)
> remove_migration_ptes(src, src, 0);
> - /* Drop an anon_vma reference if we took one */
> - if (anon_vma)
> - put_anon_vma(anon_vma);
> + /* Drop an anon_rmap reference if we took one */
> + if (anon_rmap_value(anon_rmap))
> + put_anon_rmap(anon_rmap);
> if (locked)
> folio_unlock(src);
> if (ret)
> @@ -1210,7 +1210,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
> struct folio *dst;
> int rc = -EAGAIN;
> int old_page_state = 0;
> - struct anon_vma *anon_vma = NULL;
> + anon_rmap_t anon_rmap = ANON_RMAP_NULL;
> bool locked = false;
> bool dst_locked = false;
>
> @@ -1275,19 +1275,19 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
> /*
> * By try_to_migrate(), src->mapcount goes down to 0 here. In this case,
> * we cannot notice that anon_vma is freed while we migrate a page.
> - * This get_anon_vma() delays freeing anon_vma pointer until the end
> + * This get_anon_rmap() delays freeing anon_rmap pointer until the end
> * of migration. File cache pages are no problem because of page_lock()
> * File Caches may use write_page() or lock_page() in migration, then,
> * just care Anon page here.
> *
> - * Only folio_get_anon_vma() understands the subtleties of
> - * getting a hold on an anon_vma from outside one of its mms.
> - * But if we cannot get anon_vma, then we won't need it anyway,
> + * Only folio_get_anon_rmap() understands the subtleties of
> + * getting a hold on an anon_rmap from outside one of its mms.
> + * But if we cannot get anon_rmap, then we won't need it anyway,
> * because that implies that the anon page is no longer mapped
> * (and cannot be remapped so long as we hold the page lock).
> */
> if (folio_test_anon(src) && !folio_test_ksm(src))
> - anon_vma = folio_get_anon_vma(src);
> + anon_rmap = folio_get_anon_rmap(src);
>
> /*
> * Block others from accessing the new page when we get around to
> @@ -1302,7 +1302,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
> dst_locked = true;
>
> if (unlikely(page_has_movable_ops(&src->page))) {
> - __migrate_folio_record(dst, old_page_state, anon_vma);
> + __migrate_folio_record(dst, old_page_state, anon_rmap);
> return 0;
> }
>
> @@ -1326,13 +1326,13 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
> } else if (folio_mapped(src)) {
> /* Establish migration ptes */
> VM_BUG_ON_FOLIO(folio_test_anon(src) &&
> - !folio_test_ksm(src) && !anon_vma, src);
> + !folio_test_ksm(src) && !anon_rmap_value(anon_rmap), src);
> try_to_migrate(src, mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0);
> old_page_state |= PAGE_WAS_MAPPED;
> }
>
> if (!folio_mapped(src)) {
> - __migrate_folio_record(dst, old_page_state, anon_vma);
> + __migrate_folio_record(dst, old_page_state, anon_rmap);
> return 0;
> }
>
> @@ -1345,7 +1345,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
> ret = NULL;
>
> migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED,
> - anon_vma, locked, ret);
> + anon_rmap, locked, ret);
> migrate_folio_undo_dst(dst, dst_locked, put_new_folio, private);
>
> return rc;
> @@ -1359,12 +1359,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> {
> int rc;
> int old_page_state = 0;
> - struct anon_vma *anon_vma = NULL;
> + anon_rmap_t anon_rmap = ANON_RMAP_NULL;
> bool src_deferred_split = false;
> bool src_partially_mapped = false;
> struct list_head *prev;
>
> - __migrate_folio_extract(dst, &old_page_state, &anon_vma);
> + __migrate_folio_extract(dst, &old_page_state, &anon_rmap);
> prev = dst->lru.prev;
> list_del(&dst->lru);
>
> @@ -1425,9 +1425,9 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> * and will be freed.
> */
> list_del(&src->lru);
> - /* Drop an anon_vma reference if we took one */
> - if (anon_vma)
> - put_anon_vma(anon_vma);
> + /* Drop an anon_rmap reference if we took one */
> + if (anon_rmap_value(anon_rmap))
> + put_anon_rmap(anon_rmap);
> folio_unlock(src);
> migrate_folio_done(src, reason);
>
> @@ -1439,12 +1439,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> */
> if (rc == -EAGAIN) {
> list_add(&dst->lru, prev);
> - __migrate_folio_record(dst, old_page_state, anon_vma);
> + __migrate_folio_record(dst, old_page_state, anon_rmap);
> return rc;
> }
>
> migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED,
> - anon_vma, true, ret);
> + anon_rmap, true, ret);
> migrate_folio_undo_dst(dst, true, put_new_folio, private);
>
> return rc;
> @@ -1476,7 +1476,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
> struct folio *dst;
> int rc = -EAGAIN;
> int page_was_mapped = 0;
> - struct anon_vma *anon_vma = NULL;
> + anon_rmap_t anon_rmap = ANON_RMAP_NULL;
> struct address_space *mapping = NULL;
> enum ttu_flags ttu = 0;
>
> @@ -1513,7 +1513,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
> }
>
> if (folio_test_anon(src))
> - anon_vma = folio_get_anon_vma(src);
> + anon_rmap = folio_get_anon_rmap(src);
>
> if (unlikely(!folio_trylock(dst)))
> goto put_anon;
> @@ -1550,8 +1550,8 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
> folio_unlock(dst);
>
> put_anon:
> - if (anon_vma)
> - put_anon_vma(anon_vma);
> + if (anon_rmap_value(anon_rmap))
> + put_anon_rmap(anon_rmap);
>
> if (!rc) {
> move_hugetlb_state(src, dst, reason);
> @@ -1778,11 +1778,11 @@ static void migrate_folios_undo(struct list_head *src_folios,
> dst2 = list_next_entry(dst, lru);
> list_for_each_entry_safe(folio, folio2, src_folios, lru) {
> int old_page_state = 0;
> - struct anon_vma *anon_vma = NULL;
> + anon_rmap_t anon_rmap = ANON_RMAP_NULL;
>
> - __migrate_folio_extract(dst, &old_page_state, &anon_vma);
> + __migrate_folio_extract(dst, &old_page_state, &anon_rmap);
> migrate_folio_undo_src(folio, old_page_state & PAGE_WAS_MAPPED,
> - anon_vma, true, ret_folios);
> + anon_rmap, true, ret_folios);
> list_del(&dst->lru);
> migrate_folio_undo_dst(dst, true, put_new_folio, private);
> dst = dst2;
> diff --git a/mm/page_idle.c b/mm/page_idle.c
> index 9c67cbac2965..d4103f20f526 100644
> --- a/mm/page_idle.c
> +++ b/mm/page_idle.c
> @@ -102,7 +102,7 @@ static void page_idle_clear_pte_refs(struct folio *folio)
> */
> static struct rmap_walk_control rwc = {
> .rmap_one = page_idle_clear_pte_refs_one,
> - .anon_lock = folio_lock_anon_vma_read,
> + .anon_lock = folio_lock_anon_rmap_read,
> };
>
> if (!folio_mapped(folio) || !folio_raw_mapping(folio))
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 1b2dada71778..41607168e00e 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -630,8 +630,8 @@ struct anon_vma *folio_get_anon_vma(const struct folio *folio)
> * reference like with folio_get_anon_vma() and then block on the mutex
> * on !rwc->try_lock case.
> */
> -struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
> - struct rmap_walk_control *rwc)
> +static struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio,
> + struct rmap_walk_control *rwc)
> {
> struct anon_vma *anon_vma = NULL;
> struct anon_vma *root_anon_vma;
> @@ -744,6 +744,14 @@ void anon_rmap_unlock_read(anon_rmap_t anon_rmap)
> anon_vma_unlock_read(anon_rmap_to_anon_vma(anon_rmap));
> }
>
> +static anon_rmap_t folio_anon_rmap(const struct folio *folio)
> +{
> + struct anon_vma *anon_vma;
> +
> + anon_vma = folio_anon_vma(folio);
> + return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
> +}
> +
> bool folio_maybe_same_anon_vma(const struct folio *folio,
> const struct vm_area_struct *vma)
> {
> @@ -930,13 +938,11 @@ unsigned long page_address_in_vma(const struct folio *folio,
> const struct page *page, const struct vm_area_struct *vma)
> {
> if (folio_test_anon(folio)) {
> - struct anon_vma *anon_vma = folio_anon_vma(folio);
> /*
> * Note: swapoff's unuse_vma() is more efficient with this
> * check, and needs it to match anon_vma when KSM is active.
> */
> - if (!vma->anon_vma || !anon_vma ||
> - vma->anon_vma->root != anon_vma->root)
> + if (!vma->anon_vma || !folio_maybe_same_anon_vma(folio, vma))
> return -EFAULT;
> } else if (!vma->vm_file) {
> return -EFAULT;
> @@ -944,7 +950,7 @@ unsigned long page_address_in_vma(const struct folio *folio,
> return -EFAULT;
> }
>
> - /* KSM folios don't reach here because of the !anon_vma check */
> + /* The !folio_maybe_same_anon_vma() above handles KSM folios */
> return vma_address(vma, page_pgoff(folio, page), 1);
> }
>
> @@ -1145,7 +1151,7 @@ int folio_referenced(struct folio *folio, int is_locked,
> struct rmap_walk_control rwc = {
> .rmap_one = folio_referenced_one,
> .arg = (void *)&pra,
> - .anon_lock = folio_lock_anon_vma_read,
> + .anon_lock = folio_lock_anon_rmap_read,
> .try_lock = true,
> .invalid_vma = invalid_folio_referenced_vma,
> };
> @@ -1580,8 +1586,7 @@ static void __page_check_anon_rmap(const struct folio *folio,
> * are initially only visible via the pagetables, and the pte is locked
> * over the call to folio_add_new_anon_rmap.
> */
> - VM_BUG_ON_FOLIO(folio_anon_vma(folio)->root != vma->anon_vma->root,
> - folio);
> + VM_BUG_ON_FOLIO(!folio_maybe_same_anon_vma(folio, vma), folio);
> VM_BUG_ON_PAGE(page_pgoff(folio, page) != linear_page_index(vma, address),
> page);
> }
> @@ -2468,7 +2473,7 @@ void try_to_unmap(struct folio *folio, enum ttu_flags flags)
> .rmap_one = try_to_unmap_one,
> .arg = (void *)flags,
> .done = folio_not_mapped,
> - .anon_lock = folio_lock_anon_vma_read,
> + .anon_lock = folio_lock_anon_rmap_read,
> };
>
> if (flags & TTU_RMAP_LOCKED)
> @@ -2813,7 +2818,7 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags)
> .rmap_one = try_to_migrate_one,
> .arg = (void *)flags,
> .done = folio_not_mapped,
> - .anon_lock = folio_lock_anon_vma_read,
> + .anon_lock = folio_lock_anon_rmap_read,
> };
>
> /*
> @@ -2990,8 +2995,8 @@ void __put_anon_vma(struct anon_vma *anon_vma)
> anon_vma_free(root);
> }
>
> -static struct anon_vma *rmap_walk_anon_lock(const struct folio *folio,
> - struct rmap_walk_control *rwc)
> +static anon_rmap_t rmap_walk_anon_lock(const struct folio *folio,
> + struct rmap_walk_control *rwc)
> {
> struct anon_vma *anon_vma;
>
> @@ -3006,7 +3011,7 @@ static struct anon_vma *rmap_walk_anon_lock(const struct folio *folio,
> */
> anon_vma = folio_anon_vma(folio);
> if (!anon_vma)
> - return NULL;
> + return ANON_RMAP_NULL;
>
> if (anon_vma_trylock_read(anon_vma))
> goto out;
> @@ -3019,7 +3024,7 @@ static struct anon_vma *rmap_walk_anon_lock(const struct folio *folio,
>
> anon_vma_lock_read(anon_vma);
> out:
> - return anon_vma;
> + return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL;
> }
>
> /*
> @@ -3035,9 +3040,10 @@ static struct anon_vma *rmap_walk_anon_lock(const struct folio *folio,
> static void rmap_walk_anon(struct folio *folio,
> struct rmap_walk_control *rwc, bool locked)
> {
> - struct anon_vma *anon_vma;
> + anon_rmap_t anon_rmap;
> pgoff_t pgoff_start, pgoff_end;
> struct anon_vma_chain *avc;
> + struct vm_area_struct *vma;
I have no idea why you put the VMA at this scope...
>
> /*
> * The folio lock ensures that folio->mapping can't be changed under us
> @@ -3046,20 +3052,19 @@ static void rmap_walk_anon(struct folio *folio,
> VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
>
> if (locked) {
> - anon_vma = folio_anon_vma(folio);
> + anon_rmap = folio_anon_rmap(folio);
> /* anon_vma disappear under us? */
> - VM_BUG_ON_FOLIO(!anon_vma, folio);
> + VM_BUG_ON_FOLIO(!anon_rmap_value(anon_rmap), folio);
> } else {
> - anon_vma = rmap_walk_anon_lock(folio, rwc);
> + anon_rmap = rmap_walk_anon_lock(folio, rwc);
> }
> - if (!anon_vma)
> + if (!anon_rmap_value(anon_rmap))
> return;
>
> pgoff_start = folio_pgoff(folio);
> pgoff_end = pgoff_start + folio_nr_pages(folio) - 1;
> - anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root,
> + anon_rmap_foreach_vma(vma, avc, anon_rmap,
> pgoff_start, pgoff_end) {
> - struct vm_area_struct *vma = avc->vma;
Don't throw random changes like this in with a general replacement patch.
> unsigned long address = vma_address(vma, pgoff_start,
> folio_nr_pages(folio));
>
> @@ -3076,7 +3081,7 @@ static void rmap_walk_anon(struct folio *folio,
> }
>
> if (!locked)
> - anon_vma_unlock_read(anon_vma);
> + anon_rmap_unlock_read(anon_rmap);
> }
>
> /**
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 03/15] mm: introduce anon_vma_tree_t for multiple anon_vma topologies
2026-05-27 11:01 ` [PATCH 03/15] mm: introduce anon_vma_tree_t for multiple anon_vma topologies tao
@ 2026-05-27 11:56 ` Lorenzo Stoakes
2026-05-28 9:00 ` wangtao
0 siblings, 1 reply; 64+ messages in thread
From: Lorenzo Stoakes @ 2026-05-27 11:56 UTC (permalink / raw)
To: tao
Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, kas, hpa, liam,
vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh, jgg,
jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
On Wed, May 27, 2026 at 07:01:35PM +0800, tao wrote:
> Prepare for upcoming ANON_VMA_LAZY support and RCU-based lockless rmap
> traversal by clearly separating anon_vma topology handling from the
> anon_rmap semantics.
RCU is not 'lockless'... and if you truly get RCU semantics you break a bunch of
stuff as I found out.
>
> Prepare for supporting multiple anon_vma topologies by introducing
> lightweight abstractions used by the VMA and rmap code.
>
> Introduce anon_vma_tree_t as the type stored in vma->anon_vma:
>
> typedef unsigned long anon_vma_tree_t;
>
> It represents a tagged pointer encoding a reference to the anon_vma
> topology. The low bits are reserved as type tags to distinguish
> different implementations (e.g. regular anon_vma and lazy anon_vma).
> This keeps the VMA representation compact while allowing the topology
> to evolve without changing the VMA layout.
>
> Signed-off-by: tao <tao.wangtao@honor.com>
The commit message is at least better on this one, but this approach is again,
predicated on extending a broken abstraction.
You could have saved time and effort by coming forward with this earlier to the
community.
You're also adding a bunch more messy code on top of anon_vma. It's just the
wrong direction.
> ---
> include/linux/mm_types.h | 3 +++
> mm/internal.h | 54 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 57 insertions(+)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index a308e2c23b82..5f4961ea1572 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -917,6 +917,9 @@ struct vm_area_desc {
> struct mmap_action action;
> };
>
> +/* Tagged pointer stored in vma->anon_vma. Low bits encode anon_vma type. */
> +typedef unsigned long anon_vma_tree_t;
> +
> /*
> * This struct describes a virtual memory area. There is one of these
> * per VM-area/task. A VM area is any part of the process virtual memory
> diff --git a/mm/internal.h b/mm/internal.h
> index 5a2ddcf68e0b..76544ad44ff0 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -246,6 +246,60 @@ static inline void anon_vma_unlock_read(struct anon_vma *anon_vma)
> up_read(&anon_vma->root->rwsem);
> }
>
> +/* anon_vma_tree_t APIs */
> +
> +static inline anon_vma_tree_t make_anon_vma_tree(struct anon_vma *anon_vma)
> +{
> + return (anon_vma_tree_t)anon_vma;
> +}
You're literally returning an unsigned long of an anon_vma here?
Why is the anon_rmap_t a wrapped struct and this an unsigned long?
> +
> +static inline struct anon_vma *anon_vma_tree_anon_vma(anon_vma_tree_t anon_tree)
> +{
> + return (struct anon_vma *)anon_tree;
> +}
The anon_tree is an anon_vma? What?
And it's a tagged pointer but we don't bother clearing any bits right?...!
> +
> +static inline void anon_vma_tree_lock_write(anon_vma_tree_t anon_tree)
> +{
> + struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
> +
> + anon_vma_lock_write(anon_vma);
> +}
> +
> +static inline int anon_vma_tree_trylock_write(anon_vma_tree_t anon_tree)
> +{
> + struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
> +
> + return anon_vma_trylock_write(anon_vma);
> +}
> +
> +static inline void anon_vma_tree_unlock_write(anon_vma_tree_t anon_tree)
> +{
> + struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
> +
> + anon_vma_unlock_write(anon_vma);
> +}
> +
> +static inline void anon_vma_tree_lock_read(anon_vma_tree_t anon_tree)
> +{
> + struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
> +
> + anon_vma_lock_read(anon_vma);
> +}
> +
> +static inline int anon_vma_tree_trylock_read(anon_vma_tree_t anon_tree)
> +{
> + struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
> +
> + return anon_vma_trylock_read(anon_vma);
> +}
> +
> +static inline void anon_vma_tree_unlock_read(anon_vma_tree_t anon_tree)
> +{
> + struct anon_vma *anon_vma = anon_vma_tree_anon_vma(anon_tree);
> +
> + anon_vma_unlock_read(anon_vma);
> +}
> +
You keep adding more and more code on top of the existing mess. This is NOT what
we want.
> struct anon_vma *folio_get_anon_vma(const struct folio *folio);
>
> /* Operations which modify VMAs. */
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (16 preceding siblings ...)
2026-05-27 11:30 ` Lorenzo Stoakes
@ 2026-05-27 14:33 ` Lorenzo Stoakes
2026-05-28 7:57 ` wangtao
2026-06-02 16:07 ` Harry Yoo
2026-06-03 20:25 ` David Hildenbrand (Arm)
19 siblings, 1 reply; 64+ messages in thread
From: Lorenzo Stoakes @ 2026-05-27 14:33 UTC (permalink / raw)
To: tao
Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, akpm,
david, willy, sj, kees, luizcap, zhangjiao2, kas, hpa, liam,
vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh, jgg,
jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
OK I've had a look through more thoroughly now and:
NAK and NAK any approach like this.
Not only is this structurally all wrong, it does some insane stuff (pinning
VMAs - no), the RCU usage is highly dubious and I suspect you've completely
broken the anon rmap for things like migration, or have at least added very
dubious edge cases.
You've added insane complexity, and also have failed to add even
perfunctory tests, which is also totally unacceptable.
The implementation is wrong, and the approach is wrong - we do not want to
extend or build on anon_vma. So this is unmergeable, or any approach like
it.
I also, unfortunately, strongly suspect AI here. The turn of phrase, and
poor commit messages, you doing this out of nowhere with absolutely no rmap
experience before, your total lack of communication before.
Claude puts the probability of heavy AI usage at 85-90%, and I'm pretty
convinced. Either way it's utterly unmergeable but that you (likely) used
AI to generate this much work for us makes me actually pretty annoyed.
As a result, I would strongly suggest you no longer submit patches for the
reverse mapping part of mm, as there is now a real lack of trust.
If you wish to rebuild that, I suggest you _discuss_ concepts and ideas,
e.g. send stuff on-list with a [DISCUSSION] tag, and engage with the
community, and go from there.
It's also important to synchronise - I'm working on an anon rmap
replacement that I'm more than happy to discuss with you or anybody else
which should achieve the same numbers in an architecturally sound way.
You going off and, in a vacuum, generating a bunch of code with an
unacceptable approach is not a civil way of engaging nor is it a good use
of your time, or maintainer time looking at it.
Thanks, Lorenzo
^ permalink raw reply [flat|nested] 64+ messages in thread
* RE: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-27 11:23 ` [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation Pedro Falcato
@ 2026-05-28 6:45 ` wangtao
2026-05-28 7:14 ` Lorenzo Stoakes
0 siblings, 1 reply; 64+ messages in thread
From: wangtao @ 2026-05-28 6:45 UTC (permalink / raw)
To: Pedro Falcato
Cc: catalin.marinas@arm.com, will@kernel.org, tglx@kernel.org,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
x86@kernel.org, akpm@linux-foundation.org, david@kernel.org,
willy@infradead.org, sj@kernel.org, kees@kernel.org,
luizcap@redhat.com, zhangjiao2@cmss.chinamobile.com,
kas@kernel.org, ljs@kernel.org, hpa@zytor.com, liam@infradead.org,
vbabka@kernel.org, rppt@kernel.org, surenb@google.com,
mhocko@suse.com, jack@suse.cz, riel@surriel.com, harry@kernel.org,
jannh@google.com, jgg@ziepe.ca, jhubbard@nvidia.com,
peterx@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com,
npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
baohua@kernel.org, lance.yang@linux.dev, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, 21cnbao@gmail.com, jparsana@google.com,
dvander@google.com, zhangji, wangzicheng
> >
> > Feedback and suggestions are welcome.
>
> I'm afraid, per previous discussions[1], that no one is really willing to maintain
> extra complexity for the current state of anon rmap and anon vmas.
> Sorry :/
>
> Also, please don't send series this large without previous discussion and _at
> least_ an RFC tag.
>
> [1] https://lore.kernel.org/all/aec533b2-37a7-4f44-a279-
> c4aa604206ac@lucifer.local/
>
> --
> Pedro
Thank you very much for your reply.
As I am not very good at english, I haven't participated much in community discussions before and I'm still not very familiar with the usual process.
I realize now that I should probably have started with a discussion thread first, and that this patch series would have been more appropriate with an RFC tag.
I apologize for that.
I will read the discussion in [1] more carefully. I also noticed the related code here:
https://git.kernel.org/pub/scm/linux/kernel/git/ljs/linux.git/log/?h=project/cow-context
Many years ago I had already noticed that data structures such as vma, page_table, and anon_vma consume a significant amount of memory.
However, since the mm subsystem is quite complex, I didn't look into it in depth at the time.
Recently, with memory costs increasing, I revisited these structures and analyzed their memory usage again.
Since anon_vma seems to have a relatively smaller impact compared to vma and page tables, I started by exploring possible optimizations for anon_vma first.
Although anon_vma is relatively simple, there are still quite a few uncertainties.
So I waited until the basic functionality was implemented before sending the patches for discussion.
Thanks again for taking the time to reply.
--
Tao
^ permalink raw reply [flat|nested] 64+ messages in thread
* RE: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-27 11:30 ` Lorenzo Stoakes
@ 2026-05-28 7:11 ` wangtao
2026-05-28 7:22 ` Lorenzo Stoakes
0 siblings, 1 reply; 64+ messages in thread
From: wangtao @ 2026-05-28 7:11 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: catalin.marinas@arm.com, will@kernel.org, tglx@kernel.org,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
x86@kernel.org, akpm@linux-foundation.org, david@kernel.org,
willy@infradead.org, sj@kernel.org, kees@kernel.org,
luizcap@redhat.com, zhangjiao2@cmss.chinamobile.com,
kas@kernel.org, hpa@zytor.com, liam@infradead.org,
vbabka@kernel.org, rppt@kernel.org, surenb@google.com,
mhocko@suse.com, jack@suse.cz, riel@surriel.com, harry@kernel.org,
jannh@google.com, jgg@ziepe.ca, jhubbard@nvidia.com,
peterx@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com,
npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
baohua@kernel.org, lance.yang@linux.dev, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
pfalcato@suse.de, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, 21cnbao@gmail.com, jparsana@google.com,
dvander@google.com, zhangji, wangzicheng
> Subject: Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred
> anon_vma creation
>
> I'm sorry but this is not how kernel development is done.
>
> You're sending a series that's very invasive, that you've not coordinated with
> anybody else, nor have you mentioned it at a conference, nor engaged with
> in discussion with anybody else in the community in any way.
>
> And you've sent it without an RFC, at -rc5 is... quite something.
>
> We do NOT want to extend or expand or hack in anything like this on top of
> the existing anon_vma machinery. It's a mess that requires replacement, not
> more hacks or expansion.
>
> I've been working on a replacement for the anonymous rmap, recently
> presenting at LSF/MM, and all of that has been very public.
>
> In fact I have engaged in recent work which reduced lock contention in
> anon_vma, it's really quite discourteous for you not to have contacted me or
> the community in addition to the above.
>
First of all, thank you very much for your reply.
I'm also glad to learn that you have been working on optimizations
related to anon_vma. I noticed your work from another email in the
thread.
I apologize if my approach caused any inconvenience. My English is not
very good, and I have rarely participated in community discussions
before, so I'm still learning how things are usually done in the
kernel community. If anything I did came across as discourteous, please
understand that it was not my intention.
Recently, due to increasing memory costs, I revisited the memory usage
of anon_vma and found that it might be possible to reduce its
overhead.
My original intention was simply to experiment with an anon_vma_lazy
mechanism to reduce the memory footprint of anon_vma. However, while
working on it I realized that anon_vma handling is quite complex. In
particular, after multiple levels of fork the topology of anonymous
pages can become quite complicated, and it also interacts with other
subsystems such as reclaim, KSM, and migration.
Because of this, I tried separating the functionality of anon_vma
into two parts: anon_rmap_t for anonymous page reverse mapping, and
anon_vma_tree_t for topology management.
anon_rmap_t provides the reverse mapping interface used by reclaim,
KSM, migration, and similar components.
anon_vma_tree_t manages the topology internally for operations such
as fork, clone, split, and merge, and can also indicate whether a
VMA has experienced page faults.
My hope is that by separating these responsibilities, it may become
easier to reason about and further improve the anon_vma design in the
future.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-28 6:45 ` wangtao
@ 2026-05-28 7:14 ` Lorenzo Stoakes
0 siblings, 0 replies; 64+ messages in thread
From: Lorenzo Stoakes @ 2026-05-28 7:14 UTC (permalink / raw)
To: wangtao
Cc: Pedro Falcato, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org,
lance.yang@linux.dev, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, 21cnbao@gmail.com, jparsana@google.com,
dvander@google.com, zhangji, wangzicheng
On Thu, May 28, 2026 at 06:45:07AM +0000, wangtao wrote:
> > >
> > > Feedback and suggestions are welcome.
> >
> > I'm afraid, per previous discussions[1], that no one is really willing to maintain
> > extra complexity for the current state of anon rmap and anon vmas.
> > Sorry :/
> >
> > Also, please don't send series this large without previous discussion and _at
> > least_ an RFC tag.
> >
> > [1] https://lore.kernel.org/all/aec533b2-37a7-4f44-a279-
> > c4aa604206ac@lucifer.local/
> >
> > --
> > Pedro
>
> Thank you very much for your reply.
>
> As I am not very good at english, I haven't participated much in community discussions before and I'm still not very familiar with the usual process.
> I realize now that I should probably have started with a discussion thread first, and that this patch series would have been more appropriate with an RFC tag.
> I apologize for that.
Thanks, appreciate it.
It's also for your benefit - regardless of AI usage or not, you've spent time on
this needlessly, which a discussion could have avoided.
Also as I said, going this way has damaged trust, which also doesn't benefit
anybody.
mm is a welcoming and open community, the best approach when looking at
something like this is to engage with us :)
>
> I will read the discussion in [1] more carefully. I also noticed the related code here:
> https://git.kernel.org/pub/scm/linux/kernel/git/ljs/linux.git/log/?h=project/cow-context
>
> Many years ago I had already noticed that data structures such as vma, page_table, and anon_vma consume a significant amount of memory.
> However, since the mm subsystem is quite complex, I didn't look into it in depth at the time.
> Recently, with memory costs increasing, I revisited these structures and analyzed their memory usage again.
> Since anon_vma seems to have a relatively smaller impact compared to vma and page tables, I started by exploring possible optimizations for anon_vma first.
>
> Although anon_vma is relatively simple, there are still quite a few uncertainties.
> So I waited until the basic functionality was implemented before sending the patches for discussion.
Thanks, I am more than happy to discuss my approach. You can also see slides
from my talk on this at LSF/MM at https://ljs.io/talks
(Note that the code linked is an incomplete implementation, simply some early
code to give a sense of the approach taken!)
I would ask, however, in general that you hold off on anything code-wise before
I am able to issue my own RFC implementing this approach so we can avoid any
overlap/confusion.
>
> Thanks again for taking the time to reply.
>
> --
> Tao
>
Cheers, lorenzo
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-28 7:11 ` wangtao
@ 2026-05-28 7:22 ` Lorenzo Stoakes
0 siblings, 0 replies; 64+ messages in thread
From: Lorenzo Stoakes @ 2026-05-28 7:22 UTC (permalink / raw)
To: wangtao
Cc: catalin.marinas@arm.com, will@kernel.org, tglx@kernel.org,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
x86@kernel.org, akpm@linux-foundation.org, david@kernel.org,
willy@infradead.org, sj@kernel.org, kees@kernel.org,
luizcap@redhat.com, zhangjiao2@cmss.chinamobile.com,
kas@kernel.org, hpa@zytor.com, liam@infradead.org,
vbabka@kernel.org, rppt@kernel.org, surenb@google.com,
mhocko@suse.com, jack@suse.cz, riel@surriel.com, harry@kernel.org,
jannh@google.com, jgg@ziepe.ca, jhubbard@nvidia.com,
peterx@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com,
npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
baohua@kernel.org, lance.yang@linux.dev, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
pfalcato@suse.de, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, 21cnbao@gmail.com, jparsana@google.com,
dvander@google.com, zhangji, wangzicheng
On Thu, May 28, 2026 at 07:11:19AM +0000, wangtao wrote:
>
>
> > Subject: Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred
> > anon_vma creation
> >
> > I'm sorry but this is not how kernel development is done.
> >
> > You're sending a series that's very invasive, that you've not coordinated with
> > anybody else, nor have you mentioned it at a conference, nor engaged with
> > in discussion with anybody else in the community in any way.
> >
> > And you've sent it without an RFC, at -rc5 is... quite something.
> >
> > We do NOT want to extend or expand or hack in anything like this on top of
> > the existing anon_vma machinery. It's a mess that requires replacement, not
> > more hacks or expansion.
> >
> > I've been working on a replacement for the anonymous rmap, recently
> > presenting at LSF/MM, and all of that has been very public.
> >
> > In fact I have engaged in recent work which reduced lock contention in
> > anon_vma, it's really quite discourteous for you not to have contacted me or
> > the community in addition to the above.
> >
> First of all, thank you very much for your reply.
>
> I'm also glad to learn that you have been working on optimizations
> related to anon_vma. I noticed your work from another email in the
> thread.
>
> I apologize if my approach caused any inconvenience. My English is not
> very good, and I have rarely participated in community discussions
> before, so I'm still learning how things are usually done in the
> kernel community. If anything I did came across as discourteous, please
> understand that it was not my intention.
>
> Recently, due to increasing memory costs, I revisited the memory usage
> of anon_vma and found that it might be possible to reduce its
> overhead.
>
> My original intention was simply to experiment with an anon_vma_lazy
> mechanism to reduce the memory footprint of anon_vma. However, while
> working on it I realized that anon_vma handling is quite complex. In
> particular, after multiple levels of fork the topology of anonymous
> pages can become quite complicated, and it also interacts with other
> subsystems such as reclaim, KSM, and migration.
>
> Because of this, I tried separating the functionality of anon_vma
> into two parts: anon_rmap_t for anonymous page reverse mapping, and
> anon_vma_tree_t for topology management.
>
> anon_rmap_t provides the reverse mapping interface used by reclaim,
> KSM, migration, and similar components.
>
> anon_vma_tree_t manages the topology internally for operations such
> as fork, clone, split, and merge, and can also indicate whether a
> VMA has experienced page faults.
>
> My hope is that by separating these responsibilities, it may become
> easier to reason about and further improve the anon_vma design in the
> future.
I understand your approach, as discussed it's not viable.
As mentioned elsewhere, please refrain from further code contributions to
rmap, as there is now a trust issue due to a high likelihood of undeclared
AI-generated code.
You are, however, welcome to engage in discussion, and I'm happy to discuss
the approach publicly on list or via private email whichever you prefer :)
You are also welcome to engage in discussion/review/critique once I produce
my RFC for this.
Thanks, Lorenzo
^ permalink raw reply [flat|nested] 64+ messages in thread
* RE: [PATCH 01/15] mm/rmap: introduce anon_rmap APIs for anonymous folios
2026-05-27 11:44 ` Lorenzo Stoakes
@ 2026-05-28 7:47 ` wangtao
0 siblings, 0 replies; 64+ messages in thread
From: wangtao @ 2026-05-28 7:47 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: catalin.marinas@arm.com, will@kernel.org, tglx@kernel.org,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
x86@kernel.org, akpm@linux-foundation.org, david@kernel.org,
willy@infradead.org, sj@kernel.org, kees@kernel.org,
luizcap@redhat.com, zhangjiao2@cmss.chinamobile.com,
kas@kernel.org, hpa@zytor.com, liam@infradead.org,
vbabka@kernel.org, rppt@kernel.org, surenb@google.com,
mhocko@suse.com, jack@suse.cz, riel@surriel.com, harry@kernel.org,
jannh@google.com, jgg@ziepe.ca, jhubbard@nvidia.com,
peterx@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com,
npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
baohua@kernel.org, lance.yang@linux.dev, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
pfalcato@suse.de, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, 21cnbao@gmail.com, jparsana@google.com,
dvander@google.com, zhangji, wangzicheng
>
> On Wed, May 27, 2026 at 07:01:33PM +0800, tao wrote:
> > Add a set of anon_rmap APIs to operate on the reverse mappings of
> > anonymous folios.
> >
> > Introduce anon_rmap_for_each_vma() as a wrapper around
> > vma_interval_tree_foreach(), so callers no longer access the interval
> > tree directly.
> >
> > This prepares the rmap code for upcoming ANON_VMA_LAZY support and
> > RCU-based lockless rmap traversal.
> >
> > No functional change intended.
>
> This commit message is total garbage. You're not explaining WHY you're using
> words to describe what the code does. I can read the code?
>
> >
> > Signed-off-by: tao <tao.wangtao@honor.com>
>
> This is all horrible, horribly invasive, and adding a pile of crap on machinery
> we want to get rid of.
>
> You've added zero explanation or comments. This is just not upstreamable,
> and even if you did explain yourself we don't want to extend a broken
> abstraction with more broken complexity?
>
> You're also seemingly introducing a typesafe wrapper to wrap an arbitrary
> value?
>
I thought these were just simple wrappers, so they seemed
straightforward, so I added them in the new patch.
> > ---
> > include/linux/rmap.h | 68
> +++++++++++++++++++++++++++++++++++++++++
> > mm/rmap.c | 73
> ++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 141 insertions(+)
> >
> > diff --git a/include/linux/rmap.h b/include/linux/rmap.h index
> > 8dc0871e5f00..c42314ea4362 100644
> > --- a/include/linux/rmap.h
> > +++ b/include/linux/rmap.h
> > @@ -937,6 +937,44 @@ int pfn_mkclean_range(unsigned long pfn,
> unsigned
> > long nr_pages, pgoff_t pgoff, void remove_migration_ptes(struct folio
> *src, struct folio *dst,
> > enum ttu_flags flags);
> >
> > +/* Reverse mapping handle for anonymous folio rmap helpers. */
> > +typedef struct anon_rmap {
> > + unsigned long rmap;
> > +} anon_rmap_t;
>
> I do not know why you're using a typedef when you just treat it as an
> arbitrary value?
>
anon_rmap_t provides functionality externally, so using a struct
is relatively more robust and helps prevent misuse.
> > +
> > +#define ANON_RMAP_NULL make_anon_rmap(0)
>
> This is just equivalent to a NULL value?...
>
> > +
> > +static inline anon_rmap_t make_anon_rmap(const void *anon_mapping)
> {
> > + return (anon_rmap_t){ .rmap = (unsigned long)anon_mapping, }; }
>
> You're intentionally defeating type safety to store arbitrary values?...
>
> > +
> > +static inline unsigned long anon_rmap_value(anon_rmap_t anon_rmap) {
> > + return anon_rmap.rmap;
> > +}
>
> 'Untype safe my arbitrarily type safe wrapped type'...?
>
> > +
> > +static inline anon_rmap_t anon_vma_to_anon_rmap(const struct
> anon_vma
> > +*anon_vma) {
> > + return make_anon_rmap(anon_vma);
> > +}
> > +
> > +static inline struct anon_vma *anon_rmap_to_anon_vma(anon_rmap_t
> > +anon_rmap) {
> > + unsigned long rmap = anon_rmap_value(anon_rmap);
> > +
> > + return (struct anon_vma *)rmap;
> > +}
>
> A ton of noise for seemingly no value?
>
> > +
> > +anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *vma); void
> > +put_anon_rmap(anon_rmap_t anon_rmap); void
> > +anon_rmap_lock_write(anon_rmap_t anon_rmap); int
> > +anon_rmap_trylock_write(anon_rmap_t anon_rmap); void
> > +anon_rmap_unlock_write(anon_rmap_t anon_rmap); void
> > +anon_rmap_lock_read(anon_rmap_t anon_rmap); int
> > +anon_rmap_trylock_read(anon_rmap_t anon_rmap); void
> > +anon_rmap_unlock_read(anon_rmap_t anon_rmap);
>
> Yes let's add a bunch of extra broken abstractions on the broken abstraction.
>
> And let's not comment anything!
>
> > +
> > /*
> > * rmap_walk_control: To control rmap traversing for specific needs
> > *
> > @@ -969,6 +1007,36 @@ void rmap_walk_locked(struct folio *folio,
> > struct rmap_walk_control *rwc); struct anon_vma
> *folio_lock_anon_vma_read(const struct folio *folio,
> > struct rmap_walk_control *rwc);
> >
> > +bool folio_maybe_same_anon_vma(const struct folio *folio,
> > + const struct vm_area_struct *vma);
>
> What the hell is this?
>
>
> > +anon_rmap_t folio_get_anon_rmap(const struct folio *folio);
> > +anon_rmap_t folio_lock_anon_rmap_read(const struct folio *folio,
> > + struct rmap_walk_control *rwc);
> > +
> > +static inline struct vm_area_struct *anon_rmap_iter_first_vma(
> > + anon_rmap_t anon_rmap, unsigned long start, unsigned long last,
> > + struct anon_vma_chain **avc)
> > +{
> > + struct anon_vma *anon_vma =
> anon_rmap_to_anon_vma(anon_rmap);
> > +
> > + *avc = anon_vma_interval_tree_iter_first(&anon_vma->rb_root,
> start, last);
> > + return *avc ? (*avc)->vma : NULL;
> > +}
>
> So we're allowing for folios to have NULL entries (really the commit message
> should have that, rather than me scanning through uncommented code), but
> in what world are we ok with an anon folio NOT BEING LINKED BACK TO ITS
> VMA?
>
> That's broken no?
>
I’m not sure I understand your question. Are you asking whether we should check that *avc is non‑NULL?
Here, anon_rmap_foreach_vma() is used to replace anon_vma_interval_tree_foreach.
After obtaining avc, we then check that it is non‑NULL.
#define anon_vma_interval_tree_foreach(avc, root, start, last) \
for (avc = anon_vma_interval_tree_iter_first(root, start, last); \
avc; avc = anon_vma_interval_tree_iter_next(avc, start, last))
> > +
> > +bool folio_maybe_same_anon_vma(const struct folio *folio,
> > + const struct vm_area_struct *vma)
> > +{
> > + struct anon_vma *anon_vma;
> > + struct anon_vma *tgt_anon_vma = vma->anon_vma;
> > + bool same = false;
> > +
> > + rcu_read_lock();
> > + anon_vma = folio_anon_vma(folio);
> > + if (anon_vma && tgt_anon_vma)
> > + same = anon_vma->root == tgt_anon_vma->root;
> > + rcu_read_unlock();
> > + return same;
>
> What VMA locks are being held at this point? You assert none.
>
> Why is it maybe?
>
> Why are you taking the RCU lock?
>
Using only anon_vma->root is just a simple preliminary check; it is
necessary to obtain the PTE to determine whether the page is actually
used by this anon_vma
The anon_vma obtained from folio_anon_vma(folio) must be accessed
under RCU; otherwise it may already have been freed.
^ permalink raw reply [flat|nested] 64+ messages in thread
* RE: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-27 14:33 ` Lorenzo Stoakes
@ 2026-05-28 7:57 ` wangtao
2026-05-28 8:14 ` Lorenzo Stoakes
0 siblings, 1 reply; 64+ messages in thread
From: wangtao @ 2026-05-28 7:57 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: catalin.marinas@arm.com, will@kernel.org, tglx@kernel.org,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
x86@kernel.org, akpm@linux-foundation.org, david@kernel.org,
willy@infradead.org, sj@kernel.org, kees@kernel.org,
luizcap@redhat.com, zhangjiao2@cmss.chinamobile.com,
kas@kernel.org, hpa@zytor.com, liam@infradead.org,
vbabka@kernel.org, rppt@kernel.org, surenb@google.com,
mhocko@suse.com, jack@suse.cz, riel@surriel.com, harry@kernel.org,
jannh@google.com, jgg@ziepe.ca, jhubbard@nvidia.com,
peterx@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com,
npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
baohua@kernel.org, lance.yang@linux.dev, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
pfalcato@suse.de, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, 21cnbao@gmail.com, jparsana@google.com,
dvander@google.com, zhangji, wangzicheng
> Subject: Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred
> anon_vma creation
>
> OK I've had a look through more thoroughly now and:
>
> NAK and NAK any approach like this.
>
>
> Not only is this structurally all wrong, it does some insane stuff (pinning
> VMAs - no), the RCU usage is highly dubious and I suspect you've completely
> broken the anon rmap for things like migration, or have at least added very
> dubious edge cases.
>
> You've added insane complexity, and also have failed to add even
> perfunctory tests, which is also totally unacceptable.
>
> The implementation is wrong, and the approach is wrong - we do not want to
> extend or build on anon_vma. So this is unmergeable, or any approach like it.
>
> I also, unfortunately, strongly suspect AI here. The turn of phrase, and poor
> commit messages, you doing this out of nowhere with absolutely no rmap
> experience before, your total lack of communication before.
>
> Claude puts the probability of heavy AI usage at 85-90%, and I'm pretty
> convinced. Either way it's utterly unmergeable but that you (likely) used AI to
> generate this much work for us makes me actually pretty annoyed.
>
> As a result, I would strongly suggest you no longer submit patches for the
> reverse mapping part of mm, as there is now a real lack of trust.
>
> If you wish to rebuild that, I suggest you _discuss_ concepts and ideas, e.g.
> send stuff on-list with a [DISCUSSION] tag, and engage with the community,
> and go from there.
>
> It's also important to synchronise - I'm working on an anon rmap replacement
> that I'm more than happy to discuss with you or anybody else which should
> achieve the same numbers in an architecturally sound way.
>
> You going off and, in a vacuum, generating a bunch of code with an
> unacceptable approach is not a civil way of engaging nor is it a good use of
> your time, or maintainer time looking at it.
>
> Thanks, Lorenzo
Your email is very unfriendly. I hope you can point out the specific
problems so we can discuss how to solve them.
I am not good at English and need to use AI to translate commit
messages and comments. This reply email is also translated with AI.
However, the code is written by me. I do not know which AI you are
referring to, but the AI tools I use currently cannot effectively
write kernel code.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-28 7:57 ` wangtao
@ 2026-05-28 8:14 ` Lorenzo Stoakes
[not found] ` <CAGsJ_4zy=-m5wjm0BC-vQXMHGRkHymC-5S_L9Oi708v339vvPw@mail.gmail.com>
0 siblings, 1 reply; 64+ messages in thread
From: Lorenzo Stoakes @ 2026-05-28 8:14 UTC (permalink / raw)
To: wangtao
Cc: catalin.marinas@arm.com, will@kernel.org, tglx@kernel.org,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
x86@kernel.org, akpm@linux-foundation.org, david@kernel.org,
willy@infradead.org, sj@kernel.org, kees@kernel.org,
luizcap@redhat.com, zhangjiao2@cmss.chinamobile.com,
kas@kernel.org, hpa@zytor.com, liam@infradead.org,
vbabka@kernel.org, rppt@kernel.org, surenb@google.com,
mhocko@suse.com, jack@suse.cz, riel@surriel.com, harry@kernel.org,
jannh@google.com, jgg@ziepe.ca, jhubbard@nvidia.com,
peterx@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com,
npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
baohua@kernel.org, lance.yang@linux.dev, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
pfalcato@suse.de, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, 21cnbao@gmail.com, jparsana@google.com,
dvander@google.com, zhangji, wangzicheng
On Thu, May 28, 2026 at 07:57:31AM +0000, wangtao wrote:
> > Subject: Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred
> > anon_vma creation
> >
> > OK I've had a look through more thoroughly now and:
> >
> > NAK and NAK any approach like this.
> >
> >
> > Not only is this structurally all wrong, it does some insane stuff (pinning
> > VMAs - no), the RCU usage is highly dubious and I suspect you've completely
> > broken the anon rmap for things like migration, or have at least added very
> > dubious edge cases.
> >
> > You've added insane complexity, and also have failed to add even
> > perfunctory tests, which is also totally unacceptable.
> >
> > The implementation is wrong, and the approach is wrong - we do not want to
> > extend or build on anon_vma. So this is unmergeable, or any approach like it.
> >
> > I also, unfortunately, strongly suspect AI here. The turn of phrase, and poor
> > commit messages, you doing this out of nowhere with absolutely no rmap
> > experience before, your total lack of communication before.
> >
> > Claude puts the probability of heavy AI usage at 85-90%, and I'm pretty
> > convinced. Either way it's utterly unmergeable but that you (likely) used AI to
> > generate this much work for us makes me actually pretty annoyed.
> >
> > As a result, I would strongly suggest you no longer submit patches for the
> > reverse mapping part of mm, as there is now a real lack of trust.
> >
> > If you wish to rebuild that, I suggest you _discuss_ concepts and ideas, e.g.
> > send stuff on-list with a [DISCUSSION] tag, and engage with the community,
> > and go from there.
> >
> > It's also important to synchronise - I'm working on an anon rmap replacement
> > that I'm more than happy to discuss with you or anybody else which should
> > achieve the same numbers in an architecturally sound way.
> >
> > You going off and, in a vacuum, generating a bunch of code with an
> > unacceptable approach is not a civil way of engaging nor is it a good use of
> > your time, or maintainer time looking at it.
> >
> > Thanks, Lorenzo
>
> Your email is very unfriendly. I hope you can point out the specific
> problems so we can discuss how to solve them.
I already did, you've not responded to any of them, and I'm simply not
spending any more time on this.
The series is totally unmergeable, please do not make further rmap
submissions.
>
> I am not good at English and need to use AI to translate commit
> messages and comments. This reply email is also translated with AI.
> However, the code is written by me. I do not know which AI you are
> referring to, but the AI tools I use currently cannot effectively
> write kernel code.
>
We're fine with using AI for language, or in general as long as there's a
clear understanding of what's being submitted.
However I'm very unconvinced that this series wasn't generated.
You have 2 patches in the kernel for the entirety of 2026. One in bluetooth
and one in the scheduler.
Prior to that you have patches from 2018 in device tree drivers.
You have exactly 0 contributions to mm.
Out of nowhere this year you have a big series for DMA, this series for
anon_vma, having done no work or any contributions to rmap, let alone one
of the trickiest and most complicated areas of mm.
You have a total of 39 mails on the linux-mm mailing list.
Suddenly doing a giant bit of work like this using code that looks entirely
like it's AI-generated, and which after assessment by AI gives an 85-90%
probability of AI generation is really suspicious.
Now, if I'm mistaken, and you have a different name/email/identity I missed
with many mm contributes - I will eat my words here (the series is still
unmergeable either way though).
So sorry, there's simply no trust and as a maintainer of rmap again I must
strongly suggest that you no longer submit patches for this part of the
kernel.
If you wish to build trust up again, begin with discussions, and maybe try
some smaller patches in mm to demonstrate that you're genuinely acting in
good faith?
Thanks, Lorenzo
^ permalink raw reply [flat|nested] 64+ messages in thread
* RE: [PATCH 02/15] mm: convert anon_vma rmap APIs to anon_rmap
2026-05-27 11:49 ` Lorenzo Stoakes
@ 2026-05-28 8:55 ` wangtao
0 siblings, 0 replies; 64+ messages in thread
From: wangtao @ 2026-05-28 8:55 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: catalin.marinas@arm.com, will@kernel.org, tglx@kernel.org,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
x86@kernel.org, akpm@linux-foundation.org, david@kernel.org,
willy@infradead.org, sj@kernel.org, kees@kernel.org,
luizcap@redhat.com, zhangjiao2@cmss.chinamobile.com,
kas@kernel.org, hpa@zytor.com, liam@infradead.org,
vbabka@kernel.org, rppt@kernel.org, surenb@google.com,
mhocko@suse.com, jack@suse.cz, riel@surriel.com, harry@kernel.org,
jannh@google.com, jgg@ziepe.ca, jhubbard@nvidia.com,
peterx@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com,
npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
baohua@kernel.org, lance.yang@linux.dev, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
pfalcato@suse.de, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, 21cnbao@gmail.com, jparsana@google.com,
dvander@google.com, zhangji, wangzicheng
> Subject: Re: [PATCH 02/15] mm: convert anon_vma rmap APIs to anon_rmap
>
> On Wed, May 27, 2026 at 07:01:34PM +0800, tao wrote:
> > Convert the rmap anon_vma interfaces to anon_rmap APIs to clarify the
> > semantics of anonymous rmap operations and prepare for upcoming
> > ANON_VMA_LAZY support and RCU-based lockless rmap traversal.
> >
> > Replace folio_anon_vma(), folio_get_anon_vma(),
> > folio_lock_anon_vma_read(), anon_vma_trylock_read(),
> > anon_vma_lock_read(), anon_vma_unlock_read(),
> > anon_vma_trylock_write(), anon_vma_lock_write(),
> anon_vma_unlock_write(), and vma_interval_tree_foreach() with the
> anon_rmap APIs.
>
> This is another worthless commit message. You're again just writing what you
> did not why or what for. This gives no help whatsoever.
>
I will update the commit message to add more explanation.
> >
> > No functional change intended.
>
> Err, there is a functional change, since you're literally changing how things
> iterate?
>
> >
> > Signed-off-by: tao <tao.wangtao@honor.com>
>
> All of this is terrible, you're replacing a broken abstraction with something
> that assumes something completely broken with zero explanation.
>
> No to this.
>
Before supporting ANON_VMA_LAZY, these were indeed just simple wrappers.
> > /*
> > @@ -3035,9 +3040,10 @@ static struct anon_vma
> > *rmap_walk_anon_lock(const struct folio *folio, static void
> rmap_walk_anon(struct folio *folio,
> > struct rmap_walk_control *rwc, bool locked) {
> > - struct anon_vma *anon_vma;
> > + anon_rmap_t anon_rmap;
> > pgoff_t pgoff_start, pgoff_end;
> > struct anon_vma_chain *avc;
> > + struct vm_area_struct *vma;
>
> I have no idea why you put the VMA at this scope...
The vma is used in anon_rmap_foreach_vma().
>
> >
> > /*
> > * The folio lock ensures that folio->mapping can't be changed under
> > us @@ -3046,20 +3052,19 @@ static void rmap_walk_anon(struct folio
> *folio,
> > VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
> >
> > if (locked) {
> > - anon_vma = folio_anon_vma(folio);
> > + anon_rmap = folio_anon_rmap(folio);
> > /* anon_vma disappear under us? */
> > - VM_BUG_ON_FOLIO(!anon_vma, folio);
> > + VM_BUG_ON_FOLIO(!anon_rmap_value(anon_rmap), folio);
> > } else {
> > - anon_vma = rmap_walk_anon_lock(folio, rwc);
> > + anon_rmap = rmap_walk_anon_lock(folio, rwc);
> > }
> > - if (!anon_vma)
> > + if (!anon_rmap_value(anon_rmap))
> > return;
> >
> > pgoff_start = folio_pgoff(folio);
> > pgoff_end = pgoff_start + folio_nr_pages(folio) - 1;
> > - anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root,
> > + anon_rmap_foreach_vma(vma, avc, anon_rmap,
> > pgoff_start, pgoff_end) {
> > - struct vm_area_struct *vma = avc->vma;
>
> Don't throw random changes like this in with a general replacement patch.
These were replaced carefully, but there may still be omissions or mistakes. Please point them out specifically.
>
> > unsigned long address = vma_address(vma, pgoff_start,
> > folio_nr_pages(folio));
> >
> > @@ -3076,7 +3081,7 @@ static void rmap_walk_anon(struct folio *folio,
> > }
> >
> > if (!locked)
> > - anon_vma_unlock_read(anon_vma);
> > + anon_rmap_unlock_read(anon_rmap);
> > }
> >
> > /**
> > --
> > 2.17.1
> >
^ permalink raw reply [flat|nested] 64+ messages in thread
* RE: [PATCH 03/15] mm: introduce anon_vma_tree_t for multiple anon_vma topologies
2026-05-27 11:56 ` Lorenzo Stoakes
@ 2026-05-28 9:00 ` wangtao
0 siblings, 0 replies; 64+ messages in thread
From: wangtao @ 2026-05-28 9:00 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: catalin.marinas@arm.com, will@kernel.org, tglx@kernel.org,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
x86@kernel.org, akpm@linux-foundation.org, david@kernel.org,
willy@infradead.org, sj@kernel.org, kees@kernel.org,
luizcap@redhat.com, zhangjiao2@cmss.chinamobile.com,
kas@kernel.org, hpa@zytor.com, liam@infradead.org,
vbabka@kernel.org, rppt@kernel.org, surenb@google.com,
mhocko@suse.com, jack@suse.cz, riel@surriel.com, harry@kernel.org,
jannh@google.com, jgg@ziepe.ca, jhubbard@nvidia.com,
peterx@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com,
npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
baohua@kernel.org, lance.yang@linux.dev, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
pfalcato@suse.de, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, 21cnbao@gmail.com, jparsana@google.com,
dvander@google.com, zhangji, wangzicheng
> Subject: Re: [PATCH 03/15] mm: introduce anon_vma_tree_t for multiple
> anon_vma topologies
>
> On Wed, May 27, 2026 at 07:01:35PM +0800, tao wrote:
> > Prepare for upcoming ANON_VMA_LAZY support and RCU-based lockless
> rmap
> > traversal by clearly separating anon_vma topology handling from the
> > anon_rmap semantics.
>
> RCU is not 'lockless'... and if you truly get RCU semantics you break a bunch
> of stuff as I found out.
>
RCU is required when acquiring anon_vma or vma. When calling
rmap_one, RCU lock is not required; the lock is obtained with
anon_rmap_lock_read() or folio_lock_anon_rmap_read().
For regular anon_vma, the anon_vma lock is still used. For
ANON_VMA_LAZY, there is only one vma, so the anon_vma lock is not
needed; we only need to ensure the vma is valid.
> >
> > Prepare for supporting multiple anon_vma topologies by introducing
> > lightweight abstractions used by the VMA and rmap code.
> >
> > Introduce anon_vma_tree_t as the type stored in vma->anon_vma:
> >
> > typedef unsigned long anon_vma_tree_t;
> >
> > It represents a tagged pointer encoding a reference to the anon_vma
> > topology. The low bits are reserved as type tags to distinguish
> > different implementations (e.g. regular anon_vma and lazy anon_vma).
> > This keeps the VMA representation compact while allowing the topology
> > to evolve without changing the VMA layout.
> >
> > Signed-off-by: tao <tao.wangtao@honor.com>
>
> The commit message is at least better on this one, but this approach is again,
> predicated on extending a broken abstraction.
>
> You could have saved time and effort by coming forward with this earlier to
> the community.
>
> You're also adding a bunch more messy code on top of anon_vma. It's just
> the wrong direction.
>
I will update the commit message to add more explanation.
> >
> > +/* anon_vma_tree_t APIs */
> > +
> > +static inline anon_vma_tree_t make_anon_vma_tree(struct anon_vma
> > +*anon_vma) {
> > + return (anon_vma_tree_t)anon_vma;
> > +}
>
> You're literally returning an unsigned long of an anon_vma here?
>
> Why is the anon_rmap_t a wrapped struct and this an unsigned long?
>
anon_vma_tree_t uses unsigned long because it is used internally
by rmap.c and vma.c. In other places it is mainly used to check
whether a fault has occurred.
> > +
> > +static inline struct anon_vma
> *anon_vma_tree_anon_vma(anon_vma_tree_t
> > +anon_tree) {
> > + return (struct anon_vma *)anon_tree; }
>
> The anon_tree is an anon_vma? What?
>
> And it's a tagged pointer but we don't bother clearing any bits right?...!
>
When supporting ANON_VMA_LAZY, the lower bits definitions are added.
I will add comments to clarify this.
> > +static inline void anon_vma_tree_unlock_read(anon_vma_tree_t
> > +anon_tree) {
> > + struct anon_vma *anon_vma =
> anon_vma_tree_anon_vma(anon_tree);
> > +
> > + anon_vma_unlock_read(anon_vma);
> > +}
> > +
>
> You keep adding more and more code on top of the existing mess. This is
> NOT what we want.
>
Additional handling is introduced when enabling ANON_VMA_LAZY;
I will add comments to clarify this.
^ permalink raw reply [flat|nested] 64+ messages in thread
* RE: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
[not found] ` <CAGsJ_4zy=-m5wjm0BC-vQXMHGRkHymC-5S_L9Oi708v339vvPw@mail.gmail.com>
@ 2026-05-29 2:20 ` wangzicheng
2026-05-29 6:56 ` Lorenzo Stoakes
2026-05-29 6:45 ` Lorenzo Stoakes
` (2 subsequent siblings)
3 siblings, 1 reply; 64+ messages in thread
From: wangzicheng @ 2026-05-29 2:20 UTC (permalink / raw)
To: Barry Song
Cc: wangtao, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, lance.yang@linux.dev,
xu.xin16@zte.com.cn, Lorenzo Stoakes, chengming.zhou@linux.dev,
nao.horiguchi@gmail.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
gourry@gourry.net, ying.huang@linux.alibaba.com,
apopple@nvidia.com, pfalcato@suse.de,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com,
zhangji
> -----Original Message-----
> From: Barry Song <baohua@kernel.org>
> Sent: Friday, May 29, 2026 7:31 AM
> To: Lorenzo Stoakes <ljs@kernel.org>
> Cc: wangtao <tao.wangtao@honor.com>; catalin.marinas@arm.com;
> will@kernel.org; tglx@kernel.org; mingo@redhat.com; bp@alien8.de;
> dave.hansen@linux.intel.com; x86@kernel.org; akpm@linux-foundation.org;
> david@kernel.org; willy@infradead.org; sj@kernel.org; kees@kernel.org;
> luizcap@redhat.com; zhangjiao2@cmss.chinamobile.com; kas@kernel.org;
> hpa@zytor.com; liam@infradead.org; vbabka@kernel.org; rppt@kernel.org;
> surenb@google.com; mhocko@suse.com; jack@suse.cz; riel@surriel.com;
> harry@kernel.org; jannh@google.com; jgg@ziepe.ca; jhubbard@nvidia.com;
> peterx@redhat.com; ziy@nvidia.com; baolin.wang@linux.alibaba.com;
> npache@redhat.com; ryan.roberts@arm.com; dev.jain@arm.com;
> lance.yang@linux.dev; xu.xin16@zte.com.cn; chengming.zhou@linux.dev;
> nao.horiguchi@gmail.com; matthew.brost@intel.com;
> joshua.hahnjy@gmail.com; rakie.kim@sk.com; byungchul@sk.com;
> gourry@gourry.net; ying.huang@linux.alibaba.com; apopple@nvidia.com;
> pfalcato@suse.de; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; linux-
> mm@kvack.org; damon@lists.linux.dev; shakeel.butt@linux.dev;
> ryncsn@gmail.com; jparsana@google.com; dvander@google.com; zhangji
> <zhangji1@honor.com>; wangzicheng <wangzicheng@honor.com>
> Subject: Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred
> anon_vma creation
>
> On Thu, May 28, 2026 at 4:15 PM Lorenzo Stoakes <ljs@kernel.org> wrote:
> >
> > On Thu, May 28, 2026 at 07:57:31AM +0000, wangtao wrote:
> > > > Subject: Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for
> deferred
> > > > anon_vma creation
> > > >
> > > > OK I've had a look through more thoroughly now and:
> > > >
> > > > NAK and NAK any approach like this.
> > > >
> > > >
> > > > Not only is this structurally all wrong, it does some insane stuff
> > > > (pinning VMAs - no), the RCU usage is highly dubious and I suspect
> > > > you've completely broken the anon rmap for things like migration,
> > > > or have at least added very dubious edge cases.
> > > >
> > > > You've added insane complexity, and also have failed to add even
> > > > perfunctory tests, which is also totally unacceptable.
> > > >
> > > > The implementation is wrong, and the approach is wrong - we do not
> > > > want to extend or build on anon_vma. So this is unmergeable, or any
> approach like it.
> > > >
> > > > I also, unfortunately, strongly suspect AI here. The turn of
> > > > phrase, and poor commit messages, you doing this out of nowhere
> > > > with absolutely no rmap experience before, your total lack of
> communication before.
> > > >
> > > > Claude puts the probability of heavy AI usage at 85-90%, and I'm
> > > > pretty convinced. Either way it's utterly unmergeable but that you
> > > > (likely) used AI to generate this much work for us makes me actually
> pretty annoyed.
> > > >
> > > > As a result, I would strongly suggest you no longer submit patches
> > > > for the reverse mapping part of mm, as there is now a real lack of trust.
> > > >
> > > > If you wish to rebuild that, I suggest you _discuss_ concepts and ideas,
> e.g.
> > > > send stuff on-list with a [DISCUSSION] tag, and engage with the
> > > > community, and go from there.
> > > >
> > > > It's also important to synchronise - I'm working on an anon rmap
> > > > replacement that I'm more than happy to discuss with you or
> > > > anybody else which should achieve the same numbers in an
> architecturally sound way.
> > > >
> > > > You going off and, in a vacuum, generating a bunch of code with an
> > > > unacceptable approach is not a civil way of engaging nor is it a
> > > > good use of your time, or maintainer time looking at it.
> > > >
> > > > Thanks, Lorenzo
> > >
> > > Your email is very unfriendly. I hope you can point out the specific
> > > problems so we can discuss how to solve them.
>
> Hi Tao,
>
> Lorenzo had a discussion about rmap in Zagreb here:
> https://lore.kernel.org/linux-mm/aec533b2-37a7-4f44-a279-
> c4aa604206ac@lucifer.local/
>
> He also shared the PoC code here:
> https://git.kernel.org/pub/scm/linux/kernel/git/ljs/linux.git/log/?h=project/
> cow-context
>
> and the slides were shared as well. In case you can't find them on linux-mm (I
> actually couldn't find them myself), I am attaching them again here -
> "scalable-cow-lsf-longer-version.pdf"
>
> After coming back from Zagreb, I kept trying to find one or two full days to
> read Lorenzo's code and slides carefully and write a blog about them.
> Unfortunately, I have been completely busy with other work. Sigh... we
> always seem to have too many non-upstream tasks.
>
> If possible, I'd really appreciate it if you could take a deep dive into it and
> write a detailed blog post. I'd be very eager to read it and better understand
> the overall design.
> Otherwise, I'll try to find some time next week or later to go through it
> myself.
>
Hi Barry,
Thank you for your guidance, it is very much appreciated.
I work with Tao at Honor. The motivation behind this work is genuine and practical.
The memory cost has increased significantly, and we have spent real effort investigating
and prototyping solutions to reduce it.
We're happy to join "constructive" discussions and learn from the community.
Thanks,
Zicheng
> >
> > I already did, you've not responded to any of them, and I'm simply not
> > spending any more time on this.
> >
> > The series is totally unmergeable, please do not make further rmap
> > submissions.
> >
> > >
> > > I am not good at English and need to use AI to translate commit
> > > messages and comments. This reply email is also translated with AI.
> > > However, the code is written by me. I do not know which AI you are
> > > referring to, but the AI tools I use currently cannot effectively
> > > write kernel code.
> > >
> >
> > We're fine with using AI for language, or in general as long as
> > there's a clear understanding of what's being submitted.
> >
> > However I'm very unconvinced that this series wasn't generated.
> >
> > You have 2 patches in the kernel for the entirety of 2026. One in
> > bluetooth and one in the scheduler.
> >
> > Prior to that you have patches from 2018 in device tree drivers.
> >
> > You have exactly 0 contributions to mm.
> >
> > Out of nowhere this year you have a big series for DMA, this series
> > for anon_vma, having done no work or any contributions to rmap, let
> > alone one of the trickiest and most complicated areas of mm.
> >
> > You have a total of 39 mails on the linux-mm mailing list.
> >
> > Suddenly doing a giant bit of work like this using code that looks
> > entirely like it's AI-generated, and which after assessment by AI
> > gives an 85-90% probability of AI generation is really suspicious.
> >
> > Now, if I'm mistaken, and you have a different name/email/identity I
> > missed with many mm contributes - I will eat my words here (the series
> > is still unmergeable either way though).
> >
> > So sorry, there's simply no trust and as a maintainer of rmap again I
> > must strongly suggest that you no longer submit patches for this part
> > of the kernel.
> >
> > If you wish to build trust up again, begin with discussions, and maybe
> > try some smaller patches in mm to demonstrate that you're genuinely
> > acting in good faith?
>
> Hi Lorenzo,
>
> I truly believe Tao is acting with good intentions, although the way this is
> being done is quite messy.
>
> Memory costs are increasing significantly these days, and as I understand the
> patchset, he is trying to save memory.
>
> However, I don't think this is being done at the right time or in the right way.
> This may also be due to cultural differences, language barriers, information
> gaps, and a lack of familiarity with the mm community.
> As a non-native speaker, I can see how difficult this can sometimes be.
>
> I would really ask you to give Tao more chances to build trust step by step.
>
> Best Regards
> Barry
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
[not found] ` <CAGsJ_4zy=-m5wjm0BC-vQXMHGRkHymC-5S_L9Oi708v339vvPw@mail.gmail.com>
2026-05-29 2:20 ` wangzicheng
@ 2026-05-29 6:45 ` Lorenzo Stoakes
2026-05-29 9:41 ` wangtao
2026-05-29 15:07 ` Jonathan Corbet
3 siblings, 0 replies; 64+ messages in thread
From: Lorenzo Stoakes @ 2026-05-29 6:45 UTC (permalink / raw)
To: Barry Song
Cc: wangtao, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, lance.yang@linux.dev,
xu.xin16@zte.com.cn, chengming.zhou@linux.dev,
nao.horiguchi@gmail.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
gourry@gourry.net, ying.huang@linux.alibaba.com,
apopple@nvidia.com, pfalcato@suse.de,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com,
zhangji, wangzicheng
On Fri, May 29, 2026 at 07:31:12AM +0800, Barry Song wrote:
> Hi Tao,
>
> Lorenzo had a discussion about rmap in Zagreb here:
> https://lore.kernel.org/linux-mm/aec533b2-37a7-4f44-a279-c4aa604206ac@lucifer.local/
>
> He also shared the PoC code here:
> https://git.kernel.org/pub/scm/linux/kernel/git/ljs/linux.git/log/?h=project/cow-context
>
> and the slides were shared as well. In case you can't find
> them on linux-mm (I actually couldn't find them myself), I am
> attaching them again here -
> "scalable-cow-lsf-longer-version.pdf"
>
> After coming back from Zagreb, I kept trying to find one or
> two full days to read Lorenzo's code and slides carefully and
> write a blog about them. Unfortunately, I have been completely
> busy with other work. Sigh... we always seem to have too many
> non-upstream tasks.
>
> If possible, I'd really appreciate it if you could take a
> deep dive into it and write a detailed blog post. I'd be
> very eager to read it and better understand the overall design.
> Otherwise, I'll try to find some time next week or later to
> go through it myself.
Not sure if you're asking Tao or me about a blog post here? :)
> Hi Lorenzo,
>
> I truly believe Tao is acting with good intentions, although
> the way this is being done is quite messy.
>
> Memory costs are increasing significantly these days, and as I
> understand the patchset, he is trying to save memory.
I think there's broad awareness (from myself in particular...!) of this.
>
> However, I don't think this is being done at the right time
> or in the right way. This may also be due to cultural
> differences, language barriers, information gaps, and a lack
> of familiarity with the mm community.
> As a non-native speaker, I can see how difficult this can
> sometimes be.
>
> I would really ask you to give Tao more chances to build
> trust step by step.
>
> Best Regards
> Barry
I understand and empathise with language difficulties - I have zero
objection to using LLMs to assist with that.
But none of my objections relate to this.
We have received a huge, invasive, unmergeable series with code that reads
exactly as you'd expect from LLM-generated code, that Claude assigns a high
probability of being AI generated, from somebody with:
- 0 previous mm contributions
- 0 interactions in rmap
- 2 patches in 2026 (neither mm)
- prior to that only devicetree contributions from 8 years ago
What would you have me do under those circumstances?
Unfortunately this means I have very little trust in Tao, and given limited
maintainership resource, as I said, I suggest he attempts no further code
contributions to rmap.
And as I said elsewhere, he can rebuild trust through constructive
discussion. Also perhaps building up credibility in mm through smaller
series showing understanding?
Thanks, Lorenzo
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-29 2:20 ` wangzicheng
@ 2026-05-29 6:56 ` Lorenzo Stoakes
0 siblings, 0 replies; 64+ messages in thread
From: Lorenzo Stoakes @ 2026-05-29 6:56 UTC (permalink / raw)
To: wangzicheng
Cc: Barry Song, wangtao, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, lance.yang@linux.dev,
xu.xin16@zte.com.cn, chengming.zhou@linux.dev,
nao.horiguchi@gmail.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
gourry@gourry.net, ying.huang@linux.alibaba.com,
apopple@nvidia.com, pfalcato@suse.de,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com,
zhangji
On Fri, May 29, 2026 at 02:20:38AM +0000, wangzicheng wrote:
> Hi Barry,
>
> Thank you for your guidance, it is very much appreciated.
>
> I work with Tao at Honor. The motivation behind this work is genuine and practical.
> The memory cost has increased significantly, and we have spent real effort investigating
> and prototyping solutions to reduce it.
Thanks, appreciate the input from your side Zicheng.
The series is unmergeable as-is, regardless of provenance.
What's unfortunate is that early discussion could have saved effort (and/or
tokens :). This is often the case with firms that develop something in-house in
isolation then present it to the community suddenly.
And as I said to Barry (+ Tao previously), the circumstances surrounding
this series are additionally very suspicious, and while we are fine with
LLM assistance where the authors fully understand it, it feels that this is
not the case here.
>
> We're happy to join "constructive" discussions and learn from the community.
So with the negativity above said, I'd really like us to move to a more
positive and constructive situation :)
One thing that is clear is that - we all want the same thing.
Reduced memory usage, reduced lock contention in the anon rmap.
So one thing that could be very useful is for you guys to help assist me
with testing of my anon rmap approach as it develops, and also provide
input, critique, and review.
I'd also love to be made aware of any testing you guys have done or input
on that, also any workloads where you have observed particularly
problematic memory usage or lock contention.
Please note that my work is currently under heavy development so the
proof-of-concept code provided is incomplete and not yet functional (it's
just there to give a sense of the shape).
I will likely provide a pre-RFC series to interested parties prior to
sending an RFC on-list.
I'd be more than happy to include you guys in that.
And as I said previously, I'm more than happy to engage in discussion
on-list or privately regarding this work :)
>
> Thanks,
> Zicheng
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 64+ messages in thread
* RE: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
[not found] ` <CAGsJ_4zy=-m5wjm0BC-vQXMHGRkHymC-5S_L9Oi708v339vvPw@mail.gmail.com>
2026-05-29 2:20 ` wangzicheng
2026-05-29 6:45 ` Lorenzo Stoakes
@ 2026-05-29 9:41 ` wangtao
2026-05-29 12:03 ` Lorenzo Stoakes
2026-05-29 15:07 ` Jonathan Corbet
3 siblings, 1 reply; 64+ messages in thread
From: wangtao @ 2026-05-29 9:41 UTC (permalink / raw)
To: Barry Song, Lorenzo Stoakes
Cc: catalin.marinas@arm.com, will@kernel.org, tglx@kernel.org,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
x86@kernel.org, akpm@linux-foundation.org, david@kernel.org,
willy@infradead.org, sj@kernel.org, kees@kernel.org,
luizcap@redhat.com, zhangjiao2@cmss.chinamobile.com,
kas@kernel.org, hpa@zytor.com, liam@infradead.org,
vbabka@kernel.org, rppt@kernel.org, surenb@google.com,
mhocko@suse.com, jack@suse.cz, riel@surriel.com, harry@kernel.org,
jannh@google.com, jgg@ziepe.ca, jhubbard@nvidia.com,
peterx@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com,
npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
lance.yang@linux.dev, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
pfalcato@suse.de, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com,
zhangji, wangzicheng
> > On Thu, May 28, 2026 at 07:57:31AM +0000, wangtao wrote:
> > > > Subject: Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for
> deferred
> > > > anon_vma creation
> > > >
> > > > OK I've had a look through more thoroughly now and:
> > > >
> > > > NAK and NAK any approach like this.
> > > >
> > > >
> > > > Not only is this structurally all wrong, it does some insane stuff
> > > > (pinning VMAs - no), the RCU usage is highly dubious and I suspect
> > > > you've completely broken the anon rmap for things like migration,
> > > > or have at least added very dubious edge cases.
> > > >
> > > > You've added insane complexity, and also have failed to add even
> > > > perfunctory tests, which is also totally unacceptable.
> > > >
> > > > The implementation is wrong, and the approach is wrong - we do not
> > > > want to extend or build on anon_vma. So this is unmergeable, or any
> approach like it.
> > > >
> > > > I also, unfortunately, strongly suspect AI here. The turn of
> > > > phrase, and poor commit messages, you doing this out of nowhere
> > > > with absolutely no rmap experience before, your total lack of
> communication before.
> > > >
> > > > Claude puts the probability of heavy AI usage at 85-90%, and I'm
> > > > pretty convinced. Either way it's utterly unmergeable but that you
> > > > (likely) used AI to generate this much work for us makes me actually
> pretty annoyed.
> > > >
> > > > As a result, I would strongly suggest you no longer submit patches
> > > > for the reverse mapping part of mm, as there is now a real lack of trust.
> > > >
> > > > If you wish to rebuild that, I suggest you _discuss_ concepts and ideas,
> e.g.
> > > > send stuff on-list with a [DISCUSSION] tag, and engage with the
> > > > community, and go from there.
> > > >
> > > > It's also important to synchronise - I'm working on an anon rmap
> > > > replacement that I'm more than happy to discuss with you or
> > > > anybody else which should achieve the same numbers in an
> architecturally sound way.
> > > >
> > > > You going off and, in a vacuum, generating a bunch of code with an
> > > > unacceptable approach is not a civil way of engaging nor is it a
> > > > good use of your time, or maintainer time looking at it.
> > > >
> > > > Thanks, Lorenzo
> > >
> > > Your email is very unfriendly. I hope you can point out the specific
> > > problems so we can discuss how to solve them.
>
> Hi Tao,
>
> Lorenzo had a discussion about rmap in Zagreb here:
> https://lore.kernel.org/linux-mm/aec533b2-37a7-4f44-a279-
> c4aa604206ac@lucifer.local/
>
> He also shared the PoC code here:
> https://git.kernel.org/pub/scm/linux/kernel/git/ljs/linux.git/log/?h=project/
> cow-context
>
> and the slides were shared as well. In case you can't find them on linux-mm (I
> actually couldn't find them myself), I am attaching them again here -
> "scalable-cow-lsf-longer-version.pdf"
>
> After coming back from Zagreb, I kept trying to find one or two full days to
> read Lorenzo's code and slides carefully and write a blog about them.
> Unfortunately, I have been completely busy with other work. Sigh... we
> always seem to have too many non-upstream tasks.
>
> If possible, I'd really appreciate it if you could take a deep dive into it and
> write a detailed blog post. I'd be very eager to read it and better understand
> the overall design.
> Otherwise, I'll try to find some time next week or later to go through it
> myself.
>
Hi Barry,
Thank you very much for your reply.
I took an initial look at the cow-context code, and a few points
might be worth noting:
1. cow_context_walk currently assumes that the rmap walk runs
under RCU protection. This may need to be adjusted early,
since paths such as try_to_unmap_one, page_vma_mkclean_one,
and try_to_migrate_one may involve task switching.
2. In cow_context_walk, traverse_contexts appears to involve
multiple nested loops. When there are many child processes
across several fork layers, it may not be as simple or
efficient as the current anon_vma approach.
It needs to traverse all child cow_ctx, and within each
cow_ctx, remaps_for_each() has two levels of iteration:
remaps_for_each_entry and remaps_for_each_entry_offset.
In other words, it first iterates over cow_ctx and then
traverses rmap_mt inside each one. The rough complexity
seems to be O(#proc * log(#rmap_entries_in_cow)), which
may be somewhat higher than anon_vma's
O(#vmas_in_anon_vma). However, in most cases the number
of processes is not large, so the impact may be limited.
Previously, I also considered converting anon_vma's rb_tree
to a mapletree. If one entry records a single VMA, the
average overhead could be less than two longs per VMA.
However, unlike rb_tree, mapletree does not support storing
multiple elements under a single key. The key would need to
look like (vma_id/mm_id + pgoff). On 32-bit platforms, since
64-bit mapletree keys are not supported yet, the remaining
12 bits are not enough for vma_id/mm_id.
Because of this limitation, I later started thinking about
ways to reduce anon_vma allocations instead.
I will try to find some time next week to analyze the
cow-context design and code more thoroughly, and then
write up a summary.
Thanks,
Tao
> >
> > I already did, you've not responded to any of them, and I'm simply not
> > spending any more time on this.
> >
> > The series is totally unmergeable, please do not make further rmap
> > submissions.
> >
> > >
> > > I am not good at English and need to use AI to translate commit
> > > messages and comments. This reply email is also translated with AI.
> > > However, the code is written by me. I do not know which AI you are
> > > referring to, but the AI tools I use currently cannot effectively
> > > write kernel code.
> > >
> >
> > We're fine with using AI for language, or in general as long as
> > there's a clear understanding of what's being submitted.
> >
> > However I'm very unconvinced that this series wasn't generated.
> >
> > You have 2 patches in the kernel for the entirety of 2026. One in
> > bluetooth and one in the scheduler.
> >
> > Prior to that you have patches from 2018 in device tree drivers.
> >
> > You have exactly 0 contributions to mm.
> >
> > Out of nowhere this year you have a big series for DMA, this series
> > for anon_vma, having done no work or any contributions to rmap, let
> > alone one of the trickiest and most complicated areas of mm.
> >
> > You have a total of 39 mails on the linux-mm mailing list.
> >
> > Suddenly doing a giant bit of work like this using code that looks
> > entirely like it's AI-generated, and which after assessment by AI
> > gives an 85-90% probability of AI generation is really suspicious.
> >
> > Now, if I'm mistaken, and you have a different name/email/identity I
> > missed with many mm contributes - I will eat my words here (the series
> > is still unmergeable either way though).
> >
> > So sorry, there's simply no trust and as a maintainer of rmap again I
> > must strongly suggest that you no longer submit patches for this part
> > of the kernel.
> >
> > If you wish to build trust up again, begin with discussions, and maybe
> > try some smaller patches in mm to demonstrate that you're genuinely
> > acting in good faith?
>
> Hi Lorenzo,
>
> I truly believe Tao is acting with good intentions, although the way this is
> being done is quite messy.
>
> Memory costs are increasing significantly these days, and as I understand the
> patchset, he is trying to save memory.
>
> However, I don't think this is being done at the right time or in the right way.
> This may also be due to cultural differences, language barriers, information
> gaps, and a lack of familiarity with the mm community.
> As a non-native speaker, I can see how difficult this can sometimes be.
>
> I would really ask you to give Tao more chances to build trust step by step.
>
> Best Regards
> Barry
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-29 9:41 ` wangtao
@ 2026-05-29 12:03 ` Lorenzo Stoakes
2026-06-01 1:46 ` wangtao
2026-06-02 20:47 ` Lorenzo Stoakes
0 siblings, 2 replies; 64+ messages in thread
From: Lorenzo Stoakes @ 2026-05-29 12:03 UTC (permalink / raw)
To: wangtao
Cc: Barry Song, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, lance.yang@linux.dev,
xu.xin16@zte.com.cn, chengming.zhou@linux.dev,
nao.horiguchi@gmail.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
gourry@gourry.net, ying.huang@linux.alibaba.com,
apopple@nvidia.com, pfalcato@suse.de,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com,
zhangji, wangzicheng
On Fri, May 29, 2026 at 09:41:20AM +0000, wangtao wrote:
> > Hi Tao,
> >
> > Lorenzo had a discussion about rmap in Zagreb here:
> > https://lore.kernel.org/linux-mm/aec533b2-37a7-4f44-a279-
> > c4aa604206ac@lucifer.local/
> >
> > He also shared the PoC code here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/ljs/linux.git/log/?h=project/
> > cow-context
> >
> > and the slides were shared as well. In case you can't find them on linux-mm (I
> > actually couldn't find them myself), I am attaching them again here -
> > "scalable-cow-lsf-longer-version.pdf"
> >
> > After coming back from Zagreb, I kept trying to find one or two full days to
> > read Lorenzo's code and slides carefully and write a blog about them.
> > Unfortunately, I have been completely busy with other work. Sigh... we
> > always seem to have too many non-upstream tasks.
> >
> > If possible, I'd really appreciate it if you could take a deep dive into it and
> > write a detailed blog post. I'd be very eager to read it and better understand
> > the overall design.
> > Otherwise, I'll try to find some time next week or later to go through it
> > myself.
> >
> Hi Barry,
>
> Thank you very much for your reply.
>
> I took an initial look at the cow-context code, and a few points
> might be worth noting:
>
> 1. cow_context_walk currently assumes that the rmap walk runs
> under RCU protection. This may need to be adjusted early,
> since paths such as try_to_unmap_one, page_vma_mkclean_one,
> and try_to_migrate_one may involve task switching.
>
> 2. In cow_context_walk, traverse_contexts appears to involve
> multiple nested loops. When there are many child processes
> across several fork layers, it may not be as simple or
> efficient as the current anon_vma approach.
>
> It needs to traverse all child cow_ctx, and within each
> cow_ctx, remaps_for_each() has two levels of iteration:
> remaps_for_each_entry and remaps_for_each_entry_offset.
>
> In other words, it first iterates over cow_ctx and then
> traverses rmap_mt inside each one. The rough complexity
> seems to be O(#proc * log(#rmap_entries_in_cow)), which
> may be somewhat higher than anon_vma's
> O(#vmas_in_anon_vma). However, in most cases the number
> of processes is not large, so the impact may be limited.
>
> Previously, I also considered converting anon_vma's rb_tree
> to a mapletree. If one entry records a single VMA, the
> average overhead could be less than two longs per VMA.
>
> However, unlike rb_tree, mapletree does not support storing
> multiple elements under a single key. The key would need to
> look like (vma_id/mm_id + pgoff). On 32-bit platforms, since
> 64-bit mapletree keys are not supported yet, the remaining
> 12 bits are not enough for vma_id/mm_id.
>
> Because of this limitation, I later started thinking about
> ways to reduce anon_vma allocations instead.
>
> I will try to find some time next week to analyze the
> cow-context design and code more thoroughly, and then
> write up a summary.
Tao,
This response is so full of misunderstandings it's not really worth me
responding to any of it. You've even hallucinated an imaginary field which
is REALLY suspicious.
You've no mm expertise or history and came up with this in a few hours. I
asked Claude to analyse it and it puts it at 75-80% chance of being solely
LLM-generated from cow_context.c.
I simply don't have the time to deal with this, so unfortunately I'm going
to have to withdraw the suggestion of further discussion with you on this
topic.
I am working on the scalable CoW project and will solicit opinions of those
with relevant expertise.
We are not interested in your approach or analysis.
Thanks, Lorenzo
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
[not found] ` <CAGsJ_4zy=-m5wjm0BC-vQXMHGRkHymC-5S_L9Oi708v339vvPw@mail.gmail.com>
` (2 preceding siblings ...)
2026-05-29 9:41 ` wangtao
@ 2026-05-29 15:07 ` Jonathan Corbet
2026-05-29 15:40 ` Lorenzo Stoakes
3 siblings, 1 reply; 64+ messages in thread
From: Jonathan Corbet @ 2026-05-29 15:07 UTC (permalink / raw)
To: Barry Song, Lorenzo Stoakes
Cc: wangtao, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, lance.yang@linux.dev,
xu.xin16@zte.com.cn, chengming.zhou@linux.dev,
nao.horiguchi@gmail.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
gourry@gourry.net, ying.huang@linux.alibaba.com,
apopple@nvidia.com, pfalcato@suse.de,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com,
zhangji, wangzicheng
Barry Song <baohua@kernel.org> writes:
> After coming back from Zagreb, I kept trying to find one or
> two full days to read Lorenzo's code and slides carefully and
> write a blog about them. Unfortunately, I have been completely
> busy with other work. Sigh... we always seem to have too many
> non-upstream tasks.
>
> If possible, I'd really appreciate it if you could take a
> deep dive into it and write a detailed blog post. I'd be
> very eager to read it and better understand the overall design.
> Otherwise, I'll try to find some time next week or later to
> go through it myself.
It's still somewhat superficial, but in case it's helpful:
https://lwn.net/Articles/1072378/
jon
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-29 15:07 ` Jonathan Corbet
@ 2026-05-29 15:40 ` Lorenzo Stoakes
2026-05-30 11:28 ` Barry Song
0 siblings, 1 reply; 64+ messages in thread
From: Lorenzo Stoakes @ 2026-05-29 15:40 UTC (permalink / raw)
To: Jonathan Corbet
Cc: Barry Song, wangtao, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, lance.yang@linux.dev,
xu.xin16@zte.com.cn, chengming.zhou@linux.dev,
nao.horiguchi@gmail.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
gourry@gourry.net, ying.huang@linux.alibaba.com,
apopple@nvidia.com, pfalcato@suse.de,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com,
zhangji, wangzicheng
On Fri, May 29, 2026 at 09:07:30AM -0600, Jonathan Corbet wrote:
> Barry Song <baohua@kernel.org> writes:
>
> > After coming back from Zagreb, I kept trying to find one or
> > two full days to read Lorenzo's code and slides carefully and
> > write a blog about them. Unfortunately, I have been completely
> > busy with other work. Sigh... we always seem to have too many
> > non-upstream tasks.
> >
> > If possible, I'd really appreciate it if you could take a
> > deep dive into it and write a detailed blog post. I'd be
> > very eager to read it and better understand the overall design.
> > Otherwise, I'll try to find some time next week or later to
> > go through it myself.
>
> It's still somewhat superficial, but in case it's helpful:
>
> https://lwn.net/Articles/1072378/
>
I found it to be great as usual :)
> jon
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-29 15:40 ` Lorenzo Stoakes
@ 2026-05-30 11:28 ` Barry Song
0 siblings, 0 replies; 64+ messages in thread
From: Barry Song @ 2026-05-30 11:28 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Jonathan Corbet, wangtao, catalin.marinas@arm.com,
will@kernel.org, tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, lance.yang@linux.dev,
xu.xin16@zte.com.cn, chengming.zhou@linux.dev,
nao.horiguchi@gmail.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
gourry@gourry.net, ying.huang@linux.alibaba.com,
apopple@nvidia.com, pfalcato@suse.de,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com,
zhangji, wangzicheng
On Fri, May 29, 2026 at 11:40 PM Lorenzo Stoakes <ljs@kernel.org> wrote:
>
> On Fri, May 29, 2026 at 09:07:30AM -0600, Jonathan Corbet wrote:
> > Barry Song <baohua@kernel.org> writes:
> >
> > > After coming back from Zagreb, I kept trying to find one or
> > > two full days to read Lorenzo's code and slides carefully and
> > > write a blog about them. Unfortunately, I have been completely
> > > busy with other work. Sigh... we always seem to have too many
> > > non-upstream tasks.
> > >
> > > If possible, I'd really appreciate it if you could take a
> > > deep dive into it and write a detailed blog post. I'd be
> > > very eager to read it and better understand the overall design.
> > > Otherwise, I'll try to find some time next week or later to
> > > go through it myself.
> >
> > It's still somewhat superficial, but in case it's helpful:
> >
> > https://lwn.net/Articles/1072378/
> >
>
> I found it to be great as usual :)
+1
Thanks very much, Jon. That's really helpful.
>
> > jon
>
> Cheers, Lorenzo
Best Regards
Barry
^ permalink raw reply [flat|nested] 64+ messages in thread
* RE: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-29 12:03 ` Lorenzo Stoakes
@ 2026-06-01 1:46 ` wangtao
2026-06-02 2:15 ` Barry Song
2026-06-02 20:47 ` Lorenzo Stoakes
1 sibling, 1 reply; 64+ messages in thread
From: wangtao @ 2026-06-01 1:46 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Barry Song, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, lance.yang@linux.dev,
xu.xin16@zte.com.cn, chengming.zhou@linux.dev,
nao.horiguchi@gmail.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
gourry@gourry.net, ying.huang@linux.alibaba.com,
apopple@nvidia.com, pfalcato@suse.de,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com,
zhangji, wangzicheng
> > Previously, I also considered converting anon_vma's rb_tree to a
> > mapletree. If one entry records a single VMA, the average overhead
> > could be less than two longs per VMA.
> >
> > However, unlike rb_tree, mapletree does not support storing multiple
> > elements under a single key. The key would need to look like
> > (vma_id/mm_id + pgoff). On 32-bit platforms, since 64-bit mapletree
> > keys are not supported yet, the remaining
> > 12 bits are not enough for vma_id/mm_id.
> >
> > Because of this limitation, I later started thinking about ways to
> > reduce anon_vma allocations instead.
> >
> > I will try to find some time next week to analyze the cow-context
> > design and code more thoroughly, and then write up a summary.
>
> Tao,
>
> This response is so full of misunderstandings it's not really worth me
> responding to any of it. You've even hallucinated an imaginary field which is
> REALLY suspicious.
>
> You've no mm expertise or history and came up with this in a few hours. I
> asked Claude to analyse it and it puts it at 75-80% chance of being solely LLM-
> generated from cow_context.c.
>
> I simply don't have the time to deal with this, so unfortunately I'm going to
> have to withdraw the suggestion of further discussion with you on this topic.
>
> I am working on the scalable CoW project and will solicit opinions of those
> with relevant expertise.
>
> We are not interested in your approach or analysis.
>
> Thanks, Lorenzo
You said discussion was welcome, yet when someone offered even a
small comment, you refused to continue the discussion.
If I had known you would be this inconsistent, I would not have
replied to you in the first place.
This will be my last reply to you. I will not respond again.
Consider the following test case:
Process P creates 1000 VMAs with mmap, named vma_1, vma_2, ...,
vma_1000.
Then it forks child processes C_1, C_2, ..., C_1000. Each child
process C_k keeps only vma_k and munmaps all other vma_i.
With the current anon_vma, reclaim walking each page only needs
to handle two VMAs (vma_k in process P and vma_k in process C_k).
But under the CoW approach, reclaiming each page needs to walk
1000 processes, then spend O(log(#remap_entries)) time to check
whether a remap_entry exists, and then O(log(#vmas)) time to
locate the VMA.
Both the code complexity and the time complexity of the reverse
walk are much higher than the current anon_vma approach.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-01 1:46 ` wangtao
@ 2026-06-02 2:15 ` Barry Song
2026-06-02 2:46 ` Lance Yang
2026-06-02 19:56 ` Harry Yoo
0 siblings, 2 replies; 64+ messages in thread
From: Barry Song @ 2026-06-02 2:15 UTC (permalink / raw)
To: wangtao
Cc: Lorenzo Stoakes, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, lance.yang@linux.dev,
xu.xin16@zte.com.cn, chengming.zhou@linux.dev,
nao.horiguchi@gmail.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
gourry@gourry.net, ying.huang@linux.alibaba.com,
apopple@nvidia.com, pfalcato@suse.de,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com,
zhangji, wangzicheng
On Mon, Jun 1, 2026 at 9:46 AM wangtao <tao.wangtao@honor.com> wrote:
[...]
>
> You said discussion was welcome, yet when someone offered even a
> small comment, you refused to continue the discussion.
>
> If I had known you would be this inconsistent, I would not have
> replied to you in the first place.
>
> This will be my last reply to you. I will not respond again.
>
Hi Tao,
Please don't walk away from the linux-mm community. I read your
patchset and found it quite valuable. It not only reduces memory
overhead, but also eliminates rmap costs for exclusive folios.
Since I'm not very confident discussing technical topics in English,
I wrote a blog post in Chinese about your patchset:
https://mp.weixin.qq.com/s/k00tzhTl8HbL3k4G6ev4SA
I have to admit that I found the implementation quite complex and
in need of significant improvement. However, I think the underlying
idea is very interesting and worth exploring further.
I'm looking forward to seeing a v2 RFC with a cleaner and simpler
implementation while preserving the core concept.
Regardless of whether it ultimately gets merged, I hope the discussion
can continue.
Best regards,
Barry
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-02 2:15 ` Barry Song
@ 2026-06-02 2:46 ` Lance Yang
2026-06-02 15:37 ` Lorenzo Stoakes
2026-06-02 19:56 ` Harry Yoo
1 sibling, 1 reply; 64+ messages in thread
From: Lance Yang @ 2026-06-02 2:46 UTC (permalink / raw)
To: Barry Song, wangtao
Cc: Lorenzo Stoakes, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
pfalcato@suse.de, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com,
zhangji, wangzicheng
On 2026/6/2 10:15, Barry Song wrote:
> On Mon, Jun 1, 2026 at 9:46 AM wangtao <tao.wangtao@honor.com> wrote:
> [...]
>>
>> You said discussion was welcome, yet when someone offered even a
>> small comment, you refused to continue the discussion.
>>
>> If I had known you would be this inconsistent, I would not have
>> replied to you in the first place.
>>
>> This will be my last reply to you. I will not respond again.
>>
>
> Hi Tao,
>
> Please don't walk away from the linux-mm community. I read your
> patchset and found it quite valuable. It not only reduces memory
> overhead, but also eliminates rmap costs for exclusive folios.
>
> Since I'm not very confident discussing technical topics in English,
> I wrote a blog post in Chinese about your patchset:
>
> https://mp.weixin.qq.com/s/k00tzhTl8HbL3k4G6ev4SA
>
> I have to admit that I found the implementation quite complex and
> in need of significant improvement. However, I think the underlying
> idea is very interesting and worth exploring further.
>
> I'm looking forward to seeing a v2 RFC with a cleaner and simpler
> implementation while preserving the core concept.
>
> Regardless of whether it ultimately gets merged, I hope the discussion
> can continue.
Same here :)
Tao, please don't let this thread get you down. No first RFC is
perfect, and the idea still looks worth discussing :)
Thanks for working on this!
Cheers, Lance
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-02 2:46 ` Lance Yang
@ 2026-06-02 15:37 ` Lorenzo Stoakes
2026-06-02 19:44 ` Pedro Falcato
2026-06-02 23:03 ` Barry Song
0 siblings, 2 replies; 64+ messages in thread
From: Lorenzo Stoakes @ 2026-06-02 15:37 UTC (permalink / raw)
To: Lance Yang
Cc: Barry Song, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
pfalcato@suse.de, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com
On Tue, Jun 02, 2026 at 10:46:35AM +0800, Lance Yang wrote:
>
>
> On 2026/6/2 10:15, Barry Song wrote:
> > On Mon, Jun 1, 2026 at 9:46 AM wangtao <tao.wangtao@honor.com> wrote:
> > [...]
> > >
> > > You said discussion was welcome, yet when someone offered even a
> > > small comment, you refused to continue the discussion.
> > >
> > > If I had known you would be this inconsistent, I would not have
> > > replied to you in the first place.
> > >
> > > This will be my last reply to you. I will not respond again.
> > >
> >
> > Hi Tao,
> >
> > Please don't walk away from the linux-mm community. I read your
> > patchset and found it quite valuable. It not only reduces memory
> > overhead, but also eliminates rmap costs for exclusive folios.
> >
> > Since I'm not very confident discussing technical topics in English,
> > I wrote a blog post in Chinese about your patchset:
> >
> > https://mp.weixin.qq.com/s/k00tzhTl8HbL3k4G6ev4SA
> >
> > I have to admit that I found the implementation quite complex and
> > in need of significant improvement. However, I think the underlying
> > idea is very interesting and worth exploring further.
> >
> > I'm looking forward to seeing a v2 RFC with a cleaner and simpler
> > implementation while preserving the core concept.
> >
> > Regardless of whether it ultimately gets merged, I hope the discussion
> > can continue.
>
> Same here :)
>
> Tao, please don't let this thread get you down. No first RFC is
> perfect, and the idea still looks worth discussing :)
>
> Thanks for working on this!
Guys, this isn't helpful.
We aren't extending anon_vma, and I am working on replacing it, that's the
bottom line.
I have presented compelling evidence suggesting this is AI generated. In
response I got more AI-generated nonsense. There's no trust, the code and
analysis are all wrong, end of discussion.
>
> Cheers, Lance
>
Thanks, Lorenzo
P.S. maintainership is utterly thankless, and I don't really expect much in
return, but honestly reading this, given the case I've made here, was
really quite disappointing.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (17 preceding siblings ...)
2026-05-27 14:33 ` Lorenzo Stoakes
@ 2026-06-02 16:07 ` Harry Yoo
2026-06-03 2:59 ` wangtao
2026-06-03 20:25 ` David Hildenbrand (Arm)
19 siblings, 1 reply; 64+ messages in thread
From: Harry Yoo @ 2026-06-02 16:07 UTC (permalink / raw)
To: tao, catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86,
akpm, david, willy, sj, kees, luizcap, zhangjiao2, kas, ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, jannh, jgg,
jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
[-- Attachment #1.1: Type: text/plain, Size: 807 bytes --]
On 5/27/26 8:01 PM, tao wrote:
> Design overview
> ---------------
>
> ANON_VMA_LAZY defers anon_vma allocation until it is actually needed
> (for example during fork). VMAs that never participate in sharing can
> avoid creating anon_vma structures entirely.
>
> Before an anon_vma exists, rmap operations rely directly on VMA
> information, so no anon_vma locking is required. An anon_vma is created
> and linked only when sharing semantics are required.
It is unfortunate that the design overview doesn't cover correctness
aspect at all. VMAs are subject to change (even before being shared with
other processes), and rmap needs something that doesn't go away across
VMA merging, split, etc.
I'm not sure how the idea is supposed work correctly.
--
Cheers,
Harry / Hyeonggon
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-02 15:37 ` Lorenzo Stoakes
@ 2026-06-02 19:44 ` Pedro Falcato
2026-06-02 23:03 ` Barry Song
1 sibling, 0 replies; 64+ messages in thread
From: Pedro Falcato @ 2026-06-02 19:44 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Lance Yang, Barry Song, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com
On Tue, Jun 02, 2026 at 04:37:14PM +0100, Lorenzo Stoakes wrote:
> On Tue, Jun 02, 2026 at 10:46:35AM +0800, Lance Yang wrote:
> >
> >
> > On 2026/6/2 10:15, Barry Song wrote:
> > > On Mon, Jun 1, 2026 at 9:46 AM wangtao <tao.wangtao@honor.com> wrote:
> > > [...]
> > > >
> > > > You said discussion was welcome, yet when someone offered even a
> > > > small comment, you refused to continue the discussion.
> > > >
> > > > If I had known you would be this inconsistent, I would not have
> > > > replied to you in the first place.
> > > >
> > > > This will be my last reply to you. I will not respond again.
> > > >
> > >
> > > Hi Tao,
> > >
> > > Please don't walk away from the linux-mm community. I read your
> > > patchset and found it quite valuable. It not only reduces memory
> > > overhead, but also eliminates rmap costs for exclusive folios.
> > >
> > > Since I'm not very confident discussing technical topics in English,
> > > I wrote a blog post in Chinese about your patchset:
> > >
> > > https://mp.weixin.qq.com/s/k00tzhTl8HbL3k4G6ev4SA
> > >
> > > I have to admit that I found the implementation quite complex and
> > > in need of significant improvement. However, I think the underlying
> > > idea is very interesting and worth exploring further.
> > >
> > > I'm looking forward to seeing a v2 RFC with a cleaner and simpler
> > > implementation while preserving the core concept.
> > >
> > > Regardless of whether it ultimately gets merged, I hope the discussion
> > > can continue.
> >
> > Same here :)
> >
> > Tao, please don't let this thread get you down. No first RFC is
> > perfect, and the idea still looks worth discussing :)
> >
> > Thanks for working on this!
>
> Guys, this isn't helpful.
>
> We aren't extending anon_vma, and I am working on replacing it, that's the
> bottom line.
>
> I have presented compelling evidence suggesting this is AI generated. In
> response I got more AI-generated nonsense. There's no trust, the code and
> analysis are all wrong, end of discussion.
100% agree. I think plenty of technical/process/etc reasons as to why this
idea/contribution is not mergeable have been listed. Overriding this with
"keep it up!!!111!11!!" is not helpful.
--
Pedro
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-02 2:15 ` Barry Song
2026-06-02 2:46 ` Lance Yang
@ 2026-06-02 19:56 ` Harry Yoo
2026-06-02 22:27 ` Barry Song
1 sibling, 1 reply; 64+ messages in thread
From: Harry Yoo @ 2026-06-02 19:56 UTC (permalink / raw)
To: Barry Song, wangtao
Cc: Lorenzo Stoakes, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, jannh@google.com, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, ziy@nvidia.com,
baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, lance.yang@linux.dev,
xu.xin16@zte.com.cn, chengming.zhou@linux.dev,
nao.horiguchi@gmail.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
gourry@gourry.net, ying.huang@linux.alibaba.com,
apopple@nvidia.com, pfalcato@suse.de,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com,
zhangji, wangzicheng
[-- Attachment #1.1: Type: text/plain, Size: 3099 bytes --]
On 6/2/26 11:15 AM, Barry Song wrote:
> On Mon, Jun 1, 2026 at 9:46 AM wangtao <tao.wangtao@honor.com> wrote:
> [...]
>>
>> You said discussion was welcome, yet when someone offered even a
>> small comment, you refused to continue the discussion.
>>
>> If I had known you would be this inconsistent, I would not have
>> replied to you in the first place.
>>
>> This will be my last reply to you. I will not respond again.
>
> Hi Tao,
>
> Please don't walk away from the linux-mm community. I read your
> patchset and found it quite valuable. It not only reduces memory
> overhead, but also eliminates rmap costs for exclusive folios.
>
> Since I'm not very confident discussing technical topics in English,
> I wrote a blog post in Chinese about your patchset:
>
> https://mp.weixin.qq.com/s/k00tzhTl8HbL3k4G6ev4SA
The cover letter and commit messages should have been elaborated to a
much greater degree instead of making people guess the design and intent
from the code.
> I have to admit that I found the implementation quite complex and
> in need of significant improvement.
> However, I think the underlying> idea is very interesting and worth
exploring further.
No. What it is trying to achieve is ambitious, but the idea itself is
not worth exploring further as-is unless the correctness and complexity
concerns are addressed.
> I'm looking forward to seeing a v2 RFC with a cleaner and simpler
> implementation while preserving the core concept.
I'm afraid this encouragement would mislead us in the wrong direction,
where all of us end up wasting time.
There isn't much point in posting v2 without addressing fundamental
questions about the design.
> Regardless of whether it ultimately gets merged, I hope the discussion
> can continue.
Regarding the "improving the reverse mapping subsystem" topic, a more
constructive direction would be to carefully revisit the design
decisions and discuss what we can do about them (that's exactly what
Lorenzo has been doing).
But that's not the first thing I would recommend to a relatively new
contributor given that it's really complicated and even the people who
have designed and reworked the reverse mapping subsystem over the past
20+ years haven't come up with a fundamentally better design.
Reverse mapping is a frustratingly complicated subsystem. Without
carefully revisiting the current design, there is not much hope of
improving things at the design level, even slightly.
What I would recommend to new people instead is:
1) starting by reviewing other people's work, so that you have enough
time to learn the historical context and subtleties of the subsystem
without making intrusive changes (which also keeps in touch with the
community), and
2) making progress on smaller tasks with less intrusive changes, to
gradually build trust and be able to do more valuable work.
Unfortunately, looking at how this thread went, I see that the author is
now in a worse position than an entirely new contributor.
--
Cheers,
Harry / Hyeonggon
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-29 12:03 ` Lorenzo Stoakes
2026-06-01 1:46 ` wangtao
@ 2026-06-02 20:47 ` Lorenzo Stoakes
1 sibling, 0 replies; 64+ messages in thread
From: Lorenzo Stoakes @ 2026-06-02 20:47 UTC (permalink / raw)
To: wangtao
Cc: Barry Song, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, lance.yang@linux.dev,
xu.xin16@zte.com.cn, chengming.zhou@linux.dev,
nao.horiguchi@gmail.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
gourry@gourry.net, ying.huang@linux.alibaba.com,
apopple@nvidia.com, pfalcato@suse.de,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com,
zhangji, wangzicheng
On Fri, May 29, 2026 at 01:04:08PM +0100, Lorenzo Stoakes wrote:
> On Fri, May 29, 2026 at 09:41:20AM +0000, wangtao wrote:
> > > Hi Tao,
> > >
> > > Lorenzo had a discussion about rmap in Zagreb here:
> > > https://lore.kernel.org/linux-mm/aec533b2-37a7-4f44-a279-
> > > c4aa604206ac@lucifer.local/
> > >
> > > He also shared the PoC code here:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/ljs/linux.git/log/?h=project/
> > > cow-context
> > >
> > > and the slides were shared as well. In case you can't find them on linux-mm (I
> > > actually couldn't find them myself), I am attaching them again here -
> > > "scalable-cow-lsf-longer-version.pdf"
> > >
> > > After coming back from Zagreb, I kept trying to find one or two full days to
> > > read Lorenzo's code and slides carefully and write a blog about them.
> > > Unfortunately, I have been completely busy with other work. Sigh... we
> > > always seem to have too many non-upstream tasks.
> > >
> > > If possible, I'd really appreciate it if you could take a deep dive into it and
> > > write a detailed blog post. I'd be very eager to read it and better understand
> > > the overall design.
> > > Otherwise, I'll try to find some time next week or later to go through it
> > > myself.
> > >
> > Hi Barry,
> >
> > Thank you very much for your reply.
> >
> > I took an initial look at the cow-context code, and a few points
> > might be worth noting:
> >
> > 1. cow_context_walk currently assumes that the rmap walk runs
> > under RCU protection. This may need to be adjusted early,
> > since paths such as try_to_unmap_one, page_vma_mkclean_one,
> > and try_to_migrate_one may involve task switching.
> >
> > 2. In cow_context_walk, traverse_contexts appears to involve
> > multiple nested loops. When there are many child processes
> > across several fork layers, it may not be as simple or
> > efficient as the current anon_vma approach.
> >
> > It needs to traverse all child cow_ctx, and within each
> > cow_ctx, remaps_for_each() has two levels of iteration:
> > remaps_for_each_entry and remaps_for_each_entry_offset.
> >
> > In other words, it first iterates over cow_ctx and then
> > traverses rmap_mt inside each one. The rough complexity
>
> > seems to be O(#proc * log(#rmap_entries_in_cow)), which
> > may be somewhat higher than anon_vma's
> > O(#vmas_in_anon_vma). However, in most cases the number
> > of processes is not large, so the impact may be limited.
> >
> > Previously, I also considered converting anon_vma's rb_tree
> > to a mapletree. If one entry records a single VMA, the
> > average overhead could be less than two longs per VMA.
> >
> > However, unlike rb_tree, mapletree does not support storing
> > multiple elements under a single key. The key would need to
> > look like (vma_id/mm_id + pgoff). On 32-bit platforms, since
> > 64-bit mapletree keys are not supported yet, the remaining
> > 12 bits are not enough for vma_id/mm_id.
> >
> > Because of this limitation, I later started thinking about
> > ways to reduce anon_vma allocations instead.
> >
> > I will try to find some time next week to analyze the
> > cow-context design and code more thoroughly, and then
> > write up a summary.
>
> Tao,
>
> This response is so full of misunderstandings it's not really worth me
> responding to any of it. You've even hallucinated an imaginary field which
> is REALLY suspicious.
>
> You've no mm expertise or history and came up with this in a few hours. I
> asked Claude to analyse it and it puts it at 75-80% chance of being solely
> LLM-generated from cow_context.c.
>
> I simply don't have the time to deal with this, so unfortunately I'm going
> to have to withdraw the suggestion of further discussion with you on this
> topic.
>
> I am working on the scalable CoW project and will solicit opinions of those
> with relevant expertise.
>
> We are not interested in your approach or analysis.
>
> Thanks, Lorenzo
Apparently there's some misunderstanding about this situation here, sigh.
So for avoidance of doubt - I've now spent many hours on this, and unfortunately
(as I've already said in multiple places) this series has serious architectural
and code flaws.
And unfortunately, the anon_vma approach is not something we wish to extend, for
reasons I've gone into elsewhere - but broadly because it's a broken
abstraction, that uses lots of memory and causes lock contention.
The approach here has multiple technical issues, so many that getting into each
one would require hours more of my time to analyse, maybe all week?
And then if there were further replies and replies to the replies and respins...
However, I also feel there's substantive, overlapping, evidence of the _logic_
(not the text, we are FINE with using AI to assist text for non-native speakers)
being LLM-generated.
However you can never prove this for 100% certain. But you can certainly be more
or less sure. I would never suggest this unless I was really pretty certain.
I am very keen to avoid 'witch hunts', or rash accusations. This is not
that. It's a _carefully considered_ opinion, based on evidence.
But of course - I do not know for SURE. You can never know.
The big problem here is asymmetry of maintainer resource. I simply _cannot_
respond to every single issue here. And when the architecture is something we
don't want, then it's not really necessary to.
And my big deep underlying concern with all this is - people can generate a very
significant amount of this kind of work, and we have limited reviewer time.
I've already dealt with burnout recently that I'm thankfully recovering
from. I'm not really keen to go back to that.
I really truly worry that if we don't have a means by which we can quickly
dismiss/deprioritise things when we have a _significant_ evidence of wholesale
AI generation, then maintainer overload will increase exponentially.
And that's really a serious problem.
If we treat it like simply a technically incorrect solution, then it means we
open it up to further discussion on and onx, as we're actually observing here. If
the responses are also LLM-generated then it's even more problematic.
This is why I bring it up, and proactively say it's lead to a real loss in trust
in this case, and why, after there was a response that included a hallucinated
field in it, I went further and said that I really don't want to have a
discussion either.
It's because of this asymmetry.
And even this reply, written at 9.45pm at night, after several hours of
discussion about this off-list, is evidence of the problem we have with this
kind of asymmetry.
It's nothing personal, it's about managing time and resources.
Thanks, Lorenzo
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-02 19:56 ` Harry Yoo
@ 2026-06-02 22:27 ` Barry Song
0 siblings, 0 replies; 64+ messages in thread
From: Barry Song @ 2026-06-02 22:27 UTC (permalink / raw)
To: Harry Yoo
Cc: wangtao, Lorenzo Stoakes, catalin.marinas@arm.com,
will@kernel.org, tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, jannh@google.com, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, ziy@nvidia.com,
baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, lance.yang@linux.dev,
xu.xin16@zte.com.cn, chengming.zhou@linux.dev,
nao.horiguchi@gmail.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
gourry@gourry.net, ying.huang@linux.alibaba.com,
apopple@nvidia.com, pfalcato@suse.de,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com,
zhangji, wangzicheng
On Wed, Jun 3, 2026 at 3:57 AM Harry Yoo <harry@kernel.org> wrote:
>
>
>
> On 6/2/26 11:15 AM, Barry Song wrote:
> > On Mon, Jun 1, 2026 at 9:46 AM wangtao <tao.wangtao@honor.com> wrote:
> > [...]
> >>
> >> You said discussion was welcome, yet when someone offered even a
> >> small comment, you refused to continue the discussion.
> >>
> >> If I had known you would be this inconsistent, I would not have
> >> replied to you in the first place.
> >>
> >> This will be my last reply to you. I will not respond again.
> >
> > Hi Tao,
> >
> > Please don't walk away from the linux-mm community. I read your
> > patchset and found it quite valuable. It not only reduces memory
> > overhead, but also eliminates rmap costs for exclusive folios.
> >
> > Since I'm not very confident discussing technical topics in English,
> > I wrote a blog post in Chinese about your patchset:
> >
> > https://mp.weixin.qq.com/s/k00tzhTl8HbL3k4G6ev4SA
> The cover letter and commit messages should have been elaborated to a
> much greater degree instead of making people guess the design and intent
> from the code.
Indeed. The cover letter does not clearly tell the story, and yesterday
I needed quite some time to understand what the patchset
was trying to achieve.
>
> > I have to admit that I found the implementation quite complex and
> > in need of significant improvement.
>
> > However, I think the underlying> idea is very interesting and worth
> exploring further.
>
> No. What it is trying to achieve is ambitious, but the idea itself is
> not worth exploring further as-is unless the correctness and complexity
> concerns are addressed.
Can we give Tao more time to address the concerns and explain
the correctness of the approach?
That said, I don't think the patchset is entirely without merit.
The idea that caught my attention is whether knowing that a
process is guaranteed to be a leaf process could allow us to
simplify parts of the rmap machinery and reduce some of the
associated overhead.
Assuming that a fork server (e.g. systemd or zygote) is preferable
to having each application perform its own fork(), Linux already
largely relies on fork servers in practice. Matthew also pointed
out that calling fork() in multithreaded applications is a
terrible idea [1]. This may suggest that, in general, processes
outside of a fork-server model should avoid using fork().
If we were to introduce an API such as prctl(PR_SET_NOFORK) or
something similar, could we eliminate a significant portion of
the rmap-related overhead for such leaf processes, while still
avoiding the complexity of the lazy allocation scheme proposed
by Tao?
I assume that the vast majority of processes in a real system
are leaf processes?
It also seems somewhat unusual that a few Android applications
invoke fork() directly in a multithreaded context, while most
use the zygote to create multiple processes for an app. Perhaps
the Android framework should discourage this pattern entirely,
and require applications to create child processes via the zygote?
If, in real-world systems, more than 95% of processes are leaf
processes, could that imply that the rmap design might be
reconsidered for a different optimization path?
[1] https://marc.info/?l=linuxppc-embedded&m=177912107460825&w=2
>
> > I'm looking forward to seeing a v2 RFC with a cleaner and simpler
> > implementation while preserving the core concept.
>
> I'm afraid this encouragement would mislead us in the wrong direction,
> where all of us end up wasting time.
>
> There isn't much point in posting v2 without addressing fundamental
> questions about the design.
I suggested a v2 because the current patchset does not clearly
state what it is trying to achieve. A revised version might help
clarify the intent and make it easier to understand. Even if the
overall complexity (such as lazy allocation) makes it hard to
move forward, we may still be able to learn from it and gain
some useful inspiration.
>
> > Regardless of whether it ultimately gets merged, I hope the discussion
> > can continue.
>
> Regarding the "improving the reverse mapping subsystem" topic, a more
> constructive direction would be to carefully revisit the design
> decisions and discuss what we can do about them (that's exactly what
> Lorenzo has been doing).
I have no doubt at all about Lorenzo’s expertise in rmap and many
other mm areas. That is well understood and widely recognized.
I just think that hearing more perspectives could help us gain
additional insight and inspiration.
>
> But that's not the first thing I would recommend to a relatively new
> contributor given that it's really complicated and even the people who
> have designed and reworked the reverse mapping subsystem over the past
> 20+ years haven't come up with a fundamentally better design.
>
> Reverse mapping is a frustratingly complicated subsystem. Without
> carefully revisiting the current design, there is not much hope of
> improving things at the design level, even slightly.
>
> What I would recommend to new people instead is:
>
> 1) starting by reviewing other people's work, so that you have enough
> time to learn the historical context and subtleties of the subsystem
> without making intrusive changes (which also keeps in touch with the
> community), and
>
> 2) making progress on smaller tasks with less intrusive changes, to
> gradually build trust and be able to do more valuable work.
>
Yes, that is a good approach for new contributors.
> Unfortunately, looking at how this thread went, I see that the author is
> now in a worse position than an entirely new contributor.
>
> --
> Cheers,
> Harry / Hyeonggon
Thanks
Barry
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-02 15:37 ` Lorenzo Stoakes
2026-06-02 19:44 ` Pedro Falcato
@ 2026-06-02 23:03 ` Barry Song
2026-06-03 7:07 ` Lorenzo Stoakes
1 sibling, 1 reply; 64+ messages in thread
From: Barry Song @ 2026-06-02 23:03 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Lance Yang, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
pfalcato@suse.de, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com
On Tue, Jun 2, 2026 at 11:37 PM Lorenzo Stoakes <ljs@kernel.org> wrote:
>
> On Tue, Jun 02, 2026 at 10:46:35AM +0800, Lance Yang wrote:
> >
> >
> > On 2026/6/2 10:15, Barry Song wrote:
> > > On Mon, Jun 1, 2026 at 9:46 AM wangtao <tao.wangtao@honor.com> wrote:
> > > [...]
> > > >
> > > > You said discussion was welcome, yet when someone offered even a
> > > > small comment, you refused to continue the discussion.
> > > >
> > > > If I had known you would be this inconsistent, I would not have
> > > > replied to you in the first place.
> > > >
> > > > This will be my last reply to you. I will not respond again.
> > > >
> > >
> > > Hi Tao,
> > >
> > > Please don't walk away from the linux-mm community. I read your
> > > patchset and found it quite valuable. It not only reduces memory
> > > overhead, but also eliminates rmap costs for exclusive folios.
> > >
> > > Since I'm not very confident discussing technical topics in English,
> > > I wrote a blog post in Chinese about your patchset:
> > >
> > > https://mp.weixin.qq.com/s/k00tzhTl8HbL3k4G6ev4SA
> > >
> > > I have to admit that I found the implementation quite complex and
> > > in need of significant improvement. However, I think the underlying
> > > idea is very interesting and worth exploring further.
> > >
> > > I'm looking forward to seeing a v2 RFC with a cleaner and simpler
> > > implementation while preserving the core concept.
> > >
> > > Regardless of whether it ultimately gets merged, I hope the discussion
> > > can continue.
> >
> > Same here :)
> >
> > Tao, please don't let this thread get you down. No first RFC is
> > perfect, and the idea still looks worth discussing :)
> >
> > Thanks for working on this!
>
> Guys, this isn't helpful.
>
> We aren't extending anon_vma, and I am working on replacing it, that's the
> bottom line.
Not trying to challenge your bottom line. As explained to Harry, I
have no doubt about your expertise in rmap and many other mm
areas, and I deeply respect your work on rmap.
With more discussion, we might gain additional insight and
inspiration. What Tao has inspired me with is the idea that if we
assume most real-world processes are leaf processes, could we
simplify parts of the design?
This is why I suggested a v2, to improve the clarity of the cover
letter and make the code easier to understand, and to see whether
there is something worth considering further, even if it is not
suitable for merging.
>
> I have presented compelling evidence suggesting this is AI generated. In
> response I got more AI-generated nonsense. There's no trust, the code and
> analysis are all wrong, end of discussion.
I am not an AI expert, and I do not really use AI in kernel work,
so I am not really sure what counts as AI versus non-AI. Sorry.
>
> >
> > Cheers, Lance
> >
>
> Thanks, Lorenzo
>
> P.S. maintainership is utterly thankless, and I don't really expect much in
> return, but honestly reading this, given the case I've made here, was
> really quite disappointing.
Understood. I see your position, and I personally have great
respect and appreciation for your work on maintenance. Sorry if
my words came across as disappointing.
Best Regards
Barry
^ permalink raw reply [flat|nested] 64+ messages in thread
* RE: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-02 16:07 ` Harry Yoo
@ 2026-06-03 2:59 ` wangtao
2026-06-03 3:12 ` wangtao
2026-06-03 7:54 ` Lorenzo Stoakes
0 siblings, 2 replies; 64+ messages in thread
From: wangtao @ 2026-06-03 2:59 UTC (permalink / raw)
To: Harry Yoo, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, ljs@kernel.org
Cc: hpa@zytor.com, liam@infradead.org, vbabka@kernel.org,
rppt@kernel.org, surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, jannh@google.com, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, ziy@nvidia.com,
baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org,
lance.yang@linux.dev, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
pfalcato@suse.de, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, 21cnbao@gmail.com, jparsana@google.com,
dvander@google.com, zhangji, wangzicheng
> On 5/27/26 8:01 PM, tao wrote:
> > Design overview
> > ---------------
> >
> > ANON_VMA_LAZY defers anon_vma allocation until it is actually needed
> > (for example during fork). VMAs that never participate in sharing can
> > avoid creating anon_vma structures entirely.
> >
> > Before an anon_vma exists, rmap operations rely directly on VMA
> > information, so no anon_vma locking is required. An anon_vma is
> > created and linked only when sharing semantics are required.
>
> It is unfortunate that the design overview doesn't cover correctness aspect
> at all. VMAs are subject to change (even before being shared with other
> processes), and rmap needs something that doesn't go away across VMA
> merging, split, etc.
>
> I'm not sure how the idea is supposed work correctly.
>
> --
> Cheers,
> Harry / Hyeonggon
VMA operations can be roughly divided into three categories. The handling
of ANON_VMA_LAZY is briefly described below.
1. fork
fork duplicates the parent's mm/mmap. (exec creates a new mm/mmap and is
not involved here.) This can be viewed as copying the VMAs with identical
virtual addresses into a new address space.
If the parent VMA (pvma) is ANON_VMA_LAZY, it is first upgraded to a
regular anon_vma. The corresponding folio->mapping is then fixed in
try_dup_anon_rmap().
2. mmap / brk / mprotect / munmap
These operations create, modify, or remove VMAs in the current mm. They
may split existing VMAs, merge adjacent VMAs, or remove a VMA from mm_mt.
When a new VMA is created, vm_start, vm_end and vm_pgoff are initialized
and the VMA is inserted into mm_mt. Although these fields may later be
modified, the following value remains invariant:
(vm_start - vm_pgoff * PAGE_SIZE)
We refer to this value as:
vma_mapping_base(vma) = vma->vm_start - vma->vm_pgoff * PAGE_SIZE
This value also remains unchanged when the VMA is removed from mm_mt.
If a VMA is split and produces new_vma, the following holds:
vma_mapping_base(new_vma) == vma_mapping_base(vma)
If two adjacent VMAs vma_a and vma_b are merged into vma_x, then:
vma_mapping_base(vma_a) == vma_mapping_base(vma_b) ==
vma_mapping_base(vma_x)
Assume the VMA where the first page fault occurs is called root_vma, and
ensure that any VMA produced by split or merge holds a reference to
root_vma.
During rmap we can compute the folio address using root_vma:
vma_address(vma, pgoff, 1) =
vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT)
= vma_mapping_base(vma) + pgoff * PAGE_SIZE
= vma_mapping_base(root_vma) + folio_pgoff * PAGE_SIZE
We can then use folio_addr to locate the VMA covering this folio.
3. mremap / uffd_move
If only the size changes and the start address remains the same, there
is no impact.
If the start address changes, the page is moved from (vma, addr) to
(new_vma, new_addr). In this case:
vma_mapping_base(new_vma) =
vma_mapping_base(vma) + new_addr - old_addr
We first upgrade the VMA, and then fix folio->mapping in move_ptes().
If performance becomes a concern, ANON_VMA_LAZY can be enabled only for
relatively small VMAs.
vma操作可以分为3类,下面简单说明下ANON_VMA_LAZY的处理:
1. fork 从父进程复制mm/mmap;(exev 创建一个新的mm/mmap,不涉及)。
这可以理解为在一个新的地址空间复制一份相同地址的VMAs.
如果pvma是ANON_VMA_LAZY,先升级为regular anon_vma,并在try_dup_anon_rmap中升级修正folio->mapping.
2. mmap/brk/mprotect/munmap
创建、修改或删除当前mm的VMA,可能合并或拆分出新的VMAs或者将VMA从mm_mt删除。
创建一个新的vma并设置vm_start、vm_end、vm_pgoff插入mm_mt后,虽然后续可能修改这个VMA的vm_start、vm_end、vm_pgoff,但是保持
(vm_start - vm_pgoff * PAGE_SIZE)不变,我们可以把这个称之为vma_mapping_base(vma) = vma->vm_start - vma->vm_pgoff * PAGE_SIZE。
这个vma从mm_mt删除时,vma_mapping_base(vma)也保持不变。
从这个vma拆分出的new_vma,有vma_mapping_base(new_vma) == vma_mapping_base(vma)
合并相邻vma_a、vma_b为vma_x时,也有vma_mapping_base(vma_a) == vma_mapping_base(vma_b) == vma_mapping_base(vma_x)
如果我们第一次发生缺页的VMA称为root_vma,并在split或merge时都确保使用的vma持有root_vma的引用。
在rmap时我们可以用root_vma计算folio地址:
vma_address(vma, pgoff, 1) = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT)
= vma_mapping_base(vma) + pgoff * PAGE_SIZE
= vma_mapping_base(root_vma) + folio_pgoff * PAGE_SIZE
然后用folio_addr查找folio所在的vma。
3. mremap/uffd_move
如果只是修改大小,起始地址不变,不影响。
如果改变起始地址,将page从vma/addr移动到new_vma/new_addr
这时vma_mapping_base(new_vma) = vma_mapping_base(vma) + new_addr - old_addr
我们先升级vma,在move_ptes中再修正folio->mapping。
如果担心性能影响,可以只在较小的vma上使能ANON_VMA_LAZY。
^ permalink raw reply [flat|nested] 64+ messages in thread
* RE: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-03 2:59 ` wangtao
@ 2026-06-03 3:12 ` wangtao
2026-06-03 7:54 ` Lorenzo Stoakes
1 sibling, 0 replies; 64+ messages in thread
From: wangtao @ 2026-06-03 3:12 UTC (permalink / raw)
To: Harry Yoo, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, ljs@kernel.org
Cc: hpa@zytor.com, liam@infradead.org, vbabka@kernel.org,
rppt@kernel.org, surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, jannh@google.com, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, ziy@nvidia.com,
baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org,
lance.yang@linux.dev, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
pfalcato@suse.de, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, 21cnbao@gmail.com, jparsana@google.com,
dvander@google.com, zhangji, wangzicheng
> During rmap we can compute the folio address using root_vma:
>
> vma_address(vma, pgoff, 1) =
> vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT)
> = vma_mapping_base(vma) + pgoff * PAGE_SIZE
> = vma_mapping_base(root_vma) + folio_pgoff * PAGE_SIZE
>
It is inconsistent here. The offset should remain pgoff throughout. It should be:
vma_address(vma, pgoff, 1) =
vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT)
= vma_mapping_base(vma) + pgoff * PAGE_SIZE
= vma_mapping_base(root_vma) + pgoff * PAGE_SIZE
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-02 23:03 ` Barry Song
@ 2026-06-03 7:07 ` Lorenzo Stoakes
0 siblings, 0 replies; 64+ messages in thread
From: Lorenzo Stoakes @ 2026-06-03 7:07 UTC (permalink / raw)
To: Barry Song
Cc: Lance Yang, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
pfalcato@suse.de, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com
On Wed, Jun 03, 2026 at 07:03:53AM +0800, Barry Song wrote:
> On Tue, Jun 2, 2026 at 11:37 PM Lorenzo Stoakes <ljs@kernel.org> wrote:
> >
> > On Tue, Jun 02, 2026 at 10:46:35AM +0800, Lance Yang wrote:
> > >
> > >
> > > On 2026/6/2 10:15, Barry Song wrote:
> > > > On Mon, Jun 1, 2026 at 9:46 AM wangtao <tao.wangtao@honor.com> wrote:
> > > > [...]
> > > > >
> > > > > You said discussion was welcome, yet when someone offered even a
> > > > > small comment, you refused to continue the discussion.
> > > > >
> > > > > If I had known you would be this inconsistent, I would not have
> > > > > replied to you in the first place.
> > > > >
> > > > > This will be my last reply to you. I will not respond again.
> > > > >
> > > >
> > > > Hi Tao,
> > > >
> > > > Please don't walk away from the linux-mm community. I read your
> > > > patchset and found it quite valuable. It not only reduces memory
> > > > overhead, but also eliminates rmap costs for exclusive folios.
> > > >
> > > > Since I'm not very confident discussing technical topics in English,
> > > > I wrote a blog post in Chinese about your patchset:
> > > >
> > > > https://mp.weixin.qq.com/s/k00tzhTl8HbL3k4G6ev4SA
> > > >
> > > > I have to admit that I found the implementation quite complex and
> > > > in need of significant improvement. However, I think the underlying
> > > > idea is very interesting and worth exploring further.
> > > >
> > > > I'm looking forward to seeing a v2 RFC with a cleaner and simpler
> > > > implementation while preserving the core concept.
> > > >
> > > > Regardless of whether it ultimately gets merged, I hope the discussion
> > > > can continue.
> > >
> > > Same here :)
> > >
> > > Tao, please don't let this thread get you down. No first RFC is
> > > perfect, and the idea still looks worth discussing :)
> > >
> > > Thanks for working on this!
> >
> > Guys, this isn't helpful.
> >
> > We aren't extending anon_vma, and I am working on replacing it, that's the
> > bottom line.
>
> Not trying to challenge your bottom line. As explained to Harry, I
> have no doubt about your expertise in rmap and many other mm
> areas, and I deeply respect your work on rmap.
Thanks I appreciate that.
I don't mean to be 'mean' here, I'm only acting in what I feel are the best
interests of mm and the kernel.
>
> With more discussion, we might gain additional insight and
> inspiration. What Tao has inspired me with is the idea that if we
> assume most real-world processes are leaf processes, could we
> simplify parts of the design?
Maybe I didn't express it clearly enough at LSF, but this is entirely a key
point of my CoW context design :)
It's true most stuff is leaf, and yes we can take advantage of this, and CoW
context allows us to do it while also unravelling the issues with anon_vma.
I am actually thinking of doing some incremental changes as part of my work
possibly if I can.
I maybe need to expedite that to bring some clarity to things here...
>
> This is why I suggested a v2, to improve the clarity of the cover
> letter and make the code easier to understand, and to see whether
> there is something worth considering further, even if it is not
> suitable for merging.
Right, I see. Again I'm really trying to tread a fine line here between the
technical discussion and not pouring more and more time into a discussion that's
not useful to me or the community.
See [0] as to my reasoning on this :)
[0]:https://lore.kernel.org/all/ah887A5VkXOcmq-g@lucifer/
>
> >
> > I have presented compelling evidence suggesting this is AI generated. In
> > response I got more AI-generated nonsense. There's no trust, the code and
> > analysis are all wrong, end of discussion.
>
> I am not an AI expert, and I do not really use AI in kernel work,
> so I am not really sure what counts as AI versus non-AI. Sorry.
No worries!
>
> >
> > >
> > > Cheers, Lance
> > >
> >
> > Thanks, Lorenzo
> >
> > P.S. maintainership is utterly thankless, and I don't really expect much in
> > return, but honestly reading this, given the case I've made here, was
> > really quite disappointing.
>
> Understood. I see your position, and I personally have great
> respect and appreciation for your work on maintenance. Sorry if
> my words came across as disappointing.
Thanks, appreciate it. And no worries!
>
> Best Regards
> Barry
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-03 2:59 ` wangtao
2026-06-03 3:12 ` wangtao
@ 2026-06-03 7:54 ` Lorenzo Stoakes
2026-06-03 11:05 ` wangtao
1 sibling, 1 reply; 64+ messages in thread
From: Lorenzo Stoakes @ 2026-06-03 7:54 UTC (permalink / raw)
To: wangtao
Cc: Harry Yoo, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, jannh@google.com, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, ziy@nvidia.com,
baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org,
lance.yang@linux.dev, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
pfalcato@suse.de, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, 21cnbao@gmail.com, jparsana@google.com,
dvander@google.com, zhangji, wangzicheng
On Wed, Jun 03, 2026 at 02:59:04AM +0000, wangtao wrote:
> > On 5/27/26 8:01 PM, tao wrote:
> > > Design overview
> > > ---------------
> > >
> > > ANON_VMA_LAZY defers anon_vma allocation until it is actually needed
> > > (for example during fork). VMAs that never participate in sharing can
> > > avoid creating anon_vma structures entirely.
> > >
> > > Before an anon_vma exists, rmap operations rely directly on VMA
> > > information, so no anon_vma locking is required. An anon_vma is
> > > created and linked only when sharing semantics are required.
> >
> > It is unfortunate that the design overview doesn't cover correctness aspect
> > at all. VMAs are subject to change (even before being shared with other
> > processes), and rmap needs something that doesn't go away across VMA
> > merging, split, etc.
> >
> > I'm not sure how the idea is supposed work correctly.
> >
> > --
> > Cheers,
> > Harry / Hyeonggon
>
Against my better judgment I'll address the stuff here...
> VMA operations can be roughly divided into three categories. The handling
> of ANON_VMA_LAZY is briefly described below.
I don't agree, there are plenty more VMA operations. But with respect to anon
rmap there are:
- fork
- merge/split
- remap
Your approach seems to completely ignore VMA split and the need to maintain
an interval tree to _multiple_ VMAs from a single anon_vma.
You may also actually split a VMA against a single large folio (waiting on
the deferred shrinker) and have a SINGLE _leaf_ anonymous folio that is
mapped in two places.
The lazy approach doesn't seem to address this properly. And fatally it
ties an actual VMA afaict to the folio and has to implement a VMA reference
count mechanism which interferes with the ordinarily VMA lifecycle to do
it.
The fact of us taking advantage of most stuff being AnonExclusive,
i.e. 'leaves' is something that my approach is exactly taking into account.
Of course also extending anon_vma is a real non-starter.
Also the below + the series ignores MAP_PRIVATE file-backed mappings which
is a pretty fatal flaw.
It also, as Harry says, has zero description of correctness in a way we'd
want and no tests.
>
> 1. fork
>
> fork duplicates the parent's mm/mmap. (exec creates a new mm/mmap and is
> not involved here.) This can be viewed as copying the VMAs with identical
> virtual addresses into a new address space.
>
> If the parent VMA (pvma) is ANON_VMA_LAZY, it is first upgraded to a
> regular anon_vma. The corresponding folio->mapping is then fixed in
> try_dup_anon_rmap().
And so we make fork, a very sensitive path in the kernel more expensive.
I also question the locking situation with the conversion mentioned,
updating folios in this manner is extremely difficult.
>
> 2. mmap / brk / mprotect / munmap
>
> These operations create, modify, or remove VMAs in the current mm. They
> may split existing VMAs, merge adjacent VMAs, or remove a VMA from mm_mt.
mmap and brk are not at all relevant to anon_vma, as no anon_vma is
assigned upon mapping. It's on fault.
mprotect/mlock/munmap/etc. might split, but I don't see how the lazy
approach in any way addresses any of that.
>
> When a new VMA is created, vm_start, vm_end and vm_pgoff are initialized
> and the VMA is inserted into mm_mt. Although these fields may later be
> modified, the following value remains invariant:
>
> (vm_start - vm_pgoff * PAGE_SIZE)
Err no it doesn't at all?
If I fault in a VMA at vm_start, vm_pgoff = vm_start >> PAGE_SHIFT.
Then if I remap it, vm_start changes, vm_pgoff stays the same, so:
vm_start - vm_pgoff * PAGE_SIZE
Changes right? And then that becomes essentially the offset from where it
was faulted in.
>
> We refer to this value as:
>
> vma_mapping_base(vma) = vma->vm_start - vma->vm_pgoff * PAGE_SIZE
This is mysteriously close to being the offset I mention in my CoW context
work...
I'm not sure what 'mapping base' means here.
>
> This value also remains unchanged when the VMA is removed from mm_mt.
Why does it matter what this value is on unmap?
>
> If a VMA is split and produces new_vma, the following holds:
>
> vma_mapping_base(new_vma) == vma_mapping_base(vma)
This is a roundabout way of saying we offset the vma->vm_pgoff after split.
>
> If two adjacent VMAs vma_a and vma_b are merged into vma_x, then:
>
> vma_mapping_base(vma_a) == vma_mapping_base(vma_b) ==
> vma_mapping_base(vma_x)
This is just a roundabout way of saying the pgoff has to be aligned.
>
> Assume the VMA where the first page fault occurs is called root_vma, and
> ensure that any VMA produced by split or merge holds a reference to
> root_vma.
But this VMA can be unmapped later? Or remapped?
Holding on to a VMA and treating it as some kind of canonical reference
with a reference count completely changes what VMAs are, impacts the VMA
lifecycle, and produces unwanted memory overhead in itself.
It also raises concerns and issues around lock order which is very
sensitive.
>
> During rmap we can compute the folio address using root_vma:
>
> vma_address(vma, pgoff, 1) =
What's the parameters here? What's 1?
> vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT)
> = vma_mapping_base(vma) + pgoff * PAGE_SIZE
> = vma_mapping_base(root_vma) + folio_pgoff * PAGE_SIZE
>
> We can then use folio_addr to locate the VMA covering this folio.
I'm really confused by this, you're kind of mixing and match parameters
here.
What I think you're saying is that, if a folio hasn't been remapped, you
can figure out its address based on page offset.
That's completely broken for MAP_PRIVATE file-backed mappings which also
use anon_vma and also have to keep on working.
It seems that for the lazy approach what you are doing is essentially
caching the 'root' VMA in the folio. But this doesn't account for large
folios and split VMAs.
Even if you disabled it for those cases (which adds a ton of complexity in
itself), you then have issues with locking - the anon_vma lock has to take
a lock (that cannot be a VMA-level lock - results in lock inversion) even
on these leaf entries, or you break locking.
And we can't reasonably start pinning VMAs and using them as a sort of
proto cached thing on top of the existing anon_vma logic.
You also then need to, on remap, undo all this, which requires updating
folio->mapping on remap, something I tried doing previously myself, but
that's fraught with issues around lock inversion itself.
>
> 3. mremap / uffd_move
userfaultfd moving is not relevant as it actually updates the folio
correctly.
>
> If only the size changes and the start address remains the same, there
> is no impact.
>
> If the start address changes, the page is moved from (vma, addr) to
> (new_vma, new_addr). In this case:
>
> vma_mapping_base(new_vma) =
> vma_mapping_base(vma) + new_addr - old_addr
You say above that the mapping base never changes? But here it changes?
>
> We first upgrade the VMA, and then fix folio->mapping in move_ptes().
What's 'upgrading' a VMA? You mean converting the lazy anon_vma to a
'normal' one.
As above, this is fraught with lock inversion issues.
>
> If performance becomes a concern, ANON_VMA_LAZY can be enabled only for
> relatively small VMAs.
I think you've got serious correctness, lock management and complexity
issues and it's all a non-starter as the costs deeply exceed the benefits.
This is one of the fundamental, frustrating aspects of the anon rmap - you
keep thinking that 'surely' you can do sensible thing X, but it turns out
you can't for various annoying reasons.
It's one of the reasons it's really fraught for somebody coming to make
changes, and one of the reasons why I am very keen on fundamentally
changing it.
And also on a not-wasting-time basis - I was already working in parallel on
a rework here, so I think the civil thing is to at least wait for my work
before issuing alternative solutions.
Thanks, Lorenzo
>
>
> vma操作可以分为3类,下面简单说明下ANON_VMA_LAZY的处理:
>
> 1. fork 从父进程复制mm/mmap;(exev 创建一个新的mm/mmap,不涉及)。
> 这可以理解为在一个新的地址空间复制一份相同地址的VMAs.
> 如果pvma是ANON_VMA_LAZY,先升级为regular anon_vma,并在try_dup_anon_rmap中升级修正folio->mapping.
>
> 2. mmap/brk/mprotect/munmap
> 创建、修改或删除当前mm的VMA,可能合并或拆分出新的VMAs或者将VMA从mm_mt删除。
> 创建一个新的vma并设置vm_start、vm_end、vm_pgoff插入mm_mt后,虽然后续可能修改这个VMA的vm_start、vm_end、vm_pgoff,但是保持
> (vm_start - vm_pgoff * PAGE_SIZE)不变,我们可以把这个称之为vma_mapping_base(vma) = vma->vm_start - vma->vm_pgoff * PAGE_SIZE。
> 这个vma从mm_mt删除时,vma_mapping_base(vma)也保持不变。
> 从这个vma拆分出的new_vma,有vma_mapping_base(new_vma) == vma_mapping_base(vma)
> 合并相邻vma_a、vma_b为vma_x时,也有vma_mapping_base(vma_a) == vma_mapping_base(vma_b) == vma_mapping_base(vma_x)
> 如果我们第一次发生缺页的VMA称为root_vma,并在split或merge时都确保使用的vma持有root_vma的引用。
> 在rmap时我们可以用root_vma计算folio地址:
> vma_address(vma, pgoff, 1) = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT)
> = vma_mapping_base(vma) + pgoff * PAGE_SIZE
> = vma_mapping_base(root_vma) + folio_pgoff * PAGE_SIZE
> 然后用folio_addr查找folio所在的vma。
>
> 3. mremap/uffd_move
> 如果只是修改大小,起始地址不变,不影响。
> 如果改变起始地址,将page从vma/addr移动到new_vma/new_addr
> 这时vma_mapping_base(new_vma) = vma_mapping_base(vma) + new_addr - old_addr
> 我们先升级vma,在move_ptes中再修正folio->mapping。
> 如果担心性能影响,可以只在较小的vma上使能ANON_VMA_LAZY。
>
^ permalink raw reply [flat|nested] 64+ messages in thread
* RE: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-03 7:54 ` Lorenzo Stoakes
@ 2026-06-03 11:05 ` wangtao
2026-06-03 11:53 ` Lorenzo Stoakes
0 siblings, 1 reply; 64+ messages in thread
From: wangtao @ 2026-06-03 11:05 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Harry Yoo, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, jannh@google.com, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, ziy@nvidia.com,
baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org,
lance.yang@linux.dev, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
pfalcato@suse.de, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, 21cnbao@gmail.com, jparsana@google.com,
dvander@google.com, zhangji, wangzicheng
> >
>
> Against my better judgment I'll address the stuff here...
>
> > VMA operations can be roughly divided into three categories. The
> > handling of ANON_VMA_LAZY is briefly described below.
>
> I don't agree, there are plenty more VMA operations. But with respect to
> anon rmap there are:
>
> - fork
> - merge/split
> - remap
>
Yes, these are the three categories. I originally intended to explain them
by classifying based on system calls; I should have used mremap instead of move_vma.
是的,是这三类,我本想从系统调用去分类说明,应该将move_vma换成mremap的。
> Your approach seems to completely ignore VMA split and the need to
> maintain an interval tree to _multiple_ VMAs from a single anon_vma.
>
The folio uses vma->root_vma to compute folio_address. A VMA split from it,
vma_a, also uses vma_a->root_vma = vma->root_vma to compute folio_address.
During rmap, once folio_address is obtained, the VMA can be found through
mm_mt. Without fork, there is no need to maintain the interval tree.
folio使用vma->root_vma 计算folio_address;从vma拆分出的vma_a,使用vma_a->root_vma = vma->root_vma计算folio_address。
rmap时得到folio_address就可以通过mm_mt查找到vma。
不fork就不需要维护interval tree。
> You may also actually split a VMA against a single large folio (waiting on the
> deferred shrinker) and have a SINGLE _leaf_ anonymous folio that is mapped
> in two places.
>
> The lazy approach doesn't seem to address this properly. And fatally it ties an
> actual VMA afaict to the folio and has to implement a VMA reference count
> mechanism which interferes with the ordinarily VMA lifecycle to do it.
>
> The fact of us taking advantage of most stuff being AnonExclusive, i.e.
> 'leaves' is something that my approach is exactly taking into account.
>
> Of course also extending anon_vma is a real non-starter.
>
> Also the below + the series ignores MAP_PRIVATE file-backed mappings
> which is a pretty fatal flaw.
>
> It also, as Harry says, has zero description of correctness in a way we'd want
> and no tests.
>
可以正确处理拆分vma在一个大页。拆分的vma_a或vma_b上的sub_page使用如下方式计算地址。
对于文件vma的cow 匿名页,也用同样方式计算page/folio地址。
It can correctly handle the case where a VMA is split within a large
page. The address of a sub_page in the split VMA (vma_a or vma_b) is
computed using the following method.
For COW anonymous pages originating from file VMAs, the page/folio
address is also computed using the same method.
subpage_address = vma_address(vma_a, subpage_pgoff, 1)
= vma_a->vm_start + (subpage_pgoff - vma_a->vm_pgoff) * PAGE_SIZE
= vma_a->vm_start - vma_a->vm_pgoff * PAGE_SIZE + subpage_pgoff * PAGE_SIZE
= vma_mapping_base(vma_a) + subpage_pgoff * PAGE_SIZE
= vma_mapping_base(root_vma) + subpage_pgoff * PAGE_SIZE
> >
> > 1. fork
> >
> > fork duplicates the parent's mm/mmap. (exec creates a new mm/mmap
> and
> > is not involved here.) This can be viewed as copying the VMAs with
> > identical virtual addresses into a new address space.
> >
> > If the parent VMA (pvma) is ANON_VMA_LAZY, it is first upgraded to a
> > regular anon_vma. The corresponding folio->mapping is then fixed in
> > try_dup_anon_rmap().
>
> And so we make fork, a very sensitive path in the kernel more expensive.
>
> I also question the locking situation with the conversion mentioned, updating
> folios in this manner is extremely difficult.
>
Because rmap takes the PTE lock, while fork takes the mmap write lock,
the VMA write lock, and the PTE lock.
Given the rule that folio->mapping can only transition in one direction
from lazy_vma to a regular anon_vma, the situation can be handled
correctly even without taking the folio_lock.
When rmap and fork run concurrently:
If rmap observes folio->mapping as a regular anon_vma, there is
obviously no issue.
If rmap observes folio->mapping as lazy_vma, then rmap only processes
the parent's pvma. At the end of rmap_walk_anon(), if we see that folio->mapping has
changed to a regular anon_vma, we simply process it once more. The
various rmap_one implementations are idempotent anyway.
BTW: the commit message of patch 13 says a retry is needed, but the
retry handling was accidentally omitted in the posted patch.
因为rmap获取pte锁;fork时获取mmap写锁、vma写锁、pte锁。
只允许folio->mapping从lazy_vma单向变成regular anon_vma的原则,不获取folio_lock也可以正确处理。
当rmap和fork并发处理时:
假如rmap看到的folio->mapping是regular anon_vma,显然没有问题。
假如rmap看到的folio->mapping是lazy_vma,则rmap只处理了父进程的pvma;
我们在rmap_walk_anon结束时如果看到folio->mapping变成了regular anon_vma,则再来一次处理即可,毕竟各种rmap_one实现是幂等的。
btw:patch 13的commit msg说要retry,但是发送的patch由于操作失误漏掉了重试处理。
> >
> > 2. mmap / brk / mprotect / munmap
> >
> > These operations create, modify, or remove VMAs in the current mm.
> > They may split existing VMAs, merge adjacent VMAs, or remove a VMA
> from mm_mt.
>
> mmap and brk are not at all relevant to anon_vma, as no anon_vma is
> assigned upon mapping. It's on fault.
>
mmap/brk 指定地址时可能导致匿名 VMA merge 或 split。
mmap()/brk() with a specified address may cause anonymous VMA merge or split.
> mprotect/mlock/munmap/etc. might split, but I don't see how the lazy
> approach in any way addresses any of that.
>
上边说了,split后rmap仍使用root_vma计算folio_address或page_address。
As mentioned above, after the split, rmap still uses root_vma to compute
folio_address or page_address.
> >
> > When a new VMA is created, vm_start, vm_end and vm_pgoff are
> > initialized and the VMA is inserted into mm_mt. Although these fields
> > may later be modified, the following value remains invariant:
> >
> > (vm_start - vm_pgoff * PAGE_SIZE)
>
> Err no it doesn't at all?
>
> If I fault in a VMA at vm_start, vm_pgoff = vm_start >> PAGE_SHIFT.
>
> Then if I remap it, vm_start changes, vm_pgoff stays the same, so:
>
> vm_start - vm_pgoff * PAGE_SIZE
>
> Changes right? And then that becomes essentially the offset from where it
> was faulted in.
>
If mremap modifies vm_start, i.e., move_vma, a new VMA will be created.
This corresponds exactly to the third point mentioned later: upgrading
anon_vma_lazy to a regular anon_vma and updating folio->mapping.
mremap时如果修改vm_start,即move_vma则创建新的vma,这正是我后边第三点说的:
将anon_vma_lazy升级成regular anon_vma并修改folio->mapping。
> >
> > We refer to this value as:
> >
> > vma_mapping_base(vma) = vma->vm_start - vma->vm_pgoff * PAGE_SIZE
>
> This is mysteriously close to being the offset I mention in my CoW context
> work...
>
> I'm not sure what 'mapping base' means here.
>
vma_addrss(vma, pgoff, nr_pages)
= vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT)
= vma->vm_start + ((pgoff - vma->vm_pgoff) * PAGE_SIZE)
= vma->vm_start - vma->vm_pgoff * PAGE_SIZE + pgoff * PAGE_SIZE
= vma_mapping_base(vma) + pgoff * PAGE_SIZE
vma_mapping_base depends only on the VMA and is independent of the page.
Alternatively, we could also call it vma_rmap_base.
vma_mapping_base只和vma相关,和page无关,或者我们也可以叫他vma_rmap_base?
> >
> > This value also remains unchanged when the VMA is removed from
> mm_mt.
>
> Why does it matter what this value is on unmap?
>
If root_vma is removed from mm_mt due to munmap, it will still remain
valid as long as other VMAs hold references to it.
root_vma如果被munmap从mm_mt中删除。其他vma持有引用,就仍有效。
> >
> > If a VMA is split and produces new_vma, the following holds:
> >
> > vma_mapping_base(new_vma) == vma_mapping_base(vma)
>
> This is a roundabout way of saying we offset the vma->vm_pgoff after split.
>
> >
> > If two adjacent VMAs vma_a and vma_b are merged into vma_x, then:
> >
> > vma_mapping_base(vma_a) == vma_mapping_base(vma_b) ==
> > vma_mapping_base(vma_x)
>
> This is just a roundabout way of saying the pgoff has to be aligned.
>
> >
> > Assume the VMA where the first page fault occurs is called root_vma,
> > and ensure that any VMA produced by split or merge holds a reference
> > to root_vma.
>
> But this VMA can be unmapped later? Or remapped?
>
It can be unmapped. As mentioned earlier, if mremap modifies vm_start,
a new VMA will be created.
可以被munmap。前边说了mremap如果修改vm_start则创建新的vma。
> Holding on to a VMA and treating it as some kind of canonical reference with
> a reference count completely changes what VMAs are, impacts the VMA
> lifecycle, and produces unwanted memory overhead in itself.
>
During split/merge operations, we can try to preferentially use root_vma
so as to avoid deleting it.
在split/merge时,我们可以尽量优先使用root_vma,避免删除root_vma。
> It also raises concerns and issues around lock order which is very sensitive.
>
Both rmap and fork acquire the PTE lock, which ensures that handling a page
with respect to a particular VMA is atomic.
There is no need to add folio_lock.
When fork converts folio->mapping into a regular anon_vma,
rmap_walk_anon can simply check and retry.
rmap和fork时都要获取pte锁,可以确保rmap/fork在处理page的某个vma是原子的。
不需要增加folio_lock,当fork将folio->mapping变成regular anon_vma后,rmap_walk_anon检查retry即可。
> >
> > During rmap we can compute the folio address using root_vma:
> >
> > vma_address(vma, pgoff, 1) =
>
> What's the parameters here? What's 1?
>
> > vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT)
> > = vma_mapping_base(vma) + pgoff * PAGE_SIZE
> > = vma_mapping_base(root_vma) + folio_pgoff * PAGE_SIZE
> >
> > We can then use folio_addr to locate the VMA covering this folio.
>
I overlooked this earlier. We can unify it by using pgoff as follows.
page_addr = vma_address(vma, pgoff, nr_pages)
= vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT)
= vma->vm_start + ((pgoff - vma->vm_pgoff) * PAGE_SIZE)
= vma->vm_start - vma->vm_pgoff * PAGE_SIZE + pgoff * PAGE_SIZE
= vma_mapping_base(vma) + pgoff * PAGE_SIZE
= vma_mapping_base(root_vma) + pgoff * PAGE_SIZE
> I'm really confused by this, you're kind of mixing and match parameters here.
>
> What I think you're saying is that, if a folio hasn't been remapped, you can
> figure out its address based on page offset.
>
> That's completely broken for MAP_PRIVATE file-backed mappings which also
> use anon_vma and also have to keep on working.
>
> It seems that for the lazy approach what you are doing is essentially caching
> the 'root' VMA in the folio. But this doesn't account for large folios and split
> VMAs.
>
As mentioned earlier:
subpage_address = vma_address(vma_a, subpage_pgoff, 1)
= vma_mapping_base(vma_a) + subpage_pgoff * PAGE_SIZE
= vma_mapping_base(root_vma) + subpage_pgoff * PAGE_SIZE
> Even if you disabled it for those cases (which adds a ton of complexity in
> itself), you then have issues with locking - the anon_vma lock has to take a
> lock (that cannot be a VMA-level lock - results in lock inversion) even on
> these leaf entries, or you break locking.
>
When there is no fork/mremap, we do not need the interval tree or the anon_vma lock.
不fork/mremap时我们不需要interval tree,不需要anon_vma锁。
> And we can't reasonably start pinning VMAs and using them as a sort of
> proto cached thing on top of the existing anon_vma logic.
>
In most cases, root_vma is actively used.
Although it may be removed by munmap, overall it still saves memory.
大部分情况下root_vma都是在被使用的,当然可能被munmap删除,但是整体上节省内存的。
> You also then need to, on remap, undo all this, which requires updating
> folio->mapping on remap, something I tried doing previously myself, but
> that's fraught with issues around lock inversion itself.
>
> >
> > 3. mremap / uffd_move
>
> userfaultfd moving is not relevant as it actually updates the folio correctly.
>
These two operations are different from the previous two types,
as they modify the virtual address of the page/folio.
这两个操作和前两类不同,修改page/folio的虚拟地址。
> >
> > If only the size changes and the start address remains the same, there
> > is no impact.
> >
> > If the start address changes, the page is moved from (vma, addr) to
> > (new_vma, new_addr). In this case:
> >
> > vma_mapping_base(new_vma) =
> > vma_mapping_base(vma) + new_addr - old_addr
>
> You say above that the mapping base never changes? But here it changes?
>
For the newly created new_vma, vma_mapping_base(new_vma) is not equal to vma_mapping_base(vma),
while vma_mapping_base(vma) itself remains unchanged.
新创建的new_vma的vma_mapping_base(new_vma) 不等于vma_mapping_base(vma),但是vma_mapping_base(vma)不变。
> >
> > We first upgrade the VMA, and then fix folio->mapping in move_ptes().
>
> What's 'upgrading' a VMA? You mean converting the lazy anon_vma to a
> 'normal' one.
>
> As above, this is fraught with lock inversion issues.
>
Yes, it upgrades from a lazy_vma to a regular anon_vma.
As mentioned earlier, during this process we hold the mmap write lock, the vma write lock,
and the pte lock, so acquiring the folio_lock is unnecessary.
是的,从lazy_vma升级成regular anon_vma。
如前边所说,这个过程中我们有mmap写锁、vma写锁和pte锁,可以不获取folio_lock。
> >
> > If performance becomes a concern, ANON_VMA_LAZY can be enabled
> only
> > for relatively small VMAs.
>
> I think you've got serious correctness, lock management and complexity
> issues and it's all a non-starter as the costs deeply exceed the benefits.
>
I think the approach is feasible:
1. During merge/split, the newly created vma_a satisfies
vma_mapping_base(vma_a) == vma_mapping_base(vma) ==
vma_mapping_base(root_vma). Therefore, we can use root_vma to
compute the virtual address of the folio/page mapped by vma_a.
2. During fork and mremap, we hold the mmap write lock, the vma
write lock, and the pte lock. In particular, the pte lock ensures
that rmap and fork operations on a folio/page within a specific
vma are atomic. If folio->mapping is upgraded during
rmap_walk_anon(folio), we can simply let rmap_walk_anon retry
once.
我认为方案可行:
1. merge/split时新创建的vma_a有vma_mapping_base(vma_a) == vma_mapping_base(vma) == vma_mapping_base(root_vma)
所以我们可利用root_vma计算vma_a映射的folio/page的虚拟地址。
2. fork和mremap时我们持有mmap写锁、vma写锁和pte锁。
特别的pte锁能确保rmap和fork在folio/page在某个vma上的操作是原子的。
如果rmap_walk_anon(folio)过程中folio->mapping有升级变化,我们让rmap_walk_anon retry一次即可。
> This is one of the fundamental, frustrating aspects of the anon rmap - you
> keep thinking that 'surely' you can do sensible thing X, but it turns out you
> can't for various annoying reasons.
>
> It's one of the reasons it's really fraught for somebody coming to make
> changes, and one of the reasons why I am very keen on fundamentally
> changing it.
>
> And also on a not-wasting-time basis - I was already working in parallel on a
> rework here, so I think the civil thing is to at least wait for my work before
> issuing alternative solutions.
>
> Thanks, Lorenzo
>
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-03 11:05 ` wangtao
@ 2026-06-03 11:53 ` Lorenzo Stoakes
2026-06-04 3:50 ` wangtao
0 siblings, 1 reply; 64+ messages in thread
From: Lorenzo Stoakes @ 2026-06-03 11:53 UTC (permalink / raw)
To: wangtao
Cc: Harry Yoo, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, jannh@google.com, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, ziy@nvidia.com,
baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org,
lance.yang@linux.dev, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
pfalcato@suse.de, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, 21cnbao@gmail.com, jparsana@google.com,
dvander@google.com, zhangji, wangzicheng
Thanks for your replies, but I really have to stop doing deeper analyses like
these for time management purposes.
I did this more so to make the point from [0] as to why, in lower trust
environments, this is just not feasible.
We could loop around for hours and hours and hours here.
In general as before, even if all worked perfectly (I'm very much not at all
convinced), extending anon_vma and pinning VMAs is simply a no-go for
architectural and complexity reasons.
I also find the locking story dubious and the lack of tests or anything
corroborating correctness is additionally fatal.
And finally, I was already working on a replacement for anon_vma, and the
generally done thing in these situations is for my work to take precedence.
So I'm going to bail out on futher deeper analyses here as otherwise I simply
can't work on anything else :)
Thanks, Lorenzo
[0]:https://lore.kernel.org/all/ah887A5VkXOcmq-g@lucifer/
On Wed, Jun 03, 2026 at 11:05:28AM +0000, wangtao wrote:
> > >
> >
> > Against my better judgment I'll address the stuff here...
> >
> > > VMA operations can be roughly divided into three categories. The
> > > handling of ANON_VMA_LAZY is briefly described below.
> >
> > I don't agree, there are plenty more VMA operations. But with respect to
> > anon rmap there are:
> >
> > - fork
> > - merge/split
> > - remap
> >
>
> Yes, these are the three categories. I originally intended to explain them
> by classifying based on system calls; I should have used mremap instead of move_vma.
I don't think you mentioned move_vma()? Maybe I missed it.
The categorisation is most usefully based on callers of anon_vma_clone().
>
> 是的,是这三类,我本想从系统调用去分类说明,应该将move_vma换成mremap的。
>
> > Your approach seems to completely ignore VMA split and the need to
> > maintain an interval tree to _multiple_ VMAs from a single anon_vma.
> >
>
> The folio uses vma->root_vma to compute folio_address. A VMA split from it,
> vma_a, also uses vma_a->root_vma = vma->root_vma to compute folio_address.
> During rmap, once folio_address is obtained, the VMA can be found through
> mm_mt. Without fork, there is no need to maintain the interval tree.
Well you need to search for every possible split VMA in mm_mt now, so you have
to go page-by-page searching for each page for the rmap walked range.
You're also potentially racing against a remap, as you say below you don't folio
lock on remap so concurrent rmap walkers can be present, the VMA can already be
copied.
We already have VMA lifecycle state around detached VMAs, so a VMA could be in a
detached state, assumed by the existing logic to be entirely unavailable for
use, out of the maple tree altogether but kept around in a zombie state.
We'd then have lifecycle issues and races and edge cases around process teardown
otherwise we might leak memory.
Also, presumably you set vma->anon_vma to some lazy sentinel value so that mremap
doesn't change vma->vm_pgoff when unfaulted?
You would need to update any path that manipulates vma->anon_vma also so it
doesn't incorrectly dereference it.
>
> folio使用vma->root_vma 计算folio_address;从vma拆分出的vma_a,使用vma_a->root_vma = vma->root_vma计算folio_address。
> rmap时得到folio_address就可以通过mm_mt查找到vma。
> 不fork就不需要维护interval tree。
>
> > You may also actually split a VMA against a single large folio (waiting on the
> > deferred shrinker) and have a SINGLE _leaf_ anonymous folio that is mapped
> > in two places.
> >
> > The lazy approach doesn't seem to address this properly. And fatally it ties an
> > actual VMA afaict to the folio and has to implement a VMA reference count
> > mechanism which interferes with the ordinarily VMA lifecycle to do it.
> >
> > The fact of us taking advantage of most stuff being AnonExclusive, i.e.
> > 'leaves' is something that my approach is exactly taking into account.
> >
> > Of course also extending anon_vma is a real non-starter.
> >
> > Also the below + the series ignores MAP_PRIVATE file-backed mappings
> > which is a pretty fatal flaw.
> >
> > It also, as Harry says, has zero description of correctness in a way we'd want
> > and no tests.
> >
>
> 可以正确处理拆分vma在一个大页。拆分的vma_a或vma_b上的sub_page使用如下方式计算地址。
> 对于文件vma的cow 匿名页,也用同样方式计算page/folio地址。
>
> It can correctly handle the case where a VMA is split within a large
> page. The address of a sub_page in the split VMA (vma_a or vma_b) is
> computed using the following method.
>
> For COW anonymous pages originating from file VMAs, the page/folio
> address is also computed using the same method.
>
> subpage_address = vma_address(vma_a, subpage_pgoff, 1)
> = vma_a->vm_start + (subpage_pgoff - vma_a->vm_pgoff) * PAGE_SIZE
> = vma_a->vm_start - vma_a->vm_pgoff * PAGE_SIZE + subpage_pgoff * PAGE_SIZE
> = vma_mapping_base(vma_a) + subpage_pgoff * PAGE_SIZE
> = vma_mapping_base(root_vma) + subpage_pgoff * PAGE_SIZE
OK but you want to walk entries in a _range_ in the interval tree.
So you are then now looking up VMAs (in a racey way) using mm_mt (which is the
whole basis of my work actually) which could change under you.
I guess what you're doing is using the pinned 'root' VMA as the basis of
everything, and the second a VMA is moved you (somehow) walk the page tables to
update the folio->mapping.
Again pinning the VMA like this and putting it in a folio is really not
something we want to do.
It adds a ton of complexity and also impacts VMA lifecycle which is already
fairly fraught.
It makes the VMA no longer just a VMA but rather also a 'memory' of where
something was first faulted in as a hack more or less.
>
> > >
> > > 1. fork
> > >
> > > fork duplicates the parent's mm/mmap. (exec creates a new mm/mmap
> > and
> > > is not involved here.) This can be viewed as copying the VMAs with
> > > identical virtual addresses into a new address space.
> > >
> > > If the parent VMA (pvma) is ANON_VMA_LAZY, it is first upgraded to a
> > > regular anon_vma. The corresponding folio->mapping is then fixed in
> > > try_dup_anon_rmap().
> >
> > And so we make fork, a very sensitive path in the kernel more expensive.
> >
> > I also question the locking situation with the conversion mentioned, updating
> > folios in this manner is extremely difficult.
> >
>
> Because rmap takes the PTE lock, while fork takes the mmap write lock,
> the VMA write lock, and the PTE lock.
The PTE lock is not held for the duration of an anon_vma lock.
You will break anything that needs to hold the anon_vma lock for the duration,
e.g. migration.
This is substantively the issue I am working on in my approach and as per
https://ljs.io/scalable-cow-lsf.pdf you can see that's an open question that I
am currently researching.
>
> Given the rule that folio->mapping can only transition in one direction
> from lazy_vma to a regular anon_vma, the situation can be handled
> correctly even without taking the folio_lock.
Folio lock serialises against concurrent rmap walks, and you can end up reading
a lazy_vma that later gets converted into an anon_vma concurrently.
>
> When rmap and fork run concurrently:
> If rmap observes folio->mapping as a regular anon_vma, there is
> obviously no issue.
> If rmap observes folio->mapping as lazy_vma, then rmap only processes
> the parent's pvma. At the end of rmap_walk_anon(), if we see that folio->mapping has
> changed to a regular anon_vma, we simply process it once more. The
> various rmap_one implementations are idempotent anyway.
Hm this all seems very racey.
>
> BTW: the commit message of patch 13 says a retry is needed, but the
> retry handling was accidentally omitted in the posted patch.
:))
>
> 因为rmap获取pte锁;fork时获取mmap写锁、vma写锁、pte锁。
> 只允许folio->mapping从lazy_vma单向变成regular anon_vma的原则,不获取folio_lock也可以正确处理。
> 当rmap和fork并发处理时:
> 假如rmap看到的folio->mapping是regular anon_vma,显然没有问题。
> 假如rmap看到的folio->mapping是lazy_vma,则rmap只处理了父进程的pvma;
> 我们在rmap_walk_anon结束时如果看到folio->mapping变成了regular anon_vma,则再来一次处理即可,毕竟各种rmap_one实现是幂等的。
> btw:patch 13的commit msg说要retry,但是发送的patch由于操作失误漏掉了重试处理。
>
> > >
> > > 2. mmap / brk / mprotect / munmap
> > >
> > > These operations create, modify, or remove VMAs in the current mm.
> > > They may split existing VMAs, merge adjacent VMAs, or remove a VMA
> > from mm_mt.
> >
> > mmap and brk are not at all relevant to anon_vma, as no anon_vma is
> > assigned upon mapping. It's on fault.
> >
> mmap/brk 指定地址时可能导致匿名 VMA merge 或 split。
>
> mmap()/brk() with a specified address may cause anonymous VMA merge or split.
>
> > mprotect/mlock/munmap/etc. might split, but I don't see how the lazy
> > approach in any way addresses any of that.
> >
> 上边说了,split后rmap仍使用root_vma计算folio_address或page_address。
>
> As mentioned above, after the split, rmap still uses root_vma to compute
> folio_address or page_address.
>
> > >
> > > When a new VMA is created, vm_start, vm_end and vm_pgoff are
> > > initialized and the VMA is inserted into mm_mt. Although these fields
> > > may later be modified, the following value remains invariant:
> > >
> > > (vm_start - vm_pgoff * PAGE_SIZE)
> >
> > Err no it doesn't at all?
> >
> > If I fault in a VMA at vm_start, vm_pgoff = vm_start >> PAGE_SHIFT.
> >
> > Then if I remap it, vm_start changes, vm_pgoff stays the same, so:
> >
> > vm_start - vm_pgoff * PAGE_SIZE
> >
> > Changes right? And then that becomes essentially the offset from where it
> > was faulted in.
> >
> If mremap modifies vm_start, i.e., move_vma, a new VMA will be created.
> This corresponds exactly to the third point mentioned later: upgrading
> anon_vma_lazy to a regular anon_vma and updating folio->mapping.
I think updating folio->mapping here is problematic, I know this because I
worked on this very probably a year or so ago and found locking issues prevented
this from being workable.
>
> mremap时如果修改vm_start,即move_vma则创建新的vma,这正是我后边第三点说的:
> 将anon_vma_lazy升级成regular anon_vma并修改folio->mapping。
>
> > >
> > > We refer to this value as:
> > >
> > > vma_mapping_base(vma) = vma->vm_start - vma->vm_pgoff * PAGE_SIZE
> >
> > This is mysteriously close to being the offset I mention in my CoW context
> > work...
> >
> > I'm not sure what 'mapping base' means here.
> >
>
> vma_addrss(vma, pgoff, nr_pages)
> = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT)
> = vma->vm_start + ((pgoff - vma->vm_pgoff) * PAGE_SIZE)
> = vma->vm_start - vma->vm_pgoff * PAGE_SIZE + pgoff * PAGE_SIZE
> = vma_mapping_base(vma) + pgoff * PAGE_SIZE
>
> vma_mapping_base depends only on the VMA and is independent of the page.
> Alternatively, we could also call it vma_rmap_base.
>
> vma_mapping_base只和vma相关,和page无关,或者我们也可以叫他vma_rmap_base?
>
> > >
> > > This value also remains unchanged when the VMA is removed from
> > mm_mt.
> >
> > Why does it matter what this value is on unmap?
> >
> If root_vma is removed from mm_mt due to munmap, it will still remain
> valid as long as other VMAs hold references to it.
Yeah this is something we don't want.
>
> root_vma如果被munmap从mm_mt中删除。其他vma持有引用,就仍有效。
>
> > >
> > > If a VMA is split and produces new_vma, the following holds:
> > >
> > > vma_mapping_base(new_vma) == vma_mapping_base(vma)
> >
> > This is a roundabout way of saying we offset the vma->vm_pgoff after split.
> >
> > >
> > > If two adjacent VMAs vma_a and vma_b are merged into vma_x, then:
> > >
> > > vma_mapping_base(vma_a) == vma_mapping_base(vma_b) ==
> > > vma_mapping_base(vma_x)
> >
> > This is just a roundabout way of saying the pgoff has to be aligned.
> >
> > >
> > > Assume the VMA where the first page fault occurs is called root_vma,
> > > and ensure that any VMA produced by split or merge holds a reference
> > > to root_vma.
> >
> > But this VMA can be unmapped later? Or remapped?
> >
> It can be unmapped. As mentioned earlier, if mremap modifies vm_start,
> a new VMA will be created.
But everything's racey?
>
> 可以被munmap。前边说了mremap如果修改vm_start则创建新的vma。
>
>
> > Holding on to a VMA and treating it as some kind of canonical reference with
> > a reference count completely changes what VMAs are, impacts the VMA
> > lifecycle, and produces unwanted memory overhead in itself.
> >
> During split/merge operations, we can try to preferentially use root_vma
> so as to avoid deleting it.
Adding yet more complexity and edge cases, we really cannot do that, sorry.
>
> 在split/merge时,我们可以尽量优先使用root_vma,避免删除root_vma。
>
> > It also raises concerns and issues around lock order which is very sensitive.
> >
> Both rmap and fork acquire the PTE lock, which ensures that handling a page
> with respect to a particular VMA is atomic.
The PTE lock only locks the PTE. That wasn't the issue I was raising at all.
See the top of rmap.c for lock ordering. There's substantial complexity there.
>
> There is no need to add folio_lock.
> When fork converts folio->mapping into a regular anon_vma,
> rmap_walk_anon can simply check and retry.
This seems like it won't work.
And again you're adding a lot of new complexity.
>
> rmap和fork时都要获取pte锁,可以确保rmap/fork在处理page的某个vma是原子的。
> 不需要增加folio_lock,当fork将folio->mapping变成regular anon_vma后,rmap_walk_anon检查retry即可。
>
>
> > >
> > > During rmap we can compute the folio address using root_vma:
> > >
> > > vma_address(vma, pgoff, 1) =
> >
> > What's the parameters here? What's 1?
> >
> > > vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT)
> > > = vma_mapping_base(vma) + pgoff * PAGE_SIZE
> > > = vma_mapping_base(root_vma) + folio_pgoff * PAGE_SIZE
> > >
> > > We can then use folio_addr to locate the VMA covering this folio.
> >
>
> I overlooked this earlier. We can unify it by using pgoff as follows.
>
> page_addr = vma_address(vma, pgoff, nr_pages)
> = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT)
> = vma->vm_start + ((pgoff - vma->vm_pgoff) * PAGE_SIZE)
> = vma->vm_start - vma->vm_pgoff * PAGE_SIZE + pgoff * PAGE_SIZE
> = vma_mapping_base(vma) + pgoff * PAGE_SIZE
> = vma_mapping_base(root_vma) + pgoff * PAGE_SIZE
It's really just saying
>
>
> > I'm really confused by this, you're kind of mixing and match parameters here.
> >
> > What I think you're saying is that, if a folio hasn't been remapped, you can
> > figure out its address based on page offset.
> >
> > That's completely broken for MAP_PRIVATE file-backed mappings which also
> > use anon_vma and also have to keep on working.
> >
> > It seems that for the lazy approach what you are doing is essentially caching
> > the 'root' VMA in the folio. But this doesn't account for large folios and split
> > VMAs.
> >
> As mentioned earlier:
> subpage_address = vma_address(vma_a, subpage_pgoff, 1)
> = vma_mapping_base(vma_a) + subpage_pgoff * PAGE_SIZE
> = vma_mapping_base(root_vma) + subpage_pgoff * PAGE_SIZE
I'm not sure what these
>
> > Even if you disabled it for those cases (which adds a ton of complexity in
> > itself), you then have issues with locking - the anon_vma lock has to take a
> > lock (that cannot be a VMA-level lock - results in lock inversion) even on
> > these leaf entries, or you break locking.
> >
> When there is no fork/mremap, we do not need the interval tree or the anon_vma lock.
We need to stabilise across the VMAs.
>
> 不fork/mremap时我们不需要interval tree,不需要anon_vma锁。
>
> > And we can't reasonably start pinning VMAs and using them as a sort of
> > proto cached thing on top of the existing anon_vma logic.
> >
>
> In most cases, root_vma is actively used.
> Although it may be removed by munmap, overall it still saves memory.
For what workloads? Where? How?
It's adding complexity we can't have.
>
> 大部分情况下root_vma都是在被使用的,当然可能被munmap删除,但是整体上节省内存的。
>
> > You also then need to, on remap, undo all this, which requires updating
> > folio->mapping on remap, something I tried doing previously myself, but
> > that's fraught with issues around lock inversion itself.
> >
> > >
> > > 3. mremap / uffd_move
> >
> > userfaultfd moving is not relevant as it actually updates the folio correctly.
> >
> These two operations are different from the previous two types,
> as they modify the virtual address of the page/folio.
>
> 这两个操作和前两类不同,修改page/folio的虚拟地址。
>
> > >
> > > If only the size changes and the start address remains the same, there
> > > is no impact.
> > >
> > > If the start address changes, the page is moved from (vma, addr) to
> > > (new_vma, new_addr). In this case:
> > >
> > > vma_mapping_base(new_vma) =
> > > vma_mapping_base(vma) + new_addr - old_addr
> >
> > You say above that the mapping base never changes? But here it changes?
> >
>
> For the newly created new_vma, vma_mapping_base(new_vma) is not equal to vma_mapping_base(vma),
> while vma_mapping_base(vma) itself remains unchanged.
>
> 新创建的new_vma的vma_mapping_base(new_vma) 不等于vma_mapping_base(vma),但是vma_mapping_base(vma)不变。
>
> > >
> > > We first upgrade the VMA, and then fix folio->mapping in move_ptes().
> >
> > What's 'upgrading' a VMA? You mean converting the lazy anon_vma to a
> > 'normal' one.
> >
> > As above, this is fraught with lock inversion issues.
> >
> Yes, it upgrades from a lazy_vma to a regular anon_vma.
> As mentioned earlier, during this process we hold the mmap write lock, the vma write lock,
> and the pte lock, so acquiring the folio_lock is unnecessary.
What's preventing a concurrent rmap walk?
>
> 是的,从lazy_vma升级成regular anon_vma。
> 如前边所说,这个过程中我们有mmap写锁、vma写锁和pte锁,可以不获取folio_lock。
>
> > >
> > > If performance becomes a concern, ANON_VMA_LAZY can be enabled
> > only
> > > for relatively small VMAs.
> >
> > I think you've got serious correctness, lock management and complexity
> > issues and it's all a non-starter as the costs deeply exceed the benefits.
> >
>
> I think the approach is feasible:
For complexity and architectural reasons it's not.
>
> 1. During merge/split, the newly created vma_a satisfies
> vma_mapping_base(vma_a) == vma_mapping_base(vma) ==
> vma_mapping_base(root_vma). Therefore, we can use root_vma to
> compute the virtual address of the folio/page mapped by vma_a.
I don't love these formulas. You're storing the originally faulted-in address in
a VMA that you've pinned for the purpose of that.
If you happen to merge right a lot of times you keep around dead VMAs just for
this purpose.
We're not having VMAs have a dual-role as a 'store of the address first faulted
in' as well as being a virtual memory range.
>
> 2. During fork and mremap, we hold the mmap write lock, the vma
> write lock, and the pte lock. In particular, the pte lock ensures
> that rmap and fork operations on a folio/page within a specific
> vma are atomic. If folio->mapping is upgraded during
How does a lock exclude something that doesn't also hold that lock?
This is also adding _yet more_ complexity and subtlety. It's really a hack.
> rmap_walk_anon(folio), we can simply let rmap_walk_anon retry
> once.
Again, 'just repeating' if something changes like this without proper
serialisation is not sufficient.
>
>
> 我认为方案可行:
> 1. merge/split时新创建的vma_a有vma_mapping_base(vma_a) == vma_mapping_base(vma) == vma_mapping_base(root_vma)
> 所以我们可利用root_vma计算vma_a映射的folio/page的虚拟地址。
> 2. fork和mremap时我们持有mmap写锁、vma写锁和pte锁。
> 特别的pte锁能确保rmap和fork在folio/page在某个vma上的操作是原子的。
> 如果rmap_walk_anon(folio)过程中folio->mapping有升级变化,我们让rmap_walk_anon retry一次即可。
>
> > This is one of the fundamental, frustrating aspects of the anon rmap - you
> > keep thinking that 'surely' you can do sensible thing X, but it turns out you
> > can't for various annoying reasons.
> >
> > It's one of the reasons it's really fraught for somebody coming to make
> > changes, and one of the reasons why I am very keen on fundamentally
> > changing it.
> >
> > And also on a not-wasting-time basis - I was already working in parallel on a
> > rework here, so I think the civil thing is to at least wait for my work before
> > issuing alternative solutions.
> >
> > Thanks, Lorenzo
> >
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
` (18 preceding siblings ...)
2026-06-02 16:07 ` Harry Yoo
@ 2026-06-03 20:25 ` David Hildenbrand (Arm)
2026-06-03 22:14 ` Barry Song
` (2 more replies)
19 siblings, 3 replies; 64+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-03 20:25 UTC (permalink / raw)
To: tao, catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86,
akpm, willy, sj, kees, luizcap, zhangjiao2, kas, ljs
Cc: hpa, liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh,
jgg, jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
On 5/27/26 13:01, tao wrote:
> TL;DR
> -----
>
> This series introduces ANON_VMA_LAZY, which defers anon_vma creation
> until it is actually required.
>
> - anon_vma memory reduced by ~92-97%, anon_vma_chain reduced by ~50-57%
> - rmap operations on ANON_VMA_LAZY VMAs do not require anon_vma locking
>
> Background
> ----------
>
> Currently anon_vma structures are created eagerly when anonymous VMAs
> are initialized. However, many VMAs never participate in fork or rmap
> operations that require anon_vma chains, so the allocated anon_vma and
> anon_vma_chain objects are often unnecessary.
>
> Design overview
> ---------------
>
> ANON_VMA_LAZY defers anon_vma allocation until it is actually needed
> (for example during fork). VMAs that never participate in sharing can
> avoid creating anon_vma structures entirely.
>
> Before an anon_vma exists, rmap operations rely directly on VMA
> information, so no anon_vma locking is required. An anon_vma is created
> and linked only when sharing semantics are required.
>
> This series introduces anon_rmap helpers to make rmap less dependent on
> direct anon_vma access. It also introduces anon_vma_tree_t as a container
> to support both the lazy and the existing anon_vma layouts.
>
> Once a VMA becomes associated with an anon_vma, the normal behavior
> remains unchanged.
>
> Memory impact
> -------------
>
> Preliminary measurements show significant reductions in anon_vma-related
> slab allocations.
>
> After boot:
>
> Object | Before (active KB) | After (active KB) | Change
> vm_area_struct | 117035 | 118176 | +1.0%
> anon_vma_chain | 18865.8 | 8112.06 | -57.0%
> anon_vma | 20426.4 | 613.75 | -97.0%
>
> After launching 24 apps:
>
> Object | Before (active KB) | After (active KB) | Change
> vm_area_struct | 196873 | 197345 | +0.2%
> anon_vma_chain | 31477.1 | 15576.8 | -50.5%
> anon_vma | 33280 | 2648.12 | -92.0%
>
> Simple fork microbenchmarks also show a slight improvement in fork
> performance, since child VMAs do not need to allocate anon_vma
> structures during fork.
>
> Feedback and suggestions are welcome.
>
>
> tao (15):
> mm/rmap: introduce anon_rmap APIs for anonymous folios
> mm: convert anon_vma rmap APIs to anon_rmap
> mm: introduce anon_vma_tree_t for multiple anon_vma topologies
> mm: switch to anon_vma_tree_t APIs in preparation for ANON_VMA_LAZY
> mm: add CONFIG_ANON_VMA_LAZY and folio helpers
> mm: add CONFIG_VMA_REF and VMA helpers
> mm: replace direct FOLIO_MAPPING_ANON usage with helpers
> mm: prepare rmap infrastructure for ANON_VMA_LAZY
> mm: implement ANON_VMA_LAZY rmap semantics
> mm: defer anon_vma creation with ANON_VMA_LAZY
> mm: handle ANON_VMA_LAZY in huge page operations
> mm: handle ANON_VMA_LAZY during migration
> mm: support setup and upgrade of ANON_VMA_LAZY folios
> mm: support merging of ANON_VMA_LAZY VMAs
> mm: enable CONFIG_ANON_VMA_LAZY on arm64 and x86_64
>
> arch/arm64/Kconfig | 1 +
> arch/x86/Kconfig | 1 +
> fs/proc/page.c | 6 +-
> include/linux/mm.h | 38 ++
> include/linux/mm_types.h | 9 +-
> include/linux/page-flags.h | 34 +-
> include/linux/pagemap.h | 2 +-
> include/linux/rmap.h | 165 ++++++++-
> mm/Kconfig | 22 ++
> mm/damon/ops-common.c | 4 +-
> mm/debug.c | 2 +-
> mm/debug_vm_pgtable.c | 2 +-
> mm/gup.c | 6 +-
> mm/huge_memory.c | 16 +-
> mm/internal.h | 171 +++++++++
> mm/khugepaged.c | 13 +-
> mm/ksm.c | 43 ++-
> mm/memory-failure.c | 11 +-
> mm/memory.c | 19 +-
> mm/migrate.c | 126 ++++---
> mm/mmap.c | 15 +-
> mm/mremap.c | 4 +-
> mm/page_idle.c | 2 +-
> mm/rmap.c | 690 ++++++++++++++++++++++++++++++++++---
> mm/vma.c | 76 ++--
> mm/vma.h | 4 +-
> mm/vma_exec.c | 2 +-
> mm/vma_init.c | 1 +
> 28 files changed, 1279 insertions(+), 206 deletions(-)
Hi!
When I saw the diffsat I was concerned. Going through the patches made me ...
more concerned :)
This is a lot of complexity. On top of something that is already so complicated
that I fail to grasp most details without regularly taking a look at the nice
figures Lorenzo created recently.
For example, I read above "since child VMAs do not need to allocate anon_vma"
and wondered how that could be part of something that is just done lazily. Then
I had to learn in the patches that there is some additional "Child VMAs
are created as ANON_VMA_TREE_PARENT and do not allocate anon_vma" -- excuse me,
what? :)
Reading about VMA refcounts made me shiver. Reading "Holding only
folio_lock(folio) cannot guarantee that the split
operation completes atomically." confused me. Learning that we have to invent
interesting ways to make page migration mutually exclusive to free_pgtables()
concerned me. Figuring out that there are arch-specific config options and
runtime toggles is a clear warning sign.
Seeing test_folio_unmapped() was funny, though (why?! :)).
I think this patch set has a noble goal of reducing anon_vma overhead when anon
pages are not shared during fork. However, using anon_vma for them actually
makes the overall implementation (e.g., rmap walks, locking) more consistent and
simpler.
Even if we could be convinced that most of this here is correct, how should we
reasonably maintain this increasing level of complexity here?
I won't echo what has already been said in this thread (and I didn't manage to
read all, unfortunately), but for such big and invasive work it's often best to
get in touch with the community earlier. Otherwise, you might end up wasting
your time.
Ok, arguably, someone who writes that code learns a lot on the way. And if this
code really was written by one developer only, I tip my hat! I'd be curious if
that code already ran somewhere on some Android kernel out there?
But adding more complexity on top of something that's already extremely
complicated to save some memory looks like the wrong direction, really.
I was excited when Lorenzo started working on a completely new approach that
would focus on improving the common cases while trying to reduce the overall
complexity. Because I think most of us really dislike anon_vma. It's still work
in progress, and I am sure there are some rough edges.
But fundamentally, I think we want to find a new design that is just naturally
simpler.
Lorenzo has been hard at work exploring various design options (and I'm afraid
he might be one of the 3 people on this planet that understand anon_vma in full
detail), so I suggest we wait for a redesign proposal from him and see if that
is doable?
--
Cheers,
David
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-03 20:25 ` David Hildenbrand (Arm)
@ 2026-06-03 22:14 ` Barry Song
2026-06-04 4:03 ` wangtao
2026-06-04 3:10 ` xu.xin16
2026-06-04 9:40 ` Lorenzo Stoakes
2 siblings, 1 reply; 64+ messages in thread
From: Barry Song @ 2026-06-03 22:14 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: tao, catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86,
akpm, willy, sj, kees, luizcap, zhangjiao2, kas, ljs, hpa, liam,
vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh, jgg,
jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, lance.yang, xu.xin16, chengming.zhou, nao.horiguchi,
matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
ying.huang, apopple, pfalcato, linux-arm-kernel, linux-kernel,
linux-fsdevel, linux-mm, damon, shakeel.butt, ryncsn, jparsana,
dvander, zhangji1, wangzicheng
On Thu, Jun 4, 2026 at 4:25 AM David Hildenbrand (Arm) <david@kernel.org> wrote:
[...]
> >
> > arch/arm64/Kconfig | 1 +
> > arch/x86/Kconfig | 1 +
> > fs/proc/page.c | 6 +-
> > include/linux/mm.h | 38 ++
> > include/linux/mm_types.h | 9 +-
> > include/linux/page-flags.h | 34 +-
> > include/linux/pagemap.h | 2 +-
> > include/linux/rmap.h | 165 ++++++++-
> > mm/Kconfig | 22 ++
> > mm/damon/ops-common.c | 4 +-
> > mm/debug.c | 2 +-
> > mm/debug_vm_pgtable.c | 2 +-
> > mm/gup.c | 6 +-
> > mm/huge_memory.c | 16 +-
> > mm/internal.h | 171 +++++++++
> > mm/khugepaged.c | 13 +-
> > mm/ksm.c | 43 ++-
> > mm/memory-failure.c | 11 +-
> > mm/memory.c | 19 +-
> > mm/migrate.c | 126 ++++---
> > mm/mmap.c | 15 +-
> > mm/mremap.c | 4 +-
> > mm/page_idle.c | 2 +-
> > mm/rmap.c | 690 ++++++++++++++++++++++++++++++++++---
> > mm/vma.c | 76 ++--
> > mm/vma.h | 4 +-
> > mm/vma_exec.c | 2 +-
> > mm/vma_init.c | 1 +
> > 28 files changed, 1279 insertions(+), 206 deletions(-)
>
> Hi!
>
> When I saw the diffsat I was concerned. Going through the patches made me ...
> more concerned :)
>
> This is a lot of complexity. On top of something that is already so complicated
> that I fail to grasp most details without regularly taking a look at the nice
> figures Lorenzo created recently.
>
> For example, I read above "since child VMAs do not need to allocate anon_vma"
> and wondered how that could be part of something that is just done lazily. Then
> I had to learn in the patches that there is some additional "Child VMAs
> are created as ANON_VMA_TREE_PARENT and do not allocate anon_vma" -- excuse me,
> what? :)
Yes, that part is quite complicated here. There are two cases here:
1. A forks B, and B inherits a VMA from A. In this case,
B's VMA gets ANON_VMA_TREE_PARENT.
2. A forks B, and B later creates a new VMA via mmap().
If a page fault occurs in this new VMA, it gets
ANON_VMA_TREE_VMA.
In both cases, we need to upgrade B to a regular anon_vma
when B becomes a parent and performs a fork().
This may be a bit off-topic, but I'm also considering
whether there is a chance to work with Suren to support
case 2 via a GKI hook in the Android kernel before
Lorenzo's work is ready.
Even then, the optimization would apply only to the case
where B never forks, allowing us to skip the anon_vma
"upgrade" entirely. That assumption holds for most
applications, although there are a few cases where it does
not.
I'm actually hoping Android could eventually disable
forking for UI applications altogether. From what I've
heard, some applications use fork() primarily to evade
LMKD (the Android low-memory killer daemon). For
example, a child process may monitor the main process,
and if the main process is killed, detect that event and
request a relaunch. This is one way some applications
attempt to keep themselves alive indefinitely.
But even if we limit the optimization to the subset of
case 2 where B never forks, we still need to handle
mremap(), VMA merges, VMA splits, and similar cases.
That starts to become quite a headache.
So please just ignore my rambling if it turns out to be
nonsense :-)
>
> Reading about VMA refcounts made me shiver. Reading "Holding only
> folio_lock(folio) cannot guarantee that the split
> operation completes atomically." confused me. Learning that we have to invent
> interesting ways to make page migration mutually exclusive to free_pgtables()
> concerned me. Figuring out that there are arch-specific config options and
> runtime toggles is a clear warning sign.
>
> Seeing test_folio_unmapped() was funny, though (why?! :)).
>
> I think this patch set has a noble goal of reducing anon_vma overhead when anon
> pages are not shared during fork. However, using anon_vma for them actually
> makes the overall implementation (e.g., rmap walks, locking) more consistent and
> simpler.
>
> Even if we could be convinced that most of this here is correct, how should we
> reasonably maintain this increasing level of complexity here?
>
> I won't echo what has already been said in this thread (and I didn't manage to
> read all, unfortunately), but for such big and invasive work it's often best to
> get in touch with the community earlier. Otherwise, you might end up wasting
> your time.
>
> Ok, arguably, someone who writes that code learns a lot on the way. And if this
> code really was written by one developer only, I tip my hat! I'd be curious if
> that code already ran somewhere on some Android kernel out there?
I heard from Zicheng that they have been running this for
months and it seems reasonably stable. Please correct me if
I'm wrong, Zicheng :-). This really should have been
discussed with the community earlier.
>
> But adding more complexity on top of something that's already extremely
> complicated to save some memory looks like the wrong direction, really.
>
> I was excited when Lorenzo started working on a completely new approach that
> would focus on improving the common cases while trying to reduce the overall
> complexity. Because I think most of us really dislike anon_vma. It's still work
> in progress, and I am sure there are some rough edges.
>
> But fundamentally, I think we want to find a new design that is just naturally
> simpler.
>
+1
> Lorenzo has been hard at work exploring various design options (and I'm afraid
> he might be one of the 3 people on this planet that understand anon_vma in full
> detail), so I suggest we wait for a redesign proposal from him and see if that
> is doable?
>
Thanks
Barry
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-03 20:25 ` David Hildenbrand (Arm)
2026-06-03 22:14 ` Barry Song
@ 2026-06-04 3:10 ` xu.xin16
2026-06-04 4:10 ` wangtao
2026-06-04 9:40 ` Lorenzo Stoakes
2 siblings, 1 reply; 64+ messages in thread
From: xu.xin16 @ 2026-06-04 3:10 UTC (permalink / raw)
To: david, tao.wangtao
Cc: tao.wangtao, catalin.marinas, will, tglx, mingo, bp, dave.hansen,
x86, akpm, willy, sj, kees, luizcap, zhangjiao2, kas, ljs, hpa,
liam, vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh, jgg,
jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, chengming.zhou, nao.horiguchi,
matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
ying.huang, apopple, pfalcato, linux-arm-kernel, linux-kernel,
linux-fsdevel, linux-mm, damon, shakeel.butt, ryncsn, dvander,
wangzicheng
> >
> > arch/arm64/Kconfig | 1 +
> > arch/x86/Kconfig | 1 +
> > fs/proc/page.c | 6 +-
> > include/linux/mm.h | 38 ++
> > include/linux/mm_types.h | 9 +-
> > include/linux/page-flags.h | 34 +-
> > include/linux/pagemap.h | 2 +-
> > include/linux/rmap.h | 165 ++++++++-
> > mm/Kconfig | 22 ++
> > mm/damon/ops-common.c | 4 +-
> > mm/debug.c | 2 +-
> > mm/debug_vm_pgtable.c | 2 +-
> > mm/gup.c | 6 +-
> > mm/huge_memory.c | 16 +-
> > mm/internal.h | 171 +++++++++
> > mm/khugepaged.c | 13 +-
> > mm/ksm.c | 43 ++-
> > mm/memory-failure.c | 11 +-
> > mm/memory.c | 19 +-
> > mm/migrate.c | 126 ++++---
> > mm/mmap.c | 15 +-
> > mm/mremap.c | 4 +-
> > mm/page_idle.c | 2 +-
> > mm/rmap.c | 690 ++++++++++++++++++++++++++++++++++---
> > mm/vma.c | 76 ++--
> > mm/vma.h | 4 +-
> > mm/vma_exec.c | 2 +-
> > mm/vma_init.c | 1 +
> > 28 files changed, 1279 insertions(+), 206 deletions(-)
>
> Hi!
>
> When I saw the diffsat I was concerned. Going through the patches made me ...
> more concerned :)
>
> This is a lot of complexity. On top of something that is already so complicated
> that I fail to grasp most details without regularly taking a look at the nice
> figures Lorenzo created recently.
>
> For example, I read above "since child VMAs do not need to allocate anon_vma"
> and wondered how that could be part of something that is just done lazily. Then
> I had to learn in the patches that there is some additional "Child VMAs
> are created as ANON_VMA_TREE_PARENT and do not allocate anon_vma" -- excuse me,
> what? :)
>
> Reading about VMA refcounts made me shiver. Reading "Holding only
> folio_lock(folio) cannot guarantee that the split
> operation completes atomically." confused me. Learning that we have to invent
> interesting ways to make page migration mutually exclusive to free_pgtables()
> concerned me. Figuring out that there are arch-specific config options and
> runtime toggles is a clear warning sign.
>
> Seeing test_folio_unmapped() was funny, though (why?! :)).
>
> I think this patch set has a noble goal of reducing anon_vma overhead when anon
> pages are not shared during fork. However, using anon_vma for them actually
> makes the overall implementation (e.g., rmap walks, locking) more consistent and
> simpler.
>
> Even if we could be convinced that most of this here is correct, how should we
> reasonably maintain this increasing level of complexity here?
Indeed, it's very complex, but having the changes of 15 patches scattered across
various subsystems is really frustrating for reviewers. It took me a whole day to
read through the entire patch set, which made an already complicated matter even
more complex (maintaining such complex code in the future will be a pain).
However, overall, I think the original intention behind Tao's patch is innovative
and valuable, and Tao could definitely make this patch set simpler and more
readable, because the core changes actually start from PATCH 10.
I believe that if Tao had done the following, things might have gone better and easier
for reviewing. In fact, I understand the motivation behind the patch is quite simple
at its core (just wanting to avoid allocating the anon_vma structure when a VMA hasn't
been truly forked, and instead put the VMA information directly into folio->mapping):
1) You could actually simplify your patch significantly — without adding a lot of wrappers
and helper functions that introduce extra review overhead — and keep only the most essential elements.
2) Provide complete test code (in tools/testing/selftest) that covers the affected functionality,
such as VMA, huge pages, KSM, etc.
3) Use the RFC tag to start a discussion.
I would be very glad to see if Tao could post a simpler v2 version that does not alter the rmap
core data structures too much and does not introduce excessive complexity, no matter whether
it can be merged finaly.
^ permalink raw reply [flat|nested] 64+ messages in thread
* RE: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-03 11:53 ` Lorenzo Stoakes
@ 2026-06-04 3:50 ` wangtao
0 siblings, 0 replies; 64+ messages in thread
From: wangtao @ 2026-06-04 3:50 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Harry Yoo, catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, hpa@zytor.com,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, jannh@google.com, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, ziy@nvidia.com,
baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org,
lance.yang@linux.dev, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, nao.horiguchi@gmail.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
pfalcato@suse.de, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, 21cnbao@gmail.com, jparsana@google.com,
dvander@google.com, zhangji, wangzicheng
>
> Thanks for your replies, but I really have to stop doing deeper analyses like
> these for time management purposes.
Of course I will respond to technical discussions.
>
> I did this more so to make the point from [0] as to why, in lower trust
> environments, this is just not feasible.
>
> We could loop around for hours and hours and hours here.
>
> In general as before, even if all worked perfectly (I'm very much not at all
> convinced), extending anon_vma and pinning VMAs is simply a no-go for
> architectural and complexity reasons.
>
> I also find the locking story dubious and the lack of tests or anything
> corroborating correctness is additionally fatal.
>
During rmap, anon_vma provides a superset of VMAs. We first confirm
with vma_address(), and then in each rmap_one we further check whether
the VMA needs to be processed through page_vma_mapped_walk() and
check_pte().
The lazy VMA used by ANON_VMA_LAZY provides only one VMA: if there is
no fork or mremap, then this single VMA is sufficient. To avoid taking
the folio_lock during fork and mremap, after anon_walk_anon, if
folio->mapping is upgraded to anon_vma, we retry once.
If your concern is about the lack of locking during rmap, you could
also refer to folio_wait_table and add a set of anon_vma_locks. That
was how I handled it during my initial debugging. Later, after
reviewing the code flow, I found that the lock might not be necessary,
so I removed it.
> And finally, I was already working on a replacement for anon_vma, and the
> generally done thing in these situations is for my work to take precedence.
>
> So I'm going to bail out on futher deeper analyses here as otherwise I simply
> can't work on anything else :)
>
> Thanks, Lorenzo
>
> [0]:https://lore.kernel.org/all/ah887A5VkXOcmq-g@lucifer/
>
>
> On Wed, Jun 03, 2026 at 11:05:28AM +0000, wangtao wrote:
> > > >
> > >
> > > Against my better judgment I'll address the stuff here...
> > >
> > > > VMA operations can be roughly divided into three categories. The
> > > > handling of ANON_VMA_LAZY is briefly described below.
> > >
> > > I don't agree, there are plenty more VMA operations. But with
> > > respect to anon rmap there are:
> > >
> > > - fork
> > > - merge/split
> > > - remap
> > >
> >
> > Yes, these are the three categories. I originally intended to explain
> > them by classifying based on system calls; I should have used mremap
> instead of move_vma.
>
> I don't think you mentioned move_vma()? Maybe I missed it.
>
> The categorisation is most usefully based on callers of anon_vma_clone().
>
> >
> > 是的,是这三类,我本想从系统调用去分类说明,应该将move_vma
> 换成mremap的。
> >
> > > Your approach seems to completely ignore VMA split and the need to
> > > maintain an interval tree to _multiple_ VMAs from a single anon_vma.
> > >
> >
> > The folio uses vma->root_vma to compute folio_address. A VMA split
> > from it, vma_a, also uses vma_a->root_vma = vma->root_vma to compute
> folio_address.
> > During rmap, once folio_address is obtained, the VMA can be found
> > through mm_mt. Without fork, there is no need to maintain the interval
> tree.
>
> Well you need to search for every possible split VMA in mm_mt now, so you
> have to go page-by-page searching for each page for the rmap walked range.
>
ANON_VMA_LAZY has only one VMA. When I first looked at
rmap_walk_ksm, I also thought it would need to search page by page,
which seemed unacceptable. Later I realized that it only needs to
check whether this VMA falls within the rmap walk range.
@@ -3173,20 +3171,20 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
- anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
+ anon_rmap_foreach_vma(vma, vmac, anon_rmap,
0, ULONG_MAX) {
> You're also potentially racing against a remap, as you say below you don't
> folio lock on remap so concurrent rmap walkers can be present, the VMA can
> already be copied.
>
> We already have VMA lifecycle state around detached VMAs, so a VMA
> could be in a detached state, assumed by the existing logic to be entirely
> unavailable for use, out of the maple tree altogether but kept around in a
> zombie state.
>
> We'd then have lifecycle issues and races and edge cases around process
> teardown otherwise we might leak memory.
>
> Also, presumably you set vma->anon_vma to some lazy sentinel value so
> that mremap doesn't change vma->vm_pgoff when unfaulted?
>
> You would need to update any path that manipulates vma->anon_vma also
> so it doesn't incorrectly dereference it.
>
Yes, most of the code in this patch series is intended to prevent
incorrect dereferencing of anon_vma. If we assume it will not be
misused, some of the code could be simplified or removed.
> >
> > folio使用vma->root_vma 计算folio_address;从vma拆分出的vma_a,
> 使用vma_a->root_vma =
> > folio使用vma->vma->root_vma计算folio_address。
> > rmap时得到folio_address就可以通过mm_mt查找到vma。
> > 不fork就不需要维护interval tree。
> >
> > > You may also actually split a VMA against a single large folio
> > > (waiting on the deferred shrinker) and have a SINGLE _leaf_
> > > anonymous folio that is mapped in two places.
> > >
> > > The lazy approach doesn't seem to address this properly. And fatally
> > > it ties an actual VMA afaict to the folio and has to implement a VMA
> > > reference count mechanism which interferes with the ordinarily VMA
> lifecycle to do it.
> > >
> > > The fact of us taking advantage of most stuff being AnonExclusive, i.e.
> > > 'leaves' is something that my approach is exactly taking into account.
> > >
> > > Of course also extending anon_vma is a real non-starter.
> > >
> > > Also the below + the series ignores MAP_PRIVATE file-backed mappings
> > > which is a pretty fatal flaw.
> > >
> > > It also, as Harry says, has zero description of correctness in a way
> > > we'd want and no tests.
> > >
> >
> > 可以正确处理拆分vma在一个大页。拆分的vma_a或vma_b上的
> sub_page使用如下方式计算地址。
> > 对于文件vma的cow 匿名页,也用同样方式计算page/folio地址。
> >
> > It can correctly handle the case where a VMA is split within a large
> > page. The address of a sub_page in the split VMA (vma_a or vma_b) is
> > computed using the following method.
> >
> > For COW anonymous pages originating from file VMAs, the page/folio
> > address is also computed using the same method.
> >
> > subpage_address = vma_address(vma_a, subpage_pgoff, 1) =
> > vma_a->vm_start + (subpage_pgoff - vma_a->vm_pgoff) * PAGE_SIZE =
> > vma_a->vm_start - vma_a->vm_pgoff * PAGE_SIZE + subpage_pgoff *
> > PAGE_SIZE = vma_mapping_base(vma_a) + subpage_pgoff * PAGE_SIZE
> =
> > vma_mapping_base(root_vma) + subpage_pgoff * PAGE_SIZE
>
> OK but you want to walk entries in a _range_ in the interval tree.
>
> So you are then now looking up VMAs (in a racey way) using mm_mt (which
> is the whole basis of my work actually) which could change under you.
>
> I guess what you're doing is using the pinned 'root' VMA as the basis of
> everything, and the second a VMA is moved you (somehow) walk the page
> tables to update the folio->mapping.
>
> Again pinning the VMA like this and putting it in a folio is really not something
> we want to do.
>
> It adds a ton of complexity and also impacts VMA lifecycle which is already
> fairly fraught.
>
> It makes the VMA no longer just a VMA but rather also a 'memory' of where
> something was first faulted in as a hack more or less.
>
Maybe you're right. mm/mm_mt/vma/pagetable each have their own roles
in implementing VM. Perhaps considering them together could lead to
better ideas.
^ permalink raw reply [flat|nested] 64+ messages in thread
* RE: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-03 22:14 ` Barry Song
@ 2026-06-04 4:03 ` wangtao
2026-06-04 4:20 ` Barry Song
0 siblings, 1 reply; 64+ messages in thread
From: wangtao @ 2026-06-04 4:03 UTC (permalink / raw)
To: Barry Song, David Hildenbrand (Arm)
Cc: catalin.marinas@arm.com, will@kernel.org, tglx@kernel.org,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
x86@kernel.org, akpm@linux-foundation.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, ljs@kernel.org,
hpa@zytor.com, liam@infradead.org, vbabka@kernel.org,
rppt@kernel.org, surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, lance.yang@linux.dev,
xu.xin16@zte.com.cn, chengming.zhou@linux.dev,
nao.horiguchi@gmail.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
gourry@gourry.net, ying.huang@linux.alibaba.com,
apopple@nvidia.com, pfalcato@suse.de,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com,
zhangji, wangzicheng
> >
> > Hi!
> >
> > When I saw the diffsat I was concerned. Going through the patches made
> me ...
> > more concerned :)
> >
> > This is a lot of complexity. On top of something that is already so
> > complicated that I fail to grasp most details without regularly taking
> > a look at the nice figures Lorenzo created recently.
> >
> > For example, I read above "since child VMAs do not need to allocate
> anon_vma"
> > and wondered how that could be part of something that is just done
> > lazily. Then I had to learn in the patches that there is some
> > additional "Child VMAs are created as ANON_VMA_TREE_PARENT and do
> not
> > allocate anon_vma" -- excuse me, what? :)
>
> Yes, that part is quite complicated here. There are two cases here:
> 1. A forks B, and B inherits a VMA from A. In this case, B's VMA gets
> ANON_VMA_TREE_PARENT.
>
> 2. A forks B, and B later creates a new VMA via mmap().
> If a page fault occurs in this new VMA, it gets ANON_VMA_TREE_VMA.
>
> In both cases, we need to upgrade B to a regular anon_vma when B becomes
> a parent and performs a fork().
>
Thank you for helping explain.
> This may be a bit off-topic, but I'm also considering whether there is a chance
> to work with Suren to support case 2 via a GKI hook in the Android kernel
> before Lorenzo's work is ready.
>
> Even then, the optimization would apply only to the case where B never
> forks, allowing us to skip the anon_vma "upgrade" entirely. That assumption
> holds for most applications, although there are a few cases where it does not.
>
> I'm actually hoping Android could eventually disable forking for UI
> applications altogether. From what I've heard, some applications use fork()
> primarily to evade LMKD (the Android low-memory killer daemon). For
> example, a child process may monitor the main process, and if the main
> process is killed, detect that event and request a relaunch. This is one way
> some applications attempt to keep themselves alive indefinitely.
>
> But even if we limit the optimization to the subset of case 2 where B never
> forks, we still need to handle mremap(), VMA merges, VMA splits, and
> similar cases.
> That starts to become quite a headache.
>
> So please just ignore my rambling if it turns out to be nonsense :-)
>
> >
> > Reading about VMA refcounts made me shiver. Reading "Holding only
> > folio_lock(folio) cannot guarantee that the split operation completes
> > atomically." confused me. Learning that we have to invent interesting
> > ways to make page migration mutually exclusive to free_pgtables()
> > concerned me. Figuring out that there are arch-specific config options
> > and runtime toggles is a clear warning sign.
> >
> > Seeing test_folio_unmapped() was funny, though (why?! :)).
> >
> > I think this patch set has a noble goal of reducing anon_vma overhead
> > when anon pages are not shared during fork. However, using anon_vma
> > for them actually makes the overall implementation (e.g., rmap walks,
> > locking) more consistent and simpler.
> >
> > Even if we could be convinced that most of this here is correct, how
> > should we reasonably maintain this increasing level of complexity here?
> >
> > I won't echo what has already been said in this thread (and I didn't
> > manage to read all, unfortunately), but for such big and invasive work
> > it's often best to get in touch with the community earlier. Otherwise,
> > you might end up wasting your time.
> >
> > Ok, arguably, someone who writes that code learns a lot on the way.
> > And if this code really was written by one developer only, I tip my
> > hat! I'd be curious if that code already ran somewhere on some Android
> kernel out there?
>
> I heard from Zicheng that they have been running this for months and it
> seems reasonably stable. Please correct me if I'm wrong, Zicheng :-). This
> really should have been discussed with the community earlier.
>
I initially developed and debugged this based on the Android GKI branch
and did some preliminary testing on an Android phone.
Since GKI generally only accepts features merged from the upstream
community, and this memory saving could also benefit the community, I
ported the patch to the Linux master branch.
Because my English is not very good and I rarely participate in the
community, I am not familiar with the community workflow. I did not send
an email for discussion in advance with an RFC tag. I apologize again.
> >
> > But adding more complexity on top of something that's already
> > extremely complicated to save some memory looks like the wrong
> direction, really.
> >
> > I was excited when Lorenzo started working on a completely new
> > approach that would focus on improving the common cases while trying
> > to reduce the overall complexity. Because I think most of us really
> > dislike anon_vma. It's still work in progress, and I am sure there are some
> rough edges.
> >
> > But fundamentally, I think we want to find a new design that is just
> > naturally simpler.
> >
>
> +1
>
> > Lorenzo has been hard at work exploring various design options (and
> > I'm afraid he might be one of the 3 people on this planet that
> > understand anon_vma in full detail), so I suggest we wait for a
> > redesign proposal from him and see if that is doable?
> >
>
> Thanks
> Barry
^ permalink raw reply [flat|nested] 64+ messages in thread
* RE: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-04 3:10 ` xu.xin16
@ 2026-06-04 4:10 ` wangtao
0 siblings, 0 replies; 64+ messages in thread
From: wangtao @ 2026-06-04 4:10 UTC (permalink / raw)
To: xu.xin16@zte.com.cn, david@kernel.org
Cc: catalin.marinas@arm.com, will@kernel.org, tglx@kernel.org,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
x86@kernel.org, akpm@linux-foundation.org, willy@infradead.org,
sj@kernel.org, kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, ljs@kernel.org,
hpa@zytor.com, liam@infradead.org, vbabka@kernel.org,
rppt@kernel.org, surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org,
lance.yang@linux.dev, chengming.zhou@linux.dev,
nao.horiguchi@gmail.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
gourry@gourry.net, ying.huang@linux.alibaba.com,
apopple@nvidia.com, pfalcato@suse.de,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, dvander@google.com, wangzicheng
> >
> > Even if we could be convinced that most of this here is correct, how
> > should we reasonably maintain this increasing level of complexity here?
>
> Indeed, it's very complex, but having the changes of 15 patches scattered
> across various subsystems is really frustrating for reviewers. It took me a
> whole day to read through the entire patch set, which made an already
> complicated matter even more complex (maintaining such complex code in
> the future will be a pain).
>
> However, overall, I think the original intention behind Tao's patch is
> innovative and valuable, and Tao could definitely make this patch set simpler
> and more readable, because the core changes actually start from PATCH 10.
>
Yes, initially it was basically patches 9, 10, and 13. Because without
distinguishing the anon_rmap and anon_vma topology, some fundamental code
logic is hard to understand, I added these two logical layers when
preparing to submit it to the community. However, this also significantly
increased the amount of code, which is not ideal.
> I believe that if Tao had done the following, things might have gone better
> and easier for reviewing. In fact, I understand the motivation behind the
> patch is quite simple at its core (just wanting to avoid allocating the
> anon_vma structure when a VMA hasn't been truly forked, and instead put
> the VMA information directly into folio->mapping):
>
> 1) You could actually simplify your patch significantly — without adding a lot
> of wrappers and helper functions that introduce extra review overhead —
> and keep only the most essential elements.
>
> 2) Provide complete test code (in tools/testing/selftest) that covers the
> affected functionality, such as VMA, huge pages, KSM, etc.
>
> 3) Use the RFC tag to start a discussion.
>
>
> I would be very glad to see if Tao could post a simpler v2 version that does
> not alter the rmap core data structures too much and does not introduce
> excessive complexity, no matter whether it can be merged finaly.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-04 4:03 ` wangtao
@ 2026-06-04 4:20 ` Barry Song
2026-06-04 7:35 ` wangtao
0 siblings, 1 reply; 64+ messages in thread
From: Barry Song @ 2026-06-04 4:20 UTC (permalink / raw)
To: wangtao
Cc: David Hildenbrand (Arm), catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, willy@infradead.org, sj@kernel.org,
kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, ljs@kernel.org,
hpa@zytor.com, liam@infradead.org, vbabka@kernel.org,
rppt@kernel.org, surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, lance.yang@linux.dev,
xu.xin16@zte.com.cn, chengming.zhou@linux.dev,
nao.horiguchi@gmail.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
gourry@gourry.net, ying.huang@linux.alibaba.com,
apopple@nvidia.com, pfalcato@suse.de,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com,
zhangji, wangzicheng
On Thu, Jun 4, 2026 at 12:03 PM wangtao <tao.wangtao@honor.com> wrote:
[...]
> > > I won't echo what has already been said in this thread (and I didn't
> > > manage to read all, unfortunately), but for such big and invasive work
> > > it's often best to get in touch with the community earlier. Otherwise,
> > > you might end up wasting your time.
> > >
> > > Ok, arguably, someone who writes that code learns a lot on the way.
> > > And if this code really was written by one developer only, I tip my
> > > hat! I'd be curious if that code already ran somewhere on some Android
> > kernel out there?
> >
> > I heard from Zicheng that they have been running this for months and it
> > seems reasonably stable. Please correct me if I'm wrong, Zicheng :-). This
> > really should have been discussed with the community earlier.
> >
> I initially developed and debugged this based on the Android GKI branch
> and did some preliminary testing on an Android phone.
>
> Since GKI generally only accepts features merged from the upstream
> community, and this memory saving could also benefit the community, I
> ported the patch to the Linux master branch.
>
> Because my English is not very good and I rarely participate in the
> community, I am not familiar with the community workflow. I did not send
> an email for discussion in advance with an RFC tag. I apologize again.
>
No worries. I know someone who has worked on the Linux
kernel for many, many years and has excellent kernel
expertise, yet has never submitted a patch throughout his
career for various reasons.
I heard from Zicheng that you are HONOR's key MM expert
and have been guiding them on memory-management related
work. That's really impressive. Personally, I'd love to
see more ideas and contributions from you in the linux-mm
community.
BTW, regarding my earlier suggestion about using GKI
hooks—limiting the optimization to newly created VMAs and
to applications that never call fork()—do you have any
ideas on what the smallest possible hook change would look
like?
Thanks
Barry
^ permalink raw reply [flat|nested] 64+ messages in thread
* RE: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-04 4:20 ` Barry Song
@ 2026-06-04 7:35 ` wangtao
0 siblings, 0 replies; 64+ messages in thread
From: wangtao @ 2026-06-04 7:35 UTC (permalink / raw)
To: Barry Song
Cc: David Hildenbrand (Arm), catalin.marinas@arm.com, will@kernel.org,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, willy@infradead.org, sj@kernel.org,
kees@kernel.org, luizcap@redhat.com,
zhangjiao2@cmss.chinamobile.com, kas@kernel.org, ljs@kernel.org,
hpa@zytor.com, liam@infradead.org, vbabka@kernel.org,
rppt@kernel.org, surenb@google.com, mhocko@suse.com, jack@suse.cz,
riel@surriel.com, harry@kernel.org, jannh@google.com,
jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, lance.yang@linux.dev,
xu.xin16@zte.com.cn, chengming.zhou@linux.dev,
nao.horiguchi@gmail.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
gourry@gourry.net, ying.huang@linux.alibaba.com,
apopple@nvidia.com, pfalcato@suse.de,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, damon@lists.linux.dev, shakeel.butt@linux.dev,
ryncsn@gmail.com, jparsana@google.com, dvander@google.com,
zhangji, wangzicheng
> [...]
> > > > I won't echo what has already been said in this thread (and I
> > > > didn't manage to read all, unfortunately), but for such big and
> > > > invasive work it's often best to get in touch with the community
> > > > earlier. Otherwise, you might end up wasting your time.
> > > >
> > > > Ok, arguably, someone who writes that code learns a lot on the way.
> > > > And if this code really was written by one developer only, I tip
> > > > my hat! I'd be curious if that code already ran somewhere on some
> > > > Android
> > > kernel out there?
> > >
> > > I heard from Zicheng that they have been running this for months and
> > > it seems reasonably stable. Please correct me if I'm wrong, Zicheng
> > > :-). This really should have been discussed with the community earlier.
> > >
> > I initially developed and debugged this based on the Android GKI
> > branch and did some preliminary testing on an Android phone.
> >
> > Since GKI generally only accepts features merged from the upstream
> > community, and this memory saving could also benefit the community, I
> > ported the patch to the Linux master branch.
> >
> > Because my English is not very good and I rarely participate in the
> > community, I am not familiar with the community workflow. I did not
> > send an email for discussion in advance with an RFC tag. I apologize again.
> >
>
> No worries. I know someone who has worked on the Linux kernel for many,
> many years and has excellent kernel expertise, yet has never submitted a
> patch throughout his career for various reasons.
>
> I heard from Zicheng that you are HONOR's key MM expert and have been
> guiding them on memory-management related work. That's really impressive.
> Personally, I'd love to see more ideas and contributions from you in the linux-
> mm community.
>
> BTW, regarding my earlier suggestion about using GKI hooks—limiting the
> optimization to newly created VMAs and to applications that never call
> fork()—do you have any ideas on what the smallest possible hook change
> would look like?
Even if Android does not consider handling for 32-bit, KSM, or
memory-failure, a large number of hooks are still required.
A different implementation approach is also needed. For example, one
or a set of anon_vma_locks could be used to mark whether a VMA is
lazy, so that we can avoid modifying all places that dereference
anon_vma. Reserved fields in vma could be used to record the root_vma
and a reference count respectively.
Roughly, hooks would be needed in the following places:
1. __vmf_anon_prepare: mark lazy_vma in the fault path.
2. __folio_set_anon and folio_move_anon_rmap: set mapping to lazy_vma.
3. rmap_walk_anon: handle rmap_walk for lazy_vma.
4. folio_get_anon_vma and put_anon_vma: add hooks to distinguish
lazy_vma handling.
5. anon_vma_fork: decide whether to upgrade pvma->anon_vma.
6. Add a hook in try_dup_anon_rmap to upgrade folio->mapping.
7. anon_vma_clone: handle links when src is a lazy_vma.
8. vm_area_alloc / vm_area_dup / vm_area_free: add VMA reference
counting.
9. mremap: handle upgrades in copy_vma and move_page_tables.
If this can be restricted to apps only and the app never calls fork(),
then 5 and 6 could also be removed.
If each page's mapping in the page tables is synchronously updated
when upgrading lazy_vma, then 4 and 5 could be merged into one.
If lazy_vma is restricted to anonymous VMAs where
vma_mapping_base(vma) = 0, then root_vma and the VMA reference count
could also be removed.
With this restriction, we could directly use folio->mapping to record
mm, and locate the vma using folio->index, which would simplify this
patch series.
>
> Thanks
> Barry
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
2026-06-03 20:25 ` David Hildenbrand (Arm)
2026-06-03 22:14 ` Barry Song
2026-06-04 3:10 ` xu.xin16
@ 2026-06-04 9:40 ` Lorenzo Stoakes
2 siblings, 0 replies; 64+ messages in thread
From: Lorenzo Stoakes @ 2026-06-04 9:40 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: tao, catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86,
akpm, willy, sj, kees, luizcap, zhangjiao2, kas, hpa, liam,
vbabka, rppt, surenb, mhocko, jack, riel, harry, jannh, jgg,
jhubbard, peterx, ziy, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, xu.xin16, chengming.zhou,
nao.horiguchi, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, pfalcato, linux-arm-kernel,
linux-kernel, linux-fsdevel, linux-mm, damon, shakeel.butt,
ryncsn, 21cnbao, jparsana, dvander, zhangji1, wangzicheng
On Wed, Jun 03, 2026 at 10:25:05PM +0200, David Hildenbrand (Arm) wrote:
> I was excited when Lorenzo started working on a completely new approach that
> would focus on improving the common cases while trying to reduce the overall
> complexity. Because I think most of us really dislike anon_vma. It's still work
> in progress, and I am sure there are some rough edges.
>
> But fundamentally, I think we want to find a new design that is just naturally
> simpler.
>
> Lorenzo has been hard at work exploring various design options (and I'm afraid
> he might be one of the 3 people on this planet that understand anon_vma in full
> detail), so I suggest we wait for a redesign proposal from him and see if that
> is doable?
I'll spend the next month focusing on shipping something as a priority, even if
partial.
Thanks, Lorenzo
^ permalink raw reply [flat|nested] 64+ messages in thread
end of thread, other threads:[~2026-06-04 9:41 UTC | newest]
Thread overview: 64+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
2026-05-27 11:01 ` [PATCH 01/15] mm/rmap: introduce anon_rmap APIs for anonymous folios tao
2026-05-27 11:44 ` Lorenzo Stoakes
2026-05-28 7:47 ` wangtao
2026-05-27 11:01 ` [PATCH 02/15] mm: convert anon_vma rmap APIs to anon_rmap tao
2026-05-27 11:49 ` Lorenzo Stoakes
2026-05-28 8:55 ` wangtao
2026-05-27 11:01 ` [PATCH 03/15] mm: introduce anon_vma_tree_t for multiple anon_vma topologies tao
2026-05-27 11:56 ` Lorenzo Stoakes
2026-05-28 9:00 ` wangtao
2026-05-27 11:01 ` [PATCH 04/15] mm: switch to anon_vma_tree_t APIs in preparation for ANON_VMA_LAZY tao
2026-05-27 11:01 ` [PATCH 05/15] mm: add CONFIG_ANON_VMA_LAZY and folio helpers tao
2026-05-27 11:01 ` [PATCH 06/15] mm: add CONFIG_VMA_REF and VMA helpers tao
2026-05-27 11:01 ` [PATCH 07/15] mm: replace direct FOLIO_MAPPING_ANON usage with helpers tao
2026-05-27 11:01 ` [PATCH 08/15] mm: prepare rmap infrastructure for ANON_VMA_LAZY tao
2026-05-27 11:01 ` [PATCH 09/15] mm: implement ANON_VMA_LAZY rmap semantics tao
2026-05-27 11:01 ` [PATCH 10/15] mm: defer anon_vma creation with ANON_VMA_LAZY tao
2026-05-27 11:01 ` [PATCH 11/15] mm: handle ANON_VMA_LAZY in huge page operations tao
2026-05-27 11:01 ` [PATCH 12/15] mm: handle ANON_VMA_LAZY during migration tao
2026-05-27 11:01 ` [PATCH 13/15] mm: support setup and upgrade of ANON_VMA_LAZY folios tao
2026-05-27 11:01 ` [PATCH 14/15] mm: support merging of ANON_VMA_LAZY VMAs tao
2026-05-27 11:01 ` [PATCH 15/15] mm: enable CONFIG_ANON_VMA_LAZY on arm64 and x86_64 tao
2026-05-27 11:23 ` [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation Pedro Falcato
2026-05-28 6:45 ` wangtao
2026-05-28 7:14 ` Lorenzo Stoakes
2026-05-27 11:30 ` Lorenzo Stoakes
2026-05-28 7:11 ` wangtao
2026-05-28 7:22 ` Lorenzo Stoakes
2026-05-27 14:33 ` Lorenzo Stoakes
2026-05-28 7:57 ` wangtao
2026-05-28 8:14 ` Lorenzo Stoakes
[not found] ` <CAGsJ_4zy=-m5wjm0BC-vQXMHGRkHymC-5S_L9Oi708v339vvPw@mail.gmail.com>
2026-05-29 2:20 ` wangzicheng
2026-05-29 6:56 ` Lorenzo Stoakes
2026-05-29 6:45 ` Lorenzo Stoakes
2026-05-29 9:41 ` wangtao
2026-05-29 12:03 ` Lorenzo Stoakes
2026-06-01 1:46 ` wangtao
2026-06-02 2:15 ` Barry Song
2026-06-02 2:46 ` Lance Yang
2026-06-02 15:37 ` Lorenzo Stoakes
2026-06-02 19:44 ` Pedro Falcato
2026-06-02 23:03 ` Barry Song
2026-06-03 7:07 ` Lorenzo Stoakes
2026-06-02 19:56 ` Harry Yoo
2026-06-02 22:27 ` Barry Song
2026-06-02 20:47 ` Lorenzo Stoakes
2026-05-29 15:07 ` Jonathan Corbet
2026-05-29 15:40 ` Lorenzo Stoakes
2026-05-30 11:28 ` Barry Song
2026-06-02 16:07 ` Harry Yoo
2026-06-03 2:59 ` wangtao
2026-06-03 3:12 ` wangtao
2026-06-03 7:54 ` Lorenzo Stoakes
2026-06-03 11:05 ` wangtao
2026-06-03 11:53 ` Lorenzo Stoakes
2026-06-04 3:50 ` wangtao
2026-06-03 20:25 ` David Hildenbrand (Arm)
2026-06-03 22:14 ` Barry Song
2026-06-04 4:03 ` wangtao
2026-06-04 4:20 ` Barry Song
2026-06-04 7:35 ` wangtao
2026-06-04 3:10 ` xu.xin16
2026-06-04 4:10 ` wangtao
2026-06-04 9:40 ` Lorenzo Stoakes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox