From: wangtao <tao.wangtao@honor.com>
To: Lorenzo Stoakes <ljs@kernel.org>
Cc: Harry Yoo <harry@kernel.org>,
"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
"will@kernel.org" <will@kernel.org>,
"tglx@kernel.org" <tglx@kernel.org>,
"mingo@redhat.com" <mingo@redhat.com>,
"bp@alien8.de" <bp@alien8.de>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"x86@kernel.org" <x86@kernel.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"david@kernel.org" <david@kernel.org>,
"willy@infradead.org" <willy@infradead.org>,
"sj@kernel.org" <sj@kernel.org>,
"kees@kernel.org" <kees@kernel.org>,
"luizcap@redhat.com" <luizcap@redhat.com>,
"zhangjiao2@cmss.chinamobile.com"
<zhangjiao2@cmss.chinamobile.com>,
"kas@kernel.org" <kas@kernel.org>,
"hpa@zytor.com" <hpa@zytor.com>,
"liam@infradead.org" <liam@infradead.org>,
"vbabka@kernel.org" <vbabka@kernel.org>,
"rppt@kernel.org" <rppt@kernel.org>,
"surenb@google.com" <surenb@google.com>,
"mhocko@suse.com" <mhocko@suse.com>,
"jack@suse.cz" <jack@suse.cz>,
"riel@surriel.com" <riel@surriel.com>,
"jannh@google.com" <jannh@google.com>,
"jgg@ziepe.ca" <jgg@ziepe.ca>,
"jhubbard@nvidia.com" <jhubbard@nvidia.com>,
"peterx@redhat.com" <peterx@redhat.com>,
"ziy@nvidia.com" <ziy@nvidia.com>,
"baolin.wang@linux.alibaba.com" <baolin.wang@linux.alibaba.com>,
"npache@redhat.com" <npache@redhat.com>,
"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
"dev.jain@arm.com" <dev.jain@arm.com>,
"baohua@kernel.org" <baohua@kernel.org>,
"lance.yang@linux.dev" <lance.yang@linux.dev>,
"xu.xin16@zte.com.cn" <xu.xin16@zte.com.cn>,
"chengming.zhou@linux.dev" <chengming.zhou@linux.dev>,
"nao.horiguchi@gmail.com" <nao.horiguchi@gmail.com>,
"matthew.brost@intel.com" <matthew.brost@intel.com>,
"joshua.hahnjy@gmail.com" <joshua.hahnjy@gmail.com>,
"rakie.kim@sk.com" <rakie.kim@sk.com>,
"byungchul@sk.com" <byungchul@sk.com>,
"gourry@gourry.net" <gourry@gourry.net>,
"ying.huang@linux.alibaba.com" <ying.huang@linux.alibaba.com>,
"apopple@nvidia.com" <apopple@nvidia.com>,
"pfalcato@suse.de" <pfalcato@suse.de>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"damon@lists.linux.dev" <damon@lists.linux.dev>,
"shakeel.butt@linux.dev" <shakeel.butt@linux.dev>,
"ryncsn@gmail.com" <ryncsn@gmail.com>,
"21cnbao@gmail.com" <21cnbao@gmail.com>,
"jparsana@google.com" <jparsana@google.com>,
"dvander@google.com" <dvander@google.com>,
zhangji <zhangji1@honor.com>, wangzicheng <wangzicheng@honor.com>
Subject: RE: [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation
Date: Thu, 4 Jun 2026 03:50:54 +0000 [thread overview]
Message-ID: <a073c529df7841d98dcec9ddc3dad8bc@honor.com> (raw)
In-Reply-To: <aiAMHhq9QCp-z3V9@lucifer>
>
> Thanks for your replies, but I really have to stop doing deeper analyses like
> these for time management purposes.
Of course I will respond to technical discussions.
>
> I did this more so to make the point from [0] as to why, in lower trust
> environments, this is just not feasible.
>
> We could loop around for hours and hours and hours here.
>
> In general as before, even if all worked perfectly (I'm very much not at all
> convinced), extending anon_vma and pinning VMAs is simply a no-go for
> architectural and complexity reasons.
>
> I also find the locking story dubious and the lack of tests or anything
> corroborating correctness is additionally fatal.
>
During rmap, anon_vma provides a superset of VMAs. We first confirm
with vma_address(), and then in each rmap_one we further check whether
the VMA needs to be processed through page_vma_mapped_walk() and
check_pte().
The lazy VMA used by ANON_VMA_LAZY provides only one VMA: if there is
no fork or mremap, then this single VMA is sufficient. To avoid taking
the folio_lock during fork and mremap, after anon_walk_anon, if
folio->mapping is upgraded to anon_vma, we retry once.
If your concern is about the lack of locking during rmap, you could
also refer to folio_wait_table and add a set of anon_vma_locks. That
was how I handled it during my initial debugging. Later, after
reviewing the code flow, I found that the lock might not be necessary,
so I removed it.
> And finally, I was already working on a replacement for anon_vma, and the
> generally done thing in these situations is for my work to take precedence.
>
> So I'm going to bail out on futher deeper analyses here as otherwise I simply
> can't work on anything else :)
>
> Thanks, Lorenzo
>
> [0]:https://lore.kernel.org/all/ah887A5VkXOcmq-g@lucifer/
>
>
> On Wed, Jun 03, 2026 at 11:05:28AM +0000, wangtao wrote:
> > > >
> > >
> > > Against my better judgment I'll address the stuff here...
> > >
> > > > VMA operations can be roughly divided into three categories. The
> > > > handling of ANON_VMA_LAZY is briefly described below.
> > >
> > > I don't agree, there are plenty more VMA operations. But with
> > > respect to anon rmap there are:
> > >
> > > - fork
> > > - merge/split
> > > - remap
> > >
> >
> > Yes, these are the three categories. I originally intended to explain
> > them by classifying based on system calls; I should have used mremap
> instead of move_vma.
>
> I don't think you mentioned move_vma()? Maybe I missed it.
>
> The categorisation is most usefully based on callers of anon_vma_clone().
>
> >
> > 是的,是这三类,我本想从系统调用去分类说明,应该将move_vma
> 换成mremap的。
> >
> > > Your approach seems to completely ignore VMA split and the need to
> > > maintain an interval tree to _multiple_ VMAs from a single anon_vma.
> > >
> >
> > The folio uses vma->root_vma to compute folio_address. A VMA split
> > from it, vma_a, also uses vma_a->root_vma = vma->root_vma to compute
> folio_address.
> > During rmap, once folio_address is obtained, the VMA can be found
> > through mm_mt. Without fork, there is no need to maintain the interval
> tree.
>
> Well you need to search for every possible split VMA in mm_mt now, so you
> have to go page-by-page searching for each page for the rmap walked range.
>
ANON_VMA_LAZY has only one VMA. When I first looked at
rmap_walk_ksm, I also thought it would need to search page by page,
which seemed unacceptable. Later I realized that it only needs to
check whether this VMA falls within the rmap walk range.
@@ -3173,20 +3171,20 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
- anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
+ anon_rmap_foreach_vma(vma, vmac, anon_rmap,
0, ULONG_MAX) {
> You're also potentially racing against a remap, as you say below you don't
> folio lock on remap so concurrent rmap walkers can be present, the VMA can
> already be copied.
>
> We already have VMA lifecycle state around detached VMAs, so a VMA
> could be in a detached state, assumed by the existing logic to be entirely
> unavailable for use, out of the maple tree altogether but kept around in a
> zombie state.
>
> We'd then have lifecycle issues and races and edge cases around process
> teardown otherwise we might leak memory.
>
> Also, presumably you set vma->anon_vma to some lazy sentinel value so
> that mremap doesn't change vma->vm_pgoff when unfaulted?
>
> You would need to update any path that manipulates vma->anon_vma also
> so it doesn't incorrectly dereference it.
>
Yes, most of the code in this patch series is intended to prevent
incorrect dereferencing of anon_vma. If we assume it will not be
misused, some of the code could be simplified or removed.
> >
> > folio使用vma->root_vma 计算folio_address;从vma拆分出的vma_a,
> 使用vma_a->root_vma =
> > folio使用vma->vma->root_vma计算folio_address。
> > rmap时得到folio_address就可以通过mm_mt查找到vma。
> > 不fork就不需要维护interval tree。
> >
> > > You may also actually split a VMA against a single large folio
> > > (waiting on the deferred shrinker) and have a SINGLE _leaf_
> > > anonymous folio that is mapped in two places.
> > >
> > > The lazy approach doesn't seem to address this properly. And fatally
> > > it ties an actual VMA afaict to the folio and has to implement a VMA
> > > reference count mechanism which interferes with the ordinarily VMA
> lifecycle to do it.
> > >
> > > The fact of us taking advantage of most stuff being AnonExclusive, i.e.
> > > 'leaves' is something that my approach is exactly taking into account.
> > >
> > > Of course also extending anon_vma is a real non-starter.
> > >
> > > Also the below + the series ignores MAP_PRIVATE file-backed mappings
> > > which is a pretty fatal flaw.
> > >
> > > It also, as Harry says, has zero description of correctness in a way
> > > we'd want and no tests.
> > >
> >
> > 可以正确处理拆分vma在一个大页。拆分的vma_a或vma_b上的
> sub_page使用如下方式计算地址。
> > 对于文件vma的cow 匿名页,也用同样方式计算page/folio地址。
> >
> > It can correctly handle the case where a VMA is split within a large
> > page. The address of a sub_page in the split VMA (vma_a or vma_b) is
> > computed using the following method.
> >
> > For COW anonymous pages originating from file VMAs, the page/folio
> > address is also computed using the same method.
> >
> > subpage_address = vma_address(vma_a, subpage_pgoff, 1) =
> > vma_a->vm_start + (subpage_pgoff - vma_a->vm_pgoff) * PAGE_SIZE =
> > vma_a->vm_start - vma_a->vm_pgoff * PAGE_SIZE + subpage_pgoff *
> > PAGE_SIZE = vma_mapping_base(vma_a) + subpage_pgoff * PAGE_SIZE
> =
> > vma_mapping_base(root_vma) + subpage_pgoff * PAGE_SIZE
>
> OK but you want to walk entries in a _range_ in the interval tree.
>
> So you are then now looking up VMAs (in a racey way) using mm_mt (which
> is the whole basis of my work actually) which could change under you.
>
> I guess what you're doing is using the pinned 'root' VMA as the basis of
> everything, and the second a VMA is moved you (somehow) walk the page
> tables to update the folio->mapping.
>
> Again pinning the VMA like this and putting it in a folio is really not something
> we want to do.
>
> It adds a ton of complexity and also impacts VMA lifecycle which is already
> fairly fraught.
>
> It makes the VMA no longer just a VMA but rather also a 'memory' of where
> something was first faulted in as a hack more or less.
>
Maybe you're right. mm/mm_mt/vma/pagetable each have their own roles
in implementing VM. Perhaps considering them together could lead to
better ideas.
next prev parent reply other threads:[~2026-06-04 3:51 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-27 11:01 [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation tao
2026-05-27 11:01 ` [PATCH 01/15] mm/rmap: introduce anon_rmap APIs for anonymous folios tao
2026-05-27 11:32 ` sashiko-bot
2026-05-27 11:44 ` Lorenzo Stoakes
2026-05-28 7:47 ` wangtao
2026-05-27 11:01 ` [PATCH 02/15] mm: convert anon_vma rmap APIs to anon_rmap tao
2026-05-27 11:49 ` Lorenzo Stoakes
2026-05-28 8:55 ` wangtao
2026-05-27 11:01 ` [PATCH 03/15] mm: introduce anon_vma_tree_t for multiple anon_vma topologies tao
2026-05-27 11:56 ` Lorenzo Stoakes
2026-05-28 9:00 ` wangtao
2026-05-27 11:01 ` [PATCH 04/15] mm: switch to anon_vma_tree_t APIs in preparation for ANON_VMA_LAZY tao
2026-05-27 11:53 ` sashiko-bot
2026-05-27 11:01 ` [PATCH 05/15] mm: add CONFIG_ANON_VMA_LAZY and folio helpers tao
2026-05-27 11:01 ` [PATCH 06/15] mm: add CONFIG_VMA_REF and VMA helpers tao
2026-05-27 11:42 ` sashiko-bot
2026-05-27 11:01 ` [PATCH 07/15] mm: replace direct FOLIO_MAPPING_ANON usage with helpers tao
2026-05-27 11:43 ` sashiko-bot
2026-05-27 11:01 ` [PATCH 08/15] mm: prepare rmap infrastructure for ANON_VMA_LAZY tao
2026-05-27 14:01 ` sashiko-bot
2026-05-27 11:01 ` [PATCH 09/15] mm: implement ANON_VMA_LAZY rmap semantics tao
2026-05-27 12:15 ` sashiko-bot
2026-05-27 11:01 ` [PATCH 10/15] mm: defer anon_vma creation with ANON_VMA_LAZY tao
2026-05-27 12:29 ` sashiko-bot
2026-05-27 11:01 ` [PATCH 11/15] mm: handle ANON_VMA_LAZY in huge page operations tao
2026-05-27 12:21 ` sashiko-bot
2026-05-27 11:01 ` [PATCH 12/15] mm: handle ANON_VMA_LAZY during migration tao
2026-05-27 12:23 ` sashiko-bot
2026-05-27 11:01 ` [PATCH 13/15] mm: support setup and upgrade of ANON_VMA_LAZY folios tao
2026-05-27 13:02 ` sashiko-bot
2026-05-27 11:01 ` [PATCH 14/15] mm: support merging of ANON_VMA_LAZY VMAs tao
2026-05-27 12:49 ` sashiko-bot
2026-05-27 11:01 ` [PATCH 15/15] mm: enable CONFIG_ANON_VMA_LAZY on arm64 and x86_64 tao
2026-05-27 17:19 ` sashiko-bot
2026-05-27 11:23 ` [PATCH 0/15] mm: introduce ANON_VMA_LAZY for deferred anon_vma creation Pedro Falcato
2026-05-28 6:45 ` wangtao
2026-05-28 7:14 ` Lorenzo Stoakes
2026-05-27 11:30 ` Lorenzo Stoakes
2026-05-28 7:11 ` wangtao
2026-05-28 7:22 ` Lorenzo Stoakes
2026-05-27 14:33 ` Lorenzo Stoakes
2026-05-28 7:57 ` wangtao
2026-05-28 8:14 ` Lorenzo Stoakes
2026-05-28 23:31 ` Barry Song
2026-05-29 2:20 ` wangzicheng
2026-05-29 6:56 ` Lorenzo Stoakes
2026-05-29 6:45 ` Lorenzo Stoakes
2026-05-29 9:41 ` wangtao
2026-05-29 12:03 ` Lorenzo Stoakes
2026-06-01 1:46 ` wangtao
2026-06-02 2:15 ` Barry Song
2026-06-02 2:46 ` Lance Yang
2026-06-02 15:37 ` Lorenzo Stoakes
2026-06-02 19:44 ` Pedro Falcato
2026-06-02 23:03 ` Barry Song
2026-06-03 7:07 ` Lorenzo Stoakes
2026-06-02 19:56 ` Harry Yoo
2026-06-02 22:27 ` Barry Song
2026-06-02 20:47 ` Lorenzo Stoakes
2026-05-29 15:07 ` Jonathan Corbet
2026-05-29 15:40 ` Lorenzo Stoakes
2026-05-30 11:28 ` Barry Song
2026-06-02 16:07 ` Harry Yoo
2026-06-03 2:59 ` wangtao
2026-06-03 3:12 ` wangtao
2026-06-03 7:54 ` Lorenzo Stoakes
2026-06-03 11:05 ` wangtao
2026-06-03 11:53 ` Lorenzo Stoakes
2026-06-04 3:50 ` wangtao [this message]
2026-06-03 20:25 ` David Hildenbrand (Arm)
2026-06-03 22:14 ` Barry Song
2026-06-04 4:03 ` wangtao
2026-06-04 4:20 ` Barry Song
2026-06-04 7:35 ` wangtao
2026-06-09 15:26 ` Suren Baghdasaryan
2026-06-09 15:49 ` David Hildenbrand (Arm)
2026-06-04 3:10 ` xu.xin16
2026-06-04 4:10 ` wangtao
2026-06-05 9:38 ` David Hildenbrand (Arm)
2026-06-05 10:07 ` Lorenzo Stoakes
2026-06-05 10:56 ` David Hildenbrand (Arm)
2026-06-04 9:40 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a073c529df7841d98dcec9ddc3dad8bc@honor.com \
--to=tao.wangtao@honor.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bp@alien8.de \
--cc=byungchul@sk.com \
--cc=catalin.marinas@arm.com \
--cc=chengming.zhou@linux.dev \
--cc=damon@lists.linux.dev \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=dvander@google.com \
--cc=gourry@gourry.net \
--cc=harry@kernel.org \
--cc=hpa@zytor.com \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=joshua.hahnjy@gmail.com \
--cc=jparsana@google.com \
--cc=kas@kernel.org \
--cc=kees@kernel.org \
--cc=lance.yang@linux.dev \
--cc=liam@infradead.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=luizcap@redhat.com \
--cc=matthew.brost@intel.com \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=nao.horiguchi@gmail.com \
--cc=npache@redhat.com \
--cc=peterx@redhat.com \
--cc=pfalcato@suse.de \
--cc=rakie.kim@sk.com \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=ryncsn@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=sj@kernel.org \
--cc=surenb@google.com \
--cc=tglx@kernel.org \
--cc=vbabka@kernel.org \
--cc=wangzicheng@honor.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=xu.xin16@zte.com.cn \
--cc=ying.huang@linux.alibaba.com \
--cc=zhangji1@honor.com \
--cc=zhangjiao2@cmss.chinamobile.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.