Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <ljs@kernel.org>
To: lsf-pc@lists.linux-foundation.org
Cc: linux-mm@kvack.org, David Hildenbrand <david@kernel.org>,
	 "Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	 Suren Baghdasaryan <surenb@google.com>,
	Pedro Falcato <pfalcato@suse.de>,
	 Ryan Roberts <ryan.roberts@arm.com>,
	Harry Yoo <harry.yoo@oracle.com>,
	 Rik van Riel <riel@surriel.com>, Jann Horn <jannh@google.com>,
	Chris Li <chriscli@google.com>,  Barry Song <baohua@kernel.org>
Subject: Re: [LSF/MM/BPF TOPIC] The Future of the Anonymous Reverse Mapping [RESEND]
Date: Sat, 2 May 2026 07:53:40 +0100	[thread overview]
Message-ID: <afWcMyc5M4vViLJd@lucifer> (raw)
In-Reply-To: <aec533b2-37a7-4f44-a279-c4aa604206ac@lucifer.local>

As is time-honoured LSF tradition, I am sharing code for my proposal.

I worked a very long day yesterday and got the _very_ rough PoC code into
some kind of vaguely shareable state.

https://git.kernel.org/pub/scm/linux/kernel/git/ljs/linux.git/log/?h=project/cow-context

CAVEATS:

* The code is not great, it's 'experimental, wave your arms, hope for the
  best' stuff used for experimentation.

* I know the dynamic array implementation is probably entirely broken from
  a concurrency point of view, inefficient (an n gets squared, *gasp*!),
  etc. etc. - it is _not_ what I am proposing to actually do in any even
  RFC of this code, it's just for PoC purposes.

* By default it runs CoW context alongside anon_vma, and will pr_err() if
  there are mismatches between the two.

* However you can enable 'pure' CoW context mode via
  CONFIG_COW_CONTEXT_ANON_RMAP.

* This is, as the talk will cover, currently broken for migration, not
  because of bugs etc. but because I've not decided on the synchronisation
  method yet (_everything_ is RCU in this mode).

The kernel boots in either mode :)

Obviously this is going to go through a lot more changes before any RFC,
but wanted to get this code out there in the 'discussion topic at LSF, have
code for it' tradition.

See all those who are attending in Zagreb! :)

Cheers, Lorenzo

On Mon, Mar 30, 2026 at 10:23:57PM +0100, Lorenzo Stoakes (Oracle) wrote:
> [sorry subject line was typo'd, resending with correct subject line for
> visibility. Original at
> https://lore.kernel.org/linux-mm/8aa41d47-ee41-4af1-a334-587a34fe865d@lucifer.local/]
>
> Currently we track the reverse mapping between folios and VMAs at a VMA level,
> utilising a complicated and confusing combination of anon_vma objects and
> anon_vma_chain's linking them, which must be updated when VMAs are split,
> merged, remapped or forked.
>
> It's further complicated by various optimisations intended to avoid scalability
> issues in locking and memory allocation.
>
> I have done recent work to improve the situation [0] which has also lead to a
> reported improvement in lock scalability [1], but fundamentally the situation
> remains the same.
>
> The logic is actually, when you think hard enough about it, is a fairly
> reasonable means of implementing the reverse mapping at a VMA level.
>
> It is, however, a very broken abstraction as it stands. In order to work with
> the logic, you have to essentially keep a broad understanding of the entire
> implementation in your head at one time - that is, not much is really
> abstracted.
>
> This results in confusion, mistakes, and bit rot. It's also very time-consuming
> to work with - personally I've gone to the lengths of writing a private set of
> slides for myself on the topic as a reminder each time I come back to it.
>
> There are also issues with lock scalability - the use of interval trees to
> maintain a connection between an anon_vma and AVCs connected to VMAs requires
> that a lock must be held across the entire 'CoW hierarchy' of parent and child
> VMAs whenever performing an rmap walk or performing a merge, split, remap or
> fork.
>
> This is because we tear down all interval tree mappings and reestablish them
> each time we might see changes in VMA geometry. This is an issue Barry Song
> identified as problematic in a real world use case [2].
>
> So what do we do to improve the situation?
>
> Recently I have been working on an experimental new approach to the anonymous
> reverse mapping, in which we instead track anonymous remaps, and then use the
> VMA's virtual page offset to locate VMAs from the folio.
>
> I have got the implementation working to the point where it tracks the exact
> same VMAs as the anon_vma implementation, and it seems a lot of it can be done
> under RCU.
>
> It avoids the need to maintain expensive mappings at a VMA level, though it
> incurs a cost in tracking remaps, and MAP_PRIVATE files are very much a TODO
> (they maintain a file vma->vm_pgoff, even when CoW'd, so the remap tracking is
> pretty sub-optimal).
>
> I am investigating whether I can change how MAP_PRIVATE file-backed mappings
> work to avoid this issue, and will be developing tests to see how lock
> scalability, throughput and memory usage compare to the anon_vma approach under
> different workloads.
>
> This experiment may or may not work out, either way it will be interesting to
> discuss it.
>
> By the time LSF/MM comes around I may even have already decided on a different
> approach but that's what makes things interesting :)
>
> [0]:https://lore.kernel.org/all/cover.1767711638.git.lorenzo.stoakes@oracle.com/
> [1]:https://lore.kernel.org/all/202602061747.855f053f-lkp@intel.com/
> [2]:https://lore.kernel.org/linux-mm/CAGsJ_4x=YsQR=nNcHA-q=0vg0b7ok=81C_qQqKmoJ+BZ+HVduQ@mail.gmail.com/
>
> Cheers, Lorenzo


  parent reply	other threads:[~2026-05-02  6:53 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-30 21:23 [LSF/MM/BPF TOPIC] The Future of the Anonymous Reverse Mapping [RESEND] Lorenzo Stoakes (Oracle)
2026-03-31 23:30 ` Barry Song
2026-04-01  8:43   ` Lorenzo Stoakes (Oracle)
2026-04-01 21:03     ` Barry Song
2026-04-02 12:20       ` Lorenzo Stoakes (Oracle)
2026-04-02 21:49         ` Barry Song
2026-05-04  8:10           ` Lorenzo Stoakes
2026-05-02  6:53 ` Lorenzo Stoakes [this message]
2026-05-03 18:26   ` Rik van Riel
2026-05-04  8:01     ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afWcMyc5M4vViLJd@lucifer \
    --to=ljs@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=baohua@kernel.org \
    --cc=chriscli@google.com \
    --cc=david@kernel.org \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=pfalcato@suse.de \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox