Re: [LSF/MM/BPF TOPIC] The Future of the Anonymous Reverse Mapping [RESEND]

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Lorenzo Stoakes <ljs@kernel.org>
To: Barry Song <baohua@kernel.org>
Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	 David Hildenbrand <david@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	 Vlastimil Babka <vbabka@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	 Pedro Falcato <pfalcato@suse.de>,
	Ryan Roberts <ryan.roberts@arm.com>,
	 Harry Yoo <harry.yoo@oracle.com>,
	Rik van Riel <riel@surriel.com>, Jann Horn <jannh@google.com>,
	 Chris Li <chriscli@google.com>
Subject: Re: [LSF/MM/BPF TOPIC] The Future of the Anonymous Reverse Mapping [RESEND]
Date: Mon, 4 May 2026 09:10:45 +0100	[thread overview]
Message-ID: <afhSYrGf4nL53_s9@lucifer> (raw)
In-Reply-To: <CAGsJ_4za9WM8=-60OVKraOw8KLCgb+CcJfiKixuFsMxUWQ2nzQ@mail.gmail.com>

Sorry my email is a mess lately, finally catching up after a month or so...

On Fri, Apr 03, 2026 at 05:49:10AM +0800, Barry Song wrote:
> >
> > Umm, memory is not preserved across an exec() :) so it works fine with that.
> >
> > vfork() is CLONE_VM so the mm is shared and everything works fine.
>
> My question is whether we can reuse the process tree, similar to
> walk_tg_tree_from(). With some flags in mm_struct, it might be

That's an interesting bit of code thanks for pointing me at that :)

> possible to distinguish whether an mm_struct was copied from the
> parent or created by a new exec.

In the new exec case there's no copying right? You're always overwriting the mm?


> > Well you're missing stuff there, the folio would have to be non-anon exclusive
> > (which is rare). Yes it'd find the new VMA, then traverse, and find the folio
> > does not match, and traverse children.
> >
> > rmap walks _always_ allow for you walking VMAs that a folio does not belong
> > to.
>
> I understand that we can check whether the folio belongs to the new
> VMA, but I’m curious whether this will occur more frequently in practice
> after the change. In the rmap case, I assume the original A’s folio
> anon_vma would be detached from process B once B unmaps and then maps
> a new VMA, so we wouldn’t search B anymore—is that correct?

In both cases folio_move_anon_rmap() changes folio->mapping. In the anon_vma
case, it moves it to the 'leaf' anon_vma, in the Cow context case it moves it to
the leaf cow context.

For CoW context we stop looking past the first CoW context if
!folio_maybe_mapped_shared().

So the usual situation will incur just the same amount of walking (but some edge
cases might be slower yes)

>
> >
> > For instance, with anon_vma, if you CoW a bunch of folios to child process VMAs,
> > the non-CoW'd folio will _still_ traverse all of that uselessly.
> >
> > In any case, this isn't a common case.
> >
> > However note that if a folio _becomes_ anon exclusive, it switches its 'root'
> > cow context to the one associated with the mm which it became exclusive to.
> >
>
> Agreed. I’m curious about the case of A’s folio, whose VMA has been
> completely replaced in B after the unmap and map. In the old anon_vma
> case, we wouldn’t search B anymore, but now we’ll need to check B's
> vm_pgoff since it covers the folio’s address—is that correct?

It's anon exclusive so we wouldn't bother looking past the first CoW context
level.

If it was shared we might do a useless work, yes. But again I think a rare case.


> > > If we have multiple remaps for multiple VMAs within one mm_struct,
> > > will we end up traversing all the dynamic arrays for any folio that
> > > might be located in a VMA that has been remapped?
> >
> > Yup. But there aren't all that many, and it's all under RCU so :)
> >
> > That part of the search should be quick, parts of the search involving page
> > tables, less so.
> >
> > Also I need to figure out how to maintain stabilisation without an rmap lock, an
> > ongoing open problem in all this.
> >
> > In the end, as the original mail said, I may conclude _this_ approach is
> > unworkable and come up with an alternative that's more conventional.
>
> I’m genuinely interested in the new approach. If you have the code, I’d be
> happy to read, test, and work on it.

I've posted on the thread, but it's very much a proof of concept and
stabilisation is currently broken so it's not in a testable state YET. But you
can see the rough shape of it now:

https://git.kernel.org/pub/scm/linux/kernel/git/ljs/linux.git?h=project%2Fcow-context

>
> >
> > BUT. Doing it this way saves 30x the amount of kernel allocated memory. I tried
> > a heavy load case and it was very substantial. That's not to be sniffed at.
> >
> > In any case, all of this is going to be _very_ driven by metrics. How slow is
> > it, how much overhead does it actually produce, is it workable, are the
> > trade-offs right, etc.
> >
> > It's an exploration rather than a fait accompli.
>
> Right now, I’m still at the stage of trying to understand the details of
> your new approach and would like to learn more—so I might have quite a
> few naive questions :-)

No problem, you will never ask anything more naive than what I might ask and I
may very well have made some very naive mistakes so :) health to discuss it I
think!

>
> Thanks
> Barry

Cheers, Lorenzo

next prev parent reply	other threads:[~2026-05-04  8:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-30 21:23 [LSF/MM/BPF TOPIC] The Future of the Anonymous Reverse Mapping [RESEND] Lorenzo Stoakes (Oracle)
2026-03-31 23:30 ` Barry Song
2026-04-01  8:43   ` Lorenzo Stoakes (Oracle)
2026-04-01 21:03     ` Barry Song
2026-04-02 12:20       ` Lorenzo Stoakes (Oracle)
2026-04-02 21:49         ` Barry Song
2026-05-04  8:10           ` Lorenzo Stoakes [this message]
2026-05-02  6:53 ` Lorenzo Stoakes
2026-05-03 18:26   ` Rik van Riel
2026-05-04  8:01     ` Lorenzo Stoakes
2026-05-14 13:06 ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afhSYrGf4nL53_s9@lucifer \
    --to=ljs@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=baohua@kernel.org \
    --cc=chriscli@google.com \
    --cc=david@kernel.org \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=pfalcato@suse.de \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.