From: Sean Christopherson <seanjc@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>,
Anthony Yznaga <anthony.yznaga@oracle.com>,
akpm@linux-foundation.org, willy@infradead.org,
markhemm@googlemail.com, viro@zeniv.linux.org.uk,
khalid@kernel.org, andreyknvl@gmail.com, luto@kernel.org,
brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com,
catalin.marinas@arm.com, linux-arch@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
mhiramat@kernel.org, rostedt@goodmis.org,
vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com,
neilb@suse.de, maz@kernel.org,
David Rientjes <rientjes@google.com>
Subject: Re: [RFC PATCH v3 00/10] Add support for shared PTEs across processes
Date: Mon, 7 Oct 2024 09:45:44 -0700 [thread overview]
Message-ID: <ZwQQOP89Dj5gvbaP@google.com> (raw)
In-Reply-To: <7c13be04-1d18-45bd-8cfc-f5d37bd39a8e@redhat.com>
On Mon, Oct 07, 2024, David Hildenbrand wrote:
> On 07.10.24 17:58, Dave Hansen wrote:
> > On 10/7/24 01:44, David Hildenbrand wrote:
> > > On 02.10.24 19:35, Dave Hansen wrote:
> > > > We were just chatting about this on David Rientjes's MM alignment call.
> > >
> > > Unfortunately I was not able to attend this time, my body decided it's a
> > > good idea to stay in bed for a couple of days.
> > >
> > > > I thought I'd try to give a little brain
> > > >
> > > > Let's start by thinking about KVM and secondary MMUs. KVM has a primary
> > > > mm: the QEMU (or whatever) process mm. The virtualization (EPT/NPT)
> > > > tables get entries that effectively mirror the primary mm page tables
> > > > and constitute a secondary MMU. If the primary page tables change,
> > > > mmu_notifiers ensure that the changes get reflected into the
> > > > virtualization tables and also that the virtualization paging structure
> > > > caches are flushed.
> > > >
> > > > msharefs is doing something very similar. But, in the msharefs case,
> > > > the secondary MMUs are actually normal CPU MMUs. The page tables are
> > > > normal old page tables and the caches are the normal old TLB. That's
> > > > what makes it so confusing: we have lots of infrastructure for dealing
> > > > with that "stuff" (CPU page tables and TLB), but msharefs has
> > > > short-circuited the infrastructure and it doesn't work any more.
> > >
> > > It's quite different IMHO, to a degree that I believe they are different
> > > beasts:
> > >
> > > Secondary MMUs:
> > > * "Belongs" to same MM context and the primary MMU (process page tables)
> >
> > I think you're speaking to the ratio here. For each secondary MMU, I
> > think you're saying that there's one and only one mm_struct. Is that right?
>
> Yes, that is my understanding (at least with KVM). It's a secondary MMU
> derived from exactly one primary MMU (MM context -> page table hierarchy).
I don't think the ratio is what's important. I think the important takeaway is
that the secondary MMU is explicitly tied to the primary MMU that it is tracking.
This is enforced in code, as the list of mmu_notifiers is stored in mm_struct.
The 1:1 ratio probably holds true today, e.g. for KVM, each VM is associated with
exactly one mm_struct. But fundamentally, nothing would prevent a secondary MMU
that manages a so called software TLB from tracking multiple primary MMUs.
E.g. it wouldn't be all that hard to implement in KVM (a bit crazy, but not hard),
because KVM's memslots disallow gfn aliases, i.e. each index into KVM's secondary
MMU would be associated with at most one VMA and thus mm_struct.
Pulling Dave's earlier comment in:
: But the short of it is that the msharefs host mm represents a "secondary
: MMU". I don't think it is really that special of an MMU other than the
: fact that it has an mm_struct.
and David's (so. many. Davids):
: I better not think about the complexity of seconary MMUs + mshare (e.g.,
: KVM with mshare in guest memory): MMU notifiers for all MMs must be
: called ...
mshare() is unique because it creates the possibly of chained "secondary" MMUs.
I.e. the fact that it has an mm_struct makes it *very* special, IMO.
> > > * Maintains separate tables/PTEs, in completely separate page table
> > > hierarchy
> >
> > This is the case for KVM and the VMX/SVM MMUs, but it's not generally
> > true about hardware. IOMMUs can walk x86 page tables and populate the
> > IOTLB from the _same_ page table hierarchy as the CPU.
>
> Yes, of course.
Yeah, the recent rework of invalidate_range() => arch_invalidate_secondary_tlbs()
sums things up nicely:
commit 1af5a8109904b7f00828e7f9f63f5695b42f8215
Author: Alistair Popple <apopple@nvidia.com>
AuthorDate: Tue Jul 25 23:42:07 2023 +1000
Commit: Andrew Morton <akpm@linux-foundation.org>
CommitDate: Fri Aug 18 10:12:41 2023 -0700
mmu_notifiers: rename invalidate_range notifier
There are two main use cases for mmu notifiers. One is by KVM which uses
mmu_notifier_invalidate_range_start()/end() to manage a software TLB.
The other is to manage hardware TLBs which need to use the
invalidate_range() callback because HW can establish new TLB entries at
any time. Hence using start/end() can lead to memory corruption as these
callbacks happen too soon/late during page unmap.
mmu notifier users should therefore either use the start()/end() callbacks
or the invalidate_range() callbacks. To make this usage clearer rename
the invalidate_range() callback to arch_invalidate_secondary_tlbs() and
update documention.
next prev parent reply other threads:[~2024-10-07 16:45 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-03 23:22 [RFC PATCH v3 00/10] Add support for shared PTEs across processes Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 01/10] mm: Add msharefs filesystem Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 02/10] mm/mshare: pre-populate msharefs with information file Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 03/10] mm/mshare: make msharefs writable and support directories Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 04/10] mm/mshare: allocate an mm_struct for msharefs files Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 05/10] mm/mshare: Add ioctl support Anthony Yznaga
2024-10-14 20:08 ` Jann Horn
2024-10-16 0:49 ` Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 06/10] mm/mshare: Add vm flag for shared PTEs Anthony Yznaga
2024-09-03 23:40 ` James Houghton
2024-09-03 23:58 ` Anthony Yznaga
2024-10-07 10:24 ` David Hildenbrand
2024-10-07 23:03 ` Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 07/10] mm/mshare: Add mmap support Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 08/10] mm/mshare: Add basic page table sharing support Anthony Yznaga
2024-10-07 8:41 ` Kirill A. Shutemov
2024-10-07 17:45 ` Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 09/10] mm: create __do_mmap() to take an mm_struct * arg Anthony Yznaga
2024-10-07 8:44 ` Kirill A. Shutemov
2024-10-07 17:46 ` Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 10/10] mshare: add MSHAREFS_CREATE_MAPPING Anthony Yznaga
2024-10-02 17:35 ` [RFC PATCH v3 00/10] Add support for shared PTEs across processes Dave Hansen
2024-10-02 19:30 ` Anthony Yznaga
2024-10-02 23:11 ` Dave Hansen
2024-10-03 0:24 ` Anthony Yznaga
2024-10-07 8:44 ` David Hildenbrand
2024-10-07 15:58 ` Dave Hansen
2024-10-07 16:27 ` David Hildenbrand
2024-10-07 16:45 ` Sean Christopherson [this message]
2024-10-08 1:37 ` Anthony Yznaga
2024-10-07 8:48 ` David Hildenbrand
2024-10-07 9:01 ` Kirill A. Shutemov
2024-10-07 19:23 ` Anthony Yznaga
2024-10-07 19:41 ` David Hildenbrand
2024-10-07 19:46 ` Anthony Yznaga
2024-10-14 20:07 ` Jann Horn
2024-10-16 0:59 ` Anthony Yznaga
2024-10-16 13:25 ` Jann Horn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZwQQOP89Dj5gvbaP@google.com \
--to=seanjc@google.com \
--cc=akpm@linux-foundation.org \
--cc=andreyknvl@gmail.com \
--cc=anthony.yznaga@oracle.com \
--cc=arnd@arndb.de \
--cc=brauner@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@intel.com \
--cc=david@redhat.com \
--cc=ebiederm@xmission.com \
--cc=khalid@kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=markhemm@googlemail.com \
--cc=maz@kernel.org \
--cc=mhiramat@kernel.org \
--cc=neilb@suse.de \
--cc=pcc@google.com \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=vasily.averin@linux.dev \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=xhao@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).