linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Dave Hansen <dave.hansen@intel.com>,
	Anthony Yznaga <anthony.yznaga@oracle.com>,
	akpm@linux-foundation.org, willy@infradead.org,
	markhemm@googlemail.com, viro@zeniv.linux.org.uk,
	khalid@kernel.org
Cc: andreyknvl@gmail.com, luto@kernel.org, brauner@kernel.org,
	arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com,
	linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org,
	vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com,
	neilb@suse.de, maz@kernel.org,
	David Rientjes <rientjes@google.com>
Subject: Re: [RFC PATCH v3 00/10] Add support for shared PTEs across processes
Date: Mon, 7 Oct 2024 10:44:14 +0200	[thread overview]
Message-ID: <8c7fbaf1-61a0-4f55-8466-1ab40464d9db@redhat.com> (raw)
In-Reply-To: <9927f9a3-efba-4053-8384-cc69c7949ea6@intel.com>

On 02.10.24 19:35, Dave Hansen wrote:
> We were just chatting about this on David Rientjes's MM alignment call.

Unfortunately I was not able to attend this time, my body decided it's a 
good idea to stay in bed for a couple of days.

> I thought I'd try to give a little brain
> 
> Let's start by thinking about KVM and secondary MMUs.  KVM has a primary
> mm: the QEMU (or whatever) process mm.  The virtualization (EPT/NPT)
> tables get entries that effectively mirror the primary mm page tables
> and constitute a secondary MMU.  If the primary page tables change,
> mmu_notifiers ensure that the changes get reflected into the
> virtualization tables and also that the virtualization paging structure
> caches are flushed.
> 
> msharefs is doing something very similar.  But, in the msharefs case,
> the secondary MMUs are actually normal CPU MMUs.  The page tables are
> normal old page tables and the caches are the normal old TLB.  That's
> what makes it so confusing: we have lots of infrastructure for dealing
> with that "stuff" (CPU page tables and TLB), but msharefs has
> short-circuited the infrastructure and it doesn't work any more.

It's quite different IMHO, to a degree that I believe they are different 
beasts:

Secondary MMUs:
* "Belongs" to same MM context and the primary MMU (process page tables)
* Maintains separate tables/PTEs, in completely separate page table
   hierarchy
* Notifiers make sure the secondary structure stays in sync (update
   PTEs, flush TLB)

mshare:
* Possibly mapped by many different MMs. Likely nothing stops us from
   having on MM map multiple different mshare fds/
* Updating the PTEs directly affects all other MM page table structures
   (and possibly any secondary MMUs! scary)


I better not think about the complexity of seconary MMUs + mshare (e.g., 
KVM with mshare in guest memory): MMU notifiers for all MMs must be 
called ...


> 
> Basically, I think it makes a lot of sense to check what KVM (or another
> mmu_notifier user) is doing and make sure that msharefs is following its
> lead.  For instance, KVM _should_ have the exact same "page free"
> flushing issue where it gets the MMU notifier call but the page may
> still be in the secondary MMU.  I _think_ KVM fixes it with an extra
> page refcount that it takes when it first walks the primary page tables.
> 
> But the short of it is that the msharefs host mm represents a "secondary
> MMU".  I don't think it is really that special of an MMU other than the
> fact that it has an mm_struct.

Not sure I agree ... IMHO these are two orthogonal things. Unless we 
want MMU notifiers to "update" MM primary MMUs (there is not really 
anything to update ...), but not sure if that is what we are looking for.

What you note about TLB flushing in the other mail makes sense, not sure 
how this interacts with any secondary MMUs ....

-- 
Cheers,

David / dhildenb



  parent reply	other threads:[~2024-10-07  8:44 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-03 23:22 [RFC PATCH v3 00/10] Add support for shared PTEs across processes Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 01/10] mm: Add msharefs filesystem Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 02/10] mm/mshare: pre-populate msharefs with information file Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 03/10] mm/mshare: make msharefs writable and support directories Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 04/10] mm/mshare: allocate an mm_struct for msharefs files Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 05/10] mm/mshare: Add ioctl support Anthony Yznaga
2024-10-14 20:08   ` Jann Horn
2024-10-16  0:49     ` Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 06/10] mm/mshare: Add vm flag for shared PTEs Anthony Yznaga
2024-09-03 23:40   ` James Houghton
2024-09-03 23:58     ` Anthony Yznaga
2024-10-07 10:24     ` David Hildenbrand
2024-10-07 23:03       ` Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 07/10] mm/mshare: Add mmap support Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 08/10] mm/mshare: Add basic page table sharing support Anthony Yznaga
2024-10-07  8:41   ` Kirill A. Shutemov
2024-10-07 17:45     ` Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 09/10] mm: create __do_mmap() to take an mm_struct * arg Anthony Yznaga
2024-10-07  8:44   ` Kirill A. Shutemov
2024-10-07 17:46     ` Anthony Yznaga
2024-09-03 23:22 ` [RFC PATCH v3 10/10] mshare: add MSHAREFS_CREATE_MAPPING Anthony Yznaga
2024-10-02 17:35 ` [RFC PATCH v3 00/10] Add support for shared PTEs across processes Dave Hansen
2024-10-02 19:30   ` Anthony Yznaga
2024-10-02 23:11     ` Dave Hansen
2024-10-03  0:24       ` Anthony Yznaga
2024-10-07  8:44   ` David Hildenbrand [this message]
2024-10-07 15:58     ` Dave Hansen
2024-10-07 16:27       ` David Hildenbrand
2024-10-07 16:45         ` Sean Christopherson
2024-10-08  1:37           ` Anthony Yznaga
2024-10-07  8:48   ` David Hildenbrand
2024-10-07  9:01 ` Kirill A. Shutemov
2024-10-07 19:23   ` Anthony Yznaga
2024-10-07 19:41     ` David Hildenbrand
2024-10-07 19:46       ` Anthony Yznaga
2024-10-14 20:07 ` Jann Horn
2024-10-16  0:59   ` Anthony Yznaga
2024-10-16 13:25     ` Jann Horn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8c7fbaf1-61a0-4f55-8466-1ab40464d9db@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@gmail.com \
    --cc=anthony.yznaga@oracle.com \
    --cc=arnd@arndb.de \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@intel.com \
    --cc=ebiederm@xmission.com \
    --cc=khalid@kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=markhemm@googlemail.com \
    --cc=maz@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=neilb@suse.de \
    --cc=pcc@google.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=vasily.averin@linux.dev \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=xhao@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).