All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Will Deacon <will@kernel.org>
Cc: David Hildenbrand <david@redhat.com>,
	Vishal Annapurve <vannapurve@google.com>,
	 Quentin Perret <qperret@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Fuad Tabba <tabba@google.com>,
	 kvm@vger.kernel.org, kvmarm@lists.linux.dev,
	pbonzini@redhat.com,  chenhuacai@kernel.org, mpe@ellerman.id.au,
	anup@brainfault.org,  paul.walmsley@sifive.com,
	palmer@dabbelt.com, aou@eecs.berkeley.edu,
	 viro@zeniv.linux.org.uk, brauner@kernel.org,
	akpm@linux-foundation.org,  xiaoyao.li@intel.com,
	yilun.xu@intel.com, chao.p.peng@linux.intel.com,
	 jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com,
	 yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com,
	mic@digikod.net,  vbabka@suse.cz, ackerleytng@google.com,
	mail@maciej.szmigiero.name,  michael.roth@amd.com,
	wei.w.wang@intel.com, liam.merwick@oracle.com,
	 isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com,
	 suzuki.poulose@arm.com, steven.price@arm.com,
	quic_mnalajal@quicinc.com,  quic_tsoni@quicinc.com,
	quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com,
	 quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
	catalin.marinas@arm.com,  james.morse@arm.com,
	yuzenghui@huawei.com, oliver.upton@linux.dev,  maz@kernel.org,
	keirf@google.com, linux-mm@kvack.org
Subject: Re: folio_mmapped
Date: Wed, 3 Apr 2024 17:15:19 -0700	[thread overview]
Message-ID: <Zg3xF7dTtx6hbmZj@google.com> (raw)
In-Reply-To: <20240327193454.GB11880@willie-the-truck>

On Wed, Mar 27, 2024, Will Deacon wrote:
> Hi again, David,
> 
> On Fri, Mar 22, 2024 at 06:52:14PM +0100, David Hildenbrand wrote:
> > On 19.03.24 15:31, Will Deacon wrote:
> > sorry for the late reply!
> 
> Bah, you and me both!

Hold my beer ;-)

> > > On Tue, Mar 19, 2024 at 11:26:05AM +0100, David Hildenbrand wrote:
> > > > On 19.03.24 01:10, Sean Christopherson wrote:
> > > > > On Mon, Mar 18, 2024, Vishal Annapurve wrote:
> > > > > > On Mon, Mar 18, 2024 at 3:02 PM David Hildenbrand <david@redhat.com> wrote:
> > >  From the pKVM side, we're working on guest_memfd primarily to avoid
> > > diverging from what other CoCo solutions end up using, but if it gets
> > > de-featured (e.g. no huge pages, no GUP, no mmap) compared to what we do
> > > today with anonymous memory, then it's a really hard sell to switch over
> > > from what we have in production. We're also hoping that, over time,
> > > guest_memfd will become more closely integrated with the mm subsystem to
> > > enable things like hypervisor-assisted page migration, which we would
> > > love to have.
> > 
> > Reading Sean's reply, he has a different view on that. And I think that's
> > the main issue: there are too many different use cases and too many
> > different requirements that could turn guest_memfd into something that maybe
> > it really shouldn't be.
> 
> No argument there, and we're certainly not tied to any specific
> mechanism on the pKVM side. Maybe Sean can chime in, but we've
> definitely spoken about migration being a goal in the past, so I guess
> something changed since then on the guest_memfd side.

What's "hypervisor-assisted page migration"?  More specifically, what's the
mechanism that drives it?

I am not opposed to page migration itself, what I am opposed to is adding deep
integration with core MM to do some of the fancy/complex things that lead to page
migration.

Another thing I want to avoid is taking a hard dependency on "struct page", so
that we can have line of sight to eliminating "struct page" overhead for guest_memfd,
but that's definitely a more distant future concern.

> > This makes sense: shared memory is neither nasty nor special. You can
> > migrate it, swap it out, map it into page tables, GUP it, ... without any
> > issues.
> 
> Slight aside and not wanting to derail the discussion, but we have a few
> different types of sharing which we'll have to consider:
> 
>   * Memory shared from the host to the guest. This remains owned by the
>     host and the normal mm stuff can be made to work with it.

This seems like it should be !guest_memfd, i.e. can't be converted to guest
private (without first unmapping it from the host, but at that point it's
completely different memory, for all intents and purposes).

>   * Memory shared from the guest to the host. This remains owned by the
>     guest, so there's a pin on the pages and the normal mm stuff can't
>     work without co-operation from the guest (see next point).

Do you happen to have a list of exactly what you mean by "normal mm stuff"?  I
am not at all opposed to supporting .mmap(), because long term I also want to
use guest_memfd for non-CoCo VMs.  But I want to be very conservative with respect
to what is allowed for guest_memfd.   E.g. host userspace can map guest_memfd,
and do operations that are directly related to its mapping, but that's about it.

>   * Memory relinquished from the guest to the host. This actually unmaps
>     the pages from the host and transfers ownership back to the host,
>     after which the pin is dropped and the normal mm stuff can work. We
>     use this to implement ballooning.
> 
> I suppose the main thing is that the architecture backend can deal with
> these states, so the core code shouldn't really care as long as it's
> aware that shared memory may be pinned.
> 
> > So if I would describe some key characteristics of guest_memfd as of today,
> > it would probably be:
> > 
> > 1) Memory is unmovable and unswappable. Right from the beginning, it is
> >    allocated as unmovable (e.g., not placed on ZONE_MOVABLE, CMA, ...).
> > 2) Memory is inaccessible. It cannot be read from user space, the
> >    kernel, it cannot be GUP'ed ... only some mechanisms might end up
> >    touching that memory (e.g., hibernation, /proc/kcore) might end up
> >    touching it "by accident", and we usually can handle these cases.
> > 3) Memory can be discarded in page granularity. There should be no cases
> >    where you cannot discard memory to over-allocate memory for private
> >    pages that have been replaced by shared pages otherwise.
> > 4) Page tables are not required (well, it's an memfd), and the fd could
> >    in theory be passed to other processes.o

More broadly, no VMAs are required.  The lack of stage-1 page tables are nice to
have; the lack of VMAs means that guest_memfd isn't playing second fiddle, e.g.
it's not subject to VMA protections, isn't restricted to host mapping size, etc.

> > Having "ordinary shared" memory in there implies that 1) and 2) will have to
> > be adjusted for them, which kind-of turns it "partially" into ordinary shmem
> > again.
> 
> Yes, and we'd also need a way to establish hugepages (where possible)
> even for the *private* memory so as to reduce the depth of the guest's
> stage-2 walk.

Yeah, hugepage support for guest_memfd is very much a WIP.  Getting _something_
is easy, getting the right thing is much harder.

  parent reply	other threads:[~2024-04-04  0:15 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-22 16:10 [RFC PATCH v1 00/26] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 01/26] KVM: Split KVM memory attributes into user and kernel attributes Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 02/26] KVM: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 03/26] KVM: Add restricted support for mapping guestmem by the host Fuad Tabba
2024-02-22 16:28   ` David Hildenbrand
2024-02-26  8:58     ` Fuad Tabba
2024-02-26  9:57       ` David Hildenbrand
2024-02-26 17:30         ` Fuad Tabba
2024-02-27  7:40           ` David Hildenbrand
2024-02-22 16:10 ` [RFC PATCH v1 04/26] KVM: Don't allow private attribute to be set if mapped by host Fuad Tabba
2024-04-17 23:27   ` Sean Christopherson
2024-04-18 10:54   ` David Hildenbrand
2024-02-22 16:10 ` [RFC PATCH v1 05/26] KVM: Don't allow private attribute to be removed for unmappable memory Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 06/26] KVM: Implement kvm_(read|/write)_guest_page for private memory slots Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 07/26] KVM: arm64: Turn llist of pinned pages into an rb-tree Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 08/26] KVM: arm64: Implement MEM_RELINQUISH SMCCC hypercall Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 09/26] KVM: arm64: Strictly check page type in MEM_RELINQUISH hypercall Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 10/26] KVM: arm64: Avoid unnecessary unmap walk " Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 11/26] KVM: arm64: Add initial support for KVM_CAP_EXIT_HYPERCALL Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 12/26] KVM: arm64: Allow userspace to receive SHARE and UNSHARE notifications Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 13/26] KVM: arm64: Create hypercall return handler Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 14/26] KVM: arm64: Refactor code around handling return from host to guest Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 15/26] KVM: arm64: Rename kvm_pinned_page to kvm_guest_page Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 16/26] KVM: arm64: Add a field to indicate whether the guest page was pinned Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 17/26] KVM: arm64: Do not allow changes to private memory slots Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 18/26] KVM: arm64: Skip VMA checks for slots without userspace address Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 19/26] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 20/26] KVM: arm64: Track sharing of memory from protected guest to host Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 21/26] KVM: arm64: Mark a protected VM's memory as unmappable at initialization Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 22/26] KVM: arm64: Handle unshare on way back to guest entry rather than exit Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 23/26] KVM: arm64: Check that host unmaps memory unshared by guest Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 24/26] KVM: arm64: Add handlers for kvm_arch_*_set_memory_attributes() Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 25/26] KVM: arm64: Enable private memory support when pKVM is enabled Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 26/26] KVM: arm64: Enable private memory kconfig for arm64 Fuad Tabba
2024-02-22 23:43 ` [RFC PATCH v1 00/26] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Elliot Berman
2024-02-23  0:35   ` folio_mmapped Matthew Wilcox
2024-02-26  9:28     ` folio_mmapped David Hildenbrand
2024-02-26 21:14       ` folio_mmapped Elliot Berman
2024-02-27 14:59         ` folio_mmapped David Hildenbrand
2024-02-28 10:48           ` folio_mmapped Quentin Perret
2024-02-28 11:11             ` folio_mmapped David Hildenbrand
2024-02-28 12:44               ` folio_mmapped Quentin Perret
2024-02-28 13:00                 ` folio_mmapped David Hildenbrand
2024-02-28 13:34                   ` folio_mmapped Quentin Perret
2024-02-28 18:43                     ` folio_mmapped Elliot Berman
2024-02-28 18:51                       ` Quentin Perret
2024-02-29 10:04                     ` folio_mmapped David Hildenbrand
2024-02-29 19:01                       ` folio_mmapped Fuad Tabba
2024-03-01  0:40                         ` folio_mmapped Elliot Berman
2024-03-01 11:16                           ` folio_mmapped David Hildenbrand
2024-03-04 12:53                             ` folio_mmapped Quentin Perret
2024-03-04 20:22                               ` folio_mmapped David Hildenbrand
2024-03-01 11:06                         ` folio_mmapped David Hildenbrand
2024-03-04 12:36                       ` folio_mmapped Quentin Perret
2024-03-04 19:04                         ` folio_mmapped Sean Christopherson
2024-03-04 20:17                           ` folio_mmapped David Hildenbrand
2024-03-04 21:43                             ` folio_mmapped Elliot Berman
2024-03-04 21:58                               ` folio_mmapped David Hildenbrand
2024-03-19  9:47                                 ` folio_mmapped Quentin Perret
2024-03-19  9:54                                   ` folio_mmapped David Hildenbrand
2024-03-18 17:06                             ` folio_mmapped Vishal Annapurve
2024-03-18 22:02                               ` folio_mmapped David Hildenbrand
2024-03-18 23:07                                 ` folio_mmapped Vishal Annapurve
2024-03-19  0:10                                   ` folio_mmapped Sean Christopherson
2024-03-19 10:26                                     ` folio_mmapped David Hildenbrand
2024-03-19 13:19                                       ` folio_mmapped David Hildenbrand
2024-03-19 14:31                                       ` folio_mmapped Will Deacon
2024-03-19 23:54                                         ` folio_mmapped Elliot Berman
2024-03-22 16:36                                           ` Will Deacon
2024-03-22 18:46                                             ` Elliot Berman
2024-03-27 19:31                                               ` Will Deacon
2024-03-22 17:52                                         ` folio_mmapped David Hildenbrand
2024-03-22 21:21                                           ` folio_mmapped David Hildenbrand
2024-03-26 22:04                                             ` folio_mmapped Elliot Berman
2024-03-27 17:50                                               ` folio_mmapped David Hildenbrand
2024-03-27 19:34                                           ` folio_mmapped Will Deacon
2024-03-28  9:06                                             ` folio_mmapped David Hildenbrand
2024-03-28 10:10                                               ` folio_mmapped Quentin Perret
2024-03-28 10:32                                                 ` folio_mmapped David Hildenbrand
2024-03-28 10:58                                                   ` folio_mmapped Quentin Perret
2024-03-28 11:41                                                     ` folio_mmapped David Hildenbrand
2024-03-29 18:38                                                       ` folio_mmapped Vishal Annapurve
2024-04-04  0:15                                             ` Sean Christopherson [this message]
2024-03-19 15:04                                       ` folio_mmapped Sean Christopherson
2024-03-22 17:16                                         ` folio_mmapped David Hildenbrand
2024-02-26  9:03   ` [RFC PATCH v1 00/26] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba
2024-02-23 12:00 ` Alexandru Elisei
2024-02-26  9:05   ` Fuad Tabba
2024-02-26  9:47 ` David Hildenbrand
2024-02-27  9:37   ` Fuad Tabba
2024-02-27 14:41     ` David Hildenbrand
2024-02-27 14:49       ` David Hildenbrand
2024-02-28  9:57       ` Fuad Tabba
2024-02-28 10:12         ` David Hildenbrand
2024-02-28 14:01           ` Quentin Perret
2024-02-29  9:51             ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zg3xF7dTtx6hbmZj@google.com \
    --to=seanjc@google.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=keirf@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=liam.merwick@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mic@digikod.net \
    --cc=michael.roth@amd.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_cvanscha@quicinc.com \
    --cc=quic_mnalajal@quicinc.com \
    --cc=quic_pderrin@quicinc.com \
    --cc=quic_pheragu@quicinc.com \
    --cc=quic_svaddagi@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wei.w.wang@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yu.c.zhang@linux.intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.