Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages

linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Fuad Tabba <tabba@google.com>
Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org,
	linux-mm@kvack.org,  kvmarm@lists.linux.dev, pbonzini@redhat.com,
	chenhuacai@kernel.org,  mpe@ellerman.id.au, anup@brainfault.org,
	paul.walmsley@sifive.com,  palmer@dabbelt.com,
	aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk,
	 brauner@kernel.org, willy@infradead.org,
	akpm@linux-foundation.org,  xiaoyao.li@intel.com,
	yilun.xu@intel.com, chao.p.peng@linux.intel.com,
	 jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com,
	 isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz,
	 vannapurve@google.com, ackerleytng@google.com,
	mail@maciej.szmigiero.name,  david@redhat.com,
	michael.roth@amd.com, wei.w.wang@intel.com,
	 liam.merwick@oracle.com, isaku.yamahata@gmail.com,
	 kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com,
	steven.price@arm.com,  quic_eberman@quicinc.com,
	quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com,
	 quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com,
	 quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
	catalin.marinas@arm.com,  james.morse@arm.com,
	yuzenghui@huawei.com, oliver.upton@linux.dev,  maz@kernel.org,
	will@kernel.org, qperret@google.com, keirf@google.com,
	 roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org,
	jgg@nvidia.com,  rientjes@google.com, jhubbard@nvidia.com,
	fvdl@google.com, hughd@google.com,  jthoughton@google.com,
	peterx@redhat.com, pankaj.gupta@amd.com,  ira.weiny@intel.com
Subject: Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
Date: Tue, 17 Jun 2025 16:04:20 -0700	[thread overview]
Message-ID: <aFH0dCiljueHeCSp@google.com> (raw)
In-Reply-To: <CA+EHjTyO1tP1uiVkoReZxvV6h2VwfX+1qxBT15JcP3+AXdB8fA@mail.gmail.com>

On Mon, Jun 16, 2025, Fuad Tabba wrote:
> > > This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
> > > and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
> > > flag at creation time.
> >
> > Why?  I can see that from the patch.
> 
> It's in the patch series, not this patch.

Eh, not really.  It doesn't even matter how "Why?" is interpreted, because nothing
in this series covers any of the reasonable interpretations to an acceptable
degree.

These are all the changelogs for generic changes

 : This patch enables support for shared memory in guest_memfd, including
 : mapping that memory from host userspace.
 : 
 : This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
 : and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
 : flag at creation time.

 : Add a new internal flag in the top half of memslot->flags to track when
 : a guest_memfd-backed slot supports shared memory, which is reserved for
 : internal use in KVM.
 : 
 : This avoids repeatedly checking the underlying guest_memfd file for
 : shared memory support, which requires taking a reference on the file.

the small bit of documentation

 +When the capability KVM_CAP_GMEM_SHARED_MEM is supported, the 'flags' field
 +supports GUEST_MEMFD_FLAG_SUPPORT_SHARED.  Setting this flag on guest_memfd
 +creation enables mmap() and faulting of guest_memfd memory to host userspace.
 +
 +When the KVM MMU performs a PFN lookup to service a guest fault and the backing
 +guest_memfd has the GUEST_MEMFD_FLAG_SUPPORT_SHARED set, then the fault will
 +always be consumed from guest_memfd, regardless of whether it is a shared or a
 +private fault.

and the cover letter

 : The purpose of this series is to allow mapping guest_memfd backed memory
 : at the host. This support enables VMMs like Firecracker to run guests
 : backed completely by guest_memfd [2]. Combined with Patrick's series for
 : direct map removal in guest_memfd [3], this would allow running VMs that
 : offer additional hardening against Spectre-like transient execution
 : attacks.
 : 
 : This series will also serve as a base for _restricted_ mmap() support
 : for guest_memfd backed memory at the host for CoCos that allow sharing
 : guest memory in-place with the host [4].

None of those get remotely close to explaining the use cases in sufficient
detail.

Now, it's entirely acceptable, and in this case probably highly preferred, to
link to the relevant use cases, e.g. as opposed to trying to regurgitate and
distill a huge pile of information.

But I want the _changelog_ to do the heavy lifting of capturing the most useful
links and providing context.  E.g. to find the the motiviation for using
guest_memfd to back non-CoCo VMs, I had to follow the [3] link to Patrick's
series, then walk backwards through the versions of _that_ series, and eventually
come across another link in Patrick's very first RFC:

 : This RFC series is a rough draft adding support for running
 : non-confidential compute VMs in guest_memfd, based on prior discussions
 : with Sean [1].

where [1] is the much more helpful:

  https://lore.kernel.org/linux-mm/cc1bb8e9bc3e1ab637700a4d3defeec95b55060a.camel@amazon.com

Now, _I_ am obviously aware of most/all of the use cases and motiviations, but
the changelog isn't just for people like me.  Far from it; the changelog is most
useful for people that are coming in with _zero_ knowledge and context.  Finding
the above link took me quite a bit of effort and digging (and to some extent, I
knew what I was looking for), whereas an explicit reference in the changelog
would (hopefully) take only the few seconds needed to read the blurb and click
the link.

My main argument for why you (and everyone else) should put significant effort
into changelogs (and comments and documentation!) is very simple: writing and
curating a good changelog (comment/documentation) is something the author does
*once*.  If the author skimps out on the changelog, then *every* reader is having
to do that same work *every* time they dig through this code.  We as a community
come out far, far ahead in terms of developer time and understanding by turning a
many-time cost into a one-time cost (and that's not even accounting for the fact
that the author's one-time cost will like be a _lot_ smaller).

There's obviously a balance to strike.  E.g. if the changelog has 50 links, that's
probably going to be counter-productive for most readers.  In this case, 5-7-ish
links with (very) brief contextual references is probably the sweet spot.

> Would it help if I rephrase it along the lines of:
> 
> This functionality isn't enabled until the introduction of the
> KVM_GMEM_SHARED_MEM Kconfig option, and enabled for a given instance
> by the GUEST_MEMFD_FLAG_SUPPORT_SHARED flag at creation time. Both of
> which are introduced in a subsequent patch.
> 
> > This changelog is way, way, waaay too light on details.  Sorry for jumping in at
> > the 11th hour, but we've spent what, 2 years working on this?
> 
> I'll expand this. Just to make sure that I include the right details,
> are you looking for implementation details, motivation, use cases?

Despite my lengthy response, none of the above?

Use cases are good fodder for Documentation and the cover letter, and for *brief*
references in the changelogs.  Implementation details generally don't need to be
explained in the changelog, modulo notable gotchas and edge cases that are worth
calling out.

I _am_ looking for the motivation, but I suspect it's not the motivation you have
in mind.  I'm not terribly concerned with why you want to implement this
functionality; that should be easy to glean from the Documentation and use case
links.

The motivation I'm looking for is why you're adding CONFIG_KVM_GMEM_SHARED_MEM
and GUEST_MEMFD_FLAG_SUPPORT_SHARED.

E.g. CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES was added because it gates large swaths
of code, uAPI, and a field we don't want to access "accidentally" (mem_attr_array),
and because CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES has a hard dependency on
CONFIG_KVM_GENERIC_MMU_NOTIFIER.

For CONFIG_KVM_GMEM_SHARED_MEM, I'm just not seeing the motiviation.   It gates
very little code (though that could be slightly changed by wrapping the mmap()
and fault logic guest_memfd.c), and literally every use is part of a broader
conditional.  I.e. it's effectively an optimization.

Ha!  And it's actively buggy.  Because this will allow shared gmem for DEFAULT_VM,

	#define kvm_arch_supports_gmem_shared_mem(kvm)			\
	(IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) &&			\
	 ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM ||		\
	  (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM))

but only if CONFIG_KVM_SW_PROTECTED_VM is selected.  That makes no sense.  And
that changelog is also sorely lacking.  It covers the what, but that's quite
useless, because I can very easily see the what from the code.  By covering the
"why" in the changelog, (hopefully) you would have come to the same conclusion
that selecting KVM_GMEM_SHARED_MEM iff KVM_SW_PROTECTED_VM is enabled doesn't
make any sense (because you wouldn't have been able to write a sane justification).

Or, if it somehow does make sense, i.e. if I'm missing something, then that
absolutely needs to in the changelog!

 : Define the architecture-specific macro to enable shared memory support
 : in guest_memfd for ordinary, i.e., non-CoCo, VM types, specifically
 : KVM_X86_DEFAULT_VM and KVM_X86_SW_PROTECTED_VM.
 : 
 : Enable the KVM_GMEM_SHARED_MEM Kconfig option if KVM_SW_PROTECTED_VM is
 : enabled.

As for GUEST_MEMFD_FLAG_SUPPORT_SHARED, after digging through the code, I _think_
the reason we need a flag is so that KVM knows to completely ignore the HVA in
the memslot.  (a) explaining that (again, for future readers) would be super
helpful, and (b) if there is other motiviation for a per-guest_memfd opt-in, then
_that_ is also very interesting.

And for (a), bonus points if you explain why it's a GUEST_MEMFD flag, e.g. as
opposed to a per-VM capability or per-memslot flag.  (Though this may be self-
evident to any readers that understand any of this, so definitely optional).

> > So my vote would be "GUEST_MEMFD_FLAG_MAPPABLE", and then something like
> > KVM_MEMSLOT_GUEST_MEMFD_ONLY.  That will make code like this:
> >
> >         if (kvm_slot_has_gmem(slot) &&
> >             (kvm_gmem_memslot_supports_shared(slot) ||
> >              kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
> >                 return kvm_gmem_max_mapping_level(slot, gfn, max_level);
> >         }
> >
> > much more intutive:
> >
> >         if (kvm_is_memslot_gmem_only(slot) ||
> >             kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE))
> >                 return kvm_gmem_max_mapping_level(slot, gfn, max_level);
> >
> > And then have kvm_gmem_mapping_order() do:
> >
> >         WARN_ON_ONCE(!kvm_slot_has_gmem(slot));
> >         return 0;
> 
> I have no preference really. To me this was intuitive, but I guess I
> have been staring at this way too long.

I agree that SHARED is intuitive for the pKVM use case (and probably all CoCo use
cases).  My objection with the name is that it's misleading/confusing for non-CoCo
VMs (at least for me), and that using SHARED could unnecessarily paint us into a
corner.

Specifically, if there are ever use cases where guest memory is shared between
entities *without* mapping guest memory into host userspace, then we'll be a bit
hosed.  Though as is tradition in KVM, I suppose we could just call it
GUEST_MEMFD_FLAG_SUPPORT_SHARED2 ;-)

Regarding CoCo vs. non-CoCo intuition, it's easy enough to discern that
GUEST_MEMFD_FLAG_MAPPABLE is required to do in-place sharing with host userspace.

But IMO it's not easy to glean that GUEST_MEMFD_FLAG_SUPPORT_SHARED is a
effectively a hard requirement for non-CoCo x86 VMs purely because because many
flows in KVM x86 will fail miserable if KVM can't access guest memory via uaccess,
i.e. if guest memory isn't mapped by host userspace.  In other words, it's as much
about working within KVM's existing design (and not losing support for a wide
swath of features) as it is about "sharing" guest memory with host userspace.

next prev parent reply	other threads:[~2025-06-17 23:04 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 01/18] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 02/18] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 03/18] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
2025-06-13 13:57   ` Ackerley Tng
2025-06-13 20:35   ` Sean Christopherson
2025-06-16  7:13     ` Fuad Tabba
2025-06-16 14:20       ` David Hildenbrand
2025-06-24 20:51     ` Ackerley Tng
2025-06-25  6:33       ` Roy, Patrick
2025-06-11 13:33 ` [PATCH v12 05/18] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 06/18] KVM: Fix comments that refer to slots_lock Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 07/18] KVM: Fix comment that refers to kvm uapi header path Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
2025-06-12 16:16   ` Shivank Garg
2025-06-13 21:03   ` Sean Christopherson
2025-06-13 21:18     ` David Hildenbrand
2025-06-13 22:48     ` Sean Christopherson
2025-06-16  6:52     ` Fuad Tabba
2025-06-16 14:16       ` David Hildenbrand
2025-06-17 23:04       ` Sean Christopherson [this message]
2025-06-18 11:18         ` Fuad Tabba
2025-06-16 13:44     ` Ira Weiny
2025-06-16 14:03       ` David Hildenbrand
2025-06-16 14:16         ` Fuad Tabba
2025-06-16 14:25           ` David Hildenbrand
2025-06-18  0:40             ` Sean Christopherson
2025-06-18  8:15               ` David Hildenbrand
2025-06-18  9:20                 ` Xiaoyao Li
2025-06-18  9:27                   ` David Hildenbrand
2025-06-18  9:44                     ` Xiaoyao Li
2025-06-18  9:59                       ` David Hildenbrand
2025-06-18 10:42                         ` Xiaoyao Li
2025-06-18 11:14                           ` David Hildenbrand
2025-06-18 12:17                             ` Xiaoyao Li
2025-06-18 13:16                               ` David Hildenbrand
2025-06-19  1:48                 ` Sean Christopherson
2025-06-19  1:50                   ` Sean Christopherson
2025-06-18  9:25     ` David Hildenbrand
2025-06-25 21:47   ` Ackerley Tng
2025-06-11 13:33 ` [PATCH v12 09/18] KVM: guest_memfd: Track shared memory support in memslot Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
2025-06-13 22:08   ` Sean Christopherson
2025-06-24 23:40     ` Ackerley Tng
2025-06-27 15:01       ` Ackerley Tng
2025-06-30  8:07         ` Fuad Tabba
2025-06-30 14:44           ` Ackerley Tng
2025-06-30 15:08             ` Fuad Tabba
2025-06-30 19:26               ` Shivank Garg
2025-06-30 20:03                 ` David Hildenbrand
2025-07-01 14:15                   ` Ackerley Tng
2025-07-01 14:44                     ` David Hildenbrand
2025-07-08  0:05                       ` Sean Christopherson
2025-07-08 13:44                         ` Ackerley Tng
2025-06-11 13:33 ` [PATCH v12 11/18] KVM: x86: Consult guest_memfd when computing max_mapping_level Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 12/18] KVM: x86: Enable guest_memfd shared memory for non-CoCo VMs Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 13/18] KVM: arm64: Refactor user_mem_abort() Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
2025-06-12 17:33   ` James Houghton
2025-06-11 13:33 ` [PATCH v12 15/18] KVM: arm64: Enable host mapping of shared guest_memfd memory Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 16/18] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 17/18] KVM: selftests: Don't use hardcoded page sizes in guest_memfd test Fuad Tabba
2025-06-12 16:24   ` Shivank Garg
2025-06-11 13:33 ` [PATCH v12 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
2025-06-12 16:23   ` Shivank Garg
2025-06-12 17:38 ` [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs David Hildenbrand
2025-06-24 10:02   ` Fuad Tabba
2025-06-24 10:16     ` David Hildenbrand
2025-06-24 10:25       ` Fuad Tabba
2025-06-24 11:44         ` David Hildenbrand
2025-06-24 11:58           ` Fuad Tabba
2025-06-24 17:50             ` Sean Christopherson
2025-06-25  8:00               ` Fuad Tabba
2025-06-25 14:07                 ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aFH0dCiljueHeCSp@google.com \
    --to=seanjc@google.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=fvdl@google.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=ira.weiny@intel.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=keirf@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=liam.merwick@oracle.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mic@digikod.net \
    --cc=michael.roth@amd.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=pankaj.gupta@amd.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_cvanscha@quicinc.com \
    --cc=quic_eberman@quicinc.com \
    --cc=quic_mnalajal@quicinc.com \
    --cc=quic_pderrin@quicinc.com \
    --cc=quic_pheragu@quicinc.com \
    --cc=quic_svaddagi@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    --cc=rientjes@google.com \
    --cc=roypat@amazon.co.uk \
    --cc=shuah@kernel.org \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wei.w.wang@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).