Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Fuad Tabba <tabba@google.com>
To: Ackerley Tng <ackerleytng@google.com>
Cc: Sean Christopherson <seanjc@google.com>,
	kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org,
	 linux-mm@kvack.org, kvmarm@lists.linux.dev, pbonzini@redhat.com,
	 chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org,
	 paul.walmsley@sifive.com, palmer@dabbelt.com,
	aou@eecs.berkeley.edu,  viro@zeniv.linux.org.uk,
	brauner@kernel.org, willy@infradead.org,
	 akpm@linux-foundation.org, xiaoyao.li@intel.com,
	yilun.xu@intel.com,  chao.p.peng@linux.intel.com,
	jarkko@kernel.org, amoorthy@google.com,  dmatlack@google.com,
	isaku.yamahata@intel.com, mic@digikod.net,  vbabka@suse.cz,
	vannapurve@google.com, mail@maciej.szmigiero.name,
	 david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com,
	 liam.merwick@oracle.com, isaku.yamahata@gmail.com,
	 kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com,
	steven.price@arm.com,  quic_eberman@quicinc.com,
	quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com,
	 quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com,
	 quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
	catalin.marinas@arm.com,  james.morse@arm.com,
	yuzenghui@huawei.com, oliver.upton@linux.dev,  maz@kernel.org,
	will@kernel.org, qperret@google.com, keirf@google.com,
	 roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org,
	jgg@nvidia.com,  rientjes@google.com, jhubbard@nvidia.com,
	fvdl@google.com, hughd@google.com,  jthoughton@google.com,
	peterx@redhat.com, pankaj.gupta@amd.com,  ira.weiny@intel.com
Subject: Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
Date: Mon, 30 Jun 2025 16:08:15 +0100	[thread overview]
Message-ID: <CA+EHjTwqOwO2zVd4zTYF7w7reTWMNjmCV6XnKux2JtPwYCAoZQ@mail.gmail.com> (raw)
In-Reply-To: <diqzv7odjnln.fsf@ackerleytng-ctop.c.googlers.com>

Hi Ackerley,

On Mon, 30 Jun 2025 at 15:44, Ackerley Tng <ackerleytng@google.com> wrote:
>
> Fuad Tabba <tabba@google.com> writes:
>
> > Hi Ackerley,
> >
> > On Fri, 27 Jun 2025 at 16:01, Ackerley Tng <ackerleytng@google.com> wrote:
> >>
> >> Ackerley Tng <ackerleytng@google.com> writes:
> >>
> >> > [...]
> >>
> >> >>> +/*
> >> >>> + * Returns true if the given gfn's private/shared status (in the CoCo sense) is
> >> >>> + * private.
> >> >>> + *
> >> >>> + * A return value of false indicates that the gfn is explicitly or implicitly
> >> >>> + * shared (i.e., non-CoCo VMs).
> >> >>> + */
> >> >>>  static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> >> >>>  {
> >> >>> -   return IS_ENABLED(CONFIG_KVM_GMEM) &&
> >> >>> -          kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
> >> >>> +   struct kvm_memory_slot *slot;
> >> >>> +
> >> >>> +   if (!IS_ENABLED(CONFIG_KVM_GMEM))
> >> >>> +           return false;
> >> >>> +
> >> >>> +   slot = gfn_to_memslot(kvm, gfn);
> >> >>> +   if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
> >> >>> +           /*
> >> >>> +            * Without in-place conversion support, if a guest_memfd memslot
> >> >>> +            * supports shared memory, then all the slot's memory is
> >> >>> +            * considered not private, i.e., implicitly shared.
> >> >>> +            */
> >> >>> +           return false;
> >> >>
> >> >> Why!?!?  Just make sure KVM_MEMORY_ATTRIBUTE_PRIVATE is mutually exclusive with
> >> >> mappable guest_memfd.  You need to do that no matter what.
> >> >
> >> > Thanks, I agree that setting KVM_MEMORY_ATTRIBUTE_PRIVATE should be
> >> > disallowed for gfn ranges whose slot is guest_memfd-only. Missed that
> >> > out. Where do people think we should check the mutual exclusivity?
> >> >
> >> > In kvm_supported_mem_attributes() I'm thiking that we should still allow
> >> > the use of KVM_MEMORY_ATTRIBUTE_PRIVATE for other non-guest_memfd-only
> >> > gfn ranges. Or do people think we should just disallow
> >> > KVM_MEMORY_ATTRIBUTE_PRIVATE for the entire VM as long as one memslot is
> >> > a guest_memfd-only memslot?
> >> >
> >> > If we check mutually exclusivity when handling
> >> > kvm_vm_set_memory_attributes(), as long as part of the range where
> >> > KVM_MEMORY_ATTRIBUTE_PRIVATE is requested to be set intersects a range
> >> > whose slot is guest_memfd-only, the ioctl will return EINVAL.
> >> >
> >>
> >> At yesterday's (2025-06-26) guest_memfd upstream call discussion,
> >>
> >> * Fuad brought up a possible use case where within the *same* VM, we
> >>   want to allow both memslots that supports and does not support mmap in
> >>   guest_memfd.
> >> * Shivank suggested a concrete use case for this: the user wants a
> >>   guest_memfd memslot that supports mmap just so userspace addresses can
> >>   be used as references for specifying memory policy.
> >> * Sean then added on that allowing both types of guest_memfd memslots
> >>   (support and not supporting mmap) will allow the user to have a second
> >>   layer of protection and ensure that for some memslots, the user
> >>   expects never to be able to mmap from the memslot.
> >>
> >> I agree it will be useful to allow both guest_memfd memslots that
> >> support and do not support mmap in a single VM.
> >>
> >> I think I found an issue with flags, which is that GUEST_MEMFD_FLAG_MMAP
> >> should not imply that the guest_memfd will provide memory for all guest
> >> faults within the memslot's gfn range (KVM_MEMSLOT_GMEM_ONLY).
> >>
> >> For the use case Shivank raised, if the user wants a guest_memfd memslot
> >> that supports mmap just so userspace addresses can be used as references
> >> for specifying memory policy for legacy Coco VMs where shared memory
> >> should still come from other sources, GUEST_MEMFD_FLAG_MMAP will be set,
> >> but KVM can't fault shared memory from guest_memfd. Hence,
> >> GUEST_MEMFD_FLAG_MMAP should not imply KVM_MEMSLOT_GMEM_ONLY.
> >>
> >> Thinking forward, if we want guest_memfd to provide (no-mmap) protection
> >> even for non-CoCo VMs (such that perhaps initial VM image is populated
> >> and then VM memory should never be mmap-ed at all), we will want
> >> guest_memfd to be the source of memory even if GUEST_MEMFD_FLAG_MMAP is
> >> not set.
> >>
> >> I propose that we should have a single VM-level flag to solve this (in
> >> line with Sean's guideline that we should just move towards what we want
> >> and not support non-existent use cases): something like
> >> KVM_CAP_PREFER_GMEM.
> >>
> >> If KVM_CAP_PREFER_GMEM_MEMORY is set,
> >>
> >> * memory for any gfn range in a guest_memfd memslot will be requested
> >>   from guest_memfd
> >> * any privacy status queries will also be directed to guest_memfd
> >> * KVM_MEMORY_ATTRIBUTE_PRIVATE will not be a valid attribute
> >>
> >> KVM_CAP_PREFER_GMEM_MEMORY will be orthogonal with no validation on
> >> GUEST_MEMFD_FLAG_MMAP, which should just purely guard mmap support in
> >> guest_memfd.
> >>
> >> Here's a table that I set up [1]. I believe the proposed
> >> KVM_CAP_PREFER_GMEM_MEMORY (column 7) lines up with requirements
> >> (columns 1 to 4) correctly.
> >>
> >> [1] https://lpc.events/event/18/contributions/1764/attachments/1409/3710/guest_memfd%20use%20cases%20vs%20guest_memfd%20flags%20and%20privacy%20tracking.pdf
> >
> > I'm not sure this naming helps. What does "prefer" imply here? If the
> > caller from user space does not prefer, does it mean that they
> > mind/oppose?
> >
>
> Sorry, bad naming.
>
> I used "prefer" because some memslots may not have guest_memfd at
> all. To clarify, a "guest_memfd memslot" is a memslot that has some
> valid guest_memfd fd and offset. The memslot may also have a valid
> userspace_addr configured, either mmap-ed from the same guest_memfd fd
> or from some other backing memory (for legacy CoCo VMs), or NULL for
> userspace_addr.
>
> I meant to have the CAP enable KVM_MEMSLOT_GMEM_ONLY of this patch
> series for all memslots that have some valid guest_memfd fd and offset,
> except if we have a VM-level CAP, KVM_MEMSLOT_GMEM_ONLY should be moved
> to the VM level.

Regardless of the name, I feel that this functionality at best does
not belong in this series, and potentially adds more confusion.

Userspace should be specific about what it wants, and they know what
kind of memslots there are in the VM: userspace creates them. In that
case, userspace can either create a legacy memslot, no need for any of
the new flags, or it can create a guest_memfd memslot, and then use
any new flags to qualify that. Having a flag/capability that means
something for guest_memfd memslots, but effectively keeps the same
behavior for legacy ones seems to add more confusion.

> > Regarding the use case Shivank mentioned, mmaping for policy, while
> > the use case is a valid one, the raison d'être of mmap is to map into
> > user space (i.e., fault it in). I would argue that if you opt into
> > mmap, you are doing it to be able to access it.
>
> The above is in conflict with what was discussed on 2025-06-26 IIUC.
>
> Shivank brought up the case of enabling mmap *only* to be able to set
> mempolicy using the VMAs, and Sean (IIUC) later agreed we should allow
> userspace to only enable mmap but still disable faults, so that userspace
> is given additional protection, such that even if a (compromised)
> userspace does a private-to-shared conversion, userspace is still not
> allowed to fault in the page.

I don't think there's a conflict :)  What I think is this is outside
of the scope of this series for a few reasons:

- This is prior to the mempolicy work (and is the base for it)
- If we need to, we can add a flag later to restrict mmap faulting
- Once we get in-place conversion, the mempolicy work could use the
ability to disallow mapping for private memory

By actually implementing something now, we would be restricting the
mempolicy work, rather than helping it, since we would effectively be
deciding now how that work should proceed. By keeping this the way it
is now, the mempolicy work can explore various alternatives.

I think we discussed this in the guest_memfd sync of 2025-06-12, and I
think this was roughly our conclusion.

> Hence, if we want to support mmaping just for policy and continue to
> restrict faulting, then GUEST_MEMFD_FLAG_MMAP should not imply
> KVM_MEMSLOT_GMEM_ONLY.
>
> > To me, that seems like
> > something that merits its own flag, rather than mmap. Also, I recall
> > that we said that later on, with inplace conversion, that won't be
> > even necessary.
>
> On x86, as of now I believe we're going with an ioctl that does *not*
> check what the guest prefers and will go ahead to perform the
> private-to-shared conversion, which will go ahead to update
> shareability.

Here I think you're making my case that we're dragging more complexity
from future work/series into this series, since now we're going into
the IOCTLs for the conversion series :)

> > In other words, this would also be trying to solve a
> > problem that we haven't yet encountered and that we have a solution
> > for anyway.
> >
>
> So we don't have a solution for the use case where userspace wants to
> mmap but never fault for userspace's protection from stray
> private-to-shared conversions, unless we decouple GUEST_MEMFD_FLAG_MMAP
> and KVM_MEMSLOT_GMEM_ONLY.
>
> > I think that, unless anyone disagrees, is to go ahead with the names
> > we discussed in the last meeting. They seem to be the ones that make
> > the most sense for the upcoming use cases.
> >
>
> We could also discuss if we really want to support the use case where
> userspace wants to mmap but never fault for userspace's protection from
> stray private-to-shared conversions.

I would really rather defer that work to when it's needed. It seems
that we should aim to land this series as soon as possible, since it's
the one blocking much of the future work. As far as I can tell,
nothing here precludes introducing the mechanism of supporting the
case where userspace wants to mmap but never fault, once it's needed.
This was I believe what we had agreed on in the sync on 2025-06-26.

Cheers,
/fuad

next prev parent reply	other threads:[~2025-06-30 15:08 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 01/18] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 02/18] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 03/18] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
2025-06-13 13:57   ` Ackerley Tng
2025-06-13 20:35   ` Sean Christopherson
2025-06-16  7:13     ` Fuad Tabba
2025-06-16 14:20       ` David Hildenbrand
2025-06-24 20:51     ` Ackerley Tng
2025-06-25  6:33       ` Roy, Patrick
2025-06-11 13:33 ` [PATCH v12 05/18] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 06/18] KVM: Fix comments that refer to slots_lock Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 07/18] KVM: Fix comment that refers to kvm uapi header path Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
2025-06-12 16:16   ` Shivank Garg
2025-06-13 21:03   ` Sean Christopherson
2025-06-13 21:18     ` David Hildenbrand
2025-06-13 22:48     ` Sean Christopherson
2025-06-16  6:52     ` Fuad Tabba
2025-06-16 14:16       ` David Hildenbrand
2025-06-17 23:04       ` Sean Christopherson
2025-06-18 11:18         ` Fuad Tabba
2025-06-16 13:44     ` Ira Weiny
2025-06-16 14:03       ` David Hildenbrand
2025-06-16 14:16         ` Fuad Tabba
2025-06-16 14:25           ` David Hildenbrand
2025-06-18  0:40             ` Sean Christopherson
2025-06-18  8:15               ` David Hildenbrand
2025-06-18  9:20                 ` Xiaoyao Li
2025-06-18  9:27                   ` David Hildenbrand
2025-06-18  9:44                     ` Xiaoyao Li
2025-06-18  9:59                       ` David Hildenbrand
2025-06-18 10:42                         ` Xiaoyao Li
2025-06-18 11:14                           ` David Hildenbrand
2025-06-18 12:17                             ` Xiaoyao Li
2025-06-18 13:16                               ` David Hildenbrand
2025-06-19  1:48                 ` Sean Christopherson
2025-06-19  1:50                   ` Sean Christopherson
2025-06-18  9:25     ` David Hildenbrand
2025-06-25 21:47   ` Ackerley Tng
2025-06-11 13:33 ` [PATCH v12 09/18] KVM: guest_memfd: Track shared memory support in memslot Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
2025-06-13 22:08   ` Sean Christopherson
2025-06-24 23:40     ` Ackerley Tng
2025-06-27 15:01       ` Ackerley Tng
2025-06-30  8:07         ` Fuad Tabba
2025-06-30 14:44           ` Ackerley Tng
2025-06-30 15:08             ` Fuad Tabba [this message]
2025-06-30 19:26               ` Shivank Garg
2025-06-30 20:03                 ` David Hildenbrand
2025-07-01 14:15                   ` Ackerley Tng
2025-07-01 14:44                     ` David Hildenbrand
2025-07-08  0:05                       ` Sean Christopherson
2025-07-08 13:44                         ` Ackerley Tng
2025-06-11 13:33 ` [PATCH v12 11/18] KVM: x86: Consult guest_memfd when computing max_mapping_level Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 12/18] KVM: x86: Enable guest_memfd shared memory for non-CoCo VMs Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 13/18] KVM: arm64: Refactor user_mem_abort() Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
2025-06-12 17:33   ` James Houghton
2025-06-11 13:33 ` [PATCH v12 15/18] KVM: arm64: Enable host mapping of shared guest_memfd memory Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 16/18] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 17/18] KVM: selftests: Don't use hardcoded page sizes in guest_memfd test Fuad Tabba
2025-06-12 16:24   ` Shivank Garg
2025-06-11 13:33 ` [PATCH v12 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
2025-06-12 16:23   ` Shivank Garg
2025-06-12 17:38 ` [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs David Hildenbrand
2025-06-24 10:02   ` Fuad Tabba
2025-06-24 10:16     ` David Hildenbrand
2025-06-24 10:25       ` Fuad Tabba
2025-06-24 11:44         ` David Hildenbrand
2025-06-24 11:58           ` Fuad Tabba
2025-06-24 17:50             ` Sean Christopherson
2025-06-25  8:00               ` Fuad Tabba
2025-06-25 14:07                 ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+EHjTwqOwO2zVd4zTYF7w7reTWMNjmCV6XnKux2JtPwYCAoZQ@mail.gmail.com \
    --to=tabba@google.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=fvdl@google.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=ira.weiny@intel.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=keirf@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=liam.merwick@oracle.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mic@digikod.net \
    --cc=michael.roth@amd.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=pankaj.gupta@amd.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_cvanscha@quicinc.com \
    --cc=quic_eberman@quicinc.com \
    --cc=quic_mnalajal@quicinc.com \
    --cc=quic_pderrin@quicinc.com \
    --cc=quic_pheragu@quicinc.com \
    --cc=quic_svaddagi@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    --cc=rientjes@google.com \
    --cc=roypat@amazon.co.uk \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wei.w.wang@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).