Re: [PATCH RFC gmem v1 4/8] KVM: x86: Add gmem hook for invalidating memory

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Steven Price <steven.price@arm.com>
Cc: Michael Roth <michael.roth@amd.com>,
	kvm@vger.kernel.org,  Suzuki K Poulose <suzuki.poulose@arm.com>,
	"tabba@google.com" <tabba@google.com>,
	linux-coco@lists.linux.dev,  linux-mm@kvack.org,
	linux-crypto@vger.kernel.org, x86@kernel.org,
	 linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	 pbonzini@redhat.com, isaku.yamahata@intel.com,
	ackerleytng@google.com,  vbabka@suse.cz, ashish.kalra@amd.com,
	nikunj.dadhania@amd.com,  jroedel@suse.de, pankaj.gupta@amd.com
Subject: Re: [PATCH RFC gmem v1 4/8] KVM: x86: Add gmem hook for invalidating memory
Date: Fri, 9 Feb 2024 06:28:04 -0800	[thread overview]
Message-ID: <ZcY2VRsRd03UQdF7@google.com> (raw)
In-Reply-To: <e7125fcb-52b1-4942-9ae7-c85049e92e5c@arm.com>

On Fri, Feb 09, 2024, Steven Price wrote:
> On 16/10/2023 12:50, Michael Roth wrote:
> > In some cases, like with SEV-SNP, guest memory needs to be updated in a
> > platform-specific manner before it can be safely freed back to the host.
> > Wire up arch-defined hooks to the .free_folio kvm_gmem_aops callback to
> > allow for special handling of this sort when freeing memory in response
> > to FALLOC_FL_PUNCH_HOLE operations and when releasing the inode, and go
> > ahead and define an arch-specific hook for x86 since it will be needed
> > for handling memory used for SEV-SNP guests.
> 
> Hi all,
> 
> Arm CCA has a similar need to prepare/unprepare memory (granule
> delegate/undelegate using our terminology) before it is used for
> protected memory.
> 
> However I see a problem with the current gmem implementation that the
> "invalidations" are not precise enough for our RMI API. When punching a
> hole in the memfd the code currently hits the same path (ending in
> kvm_unmap_gfn_range()) as if a VMA is modified in the same range (for
> the shared version).
>
> The Arm CCA architecture doesn't allow the protected memory to be removed and
> refaulted without the permission of the guest (the memory contents would be
> wiped in this case).

TDX behaves almost exactly like CCA.  Well, that's not technically true, strictly
speaking, as there are TDX APIs that do allow for *temporarily* marking mappings
!PRESENT, but those aren't in play for invalidation events like this.

SNP does allow zapping page table mappings, but fully removing a page, as PUNCH_HOLE
would do, is destructive, so SNP also behaves the same way for all intents and
purposes.

> One option that I've considered is to implement a seperate CCA ioctl to
> notify KVM whether the memory should be mapped protected.

That's what KVM_SET_MEMORY_ATTRIBUTES+KVM_MEMORY_ATTRIBUTE_PRIVATE is for, no?

> The invalidations would then be ignored on ranges that are currently
> protected for this guest.

That's backwards.  Invalidations on a guest_memfd should affect only *protected*
mappings.  And for that, the plan/proposal is to plumb only_{shared,private} flags
into "struct kvm_gfn_range"[1] so that guest_memfd invalidations don't zap shared
mappings, and mmu_notifier invalidation don't zap private mappings.  Sample usage
in the TDX context[2] (disclaimer, I'm pretty sure I didn't write most of that
patch despite, I only provided a rough sketch).

[1] https://lore.kernel.org/all/20231027182217.3615211-13-seanjc@google.com
[2] https://lore.kernel.org/all/0b308fb6dd52bafe7153086c7f54bfad03da74b1.1705965635.git.isaku.yamahata@intel.com

> This 'solves' the problem nicely except for the case where the VMM
> deliberately punches holes in memory which the guest is using.

I don't see what problem there is to solve in this case.  PUNCH_HOLE is destructive,
so don't do that.

> The issue in this case is that there's no way of failing the punch hole
> operation - we can detect that the memory is in use and shouldn't be
> freed, but this callback doesn't give the opportunity to actually block
> the freeing of the memory.

Why is this KVM's problem?  E.g. the same exact thing happens without guest_memfd
if userspace munmap()s memory the guest is using.

> Sadly there's no easy way to map from a physical page in a gmem back to
> which VM (and where in the VM) the page is mapped. So actually ripping
> the page out of the appropriate VM isn't really possible in this case.

I don't follow.  guest_memfd has a 1:1 binding with a VM *and* a gfn, how can you
not know what exactly needs to be invalidated?

> How is this situation handled on x86? Is it possible to invalidate and
> then refault a protected page without affecting the memory contents? My
> guess is yes and that is a CCA specific problem - is my understanding
> correct?
> 
> My current thoughts for CCA are one of three options:
> 
> 1. Represent shared and protected memory as two separate memslots. This
> matches the underlying architecture more closely (the top address bit is
> repurposed as a 'shared' flag), but I don't like it because it's a
> deviation from other CoCo architectures (notably pKVM).
> 
> 2. Allow punch-hole to fail on CCA if the memory is mapped into the
> guest's protected space. Again, this is CCA being different and also
> creates nasty corner cases where the gmem descriptor could have to
> outlive the VMM - so looks like a potential source of memory leaks.
> 
> 3. 'Fix' the invalidation to provide more precise semantics. I haven't
> yet prototyped it but it might be possible to simply provide a flag from
> kvm_gmem_invalidate_begin specifying that the invalidation is for the
> protected memory. KVM would then only unmap the protected memory when
> this flag is set (avoiding issues with VMA updates causing spurious unmaps).
> 
> Fairly obviously (3) is my preferred option, but it relies on the
> guarantees that the "invalidation" is actually a precise set of
> addresses where the memory is actually being freed.

#3 is what we are planning for x86, and except for the only_{shared,private} flags,
the requisite functionality should already be in Linus' tree, though it does need
to be wired up for ARM.

next prev parent reply	other threads:[~2024-02-09 14:28 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16 11:50 [PATCH RFC gmem v1 0/8] KVM: gmem hooks/changes needed for x86 (other archs?) Michael Roth
2023-10-16 11:50 ` [PATCH RFC gmem v1 1/8] mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory Michael Roth
2023-10-16 11:50 ` [PATCH RFC gmem v1 2/8] KVM: Use AS_INACCESSIBLE when creating guest_memfd inode Michael Roth
2023-10-16 11:50 ` [PATCH RFC gmem v1 3/8] KVM: x86: Add gmem hook for initializing memory Michael Roth
2024-02-08 10:57   ` Suzuki K Poulose
2024-02-08 17:29     ` Sean Christopherson
2023-10-16 11:50 ` [PATCH RFC gmem v1 4/8] KVM: x86: Add gmem hook for invalidating memory Michael Roth
2024-02-09 10:11   ` Steven Price
2024-02-09 14:28     ` Sean Christopherson [this message]
2024-02-09 15:02       ` Steven Price
2024-02-09 15:13         ` Sean Christopherson
2024-03-11 17:24           ` Michael Roth
2024-03-12 20:26             ` Sean Christopherson
2024-03-13 17:11               ` Steven Price
2023-10-16 11:50 ` [PATCH RFC gmem v1 5/8] KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults Michael Roth
2023-10-16 11:50 ` [PATCH RFC gmem v1 6/8] KVM: x86: Add KVM_X86_SNP_VM vm_type Michael Roth
2023-10-16 11:50 ` [PATCH RFC gmem v1 7/8] KVM: x86: Define RMP page fault error bits for #NPF Michael Roth
2023-10-16 11:50 ` [PATCH RFC gmem v1 8/8] KVM: x86: Determine shared/private faults based on vm_type Michael Roth
2024-01-31  1:13   ` Sean Christopherson
2024-02-08  0:24     ` Michael Roth
2024-02-08 17:27       ` Sean Christopherson
2024-02-08 17:30         ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZcY2VRsRd03UQdF7@google.com \
    --to=seanjc@google.com \
    --cc=ackerleytng@google.com \
    --cc=ashish.kalra@amd.com \
    --cc=isaku.yamahata@intel.com \
    --cc=jroedel@suse.de \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=michael.roth@amd.com \
    --cc=nikunj.dadhania@amd.com \
    --cc=pankaj.gupta@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=vbabka@suse.cz \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.