From: Sean Christopherson <seanjc@google.com>
To: Rick P Edgecombe <rick.p.edgecombe@intel.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"michael.roth@amd.com" <michael.roth@amd.com>,
Isaku Yamahata <isaku.yamahata@intel.com>
Subject: Re: [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo features
Date: Mon, 8 Apr 2024 18:21:21 -0700 [thread overview]
Message-ID: <ZhSYEVCHqSOpVKMh@google.com> (raw)
In-Reply-To: <43d1ade0461868016165e964e2bc97f280aee9d4.camel@intel.com>
On Fri, Apr 05, 2024, Rick P Edgecombe wrote:
> On Thu, 2024-04-04 at 08:13 -0400, Paolo Bonzini wrote:
> >
> > struct kvm_arch {
> > - unsigned long vm_type;
> > unsigned long n_used_mmu_pages;
> > unsigned long n_requested_mmu_pages;
> > unsigned long n_max_mmu_pages;
> > unsigned int indirect_shadow_pages;
> > u8 mmu_valid_gen;
> > + u8 vm_type;
> > + bool has_private_mem;
> > + bool has_protected_state;
>
> I'm a little late to this conversation, so hopefully not just complicating
> things. But why not deduce has_private_mem and has_protected_state from the
> vm_type during runtime? Like if kvm.arch.vm_type was instead a bit mask with
> the bit position of the KVM_X86_*_VM set, kvm_arch_has_private_mem() could
> bitwise-and with a compile time mask of vm_types that have primate memory.
> This also prevents it from ever transitioning through non-nonsensical states
> like vm_type == KVM_X86_TDX_VM, but !has_private_memory, so would be a little
> more robust.
LOL, time is a circle, or something like that. Paolo actually did this in v2[*],
and I objected, vociferously.
KVM advertises VM types to userspace via a 32-bit field, one bit per type. So
without more uAPI changes, the VM type needs to be <=31. KVM could embed the
"has private memory" information into the type, but then we cut down on the number
of possible VM types *and* bleed has_private_memory into KVM's ABI.
While it's unlikely KVM will ever support TDX without has_private_memory, it's
entirely possible that KVM could add support for an existing VM "base" type that
doesn't currently support private memory. E.g. with some massaging, KVM could
support private memory for SEV and SEV-ES. And then we need to add an entirely
new VM type just so that KVM can let it use private memory.
Obviously KVM could shove in bits after the fact, e.g. store vm_type as a u64
instead of u32 (or u8 as in this patch), but then what's the point? Burning a
byte instead of a bit for per-VM flag is a complete non-issue, and booleans tend
to yield code that's easier to read and easier to maintain.
[*] https://lore.kernel.org/all/ZdjL783FazB6V6Cy@google.com
> Partly why I ask is there is logic in the x86 MMU TDX changes that tries to
> be generic but still needs special handling for it. The current solution is
> to look at kvm_gfn_shared_mask() as TDX is the only vm type that sets it, but
> Isaku and I were discussing if we should check something else, that didn't
> appear to be tying together to unrelated concepts:
> https://lore.kernel.org/kvm/20240319235654.GC1994522@ls.amr.corp.intel.com/
>
> Since it's down the mail, the relevant snippet:
> "
> > > void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> > > struct kvm_memory_slot *slot)
> > > {
> > > - kvm_mmu_zap_all_fast(kvm);
> > > + if (kvm_gfn_shared_mask(kvm))
Whatever you do that is TDX specific and an internal KVM thing is likely the wrong
thing :-)
The main reason KVM doesn't do a targeted zap on memslot removal is because of ABI
baggage that we _think_ is limited to interaction with VFIO. Since KVM doesn't
have any ABI for TDX *or* SNP, I want to at least entertain the option of doing
a target zap for SNP as well as TDX, even though it's only truly "necessary" for
TDX, in quotes because it's not strictly necessary, e.g. KVM could BLOCK the S-EPT
entries without fully removing the mappings.
Whether or not targeted zapping is optimal for SNP (or any VM type) is very much
TBD, and likely highly dependent on use case, but at the same time it would be
nice to not rule it out completely.
E.g. ChromeOS currently has a use case where they frequently delete and recreate
a 2GiB (give or take) memslot. For that use case, zapping _just_ that memslot is
likely far superious than blasting and rebuilding the entire VM. But if userspace
deletes a 1TiB for some reason, e.g. for memory unplug?, then the fast zap is
probably better, even though it requires rebuilding all SPTEs.
> > There seems to be an attempt to abstract away the existence of Secure-
> > EPT in mmu.c, that is not fully successful. In this case the code
> > checks kvm_gfn_shared_mask() to see if it needs to handle the zapping
> > in a way specific needed by S-EPT. It ends up being a little confusing
> > because the actual check is about whether there is a shared bit. It
> > only works because only S-EPT is the only thing that has a
> > kvm_gfn_shared_mask().
> >
> > Doing something like (kvm->arch.vm_type == KVM_X86_TDX_VM) looks wrong,
> > but is more honest about what we are getting up to here. I'm not sure
> > though, what do you think?
>
> Right, I attempted and failed in zapping case. This is due to the restriction
> that the Secure-EPT pages must be removed from the leaves. the VMX case (also
> NPT, even SNP) heavily depends on zapping root entry as optimization.
As above, it's more nuanced than that. KVM has come to depend on the fast zap,
but it got that way *because* KVM has historical zapped everything, and userspace
has (unknowingly) relied on that behavior.
> I can think of
> - add TDX check. Looks wrong
> - Use kvm_gfn_shared_mask(kvm). confusing
Ya, even if we end up making it a hardcoded TDX thing, dress it up a bit. E.g.
even if KVM checks for a shared mask under the hood, add a helper to capture the
logic, e.g. kvm_zap_all_sptes_on_memslot_deletion(kvm).
> - Give other name for this check like zap_from_leafs (or better name?)
> The implementation is same to kvm_gfn_shared_mask() with comment.
> - Or we can add a boolean variable to struct kvm
If we _don't_ hardcode the behavior, a per-memslot flag or a per-VM capability
(and thus boolean) is likely the way to go. My off-the-cuff vote is probably for
a per-memslot flag.
next prev parent reply other threads:[~2024-04-09 1:21 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-04 12:13 [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 01/17] KVM: SVM: Invert handling of SEV and SEV_ES feature flags Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 02/17] KVM: SVM: Compile sev.c if and only if CONFIG_KVM_AMD_SEV=y Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 03/17] KVM: x86: use u64_to_user_ptr() Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 04/17] KVM: introduce new vendor op for KVM_GET_DEVICE_ATTR Paolo Bonzini
2024-04-04 21:30 ` Isaku Yamahata
2024-04-04 12:13 ` [PATCH v5 05/17] KVM: SEV: publish supported VMSA features Paolo Bonzini
2024-04-04 21:32 ` Isaku Yamahata
2024-04-04 12:13 ` [PATCH v5 06/17] KVM: SEV: store VMSA features in kvm_sev_info Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo features Paolo Bonzini
2024-04-04 21:39 ` Isaku Yamahata
2024-04-05 23:01 ` Edgecombe, Rick P
2024-04-09 1:21 ` Sean Christopherson [this message]
2024-04-09 14:01 ` Edgecombe, Rick P
2024-04-09 14:43 ` Paolo Bonzini
2024-04-09 15:26 ` Sean Christopherson
2024-05-07 23:01 ` Edgecombe, Rick P
2024-05-08 0:21 ` Sean Christopherson
2024-05-08 1:19 ` Edgecombe, Rick P
2024-05-08 14:38 ` Sean Christopherson
2024-05-08 15:04 ` Edgecombe, Rick P
2024-04-04 12:13 ` [PATCH v5 08/17] KVM: x86: Add supported_vm_types to kvm_caps Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 09/17] KVM: SEV: introduce to_kvm_sev_info Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 10/17] KVM: SEV: define VM types for SEV and SEV-ES Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 11/17] KVM: SEV: sync FPU and AVX state at LAUNCH_UPDATE_VMSA time Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 12/17] KVM: SEV: introduce KVM_SEV_INIT2 operation Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 13/17] KVM: SEV: allow SEV-ES DebugSwap again Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 14/17] selftests: kvm: add tests for KVM_SEV_INIT2 Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 15/17] selftests: kvm: switch to using KVM_X86_*_VM Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 16/17] selftests: kvm: split "launch" phase of SEV VM creation Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 17/17] selftests: kvm: add test for transferring FPU state into VMSA Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZhSYEVCHqSOpVKMh@google.com \
--to=seanjc@google.com \
--cc=isaku.yamahata@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=michael.roth@amd.com \
--cc=pbonzini@redhat.com \
--cc=rick.p.edgecombe@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox