From: Sean Christopherson <seanjc@google.com>
To: Dmytro Maluka <dmy@semihalf.com>
Cc: Jason CJ Chen <jason.cj.chen@intel.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"android-kvm@google.com" <android-kvm@google.com>,
Dmitry Torokhov <dtor@chromium.org>,
Tomasz Nowicki <tn@semihalf.com>,
Grzegorz Jaszczyk <jaz@semihalf.com>,
Keir Fraser <keirf@google.com>
Subject: Re: [RFC PATCH part-5 00/22] VMX emulation
Date: Tue, 13 Jun 2023 12:50:52 -0700 [thread overview]
Message-ID: <ZIjInENnK5/L/Jsd@google.com> (raw)
In-Reply-To: <309da807-2fdb-69ea-3b1b-ff36fc1d67ec@semihalf.com>
On Fri, Jun 09, 2023, Dmytro Maluka wrote:
> On 6/9/23 04:07, Chen, Jason CJ wrote:
> > I think with PV design, we can benefit from skip shadowing. For example, a TLB flush
> > could be done in hypervisor directly, while shadowing EPT need emulate it by destroy
> > shadow EPT page table entries then do next shadowing upon ept violation.
This is a bit misleading. KVM has an effective TLB for nested TDP only for 4KiB
pages; larger shadow pages are never allowed to go out-of-sync, i.e. KVM doesn't
wait until L1 does a TLB flush to update SPTEs. KVM does "unload" roots, e.g. to
emulate INVEPT, but that usually just ends up being an extra slow TLB flush in L0,
because nested TDP SPTEs rarely go unsync in practice. The patterns for hypervisors
managing VM memory don't typically trigger the types of PTE modifications that
result in unsync SPTEs.
I actually have a (very tiny) patch sitting around somwhere to disable unsync support
when TDP is enabled. There is a very, very thoeretical bug where KVM might fail
to honor when a guest TDP PTE change is architecturally supposed to be visible,
and the simplest fix (by far) is to disable unsync support. Disabling TDP+unsync
is a viable fix because unsync support is almost never used for nested TDP. Legacy
shadow paging on the other hand *significantly* benefits from unsync support, e.g.
when the guest is managing CoW mappings. I haven't gotten around to posting the
patch to disable unsync on TDP purely because the flaw is almost comically theoretical.
Anyways, the point is that the TLB flushing side of nested TDP isn't all that
interesting.
> Yeah indeed, good point.
>
> Is my understanding correct: TLB flush is still gonna be requested by
> the host VM via a hypercall, but the benefit is that the hypervisor
> merely needs to do INVEPT?
Maybe? A paravirt paging scheme could do whatever it wanted. The APIs could be
designed in such a way that L1 never needs to explicitly request a TLB flush,
e.g. if the contract is that changes must always become immediately visible to L2.
And TLB flushing is but one small aspect of page table shadowing. With PV paging,
L1 wouldn't need to manage hardware-defined page tables, i.e. could use any arbitrary
data type. E.g. KVM as L1 could use an XArray to track L2 mappings. And L0 in
turn wouldn't need to have vendor specific code, i.e. pKVM on x86 (potentially
*all* architectures) could have a single nested paging scheme for both Intel and
AMD, as opposed to needing code to deal with the differences between EPT and NPT.
A few months back, I mentally worked through the flows[*] (I forget why I was
thinking about PV paging), and I'm pretty sure that adapting x86's TDP MMU to
support PV paging would be easy-ish, e.g. kvm_tdp_mmu_map() would become an
XArray insertion (to track the L2 mapping) + hypercall (to inform L1 of the new
mapping).
[*] I even though of a catchy name, KVM Paravirt Only Paging, a.k.a. KPOP ;-)
next prev parent reply other threads:[~2023-06-13 19:50 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-12 18:02 [RFC PATCH part-5 00/22] VMX emulation Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 01/22] pkvm: x86: Add memcpy lib Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 02/22] pkvm: x86: Add memory operation APIs for for host VM Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 03/22] pkvm: x86: Do guest address translation per page granularity Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 04/22] pkvm: x86: Add check for guest address translation Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 05/22] pkvm: x86: Add hypercalls for shadow_vm/vcpu init & teardown Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 06/22] KVM: VMX: Add new kvm_x86_ops vm_free Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 07/22] KVM: VMX: Add initialization/teardown for shadow vm/vcpu Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 08/22] pkvm: x86: Add hash table mapping for shadow vcpu based on vmcs12_pa Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 09/22] pkvm: x86: Add VMXON/VMXOFF emulation Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 10/22] pkvm: x86: Add has_vmcs_field() API for physical vmx capability check Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 11/22] KVM: VMX: Add more vmcs and vmcs12 fields definition Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 12/22] pkvm: x86: Init vmcs read/write bitmap for vmcs emulation Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 13/22] pkvm: x86: Initialize emulated fields " Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 14/22] pkvm: x86: Add msr ops for pKVM hypervisor Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 15/22] pkvm: x86: Move _init_host_state_area to " Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 16/22] pkvm: x86: Add vmcs_load/clear_track APIs Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 17/22] pkvm: x86: Add VMPTRLD/VMCLEAR emulation Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 18/22] pkvm: x86: Add VMREAD/VMWRITE emulation Jason Chen CJ
2023-03-12 18:03 ` [RFC PATCH part-5 19/22] pkvm: x86: Add VMLAUNCH/VMRESUME emulation Jason Chen CJ
2023-03-12 18:03 ` [RFC PATCH part-5 20/22] pkvm: x86: Add INVEPT/INVVPID emulation Jason Chen CJ
2023-03-12 18:03 ` [RFC PATCH part-5 21/22] pkvm: x86: Initialize msr_bitmap for vmsr Jason Chen CJ
2023-03-12 18:03 ` [RFC PATCH part-5 22/22] pkvm: x86: Add vmx msr emulation Jason Chen CJ
2023-03-13 16:58 ` [RFC PATCH part-5 00/22] VMX emulation Sean Christopherson
2023-03-14 16:29 ` Jason Chen CJ
2023-06-08 21:38 ` Dmytro Maluka
2023-06-09 2:07 ` Chen, Jason CJ
2023-06-09 8:34 ` Dmytro Maluka
2023-06-13 19:50 ` Sean Christopherson [this message]
2023-06-15 18:07 ` Dmytro Maluka
2023-06-20 15:46 ` Jason Chen CJ
2023-09-05 9:47 ` Jason Chen CJ
2023-06-15 3:59 ` Chen, Jason CJ
2023-06-15 21:13 ` Nadav Amit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZIjInENnK5/L/Jsd@google.com \
--to=seanjc@google.com \
--cc=android-kvm@google.com \
--cc=dmy@semihalf.com \
--cc=dtor@chromium.org \
--cc=jason.cj.chen@intel.com \
--cc=jaz@semihalf.com \
--cc=keirf@google.com \
--cc=kvm@vger.kernel.org \
--cc=tn@semihalf.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox