From: Jason Chen CJ <jason.cj.chen@intel.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Dmytro Maluka <dmy@semihalf.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"android-kvm@google.com" <android-kvm@google.com>,
Dmitry Torokhov <dtor@chromium.org>,
Tomasz Nowicki <tn@semihalf.com>,
Grzegorz Jaszczyk <jaz@semihalf.com>,
Keir Fraser <keirf@google.com>
Subject: Re: [RFC PATCH part-5 00/22] VMX emulation
Date: Tue, 20 Jun 2023 15:46:21 +0000 [thread overview]
Message-ID: <ZJHJzU607QOYeRM3@jiechen-ubuntu-dev> (raw)
In-Reply-To: <ZIjInENnK5/L/Jsd@google.com>
On Tue, Jun 13, 2023 at 12:50:52PM -0700, Sean Christopherson wrote:
> On Fri, Jun 09, 2023, Dmytro Maluka wrote:
> > On 6/9/23 04:07, Chen, Jason CJ wrote:
> > > I think with PV design, we can benefit from skip shadowing. For example, a TLB flush
> > > could be done in hypervisor directly, while shadowing EPT need emulate it by destroy
> > > shadow EPT page table entries then do next shadowing upon ept violation.
>
> This is a bit misleading. KVM has an effective TLB for nested TDP only for 4KiB
> pages; larger shadow pages are never allowed to go out-of-sync, i.e. KVM doesn't
> wait until L1 does a TLB flush to update SPTEs. KVM does "unload" roots, e.g. to
> emulate INVEPT, but that usually just ends up being an extra slow TLB flush in L0,
> because nested TDP SPTEs rarely go unsync in practice. The patterns for hypervisors
> managing VM memory don't typically trigger the types of PTE modifications that
> result in unsync SPTEs.
>
> I actually have a (very tiny) patch sitting around somwhere to disable unsync support
> when TDP is enabled. There is a very, very thoeretical bug where KVM might fail
> to honor when a guest TDP PTE change is architecturally supposed to be visible,
> and the simplest fix (by far) is to disable unsync support. Disabling TDP+unsync
> is a viable fix because unsync support is almost never used for nested TDP. Legacy
> shadow paging on the other hand *significantly* benefits from unsync support, e.g.
> when the guest is managing CoW mappings. I haven't gotten around to posting the
> patch to disable unsync on TDP purely because the flaw is almost comically theoretical.
>
> Anyways, the point is that the TLB flushing side of nested TDP isn't all that
> interesting.
Agree. Thanks to point it out! I was thinking based on comparing to
current RFC pkvm on x86 solution. :-(
To me, the KVM page table shadowing mechanism (e.g., unsync & sync page)
is too heavy & complicated, if we have KPOP solution, IIUC, we may be
able to totally remove all shadowing stuff, right? :-)
BTW, KPOP may bring questions to support access tracking & page
dirty loging which may need extend more PV interfaces. MMIO fault
could be another issue if we want to keep optimization based on EPT
MISCONFIG for IA platform.
>
> > Yeah indeed, good point.
> >
> > Is my understanding correct: TLB flush is still gonna be requested by
> > the host VM via a hypercall, but the benefit is that the hypervisor
> > merely needs to do INVEPT?
>
> Maybe? A paravirt paging scheme could do whatever it wanted. The APIs could be
> designed in such a way that L1 never needs to explicitly request a TLB flush,
> e.g. if the contract is that changes must always become immediately visible to L2.
>
> And TLB flushing is but one small aspect of page table shadowing. With PV paging,
> L1 wouldn't need to manage hardware-defined page tables, i.e. could use any arbitrary
> data type. E.g. KVM as L1 could use an XArray to track L2 mappings. And L0 in
> turn wouldn't need to have vendor specific code, i.e. pKVM on x86 (potentially
> *all* architectures) could have a single nested paging scheme for both Intel and
> AMD, as opposed to needing code to deal with the differences between EPT and NPT.
>
> A few months back, I mentally worked through the flows[*] (I forget why I was
> thinking about PV paging), and I'm pretty sure that adapting x86's TDP MMU to
> support PV paging would be easy-ish, e.g. kvm_tdp_mmu_map() would become an
> XArray insertion (to track the L2 mapping) + hypercall (to inform L1 of the new
> mapping).
>
> [*] I even though of a catchy name, KVM Paravirt Only Paging, a.k.a. KPOP ;-)
--
Thanks
Jason CJ Chen
next prev parent reply other threads:[~2023-06-20 7:33 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-12 18:02 [RFC PATCH part-5 00/22] VMX emulation Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 01/22] pkvm: x86: Add memcpy lib Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 02/22] pkvm: x86: Add memory operation APIs for for host VM Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 03/22] pkvm: x86: Do guest address translation per page granularity Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 04/22] pkvm: x86: Add check for guest address translation Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 05/22] pkvm: x86: Add hypercalls for shadow_vm/vcpu init & teardown Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 06/22] KVM: VMX: Add new kvm_x86_ops vm_free Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 07/22] KVM: VMX: Add initialization/teardown for shadow vm/vcpu Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 08/22] pkvm: x86: Add hash table mapping for shadow vcpu based on vmcs12_pa Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 09/22] pkvm: x86: Add VMXON/VMXOFF emulation Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 10/22] pkvm: x86: Add has_vmcs_field() API for physical vmx capability check Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 11/22] KVM: VMX: Add more vmcs and vmcs12 fields definition Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 12/22] pkvm: x86: Init vmcs read/write bitmap for vmcs emulation Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 13/22] pkvm: x86: Initialize emulated fields " Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 14/22] pkvm: x86: Add msr ops for pKVM hypervisor Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 15/22] pkvm: x86: Move _init_host_state_area to " Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 16/22] pkvm: x86: Add vmcs_load/clear_track APIs Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 17/22] pkvm: x86: Add VMPTRLD/VMCLEAR emulation Jason Chen CJ
2023-03-12 18:02 ` [RFC PATCH part-5 18/22] pkvm: x86: Add VMREAD/VMWRITE emulation Jason Chen CJ
2023-03-12 18:03 ` [RFC PATCH part-5 19/22] pkvm: x86: Add VMLAUNCH/VMRESUME emulation Jason Chen CJ
2023-03-12 18:03 ` [RFC PATCH part-5 20/22] pkvm: x86: Add INVEPT/INVVPID emulation Jason Chen CJ
2023-03-12 18:03 ` [RFC PATCH part-5 21/22] pkvm: x86: Initialize msr_bitmap for vmsr Jason Chen CJ
2023-03-12 18:03 ` [RFC PATCH part-5 22/22] pkvm: x86: Add vmx msr emulation Jason Chen CJ
2023-03-13 16:58 ` [RFC PATCH part-5 00/22] VMX emulation Sean Christopherson
2023-03-14 16:29 ` Jason Chen CJ
2023-06-08 21:38 ` Dmytro Maluka
2023-06-09 2:07 ` Chen, Jason CJ
2023-06-09 8:34 ` Dmytro Maluka
2023-06-13 19:50 ` Sean Christopherson
2023-06-15 18:07 ` Dmytro Maluka
2023-06-20 15:46 ` Jason Chen CJ [this message]
2023-09-05 9:47 ` Jason Chen CJ
2023-06-15 3:59 ` Chen, Jason CJ
2023-06-15 21:13 ` Nadav Amit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZJHJzU607QOYeRM3@jiechen-ubuntu-dev \
--to=jason.cj.chen@intel.com \
--cc=android-kvm@google.com \
--cc=dmy@semihalf.com \
--cc=dtor@chromium.org \
--cc=jaz@semihalf.com \
--cc=keirf@google.com \
--cc=kvm@vger.kernel.org \
--cc=seanjc@google.com \
--cc=tn@semihalf.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox