From: "Radim Krčmář" <rkrcmar@redhat.com>
To: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, kvm@vger.kernel.org
Subject: Re: [PATCH v2] KVM: nVMX: Stash L1's CR3 in vmcs01.GUEST_CR3 on nested entry w/o EPT
Date: Mon, 10 Jun 2019 20:01:01 +0200 [thread overview]
Message-ID: <20190610180101.GB6604@flask> (raw)
In-Reply-To: <20190607185534.24368-1-sean.j.christopherson@intel.com>
2019-06-07 11:55-0700, Sean Christopherson:
> KVM does not have 100% coverage of VMX consistency checks, i.e. some
> checks that cause VM-Fail may only be detected by hardware during a
> nested VM-Entry. In such a case, KVM must restore L1's state to the
> pre-VM-Enter state as L2's state has already been loaded into KVM's
> software model.
>
> L1's CR3 and PDPTRs in particular are loaded from vmcs01.GUEST_*. But
> when EPT is disabled, the associated fields hold KVM's shadow values,
> not L1's "real" values. Fortunately, when EPT is disabled the PDPTRs
> come from memory, i.e. are not cached in the VMCS. Which leaves CR3
> as the sole anomaly.
>
> A previously applied workaround to handle CR3 was to force nested early
> checks if EPT is disabled:
>
> commit 2b27924bb1d48 ("KVM: nVMX: always use early vmcs check when EPT
> is disabled")
>
> Forcing nested early checks is undesirable as doing so adds hundreds of
> cycles to every nested VM-Entry. Rather than take this performance hit,
> handle CR3 by overwriting vmcs01.GUEST_CR3 with L1's CR3 during nested
> VM-Entry when EPT is disabled *and* nested early checks are disabled.
> By stuffing vmcs01.GUEST_CR3, nested_vmx_restore_host_state() will
> naturally restore the correct vcpu->arch.cr3 from vmcs01.GUEST_CR3.
>
> These shenanigans work because nested_vmx_restore_host_state() does a
> full kvm_mmu_reset_context(), i.e. unloads the current MMU, which
> guarantees vmcs01.GUEST_CR3 will be rewritten with a new shadow CR3
> prior to re-entering L1.
>
> vcpu->arch.root_mmu.root_hpa is set to INVALID_PAGE via:
>
> nested_vmx_restore_host_state() ->
> kvm_mmu_reset_context() ->
> kvm_mmu_unload() ->
> kvm_mmu_free_roots()
>
> kvm_mmu_unload() has WARN_ON(root_hpa != INVALID_PAGE), i.e. we can bank
> on 'root_hpa == INVALID_PAGE' unless the implementation of
> kvm_mmu_reset_context() is changed.
>
> On the way into L1, VMCS.GUEST_CR3 is guaranteed to be written (on a
> successful entry) via:
>
> vcpu_enter_guest() ->
> kvm_mmu_reload() ->
> kvm_mmu_load() ->
> kvm_mmu_load_cr3() ->
> vmx_set_cr3()
>
> Stuff vmcs01.GUEST_CR3 if and only if nested early checks are disabled
> as a "late" VM-Fail should never happen win that case (KVM WARNs), and
> the conditional write avoids the need to restore the correct GUEST_CR3
> when nested_vmx_check_vmentry_hw() fails.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
Surprisingly robust, well done.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
next prev parent reply other threads:[~2019-06-10 18:01 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-07 18:55 [PATCH v2] KVM: nVMX: Stash L1's CR3 in vmcs01.GUEST_CR3 on nested entry w/o EPT Sean Christopherson
2019-06-10 18:01 ` Radim Krčmář [this message]
2019-07-04 13:17 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190610180101.GB6604@flask \
--to=rkrcmar@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=sean.j.christopherson@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox