From: "Radim Krčmář" <rkrcmar@redhat.com>
To: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, kvm@vger.kernel.org
Subject: Re: [PATCH v2] KVM: nVMX: Stash L1's CR3 in vmcs01.GUEST_CR3 on nested entry w/o EPT
Date: Mon, 10 Jun 2019 20:01:01 +0200 [thread overview]
Message-ID: <20190610180101.GB6604@flask> (raw)
In-Reply-To: <20190607185534.24368-1-sean.j.christopherson@intel.com>
2019-06-07 11:55-0700, Sean Christopherson:
> KVM does not have 100% coverage of VMX consistency checks, i.e. some
> checks that cause VM-Fail may only be detected by hardware during a
> nested VM-Entry. In such a case, KVM must restore L1's state to the
> pre-VM-Enter state as L2's state has already been loaded into KVM's
> software model.
>
> L1's CR3 and PDPTRs in particular are loaded from vmcs01.GUEST_*. But
> when EPT is disabled, the associated fields hold KVM's shadow values,
> not L1's "real" values. Fortunately, when EPT is disabled the PDPTRs
> come from memory, i.e. are not cached in the VMCS. Which leaves CR3
> as the sole anomaly.
>
> A previously applied workaround to handle CR3 was to force nested early
> checks if EPT is disabled:
>
> commit 2b27924bb1d48 ("KVM: nVMX: always use early vmcs check when EPT
> is disabled")
>
> Forcing nested early checks is undesirable as doing so adds hundreds of
> cycles to every nested VM-Entry. Rather than take this performance hit,
> handle CR3 by overwriting vmcs01.GUEST_CR3 with L1's CR3 during nested
> VM-Entry when EPT is disabled *and* nested early checks are disabled.
> By stuffing vmcs01.GUEST_CR3, nested_vmx_restore_host_state() will
> naturally restore the correct vcpu->arch.cr3 from vmcs01.GUEST_CR3.
>
> These shenanigans work because nested_vmx_restore_host_state() does a
> full kvm_mmu_reset_context(), i.e. unloads the current MMU, which
> guarantees vmcs01.GUEST_CR3 will be rewritten with a new shadow CR3
> prior to re-entering L1.
>
> vcpu->arch.root_mmu.root_hpa is set to INVALID_PAGE via:
>
> nested_vmx_restore_host_state() ->
> kvm_mmu_reset_context() ->
> kvm_mmu_unload() ->
> kvm_mmu_free_roots()
>
> kvm_mmu_unload() has WARN_ON(root_hpa != INVALID_PAGE), i.e. we can bank
> on 'root_hpa == INVALID_PAGE' unless the implementation of
> kvm_mmu_reset_context() is changed.
>
> On the way into L1, VMCS.GUEST_CR3 is guaranteed to be written (on a
> successful entry) via:
>
> vcpu_enter_guest() ->
> kvm_mmu_reload() ->
> kvm_mmu_load() ->
> kvm_mmu_load_cr3() ->
> vmx_set_cr3()
>
> Stuff vmcs01.GUEST_CR3 if and only if nested early checks are disabled
> as a "late" VM-Fail should never happen win that case (KVM WARNs), and
> the conditional write avoids the need to restore the correct GUEST_CR3
> when nested_vmx_check_vmentry_hw() fails.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
Surprisingly robust, well done.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
next prev parent reply other threads:[~2019-06-10 18:01 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-07 18:55 [PATCH v2] KVM: nVMX: Stash L1's CR3 in vmcs01.GUEST_CR3 on nested entry w/o EPT Sean Christopherson
2019-06-10 18:01 ` Radim Krčmář [this message]
2019-07-04 13:17 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190610180101.GB6604@flask \
--to=rkrcmar@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=sean.j.christopherson@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.