From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Jim Mattson <jmattson@google.com>
Subject: [PATCH v2 2/2] KVM: nVMX: Don't use vmcs01.GUEST_CR3 to snapshot L1's CR3 when EPT is disabled
Date: Fri, 12 Jun 2026 07:56:42 -0700 [thread overview]
Message-ID: <20260612145642.452392-3-seanjc@google.com> (raw)
In-Reply-To: <20260612145642.452392-1-seanjc@google.com>
Add a dedicated field in "struct nested_vmx" to track L1's pre-VM-Enter CR3
instead of using vmcs01.GUEST_CR3, which isn't anywhere near as safe as the
comment purports it to be. E.g. in addition to the warn_on_missed_cc bug
(that was fixed by relocating the consistency check), if getting vmcs12
pages (during actual nested VM-Entry) fails and EPT is disabled (in KVM),
KVM will return control to userspace with vmcs01.GUEST_CR3 holding a guest-
controlled value.
Alternatively, KVM could force a reload of vmcs01.GUEST_CR3 by resetting
the MMU context in the error path, but as above, the safety of the vmcs01
approach is extremely questionable, e.g. it took all of ~4 months for the
code to break.
Fixes: 671ddc700fd0 ("KVM: nVMX: Don't leak L1 MMIO regions to L2")
Cc: stable@vger.kernel.org
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/nested.c | 21 ++++++++-------------
arch/x86/kvm/vmx/vmx.h | 7 +++++++
2 files changed, 15 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 199b866072c0..7a2251061bfa 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3669,19 +3669,14 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
&vmx->nested.pre_vmenter_ssp_tbl);
/*
- * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled. In the
- * event of a "late" VM-Fail, i.e. a VM-Fail detected by hardware but
- * not KVM, KVM must unwind its software model to the pre-VM-Entry host
- * state. When EPT is disabled, GUEST_CR3 holds KVM's shadow CR3, not
- * L1's "real" CR3, which causes nested_vmx_restore_host_state() to
- * corrupt vcpu->arch.cr3. Stuffing vmcs01.GUEST_CR3 results in the
- * unwind naturally setting arch.cr3 to the correct value. Smashing
- * vmcs01.GUEST_CR3 is safe because nested VM-Exits, and the unwind,
- * reset KVM's MMU, i.e. vmcs01.GUEST_CR3 is guaranteed to be
- * overwritten with a shadow CR3 prior to re-entering L1.
+ * Stash L1's CR3, so that in the event of a "late" VM-Fail, i.e. a
+ * VM-Fail detected by hardware but not KVM, KVM can unwind its
+ * software model to the pre-VM-Entry host state. When EPT is
+ * disabled, GUEST_CR3 holds KVM's shadow CR3, not L1's "real" CR3,
+ * and so simply restoring from vmcs01.GUEST_CR3 would corrupt
+ * vcpu->arch.cr3.
*/
- if (!enable_ept)
- vmcs_writel(GUEST_CR3, vcpu->arch.cr3);
+ vmx->nested.pre_vmenter_cr3 = kvm_read_cr3(vcpu);
vmx_switch_vmcs(vcpu, &vmx->nested.vmcs02);
@@ -4993,7 +4988,7 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
vmx_set_cr4(vcpu, vmcs_readl(CR4_READ_SHADOW));
nested_ept_uninit_mmu_context(vcpu);
- vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
+ vcpu->arch.cr3 = vmx->nested.pre_vmenter_cr3;
kvm_register_mark_available(vcpu, VCPU_REG_CR3);
/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index de9de0d2016c..dc8517f15bc4 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -159,6 +159,13 @@ struct nested_vmx {
bool has_preemption_timer_deadline;
bool preemption_timer_expired;
+ /*
+ * Used to restore L1's CR3 if hardware detects a VM-Fail Consistency
+ * Check that KVM does not, in which case KVM needs to unwind CR3 back
+ * to its pre-VM-Enter state, NOT to vmcs01.HOST_CR3.
+ */
+ unsigned long pre_vmenter_cr3;
+
/*
* Used to snapshot MSRs that are conditionally loaded on VM-Enter in
* order to propagate the guest's pre-VM-Enter value into vmcs02. For
--
2.54.0.1136.gdb2ca164c4-goog
next prev parent reply other threads:[~2026-06-12 14:56 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-12 14:56 [PATCH v2 0/2] KVM: nVMX: Fix ept=n bugs where KVM runs L2 with guest CR3 Sean Christopherson
2026-06-12 14:56 ` [PATCH v2 1/2] KVM: nVMX: Move vTPR vs. TPR Threshold consistency check into "normal" checks Sean Christopherson
2026-06-12 14:56 ` Sean Christopherson [this message]
2026-06-12 15:16 ` [PATCH v2 2/2] KVM: nVMX: Don't use vmcs01.GUEST_CR3 to snapshot L1's CR3 when EPT is disabled sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260612145642.452392-3-seanjc@google.com \
--to=seanjc@google.com \
--cc=jmattson@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox