From: Konstantin Khorenko <khorenko@virtuozzo.com>
To: Sean Christopherson <seanjc@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org
Cc: Thomas Gleixner <tglx@kernel.org>, Ingo Molnar <mingo@redhat.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
"H . Peter Anvin" <hpa@zytor.com>,
x86@kernel.org, linux-kernel@vger.kernel.org,
Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Subject: [RFC PATCH 1/1] KVM: VMX: restore host CR2 after VM exit
Date: Wed, 22 Apr 2026 19:50:00 +0200 [thread overview]
Message-ID: <20260422175000.1544258-2-khorenko@virtuozzo.com> (raw)
In-Reply-To: <20260422175000.1544258-1-khorenko@virtuozzo.com>
On Intel VMX, CR2 is not part of the VMCS guest/host state area. The
CPU does not save or restore it automatically across VM transitions,
so KVM manages it in software: before VM entry it writes vcpu->arch.cr2
into the hardware register if it differs from the current value, and
after VM exit it reads the hardware register back into vcpu->arch.cr2.
The host CR2 is intentionally left clobbered by the guest after VM
exit, as an optimization: the expectation is that the next host page
fault will overwrite it before anything else looks at it.
That expectation is fragile. The rest of the kernel treats CR2 as an
invariant.
- exc_page_fault() reads it at the very start of #PF handling, before
any instruction could have updated it.
- __show_regs() reads and prints it from die()/oops/crash paths.
Any flow that reaches a #PF handler, or that reads CR2 in an oops or
crash context, without the CPU having just taken a real host #PF, will
observe the guest's CR2 instead of the host's.
On nested setups the stale guest CR2 left in the hardware register
has the form of a kernel virtual address in the inner guest's address
space, which overlaps 1:1 with the outer-guest kernel layout. That
makes the stale value visually indistinguishable from a plausible
outer-guest fault address, which can lead to confusing oops reports
whose CR2 has no relation to the reported faulting RIP.
Fix: save the host CR2 before VM entry into a local variable. After
VM exit, compare the already-read vcpu->arch.cr2 against the saved
host value, and write the host CR2 back if the guest modified it.
In the common case where the guest did not touch CR2 this is a single
register compare with no write; the restore is placed under unlikely()
because most VM-entry/exit cycles do not involve a guest CR2 write.
The change stays within the existing noinstr region;
native_read_cr2()/native_write_cr2() are plain inline asm with no
instrumentation.
This brings VMX in line with the CR2 invariant the rest of the kernel
already relies on.
AMD SVM is not affected. On SVM, CR2 is part of the VMCB save area
and the CPU saves and restores host and guest CR2 automatically on
VMRUN and #VMEXIT. KVM's SVM code only accesses svm->vmcb->save.cr2
and never touches the hardware CR2 register.
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
---
arch/x86/kvm/vmx/vmx.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a29896a9ef145..dd441b90dfd4a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7458,6 +7458,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
unsigned int flags)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ unsigned long host_cr2;
guest_state_enter_irqoff();
@@ -7465,13 +7466,25 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
vmx_disable_fb_clear(vmx);
- if (vcpu->arch.cr2 != native_read_cr2())
+ host_cr2 = native_read_cr2();
+ if (vcpu->arch.cr2 != host_cr2)
native_write_cr2(vcpu->arch.cr2);
vmx->fail = __vmx_vcpu_run(vmx, (unsigned long *)&vcpu->arch.regs,
flags);
vcpu->arch.cr2 = native_read_cr2();
+
+ /*
+ * Restore host CR2 if the guest modified it. The rest of the
+ * kernel relies on CR2 holding the address of the last host
+ * #PF; leaving the guest value there can mislead any code path
+ * that reads CR2 without the CPU having just taken a real host
+ * #PF (exc_page_fault(), __show_regs() from oops/crash paths,
+ * NMI/MCE report, nested-virt corner cases, etc.).
+ */
+ if (unlikely(vcpu->arch.cr2 != host_cr2))
+ native_write_cr2(host_cr2);
vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET;
vmx->idt_vectoring_info = 0;
--
2.43.0
next prev parent reply other threads:[~2026-04-22 17:50 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-22 17:49 [RFC PATCH 0/1] KVM: VMX: restore host CR2 after VM exit Konstantin Khorenko
2026-04-22 17:50 ` Konstantin Khorenko [this message]
2026-04-22 18:56 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260422175000.1544258-2-khorenko@virtuozzo.com \
--to=khorenko@virtuozzo.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=ptikhomirov@virtuozzo.com \
--cc=seanjc@google.com \
--cc=tglx@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox