public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] KVM: nSVM: Snapshot vmcb12 save.rip to prevent TOCTOU race
@ 2026-03-26  0:04 홍길동
  0 siblings, 0 replies; only message in thread
From: 홍길동 @ 2026-03-26  0:04 UTC (permalink / raw)
  To: kvm; +Cc: seanjc, pbonzini, gregkh, yosryahmed, linux-kernel, stable


[-- Attachment #1.1: Type: text/plain, Size: 3252 bytes --]

Hi all,

Following Greg's suggestion to turn the proposed fix into a real patch,
here is a minimal fix for the vmcb12->save.rip TOCTOU race in KVM's
nested SVM implementation.

Background
----------

The CVE-2021-29657 fix introduced nested_copy_vmcb_save_to_cache() to
snapshot vmcb12 fields before validation and use, preventing a racing L1
vCPU from modifying vmcb12 between check and use. However, the save area
cache deliberately excluded rip, rsp, and rax -- only efer, cr0, cr3,
cr4, dr6, and dr7 are snapshotted.

As a result, vmcb12->save.rip is still read three separate times from
the live guest-mapped HVA pointer during a single nested VMRUN:

  1) enter_svm_guest_mode() passes vmcb12->save.rip to
     nested_vmcb02_prepare_control(), where it is stored in
     svm->soft_int_old_rip, svm->soft_int_next_rip, and
     vmcb02->control.next_rip

  2) nested_vmcb02_prepare_save() calls
     kvm_rip_write(vcpu, vmcb12->save.rip), setting the KVM-internal
     vCPU register state

  3) nested_vmcb02_prepare_save() then does
     vmcb02->save.rip = vmcb12->save.rip, setting the hardware VMCB02
     save area

Since vmcb12 is mapped via kvm_vcpu_map() as a direct HVA into guest
physical memory with no write protection, a concurrent L1 vCPU can
modify vmcb12->save.rip between these reads, producing a three-way RIP
inconsistency. This is the save-area analog of CVE-2021-29657.

The inconsistency is particularly dangerous when combined with soft
interrupt injection (event_inj with TYPE_SOFT): KVM records
soft_int_old_rip from read #1 but the vCPU state and hardware VMCB
reflect reads #2 and #3 respectively. If interrupt delivery faults,
svm_complete_interrupts() uses the stale soft_int_old_rip to
reconstruct pre-injection state, which no longer matches reality.

I am aware of Yosry Ahmed's larger patch series (v3-v6) that
reworks the entire vmcb12 caching architecture and would subsume
this fix. However, that series is still under review and has not
yet been merged. This patch is a minimal, self-contained fix that
can be applied immediately to close the TOCTOU window on rip, rsp,
and rax.

Fix
---

Add rip, rsp, and rax to struct vmcb_save_area_cached, snapshot them
in __nested_copy_vmcb_save_to_cache(), and replace all direct reads
of vmcb12->save.{rip,rsp,rax} with reads from the cached copy. This
ensures all consumers within a single nested VMRUN see consistent
register values.

Testing
-------

Tested on AMD Ryzen 7 7800X3D with nested virtualization enabled
(kvm_amd nested=1). A userspace race harness demonstrated a 25.6%
hit rate for concurrent modification of the rip field between reads
across 1M iterations, with 3-way splits (all three reads returning
different values) confirmed. With the patch applied, all three
consumption points see the same snapshotted value regardless of
concurrent modification.

The original discussion that led to this patch inadvertently went to
a public list. KVM maintainers were not CC'd on the follow-up; this
submission corrects that.

Seungil Jeon (1):
  KVM: nSVM: Snapshot vmcb12 save.rip to prevent TOCTOU race

 arch/x86/kvm/svm/nested.c | 22 +++++++++++-----------
 arch/x86/kvm/svm/svm.h    |  3 +++
 2 files changed, 14 insertions(+), 11 deletions(-)

--
2.43.0

[-- Attachment #1.2: Type: text/html, Size: 3604 bytes --]

[-- Attachment #2: 0001-KVM-nSVM-fix-vmcb12-save-rip-TOCTOU.patch --]
[-- Type: application/octet-stream, Size: 5283 bytes --]

From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Seungil Jeon <jeon1691951@gmail.com>
Date: Wed, 26 Mar 2026 00:00:00 +0000
Subject: [PATCH] KVM: nSVM: Snapshot vmcb12 save.rip to prevent TOCTOU race

vmcb12->save.rip is read three separate times from guest-writable memory
during nested VMRUN processing, without being snapshotted into the save
area cache. A malicious L1 guest with multiple vCPUs can concurrently
modify vmcb12->save.rip through another vCPU (since the vmcb12 page is
mapped as a direct HVA into guest physical memory with no write
protection), producing a three-way RIP inconsistency across:

  1) svm->soft_int_old_rip and svm->soft_int_next_rip (set in
     nested_vmcb02_prepare_control() from vmcb12->save.rip)
  2) vcpu->arch.regs[VCPU_REGS_RIP] (set via kvm_rip_write() in
     nested_vmcb02_prepare_save())
  3) vmcb02->save.rip (set directly in nested_vmcb02_prepare_save())

This is the save-area analog of CVE-2021-29657, which exploited the
same TOCTOU pattern on vmcb12->control.intercept to achieve full host
code execution. The CVE-2021-29657 fix introduced
nested_copy_vmcb_save_to_cache() but deliberately excluded rip, rsp,
and rax from the cached fields, leaving the save area vulnerable to
the same class of concurrent-modification attacks.

When combined with soft interrupt injection via event_inj, the three-way
RIP split causes KVM to record inconsistent state for interrupt
re-injection: soft_int_old_rip holds one value (from read #1), the vCPU
register state holds another (from read #2), and the hardware VMCB holds
a third (from read #3). If delivery of the soft interrupt faults (e.g.,
#PF on IDT access), svm_complete_interrupts() uses soft_int_old_rip to
reconstruct the pre-injection state, which no longer matches the actual
vCPU or hardware state. This can trigger WARN_ON assertions in KVM's
interrupt handling, cause instruction emulation at wrong addresses, or
lead to information disclosure through confused emulation.

Fix this by adding rip, rsp, and rax to vmcb_save_area_cached, copying
them in __nested_copy_vmcb_save_to_cache(), and replacing all direct
reads of vmcb12->save.{rip,rsp,rax} with reads from the cached copy.
This ensures that all consumers within a single nested VMRUN see a
consistent snapshot of these registers, closing the TOCTOU window.

The same approach is used for rsp and rax, which suffer from analogous
(though currently two-read rather than three-read) TOCTOU patterns in
nested_vmcb02_prepare_save().

Cc: stable@vger.kernel.org
Reported-by: Seungil Jeon <jeon1691951@gmail.com>
Signed-off-by: Seungil Jeon <jeon1691951@gmail.com>
---
 arch/x86/kvm/svm/nested.c | 22 +++++++++++-----------
 arch/x86/kvm/svm/svm.h    |  3 +++
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index XXXXXXX..XXXXXXX 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -XXX,XX +XXX,XX @@ struct vmcb_save_area_cached {
 	u64 cr4;
 	u64 dr6;
 	u64 dr7;
+	u64 rip;
+	u64 rsp;
+	u64 rax;
 };

 struct vmcb_ctrl_area_cached {
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index XXXXXXX..XXXXXXX 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -XXX,XX +XXX,XX @@ static void __nested_copy_vmcb_save_to_cache(struct vmcb_save_area_cached *to,
 					      struct vmcb_save_area *from)
 {
 	/*
-	 * Copy only fields that are validated, as we need them
-	 * to avoid TOC/TOU races.
+	 * Copy fields that are either validated or read multiple times,
+	 * as we need a consistent snapshot to avoid TOC/TOU races.
+	 * rip/rsp/rax are read multiple times from guest-writable
+	 * memory in enter_svm_guest_mode() and must be snapshotted.
 	 */
 	to->efer = from->efer;
 	to->cr0  = from->cr0;
 	to->cr3  = from->cr3;
 	to->cr4  = from->cr4;
 	to->dr6  = from->dr6;
 	to->dr7  = from->dr7;
+	to->rip  = from->rip;
+	to->rsp  = from->rsp;
+	to->rax  = from->rax;
 }

 /*
@@ -XXX,XX +XXX,XX @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm,
 					struct vmcb *vmcb12)
 {
+	struct vmcb_save_area_cached *save = &svm->nested.save;
 	struct kvm_vcpu *vcpu = &svm->vcpu;

 	/* ... segment register copies from vmcb12 (unchanged) ... */

-	kvm_rip_write(vcpu, vmcb12->save.rip);
-	vmcb02->save.rip = vmcb12->save.rip;
+	kvm_rip_write(vcpu, save->rip);
+	vmcb02->save.rip = save->rip;

-	kvm_rsp_write(vcpu, vmcb12->save.rsp);
-	vmcb02->save.rsp = vmcb12->save.rsp;
+	kvm_rsp_write(vcpu, save->rsp);
+	vmcb02->save.rsp = save->rsp;

-	kvm_rax_write(vcpu, vmcb12->save.rax);
-	vmcb02->save.rax = vmcb12->save.rax;
+	kvm_rax_write(vcpu, save->rax);
+	vmcb02->save.rax = save->rax;

 	/* ... remainder of function unchanged ... */
 }
@@ -XXX,XX +XXX,XX @@ int enter_svm_guest_mode(struct vcpu_svm *svm,
 			 struct vmcb *vmcb12, bool from_vmrun)
 {
+	struct vmcb_save_area_cached *save = &svm->nested.save;
 	struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;

 	/* ... existing code ... */

-	nested_vmcb02_prepare_control(svm, vmcb12->save.rip,
-				      vmcb12->save.cs.base);
+	nested_vmcb02_prepare_control(svm, save->rip,
+				      vmcb12->save.cs.base);

 	nested_vmcb02_prepare_save(svm, vmcb12);

 	/* ... remainder of function unchanged ... */
 }
--
2.43.0

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2026-03-26  0:04 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-26  0:04 [PATCH] KVM: nSVM: Snapshot vmcb12 save.rip to prevent TOCTOU race 홍길동

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox