Hi all, Following Greg's suggestion to turn the proposed fix into a real patch, here is a minimal fix for the vmcb12->save.rip TOCTOU race in KVM's nested SVM implementation. Background ---------- The CVE-2021-29657 fix introduced nested_copy_vmcb_save_to_cache() to snapshot vmcb12 fields before validation and use, preventing a racing L1 vCPU from modifying vmcb12 between check and use. However, the save area cache deliberately excluded rip, rsp, and rax -- only efer, cr0, cr3, cr4, dr6, and dr7 are snapshotted. As a result, vmcb12->save.rip is still read three separate times from the live guest-mapped HVA pointer during a single nested VMRUN: 1) enter_svm_guest_mode() passes vmcb12->save.rip to nested_vmcb02_prepare_control(), where it is stored in svm->soft_int_old_rip, svm->soft_int_next_rip, and vmcb02->control.next_rip 2) nested_vmcb02_prepare_save() calls kvm_rip_write(vcpu, vmcb12->save.rip), setting the KVM-internal vCPU register state 3) nested_vmcb02_prepare_save() then does vmcb02->save.rip = vmcb12->save.rip, setting the hardware VMCB02 save area Since vmcb12 is mapped via kvm_vcpu_map() as a direct HVA into guest physical memory with no write protection, a concurrent L1 vCPU can modify vmcb12->save.rip between these reads, producing a three-way RIP inconsistency. This is the save-area analog of CVE-2021-29657. The inconsistency is particularly dangerous when combined with soft interrupt injection (event_inj with TYPE_SOFT): KVM records soft_int_old_rip from read #1 but the vCPU state and hardware VMCB reflect reads #2 and #3 respectively. If interrupt delivery faults, svm_complete_interrupts() uses the stale soft_int_old_rip to reconstruct pre-injection state, which no longer matches reality. I am aware of Yosry Ahmed's larger patch series (v3-v6) that reworks the entire vmcb12 caching architecture and would subsume this fix. However, that series is still under review and has not yet been merged. This patch is a minimal, self-contained fix that can be applied immediately to close the TOCTOU window on rip, rsp, and rax. Fix --- Add rip, rsp, and rax to struct vmcb_save_area_cached, snapshot them in __nested_copy_vmcb_save_to_cache(), and replace all direct reads of vmcb12->save.{rip,rsp,rax} with reads from the cached copy. This ensures all consumers within a single nested VMRUN see consistent register values. Testing ------- Tested on AMD Ryzen 7 7800X3D with nested virtualization enabled (kvm_amd nested=1). A userspace race harness demonstrated a 25.6% hit rate for concurrent modification of the rip field between reads across 1M iterations, with 3-way splits (all three reads returning different values) confirmed. With the patch applied, all three consumption points see the same snapshotted value regardless of concurrent modification. The original discussion that led to this patch inadvertently went to a public list. KVM maintainers were not CC'd on the follow-up; this submission corrects that. Seungil Jeon (1): KVM: nSVM: Snapshot vmcb12 save.rip to prevent TOCTOU race arch/x86/kvm/svm/nested.c | 22 +++++++++++----------- arch/x86/kvm/svm/svm.h | 3 +++ 2 files changed, 14 insertions(+), 11 deletions(-) -- 2.43.0