* [PATCH 00/31] Nested SVM fixes, cleanups, and hardening
@ 2026-02-24 22:33 Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 01/31] KVM: nSVM: Avoid clearing VMCB_LBR in vmcb12 Yosry Ahmed
` (31 more replies)
0 siblings, 32 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
A group of semi-related fixes, cleanups, and hardening patches for nSVM.
The series is essentially a group of related mini-series stitched
together for syntactic and semantic dependencies. The first 22 patches
(except patch 3) are all optimistically CC'd to stable as they are fixes
or refactoring leading up to bug fixes. Although I am not sure how much
of that will actually apply to stable trees.
Patches 1-3 here are v2 of the last 3 patches in the LBRV fixes series
[1]. The first 3 patches of [1] are already upstream.
Patches 4-17 are fixes for failure handling in the nested VMRUN and
#VMEXIT code paths, ending with a nice unified code path for handling
VMRUN failures as suggested by Sean. Within this block, patches 7-12 are
refactoring needed for patches 13-14.
Patches 18-22 are fixes for missing or made-up consistency checks.
Patches 23-24 are renames and cleanups.
Patches 25-30 add hardening to reading the VMCB12, caching all used
fields in the save area to prevent theoritical TOC-TOU bugs, sanitizing
used fields in the control area, and restricting accesses to the VMCB12
through guest memory.
Finally, patch 31 is a selftest for nested VMRUN and #VMEXIT failures
due to failing to map vmcb12.
v5 -> v6:
- Set VMCB_LBR dirty when setting LBR registers [Yosry].
- Fix state leakage in LBR save/restore test [Kevin Cheng].
- Do not abort nested #VMEXIT flow if mapping vmcb12 or restoring L1 CR3
fails [Sean].
- Break down the patch sanitizing control fields from vmcb12 and drop
the ASID comment change [Sean].
- Add a selftest for VMRUN and #VMEXIT with unmappable vmcb12 [Yosry].
v5: https://lore.kernel.org/kvm/20260206190851.860662-1-yosry.ahmed@linux.dev/
Yosry Ahmed (31):
KVM: nSVM: Avoid clearing VMCB_LBR in vmcb12
KVM: SVM: Switch svm_copy_lbrs() to a macro
KVM: SVM: Add missing save/restore handling of LBR MSRs
KVM: selftests: Add a test for LBR save/restore (ft. nested)
KVM: nSVM: Always inject a #GP if mapping VMCB12 fails on nested VMRUN
KVM: nSVM: Refactor checking LBRV enablement in vmcb12 into a helper
KVM: nSVM: Refactor writing vmcb12 on nested #VMEXIT as a helper
KVM: nSVM: Triple fault if mapping VMCB12 fails on nested #VMEXIT
KVM: nSVM: Triple fault if restore host CR3 fails on nested #VMEXIT
KVM: nSVM: Drop nested_vmcb_check_{save/control}() wrappers
KVM: nSVM: Call enter_guest_mode() before switching to VMCB02
KVM: nSVM: Make nested_svm_merge_msrpm() return an errno
KVM: nSVM: Call nested_svm_merge_msrpm() from enter_svm_guest_mode()
KVM: nSVM: Call nested_svm_init_mmu_context() before switching to
VMCB02
KVM: nSVM: Refactor minimal #VMEXIT handling out of
nested_svm_vmexit()
KVM: nSVM: Unify handling of VMRUN failures with proper cleanup
KVM: nSVM: Clear EVENTINJ field in VMCB12 on nested #VMEXIT
KVM: nSVM: Drop the non-architectural consistency check for NP_ENABLE
KVM: nSVM: Add missing consistency check for nCR3 validity
KVM: nSVM: Add missing consistency check for hCR0.PG and NP_ENABLE
KVM: nSVM: Add missing consistency check for EFER, CR0, CR4, and CS
KVM: nSVM: Add missing consistency check for EVENTINJ
KVM: SVM: Rename vmcb->nested_ctl to vmcb->misc_ctl
KVM: SVM: Rename vmcb->virt_ext to vmcb->misc_ctl2
KVM: nSVM: Cache all used fields from VMCB12
KVM: nSVM: Restrict mapping VMCB12 on nested VMRUN
KVM: nSVM: Use PAGE_MASK to drop lower bits of bitmap GPAs from vmcb12
KVM: nSVM: Sanitize TLB_CONTROL field when copying from vmcb12
KVM: nSVM: Sanitize INT/EVENTINJ fields when copying from vmcb12
KVM: nSVM: Only copy SVM_MISC_ENABLE_NP from VMCB01's misc_ctl
KVM: selftest: Add a selftest for VMRUN/#VMEXIT with unmappable vmcb12
arch/x86/include/asm/svm.h | 20 +-
arch/x86/kvm/svm/nested.c | 570 +++++++++++-------
arch/x86/kvm/svm/sev.c | 4 +-
arch/x86/kvm/svm/svm.c | 68 ++-
arch/x86/kvm/svm/svm.h | 50 +-
arch/x86/kvm/x86.c | 3 +
tools/testing/selftests/kvm/Makefile.kvm | 2 +
.../selftests/kvm/include/x86/processor.h | 5 +
tools/testing/selftests/kvm/include/x86/svm.h | 14 +-
tools/testing/selftests/kvm/lib/x86/svm.c | 2 +-
.../kvm/x86/nested_vmsave_vmload_test.c | 16 +-
.../selftests/kvm/x86/svm_lbr_nested_state.c | 145 +++++
.../kvm/x86/svm_nested_invalid_vmcb12_gpa.c | 95 +++
13 files changed, 711 insertions(+), 283 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86/svm_lbr_nested_state.c
create mode 100644 tools/testing/selftests/kvm/x86/svm_nested_invalid_vmcb12_gpa.c
base-commit: 183bb0ce8c77b0fd1fb25874112bc8751a461e49
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v6 01/31] KVM: nSVM: Avoid clearing VMCB_LBR in vmcb12
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 02/31] KVM: SVM: Switch svm_copy_lbrs() to a macro Yosry Ahmed
` (30 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
svm_copy_lbrs() always marks VMCB_LBR dirty in the destination VMCB.
However, nested_svm_vmexit() uses it to copy LBRs to vmcb12, and
clearing clean bits in vmcb12 is not architecturally defined.
Move vmcb_mark_dirty() to callers and drop it for vmcb12.
This also facilitates incoming refactoring that does not pass the entire
VMCB to svm_copy_lbrs().
Fixes: d20c796ca370 ("KVM: x86: nSVM: implement nested LBR virtualization")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 7 +++++--
arch/x86/kvm/svm/svm.c | 2 --
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index de90b104a0dd5..a31f3be1e16ec 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -714,6 +714,7 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
} else {
svm_copy_lbrs(vmcb02, vmcb01);
}
+ vmcb_mark_dirty(vmcb02, VMCB_LBR);
svm_update_lbrv(&svm->vcpu);
}
@@ -1232,10 +1233,12 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
- (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK)))
+ (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
svm_copy_lbrs(vmcb12, vmcb02);
- else
+ } else {
svm_copy_lbrs(vmcb01, vmcb02);
+ vmcb_mark_dirty(vmcb01, VMCB_LBR);
+ }
svm_update_lbrv(vcpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8f8bc863e2143..a2452b8ec49db 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -848,8 +848,6 @@ void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
to_vmcb->save.br_to = from_vmcb->save.br_to;
to_vmcb->save.last_excp_from = from_vmcb->save.last_excp_from;
to_vmcb->save.last_excp_to = from_vmcb->save.last_excp_to;
-
- vmcb_mark_dirty(to_vmcb, VMCB_LBR);
}
static void __svm_enable_lbrv(struct kvm_vcpu *vcpu)
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 02/31] KVM: SVM: Switch svm_copy_lbrs() to a macro
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 01/31] KVM: nSVM: Avoid clearing VMCB_LBR in vmcb12 Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 03/31] KVM: SVM: Add missing save/restore handling of LBR MSRs Yosry Ahmed
` (29 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
In preparation for using svm_copy_lbrs() with 'struct vmcb_save_area'
without a containing 'struct vmcb', and later even 'struct
vmcb_save_area_cached', make it a macro.
Macros are generally not preferred compared to functions, mainly due to
type-safety. However, in this case it seems like having a simple macro
copying a few fields is better than copy-pasting the same 5 lines of
code in different places.
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 8 ++++----
arch/x86/kvm/svm/svm.c | 9 ---------
arch/x86/kvm/svm/svm.h | 10 +++++++++-
3 files changed, 13 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index a31f3be1e16ec..f7d5db0af69ac 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -709,10 +709,10 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
* Reserved bits of DEBUGCTL are ignored. Be consistent with
* svm_set_msr's definition of reserved bits.
*/
- svm_copy_lbrs(vmcb02, vmcb12);
+ svm_copy_lbrs(&vmcb02->save, &vmcb12->save);
vmcb02->save.dbgctl &= ~DEBUGCTL_RESERVED_BITS;
} else {
- svm_copy_lbrs(vmcb02, vmcb01);
+ svm_copy_lbrs(&vmcb02->save, &vmcb01->save);
}
vmcb_mark_dirty(vmcb02, VMCB_LBR);
svm_update_lbrv(&svm->vcpu);
@@ -1234,9 +1234,9 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
(svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
- svm_copy_lbrs(vmcb12, vmcb02);
+ svm_copy_lbrs(&vmcb12->save, &vmcb02->save);
} else {
- svm_copy_lbrs(vmcb01, vmcb02);
+ svm_copy_lbrs(&vmcb01->save, &vmcb02->save);
vmcb_mark_dirty(vmcb01, VMCB_LBR);
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a2452b8ec49db..f52e588317fcf 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -841,15 +841,6 @@ static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
*/
}
-void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
-{
- to_vmcb->save.dbgctl = from_vmcb->save.dbgctl;
- to_vmcb->save.br_from = from_vmcb->save.br_from;
- to_vmcb->save.br_to = from_vmcb->save.br_to;
- to_vmcb->save.last_excp_from = from_vmcb->save.last_excp_from;
- to_vmcb->save.last_excp_to = from_vmcb->save.last_excp_to;
-}
-
static void __svm_enable_lbrv(struct kvm_vcpu *vcpu)
{
to_svm(vcpu)->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index ebd7b36b1ceb9..44d767cd1d25a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -713,8 +713,16 @@ static inline void *svm_vcpu_alloc_msrpm(void)
return svm_alloc_permissions_map(MSRPM_SIZE, GFP_KERNEL_ACCOUNT);
}
+#define svm_copy_lbrs(to, from) \
+do { \
+ (to)->dbgctl = (from)->dbgctl; \
+ (to)->br_from = (from)->br_from; \
+ (to)->br_to = (from)->br_to; \
+ (to)->last_excp_from = (from)->last_excp_from; \
+ (to)->last_excp_to = (from)->last_excp_to; \
+} while (0)
+
void svm_vcpu_free_msrpm(void *msrpm);
-void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb);
void svm_enable_lbrv(struct kvm_vcpu *vcpu);
void svm_update_lbrv(struct kvm_vcpu *vcpu);
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 03/31] KVM: SVM: Add missing save/restore handling of LBR MSRs
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 01/31] KVM: nSVM: Avoid clearing VMCB_LBR in vmcb12 Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 02/31] KVM: SVM: Switch svm_copy_lbrs() to a macro Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-03-03 0:02 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 04/31] KVM: selftests: Add a test for LBR save/restore (ft. nested) Yosry Ahmed
` (28 subsequent siblings)
31 siblings, 1 reply; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable,
Jim Mattson
MSR_IA32_DEBUGCTLMSR and LBR MSRs are currently not enumerated by
KVM_GET_MSR_INDEX_LIST, and LBR MSRs cannot be set with KVM_SET_MSRS. So
save/restore is completely broken.
Fix it by adding the MSRs to msrs_to_save_base, and allowing writes to
LBR MSRs from userspace only (as they are read-only MSRs). Additionally,
to correctly restore L1's LBRs while L2 is running, make sure the LBRs
are copied from the captured VMCB01 save area in svm_copy_vmrun_state().
Fixes: 24e09cbf480a ("KVM: SVM: enable LBR virtualization")
Cc: stable@vger.kernel.org
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 3 +++
arch/x86/kvm/svm/svm.c | 24 ++++++++++++++++++++++++
arch/x86/kvm/x86.c | 3 +++
3 files changed, 30 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index f7d5db0af69ac..52d8536845927 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1100,6 +1100,9 @@ void svm_copy_vmrun_state(struct vmcb_save_area *to_save,
to_save->isst_addr = from_save->isst_addr;
to_save->ssp = from_save->ssp;
}
+
+ if (lbrv)
+ svm_copy_lbrs(to_save, from_save);
}
void svm_copy_vmloadsave_state(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f52e588317fcf..cb53174583a26 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3071,6 +3071,30 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
svm_update_lbrv(vcpu);
break;
+ case MSR_IA32_LASTBRANCHFROMIP:
+ if (!msr->host_initiated)
+ return 1;
+ svm->vmcb->save.br_from = data;
+ vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
+ break;
+ case MSR_IA32_LASTBRANCHTOIP:
+ if (!msr->host_initiated)
+ return 1;
+ svm->vmcb->save.br_to = data;
+ vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
+ break;
+ case MSR_IA32_LASTINTFROMIP:
+ if (!msr->host_initiated)
+ return 1;
+ svm->vmcb->save.last_excp_from = data;
+ vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
+ break;
+ case MSR_IA32_LASTINTTOIP:
+ if (!msr->host_initiated)
+ return 1;
+ svm->vmcb->save.last_excp_to = data;
+ vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
+ break;
case MSR_VM_HSAVE_PA:
/*
* Old kernels did not validate the value written to
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index db3f393192d94..416899b5dbe4d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -351,6 +351,9 @@ static const u32 msrs_to_save_base[] = {
MSR_IA32_U_CET, MSR_IA32_S_CET,
MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB,
+ MSR_IA32_DEBUGCTLMSR,
+ MSR_IA32_LASTBRANCHFROMIP, MSR_IA32_LASTBRANCHTOIP,
+ MSR_IA32_LASTINTFROMIP, MSR_IA32_LASTINTTOIP,
};
static const u32 msrs_to_save_pmu[] = {
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 04/31] KVM: selftests: Add a test for LBR save/restore (ft. nested)
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (2 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 03/31] KVM: SVM: Add missing save/restore handling of LBR MSRs Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 05/31] KVM: nSVM: Always inject a #GP if mapping VMCB12 fails on nested VMRUN Yosry Ahmed
` (27 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
Add a selftest exercising save/restore with usage of LBRs in both L1 and
L2, and making sure all LBRs remain intact.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../selftests/kvm/include/x86/processor.h | 5 +
.../selftests/kvm/x86/svm_lbr_nested_state.c | 145 ++++++++++++++++++
3 files changed, 151 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86/svm_lbr_nested_state.c
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index fdec90e854671..36b48e766e499 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -112,6 +112,7 @@ TEST_GEN_PROGS_x86 += x86/svm_vmcall_test
TEST_GEN_PROGS_x86 += x86/svm_int_ctl_test
TEST_GEN_PROGS_x86 += x86/svm_nested_shutdown_test
TEST_GEN_PROGS_x86 += x86/svm_nested_soft_inject_test
+TEST_GEN_PROGS_x86 += x86/svm_lbr_nested_state
TEST_GEN_PROGS_x86 += x86/tsc_scaling_sync
TEST_GEN_PROGS_x86 += x86/sync_regs_test
TEST_GEN_PROGS_x86 += x86/ucna_injection_test
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 4ebae4269e681..db0171935197d 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1360,6 +1360,11 @@ static inline bool kvm_is_ignore_msrs(void)
return get_kvm_param_bool("ignore_msrs");
}
+static inline bool kvm_is_lbrv_enabled(void)
+{
+ return !!get_kvm_amd_param_integer("lbrv");
+}
+
uint64_t *vm_get_pte(struct kvm_vm *vm, uint64_t vaddr);
uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, uint64_t a1, uint64_t a2,
diff --git a/tools/testing/selftests/kvm/x86/svm_lbr_nested_state.c b/tools/testing/selftests/kvm/x86/svm_lbr_nested_state.c
new file mode 100644
index 0000000000000..bf16abb1152e0
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/svm_lbr_nested_state.c
@@ -0,0 +1,145 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2026, Google, Inc.
+ */
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+#include "svm_util.h"
+
+
+#define L2_GUEST_STACK_SIZE 64
+
+#define DO_BRANCH() do { asm volatile("jmp 1f\n 1: nop"); } while (0)
+
+struct lbr_branch {
+ u64 from, to;
+};
+
+volatile struct lbr_branch l2_branch;
+
+#define RECORD_AND_CHECK_BRANCH(b) \
+do { \
+ wrmsr(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_LBR); \
+ DO_BRANCH(); \
+ (b)->from = rdmsr(MSR_IA32_LASTBRANCHFROMIP); \
+ (b)->to = rdmsr(MSR_IA32_LASTBRANCHTOIP); \
+ /* Disable LBR right after to avoid overriding the IPs */ \
+ wrmsr(MSR_IA32_DEBUGCTLMSR, 0); \
+ \
+ GUEST_ASSERT_NE((b)->from, 0); \
+ GUEST_ASSERT_NE((b)->to, 0); \
+} while (0)
+
+#define CHECK_BRANCH_MSRS(b) \
+do { \
+ GUEST_ASSERT_EQ((b)->from, rdmsr(MSR_IA32_LASTBRANCHFROMIP)); \
+ GUEST_ASSERT_EQ((b)->to, rdmsr(MSR_IA32_LASTBRANCHTOIP)); \
+} while (0)
+
+#define CHECK_BRANCH_VMCB(b, vmcb) \
+do { \
+ GUEST_ASSERT_EQ((b)->from, vmcb->save.br_from); \
+ GUEST_ASSERT_EQ((b)->to, vmcb->save.br_to); \
+} while (0)
+
+static void l2_guest_code(struct svm_test_data *svm)
+{
+ /* Record a branch, trigger save/restore, and make sure LBRs are intact */
+ RECORD_AND_CHECK_BRANCH(&l2_branch);
+ GUEST_SYNC(true);
+ CHECK_BRANCH_MSRS(&l2_branch);
+ vmmcall();
+}
+
+static void l1_guest_code(struct svm_test_data *svm, bool nested_lbrv)
+{
+ unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+ struct vmcb *vmcb = svm->vmcb;
+ struct lbr_branch l1_branch;
+
+ /* Record a branch, trigger save/restore, and make sure LBRs are intact */
+ RECORD_AND_CHECK_BRANCH(&l1_branch);
+ GUEST_SYNC(true);
+ CHECK_BRANCH_MSRS(&l1_branch);
+
+ /* Run L2, which will also do the same */
+ generic_svm_setup(svm, l2_guest_code,
+ &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+
+ if (nested_lbrv)
+ vmcb->control.virt_ext = LBR_CTL_ENABLE_MASK;
+ else
+ vmcb->control.virt_ext &= ~LBR_CTL_ENABLE_MASK;
+
+ run_guest(vmcb, svm->vmcb_gpa);
+ GUEST_ASSERT(svm->vmcb->control.exit_code == SVM_EXIT_VMMCALL);
+
+ /* Trigger save/restore one more time before checking, just for kicks */
+ GUEST_SYNC(true);
+
+ /*
+ * If LBR_CTL_ENABLE is set, L1 and L2 should have separate LBR MSRs, so
+ * expect L1's LBRs to remain intact and L2 LBRs to be in the VMCB.
+ * Otherwise, the MSRs are shared between L1 & L2 so expect L2's LBRs.
+ */
+ if (nested_lbrv) {
+ CHECK_BRANCH_MSRS(&l1_branch);
+ CHECK_BRANCH_VMCB(&l2_branch, vmcb);
+ } else {
+ CHECK_BRANCH_MSRS(&l2_branch);
+ }
+ GUEST_DONE();
+}
+
+void test_lbrv_nested_state(bool nested_lbrv)
+{
+ struct kvm_x86_state *state = NULL;
+ struct kvm_vcpu *vcpu;
+ vm_vaddr_t svm_gva;
+ struct kvm_vm *vm;
+ struct ucall uc;
+
+ pr_info("Testing with nested LBRV %s\n", nested_lbrv ? "enabled" : "disabled");
+
+ vm = vm_create_with_one_vcpu(&vcpu, l1_guest_code);
+ vcpu_alloc_svm(vm, &svm_gva);
+ vcpu_args_set(vcpu, 2, svm_gva, nested_lbrv);
+
+ for (;;) {
+ vcpu_run(vcpu);
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_SYNC:
+ /* Save the vCPU state and restore it in a new VM on sync */
+ pr_info("Guest triggered save/restore.\n");
+ state = vcpu_save_state(vcpu);
+ kvm_vm_release(vm);
+ vcpu = vm_recreate_with_one_vcpu(vm);
+ vcpu_load_state(vcpu, state);
+ kvm_x86_state_cleanup(state);
+ break;
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ /* NOT REACHED */
+ case UCALL_DONE:
+ goto done;
+ default:
+ TEST_FAIL("Unknown ucall %lu", uc.cmd);
+ }
+ }
+done:
+ kvm_vm_free(vm);
+}
+
+int main(int argc, char *argv[])
+{
+ TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SVM));
+ TEST_REQUIRE(kvm_is_lbrv_enabled());
+
+ test_lbrv_nested_state(/*nested_lbrv=*/false);
+ test_lbrv_nested_state(/*nested_lbrv=*/true);
+
+ return 0;
+}
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 05/31] KVM: nSVM: Always inject a #GP if mapping VMCB12 fails on nested VMRUN
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (3 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 04/31] KVM: selftests: Add a test for LBR save/restore (ft. nested) Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 06/31] KVM: nSVM: Refactor checking LBRV enablement in vmcb12 into a helper Yosry Ahmed
` (26 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
nested_svm_vmrun() currently only injects a #GP if kvm_vcpu_map() fails
with -EINVAL. But it could also fail with -EFAULT if creating a host
mapping failed. Inject a #GP in all cases, no reason to treat failure
modes differently.
Fixes: 8c5fbf1a7231 ("KVM/nSVM: Use the new mapping API for mapping guest memory")
CC: stable@vger.kernel.org
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 52d8536845927..fab0d3d5baa27 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1011,12 +1011,9 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
}
vmcb12_gpa = svm->vmcb->save.rax;
- ret = kvm_vcpu_map(vcpu, gpa_to_gfn(vmcb12_gpa), &map);
- if (ret == -EINVAL) {
+ if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmcb12_gpa), &map)) {
kvm_inject_gp(vcpu, 0);
return 1;
- } else if (ret) {
- return kvm_skip_emulated_instruction(vcpu);
}
ret = kvm_skip_emulated_instruction(vcpu);
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 06/31] KVM: nSVM: Refactor checking LBRV enablement in vmcb12 into a helper
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (4 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 05/31] KVM: nSVM: Always inject a #GP if mapping VMCB12 fails on nested VMRUN Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 07/31] KVM: nSVM: Refactor writing vmcb12 on nested #VMEXIT as " Yosry Ahmed
` (25 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
Refactor the vCPU cap and vmcb12 flag checks into a helper. The
unlikely() annotation is dropped, it's unlikely (huh) to make a
difference and the CPU will probably predict it better on its own.
CC: stable@vger.kernel.org
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index fab0d3d5baa27..d11cf4968adbe 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -639,6 +639,12 @@ void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm)
svm->nested.vmcb02.ptr->save.g_pat = svm->vmcb01.ptr->save.g_pat;
}
+static bool nested_vmcb12_has_lbrv(struct kvm_vcpu *vcpu)
+{
+ return guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
+ (to_svm(vcpu)->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK);
+}
+
static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12)
{
bool new_vmcb12 = false;
@@ -703,8 +709,7 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
vmcb_mark_dirty(vmcb02, VMCB_DR);
}
- if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
- (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
+ if (nested_vmcb12_has_lbrv(vcpu)) {
/*
* Reserved bits of DEBUGCTL are ignored. Be consistent with
* svm_set_msr's definition of reserved bits.
@@ -1232,8 +1237,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
if (!nested_exit_on_intr(svm))
kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
- if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
- (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
+ if (nested_vmcb12_has_lbrv(vcpu)) {
svm_copy_lbrs(&vmcb12->save, &vmcb02->save);
} else {
svm_copy_lbrs(&vmcb01->save, &vmcb02->save);
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 07/31] KVM: nSVM: Refactor writing vmcb12 on nested #VMEXIT as a helper
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (5 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 06/31] KVM: nSVM: Refactor checking LBRV enablement in vmcb12 into a helper Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 08/31] KVM: nSVM: Triple fault if mapping VMCB12 fails on nested #VMEXIT Yosry Ahmed
` (24 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
Move mapping vmcb12 and updating it out of nested_svm_vmexit() into a
helper, no functional change intended.
CC: stable@vger.kernel.org
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 77 ++++++++++++++++++++++-----------------
1 file changed, 44 insertions(+), 33 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index d11cf4968adbe..6364c52e6e0a8 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1123,36 +1123,20 @@ void svm_copy_vmloadsave_state(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
to_vmcb->save.sysenter_eip = from_vmcb->save.sysenter_eip;
}
-int nested_svm_vmexit(struct vcpu_svm *svm)
+static int nested_svm_vmexit_update_vmcb12(struct kvm_vcpu *vcpu)
{
- struct kvm_vcpu *vcpu = &svm->vcpu;
- struct vmcb *vmcb01 = svm->vmcb01.ptr;
+ struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;
- struct vmcb *vmcb12;
struct kvm_host_map map;
+ struct vmcb *vmcb12;
int rc;
rc = kvm_vcpu_map(vcpu, gpa_to_gfn(svm->nested.vmcb12_gpa), &map);
- if (rc) {
- if (rc == -EINVAL)
- kvm_inject_gp(vcpu, 0);
- return 1;
- }
+ if (rc)
+ return rc;
vmcb12 = map.hva;
- /* Exit Guest-Mode */
- leave_guest_mode(vcpu);
- svm->nested.vmcb12_gpa = 0;
- WARN_ON_ONCE(svm->nested.nested_run_pending);
-
- kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
-
- /* in case we halted in L2 */
- kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
-
- /* Give the current vmcb to the guest */
-
vmcb12->save.es = vmcb02->save.es;
vmcb12->save.cs = vmcb02->save.cs;
vmcb12->save.ss = vmcb02->save.ss;
@@ -1189,10 +1173,48 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
vmcb12->control.next_rip = vmcb02->control.next_rip;
+ if (nested_vmcb12_has_lbrv(vcpu))
+ svm_copy_lbrs(&vmcb12->save, &vmcb02->save);
+
vmcb12->control.int_ctl = svm->nested.ctl.int_ctl;
vmcb12->control.event_inj = svm->nested.ctl.event_inj;
vmcb12->control.event_inj_err = svm->nested.ctl.event_inj_err;
+ trace_kvm_nested_vmexit_inject(vmcb12->control.exit_code,
+ vmcb12->control.exit_info_1,
+ vmcb12->control.exit_info_2,
+ vmcb12->control.exit_int_info,
+ vmcb12->control.exit_int_info_err,
+ KVM_ISA_SVM);
+
+ kvm_vcpu_unmap(vcpu, &map);
+ return 0;
+}
+
+int nested_svm_vmexit(struct vcpu_svm *svm)
+{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct vmcb *vmcb01 = svm->vmcb01.ptr;
+ struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;
+ int rc;
+
+ rc = nested_svm_vmexit_update_vmcb12(vcpu);
+ if (rc) {
+ if (rc == -EINVAL)
+ kvm_inject_gp(vcpu, 0);
+ return 1;
+ }
+
+ /* Exit Guest-Mode */
+ leave_guest_mode(vcpu);
+ svm->nested.vmcb12_gpa = 0;
+ WARN_ON_ONCE(svm->nested.nested_run_pending);
+
+ kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
+
+ /* in case we halted in L2 */
+ kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
+
if (!kvm_pause_in_guest(vcpu->kvm)) {
vmcb01->control.pause_filter_count = vmcb02->control.pause_filter_count;
vmcb_mark_dirty(vmcb01, VMCB_INTERCEPTS);
@@ -1237,9 +1259,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
if (!nested_exit_on_intr(svm))
kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
- if (nested_vmcb12_has_lbrv(vcpu)) {
- svm_copy_lbrs(&vmcb12->save, &vmcb02->save);
- } else {
+ if (!nested_vmcb12_has_lbrv(vcpu)) {
svm_copy_lbrs(&vmcb01->save, &vmcb02->save);
vmcb_mark_dirty(vmcb01, VMCB_LBR);
}
@@ -1295,15 +1315,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
svm->vcpu.arch.dr7 = DR7_FIXED_1;
kvm_update_dr7(&svm->vcpu);
- trace_kvm_nested_vmexit_inject(vmcb12->control.exit_code,
- vmcb12->control.exit_info_1,
- vmcb12->control.exit_info_2,
- vmcb12->control.exit_int_info,
- vmcb12->control.exit_int_info_err,
- KVM_ISA_SVM);
-
- kvm_vcpu_unmap(vcpu, &map);
-
nested_svm_transition_tlb_flush(vcpu);
nested_svm_uninit_mmu_context(vcpu);
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 08/31] KVM: nSVM: Triple fault if mapping VMCB12 fails on nested #VMEXIT
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (6 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 07/31] KVM: nSVM: Refactor writing vmcb12 on nested #VMEXIT as " Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 09/31] KVM: nSVM: Triple fault if restore host CR3 " Yosry Ahmed
` (23 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
KVM currently injects a #GP and hopes for the best if mapping VMCB12
fails on nested #VMEXIT, and only if the failure mode is -EINVAL.
Mapping the VMCB12 could also fail if creating host mappings fails.
After the #GP is injected, nested_svm_vmexit() bails early, without
cleaning up (e.g. KVM_REQ_GET_NESTED_STATE_PAGES is set, is_guest_mode()
is true, etc).
Instead of optionally injecting a #GP, triple fault the guest if mapping
VMCB12 fails since KVM cannot make a sane recovery. The APM states that
a #VMEXIT will triple fault if host state is illegal or an exception
occurs while loading host state, so the behavior is not entirely made
up.
Do not return early from nested_svm_vmexit(), continue cleaning up the
vCPU state (e.g. switch back to vmcb01), to handle the failure as
gracefully as possible.
Fixes: cf74a78b229d ("KVM: SVM: Add VMEXIT handler and intercepts")
CC: stable@vger.kernel.org
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 6364c52e6e0a8..280d0fccd1971 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1198,12 +1198,8 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;
int rc;
- rc = nested_svm_vmexit_update_vmcb12(vcpu);
- if (rc) {
- if (rc == -EINVAL)
- kvm_inject_gp(vcpu, 0);
- return 1;
- }
+ if (nested_svm_vmexit_update_vmcb12(vcpu))
+ kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
/* Exit Guest-Mode */
leave_guest_mode(vcpu);
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 09/31] KVM: nSVM: Triple fault if restore host CR3 fails on nested #VMEXIT
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (7 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 08/31] KVM: nSVM: Triple fault if mapping VMCB12 fails on nested #VMEXIT Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 10/31] KVM: nSVM: Drop nested_vmcb_check_{save/control}() wrappers Yosry Ahmed
` (22 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
If loading L1's CR3 fails on a nested #VMEXIT, nested_svm_vmexit()
returns an error code that is ignored by most callers, and continues to
run L1 with corrupted state. A sane recovery is not possible in this
case, and HW behavior is to cause a shutdown. Inject a triple fault
,nstead, and do not return early from nested_svm_vmexit(). Continue
cleaning up the vCPU state (e.g. clear pending exceptions), to handle
the failure as gracefully as possible.
From the APM:
Upon #VMEXIT, the processor performs the following actions in
order to return to the host execution context:
...
if (illegal host state loaded, or exception while loading
host state)
shutdown
else
execute first host instruction following the VMRUN
Remove the return value of nested_svm_vmexit(), which is mostly
unchecked anyway.
Fixes: d82aaef9c88a ("KVM: nSVM: use nested_svm_load_cr3() on guest->host switch")
CC: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 10 +++-------
arch/x86/kvm/svm/svm.c | 11 ++---------
arch/x86/kvm/svm/svm.h | 6 +++---
3 files changed, 8 insertions(+), 19 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 280d0fccd1971..d734cd5eef5e7 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1191,12 +1191,11 @@ static int nested_svm_vmexit_update_vmcb12(struct kvm_vcpu *vcpu)
return 0;
}
-int nested_svm_vmexit(struct vcpu_svm *svm)
+void nested_svm_vmexit(struct vcpu_svm *svm)
{
struct kvm_vcpu *vcpu = &svm->vcpu;
struct vmcb *vmcb01 = svm->vmcb01.ptr;
struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;
- int rc;
if (nested_svm_vmexit_update_vmcb12(vcpu))
kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
@@ -1315,9 +1314,8 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
nested_svm_uninit_mmu_context(vcpu);
- rc = nested_svm_load_cr3(vcpu, vmcb01->save.cr3, false, true);
- if (rc)
- return 1;
+ if (nested_svm_load_cr3(vcpu, vmcb01->save.cr3, false, true))
+ kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
/*
* Drop what we picked up for L2 via svm_complete_interrupts() so it
@@ -1342,8 +1340,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
*/
if (kvm_apicv_activated(vcpu->kvm))
__kvm_vcpu_update_apicv(vcpu);
-
- return 0;
}
static void nested_svm_triple_fault(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cb53174583a26..1b31b033d79b0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2234,13 +2234,9 @@ static int emulate_svm_instr(struct kvm_vcpu *vcpu, int opcode)
[SVM_INSTR_VMSAVE] = vmsave_interception,
};
struct vcpu_svm *svm = to_svm(vcpu);
- int ret;
if (is_guest_mode(vcpu)) {
- /* Returns '1' or -errno on failure, '0' on success. */
- ret = nested_svm_simple_vmexit(svm, guest_mode_exit_codes[opcode]);
- if (ret)
- return ret;
+ nested_svm_simple_vmexit(svm, guest_mode_exit_codes[opcode]);
return 1;
}
return svm_instr_handlers[opcode](vcpu);
@@ -4796,7 +4792,6 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram)
{
struct vcpu_svm *svm = to_svm(vcpu);
struct kvm_host_map map_save;
- int ret;
if (!is_guest_mode(vcpu))
return 0;
@@ -4816,9 +4811,7 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram)
svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
- ret = nested_svm_simple_vmexit(svm, SVM_EXIT_SW);
- if (ret)
- return ret;
+ nested_svm_simple_vmexit(svm, SVM_EXIT_SW);
/*
* KVM uses VMCB01 to store L1 host state while L2 runs but
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 44d767cd1d25a..7629cb37c9302 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -793,14 +793,14 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu);
void svm_copy_vmrun_state(struct vmcb_save_area *to_save,
struct vmcb_save_area *from_save);
void svm_copy_vmloadsave_state(struct vmcb *to_vmcb, struct vmcb *from_vmcb);
-int nested_svm_vmexit(struct vcpu_svm *svm);
+void nested_svm_vmexit(struct vcpu_svm *svm);
-static inline int nested_svm_simple_vmexit(struct vcpu_svm *svm, u32 exit_code)
+static inline void nested_svm_simple_vmexit(struct vcpu_svm *svm, u32 exit_code)
{
svm->vmcb->control.exit_code = exit_code;
svm->vmcb->control.exit_info_1 = 0;
svm->vmcb->control.exit_info_2 = 0;
- return nested_svm_vmexit(svm);
+ nested_svm_vmexit(svm);
}
int nested_svm_exit_handled(struct vcpu_svm *svm);
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 10/31] KVM: nSVM: Drop nested_vmcb_check_{save/control}() wrappers
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (8 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 09/31] KVM: nSVM: Triple fault if restore host CR3 " Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 11/31] KVM: nSVM: Call enter_guest_mode() before switching to VMCB02 Yosry Ahmed
` (21 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
The wrappers provide little value and make it harder to see what KVM is
checking in the normal flow. Drop them.
Opportunistically fixup comments referring to the functions, adding '()'
to make it clear it's a reference to a function.
No functional change intended.
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 36 ++++++++++--------------------------
1 file changed, 10 insertions(+), 26 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index d734cd5eef5e7..0592690e75164 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -339,8 +339,8 @@ static bool nested_svm_check_bitmap_pa(struct kvm_vcpu *vcpu, u64 pa, u32 size)
kvm_vcpu_is_legal_gpa(vcpu, addr + size - 1);
}
-static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
- struct vmcb_ctrl_area_cached *control)
+static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
+ struct vmcb_ctrl_area_cached *control)
{
if (CC(!vmcb12_is_intercept(control, INTERCEPT_VMRUN)))
return false;
@@ -367,8 +367,8 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
}
/* Common checks that apply to both L1 and L2 state. */
-static bool __nested_vmcb_check_save(struct kvm_vcpu *vcpu,
- struct vmcb_save_area_cached *save)
+static bool nested_vmcb_check_save(struct kvm_vcpu *vcpu,
+ struct vmcb_save_area_cached *save)
{
if (CC(!(save->efer & EFER_SVME)))
return false;
@@ -402,22 +402,6 @@ static bool __nested_vmcb_check_save(struct kvm_vcpu *vcpu,
return true;
}
-static bool nested_vmcb_check_save(struct kvm_vcpu *vcpu)
-{
- struct vcpu_svm *svm = to_svm(vcpu);
- struct vmcb_save_area_cached *save = &svm->nested.save;
-
- return __nested_vmcb_check_save(vcpu, save);
-}
-
-static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu)
-{
- struct vcpu_svm *svm = to_svm(vcpu);
- struct vmcb_ctrl_area_cached *ctl = &svm->nested.ctl;
-
- return __nested_vmcb_check_controls(vcpu, ctl);
-}
-
/*
* If a feature is not advertised to L1, clear the corresponding vmcb12
* intercept.
@@ -469,7 +453,7 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
to->pause_filter_count = from->pause_filter_count;
to->pause_filter_thresh = from->pause_filter_thresh;
- /* Copy asid here because nested_vmcb_check_controls will check it. */
+ /* Copy asid here because nested_vmcb_check_controls() will check it */
to->asid = from->asid;
to->msrpm_base_pa &= ~0x0fffULL;
to->iopm_base_pa &= ~0x0fffULL;
@@ -1031,8 +1015,8 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
- if (!nested_vmcb_check_save(vcpu) ||
- !nested_vmcb_check_controls(vcpu)) {
+ if (!nested_vmcb_check_save(vcpu, &svm->nested.save) ||
+ !nested_vmcb_check_controls(vcpu, &svm->nested.ctl)) {
vmcb12->control.exit_code = SVM_EXIT_ERR;
vmcb12->control.exit_info_1 = 0;
vmcb12->control.exit_info_2 = 0;
@@ -1871,12 +1855,12 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
ret = -EINVAL;
__nested_copy_vmcb_control_to_cache(vcpu, &ctl_cached, ctl);
- if (!__nested_vmcb_check_controls(vcpu, &ctl_cached))
+ if (!nested_vmcb_check_controls(vcpu, &ctl_cached))
goto out_free;
/*
* Processor state contains L2 state. Check that it is
- * valid for guest mode (see nested_vmcb_check_save).
+ * valid for guest mode (see nested_vmcb_check_save()).
*/
cr0 = kvm_read_cr0(vcpu);
if (((cr0 & X86_CR0_CD) == 0) && (cr0 & X86_CR0_NW))
@@ -1890,7 +1874,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
if (!(save->cr0 & X86_CR0_PG) ||
!(save->cr0 & X86_CR0_PE) ||
(save->rflags & X86_EFLAGS_VM) ||
- !__nested_vmcb_check_save(vcpu, &save_cached))
+ !nested_vmcb_check_save(vcpu, &save_cached))
goto out_free;
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 11/31] KVM: nSVM: Call enter_guest_mode() before switching to VMCB02
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (9 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 10/31] KVM: nSVM: Drop nested_vmcb_check_{save/control}() wrappers Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 12/31] KVM: nSVM: Make nested_svm_merge_msrpm() return an errno Yosry Ahmed
` (20 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
In preparation for moving more changes that rely on is_guest_mode()
before switching to VMCB02, move entering guest mode a bit earlier.
Nothing between the new callsite(s) and the old ones rely on
is_guest_mode(), so this should be safe.
No functional change intended.
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 0592690e75164..5ba3f8de0a939 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -746,9 +746,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
nested_svm_transition_tlb_flush(vcpu);
- /* Enter Guest-Mode */
- enter_guest_mode(vcpu);
-
/*
* Filled at exit: exit_code, exit_info_1, exit_info_2, exit_int_info,
* exit_int_info_err, next_rip, insn_len, insn_bytes.
@@ -949,6 +946,8 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
WARN_ON(svm->vmcb == svm->nested.vmcb02.ptr);
+ enter_guest_mode(vcpu);
+
nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr);
svm_switch_vmcb(svm, &svm->nested.vmcb02);
@@ -1900,6 +1899,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
svm_copy_vmrun_state(&svm->vmcb01.ptr->save, save);
nested_copy_vmcb_control_to_cache(svm, ctl);
+ enter_guest_mode(vcpu);
svm_switch_vmcb(svm, &svm->nested.vmcb02);
nested_vmcb02_prepare_control(svm, svm->vmcb->save.rip, svm->vmcb->save.cs.base);
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 12/31] KVM: nSVM: Make nested_svm_merge_msrpm() return an errno
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (10 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 11/31] KVM: nSVM: Call enter_guest_mode() before switching to VMCB02 Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 13/31] KVM: nSVM: Call nested_svm_merge_msrpm() from enter_svm_guest_mode() Yosry Ahmed
` (19 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
In preparation for moving nested_svm_merge_msrpm() within
enter_svm_guest_mode(), which returns an errno, return an errno from
nested_svm_merge_msrpm().
No functional change intended.
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 5ba3f8de0a939..78caa75fe619a 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -276,7 +276,7 @@ int __init nested_svm_init_msrpm_merge_offsets(void)
* is optimized in that it only merges the parts where KVM MSR permission bitmap
* may contain zero bits.
*/
-static bool nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
+static int nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
nsvm_msrpm_merge_t *msrpm02 = svm->nested.msrpm;
@@ -303,17 +303,19 @@ static bool nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
#endif
if (!(vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_MSR_PROT)))
- return true;
+ return 0;
for (i = 0; i < nested_svm_nr_msrpm_merge_offsets; i++) {
const int p = nested_svm_msrpm_merge_offsets[i];
nsvm_msrpm_merge_t l1_val;
gpa_t gpa;
+ int r;
gpa = svm->nested.ctl.msrpm_base_pa + (p * sizeof(l1_val));
- if (kvm_vcpu_read_guest(vcpu, gpa, &l1_val, sizeof(l1_val)))
- return false;
+ r = kvm_vcpu_read_guest(vcpu, gpa, &l1_val, sizeof(l1_val));
+ if (r)
+ return r;
msrpm02[p] = msrpm01[p] | l1_val;
}
@@ -325,7 +327,7 @@ static bool nested_svm_merge_msrpm(struct kvm_vcpu *vcpu)
#endif
svm->vmcb->control.msrpm_base_pa = __sme_set(__pa(svm->nested.msrpm));
- return true;
+ return 0;
}
/*
@@ -1040,7 +1042,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, true))
goto out_exit_err;
- if (nested_svm_merge_msrpm(vcpu))
+ if (!nested_svm_merge_msrpm(vcpu))
goto out;
out_exit_err:
@@ -1940,7 +1942,7 @@ static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu)
if (CC(!load_pdptrs(vcpu, vcpu->arch.cr3)))
return false;
- if (!nested_svm_merge_msrpm(vcpu)) {
+ if (nested_svm_merge_msrpm(vcpu)) {
vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
vcpu->run->internal.suberror =
KVM_INTERNAL_ERROR_EMULATION;
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 13/31] KVM: nSVM: Call nested_svm_merge_msrpm() from enter_svm_guest_mode()
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (11 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 12/31] KVM: nSVM: Make nested_svm_merge_msrpm() return an errno Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 14/31] KVM: nSVM: Call nested_svm_init_mmu_context() before switching to VMCB02 Yosry Ahmed
` (18 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
In preparation for unifying the VMRUN failure code paths, move calling
nested_svm_merge_msrpm() into enter_svm_guest_mode() next to the
nested_svm_load_cr3() call (the other failure path in
enter_svm_guest_mode()).
Adding more uses of the from_vmrun parameter is not pretty, but it is
plumbed all the way to nested_svm_load_cr3() so it's not going away soon
anyway.
No functional change intended.
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 29 +++++++++++++++--------------
1 file changed, 15 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 78caa75fe619a..5276e41605d43 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -961,6 +961,12 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
if (ret)
return ret;
+ if (from_vmrun) {
+ ret = nested_svm_merge_msrpm(vcpu);
+ if (ret)
+ return ret;
+ }
+
if (!from_vmrun)
kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
@@ -1039,22 +1045,17 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
svm->nested.nested_run_pending = 1;
- if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, true))
- goto out_exit_err;
-
- if (!nested_svm_merge_msrpm(vcpu))
- goto out;
-
-out_exit_err:
- svm->nested.nested_run_pending = 0;
- svm->nmi_l1_to_l2 = false;
- svm->soft_int_injected = false;
+ if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, true)) {
+ svm->nested.nested_run_pending = 0;
+ svm->nmi_l1_to_l2 = false;
+ svm->soft_int_injected = false;
- svm->vmcb->control.exit_code = SVM_EXIT_ERR;
- svm->vmcb->control.exit_info_1 = 0;
- svm->vmcb->control.exit_info_2 = 0;
+ svm->vmcb->control.exit_code = SVM_EXIT_ERR;
+ svm->vmcb->control.exit_info_1 = 0;
+ svm->vmcb->control.exit_info_2 = 0;
- nested_svm_vmexit(svm);
+ nested_svm_vmexit(svm);
+ }
out:
kvm_vcpu_unmap(vcpu, &map);
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 14/31] KVM: nSVM: Call nested_svm_init_mmu_context() before switching to VMCB02
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (12 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 13/31] KVM: nSVM: Call nested_svm_merge_msrpm() from enter_svm_guest_mode() Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 15/31] KVM: nSVM: Refactor minimal #VMEXIT handling out of nested_svm_vmexit() Yosry Ahmed
` (17 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
In preparation for moving more code that depends on
nested_svm_init_mmu_context() before switching to VMCB02, move the call
outside of nested_vmcb02_prepare_control() into callers, a bit earlier.
nested_svm_init_mmu_context() needs to be called after
enter_guest_mode(), but not after switching to VMCB02.
No functional change intended.
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 5276e41605d43..13d1276940330 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -808,10 +808,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
/* Also overwritten later if necessary. */
vmcb02->control.tlb_ctl = TLB_CONTROL_DO_NOTHING;
- /* nested_cr3. */
- if (nested_npt_enabled(svm))
- nested_svm_init_mmu_context(vcpu);
-
vcpu->arch.tsc_offset = kvm_calc_nested_tsc_offset(
vcpu->arch.l1_tsc_offset,
svm->nested.ctl.tsc_offset,
@@ -950,6 +946,9 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
enter_guest_mode(vcpu);
+ if (nested_npt_enabled(svm))
+ nested_svm_init_mmu_context(vcpu);
+
nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr);
svm_switch_vmcb(svm, &svm->nested.vmcb02);
@@ -1903,6 +1902,10 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
nested_copy_vmcb_control_to_cache(svm, ctl);
enter_guest_mode(vcpu);
+
+ if (nested_npt_enabled(svm))
+ nested_svm_init_mmu_context(vcpu);
+
svm_switch_vmcb(svm, &svm->nested.vmcb02);
nested_vmcb02_prepare_control(svm, svm->vmcb->save.rip, svm->vmcb->save.cs.base);
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 15/31] KVM: nSVM: Refactor minimal #VMEXIT handling out of nested_svm_vmexit()
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (13 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 14/31] KVM: nSVM: Call nested_svm_init_mmu_context() before switching to VMCB02 Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 16/31] KVM: nSVM: Unify handling of VMRUN failures with proper cleanup Yosry Ahmed
` (16 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
In preparation for having a separate minimal #VMEXIT path for handling
failed VMRUNs, move the minimal logic out of nested_svm_vmexit() into a
helper.
This includes clearing the GIF, handling single-stepping on VMRUN, and a
few data structure cleanups. Basically, everything that is required by
the architecture (or KVM) on a #VMEXIT where L2 never actually ran.
Additionally move uninitializing the nested MMU and reloading host CR3
to the new helper. It is not required at this point, but following
changes will require it.
No functional change intended.
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 54 ++++++++++++++++++++++-----------------
1 file changed, 30 insertions(+), 24 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 13d1276940330..a6e8c0d26c64e 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -979,6 +979,33 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
return 0;
}
+static void __nested_svm_vmexit(struct vcpu_svm *svm)
+{
+ struct vmcb *vmcb01 = svm->vmcb01.ptr;
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+
+ svm->nested.vmcb12_gpa = 0;
+ svm->nested.ctl.nested_cr3 = 0;
+
+ /* GIF is cleared on #VMEXIT, no event can be injected in L1 */
+ svm_set_gif(svm, false);
+ vmcb01->control.exit_int_info = 0;
+
+ nested_svm_uninit_mmu_context(vcpu);
+
+ if (nested_svm_load_cr3(vcpu, vmcb01->save.cr3, false, true))
+ kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+
+ /*
+ * If we are here following the completion of a VMRUN that
+ * is being single-stepped, queue the pending #DB intercept
+ * right now so that it an be accounted for before we execute
+ * L1's next instruction.
+ */
+ if (unlikely(vmcb01->save.rflags & X86_EFLAGS_TF))
+ kvm_queue_exception(vcpu, DB_VECTOR);
+}
+
int nested_svm_vmrun(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -1187,7 +1214,6 @@ void nested_svm_vmexit(struct vcpu_svm *svm)
/* Exit Guest-Mode */
leave_guest_mode(vcpu);
- svm->nested.vmcb12_gpa = 0;
WARN_ON_ONCE(svm->nested.nested_run_pending);
kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
@@ -1260,13 +1286,6 @@ void nested_svm_vmexit(struct vcpu_svm *svm)
}
}
- /*
- * On vmexit the GIF is set to false and
- * no event can be injected in L1.
- */
- svm_set_gif(svm, false);
- vmcb01->control.exit_int_info = 0;
-
svm->vcpu.arch.tsc_offset = svm->vcpu.arch.l1_tsc_offset;
if (vmcb01->control.tsc_offset != svm->vcpu.arch.tsc_offset) {
vmcb01->control.tsc_offset = svm->vcpu.arch.tsc_offset;
@@ -1279,8 +1298,6 @@ void nested_svm_vmexit(struct vcpu_svm *svm)
svm_write_tsc_multiplier(vcpu);
}
- svm->nested.ctl.nested_cr3 = 0;
-
/*
* Restore processor state that had been saved in vmcb01
*/
@@ -1297,11 +1314,6 @@ void nested_svm_vmexit(struct vcpu_svm *svm)
nested_svm_transition_tlb_flush(vcpu);
- nested_svm_uninit_mmu_context(vcpu);
-
- if (nested_svm_load_cr3(vcpu, vmcb01->save.cr3, false, true))
- kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
-
/*
* Drop what we picked up for L2 via svm_complete_interrupts() so it
* doesn't end up in L1.
@@ -1310,21 +1322,15 @@ void nested_svm_vmexit(struct vcpu_svm *svm)
kvm_clear_exception_queue(vcpu);
kvm_clear_interrupt_queue(vcpu);
- /*
- * If we are here following the completion of a VMRUN that
- * is being single-stepped, queue the pending #DB intercept
- * right now so that it an be accounted for before we execute
- * L1's next instruction.
- */
- if (unlikely(vmcb01->save.rflags & X86_EFLAGS_TF))
- kvm_queue_exception(&(svm->vcpu), DB_VECTOR);
-
/*
* Un-inhibit the AVIC right away, so that other vCPUs can start
* to benefit from it right away.
*/
if (kvm_apicv_activated(vcpu->kvm))
__kvm_vcpu_update_apicv(vcpu);
+
+ /* After clearing exceptions as it may need to queue an exception */
+ __nested_svm_vmexit(svm);
}
static void nested_svm_triple_fault(struct kvm_vcpu *vcpu)
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 16/31] KVM: nSVM: Unify handling of VMRUN failures with proper cleanup
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (14 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 15/31] KVM: nSVM: Refactor minimal #VMEXIT handling out of nested_svm_vmexit() Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-25 22:44 ` Yosry Ahmed
2026-02-28 0:07 ` Sean Christopherson
2026-02-24 22:33 ` [PATCH v6 17/31] KVM: nSVM: Clear EVENTINJ field in VMCB12 on nested #VMEXIT Yosry Ahmed
` (15 subsequent siblings)
31 siblings, 2 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
There are currently two possible causes of VMRUN failures emulated by
KVM:
1) Consistency checks failures. In this case, KVM updates the exit code
in the mapped VMCB12 and exits early in nested_svm_vmrun(). This
causes a few problems:
A) KVM does not clear the GIF if the early consistency checks fail
(because nested_svm_vmexit() is not called). Nothing requires
GIF=0 before a VMRUN, from the APM:
It is assumed that VMM software cleared GIF some time before
executing the VMRUN instruction, to ensure an atomic state
switch.
So an early #VMEXIT from early consistency checks could leave the
GIF set.
B) svm_leave_smm() is missing consistency checks on the newly loaded
guest state, because the checks aren't performed by
enter_svm_guest_mode().
2) Failure to load L2's CR3 or merge the MSR bitmaps. In this case, a
fully-fledged #VMEXIT injection is performed as VMCB02 is already
prepared.
Arguably all VMRUN failures should be handled before the VMCB02 is
prepared, but with proper cleanup (e.g. clear the GIF). Move all the
potential failure checks inside enter_svm_guest_mode() before switching
to VMCB02. On failure of any of these checks, nested_svm_vmrun()
synthesizes a minimal #VMEXIT through the new nested_svm_failed_vmrun()
helper.
__nested_svm_vmexit() already performs the necessary cleanup for a
failed VMRUN, including uninitializing the nested MMU and reloading L1's
CR3. This ensures that consistency check failures do proper necessary
cleanup, while other failures do not doo too much cleanup. It also
leaves a unified path for handling VMRUN failures.
Cc: stable@vger.kernel.org
Fixes: 52c65a30a5c6 ("KVM: SVM: Check for nested vmrun intercept before emulating vmrun")
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 66 +++++++++++++++++++++++++--------------
1 file changed, 42 insertions(+), 24 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index a6e8c0d26c64e..2c5878cbb3940 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -939,22 +939,19 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
vmcb12->control.intercepts[INTERCEPT_WORD4],
vmcb12->control.intercepts[INTERCEPT_WORD5]);
-
svm->nested.vmcb12_gpa = vmcb12_gpa;
WARN_ON(svm->vmcb == svm->nested.vmcb02.ptr);
enter_guest_mode(vcpu);
+ if (!nested_vmcb_check_save(vcpu, &svm->nested.save) ||
+ !nested_vmcb_check_controls(vcpu, &svm->nested.ctl))
+ return -EINVAL;
+
if (nested_npt_enabled(svm))
nested_svm_init_mmu_context(vcpu);
- nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr);
-
- svm_switch_vmcb(svm, &svm->nested.vmcb02);
- nested_vmcb02_prepare_control(svm, vmcb12->save.rip, vmcb12->save.cs.base);
- nested_vmcb02_prepare_save(svm, vmcb12);
-
ret = nested_svm_load_cr3(&svm->vcpu, svm->nested.save.cr3,
nested_npt_enabled(svm), from_vmrun);
if (ret)
@@ -966,6 +963,17 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
return ret;
}
+ /*
+ * Any VMRUN failure needs to happen before this point, such that the
+ * nested #VMEXIT is injected properly by nested_svm_vmrun_error_vmexit().
+ */
+
+ nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr);
+
+ svm_switch_vmcb(svm, &svm->nested.vmcb02);
+ nested_vmcb02_prepare_control(svm, vmcb12->save.rip, vmcb12->save.cs.base);
+ nested_vmcb02_prepare_save(svm, vmcb12);
+
if (!from_vmrun)
kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
@@ -984,6 +992,8 @@ static void __nested_svm_vmexit(struct vcpu_svm *svm)
struct vmcb *vmcb01 = svm->vmcb01.ptr;
struct kvm_vcpu *vcpu = &svm->vcpu;
+ WARN_ON_ONCE(is_guest_mode(vcpu));
+
svm->nested.vmcb12_gpa = 0;
svm->nested.ctl.nested_cr3 = 0;
@@ -1006,6 +1016,20 @@ static void __nested_svm_vmexit(struct vcpu_svm *svm)
kvm_queue_exception(vcpu, DB_VECTOR);
}
+static void nested_svm_vmrun_error_vmexit(struct kvm_vcpu *vcpu, struct vmcb *vmcb12)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ WARN_ON_ONCE(svm->vmcb == svm->nested.vmcb02.ptr);
+
+ leave_guest_mode(vcpu);
+
+ vmcb12->control.exit_code = SVM_EXIT_ERR;
+ vmcb12->control.exit_info_1 = 0;
+ vmcb12->control.exit_info_2 = 0;
+ __nested_svm_vmexit(svm);
+}
+
int nested_svm_vmrun(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -1048,14 +1072,6 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
- if (!nested_vmcb_check_save(vcpu, &svm->nested.save) ||
- !nested_vmcb_check_controls(vcpu, &svm->nested.ctl)) {
- vmcb12->control.exit_code = SVM_EXIT_ERR;
- vmcb12->control.exit_info_1 = 0;
- vmcb12->control.exit_info_2 = 0;
- goto out;
- }
-
/*
* Since vmcb01 is not in use, we can use it to store some of the L1
* state.
@@ -1076,14 +1092,9 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
svm->nmi_l1_to_l2 = false;
svm->soft_int_injected = false;
- svm->vmcb->control.exit_code = SVM_EXIT_ERR;
- svm->vmcb->control.exit_info_1 = 0;
- svm->vmcb->control.exit_info_2 = 0;
-
- nested_svm_vmexit(svm);
+ nested_svm_vmrun_error_vmexit(vcpu, vmcb12);
}
-out:
kvm_vcpu_unmap(vcpu, &map);
return ret;
@@ -1241,6 +1252,13 @@ void nested_svm_vmexit(struct vcpu_svm *svm)
if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS))
vmcb01->control.erap_ctl |= ERAP_CONTROL_CLEAR_RAP;
+ /*
+ * nested_svm_vmexit() is intended for use only when KVM is synthesizing
+ * a #VMEXIT after a successful nested VMRUN. All VMRUN consistency
+ * checks must be performed before loading guest state, and so should
+ * use __nested_svm_vmexit().
+ */
+ WARN_ON_ONCE(svm->vmcb != svm->nested.vmcb02.ptr);
svm_switch_vmcb(svm, &svm->vmcb01);
/*
@@ -1912,9 +1930,6 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
if (nested_npt_enabled(svm))
nested_svm_init_mmu_context(vcpu);
- svm_switch_vmcb(svm, &svm->nested.vmcb02);
- nested_vmcb02_prepare_control(svm, svm->vmcb->save.rip, svm->vmcb->save.cs.base);
-
/*
* While the nested guest CR3 is already checked and set by
* KVM_SET_SREGS, it was set when nested state was yet loaded,
@@ -1926,6 +1941,9 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
if (ret)
goto out_free;
+ svm_switch_vmcb(svm, &svm->nested.vmcb02);
+ nested_vmcb02_prepare_control(svm, svm->vmcb->save.rip, svm->vmcb->save.cs.base);
+
svm->nested.force_msr_bitmap_recalc = true;
kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 17/31] KVM: nSVM: Clear EVENTINJ field in VMCB12 on nested #VMEXIT
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (15 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 16/31] KVM: nSVM: Unify handling of VMRUN failures with proper cleanup Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 18/31] KVM: nSVM: Drop the non-architectural consistency check for NP_ENABLE Yosry Ahmed
` (14 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
According to the APM, from the reference of the VMRUN instruction:
Upon #VMEXIT, the processor performs the following actions in
order to return to the host execution context:
...
clear EVENTINJ field in VMCB
KVM correctly cleared EVENTINJ (i.e. event_inj and event_inj_err) on
nested #VMEXIT before commit 2d8a42be0e2b ("KVM: nSVM: synchronize VMCB
controls updated by the processor on every vmexit"). That commit made
sure the fields are synchronized between VMCB02 and KVM's cached VMCB12
on every L2->L0 #VMEXIT, such that they are serialized correctly on
save/restore.
However, the commit also incorrectly copied the fields from KVM's cached
VMCB12 to L1's VMCB12 on nested #VMEXIT. This is mostly correct, as the
fields will be zeroed by the CPU, but it doesn't handle a #VMEXIT due to
an invalid VMRUN. Explicitly clear the fields all nested #VMEXIT code
paths.
Fixes: 2d8a42be0e2b ("KVM: nSVM: synchronize VMCB controls updated by the processor on every vmexit")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 2c5878cbb3940..dd95c6434403f 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1027,6 +1027,8 @@ static void nested_svm_vmrun_error_vmexit(struct kvm_vcpu *vcpu, struct vmcb *vm
vmcb12->control.exit_code = SVM_EXIT_ERR;
vmcb12->control.exit_info_1 = 0;
vmcb12->control.exit_info_2 = 0;
+ vmcb12->control.event_inj = 0;
+ vmcb12->control.event_inj_err = 0;
__nested_svm_vmexit(svm);
}
@@ -1199,9 +1201,9 @@ static int nested_svm_vmexit_update_vmcb12(struct kvm_vcpu *vcpu)
if (nested_vmcb12_has_lbrv(vcpu))
svm_copy_lbrs(&vmcb12->save, &vmcb02->save);
+ vmcb12->control.event_inj = 0;
+ vmcb12->control.event_inj_err = 0;
vmcb12->control.int_ctl = svm->nested.ctl.int_ctl;
- vmcb12->control.event_inj = svm->nested.ctl.event_inj;
- vmcb12->control.event_inj_err = svm->nested.ctl.event_inj_err;
trace_kvm_nested_vmexit_inject(vmcb12->control.exit_code,
vmcb12->control.exit_info_1,
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 18/31] KVM: nSVM: Drop the non-architectural consistency check for NP_ENABLE
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (16 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 17/31] KVM: nSVM: Clear EVENTINJ field in VMCB12 on nested #VMEXIT Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 19/31] KVM: nSVM: Add missing consistency check for nCR3 validity Yosry Ahmed
` (13 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
KVM currenty fails a nested VMRUN and injects VMEXIT_INVALID (aka
SVM_EXIT_ERR) if L1 sets NP_ENABLE and the host does not support NPTs.
On first glance, it seems like the check should actually be for
guest_cpu_cap_has(X86_FEATURE_NPT) instead, as it is possible for the
host to support NPTs but the guest CPUID to not advertise it.
However, the consistency check is not architectural to begin with. The
APM does not mention VMEXIT_INVALID if NP_ENABLE is set on a processor
that does not have X86_FEATURE_NPT. Hence, NP_ENABLE should be ignored
if X86_FEATURE_NPT is not available for L1, so sanitize it when copying
from the VMCB12 to KVM's cache.
Apart from the consistency check, NP_ENABLE in VMCB12 is currently
ignored because the bit is actually copied from VMCB01 to VMCB02, not
from VMCB12.
Fixes: 4b16184c1cca ("KVM: SVM: Initialize Nested Nested MMU context on VMRUN")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index dd95c6434403f..0f18123f5461f 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -350,9 +350,6 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
if (CC(control->asid == 0))
return false;
- if (CC((control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) && !npt_enabled))
- return false;
-
if (CC(!nested_svm_check_bitmap_pa(vcpu, control->msrpm_base_pa,
MSRPM_SIZE)))
return false;
@@ -433,6 +430,11 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
nested_svm_sanitize_intercept(vcpu, to, SKINIT);
nested_svm_sanitize_intercept(vcpu, to, RDPRU);
+ /* Always clear SVM_NESTED_CTL_NP_ENABLE if the guest cannot use NPTs */
+ to->nested_ctl = from->nested_ctl;
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NPT))
+ to->nested_ctl &= ~SVM_NESTED_CTL_NP_ENABLE;
+
to->iopm_base_pa = from->iopm_base_pa;
to->msrpm_base_pa = from->msrpm_base_pa;
to->tsc_offset = from->tsc_offset;
@@ -446,7 +448,6 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
to->exit_info_2 = from->exit_info_2;
to->exit_int_info = from->exit_int_info;
to->exit_int_info_err = from->exit_int_info_err;
- to->nested_ctl = from->nested_ctl;
to->event_inj = from->event_inj;
to->event_inj_err = from->event_inj_err;
to->next_rip = from->next_rip;
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 19/31] KVM: nSVM: Add missing consistency check for nCR3 validity
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (17 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 18/31] KVM: nSVM: Drop the non-architectural consistency check for NP_ENABLE Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 20/31] KVM: nSVM: Add missing consistency check for hCR0.PG and NP_ENABLE Yosry Ahmed
` (12 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
From the APM Volume #2, 15.25.4 (24593—Rev. 3.42—March 2024):
When VMRUN is executed with nested paging enabled
(NP_ENABLE = 1), the following conditions are considered illegal
state combinations, in addition to those mentioned in
“Canonicalization and Consistency Checks”:
• Any MBZ bit of nCR3 is set.
• Any G_PAT.PA field has an unsupported type encoding or any
reserved field in G_PAT has a nonzero value.
Add the consistency check for nCR3 being a legal GPA with no MBZ bits
set. The G_PAT.PA check was proposed separately [*].
[*]https://lore.kernel.org/kvm/20260205214326.1029278-3-jmattson@google.com/
Fixes: 4b16184c1cca ("KVM: SVM: Initialize Nested Nested MMU context on VMRUN")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 0f18123f5461f..752dd9eb98a84 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -350,6 +350,11 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
if (CC(control->asid == 0))
return false;
+ if (control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) {
+ if (CC(!kvm_vcpu_is_legal_gpa(vcpu, control->nested_cr3)))
+ return false;
+ }
+
if (CC(!nested_svm_check_bitmap_pa(vcpu, control->msrpm_base_pa,
MSRPM_SIZE)))
return false;
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 20/31] KVM: nSVM: Add missing consistency check for hCR0.PG and NP_ENABLE
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (18 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 19/31] KVM: nSVM: Add missing consistency check for nCR3 validity Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-03-03 0:00 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 21/31] KVM: nSVM: Add missing consistency check for EFER, CR0, CR4, and CS Yosry Ahmed
` (11 subsequent siblings)
31 siblings, 1 reply; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
From the APM Volume #2, 15.25.3 (24593—Rev. 3.42—March 2024):
If VMRUN is executed with hCR0.PG cleared to zero and NP_ENABLE
set to 1 , VMRUN terminates with #VMEXIT(VMEXIT_INVALID).
Add the consistency check by plumbing L1's CR0 to
nested_vmcb_check_controls().
Fixes: 4b16184c1cca ("KVM: SVM: Initialize Nested Nested MMU context on VMRUN")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 752dd9eb98a84..6fffb6ae6b88b 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -342,7 +342,8 @@ static bool nested_svm_check_bitmap_pa(struct kvm_vcpu *vcpu, u64 pa, u32 size)
}
static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
- struct vmcb_ctrl_area_cached *control)
+ struct vmcb_ctrl_area_cached *control,
+ unsigned long l1_cr0)
{
if (CC(!vmcb12_is_intercept(control, INTERCEPT_VMRUN)))
return false;
@@ -353,6 +354,8 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
if (control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) {
if (CC(!kvm_vcpu_is_legal_gpa(vcpu, control->nested_cr3)))
return false;
+ if (CC(!(l1_cr0 & X86_CR0_PG)))
+ return false;
}
if (CC(!nested_svm_check_bitmap_pa(vcpu, control->msrpm_base_pa,
@@ -952,7 +955,8 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
enter_guest_mode(vcpu);
if (!nested_vmcb_check_save(vcpu, &svm->nested.save) ||
- !nested_vmcb_check_controls(vcpu, &svm->nested.ctl))
+ !nested_vmcb_check_controls(vcpu, &svm->nested.ctl,
+ svm->vmcb01.ptr->save.cr0))
return -EINVAL;
if (nested_npt_enabled(svm))
@@ -1888,7 +1892,8 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
ret = -EINVAL;
__nested_copy_vmcb_control_to_cache(vcpu, &ctl_cached, ctl);
- if (!nested_vmcb_check_controls(vcpu, &ctl_cached))
+ /* 'save' contains L1 state saved from before VMRUN */
+ if (!nested_vmcb_check_controls(vcpu, &ctl_cached, save->cr0))
goto out_free;
/*
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 21/31] KVM: nSVM: Add missing consistency check for EFER, CR0, CR4, and CS
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (19 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 20/31] KVM: nSVM: Add missing consistency check for hCR0.PG and NP_ENABLE Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-03-03 0:03 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 22/31] KVM: nSVM: Add missing consistency check for EVENTINJ Yosry Ahmed
` (10 subsequent siblings)
31 siblings, 1 reply; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
According to the APM Volume #2, 15.5, Canonicalization and Consistency
Checks (24593—Rev. 3.42—March 2024), the following condition (among
others) results in a #VMEXIT with VMEXIT_INVALID (aka SVM_EXIT_ERR):
EFER.LME, CR0.PG, CR4.PAE, CS.L, and CS.D are all non-zero.
Add the missing consistency check. This is functionally a nop because
the nested VMRUN results in SVM_EXIT_ERR in HW, which is forwarded to
L1, but KVM makes all consistency checks before a VMRUN is actually
attempted.
Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 7 +++++++
arch/x86/kvm/svm/svm.h | 1 +
2 files changed, 8 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 6fffb6ae6b88b..2c852e94a9ad9 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -397,6 +397,11 @@ static bool nested_vmcb_check_save(struct kvm_vcpu *vcpu,
CC(!(save->cr0 & X86_CR0_PE)) ||
CC(!kvm_vcpu_is_legal_cr3(vcpu, save->cr3)))
return false;
+
+ if (CC((save->cr4 & X86_CR4_PAE) &&
+ (save->cs.attrib & SVM_SELECTOR_L_MASK) &&
+ (save->cs.attrib & SVM_SELECTOR_DB_MASK)))
+ return false;
}
/* Note, SVM doesn't have any additional restrictions on CR4. */
@@ -492,6 +497,8 @@ static void __nested_copy_vmcb_save_to_cache(struct vmcb_save_area_cached *to,
* Copy only fields that are validated, as we need them
* to avoid TOC/TOU races.
*/
+ to->cs = from->cs;
+
to->efer = from->efer;
to->cr0 = from->cr0;
to->cr3 = from->cr3;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 7629cb37c9302..0a5d5a4453b7e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -140,6 +140,7 @@ struct kvm_vmcb_info {
};
struct vmcb_save_area_cached {
+ struct vmcb_seg cs;
u64 efer;
u64 cr4;
u64 cr3;
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 22/31] KVM: nSVM: Add missing consistency check for EVENTINJ
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (20 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 21/31] KVM: nSVM: Add missing consistency check for EFER, CR0, CR4, and CS Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 23/31] KVM: SVM: Rename vmcb->nested_ctl to vmcb->misc_ctl Yosry Ahmed
` (9 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
According to the APM Volume #2, 15.20 (24593—Rev. 3.42—March 2024):
VMRUN exits with VMEXIT_INVALID error code if either:
• Reserved values of TYPE have been specified, or
• TYPE = 3 (exception) has been specified with a vector that does not
correspond to an exception (this includes vector 2, which is an NMI,
not an exception).
Add the missing consistency checks to KVM. For the second point, inject
VMEXIT_INVALID if the vector is anything but the vectors defined by the
APM for exceptions. Reserved vectors are also considered invalid, which
matches the HW behavior. Vector 9 (i.e. #CSO) is considered invalid
because it is reserved on modern CPUs, and according to LLMs no CPUs
exist supporting SVM and producing #CSOs.
Defined exceptions could be different between virtual CPUs as new CPUs
define new vectors. In a best effort to dynamically define the valid
vectors, make all currently defined vectors as valid except those
obviously tied to a CPU feature: SHSTK -> #CP and SEV-ES -> #VC. As new
vectors are defined, they can similarly be tied to corresponding CPU
features.
Invalid vectors on specific (e.g. old) CPUs that are missed by KVM
should be rejected by HW anyway.
Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler")
CC: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 51 +++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 2c852e94a9ad9..dc0e0ac881979 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -341,6 +341,54 @@ static bool nested_svm_check_bitmap_pa(struct kvm_vcpu *vcpu, u64 pa, u32 size)
kvm_vcpu_is_legal_gpa(vcpu, addr + size - 1);
}
+static bool nested_svm_event_inj_valid_exept(struct kvm_vcpu *vcpu, u8 vector)
+{
+ /*
+ * Vectors that do not correspond to a defined exception are invalid
+ * (including #NMI and reserved vectors). In a best effort to define
+ * valid exceptions based on the virtual CPU, make all exceptions always
+ * valid except those obviously tied to a CPU feature.
+ */
+ switch (vector) {
+ case DE_VECTOR: case DB_VECTOR: case BP_VECTOR: case OF_VECTOR:
+ case BR_VECTOR: case UD_VECTOR: case NM_VECTOR: case DF_VECTOR:
+ case TS_VECTOR: case NP_VECTOR: case SS_VECTOR: case GP_VECTOR:
+ case PF_VECTOR: case MF_VECTOR: case AC_VECTOR: case MC_VECTOR:
+ case XM_VECTOR: case HV_VECTOR: case SX_VECTOR:
+ return true;
+ case CP_VECTOR:
+ return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
+ case VC_VECTOR:
+ return guest_cpu_cap_has(vcpu, X86_FEATURE_SEV_ES);
+ }
+ return false;
+}
+
+/*
+ * According to the APM, VMRUN exits with SVM_EXIT_ERR if SVM_EVTINJ_VALID is
+ * set and:
+ * - The type of event_inj is not one of the defined values.
+ * - The type is SVM_EVTINJ_TYPE_EXEPT, but the vector is not a valid exception.
+ */
+static bool nested_svm_check_event_inj(struct kvm_vcpu *vcpu, u32 event_inj)
+{
+ u32 type = event_inj & SVM_EVTINJ_TYPE_MASK;
+ u8 vector = event_inj & SVM_EVTINJ_VEC_MASK;
+
+ if (!(event_inj & SVM_EVTINJ_VALID))
+ return true;
+
+ if (type != SVM_EVTINJ_TYPE_INTR && type != SVM_EVTINJ_TYPE_NMI &&
+ type != SVM_EVTINJ_TYPE_EXEPT && type != SVM_EVTINJ_TYPE_SOFT)
+ return false;
+
+ if (type == SVM_EVTINJ_TYPE_EXEPT &&
+ !nested_svm_event_inj_valid_exept(vcpu, vector))
+ return false;
+
+ return true;
+}
+
static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
struct vmcb_ctrl_area_cached *control,
unsigned long l1_cr0)
@@ -370,6 +418,9 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
return false;
}
+ if (CC(!nested_svm_check_event_inj(vcpu, control->event_inj)))
+ return false;
+
return true;
}
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 23/31] KVM: SVM: Rename vmcb->nested_ctl to vmcb->misc_ctl
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (21 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 22/31] KVM: nSVM: Add missing consistency check for EVENTINJ Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 24/31] KVM: SVM: Rename vmcb->virt_ext to vmcb->misc_ctl2 Yosry Ahmed
` (8 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
The 'nested_ctl' field is misnamed. Although the first bit is for nested
paging, the other defined bits are for SEV/SEV-ES. Other bits in the
same field according to the APM (but not defined by KVM) include "Guest
Mode Execution Trap", "Enable INVLPGB/TLBSYNC", and other control bits
unrelated to 'nested'.
There is nothing common among these bits, so just name the field
misc_ctl. Also rename the flags accordingly.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/include/asm/svm.h | 8 ++++----
arch/x86/kvm/svm/nested.c | 14 +++++++-------
arch/x86/kvm/svm/sev.c | 4 ++--
arch/x86/kvm/svm/svm.c | 4 ++--
arch/x86/kvm/svm/svm.h | 4 ++--
tools/testing/selftests/kvm/include/x86/svm.h | 6 +++---
tools/testing/selftests/kvm/lib/x86/svm.c | 2 +-
7 files changed, 21 insertions(+), 21 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index edde36097ddc3..983db6575141d 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -142,7 +142,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u64 exit_info_2;
u32 exit_int_info;
u32 exit_int_info_err;
- u64 nested_ctl;
+ u64 misc_ctl;
u64 avic_vapic_bar;
u64 ghcb_gpa;
u32 event_inj;
@@ -239,9 +239,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define SVM_IOIO_SIZE_MASK (7 << SVM_IOIO_SIZE_SHIFT)
#define SVM_IOIO_ASIZE_MASK (7 << SVM_IOIO_ASIZE_SHIFT)
-#define SVM_NESTED_CTL_NP_ENABLE BIT(0)
-#define SVM_NESTED_CTL_SEV_ENABLE BIT(1)
-#define SVM_NESTED_CTL_SEV_ES_ENABLE BIT(2)
+#define SVM_MISC_ENABLE_NP BIT(0)
+#define SVM_MISC_ENABLE_SEV BIT(1)
+#define SVM_MISC_ENABLE_SEV_ES BIT(2)
#define SVM_TSC_RATIO_RSVD 0xffffff0000000000ULL
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index dc0e0ac881979..099cdab878d45 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -399,7 +399,7 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
if (CC(control->asid == 0))
return false;
- if (control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) {
+ if (control->misc_ctl & SVM_MISC_ENABLE_NP) {
if (CC(!kvm_vcpu_is_legal_gpa(vcpu, control->nested_cr3)))
return false;
if (CC(!(l1_cr0 & X86_CR0_PG)))
@@ -494,10 +494,10 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
nested_svm_sanitize_intercept(vcpu, to, SKINIT);
nested_svm_sanitize_intercept(vcpu, to, RDPRU);
- /* Always clear SVM_NESTED_CTL_NP_ENABLE if the guest cannot use NPTs */
- to->nested_ctl = from->nested_ctl;
+ /* Always clear SVM_MISC_ENABLE_NP if the guest cannot use NPTs */
+ to->misc_ctl = from->misc_ctl;
if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NPT))
- to->nested_ctl &= ~SVM_NESTED_CTL_NP_ENABLE;
+ to->misc_ctl &= ~SVM_MISC_ENABLE_NP;
to->iopm_base_pa = from->iopm_base_pa;
to->msrpm_base_pa = from->msrpm_base_pa;
@@ -838,7 +838,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
}
/* Copied from vmcb01. msrpm_base can be overwritten later. */
- vmcb02->control.nested_ctl = vmcb01->control.nested_ctl;
+ vmcb02->control.misc_ctl = vmcb01->control.misc_ctl;
vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa;
vmcb02->control.msrpm_base_pa = vmcb01->control.msrpm_base_pa;
vmcb_mark_dirty(vmcb02, VMCB_PERM_MAP);
@@ -994,7 +994,7 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
vmcb12->save.rip,
vmcb12->control.int_ctl,
vmcb12->control.event_inj,
- vmcb12->control.nested_ctl,
+ vmcb12->control.misc_ctl,
vmcb12->control.nested_cr3,
vmcb12->save.cr3,
KVM_ISA_SVM);
@@ -1810,7 +1810,7 @@ static void nested_copy_vmcb_cache_to_control(struct vmcb_control_area *dst,
dst->exit_info_2 = from->exit_info_2;
dst->exit_int_info = from->exit_int_info;
dst->exit_int_info_err = from->exit_int_info_err;
- dst->nested_ctl = from->nested_ctl;
+ dst->misc_ctl = from->misc_ctl;
dst->event_inj = from->event_inj;
dst->event_inj_err = from->event_inj_err;
dst->next_rip = from->next_rip;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ea515cf411686..0ed9cfed1cbcc 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4599,7 +4599,7 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm, bool init_event)
struct kvm_sev_info *sev = to_kvm_sev_info(svm->vcpu.kvm);
struct vmcb *vmcb = svm->vmcb01.ptr;
- svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ES_ENABLE;
+ svm->vmcb->control.misc_ctl |= SVM_MISC_ENABLE_SEV_ES;
/*
* An SEV-ES guest requires a VMSA area that is a separate from the
@@ -4670,7 +4670,7 @@ void sev_init_vmcb(struct vcpu_svm *svm, bool init_event)
{
struct kvm_vcpu *vcpu = &svm->vcpu;
- svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ENABLE;
+ svm->vmcb->control.misc_ctl |= SVM_MISC_ENABLE_SEV;
clr_exception_intercept(svm, UD_VECTOR);
/*
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1b31b033d79b0..7bc8b72fe5ad8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1152,7 +1152,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
if (npt_enabled) {
/* Setup VMCB for Nested Paging */
- control->nested_ctl |= SVM_NESTED_CTL_NP_ENABLE;
+ control->misc_ctl |= SVM_MISC_ENABLE_NP;
svm_clr_intercept(svm, INTERCEPT_INVLPG);
clr_exception_intercept(svm, PF_VECTOR);
svm_clr_intercept(svm, INTERCEPT_CR3_READ);
@@ -3362,7 +3362,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
pr_err("%-20s%016llx\n", "exit_info2:", control->exit_info_2);
pr_err("%-20s%08x\n", "exit_int_info:", control->exit_int_info);
pr_err("%-20s%08x\n", "exit_int_info_err:", control->exit_int_info_err);
- pr_err("%-20s%lld\n", "nested_ctl:", control->nested_ctl);
+ pr_err("%-20s%lld\n", "misc_ctl:", control->misc_ctl);
pr_err("%-20s%016llx\n", "nested_cr3:", control->nested_cr3);
pr_err("%-20s%016llx\n", "avic_vapic_bar:", control->avic_vapic_bar);
pr_err("%-20s%016llx\n", "ghcb:", control->ghcb_gpa);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0a5d5a4453b7e..f66e5c8565aad 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -167,7 +167,7 @@ struct vmcb_ctrl_area_cached {
u64 exit_info_2;
u32 exit_int_info;
u32 exit_int_info_err;
- u64 nested_ctl;
+ u64 misc_ctl;
u32 event_inj;
u32 event_inj_err;
u64 next_rip;
@@ -579,7 +579,7 @@ static inline bool gif_set(struct vcpu_svm *svm)
static inline bool nested_npt_enabled(struct vcpu_svm *svm)
{
- return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
+ return svm->nested.ctl.misc_ctl & SVM_MISC_ENABLE_NP;
}
static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
diff --git a/tools/testing/selftests/kvm/include/x86/svm.h b/tools/testing/selftests/kvm/include/x86/svm.h
index 10b30b38bb3fd..d81d8a9f5bfb6 100644
--- a/tools/testing/selftests/kvm/include/x86/svm.h
+++ b/tools/testing/selftests/kvm/include/x86/svm.h
@@ -97,7 +97,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u64 exit_info_2;
u32 exit_int_info;
u32 exit_int_info_err;
- u64 nested_ctl;
+ u64 misc_ctl;
u64 avic_vapic_bar;
u8 reserved_4[8];
u32 event_inj;
@@ -175,8 +175,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define SVM_VM_CR_SVM_LOCK_MASK 0x0008ULL
#define SVM_VM_CR_SVM_DIS_MASK 0x0010ULL
-#define SVM_NESTED_CTL_NP_ENABLE BIT(0)
-#define SVM_NESTED_CTL_SEV_ENABLE BIT(1)
+#define SVM_MISC_ENABLE_NP BIT(0)
+#define SVM_MISC_ENABLE_SEV BIT(1)
struct __attribute__ ((__packed__)) vmcb_seg {
u16 selector;
diff --git a/tools/testing/selftests/kvm/lib/x86/svm.c b/tools/testing/selftests/kvm/lib/x86/svm.c
index 2e5c480c9afd4..eb20b00112c76 100644
--- a/tools/testing/selftests/kvm/lib/x86/svm.c
+++ b/tools/testing/selftests/kvm/lib/x86/svm.c
@@ -126,7 +126,7 @@ void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_r
guest_regs.rdi = (u64)svm;
if (svm->ncr3_gpa) {
- ctrl->nested_ctl |= SVM_NESTED_CTL_NP_ENABLE;
+ ctrl->misc_ctl |= SVM_MISC_ENABLE_NP;
ctrl->nested_cr3 = svm->ncr3_gpa;
}
}
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 24/31] KVM: SVM: Rename vmcb->virt_ext to vmcb->misc_ctl2
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (22 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 23/31] KVM: SVM: Rename vmcb->nested_ctl to vmcb->misc_ctl Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 25/31] KVM: nSVM: Cache all used fields from VMCB12 Yosry Ahmed
` (7 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
'virt' is confusing in the VMCB because it is relative and ambiguous.
The 'virt_ext' field includes bits for LBR virtualization and
VMSAVE/VMLOAD virtualization, so it's just another miscellaneous control
field. Name it as such.
While at it, move the definitions of the bits below those for
'misc_ctl' and rename them for consistency.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/include/asm/svm.h | 7 +++----
arch/x86/kvm/svm/nested.c | 16 +++++++--------
arch/x86/kvm/svm/svm.c | 20 +++++++++----------
arch/x86/kvm/svm/svm.h | 2 +-
tools/testing/selftests/kvm/include/x86/svm.h | 8 ++++----
.../kvm/x86/nested_vmsave_vmload_test.c | 16 +++++++--------
.../selftests/kvm/x86/svm_lbr_nested_state.c | 4 ++--
7 files changed, 36 insertions(+), 37 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 983db6575141d..c169256c415fb 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -148,7 +148,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u32 event_inj;
u32 event_inj_err;
u64 nested_cr3;
- u64 virt_ext;
+ u64 misc_ctl2;
u32 clean;
u32 reserved_5;
u64 next_rip;
@@ -222,9 +222,6 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define X2APIC_MODE_SHIFT 30
#define X2APIC_MODE_MASK (1 << X2APIC_MODE_SHIFT)
-#define LBR_CTL_ENABLE_MASK BIT_ULL(0)
-#define VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK BIT_ULL(1)
-
#define SVM_INTERRUPT_SHADOW_MASK BIT_ULL(0)
#define SVM_GUEST_INTERRUPT_MASK BIT_ULL(1)
@@ -243,6 +240,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define SVM_MISC_ENABLE_SEV BIT(1)
#define SVM_MISC_ENABLE_SEV_ES BIT(2)
+#define SVM_MISC2_ENABLE_V_LBR BIT_ULL(0)
+#define SVM_MISC2_ENABLE_V_VMLOAD_VMSAVE BIT_ULL(1)
#define SVM_TSC_RATIO_RSVD 0xffffff0000000000ULL
#define SVM_TSC_RATIO_MIN 0x0000000000000001ULL
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 099cdab878d45..679ac9f6dfe80 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -116,7 +116,7 @@ static bool nested_vmcb_needs_vls_intercept(struct vcpu_svm *svm)
if (!nested_npt_enabled(svm))
return true;
- if (!(svm->nested.ctl.virt_ext & VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK))
+ if (!(svm->nested.ctl.misc_ctl2 & SVM_MISC2_ENABLE_V_VMLOAD_VMSAVE))
return true;
return false;
@@ -179,7 +179,7 @@ void recalc_intercepts(struct vcpu_svm *svm)
vmcb_set_intercept(c, INTERCEPT_VMLOAD);
vmcb_set_intercept(c, INTERCEPT_VMSAVE);
} else {
- WARN_ON(!(c->virt_ext & VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK));
+ WARN_ON(!(c->misc_ctl2 & SVM_MISC2_ENABLE_V_VMLOAD_VMSAVE));
}
}
@@ -516,7 +516,7 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
to->event_inj_err = from->event_inj_err;
to->next_rip = from->next_rip;
to->nested_cr3 = from->nested_cr3;
- to->virt_ext = from->virt_ext;
+ to->misc_ctl2 = from->misc_ctl2;
to->pause_filter_count = from->pause_filter_count;
to->pause_filter_thresh = from->pause_filter_thresh;
@@ -695,7 +695,7 @@ void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm)
static bool nested_vmcb12_has_lbrv(struct kvm_vcpu *vcpu)
{
return guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
- (to_svm(vcpu)->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK);
+ (to_svm(vcpu)->nested.ctl.misc_ctl2 & SVM_MISC2_ENABLE_V_LBR);
}
static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12)
@@ -919,10 +919,10 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
svm->soft_int_next_rip = vmcb12_rip;
}
- /* LBR_CTL_ENABLE_MASK is controlled by svm_update_lbrv() */
+ /* SVM_MISC2_ENABLE_V_LBR is controlled by svm_update_lbrv() */
if (!nested_vmcb_needs_vls_intercept(svm))
- vmcb02->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
+ vmcb02->control.misc_ctl2 |= SVM_MISC2_ENABLE_V_VMLOAD_VMSAVE;
if (guest_cpu_cap_has(vcpu, X86_FEATURE_PAUSEFILTER))
pause_count12 = svm->nested.ctl.pause_filter_count;
@@ -1814,8 +1814,8 @@ static void nested_copy_vmcb_cache_to_control(struct vmcb_control_area *dst,
dst->event_inj = from->event_inj;
dst->event_inj_err = from->event_inj_err;
dst->next_rip = from->next_rip;
- dst->nested_cr3 = from->nested_cr3;
- dst->virt_ext = from->virt_ext;
+ dst->nested_cr3 = from->nested_cr3;
+ dst->misc_ctl2 = from->misc_ctl2;
dst->pause_filter_count = from->pause_filter_count;
dst->pause_filter_thresh = from->pause_filter_thresh;
/* 'clean' and 'hv_enlightenments' are not changed by KVM */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7bc8b72fe5ad8..94e14badddfa2 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -710,7 +710,7 @@ void *svm_alloc_permissions_map(unsigned long size, gfp_t gfp_mask)
static void svm_recalc_lbr_msr_intercepts(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- bool intercept = !(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK);
+ bool intercept = !(svm->vmcb->control.misc_ctl2 & SVM_MISC2_ENABLE_V_LBR);
if (intercept == svm->lbr_msrs_intercepted)
return;
@@ -843,7 +843,7 @@ static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
static void __svm_enable_lbrv(struct kvm_vcpu *vcpu)
{
- to_svm(vcpu)->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;
+ to_svm(vcpu)->vmcb->control.misc_ctl2 |= SVM_MISC2_ENABLE_V_LBR;
}
void svm_enable_lbrv(struct kvm_vcpu *vcpu)
@@ -855,16 +855,16 @@ void svm_enable_lbrv(struct kvm_vcpu *vcpu)
static void __svm_disable_lbrv(struct kvm_vcpu *vcpu)
{
KVM_BUG_ON(sev_es_guest(vcpu->kvm), vcpu->kvm);
- to_svm(vcpu)->vmcb->control.virt_ext &= ~LBR_CTL_ENABLE_MASK;
+ to_svm(vcpu)->vmcb->control.misc_ctl2 &= ~SVM_MISC2_ENABLE_V_LBR;
}
void svm_update_lbrv(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- bool current_enable_lbrv = svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK;
+ bool current_enable_lbrv = svm->vmcb->control.misc_ctl2 & SVM_MISC2_ENABLE_V_LBR;
bool enable_lbrv = (svm->vmcb->save.dbgctl & DEBUGCTLMSR_LBR) ||
(is_guest_mode(vcpu) && guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
- (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK));
+ (svm->nested.ctl.misc_ctl2 & SVM_MISC2_ENABLE_V_LBR));
if (enable_lbrv && !current_enable_lbrv)
__svm_enable_lbrv(vcpu);
@@ -1023,7 +1023,7 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu)
}
/*
- * No need to toggle VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK here, it is
+ * No need to toggle SVM_MISC2_ENABLE_V_VMLOAD_VMSAVE here, it is
* always set if vls is enabled. If the intercepts are set, the bit is
* meaningless anyway.
*/
@@ -1191,7 +1191,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
}
if (vls)
- svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
+ svm->vmcb->control.misc_ctl2 |= SVM_MISC2_ENABLE_V_VMLOAD_VMSAVE;
if (vcpu->kvm->arch.bus_lock_detection_enabled)
svm_set_intercept(svm, INTERCEPT_BUSLOCK);
@@ -3368,7 +3368,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
pr_err("%-20s%016llx\n", "ghcb:", control->ghcb_gpa);
pr_err("%-20s%08x\n", "event_inj:", control->event_inj);
pr_err("%-20s%08x\n", "event_inj_err:", control->event_inj_err);
- pr_err("%-20s%lld\n", "virt_ext:", control->virt_ext);
+ pr_err("%-20s%lld\n", "misc_ctl2:", control->misc_ctl2);
pr_err("%-20s%016llx\n", "next_rip:", control->next_rip);
pr_err("%-20s%016llx\n", "avic_backing_page:", control->avic_backing_page);
pr_err("%-20s%016llx\n", "avic_logical_id:", control->avic_logical_id);
@@ -4363,7 +4363,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
* VM-Exit), as running with the host's DEBUGCTL can negatively affect
* guest state and can even be fatal, e.g. due to Bus Lock Detect.
*/
- if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK) &&
+ if (!(svm->vmcb->control.misc_ctl2 & SVM_MISC2_ENABLE_V_LBR) &&
vcpu->arch.host_debugctl != svm->vmcb->save.dbgctl)
update_debugctlmsr(svm->vmcb->save.dbgctl);
@@ -4394,7 +4394,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
- if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK) &&
+ if (!(svm->vmcb->control.misc_ctl2 & SVM_MISC2_ENABLE_V_LBR) &&
vcpu->arch.host_debugctl != svm->vmcb->save.dbgctl)
update_debugctlmsr(vcpu->arch.host_debugctl);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f66e5c8565aad..304328c33e960 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -172,7 +172,7 @@ struct vmcb_ctrl_area_cached {
u32 event_inj_err;
u64 next_rip;
u64 nested_cr3;
- u64 virt_ext;
+ u64 misc_ctl2;
u32 clean;
u64 bus_lock_rip;
union {
diff --git a/tools/testing/selftests/kvm/include/x86/svm.h b/tools/testing/selftests/kvm/include/x86/svm.h
index d81d8a9f5bfb6..c8539166270ea 100644
--- a/tools/testing/selftests/kvm/include/x86/svm.h
+++ b/tools/testing/selftests/kvm/include/x86/svm.h
@@ -103,7 +103,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u32 event_inj;
u32 event_inj_err;
u64 nested_cr3;
- u64 virt_ext;
+ u64 misc_ctl2;
u32 clean;
u32 reserved_5;
u64 next_rip;
@@ -155,9 +155,6 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define AVIC_ENABLE_SHIFT 31
#define AVIC_ENABLE_MASK (1 << AVIC_ENABLE_SHIFT)
-#define LBR_CTL_ENABLE_MASK BIT_ULL(0)
-#define VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK BIT_ULL(1)
-
#define SVM_INTERRUPT_SHADOW_MASK 1
#define SVM_IOIO_STR_SHIFT 2
@@ -178,6 +175,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define SVM_MISC_ENABLE_NP BIT(0)
#define SVM_MISC_ENABLE_SEV BIT(1)
+#define SVM_MISC2_ENABLE_V_LBR BIT_ULL(0)
+#define SVM_MISC2_ENABLE_V_VMLOAD_VMSAVE BIT_ULL(1)
+
struct __attribute__ ((__packed__)) vmcb_seg {
u16 selector;
u16 attrib;
diff --git a/tools/testing/selftests/kvm/x86/nested_vmsave_vmload_test.c b/tools/testing/selftests/kvm/x86/nested_vmsave_vmload_test.c
index 6764a48f9d4d9..71717118d6924 100644
--- a/tools/testing/selftests/kvm/x86/nested_vmsave_vmload_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_vmsave_vmload_test.c
@@ -79,8 +79,8 @@ static void l1_guest_code(struct svm_test_data *svm)
svm->vmcb->control.intercept |= (BIT_ULL(INTERCEPT_VMSAVE) |
BIT_ULL(INTERCEPT_VMLOAD));
- /* ..VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK cleared.. */
- svm->vmcb->control.virt_ext &= ~VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
+ /* ..SVM_MISC2_ENABLE_V_VMLOAD_VMSAVE cleared.. */
+ svm->vmcb->control.misc_ctl2 &= ~SVM_MISC2_ENABLE_V_VMLOAD_VMSAVE;
svm->vmcb->save.rip = (u64)l2_guest_code_vmsave;
run_guest(svm->vmcb, svm->vmcb_gpa);
@@ -90,8 +90,8 @@ static void l1_guest_code(struct svm_test_data *svm)
run_guest(svm->vmcb, svm->vmcb_gpa);
GUEST_ASSERT_EQ(svm->vmcb->control.exit_code, SVM_EXIT_VMLOAD);
- /* ..and VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK set */
- svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
+ /* ..and SVM_MISC2_ENABLE_V_VMLOAD_VMSAVE set */
+ svm->vmcb->control.misc_ctl2 |= SVM_MISC2_ENABLE_V_VMLOAD_VMSAVE;
svm->vmcb->save.rip = (u64)l2_guest_code_vmsave;
run_guest(svm->vmcb, svm->vmcb_gpa);
@@ -106,20 +106,20 @@ static void l1_guest_code(struct svm_test_data *svm)
BIT_ULL(INTERCEPT_VMLOAD));
/*
- * Without VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK, the GPA will be
+ * Without SVM_MISC2_ENABLE_V_VMLOAD_VMSAVE, the GPA will be
* interpreted as an L1 GPA, so VMCB0 should be used.
*/
svm->vmcb->save.rip = (u64)l2_guest_code_vmcb0;
- svm->vmcb->control.virt_ext &= ~VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
+ svm->vmcb->control.misc_ctl2 &= ~SVM_MISC2_ENABLE_V_VMLOAD_VMSAVE;
run_guest(svm->vmcb, svm->vmcb_gpa);
GUEST_ASSERT_EQ(svm->vmcb->control.exit_code, SVM_EXIT_VMMCALL);
/*
- * With VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK, the GPA will be interpeted as
+ * With SVM_MISC2_ENABLE_V_VMLOAD_VMSAVE, the GPA will be interpeted as
* an L2 GPA, and translated through the NPT to VMCB1.
*/
svm->vmcb->save.rip = (u64)l2_guest_code_vmcb1;
- svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
+ svm->vmcb->control.misc_ctl2 |= SVM_MISC2_ENABLE_V_VMLOAD_VMSAVE;
run_guest(svm->vmcb, svm->vmcb_gpa);
GUEST_ASSERT_EQ(svm->vmcb->control.exit_code, SVM_EXIT_VMMCALL);
diff --git a/tools/testing/selftests/kvm/x86/svm_lbr_nested_state.c b/tools/testing/selftests/kvm/x86/svm_lbr_nested_state.c
index bf16abb1152e0..ff99438824d3a 100644
--- a/tools/testing/selftests/kvm/x86/svm_lbr_nested_state.c
+++ b/tools/testing/selftests/kvm/x86/svm_lbr_nested_state.c
@@ -69,9 +69,9 @@ static void l1_guest_code(struct svm_test_data *svm, bool nested_lbrv)
&l2_guest_stack[L2_GUEST_STACK_SIZE]);
if (nested_lbrv)
- vmcb->control.virt_ext = LBR_CTL_ENABLE_MASK;
+ vmcb->control.misc_ctl2 = SVM_MISC2_ENABLE_V_LBR;
else
- vmcb->control.virt_ext &= ~LBR_CTL_ENABLE_MASK;
+ vmcb->control.misc_ctl2 &= ~SVM_MISC2_ENABLE_V_LBR;
run_guest(vmcb, svm->vmcb_gpa);
GUEST_ASSERT(svm->vmcb->control.exit_code == SVM_EXIT_VMMCALL);
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 25/31] KVM: nSVM: Cache all used fields from VMCB12
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (23 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 24/31] KVM: SVM: Rename vmcb->virt_ext to vmcb->misc_ctl2 Yosry Ahmed
@ 2026-02-24 22:33 ` Yosry Ahmed
2026-02-24 23:58 ` Yosry Ahmed
2026-02-24 22:34 ` [PATCH v6 26/31] KVM: nSVM: Restrict mapping VMCB12 on nested VMRUN Yosry Ahmed
` (6 subsequent siblings)
31 siblings, 1 reply; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
Currently, most fields used from VMCB12 are cached in
svm->nested.{ctl/save}. This is mainly to avoid TOC-TOU bugs. However,
for the save area, only the fields used in the consistency checks (i.e.
nested_vmcb_check_save()) were being cached. Other fields are read
directly from guest memory in nested_vmcb02_prepare_save().
While probably benign, this still makes it possible for TOC-TOU bugs to
happen. For example, RAX, RSP, and RIP are read twice, once to store in
VMCB02, and once to store in vcpu->arch.regs. It is possible for the
guest to modify the value between both reads, potentially causing nasty
bugs.
Harden against such bugs by caching everything in svm->nested.save.
Cache all the needed fields, and keep all accesses to the VMCB12
strictly in nested_svm_vmrun() for caching and early error injection.
Following changes will further limit the access to the VMCB12 in the
nested VMRUN path.
Introduce vmcb12_is_dirty() to use with the cached control fields
instead of vmcb_is_dirty(), similar to vmcb12_is_intercept().
Opportunistically order the copies in __nested_copy_vmcb_save_to_cache()
by the order in which the fields are defined in struct vmcb_save_area.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 116 ++++++++++++++++++++++----------------
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/svm/svm.h | 27 ++++++++-
3 files changed, 93 insertions(+), 52 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 679ac9f6dfe80..2159f5fbfc314 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -544,19 +544,34 @@ void nested_copy_vmcb_control_to_cache(struct vcpu_svm *svm,
static void __nested_copy_vmcb_save_to_cache(struct vmcb_save_area_cached *to,
struct vmcb_save_area *from)
{
- /*
- * Copy only fields that are validated, as we need them
- * to avoid TOC/TOU races.
- */
+ to->es = from->es;
to->cs = from->cs;
+ to->ss = from->ss;
+ to->ds = from->ds;
+ to->gdtr = from->gdtr;
+ to->idtr = from->idtr;
+
+ to->cpl = from->cpl;
to->efer = from->efer;
- to->cr0 = from->cr0;
- to->cr3 = from->cr3;
to->cr4 = from->cr4;
-
- to->dr6 = from->dr6;
+ to->cr3 = from->cr3;
+ to->cr0 = from->cr0;
to->dr7 = from->dr7;
+ to->dr6 = from->dr6;
+
+ to->rflags = from->rflags;
+ to->rip = from->rip;
+ to->rsp = from->rsp;
+
+ to->s_cet = from->s_cet;
+ to->ssp = from->ssp;
+ to->isst_addr = from->isst_addr;
+
+ to->rax = from->rax;
+ to->cr2 = from->cr2;
+
+ svm_copy_lbrs(to, from);
}
void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
@@ -698,8 +713,10 @@ static bool nested_vmcb12_has_lbrv(struct kvm_vcpu *vcpu)
(to_svm(vcpu)->nested.ctl.misc_ctl2 & SVM_MISC2_ENABLE_V_LBR);
}
-static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12)
+static void nested_vmcb02_prepare_save(struct vcpu_svm *svm)
{
+ struct vmcb_ctrl_area_cached *control = &svm->nested.ctl;
+ struct vmcb_save_area_cached *save = &svm->nested.save;
bool new_vmcb12 = false;
struct vmcb *vmcb01 = svm->vmcb01.ptr;
struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;
@@ -715,48 +732,48 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
svm->nested.force_msr_bitmap_recalc = true;
}
- if (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_SEG))) {
- vmcb02->save.es = vmcb12->save.es;
- vmcb02->save.cs = vmcb12->save.cs;
- vmcb02->save.ss = vmcb12->save.ss;
- vmcb02->save.ds = vmcb12->save.ds;
- vmcb02->save.cpl = vmcb12->save.cpl;
+ if (unlikely(new_vmcb12 || vmcb12_is_dirty(control, VMCB_SEG))) {
+ vmcb02->save.es = save->es;
+ vmcb02->save.cs = save->cs;
+ vmcb02->save.ss = save->ss;
+ vmcb02->save.ds = save->ds;
+ vmcb02->save.cpl = save->cpl;
vmcb_mark_dirty(vmcb02, VMCB_SEG);
}
- if (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_DT))) {
- vmcb02->save.gdtr = vmcb12->save.gdtr;
- vmcb02->save.idtr = vmcb12->save.idtr;
+ if (unlikely(new_vmcb12 || vmcb12_is_dirty(control, VMCB_DT))) {
+ vmcb02->save.gdtr = save->gdtr;
+ vmcb02->save.idtr = save->idtr;
vmcb_mark_dirty(vmcb02, VMCB_DT);
}
if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
- (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_CET)))) {
- vmcb02->save.s_cet = vmcb12->save.s_cet;
- vmcb02->save.isst_addr = vmcb12->save.isst_addr;
- vmcb02->save.ssp = vmcb12->save.ssp;
+ (unlikely(new_vmcb12 || vmcb12_is_dirty(control, VMCB_CET)))) {
+ vmcb02->save.s_cet = save->s_cet;
+ vmcb02->save.isst_addr = save->isst_addr;
+ vmcb02->save.ssp = save->ssp;
vmcb_mark_dirty(vmcb02, VMCB_CET);
}
- kvm_set_rflags(vcpu, vmcb12->save.rflags | X86_EFLAGS_FIXED);
+ kvm_set_rflags(vcpu, save->rflags | X86_EFLAGS_FIXED);
svm_set_efer(vcpu, svm->nested.save.efer);
svm_set_cr0(vcpu, svm->nested.save.cr0);
svm_set_cr4(vcpu, svm->nested.save.cr4);
- svm->vcpu.arch.cr2 = vmcb12->save.cr2;
+ svm->vcpu.arch.cr2 = save->cr2;
- kvm_rax_write(vcpu, vmcb12->save.rax);
- kvm_rsp_write(vcpu, vmcb12->save.rsp);
- kvm_rip_write(vcpu, vmcb12->save.rip);
+ kvm_rax_write(vcpu, save->rax);
+ kvm_rsp_write(vcpu, save->rsp);
+ kvm_rip_write(vcpu, save->rip);
/* In case we don't even reach vcpu_run, the fields are not updated */
- vmcb02->save.rax = vmcb12->save.rax;
- vmcb02->save.rsp = vmcb12->save.rsp;
- vmcb02->save.rip = vmcb12->save.rip;
+ vmcb02->save.rax = save->rax;
+ vmcb02->save.rsp = save->rsp;
+ vmcb02->save.rip = save->rip;
- if (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_DR))) {
+ if (unlikely(new_vmcb12 || vmcb12_is_dirty(control, VMCB_DR))) {
vmcb02->save.dr7 = svm->nested.save.dr7 | DR7_FIXED_1;
svm->vcpu.arch.dr6 = svm->nested.save.dr6 | DR6_ACTIVE_LOW;
vmcb_mark_dirty(vmcb02, VMCB_DR);
@@ -767,7 +784,7 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
* Reserved bits of DEBUGCTL are ignored. Be consistent with
* svm_set_msr's definition of reserved bits.
*/
- svm_copy_lbrs(&vmcb02->save, &vmcb12->save);
+ svm_copy_lbrs(&vmcb02->save, save);
vmcb02->save.dbgctl &= ~DEBUGCTL_RESERVED_BITS;
} else {
svm_copy_lbrs(&vmcb02->save, &vmcb01->save);
@@ -983,28 +1000,29 @@ static void nested_svm_copy_common_state(struct vmcb *from_vmcb, struct vmcb *to
to_vmcb->save.spec_ctrl = from_vmcb->save.spec_ctrl;
}
-int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
- struct vmcb *vmcb12, bool from_vmrun)
+int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa, bool from_vmrun)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ struct vmcb_ctrl_area_cached *control = &svm->nested.ctl;
+ struct vmcb_save_area_cached *save = &svm->nested.save;
int ret;
trace_kvm_nested_vmenter(svm->vmcb->save.rip,
vmcb12_gpa,
- vmcb12->save.rip,
- vmcb12->control.int_ctl,
- vmcb12->control.event_inj,
- vmcb12->control.misc_ctl,
- vmcb12->control.nested_cr3,
- vmcb12->save.cr3,
+ save->rip,
+ control->int_ctl,
+ control->event_inj,
+ control->misc_ctl,
+ control->nested_cr3,
+ save->cr3,
KVM_ISA_SVM);
- trace_kvm_nested_intercepts(vmcb12->control.intercepts[INTERCEPT_CR] & 0xffff,
- vmcb12->control.intercepts[INTERCEPT_CR] >> 16,
- vmcb12->control.intercepts[INTERCEPT_EXCEPTION],
- vmcb12->control.intercepts[INTERCEPT_WORD3],
- vmcb12->control.intercepts[INTERCEPT_WORD4],
- vmcb12->control.intercepts[INTERCEPT_WORD5]);
+ trace_kvm_nested_intercepts(control->intercepts[INTERCEPT_CR] & 0xffff,
+ control->intercepts[INTERCEPT_CR] >> 16,
+ control->intercepts[INTERCEPT_EXCEPTION],
+ control->intercepts[INTERCEPT_WORD3],
+ control->intercepts[INTERCEPT_WORD4],
+ control->intercepts[INTERCEPT_WORD5]);
svm->nested.vmcb12_gpa = vmcb12_gpa;
@@ -1039,8 +1057,8 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr);
svm_switch_vmcb(svm, &svm->nested.vmcb02);
- nested_vmcb02_prepare_control(svm, vmcb12->save.rip, vmcb12->save.cs.base);
- nested_vmcb02_prepare_save(svm, vmcb12);
+ nested_vmcb02_prepare_control(svm, save->rip, save->cs.base);
+ nested_vmcb02_prepare_save(svm);
if (!from_vmrun)
kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
@@ -1157,7 +1175,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
svm->nested.nested_run_pending = 1;
- if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, true)) {
+ if (enter_svm_guest_mode(vcpu, vmcb12_gpa, true)) {
svm->nested.nested_run_pending = 0;
svm->nmi_l1_to_l2 = false;
svm->soft_int_injected = false;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 94e14badddfa2..19112ec48c0f7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4885,7 +4885,7 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)
vmcb12 = map.hva;
nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
- ret = enter_svm_guest_mode(vcpu, smram64->svm_guest_vmcb_gpa, vmcb12, false);
+ ret = enter_svm_guest_mode(vcpu, smram64->svm_guest_vmcb_gpa, false);
if (ret)
goto unmap_save;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 304328c33e960..388aaa5d63d29 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -140,13 +140,32 @@ struct kvm_vmcb_info {
};
struct vmcb_save_area_cached {
+ struct vmcb_seg es;
struct vmcb_seg cs;
+ struct vmcb_seg ss;
+ struct vmcb_seg ds;
+ struct vmcb_seg gdtr;
+ struct vmcb_seg idtr;
+ u8 cpl;
u64 efer;
u64 cr4;
u64 cr3;
u64 cr0;
u64 dr7;
u64 dr6;
+ u64 rflags;
+ u64 rip;
+ u64 rsp;
+ u64 s_cet;
+ u64 ssp;
+ u64 isst_addr;
+ u64 rax;
+ u64 cr2;
+ u64 dbgctl;
+ u64 br_from;
+ u64 br_to;
+ u64 last_excp_from;
+ u64 last_excp_to;
};
struct vmcb_ctrl_area_cached {
@@ -421,6 +440,11 @@ static inline bool vmcb_is_dirty(struct vmcb *vmcb, int bit)
return !test_bit(bit, (unsigned long *)&vmcb->control.clean);
}
+static inline bool vmcb12_is_dirty(struct vmcb_ctrl_area_cached *control, int bit)
+{
+ return !test_bit(bit, (unsigned long *)&control->clean);
+}
+
static __always_inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
{
return container_of(vcpu, struct vcpu_svm, vcpu);
@@ -785,8 +809,7 @@ static inline bool nested_exit_on_nmi(struct vcpu_svm *svm)
int __init nested_svm_init_msrpm_merge_offsets(void);
-int enter_svm_guest_mode(struct kvm_vcpu *vcpu,
- u64 vmcb_gpa, struct vmcb *vmcb12, bool from_vmrun);
+int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb_gpa, bool from_vmrun);
void svm_leave_nested(struct kvm_vcpu *vcpu);
void svm_free_nested(struct vcpu_svm *svm);
int svm_allocate_nested(struct vcpu_svm *svm);
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 26/31] KVM: nSVM: Restrict mapping VMCB12 on nested VMRUN
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (24 preceding siblings ...)
2026-02-24 22:33 ` [PATCH v6 25/31] KVM: nSVM: Cache all used fields from VMCB12 Yosry Ahmed
@ 2026-02-24 22:34 ` Yosry Ahmed
2026-02-24 22:34 ` [PATCH v6 27/31] KVM: nSVM: Use PAGE_MASK to drop lower bits of bitmap GPAs from vmcb12 Yosry Ahmed
` (5 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
All accesses to the VMCB12 in the guest memory on nested VMRUN are
limited to nested_svm_vmrun() and nested_svm_vmrun_error_vmexit().
However, the VMCB12 remains mapped throughout nested_svm_vmrun().
Mapping and unmapping around usages is possible, but it becomes easy-ish
to introduce bugs where 'vmcb12' is used after being unmapped.
Move reading the VMCB12 and copying to cache from nested_svm_vmrun()
into a new helper, nested_svm_copy_vmcb12_to_cache(), that maps the
VMCB12, caches the needed fields, and unmaps it. Use
kvm_vcpu_map_readonly() as only reading the VMCB12 is needed.
Similarly, move mapping the VMCB12 on VMRUN failure into
nested_svm_vmrun_error_vmexit(). Inject a triple fault if the mapping
fails, similar to nested_svm_vmexit().
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 49 +++++++++++++++++++++++++++------------
1 file changed, 34 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 2159f5fbfc314..5c8449bc6fa0c 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1102,28 +1102,54 @@ static void __nested_svm_vmexit(struct vcpu_svm *svm)
kvm_queue_exception(vcpu, DB_VECTOR);
}
-static void nested_svm_vmrun_error_vmexit(struct kvm_vcpu *vcpu, struct vmcb *vmcb12)
+static void nested_svm_vmrun_error_vmexit(struct kvm_vcpu *vcpu, u64 vmcb12_gpa)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ struct kvm_host_map map;
+ struct vmcb *vmcb12;
WARN_ON_ONCE(svm->vmcb == svm->nested.vmcb02.ptr);
leave_guest_mode(vcpu);
+ if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmcb12_gpa), &map)) {
+ kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+ goto out;
+ }
+
+ vmcb12 = map.hva;
vmcb12->control.exit_code = SVM_EXIT_ERR;
vmcb12->control.exit_info_1 = 0;
vmcb12->control.exit_info_2 = 0;
vmcb12->control.event_inj = 0;
vmcb12->control.event_inj_err = 0;
+ kvm_vcpu_unmap(vcpu, &map);
+out:
__nested_svm_vmexit(svm);
}
+static int nested_svm_copy_vmcb12_to_cache(struct kvm_vcpu *vcpu, u64 vmcb12_gpa)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ struct kvm_host_map map;
+ struct vmcb *vmcb12;
+ int r;
+
+ r = kvm_vcpu_map_readonly(vcpu, gpa_to_gfn(vmcb12_gpa), &map);
+ if (r)
+ return r;
+
+ vmcb12 = map.hva;
+ nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
+ nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
+ kvm_vcpu_unmap(vcpu, &map);
+ return 0;
+}
+
int nested_svm_vmrun(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
int ret;
- struct vmcb *vmcb12;
- struct kvm_host_map map;
u64 vmcb12_gpa;
struct vmcb *vmcb01 = svm->vmcb01.ptr;
@@ -1144,22 +1170,17 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
return ret;
}
+ if (WARN_ON_ONCE(!svm->nested.initialized))
+ return -EINVAL;
+
vmcb12_gpa = svm->vmcb->save.rax;
- if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmcb12_gpa), &map)) {
+ if (nested_svm_copy_vmcb12_to_cache(vcpu, vmcb12_gpa)) {
kvm_inject_gp(vcpu, 0);
return 1;
}
ret = kvm_skip_emulated_instruction(vcpu);
- vmcb12 = map.hva;
-
- if (WARN_ON_ONCE(!svm->nested.initialized))
- return -EINVAL;
-
- nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
- nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
-
/*
* Since vmcb01 is not in use, we can use it to store some of the L1
* state.
@@ -1180,11 +1201,9 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
svm->nmi_l1_to_l2 = false;
svm->soft_int_injected = false;
- nested_svm_vmrun_error_vmexit(vcpu, vmcb12);
+ nested_svm_vmrun_error_vmexit(vcpu, vmcb12_gpa);
}
- kvm_vcpu_unmap(vcpu, &map);
-
return ret;
}
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 27/31] KVM: nSVM: Use PAGE_MASK to drop lower bits of bitmap GPAs from vmcb12
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (25 preceding siblings ...)
2026-02-24 22:34 ` [PATCH v6 26/31] KVM: nSVM: Restrict mapping VMCB12 on nested VMRUN Yosry Ahmed
@ 2026-02-24 22:34 ` Yosry Ahmed
2026-02-24 22:34 ` [PATCH v6 28/31] KVM: nSVM: Sanitize TLB_CONTROL field when copying " Yosry Ahmed
` (4 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
Use PAGE_MASK to drop the lower bits from IOPM_BASE_PA and MSRPM_BASE_PA
while copying them instead of dropping the bits afterward with a
hardcoded mask.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 5c8449bc6fa0c..28a8bfc632ef5 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -499,8 +499,8 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NPT))
to->misc_ctl &= ~SVM_MISC_ENABLE_NP;
- to->iopm_base_pa = from->iopm_base_pa;
- to->msrpm_base_pa = from->msrpm_base_pa;
+ to->iopm_base_pa = from->iopm_base_pa & PAGE_MASK;
+ to->msrpm_base_pa = from->msrpm_base_pa & PAGE_MASK;
to->tsc_offset = from->tsc_offset;
to->tlb_ctl = from->tlb_ctl;
to->erap_ctl = from->erap_ctl;
@@ -522,8 +522,6 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
/* Copy asid here because nested_vmcb_check_controls() will check it */
to->asid = from->asid;
- to->msrpm_base_pa &= ~0x0fffULL;
- to->iopm_base_pa &= ~0x0fffULL;
#ifdef CONFIG_KVM_HYPERV
/* Hyper-V extensions (Enlightened VMCB) */
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 28/31] KVM: nSVM: Sanitize TLB_CONTROL field when copying from vmcb12
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (26 preceding siblings ...)
2026-02-24 22:34 ` [PATCH v6 27/31] KVM: nSVM: Use PAGE_MASK to drop lower bits of bitmap GPAs from vmcb12 Yosry Ahmed
@ 2026-02-24 22:34 ` Yosry Ahmed
2026-02-24 22:34 ` [PATCH v6 29/31] KVM: nSVM: Sanitize INT/EVENTINJ fields " Yosry Ahmed
` (3 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
The APM defines possible values for TLB_CONTROL as 0, 1, 3, and 7 -- all
of which are always allowed for KVM guests as KVM always supports
X86_FEATURE_FLUSHBYASID. Only copy bits 0 to 2 from vmcb12's
TLB_CONTROL, such that no unhandled or reserved bits end up in vmcb02.
Note that TLB_CONTROL in vmcb12 is currently ignored by KVM, as it nukes
the TLB on nested transitions anyway (see
nested_svm_transition_tlb_flush()). However, such sanitization will be
needed once the TODOs there are addressed, and it's minimal churn to add
it now.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/include/asm/svm.h | 2 ++
arch/x86/kvm/svm/nested.c | 2 +-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index c169256c415fb..16cf4f435aebd 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -182,6 +182,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define TLB_CONTROL_FLUSH_ASID 3
#define TLB_CONTROL_FLUSH_ASID_LOCAL 7
+#define TLB_CONTROL_MASK GENMASK(2, 0)
+
#define ERAP_CONTROL_ALLOW_LARGER_RAP BIT(0)
#define ERAP_CONTROL_CLEAR_RAP BIT(1)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 28a8bfc632ef5..d7c353ac42d88 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -502,7 +502,7 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
to->iopm_base_pa = from->iopm_base_pa & PAGE_MASK;
to->msrpm_base_pa = from->msrpm_base_pa & PAGE_MASK;
to->tsc_offset = from->tsc_offset;
- to->tlb_ctl = from->tlb_ctl;
+ to->tlb_ctl = from->tlb_ctl & TLB_CONTROL_MASK;
to->erap_ctl = from->erap_ctl;
to->int_ctl = from->int_ctl;
to->int_vector = from->int_vector;
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 29/31] KVM: nSVM: Sanitize INT/EVENTINJ fields when copying from vmcb12
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (27 preceding siblings ...)
2026-02-24 22:34 ` [PATCH v6 28/31] KVM: nSVM: Sanitize TLB_CONTROL field when copying " Yosry Ahmed
@ 2026-02-24 22:34 ` Yosry Ahmed
2026-02-24 22:34 ` [PATCH v6 30/31] KVM: nSVM: Only copy SVM_MISC_ENABLE_NP from VMCB01's misc_ctl Yosry Ahmed
` (2 subsequent siblings)
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:34 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, Jim Mattson
Make sure all fields used from vmcb12 in creating the vmcb02 are
sanitized, such that no unhandled or reserved bits end up in the vmcb02.
The following control fields are read from vmcb12 and have bits that are
either reserved or not handled/advertised by KVM: tlb_ctl, int_ctl,
int_state, int_vector, event_inj, misc_ctl, and misc_ctl2.
The following fields do not require any extra sanitizing:
- tlb_ctl: already being sanitized.
- int_ctl: bits from vmcb12 are copied bit-by-bit as needed.
- misc_ctl: only used in consistency checks (particularly NP_ENABLE).
- misc_ctl2: bits from vmcb12 are copied bit-by-bit as needed.
For the remaining fields (int_vector, int_state, and event_inj), make
sure only defined bits are copied from L1's vmcb12 into KVM'cache by
defining appropriate masks where needed.
Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/include/asm/svm.h | 5 +++++
arch/x86/kvm/svm/nested.c | 8 ++++----
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 16cf4f435aebd..bcfeb5e7c0edf 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -224,6 +224,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define X2APIC_MODE_SHIFT 30
#define X2APIC_MODE_MASK (1 << X2APIC_MODE_SHIFT)
+#define SVM_INT_VECTOR_MASK GENMASK(7, 0)
+
#define SVM_INTERRUPT_SHADOW_MASK BIT_ULL(0)
#define SVM_GUEST_INTERRUPT_MASK BIT_ULL(1)
@@ -637,6 +639,9 @@ static inline void __unused_size_checks(void)
#define SVM_EVTINJ_VALID (1 << 31)
#define SVM_EVTINJ_VALID_ERR (1 << 11)
+#define SVM_EVTINJ_RESERVED_BITS ~(SVM_EVTINJ_VEC_MASK | SVM_EVTINJ_TYPE_MASK | \
+ SVM_EVTINJ_VALID_ERR | SVM_EVTINJ_VALID)
+
#define SVM_EXITINTINFO_VEC_MASK SVM_EVTINJ_VEC_MASK
#define SVM_EXITINTINFO_TYPE_MASK SVM_EVTINJ_TYPE_MASK
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index d7c353ac42d88..6dd31ed2bed5e 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -505,18 +505,18 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
to->tlb_ctl = from->tlb_ctl & TLB_CONTROL_MASK;
to->erap_ctl = from->erap_ctl;
to->int_ctl = from->int_ctl;
- to->int_vector = from->int_vector;
- to->int_state = from->int_state;
+ to->int_vector = from->int_vector & SVM_INT_VECTOR_MASK;
+ to->int_state = from->int_state & SVM_INTERRUPT_SHADOW_MASK;
to->exit_code = from->exit_code;
to->exit_info_1 = from->exit_info_1;
to->exit_info_2 = from->exit_info_2;
to->exit_int_info = from->exit_int_info;
to->exit_int_info_err = from->exit_int_info_err;
- to->event_inj = from->event_inj;
+ to->event_inj = from->event_inj & ~SVM_EVTINJ_RESERVED_BITS;
to->event_inj_err = from->event_inj_err;
to->next_rip = from->next_rip;
to->nested_cr3 = from->nested_cr3;
- to->misc_ctl2 = from->misc_ctl2;
+ to->misc_ctl2 = from->misc_ctl2;
to->pause_filter_count = from->pause_filter_count;
to->pause_filter_thresh = from->pause_filter_thresh;
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 30/31] KVM: nSVM: Only copy SVM_MISC_ENABLE_NP from VMCB01's misc_ctl
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (28 preceding siblings ...)
2026-02-24 22:34 ` [PATCH v6 29/31] KVM: nSVM: Sanitize INT/EVENTINJ fields " Yosry Ahmed
@ 2026-02-24 22:34 ` Yosry Ahmed
2026-02-24 22:34 ` [PATCH v6 31/31] KVM: selftest: Add a selftest for VMRUN/#VMEXIT with unmappable vmcb12 Yosry Ahmed
2026-02-24 22:37 ` [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:34 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, Jim Mattson
The 'misc_ctl' field in VMCB02 is taken as-is from VMCB01. However, the
only bit that needs to copied is SVM_MISC_ENABLE_NP, as all other known
bits in misc_ctl are related to SEV guests, and KVM doesn't support
nested virtualization for SEV guests.
Only copy SVM_MISC_ENABLE_NP to harden against future bugs if/when other
bits are set for L1 but should not be set for L2.
Opportunistically add a comment explaining why SVM_MISC_ENABLE_NP is
taken from VMCB01 and not VMCB02.
Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 6dd31ed2bed5e..acc213e675ac7 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -852,8 +852,16 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
V_NMI_BLOCKING_MASK);
}
- /* Copied from vmcb01. msrpm_base can be overwritten later. */
- vmcb02->control.misc_ctl = vmcb01->control.misc_ctl;
+ /*
+ * Copied from vmcb01. msrpm_base can be overwritten later.
+ *
+ * SVM_MISC_ENABLE_NP in vmcb12 is only used for consistency checks. If
+ * L1 enables NPTs, KVM shadows L1's NPTs and uses those to run L2. If
+ * L1 disables NPT, KVM runs L2 with the same NPTs used to run L1. For
+ * the latter, L1 runs L2 with shadow page tables that translate L2 GVAs
+ * to L1 GPAs, so the same NPTs can be used for L1 and L2.
+ */
+ vmcb02->control.misc_ctl = vmcb01->control.misc_ctl & SVM_MISC_ENABLE_NP;
vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa;
vmcb02->control.msrpm_base_pa = vmcb01->control.msrpm_base_pa;
vmcb_mark_dirty(vmcb02, VMCB_PERM_MAP);
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v6 31/31] KVM: selftest: Add a selftest for VMRUN/#VMEXIT with unmappable vmcb12
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (29 preceding siblings ...)
2026-02-24 22:34 ` [PATCH v6 30/31] KVM: nSVM: Only copy SVM_MISC_ENABLE_NP from VMCB01's misc_ctl Yosry Ahmed
@ 2026-02-24 22:34 ` Yosry Ahmed
2026-03-03 0:13 ` Yosry Ahmed
2026-02-24 22:37 ` [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
31 siblings, 1 reply; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
Add a test that verifies that KVM correctly injects a #GP for nested
VMRUN and a shutdown for nested #VMEXIT, if the GPA of vmcb12 cannot be
mapped.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../kvm/x86/svm_nested_invalid_vmcb12_gpa.c | 95 +++++++++++++++++++
2 files changed, 96 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86/svm_nested_invalid_vmcb12_gpa.c
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 36b48e766e499..f12e7c17d379d 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -110,6 +110,7 @@ TEST_GEN_PROGS_x86 += x86/state_test
TEST_GEN_PROGS_x86 += x86/vmx_preemption_timer_test
TEST_GEN_PROGS_x86 += x86/svm_vmcall_test
TEST_GEN_PROGS_x86 += x86/svm_int_ctl_test
+TEST_GEN_PROGS_x86 += x86/svm_nested_invalid_vmcb12_gpa
TEST_GEN_PROGS_x86 += x86/svm_nested_shutdown_test
TEST_GEN_PROGS_x86 += x86/svm_nested_soft_inject_test
TEST_GEN_PROGS_x86 += x86/svm_lbr_nested_state
diff --git a/tools/testing/selftests/kvm/x86/svm_nested_invalid_vmcb12_gpa.c b/tools/testing/selftests/kvm/x86/svm_nested_invalid_vmcb12_gpa.c
new file mode 100644
index 0000000000000..5b0dec4240937
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/svm_nested_invalid_vmcb12_gpa.c
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2026, Google LLC.
+ */
+#include "kvm_util.h"
+#include "vmx.h"
+#include "svm_util.h"
+#include "kselftest.h"
+
+
+#define L2_GUEST_STACK_SIZE 64
+
+#define SYNC_GP 101
+#define SYNC_L2_STARTED 102
+
+extern char invalid_vmrun;
+
+static void guest_gp_handler(struct ex_regs *regs)
+{
+ GUEST_SYNC(SYNC_GP);
+ regs->rip = (uintptr_t)&invalid_vmrun;
+}
+
+static void l2_guest_code(void)
+{
+ GUEST_SYNC(SYNC_L2_STARTED);
+ vmcall();
+}
+
+static void l1_guest_code(struct svm_test_data *svm, u64 invalid_vmcb12_gpa)
+{
+ unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+
+ generic_svm_setup(svm, l2_guest_code,
+ &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+
+ run_guest(svm->vmcb, invalid_vmcb12_gpa); /* #GP */
+
+ /* GP handler should jump here */
+ asm volatile ("invalid_vmrun:");
+ run_guest(svm->vmcb, svm->vmcb_gpa);
+ GUEST_ASSERT(svm->vmcb->control.exit_code == SVM_EXIT_VMMCALL);
+ GUEST_DONE();
+}
+
+int main(int argc, char *argv[])
+{
+ struct kvm_x86_state *state;
+ vm_vaddr_t nested_gva = 0;
+ struct kvm_vcpu *vcpu;
+ uint32_t maxphyaddr;
+ u64 max_legal_gpa;
+ struct kvm_vm *vm;
+ struct ucall uc;
+
+ TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SVM));
+
+ vm = vm_create_with_one_vcpu(&vcpu, l1_guest_code);
+ vm_install_exception_handler(vcpu->vm, GP_VECTOR, guest_gp_handler);
+
+ /*
+ * Find the max legal GPA that is not backed by a memslot (i.e. cannot
+ * be mapped by KVM).
+ */
+ maxphyaddr = kvm_cpuid_property(vcpu->cpuid, X86_PROPERTY_MAX_PHY_ADDR);
+ max_legal_gpa = BIT_ULL(maxphyaddr) - PAGE_SIZE;
+ vcpu_alloc_svm(vm, &nested_gva);
+ vcpu_args_set(vcpu, 2, nested_gva, max_legal_gpa);
+
+ /* VMRUN with max_legal_gpa, KVM injects a #GP */
+ vcpu_run(vcpu);
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+ TEST_ASSERT_EQ(get_ucall(vcpu, &uc), UCALL_SYNC);
+ TEST_ASSERT_EQ(uc.args[1], SYNC_GP);
+
+ /*
+ * Enter L2 (with a legit vmcb12 GPA), then overwrite vmcb12 GPA with
+ * max_legal_gpa. KVM will fail to map vmcb12 on nested VM-Exit and
+ * cause a shutdown.
+ */
+ vcpu_run(vcpu);
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+ TEST_ASSERT_EQ(get_ucall(vcpu, &uc), UCALL_SYNC);
+ TEST_ASSERT_EQ(uc.args[1], SYNC_L2_STARTED);
+
+ state = vcpu_save_state(vcpu);
+ state->nested.hdr.svm.vmcb_pa = max_legal_gpa;
+ vcpu_load_state(vcpu, state);
+ vcpu_run(vcpu);
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_SHUTDOWN);
+
+ kvm_x86_state_cleanup(state);
+ kvm_vm_free(vm);
+ return 0;
+}
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 00/31] Nested SVM fixes, cleanups, and hardening
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (30 preceding siblings ...)
2026-02-24 22:34 ` [PATCH v6 31/31] KVM: selftest: Add a selftest for VMRUN/#VMEXIT with unmappable vmcb12 Yosry Ahmed
@ 2026-02-24 22:37 ` Yosry Ahmed
31 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 22:37 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
Apparently I messed up the version in the subject of the cover letter
(but not the patches). This is v6.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v6 25/31] KVM: nSVM: Cache all used fields from VMCB12
2026-02-24 22:33 ` [PATCH v6 25/31] KVM: nSVM: Cache all used fields from VMCB12 Yosry Ahmed
@ 2026-02-24 23:58 ` Yosry Ahmed
0 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-24 23:58 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
> @@ -715,48 +732,48 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
> svm->nested.force_msr_bitmap_recalc = true;
> }
>
> - if (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_SEG))) {
> - vmcb02->save.es = vmcb12->save.es;
> - vmcb02->save.cs = vmcb12->save.cs;
> - vmcb02->save.ss = vmcb12->save.ss;
> - vmcb02->save.ds = vmcb12->save.ds;
> - vmcb02->save.cpl = vmcb12->save.cpl;
> + if (unlikely(new_vmcb12 || vmcb12_is_dirty(control, VMCB_SEG))) {
Internal AI review caught a bug here. We only copy clean bits in
__nested_copy_vmcb_control_to_cache() if Hyper-V extensions are used,
so this patch will treat everything as dirty. Not a correctness bug,
but a perf one. Probably need the following:
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 2159f5fbfc314..3c9643c03b1a4 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -524,11 +524,11 @@ void __nested_copy_vmcb_control_to_cache(struct
kvm_vcpu *vcpu,
to->asid = from->asid;
to->msrpm_base_pa &= ~0x0fffULL;
to->iopm_base_pa &= ~0x0fffULL;
+ to->clean = from->clean;
#ifdef CONFIG_KVM_HYPERV
/* Hyper-V extensions (Enlightened VMCB) */
if (kvm_hv_hypercall_enabled(vcpu)) {
- to->clean = from->clean;
memcpy(&to->hv_enlightenments, &from->hv_enlightenments,
sizeof(to->hv_enlightenments));
}
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v6 16/31] KVM: nSVM: Unify handling of VMRUN failures with proper cleanup
2026-02-24 22:33 ` [PATCH v6 16/31] KVM: nSVM: Unify handling of VMRUN failures with proper cleanup Yosry Ahmed
@ 2026-02-25 22:44 ` Yosry Ahmed
2026-02-28 0:07 ` Sean Christopherson
1 sibling, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-25 22:44 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
> @@ -939,22 +939,19 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
> vmcb12->control.intercepts[INTERCEPT_WORD4],
> vmcb12->control.intercepts[INTERCEPT_WORD5]);
>
> -
> svm->nested.vmcb12_gpa = vmcb12_gpa;
>
> WARN_ON(svm->vmcb == svm->nested.vmcb02.ptr);
>
> enter_guest_mode(vcpu);
>
> + if (!nested_vmcb_check_save(vcpu, &svm->nested.save) ||
> + !nested_vmcb_check_controls(vcpu, &svm->nested.ctl))
> + return -EINVAL;
> +
> if (nested_npt_enabled(svm))
> nested_svm_init_mmu_context(vcpu);
>
> - nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr);
> -
> - svm_switch_vmcb(svm, &svm->nested.vmcb02);
> - nested_vmcb02_prepare_control(svm, vmcb12->save.rip, vmcb12->save.cs.base);
> - nested_vmcb02_prepare_save(svm, vmcb12);
> -
> ret = nested_svm_load_cr3(&svm->vcpu, svm->nested.save.cr3,
> nested_npt_enabled(svm), from_vmrun);
> if (ret)
I think the above is wrong (as pointed out by an internal AI bot).
nested_vmcb02_prepare_save() also initializes architectural state to
L2's, moving it after nested_svm_load_cr3() means that
nested_svm_load_cr3() will use L1's architectural state (e.g. in
is_pae_paging(), but more importantly in kvm_init_mmu()).
The only clean-ish solution I can think of is to refactor loading L2's
architectural state out of nested_vmcb02_prepare_save(), and keep it
before nested_svm_load_cr3(). Then, also refactor restoring L1's
architectural state out of nested_svm_vmexit() and also do it in
nested_svm_vmrun_error_vmexit().
We can probably move it to __nested_svm_vmexit(), assuming nothing
else in nested_svm_vmexit() relies on it.
That being said, I am open to suggestions for better options.
> @@ -966,6 +963,17 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
> return ret;
> }
>
> + /*
> + * Any VMRUN failure needs to happen before this point, such that the
> + * nested #VMEXIT is injected properly by nested_svm_vmrun_error_vmexit().
> + */
> +
> + nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr);
> +
> + svm_switch_vmcb(svm, &svm->nested.vmcb02);
> + nested_vmcb02_prepare_control(svm, vmcb12->save.rip, vmcb12->save.cs.base);
> + nested_vmcb02_prepare_save(svm, vmcb12);
> +
Another problem here, hidden above the context lines is a call to
nested_svm_merge_msrpm(), which updates msrpm_base_pa in the active
VMCB, which should be vmcb02, but will be vmcb01 because we moved the
VMCB switch below it.
I think the easiest thing to do here is make nested_svm_merge_msrpm()
explicitly update msrpm_base_pa in vmcb02, I think this is probably
better anyway.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v6 16/31] KVM: nSVM: Unify handling of VMRUN failures with proper cleanup
2026-02-24 22:33 ` [PATCH v6 16/31] KVM: nSVM: Unify handling of VMRUN failures with proper cleanup Yosry Ahmed
2026-02-25 22:44 ` Yosry Ahmed
@ 2026-02-28 0:07 ` Sean Christopherson
2026-02-28 0:46 ` Yosry Ahmed
1 sibling, 1 reply; 44+ messages in thread
From: Sean Christopherson @ 2026-02-28 0:07 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
On Tue, Feb 24, 2026, Yosry Ahmed wrote:
> There are currently two possible causes of VMRUN failures emulated by
> KVM:
>
> 1) Consistency checks failures. In this case, KVM updates the exit code
> in the mapped VMCB12 and exits early in nested_svm_vmrun(). This
> causes a few problems:
>
> A) KVM does not clear the GIF if the early consistency checks fail
> (because nested_svm_vmexit() is not called). Nothing requires
> GIF=0 before a VMRUN, from the APM:
>
> It is assumed that VMM software cleared GIF some time before
> executing the VMRUN instruction, to ensure an atomic state
> switch.
>
> So an early #VMEXIT from early consistency checks could leave the
> GIF set.
>
> B) svm_leave_smm() is missing consistency checks on the newly loaded
> guest state, because the checks aren't performed by
> enter_svm_guest_mode().
This is flat out wrong. RSM isn't missing any consistency checks that are
provided by nested_vmcb_check_save().
if (CC(!(save->efer & EFER_SVME))) <=== irrelevant given KVM's implementation
return false;
if (CC((save->cr0 & X86_CR0_CD) == 0 && (save->cr0 & X86_CR0_NW)) || <== kvm_set_cr0() in rsm_enter_protected_mode()
CC(save->cr0 & ~0xffffffffULL))
return false;
if (CC(!kvm_dr6_valid(save->dr6)) || CC(!kvm_dr7_valid(save->dr7))) <== kvm_set_dr() in rsm_load_state_{32,64}
return false;
/*
* These checks are also performed by KVM_SET_SREGS,
* except that EFER.LMA is not checked by SVM against
* CR0.PG && EFER.LME.
*/
if ((save->efer & EFER_LME) && (save->cr0 & X86_CR0_PG)) {
if (CC(!(save->cr4 & X86_CR4_PAE)) || <== kvm_set_cr4() in rsm_enter_protected_mode()
CC(!(save->cr0 & X86_CR0_PE)) || <== kvm_set_cr0() in rsm_enter_protected_mode()
CC(!kvm_vcpu_is_legal_cr3(vcpu, save->cr3))) <== kvm_set_cr3() in rsm_enter_protected_mode()
return false;
}
/* Note, SVM doesn't have any additional restrictions on CR4. */
if (CC(!__kvm_is_valid_cr4(vcpu, save->cr4))) <== kvm_set_cr4() in rsm_enter_protected_mode()
return false;
if (CC(!kvm_valid_efer(vcpu, save->efer))) <== __kvm_emulate_msr_write() in rsm_load_state_64()
return false;
Even if RSM were missing checks on the L2 state being loaded, I'm not willing to
take on _any_ complexity in nested VMRUN to make RSM suck a little less. KVM's
L2 => SMM => RSM => L2 is fundamentally broken. Anyone that argues otherwise is
ignoring architecturally defined behavior in the SDM and APM.
If someone wants to actually put in the effort to properly emulating SMI => RSM
from L2, then I'd be happy to take on some complexity, but even then it's not at
all clear that it would be necessary.
> 2) Failure to load L2's CR3 or merge the MSR bitmaps. In this case, a
> fully-fledged #VMEXIT injection is performed as VMCB02 is already
> prepared.
>
> Arguably all VMRUN failures should be handled before the VMCB02 is
> prepared, but with proper cleanup (e.g. clear the GIF).
Eh, so long as KVM cleans up after itself, I don't see anything wrong with
preparing some of vmcb02.
So after staring at this for some time, us having gone through multiple attempts
to get things right, and this being tagged for stable@, unless I'm missing some
massive simplification this provides down the road, I am strongly against refactoring
this code, and 100% against reworking things to "fix" SMM.
And so for the stable@ patches, I'm also opposed to all of these:
KVM: nSVM: Refactor minimal #VMEXIT handling out of nested_svm_vmexit()
KVM: nSVM: Call nested_svm_init_mmu_context() before switching to VMCB02
KVM: nSVM: Call nested_svm_merge_msrpm() from enter_svm_guest_mode()
KVM: nSVM: Make nested_svm_merge_msrpm() return an errno
KVM: nSVM: Call enter_guest_mode() before switching to VMCB02
KVM: nSVM: Drop nested_vmcb_check_{save/control}() wrappers
unless they're *needed* by some later commit (I didn't look super closely).
For stable@, just fix the GIF case and move on.
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index d734cd5eef5e..d9790e37d4e8 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1036,6 +1036,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
vmcb12->control.exit_code = SVM_EXIT_ERR;
vmcb12->control.exit_info_1 = 0;
vmcb12->control.exit_info_2 = 0;
+ svm_set_gif(svm, false);
goto out;
}
Sorry for not catching this earlier, I didn't actually read the changelog until
this version. /facepalm
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v6 16/31] KVM: nSVM: Unify handling of VMRUN failures with proper cleanup
2026-02-28 0:07 ` Sean Christopherson
@ 2026-02-28 0:46 ` Yosry Ahmed
2026-02-28 1:05 ` Sean Christopherson
0 siblings, 1 reply; 44+ messages in thread
From: Yosry Ahmed @ 2026-02-28 0:46 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
On Fri, Feb 27, 2026 at 4:07 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Feb 24, 2026, Yosry Ahmed wrote:
> > There are currently two possible causes of VMRUN failures emulated by
> > KVM:
> >
> > 1) Consistency checks failures. In this case, KVM updates the exit code
> > in the mapped VMCB12 and exits early in nested_svm_vmrun(). This
> > causes a few problems:
> >
> > A) KVM does not clear the GIF if the early consistency checks fail
> > (because nested_svm_vmexit() is not called). Nothing requires
> > GIF=0 before a VMRUN, from the APM:
> >
> > It is assumed that VMM software cleared GIF some time before
> > executing the VMRUN instruction, to ensure an atomic state
> > switch.
> >
> > So an early #VMEXIT from early consistency checks could leave the
> > GIF set.
> >
> > B) svm_leave_smm() is missing consistency checks on the newly loaded
> > guest state, because the checks aren't performed by
> > enter_svm_guest_mode().
>
> This is flat out wrong. RSM isn't missing any consistency checks that are
> provided by nested_vmcb_check_save().
>
> if (CC(!(save->efer & EFER_SVME))) <=== irrelevant given KVM's implementation
> return false;
>
> if (CC((save->cr0 & X86_CR0_CD) == 0 && (save->cr0 & X86_CR0_NW)) || <== kvm_set_cr0() in rsm_enter_protected_mode()
> CC(save->cr0 & ~0xffffffffULL))
> return false;
>
> if (CC(!kvm_dr6_valid(save->dr6)) || CC(!kvm_dr7_valid(save->dr7))) <== kvm_set_dr() in rsm_load_state_{32,64}
> return false;
>
> /*
> * These checks are also performed by KVM_SET_SREGS,
> * except that EFER.LMA is not checked by SVM against
> * CR0.PG && EFER.LME.
> */
> if ((save->efer & EFER_LME) && (save->cr0 & X86_CR0_PG)) {
> if (CC(!(save->cr4 & X86_CR4_PAE)) || <== kvm_set_cr4() in rsm_enter_protected_mode()
> CC(!(save->cr0 & X86_CR0_PE)) || <== kvm_set_cr0() in rsm_enter_protected_mode()
> CC(!kvm_vcpu_is_legal_cr3(vcpu, save->cr3))) <== kvm_set_cr3() in rsm_enter_protected_mode()
> return false;
> }
>
> /* Note, SVM doesn't have any additional restrictions on CR4. */
> if (CC(!__kvm_is_valid_cr4(vcpu, save->cr4))) <== kvm_set_cr4() in rsm_enter_protected_mode()
> return false;
>
> if (CC(!kvm_valid_efer(vcpu, save->efer))) <== __kvm_emulate_msr_write() in rsm_load_state_64()
> return false;
>
> Even if RSM were missing checks on the L2 state being loaded, I'm not willing to
> take on _any_ complexity in nested VMRUN to make RSM suck a little less. KVM's
> L2 => SMM => RSM => L2 is fundamentally broken. Anyone that argues otherwise is
> ignoring architecturally defined behavior in the SDM and APM.
>
> If someone wants to actually put in the effort to properly emulating SMI => RSM
> from L2, then I'd be happy to take on some complexity, but even then it's not at
> all clear that it would be necessary.
>
> > 2) Failure to load L2's CR3 or merge the MSR bitmaps. In this case, a
> > fully-fledged #VMEXIT injection is performed as VMCB02 is already
> > prepared.
> >
> > Arguably all VMRUN failures should be handled before the VMCB02 is
> > prepared, but with proper cleanup (e.g. clear the GIF).
>
> Eh, so long as KVM cleans up after itself, I don't see anything wrong with
> preparing some of vmcb02.
>
> So after staring at this for some time, us having gone through multiple attempts
> to get things right, and this being tagged for stable@, unless I'm missing some
> massive simplification this provides down the road, I am strongly against refactoring
> this code, and 100% against reworking things to "fix" SMM.
For context, this patch (and others you quoted below) were a direct
result of this discussion in v2:
https://lore.kernel.org/kvm/aThIQzni6fC1qdgj@google.com/. I didn't
look too closely into the SMM bug tbh I just copy/pasted that verbatim
into the changelog.
As for refactoring the code, I didn't really do it for SMM, but I
think the code is generally cleaner with the single VMRUN failure
path. That being said..
> And so for the stable@ patches, I'm also opposed to all of these:
>
> KVM: nSVM: Refactor minimal #VMEXIT handling out of nested_svm_vmexit()
> KVM: nSVM: Call nested_svm_init_mmu_context() before switching to VMCB02
> KVM: nSVM: Call nested_svm_merge_msrpm() from enter_svm_guest_mode()
> KVM: nSVM: Make nested_svm_merge_msrpm() return an errno
> KVM: nSVM: Call enter_guest_mode() before switching to VMCB02
> KVM: nSVM: Drop nested_vmcb_check_{save/control}() wrappers
>
> unless they're *needed* by some later commit (I didn't look super closely).
>
> For stable@, just fix the GIF case and move on.
.. I am not sure if you mean dropping them completely, or dropping
them from stable@.
I am fine with dropping the stable@ tag from everything from this
point onward, or re-ordering the patches to keep it for the missing
consistency checks.
If you mean drop them completely, it's a bit of a shame because I
think the code ends up looking much better, but I also understand
given all the back-and-forth, and the new problem I reported recently
that will need further refactoring to address (see my other reply to
the same patch).
Let me know how you want to proceed: drop the patches entirely and
just fix GIF, or fix GIF first for stable, and keep the refactoring
for non-stable.
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index d734cd5eef5e..d9790e37d4e8 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -1036,6 +1036,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
> vmcb12->control.exit_code = SVM_EXIT_ERR;
> vmcb12->control.exit_info_1 = 0;
> vmcb12->control.exit_info_2 = 0;
> + svm_set_gif(svm, false);
> goto out;
> }
>
> Sorry for not catching this earlier, I didn't actually read the changelog until
> this version. /facepalm
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v6 16/31] KVM: nSVM: Unify handling of VMRUN failures with proper cleanup
2026-02-28 0:46 ` Yosry Ahmed
@ 2026-02-28 1:05 ` Sean Christopherson
2026-03-02 16:02 ` Yosry Ahmed
0 siblings, 1 reply; 44+ messages in thread
From: Sean Christopherson @ 2026-02-28 1:05 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
On Fri, Feb 27, 2026, Yosry Ahmed wrote:
> On Fri, Feb 27, 2026 at 4:07 PM Sean Christopherson <seanjc@google.com> wrote:
> > So after staring at this for some time, us having gone through multiple attempts
> > to get things right, and this being tagged for stable@, unless I'm missing some
> > massive simplification this provides down the road, I am strongly against refactoring
> > this code, and 100% against reworking things to "fix" SMM.
>
> For context, this patch (and others you quoted below) were a direct
> result of this discussion in v2:
> https://lore.kernel.org/kvm/aThIQzni6fC1qdgj@google.com/. I didn't
> look too closely into the SMM bug tbh I just copy/pasted that verbatim
> into the changelog.
Well fudge. I'm sorry. I obviously got a bit hasty with the whole svm_leave_smm()
thing. *sigh*
> As for refactoring the code, I didn't really do it for SMM, but I
> think the code is generally cleaner with the single VMRUN failure
> path.
Except for the minor detail of being wrong :-)
And while I agree it's probably "cleaner", I think it's actually harder for
readers to understand, especially for readers that are familiar with nVMX. E.g.
having a separate #VMEXIT path for consistency checks can actually be helpful,
because it highlights things that happen on #VMEXIT even when KVM hasn't started
loading L2 state.
> That being said..
>
> > And so for the stable@ patches, I'm also opposed to all of these:
> >
> > KVM: nSVM: Refactor minimal #VMEXIT handling out of nested_svm_vmexit()
> > KVM: nSVM: Call nested_svm_init_mmu_context() before switching to VMCB02
> > KVM: nSVM: Call nested_svm_merge_msrpm() from enter_svm_guest_mode()
> > KVM: nSVM: Make nested_svm_merge_msrpm() return an errno
> > KVM: nSVM: Call enter_guest_mode() before switching to VMCB02
> > KVM: nSVM: Drop nested_vmcb_check_{save/control}() wrappers
> >
> > unless they're *needed* by some later commit (I didn't look super closely).
> >
> > For stable@, just fix the GIF case and move on.
>
> .. I am not sure if you mean dropping them completely, or dropping
> them from stable@.
My preference is to completely drop these:
KVM: nSVM: Unify handling of VMRUN failures with proper cleanup
KVM: nSVM: Refactor minimal #VMEXIT handling out of nested_svm_vmexit()
KVM: nSVM: Call nested_svm_init_mmu_context() before switching to VMCB02
KVM: nSVM: Call nested_svm_merge_msrpm() from enter_svm_guest_mode()
KVM: nSVM: Call enter_guest_mode() before switching to VMCB02
> I am fine with dropping the stable@ tag from everything from this
> point onward, or re-ordering the patches to keep it for the missing
> consistency checks.
And then moving these to the end of the series (or at least, beyond the stable@
patches):
KVM: nSVM: Make nested_svm_merge_msrpm() return an errno
KVM: nSVM: Drop nested_vmcb_check_{save/control}() wrappers
> If you mean drop them completely, it's a bit of a shame because I
> think the code ends up looking much better, but I also understand
> given all the back-and-forth, and the new problem I reported recently
> that will need further refactoring to address (see my other reply to
> the same patch).
After paging more of this stuff back in (it's been a while since I looked at the
equivalent nVMX flow in depth), I'm quite opposed to aiming for a unified #VMEXIT
path for VMRUN. Although it might seem otherwise at times, nVMX and nSVM didn't
end up with nested_vmx_load_cr3() buried toward the end of their flows purely to
make the code harder to read, there are real dependencies that need to be taken
into account.
And there's also value in having similar flows for nVMX and nSVM, e.g. where most
consistency checks occur before KVM starts loading L2 state. VMX just happens to
architecturally _require_ that, whereas with SVM it was a naturally consequence
of writing the code.
Without the unification, a minimal #VMEXIT helper doesn't make any sense, and
I don't see any strong justification for shuffling around the order.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v6 16/31] KVM: nSVM: Unify handling of VMRUN failures with proper cleanup
2026-02-28 1:05 ` Sean Christopherson
@ 2026-03-02 16:02 ` Yosry Ahmed
2026-03-02 16:11 ` Sean Christopherson
0 siblings, 1 reply; 44+ messages in thread
From: Yosry Ahmed @ 2026-03-02 16:02 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
> > As for refactoring the code, I didn't really do it for SMM, but I
> > think the code is generally cleaner with the single VMRUN failure
> > path.
>
> Except for the minor detail of being wrong :-)
I guess we're nitpicking now :P
> My preference is to completely drop these:
>
> KVM: nSVM: Unify handling of VMRUN failures with proper cleanup
> KVM: nSVM: Refactor minimal #VMEXIT handling out of nested_svm_vmexit()
> KVM: nSVM: Call nested_svm_init_mmu_context() before switching to VMCB02
> KVM: nSVM: Call nested_svm_merge_msrpm() from enter_svm_guest_mode()
> KVM: nSVM: Call enter_guest_mode() before switching to VMCB02
>
> > I am fine with dropping the stable@ tag from everything from this
> > point onward, or re-ordering the patches to keep it for the missing
> > consistency checks.
>
> And then moving these to the end of the series (or at least, beyond the stable@
> patches):
>
> KVM: nSVM: Make nested_svm_merge_msrpm() return an errno
I don't think there's much value in keeping this now, it was mainly needed for:
> KVM: nSVM: Call nested_svm_merge_msrpm() from enter_svm_guest_mode()
But I can keep it if you like it on its own.
> KVM: nSVM: Drop nested_vmcb_check_{save/control}() wrappers
This one will still be needed ahead of the consistency checks, specifically:
> KVM: nSVM: Add missing consistency check for hCR0.PG and NP_ENABLE
As we pass in L1's CR0, and with the wrappers in place it isn't
obviously correct that the current CR0 is L1's.
> > If you mean drop them completely, it's a bit of a shame because I
> > think the code ends up looking much better, but I also understand
> > given all the back-and-forth, and the new problem I reported recently
> > that will need further refactoring to address (see my other reply to
> > the same patch).
>
> After paging more of this stuff back in (it's been a while since I looked at the
> equivalent nVMX flow in depth), I'm quite opposed to aiming for a unified #VMEXIT
> path for VMRUN. Although it might seem otherwise at times, nVMX and nSVM didn't
> end up with nested_vmx_load_cr3() buried toward the end of their flows purely to
> make the code harder to read, there are real dependencies that need to be taken
> into account.
Yeah, and I tried to take them into account but that obviously didn't
work out well :)
> And there's also value in having similar flows for nVMX and nSVM, e.g. where most
> consistency checks occur before KVM starts loading L2 state. VMX just happens to
> architecturally _require_ that, whereas with SVM it was a naturally consequence
> of writing the code.
>
> Without the unification, a minimal #VMEXIT helper doesn't make any sense, and
> I don't see any strong justification for shuffling around the order.
Yeah that's fair.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v6 16/31] KVM: nSVM: Unify handling of VMRUN failures with proper cleanup
2026-03-02 16:02 ` Yosry Ahmed
@ 2026-03-02 16:11 ` Sean Christopherson
0 siblings, 0 replies; 44+ messages in thread
From: Sean Christopherson @ 2026-03-02 16:11 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
On Mon, Mar 02, 2026, Yosry Ahmed wrote:
> > > As for refactoring the code, I didn't really do it for SMM, but I
> > > think the code is generally cleaner with the single VMRUN failure
> > > path.
> >
> > Except for the minor detail of being wrong :-)
>
> I guess we're nitpicking now :P
>
> > My preference is to completely drop these:
> >
> > KVM: nSVM: Unify handling of VMRUN failures with proper cleanup
> > KVM: nSVM: Refactor minimal #VMEXIT handling out of nested_svm_vmexit()
> > KVM: nSVM: Call nested_svm_init_mmu_context() before switching to VMCB02
> > KVM: nSVM: Call nested_svm_merge_msrpm() from enter_svm_guest_mode()
> > KVM: nSVM: Call enter_guest_mode() before switching to VMCB02
> >
> > > I am fine with dropping the stable@ tag from everything from this
> > > point onward, or re-ordering the patches to keep it for the missing
> > > consistency checks.
> >
> > And then moving these to the end of the series (or at least, beyond the stable@
> > patches):
> >
> > KVM: nSVM: Make nested_svm_merge_msrpm() return an errno
>
> I don't think there's much value in keeping this now, it was mainly needed for:
>
> > KVM: nSVM: Call nested_svm_merge_msrpm() from enter_svm_guest_mode()
>
> But I can keep it if you like it on its own.
Hmm. I don't have a strong preference. Let's skip it for now. As much as I
dislike boolean returns, 0/-errno isn't obviously better in this case, and we
can always change it later.
> > KVM: nSVM: Drop nested_vmcb_check_{save/control}() wrappers
>
> This one will still be needed ahead of the consistency checks, specifically:
>
> > KVM: nSVM: Add missing consistency check for hCR0.PG and NP_ENABLE
>
> As we pass in L1's CR0, and with the wrappers in place it isn't
> obviously correct that the current CR0 is L1's.
Oh, gotcha. I'm a-ok keeping that one in the stable@ path, it's not at all
scary.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v6 20/31] KVM: nSVM: Add missing consistency check for hCR0.PG and NP_ENABLE
2026-02-24 22:33 ` [PATCH v6 20/31] KVM: nSVM: Add missing consistency check for hCR0.PG and NP_ENABLE Yosry Ahmed
@ 2026-03-03 0:00 ` Yosry Ahmed
0 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-03-03 0:00 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 752dd9eb98a84..6fffb6ae6b88b 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -342,7 +342,8 @@ static bool nested_svm_check_bitmap_pa(struct kvm_vcpu *vcpu, u64 pa, u32 size)
> }
>
> static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
> - struct vmcb_ctrl_area_cached *control)
> + struct vmcb_ctrl_area_cached *control,
> + unsigned long l1_cr0)
> {
> if (CC(!vmcb12_is_intercept(control, INTERCEPT_VMRUN)))
> return false;
> @@ -353,6 +354,8 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
> if (control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) {
> if (CC(!kvm_vcpu_is_legal_gpa(vcpu, control->nested_cr3)))
> return false;
> + if (CC(!(l1_cr0 & X86_CR0_PG)))
> + return false;
This is already checked by nested_svm_check_permissions() -> is_paging().
> }
>
> if (CC(!nested_svm_check_bitmap_pa(vcpu, control->msrpm_base_pa,
> @@ -952,7 +955,8 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
> enter_guest_mode(vcpu);
>
> if (!nested_vmcb_check_save(vcpu, &svm->nested.save) ||
> - !nested_vmcb_check_controls(vcpu, &svm->nested.ctl))
> + !nested_vmcb_check_controls(vcpu, &svm->nested.ctl,
> + svm->vmcb01.ptr->save.cr0))
> return -EINVAL;
>
> if (nested_npt_enabled(svm))
> @@ -1888,7 +1892,8 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
>
> ret = -EINVAL;
> __nested_copy_vmcb_control_to_cache(vcpu, &ctl_cached, ctl);
> - if (!nested_vmcb_check_controls(vcpu, &ctl_cached))
> + /* 'save' contains L1 state saved from before VMRUN */
> + if (!nested_vmcb_check_controls(vcpu, &ctl_cached, save->cr0))
..and this is checked slightly below.
I will drop this patch.
> goto out_free;
>
> /*
> --
> 2.53.0.414.gf7e9f6c205-goog
>
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v6 03/31] KVM: SVM: Add missing save/restore handling of LBR MSRs
2026-02-24 22:33 ` [PATCH v6 03/31] KVM: SVM: Add missing save/restore handling of LBR MSRs Yosry Ahmed
@ 2026-03-03 0:02 ` Yosry Ahmed
0 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-03-03 0:02 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable, Jim Mattson
On Tue, Feb 24, 2026 at 2:34 PM Yosry Ahmed <yosry@kernel.org> wrote:
>
> MSR_IA32_DEBUGCTLMSR and LBR MSRs are currently not enumerated by
> KVM_GET_MSR_INDEX_LIST, and LBR MSRs cannot be set with KVM_SET_MSRS. So
> save/restore is completely broken.
>
> Fix it by adding the MSRs to msrs_to_save_base, and allowing writes to
> LBR MSRs from userspace only (as they are read-only MSRs). Additionally,
> to correctly restore L1's LBRs while L2 is running, make sure the LBRs
> are copied from the captured VMCB01 save area in svm_copy_vmrun_state().
>
> Fixes: 24e09cbf480a ("KVM: SVM: enable LBR virtualization")
> Cc: stable@vger.kernel.org
> Reported-by: Jim Mattson <jmattson@google.com>
> Signed-off-by: Yosry Ahmed <yosry@kernel.org>
> ---
> arch/x86/kvm/svm/nested.c | 3 +++
> arch/x86/kvm/svm/svm.c | 24 ++++++++++++++++++++++++
> arch/x86/kvm/x86.c | 3 +++
> 3 files changed, 30 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index f7d5db0af69ac..52d8536845927 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -1100,6 +1100,9 @@ void svm_copy_vmrun_state(struct vmcb_save_area *to_save,
> to_save->isst_addr = from_save->isst_addr;
> to_save->ssp = from_save->ssp;
> }
> +
> + if (lbrv)
> + svm_copy_lbrs(to_save, from_save);
We need to clear reserved bits here, similar to
nested_vmcb02_prepare_save(), as this is stuff by userspace.
> }
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v6 21/31] KVM: nSVM: Add missing consistency check for EFER, CR0, CR4, and CS
2026-02-24 22:33 ` [PATCH v6 21/31] KVM: nSVM: Add missing consistency check for EFER, CR0, CR4, and CS Yosry Ahmed
@ 2026-03-03 0:03 ` Yosry Ahmed
0 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-03-03 0:03 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
On Tue, Feb 24, 2026 at 2:34 PM Yosry Ahmed <yosry@kernel.org> wrote:
>
> According to the APM Volume #2, 15.5, Canonicalization and Consistency
> Checks (24593—Rev. 3.42—March 2024), the following condition (among
> others) results in a #VMEXIT with VMEXIT_INVALID (aka SVM_EXIT_ERR):
>
> EFER.LME, CR0.PG, CR4.PAE, CS.L, and CS.D are all non-zero.
>
> Add the missing consistency check. This is functionally a nop because
> the nested VMRUN results in SVM_EXIT_ERR in HW, which is forwarded to
> L1, but KVM makes all consistency checks before a VMRUN is actually
> attempted.
>
> Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler")
> Cc: stable@vger.kernel.org
> Signed-off-by: Yosry Ahmed <yosry@kernel.org>
> ---
> arch/x86/kvm/svm/nested.c | 7 +++++++
> arch/x86/kvm/svm/svm.h | 1 +
> 2 files changed, 8 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 6fffb6ae6b88b..2c852e94a9ad9 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -397,6 +397,11 @@ static bool nested_vmcb_check_save(struct kvm_vcpu *vcpu,
> CC(!(save->cr0 & X86_CR0_PE)) ||
> CC(!kvm_vcpu_is_legal_cr3(vcpu, save->cr3)))
> return false;
> +
> + if (CC((save->cr4 & X86_CR4_PAE) &&
No need to check X86_CR4_PAE here, as it's checked right above the
context lines.
> + (save->cs.attrib & SVM_SELECTOR_L_MASK) &&
> + (save->cs.attrib & SVM_SELECTOR_DB_MASK)))
> + return false;
> }
>
> /* Note, SVM doesn't have any additional restrictions on CR4. */
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v6 31/31] KVM: selftest: Add a selftest for VMRUN/#VMEXIT with unmappable vmcb12
2026-02-24 22:34 ` [PATCH v6 31/31] KVM: selftest: Add a selftest for VMRUN/#VMEXIT with unmappable vmcb12 Yosry Ahmed
@ 2026-03-03 0:13 ` Yosry Ahmed
0 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2026-03-03 0:13 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
> +++ b/tools/testing/selftests/kvm/x86/svm_nested_invalid_vmcb12_gpa.c
> @@ -0,0 +1,95 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2026, Google LLC.
> + */
> +#include "kvm_util.h"
> +#include "vmx.h"
> +#include "svm_util.h"
> +#include "kselftest.h"
> +
> +
> +#define L2_GUEST_STACK_SIZE 64
> +
> +#define SYNC_GP 101
> +#define SYNC_L2_STARTED 102
> +
> +extern char invalid_vmrun;
> +
> +static void guest_gp_handler(struct ex_regs *regs)
> +{
> + GUEST_SYNC(SYNC_GP);
> + regs->rip = (uintptr_t)&invalid_vmrun;
Instead of jumping after run_guest() and skipping the host restore
sequence, it's probably better to fixup RAX here and have a single
run_guest() call in l1_guest_code().
> +}
> +
> +static void l2_guest_code(void)
> +{
> + GUEST_SYNC(SYNC_L2_STARTED);
> + vmcall();
> +}
> +
> +static void l1_guest_code(struct svm_test_data *svm, u64 invalid_vmcb12_gpa)
> +{
> + unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
> +
> + generic_svm_setup(svm, l2_guest_code,
> + &l2_guest_stack[L2_GUEST_STACK_SIZE]);
> +
> + run_guest(svm->vmcb, invalid_vmcb12_gpa); /* #GP */
> +
> + /* GP handler should jump here */
> + asm volatile ("invalid_vmrun:");
> + run_guest(svm->vmcb, svm->vmcb_gpa);
> + GUEST_ASSERT(svm->vmcb->control.exit_code == SVM_EXIT_VMMCALL);
> + GUEST_DONE();
> +}
^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2026-03-03 0:13 UTC | newest]
Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-24 22:33 [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 01/31] KVM: nSVM: Avoid clearing VMCB_LBR in vmcb12 Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 02/31] KVM: SVM: Switch svm_copy_lbrs() to a macro Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 03/31] KVM: SVM: Add missing save/restore handling of LBR MSRs Yosry Ahmed
2026-03-03 0:02 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 04/31] KVM: selftests: Add a test for LBR save/restore (ft. nested) Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 05/31] KVM: nSVM: Always inject a #GP if mapping VMCB12 fails on nested VMRUN Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 06/31] KVM: nSVM: Refactor checking LBRV enablement in vmcb12 into a helper Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 07/31] KVM: nSVM: Refactor writing vmcb12 on nested #VMEXIT as " Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 08/31] KVM: nSVM: Triple fault if mapping VMCB12 fails on nested #VMEXIT Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 09/31] KVM: nSVM: Triple fault if restore host CR3 " Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 10/31] KVM: nSVM: Drop nested_vmcb_check_{save/control}() wrappers Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 11/31] KVM: nSVM: Call enter_guest_mode() before switching to VMCB02 Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 12/31] KVM: nSVM: Make nested_svm_merge_msrpm() return an errno Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 13/31] KVM: nSVM: Call nested_svm_merge_msrpm() from enter_svm_guest_mode() Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 14/31] KVM: nSVM: Call nested_svm_init_mmu_context() before switching to VMCB02 Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 15/31] KVM: nSVM: Refactor minimal #VMEXIT handling out of nested_svm_vmexit() Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 16/31] KVM: nSVM: Unify handling of VMRUN failures with proper cleanup Yosry Ahmed
2026-02-25 22:44 ` Yosry Ahmed
2026-02-28 0:07 ` Sean Christopherson
2026-02-28 0:46 ` Yosry Ahmed
2026-02-28 1:05 ` Sean Christopherson
2026-03-02 16:02 ` Yosry Ahmed
2026-03-02 16:11 ` Sean Christopherson
2026-02-24 22:33 ` [PATCH v6 17/31] KVM: nSVM: Clear EVENTINJ field in VMCB12 on nested #VMEXIT Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 18/31] KVM: nSVM: Drop the non-architectural consistency check for NP_ENABLE Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 19/31] KVM: nSVM: Add missing consistency check for nCR3 validity Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 20/31] KVM: nSVM: Add missing consistency check for hCR0.PG and NP_ENABLE Yosry Ahmed
2026-03-03 0:00 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 21/31] KVM: nSVM: Add missing consistency check for EFER, CR0, CR4, and CS Yosry Ahmed
2026-03-03 0:03 ` Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 22/31] KVM: nSVM: Add missing consistency check for EVENTINJ Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 23/31] KVM: SVM: Rename vmcb->nested_ctl to vmcb->misc_ctl Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 24/31] KVM: SVM: Rename vmcb->virt_ext to vmcb->misc_ctl2 Yosry Ahmed
2026-02-24 22:33 ` [PATCH v6 25/31] KVM: nSVM: Cache all used fields from VMCB12 Yosry Ahmed
2026-02-24 23:58 ` Yosry Ahmed
2026-02-24 22:34 ` [PATCH v6 26/31] KVM: nSVM: Restrict mapping VMCB12 on nested VMRUN Yosry Ahmed
2026-02-24 22:34 ` [PATCH v6 27/31] KVM: nSVM: Use PAGE_MASK to drop lower bits of bitmap GPAs from vmcb12 Yosry Ahmed
2026-02-24 22:34 ` [PATCH v6 28/31] KVM: nSVM: Sanitize TLB_CONTROL field when copying " Yosry Ahmed
2026-02-24 22:34 ` [PATCH v6 29/31] KVM: nSVM: Sanitize INT/EVENTINJ fields " Yosry Ahmed
2026-02-24 22:34 ` [PATCH v6 30/31] KVM: nSVM: Only copy SVM_MISC_ENABLE_NP from VMCB01's misc_ctl Yosry Ahmed
2026-02-24 22:34 ` [PATCH v6 31/31] KVM: selftest: Add a selftest for VMRUN/#VMEXIT with unmappable vmcb12 Yosry Ahmed
2026-03-03 0:13 ` Yosry Ahmed
2026-02-24 22:37 ` [PATCH 00/31] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox