* [PATCH 00/11] Nested SVM fixes, cleanups, and hardening
@ 2025-11-04 19:59 Yosry Ahmed
2025-11-04 19:59 ` [PATCH 01/11] KVM: nSVM: Fix consistency checks for NP_ENABLE Yosry Ahmed
` (10 more replies)
0 siblings, 11 replies; 17+ messages in thread
From: Yosry Ahmed @ 2025-11-04 19:59 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, kvm, linux-kernel, Yosry Ahmed
A group of semi-related fixes, cleanups, and hardening patches for nSVM.
Patches 1-3 fix or add missing consistency checks.
Patches 4-5 are renames to clarify some VMCB fields.
Patches 6-10 add hardening to reading the VMCB12, caching all used
fields in the save area to prevent theoritical TOC-TOU bugs, sanitizing
used fields in the control area, and restricting accesses to the VMCB12
through guest memory.
Patch 11 further restricts fields copied from VMCB01 to VMCB12.
Yosry Ahmed (11):
KVM: nSVM: Fix consistency checks for NP_ENABLE
KVM: nSVM: Add missing consistency check for EFER, CR0, CR4, and CS
KVM: nSVM: Add missing consistency check for event_inj
KVM: SVM: Rename vmcb->nested_ctl to vmcb->misc_ctl
KVM: SVM: Rename vmcb->virt_ext to vmcb->misc_ctl2
KVM: SVM: switch svm_copy_lbrs() to a macro
KVM: nSVM: Cache all used fields from VMCB12
KVM: nSVM: Restrict mapping VMCB12 on nested VMRUN
KVM: nSVM: Simplify nested_svm_vmrun()
KVM: nSVM: Sanitize control fields copied from VMCB12
KVM: nSVM: Only copy NP_ENABLE from VMCB01's nested_ctl
arch/x86/include/asm/svm.h | 31 +-
arch/x86/kvm/svm/nested.c | 335 +++++++++++-------
arch/x86/kvm/svm/sev.c | 4 +-
arch/x86/kvm/svm/svm.c | 51 ++-
arch/x86/kvm/svm/svm.h | 46 ++-
tools/testing/selftests/kvm/include/x86/svm.h | 14 +-
6 files changed, 302 insertions(+), 179 deletions(-)
--
2.51.2.1026.g39e6a42477-goog
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 01/11] KVM: nSVM: Fix consistency checks for NP_ENABLE
2025-11-04 19:59 [PATCH 00/11] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
@ 2025-11-04 19:59 ` Yosry Ahmed
2025-11-05 18:52 ` Yosry Ahmed
2025-11-04 19:59 ` [PATCH 02/11] KVM: nSVM: Add missing consistency check for EFER, CR0, CR4, and CS Yosry Ahmed
` (9 subsequent siblings)
10 siblings, 1 reply; 17+ messages in thread
From: Yosry Ahmed @ 2025-11-04 19:59 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, kvm, linux-kernel, Yosry Ahmed,
stable
KVM currenty fails a nested VMRUN and injects VMEXIT_INVALID (aka
SVM_EXIT_ERR) if L1 sets NP_ENABLE and the host does not support NPTs.
On first glance, it seems like the check should actually be for
guest_cpu_cap_has(X86_FEATURE_NPT) instead, as it is possible for the
host to support NPTs but the guest CPUID to not advertise it.
However, the consistency check is not architectural to begin with. The
APM does not mention VMEXIT_INVALID if NP_ENABLE is set on a processor
that does not have X86_FEATURE_NPT. Hence, NP_ENABLE should be ignored
if X86_FEATURE_NPT is not available for L1. Apart from the consistency
check, this is currently the case because NP_ENABLE is actually copied
from VMCB01 to VMCB02, not from VMCB12.
On the other hand, the APM does mention two other consistency checks for
NP_ENABLE, both of which are missing (paraphrased):
In Volume #2, 15.25.3 (24593—Rev. 3.42—March 2024):
If VMRUN is executed with hCR0.PG cleared to zero and NP_ENABLE set to
1, VMRUN terminates with #VMEXIT(VMEXIT_INVALID)
In Volume #2, 15.25.4 (24593—Rev. 3.42—March 2024):
When VMRUN is executed with nested paging enabled (NP_ENABLE = 1), the
following conditions are considered illegal state combinations, in
addition to those mentioned in “Canonicalization and Consistency
Checks”:
• Any MBZ bit of nCR3 is set.
• Any G_PAT.PA field has an unsupported type encoding or any
reserved field in G_PAT has a nonzero value.
Replace the existing consistency check with consistency checks on
hCR0.PG and nCR3. The G_PAT consistency check will be addressed
separately.
Pass L1's CR0 to __nested_vmcb_check_controls(). In
nested_vmcb_check_controls(), L1's CR0 is available through
kvm_read_cr0(), as vcpu->arch.cr0 is not updated to L2's CR0 until later
through nested_vmcb02_prepare_save() -> svm_set_cr0().
In svm_set_nested_state(), L1's CR0 is available in the captured save
area, as svm_get_nested_state() captures L1's save area when running L2,
and L1's CR0 is stashed in VMCB01 on nested VMRUN (in
nested_svm_vmrun()).
Fixes: 4b16184c1cca ("KVM: SVM: Initialize Nested Nested MMU context on VMRUN")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
arch/x86/kvm/svm/nested.c | 21 ++++++++++++++++-----
arch/x86/kvm/svm/svm.h | 3 ++-
2 files changed, 18 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 83de3456df708..9a534f04bdc83 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -325,7 +325,8 @@ static bool nested_svm_check_bitmap_pa(struct kvm_vcpu *vcpu, u64 pa, u32 size)
}
static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
- struct vmcb_ctrl_area_cached *control)
+ struct vmcb_ctrl_area_cached *control,
+ unsigned long l1_cr0)
{
if (CC(!vmcb12_is_intercept(control, INTERCEPT_VMRUN)))
return false;
@@ -333,8 +334,12 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
if (CC(control->asid == 0))
return false;
- if (CC((control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) && !npt_enabled))
- return false;
+ if (control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) {
+ if (CC(!kvm_vcpu_is_legal_gpa(vcpu, control->nested_cr3)))
+ return false;
+ if (CC(!(l1_cr0 & X86_CR0_PG)))
+ return false;
+ }
if (CC(!nested_svm_check_bitmap_pa(vcpu, control->msrpm_base_pa,
MSRPM_SIZE)))
@@ -400,7 +405,12 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb_ctrl_area_cached *ctl = &svm->nested.ctl;
- return __nested_vmcb_check_controls(vcpu, ctl);
+ /*
+ * Make sure we did not enter guest mode yet, in which case
+ * kvm_read_cr0() could return L2's CR0.
+ */
+ WARN_ON_ONCE(is_guest_mode(vcpu));
+ return __nested_vmcb_check_controls(vcpu, ctl, kvm_read_cr0(vcpu));
}
static
@@ -1832,7 +1842,8 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
ret = -EINVAL;
__nested_copy_vmcb_control_to_cache(vcpu, &ctl_cached, ctl);
- if (!__nested_vmcb_check_controls(vcpu, &ctl_cached))
+ /* 'save' contains L1 state saved from before VMRUN */
+ if (!__nested_vmcb_check_controls(vcpu, &ctl_cached, save->cr0))
goto out_free;
/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 6765a5e433cea..0a2908e22d746 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -552,7 +552,8 @@ static inline bool gif_set(struct vcpu_svm *svm)
static inline bool nested_npt_enabled(struct vcpu_svm *svm)
{
- return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
+ return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_NPT) &&
+ svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
}
static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
--
2.51.2.1026.g39e6a42477-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 02/11] KVM: nSVM: Add missing consistency check for EFER, CR0, CR4, and CS
2025-11-04 19:59 [PATCH 00/11] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
2025-11-04 19:59 ` [PATCH 01/11] KVM: nSVM: Fix consistency checks for NP_ENABLE Yosry Ahmed
@ 2025-11-04 19:59 ` Yosry Ahmed
2025-11-04 19:59 ` [PATCH 03/11] KVM: nSVM: Add missing consistency check for event_inj Yosry Ahmed
` (8 subsequent siblings)
10 siblings, 0 replies; 17+ messages in thread
From: Yosry Ahmed @ 2025-11-04 19:59 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, kvm, linux-kernel, Yosry Ahmed
According to the APM Volume #2, 15.5, Canonicalization and Consistency
Checks (24593—Rev. 3.42—March 2024), the following condition (among
others) results in a #VMEXIT with VMEXIT_INVALID (aka SVM_EXIT_ERR):
EFER.LME, CR0.PG, CR4.PAE, CS.L, and CS.D are all non-zero.
Add the missing consistency check. This is functionally a nop because
the nested VMRUN results in SVM_EXIT_ERR in HW, which is forwarded to
L1, but KVM makes all consistency checks before a VMRUN is actually
attempted.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
arch/x86/kvm/svm/nested.c | 7 +++++++
arch/x86/kvm/svm/svm.h | 1 +
2 files changed, 8 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 9a534f04bdc83..9866b2fd8f32a 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -380,6 +380,11 @@ static bool __nested_vmcb_check_save(struct kvm_vcpu *vcpu,
CC(!(save->cr0 & X86_CR0_PE)) ||
CC(!kvm_vcpu_is_legal_cr3(vcpu, save->cr3)))
return false;
+
+ if (CC((save->cr4 & X86_CR4_PAE) &&
+ (save->cs.attrib & SVM_SELECTOR_L_MASK) &&
+ (save->cs.attrib & SVM_SELECTOR_DB_MASK)))
+ return false;
}
/* Note, SVM doesn't have any additional restrictions on CR4. */
@@ -473,6 +478,8 @@ static void __nested_copy_vmcb_save_to_cache(struct vmcb_save_area_cached *to,
* Copy only fields that are validated, as we need them
* to avoid TOC/TOU races.
*/
+ to->cs = from->cs;
+
to->efer = from->efer;
to->cr0 = from->cr0;
to->cr3 = from->cr3;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0a2908e22d746..efcf923264c55 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -142,6 +142,7 @@ struct kvm_vmcb_info {
};
struct vmcb_save_area_cached {
+ struct vmcb_seg cs;
u64 efer;
u64 cr4;
u64 cr3;
--
2.51.2.1026.g39e6a42477-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 03/11] KVM: nSVM: Add missing consistency check for event_inj
2025-11-04 19:59 [PATCH 00/11] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
2025-11-04 19:59 ` [PATCH 01/11] KVM: nSVM: Fix consistency checks for NP_ENABLE Yosry Ahmed
2025-11-04 19:59 ` [PATCH 02/11] KVM: nSVM: Add missing consistency check for EFER, CR0, CR4, and CS Yosry Ahmed
@ 2025-11-04 19:59 ` Yosry Ahmed
2025-11-05 18:48 ` Sean Christopherson
2025-11-04 19:59 ` [PATCH 04/11] KVM: SVM: Rename vmcb->nested_ctl to vmcb->misc_ctl Yosry Ahmed
` (7 subsequent siblings)
10 siblings, 1 reply; 17+ messages in thread
From: Yosry Ahmed @ 2025-11-04 19:59 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, kvm, linux-kernel, Yosry Ahmed
According to the APM Volume #2, 15.20 (24593—Rev. 3.42—March 2024):
VMRUN exits with VMEXIT_INVALID error code if either:
• Reserved values of TYPE have been specified, or
• TYPE = 3 (exception) has been specified with a vector that does not
correspond to an exception (this includes vector 2, which is an NMI,
not an exception).
Add the missing consistency checks to KVM. For the second point, inject
VMEXIT_INVALID if the vector is anything but the vectors defined by the
APM for exceptions. Reserved vectors are also considered invalid, which
matches the HW behavior. Vector 9 (i.e. #CSO) is considered invalid
because it is reserved on modern CPUs, and according to LLMs no CPUs
exist supporting SVM and producing #CSOs.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
arch/x86/include/asm/svm.h | 5 +++++
arch/x86/kvm/svm/nested.c | 33 +++++++++++++++++++++++++++++++++
2 files changed, 38 insertions(+)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index e69b6d0dedcf0..3a9441a8954f3 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -633,6 +633,11 @@ static inline void __unused_size_checks(void)
#define SVM_EVTINJ_VALID (1 << 31)
#define SVM_EVTINJ_VALID_ERR (1 << 11)
+/* Only valid exceptions (and not NMIs) are allowed for SVM_EVTINJ_TYPE_EXEPT */
+#define SVM_EVNTINJ_INVALID_EXEPTS (NMI_VECTOR | BIT_ULL(9) | BIT_ULL(15) | \
+ BIT_ULL(20) | GENMASK_ULL(27, 22) | \
+ BIT_ULL(31))
+
#define SVM_EXITINTINFO_VEC_MASK SVM_EVTINJ_VEC_MASK
#define SVM_EXITINTINFO_TYPE_MASK SVM_EVTINJ_TYPE_MASK
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 9866b2fd8f32a..8641ac9331c5d 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -324,6 +324,36 @@ static bool nested_svm_check_bitmap_pa(struct kvm_vcpu *vcpu, u64 pa, u32 size)
kvm_vcpu_is_legal_gpa(vcpu, addr + size - 1);
}
+/*
+ * According to the APM, VMRUN exits with SVM_EXIT_ERR if:
+ * - The type of event_inj is not one of the defined values.
+ * - The type is SVM_EVTINJ_TYPE_EXEPT, but the vector does not
+ * correspond to an exception (no NMIs and no reserved values).
+ *
+ * These checks are only performed if SVM_EVTINJ_VALID is set.
+ */
+static bool nested_svm_check_event_inj(u32 event_inj)
+{
+ u32 type = event_inj & SVM_EVTINJ_TYPE_MASK;
+ u8 vector = event_inj & SVM_EVTINJ_VEC_MASK;
+
+ if (!(event_inj & SVM_EVTINJ_VALID))
+ return true;
+
+ if (type != SVM_EVTINJ_TYPE_INTR && type != SVM_EVTINJ_TYPE_NMI &&
+ type != SVM_EVTINJ_TYPE_EXEPT && type != SVM_EVTINJ_TYPE_SOFT)
+ return false;
+
+ if (type == SVM_EVTINJ_TYPE_EXEPT) {
+ if (vector >= FIRST_EXTERNAL_VECTOR)
+ return false;
+
+ if ((1 << vector) & SVM_EVNTINJ_INVALID_EXEPTS)
+ return false;
+ }
+ return true;
+}
+
static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
struct vmcb_ctrl_area_cached *control,
unsigned long l1_cr0)
@@ -353,6 +383,9 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
return false;
}
+ if (CC(!nested_svm_check_event_inj(control->event_inj)))
+ return false;
+
return true;
}
--
2.51.2.1026.g39e6a42477-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 04/11] KVM: SVM: Rename vmcb->nested_ctl to vmcb->misc_ctl
2025-11-04 19:59 [PATCH 00/11] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (2 preceding siblings ...)
2025-11-04 19:59 ` [PATCH 03/11] KVM: nSVM: Add missing consistency check for event_inj Yosry Ahmed
@ 2025-11-04 19:59 ` Yosry Ahmed
2025-11-04 19:59 ` [PATCH 05/11] KVM: SVM: Rename vmcb->virt_ext to vmcb->misc_ctl2 Yosry Ahmed
` (6 subsequent siblings)
10 siblings, 0 replies; 17+ messages in thread
From: Yosry Ahmed @ 2025-11-04 19:59 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, kvm, linux-kernel, Yosry Ahmed
The 'nested_ctl' field is misnamed. Although the first bit is for nested
paging, the other defined bits are for SEV/SEV-ES. Other bits in the
same field according to the APM (but not defined by KVM) include "Guest
Mode Execution Trap", "Enable INVLPGB/TLBSYNC", and other control bits
unrelated to 'nested'.
There is nothing common among these bits, so just name the field
misc_ctl.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
arch/x86/include/asm/svm.h | 8 ++++----
arch/x86/kvm/svm/nested.c | 10 +++++-----
arch/x86/kvm/svm/sev.c | 4 ++--
arch/x86/kvm/svm/svm.c | 4 ++--
arch/x86/kvm/svm/svm.h | 4 ++--
tools/testing/selftests/kvm/include/x86/svm.h | 6 +++---
6 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 3a9441a8954f3..ead275da9850e 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -142,7 +142,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u64 exit_info_2;
u32 exit_int_info;
u32 exit_int_info_err;
- u64 nested_ctl;
+ u64 misc_ctl;
u64 avic_vapic_bar;
u64 ghcb_gpa;
u32 event_inj;
@@ -236,9 +236,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define SVM_IOIO_SIZE_MASK (7 << SVM_IOIO_SIZE_SHIFT)
#define SVM_IOIO_ASIZE_MASK (7 << SVM_IOIO_ASIZE_SHIFT)
-#define SVM_NESTED_CTL_NP_ENABLE BIT(0)
-#define SVM_NESTED_CTL_SEV_ENABLE BIT(1)
-#define SVM_NESTED_CTL_SEV_ES_ENABLE BIT(2)
+#define SVM_MISC_CTL_NP_ENABLE BIT(0)
+#define SVM_MISC_CTL_SEV_ENABLE BIT(1)
+#define SVM_MISC_CTL_SEV_ES_ENABLE BIT(2)
#define SVM_TSC_RATIO_RSVD 0xffffff0000000000ULL
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 8641ac9331c5d..9e6b996753e4e 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -364,7 +364,7 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
if (CC(control->asid == 0))
return false;
- if (control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) {
+ if (control->misc_ctl & SVM_MISC_CTL_NP_ENABLE) {
if (CC(!kvm_vcpu_is_legal_gpa(vcpu, control->nested_cr3)))
return false;
if (CC(!(l1_cr0 & X86_CR0_PG)))
@@ -474,7 +474,7 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
to->exit_info_2 = from->exit_info_2;
to->exit_int_info = from->exit_int_info;
to->exit_int_info_err = from->exit_int_info_err;
- to->nested_ctl = from->nested_ctl;
+ to->misc_ctl = from->misc_ctl;
to->event_inj = from->event_inj;
to->event_inj_err = from->event_inj_err;
to->next_rip = from->next_rip;
@@ -800,7 +800,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
}
/* Copied from vmcb01. msrpm_base can be overwritten later. */
- vmcb02->control.nested_ctl = vmcb01->control.nested_ctl;
+ vmcb02->control.misc_ctl = vmcb01->control.misc_ctl;
vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa;
vmcb02->control.msrpm_base_pa = vmcb01->control.msrpm_base_pa;
vmcb_mark_dirty(vmcb02, VMCB_PERM_MAP);
@@ -951,7 +951,7 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
vmcb12->save.rip,
vmcb12->control.int_ctl,
vmcb12->control.event_inj,
- vmcb12->control.nested_ctl,
+ vmcb12->control.misc_ctl,
vmcb12->control.nested_cr3,
vmcb12->save.cr3,
KVM_ISA_SVM);
@@ -1742,7 +1742,7 @@ static void nested_copy_vmcb_cache_to_control(struct vmcb_control_area *dst,
dst->exit_info_2 = from->exit_info_2;
dst->exit_int_info = from->exit_int_info;
dst->exit_int_info_err = from->exit_int_info_err;
- dst->nested_ctl = from->nested_ctl;
+ dst->misc_ctl = from->misc_ctl;
dst->event_inj = from->event_inj;
dst->event_inj_err = from->event_inj_err;
dst->next_rip = from->next_rip;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0835c664fbfdb..4eff5cc43821a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4553,7 +4553,7 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm, bool init_event)
struct kvm_sev_info *sev = to_kvm_sev_info(svm->vcpu.kvm);
struct vmcb *vmcb = svm->vmcb01.ptr;
- svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ES_ENABLE;
+ svm->vmcb->control.misc_ctl |= SVM_MISC_CTL_SEV_ES_ENABLE;
/*
* An SEV-ES guest requires a VMSA area that is a separate from the
@@ -4624,7 +4624,7 @@ void sev_init_vmcb(struct vcpu_svm *svm, bool init_event)
{
struct kvm_vcpu *vcpu = &svm->vcpu;
- svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ENABLE;
+ svm->vmcb->control.misc_ctl |= SVM_MISC_CTL_SEV_ENABLE;
clr_exception_intercept(svm, UD_VECTOR);
/*
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f14709a511aa4..9ef7683fb2ff0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1124,7 +1124,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
if (npt_enabled) {
/* Setup VMCB for Nested Paging */
- control->nested_ctl |= SVM_NESTED_CTL_NP_ENABLE;
+ control->misc_ctl |= SVM_MISC_CTL_NP_ENABLE;
svm_clr_intercept(svm, INTERCEPT_INVLPG);
clr_exception_intercept(svm, PF_VECTOR);
svm_clr_intercept(svm, INTERCEPT_CR3_READ);
@@ -3280,7 +3280,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
pr_err("%-20s%016llx\n", "exit_info2:", control->exit_info_2);
pr_err("%-20s%08x\n", "exit_int_info:", control->exit_int_info);
pr_err("%-20s%08x\n", "exit_int_info_err:", control->exit_int_info_err);
- pr_err("%-20s%lld\n", "nested_ctl:", control->nested_ctl);
+ pr_err("%-20s%lld\n", "misc_ctl:", control->misc_ctl);
pr_err("%-20s%016llx\n", "nested_cr3:", control->nested_cr3);
pr_err("%-20s%016llx\n", "avic_vapic_bar:", control->avic_vapic_bar);
pr_err("%-20s%016llx\n", "ghcb:", control->ghcb_gpa);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index efcf923264c55..980560a815868 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -169,7 +169,7 @@ struct vmcb_ctrl_area_cached {
u64 exit_info_2;
u32 exit_int_info;
u32 exit_int_info_err;
- u64 nested_ctl;
+ u64 misc_ctl;
u32 event_inj;
u32 event_inj_err;
u64 next_rip;
@@ -554,7 +554,7 @@ static inline bool gif_set(struct vcpu_svm *svm)
static inline bool nested_npt_enabled(struct vcpu_svm *svm)
{
return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_NPT) &&
- svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
+ svm->nested.ctl.misc_ctl & SVM_MISC_CTL_NP_ENABLE;
}
static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
diff --git a/tools/testing/selftests/kvm/include/x86/svm.h b/tools/testing/selftests/kvm/include/x86/svm.h
index 29cffd0a91816..5d2bcce34c019 100644
--- a/tools/testing/selftests/kvm/include/x86/svm.h
+++ b/tools/testing/selftests/kvm/include/x86/svm.h
@@ -98,7 +98,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u64 exit_info_2;
u32 exit_int_info;
u32 exit_int_info_err;
- u64 nested_ctl;
+ u64 misc_ctl;
u64 avic_vapic_bar;
u8 reserved_4[8];
u32 event_inj;
@@ -176,8 +176,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define SVM_VM_CR_SVM_LOCK_MASK 0x0008ULL
#define SVM_VM_CR_SVM_DIS_MASK 0x0010ULL
-#define SVM_NESTED_CTL_NP_ENABLE BIT(0)
-#define SVM_NESTED_CTL_SEV_ENABLE BIT(1)
+#define SVM_MISC_CTL_CTL_NP_ENABLE BIT(0)
+#define SVM_MISC_CTL_SEV_ENABLE BIT(1)
struct __attribute__ ((__packed__)) vmcb_seg {
u16 selector;
--
2.51.2.1026.g39e6a42477-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 05/11] KVM: SVM: Rename vmcb->virt_ext to vmcb->misc_ctl2
2025-11-04 19:59 [PATCH 00/11] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (3 preceding siblings ...)
2025-11-04 19:59 ` [PATCH 04/11] KVM: SVM: Rename vmcb->nested_ctl to vmcb->misc_ctl Yosry Ahmed
@ 2025-11-04 19:59 ` Yosry Ahmed
2025-11-04 19:59 ` [PATCH 06/11] KVM: SVM: switch svm_copy_lbrs() to a macro Yosry Ahmed
` (5 subsequent siblings)
10 siblings, 0 replies; 17+ messages in thread
From: Yosry Ahmed @ 2025-11-04 19:59 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, kvm, linux-kernel, Yosry Ahmed
'virt' is confusing in the VMCB because it is relative and ambiguous.
The 'virt_ext' field includes bits for LBR virtualization and
VMSAVE/VMLOAD virtualization, so it's just another miscellaneous control
field. Name it as such.
While at it, move the definitions of the bits below those for
'misc_ctl'.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
arch/x86/include/asm/svm.h | 7 ++---
arch/x86/kvm/svm/nested.c | 28 +++++++++----------
arch/x86/kvm/svm/svm.c | 22 +++++++--------
arch/x86/kvm/svm/svm.h | 2 +-
tools/testing/selftests/kvm/include/x86/svm.h | 8 +++---
5 files changed, 33 insertions(+), 34 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index ead275da9850e..5f3781587dd01 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -148,7 +148,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u32 event_inj;
u32 event_inj_err;
u64 nested_cr3;
- u64 virt_ext;
+ u64 misc_ctl2;
u32 clean;
u32 reserved_5;
u64 next_rip;
@@ -219,9 +219,6 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define X2APIC_MODE_SHIFT 30
#define X2APIC_MODE_MASK (1 << X2APIC_MODE_SHIFT)
-#define LBR_CTL_ENABLE_MASK BIT_ULL(0)
-#define VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK BIT_ULL(1)
-
#define SVM_INTERRUPT_SHADOW_MASK BIT_ULL(0)
#define SVM_GUEST_INTERRUPT_MASK BIT_ULL(1)
@@ -240,6 +237,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define SVM_MISC_CTL_SEV_ENABLE BIT(1)
#define SVM_MISC_CTL_SEV_ES_ENABLE BIT(2)
+#define SVM_MISC_CTL2_LBR_CTL_ENABLE BIT_ULL(0)
+#define SVM_MISC_CTL2_V_VMLOAD_VMSAVE_ENABLE BIT_ULL(1)
#define SVM_TSC_RATIO_RSVD 0xffffff0000000000ULL
#define SVM_TSC_RATIO_MIN 0x0000000000000001ULL
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 9e6b996753e4e..986e6382dc4fa 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -117,7 +117,7 @@ static bool nested_vmcb_needs_vls_intercept(struct vcpu_svm *svm)
if (!nested_npt_enabled(svm))
return true;
- if (!(svm->nested.ctl.virt_ext & VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK))
+ if (!(svm->nested.ctl.misc_ctl2 & SVM_MISC_CTL2_V_VMLOAD_VMSAVE_ENABLE))
return true;
return false;
@@ -180,7 +180,7 @@ void recalc_intercepts(struct vcpu_svm *svm)
vmcb_set_intercept(c, INTERCEPT_VMLOAD);
vmcb_set_intercept(c, INTERCEPT_VMSAVE);
} else {
- WARN_ON(!(c->virt_ext & VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK));
+ WARN_ON(!(c->misc_ctl2 & SVM_MISC_CTL2_V_VMLOAD_VMSAVE_ENABLE));
}
}
@@ -479,7 +479,7 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
to->event_inj_err = from->event_inj_err;
to->next_rip = from->next_rip;
to->nested_cr3 = from->nested_cr3;
- to->virt_ext = from->virt_ext;
+ to->misc_ctl2 = from->misc_ctl2;
to->pause_filter_count = from->pause_filter_count;
to->pause_filter_thresh = from->pause_filter_thresh;
@@ -721,7 +721,7 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
}
if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
- (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
+ (svm->nested.ctl.misc_ctl2 & SVM_MISC_CTL2_LBR_CTL_ENABLE))) {
/*
* Reserved bits of DEBUGCTL are ignored. Be consistent with
* svm_set_msr's definition of reserved bits.
@@ -730,7 +730,7 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
vmcb02->save.dbgctl &= ~DEBUGCTL_RESERVED_BITS;
svm_update_lbrv(&svm->vcpu);
- } else if (unlikely(vmcb01->control.virt_ext & LBR_CTL_ENABLE_MASK)) {
+ } else if (unlikely(vmcb01->control.misc_ctl2 & SVM_MISC_CTL2_LBR_CTL_ENABLE)) {
svm_copy_lbrs(vmcb02, vmcb01);
}
}
@@ -885,14 +885,14 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
svm->soft_int_next_rip = vmcb12_rip;
}
- vmcb02->control.virt_ext = vmcb01->control.virt_ext &
- LBR_CTL_ENABLE_MASK;
+ vmcb02->control.misc_ctl2 = vmcb01->control.misc_ctl2 &
+ SVM_MISC_CTL2_LBR_CTL_ENABLE;
if (guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV))
- vmcb02->control.virt_ext |=
- (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK);
+ vmcb02->control.misc_ctl2 |=
+ (svm->nested.ctl.misc_ctl2 & SVM_MISC_CTL2_LBR_CTL_ENABLE);
if (!nested_vmcb_needs_vls_intercept(svm))
- vmcb02->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
+ vmcb02->control.misc_ctl2 |= SVM_MISC_CTL2_V_VMLOAD_VMSAVE_ENABLE;
if (guest_cpu_cap_has(vcpu, X86_FEATURE_PAUSEFILTER))
pause_count12 = svm->nested.ctl.pause_filter_count;
@@ -1241,10 +1241,10 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
- (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
+ (svm->nested.ctl.misc_ctl2 & SVM_MISC_CTL2_LBR_CTL_ENABLE))) {
svm_copy_lbrs(vmcb12, vmcb02);
svm_update_lbrv(vcpu);
- } else if (unlikely(vmcb01->control.virt_ext & LBR_CTL_ENABLE_MASK)) {
+ } else if (unlikely(vmcb01->control.misc_ctl2 & SVM_MISC_CTL2_LBR_CTL_ENABLE)) {
svm_copy_lbrs(vmcb01, vmcb02);
svm_update_lbrv(vcpu);
}
@@ -1746,8 +1746,8 @@ static void nested_copy_vmcb_cache_to_control(struct vmcb_control_area *dst,
dst->event_inj = from->event_inj;
dst->event_inj_err = from->event_inj_err;
dst->next_rip = from->next_rip;
- dst->nested_cr3 = from->nested_cr3;
- dst->virt_ext = from->virt_ext;
+ dst->nested_cr3 = from->nested_cr3;
+ dst->misc_ctl2 = from->misc_ctl2;
dst->pause_filter_count = from->pause_filter_count;
dst->pause_filter_thresh = from->pause_filter_thresh;
/* 'clean' and 'hv_enlightenments' are not changed by KVM */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9ef7683fb2ff0..185f17ff2170b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -705,7 +705,7 @@ void *svm_alloc_permissions_map(unsigned long size, gfp_t gfp_mask)
static void svm_recalc_lbr_msr_intercepts(struct kvm_vcpu *vcpu)
{
- bool intercept = !(to_svm(vcpu)->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK);
+ bool intercept = !(to_svm(vcpu)->vmcb->control.misc_ctl2 & SVM_MISC_CTL2_LBR_CTL_ENABLE);
svm_set_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHFROMIP, MSR_TYPE_RW, intercept);
svm_set_intercept_for_msr(vcpu, MSR_IA32_LASTBRANCHTOIP, MSR_TYPE_RW, intercept);
@@ -810,7 +810,7 @@ void svm_enable_lbrv(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;
+ svm->vmcb->control.misc_ctl2 |= SVM_MISC_CTL2_LBR_CTL_ENABLE;
svm_recalc_lbr_msr_intercepts(vcpu);
/* Move the LBR msrs to the vmcb02 so that the guest can see them. */
@@ -823,7 +823,7 @@ static void svm_disable_lbrv(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
KVM_BUG_ON(sev_es_guest(vcpu->kvm), vcpu->kvm);
- svm->vmcb->control.virt_ext &= ~LBR_CTL_ENABLE_MASK;
+ svm->vmcb->control.misc_ctl2 &= ~SVM_MISC_CTL2_LBR_CTL_ENABLE;
svm_recalc_lbr_msr_intercepts(vcpu);
/*
@@ -841,17 +841,17 @@ static struct vmcb *svm_get_lbr_vmcb(struct vcpu_svm *svm)
* vmcb01. If LBR virtualization is enabled and L1 is running VMs of
* its own, the MSRs are moved between vmcb01 and vmcb02 as needed.
*/
- return svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK ? svm->vmcb :
+ return svm->vmcb->control.misc_ctl2 & SVM_MISC_CTL2_LBR_CTL_ENABLE ? svm->vmcb :
svm->vmcb01.ptr;
}
void svm_update_lbrv(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- bool current_enable_lbrv = svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK;
+ bool current_enable_lbrv = svm->vmcb->control.misc_ctl2 & SVM_MISC_CTL2_LBR_CTL_ENABLE;
bool enable_lbrv = (svm_get_lbr_vmcb(svm)->save.dbgctl & DEBUGCTLMSR_LBR) ||
(is_guest_mode(vcpu) && guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
- (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK));
+ (svm->nested.ctl.misc_ctl2 & SVM_MISC_CTL2_LBR_CTL_ENABLE));
if (enable_lbrv == current_enable_lbrv)
return;
@@ -1005,7 +1005,7 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu)
if (guest_cpuid_is_intel_compatible(vcpu)) {
svm_set_intercept(svm, INTERCEPT_VMLOAD);
svm_set_intercept(svm, INTERCEPT_VMSAVE);
- svm->vmcb->control.virt_ext &= ~VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
+ svm->vmcb->control.misc_ctl2 &= ~SVM_MISC_CTL2_V_VMLOAD_VMSAVE_ENABLE;
} else {
/*
* If hardware supports Virtual VMLOAD VMSAVE then enable it
@@ -1014,7 +1014,7 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu)
if (vls) {
svm_clr_intercept(svm, INTERCEPT_VMLOAD);
svm_clr_intercept(svm, INTERCEPT_VMSAVE);
- svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
+ svm->vmcb->control.misc_ctl2 |= SVM_MISC_CTL2_V_VMLOAD_VMSAVE_ENABLE;
}
}
}
@@ -3286,7 +3286,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
pr_err("%-20s%016llx\n", "ghcb:", control->ghcb_gpa);
pr_err("%-20s%08x\n", "event_inj:", control->event_inj);
pr_err("%-20s%08x\n", "event_inj_err:", control->event_inj_err);
- pr_err("%-20s%lld\n", "virt_ext:", control->virt_ext);
+ pr_err("%-20s%lld\n", "misc_ctl2:", control->misc_ctl2);
pr_err("%-20s%016llx\n", "next_rip:", control->next_rip);
pr_err("%-20s%016llx\n", "avic_backing_page:", control->avic_backing_page);
pr_err("%-20s%016llx\n", "avic_logical_id:", control->avic_logical_id);
@@ -4268,7 +4268,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
* VM-Exit), as running with the host's DEBUGCTL can negatively affect
* guest state and can even be fatal, e.g. due to Bus Lock Detect.
*/
- if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK) &&
+ if (!(svm->vmcb->control.misc_ctl2 & SVM_MISC_CTL2_LBR_CTL_ENABLE) &&
vcpu->arch.host_debugctl != svm->vmcb->save.dbgctl)
update_debugctlmsr(svm->vmcb->save.dbgctl);
@@ -4299,7 +4299,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
- if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK) &&
+ if (!(svm->vmcb->control.misc_ctl2 & SVM_MISC_CTL2_LBR_CTL_ENABLE) &&
vcpu->arch.host_debugctl != svm->vmcb->save.dbgctl)
update_debugctlmsr(vcpu->arch.host_debugctl);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 980560a815868..26ba9472784eb 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -174,7 +174,7 @@ struct vmcb_ctrl_area_cached {
u32 event_inj_err;
u64 next_rip;
u64 nested_cr3;
- u64 virt_ext;
+ u64 misc_ctl2;
u32 clean;
u64 bus_lock_rip;
union {
diff --git a/tools/testing/selftests/kvm/include/x86/svm.h b/tools/testing/selftests/kvm/include/x86/svm.h
index 5d2bcce34c019..a3f4eadffeb46 100644
--- a/tools/testing/selftests/kvm/include/x86/svm.h
+++ b/tools/testing/selftests/kvm/include/x86/svm.h
@@ -104,7 +104,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u32 event_inj;
u32 event_inj_err;
u64 nested_cr3;
- u64 virt_ext;
+ u64 misc_ctl2;
u32 clean;
u32 reserved_5;
u64 next_rip;
@@ -156,9 +156,6 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define AVIC_ENABLE_SHIFT 31
#define AVIC_ENABLE_MASK (1 << AVIC_ENABLE_SHIFT)
-#define LBR_CTL_ENABLE_MASK BIT_ULL(0)
-#define VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK BIT_ULL(1)
-
#define SVM_INTERRUPT_SHADOW_MASK 1
#define SVM_IOIO_STR_SHIFT 2
@@ -179,6 +176,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define SVM_MISC_CTL_CTL_NP_ENABLE BIT(0)
#define SVM_MISC_CTL_SEV_ENABLE BIT(1)
+#define SVM_MISC_CTL2_LBR_CTL_ENABLE BIT_ULL(0)
+#define SVM_MISC_CTL2_V_VMLOAD_VMSAVE_ENABLE BIT_ULL(1)
+
struct __attribute__ ((__packed__)) vmcb_seg {
u16 selector;
u16 attrib;
--
2.51.2.1026.g39e6a42477-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 06/11] KVM: SVM: switch svm_copy_lbrs() to a macro
2025-11-04 19:59 [PATCH 00/11] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (4 preceding siblings ...)
2025-11-04 19:59 ` [PATCH 05/11] KVM: SVM: Rename vmcb->virt_ext to vmcb->misc_ctl2 Yosry Ahmed
@ 2025-11-04 19:59 ` Yosry Ahmed
2025-11-04 19:59 ` [PATCH 07/11] KVM: nSVM: Cache all used fields from VMCB12 Yosry Ahmed
` (4 subsequent siblings)
10 siblings, 0 replies; 17+ messages in thread
From: Yosry Ahmed @ 2025-11-04 19:59 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, kvm, linux-kernel, Yosry Ahmed
In preparation for using svm_copy_lbrs() with instances of both 'struct
vmcb_save_area' and 'struct vmcb_save_area_cached', make it a macro
instead. Pull the call to vmcb_mark_dirty() out to the callers.
Macros are generally not preferred compared to functions, mainly due to
type-safety. However, in this case it seems like having a simple macro
copying a few fields is better than copy-pasting the same 5 lines of
code in 6 different places (soon to be 7), or creating 3 different
variants of the function.
On the bright side, pulling vmcb_mark_dirty() calls to the callers makes
it clear that in one case, vmcb_mark_dirty() was being called on VMCB12.
It is not architecturally defined for the CPU to clear arbitrary clean
bits, and it is not needed, so drop that one call.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
arch/x86/kvm/svm/nested.c | 11 +++++++----
arch/x86/kvm/svm/svm.c | 23 ++++++++---------------
arch/x86/kvm/svm/svm.h | 10 +++++++++-
3 files changed, 24 insertions(+), 20 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 986e6382dc4fa..81d7a0ed71392 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -726,12 +726,14 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
* Reserved bits of DEBUGCTL are ignored. Be consistent with
* svm_set_msr's definition of reserved bits.
*/
- svm_copy_lbrs(vmcb02, vmcb12);
+ svm_copy_lbrs(&vmcb02->save, &vmcb12->save);
+ vmcb_mark_dirty(vmcb02, VMCB_LBR);
vmcb02->save.dbgctl &= ~DEBUGCTL_RESERVED_BITS;
svm_update_lbrv(&svm->vcpu);
} else if (unlikely(vmcb01->control.misc_ctl2 & SVM_MISC_CTL2_LBR_CTL_ENABLE)) {
- svm_copy_lbrs(vmcb02, vmcb01);
+ svm_copy_lbrs(&vmcb02->save, &vmcb01->save);
+ vmcb_mark_dirty(vmcb02, VMCB_LBR);
}
}
@@ -1242,10 +1244,11 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
(svm->nested.ctl.misc_ctl2 & SVM_MISC_CTL2_LBR_CTL_ENABLE))) {
- svm_copy_lbrs(vmcb12, vmcb02);
+ svm_copy_lbrs(&vmcb12->save, &vmcb02->save);
svm_update_lbrv(vcpu);
} else if (unlikely(vmcb01->control.misc_ctl2 & SVM_MISC_CTL2_LBR_CTL_ENABLE)) {
- svm_copy_lbrs(vmcb01, vmcb02);
+ svm_copy_lbrs(&vmcb01->save, &vmcb02->save);
+ vmcb_mark_dirty(vmcb01, VMCB_LBR);
svm_update_lbrv(vcpu);
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 185f17ff2170b..07958dc7c62ba 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -795,17 +795,6 @@ static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
*/
}
-void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
-{
- to_vmcb->save.dbgctl = from_vmcb->save.dbgctl;
- to_vmcb->save.br_from = from_vmcb->save.br_from;
- to_vmcb->save.br_to = from_vmcb->save.br_to;
- to_vmcb->save.last_excp_from = from_vmcb->save.last_excp_from;
- to_vmcb->save.last_excp_to = from_vmcb->save.last_excp_to;
-
- vmcb_mark_dirty(to_vmcb, VMCB_LBR);
-}
-
void svm_enable_lbrv(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -814,8 +803,10 @@ void svm_enable_lbrv(struct kvm_vcpu *vcpu)
svm_recalc_lbr_msr_intercepts(vcpu);
/* Move the LBR msrs to the vmcb02 so that the guest can see them. */
- if (is_guest_mode(vcpu))
- svm_copy_lbrs(svm->vmcb, svm->vmcb01.ptr);
+ if (is_guest_mode(vcpu)) {
+ svm_copy_lbrs(&svm->vmcb->save, &svm->vmcb01.ptr->save);
+ vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
+ }
}
static void svm_disable_lbrv(struct kvm_vcpu *vcpu)
@@ -830,8 +821,10 @@ static void svm_disable_lbrv(struct kvm_vcpu *vcpu)
* Move the LBR msrs back to the vmcb01 to avoid copying them
* on nested guest entries.
*/
- if (is_guest_mode(vcpu))
- svm_copy_lbrs(svm->vmcb01.ptr, svm->vmcb);
+ if (is_guest_mode(vcpu)) {
+ svm_copy_lbrs(&svm->vmcb01.ptr->save, &svm->vmcb->save);
+ vmcb_mark_dirty(svm->vmcb01.ptr, VMCB_LBR);
+ }
}
static struct vmcb *svm_get_lbr_vmcb(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 26ba9472784eb..8577e35a7096a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -689,8 +689,16 @@ static inline void *svm_vcpu_alloc_msrpm(void)
return svm_alloc_permissions_map(MSRPM_SIZE, GFP_KERNEL_ACCOUNT);
}
+#define svm_copy_lbrs(to, from) \
+({ \
+ (to)->dbgctl = (from)->dbgctl; \
+ (to)->br_from = (from)->br_from; \
+ (to)->br_to = (from)->br_to; \
+ (to)->last_excp_from = (from)->last_excp_from; \
+ (to)->last_excp_to = (from)->last_excp_to; \
+})
+
void svm_vcpu_free_msrpm(void *msrpm);
-void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb);
void svm_enable_lbrv(struct kvm_vcpu *vcpu);
void svm_update_lbrv(struct kvm_vcpu *vcpu);
--
2.51.2.1026.g39e6a42477-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 07/11] KVM: nSVM: Cache all used fields from VMCB12
2025-11-04 19:59 [PATCH 00/11] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (5 preceding siblings ...)
2025-11-04 19:59 ` [PATCH 06/11] KVM: SVM: switch svm_copy_lbrs() to a macro Yosry Ahmed
@ 2025-11-04 19:59 ` Yosry Ahmed
2025-11-04 19:59 ` [PATCH 08/11] KVM: nSVM: Restrict mapping VMCB12 on nested VMRUN Yosry Ahmed
` (3 subsequent siblings)
10 siblings, 0 replies; 17+ messages in thread
From: Yosry Ahmed @ 2025-11-04 19:59 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, kvm, linux-kernel, Yosry Ahmed
Currently, most fields used from VMCB12 are cached in
svm->nested.{ctl/save}. This is mainly to avoid TOC-TOU bugs. However,
for the save area, only the fields used in the consistency checks (i.e.
nested_vmcb_check_save()) were being cached. Other fields are read
directly from guest memory in nested_vmcb02_prepare_save().
While probably benign, this still makes it possible for TOC-TOU bugs to
happen. For example, RAX, RSP, and RIP are read twice, once to store in
VMCB02, and once to store in vcpu->arch.regs. It is possible for the
guest to modify the value between both reads, potentially causing nasty
bugs.
Harden against such bugs by caching everything in svm->nested.save.
Cache all the needed fields, and keep all accesses to the VMCB12
strictly in nested_svm_vmrun() for caching and early error injection.
Following changes will further limit the access to the VMCB12 in the
nested VMRUN path.
Introduce vmcb12_is_dirty() to use with the cached control fields
instead of vmcb_is_dirty(), similar to vmcb12_is_intercept().
Opportunistically order the copies in __nested_copy_vmcb_save_to_cache()
by the order in which the fields are defined in struct vmcb_save_area.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
arch/x86/kvm/svm/nested.c | 116 ++++++++++++++++++++++----------------
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/svm/svm.h | 27 ++++++++-
3 files changed, 93 insertions(+), 52 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 81d7a0ed71392..901f6dc12b09f 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -507,19 +507,34 @@ void nested_copy_vmcb_control_to_cache(struct vcpu_svm *svm,
static void __nested_copy_vmcb_save_to_cache(struct vmcb_save_area_cached *to,
struct vmcb_save_area *from)
{
- /*
- * Copy only fields that are validated, as we need them
- * to avoid TOC/TOU races.
- */
+ to->es = from->es;
to->cs = from->cs;
+ to->ss = from->ss;
+ to->ds = from->ds;
+ to->gdtr = from->gdtr;
+ to->idtr = from->idtr;
+
+ to->cpl = from->cpl;
to->efer = from->efer;
- to->cr0 = from->cr0;
- to->cr3 = from->cr3;
to->cr4 = from->cr4;
-
- to->dr6 = from->dr6;
+ to->cr3 = from->cr3;
+ to->cr0 = from->cr0;
to->dr7 = from->dr7;
+ to->dr6 = from->dr6;
+
+ to->rflags = from->rflags;
+ to->rip = from->rip;
+ to->rsp = from->rsp;
+
+ to->s_cet = from->s_cet;
+ to->ssp = from->ssp;
+ to->isst_addr = from->isst_addr;
+
+ to->rax = from->rax;
+ to->cr2 = from->cr2;
+
+ svm_copy_lbrs(to, from);
}
void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
@@ -655,8 +670,10 @@ void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm)
svm->nested.vmcb02.ptr->save.g_pat = svm->vmcb01.ptr->save.g_pat;
}
-static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12)
+static void nested_vmcb02_prepare_save(struct vcpu_svm *svm)
{
+ struct vmcb_ctrl_area_cached *control = &svm->nested.ctl;
+ struct vmcb_save_area_cached *save = &svm->nested.save;
bool new_vmcb12 = false;
struct vmcb *vmcb01 = svm->vmcb01.ptr;
struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;
@@ -672,49 +689,49 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
svm->nested.force_msr_bitmap_recalc = true;
}
- if (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_SEG))) {
- vmcb02->save.es = vmcb12->save.es;
- vmcb02->save.cs = vmcb12->save.cs;
- vmcb02->save.ss = vmcb12->save.ss;
- vmcb02->save.ds = vmcb12->save.ds;
- vmcb02->save.cpl = vmcb12->save.cpl;
+ if (unlikely(new_vmcb12 || vmcb12_is_dirty(control, VMCB_SEG))) {
+ vmcb02->save.es = save->es;
+ vmcb02->save.cs = save->cs;
+ vmcb02->save.ss = save->ss;
+ vmcb02->save.ds = save->ds;
+ vmcb02->save.cpl = save->cpl;
vmcb_mark_dirty(vmcb02, VMCB_SEG);
}
- if (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_DT))) {
- vmcb02->save.gdtr = vmcb12->save.gdtr;
- vmcb02->save.idtr = vmcb12->save.idtr;
+ if (unlikely(new_vmcb12 || vmcb12_is_dirty(control, VMCB_DT))) {
+ vmcb02->save.gdtr = save->gdtr;
+ vmcb02->save.idtr = save->idtr;
vmcb_mark_dirty(vmcb02, VMCB_DT);
}
if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
- (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_CET)))) {
- vmcb02->save.s_cet = vmcb12->save.s_cet;
- vmcb02->save.isst_addr = vmcb12->save.isst_addr;
- vmcb02->save.ssp = vmcb12->save.ssp;
+ (unlikely(new_vmcb12 || vmcb12_is_dirty(control, VMCB_CET)))) {
+ vmcb02->save.s_cet = save->s_cet;
+ vmcb02->save.isst_addr = save->isst_addr;
+ vmcb02->save.ssp = save->ssp;
vmcb_mark_dirty(vmcb02, VMCB_CET);
}
- kvm_set_rflags(vcpu, vmcb12->save.rflags | X86_EFLAGS_FIXED);
+ kvm_set_rflags(vcpu, save->rflags | X86_EFLAGS_FIXED);
svm_set_efer(vcpu, svm->nested.save.efer);
svm_set_cr0(vcpu, svm->nested.save.cr0);
svm_set_cr4(vcpu, svm->nested.save.cr4);
- svm->vcpu.arch.cr2 = vmcb12->save.cr2;
+ svm->vcpu.arch.cr2 = save->cr2;
- kvm_rax_write(vcpu, vmcb12->save.rax);
- kvm_rsp_write(vcpu, vmcb12->save.rsp);
- kvm_rip_write(vcpu, vmcb12->save.rip);
+ kvm_rax_write(vcpu, save->rax);
+ kvm_rsp_write(vcpu, save->rsp);
+ kvm_rip_write(vcpu, save->rip);
/* In case we don't even reach vcpu_run, the fields are not updated */
- vmcb02->save.rax = vmcb12->save.rax;
- vmcb02->save.rsp = vmcb12->save.rsp;
- vmcb02->save.rip = vmcb12->save.rip;
+ vmcb02->save.rax = save->rax;
+ vmcb02->save.rsp = save->rsp;
+ vmcb02->save.rip = save->rip;
/* These bits will be set properly on the first execution when new_vmc12 is true */
- if (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_DR))) {
+ if (unlikely(new_vmcb12 || vmcb12_is_dirty(control, VMCB_DR))) {
vmcb02->save.dr7 = svm->nested.save.dr7 | DR7_FIXED_1;
svm->vcpu.arch.dr6 = svm->nested.save.dr6 | DR6_ACTIVE_LOW;
vmcb_mark_dirty(vmcb02, VMCB_DR);
@@ -726,7 +743,7 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
* Reserved bits of DEBUGCTL are ignored. Be consistent with
* svm_set_msr's definition of reserved bits.
*/
- svm_copy_lbrs(&vmcb02->save, &vmcb12->save);
+ svm_copy_lbrs(&vmcb02->save, save);
vmcb_mark_dirty(vmcb02, VMCB_LBR);
vmcb02->save.dbgctl &= ~DEBUGCTL_RESERVED_BITS;
svm_update_lbrv(&svm->vcpu);
@@ -942,28 +959,29 @@ static void nested_svm_copy_common_state(struct vmcb *from_vmcb, struct vmcb *to
to_vmcb->save.spec_ctrl = from_vmcb->save.spec_ctrl;
}
-int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
- struct vmcb *vmcb12, bool from_vmrun)
+int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa, bool from_vmrun)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ struct vmcb_ctrl_area_cached *control = &svm->nested.ctl;
+ struct vmcb_save_area_cached *save = &svm->nested.save;
int ret;
trace_kvm_nested_vmenter(svm->vmcb->save.rip,
vmcb12_gpa,
- vmcb12->save.rip,
- vmcb12->control.int_ctl,
- vmcb12->control.event_inj,
- vmcb12->control.misc_ctl,
- vmcb12->control.nested_cr3,
- vmcb12->save.cr3,
+ save->rip,
+ control->int_ctl,
+ control->event_inj,
+ control->misc_ctl,
+ control->nested_cr3,
+ save->cr3,
KVM_ISA_SVM);
- trace_kvm_nested_intercepts(vmcb12->control.intercepts[INTERCEPT_CR] & 0xffff,
- vmcb12->control.intercepts[INTERCEPT_CR] >> 16,
- vmcb12->control.intercepts[INTERCEPT_EXCEPTION],
- vmcb12->control.intercepts[INTERCEPT_WORD3],
- vmcb12->control.intercepts[INTERCEPT_WORD4],
- vmcb12->control.intercepts[INTERCEPT_WORD5]);
+ trace_kvm_nested_intercepts(control->intercepts[INTERCEPT_CR] & 0xffff,
+ control->intercepts[INTERCEPT_CR] >> 16,
+ control->intercepts[INTERCEPT_EXCEPTION],
+ control->intercepts[INTERCEPT_WORD3],
+ control->intercepts[INTERCEPT_WORD4],
+ control->intercepts[INTERCEPT_WORD5]);
svm->nested.vmcb12_gpa = vmcb12_gpa;
@@ -973,8 +991,8 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr);
svm_switch_vmcb(svm, &svm->nested.vmcb02);
- nested_vmcb02_prepare_control(svm, vmcb12->save.rip, vmcb12->save.cs.base);
- nested_vmcb02_prepare_save(svm, vmcb12);
+ nested_vmcb02_prepare_control(svm, save->rip, save->cs.base);
+ nested_vmcb02_prepare_save(svm);
ret = nested_svm_load_cr3(&svm->vcpu, svm->nested.save.cr3,
nested_npt_enabled(svm), from_vmrun);
@@ -1063,7 +1081,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
svm->nested.nested_run_pending = 1;
- if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, true))
+ if (enter_svm_guest_mode(vcpu, vmcb12_gpa, true))
goto out_exit_err;
if (nested_svm_merge_msrpm(vcpu))
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 07958dc7c62ba..0062411d8f2be 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4762,7 +4762,7 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)
vmcb12 = map.hva;
nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
- ret = enter_svm_guest_mode(vcpu, smram64->svm_guest_vmcb_gpa, vmcb12, false);
+ ret = enter_svm_guest_mode(vcpu, smram64->svm_guest_vmcb_gpa, false);
if (ret)
goto unmap_save;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 8577e35a7096a..70cb8985904b8 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -142,13 +142,32 @@ struct kvm_vmcb_info {
};
struct vmcb_save_area_cached {
+ struct vmcb_seg es;
struct vmcb_seg cs;
+ struct vmcb_seg ss;
+ struct vmcb_seg ds;
+ struct vmcb_seg gdtr;
+ struct vmcb_seg idtr;
+ u8 cpl;
u64 efer;
u64 cr4;
u64 cr3;
u64 cr0;
u64 dr7;
u64 dr6;
+ u64 rflags;
+ u64 rip;
+ u64 rsp;
+ u64 s_cet;
+ u64 ssp;
+ u64 isst_addr;
+ u64 rax;
+ u64 cr2;
+ u64 dbgctl;
+ u64 br_from;
+ u64 br_to;
+ u64 last_excp_from;
+ u64 last_excp_to;
};
struct vmcb_ctrl_area_cached {
@@ -422,6 +441,11 @@ static inline bool vmcb_is_dirty(struct vmcb *vmcb, int bit)
return !test_bit(bit, (unsigned long *)&vmcb->control.clean);
}
+static inline bool vmcb12_is_dirty(struct vmcb_ctrl_area_cached *control, int bit)
+{
+ return !test_bit(bit, (unsigned long *)&control->clean);
+}
+
static __always_inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
{
return container_of(vcpu, struct vcpu_svm, vcpu);
@@ -760,8 +784,7 @@ static inline bool nested_exit_on_nmi(struct vcpu_svm *svm)
int __init nested_svm_init_msrpm_merge_offsets(void);
-int enter_svm_guest_mode(struct kvm_vcpu *vcpu,
- u64 vmcb_gpa, struct vmcb *vmcb12, bool from_vmrun);
+int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb_gpa, bool from_vmrun);
void svm_leave_nested(struct kvm_vcpu *vcpu);
void svm_free_nested(struct vcpu_svm *svm);
int svm_allocate_nested(struct vcpu_svm *svm);
--
2.51.2.1026.g39e6a42477-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 08/11] KVM: nSVM: Restrict mapping VMCB12 on nested VMRUN
2025-11-04 19:59 [PATCH 00/11] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (6 preceding siblings ...)
2025-11-04 19:59 ` [PATCH 07/11] KVM: nSVM: Cache all used fields from VMCB12 Yosry Ahmed
@ 2025-11-04 19:59 ` Yosry Ahmed
2025-11-04 19:59 ` [PATCH 09/11] KVM: nSVM: Simplify nested_svm_vmrun() Yosry Ahmed
` (2 subsequent siblings)
10 siblings, 0 replies; 17+ messages in thread
From: Yosry Ahmed @ 2025-11-04 19:59 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, kvm, linux-kernel, Yosry Ahmed
All accesses to the VMCB12 in the guest memory are limited to
nested_svm_vmrun(). However, the VMCB12 remains mapped until the end of
the function execution. Unmapping right after the consistency checks is
possible, but it becomes easy-ish to introduce bugs where 'vmcb12' is
used after being unmapped.
Move all accesses to the VMCB12 into a new helper,
nested_svm_vmrun_read_vmcb12(), that maps the VMCB12,
caches the needed fields, performs consistency checks, and unmaps it.
This limits the scope of the VMCB12 mapping appropriately. It also
slightly simplifies the cleanup path of nested_svm_vmrun().
nested_svm_vmrun_read_vmcb12() returns -1 if the consistency checks
fail, maintaining the current behavior of skipping the instructions and
unmapping the VMCB12 (although in the opposite order).
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
arch/x86/kvm/svm/nested.c | 59 ++++++++++++++++++++++-----------------
1 file changed, 34 insertions(+), 25 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 901f6dc12b09f..8d5165df52f57 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1012,12 +1012,39 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa, bool from_vmrun)
return 0;
}
+static int nested_svm_vmrun_read_vmcb12(struct kvm_vcpu *vcpu, u64 vmcb12_gpa)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ struct kvm_host_map map;
+ struct vmcb *vmcb12;
+ int ret;
+
+ ret = kvm_vcpu_map(vcpu, gpa_to_gfn(vmcb12_gpa), &map);
+ if (ret)
+ return ret;
+
+ vmcb12 = map.hva;
+
+ nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
+ nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
+
+ if (!nested_vmcb_check_save(vcpu) ||
+ !nested_vmcb_check_controls(vcpu)) {
+ vmcb12->control.exit_code = SVM_EXIT_ERR;
+ vmcb12->control.exit_code_hi = 0;
+ vmcb12->control.exit_info_1 = 0;
+ vmcb12->control.exit_info_2 = 0;
+ ret = -1;
+ }
+
+ kvm_vcpu_unmap(vcpu, &map);
+ return ret;
+}
+
int nested_svm_vmrun(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
int ret;
- struct vmcb *vmcb12;
- struct kvm_host_map map;
u64 vmcb12_gpa;
struct vmcb *vmcb01 = svm->vmcb01.ptr;
@@ -1038,8 +1065,11 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
return ret;
}
+ if (WARN_ON_ONCE(!svm->nested.initialized))
+ return -EINVAL;
+
vmcb12_gpa = svm->vmcb->save.rax;
- ret = kvm_vcpu_map(vcpu, gpa_to_gfn(vmcb12_gpa), &map);
+ ret = nested_svm_vmrun_read_vmcb12(vcpu, vmcb12_gpa);
if (ret == -EINVAL) {
kvm_inject_gp(vcpu, 0);
return 1;
@@ -1049,23 +1079,6 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
ret = kvm_skip_emulated_instruction(vcpu);
- vmcb12 = map.hva;
-
- if (WARN_ON_ONCE(!svm->nested.initialized))
- return -EINVAL;
-
- nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
- nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
-
- if (!nested_vmcb_check_save(vcpu) ||
- !nested_vmcb_check_controls(vcpu)) {
- vmcb12->control.exit_code = SVM_EXIT_ERR;
- vmcb12->control.exit_code_hi = 0;
- vmcb12->control.exit_info_1 = 0;
- vmcb12->control.exit_info_2 = 0;
- goto out;
- }
-
/*
* Since vmcb01 is not in use, we can use it to store some of the L1
* state.
@@ -1085,7 +1098,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
goto out_exit_err;
if (nested_svm_merge_msrpm(vcpu))
- goto out;
+ return ret;
out_exit_err:
svm->nested.nested_run_pending = 0;
@@ -1098,10 +1111,6 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
svm->vmcb->control.exit_info_2 = 0;
nested_svm_vmexit(svm);
-
-out:
- kvm_vcpu_unmap(vcpu, &map);
-
return ret;
}
--
2.51.2.1026.g39e6a42477-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 09/11] KVM: nSVM: Simplify nested_svm_vmrun()
2025-11-04 19:59 [PATCH 00/11] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (7 preceding siblings ...)
2025-11-04 19:59 ` [PATCH 08/11] KVM: nSVM: Restrict mapping VMCB12 on nested VMRUN Yosry Ahmed
@ 2025-11-04 19:59 ` Yosry Ahmed
2025-11-04 19:59 ` [PATCH 10/11] KVM: nSVM: Sanitize control fields copied from VMCB12 Yosry Ahmed
2025-11-04 19:59 ` [PATCH 11/11] KVM: nSVM: Only copy NP_ENABLE from VMCB01's nested_ctl Yosry Ahmed
10 siblings, 0 replies; 17+ messages in thread
From: Yosry Ahmed @ 2025-11-04 19:59 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, kvm, linux-kernel, Yosry Ahmed
Call nested_svm_merge_msrpm() from enter_svm_guest_mode() if called from
the VMRUN path, instead of making the call in nested_svm_vmrun(). This
simplifies the flow of nested_svm_vmrun() and removes all jumps to
cleanup labels.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
arch/x86/kvm/svm/nested.c | 28 +++++++++++++---------------
1 file changed, 13 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 8d5165df52f57..3c0d8990ecd11 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1009,6 +1009,9 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa, bool from_vmrun)
nested_svm_hv_update_vm_vp_ids(vcpu);
+ if (from_vmrun && !nested_svm_merge_msrpm(vcpu))
+ return -1;
+
return 0;
}
@@ -1094,23 +1097,18 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
svm->nested.nested_run_pending = 1;
- if (enter_svm_guest_mode(vcpu, vmcb12_gpa, true))
- goto out_exit_err;
-
- if (nested_svm_merge_msrpm(vcpu))
- return ret;
-
-out_exit_err:
- svm->nested.nested_run_pending = 0;
- svm->nmi_l1_to_l2 = false;
- svm->soft_int_injected = false;
+ if (enter_svm_guest_mode(vcpu, vmcb12_gpa, true)) {
+ svm->nested.nested_run_pending = 0;
+ svm->nmi_l1_to_l2 = false;
+ svm->soft_int_injected = false;
- svm->vmcb->control.exit_code = SVM_EXIT_ERR;
- svm->vmcb->control.exit_code_hi = 0;
- svm->vmcb->control.exit_info_1 = 0;
- svm->vmcb->control.exit_info_2 = 0;
+ svm->vmcb->control.exit_code = SVM_EXIT_ERR;
+ svm->vmcb->control.exit_code_hi = 0;
+ svm->vmcb->control.exit_info_1 = 0;
+ svm->vmcb->control.exit_info_2 = 0;
- nested_svm_vmexit(svm);
+ nested_svm_vmexit(svm);
+ }
return ret;
}
--
2.51.2.1026.g39e6a42477-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 10/11] KVM: nSVM: Sanitize control fields copied from VMCB12
2025-11-04 19:59 [PATCH 00/11] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (8 preceding siblings ...)
2025-11-04 19:59 ` [PATCH 09/11] KVM: nSVM: Simplify nested_svm_vmrun() Yosry Ahmed
@ 2025-11-04 19:59 ` Yosry Ahmed
2025-11-04 19:59 ` [PATCH 11/11] KVM: nSVM: Only copy NP_ENABLE from VMCB01's nested_ctl Yosry Ahmed
10 siblings, 0 replies; 17+ messages in thread
From: Yosry Ahmed @ 2025-11-04 19:59 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, kvm, linux-kernel, Yosry Ahmed
Make sure all fields used from VMCB12 in creating the VMCB02 are
sanitized, such no unhandled or reserved bits end up in the VMCB02.
The following control fields are read from VMCB12 and have bits that are
either reserved or not handled/advertised by KVM: tlb_ctl, int_ctl,
int_state, int_vector, event_inj, misc_ctl, and misc_ctl2.
The following fields do not require any extra sanitizing:
- int_ctl: bits from VMCB12 are copied bit-by-bit as needed.
- misc_ctl: only used in consistency checks (particularly NP_ENABLE).
- misc_ctl2: bits from VMCB12 are copied bit-by-bit as needed.
For the remaining fields, make sure only defined bits are copied from
VMCB12 by defining appropriate masks where needed. The only exception is
tlb_ctl, which is unused, so remove it.
Opportunisitcally move some existing definitions in svm.h around such
that they are ordered by bit position, and cleanup ignoring the lower
bits of {io/msr}pm_base_pa in __nested_copy_vmcb_control_to_cache() by
using PAGE_MASK. Also, expand the comment about the ASID being copied
only for consistency checks.
Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
arch/x86/include/asm/svm.h | 11 ++++++++---
arch/x86/kvm/svm/nested.c | 26 ++++++++++++++------------
arch/x86/kvm/svm/svm.h | 1 -
3 files changed, 22 insertions(+), 16 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 5f3781587dd01..f583871dcc3b5 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -213,11 +213,13 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define V_NMI_ENABLE_SHIFT 26
#define V_NMI_ENABLE_MASK (1 << V_NMI_ENABLE_SHIFT)
+#define X2APIC_MODE_SHIFT 30
+#define X2APIC_MODE_MASK (1 << X2APIC_MODE_SHIFT)
+
#define AVIC_ENABLE_SHIFT 31
#define AVIC_ENABLE_MASK (1 << AVIC_ENABLE_SHIFT)
-#define X2APIC_MODE_SHIFT 30
-#define X2APIC_MODE_MASK (1 << X2APIC_MODE_SHIFT)
+#define SVM_INT_VECTOR_MASK (0xff)
#define SVM_INTERRUPT_SHADOW_MASK BIT_ULL(0)
#define SVM_GUEST_INTERRUPT_MASK BIT_ULL(1)
@@ -629,8 +631,11 @@ static inline void __unused_size_checks(void)
#define SVM_EVTINJ_TYPE_EXEPT (3 << SVM_EVTINJ_TYPE_SHIFT)
#define SVM_EVTINJ_TYPE_SOFT (4 << SVM_EVTINJ_TYPE_SHIFT)
-#define SVM_EVTINJ_VALID (1 << 31)
#define SVM_EVTINJ_VALID_ERR (1 << 11)
+#define SVM_EVTINJ_VALID (1 << 31)
+
+#define SVM_EVTINJ_RESERVED_BITS ~(SVM_EVTINJ_VEC_MASK | SVM_EVTINJ_TYPE_MASK | \
+ SVM_EVTINJ_VALID_ERR | SVM_EVTINJ_VALID)
/* Only valid exceptions (and not NMIs) are allowed for SVM_EVTINJ_TYPE_EXEPT */
#define SVM_EVNTINJ_INVALID_EXEPTS (NMI_VECTOR | BIT_ULL(9) | BIT_ULL(15) | \
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 3c0d8990ecd11..a223887e5b78d 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -461,10 +461,11 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
for (i = 0; i < MAX_INTERCEPT; i++)
to->intercepts[i] = from->intercepts[i];
- to->iopm_base_pa = from->iopm_base_pa;
- to->msrpm_base_pa = from->msrpm_base_pa;
+ /* Lower bits of IOPM_BASE_PA and MSRPM_BASE_PA are ignored */
+ to->iopm_base_pa = from->iopm_base_pa & PAGE_MASK;
+ to->msrpm_base_pa = from->msrpm_base_pa & PAGE_MASK;
+
to->tsc_offset = from->tsc_offset;
- to->tlb_ctl = from->tlb_ctl;
to->int_ctl = from->int_ctl;
to->int_vector = from->int_vector;
to->int_state = from->int_state;
@@ -474,19 +475,21 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
to->exit_info_2 = from->exit_info_2;
to->exit_int_info = from->exit_int_info;
to->exit_int_info_err = from->exit_int_info_err;
- to->misc_ctl = from->misc_ctl;
+ to->misc_ctl = from->misc_ctl;
to->event_inj = from->event_inj;
to->event_inj_err = from->event_inj_err;
to->next_rip = from->next_rip;
to->nested_cr3 = from->nested_cr3;
- to->misc_ctl2 = from->misc_ctl2;
+ to->misc_ctl2 = from->misc_ctl2;
to->pause_filter_count = from->pause_filter_count;
to->pause_filter_thresh = from->pause_filter_thresh;
- /* Copy asid here because nested_vmcb_check_controls will check it. */
+ /*
+ * Copy asid here because nested_vmcb_check_controls() will check it.
+ * The ASID could be invalid, or conflict with another VM's ASID , so it
+ * should never be used directly to run L2.
+ */
to->asid = from->asid;
- to->msrpm_base_pa &= ~0x0fffULL;
- to->iopm_base_pa &= ~0x0fffULL;
#ifdef CONFIG_KVM_HYPERV
/* Hyper-V extensions (Enlightened VMCB) */
@@ -875,9 +878,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
(svm->nested.ctl.int_ctl & int_ctl_vmcb12_bits) |
(vmcb01->control.int_ctl & int_ctl_vmcb01_bits);
- vmcb02->control.int_vector = svm->nested.ctl.int_vector;
- vmcb02->control.int_state = svm->nested.ctl.int_state;
- vmcb02->control.event_inj = svm->nested.ctl.event_inj;
+ vmcb02->control.int_vector = svm->nested.ctl.int_vector & SVM_INT_VECTOR_MASK;
+ vmcb02->control.int_state = svm->nested.ctl.int_state & SVM_INTERRUPT_SHADOW_MASK;
+ vmcb02->control.event_inj = svm->nested.ctl.event_inj & ~SVM_EVTINJ_RESERVED_BITS;
vmcb02->control.event_inj_err = svm->nested.ctl.event_inj_err;
/*
@@ -1760,7 +1763,6 @@ static void nested_copy_vmcb_cache_to_control(struct vmcb_control_area *dst,
dst->msrpm_base_pa = from->msrpm_base_pa;
dst->tsc_offset = from->tsc_offset;
dst->asid = from->asid;
- dst->tlb_ctl = from->tlb_ctl;
dst->int_ctl = from->int_ctl;
dst->int_vector = from->int_vector;
dst->int_state = from->int_state;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 70cb8985904b8..23e901cec8799 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -178,7 +178,6 @@ struct vmcb_ctrl_area_cached {
u64 msrpm_base_pa;
u64 tsc_offset;
u32 asid;
- u8 tlb_ctl;
u32 int_ctl;
u32 int_vector;
u32 int_state;
--
2.51.2.1026.g39e6a42477-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 11/11] KVM: nSVM: Only copy NP_ENABLE from VMCB01's nested_ctl
2025-11-04 19:59 [PATCH 00/11] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
` (9 preceding siblings ...)
2025-11-04 19:59 ` [PATCH 10/11] KVM: nSVM: Sanitize control fields copied from VMCB12 Yosry Ahmed
@ 2025-11-04 19:59 ` Yosry Ahmed
10 siblings, 0 replies; 17+ messages in thread
From: Yosry Ahmed @ 2025-11-04 19:59 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, kvm, linux-kernel, Yosry Ahmed
The 'misc_ctl' field in VMCB02 is taken as-is from VMCB01. However, the
only bit that needs to copied is NP_ENABLE. This is a nop now because
other bits are for SEV guests, which do not support nested.
Nonetheless, this hardens against future bugs if/when other bits are
set for L1 but should not be set for L2.
Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
arch/x86/kvm/svm/nested.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index a223887e5b78d..81d8b10e26cdf 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -821,8 +821,16 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
V_NMI_BLOCKING_MASK);
}
- /* Copied from vmcb01. msrpm_base can be overwritten later. */
- vmcb02->control.misc_ctl = vmcb01->control.misc_ctl;
+ /*
+ * Copied from vmcb01. msrpm_base can be overwritten later.
+ *
+ * NP_ENABLE in vmcb12 is only used for consistency checks. If L1
+ * enables NPTs, KVM shadows L1's NPTs and uses those to run L2. If L1
+ * disables NPT, KVM runs L2 with the same NPTs used to run L1. For the
+ * latter, L1 runs L2 with shadow page tables that translate L2 GVAs to
+ * L1 GPAs, so the same NPTs can be used for L1 and L2.
+ */
+ vmcb02->control.misc_ctl = vmcb01->control.misc_ctl & SVM_MISC_CTL_NP_ENABLE;
vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa;
vmcb02->control.msrpm_base_pa = vmcb01->control.msrpm_base_pa;
vmcb_mark_dirty(vmcb02, VMCB_PERM_MAP);
--
2.51.2.1026.g39e6a42477-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 03/11] KVM: nSVM: Add missing consistency check for event_inj
2025-11-04 19:59 ` [PATCH 03/11] KVM: nSVM: Add missing consistency check for event_inj Yosry Ahmed
@ 2025-11-05 18:48 ` Sean Christopherson
2025-11-05 19:29 ` Yosry Ahmed
0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2025-11-05 18:48 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, Jim Mattson, kvm, linux-kernel
On Tue, Nov 04, 2025, Yosry Ahmed wrote:
> According to the APM Volume #2, 15.20 (24593—Rev. 3.42—March 2024):
>
> VMRUN exits with VMEXIT_INVALID error code if either:
> • Reserved values of TYPE have been specified, or
> • TYPE = 3 (exception) has been specified with a vector that does not
> correspond to an exception (this includes vector 2, which is an NMI,
> not an exception).
>
> Add the missing consistency checks to KVM. For the second point, inject
> VMEXIT_INVALID if the vector is anything but the vectors defined by the
> APM for exceptions. Reserved vectors are also considered invalid, which
> matches the HW behavior.
Ugh. Strictly speaking, that means KVM needs to match the capabilities of the
virtual CPU. E.g. if the virtual CPU predates SEV-ES, then #VC should be reserved
from the guest's perspective.
> Vector 9 (i.e. #CSO) is considered invalid because it is reserved on modern
> CPUs, and according to LLMs no CPUs exist supporting SVM and producing #CSOs.
>
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
> arch/x86/include/asm/svm.h | 5 +++++
> arch/x86/kvm/svm/nested.c | 33 +++++++++++++++++++++++++++++++++
> 2 files changed, 38 insertions(+)
>
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index e69b6d0dedcf0..3a9441a8954f3 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -633,6 +633,11 @@ static inline void __unused_size_checks(void)
> #define SVM_EVTINJ_VALID (1 << 31)
> #define SVM_EVTINJ_VALID_ERR (1 << 11)
>
> +/* Only valid exceptions (and not NMIs) are allowed for SVM_EVTINJ_TYPE_EXEPT */
> +#define SVM_EVNTINJ_INVALID_EXEPTS (NMI_VECTOR | BIT_ULL(9) | BIT_ULL(15) | \
> + BIT_ULL(20) | GENMASK_ULL(27, 22) | \
> + BIT_ULL(31))
As above, hardcoding this won't work. E.g. if a VM is migrated from a CPU where
vector X is reserved to a CPU where vector X is valid, then the VM will observe
a change in behavior.
Even if we're ok being overly permissive today (e.g. by taking an erratum), this
will create problems in the future when one of the reserved vectors is defined,
at which point we'll end up changing guest-visible behavior (and will have to
take another erratum, or maybe define the erratum to be that KVM straight up
doesn't enforce this correctly?)
And if we do throw in the towel and don't try to enforce this, we'll still want
a safeguard against this becoming stale, e.g. when KVM adds support for new
feature XYZ that comes with a new vector.
Off the cuff, the best idea I have is to define the positive set of vectors
somewhere common with a static assert, and then invert that. E.g. maybe something
shared with kvm_trace_sym_exc()?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 01/11] KVM: nSVM: Fix consistency checks for NP_ENABLE
2025-11-04 19:59 ` [PATCH 01/11] KVM: nSVM: Fix consistency checks for NP_ENABLE Yosry Ahmed
@ 2025-11-05 18:52 ` Yosry Ahmed
0 siblings, 0 replies; 17+ messages in thread
From: Yosry Ahmed @ 2025-11-05 18:52 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm, linux-kernel, stable
On Tue, Nov 04, 2025 at 07:59:39PM +0000, Yosry Ahmed wrote:
> KVM currenty fails a nested VMRUN and injects VMEXIT_INVALID (aka
> SVM_EXIT_ERR) if L1 sets NP_ENABLE and the host does not support NPTs.
> On first glance, it seems like the check should actually be for
> guest_cpu_cap_has(X86_FEATURE_NPT) instead, as it is possible for the
> host to support NPTs but the guest CPUID to not advertise it.
>
> However, the consistency check is not architectural to begin with. The
> APM does not mention VMEXIT_INVALID if NP_ENABLE is set on a processor
> that does not have X86_FEATURE_NPT. Hence, NP_ENABLE should be ignored
> if X86_FEATURE_NPT is not available for L1. Apart from the consistency
> check, this is currently the case because NP_ENABLE is actually copied
> from VMCB01 to VMCB02, not from VMCB12.
>
> On the other hand, the APM does mention two other consistency checks for
> NP_ENABLE, both of which are missing (paraphrased):
>
> In Volume #2, 15.25.3 (24593—Rev. 3.42—March 2024):
>
> If VMRUN is executed with hCR0.PG cleared to zero and NP_ENABLE set to
> 1, VMRUN terminates with #VMEXIT(VMEXIT_INVALID)
>
> In Volume #2, 15.25.4 (24593—Rev. 3.42—March 2024):
>
> When VMRUN is executed with nested paging enabled (NP_ENABLE = 1), the
> following conditions are considered illegal state combinations, in
> addition to those mentioned in “Canonicalization and Consistency
> Checks”:
> • Any MBZ bit of nCR3 is set.
> • Any G_PAT.PA field has an unsupported type encoding or any
> reserved field in G_PAT has a nonzero value.
>
> Replace the existing consistency check with consistency checks on
> hCR0.PG and nCR3. The G_PAT consistency check will be addressed
> separately.
>
> Pass L1's CR0 to __nested_vmcb_check_controls(). In
> nested_vmcb_check_controls(), L1's CR0 is available through
> kvm_read_cr0(), as vcpu->arch.cr0 is not updated to L2's CR0 until later
> through nested_vmcb02_prepare_save() -> svm_set_cr0().
>
> In svm_set_nested_state(), L1's CR0 is available in the captured save
> area, as svm_get_nested_state() captures L1's save area when running L2,
> and L1's CR0 is stashed in VMCB01 on nested VMRUN (in
> nested_svm_vmrun()).
>
> Fixes: 4b16184c1cca ("KVM: SVM: Initialize Nested Nested MMU context on VMRUN")
> Cc: stable@vger.kernel.org
>
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
> arch/x86/kvm/svm/nested.c | 21 ++++++++++++++++-----
> arch/x86/kvm/svm/svm.h | 3 ++-
> 2 files changed, 18 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 83de3456df708..9a534f04bdc83 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -325,7 +325,8 @@ static bool nested_svm_check_bitmap_pa(struct kvm_vcpu *vcpu, u64 pa, u32 size)
> }
>
> static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
> - struct vmcb_ctrl_area_cached *control)
> + struct vmcb_ctrl_area_cached *control,
> + unsigned long l1_cr0)
> {
> if (CC(!vmcb12_is_intercept(control, INTERCEPT_VMRUN)))
> return false;
> @@ -333,8 +334,12 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
> if (CC(control->asid == 0))
> return false;
>
> - if (CC((control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) && !npt_enabled))
> - return false;
> + if (control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) {
I think this should actually be nested_npt_enabled(), because we
shouldn't do these consistency checks if NPT is not supported on the
vCPU at all (which was kinda the point).
> + if (CC(!kvm_vcpu_is_legal_gpa(vcpu, control->nested_cr3)))
> + return false;
> + if (CC(!(l1_cr0 & X86_CR0_PG)))
> + return false;
> + }
>
> if (CC(!nested_svm_check_bitmap_pa(vcpu, control->msrpm_base_pa,
> MSRPM_SIZE)))
> @@ -400,7 +405,12 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu)
> struct vcpu_svm *svm = to_svm(vcpu);
> struct vmcb_ctrl_area_cached *ctl = &svm->nested.ctl;
>
> - return __nested_vmcb_check_controls(vcpu, ctl);
> + /*
> + * Make sure we did not enter guest mode yet, in which case
> + * kvm_read_cr0() could return L2's CR0.
> + */
> + WARN_ON_ONCE(is_guest_mode(vcpu));
> + return __nested_vmcb_check_controls(vcpu, ctl, kvm_read_cr0(vcpu));
> }
>
> static
> @@ -1832,7 +1842,8 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
>
> ret = -EINVAL;
> __nested_copy_vmcb_control_to_cache(vcpu, &ctl_cached, ctl);
> - if (!__nested_vmcb_check_controls(vcpu, &ctl_cached))
> + /* 'save' contains L1 state saved from before VMRUN */
> + if (!__nested_vmcb_check_controls(vcpu, &ctl_cached, save->cr0))
> goto out_free;
>
> /*
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 6765a5e433cea..0a2908e22d746 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -552,7 +552,8 @@ static inline bool gif_set(struct vcpu_svm *svm)
>
> static inline bool nested_npt_enabled(struct vcpu_svm *svm)
> {
> - return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> + return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_NPT) &&
> + svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> }
>
> static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
> --
> 2.51.2.1026.g39e6a42477-goog
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 03/11] KVM: nSVM: Add missing consistency check for event_inj
2025-11-05 18:48 ` Sean Christopherson
@ 2025-11-05 19:29 ` Yosry Ahmed
2025-11-06 1:17 ` Sean Christopherson
0 siblings, 1 reply; 17+ messages in thread
From: Yosry Ahmed @ 2025-11-05 19:29 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm, linux-kernel
On Wed, Nov 05, 2025 at 10:48:28AM -0800, Sean Christopherson wrote:
> On Tue, Nov 04, 2025, Yosry Ahmed wrote:
> > According to the APM Volume #2, 15.20 (24593—Rev. 3.42—March 2024):
> >
> > VMRUN exits with VMEXIT_INVALID error code if either:
> > • Reserved values of TYPE have been specified, or
> > • TYPE = 3 (exception) has been specified with a vector that does not
> > correspond to an exception (this includes vector 2, which is an NMI,
> > not an exception).
> >
> > Add the missing consistency checks to KVM. For the second point, inject
> > VMEXIT_INVALID if the vector is anything but the vectors defined by the
> > APM for exceptions. Reserved vectors are also considered invalid, which
> > matches the HW behavior.
>
> Ugh. Strictly speaking, that means KVM needs to match the capabilities of the
> virtual CPU. E.g. if the virtual CPU predates SEV-ES, then #VC should be reserved
> from the guest's perspective.
>
> > Vector 9 (i.e. #CSO) is considered invalid because it is reserved on modern
> > CPUs, and according to LLMs no CPUs exist supporting SVM and producing #CSOs.
> >
> > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > ---
> > arch/x86/include/asm/svm.h | 5 +++++
> > arch/x86/kvm/svm/nested.c | 33 +++++++++++++++++++++++++++++++++
> > 2 files changed, 38 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> > index e69b6d0dedcf0..3a9441a8954f3 100644
> > --- a/arch/x86/include/asm/svm.h
> > +++ b/arch/x86/include/asm/svm.h
> > @@ -633,6 +633,11 @@ static inline void __unused_size_checks(void)
> > #define SVM_EVTINJ_VALID (1 << 31)
> > #define SVM_EVTINJ_VALID_ERR (1 << 11)
> >
> > +/* Only valid exceptions (and not NMIs) are allowed for SVM_EVTINJ_TYPE_EXEPT */
> > +#define SVM_EVNTINJ_INVALID_EXEPTS (NMI_VECTOR | BIT_ULL(9) | BIT_ULL(15) | \
> > + BIT_ULL(20) | GENMASK_ULL(27, 22) | \
> > + BIT_ULL(31))
>
> As above, hardcoding this won't work. E.g. if a VM is migrated from a CPU where
> vector X is reserved to a CPU where vector X is valid, then the VM will observe
> a change in behavior.
>
> Even if we're ok being overly permissive today (e.g. by taking an erratum), this
> will create problems in the future when one of the reserved vectors is defined,
> at which point we'll end up changing guest-visible behavior (and will have to
> take another erratum, or maybe define the erratum to be that KVM straight up
> doesn't enforce this correctly?)
>
> And if we do throw in the towel and don't try to enforce this, we'll still want
> a safeguard against this becoming stale, e.g. when KVM adds support for new
> feature XYZ that comes with a new vector.
>
> Off the cuff, the best idea I have is to define the positive set of vectors
> somewhere common with a static assert, and then invert that. E.g. maybe something
> shared with kvm_trace_sym_exc()?
Do you mean define the positive set of vectors dynamically based on the
vCPU caps? Like a helper returning a dynamic bitmask instead of
SVM_EVNTINJ_INVALID_EXEPTS?
If we'll reuse that for kvm_trace_sym_exc() it will need more work, but
I don't see why we need a dynamic list for kvm_trace_sym_exc().
So my best guess is that I didn't really understand your suggestion :)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 03/11] KVM: nSVM: Add missing consistency check for event_inj
2025-11-05 19:29 ` Yosry Ahmed
@ 2025-11-06 1:17 ` Sean Christopherson
2025-11-08 2:29 ` Yosry Ahmed
0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2025-11-06 1:17 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, Jim Mattson, kvm, linux-kernel
On Wed, Nov 05, 2025, Yosry Ahmed wrote:
> On Wed, Nov 05, 2025 at 10:48:28AM -0800, Sean Christopherson wrote:
> > On Tue, Nov 04, 2025, Yosry Ahmed wrote:
> > > According to the APM Volume #2, 15.20 (24593—Rev. 3.42—March 2024):
> > >
> > > VMRUN exits with VMEXIT_INVALID error code if either:
> > > • Reserved values of TYPE have been specified, or
> > > • TYPE = 3 (exception) has been specified with a vector that does not
> > > correspond to an exception (this includes vector 2, which is an NMI,
> > > not an exception).
> > >
> > > Add the missing consistency checks to KVM. For the second point, inject
> > > VMEXIT_INVALID if the vector is anything but the vectors defined by the
> > > APM for exceptions. Reserved vectors are also considered invalid, which
> > > matches the HW behavior.
> >
> > Ugh. Strictly speaking, that means KVM needs to match the capabilities of the
> > virtual CPU. E.g. if the virtual CPU predates SEV-ES, then #VC should be reserved
> > from the guest's perspective.
> >
> > > Vector 9 (i.e. #CSO) is considered invalid because it is reserved on modern
> > > CPUs, and according to LLMs no CPUs exist supporting SVM and producing #CSOs.
> > >
> > > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > > ---
> > > arch/x86/include/asm/svm.h | 5 +++++
> > > arch/x86/kvm/svm/nested.c | 33 +++++++++++++++++++++++++++++++++
> > > 2 files changed, 38 insertions(+)
> > >
> > > diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> > > index e69b6d0dedcf0..3a9441a8954f3 100644
> > > --- a/arch/x86/include/asm/svm.h
> > > +++ b/arch/x86/include/asm/svm.h
> > > @@ -633,6 +633,11 @@ static inline void __unused_size_checks(void)
> > > #define SVM_EVTINJ_VALID (1 << 31)
> > > #define SVM_EVTINJ_VALID_ERR (1 << 11)
> > >
> > > +/* Only valid exceptions (and not NMIs) are allowed for SVM_EVTINJ_TYPE_EXEPT */
> > > +#define SVM_EVNTINJ_INVALID_EXEPTS (NMI_VECTOR | BIT_ULL(9) | BIT_ULL(15) | \
> > > + BIT_ULL(20) | GENMASK_ULL(27, 22) | \
> > > + BIT_ULL(31))
> >
> > As above, hardcoding this won't work. E.g. if a VM is migrated from a CPU where
> > vector X is reserved to a CPU where vector X is valid, then the VM will observe
> > a change in behavior.
> >
> > Even if we're ok being overly permissive today (e.g. by taking an erratum), this
> > will create problems in the future when one of the reserved vectors is defined,
> > at which point we'll end up changing guest-visible behavior (and will have to
> > take another erratum, or maybe define the erratum to be that KVM straight up
> > doesn't enforce this correctly?)
> >
> > And if we do throw in the towel and don't try to enforce this, we'll still want
> > a safeguard against this becoming stale, e.g. when KVM adds support for new
> > feature XYZ that comes with a new vector.
> >
> > Off the cuff, the best idea I have is to define the positive set of vectors
> > somewhere common with a static assert, and then invert that. E.g. maybe something
> > shared with kvm_trace_sym_exc()?
>
> Do you mean define the positive set of vectors dynamically based on the
> vCPU caps? Like a helper returning a dynamic bitmask instead of
> SVM_EVNTINJ_INVALID_EXEPTS?
Ya, that would be option #1, though I'm not entirely sure it's a good option.
The validity of vectors aren't architecturally tied to the existince of any
particular feature, at least not explicitly. For the "newer" vectors, i.e. the
ones that we can treat as conditionally valid, it's pretty obvious which features
they "belong" to, but even then I hesitate to draw connections, e.g. on the off
chance that some weird hypervisor checks Family/Model/Stepping or something.
> If we'll reuse that for kvm_trace_sym_exc() it will need more work, but
> I don't see why we need a dynamic list for kvm_trace_sym_exc().
Sorry, this is for option #2. Hardcode the set of vectors that KVM allows (to
prevent L1 from throwing pure garbage at hardware), but otherwise defer to the
CPU to enforce the reserved vectors.
Hrm, but option #2 just delays the inevitable, e.g. we'll be stuck once again
when KVM supports some new vector, in which case we'll have to change guest
visible behavior _again_, or bite the bullet and do option #1.
So I guess do option #1 straight away and hope nothing breaks? Maybe hardcode
everything as supported except #CP (SHSTK) and #VC (SEV-ES)?
> So my best guess is that I didn't really understand your suggestion :)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 03/11] KVM: nSVM: Add missing consistency check for event_inj
2025-11-06 1:17 ` Sean Christopherson
@ 2025-11-08 2:29 ` Yosry Ahmed
0 siblings, 0 replies; 17+ messages in thread
From: Yosry Ahmed @ 2025-11-08 2:29 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm, linux-kernel
On Wed, Nov 05, 2025 at 05:17:38PM -0800, Sean Christopherson wrote:
> On Wed, Nov 05, 2025, Yosry Ahmed wrote:
> > On Wed, Nov 05, 2025 at 10:48:28AM -0800, Sean Christopherson wrote:
> > > On Tue, Nov 04, 2025, Yosry Ahmed wrote:
> > > > According to the APM Volume #2, 15.20 (24593—Rev. 3.42—March 2024):
> > > >
> > > > VMRUN exits with VMEXIT_INVALID error code if either:
> > > > • Reserved values of TYPE have been specified, or
> > > > • TYPE = 3 (exception) has been specified with a vector that does not
> > > > correspond to an exception (this includes vector 2, which is an NMI,
> > > > not an exception).
> > > >
> > > > Add the missing consistency checks to KVM. For the second point, inject
> > > > VMEXIT_INVALID if the vector is anything but the vectors defined by the
> > > > APM for exceptions. Reserved vectors are also considered invalid, which
> > > > matches the HW behavior.
> > >
> > > Ugh. Strictly speaking, that means KVM needs to match the capabilities of the
> > > virtual CPU. E.g. if the virtual CPU predates SEV-ES, then #VC should be reserved
> > > from the guest's perspective.
> > >
> > > > Vector 9 (i.e. #CSO) is considered invalid because it is reserved on modern
> > > > CPUs, and according to LLMs no CPUs exist supporting SVM and producing #CSOs.
> > > >
> > > > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > > > ---
> > > > arch/x86/include/asm/svm.h | 5 +++++
> > > > arch/x86/kvm/svm/nested.c | 33 +++++++++++++++++++++++++++++++++
> > > > 2 files changed, 38 insertions(+)
> > > >
> > > > diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> > > > index e69b6d0dedcf0..3a9441a8954f3 100644
> > > > --- a/arch/x86/include/asm/svm.h
> > > > +++ b/arch/x86/include/asm/svm.h
> > > > @@ -633,6 +633,11 @@ static inline void __unused_size_checks(void)
> > > > #define SVM_EVTINJ_VALID (1 << 31)
> > > > #define SVM_EVTINJ_VALID_ERR (1 << 11)
> > > >
> > > > +/* Only valid exceptions (and not NMIs) are allowed for SVM_EVTINJ_TYPE_EXEPT */
> > > > +#define SVM_EVNTINJ_INVALID_EXEPTS (NMI_VECTOR | BIT_ULL(9) | BIT_ULL(15) | \
> > > > + BIT_ULL(20) | GENMASK_ULL(27, 22) | \
> > > > + BIT_ULL(31))
> > >
> > > As above, hardcoding this won't work. E.g. if a VM is migrated from a CPU where
> > > vector X is reserved to a CPU where vector X is valid, then the VM will observe
> > > a change in behavior.
> > >
> > > Even if we're ok being overly permissive today (e.g. by taking an erratum), this
> > > will create problems in the future when one of the reserved vectors is defined,
> > > at which point we'll end up changing guest-visible behavior (and will have to
> > > take another erratum, or maybe define the erratum to be that KVM straight up
> > > doesn't enforce this correctly?)
> > >
> > > And if we do throw in the towel and don't try to enforce this, we'll still want
> > > a safeguard against this becoming stale, e.g. when KVM adds support for new
> > > feature XYZ that comes with a new vector.
> > >
> > > Off the cuff, the best idea I have is to define the positive set of vectors
> > > somewhere common with a static assert, and then invert that. E.g. maybe something
> > > shared with kvm_trace_sym_exc()?
> >
> > Do you mean define the positive set of vectors dynamically based on the
> > vCPU caps? Like a helper returning a dynamic bitmask instead of
> > SVM_EVNTINJ_INVALID_EXEPTS?
>
> Ya, that would be option #1, though I'm not entirely sure it's a good option.
> The validity of vectors aren't architecturally tied to the existince of any
> particular feature, at least not explicitly. For the "newer" vectors, i.e. the
> ones that we can treat as conditionally valid, it's pretty obvious which features
> they "belong" to, but even then I hesitate to draw connections, e.g. on the off
> chance that some weird hypervisor checks Family/Model/Stepping or something.
>
> > If we'll reuse that for kvm_trace_sym_exc() it will need more work, but
> > I don't see why we need a dynamic list for kvm_trace_sym_exc().
>
> Sorry, this is for option #2. Hardcode the set of vectors that KVM allows (to
> prevent L1 from throwing pure garbage at hardware), but otherwise defer to the
> CPU to enforce the reserved vectors.
>
> Hrm, but option #2 just delays the inevitable, e.g. we'll be stuck once again
> when KVM supports some new vector, in which case we'll have to change guest
> visible behavior _again_, or bite the bullet and do option #1.
>
> So I guess do option #1 straight away and hope nothing breaks? Maybe hardcode
> everything as supported except #CP (SHSTK) and #VC (SEV-ES)?
Ack, will do something like:
#define ALWAYS_VALID_EVTINJ_EXEPTS (DE_VECTOR | DB_VECTOR | BP_VECTOR | \
OF_VECTOR | BR_VECTOR | UD_VECTOR | \
NM_VECTOR | DF_VECTOR | TS_VECTOR | \
NP_VECTOR | SS_VECTOR | GP_VECTOR | \
PF_VECTOR | MF_VECTOR | AC_VECTOR | \
MC_VECTOR | XM_VECTOR | VE_VECTOR | \
HV_VECTOR | SX_VECTOR)
static bool nested_svm_event_inj_valid_exept(struct kvm_vcpu *vcpu, u8 vector)
{
return ((1 << vector) & ALWAYS_VALID_EVTINJ_EXEPTS) ||
(vector == CP_VECTOR && guest_cpu_cap_has(X86_FEATURE_SHSTK)) ||
(vector == VC_VECTOR && guest_cpu_cap_has(X86_FEATURE_SEV_ES));
}
I will rebase this on top of the LBRV fixes I just sent out and include
this change and send a v2 early next week.
>
> > So my best guess is that I didn't really understand your suggestion :)
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-11-08 2:29 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-04 19:59 [PATCH 00/11] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
2025-11-04 19:59 ` [PATCH 01/11] KVM: nSVM: Fix consistency checks for NP_ENABLE Yosry Ahmed
2025-11-05 18:52 ` Yosry Ahmed
2025-11-04 19:59 ` [PATCH 02/11] KVM: nSVM: Add missing consistency check for EFER, CR0, CR4, and CS Yosry Ahmed
2025-11-04 19:59 ` [PATCH 03/11] KVM: nSVM: Add missing consistency check for event_inj Yosry Ahmed
2025-11-05 18:48 ` Sean Christopherson
2025-11-05 19:29 ` Yosry Ahmed
2025-11-06 1:17 ` Sean Christopherson
2025-11-08 2:29 ` Yosry Ahmed
2025-11-04 19:59 ` [PATCH 04/11] KVM: SVM: Rename vmcb->nested_ctl to vmcb->misc_ctl Yosry Ahmed
2025-11-04 19:59 ` [PATCH 05/11] KVM: SVM: Rename vmcb->virt_ext to vmcb->misc_ctl2 Yosry Ahmed
2025-11-04 19:59 ` [PATCH 06/11] KVM: SVM: switch svm_copy_lbrs() to a macro Yosry Ahmed
2025-11-04 19:59 ` [PATCH 07/11] KVM: nSVM: Cache all used fields from VMCB12 Yosry Ahmed
2025-11-04 19:59 ` [PATCH 08/11] KVM: nSVM: Restrict mapping VMCB12 on nested VMRUN Yosry Ahmed
2025-11-04 19:59 ` [PATCH 09/11] KVM: nSVM: Simplify nested_svm_vmrun() Yosry Ahmed
2025-11-04 19:59 ` [PATCH 10/11] KVM: nSVM: Sanitize control fields copied from VMCB12 Yosry Ahmed
2025-11-04 19:59 ` [PATCH 11/11] KVM: nSVM: Only copy NP_ENABLE from VMCB01's nested_ctl Yosry Ahmed
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).