public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 01/26] KVM: nSVM: Avoid clearing VMCB_LBR in vmcb12
       [not found] <20260303003421.2185681-1-yosry@kernel.org>
@ 2026-03-03  0:33 ` Yosry Ahmed
  2026-03-03  0:33 ` [PATCH v7 02/26] KVM: SVM: Switch svm_copy_lbrs() to a macro Yosry Ahmed
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03  0:33 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable

svm_copy_lbrs() always marks VMCB_LBR dirty in the destination VMCB.
However, nested_svm_vmexit() uses it to copy LBRs to vmcb12, and
clearing clean bits in vmcb12 is not architecturally defined.

Move vmcb_mark_dirty() to callers and drop it for vmcb12.

This also facilitates incoming refactoring that does not pass the entire
VMCB to svm_copy_lbrs().

Fixes: d20c796ca370 ("KVM: x86: nSVM: implement nested LBR virtualization")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 7 +++++--
 arch/x86/kvm/svm/svm.c    | 2 --
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index de90b104a0dd5..a31f3be1e16ec 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -714,6 +714,7 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
 	} else {
 		svm_copy_lbrs(vmcb02, vmcb01);
 	}
+	vmcb_mark_dirty(vmcb02, VMCB_LBR);
 	svm_update_lbrv(&svm->vcpu);
 }
 
@@ -1232,10 +1233,12 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 		kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
 
 	if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
-		     (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK)))
+		     (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
 		svm_copy_lbrs(vmcb12, vmcb02);
-	else
+	} else {
 		svm_copy_lbrs(vmcb01, vmcb02);
+		vmcb_mark_dirty(vmcb01, VMCB_LBR);
+	}
 
 	svm_update_lbrv(vcpu);
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8f8bc863e2143..a2452b8ec49db 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -848,8 +848,6 @@ void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
 	to_vmcb->save.br_to		= from_vmcb->save.br_to;
 	to_vmcb->save.last_excp_from	= from_vmcb->save.last_excp_from;
 	to_vmcb->save.last_excp_to	= from_vmcb->save.last_excp_to;
-
-	vmcb_mark_dirty(to_vmcb, VMCB_LBR);
 }
 
 static void __svm_enable_lbrv(struct kvm_vcpu *vcpu)
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 02/26] KVM: SVM: Switch svm_copy_lbrs() to a macro
       [not found] <20260303003421.2185681-1-yosry@kernel.org>
  2026-03-03  0:33 ` [PATCH v7 01/26] KVM: nSVM: Avoid clearing VMCB_LBR in vmcb12 Yosry Ahmed
@ 2026-03-03  0:33 ` Yosry Ahmed
  2026-03-03  0:33 ` [PATCH v7 03/26] KVM: SVM: Add missing save/restore handling of LBR MSRs Yosry Ahmed
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03  0:33 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable

In preparation for using svm_copy_lbrs() with 'struct vmcb_save_area'
without a containing 'struct vmcb', and later even 'struct
vmcb_save_area_cached', make it a macro.

Macros are generally not preferred compared to functions, mainly due to
type-safety. However, in this case it seems like having a simple macro
copying a few fields is better than copy-pasting the same 5 lines of
code in different places.

Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c |  8 ++++----
 arch/x86/kvm/svm/svm.c    |  9 ---------
 arch/x86/kvm/svm/svm.h    | 10 +++++++++-
 3 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index a31f3be1e16ec..f7d5db0af69ac 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -709,10 +709,10 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
 		 * Reserved bits of DEBUGCTL are ignored.  Be consistent with
 		 * svm_set_msr's definition of reserved bits.
 		 */
-		svm_copy_lbrs(vmcb02, vmcb12);
+		svm_copy_lbrs(&vmcb02->save, &vmcb12->save);
 		vmcb02->save.dbgctl &= ~DEBUGCTL_RESERVED_BITS;
 	} else {
-		svm_copy_lbrs(vmcb02, vmcb01);
+		svm_copy_lbrs(&vmcb02->save, &vmcb01->save);
 	}
 	vmcb_mark_dirty(vmcb02, VMCB_LBR);
 	svm_update_lbrv(&svm->vcpu);
@@ -1234,9 +1234,9 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 
 	if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
 		     (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
-		svm_copy_lbrs(vmcb12, vmcb02);
+		svm_copy_lbrs(&vmcb12->save, &vmcb02->save);
 	} else {
-		svm_copy_lbrs(vmcb01, vmcb02);
+		svm_copy_lbrs(&vmcb01->save, &vmcb02->save);
 		vmcb_mark_dirty(vmcb01, VMCB_LBR);
 	}
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a2452b8ec49db..f52e588317fcf 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -841,15 +841,6 @@ static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 	 */
 }
 
-void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
-{
-	to_vmcb->save.dbgctl		= from_vmcb->save.dbgctl;
-	to_vmcb->save.br_from		= from_vmcb->save.br_from;
-	to_vmcb->save.br_to		= from_vmcb->save.br_to;
-	to_vmcb->save.last_excp_from	= from_vmcb->save.last_excp_from;
-	to_vmcb->save.last_excp_to	= from_vmcb->save.last_excp_to;
-}
-
 static void __svm_enable_lbrv(struct kvm_vcpu *vcpu)
 {
 	to_svm(vcpu)->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index ebd7b36b1ceb9..44d767cd1d25a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -713,8 +713,16 @@ static inline void *svm_vcpu_alloc_msrpm(void)
 	return svm_alloc_permissions_map(MSRPM_SIZE, GFP_KERNEL_ACCOUNT);
 }
 
+#define svm_copy_lbrs(to, from)					\
+do {								\
+	(to)->dbgctl		= (from)->dbgctl;		\
+	(to)->br_from		= (from)->br_from;		\
+	(to)->br_to		= (from)->br_to;		\
+	(to)->last_excp_from	= (from)->last_excp_from;	\
+	(to)->last_excp_to	= (from)->last_excp_to;		\
+} while (0)
+
 void svm_vcpu_free_msrpm(void *msrpm);
-void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb);
 void svm_enable_lbrv(struct kvm_vcpu *vcpu);
 void svm_update_lbrv(struct kvm_vcpu *vcpu);
 
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 03/26] KVM: SVM: Add missing save/restore handling of LBR MSRs
       [not found] <20260303003421.2185681-1-yosry@kernel.org>
  2026-03-03  0:33 ` [PATCH v7 01/26] KVM: nSVM: Avoid clearing VMCB_LBR in vmcb12 Yosry Ahmed
  2026-03-03  0:33 ` [PATCH v7 02/26] KVM: SVM: Switch svm_copy_lbrs() to a macro Yosry Ahmed
@ 2026-03-03  0:33 ` Yosry Ahmed
  2026-03-03 16:37   ` Sean Christopherson
  2026-03-03  0:33 ` [PATCH v7 05/26] KVM: nSVM: Always inject a #GP if mapping VMCB12 fails on nested VMRUN Yosry Ahmed
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03  0:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable,
	Jim Mattson

MSR_IA32_DEBUGCTLMSR and LBR MSRs are currently not enumerated by
KVM_GET_MSR_INDEX_LIST, and LBR MSRs cannot be set with KVM_SET_MSRS. So
save/restore is completely broken.

Fix it by adding the MSRs to msrs_to_save_base, and allowing writes to
LBR MSRs from userspace only (as they are read-only MSRs). Additionally,
to correctly restore L1's LBRs while L2 is running, make sure the LBRs
are copied from the captured VMCB01 save area in svm_copy_vmrun_state().

For VMX, this also adds save/restore handling of KVM_GET_MSR_INDEX_LIST.
For unspported MSR_IA32_LAST* MSRs, kvm_do_msr_access() should 0 these
MSRs on userspace reads, and ignore KVM_MSR_RET_UNSUPPORTED on userspace
writes.

Fixes: 24e09cbf480a ("KVM: SVM: enable LBR virtualization")
Cc: stable@vger.kernel.org
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c |  5 +++++
 arch/x86/kvm/svm/svm.c    | 24 ++++++++++++++++++++++++
 arch/x86/kvm/x86.c        |  3 +++
 3 files changed, 32 insertions(+)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index f7d5db0af69ac..3bf758c9cb85c 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1100,6 +1100,11 @@ void svm_copy_vmrun_state(struct vmcb_save_area *to_save,
 		to_save->isst_addr = from_save->isst_addr;
 		to_save->ssp = from_save->ssp;
 	}
+
+	if (lbrv) {
+		svm_copy_lbrs(to_save, from_save);
+		to_save->dbgctl &= ~DEBUGCTL_RESERVED_BITS;
+	}
 }
 
 void svm_copy_vmloadsave_state(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f52e588317fcf..cb53174583a26 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3071,6 +3071,30 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
 		svm_update_lbrv(vcpu);
 		break;
+	case MSR_IA32_LASTBRANCHFROMIP:
+		if (!msr->host_initiated)
+			return 1;
+		svm->vmcb->save.br_from = data;
+		vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
+		break;
+	case MSR_IA32_LASTBRANCHTOIP:
+		if (!msr->host_initiated)
+			return 1;
+		svm->vmcb->save.br_to = data;
+		vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
+		break;
+	case MSR_IA32_LASTINTFROMIP:
+		if (!msr->host_initiated)
+			return 1;
+		svm->vmcb->save.last_excp_from = data;
+		vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
+		break;
+	case MSR_IA32_LASTINTTOIP:
+		if (!msr->host_initiated)
+			return 1;
+		svm->vmcb->save.last_excp_to = data;
+		vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
+		break;
 	case MSR_VM_HSAVE_PA:
 		/*
 		 * Old kernels did not validate the value written to
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index db3f393192d94..416899b5dbe4d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -351,6 +351,9 @@ static const u32 msrs_to_save_base[] = {
 	MSR_IA32_U_CET, MSR_IA32_S_CET,
 	MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
 	MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB,
+	MSR_IA32_DEBUGCTLMSR,
+	MSR_IA32_LASTBRANCHFROMIP, MSR_IA32_LASTBRANCHTOIP,
+	MSR_IA32_LASTINTFROMIP, MSR_IA32_LASTINTTOIP,
 };
 
 static const u32 msrs_to_save_pmu[] = {
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 05/26] KVM: nSVM: Always inject a #GP if mapping VMCB12 fails on nested VMRUN
       [not found] <20260303003421.2185681-1-yosry@kernel.org>
                   ` (2 preceding siblings ...)
  2026-03-03  0:33 ` [PATCH v7 03/26] KVM: SVM: Add missing save/restore handling of LBR MSRs Yosry Ahmed
@ 2026-03-03  0:33 ` Yosry Ahmed
  2026-03-03  0:34 ` [PATCH v7 06/26] KVM: nSVM: Refactor checking LBRV enablement in vmcb12 into a helper Yosry Ahmed
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03  0:33 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable

nested_svm_vmrun() currently only injects a #GP if kvm_vcpu_map() fails
with -EINVAL. But it could also fail with -EFAULT if creating a host
mapping failed. Inject a #GP in all cases, no reason to treat failure
modes differently.

Fixes: 8c5fbf1a7231 ("KVM/nSVM: Use the new mapping API for mapping guest memory")
CC: stable@vger.kernel.org
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 3bf758c9cb85c..25f769a0d9a0d 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1011,12 +1011,9 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
 	}
 
 	vmcb12_gpa = svm->vmcb->save.rax;
-	ret = kvm_vcpu_map(vcpu, gpa_to_gfn(vmcb12_gpa), &map);
-	if (ret == -EINVAL) {
+	if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmcb12_gpa), &map)) {
 		kvm_inject_gp(vcpu, 0);
 		return 1;
-	} else if (ret) {
-		return kvm_skip_emulated_instruction(vcpu);
 	}
 
 	ret = kvm_skip_emulated_instruction(vcpu);
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 06/26] KVM: nSVM: Refactor checking LBRV enablement in vmcb12 into a helper
       [not found] <20260303003421.2185681-1-yosry@kernel.org>
                   ` (3 preceding siblings ...)
  2026-03-03  0:33 ` [PATCH v7 05/26] KVM: nSVM: Always inject a #GP if mapping VMCB12 fails on nested VMRUN Yosry Ahmed
@ 2026-03-03  0:34 ` Yosry Ahmed
  2026-03-03  0:34 ` [PATCH v7 07/26] KVM: nSVM: Refactor writing vmcb12 on nested #VMEXIT as " Yosry Ahmed
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03  0:34 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable

Refactor the vCPU cap and vmcb12 flag checks into a helper. The
unlikely() annotation is dropped, it's unlikely (huh) to make a
difference and the CPU will probably predict it better on its own.

CC: stable@vger.kernel.org
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 25f769a0d9a0d..d84af051f65bc 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -639,6 +639,12 @@ void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm)
 	svm->nested.vmcb02.ptr->save.g_pat = svm->vmcb01.ptr->save.g_pat;
 }
 
+static bool nested_vmcb12_has_lbrv(struct kvm_vcpu *vcpu)
+{
+	return guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
+		(to_svm(vcpu)->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK);
+}
+
 static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12)
 {
 	bool new_vmcb12 = false;
@@ -703,8 +709,7 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
 		vmcb_mark_dirty(vmcb02, VMCB_DR);
 	}
 
-	if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
-		     (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
+	if (nested_vmcb12_has_lbrv(vcpu)) {
 		/*
 		 * Reserved bits of DEBUGCTL are ignored.  Be consistent with
 		 * svm_set_msr's definition of reserved bits.
@@ -1234,8 +1239,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	if (!nested_exit_on_intr(svm))
 		kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
 
-	if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
-		     (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
+	if (nested_vmcb12_has_lbrv(vcpu)) {
 		svm_copy_lbrs(&vmcb12->save, &vmcb02->save);
 	} else {
 		svm_copy_lbrs(&vmcb01->save, &vmcb02->save);
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 07/26] KVM: nSVM: Refactor writing vmcb12 on nested #VMEXIT as a helper
       [not found] <20260303003421.2185681-1-yosry@kernel.org>
                   ` (4 preceding siblings ...)
  2026-03-03  0:34 ` [PATCH v7 06/26] KVM: nSVM: Refactor checking LBRV enablement in vmcb12 into a helper Yosry Ahmed
@ 2026-03-03  0:34 ` Yosry Ahmed
  2026-03-03  0:34 ` [PATCH v7 08/26] KVM: nSVM: Triple fault if mapping VMCB12 fails on nested #VMEXIT Yosry Ahmed
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03  0:34 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable

Move mapping vmcb12 and updating it out of nested_svm_vmexit() into a
helper, no functional change intended.

CC: stable@vger.kernel.org
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 77 ++++++++++++++++++++++-----------------
 1 file changed, 44 insertions(+), 33 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index d84af051f65bc..82a92501ee86a 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1125,36 +1125,20 @@ void svm_copy_vmloadsave_state(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
 	to_vmcb->save.sysenter_eip = from_vmcb->save.sysenter_eip;
 }
 
-int nested_svm_vmexit(struct vcpu_svm *svm)
+static int nested_svm_vmexit_update_vmcb12(struct kvm_vcpu *vcpu)
 {
-	struct kvm_vcpu *vcpu = &svm->vcpu;
-	struct vmcb *vmcb01 = svm->vmcb01.ptr;
+	struct vcpu_svm *svm = to_svm(vcpu);
 	struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;
-	struct vmcb *vmcb12;
 	struct kvm_host_map map;
+	struct vmcb *vmcb12;
 	int rc;
 
 	rc = kvm_vcpu_map(vcpu, gpa_to_gfn(svm->nested.vmcb12_gpa), &map);
-	if (rc) {
-		if (rc == -EINVAL)
-			kvm_inject_gp(vcpu, 0);
-		return 1;
-	}
+	if (rc)
+		return rc;
 
 	vmcb12 = map.hva;
 
-	/* Exit Guest-Mode */
-	leave_guest_mode(vcpu);
-	svm->nested.vmcb12_gpa = 0;
-	WARN_ON_ONCE(svm->nested.nested_run_pending);
-
-	kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
-
-	/* in case we halted in L2 */
-	kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
-
-	/* Give the current vmcb to the guest */
-
 	vmcb12->save.es     = vmcb02->save.es;
 	vmcb12->save.cs     = vmcb02->save.cs;
 	vmcb12->save.ss     = vmcb02->save.ss;
@@ -1191,10 +1175,48 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
 		vmcb12->control.next_rip  = vmcb02->control.next_rip;
 
+	if (nested_vmcb12_has_lbrv(vcpu))
+		svm_copy_lbrs(&vmcb12->save, &vmcb02->save);
+
 	vmcb12->control.int_ctl           = svm->nested.ctl.int_ctl;
 	vmcb12->control.event_inj         = svm->nested.ctl.event_inj;
 	vmcb12->control.event_inj_err     = svm->nested.ctl.event_inj_err;
 
+	trace_kvm_nested_vmexit_inject(vmcb12->control.exit_code,
+				       vmcb12->control.exit_info_1,
+				       vmcb12->control.exit_info_2,
+				       vmcb12->control.exit_int_info,
+				       vmcb12->control.exit_int_info_err,
+				       KVM_ISA_SVM);
+
+	kvm_vcpu_unmap(vcpu, &map);
+	return 0;
+}
+
+int nested_svm_vmexit(struct vcpu_svm *svm)
+{
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct vmcb *vmcb01 = svm->vmcb01.ptr;
+	struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;
+	int rc;
+
+	rc = nested_svm_vmexit_update_vmcb12(vcpu);
+	if (rc) {
+		if (rc == -EINVAL)
+			kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
+	/* Exit Guest-Mode */
+	leave_guest_mode(vcpu);
+	svm->nested.vmcb12_gpa = 0;
+	WARN_ON_ONCE(svm->nested.nested_run_pending);
+
+	kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
+
+	/* in case we halted in L2 */
+	kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
+
 	if (!kvm_pause_in_guest(vcpu->kvm)) {
 		vmcb01->control.pause_filter_count = vmcb02->control.pause_filter_count;
 		vmcb_mark_dirty(vmcb01, VMCB_INTERCEPTS);
@@ -1239,9 +1261,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	if (!nested_exit_on_intr(svm))
 		kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
 
-	if (nested_vmcb12_has_lbrv(vcpu)) {
-		svm_copy_lbrs(&vmcb12->save, &vmcb02->save);
-	} else {
+	if (!nested_vmcb12_has_lbrv(vcpu)) {
 		svm_copy_lbrs(&vmcb01->save, &vmcb02->save);
 		vmcb_mark_dirty(vmcb01, VMCB_LBR);
 	}
@@ -1297,15 +1317,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	svm->vcpu.arch.dr7 = DR7_FIXED_1;
 	kvm_update_dr7(&svm->vcpu);
 
-	trace_kvm_nested_vmexit_inject(vmcb12->control.exit_code,
-				       vmcb12->control.exit_info_1,
-				       vmcb12->control.exit_info_2,
-				       vmcb12->control.exit_int_info,
-				       vmcb12->control.exit_int_info_err,
-				       KVM_ISA_SVM);
-
-	kvm_vcpu_unmap(vcpu, &map);
-
 	nested_svm_transition_tlb_flush(vcpu);
 
 	nested_svm_uninit_mmu_context(vcpu);
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 08/26] KVM: nSVM: Triple fault if mapping VMCB12 fails on nested #VMEXIT
       [not found] <20260303003421.2185681-1-yosry@kernel.org>
                   ` (5 preceding siblings ...)
  2026-03-03  0:34 ` [PATCH v7 07/26] KVM: nSVM: Refactor writing vmcb12 on nested #VMEXIT as " Yosry Ahmed
@ 2026-03-03  0:34 ` Yosry Ahmed
  2026-03-03  0:34 ` [PATCH v7 09/26] KVM: nSVM: Triple fault if restore host CR3 " Yosry Ahmed
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03  0:34 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable

KVM currently injects a #GP and hopes for the best if mapping VMCB12
fails on nested #VMEXIT, and only if the failure mode is -EINVAL.
Mapping the VMCB12 could also fail if creating host mappings fails.

After the #GP is injected, nested_svm_vmexit() bails early, without
cleaning up (e.g. KVM_REQ_GET_NESTED_STATE_PAGES is set, is_guest_mode()
is true, etc).

Instead of optionally injecting a #GP, triple fault the guest if mapping
VMCB12 fails since KVM cannot make a sane recovery. The APM states that
a #VMEXIT will triple fault if host state is illegal or an exception
occurs while loading host state, so the behavior is not entirely made
up.

Do not return early from nested_svm_vmexit(), continue cleaning up the
vCPU state (e.g. switch back to vmcb01), to handle the failure as
gracefully as possible.

Fixes: cf74a78b229d ("KVM: SVM: Add VMEXIT handler and intercepts")
CC: stable@vger.kernel.org
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 82a92501ee86a..5ad0ac3680fdd 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1200,12 +1200,8 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;
 	int rc;
 
-	rc = nested_svm_vmexit_update_vmcb12(vcpu);
-	if (rc) {
-		if (rc == -EINVAL)
-			kvm_inject_gp(vcpu, 0);
-		return 1;
-	}
+	if (nested_svm_vmexit_update_vmcb12(vcpu))
+		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
 
 	/* Exit Guest-Mode */
 	leave_guest_mode(vcpu);
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 09/26] KVM: nSVM: Triple fault if restore host CR3 fails on nested #VMEXIT
       [not found] <20260303003421.2185681-1-yosry@kernel.org>
                   ` (6 preceding siblings ...)
  2026-03-03  0:34 ` [PATCH v7 08/26] KVM: nSVM: Triple fault if mapping VMCB12 fails on nested #VMEXIT Yosry Ahmed
@ 2026-03-03  0:34 ` Yosry Ahmed
  2026-03-03 16:49   ` Sean Christopherson
  2026-03-03  0:34 ` [PATCH v7 10/26] KVM: nSVM: Clear GIF on nested #VMEXIT(INVALID) Yosry Ahmed
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03  0:34 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable

If loading L1's CR3 fails on a nested #VMEXIT, nested_svm_vmexit()
returns an error code that is ignored by most callers, and continues to
run L1 with corrupted state. A sane recovery is not possible in this
case, and HW behavior is to cause a shutdown. Inject a triple fault
,nstead, and do not return early from nested_svm_vmexit(). Continue
cleaning up the vCPU state (e.g. clear pending exceptions), to handle
the failure as gracefully as possible.

From the APM:
	Upon #VMEXIT, the processor performs the following actions in
	order to return to the host execution context:

	...
	if (illegal host state loaded, or exception while loading
	    host state)
		shutdown
	else
		execute first host instruction following the VMRUN

Remove the return value of nested_svm_vmexit(), which is mostly
unchecked anyway.

Fixes: d82aaef9c88a ("KVM: nSVM: use nested_svm_load_cr3() on guest->host switch")
CC: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 10 +++-------
 arch/x86/kvm/svm/svm.c    | 11 ++---------
 arch/x86/kvm/svm/svm.h    |  6 +++---
 3 files changed, 8 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 5ad0ac3680fdd..bb2cec5fd0434 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1193,12 +1193,11 @@ static int nested_svm_vmexit_update_vmcb12(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-int nested_svm_vmexit(struct vcpu_svm *svm)
+void nested_svm_vmexit(struct vcpu_svm *svm)
 {
 	struct kvm_vcpu *vcpu = &svm->vcpu;
 	struct vmcb *vmcb01 = svm->vmcb01.ptr;
 	struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;
-	int rc;
 
 	if (nested_svm_vmexit_update_vmcb12(vcpu))
 		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
@@ -1317,9 +1316,8 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 
 	nested_svm_uninit_mmu_context(vcpu);
 
-	rc = nested_svm_load_cr3(vcpu, vmcb01->save.cr3, false, true);
-	if (rc)
-		return 1;
+	if (nested_svm_load_cr3(vcpu, vmcb01->save.cr3, false, true))
+		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
 
 	/*
 	 * Drop what we picked up for L2 via svm_complete_interrupts() so it
@@ -1344,8 +1342,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	 */
 	if (kvm_apicv_activated(vcpu->kvm))
 		__kvm_vcpu_update_apicv(vcpu);
-
-	return 0;
 }
 
 static void nested_svm_triple_fault(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cb53174583a26..1b31b033d79b0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2234,13 +2234,9 @@ static int emulate_svm_instr(struct kvm_vcpu *vcpu, int opcode)
 		[SVM_INSTR_VMSAVE] = vmsave_interception,
 	};
 	struct vcpu_svm *svm = to_svm(vcpu);
-	int ret;
 
 	if (is_guest_mode(vcpu)) {
-		/* Returns '1' or -errno on failure, '0' on success. */
-		ret = nested_svm_simple_vmexit(svm, guest_mode_exit_codes[opcode]);
-		if (ret)
-			return ret;
+		nested_svm_simple_vmexit(svm, guest_mode_exit_codes[opcode]);
 		return 1;
 	}
 	return svm_instr_handlers[opcode](vcpu);
@@ -4796,7 +4792,6 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	struct kvm_host_map map_save;
-	int ret;
 
 	if (!is_guest_mode(vcpu))
 		return 0;
@@ -4816,9 +4811,7 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram)
 	svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
 	svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
 
-	ret = nested_svm_simple_vmexit(svm, SVM_EXIT_SW);
-	if (ret)
-		return ret;
+	nested_svm_simple_vmexit(svm, SVM_EXIT_SW);
 
 	/*
 	 * KVM uses VMCB01 to store L1 host state while L2 runs but
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 44d767cd1d25a..7629cb37c9302 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -793,14 +793,14 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu);
 void svm_copy_vmrun_state(struct vmcb_save_area *to_save,
 			  struct vmcb_save_area *from_save);
 void svm_copy_vmloadsave_state(struct vmcb *to_vmcb, struct vmcb *from_vmcb);
-int nested_svm_vmexit(struct vcpu_svm *svm);
+void nested_svm_vmexit(struct vcpu_svm *svm);
 
-static inline int nested_svm_simple_vmexit(struct vcpu_svm *svm, u32 exit_code)
+static inline void nested_svm_simple_vmexit(struct vcpu_svm *svm, u32 exit_code)
 {
 	svm->vmcb->control.exit_code	= exit_code;
 	svm->vmcb->control.exit_info_1	= 0;
 	svm->vmcb->control.exit_info_2	= 0;
-	return nested_svm_vmexit(svm);
+	nested_svm_vmexit(svm);
 }
 
 int nested_svm_exit_handled(struct vcpu_svm *svm);
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 10/26] KVM: nSVM: Clear GIF on nested #VMEXIT(INVALID)
       [not found] <20260303003421.2185681-1-yosry@kernel.org>
                   ` (7 preceding siblings ...)
  2026-03-03  0:34 ` [PATCH v7 09/26] KVM: nSVM: Triple fault if restore host CR3 " Yosry Ahmed
@ 2026-03-03  0:34 ` Yosry Ahmed
  2026-03-03  0:34 ` [PATCH v7 11/26] KVM: nSVM: Clear EVENTINJ fields in vmcb12 on nested #VMEXIT Yosry Ahmed
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03  0:34 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable

According to the APM, GIF is set to 0 on any #VMEXIT, including
an #VMEXIT(INVALID) due to failed consistency checks. Clear GIF on
consistency check failures.

Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index bb2cec5fd0434..04ccab887c5ec 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1036,6 +1036,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
 		vmcb12->control.exit_code    = SVM_EXIT_ERR;
 		vmcb12->control.exit_info_1  = 0;
 		vmcb12->control.exit_info_2  = 0;
+		svm_set_gif(svm, false);
 		goto out;
 	}
 
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 11/26] KVM: nSVM: Clear EVENTINJ fields in vmcb12 on nested #VMEXIT
       [not found] <20260303003421.2185681-1-yosry@kernel.org>
                   ` (8 preceding siblings ...)
  2026-03-03  0:34 ` [PATCH v7 10/26] KVM: nSVM: Clear GIF on nested #VMEXIT(INVALID) Yosry Ahmed
@ 2026-03-03  0:34 ` Yosry Ahmed
  2026-03-03  0:34 ` [PATCH v7 12/26] KVM: nSVM: Clear tracking of L1->L2 NMI and soft IRQ " Yosry Ahmed
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03  0:34 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable

According to the APM, from the reference of the VMRUN instruction:

	Upon #VMEXIT, the processor performs the following actions in
	order to return to the host execution context:
	...
	clear EVENTINJ field in VMCB

KVM syncs EVENTINJ fields from vmcb02 to cached vmcb12 on every L2->L0
 #VMEXIT. Since these fields are zeroed by the CPU on #VMEXIT, they will
mostly be zeroed in vmcb12 on nested #VMEXIT by nested_svm_vmexit().
However, this is not the case when:
1. Consistency checks fail, as nested_svm_vmexit() is not called.
2. Entering guest mode fails before L2 runs (e.g. due to failed load of
   CR3).

(2) was broken by commit 2d8a42be0e2b ("KVM: nSVM: synchronize VMCB
controls updated by the processor on every vmexit"), as prior to that
nested_svm_vmexit() always zeroed EVENTINJ fields.

Explicitly clear the fields all nested #VMEXIT code paths.

Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler")
Fixes: 2d8a42be0e2b ("KVM: nSVM: synchronize VMCB controls updated by the processor on every vmexit")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 04ccab887c5ec..f0ed352a3e901 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1036,6 +1036,8 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
 		vmcb12->control.exit_code    = SVM_EXIT_ERR;
 		vmcb12->control.exit_info_1  = 0;
 		vmcb12->control.exit_info_2  = 0;
+		vmcb12->control.event_inj = 0;
+		vmcb12->control.event_inj_err = 0;
 		svm_set_gif(svm, false);
 		goto out;
 	}
@@ -1179,9 +1181,9 @@ static int nested_svm_vmexit_update_vmcb12(struct kvm_vcpu *vcpu)
 	if (nested_vmcb12_has_lbrv(vcpu))
 		svm_copy_lbrs(&vmcb12->save, &vmcb02->save);
 
+	vmcb12->control.event_inj	  = 0;
+	vmcb12->control.event_inj_err	  = 0;
 	vmcb12->control.int_ctl           = svm->nested.ctl.int_ctl;
-	vmcb12->control.event_inj         = svm->nested.ctl.event_inj;
-	vmcb12->control.event_inj_err     = svm->nested.ctl.event_inj_err;
 
 	trace_kvm_nested_vmexit_inject(vmcb12->control.exit_code,
 				       vmcb12->control.exit_info_1,
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 12/26] KVM: nSVM: Clear tracking of L1->L2 NMI and soft IRQ on nested #VMEXIT
       [not found] <20260303003421.2185681-1-yosry@kernel.org>
                   ` (9 preceding siblings ...)
  2026-03-03  0:34 ` [PATCH v7 11/26] KVM: nSVM: Clear EVENTINJ fields in vmcb12 on nested #VMEXIT Yosry Ahmed
@ 2026-03-03  0:34 ` Yosry Ahmed
  2026-03-03 16:50   ` Sean Christopherson
  2026-03-03  0:34 ` [PATCH v7 13/26] KVM: nSVM: Drop nested_vmcb_check_{save/control}() wrappers Yosry Ahmed
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03  0:34 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable

KVM clears tracking of L1->L2 injected NMIs (i.e. nmi_l1_to_l2) and soft
IRQs (i.e. soft_int_injected) on a synthesized #VMEXIT(INVALID) due to
failed VMRUN. However, they are not explicitly cleared in other
synthesized #VMEXITs.

soft_int_injected is always cleared after the first VMRUN of L2 when
completing interrupts, as any re-injection is then tracked by KVM
(instead of purely in vmcb02).

nmi_l1_to_l2 is not cleared after the first VMRUN if NMI injection
failed, as KVM still needs to keep track that the NMI originated from L1
to avoid blocking NMIs for L1. It is only cleared when the NMI injection
succeeds.

KVM could synthesize a #VMEXIT to L1 before successfully injecting the
NMI into L2 (e.g. due to a #NPF on L2's NMI handler in L1's NPTs). In
this case, nmi_l1_to_l2 will remain true, and KVM may not correctly mask
NMIs and intercept IRET when injecting an NMI into L1.

Clear both nmi_l1_to_l2 and soft_int_injected in nested_svm_vmexit() to
capture all #VMEXITs, except those that occur due to failed consistency
checks, as those happen before nmi_l1_to_l2 or soft_int_injected are
set.

Fixes: 159fc6fa3b7d ("KVM: nSVM: Transparently handle L1 -> L2 NMI re-injection")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index f0ed352a3e901..b66bd9bfce9d8 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1065,8 +1065,6 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
 
 out_exit_err:
 	svm->nested.nested_run_pending = 0;
-	svm->nmi_l1_to_l2 = false;
-	svm->soft_int_injected = false;
 
 	svm->vmcb->control.exit_code    = SVM_EXIT_ERR;
 	svm->vmcb->control.exit_info_1  = 0;
@@ -1322,6 +1320,10 @@ void nested_svm_vmexit(struct vcpu_svm *svm)
 	if (nested_svm_load_cr3(vcpu, vmcb01->save.cr3, false, true))
 		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
 
+	/* Drop tracking for L1->L2 injected NMIs and soft IRQs */
+	svm->nmi_l1_to_l2 = false;
+	svm->soft_int_injected = false;
+
 	/*
 	 * Drop what we picked up for L2 via svm_complete_interrupts() so it
 	 * doesn't end up in L1.
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 13/26] KVM: nSVM: Drop nested_vmcb_check_{save/control}() wrappers
       [not found] <20260303003421.2185681-1-yosry@kernel.org>
                   ` (10 preceding siblings ...)
  2026-03-03  0:34 ` [PATCH v7 12/26] KVM: nSVM: Clear tracking of L1->L2 NMI and soft IRQ " Yosry Ahmed
@ 2026-03-03  0:34 ` Yosry Ahmed
  2026-03-03  0:34 ` [PATCH v7 14/26] KVM: nSVM: Drop the non-architectural consistency check for NP_ENABLE Yosry Ahmed
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03  0:34 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable

The wrappers provide little value and make it harder to see what KVM is
checking in the normal flow. Drop them.

Opportunistically fixup comments referring to the functions, adding '()'
to make it clear it's a reference to a function.

No functional change intended.

Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 36 ++++++++++--------------------------
 1 file changed, 10 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index b66bd9bfce9d8..21e1a43c91879 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -339,8 +339,8 @@ static bool nested_svm_check_bitmap_pa(struct kvm_vcpu *vcpu, u64 pa, u32 size)
 	    kvm_vcpu_is_legal_gpa(vcpu, addr + size - 1);
 }
 
-static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
-					 struct vmcb_ctrl_area_cached *control)
+static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
+				       struct vmcb_ctrl_area_cached *control)
 {
 	if (CC(!vmcb12_is_intercept(control, INTERCEPT_VMRUN)))
 		return false;
@@ -367,8 +367,8 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
 }
 
 /* Common checks that apply to both L1 and L2 state.  */
-static bool __nested_vmcb_check_save(struct kvm_vcpu *vcpu,
-				     struct vmcb_save_area_cached *save)
+static bool nested_vmcb_check_save(struct kvm_vcpu *vcpu,
+				   struct vmcb_save_area_cached *save)
 {
 	if (CC(!(save->efer & EFER_SVME)))
 		return false;
@@ -402,22 +402,6 @@ static bool __nested_vmcb_check_save(struct kvm_vcpu *vcpu,
 	return true;
 }
 
-static bool nested_vmcb_check_save(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_svm *svm = to_svm(vcpu);
-	struct vmcb_save_area_cached *save = &svm->nested.save;
-
-	return __nested_vmcb_check_save(vcpu, save);
-}
-
-static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_svm *svm = to_svm(vcpu);
-	struct vmcb_ctrl_area_cached *ctl = &svm->nested.ctl;
-
-	return __nested_vmcb_check_controls(vcpu, ctl);
-}
-
 /*
  * If a feature is not advertised to L1, clear the corresponding vmcb12
  * intercept.
@@ -469,7 +453,7 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
 	to->pause_filter_count  = from->pause_filter_count;
 	to->pause_filter_thresh = from->pause_filter_thresh;
 
-	/* Copy asid here because nested_vmcb_check_controls will check it.  */
+	/* Copy asid here because nested_vmcb_check_controls() will check it */
 	to->asid           = from->asid;
 	to->msrpm_base_pa &= ~0x0fffULL;
 	to->iopm_base_pa  &= ~0x0fffULL;
@@ -1031,8 +1015,8 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
 	nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
 	nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
 
-	if (!nested_vmcb_check_save(vcpu) ||
-	    !nested_vmcb_check_controls(vcpu)) {
+	if (!nested_vmcb_check_save(vcpu, &svm->nested.save) ||
+	    !nested_vmcb_check_controls(vcpu, &svm->nested.ctl)) {
 		vmcb12->control.exit_code    = SVM_EXIT_ERR;
 		vmcb12->control.exit_info_1  = 0;
 		vmcb12->control.exit_info_2  = 0;
@@ -1878,12 +1862,12 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
 
 	ret = -EINVAL;
 	__nested_copy_vmcb_control_to_cache(vcpu, &ctl_cached, ctl);
-	if (!__nested_vmcb_check_controls(vcpu, &ctl_cached))
+	if (!nested_vmcb_check_controls(vcpu, &ctl_cached))
 		goto out_free;
 
 	/*
 	 * Processor state contains L2 state.  Check that it is
-	 * valid for guest mode (see nested_vmcb_check_save).
+	 * valid for guest mode (see nested_vmcb_check_save()).
 	 */
 	cr0 = kvm_read_cr0(vcpu);
         if (((cr0 & X86_CR0_CD) == 0) && (cr0 & X86_CR0_NW))
@@ -1897,7 +1881,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
 	if (!(save->cr0 & X86_CR0_PG) ||
 	    !(save->cr0 & X86_CR0_PE) ||
 	    (save->rflags & X86_EFLAGS_VM) ||
-	    !__nested_vmcb_check_save(vcpu, &save_cached))
+	    !nested_vmcb_check_save(vcpu, &save_cached))
 		goto out_free;
 
 
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 14/26] KVM: nSVM: Drop the non-architectural consistency check for NP_ENABLE
       [not found] <20260303003421.2185681-1-yosry@kernel.org>
                   ` (11 preceding siblings ...)
  2026-03-03  0:34 ` [PATCH v7 13/26] KVM: nSVM: Drop nested_vmcb_check_{save/control}() wrappers Yosry Ahmed
@ 2026-03-03  0:34 ` Yosry Ahmed
  2026-03-03  0:34 ` [PATCH v7 15/26] KVM: nSVM: Add missing consistency check for nCR3 validity Yosry Ahmed
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03  0:34 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable

KVM currenty fails a nested VMRUN and injects VMEXIT_INVALID (aka
SVM_EXIT_ERR) if L1 sets NP_ENABLE and the host does not support NPTs.
On first glance, it seems like the check should actually be for
guest_cpu_cap_has(X86_FEATURE_NPT) instead, as it is possible for the
host to support NPTs but the guest CPUID to not advertise it.

However, the consistency check is not architectural to begin with. The
APM does not mention VMEXIT_INVALID if NP_ENABLE is set on a processor
that does not have X86_FEATURE_NPT. Hence, NP_ENABLE should be ignored
if X86_FEATURE_NPT is not available for L1, so sanitize it when copying
from the VMCB12 to KVM's cache.

Apart from the consistency check, NP_ENABLE in VMCB12 is currently
ignored because the bit is actually copied from VMCB01 to VMCB02, not
from VMCB12.

Fixes: 4b16184c1cca ("KVM: SVM: Initialize Nested Nested MMU context on VMRUN")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 21e1a43c91879..613d5e2e7c3d1 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -348,9 +348,6 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
 	if (CC(control->asid == 0))
 		return false;
 
-	if (CC((control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) && !npt_enabled))
-		return false;
-
 	if (CC(!nested_svm_check_bitmap_pa(vcpu, control->msrpm_base_pa,
 					   MSRPM_SIZE)))
 		return false;
@@ -431,6 +428,11 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
 	nested_svm_sanitize_intercept(vcpu, to, SKINIT);
 	nested_svm_sanitize_intercept(vcpu, to, RDPRU);
 
+	/* Always clear SVM_NESTED_CTL_NP_ENABLE if the guest cannot use NPTs */
+	to->nested_ctl          = from->nested_ctl;
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NPT))
+		to->nested_ctl &= ~SVM_NESTED_CTL_NP_ENABLE;
+
 	to->iopm_base_pa        = from->iopm_base_pa;
 	to->msrpm_base_pa       = from->msrpm_base_pa;
 	to->tsc_offset          = from->tsc_offset;
@@ -444,7 +446,6 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
 	to->exit_info_2         = from->exit_info_2;
 	to->exit_int_info       = from->exit_int_info;
 	to->exit_int_info_err   = from->exit_int_info_err;
-	to->nested_ctl          = from->nested_ctl;
 	to->event_inj           = from->event_inj;
 	to->event_inj_err       = from->event_inj_err;
 	to->next_rip            = from->next_rip;
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 15/26] KVM: nSVM: Add missing consistency check for nCR3 validity
       [not found] <20260303003421.2185681-1-yosry@kernel.org>
                   ` (12 preceding siblings ...)
  2026-03-03  0:34 ` [PATCH v7 14/26] KVM: nSVM: Drop the non-architectural consistency check for NP_ENABLE Yosry Ahmed
@ 2026-03-03  0:34 ` Yosry Ahmed
  2026-03-03 16:56   ` Sean Christopherson
  2026-03-03  0:34 ` [PATCH v7 16/26] KVM: nSVM: Add missing consistency check for EFER, CR0, CR4, and CS Yosry Ahmed
  2026-03-03  0:34 ` [PATCH v7 17/26] KVM: nSVM: Add missing consistency check for EVENTINJ Yosry Ahmed
  15 siblings, 1 reply; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03  0:34 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable

From the APM Volume #2, 15.25.4 (24593—Rev. 3.42—March 2024):

	When VMRUN is executed with nested paging enabled
	(NP_ENABLE = 1), the following conditions are considered illegal
	state combinations, in addition to those mentioned in
	“Canonicalization and Consistency Checks”:
	• Any MBZ bit of nCR3 is set.
	• Any G_PAT.PA field has an unsupported type encoding or any
	reserved field in G_PAT has a nonzero value.

Add the consistency check for nCR3 being a legal GPA with no MBZ bits
set. The G_PAT.PA check was proposed separately [*].

[*]https://lore.kernel.org/kvm/20260205214326.1029278-3-jmattson@google.com/

Fixes: 4b16184c1cca ("KVM: SVM: Initialize Nested Nested MMU context on VMRUN")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 613d5e2e7c3d1..3aaa4f0bb31ab 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -348,6 +348,11 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
 	if (CC(control->asid == 0))
 		return false;
 
+	if (control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) {
+		if (CC(!kvm_vcpu_is_legal_gpa(vcpu, control->nested_cr3)))
+			return false;
+	}
+
 	if (CC(!nested_svm_check_bitmap_pa(vcpu, control->msrpm_base_pa,
 					   MSRPM_SIZE)))
 		return false;
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 16/26] KVM: nSVM: Add missing consistency check for EFER, CR0, CR4, and CS
       [not found] <20260303003421.2185681-1-yosry@kernel.org>
                   ` (13 preceding siblings ...)
  2026-03-03  0:34 ` [PATCH v7 15/26] KVM: nSVM: Add missing consistency check for nCR3 validity Yosry Ahmed
@ 2026-03-03  0:34 ` Yosry Ahmed
  2026-03-03  0:34 ` [PATCH v7 17/26] KVM: nSVM: Add missing consistency check for EVENTINJ Yosry Ahmed
  15 siblings, 0 replies; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03  0:34 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable

According to the APM Volume #2, 15.5, Canonicalization and Consistency
Checks (24593—Rev. 3.42—March 2024), the following condition (among
others) results in a #VMEXIT with VMEXIT_INVALID (aka SVM_EXIT_ERR):

  EFER.LME, CR0.PG, CR4.PAE, CS.L, and CS.D are all non-zero.

In the list of consistency checks done when EFER.LME and CR0.PG are set,
add a check that CS.L and CS.D are not both set, after the existing
check that CR4.PAE is set.

This is functionally a nop because the nested VMRUN results in
SVM_EXIT_ERR in HW, which is forwarded to L1, but KVM makes all
consistency checks before a VMRUN is actually attempted.

Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 6 ++++++
 arch/x86/kvm/svm/svm.h    | 1 +
 2 files changed, 7 insertions(+)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 3aaa4f0bb31ab..93b3fab9b415d 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -392,6 +392,10 @@ static bool nested_vmcb_check_save(struct kvm_vcpu *vcpu,
 		    CC(!(save->cr0 & X86_CR0_PE)) ||
 		    CC(!kvm_vcpu_is_legal_cr3(vcpu, save->cr3)))
 			return false;
+
+		if (CC((save->cs.attrib & SVM_SELECTOR_L_MASK) &&
+		       (save->cs.attrib & SVM_SELECTOR_DB_MASK)))
+			return false;
 	}
 
 	/* Note, SVM doesn't have any additional restrictions on CR4. */
@@ -487,6 +491,8 @@ static void __nested_copy_vmcb_save_to_cache(struct vmcb_save_area_cached *to,
 	 * Copy only fields that are validated, as we need them
 	 * to avoid TOC/TOU races.
 	 */
+	to->cs = from->cs;
+
 	to->efer = from->efer;
 	to->cr0 = from->cr0;
 	to->cr3 = from->cr3;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 7629cb37c9302..0a5d5a4453b7e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -140,6 +140,7 @@ struct kvm_vmcb_info {
 };
 
 struct vmcb_save_area_cached {
+	struct vmcb_seg cs;
 	u64 efer;
 	u64 cr4;
 	u64 cr3;
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 17/26] KVM: nSVM: Add missing consistency check for EVENTINJ
       [not found] <20260303003421.2185681-1-yosry@kernel.org>
                   ` (14 preceding siblings ...)
  2026-03-03  0:34 ` [PATCH v7 16/26] KVM: nSVM: Add missing consistency check for EFER, CR0, CR4, and CS Yosry Ahmed
@ 2026-03-03  0:34 ` Yosry Ahmed
  15 siblings, 0 replies; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03  0:34 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable

According to the APM Volume #2, 15.20 (24593—Rev. 3.42—March 2024):

  VMRUN exits with VMEXIT_INVALID error code if either:
  • Reserved values of TYPE have been specified, or
  • TYPE = 3 (exception) has been specified with a vector that does not
    correspond to an exception (this includes vector 2, which is an NMI,
    not an exception).

Add the missing consistency checks to KVM. For the second point, inject
VMEXIT_INVALID if the vector is anything but the vectors defined by the
APM for exceptions. Reserved vectors are also considered invalid, which
matches the HW behavior. Vector 9 (i.e. #CSO) is considered invalid
because it is reserved on modern CPUs, and according to LLMs no CPUs
exist supporting SVM and producing #CSOs.

Defined exceptions could be different between virtual CPUs as new CPUs
define new vectors. In a best effort to dynamically define the valid
vectors, make all currently defined vectors as valid except those
obviously tied to a CPU feature: SHSTK -> #CP and SEV-ES -> #VC. As new
vectors are defined, they can similarly be tied to corresponding CPU
features.

Invalid vectors on specific (e.g. old) CPUs that are missed by KVM
should be rejected by HW anyway.

Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler")
CC: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 51 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 93b3fab9b415d..15f483fac28a0 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -339,6 +339,54 @@ static bool nested_svm_check_bitmap_pa(struct kvm_vcpu *vcpu, u64 pa, u32 size)
 	    kvm_vcpu_is_legal_gpa(vcpu, addr + size - 1);
 }
 
+static bool nested_svm_event_inj_valid_exept(struct kvm_vcpu *vcpu, u8 vector)
+{
+	/*
+	 * Vectors that do not correspond to a defined exception are invalid
+	 * (including #NMI and reserved vectors). In a best effort to define
+	 * valid exceptions based on the virtual CPU, make all exceptions always
+	 * valid except those obviously tied to a CPU feature.
+	 */
+	switch (vector) {
+	case DE_VECTOR: case DB_VECTOR: case BP_VECTOR: case OF_VECTOR:
+	case BR_VECTOR: case UD_VECTOR: case NM_VECTOR: case DF_VECTOR:
+	case TS_VECTOR: case NP_VECTOR: case SS_VECTOR: case GP_VECTOR:
+	case PF_VECTOR: case MF_VECTOR: case AC_VECTOR: case MC_VECTOR:
+	case XM_VECTOR: case HV_VECTOR: case SX_VECTOR:
+		return true;
+	case CP_VECTOR:
+		return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
+	case VC_VECTOR:
+		return guest_cpu_cap_has(vcpu, X86_FEATURE_SEV_ES);
+	}
+	return false;
+}
+
+/*
+ * According to the APM, VMRUN exits with SVM_EXIT_ERR if SVM_EVTINJ_VALID is
+ * set and:
+ * - The type of event_inj is not one of the defined values.
+ * - The type is SVM_EVTINJ_TYPE_EXEPT, but the vector is not a valid exception.
+ */
+static bool nested_svm_check_event_inj(struct kvm_vcpu *vcpu, u32 event_inj)
+{
+	u32 type = event_inj & SVM_EVTINJ_TYPE_MASK;
+	u8 vector = event_inj & SVM_EVTINJ_VEC_MASK;
+
+	if (!(event_inj & SVM_EVTINJ_VALID))
+		return true;
+
+	if (type != SVM_EVTINJ_TYPE_INTR && type != SVM_EVTINJ_TYPE_NMI &&
+	    type != SVM_EVTINJ_TYPE_EXEPT && type != SVM_EVTINJ_TYPE_SOFT)
+		return false;
+
+	if (type == SVM_EVTINJ_TYPE_EXEPT &&
+	    !nested_svm_event_inj_valid_exept(vcpu, vector))
+		return false;
+
+	return true;
+}
+
 static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
 				       struct vmcb_ctrl_area_cached *control)
 {
@@ -365,6 +413,9 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
 		return false;
 	}
 
+	if (CC(!nested_svm_check_event_inj(vcpu, control->event_inj)))
+		return false;
+
 	return true;
 }
 
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 03/26] KVM: SVM: Add missing save/restore handling of LBR MSRs
  2026-03-03  0:33 ` [PATCH v7 03/26] KVM: SVM: Add missing save/restore handling of LBR MSRs Yosry Ahmed
@ 2026-03-03 16:37   ` Sean Christopherson
  2026-03-03 19:14     ` Yosry Ahmed
  0 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2026-03-03 16:37 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, stable, Jim Mattson

On Tue, Mar 03, 2026, Yosry Ahmed wrote:
> MSR_IA32_DEBUGCTLMSR and LBR MSRs are currently not enumerated by
> KVM_GET_MSR_INDEX_LIST, and LBR MSRs cannot be set with KVM_SET_MSRS. So
> save/restore is completely broken.
> 
> Fix it by adding the MSRs to msrs_to_save_base, and allowing writes to
> LBR MSRs from userspace only (as they are read-only MSRs). Additionally,
> to correctly restore L1's LBRs while L2 is running, make sure the LBRs
> are copied from the captured VMCB01 save area in svm_copy_vmrun_state().
> 
> For VMX, this also adds save/restore handling of KVM_GET_MSR_INDEX_LIST.
> For unspported MSR_IA32_LAST* MSRs, kvm_do_msr_access() should 0 these
> MSRs on userspace reads, and ignore KVM_MSR_RET_UNSUPPORTED on userspace
> writes.
> 
> Fixes: 24e09cbf480a ("KVM: SVM: enable LBR virtualization")
> Cc: stable@vger.kernel.org
> Reported-by: Jim Mattson <jmattson@google.com>
> Signed-off-by: Yosry Ahmed <yosry@kernel.org>
> ---
>  arch/x86/kvm/svm/nested.c |  5 +++++
>  arch/x86/kvm/svm/svm.c    | 24 ++++++++++++++++++++++++
>  arch/x86/kvm/x86.c        |  3 +++
>  3 files changed, 32 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index f7d5db0af69ac..3bf758c9cb85c 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -1100,6 +1100,11 @@ void svm_copy_vmrun_state(struct vmcb_save_area *to_save,
>  		to_save->isst_addr = from_save->isst_addr;
>  		to_save->ssp = from_save->ssp;
>  	}
> +
> +	if (lbrv) {

Tomato, tomato, but maybe make this

	if (kvm_cpu_cap_has(X86_FEATURE_LBRV)) {

to capture that this requires nested support.  I can't imagine we'll ever disable
X86_FEATURE_LBRV when nested=1 && lbrv=1, but I don't see any harm in being
paranoid in this case.

> +		svm_copy_lbrs(to_save, from_save);
> +		to_save->dbgctl &= ~DEBUGCTL_RESERVED_BITS;
> +	}
>  }
>  
>  void svm_copy_vmloadsave_state(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index f52e588317fcf..cb53174583a26 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3071,6 +3071,30 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>  		vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
>  		svm_update_lbrv(vcpu);
>  		break;
> +	case MSR_IA32_LASTBRANCHFROMIP:

Shouldn't these be gated on lbrv?  If LBRV is truly unsupported, KVM would be
writing "undefined" fields and clearing "unknown" clean bits.

Specifically, if we do:

		if (!lbrv)
			return KVM_MSR_RET_UNSUPPORTED;

then kvm_do_msr_access() will allow writes of '0' from the host, via this code:

	if (host_initiated && !*data && kvm_is_advertised_msr(msr))
		return 0;

And then in the read side, do e.g.:

	msr_info->data = lbrv ? svm->vmcb->save.dbgctl : 0;

to ensure KVM won't feed userspace garbage (the VMCB fields should be '0', but
there's no reason to risk that).

The changelog also needs to call out that kvm_set_msr_common() returns
KVM_MSR_RET_UNSUPPORTED for unhandled MSRs (i.e. for VMX and TDX), and that
kvm_get_msr_common() explicitly zeros the data for MSR_IA32_LASTxxx (because per
b5e2fec0ebc3 ("KVM: Ignore DEBUGCTL MSRs with no effect"), old and crust kernels
would read the MSRs on Intel...).

So all in all (not yet tested), this?  If this is the only issue in the series,
or at least in the stable@ part of the series, no need for a v8 (I've obviously
already done the fixup).

--
From: Yosry Ahmed <yosry@kernel.org>
Date: Tue, 3 Mar 2026 00:33:57 +0000
Subject: [PATCH] KVM: SVM: Add missing save/restore handling of LBR MSRs

MSR_IA32_DEBUGCTLMSR and LBR MSRs are currently not enumerated by
KVM_GET_MSR_INDEX_LIST, and LBR MSRs cannot be set with KVM_SET_MSRS. So
save/restore is completely broken.

Fix it by adding the MSRs to msrs_to_save_base, and allowing writes to
LBR MSRs from userspace only (as they are read-only MSRs) if LBR
virtualization is enabled.  Additionally, to correctly restore L1's LBRs
while L2 is running, make sure the LBRs are copied from the captured
VMCB01 save area in svm_copy_vmrun_state().

Note, for VMX, this also fixes a flaw where MSR_IA32_DEBUGCTLMSR isn't
reported as an MSR to save/restore.

Note #2, over-reporting MSR_IA32_LASTxxx on Intel is ok, as KVM already
handles unsupported reads and writes thanks to commit b5e2fec0ebc3 ("KVM:
Ignore DEBUGCTL MSRs with no effect") (kvm_do_msr_access() will morph the
unsupported userspace write into a nop).

Fixes: 24e09cbf480a ("KVM: SVM: enable LBR virtualization")
Cc: stable@vger.kernel.org
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
Link: https://patch.msgid.link/20260303003421.2185681-4-yosry@kernel.org
[sean: guard with lbrv checks, massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c |  5 +++++
 arch/x86/kvm/svm/svm.c    | 44 +++++++++++++++++++++++++++++++++------
 arch/x86/kvm/x86.c        |  3 +++
 3 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index d0faa3e2dc97..d142761ad517 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1098,6 +1098,11 @@ void svm_copy_vmrun_state(struct vmcb_save_area *to_save,
 		to_save->isst_addr = from_save->isst_addr;
 		to_save->ssp = from_save->ssp;
 	}
+
+	if (kvm_cpu_cap_has(X86_FEATURE_LBRV)) {
+		svm_copy_lbrs(to_save, from_save);
+		to_save->dbgctl &= ~DEBUGCTL_RESERVED_BITS;
+	}
 }
 
 void svm_copy_vmloadsave_state(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4649cef966f6..317c8c28443a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2788,19 +2788,19 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		msr_info->data = svm->tsc_aux;
 		break;
 	case MSR_IA32_DEBUGCTLMSR:
-		msr_info->data = svm->vmcb->save.dbgctl;
+		msr_info->data = lbrv ? svm->vmcb->save.dbgctl : 0;
 		break;
 	case MSR_IA32_LASTBRANCHFROMIP:
-		msr_info->data = svm->vmcb->save.br_from;
+		msr_info->data = lbrv ? svm->vmcb->save.br_from : 0;
 		break;
 	case MSR_IA32_LASTBRANCHTOIP:
-		msr_info->data = svm->vmcb->save.br_to;
+		msr_info->data = lbrv ? svm->vmcb->save.br_to : 0;
 		break;
 	case MSR_IA32_LASTINTFROMIP:
-		msr_info->data = svm->vmcb->save.last_excp_from;
-		break;
+		msr_info->data = lbrv ? svm->vmcb->save.last_excp_from : 0;
+		breakk;
 	case MSR_IA32_LASTINTTOIP:
-		msr_info->data = svm->vmcb->save.last_excp_to;
+		msr_info->data = lbrv ? svm->vmcb->save.last_excp_to : 0;
 		break;
 	case MSR_VM_HSAVE_PA:
 		msr_info->data = svm->nested.hsave_msr;
@@ -3075,6 +3075,38 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
 		svm_update_lbrv(vcpu);
 		break;
+	case MSR_IA32_LASTBRANCHFROMIP:
+		if (!lbrv)
+			return KVM_MSR_RET_UNSUPPORTED;
+		if (!msr->host_initiated)
+			return 1;
+		svm->vmcb->save.br_from = data;
+		vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
+		break;
+	case MSR_IA32_LASTBRANCHTOIP:
+		if (!lbrv)
+			return KVM_MSR_RET_UNSUPPORTED;
+		if (!msr->host_initiated)
+			return 1;
+		svm->vmcb->save.br_to = data;
+		vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
+		break;
+	case MSR_IA32_LASTINTFROMIP:
+		if (!lbrv)
+			return KVM_MSR_RET_UNSUPPORTED;
+		if (!msr->host_initiated)
+			return 1;
+		svm->vmcb->save.last_excp_from = data;
+		vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
+		break;
+	case MSR_IA32_LASTINTTOIP:
+		if (!lbrv)
+			return KVM_MSR_RET_UNSUPPORTED;
+		if (!msr->host_initiated)
+			return 1;
+		svm->vmcb->save.last_excp_to = data;
+		vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
+		break;
 	case MSR_VM_HSAVE_PA:
 		/*
 		 * Old kernels did not validate the value written to
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6e87ec52fa06..64da02d1ee00 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -351,6 +351,9 @@ static const u32 msrs_to_save_base[] = {
 	MSR_IA32_U_CET, MSR_IA32_S_CET,
 	MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
 	MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB,
+	MSR_IA32_DEBUGCTLMSR,
+	MSR_IA32_LASTBRANCHFROMIP, MSR_IA32_LASTBRANCHTOIP,
+	MSR_IA32_LASTINTFROMIP, MSR_IA32_LASTINTTOIP,
 };
 
 static const u32 msrs_to_save_pmu[] = {

base-commit: 149b996ea367eef39faf82ccba0659a5f3d389ea
--

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 09/26] KVM: nSVM: Triple fault if restore host CR3 fails on nested #VMEXIT
  2026-03-03  0:34 ` [PATCH v7 09/26] KVM: nSVM: Triple fault if restore host CR3 " Yosry Ahmed
@ 2026-03-03 16:49   ` Sean Christopherson
  2026-03-03 19:15     ` Yosry Ahmed
  0 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2026-03-03 16:49 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, stable

On Tue, Mar 03, 2026, Yosry Ahmed wrote:
> If loading L1's CR3 fails on a nested #VMEXIT, nested_svm_vmexit()
> returns an error code that is ignored by most callers, and continues to
> run L1 with corrupted state. A sane recovery is not possible in this
> case, and HW behavior is to cause a shutdown. Inject a triple fault
> ,nstead, and do not return early from nested_svm_vmexit(). Continue

s/,/i

> cleaning up the vCPU state (e.g. clear pending exceptions), to handle
> the failure as gracefully as possible.
> 
> >From the APM:
> 	Upon #VMEXIT, the processor performs the following actions in
> 	order to return to the host execution context:
> 
> 	...
> 	if (illegal host state loaded, or exception while loading
> 	    host state)
> 		shutdown
> 	else
> 		execute first host instruction following the VMRUN

Uber nit, use spaces instead of tabs in changelogs, as indenting eight chars is
almost always overkill and changelogs are more likely to be viewed in a reader
that has tab-stops set to something other than eight.  E.g. using two spaces as
the margin and then manual indentation of four:

From the APM:

  Upon #VMEXIT, the processor performs the following actions in order to
  return to the host execution context:

  ...

  if (illegal host state loaded, or exception while loading host state)
      shutdown
  else
      execute first host instruction following the VMRUN

Remove the return value of nested_svm_vmexit(), which is mostly
unchecked anyway.


> Remove the return value of nested_svm_vmexit(), which is mostly
> unchecked anyway.
> 
> Fixes: d82aaef9c88a ("KVM: nSVM: use nested_svm_load_cr3() on guest->host switch")
> CC: stable@vger.kernel.org

Heh, and super duper uber nit, "Cc:" is much more common than "CC:" (I'm actually
somewhat surprised checkpatch didn't complain since it's so particular about case
for other trailers).

$ git log -10000 | grep "CC:" | wc -l
38
$ git log -10000 | grep "Cc:" | wc -l
11238

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 12/26] KVM: nSVM: Clear tracking of L1->L2 NMI and soft IRQ on nested #VMEXIT
  2026-03-03  0:34 ` [PATCH v7 12/26] KVM: nSVM: Clear tracking of L1->L2 NMI and soft IRQ " Yosry Ahmed
@ 2026-03-03 16:50   ` Sean Christopherson
  2026-03-03 19:15     ` Yosry Ahmed
  0 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2026-03-03 16:50 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, stable

On Tue, Mar 03, 2026, Yosry Ahmed wrote:
> KVM clears tracking of L1->L2 injected NMIs (i.e. nmi_l1_to_l2) and soft
> IRQs (i.e. soft_int_injected) on a synthesized #VMEXIT(INVALID) due to
> failed VMRUN. However, they are not explicitly cleared in other
> synthesized #VMEXITs.
> 
> soft_int_injected is always cleared after the first VMRUN of L2 when
> completing interrupts, as any re-injection is then tracked by KVM
> (instead of purely in vmcb02).
> 
> nmi_l1_to_l2 is not cleared after the first VMRUN if NMI injection
> failed, as KVM still needs to keep track that the NMI originated from L1
> to avoid blocking NMIs for L1. It is only cleared when the NMI injection
> succeeds.
> 
> KVM could synthesize a #VMEXIT to L1 before successfully injecting the
> NMI into L2 (e.g. due to a #NPF on L2's NMI handler in L1's NPTs). In
> this case, nmi_l1_to_l2 will remain true, and KVM may not correctly mask
> NMIs and intercept IRET when injecting an NMI into L1.
> 
> Clear both nmi_l1_to_l2 and soft_int_injected in nested_svm_vmexit() to
> capture all #VMEXITs, except those that occur due to failed consistency
> checks, as those happen before nmi_l1_to_l2 or soft_int_injected are
> set.

This last paragraph confused me a little bit.  I read "to capture all #VMEXITs"
as some sort of "catching" that KVM was doing.  I've got it reworded to this:

Clear both nmi_l1_to_l2 and soft_int_injected in nested_svm_vmexit(), i.e.
for all #VMEXITs except those that occur due to failed consistency checks,
as those happen before nmi_l1_to_l2 or soft_int_injected are set.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 15/26] KVM: nSVM: Add missing consistency check for nCR3 validity
  2026-03-03  0:34 ` [PATCH v7 15/26] KVM: nSVM: Add missing consistency check for nCR3 validity Yosry Ahmed
@ 2026-03-03 16:56   ` Sean Christopherson
  2026-03-03 19:17     ` Yosry Ahmed
  0 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2026-03-03 16:56 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, stable

On Tue, Mar 03, 2026, Yosry Ahmed wrote:
> >From the APM Volume #2, 15.25.4 (24593—Rev. 3.42—March 2024):
> 
> 	When VMRUN is executed with nested paging enabled
> 	(NP_ENABLE = 1), the following conditions are considered illegal
> 	state combinations, in addition to those mentioned in
> 	“Canonicalization and Consistency Checks”:
> 	• Any MBZ bit of nCR3 is set.
> 	• Any G_PAT.PA field has an unsupported type encoding or any
> 	reserved field in G_PAT has a nonzero value.
> 
> Add the consistency check for nCR3 being a legal GPA with no MBZ bits
> set. The G_PAT.PA check was proposed separately [*].
> 
> [*]https://lore.kernel.org/kvm/20260205214326.1029278-3-jmattson@google.com/
> 
> Fixes: 4b16184c1cca ("KVM: SVM: Initialize Nested Nested MMU context on VMRUN")
> Cc: stable@vger.kernel.org
> Signed-off-by: Yosry Ahmed <yosry@kernel.org>
> ---
>  arch/x86/kvm/svm/nested.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 613d5e2e7c3d1..3aaa4f0bb31ab 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -348,6 +348,11 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
>  	if (CC(control->asid == 0))
>  		return false;
>  
> +	if (control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) {
> +		if (CC(!kvm_vcpu_is_legal_gpa(vcpu, control->nested_cr3)))
> +			return false;

Put the full if-statement in CC(), that way the tracepoint will capture the entire
clause, i.e. will help the reader understand than nested_cr3 was checked
specifically because NPT was enabled.

	if (CC((control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) &&
	       !kvm_vcpu_is_legal_gpa(vcpu, control->nested_cr3)))
		return false;

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 03/26] KVM: SVM: Add missing save/restore handling of LBR MSRs
  2026-03-03 16:37   ` Sean Christopherson
@ 2026-03-03 19:14     ` Yosry Ahmed
  2026-03-04  0:44       ` Sean Christopherson
  0 siblings, 1 reply; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03 19:14 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable, Jim Mattson

> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index f7d5db0af69ac..3bf758c9cb85c 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -1100,6 +1100,11 @@ void svm_copy_vmrun_state(struct vmcb_save_area *to_save,
> >               to_save->isst_addr = from_save->isst_addr;
> >               to_save->ssp = from_save->ssp;
> >       }
> > +
> > +     if (lbrv) {
>
> Tomato, tomato, but maybe make this
>
>         if (kvm_cpu_cap_has(X86_FEATURE_LBRV)) {
>
> to capture that this requires nested support.  I can't imagine we'll ever disable
> X86_FEATURE_LBRV when nested=1 && lbrv=1, but I don't see any harm in being
> paranoid in this case.

Sounds good.

>
> > +             svm_copy_lbrs(to_save, from_save);
> > +             to_save->dbgctl &= ~DEBUGCTL_RESERVED_BITS;
> > +     }
> >  }
> >
> >  void svm_copy_vmloadsave_state(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index f52e588317fcf..cb53174583a26 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -3071,6 +3071,30 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> >               vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
> >               svm_update_lbrv(vcpu);
> >               break;
> > +     case MSR_IA32_LASTBRANCHFROMIP:
>
> Shouldn't these be gated on lbrv?  If LBRV is truly unsupported, KVM would be
> writing "undefined" fields and clearing "unknown" clean bits.
>
> Specifically, if we do:
>
>                 if (!lbrv)
>                         return KVM_MSR_RET_UNSUPPORTED;
>
> then kvm_do_msr_access() will allow writes of '0' from the host, via this code:
>
>         if (host_initiated && !*data && kvm_is_advertised_msr(msr))
>                 return 0;
>
> And then in the read side, do e.g.:
>
>         msr_info->data = lbrv ? svm->vmcb->save.dbgctl : 0;
>
> to ensure KVM won't feed userspace garbage (the VMCB fields should be '0', but
> there's no reason to risk that).

Good call.

>
> The changelog also needs to call out that kvm_set_msr_common() returns
> KVM_MSR_RET_UNSUPPORTED for unhandled MSRs (i.e. for VMX and TDX), and that
> kvm_get_msr_common() explicitly zeros the data for MSR_IA32_LASTxxx (because per
> b5e2fec0ebc3 ("KVM: Ignore DEBUGCTL MSRs with no effect"), old and crust kernels
> would read the MSRs on Intel...).

That was captured (somehow):

For VMX, this also adds save/restore handling of KVM_GET_MSR_INDEX_LIST.
For unspported MSR_IA32_LAST* MSRs, kvm_do_msr_access() should 0 these
MSRs on userspace reads, and ignore KVM_MSR_RET_UNSUPPORTED on userspace
writes.

>
> So all in all (not yet tested), this?  If this is the only issue in the series,
> or at least in the stable@ part of the series, no need for a v8 (I've obviously
> already done the fixup).

Looks good with a minor nit below (could be a followup).

> @@ -3075,6 +3075,38 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>                 vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
>                 svm_update_lbrv(vcpu);
>                 break;
> +       case MSR_IA32_LASTBRANCHFROMIP:
> +               if (!lbrv)
> +                       return KVM_MSR_RET_UNSUPPORTED;
> +               if (!msr->host_initiated)
> +                       return 1;
> +               svm->vmcb->save.br_from = data;
> +               vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
> +               break;
> +       case MSR_IA32_LASTBRANCHTOIP:
> +               if (!lbrv)
> +                       return KVM_MSR_RET_UNSUPPORTED;
> +               if (!msr->host_initiated)
> +                       return 1;
> +               svm->vmcb->save.br_to = data;
> +               vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
> +               break;
> +       case MSR_IA32_LASTINTFROMIP:
> +               if (!lbrv)
> +                       return KVM_MSR_RET_UNSUPPORTED;
> +               if (!msr->host_initiated)
> +                       return 1;
> +               svm->vmcb->save.last_excp_from = data;
> +               vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
> +               break;
> +       case MSR_IA32_LASTINTTOIP:
> +               if (!lbrv)
> +                       return KVM_MSR_RET_UNSUPPORTED;
> +               if (!msr->host_initiated)
> +                       return 1;
> +               svm->vmcb->save.last_excp_to = data;
> +               vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
> +               break;

There's so much repeated code here. We can use gotos to share code,
but I am not sure if that's a strict improvement. We can also use a
helper, perhaps?

static int svm_set_lbr_msr(struct vcpu_svm *svm, struct msr_data *msr,
u64 data, u64 *field)
{
       if (!lbrv)
               return KVM_MSR_RET_UNSUPPORTED;
       if (!msr->host_initiated)
               return 1;
       *field = data;
        vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
        return 0;
}

...

       case MSR_IA32_LASTBRANCHFROMIP:
             ret = svm_set_lbr_msr(svm, msr, data, &svm->vmcb->save.br_from);
             if (ret)
                        return ret;
              break;
...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 09/26] KVM: nSVM: Triple fault if restore host CR3 fails on nested #VMEXIT
  2026-03-03 16:49   ` Sean Christopherson
@ 2026-03-03 19:15     ` Yosry Ahmed
  0 siblings, 0 replies; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03 19:15 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable

On Tue, Mar 3, 2026 at 8:49 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Mar 03, 2026, Yosry Ahmed wrote:
> > If loading L1's CR3 fails on a nested #VMEXIT, nested_svm_vmexit()
> > returns an error code that is ignored by most callers, and continues to
> > run L1 with corrupted state. A sane recovery is not possible in this
> > case, and HW behavior is to cause a shutdown. Inject a triple fault
> > ,nstead, and do not return early from nested_svm_vmexit(). Continue
>
> s/,/i

Not sure how that happened lol.

>
> > cleaning up the vCPU state (e.g. clear pending exceptions), to handle
> > the failure as gracefully as possible.
> >
> > >From the APM:
> >       Upon #VMEXIT, the processor performs the following actions in
> >       order to return to the host execution context:
> >
> >       ...
> >       if (illegal host state loaded, or exception while loading
> >           host state)
> >               shutdown
> >       else
> >               execute first host instruction following the VMRUN
>
> Uber nit, use spaces instead of tabs in changelogs, as indenting eight chars is
> almost always overkill and changelogs are more likely to be viewed in a reader
> that has tab-stops set to something other than eight.  E.g. using two spaces as
> the margin and then manual indentation of four:

Yeah I started doing that recently but I didn't go back to change old ones.

[..]
> >
> > Fixes: d82aaef9c88a ("KVM: nSVM: use nested_svm_load_cr3() on guest->host switch")
> > CC: stable@vger.kernel.org
>
> Heh, and super duper uber nit, "Cc:" is much more common than "CC:" (I'm actually
> somewhat surprised checkpatch didn't complain since it's so particular about case
> for other trailers).
>
> $ git log -10000 | grep "CC:" | wc -l
> 38
> $ git log -10000 | grep "Cc:" | wc -l
> 11238

That was a mistake, I think I generally use Cc.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 12/26] KVM: nSVM: Clear tracking of L1->L2 NMI and soft IRQ on nested #VMEXIT
  2026-03-03 16:50   ` Sean Christopherson
@ 2026-03-03 19:15     ` Yosry Ahmed
  0 siblings, 0 replies; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03 19:15 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable

On Tue, Mar 3, 2026 at 8:50 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Mar 03, 2026, Yosry Ahmed wrote:
> > KVM clears tracking of L1->L2 injected NMIs (i.e. nmi_l1_to_l2) and soft
> > IRQs (i.e. soft_int_injected) on a synthesized #VMEXIT(INVALID) due to
> > failed VMRUN. However, they are not explicitly cleared in other
> > synthesized #VMEXITs.
> >
> > soft_int_injected is always cleared after the first VMRUN of L2 when
> > completing interrupts, as any re-injection is then tracked by KVM
> > (instead of purely in vmcb02).
> >
> > nmi_l1_to_l2 is not cleared after the first VMRUN if NMI injection
> > failed, as KVM still needs to keep track that the NMI originated from L1
> > to avoid blocking NMIs for L1. It is only cleared when the NMI injection
> > succeeds.
> >
> > KVM could synthesize a #VMEXIT to L1 before successfully injecting the
> > NMI into L2 (e.g. due to a #NPF on L2's NMI handler in L1's NPTs). In
> > this case, nmi_l1_to_l2 will remain true, and KVM may not correctly mask
> > NMIs and intercept IRET when injecting an NMI into L1.
> >
> > Clear both nmi_l1_to_l2 and soft_int_injected in nested_svm_vmexit() to
> > capture all #VMEXITs, except those that occur due to failed consistency
> > checks, as those happen before nmi_l1_to_l2 or soft_int_injected are
> > set.
>
> This last paragraph confused me a little bit.  I read "to capture all #VMEXITs"
> as some sort of "catching" that KVM was doing.  I've got it reworded to this:
>
> Clear both nmi_l1_to_l2 and soft_int_injected in nested_svm_vmexit(), i.e.
> for all #VMEXITs except those that occur due to failed consistency checks,
> as those happen before nmi_l1_to_l2 or soft_int_injected are set.

LGTM.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 15/26] KVM: nSVM: Add missing consistency check for nCR3 validity
  2026-03-03 16:56   ` Sean Christopherson
@ 2026-03-03 19:17     ` Yosry Ahmed
  0 siblings, 0 replies; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-03 19:17 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable

On Tue, Mar 3, 2026 at 8:56 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Mar 03, 2026, Yosry Ahmed wrote:
> > >From the APM Volume #2, 15.25.4 (24593—Rev. 3.42—March 2024):
> >
> >       When VMRUN is executed with nested paging enabled
> >       (NP_ENABLE = 1), the following conditions are considered illegal
> >       state combinations, in addition to those mentioned in
> >       “Canonicalization and Consistency Checks”:
> >       • Any MBZ bit of nCR3 is set.
> >       • Any G_PAT.PA field has an unsupported type encoding or any
> >       reserved field in G_PAT has a nonzero value.
> >
> > Add the consistency check for nCR3 being a legal GPA with no MBZ bits
> > set. The G_PAT.PA check was proposed separately [*].
> >
> > [*]https://lore.kernel.org/kvm/20260205214326.1029278-3-jmattson@google.com/
> >
> > Fixes: 4b16184c1cca ("KVM: SVM: Initialize Nested Nested MMU context on VMRUN")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Yosry Ahmed <yosry@kernel.org>
> > ---
> >  arch/x86/kvm/svm/nested.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 613d5e2e7c3d1..3aaa4f0bb31ab 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -348,6 +348,11 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
> >       if (CC(control->asid == 0))
> >               return false;
> >
> > +     if (control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) {
> > +             if (CC(!kvm_vcpu_is_legal_gpa(vcpu, control->nested_cr3)))
> > +                     return false;
>
> Put the full if-statement in CC(), that way the tracepoint will capture the entire
> clause, i.e. will help the reader understand than nested_cr3 was checked
> specifically because NPT was enabled.

I had it this way in v6 because there was another consistency check
dependent on NPT being enabled:
https://lore.kernel.org/kvm/20260224223405.3270433-21-yosry@kernel.org/.

I dropped the patch in v7 as I realized L1's CR0.PG was already being
checked, but it didn't occur to me to go back and update this. Good
catch.

>
>         if (CC((control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) &&
>                !kvm_vcpu_is_legal_gpa(vcpu, control->nested_cr3)))
>                 return false;

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 03/26] KVM: SVM: Add missing save/restore handling of LBR MSRs
  2026-03-03 19:14     ` Yosry Ahmed
@ 2026-03-04  0:44       ` Sean Christopherson
  2026-03-04  0:48         ` Yosry Ahmed
  0 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2026-03-04  0:44 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, stable, Jim Mattson

On Tue, Mar 03, 2026, Yosry Ahmed wrote:
> > > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > So all in all (not yet tested), this?  If this is the only issue in the series,
> > or at least in the stable@ part of the series, no need for a v8 (I've obviously
> > already done the fixup).
> 
> Looks good with a minor nit below (could be a followup).
> 
> > @@ -3075,6 +3075,38 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> >                 vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
> >                 svm_update_lbrv(vcpu);
> >                 break;
> > +       case MSR_IA32_LASTBRANCHFROMIP:
> > +               if (!lbrv)
> > +                       return KVM_MSR_RET_UNSUPPORTED;
> > +               if (!msr->host_initiated)
> > +                       return 1;
> > +               svm->vmcb->save.br_from = data;
> > +               vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
> > +               break;
> > +       case MSR_IA32_LASTBRANCHTOIP:
> > +               if (!lbrv)
> > +                       return KVM_MSR_RET_UNSUPPORTED;
> > +               if (!msr->host_initiated)
> > +                       return 1;
> > +               svm->vmcb->save.br_to = data;
> > +               vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
> > +               break;
> > +       case MSR_IA32_LASTINTFROMIP:
> > +               if (!lbrv)
> > +                       return KVM_MSR_RET_UNSUPPORTED;
> > +               if (!msr->host_initiated)
> > +                       return 1;
> > +               svm->vmcb->save.last_excp_from = data;
> > +               vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
> > +               break;
> > +       case MSR_IA32_LASTINTTOIP:
> > +               if (!lbrv)
> > +                       return KVM_MSR_RET_UNSUPPORTED;
> > +               if (!msr->host_initiated)
> > +                       return 1;
> > +               svm->vmcb->save.last_excp_to = data;
> > +               vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
> > +               break;
> 
> There's so much repeated code here. 

Ya :-(

> We can use gotos to share code, but I am not sure if that's a strict
> improvement. We can also use a helper, perhaps?


Where's your sense of adventure?

	case MSR_IA32_LASTBRANCHFROMIP:
	case MSR_IA32_LASTBRANCHTOIP:
	case MSR_IA32_LASTINTFROMIP:
	case MSR_IA32_LASTINTTOIP:
		if (!lbrv)
			return KVM_MSR_RET_UNSUPPORTED;
		if (!msr->host_initiated)
			return 1;
		*(&svm->vmcb->save.br_from + (ecx - MSR_IA32_LASTBRANCHFROMIP)) = data;
		vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
		break;

Jokes aside, maybe this, to dedup get() at the same time?

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 68b747a94294..f1811105e89f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2720,6 +2720,23 @@ static int svm_get_feature_msr(u32 msr, u64 *data)
        return 0;
 }
 
+static __always_inline u64 *svm_vmcb_lbr(struct vcpu_svm *svm, u32 msr)
+{
+       switch (msr) {
+       case MSR_IA32_LASTBRANCHFROMIP:
+               return &svm->vmcb->save.br_from;
+       case MSR_IA32_LASTBRANCHTOIP:
+               return &svm->vmcb->save.br_to;
+       case MSR_IA32_LASTINTFROMIP:
+               return &svm->vmcb->save.last_excp_from;
+       case MSR_IA32_LASTINTTOIP:
+               return &svm->vmcb->save.last_excp_to;
+       default:
+               break;
+       }
+       BUILD_BUG();
+}
+
 static bool sev_es_prevent_msr_access(struct kvm_vcpu *vcpu,
                                      struct msr_data *msr_info)
 {
@@ -2838,16 +2855,10 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
                msr_info->data = lbrv ? svm->vmcb->save.dbgctl : 0;
                break;
        case MSR_IA32_LASTBRANCHFROMIP:
-               msr_info->data = lbrv ? svm->vmcb->save.br_from : 0;
-               break;
        case MSR_IA32_LASTBRANCHTOIP:
-               msr_info->data = lbrv ? svm->vmcb->save.br_to : 0;
-               break;
        case MSR_IA32_LASTINTFROMIP:
-               msr_info->data = lbrv ? svm->vmcb->save.last_excp_from : 0;
-               break;
        case MSR_IA32_LASTINTTOIP:
-               msr_info->data = lbrv ? svm->vmcb->save.last_excp_to : 0;
+               msr_info->data = lbrv ? *svm_vmcb_lbr(svm, msr_info->index) : 0;
                break;
        case MSR_VM_HSAVE_PA:
                msr_info->data = svm->nested.hsave_msr;
@@ -3122,35 +3133,14 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
                svm_update_lbrv(vcpu);
                break;
        case MSR_IA32_LASTBRANCHFROMIP:
-               if (!lbrv)
-                       return KVM_MSR_RET_UNSUPPORTED;
-               if (!msr->host_initiated)
-                       return 1;
-               svm->vmcb->save.br_from = data;
-               vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
-               break;
        case MSR_IA32_LASTBRANCHTOIP:
-               if (!lbrv)
-                       return KVM_MSR_RET_UNSUPPORTED;
-               if (!msr->host_initiated)
-                       return 1;
-               svm->vmcb->save.br_to = data;
-               vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
-               break;
        case MSR_IA32_LASTINTFROMIP:
-               if (!lbrv)
-                       return KVM_MSR_RET_UNSUPPORTED;
-               if (!msr->host_initiated)
-                       return 1;
-               svm->vmcb->save.last_excp_from = data;
-               vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
-               break;
        case MSR_IA32_LASTINTTOIP:
                if (!lbrv)
                        return KVM_MSR_RET_UNSUPPORTED;
                if (!msr->host_initiated)
                        return 1;
-               svm->vmcb->save.last_excp_to = data;
+               *svm_vmcb_lbr(svm, ecx) = data;
                vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
                break;
        case MSR_VM_HSAVE_PA:

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 03/26] KVM: SVM: Add missing save/restore handling of LBR MSRs
  2026-03-04  0:44       ` Sean Christopherson
@ 2026-03-04  0:48         ` Yosry Ahmed
  0 siblings, 0 replies; 26+ messages in thread
From: Yosry Ahmed @ 2026-03-04  0:48 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable, Jim Mattson

> > There's so much repeated code here.
>
> Ya :-(
>
> > We can use gotos to share code, but I am not sure if that's a strict
> > improvement. We can also use a helper, perhaps?
>
>
> Where's your sense of adventure?
>
>         case MSR_IA32_LASTBRANCHFROMIP:
>         case MSR_IA32_LASTBRANCHTOIP:
>         case MSR_IA32_LASTINTFROMIP:
>         case MSR_IA32_LASTINTTOIP:
>                 if (!lbrv)
>                         return KVM_MSR_RET_UNSUPPORTED;
>                 if (!msr->host_initiated)
>                         return 1;
>                 *(&svm->vmcb->save.br_from + (ecx - MSR_IA32_LASTBRANCHFROMIP)) = data;
>                 vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
>                 break;
>
> Jokes aside, maybe this, to dedup get() at the same time?

Looks good to me!

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2026-03-04  0:48 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260303003421.2185681-1-yosry@kernel.org>
2026-03-03  0:33 ` [PATCH v7 01/26] KVM: nSVM: Avoid clearing VMCB_LBR in vmcb12 Yosry Ahmed
2026-03-03  0:33 ` [PATCH v7 02/26] KVM: SVM: Switch svm_copy_lbrs() to a macro Yosry Ahmed
2026-03-03  0:33 ` [PATCH v7 03/26] KVM: SVM: Add missing save/restore handling of LBR MSRs Yosry Ahmed
2026-03-03 16:37   ` Sean Christopherson
2026-03-03 19:14     ` Yosry Ahmed
2026-03-04  0:44       ` Sean Christopherson
2026-03-04  0:48         ` Yosry Ahmed
2026-03-03  0:33 ` [PATCH v7 05/26] KVM: nSVM: Always inject a #GP if mapping VMCB12 fails on nested VMRUN Yosry Ahmed
2026-03-03  0:34 ` [PATCH v7 06/26] KVM: nSVM: Refactor checking LBRV enablement in vmcb12 into a helper Yosry Ahmed
2026-03-03  0:34 ` [PATCH v7 07/26] KVM: nSVM: Refactor writing vmcb12 on nested #VMEXIT as " Yosry Ahmed
2026-03-03  0:34 ` [PATCH v7 08/26] KVM: nSVM: Triple fault if mapping VMCB12 fails on nested #VMEXIT Yosry Ahmed
2026-03-03  0:34 ` [PATCH v7 09/26] KVM: nSVM: Triple fault if restore host CR3 " Yosry Ahmed
2026-03-03 16:49   ` Sean Christopherson
2026-03-03 19:15     ` Yosry Ahmed
2026-03-03  0:34 ` [PATCH v7 10/26] KVM: nSVM: Clear GIF on nested #VMEXIT(INVALID) Yosry Ahmed
2026-03-03  0:34 ` [PATCH v7 11/26] KVM: nSVM: Clear EVENTINJ fields in vmcb12 on nested #VMEXIT Yosry Ahmed
2026-03-03  0:34 ` [PATCH v7 12/26] KVM: nSVM: Clear tracking of L1->L2 NMI and soft IRQ " Yosry Ahmed
2026-03-03 16:50   ` Sean Christopherson
2026-03-03 19:15     ` Yosry Ahmed
2026-03-03  0:34 ` [PATCH v7 13/26] KVM: nSVM: Drop nested_vmcb_check_{save/control}() wrappers Yosry Ahmed
2026-03-03  0:34 ` [PATCH v7 14/26] KVM: nSVM: Drop the non-architectural consistency check for NP_ENABLE Yosry Ahmed
2026-03-03  0:34 ` [PATCH v7 15/26] KVM: nSVM: Add missing consistency check for nCR3 validity Yosry Ahmed
2026-03-03 16:56   ` Sean Christopherson
2026-03-03 19:17     ` Yosry Ahmed
2026-03-03  0:34 ` [PATCH v7 16/26] KVM: nSVM: Add missing consistency check for EFER, CR0, CR4, and CS Yosry Ahmed
2026-03-03  0:34 ` [PATCH v7 17/26] KVM: nSVM: Add missing consistency check for EVENTINJ Yosry Ahmed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox