* [PATCH v3 0/2] Address syzkaller warnings in nested VM-Exit after RSM
@ 2024-05-01 20:29 Kishen Maloor
2024-05-01 20:29 ` [PATCH v3 1/2] KVM: x86: nSVM/nVMX: Move nested_run_pending to kvm_vcpu_arch Kishen Maloor
2024-05-01 20:29 ` [PATCH v3 2/2] KVM: x86: nSVM/nVMX: Fix RSM logic leading to L2 VM-Entries Kishen Maloor
0 siblings, 2 replies; 4+ messages in thread
From: Kishen Maloor @ 2024-05-01 20:29 UTC (permalink / raw)
To: kvm, seanjc, pbonzini, mlevitsk, zheyuma97; +Cc: Kishen Maloor
This series aims to close the loop on a prior conversation on this matter.
I have picked this up from Michal Wilczynski who had proposed different
fixes (v1 and v2).
v2: https://lore.kernel.org/all/20240123001555.4168188-1-michal.wilczynski@intel.com/
v1: https://lore.kernel.org/all/20231222164543.918037-1-michal.wilczynski@intel.com/
The issue was initially reported here:
https://lore.kernel.org/all/CAMhUBjmXMYsEoVYw_M8hSZjBMHh24i88QYm-RY6HDta5YZ7Wgw@mail.gmail.com/
It is caused by setting nested_run_pending in the vendor-specific leave_smm()
callback from the RSM emulation. The syzkaller test produced a triple fault in
rsm_load_state_64() resulting in a nested VM-Exit with nested_run_pending being
set and triggered the warnings. The commit message for patch 2 has a detailed
description of the flow.
The patches do the following:
a) Move nested_run_pending out of vendor structs and into the x86 kvm_vcpu_arch
so it can be accessed by common x86 code (e.g., the SMM emulation).
The usage and semantics of this flag are common between SVM and VMX.
b) Set nested_run_pending only after a successful RSM emulation.
This evidently resolves the issue, but I would appreciate feedback
(if the patches are acceptable) and/or suggestions.
Kishen Maloor (2):
KVM: x86: nSVM/nVMX: Move nested_run_pending to kvm_vcpu_arch
KVM: x86: nSVM/nVMX: Fix RSM logic leading to L2 VM-Entries
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/smm.c | 12 ++++++++--
arch/x86/kvm/svm/nested.c | 14 +++++------
arch/x86/kvm/svm/svm.c | 12 ++++------
arch/x86/kvm/svm/svm.h | 4 ----
arch/x86/kvm/vmx/nested.c | 42 ++++++++++++++++-----------------
arch/x86/kvm/vmx/vmx.c | 13 +++++-----
arch/x86/kvm/vmx/vmx.h | 3 ---
8 files changed, 50 insertions(+), 51 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v3 1/2] KVM: x86: nSVM/nVMX: Move nested_run_pending to kvm_vcpu_arch
2024-05-01 20:29 [PATCH v3 0/2] Address syzkaller warnings in nested VM-Exit after RSM Kishen Maloor
@ 2024-05-01 20:29 ` Kishen Maloor
2024-05-01 20:29 ` [PATCH v3 2/2] KVM: x86: nSVM/nVMX: Fix RSM logic leading to L2 VM-Entries Kishen Maloor
1 sibling, 0 replies; 4+ messages in thread
From: Kishen Maloor @ 2024-05-01 20:29 UTC (permalink / raw)
To: kvm, seanjc, pbonzini, mlevitsk, zheyuma97; +Cc: Kishen Maloor
nested_run_pending is used in SVM/VMX code to signify
that a nested VM-Entry has been initiated but not yet performed,
and a nested VM-Exit cannot be injected. nested_run_pending is
presently stored in vendor structs and its usage is replicated
in vendor code.
This change merely moves nested_run_pending into kvm_vcpu_arch
so that it may also be operated upon by common x86 code.
The RSM emulation is one such case.
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm/nested.c | 14 +++++------
arch/x86/kvm/svm/svm.c | 12 +++++-----
arch/x86/kvm/svm/svm.h | 4 ----
arch/x86/kvm/vmx/nested.c | 42 ++++++++++++++++-----------------
arch/x86/kvm/vmx/vmx.c | 14 +++++------
arch/x86/kvm/vmx/vmx.h | 3 ---
7 files changed, 42 insertions(+), 48 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6efd1497b026..f2c9e3813800 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -770,6 +770,7 @@ struct kvm_vcpu_arch {
u64 ia32_misc_enable_msr;
u64 smbase;
u64 smi_count;
+ bool nested_run_pending;
bool at_instruction_boundary;
bool tpr_access_reporting;
bool xfd_no_write_intercept;
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 55b9a6d96bcf..ab8c8c9e1e46 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -901,7 +901,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
if (!npt_enabled)
vmcb01->save.cr3 = kvm_read_cr3(vcpu);
- svm->nested.nested_run_pending = 1;
+ vcpu->arch.nested_run_pending = 1;
if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, true))
goto out_exit_err;
@@ -910,7 +910,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
goto out;
out_exit_err:
- svm->nested.nested_run_pending = 0;
+ vcpu->arch.nested_run_pending = 0;
svm->nmi_l1_to_l2 = false;
svm->soft_int_injected = false;
@@ -985,7 +985,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
/* Exit Guest-Mode */
leave_guest_mode(vcpu);
svm->nested.vmcb12_gpa = 0;
- WARN_ON_ONCE(svm->nested.nested_run_pending);
+ WARN_ON_ONCE(vcpu->arch.nested_run_pending);
kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
@@ -1231,7 +1231,7 @@ void svm_leave_nested(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
if (is_guest_mode(vcpu)) {
- svm->nested.nested_run_pending = 0;
+ vcpu->arch.nested_run_pending = 0;
svm->nested.vmcb12_gpa = INVALID_GPA;
leave_guest_mode(vcpu);
@@ -1427,7 +1427,7 @@ static int svm_check_nested_events(struct kvm_vcpu *vcpu)
* previously injected event, the pending exception occurred while said
* event was being delivered and thus needs to be handled.
*/
- bool block_nested_exceptions = svm->nested.nested_run_pending;
+ bool block_nested_exceptions = vcpu->arch.nested_run_pending;
/*
* New events (not exceptions) are only recognized at instruction
* boundaries. If an event needs reinjection, then KVM is handling a
@@ -1604,7 +1604,7 @@ static int svm_get_nested_state(struct kvm_vcpu *vcpu,
kvm_state.size += KVM_STATE_NESTED_SVM_VMCB_SIZE;
kvm_state.flags |= KVM_STATE_NESTED_GUEST_MODE;
- if (svm->nested.nested_run_pending)
+ if (vcpu->arch.nested_run_pending)
kvm_state.flags |= KVM_STATE_NESTED_RUN_PENDING;
}
@@ -1743,7 +1743,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
svm_set_gif(svm, !!(kvm_state->flags & KVM_STATE_NESTED_GIF_SET));
- svm->nested.nested_run_pending =
+ vcpu->arch.nested_run_pending =
!!(kvm_state->flags & KVM_STATE_NESTED_RUN_PENDING);
svm->nested.vmcb12_gpa = kvm_state->hdr.svm.vmcb_pa;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9aaf83c8d57d..debc53b73ea3 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3750,7 +3750,7 @@ bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
{
struct vcpu_svm *svm = to_svm(vcpu);
- if (svm->nested.nested_run_pending)
+ if (vcpu->arch.nested_run_pending)
return -EBUSY;
if (svm_nmi_blocked(vcpu))
@@ -3792,7 +3792,7 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
{
struct vcpu_svm *svm = to_svm(vcpu);
- if (svm->nested.nested_run_pending)
+ if (vcpu->arch.nested_run_pending)
return -EBUSY;
if (svm_interrupt_blocked(vcpu))
@@ -4215,11 +4215,11 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
nested_sync_control_from_vmcb02(svm);
/* Track VMRUNs that have made past consistency checking */
- if (svm->nested.nested_run_pending &&
+ if (vcpu->arch.nested_run_pending &&
svm->vmcb->control.exit_code != SVM_EXIT_ERR)
++vcpu->stat.nested_run;
- svm->nested.nested_run_pending = 0;
+ vcpu->arch.nested_run_pending = 0;
}
svm->vmcb->control.tlb_ctl = TLB_CONTROL_DO_NOTHING;
@@ -4581,7 +4581,7 @@ bool svm_smi_blocked(struct kvm_vcpu *vcpu)
static int svm_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
{
struct vcpu_svm *svm = to_svm(vcpu);
- if (svm->nested.nested_run_pending)
+ if (vcpu->arch.nested_run_pending)
return -EBUSY;
if (svm_smi_blocked(vcpu))
@@ -4699,7 +4699,7 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)
if (ret)
goto unmap_save;
- svm->nested.nested_run_pending = 1;
+ vcpu->arch.nested_run_pending = 1;
unmap_save:
kvm_vcpu_unmap(vcpu, &map_save, true);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 33878efdebc8..b78e7c562cea 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -165,10 +165,6 @@ struct svm_nested_state {
/* These are the merged vectors */
u32 *msrpm;
- /* A VMRUN has started but has not yet been performed, so
- * we cannot inject a nested vmexit yet. */
- bool nested_run_pending;
-
/* cache for control fields of the guest */
struct vmcb_ctrl_area_cached ctl;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d05ddf751491..5510d3667eb8 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2188,7 +2188,7 @@ static void vmx_start_preemption_timer(struct kvm_vcpu *vcpu,
static u64 nested_vmx_calc_efer(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
{
- if (vmx->nested.nested_run_pending &&
+ if (vmx->vcpu.arch.nested_run_pending &&
(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER))
return vmcs12->guest_ia32_efer;
else if (vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE)
@@ -2422,7 +2422,7 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
/*
* Interrupt/Exception Fields
*/
- if (vmx->nested.nested_run_pending) {
+ if (vmx->vcpu.arch.nested_run_pending) {
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
vmcs12->vm_entry_intr_info_field);
vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE,
@@ -2503,7 +2503,7 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
vmcs_write64(GUEST_PDPTR3, vmcs12->guest_pdptr3);
}
- if (kvm_mpx_supported() && vmx->nested.nested_run_pending &&
+ if (kvm_mpx_supported() && vmx->vcpu.arch.nested_run_pending &&
(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs);
}
@@ -2583,7 +2583,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
!(evmcs->hv_clean_fields & HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1);
}
- if (vmx->nested.nested_run_pending &&
+ if (vmx->vcpu.arch.nested_run_pending &&
(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) {
kvm_set_dr(vcpu, 7, vmcs12->guest_dr7);
vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl);
@@ -2591,7 +2591,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl);
}
- if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending ||
+ if (kvm_mpx_supported() && (!vmx->vcpu.arch.nested_run_pending ||
!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
vmcs_write64(GUEST_BNDCFGS, vmx->nested.pre_vmenter_bndcfgs);
vmx_set_rflags(vcpu, vmcs12->guest_rflags);
@@ -2604,7 +2604,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
vcpu->arch.cr0_guest_owned_bits &= ~vmcs12->cr0_guest_host_mask;
vmcs_writel(CR0_GUEST_HOST_MASK, ~vcpu->arch.cr0_guest_owned_bits);
- if (vmx->nested.nested_run_pending &&
+ if (vmx->vcpu.arch.nested_run_pending &&
(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT)) {
vmcs_write64(GUEST_IA32_PAT, vmcs12->guest_ia32_pat);
vcpu->arch.pat = vmcs12->guest_ia32_pat;
@@ -3101,7 +3101,7 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
* to bit 8 (LME) if bit 31 in the CR0 field (corresponding to
* CR0.PG) is 1.
*/
- if (to_vmx(vcpu)->nested.nested_run_pending &&
+ if (vcpu->arch.nested_run_pending &&
(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER)) {
if (CC(!kvm_valid_efer(vcpu, vmcs12->guest_ia32_efer)) ||
CC(ia32e != !!(vmcs12->guest_ia32_efer & EFER_LMA)) ||
@@ -3456,11 +3456,11 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
if (!evaluate_pending_interrupts)
evaluate_pending_interrupts |= kvm_apic_has_pending_init_or_sipi(vcpu);
- if (!vmx->nested.nested_run_pending ||
+ if (!vcpu->arch.nested_run_pending ||
!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS))
vmx->nested.pre_vmenter_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
if (kvm_mpx_supported() &&
- (!vmx->nested.nested_run_pending ||
+ (!vcpu->arch.nested_run_pending ||
!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
vmx->nested.pre_vmenter_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
@@ -3667,7 +3667,7 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
* We're finally done with prerequisite checking, and can start with
* the nested entry.
*/
- vmx->nested.nested_run_pending = 1;
+ vcpu->arch.nested_run_pending = 1;
vmx->nested.has_preemption_timer_deadline = false;
status = nested_vmx_enter_non_root_mode(vcpu, true);
if (unlikely(status != NVMX_VMENTRY_SUCCESS))
@@ -3707,12 +3707,12 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
!nested_cpu_has(vmcs12, CPU_BASED_NMI_WINDOW_EXITING) &&
!(nested_cpu_has(vmcs12, CPU_BASED_INTR_WINDOW_EXITING) &&
(vmcs12->guest_rflags & X86_EFLAGS_IF))) {
- vmx->nested.nested_run_pending = 0;
+ vcpu->arch.nested_run_pending = 0;
return kvm_emulate_halt_noskip(vcpu);
}
break;
case GUEST_ACTIVITY_WAIT_SIPI:
- vmx->nested.nested_run_pending = 0;
+ vcpu->arch.nested_run_pending = 0;
vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
break;
default:
@@ -3722,7 +3722,7 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
return 1;
vmentry_failed:
- vmx->nested.nested_run_pending = 0;
+ vcpu->arch.nested_run_pending = 0;
if (status == NVMX_VMENTRY_KVM_INTERNAL_ERROR)
return 0;
if (status == NVMX_VMENTRY_VMEXIT)
@@ -4104,7 +4104,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
* previously injected event, the pending exception occurred while said
* event was being delivered and thus needs to be handled.
*/
- bool block_nested_exceptions = vmx->nested.nested_run_pending;
+ bool block_nested_exceptions = vcpu->arch.nested_run_pending;
/*
* New events (not exceptions) are only recognized at instruction
* boundaries. If an event needs reinjection, then KVM is handling a
@@ -4401,7 +4401,7 @@ static void sync_vmcs02_to_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
if (nested_cpu_has_preemption_timer(vmcs12) &&
vmcs12->vm_exit_controls & VM_EXIT_SAVE_VMX_PREEMPTION_TIMER &&
- !vmx->nested.nested_run_pending)
+ !vcpu->arch.nested_run_pending)
vmcs12->vmx_preemption_timer_value =
vmx_get_preemption_timer_value(vcpu);
@@ -4774,7 +4774,7 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
vmx->nested.mtf_pending = false;
/* trying to cancel vmlaunch/vmresume is a bug */
- WARN_ON_ONCE(vmx->nested.nested_run_pending);
+ WARN_ON_ONCE(vcpu->arch.nested_run_pending);
#ifdef CONFIG_KVM_HYPERV
if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
@@ -6397,7 +6397,7 @@ bool nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu)
unsigned long exit_qual;
u32 exit_intr_info;
- WARN_ON_ONCE(vmx->nested.nested_run_pending);
+ WARN_ON_ONCE(vcpu->arch.nested_run_pending);
/*
* Late nested VM-Fail shares the same flow as nested VM-Exit since KVM
@@ -6493,7 +6493,7 @@ static int vmx_get_nested_state(struct kvm_vcpu *vcpu,
if (is_guest_mode(vcpu)) {
kvm_state.flags |= KVM_STATE_NESTED_GUEST_MODE;
- if (vmx->nested.nested_run_pending)
+ if (vcpu->arch.nested_run_pending)
kvm_state.flags |= KVM_STATE_NESTED_RUN_PENDING;
if (vmx->nested.mtf_pending)
@@ -6568,7 +6568,7 @@ static int vmx_get_nested_state(struct kvm_vcpu *vcpu,
void vmx_leave_nested(struct kvm_vcpu *vcpu)
{
if (is_guest_mode(vcpu)) {
- to_vmx(vcpu)->nested.nested_run_pending = 0;
+ vcpu->arch.nested_run_pending = 0;
nested_vmx_vmexit(vcpu, -1, 0, 0);
}
free_nested(vcpu);
@@ -6705,7 +6705,7 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
if (!(kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE))
return 0;
- vmx->nested.nested_run_pending =
+ vcpu->arch.nested_run_pending =
!!(kvm_state->flags & KVM_STATE_NESTED_RUN_PENDING);
vmx->nested.mtf_pending =
@@ -6757,7 +6757,7 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
return 0;
error_guest_mode:
- vmx->nested.nested_run_pending = 0;
+ vcpu->arch.nested_run_pending = 0;
return ret;
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 22411f4aff53..e83439ecd956 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5030,7 +5030,7 @@ bool vmx_nmi_blocked(struct kvm_vcpu *vcpu)
static int vmx_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
{
- if (to_vmx(vcpu)->nested.nested_run_pending)
+ if (vcpu->arch.nested_run_pending)
return -EBUSY;
/* An NMI must not be injected into L2 if it's supposed to VM-Exit. */
@@ -5052,7 +5052,7 @@ bool vmx_interrupt_blocked(struct kvm_vcpu *vcpu)
static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
{
- if (to_vmx(vcpu)->nested.nested_run_pending)
+ if (vcpu->arch.nested_run_pending)
return -EBUSY;
/*
@@ -6446,7 +6446,7 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
* invalid guest state should never happen as that means KVM knowingly
* allowed a nested VM-Enter with an invalid vmcs12. More below.
*/
- if (KVM_BUG_ON(vmx->nested.nested_run_pending, vcpu->kvm))
+ if (KVM_BUG_ON(vcpu->arch.nested_run_pending, vcpu->kvm))
return -EIO;
if (is_guest_mode(vcpu)) {
@@ -7437,11 +7437,11 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
* Track VMLAUNCH/VMRESUME that have made past guest state
* checking.
*/
- if (vmx->nested.nested_run_pending &&
+ if (vcpu->arch.nested_run_pending &&
!vmx->exit_reason.failed_vmentry)
++vcpu->stat.nested_run;
- vmx->nested.nested_run_pending = 0;
+ vcpu->arch.nested_run_pending = 0;
}
if (unlikely(vmx->fail))
@@ -8173,7 +8173,7 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu)
static int vmx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
{
/* we need a nested vmexit to enter SMM, postpone if run is pending */
- if (to_vmx(vcpu)->nested.nested_run_pending)
+ if (vcpu->arch.nested_run_pending)
return -EBUSY;
return !is_smm(vcpu);
}
@@ -8214,7 +8214,7 @@ static int vmx_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)
if (ret)
return ret;
- vmx->nested.nested_run_pending = 1;
+ vcpu->arch.nested_run_pending = 1;
vmx->nested.smm.guest_mode = false;
}
return 0;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 90f9e4434646..571ab070c2af 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -188,9 +188,6 @@ struct nested_vmx {
*/
bool enlightened_vmcs_enabled;
- /* L2 must run next, and mustn't decide to exit to L1. */
- bool nested_run_pending;
-
/* Pending MTF VM-exit into L1. */
bool mtf_pending;
--
2.31.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v3 2/2] KVM: x86: nSVM/nVMX: Fix RSM logic leading to L2 VM-Entries
2024-05-01 20:29 [PATCH v3 0/2] Address syzkaller warnings in nested VM-Exit after RSM Kishen Maloor
2024-05-01 20:29 ` [PATCH v3 1/2] KVM: x86: nSVM/nVMX: Move nested_run_pending to kvm_vcpu_arch Kishen Maloor
@ 2024-05-01 20:29 ` Kishen Maloor
2024-08-16 17:22 ` Sean Christopherson
1 sibling, 1 reply; 4+ messages in thread
From: Kishen Maloor @ 2024-05-01 20:29 UTC (permalink / raw)
To: kvm, seanjc, pbonzini, mlevitsk, zheyuma97
Cc: Kishen Maloor, Michal Wilczynski
Syzkaller found a warning triggered in nested_vmx_vmexit().
The nested_run_pending flag is non-zero, even though we're in
nested_vmx_vmexit(). Generally, trying to cancel a pending entry is
considered a bug. However in this particular scenario, the kernel
behavior seems correct.
Syzkaller scenario:
1) Set up VCPU's
2) Run some code with KVM_RUN in L2 as a nested guest
3) Return from KVM_RUN
4) Inject KVM_SMI into the VCPU
5) Change the EFER register with KVM_SET_SREGS to value 0x2501
6) Run some code on the VCPU using KVM_RUN
7) Observe following behavior:
kvm_smm_transition: vcpu 0: entering SMM, smbase 0x30000
kvm_entry: vcpu 0, rip 0x8000
kvm_entry: vcpu 0, rip 0x8000
kvm_entry: vcpu 0, rip 0x8002
kvm_smm_transition: vcpu 0: leaving SMM, smbase 0x30000
kvm_nested_vmenter: rip: 0x0000000000008002 vmcs: 0x0000000000007000
nested_rip: 0x0000000000000000 int_ctl: 0x00000000
event_inj: 0x00000000 nested_ept=n guest
cr3: 0x0000000000002000
kvm_nested_vmexit_inject: reason: TRIPLE_FAULT ext_inf1: 0x0000000000000000
ext_inf2: 0x0000000000000000 ext_int: 0x00000000
ext_int_err: 0x00000000
What happened here is an SMI was injected immediately and the handler was
called at address 0x8000; all is good. Later, an RSM instruction is
executed in an emulator to return to the nested VM. em_rsm() is called,
which leads to emulator_leave_smm(). A part of this function calls the
VMX/SVM callback, in this case vmx_leave_smm(). It attempts to set up a
pending reentry to the guest VM by calling nested_vmx_enter_non_root_mode()
and sets the nested_run_pending flag to one. Unfortunately, later in
emulator_leave_smm(), rsm_load_state_64() fails to write an invalid EFER to
the MSR. This results in em_rsm() calling the triple_fault callback. At this
point, it's clear that KVM should perform a vmexit, but nested_run_pending
is left set to 1.
Similar flow goes for SVM, as the bug also reproduces on AMD platforms.
[Notes above by Michal Wilczynski of the bug analysis]
To address the issue, this change sets the nested_run_pending flag only
after a successful emulation of the RSM instruction. Previously, it
was set (prematurely) inside the vendor-specific leave_smm() callback since
commit 759cbd59674a ("KVM: x86: nSVM/nVMX: set nested_run_pending on VM
entry which is a result of RSM").
Fixes: 759cbd59674a ("KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM")
Reported-by: Zheyu Ma <zheyuma97@gmail.com>
Closes: https://lore.kernel.org/all/CAMhUBjmXMYsEoVYw_M8hSZjBMHh24i88QYm-RY6HDta5YZ7Wgw@mail.gmail.com
Analyzed-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
arch/x86/kvm/smm.c | 12 ++++++++++--
arch/x86/kvm/svm/svm.c | 2 --
arch/x86/kvm/vmx/vmx.c | 1 -
3 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
index d06d43d8d2aa..b1dac967f1a5 100644
--- a/arch/x86/kvm/smm.c
+++ b/arch/x86/kvm/smm.c
@@ -633,8 +633,16 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
#ifdef CONFIG_X86_64
if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
- return rsm_load_state_64(ctxt, &smram.smram64);
+ ret = rsm_load_state_64(ctxt, &smram.smram64);
else
#endif
- return rsm_load_state_32(ctxt, &smram.smram32);
+ ret = rsm_load_state_32(ctxt, &smram.smram32);
+
+ /*
+ * Set nested_run_pending to ensure completion of a nested VM-Entry
+ */
+ if (ret == X86EMUL_CONTINUE && ctxt->ops->is_guest_mode(ctxt))
+ vcpu->arch.nested_run_pending = 1;
+
+ return ret;
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index debc53b73ea3..4c3f0e1f0dd0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4699,8 +4699,6 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)
if (ret)
goto unmap_save;
- vcpu->arch.nested_run_pending = 1;
-
unmap_save:
kvm_vcpu_unmap(vcpu, &map_save, true);
unmap_map:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e83439ecd956..e66fc14b54be 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8214,7 +8214,6 @@ static int vmx_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)
if (ret)
return ret;
- vcpu->arch.nested_run_pending = 1;
vmx->nested.smm.guest_mode = false;
}
return 0;
--
2.31.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v3 2/2] KVM: x86: nSVM/nVMX: Fix RSM logic leading to L2 VM-Entries
2024-05-01 20:29 ` [PATCH v3 2/2] KVM: x86: nSVM/nVMX: Fix RSM logic leading to L2 VM-Entries Kishen Maloor
@ 2024-08-16 17:22 ` Sean Christopherson
0 siblings, 0 replies; 4+ messages in thread
From: Sean Christopherson @ 2024-08-16 17:22 UTC (permalink / raw)
To: Kishen Maloor; +Cc: kvm, pbonzini, mlevitsk, zheyuma97, Michal Wilczynski
Sorry for the super slow review, I was dreading looking at KVM's SMM mess :-)
On Wed, May 01, 2024, Kishen Maloor wrote:
> diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
> index d06d43d8d2aa..b1dac967f1a5 100644
> --- a/arch/x86/kvm/smm.c
> +++ b/arch/x86/kvm/smm.c
> @@ -633,8 +633,16 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
>
> #ifdef CONFIG_X86_64
> if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
> - return rsm_load_state_64(ctxt, &smram.smram64);
> + ret = rsm_load_state_64(ctxt, &smram.smram64);
> else
> #endif
> - return rsm_load_state_32(ctxt, &smram.smram32);
> + ret = rsm_load_state_32(ctxt, &smram.smram32);
> +
> + /*
> + * Set nested_run_pending to ensure completion of a nested VM-Entry
> + */
> + if (ret == X86EMUL_CONTINUE && ctxt->ops->is_guest_mode(ctxt))
No need to bounce through the emulation context, this can simply be
if (ret == X86EMUL_CONTINUE && is_guest_mode(vcpu))
> + vcpu->arch.nested_run_pending = 1;
That said, while I 100% agree that KVM shouldn't set nested_run_pending before
loading state from SMRAM, I think it's actually the wrong fix. I am fairly certain
that the real issue is that KVM is synthesizing shutdown _for L2_. Neither the
the SDM and APM state that a shutdown on RSM to guest mode can trigger a VM-Exit,
Intel's pseudocode explicitly says state is loaded from SMRAM before transitioning
the CPU back to non-root mode, and AMD saves VMCB state in SMRAM, i.e. it's not
treated differently (like Intel does for VMX state).
So, the more correct, and amusingly easier, fix is to forcibly leave nested mode
prior to signalling shutdown. I'll post a patch for that.
I think I'll also grab patch 1 and post it in a slightly larger cleanup series
at some point. Making nested_run_pending a common field is worthwhile regardless
of this SMM mess.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-08-16 17:22 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-01 20:29 [PATCH v3 0/2] Address syzkaller warnings in nested VM-Exit after RSM Kishen Maloor
2024-05-01 20:29 ` [PATCH v3 1/2] KVM: x86: nSVM/nVMX: Move nested_run_pending to kvm_vcpu_arch Kishen Maloor
2024-05-01 20:29 ` [PATCH v3 2/2] KVM: x86: nSVM/nVMX: Fix RSM logic leading to L2 VM-Entries Kishen Maloor
2024-08-16 17:22 ` Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox