* [PATCH] KVM: x86: Free the MSR filter after destorying VCPUs
@ 2024-07-03 17:56 Aaron Lewis
2024-08-22 17:24 ` Sean Christopherson
0 siblings, 1 reply; 2+ messages in thread
From: Aaron Lewis @ 2024-07-03 17:56 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, jmattson, seanjc, Aaron Lewis
Delay freeing the MSR filter until after the VCPUs are destroyed to
avoid a possible use-after-free when freeing a nested guest.
This callstack is from a 5.15 kernel, but the issue still appears to be
possible upstream.
kvm_msr_allowed+0x4c/0xd0
__kvm_set_msr+0x12d/0x1e0
kvm_set_msr+0x19/0x40
load_vmcs12_host_state+0x2d8/0x6e0 [kvm_intel]
nested_vmx_vmexit+0x715/0xbd0 [kvm_intel]
nested_vmx_free_vcpu+0x33/0x50 [kvm_intel]
vmx_free_vcpu+0x54/0xc0 [kvm_intel]
kvm_arch_vcpu_destroy+0x28/0xf0
kvm_vcpu_destroy+0x12/0x50
kvm_arch_destroy_vm+0x12c/0x1c0
kvm_put_kvm+0x263/0x3c0
kvm_vm_release+0x21/0x30
__fput+0xb9/0x240
____fput+0xe/0x20
task_work_run+0x6f/0xd0
syscall_exit_to_user_mode+0x123/0x300
do_syscall_64+0x72/0xb0
entry_SYSCALL_64_after_hwframe+0x61/0xc6
Fixes: b318e8decf6b ("KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish")
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Suggested-by: Jim Mattson <jmattson@google.com>
---
arch/x86/kvm/x86.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 66c4381460dc..638696efa17e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12711,12 +12711,12 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
}
kvm_unload_vcpu_mmus(kvm);
static_call_cond(kvm_x86_vm_destroy)(kvm);
- kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
kvm_pic_destroy(kvm);
kvm_ioapic_destroy(kvm);
kvm_destroy_vcpus(kvm);
kvfree(rcu_dereference_check(kvm->arch.apic_map, 1));
kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1));
+ kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
kvm_mmu_uninit_vm(kvm);
kvm_page_track_cleanup(kvm);
kvm_xen_destroy_vm(kvm);
--
2.45.2.803.g4e1b14247a-goog
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] KVM: x86: Free the MSR filter after destorying VCPUs
2024-07-03 17:56 [PATCH] KVM: x86: Free the MSR filter after destorying VCPUs Aaron Lewis
@ 2024-08-22 17:24 ` Sean Christopherson
0 siblings, 0 replies; 2+ messages in thread
From: Sean Christopherson @ 2024-08-22 17:24 UTC (permalink / raw)
To: Aaron Lewis; +Cc: kvm, pbonzini, jmattson
On Wed, Jul 03, 2024, Aaron Lewis wrote:
> Delay freeing the MSR filter until after the VCPUs are destroyed to
> avoid a possible use-after-free when freeing a nested guest.
>
> This callstack is from a 5.15 kernel, but the issue still appears to be
> possible upstream.
>
> kvm_msr_allowed+0x4c/0xd0
> __kvm_set_msr+0x12d/0x1e0
> kvm_set_msr+0x19/0x40
> load_vmcs12_host_state+0x2d8/0x6e0 [kvm_intel]
> nested_vmx_vmexit+0x715/0xbd0 [kvm_intel]
> nested_vmx_free_vcpu+0x33/0x50 [kvm_intel]
> vmx_free_vcpu+0x54/0xc0 [kvm_intel]
> kvm_arch_vcpu_destroy+0x28/0xf0
> kvm_vcpu_destroy+0x12/0x50
> kvm_arch_destroy_vm+0x12c/0x1c0
> kvm_put_kvm+0x263/0x3c0
> kvm_vm_release+0x21/0x30
> __fput+0xb9/0x240
> ____fput+0xe/0x20
> task_work_run+0x6f/0xd0
> syscall_exit_to_user_mode+0x123/0x300
> do_syscall_64+0x72/0xb0
> entry_SYSCALL_64_after_hwframe+0x61/0xc6
>
> Fixes: b318e8decf6b ("KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish")
>
> Signed-off-by: Aaron Lewis <aaronlewis@google.com>
> Suggested-by: Jim Mattson <jmattson@google.com>
> ---
> arch/x86/kvm/x86.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 66c4381460dc..638696efa17e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12711,12 +12711,12 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> }
> kvm_unload_vcpu_mmus(kvm);
> static_call_cond(kvm_x86_vm_destroy)(kvm);
> - kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
> kvm_pic_destroy(kvm);
> kvm_ioapic_destroy(kvm);
> kvm_destroy_vcpus(kvm);
> kvfree(rcu_dereference_check(kvm->arch.apic_map, 1));
> kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1));
> + kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
This makes the KFENCE/KASAN issues go away, but I'm not so sure it actually fixes
the main problem, which is that doing nested_vmx_vmexit() when freeing a vCPU is
insane. E.g. it can do VMWRITEs, VMPTRLD, read and write guest memory, write
guest MSRs (which could theoretically write _host_ MSRs), and so on and so fort.
And free_nested() really should free _everything_, i.e. it shouldn't rely on
nested_vmx_vmexit() to do things like cancel hrtimers.
Completely untested, but at a glance I think we can/should do this. If we need
a fix for stable, then we can move the MSR filter freeing, but going forward, I
think we need to fix the underlying mess.
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 2392a7ef254d..f5210834a246 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -359,6 +359,8 @@ static void free_nested(struct kvm_vcpu *vcpu)
kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
vmx->nested.pi_desc = NULL;
+ hrtimer_cancel(&to_vmx(vcpu)->nested.preemption_timer);
+
kvm_mmu_free_roots(vcpu->kvm, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
nested_release_evmcs(vcpu);
@@ -372,9 +374,10 @@ static void free_nested(struct kvm_vcpu *vcpu)
*/
void nested_vmx_free_vcpu(struct kvm_vcpu *vcpu)
{
- vcpu_load(vcpu);
- vmx_leave_nested(vcpu);
- vcpu_put(vcpu);
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+ vmx->loaded_vmcs = &vmx->vmcs01;
+ free_nested(vcpu);
}
#define EPTP_PA_MASK GENMASK_ULL(51, 12)
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-08-22 17:24 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-03 17:56 [PATCH] KVM: x86: Free the MSR filter after destorying VCPUs Aaron Lewis
2024-08-22 17:24 ` Sean Christopherson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.