* [PATCH 1/7] KVM: x86: Free vCPUs before freeing VM state
2025-02-24 23:55 [PATCH 0/7] KVM: x86: nVMX IRQ fix and VM teardown cleanups Sean Christopherson
@ 2025-02-24 23:55 ` Sean Christopherson
2025-02-25 7:44 ` Yan Zhao
2025-02-25 23:47 ` Paolo Bonzini
2025-02-24 23:55 ` [PATCH 2/7] KVM: nVMX: Process events on nested VM-Exit if injectable IRQ or NMI is pending Sean Christopherson
` (7 subsequent siblings)
8 siblings, 2 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-02-24 23:55 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Madhavan Srinivasan, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson, Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, Aaron Lewis,
Jim Mattson, Yan Zhao, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
Free vCPUs before freeing any VM state, as both SVM and VMX may access
VM state when "freeing" a vCPU that is currently "in" L2, i.e. that needs
to be kicked out of nested guest mode.
Commit 6fcee03df6a1 ("KVM: x86: avoid loading a vCPU after .vm_destroy was
called") partially fixed the issue, but for unknown reasons only moved the
MMU unloading before VM destruction. Complete the change, and free all
vCPU state prior to destroying VM state, as nVMX accesses even more state
than nSVM.
In addition to the AVIC, KVM can hit a use-after-free on MSR filters:
kvm_msr_allowed+0x4c/0xd0
__kvm_set_msr+0x12d/0x1e0
kvm_set_msr+0x19/0x40
load_vmcs12_host_state+0x2d8/0x6e0 [kvm_intel]
nested_vmx_vmexit+0x715/0xbd0 [kvm_intel]
nested_vmx_free_vcpu+0x33/0x50 [kvm_intel]
vmx_free_vcpu+0x54/0xc0 [kvm_intel]
kvm_arch_vcpu_destroy+0x28/0xf0
kvm_vcpu_destroy+0x12/0x50
kvm_arch_destroy_vm+0x12c/0x1c0
kvm_put_kvm+0x263/0x3c0
kvm_vm_release+0x21/0x30
and an upcoming fix to process injectable interrupts on nested VM-Exit
will access the PIC:
BUG: kernel NULL pointer dereference, address: 0000000000000090
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
CPU: 23 UID: 1000 PID: 2658 Comm: kvm-nx-lpage-re
RIP: 0010:kvm_cpu_has_extint+0x2f/0x60 [kvm]
Call Trace:
<TASK>
kvm_cpu_has_injectable_intr+0xe/0x60 [kvm]
nested_vmx_vmexit+0x2d7/0xdf0 [kvm_intel]
nested_vmx_free_vcpu+0x40/0x50 [kvm_intel]
vmx_vcpu_free+0x2d/0x80 [kvm_intel]
kvm_arch_vcpu_destroy+0x2d/0x130 [kvm]
kvm_destroy_vcpus+0x8a/0x100 [kvm]
kvm_arch_destroy_vm+0xa7/0x1d0 [kvm]
kvm_destroy_vm+0x172/0x300 [kvm]
kvm_vcpu_release+0x31/0x50 [kvm]
Inarguably, both nSVM and nVMX need to be fixed, but punt on those
cleanups for the moment. Conceptually, vCPUs should be freed before VM
state. Assets like the I/O APIC and PIC _must_ be allocated before vCPUs
are created, so it stands to reason that they must be freed _after_ vCPUs
are destroyed.
Reported-by: Aaron Lewis <aaronlewis@google.com>
Closes: https://lore.kernel.org/all/20240703175618.2304869-2-aaronlewis@google.com
Cc: Jim Mattson <jmattson@google.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>
Cc: Rick P Edgecombe <rick.p.edgecombe@intel.com>
Cc: Kai Huang <kai.huang@intel.com>
Cc: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 58b82d6fd77c..045c61cc7e54 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12890,11 +12890,11 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
mutex_unlock(&kvm->slots_lock);
}
kvm_unload_vcpu_mmus(kvm);
+ kvm_destroy_vcpus(kvm);
kvm_x86_call(vm_destroy)(kvm);
kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
kvm_pic_destroy(kvm);
kvm_ioapic_destroy(kvm);
- kvm_destroy_vcpus(kvm);
kvfree(rcu_dereference_check(kvm->arch.apic_map, 1));
kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1));
kvm_mmu_uninit_vm(kvm);
--
2.48.1.658.g4767266eb4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH 1/7] KVM: x86: Free vCPUs before freeing VM state
2025-02-24 23:55 ` [PATCH 1/7] KVM: x86: Free vCPUs before freeing VM state Sean Christopherson
@ 2025-02-25 7:44 ` Yan Zhao
2025-02-25 15:04 ` Sean Christopherson
2025-02-25 23:47 ` Paolo Bonzini
1 sibling, 1 reply; 22+ messages in thread
From: Yan Zhao @ 2025-02-25 7:44 UTC (permalink / raw)
To: Sean Christopherson
Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Madhavan Srinivasan, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Paolo Bonzini, linux-arm-kernel, kvmarm, kvm, loongarch,
linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Aaron Lewis, Jim Mattson, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 58b82d6fd77c..045c61cc7e54 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12890,11 +12890,11 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> mutex_unlock(&kvm->slots_lock);
> }
> kvm_unload_vcpu_mmus(kvm);
> + kvm_destroy_vcpus(kvm);
> kvm_x86_call(vm_destroy)(kvm);
> kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
> kvm_pic_destroy(kvm);
> kvm_ioapic_destroy(kvm);
> - kvm_destroy_vcpus(kvm);
> kvfree(rcu_dereference_check(kvm->arch.apic_map, 1));
> kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1));
> kvm_mmu_uninit_vm(kvm);
After this change, now the sequence is that
1. kvm_arch_pre_destroy_vm()
2. kvm_arch_destroy_vm()
2.1 kvm_destroy_vcpus()
2.2 .vm_destroy hook
2.3 kvm_mmu_uninit_vm() --> mirror root ref is 1 upon here. Zap the mirror
root and reclaim SETP page table pages.
2.4 .vm_free hook
Since TDX needs to reclaim the TDR page after reclaiming all other pages, we
currently added a vm_free hook at 2.4, after 2.3.
Could we move kvm_mmu_uninit_vm() before the .vm_destroy hook and after
kvm_destroy_vcpus()?
Or move the .vm_destroy hook after kvm_mmu_uninit_vm(), e.g. after
kvm_page_track_cleanup()?
Otherwise, TDX still needs to introduce the .vm_free hook, which is invoked at
the end of kvm_arch_destroy_vm().
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/7] KVM: x86: Free vCPUs before freeing VM state
2025-02-25 7:44 ` Yan Zhao
@ 2025-02-25 15:04 ` Sean Christopherson
2025-02-26 7:34 ` Yan Zhao
0 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2025-02-25 15:04 UTC (permalink / raw)
To: Yan Zhao
Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Madhavan Srinivasan, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Paolo Bonzini, linux-arm-kernel, kvmarm, kvm, loongarch,
linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Aaron Lewis, Jim Mattson, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
On Tue, Feb 25, 2025, Yan Zhao wrote:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 58b82d6fd77c..045c61cc7e54 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -12890,11 +12890,11 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> > mutex_unlock(&kvm->slots_lock);
> > }
> > kvm_unload_vcpu_mmus(kvm);
> > + kvm_destroy_vcpus(kvm);
> > kvm_x86_call(vm_destroy)(kvm);
> > kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
> > kvm_pic_destroy(kvm);
> > kvm_ioapic_destroy(kvm);
> > - kvm_destroy_vcpus(kvm);
> > kvfree(rcu_dereference_check(kvm->arch.apic_map, 1));
> > kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1));
> > kvm_mmu_uninit_vm(kvm);
> After this change, now the sequence is that
>
> 1. kvm_arch_pre_destroy_vm()
> 2. kvm_arch_destroy_vm()
> 2.1 kvm_destroy_vcpus()
> 2.2 .vm_destroy hook
> 2.3 kvm_mmu_uninit_vm() --> mirror root ref is 1 upon here. Zap the mirror
> root and reclaim SETP page table pages.
> 2.4 .vm_free hook
>
> Since TDX needs to reclaim the TDR page after reclaiming all other pages, we
> currently added a vm_free hook at 2.4, after 2.3.
>
> Could we move kvm_mmu_uninit_vm() before the .vm_destroy hook and after
> kvm_destroy_vcpus()?
>
> Or move the .vm_destroy hook after kvm_mmu_uninit_vm(), e.g. after
> kvm_page_track_cleanup()?
I would go for the first option. I'll tack on a patch since I need to test all
of these flows anyways, and I would much prefer to change course sooner rather
than later if it doesn't work for whatever reason.
Is this comment accurate?
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1e5f6f820c0b..f5685f153e08 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12874,13 +12874,19 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
mutex_unlock(&kvm->slots_lock);
}
kvm_destroy_vcpus(kvm);
+
+ /*
+ * Do final MMU teardown prior to calling into vendor code. All pages
+ * that were donated to the TDX module, e.g. for S-EPT tables, need to
+ * be reclaimed before the VM metadata page can be freed.
+ */
+ kvm_mmu_uninit_vm(kvm);
kvm_x86_call(vm_destroy)(kvm);
kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
kvm_pic_destroy(kvm);
kvm_ioapic_destroy(kvm);
kvfree(rcu_dereference_check(kvm->arch.apic_map, 1));
kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1));
- kvm_mmu_uninit_vm(kvm);
kvm_page_track_cleanup(kvm);
kvm_xen_destroy_vm(kvm);
kvm_hv_destroy_vm(kvm);
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH 1/7] KVM: x86: Free vCPUs before freeing VM state
2025-02-25 15:04 ` Sean Christopherson
@ 2025-02-26 7:34 ` Yan Zhao
0 siblings, 0 replies; 22+ messages in thread
From: Yan Zhao @ 2025-02-26 7:34 UTC (permalink / raw)
To: Sean Christopherson
Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Madhavan Srinivasan, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Paolo Bonzini, linux-arm-kernel, kvmarm, kvm, loongarch,
linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Aaron Lewis, Jim Mattson, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
On Tue, Feb 25, 2025 at 07:04:55AM -0800, Sean Christopherson wrote:
> On Tue, Feb 25, 2025, Yan Zhao wrote:
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 58b82d6fd77c..045c61cc7e54 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -12890,11 +12890,11 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> > > mutex_unlock(&kvm->slots_lock);
> > > }
> > > kvm_unload_vcpu_mmus(kvm);
> > > + kvm_destroy_vcpus(kvm);
> > > kvm_x86_call(vm_destroy)(kvm);
> > > kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
> > > kvm_pic_destroy(kvm);
> > > kvm_ioapic_destroy(kvm);
> > > - kvm_destroy_vcpus(kvm);
> > > kvfree(rcu_dereference_check(kvm->arch.apic_map, 1));
> > > kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1));
> > > kvm_mmu_uninit_vm(kvm);
> > After this change, now the sequence is that
> >
> > 1. kvm_arch_pre_destroy_vm()
> > 2. kvm_arch_destroy_vm()
> > 2.1 kvm_destroy_vcpus()
> > 2.2 .vm_destroy hook
> > 2.3 kvm_mmu_uninit_vm() --> mirror root ref is 1 upon here. Zap the mirror
> > root and reclaim SETP page table pages.
> > 2.4 .vm_free hook
> >
> > Since TDX needs to reclaim the TDR page after reclaiming all other pages, we
> > currently added a vm_free hook at 2.4, after 2.3.
> >
> > Could we move kvm_mmu_uninit_vm() before the .vm_destroy hook and after
> > kvm_destroy_vcpus()?
> >
> > Or move the .vm_destroy hook after kvm_mmu_uninit_vm(), e.g. after
> > kvm_page_track_cleanup()?
>
> I would go for the first option. I'll tack on a patch since I need to test all
> of these flows anyways, and I would much prefer to change course sooner rather
> than later if it doesn't work for whatever reason.
>
> Is this comment accurate?
Yes.
And since tdx_mmu_release_hkid() is called before kvm_unload_vcpu_mmus(),
patch 4 in this series is required by TDX. Otherwise, the list_add in
tdx_vcpu_load will cause corruption during tearing down, since
tdx_mmu_release_hkid() has already invoked tdx_disassociate_vp() on all
vcpus.
kvm_unload_vcpu_mmu
vcpu_load
tdx_vcpu_load
list_add(&tdx->cpu_list, &per_cpu(associated_tdvcpus, cpu))
So, maybe a change as below is also required by TDX in case vcpu_load() is
accidentally invoked after .vm_pre_destroy.
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index b67df0af64f3..183192706ced 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -703,7 +704,7 @@ void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
struct vcpu_tdx *tdx = to_tdx(vcpu);
vmx_vcpu_pi_load(vcpu, cpu);
- if (vcpu->cpu == cpu)
+ if (vcpu->cpu == cpu || !is_hkid_assigned(to_kvm_tdx(vcpu->kvm)))
return;
tdx_flush_vp_on_cpu(vcpu);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1e5f6f820c0b..f5685f153e08 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12874,13 +12874,19 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> mutex_unlock(&kvm->slots_lock);
> }
> kvm_destroy_vcpus(kvm);
> +
> + /*
> + * Do final MMU teardown prior to calling into vendor code. All pages
> + * that were donated to the TDX module, e.g. for S-EPT tables, need to
> + * be reclaimed before the VM metadata page can be freed.
> + */
> + kvm_mmu_uninit_vm(kvm);
> kvm_x86_call(vm_destroy)(kvm);
> kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
> kvm_pic_destroy(kvm);
> kvm_ioapic_destroy(kvm);
> kvfree(rcu_dereference_check(kvm->arch.apic_map, 1));
> kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1));
> - kvm_mmu_uninit_vm(kvm);
> kvm_page_track_cleanup(kvm);
> kvm_xen_destroy_vm(kvm);
> kvm_hv_destroy_vm(kvm);
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 1/7] KVM: x86: Free vCPUs before freeing VM state
2025-02-24 23:55 ` [PATCH 1/7] KVM: x86: Free vCPUs before freeing VM state Sean Christopherson
2025-02-25 7:44 ` Yan Zhao
@ 2025-02-25 23:47 ` Paolo Bonzini
2025-02-26 0:27 ` Sean Christopherson
1 sibling, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2025-02-25 23:47 UTC (permalink / raw)
To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
Bibo Mao, Huacai Chen, Madhavan Srinivasan, Anup Patel,
Paul Walmsley, Palmer Dabbelt, Albert Ou, Christian Borntraeger,
Janosch Frank, Claudio Imbrenda
Cc: linux-arm-kernel, kvmarm, kvm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, Aaron Lewis,
Jim Mattson, Yan Zhao, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
On 2/25/25 00:55, Sean Christopherson wrote:
> Free vCPUs before freeing any VM state, as both SVM and VMX may access
> VM state when "freeing" a vCPU that is currently "in" L2, i.e. that needs
> to be kicked out of nested guest mode.
>
> Commit 6fcee03df6a1 ("KVM: x86: avoid loading a vCPU after .vm_destroy was
> called") partially fixed the issue, but for unknown reasons only moved the
> MMU unloading before VM destruction. Complete the change, and free all
> vCPU state prior to destroying VM state, as nVMX accesses even more state
> than nSVM.
I applied this to kvm-coco-queue, I will place it in kvm/master too
unless you shout.
Paolo
> In addition to the AVIC, KVM can hit a use-after-free on MSR filters:
>
> kvm_msr_allowed+0x4c/0xd0
> __kvm_set_msr+0x12d/0x1e0
> kvm_set_msr+0x19/0x40
> load_vmcs12_host_state+0x2d8/0x6e0 [kvm_intel]
> nested_vmx_vmexit+0x715/0xbd0 [kvm_intel]
> nested_vmx_free_vcpu+0x33/0x50 [kvm_intel]
> vmx_free_vcpu+0x54/0xc0 [kvm_intel]
> kvm_arch_vcpu_destroy+0x28/0xf0
> kvm_vcpu_destroy+0x12/0x50
> kvm_arch_destroy_vm+0x12c/0x1c0
> kvm_put_kvm+0x263/0x3c0
> kvm_vm_release+0x21/0x30
>
> and an upcoming fix to process injectable interrupts on nested VM-Exit
> will access the PIC:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000090
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> CPU: 23 UID: 1000 PID: 2658 Comm: kvm-nx-lpage-re
> RIP: 0010:kvm_cpu_has_extint+0x2f/0x60 [kvm]
> Call Trace:
> <TASK>
> kvm_cpu_has_injectable_intr+0xe/0x60 [kvm]
> nested_vmx_vmexit+0x2d7/0xdf0 [kvm_intel]
> nested_vmx_free_vcpu+0x40/0x50 [kvm_intel]
> vmx_vcpu_free+0x2d/0x80 [kvm_intel]
> kvm_arch_vcpu_destroy+0x2d/0x130 [kvm]
> kvm_destroy_vcpus+0x8a/0x100 [kvm]
> kvm_arch_destroy_vm+0xa7/0x1d0 [kvm]
> kvm_destroy_vm+0x172/0x300 [kvm]
> kvm_vcpu_release+0x31/0x50 [kvm]
>
> Inarguably, both nSVM and nVMX need to be fixed, but punt on those
> cleanups for the moment. Conceptually, vCPUs should be freed before VM
> state. Assets like the I/O APIC and PIC _must_ be allocated before vCPUs
> are created, so it stands to reason that they must be freed _after_ vCPUs
> are destroyed.
>
> Reported-by: Aaron Lewis <aaronlewis@google.com>
> Closes: https://lore.kernel.org/all/20240703175618.2304869-2-aaronlewis@google.com
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Yan Zhao <yan.y.zhao@intel.com>
> Cc: Rick P Edgecombe <rick.p.edgecombe@intel.com>
> Cc: Kai Huang <kai.huang@intel.com>
> Cc: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/x86.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 58b82d6fd77c..045c61cc7e54 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12890,11 +12890,11 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> mutex_unlock(&kvm->slots_lock);
> }
> kvm_unload_vcpu_mmus(kvm);
> + kvm_destroy_vcpus(kvm);
> kvm_x86_call(vm_destroy)(kvm);
> kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
> kvm_pic_destroy(kvm);
> kvm_ioapic_destroy(kvm);
> - kvm_destroy_vcpus(kvm);
> kvfree(rcu_dereference_check(kvm->arch.apic_map, 1));
> kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1));
> kvm_mmu_uninit_vm(kvm);
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/7] KVM: x86: Free vCPUs before freeing VM state
2025-02-25 23:47 ` Paolo Bonzini
@ 2025-02-26 0:27 ` Sean Christopherson
2025-02-26 9:18 ` Paolo Bonzini
0 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2025-02-26 0:27 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Madhavan Srinivasan, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
linux-arm-kernel, kvmarm, kvm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, Aaron Lewis,
Jim Mattson, Yan Zhao, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
On Wed, Feb 26, 2025, Paolo Bonzini wrote:
> On 2/25/25 00:55, Sean Christopherson wrote:
> > Free vCPUs before freeing any VM state, as both SVM and VMX may access
> > VM state when "freeing" a vCPU that is currently "in" L2, i.e. that needs
> > to be kicked out of nested guest mode.
> >
> > Commit 6fcee03df6a1 ("KVM: x86: avoid loading a vCPU after .vm_destroy was
> > called") partially fixed the issue, but for unknown reasons only moved the
> > MMU unloading before VM destruction. Complete the change, and free all
> > vCPU state prior to destroying VM state, as nVMX accesses even more state
> > than nSVM.
>
> I applied this to kvm-coco-queue, I will place it in kvm/master too unless
> you shout.
Depends on what "this" is :-)
My plan/hope is to land patches 1 and 2 in 6.14, i.e. in kvm/master, but the
rest are firmly 6.15 IMO. And based on Yan's feedback, I'm planning on adding a
few more cleanups (though I think they're fully additive, i.e. can go on top).
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/7] KVM: x86: Free vCPUs before freeing VM state
2025-02-26 0:27 ` Sean Christopherson
@ 2025-02-26 9:18 ` Paolo Bonzini
0 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2025-02-26 9:18 UTC (permalink / raw)
To: Sean Christopherson
Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Madhavan Srinivasan, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
linux-arm-kernel, kvmarm, kvm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, Aaron Lewis,
Jim Mattson, Yan Zhao, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
On Wed, Feb 26, 2025 at 1:27 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Feb 26, 2025, Paolo Bonzini wrote:
> > On 2/25/25 00:55, Sean Christopherson wrote:
> > > Free vCPUs before freeing any VM state, as both SVM and VMX may access
> > > VM state when "freeing" a vCPU that is currently "in" L2, i.e. that needs
> > > to be kicked out of nested guest mode.
> > >
> > > Commit 6fcee03df6a1 ("KVM: x86: avoid loading a vCPU after .vm_destroy was
> > > called") partially fixed the issue, but for unknown reasons only moved the
> > > MMU unloading before VM destruction. Complete the change, and free all
> > > vCPU state prior to destroying VM state, as nVMX accesses even more state
> > > than nSVM.
> >
> > I applied this to kvm-coco-queue, I will place it in kvm/master too unless
> > you shout.
>
> Depends on what "this" is :-)
>
> My plan/hope is to land patches 1 and 2 in 6.14, i.e. in kvm/master
I meant only 1, but if you want to have 2 as well then that's fine too.
As to kvm-coco-queue, based on Yan's reply I have 1 (of course), 4 and
an extra patch to move kvm_x86_call(vm_destroy) at the very end of
kvm_arch_destroy_vm; I'll post everything as soon as I finish building
and testing.
Paolo
> rest are firmly 6.15 IMO. And based on Yan's feedback, I'm planning on adding a
> few more cleanups (though I think they're fully additive, i.e. can go on top).
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 2/7] KVM: nVMX: Process events on nested VM-Exit if injectable IRQ or NMI is pending
2025-02-24 23:55 [PATCH 0/7] KVM: x86: nVMX IRQ fix and VM teardown cleanups Sean Christopherson
2025-02-24 23:55 ` [PATCH 1/7] KVM: x86: Free vCPUs before freeing VM state Sean Christopherson
@ 2025-02-24 23:55 ` Sean Christopherson
2025-02-24 23:55 ` [PATCH 3/7] KVM: Assert that a destroyed/freed vCPU is no longer visible Sean Christopherson
` (6 subsequent siblings)
8 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-02-24 23:55 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Madhavan Srinivasan, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson, Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, Aaron Lewis,
Jim Mattson, Yan Zhao, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
Process pending events on nested VM-Exit if the vCPU has an injectable IRQ
or NMI, as the event may have become pending while L2 was active, i.e. may
not be tracked in the context of vmcs01. E.g. if L1 has passed its APIC
through to L2 and an IRQ arrives while L2 is active, then KVM needs to
request an IRQ window prior to running L1, otherwise delivery of the IRQ
will be delayed until KVM happens to process events for some other reason.
The missed failure is detected by vmx_apic_passthrough_tpr_threshold_test
in KVM-Unit-Tests, but has effectively been masked due to a flaw in KVM's
PIC emulation that causes KVM to make spurious KVM_REQ_EVENT requests (and
apparently no one ever ran the test with split IRQ chips).
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/nested.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index bca2575837ce..8220b09e91ce 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5084,6 +5084,17 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
load_vmcs12_host_state(vcpu, vmcs12);
+ /*
+ * Process events if an injectable IRQ or NMI is pending, even
+ * if the event is blocked (RFLAGS.IF is cleared on VM-Exit).
+ * If an event became pending while L2 was active, KVM needs to
+ * either inject the event or request an IRQ/NMI window. SMIs
+ * don't need to be processed as SMM is mutually exclusive with
+ * non-root mode. INIT/SIPI don't need to be checked as INIT
+ * is blocked post-VMXON, and SIPIs are ignored.
+ */
+ if (kvm_cpu_has_injectable_intr(vcpu) || vcpu->arch.nmi_pending)
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
return;
}
--
2.48.1.658.g4767266eb4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH 3/7] KVM: Assert that a destroyed/freed vCPU is no longer visible
2025-02-24 23:55 [PATCH 0/7] KVM: x86: nVMX IRQ fix and VM teardown cleanups Sean Christopherson
2025-02-24 23:55 ` [PATCH 1/7] KVM: x86: Free vCPUs before freeing VM state Sean Christopherson
2025-02-24 23:55 ` [PATCH 2/7] KVM: nVMX: Process events on nested VM-Exit if injectable IRQ or NMI is pending Sean Christopherson
@ 2025-02-24 23:55 ` Sean Christopherson
2025-02-25 7:07 ` Yan Zhao
2025-02-24 23:55 ` [PATCH 4/7] KVM: x86: Don't load/put vCPU when unloading its MMU during teardown Sean Christopherson
` (5 subsequent siblings)
8 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2025-02-24 23:55 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Madhavan Srinivasan, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson, Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, Aaron Lewis,
Jim Mattson, Yan Zhao, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
After freeing a vCPU, assert that it is no longer reachable, and that
kvm_get_vcpu() doesn't return garbage or a pointer to some other vCPU.
While KVM obviously shouldn't be attempting to access a freed vCPU, it's
all too easy for KVM to make a VM-wide request, e.g. via KVM_BUG_ON() or
kvm_flush_remote_tlbs().
Alternatively, KVM could short-circuit problematic paths if the VM's
refcount has gone to zero, e.g. in kvm_make_all_cpus_request(), or KVM
could try disallow making global requests during teardown. But given that
deleting the vCPU from the array Just Works, adding logic to the requests
path is unnecessary, and trying to make requests illegal during teardown
would be a fool's errand.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 201c14ff476f..991e8111e88b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -489,6 +489,14 @@ void kvm_destroy_vcpus(struct kvm *kvm)
kvm_for_each_vcpu(i, vcpu, kvm) {
kvm_vcpu_destroy(vcpu);
xa_erase(&kvm->vcpu_array, i);
+
+ /*
+ * Assert that the vCPU isn't visible in any way, to ensure KVM
+ * doesn't trigger a use-after-free if destroying vCPUs results
+ * in VM-wide request, e.g. to flush remote TLBs when tearing
+ * down MMUs, or to mark the VM dead if a KVM_BUG_ON() fires.
+ */
+ WARN_ON_ONCE(xa_load(&kvm->vcpu_array, i) || kvm_get_vcpu(kvm, i));
}
atomic_set(&kvm->online_vcpus, 0);
--
2.48.1.658.g4767266eb4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH 3/7] KVM: Assert that a destroyed/freed vCPU is no longer visible
2025-02-24 23:55 ` [PATCH 3/7] KVM: Assert that a destroyed/freed vCPU is no longer visible Sean Christopherson
@ 2025-02-25 7:07 ` Yan Zhao
2025-02-25 14:35 ` Sean Christopherson
0 siblings, 1 reply; 22+ messages in thread
From: Yan Zhao @ 2025-02-25 7:07 UTC (permalink / raw)
To: Sean Christopherson
Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Madhavan Srinivasan, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Paolo Bonzini, linux-arm-kernel, kvmarm, kvm, loongarch,
linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Aaron Lewis, Jim Mattson, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
On Mon, Feb 24, 2025 at 03:55:38PM -0800, Sean Christopherson wrote:
> After freeing a vCPU, assert that it is no longer reachable, and that
> kvm_get_vcpu() doesn't return garbage or a pointer to some other vCPU.
> While KVM obviously shouldn't be attempting to access a freed vCPU, it's
> all too easy for KVM to make a VM-wide request, e.g. via KVM_BUG_ON() or
> kvm_flush_remote_tlbs().
>
> Alternatively, KVM could short-circuit problematic paths if the VM's
> refcount has gone to zero, e.g. in kvm_make_all_cpus_request(), or KVM
> could try disallow making global requests during teardown. But given that
> deleting the vCPU from the array Just Works, adding logic to the requests
> path is unnecessary, and trying to make requests illegal during teardown
> would be a fool's errand.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> virt/kvm/kvm_main.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 201c14ff476f..991e8111e88b 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -489,6 +489,14 @@ void kvm_destroy_vcpus(struct kvm *kvm)
> kvm_for_each_vcpu(i, vcpu, kvm) {
> kvm_vcpu_destroy(vcpu);
> xa_erase(&kvm->vcpu_array, i);
> +
> + /*
> + * Assert that the vCPU isn't visible in any way, to ensure KVM
> + * doesn't trigger a use-after-free if destroying vCPUs results
> + * in VM-wide request, e.g. to flush remote TLBs when tearing
> + * down MMUs, or to mark the VM dead if a KVM_BUG_ON() fires.
> + */
> + WARN_ON_ONCE(xa_load(&kvm->vcpu_array, i) || kvm_get_vcpu(kvm, i));
As xa_erase() says "After this function returns, loading from @index will return
%NULL", is this checking of xa_load() necessary?
> }
>
> atomic_set(&kvm->online_vcpus, 0);
> --
> 2.48.1.658.g4767266eb4-goog
>
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 3/7] KVM: Assert that a destroyed/freed vCPU is no longer visible
2025-02-25 7:07 ` Yan Zhao
@ 2025-02-25 14:35 ` Sean Christopherson
0 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-02-25 14:35 UTC (permalink / raw)
To: Yan Zhao
Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Madhavan Srinivasan, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Paolo Bonzini, linux-arm-kernel, kvmarm, kvm, loongarch,
linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Aaron Lewis, Jim Mattson, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
On Tue, Feb 25, 2025, Yan Zhao wrote:
> On Mon, Feb 24, 2025 at 03:55:38PM -0800, Sean Christopherson wrote:
> > After freeing a vCPU, assert that it is no longer reachable, and that
> > kvm_get_vcpu() doesn't return garbage or a pointer to some other vCPU.
> > While KVM obviously shouldn't be attempting to access a freed vCPU, it's
> > all too easy for KVM to make a VM-wide request, e.g. via KVM_BUG_ON() or
> > kvm_flush_remote_tlbs().
> >
> > Alternatively, KVM could short-circuit problematic paths if the VM's
> > refcount has gone to zero, e.g. in kvm_make_all_cpus_request(), or KVM
> > could try disallow making global requests during teardown. But given that
> > deleting the vCPU from the array Just Works, adding logic to the requests
> > path is unnecessary, and trying to make requests illegal during teardown
> > would be a fool's errand.
> >
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> > virt/kvm/kvm_main.c | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 201c14ff476f..991e8111e88b 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -489,6 +489,14 @@ void kvm_destroy_vcpus(struct kvm *kvm)
> > kvm_for_each_vcpu(i, vcpu, kvm) {
> > kvm_vcpu_destroy(vcpu);
> > xa_erase(&kvm->vcpu_array, i);
> > +
> > + /*
> > + * Assert that the vCPU isn't visible in any way, to ensure KVM
> > + * doesn't trigger a use-after-free if destroying vCPUs results
> > + * in VM-wide request, e.g. to flush remote TLBs when tearing
> > + * down MMUs, or to mark the VM dead if a KVM_BUG_ON() fires.
> > + */
> > + WARN_ON_ONCE(xa_load(&kvm->vcpu_array, i) || kvm_get_vcpu(kvm, i));
> As xa_erase() says "After this function returns, loading from @index will return
> %NULL", is this checking of xa_load() necessary?
None of this is "necessary". My goal with the assert is to (a) document that KVM
relies the vCPU to be NULL/unreachable and (b) to help ensure that doesn't change
in the future. Checking xa_load() is mostly about (a).
That said, I agree checking xa_load() is more than a bit gratuitous. I have no
objection to checking only kvm_get_vcpu().
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 4/7] KVM: x86: Don't load/put vCPU when unloading its MMU during teardown
2025-02-24 23:55 [PATCH 0/7] KVM: x86: nVMX IRQ fix and VM teardown cleanups Sean Christopherson
` (2 preceding siblings ...)
2025-02-24 23:55 ` [PATCH 3/7] KVM: Assert that a destroyed/freed vCPU is no longer visible Sean Christopherson
@ 2025-02-24 23:55 ` Sean Christopherson
2025-02-25 7:13 ` Yan Zhao
2025-02-24 23:55 ` [PATCH 5/7] KVM: x86: Unload MMUs during vCPU destruction, not before Sean Christopherson
` (4 subsequent siblings)
8 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2025-02-24 23:55 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Madhavan Srinivasan, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson, Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, Aaron Lewis,
Jim Mattson, Yan Zhao, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
Don't load (and then put) a vCPU when unloading its MMU during VM
destruction, as nothing in kvm_mmu_unload() accesses vCPU state beyond the
root page/address of each MMU, i.e. can't possible need to run with the
vCPU loaded.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 045c61cc7e54..9978ed4c0917 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12767,13 +12767,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return ret;
}
-static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
-{
- vcpu_load(vcpu);
- kvm_mmu_unload(vcpu);
- vcpu_put(vcpu);
-}
-
static void kvm_unload_vcpu_mmus(struct kvm *kvm)
{
unsigned long i;
@@ -12781,7 +12774,7 @@ static void kvm_unload_vcpu_mmus(struct kvm *kvm)
kvm_for_each_vcpu(i, vcpu, kvm) {
kvm_clear_async_pf_completion_queue(vcpu);
- kvm_unload_vcpu_mmu(vcpu);
+ kvm_mmu_unload(vcpu);
}
}
--
2.48.1.658.g4767266eb4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH 4/7] KVM: x86: Don't load/put vCPU when unloading its MMU during teardown
2025-02-24 23:55 ` [PATCH 4/7] KVM: x86: Don't load/put vCPU when unloading its MMU during teardown Sean Christopherson
@ 2025-02-25 7:13 ` Yan Zhao
2025-02-25 14:44 ` Sean Christopherson
0 siblings, 1 reply; 22+ messages in thread
From: Yan Zhao @ 2025-02-25 7:13 UTC (permalink / raw)
To: Sean Christopherson
Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Madhavan Srinivasan, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Paolo Bonzini, linux-arm-kernel, kvmarm, kvm, loongarch,
linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Aaron Lewis, Jim Mattson, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
On Mon, Feb 24, 2025 at 03:55:39PM -0800, Sean Christopherson wrote:
> Don't load (and then put) a vCPU when unloading its MMU during VM
> destruction, as nothing in kvm_mmu_unload() accesses vCPU state beyond the
> root page/address of each MMU, i.e. can't possible need to run with the
> vCPU loaded.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/x86.c | 9 +--------
> 1 file changed, 1 insertion(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 045c61cc7e54..9978ed4c0917 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12767,13 +12767,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> return ret;
> }
>
> -static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
> -{
> - vcpu_load(vcpu);
> - kvm_mmu_unload(vcpu);
> - vcpu_put(vcpu);
> -}
> -
> static void kvm_unload_vcpu_mmus(struct kvm *kvm)
> {
> unsigned long i;
> @@ -12781,7 +12774,7 @@ static void kvm_unload_vcpu_mmus(struct kvm *kvm)
>
> kvm_for_each_vcpu(i, vcpu, kvm) {
> kvm_clear_async_pf_completion_queue(vcpu);
> - kvm_unload_vcpu_mmu(vcpu);
> + kvm_mmu_unload(vcpu);
What about just dropping kvm_unload_vcpu_mmu() here?
kvm_mmu_unload() will be invoked again in kvm_mmu_destroy().
kvm_arch_vcpu_destroy() --> kvm_mmu_destroy() --> kvm_mmu_unload().
> }
> }
>
> --
> 2.48.1.658.g4767266eb4-goog
>
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 4/7] KVM: x86: Don't load/put vCPU when unloading its MMU during teardown
2025-02-25 7:13 ` Yan Zhao
@ 2025-02-25 14:44 ` Sean Christopherson
0 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-02-25 14:44 UTC (permalink / raw)
To: Yan Zhao
Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Madhavan Srinivasan, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Paolo Bonzini, linux-arm-kernel, kvmarm, kvm, loongarch,
linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Aaron Lewis, Jim Mattson, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
On Tue, Feb 25, 2025, Yan Zhao wrote:
> On Mon, Feb 24, 2025 at 03:55:39PM -0800, Sean Christopherson wrote:
> > Don't load (and then put) a vCPU when unloading its MMU during VM
> > destruction, as nothing in kvm_mmu_unload() accesses vCPU state beyond the
> > root page/address of each MMU, i.e. can't possible need to run with the
> > vCPU loaded.
> >
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> > arch/x86/kvm/x86.c | 9 +--------
> > 1 file changed, 1 insertion(+), 8 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 045c61cc7e54..9978ed4c0917 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -12767,13 +12767,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> > return ret;
> > }
> >
> > -static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
> > -{
> > - vcpu_load(vcpu);
> > - kvm_mmu_unload(vcpu);
> > - vcpu_put(vcpu);
> > -}
> > -
> > static void kvm_unload_vcpu_mmus(struct kvm *kvm)
> > {
> > unsigned long i;
> > @@ -12781,7 +12774,7 @@ static void kvm_unload_vcpu_mmus(struct kvm *kvm)
> >
> > kvm_for_each_vcpu(i, vcpu, kvm) {
> > kvm_clear_async_pf_completion_queue(vcpu);
> > - kvm_unload_vcpu_mmu(vcpu);
> > + kvm_mmu_unload(vcpu);
> What about just dropping kvm_unload_vcpu_mmu() here?
> kvm_mmu_unload() will be invoked again in kvm_mmu_destroy().
>
> kvm_arch_vcpu_destroy() --> kvm_mmu_destroy() --> kvm_mmu_unload().
Ugh, I missed that there's yet another call to kvm_mmu_unload(). I definitely
agree with dropping the first kvm_mmu_load(), but I'll do it in a follow-up patch
so that all three changes are isolated (not doing the load/put, doing unload as
part of vCPU destruction, doing unload only once at the end).
And looking at both calls to kvm_mmu_unload(), I suspect that grabbing kvm->srcu
around kvm_mmu_destroy() is unnecessary. I'll try cleaning that up as well.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 5/7] KVM: x86: Unload MMUs during vCPU destruction, not before
2025-02-24 23:55 [PATCH 0/7] KVM: x86: nVMX IRQ fix and VM teardown cleanups Sean Christopherson
` (3 preceding siblings ...)
2025-02-24 23:55 ` [PATCH 4/7] KVM: x86: Don't load/put vCPU when unloading its MMU during teardown Sean Christopherson
@ 2025-02-24 23:55 ` Sean Christopherson
2025-02-24 23:55 ` [PATCH 6/7] KVM: x86: Fold guts of kvm_arch_sync_events() into kvm_arch_pre_destroy_vm() Sean Christopherson
` (3 subsequent siblings)
8 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-02-24 23:55 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Madhavan Srinivasan, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson, Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, Aaron Lewis,
Jim Mattson, Yan Zhao, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
When destroying a VM, unload a vCPU's MMUs as part of normal vCPU freeing,
instead of as a separate prepratory action. Unloading MMUs ahead of time
is a holdover from commit 7b53aa565084 ("KVM: Fix vcpu freeing for guest
smp"), which "fixed" a rather egregious flaw where KVM would attempt to
free *all* MMU pages when destroying a vCPU.
At the time, KVM would spin on all MMU pages in a VM when free a single
vCPU, and so would hang due to the way KVM pins and zaps root pages
(roots are invalidated but not freed if they are pinned by a vCPU).
static void free_mmu_pages(struct kvm_vcpu *vcpu)
{
struct kvm_mmu_page *page;
while (!list_empty(&vcpu->kvm->active_mmu_pages)) {
page = container_of(vcpu->kvm->active_mmu_pages.next,
struct kvm_mmu_page, link);
kvm_mmu_zap_page(vcpu->kvm, page);
}
free_page((unsigned long)vcpu->mmu.pae_root);
}
Now that KVM doesn't try to free all MMU pages when destroying a single
vCPU, there's no need to unpin roots prior to destroying a vCPU.
Note! While KVM mostly destroys all MMUs before calling
kvm_arch_destroy_vm() (see commit f00be0cae4e6 ("KVM: MMU: do not free
active mmu pages in free_mmu_pages()")), unpinning MMU roots during vCPU
destruction will unfortunately trigger remote TLB flushes, i.e. will try
to send requests to all vCPUs.
Happily, thanks to commit 27592ae8dbe4 ("KVM: Move wiping of the kvm->vcpus
array to common code"), that's a non-issue as freed vCPUs are naturally
skipped by xa_for_each_range(), i.e. by kvm_for_each_vcpu(). Prior to that
commit, KVM x86 rather stupidly freed vCPUs one-by-one, and _then_
nullified them, one-by-one. I.e. triggering a VM-wide request would hit a
use-after-free.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 15 +++------------
1 file changed, 3 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9978ed4c0917..a61dbd1f0d01 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12374,6 +12374,9 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
int idx;
+ kvm_clear_async_pf_completion_queue(vcpu);
+ kvm_mmu_unload(vcpu);
+
kvmclock_reset(vcpu);
kvm_x86_call(vcpu_free)(vcpu);
@@ -12767,17 +12770,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return ret;
}
-static void kvm_unload_vcpu_mmus(struct kvm *kvm)
-{
- unsigned long i;
- struct kvm_vcpu *vcpu;
-
- kvm_for_each_vcpu(i, vcpu, kvm) {
- kvm_clear_async_pf_completion_queue(vcpu);
- kvm_mmu_unload(vcpu);
- }
-}
-
void kvm_arch_sync_events(struct kvm *kvm)
{
cancel_delayed_work_sync(&kvm->arch.kvmclock_sync_work);
@@ -12882,7 +12874,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
__x86_set_memory_region(kvm, TSS_PRIVATE_MEMSLOT, 0, 0);
mutex_unlock(&kvm->slots_lock);
}
- kvm_unload_vcpu_mmus(kvm);
kvm_destroy_vcpus(kvm);
kvm_x86_call(vm_destroy)(kvm);
kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
--
2.48.1.658.g4767266eb4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH 6/7] KVM: x86: Fold guts of kvm_arch_sync_events() into kvm_arch_pre_destroy_vm()
2025-02-24 23:55 [PATCH 0/7] KVM: x86: nVMX IRQ fix and VM teardown cleanups Sean Christopherson
` (4 preceding siblings ...)
2025-02-24 23:55 ` [PATCH 5/7] KVM: x86: Unload MMUs during vCPU destruction, not before Sean Christopherson
@ 2025-02-24 23:55 ` Sean Christopherson
2025-02-24 23:55 ` [PATCH 7/7] KVM: Drop kvm_arch_sync_events() now that all implementations are nops Sean Christopherson
` (2 subsequent siblings)
8 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-02-24 23:55 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Madhavan Srinivasan, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson, Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, Aaron Lewis,
Jim Mattson, Yan Zhao, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
Fold the guts of kvm_arch_sync_events() into kvm_arch_pre_destroy_vm(), as
the kvmclock and PIT background workers only need to be stopped before
destroying vCPUs (to avoid accessing vCPUs as they are being freed); it's
a-ok for them to be running while the VM is visible on the global vm_list.
Note, the PIT also needs to be stopped before IRQ routing is freed
(because KVM's IRQ routing is garbage and assumes there is always non-NULL
routing).
Opportunistically add comments to explain why KVM stops/frees certain
assets early.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a61dbd1f0d01..ea445e6579f1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12772,9 +12772,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
void kvm_arch_sync_events(struct kvm *kvm)
{
- cancel_delayed_work_sync(&kvm->arch.kvmclock_sync_work);
- cancel_delayed_work_sync(&kvm->arch.kvmclock_update_work);
- kvm_free_pit(kvm);
+
}
/**
@@ -12855,6 +12853,17 @@ EXPORT_SYMBOL_GPL(__x86_set_memory_region);
void kvm_arch_pre_destroy_vm(struct kvm *kvm)
{
+ /*
+ * Stop all background workers and kthreads before destroying vCPUs, as
+ * iterating over vCPUs in a different task while vCPUs are being freed
+ * is unsafe, i.e. will lead to use-after-free. The PIT also needs to
+ * be stopped before IRQ routing is freed.
+ */
+ cancel_delayed_work_sync(&kvm->arch.kvmclock_sync_work);
+ cancel_delayed_work_sync(&kvm->arch.kvmclock_update_work);
+
+ kvm_free_pit(kvm);
+
kvm_mmu_pre_destroy_vm(kvm);
}
--
2.48.1.658.g4767266eb4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH 7/7] KVM: Drop kvm_arch_sync_events() now that all implementations are nops
2025-02-24 23:55 [PATCH 0/7] KVM: x86: nVMX IRQ fix and VM teardown cleanups Sean Christopherson
` (5 preceding siblings ...)
2025-02-24 23:55 ` [PATCH 6/7] KVM: x86: Fold guts of kvm_arch_sync_events() into kvm_arch_pre_destroy_vm() Sean Christopherson
@ 2025-02-24 23:55 ` Sean Christopherson
2025-02-25 12:05 ` bibo mao
2025-02-25 16:15 ` Claudio Imbrenda
2025-02-26 18:38 ` [PATCH 0/7] KVM: x86: nVMX IRQ fix and VM teardown cleanups Paolo Bonzini
2025-03-27 3:24 ` patchwork-bot+linux-riscv
8 siblings, 2 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-02-24 23:55 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Madhavan Srinivasan, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson, Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, Aaron Lewis,
Jim Mattson, Yan Zhao, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
Remove kvm_arch_sync_events() now that x86 no longer uses it (no other
arch has ever used it).
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/arm64/include/asm/kvm_host.h | 2 --
arch/loongarch/include/asm/kvm_host.h | 1 -
arch/mips/include/asm/kvm_host.h | 1 -
arch/powerpc/include/asm/kvm_host.h | 1 -
arch/riscv/include/asm/kvm_host.h | 2 --
arch/s390/include/asm/kvm_host.h | 1 -
arch/x86/kvm/x86.c | 5 -----
include/linux/kvm_host.h | 1 -
virt/kvm/kvm_main.c | 1 -
9 files changed, 15 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7cfa024de4e3..40897bd2b4a3 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1346,8 +1346,6 @@ static inline bool kvm_system_needs_idmapped_vectors(void)
return cpus_have_final_cap(ARM64_SPECTRE_V3A);
}
-static inline void kvm_arch_sync_events(struct kvm *kvm) {}
-
void kvm_init_host_debug_data(void);
void kvm_vcpu_load_debug(struct kvm_vcpu *vcpu);
void kvm_vcpu_put_debug(struct kvm_vcpu *vcpu);
diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
index 590982cd986e..ab5b7001e2ff 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -320,7 +320,6 @@ static inline bool kvm_is_ifetch_fault(struct kvm_vcpu_arch *arch)
/* Misc */
static inline void kvm_arch_hardware_unsetup(void) {}
-static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index f7222eb594ea..c14b10821817 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -886,7 +886,6 @@ extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
extern int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,
struct kvm_mips_interrupt *irq);
-static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_free_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot) {}
static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 6e1108f8fce6..2d139c807577 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -902,7 +902,6 @@ struct kvm_vcpu_arch {
#define __KVM_HAVE_ARCH_WQP
#define __KVM_HAVE_CREATE_DEVICE
-static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index cc33e35cd628..0e9c2fab6378 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -301,8 +301,6 @@ static inline bool kvm_arch_pmi_in_guest(struct kvm_vcpu *vcpu)
return IS_ENABLED(CONFIG_GUEST_PERF_EVENTS) && !!vcpu;
}
-static inline void kvm_arch_sync_events(struct kvm *kvm) {}
-
#define KVM_RISCV_GSTAGE_TLB_MIN_ORDER 12
void kvm_riscv_local_hfence_gvma_vmid_gpa(unsigned long vmid,
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 9a367866cab0..424f899d8163 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -1056,7 +1056,6 @@ bool kvm_s390_pv_cpu_is_protected(struct kvm_vcpu *vcpu);
extern int kvm_s390_gisc_register(struct kvm *kvm, u32 gisc);
extern int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc);
-static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_free_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot) {}
static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ea445e6579f1..454fd6b8f3db 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12770,11 +12770,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return ret;
}
-void kvm_arch_sync_events(struct kvm *kvm)
-{
-
-}
-
/**
* __x86_set_memory_region: Setup KVM internal memory slot
*
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c28a6aa1f2ed..5438a1b446a6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1747,7 +1747,6 @@ static inline void kvm_unregister_perf_callbacks(void) {}
int kvm_arch_init_vm(struct kvm *kvm, unsigned long type);
void kvm_arch_destroy_vm(struct kvm *kvm);
-void kvm_arch_sync_events(struct kvm *kvm);
int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 991e8111e88b..55153494ac70 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1271,7 +1271,6 @@ static void kvm_destroy_vm(struct kvm *kvm)
kvm_destroy_pm_notifier(kvm);
kvm_uevent_notify_change(KVM_EVENT_DESTROY_VM, kvm);
kvm_destroy_vm_debugfs(kvm);
- kvm_arch_sync_events(kvm);
mutex_lock(&kvm_lock);
list_del(&kvm->vm_list);
mutex_unlock(&kvm_lock);
--
2.48.1.658.g4767266eb4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH 7/7] KVM: Drop kvm_arch_sync_events() now that all implementations are nops
2025-02-24 23:55 ` [PATCH 7/7] KVM: Drop kvm_arch_sync_events() now that all implementations are nops Sean Christopherson
@ 2025-02-25 12:05 ` bibo mao
2025-02-25 16:15 ` Claudio Imbrenda
1 sibling, 0 replies; 22+ messages in thread
From: bibo mao @ 2025-02-25 12:05 UTC (permalink / raw)
To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
Huacai Chen, Madhavan Srinivasan, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, Aaron Lewis,
Jim Mattson, Yan Zhao, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
On 2025/2/25 上午7:55, Sean Christopherson wrote:
> Remove kvm_arch_sync_events() now that x86 no longer uses it (no other
> arch has ever used it).
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/arm64/include/asm/kvm_host.h | 2 --
> arch/loongarch/include/asm/kvm_host.h | 1 -
> arch/mips/include/asm/kvm_host.h | 1 -
> arch/powerpc/include/asm/kvm_host.h | 1 -
> arch/riscv/include/asm/kvm_host.h | 2 --
> arch/s390/include/asm/kvm_host.h | 1 -
> arch/x86/kvm/x86.c | 5 -----
> include/linux/kvm_host.h | 1 -
> virt/kvm/kvm_main.c | 1 -
> 9 files changed, 15 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 7cfa024de4e3..40897bd2b4a3 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1346,8 +1346,6 @@ static inline bool kvm_system_needs_idmapped_vectors(void)
> return cpus_have_final_cap(ARM64_SPECTRE_V3A);
> }
>
> -static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> -
> void kvm_init_host_debug_data(void);
> void kvm_vcpu_load_debug(struct kvm_vcpu *vcpu);
> void kvm_vcpu_put_debug(struct kvm_vcpu *vcpu);
> diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
> index 590982cd986e..ab5b7001e2ff 100644
> --- a/arch/loongarch/include/asm/kvm_host.h
> +++ b/arch/loongarch/include/asm/kvm_host.h
> @@ -320,7 +320,6 @@ static inline bool kvm_is_ifetch_fault(struct kvm_vcpu_arch *arch)
>
> /* Misc */
> static inline void kvm_arch_hardware_unsetup(void) {}
> -static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
> static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
> index f7222eb594ea..c14b10821817 100644
> --- a/arch/mips/include/asm/kvm_host.h
> +++ b/arch/mips/include/asm/kvm_host.h
> @@ -886,7 +886,6 @@ extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
> extern int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,
> struct kvm_mips_interrupt *irq);
>
> -static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> static inline void kvm_arch_free_memslot(struct kvm *kvm,
> struct kvm_memory_slot *slot) {}
> static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 6e1108f8fce6..2d139c807577 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -902,7 +902,6 @@ struct kvm_vcpu_arch {
> #define __KVM_HAVE_ARCH_WQP
> #define __KVM_HAVE_CREATE_DEVICE
>
> -static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
> static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
> static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> index cc33e35cd628..0e9c2fab6378 100644
> --- a/arch/riscv/include/asm/kvm_host.h
> +++ b/arch/riscv/include/asm/kvm_host.h
> @@ -301,8 +301,6 @@ static inline bool kvm_arch_pmi_in_guest(struct kvm_vcpu *vcpu)
> return IS_ENABLED(CONFIG_GUEST_PERF_EVENTS) && !!vcpu;
> }
>
> -static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> -
> #define KVM_RISCV_GSTAGE_TLB_MIN_ORDER 12
>
> void kvm_riscv_local_hfence_gvma_vmid_gpa(unsigned long vmid,
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 9a367866cab0..424f899d8163 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -1056,7 +1056,6 @@ bool kvm_s390_pv_cpu_is_protected(struct kvm_vcpu *vcpu);
> extern int kvm_s390_gisc_register(struct kvm *kvm, u32 gisc);
> extern int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc);
>
> -static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> static inline void kvm_arch_free_memslot(struct kvm *kvm,
> struct kvm_memory_slot *slot) {}
> static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ea445e6579f1..454fd6b8f3db 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12770,11 +12770,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> return ret;
> }
>
> -void kvm_arch_sync_events(struct kvm *kvm)
> -{
> -
> -}
> -
> /**
> * __x86_set_memory_region: Setup KVM internal memory slot
> *
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index c28a6aa1f2ed..5438a1b446a6 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1747,7 +1747,6 @@ static inline void kvm_unregister_perf_callbacks(void) {}
>
> int kvm_arch_init_vm(struct kvm *kvm, unsigned long type);
> void kvm_arch_destroy_vm(struct kvm *kvm);
> -void kvm_arch_sync_events(struct kvm *kvm);
>
> int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 991e8111e88b..55153494ac70 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1271,7 +1271,6 @@ static void kvm_destroy_vm(struct kvm *kvm)
> kvm_destroy_pm_notifier(kvm);
> kvm_uevent_notify_change(KVM_EVENT_DESTROY_VM, kvm);
> kvm_destroy_vm_debugfs(kvm);
> - kvm_arch_sync_events(kvm);
> mutex_lock(&kvm_lock);
> list_del(&kvm->vm_list);
> mutex_unlock(&kvm_lock);
>
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 7/7] KVM: Drop kvm_arch_sync_events() now that all implementations are nops
2025-02-24 23:55 ` [PATCH 7/7] KVM: Drop kvm_arch_sync_events() now that all implementations are nops Sean Christopherson
2025-02-25 12:05 ` bibo mao
@ 2025-02-25 16:15 ` Claudio Imbrenda
1 sibling, 0 replies; 22+ messages in thread
From: Claudio Imbrenda @ 2025-02-25 16:15 UTC (permalink / raw)
To: Sean Christopherson
Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Madhavan Srinivasan, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Paolo Bonzini,
linux-arm-kernel, kvmarm, kvm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, Aaron Lewis,
Jim Mattson, Yan Zhao, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
On Mon, 24 Feb 2025 15:55:42 -0800
Sean Christopherson <seanjc@google.com> wrote:
> Remove kvm_arch_sync_events() now that x86 no longer uses it (no other
> arch has ever used it).
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
> arch/arm64/include/asm/kvm_host.h | 2 --
> arch/loongarch/include/asm/kvm_host.h | 1 -
> arch/mips/include/asm/kvm_host.h | 1 -
> arch/powerpc/include/asm/kvm_host.h | 1 -
> arch/riscv/include/asm/kvm_host.h | 2 --
> arch/s390/include/asm/kvm_host.h | 1 -
> arch/x86/kvm/x86.c | 5 -----
> include/linux/kvm_host.h | 1 -
> virt/kvm/kvm_main.c | 1 -
> 9 files changed, 15 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 7cfa024de4e3..40897bd2b4a3 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1346,8 +1346,6 @@ static inline bool kvm_system_needs_idmapped_vectors(void)
> return cpus_have_final_cap(ARM64_SPECTRE_V3A);
> }
>
> -static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> -
> void kvm_init_host_debug_data(void);
> void kvm_vcpu_load_debug(struct kvm_vcpu *vcpu);
> void kvm_vcpu_put_debug(struct kvm_vcpu *vcpu);
> diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
> index 590982cd986e..ab5b7001e2ff 100644
> --- a/arch/loongarch/include/asm/kvm_host.h
> +++ b/arch/loongarch/include/asm/kvm_host.h
> @@ -320,7 +320,6 @@ static inline bool kvm_is_ifetch_fault(struct kvm_vcpu_arch *arch)
>
> /* Misc */
> static inline void kvm_arch_hardware_unsetup(void) {}
> -static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
> static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
> index f7222eb594ea..c14b10821817 100644
> --- a/arch/mips/include/asm/kvm_host.h
> +++ b/arch/mips/include/asm/kvm_host.h
> @@ -886,7 +886,6 @@ extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
> extern int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,
> struct kvm_mips_interrupt *irq);
>
> -static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> static inline void kvm_arch_free_memslot(struct kvm *kvm,
> struct kvm_memory_slot *slot) {}
> static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 6e1108f8fce6..2d139c807577 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -902,7 +902,6 @@ struct kvm_vcpu_arch {
> #define __KVM_HAVE_ARCH_WQP
> #define __KVM_HAVE_CREATE_DEVICE
>
> -static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
> static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
> static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> index cc33e35cd628..0e9c2fab6378 100644
> --- a/arch/riscv/include/asm/kvm_host.h
> +++ b/arch/riscv/include/asm/kvm_host.h
> @@ -301,8 +301,6 @@ static inline bool kvm_arch_pmi_in_guest(struct kvm_vcpu *vcpu)
> return IS_ENABLED(CONFIG_GUEST_PERF_EVENTS) && !!vcpu;
> }
>
> -static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> -
> #define KVM_RISCV_GSTAGE_TLB_MIN_ORDER 12
>
> void kvm_riscv_local_hfence_gvma_vmid_gpa(unsigned long vmid,
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 9a367866cab0..424f899d8163 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -1056,7 +1056,6 @@ bool kvm_s390_pv_cpu_is_protected(struct kvm_vcpu *vcpu);
> extern int kvm_s390_gisc_register(struct kvm *kvm, u32 gisc);
> extern int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc);
>
> -static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> static inline void kvm_arch_free_memslot(struct kvm *kvm,
> struct kvm_memory_slot *slot) {}
> static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ea445e6579f1..454fd6b8f3db 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12770,11 +12770,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> return ret;
> }
>
> -void kvm_arch_sync_events(struct kvm *kvm)
> -{
> -
> -}
> -
> /**
> * __x86_set_memory_region: Setup KVM internal memory slot
> *
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index c28a6aa1f2ed..5438a1b446a6 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1747,7 +1747,6 @@ static inline void kvm_unregister_perf_callbacks(void) {}
>
> int kvm_arch_init_vm(struct kvm *kvm, unsigned long type);
> void kvm_arch_destroy_vm(struct kvm *kvm);
> -void kvm_arch_sync_events(struct kvm *kvm);
>
> int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 991e8111e88b..55153494ac70 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1271,7 +1271,6 @@ static void kvm_destroy_vm(struct kvm *kvm)
> kvm_destroy_pm_notifier(kvm);
> kvm_uevent_notify_change(KVM_EVENT_DESTROY_VM, kvm);
> kvm_destroy_vm_debugfs(kvm);
> - kvm_arch_sync_events(kvm);
> mutex_lock(&kvm_lock);
> list_del(&kvm->vm_list);
> mutex_unlock(&kvm_lock);
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/7] KVM: x86: nVMX IRQ fix and VM teardown cleanups
2025-02-24 23:55 [PATCH 0/7] KVM: x86: nVMX IRQ fix and VM teardown cleanups Sean Christopherson
` (6 preceding siblings ...)
2025-02-24 23:55 ` [PATCH 7/7] KVM: Drop kvm_arch_sync_events() now that all implementations are nops Sean Christopherson
@ 2025-02-26 18:38 ` Paolo Bonzini
2025-03-27 3:24 ` patchwork-bot+linux-riscv
8 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2025-02-26 18:38 UTC (permalink / raw)
To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
Bibo Mao, Huacai Chen, Madhavan Srinivasan, Anup Patel,
Paul Walmsley, Palmer Dabbelt, Albert Ou, Christian Borntraeger,
Janosch Frank, Claudio Imbrenda
Cc: linux-arm-kernel, kvmarm, kvm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, Aaron Lewis,
Jim Mattson, Yan Zhao, Rick P Edgecombe, Kai Huang,
Isaku Yamahata
On 2/25/25 00:55, Sean Christopherson wrote:
> This was _supposed_ to be a tiny one-off patch to fix a nVMX bug where KVM
> fails to detect that, after nested VM-Exit, L1 has a pending IRQ (or NMI).
> But because x86's nested teardown flows are garbage (KVM simply forces a
> nested VM-Exit to put the vCPU back into L1), that simple fix snowballed.
>
> The immediate issue is that checking for a pending interrupt accesses the
> legacy PIC, and x86's kvm_arch_destroy_vm() currently frees the PIC before
> destroying vCPUs, i.e. checking for IRQs during the forced nested VM-Exit
> results in a NULL pointer deref (or use-after-free if KVM didn't nullify
> the PIC pointer). That's patch 1.
>
> Patch 2 is the original nVMX fix.
>
> The remaining patches attempt to bring a bit of sanity to x86's VM
> teardown code, which has accumulated a lot of cruft over the years. E.g.
> KVM currently unloads each vCPU's MMUs in a separate operation from
> destroying vCPUs, all because when guest SMP support was added, KVM had a
> kludgy MMU teardown flow that broken when a VM had more than one 1 vCPU.
> And that oddity lived on, for 18 years...
Queued patches 1 and 2 to kvm/master, and everything to kvm/queue
(pending a little more testing and the related TDX change).
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 0/7] KVM: x86: nVMX IRQ fix and VM teardown cleanups
2025-02-24 23:55 [PATCH 0/7] KVM: x86: nVMX IRQ fix and VM teardown cleanups Sean Christopherson
` (7 preceding siblings ...)
2025-02-26 18:38 ` [PATCH 0/7] KVM: x86: nVMX IRQ fix and VM teardown cleanups Paolo Bonzini
@ 2025-03-27 3:24 ` patchwork-bot+linux-riscv
8 siblings, 0 replies; 22+ messages in thread
From: patchwork-bot+linux-riscv @ 2025-03-27 3:24 UTC (permalink / raw)
To: Sean Christopherson
Cc: linux-riscv, maz, oliver.upton, zhaotianrui, maobibo, chenhuacai,
maddy, anup, paul.walmsley, palmer, aou, borntraeger, frankja,
imbrenda, pbonzini, linux-arm-kernel, kvmarm, kvm, loongarch,
linux-mips, linuxppc-dev, kvm-riscv, linux-kernel, aaronlewis,
jmattson, yan.y.zhao, rick.p.edgecombe, kai.huang, isaku.yamahata
Hello:
This series was applied to riscv/linux.git (for-next)
by Paolo Bonzini <pbonzini@redhat.com>:
On Mon, 24 Feb 2025 15:55:35 -0800 you wrote:
> This was _supposed_ to be a tiny one-off patch to fix a nVMX bug where KVM
> fails to detect that, after nested VM-Exit, L1 has a pending IRQ (or NMI).
> But because x86's nested teardown flows are garbage (KVM simply forces a
> nested VM-Exit to put the vCPU back into L1), that simple fix snowballed.
>
> The immediate issue is that checking for a pending interrupt accesses the
> legacy PIC, and x86's kvm_arch_destroy_vm() currently frees the PIC before
> destroying vCPUs, i.e. checking for IRQs during the forced nested VM-Exit
> results in a NULL pointer deref (or use-after-free if KVM didn't nullify
> the PIC pointer). That's patch 1.
>
> [...]
Here is the summary with links:
- [1/7] KVM: x86: Free vCPUs before freeing VM state
https://git.kernel.org/riscv/c/17bcd7144263
- [2/7] KVM: nVMX: Process events on nested VM-Exit if injectable IRQ or NMI is pending
https://git.kernel.org/riscv/c/982caaa11504
- [3/7] KVM: Assert that a destroyed/freed vCPU is no longer visible
(no matching commit)
- [4/7] KVM: x86: Don't load/put vCPU when unloading its MMU during teardown
(no matching commit)
- [5/7] KVM: x86: Unload MMUs during vCPU destruction, not before
(no matching commit)
- [6/7] KVM: x86: Fold guts of kvm_arch_sync_events() into kvm_arch_pre_destroy_vm()
(no matching commit)
- [7/7] KVM: Drop kvm_arch_sync_events() now that all implementations are nops
(no matching commit)
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 22+ messages in thread