From: Sean Christopherson <seanjc@google.com>
To: Marc Zyngier <maz@kernel.org>,
Oliver Upton <oliver.upton@linux.dev>,
Tianrui Zhao <zhaotianrui@loongson.cn>,
Bibo Mao <maobibo@loongson.cn>,
Huacai Chen <chenhuacai@kernel.org>,
Madhavan Srinivasan <maddy@linux.ibm.com>,
Anup Patel <anup@brainfault.org>,
Paul Walmsley <paul.walmsley@sifive.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Janosch Frank <frankja@linux.ibm.com>,
Claudio Imbrenda <imbrenda@linux.ibm.com>,
Sean Christopherson <seanjc@google.com>,
Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org, loongarch@lists.linux.dev,
linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org,
linux-kernel@vger.kernel.org,
Aaron Lewis <aaronlewis@google.com>,
Jim Mattson <jmattson@google.com>,
Yan Zhao <yan.y.zhao@intel.com>,
Rick P Edgecombe <rick.p.edgecombe@intel.com>,
Kai Huang <kai.huang@intel.com>,
Isaku Yamahata <isaku.yamahata@intel.com>
Subject: [PATCH 1/7] KVM: x86: Free vCPUs before freeing VM state
Date: Mon, 24 Feb 2025 15:55:36 -0800 [thread overview]
Message-ID: <20250224235542.2562848-2-seanjc@google.com> (raw)
In-Reply-To: <20250224235542.2562848-1-seanjc@google.com>
Free vCPUs before freeing any VM state, as both SVM and VMX may access
VM state when "freeing" a vCPU that is currently "in" L2, i.e. that needs
to be kicked out of nested guest mode.
Commit 6fcee03df6a1 ("KVM: x86: avoid loading a vCPU after .vm_destroy was
called") partially fixed the issue, but for unknown reasons only moved the
MMU unloading before VM destruction. Complete the change, and free all
vCPU state prior to destroying VM state, as nVMX accesses even more state
than nSVM.
In addition to the AVIC, KVM can hit a use-after-free on MSR filters:
kvm_msr_allowed+0x4c/0xd0
__kvm_set_msr+0x12d/0x1e0
kvm_set_msr+0x19/0x40
load_vmcs12_host_state+0x2d8/0x6e0 [kvm_intel]
nested_vmx_vmexit+0x715/0xbd0 [kvm_intel]
nested_vmx_free_vcpu+0x33/0x50 [kvm_intel]
vmx_free_vcpu+0x54/0xc0 [kvm_intel]
kvm_arch_vcpu_destroy+0x28/0xf0
kvm_vcpu_destroy+0x12/0x50
kvm_arch_destroy_vm+0x12c/0x1c0
kvm_put_kvm+0x263/0x3c0
kvm_vm_release+0x21/0x30
and an upcoming fix to process injectable interrupts on nested VM-Exit
will access the PIC:
BUG: kernel NULL pointer dereference, address: 0000000000000090
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
CPU: 23 UID: 1000 PID: 2658 Comm: kvm-nx-lpage-re
RIP: 0010:kvm_cpu_has_extint+0x2f/0x60 [kvm]
Call Trace:
<TASK>
kvm_cpu_has_injectable_intr+0xe/0x60 [kvm]
nested_vmx_vmexit+0x2d7/0xdf0 [kvm_intel]
nested_vmx_free_vcpu+0x40/0x50 [kvm_intel]
vmx_vcpu_free+0x2d/0x80 [kvm_intel]
kvm_arch_vcpu_destroy+0x2d/0x130 [kvm]
kvm_destroy_vcpus+0x8a/0x100 [kvm]
kvm_arch_destroy_vm+0xa7/0x1d0 [kvm]
kvm_destroy_vm+0x172/0x300 [kvm]
kvm_vcpu_release+0x31/0x50 [kvm]
Inarguably, both nSVM and nVMX need to be fixed, but punt on those
cleanups for the moment. Conceptually, vCPUs should be freed before VM
state. Assets like the I/O APIC and PIC _must_ be allocated before vCPUs
are created, so it stands to reason that they must be freed _after_ vCPUs
are destroyed.
Reported-by: Aaron Lewis <aaronlewis@google.com>
Closes: https://lore.kernel.org/all/20240703175618.2304869-2-aaronlewis@google.com
Cc: Jim Mattson <jmattson@google.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>
Cc: Rick P Edgecombe <rick.p.edgecombe@intel.com>
Cc: Kai Huang <kai.huang@intel.com>
Cc: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 58b82d6fd77c..045c61cc7e54 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12890,11 +12890,11 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
mutex_unlock(&kvm->slots_lock);
}
kvm_unload_vcpu_mmus(kvm);
+ kvm_destroy_vcpus(kvm);
kvm_x86_call(vm_destroy)(kvm);
kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
kvm_pic_destroy(kvm);
kvm_ioapic_destroy(kvm);
- kvm_destroy_vcpus(kvm);
kvfree(rcu_dereference_check(kvm->arch.apic_map, 1));
kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1));
kvm_mmu_uninit_vm(kvm);
--
2.48.1.658.g4767266eb4-goog
next prev parent reply other threads:[~2025-02-24 23:55 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-24 23:55 [PATCH 0/7] KVM: x86: nVMX IRQ fix and VM teardown cleanups Sean Christopherson
2025-02-24 23:55 ` Sean Christopherson [this message]
2025-02-25 7:44 ` [PATCH 1/7] KVM: x86: Free vCPUs before freeing VM state Yan Zhao
2025-02-25 15:04 ` Sean Christopherson
2025-02-26 7:34 ` Yan Zhao
2025-02-25 23:47 ` Paolo Bonzini
2025-02-26 0:27 ` Sean Christopherson
2025-02-26 9:18 ` Paolo Bonzini
2025-02-24 23:55 ` [PATCH 2/7] KVM: nVMX: Process events on nested VM-Exit if injectable IRQ or NMI is pending Sean Christopherson
2025-02-24 23:55 ` [PATCH 3/7] KVM: Assert that a destroyed/freed vCPU is no longer visible Sean Christopherson
2025-02-25 7:07 ` Yan Zhao
2025-02-25 14:35 ` Sean Christopherson
2025-02-24 23:55 ` [PATCH 4/7] KVM: x86: Don't load/put vCPU when unloading its MMU during teardown Sean Christopherson
2025-02-25 7:13 ` Yan Zhao
2025-02-25 14:44 ` Sean Christopherson
2025-02-24 23:55 ` [PATCH 5/7] KVM: x86: Unload MMUs during vCPU destruction, not before Sean Christopherson
2025-02-24 23:55 ` [PATCH 6/7] KVM: x86: Fold guts of kvm_arch_sync_events() into kvm_arch_pre_destroy_vm() Sean Christopherson
2025-02-24 23:55 ` [PATCH 7/7] KVM: Drop kvm_arch_sync_events() now that all implementations are nops Sean Christopherson
2025-02-25 12:05 ` bibo mao
2025-02-25 16:15 ` Claudio Imbrenda
2025-02-26 18:38 ` [PATCH 0/7] KVM: x86: nVMX IRQ fix and VM teardown cleanups Paolo Bonzini
2025-03-27 3:24 ` patchwork-bot+linux-riscv
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250224235542.2562848-2-seanjc@google.com \
--to=seanjc@google.com \
--cc=aaronlewis@google.com \
--cc=anup@brainfault.org \
--cc=aou@eecs.berkeley.edu \
--cc=borntraeger@linux.ibm.com \
--cc=chenhuacai@kernel.org \
--cc=frankja@linux.ibm.com \
--cc=imbrenda@linux.ibm.com \
--cc=isaku.yamahata@intel.com \
--cc=jmattson@google.com \
--cc=kai.huang@intel.com \
--cc=kvm-riscv@lists.infradead.org \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=loongarch@lists.linux.dev \
--cc=maddy@linux.ibm.com \
--cc=maobibo@loongson.cn \
--cc=maz@kernel.org \
--cc=oliver.upton@linux.dev \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=pbonzini@redhat.com \
--cc=rick.p.edgecombe@intel.com \
--cc=yan.y.zhao@intel.com \
--cc=zhaotianrui@loongson.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).