From: Greg KH <gregkh@linuxfoundation.org>
To: Kevin Cheng <chengkev@google.com>
Cc: sashal@kernel.org, stable@vger.kernel.org,
Sean Christopherson <seanjc@google.com>,
Aaron Lewis <aaronlewis@google.com>,
Jim Mattson <jmattson@google.com>,
Yan Zhao <yan.y.zhao@intel.com>,
Rick P Edgecombe <rick.p.edgecombe@intel.com>,
Kai Huang <kai.huang@intel.com>,
Isaku Yamahata <isaku.yamahata@intel.com>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 6.12.y v2] KVM: x86: Free vCPUs before freeing VM state
Date: Tue, 29 Jul 2025 16:46:47 +0200 [thread overview]
Message-ID: <2025072934-path-slightly-24f2@gregkh> (raw)
In-Reply-To: <20250728175122.4021478-1-chengkev@google.com>
On Mon, Jul 28, 2025 at 05:51:22PM +0000, Kevin Cheng wrote:
> From: Sean Christopherson <seanjc@google.com>
>
> [ Upstream commit 17bcd714426386fda741a4bccd96a2870179344b ]
>
> Free vCPUs before freeing any VM state, as both SVM and VMX may access
> VM state when "freeing" a vCPU that is currently "in" L2, i.e. that needs
> to be kicked out of nested guest mode.
>
> Commit 6fcee03df6a1 ("KVM: x86: avoid loading a vCPU after .vm_destroy was
> called") partially fixed the issue, but for unknown reasons only moved the
> MMU unloading before VM destruction. Complete the change, and free all
> vCPU state prior to destroying VM state, as nVMX accesses even more state
> than nSVM.
>
> In addition to the AVIC, KVM can hit a use-after-free on MSR filters:
>
> kvm_msr_allowed+0x4c/0xd0
> __kvm_set_msr+0x12d/0x1e0
> kvm_set_msr+0x19/0x40
> load_vmcs12_host_state+0x2d8/0x6e0 [kvm_intel]
> nested_vmx_vmexit+0x715/0xbd0 [kvm_intel]
> nested_vmx_free_vcpu+0x33/0x50 [kvm_intel]
> vmx_free_vcpu+0x54/0xc0 [kvm_intel]
> kvm_arch_vcpu_destroy+0x28/0xf0
> kvm_vcpu_destroy+0x12/0x50
> kvm_arch_destroy_vm+0x12c/0x1c0
> kvm_put_kvm+0x263/0x3c0
> kvm_vm_release+0x21/0x30
>
> and an upcoming fix to process injectable interrupts on nested VM-Exit
> will access the PIC:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000090
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> CPU: 23 UID: 1000 PID: 2658 Comm: kvm-nx-lpage-re
> RIP: 0010:kvm_cpu_has_extint+0x2f/0x60 [kvm]
> Call Trace:
> <TASK>
> kvm_cpu_has_injectable_intr+0xe/0x60 [kvm]
> nested_vmx_vmexit+0x2d7/0xdf0 [kvm_intel]
> nested_vmx_free_vcpu+0x40/0x50 [kvm_intel]
> vmx_vcpu_free+0x2d/0x80 [kvm_intel]
> kvm_arch_vcpu_destroy+0x2d/0x130 [kvm]
> kvm_destroy_vcpus+0x8a/0x100 [kvm]
> kvm_arch_destroy_vm+0xa7/0x1d0 [kvm]
> kvm_destroy_vm+0x172/0x300 [kvm]
> kvm_vcpu_release+0x31/0x50 [kvm]
>
> Inarguably, both nSVM and nVMX need to be fixed, but punt on those
> cleanups for the moment. Conceptually, vCPUs should be freed before VM
> state. Assets like the I/O APIC and PIC _must_ be allocated before vCPUs
> are created, so it stands to reason that they must be freed _after_ vCPUs
> are destroyed.
>
> Reported-by: Aaron Lewis <aaronlewis@google.com>
> Closes: https://lore.kernel.org/all/20240703175618.2304869-2-aaronlewis@google.com
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Yan Zhao <yan.y.zhao@intel.com>
> Cc: Rick P Edgecombe <rick.p.edgecombe@intel.com>
> Cc: Kai Huang <kai.huang@intel.com>
> Cc: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Message-ID: <20250224235542.2562848-2-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Kevin Cheng <chengkev@google.com>
> ---
> arch/x86/kvm/x86.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f378d479fea3f..7f91b11e6f0ec 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12888,11 +12888,11 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> mutex_unlock(&kvm->slots_lock);
> }
> kvm_unload_vcpu_mmus(kvm);
> + kvm_destroy_vcpus(kvm);
> kvm_x86_call(vm_destroy)(kvm);
> kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
> kvm_pic_destroy(kvm);
> kvm_ioapic_destroy(kvm);
> - kvm_destroy_vcpus(kvm);
> kvfree(rcu_dereference_check(kvm->arch.apic_map, 1));
> kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1));
> kvm_mmu_uninit_vm(kvm);
> --
> 2.50.1.487.gc89ff58d15-goog
>
>
What changed in this v2 version?
confused,
greg k-h
next prev parent reply other threads:[~2025-07-29 14:46 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-25 18:58 [PATCH 6.12.y] KVM: x86: Free vCPUs before freeing VM state Kevin Cheng
2025-07-26 1:37 ` Sasha Levin
2025-07-28 17:51 ` [PATCH 6.12.y v2] " Kevin Cheng
2025-07-29 14:46 ` Greg KH [this message]
2025-07-29 15:31 ` Kevin Cheng
2025-07-29 19:45 ` Sasha Levin
-- strict thread matches above, loose matches on Subject: below --
2025-07-05 21:40 [PATCH 6.12.y v2] mm/vmalloc: fix data race in show_numa_info() Sasha Levin
2025-07-28 17:50 ` [PATCH 6.12.y v2] KVM: x86: Free vCPUs before freeing VM state Kevin Cheng
2025-07-29 19:49 ` Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2025072934-path-slightly-24f2@gregkh \
--to=gregkh@linuxfoundation.org \
--cc=aaronlewis@google.com \
--cc=chengkev@google.com \
--cc=isaku.yamahata@intel.com \
--cc=jmattson@google.com \
--cc=kai.huang@intel.com \
--cc=pbonzini@redhat.com \
--cc=rick.p.edgecombe@intel.com \
--cc=sashal@kernel.org \
--cc=seanjc@google.com \
--cc=stable@vger.kernel.org \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.