linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Marc Zyngier <maz@kernel.org>,
	Oliver Upton <oliver.upton@linux.dev>,
	 Tianrui Zhao <zhaotianrui@loongson.cn>,
	Bibo Mao <maobibo@loongson.cn>,
	 Huacai Chen <chenhuacai@kernel.org>,
	Madhavan Srinivasan <maddy@linux.ibm.com>,
	 Anup Patel <anup@brainfault.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	 Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	 Christian Borntraeger <borntraeger@linux.ibm.com>,
	Janosch Frank <frankja@linux.ibm.com>,
	 Claudio Imbrenda <imbrenda@linux.ibm.com>,
	Sean Christopherson <seanjc@google.com>,
	 Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	 kvm@vger.kernel.org, loongarch@lists.linux.dev,
	linux-mips@vger.kernel.org,  linuxppc-dev@lists.ozlabs.org,
	kvm-riscv@lists.infradead.org,  linux-riscv@lists.infradead.org,
	linux-kernel@vger.kernel.org,
	 Aaron Lewis <aaronlewis@google.com>,
	Jim Mattson <jmattson@google.com>,
	 Yan Zhao <yan.y.zhao@intel.com>,
	Rick P Edgecombe <rick.p.edgecombe@intel.com>,
	 Kai Huang <kai.huang@intel.com>,
	Isaku Yamahata <isaku.yamahata@intel.com>
Subject: [PATCH 5/7] KVM: x86: Unload MMUs during vCPU destruction, not before
Date: Mon, 24 Feb 2025 15:55:40 -0800	[thread overview]
Message-ID: <20250224235542.2562848-6-seanjc@google.com> (raw)
In-Reply-To: <20250224235542.2562848-1-seanjc@google.com>

When destroying a VM, unload a vCPU's MMUs as part of normal vCPU freeing,
instead of as a separate prepratory action.  Unloading MMUs ahead of time
is a holdover from commit 7b53aa565084 ("KVM: Fix vcpu freeing for guest
smp"), which "fixed" a rather egregious flaw where KVM would attempt to
free *all* MMU pages when destroying a vCPU.

At the time, KVM would spin on all MMU pages in a VM when free a single
vCPU, and so would hang due to the way KVM pins and zaps root pages
(roots are invalidated but not freed if they are pinned by a vCPU).

  static void free_mmu_pages(struct kvm_vcpu *vcpu)
  {
        struct kvm_mmu_page *page;

        while (!list_empty(&vcpu->kvm->active_mmu_pages)) {
                page = container_of(vcpu->kvm->active_mmu_pages.next,
                                    struct kvm_mmu_page, link);
                kvm_mmu_zap_page(vcpu->kvm, page);
        }
        free_page((unsigned long)vcpu->mmu.pae_root);
  }

Now that KVM doesn't try to free all MMU pages when destroying a single
vCPU, there's no need to unpin roots prior to destroying a vCPU.

Note!  While KVM mostly destroys all MMUs before calling
kvm_arch_destroy_vm() (see commit f00be0cae4e6 ("KVM: MMU: do not free
active mmu pages in free_mmu_pages()")), unpinning MMU roots during vCPU
destruction will unfortunately trigger remote TLB flushes, i.e. will try
to send requests to all vCPUs.

Happily, thanks to commit 27592ae8dbe4 ("KVM: Move wiping of the kvm->vcpus
array to common code"), that's a non-issue as freed vCPUs are naturally
skipped by xa_for_each_range(), i.e. by kvm_for_each_vcpu().  Prior to that
commit, KVM x86 rather stupidly freed vCPUs one-by-one, and _then_
nullified them, one-by-one.  I.e. triggering a VM-wide request would hit a
use-after-free.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9978ed4c0917..a61dbd1f0d01 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12374,6 +12374,9 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
 	int idx;
 
+	kvm_clear_async_pf_completion_queue(vcpu);
+	kvm_mmu_unload(vcpu);
+
 	kvmclock_reset(vcpu);
 
 	kvm_x86_call(vcpu_free)(vcpu);
@@ -12767,17 +12770,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	return ret;
 }
 
-static void kvm_unload_vcpu_mmus(struct kvm *kvm)
-{
-	unsigned long i;
-	struct kvm_vcpu *vcpu;
-
-	kvm_for_each_vcpu(i, vcpu, kvm) {
-		kvm_clear_async_pf_completion_queue(vcpu);
-		kvm_mmu_unload(vcpu);
-	}
-}
-
 void kvm_arch_sync_events(struct kvm *kvm)
 {
 	cancel_delayed_work_sync(&kvm->arch.kvmclock_sync_work);
@@ -12882,7 +12874,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 		__x86_set_memory_region(kvm, TSS_PRIVATE_MEMSLOT, 0, 0);
 		mutex_unlock(&kvm->slots_lock);
 	}
-	kvm_unload_vcpu_mmus(kvm);
 	kvm_destroy_vcpus(kvm);
 	kvm_x86_call(vm_destroy)(kvm);
 	kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
-- 
2.48.1.658.g4767266eb4-goog



  parent reply	other threads:[~2025-02-24 23:56 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-24 23:55 [PATCH 0/7] KVM: x86: nVMX IRQ fix and VM teardown cleanups Sean Christopherson
2025-02-24 23:55 ` [PATCH 1/7] KVM: x86: Free vCPUs before freeing VM state Sean Christopherson
2025-02-25  7:44   ` Yan Zhao
2025-02-25 15:04     ` Sean Christopherson
2025-02-26  7:34       ` Yan Zhao
2025-02-25 23:47   ` Paolo Bonzini
2025-02-26  0:27     ` Sean Christopherson
2025-02-26  9:18       ` Paolo Bonzini
2025-02-24 23:55 ` [PATCH 2/7] KVM: nVMX: Process events on nested VM-Exit if injectable IRQ or NMI is pending Sean Christopherson
2025-02-24 23:55 ` [PATCH 3/7] KVM: Assert that a destroyed/freed vCPU is no longer visible Sean Christopherson
2025-02-25  7:07   ` Yan Zhao
2025-02-25 14:35     ` Sean Christopherson
2025-02-24 23:55 ` [PATCH 4/7] KVM: x86: Don't load/put vCPU when unloading its MMU during teardown Sean Christopherson
2025-02-25  7:13   ` Yan Zhao
2025-02-25 14:44     ` Sean Christopherson
2025-02-24 23:55 ` Sean Christopherson [this message]
2025-02-24 23:55 ` [PATCH 6/7] KVM: x86: Fold guts of kvm_arch_sync_events() into kvm_arch_pre_destroy_vm() Sean Christopherson
2025-02-24 23:55 ` [PATCH 7/7] KVM: Drop kvm_arch_sync_events() now that all implementations are nops Sean Christopherson
2025-02-25 12:05   ` bibo mao
2025-02-25 16:15   ` Claudio Imbrenda
2025-02-26 18:38 ` [PATCH 0/7] KVM: x86: nVMX IRQ fix and VM teardown cleanups Paolo Bonzini
2025-03-27  3:24 ` patchwork-bot+linux-riscv

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250224235542.2562848-6-seanjc@google.com \
    --to=seanjc@google.com \
    --cc=aaronlewis@google.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=borntraeger@linux.ibm.com \
    --cc=chenhuacai@kernel.org \
    --cc=frankja@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=isaku.yamahata@intel.com \
    --cc=jmattson@google.com \
    --cc=kai.huang@intel.com \
    --cc=kvm-riscv@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=loongarch@lists.linux.dev \
    --cc=maddy@linux.ibm.com \
    --cc=maobibo@loongson.cn \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=yan.y.zhao@intel.com \
    --cc=zhaotianrui@loongson.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).