Kernel KVM virtualization development
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Naveen N Rao <naveen@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] KVM: SVM: Drop unnecessary avic_vm_destroy() call on init failure
Date: Mon, 29 Jun 2026 17:17:17 -0700	[thread overview]
Message-ID: <akMLDayXmzhmvJBG@google.com> (raw)
In-Reply-To: <akJw3nXeWO-Xavc5@blrnaveerao1>

On Mon, Jun 29, 2026, Naveen N Rao wrote:
> On Thu, Jun 25, 2026 at 03:09:33PM -0700, Sean Christopherson wrote:
> > Don't bother calling avic_vm_destroy() when allocating the logical ID table
> > fails, as there is nothing to clean up now that the physical ID table is
> > allocated later, on-demand at first vCPU creation.
> > 
> > For all intents and purposes, no functional change intended, as calling
> > avic_vm_destroy() is a big nop in this case.
> > 
> > Fixes: 54ffe74cc4ab ("KVM: SVM: Move AVIC Physical ID table allocation to vcpu_precreate()")
> > Cc: Naveen N Rao (AMD) <naveen@kernel.org>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/kvm/svm/avic.c | 7 +------
> >  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> This LGTM, though I'm not sure this deserves a fixes tag.

Yeah, it's borderline.  Because KVM (x86) doesn't do AUTOSEL, i.e. requires an
explicit "Cc: stable@vger.kernel.org" for the vast majority of cases, we can be
very liberal with Fixes.  And so I like to use Fixes to create a paper trail for
anything where a commit did something that would have necessitate a new version
of the patch, had it been caught in code review.

> FWIW, I have had a patch (part of a larger series I am going to post 
> soon) to move the logical_id_table allocation alongside the physical ID 
> table allocation. I'm fine if you want to queue your patch first, but 
> this is what I have locally:

Ah shoot, I should have read this mail earlier.  While floundering around, trying
to figure out how to deal with kvm_arch_pre_destroy_vm() not being called when
VM creation fails after kvm_arch_init_vm(), I ended up with the same idea, though
I moved *all* of avic_vm_init() to the vCPU-precreate phase.  Because what I
needed was to defer adding the VM to the GA Log list until the VM is fully
created.

> @@ -303,10 +303,16 @@ int avic_alloc_physical_id_table(struct kvm *kvm)
>  	if (kvm_svm->avic_physical_id_table)
>  		return 0;
>  
> +	kvm_svm->avic_logical_id_table = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
> +	if (!kvm_svm->avic_logical_id_table)
> +		return -ENOMEM;
> +
>  	kvm_svm->avic_physical_id_table = (void *)__get_free_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO,
>  								   avic_get_physical_id_table_order(kvm));
> -	if (!kvm_svm->avic_physical_id_table)
> +	if (!kvm_svm->avic_physical_id_table) {
> +		free_page((unsigned long)kvm_svm->avic_logical_id_table);

Freeing the page on failure is "wrong".  To keep vCPU creation idempotent from
userspace's perspective (well, not truly idempotent, but close-ish), KVM
deliberately doesn't unwind if .vcpu_precreate() fails.  E.g. so that if
allocating the tables succeeds, but a later stage of vCPU creation fails,
subsequent calls to create vCPUs won't fail at the earlier stage.

The nice thing is that since the VM has already been created, there's no need to
every unwind on failure, because it's no different than if KVM had successfully
allocate the assets during VM creation.

This is what I have (I reworked the avic_alloc_logical_id_table() into
avic_vcpu_pre_create() and dropped svm_vcpu_precreate() in a previous commit.
Hrm, maybe that should be avic_vcpu_precreate()...

---
From: Sean Christopherson <seanjc@google.com>
Date: Mon, 29 Jun 2026 12:23:20 -0700
Subject: [PATCH 2/3] KVM: SVM: Do all per-VM AVIC initialization during vCPU
 precreation phase

Move all per-VM AVIC initialization from VM creation to vCPU pre-creation,
i.e. defer allocating the logical ID table and adding the VM to the GA Log
list until vCPUs are created.  This will allow removing the VM from the GA
Log list before vCPUs are destroyed without needing yet another kvm_x86_ops
hook (.vm_pre_destroy() is very intentionally called if and only if VM
creation fully succeeds).

As a bonus, this re-unites physical and logic table allocation, and avoids
allocating a logical table in the unlikely scenario that userspace creates
a VM without an in-kernel local APIC.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/avic.c | 94 +++++++++++++++++++++++++----------------
 arch/x86/kvm/svm/svm.c  |  6 ---
 2 files changed, 57 insertions(+), 43 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 5d7adc394e5a..3393e90879e6 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -308,47 +308,32 @@ static int avic_alloc_physical_id_table(struct kvm *kvm)
 	return 0;
 }
 
-int avic_vcpu_pre_create(struct kvm *kvm)
+static int avic_alloc_logical_id_table(struct kvm *kvm)
 {
-	if (!irqchip_in_kernel(kvm) || WARN_ON_ONCE(!enable_apicv))
-		return 0;
-
-	return avic_alloc_physical_id_table(kvm);
-}
-
-void avic_vm_destroy(struct kvm *kvm)
-{
-	unsigned long flags;
-	struct kvm_svm *kvm_svm = to_kvm_svm(kvm);
-
-	if (!enable_apicv)
-		return;
-
-	free_page((unsigned long)kvm_svm->avic_logical_id_table);
-	free_pages((unsigned long)kvm_svm->avic_physical_id_table,
-		   avic_get_physical_id_table_order(kvm));
-
-	spin_lock_irqsave(&svm_vm_data_hash_lock, flags);
-	hash_del(&kvm_svm->hnode);
-	spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags);
-}
-
-int avic_vm_init(struct kvm *kvm)
-{
-	unsigned long flags;
-	int err = -ENOMEM;
 	struct kvm_svm *kvm_svm = to_kvm_svm(kvm);
-	struct kvm_svm *k2;
-	u32 vm_id;
 
-	if (!enable_apicv)
+	if (kvm_svm->avic_logical_id_table)
 		return 0;
 
 	kvm_svm->avic_logical_id_table = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
 	if (!kvm_svm->avic_logical_id_table)
-		goto free_avic;
+		return -ENOMEM;
 
-	spin_lock_irqsave(&svm_vm_data_hash_lock, flags);
+	return 0;
+}
+
+static void avic_add_vm_to_ga_log_list(struct kvm *kvm)
+{
+	struct kvm_svm *kvm_svm = to_kvm_svm(kvm);
+	struct kvm_svm *k2;
+	u32 vm_id;
+
+	lockdep_assert_held(&kvm->lock);
+
+	if (kvm_svm->avic_vm_id)
+		return;
+
+	guard(spinlock_irqsave)(&svm_vm_data_hash_lock);
  again:
 	vm_id = next_vm_id = (next_vm_id + 1) & AVIC_VM_ID_MASK;
 	if (vm_id == 0) { /* id is 1-based, zero is not okay */
@@ -364,13 +349,48 @@ int avic_vm_init(struct kvm *kvm)
 	}
 	kvm_svm->avic_vm_id = vm_id;
 	hash_add(svm_vm_data_hash, &kvm_svm->hnode, kvm_svm->avic_vm_id);
-	spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags);
+}
 
+int avic_vcpu_pre_create(struct kvm *kvm)
+{
+	int r;
+
+	if (!irqchip_in_kernel(kvm) || WARN_ON_ONCE(!enable_apicv))
+		return 0;
+
+	/*
+	 * Don't unwind on failure, all actions must be idempotent with respect
+	 * to creating multiple vCPUs, i.e. must persist until the VM is destroyed.
+	 */
+	r = avic_alloc_physical_id_table(kvm);
+	if (r)
+		return r;
+
+	r = avic_alloc_logical_id_table(kvm);
+	if (r)
+		return r;
+
+	avic_add_vm_to_ga_log_list(kvm);
 	return 0;
+}
 
-free_avic:
-	avic_vm_destroy(kvm);
-	return err;
+void avic_vm_destroy(struct kvm *kvm)
+{
+	unsigned long flags;
+	struct kvm_svm *kvm_svm = to_kvm_svm(kvm);
+
+	if (!enable_apicv)
+		return;
+
+	free_page((unsigned long)kvm_svm->avic_logical_id_table);
+	free_pages((unsigned long)kvm_svm->avic_physical_id_table,
+		   avic_get_physical_id_table_order(kvm));
+
+	if (kvm_svm->avic_vm_id) {
+		spin_lock_irqsave(&svm_vm_data_hash_lock, flags);
+		hash_del(&kvm_svm->hnode);
+		spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags);
+	}
 }
 
 static phys_addr_t avic_get_backing_page_address(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ab3ffe04bc35..ecd4abd41040 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5297,12 +5297,6 @@ static int svm_vm_init(struct kvm *kvm)
 	if (!pause_filter_count || !pause_filter_thresh)
 		kvm_disable_exits(kvm, KVM_X86_DISABLE_EXITS_PAUSE);
 
-	if (enable_apicv) {
-		int ret = avic_vm_init(kvm);
-		if (ret)
-			return ret;
-	}
-
 	svm_srso_vm_init();
 	return 0;
 }
--

      reply	other threads:[~2026-06-30  0:17 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-25 22:09 [PATCH 0/2] KVM: SVM: Fix a (very) unlikely UAF for GA Log IRQs Sean Christopherson
2026-06-25 22:09 ` [PATCH 1/2] KVM: SVM: Remove VM from the GA Log notifier list before VM destruction Sean Christopherson
2026-06-25 22:26   ` sashiko-bot
2026-06-28 18:30     ` XIAO WU
2026-06-25 22:09 ` [PATCH 2/2] KVM: SVM: Drop unnecessary avic_vm_destroy() call on init failure Sean Christopherson
2026-06-25 22:25   ` sashiko-bot
2026-06-29 13:27   ` Naveen N Rao
2026-06-30  0:17     ` Sean Christopherson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=akMLDayXmzhmvJBG@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=naveen@kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox