linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Naveen N Rao <naveen@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>,
	Oliver Upton <oliver.upton@linux.dev>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	Joerg Roedel <joro@8bytes.org>,
	 David Woodhouse <dwmw2@infradead.org>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	 linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	 kvm@vger.kernel.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org,  Sairaj Kodilkar <sarunkod@amd.com>,
	Vasant Hegde <vasant.hegde@amd.com>,
	 Maxim Levitsky <mlevitsk@redhat.com>,
	Joao Martins <joao.m.martins@oracle.com>,
	 Francesco Lavra <francescolavra.fl@gmail.com>,
	David Matlack <dmatlack@google.com>
Subject: Re: [PATCH v3 12/62] KVM: SVM: Inhibit AVIC if ID is too big instead of rejecting vCPU creation
Date: Tue, 17 Jun 2025 09:10:10 -0700	[thread overview]
Message-ID: <aFGTYoxlLhZsiMUC@google.com> (raw)
In-Reply-To: <bmx43lru6mwdyun3ktcmoeezqw6baih7vgtykolsrmu5q2x7vu@xp5jrbbqwzej>

On Tue, Jun 17, 2025, Naveen N Rao wrote:
> On Wed, Jun 11, 2025 at 03:45:15PM -0700, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index ab228872a19b..f0a74b102c57 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -277,9 +277,19 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
> >  	int id = vcpu->vcpu_id;
> >  	struct vcpu_svm *svm = to_svm(vcpu);
> >  
> > +	/*
> > +	 * Inhibit AVIC if the vCPU ID is bigger than what is supported by AVIC
> > +	 * hardware.  Immediately clear apicv_active, i.e. don't wait until the
> > +	 * KVM_REQ_APICV_UPDATE request is processed on the first KVM_RUN, as
> > +	 * avic_vcpu_load() expects to be called if and only if the vCPU has
> > +	 * fully initialized AVIC.
> > +	 */
> >  	if ((!x2avic_enabled && id > AVIC_MAX_PHYSICAL_ID) ||
> > -	    (id > X2AVIC_MAX_PHYSICAL_ID))
> > -		return -EINVAL;
> > +	    (id > X2AVIC_MAX_PHYSICAL_ID)) {
> > +		kvm_set_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_PHYSICAL_ID_TOO_BIG);
> > +		vcpu->arch.apic->apicv_active = false;
> 
> This bothers me a bit. kvm_create_lapic() does this:
>           if (enable_apicv) {
>                   apic->apicv_active = true;
>                   kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu);
> 	  }
> 
> But, setting apic->apicv_active to false here means KVM_REQ_APICV_UPDATE 
> is going to be a no-op.

That's fine, KVM_REQ_APICV_UPDATE is a nop in other scenarios, too.  I agree it's
not ideal, but this is a rather extreme edge case and a one-time slow path, so I
don't think it's worth doing anything special just to avoid KVM_REQ_APICV_UPDATE.

> This does not look to be a big deal given that kvm_create_lapic() itself 
> is called just a bit before svm_vcpu_create() (which calls the above 
> function through avic_init_vcpu()) in kvm_arch_vcpu_create(), so there 
> isn't that much done before apicv_active is toggled.
> 
> But, this made me wonder if introducing a kvm_x86_op to check and 
> enable/disable apic->apicv_active in kvm_create_lapic() might be cleaner 
> overall

Hmm, yes and no.  I completely agree that clearing apicv_active in avic.c is all
kinds of gross, but clearing apic->apicv_active in lapic.c to handle this particular
scenario is just as problematic, because then avic_init_backing_page() would need
to check kvm_vcpu_apicv_active() to determine whether or not to allocate the backing
page.  In a way, that's even worse, because setting apic->apicv_active by default
is purely an optimization, i.e. leaving it %false _should_ work as well, it would
just be suboptimal.  But if AVIC were to key off apic->apicv_active, that could
lead to KVM incorrectly skipping allocation of the AVIC backing page.

> Maybe even have it be the initialization point for APICv: 
> apicv_init(), so we can invoke avic_init_vcpu() right away?

I mostly like this idea (actually, I take that back; see below), but VMX throws
a big wrench in things.  Unlike SVM, VMX doesn't have a singular "enable APICv"
flag.  Rather, KVM considers "APICv" to be the combination of APIC-register
virtualization, virtual interrupt delivery, and posted interrupts.

Which is totally fine.  The big wrench is that the are other features that interact
with "APICv" and require allocations.  E.g.  the APIC access page isn't actually
tied to enable_apicv, it's tied to yet another VMX feature, "virtualize APIC
accesses" (not to be confused with APIC-register virtualization; don't blame me,
I didn't create the control knobs/names).

As a result, KVM may need to allocate the APIC access page (not to be confused
with the APIC *backing* page; again, don't blame me :-D) when enable_apicv=false,
and even more confusingly, NOT allocate the APIC access page when enable_apicv=true.

	if (cpu_need_virtualize_apic_accesses(vcpu)) {  <=== not an enable_apic check, *sigh*
		err = kvm_alloc_apic_access_page(vcpu->kvm);
		if (err)
			goto free_vmcs;
	}

And for both SVM and VMX, IPI virtualization adds another wrinkle, in that the
per-vCPU setup needs to fill an entry in a per-VM table.  And filling that entry
needs to happen at the very end of vCPU creation, e.g. so that the vCPU can't be
targeted until KVM is ready to run the vCPU.

Ouch.  And I'm pretty sure there's a use-after-free in the AVIC code.  If
svm_vcpu_alloc_msrpm() fails, the avic_physical_id_table[] will still have a pointer
to freed vAPIC page.

Thus, invoking avic_init_vcpu() "right away" is unfortunately flat out unsafe
(took me a while to realize that).

So while I 100% agree with your dislike of this patch, I think it's the least
awful band-aid, at least in the short term.

Longer term, I'd definitely be in favor of cleaning up the related flows, but I
think that's best done on top of this series, because I suspect it'll be somewhat
invasive.

  reply	other threads:[~2025-06-17 16:10 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-11 22:45 [PATCH v3 00/62] KVM: iommu: Overhaul device posted IRQs support Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 01/62] KVM: arm64: Explicitly treat routing entry type changes as changes Sean Christopherson
2025-06-13 19:43   ` Oliver Upton
2025-06-19 12:36   ` (subset) " Marc Zyngier
2025-06-11 22:45 ` [PATCH v3 02/62] KVM: arm64: WARN if unmapping vLPI fails Sean Christopherson
2025-06-12 11:59   ` Marc Zyngier
2025-06-12 14:34     ` Sean Christopherson
2025-06-13 20:47       ` Oliver Upton
2025-06-20 17:22         ` Sean Christopherson
2025-06-20 18:00           ` David Woodhouse
2025-06-20 18:48           ` Oliver Upton
2025-06-20 19:04             ` Sean Christopherson
2025-06-20 19:27               ` Oliver Upton
2025-06-20 20:31                 ` Sean Christopherson
2025-06-20 20:45                   ` Oliver Upton
2025-06-11 22:45 ` [PATCH v3 03/62] KVM: Pass new routing entries and irqfd when updating IRTEs Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 04/62] KVM: SVM: Track per-vCPU IRTEs using kvm_kernel_irqfd structure Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 05/62] KVM: SVM: Delete IRTE link from previous vCPU before setting new IRTE Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 06/62] iommu/amd: KVM: SVM: Delete now-unused cached/previous GA tag fields Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 07/62] KVM: SVM: Delete IRTE link from previous vCPU irrespective of new routing Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 08/62] KVM: SVM: Drop pointless masking of default APIC base when setting V_APIC_BAR Sean Christopherson
2025-06-13 14:15   ` Naveen N Rao
2025-06-11 22:45 ` [PATCH v3 09/62] KVM: SVM: Drop pointless masking of kernel page pa's with AVIC HPA masks Sean Christopherson
2025-06-13 14:37   ` Naveen N Rao
2025-06-11 22:45 ` [PATCH v3 10/62] KVM: SVM: Add helper to deduplicate code for getting AVIC backing page Sean Christopherson
2025-06-13 14:38   ` Naveen N Rao
2025-06-11 22:45 ` [PATCH v3 11/62] KVM: SVM: Drop vcpu_svm's pointless avic_backing_page field Sean Christopherson
2025-06-13 14:44   ` Naveen N Rao
2025-06-11 22:45 ` [PATCH v3 12/62] KVM: SVM: Inhibit AVIC if ID is too big instead of rejecting vCPU creation Sean Christopherson
2025-06-17 14:25   ` Naveen N Rao
2025-06-17 16:10     ` Sean Christopherson [this message]
2025-06-18 14:33       ` Naveen N Rao
2025-06-18 20:59         ` Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 13/62] KVM: SVM: Drop redundant check in AVIC code on ID during " Sean Christopherson
2025-06-17 14:49   ` Naveen N Rao
2025-06-17 16:33     ` Sean Christopherson
2025-06-18 14:39       ` Naveen N Rao
2025-06-11 22:45 ` [PATCH v3 14/62] KVM: SVM: Track AVIC tables as natively sized pointers, not "struct pages" Sean Christopherson
2025-06-17 15:01   ` Naveen N Rao
2025-06-11 22:45 ` [PATCH v3 15/62] KVM: SVM: Drop superfluous "cache" of AVIC Physical ID entry pointer Sean Christopherson
2025-06-19 11:09   ` Naveen N Rao
2025-06-11 22:45 ` [PATCH v3 16/62] KVM: VMX: Move enable_ipiv knob to common x86 Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 17/62] KVM: SVM: Add enable_ipiv param, never set IsRunning if disabled Sean Christopherson
2025-06-19 11:31   ` Naveen N Rao
2025-06-19 12:01     ` Naveen N Rao
2025-06-20 14:39     ` Sean Christopherson
2025-06-23 10:45       ` Naveen N Rao
2025-06-11 22:45 ` [PATCH v3 18/62] KVM: SVM: Disable (x2)AVIC IPI virtualization if CPU has erratum #1235 Sean Christopherson
2025-06-23 14:05   ` Naveen N Rao
2025-06-23 15:30     ` Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 19/62] KVM: VMX: Suppress PI notifications whenever the vCPU is put Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 20/62] KVM: SVM: Add a comment to explain why avic_vcpu_blocking() ignores IRQ blocking Sean Christopherson
2025-06-23 15:54   ` Naveen N Rao
2025-06-23 16:18     ` Sean Christopherson
2025-06-25 15:28       ` Naveen N Rao
2025-06-11 22:45 ` [PATCH v3 21/62] iommu/amd: KVM: SVM: Use pi_desc_addr to derive ga_root_ptr Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 22/62] iommu/amd: KVM: SVM: Pass NULL @vcpu_info to indicate "not guest mode" Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 23/62] KVM: SVM: Stop walking list of routing table entries when updating IRTE Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 24/62] KVM: VMX: " Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 25/62] KVM: SVM: Extract SVM specific code out of get_pi_vcpu_info() Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 26/62] KVM: x86: Move IRQ routing/delivery APIs from x86.c => irq.c Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 27/62] KVM: x86: Nullify irqfd->producer after updating IRTEs Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 28/62] KVM: x86: Dedup AVIC vs. PI code for identifying target vCPU Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 29/62] KVM: x86: Move posted interrupt tracepoint to common code Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 30/62] KVM: SVM: Clean up return handling in avic_pi_update_irte() Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 31/62] iommu: KVM: Split "struct vcpu_data" into separate AMD vs. Intel structs Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 32/62] KVM: Don't WARN if updating IRQ bypass route fails Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 33/62] KVM: Fold kvm_arch_irqfd_route_changed() into kvm_arch_update_irqfd_routing() Sean Christopherson
2025-06-13 20:50   ` Oliver Upton
2025-06-11 22:45 ` [PATCH v3 34/62] KVM: x86: Track irq_bypass_vcpu in common x86 code Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 35/62] KVM: x86: Skip IOMMU IRTE updates if there's no old or new vCPU being targeted Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 36/62] KVM: x86: Don't update IRTE entries when old and new routes were !MSI Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 37/62] KVM: SVM: Revert IRTE to legacy mode if IOMMU doesn't provide IR metadata Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 38/62] KVM: SVM: Take and hold ir_list_lock across IRTE updates in IOMMU Sean Christopherson
2025-06-17 15:42   ` Naveen N Rao
2025-06-11 22:45 ` [PATCH v3 39/62] iommu/amd: Document which IRTE fields amd_iommu_update_ga() can modify Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 40/62] iommu/amd: KVM: SVM: Infer IsRun from validity of pCPU destination Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 41/62] iommu/amd: Factor out helper for manipulating IRTE GA/CPU info Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 42/62] iommu/amd: KVM: SVM: Set pCPU info in IRTE when setting vCPU affinity Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 43/62] iommu/amd: KVM: SVM: Add IRTE metadata to affined vCPU's list if AVIC is inhibited Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 44/62] KVM: SVM: Don't check for assigned device(s) when updating affinity Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 45/62] KVM: SVM: Don't check for assigned device(s) when activating AVIC Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 46/62] KVM: SVM: WARN if (de)activating guest mode in IOMMU fails Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 47/62] KVM: SVM: Process all IRTEs on affinity change even if one update fails Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 48/62] KVM: SVM: WARN if updating IRTE GA fields in IOMMU fails Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 49/62] KVM: x86: Drop superfluous "has assigned device" check in kvm_pi_update_irte() Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 50/62] KVM: x86: WARN if IRQ bypass isn't supported " Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 51/62] KVM: x86: WARN if IRQ bypass routing is updated without in-kernel local APIC Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 52/62] KVM: SVM: WARN if ir_list is non-empty at vCPU free Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 53/62] KVM: x86: Decouple device assignment from IRQ bypass Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 54/62] KVM: VMX: WARN if VT-d Posted IRQs aren't possible when starting " Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 55/62] KVM: SVM: Use vcpu_idx, not vcpu_id, for GA log tag/metadata Sean Christopherson
2025-06-11 22:45 ` [PATCH v3 56/62] iommu/amd: WARN if KVM calls GA IRTE helpers without virtual APIC support Sean Christopherson
2025-06-11 22:46 ` [PATCH v3 57/62] KVM: SVM: Fold avic_set_pi_irte_mode() into its sole caller Sean Christopherson
2025-06-11 22:46 ` [PATCH v3 58/62] KVM: SVM: Don't check vCPU's blocking status when toggling AVIC on/off Sean Christopherson
2025-06-11 22:46 ` [PATCH v3 59/62] KVM: SVM: Consolidate IRTE update " Sean Christopherson
2025-06-11 22:46 ` [PATCH v3 60/62] iommu/amd: KVM: SVM: Allow KVM to control need for GA log interrupts Sean Christopherson
2025-06-11 22:46 ` [PATCH v3 61/62] KVM: SVM: Generate GA log IRQs only if the associated vCPUs is blocking Sean Christopherson
2025-06-11 22:46 ` [PATCH v3 62/62] KVM: x86: Rename kvm_set_msi_irq() => kvm_msi_to_lapic_irq() Sean Christopherson
2025-06-24 19:38 ` [PATCH v3 00/62] KVM: iommu: Overhaul device posted IRQs support Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aFGTYoxlLhZsiMUC@google.com \
    --to=seanjc@google.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=dmatlack@google.com \
    --cc=dwmw2@infradead.org \
    --cc=francescolavra.fl@gmail.com \
    --cc=iommu@lists.linux.dev \
    --cc=joao.m.martins@oracle.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=naveen@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=sarunkod@amd.com \
    --cc=vasant.hegde@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).