* [PATCH 0/2] KVM: SVM: Fix CR8 intercpetion woes with AVIC
@ 2026-02-03 19:07 Sean Christopherson
2026-02-03 19:07 ` [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC Sean Christopherson
` (2 more replies)
0 siblings, 3 replies; 25+ messages in thread
From: Sean Christopherson @ 2026-02-03 19:07 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Jim Mattson, Naveen N Rao,
Maciej S . Szmigiero
Fix a bug (or rather, a class of bugs) where SVM leaves the CR8 write
intercept enabled after AVIC is enabled. On its own, the dangling CR8
intercept is "just" a performance issue. But combined with the TPR sync bug
fixed by commit d02e48830e3f ("KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR
even if AVIC is active"), the danging intercept is fatal to Windows guests as
the TPR seen by hardware gets wildly out of sync with reality.
Tagged for stable even though there shouldn't be functional issues so long as
the TPR sync bug is fixed, because (a) write_cr8 exits can represent the
overwhelming majority of exits (hence the quotes around "just" a performance
issue), and (b) running with a bad/wrong configuration increases the chances
of encountering other lurking TPR bugs (if there are any), i.e. of hitting
bugs that would otherwise be rare edge (which is good for testing, but bad
for production).
Sean Christopherson (2):
KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with
in-kernel APIC
KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
arch/x86/kvm/svm/avic.c | 8 +++++---
arch/x86/kvm/svm/svm.c | 11 ++++++-----
2 files changed, 11 insertions(+), 8 deletions(-)
base-commit: e944fe2c09f405a2e2d147145c9b470084bc4c9a
--
2.53.0.rc2.204.g2597b5adb4-goog
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC 2026-02-03 19:07 [PATCH 0/2] KVM: SVM: Fix CR8 intercpetion woes with AVIC Sean Christopherson @ 2026-02-03 19:07 ` Sean Christopherson 2026-02-05 4:21 ` Jim Mattson 2026-02-06 14:00 ` Naveen N Rao 2026-02-03 19:07 ` [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated Sean Christopherson 2026-03-05 17:07 ` [PATCH 0/2] KVM: SVM: Fix CR8 intercpetion woes with AVIC Sean Christopherson 2 siblings, 2 replies; 25+ messages in thread From: Sean Christopherson @ 2026-02-03 19:07 UTC (permalink / raw) To: Sean Christopherson, Paolo Bonzini Cc: kvm, linux-kernel, Jim Mattson, Naveen N Rao, Maciej S . Szmigiero Initialize all per-vCPU AVIC control fields in the VMCB if AVIC is enabled in KVM and the VM has an in-kernel local APIC, i.e. if it's _possible_ the vCPU could activate AVIC at any point in its lifecycle. Configuring the VMCB if and only if AVIC is active "works" purely because of optimizations in kvm_create_lapic() to speculatively set apicv_active if AVIC is enabled *and* to defer updates until the first KVM_RUN. In quotes because KVM likely won't do the right thing if kvm_apicv_activated() is false, i.e. if a vCPU is created while APICv is inhibited at the VM level for whatever reason. E.g. if the inhibit is *removed* before KVM_REQ_APICV_UPDATE is handled in KVM_RUN, then __kvm_vcpu_update_apicv() will elide calls to vendor code due to seeing "apicv_active == activate". Cleaning up the initialization code will also allow fixing a bug where KVM incorrectly leaves CR8 interception enabled when AVIC is activated without creating a mess with respect to whether AVIC is activated or not. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/svm/avic.c | 2 +- arch/x86/kvm/svm/svm.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index f92214b1a938..44e07c27b190 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -368,7 +368,7 @@ void avic_init_vmcb(struct vcpu_svm *svm, struct vmcb *vmcb) vmcb->control.avic_physical_id = __sme_set(__pa(kvm_svm->avic_physical_id_table)); vmcb->control.avic_vapic_bar = APIC_DEFAULT_PHYS_BASE; - if (kvm_apicv_activated(svm->vcpu.kvm)) + if (kvm_vcpu_apicv_active(&svm->vcpu)) avic_activate_vmcb(svm); else avic_deactivate_vmcb(svm); diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 5f0136dbdde6..e8313fdc5465 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1189,7 +1189,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event) if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS)) svm->vmcb->control.erap_ctl |= ERAP_CONTROL_ALLOW_LARGER_RAP; - if (kvm_vcpu_apicv_active(vcpu)) + if (enable_apicv && irqchip_in_kernel(vcpu->kvm)) avic_init_vmcb(svm, vmcb); if (vnmi) -- 2.53.0.rc2.204.g2597b5adb4-goog ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC 2026-02-03 19:07 ` [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC Sean Christopherson @ 2026-02-05 4:21 ` Jim Mattson 2026-02-06 14:00 ` Naveen N Rao 1 sibling, 0 replies; 25+ messages in thread From: Jim Mattson @ 2026-02-05 4:21 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, kvm, linux-kernel, Naveen N Rao, Maciej S . Szmigiero On Tue, Feb 3, 2026 at 11:07 AM Sean Christopherson <seanjc@google.com> wrote: > > Initialize all per-vCPU AVIC control fields in the VMCB if AVIC is enabled > in KVM and the VM has an in-kernel local APIC, i.e. if it's _possible_ the > vCPU could activate AVIC at any point in its lifecycle. Configuring the > VMCB if and only if AVIC is active "works" purely because of optimizations > in kvm_create_lapic() to speculatively set apicv_active if AVIC is enabled > *and* to defer updates until the first KVM_RUN. In quotes because KVM > likely won't do the right thing if kvm_apicv_activated() is false, i.e. if > a vCPU is created while APICv is inhibited at the VM level for whatever > reason. E.g. if the inhibit is *removed* before KVM_REQ_APICV_UPDATE is > handled in KVM_RUN, then __kvm_vcpu_update_apicv() will elide calls to > vendor code due to seeing "apicv_active == activate". > > Cleaning up the initialization code will also allow fixing a bug where KVM > incorrectly leaves CR8 interception enabled when AVIC is activated without > creating a mess with respect to whether AVIC is activated or not. > > Cc: stable@vger.kernel.org > Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC 2026-02-03 19:07 ` [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC Sean Christopherson 2026-02-05 4:21 ` Jim Mattson @ 2026-02-06 14:00 ` Naveen N Rao 2026-02-06 18:17 ` Sean Christopherson 1 sibling, 1 reply; 25+ messages in thread From: Naveen N Rao @ 2026-02-06 14:00 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, kvm, linux-kernel, Jim Mattson, Maciej S . Szmigiero On Tue, Feb 03, 2026 at 11:07:09AM -0800, Sean Christopherson wrote: > Initialize all per-vCPU AVIC control fields in the VMCB if AVIC is enabled > in KVM and the VM has an in-kernel local APIC, i.e. if it's _possible_ the > vCPU could activate AVIC at any point in its lifecycle. Configuring the > VMCB if and only if AVIC is active "works" purely because of optimizations > in kvm_create_lapic() to speculatively set apicv_active if AVIC is enabled > *and* to defer updates until the first KVM_RUN. In quotes because KVM I think it will be good to clarify that two issues are being addressed here (it wasn't clear to me to begin with): - One, described above, is about calling into avic_init_vmcb() regardless of the vCPU APICv status. - Two, described below is about using the vCPU APICv status for init and not consulting the VM-level APICv inhibit status. > likely won't do the right thing if kvm_apicv_activated() is false, i.e. if > a vCPU is created while APICv is inhibited at the VM level for whatever > reason. E.g. if the inhibit is *removed* before KVM_REQ_APICV_UPDATE is > handled in KVM_RUN, then __kvm_vcpu_update_apicv() will elide calls to > vendor code due to seeing "apicv_active == activate". > > Cleaning up the initialization code will also allow fixing a bug where KVM > incorrectly leaves CR8 interception enabled when AVIC is activated without > creating a mess with respect to whether AVIC is activated or not. > > Cc: stable@vger.kernel.org > Signed-off-by: Sean Christopherson <seanjc@google.com> Any reason not to add a Fixes: tag? It looks like the below commits are to blame, but those are really old so I understand if you don't think this is useful: Fixes: 67034bb9dd5e ("KVM: SVM: Add irqchip_split() checks before enabling AVIC") Fixes: 6c3e4422dd20 ("svm: Add support for dynamic APICv") Other than that: Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> > --- > arch/x86/kvm/svm/avic.c | 2 +- > arch/x86/kvm/svm/svm.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > index f92214b1a938..44e07c27b190 100644 > --- a/arch/x86/kvm/svm/avic.c > +++ b/arch/x86/kvm/svm/avic.c > @@ -368,7 +368,7 @@ void avic_init_vmcb(struct vcpu_svm *svm, struct vmcb *vmcb) > vmcb->control.avic_physical_id = __sme_set(__pa(kvm_svm->avic_physical_id_table)); > vmcb->control.avic_vapic_bar = APIC_DEFAULT_PHYS_BASE; > > - if (kvm_apicv_activated(svm->vcpu.kvm)) > + if (kvm_vcpu_apicv_active(&svm->vcpu)) > avic_activate_vmcb(svm); > else > avic_deactivate_vmcb(svm); > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 5f0136dbdde6..e8313fdc5465 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -1189,7 +1189,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event) > if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS)) > svm->vmcb->control.erap_ctl |= ERAP_CONTROL_ALLOW_LARGER_RAP; > > - if (kvm_vcpu_apicv_active(vcpu)) > + if (enable_apicv && irqchip_in_kernel(vcpu->kvm)) > avic_init_vmcb(svm, vmcb); Doesn't have to be done as part of this series, but I'm wondering if it makes sense to turn this into a helper to clarify the intent and to make it more obvious: --- diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e441f270f354..4e0ec4bf0db6 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -2289,6 +2289,7 @@ gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, struct x86_exception *exception); +bool kvm_apicv_possible(struct kvm *kvm); bool kvm_apicv_activated(struct kvm *kvm); bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu); void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 13a4a8949aba..f7b1271cea88 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -285,7 +285,7 @@ int avic_alloc_physical_id_table(struct kvm *kvm) { struct kvm_svm *kvm_svm = to_kvm_svm(kvm); - if (!irqchip_in_kernel(kvm) || !enable_apicv) + if (!kvm_apicv_possible(kvm)) return 0; if (kvm_svm->avic_physical_id_table) @@ -839,7 +839,7 @@ int avic_init_vcpu(struct vcpu_svm *svm) INIT_LIST_HEAD(&svm->ir_list); raw_spin_lock_init(&svm->ir_list_lock); - if (!enable_apicv || !irqchip_in_kernel(vcpu->kvm)) + if (!kvm_apicv_possible(vcpu->kvm)) return 0; ret = avic_init_backing_page(vcpu); diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 4115fe583052..b964d834512e 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1188,7 +1188,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event) if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS)) svm->vmcb->control.erap_ctl |= ERAP_CONTROL_ALLOW_LARGER_RAP; - if (enable_apicv && irqchip_in_kernel(vcpu->kvm)) + if (kvm_apicv_possible(vcpu->kvm)) avic_init_vmcb(svm, vmcb); if (vnmi) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8acfdfc583a1..86f99c5b831a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10270,6 +10270,12 @@ static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid) kvm_irq_delivery_to_apic(kvm, NULL, &lapic_irq); } +bool kvm_apicv_possible(struct kvm *kvm) +{ + return enable_apicv && irqchip_in_kernel(kvm); +} +EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apicv_possible); + bool kvm_apicv_activated(struct kvm *kvm) { return (READ_ONCE(kvm->arch.apicv_inhibit_reasons) == 0); - Naveen ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC 2026-02-06 14:00 ` Naveen N Rao @ 2026-02-06 18:17 ` Sean Christopherson 2026-02-09 10:23 ` Naveen N Rao 0 siblings, 1 reply; 25+ messages in thread From: Sean Christopherson @ 2026-02-06 18:17 UTC (permalink / raw) To: Naveen N Rao Cc: Paolo Bonzini, kvm, linux-kernel, Jim Mattson, Maciej S . Szmigiero On Fri, Feb 06, 2026, Naveen N Rao wrote: > On Tue, Feb 03, 2026 at 11:07:09AM -0800, Sean Christopherson wrote: > > Initialize all per-vCPU AVIC control fields in the VMCB if AVIC is enabled > > in KVM and the VM has an in-kernel local APIC, i.e. if it's _possible_ the > > vCPU could activate AVIC at any point in its lifecycle. Configuring the > > VMCB if and only if AVIC is active "works" purely because of optimizations > > in kvm_create_lapic() to speculatively set apicv_active if AVIC is enabled > > *and* to defer updates until the first KVM_RUN. In quotes because KVM > > I think it will be good to clarify that two issues are being addressed > here (it wasn't clear to me to begin with): > - One, described above, is about calling into avic_init_vmcb() > regardless of the vCPU APICv status. > - Two, described below is about using the vCPU APICv status for init and > not consulting the VM-level APICv inhibit status. Yeah, I was worried the changelog didn't capture the second one well, but I was struggling to come up with wording. How about this as a penultimate paragraph? Note! Use the vCPU's current APICv status when initializing the VMCB, not the VM-level inhibit status. The state of the VMCB *must* be kept consistent with the vCPU's APICv status at all times (KVM elides updates that are supposed be nops). If the vCPU's APICv status isn't up-to-date with the VM-level status, then there is guaranteed to be a pending KVM_REQ_APICV_UPDATE, i.e. KVM will sync the vCPU with the VM before entering the guest. > > likely won't do the right thing if kvm_apicv_activated() is false, i.e. if > > a vCPU is created while APICv is inhibited at the VM level for whatever > > reason. E.g. if the inhibit is *removed* before KVM_REQ_APICV_UPDATE is > > handled in KVM_RUN, then __kvm_vcpu_update_apicv() will elide calls to > > vendor code due to seeing "apicv_active == activate". > > > > Cleaning up the initialization code will also allow fixing a bug where KVM > > incorrectly leaves CR8 interception enabled when AVIC is activated without > > creating a mess with respect to whether AVIC is activated or not. > > > > Cc: stable@vger.kernel.org > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > Any reason not to add a Fixes: tag? Purely that I couldn't pin down exactly what commit(s) to blame. Well, that's a bit of a lie. If I'm being 100% truthful, I got as far as commit 67034bb9dd5e and decided I didn't care enough to spend the effort to figure out whether or not that commit was truly to blame :-) > It looks like the below commits are to blame, but those are really old so I > understand if you don't think this is useful: > Fixes: 67034bb9dd5e ("KVM: SVM: Add irqchip_split() checks before enabling AVIC") > Fixes: 6c3e4422dd20 ("svm: Add support for dynamic APICv") LGTM, I'll tack them on. > Other than that: > Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Thanks! (Seriously, I really appreciate the in-depth reviews) > > --- > > arch/x86/kvm/svm/avic.c | 2 +- > > arch/x86/kvm/svm/svm.c | 2 +- > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > > index f92214b1a938..44e07c27b190 100644 > > --- a/arch/x86/kvm/svm/avic.c > > +++ b/arch/x86/kvm/svm/avic.c > > @@ -368,7 +368,7 @@ void avic_init_vmcb(struct vcpu_svm *svm, struct vmcb *vmcb) > > vmcb->control.avic_physical_id = __sme_set(__pa(kvm_svm->avic_physical_id_table)); > > vmcb->control.avic_vapic_bar = APIC_DEFAULT_PHYS_BASE; > > > > - if (kvm_apicv_activated(svm->vcpu.kvm)) > > + if (kvm_vcpu_apicv_active(&svm->vcpu)) > > avic_activate_vmcb(svm); > > else > > avic_deactivate_vmcb(svm); > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > > index 5f0136dbdde6..e8313fdc5465 100644 > > --- a/arch/x86/kvm/svm/svm.c > > +++ b/arch/x86/kvm/svm/svm.c > > @@ -1189,7 +1189,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event) > > if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS)) > > svm->vmcb->control.erap_ctl |= ERAP_CONTROL_ALLOW_LARGER_RAP; > > > > - if (kvm_vcpu_apicv_active(vcpu)) > > + if (enable_apicv && irqchip_in_kernel(vcpu->kvm)) > > avic_init_vmcb(svm, vmcb); > > Doesn't have to be done as part of this series, but I'm wondering if it > makes sense to turn this into a helper to clarify the intent and to make > it more obvious: Hmm, yeah, though my only hesitation is the name. For whatever reason, "possible" makes me think "is APICv possible *right now*" (ignoring that I wrote exactly that in the changelog). What if we go with kvm_can_use_apicv()? That would align with vmx_can_use_ipiv() and vmx_can_use_vtd_pi(), which are pretty much identical in concept. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC 2026-02-06 18:17 ` Sean Christopherson @ 2026-02-09 10:23 ` Naveen N Rao 2026-02-09 21:36 ` Sean Christopherson 0 siblings, 1 reply; 25+ messages in thread From: Naveen N Rao @ 2026-02-09 10:23 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, kvm, linux-kernel, Jim Mattson, Maciej S . Szmigiero On Fri, Feb 06, 2026 at 10:17:29AM -0800, Sean Christopherson wrote: > On Fri, Feb 06, 2026, Naveen N Rao wrote: > > On Tue, Feb 03, 2026 at 11:07:09AM -0800, Sean Christopherson wrote: > > > Initialize all per-vCPU AVIC control fields in the VMCB if AVIC is enabled > > > in KVM and the VM has an in-kernel local APIC, i.e. if it's _possible_ the > > > vCPU could activate AVIC at any point in its lifecycle. Configuring the > > > VMCB if and only if AVIC is active "works" purely because of optimizations > > > in kvm_create_lapic() to speculatively set apicv_active if AVIC is enabled > > > *and* to defer updates until the first KVM_RUN. In quotes because KVM > > > > I think it will be good to clarify that two issues are being addressed > > here (it wasn't clear to me to begin with): > > - One, described above, is about calling into avic_init_vmcb() > > regardless of the vCPU APICv status. > > - Two, described below is about using the vCPU APICv status for init and > > not consulting the VM-level APICv inhibit status. > > Yeah, I was worried the changelog didn't capture the second one well, but I was > struggling to come up with wording. How about this as a penultimate paragraph? > > Note! Use the vCPU's current APICv status when initializing the VMCB, > not the VM-level inhibit status. The state of the VMCB *must* be kept > consistent with the vCPU's APICv status at all times (KVM elides updates > that are supposed be nops). If the vCPU's APICv status isn't up-to-date > with the VM-level status, then there is guaranteed to be a pending > KVM_REQ_APICV_UPDATE, i.e. KVM will sync the vCPU with the VM before > entering the guest. LGTM. > > > > likely won't do the right thing if kvm_apicv_activated() is false, i.e. if > > > a vCPU is created while APICv is inhibited at the VM level for whatever > > > reason. E.g. if the inhibit is *removed* before KVM_REQ_APICV_UPDATE is > > > handled in KVM_RUN, then __kvm_vcpu_update_apicv() will elide calls to > > > vendor code due to seeing "apicv_active == activate". > > > > > > Cleaning up the initialization code will also allow fixing a bug where KVM > > > incorrectly leaves CR8 interception enabled when AVIC is activated without > > > creating a mess with respect to whether AVIC is activated or not. > > > > > > Cc: stable@vger.kernel.org > > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > > > Any reason not to add a Fixes: tag? > > Purely that I couldn't pin down exactly what commit(s) to blame. Well, that's a > bit of a lie. If I'm being 100% truthful, I got as far as commit 67034bb9dd5e > and decided I didn't care enough to spend the effort to figure out whether or not > that commit was truly to blame :-) > > > It looks like the below commits are to blame, but those are really old so I > > understand if you don't think this is useful: > > Fixes: 67034bb9dd5e ("KVM: SVM: Add irqchip_split() checks before enabling AVIC") > > Fixes: 6c3e4422dd20 ("svm: Add support for dynamic APICv") > > LGTM, I'll tack them on. > > > Other than that: > > Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> > > Thanks! (Seriously, I really appreciate the in-depth reviews) Glad to hear that! > > > > --- > > > arch/x86/kvm/svm/avic.c | 2 +- > > > arch/x86/kvm/svm/svm.c | 2 +- > > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > > > index f92214b1a938..44e07c27b190 100644 > > > --- a/arch/x86/kvm/svm/avic.c > > > +++ b/arch/x86/kvm/svm/avic.c > > > @@ -368,7 +368,7 @@ void avic_init_vmcb(struct vcpu_svm *svm, struct vmcb *vmcb) > > > vmcb->control.avic_physical_id = __sme_set(__pa(kvm_svm->avic_physical_id_table)); > > > vmcb->control.avic_vapic_bar = APIC_DEFAULT_PHYS_BASE; > > > > > > - if (kvm_apicv_activated(svm->vcpu.kvm)) > > > + if (kvm_vcpu_apicv_active(&svm->vcpu)) > > > avic_activate_vmcb(svm); > > > else > > > avic_deactivate_vmcb(svm); > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > > > index 5f0136dbdde6..e8313fdc5465 100644 > > > --- a/arch/x86/kvm/svm/svm.c > > > +++ b/arch/x86/kvm/svm/svm.c > > > @@ -1189,7 +1189,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event) > > > if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS)) > > > svm->vmcb->control.erap_ctl |= ERAP_CONTROL_ALLOW_LARGER_RAP; > > > > > > - if (kvm_vcpu_apicv_active(vcpu)) > > > + if (enable_apicv && irqchip_in_kernel(vcpu->kvm)) > > > avic_init_vmcb(svm, vmcb); > > > > Doesn't have to be done as part of this series, but I'm wondering if it > > makes sense to turn this into a helper to clarify the intent and to make > > it more obvious: > > Hmm, yeah, though my only hesitation is the name. For whatever reason, "possible" > makes me think "is APICv possible *right now*" (ignoring that I wrote exactly that > in the changelog). > > What if we go with kvm_can_use_apicv()? That would align with vmx_can_use_ipiv() > and vmx_can_use_vtd_pi(), which are pretty much identical in concept. Yes, that's better. I'll use that and post it as a subsequent cleanup, unless you want to pick it up rightaway. Thanks! - Naveen ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC 2026-02-09 10:23 ` Naveen N Rao @ 2026-02-09 21:36 ` Sean Christopherson 0 siblings, 0 replies; 25+ messages in thread From: Sean Christopherson @ 2026-02-09 21:36 UTC (permalink / raw) To: Naveen N Rao Cc: Paolo Bonzini, kvm, linux-kernel, Jim Mattson, Maciej S . Szmigiero On Mon, Feb 09, 2026, Naveen N Rao wrote: > On Fri, Feb 06, 2026 at 10:17:29AM -0800, Sean Christopherson wrote: > > > > arch/x86/kvm/svm/avic.c | 2 +- > > > > arch/x86/kvm/svm/svm.c | 2 +- > > > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > > > > index f92214b1a938..44e07c27b190 100644 > > > > --- a/arch/x86/kvm/svm/avic.c > > > > +++ b/arch/x86/kvm/svm/avic.c > > > > @@ -368,7 +368,7 @@ void avic_init_vmcb(struct vcpu_svm *svm, struct vmcb *vmcb) > > > > vmcb->control.avic_physical_id = __sme_set(__pa(kvm_svm->avic_physical_id_table)); > > > > vmcb->control.avic_vapic_bar = APIC_DEFAULT_PHYS_BASE; > > > > > > > > - if (kvm_apicv_activated(svm->vcpu.kvm)) > > > > + if (kvm_vcpu_apicv_active(&svm->vcpu)) > > > > avic_activate_vmcb(svm); > > > > else > > > > avic_deactivate_vmcb(svm); > > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > > > > index 5f0136dbdde6..e8313fdc5465 100644 > > > > --- a/arch/x86/kvm/svm/svm.c > > > > +++ b/arch/x86/kvm/svm/svm.c > > > > @@ -1189,7 +1189,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event) > > > > if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS)) > > > > svm->vmcb->control.erap_ctl |= ERAP_CONTROL_ALLOW_LARGER_RAP; > > > > > > > > - if (kvm_vcpu_apicv_active(vcpu)) > > > > + if (enable_apicv && irqchip_in_kernel(vcpu->kvm)) > > > > avic_init_vmcb(svm, vmcb); > > > > > > Doesn't have to be done as part of this series, but I'm wondering if it > > > makes sense to turn this into a helper to clarify the intent and to make > > > it more obvious: > > > > Hmm, yeah, though my only hesitation is the name. For whatever reason, "possible" > > makes me think "is APICv possible *right now*" (ignoring that I wrote exactly that > > in the changelog). > > > > What if we go with kvm_can_use_apicv()? That would align with vmx_can_use_ipiv() > > and vmx_can_use_vtd_pi(), which are pretty much identical in concept. > > Yes, that's better. I'll use that and post it as a subsequent cleanup, > unless you want to pick it up rightaway. Go ahead and post it separately, it's nice to have a proper paper trail. ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated 2026-02-03 19:07 [PATCH 0/2] KVM: SVM: Fix CR8 intercpetion woes with AVIC Sean Christopherson 2026-02-03 19:07 ` [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC Sean Christopherson @ 2026-02-03 19:07 ` Sean Christopherson 2026-02-05 4:22 ` Jim Mattson ` (2 more replies) 2026-03-05 17:07 ` [PATCH 0/2] KVM: SVM: Fix CR8 intercpetion woes with AVIC Sean Christopherson 2 siblings, 3 replies; 25+ messages in thread From: Sean Christopherson @ 2026-02-03 19:07 UTC (permalink / raw) To: Sean Christopherson, Paolo Bonzini Cc: kvm, linux-kernel, Jim Mattson, Naveen N Rao, Maciej S . Szmigiero Explicitly set/clear CR8 write interception when AVIC is (de)activated to fix a bug where KVM leaves the interception enabled after AVIC is activated. E.g. if KVM emulates INIT=>WFS while AVIC is deactivated, CR8 will remain intercepted in perpetuity. On its own, the dangling CR8 intercept is "just" a performance issue, but combined with the TPR sync bug fixed by commit d02e48830e3f ("KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active"), the danging intercept is fatal to Windows guests as the TPR seen by hardware gets wildly out of sync with reality. Note, VMX isn't affected by the bug as TPR_THRESHOLD is explicitly ignored when Virtual Interrupt Delivery is enabled, i.e. when APICv is active in KVM's world. I.e. there's no need to trigger update_cr8_intercept(), this is firmly an SVM implementation flaw/detail. WARN if KVM gets a CR8 write #VMEXIT while AVIC is active, as KVM should never enter the guest with AVIC enabled and CR8 writes intercepted. Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC") Cc: stable@vger.kernel.org Cc: Jim Mattson <jmattson@google.com> Cc: Naveen N Rao (AMD) <naveen@kernel.org> Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/svm/avic.c | 6 ++++-- arch/x86/kvm/svm/svm.c | 9 +++++---- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 44e07c27b190..13a4a8949aba 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -189,12 +189,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm) struct kvm_vcpu *vcpu = &svm->vcpu; vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK); - vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK; vmcb->control.avic_physical_id |= avic_get_max_physical_id(vcpu); - vmcb->control.int_ctl |= AVIC_ENABLE_MASK; + svm_clr_intercept(svm, INTERCEPT_CR8_WRITE); + /* * Note: KVM supports hybrid-AVIC mode, where KVM emulates x2APIC MSR * accesses, while interrupt injection to a running vCPU can be @@ -226,6 +226,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm) vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK); vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK; + svm_set_intercept(svm, INTERCEPT_CR8_WRITE); + /* * If running nested and the guest uses its own MSR bitmap, there * is no need to update L0's msr bitmap diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index e8313fdc5465..aa3ab22215f5 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1077,8 +1077,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event) svm_set_intercept(svm, INTERCEPT_CR0_WRITE); svm_set_intercept(svm, INTERCEPT_CR3_WRITE); svm_set_intercept(svm, INTERCEPT_CR4_WRITE); - if (!kvm_vcpu_apicv_active(vcpu)) - svm_set_intercept(svm, INTERCEPT_CR8_WRITE); + svm_set_intercept(svm, INTERCEPT_CR8_WRITE); set_dr_intercepts(svm); @@ -2674,9 +2673,11 @@ static int dr_interception(struct kvm_vcpu *vcpu) static int cr8_write_interception(struct kvm_vcpu *vcpu) { - int r; - u8 cr8_prev = kvm_get_cr8(vcpu); + int r; + + WARN_ON_ONCE(kvm_vcpu_apicv_active(vcpu)); + /* instruction emulation calls kvm_set_cr8() */ r = cr_interception(vcpu); if (lapic_in_kernel(vcpu)) -- 2.53.0.rc2.204.g2597b5adb4-goog ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated 2026-02-03 19:07 ` [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated Sean Christopherson @ 2026-02-05 4:22 ` Jim Mattson 2026-02-06 17:11 ` Naveen N Rao 2026-03-10 15:41 ` Aithal, Srikanth 2 siblings, 0 replies; 25+ messages in thread From: Jim Mattson @ 2026-02-05 4:22 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, kvm, linux-kernel, Naveen N Rao, Maciej S . Szmigiero On Tue, Feb 3, 2026 at 11:07 AM Sean Christopherson <seanjc@google.com> wrote: > > Explicitly set/clear CR8 write interception when AVIC is (de)activated to > fix a bug where KVM leaves the interception enabled after AVIC is > activated. E.g. if KVM emulates INIT=>WFS while AVIC is deactivated, CR8 > will remain intercepted in perpetuity. > > On its own, the dangling CR8 intercept is "just" a performance issue, but > combined with the TPR sync bug fixed by commit d02e48830e3f ("KVM: SVM: > Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active"), the danging > intercept is fatal to Windows guests as the TPR seen by hardware gets > wildly out of sync with reality. > > Note, VMX isn't affected by the bug as TPR_THRESHOLD is explicitly ignored > when Virtual Interrupt Delivery is enabled, i.e. when APICv is active in > KVM's world. I.e. there's no need to trigger update_cr8_intercept(), this > is firmly an SVM implementation flaw/detail. > > WARN if KVM gets a CR8 write #VMEXIT while AVIC is active, as KVM should > never enter the guest with AVIC enabled and CR8 writes intercepted. > > Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC") > Cc: stable@vger.kernel.org > Cc: Jim Mattson <jmattson@google.com> > Cc: Naveen N Rao (AMD) <naveen@kernel.org> > Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated 2026-02-03 19:07 ` [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated Sean Christopherson 2026-02-05 4:22 ` Jim Mattson @ 2026-02-06 17:11 ` Naveen N Rao 2026-02-06 17:55 ` Sean Christopherson 2026-03-10 15:41 ` Aithal, Srikanth 2 siblings, 1 reply; 25+ messages in thread From: Naveen N Rao @ 2026-02-06 17:11 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, kvm, linux-kernel, Jim Mattson, Maciej S . Szmigiero On Tue, Feb 03, 2026 at 11:07:10AM -0800, Sean Christopherson wrote: > Explicitly set/clear CR8 write interception when AVIC is (de)activated to > fix a bug where KVM leaves the interception enabled after AVIC is > activated. E.g. if KVM emulates INIT=>WFS while AVIC is deactivated, CR8 > will remain intercepted in perpetuity. Looking at svm_update_cr8_intercept(), I suppose this could also more commonly happen whenever AVIC is inhibited (IRQ Windows, as an example)? > > On its own, the dangling CR8 intercept is "just" a performance issue, but > combined with the TPR sync bug fixed by commit d02e48830e3f ("KVM: SVM: > Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active"), the danging > intercept is fatal to Windows guests as the TPR seen by hardware gets > wildly out of sync with reality. > > Note, VMX isn't affected by the bug as TPR_THRESHOLD is explicitly ignored > when Virtual Interrupt Delivery is enabled, i.e. when APICv is active in > KVM's world. I.e. there's no need to trigger update_cr8_intercept(), this > is firmly an SVM implementation flaw/detail. > > WARN if KVM gets a CR8 write #VMEXIT while AVIC is active, as KVM should > never enter the guest with AVIC enabled and CR8 writes intercepted. > > Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC") > Cc: stable@vger.kernel.org > Cc: Jim Mattson <jmattson@google.com> > Cc: Naveen N Rao (AMD) <naveen@kernel.org> > Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/svm/avic.c | 6 ++++-- > arch/x86/kvm/svm/svm.c | 9 +++++---- > 2 files changed, 9 insertions(+), 6 deletions(-) LGTM. Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Thanks, Naveen ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated 2026-02-06 17:11 ` Naveen N Rao @ 2026-02-06 17:55 ` Sean Christopherson 0 siblings, 0 replies; 25+ messages in thread From: Sean Christopherson @ 2026-02-06 17:55 UTC (permalink / raw) To: Naveen N Rao Cc: Paolo Bonzini, kvm, linux-kernel, Jim Mattson, Maciej S . Szmigiero On Fri, Feb 06, 2026, Naveen N Rao wrote: > On Tue, Feb 03, 2026 at 11:07:10AM -0800, Sean Christopherson wrote: > > Explicitly set/clear CR8 write interception when AVIC is (de)activated to > > fix a bug where KVM leaves the interception enabled after AVIC is > > activated. E.g. if KVM emulates INIT=>WFS while AVIC is deactivated, CR8 > > will remain intercepted in perpetuity. > > Looking at svm_update_cr8_intercept(), I suppose this could also more > commonly happen whenever AVIC is inhibited (IRQ Windows, as an example)? Maybe? I don't think it's actually common in practice. Because the bug requires the source of the inhibition to be removed while the vCPU still has a pending IRQ that is below PPR. Which is definitely possible, but that seems overall unlikely, and it'd also be self-healing to some extent. E.g. if a workload is triggering ExtINT, then odds are good it's going to _keep_ generating ExtINT, keep toggling the inhibit, and thus reconcile CR8 interception every time AVIC is inhibited. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated 2026-02-03 19:07 ` [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated Sean Christopherson 2026-02-05 4:22 ` Jim Mattson 2026-02-06 17:11 ` Naveen N Rao @ 2026-03-10 15:41 ` Aithal, Srikanth 2026-03-10 17:17 ` Sean Christopherson 2 siblings, 1 reply; 25+ messages in thread From: Aithal, Srikanth @ 2026-03-10 15:41 UTC (permalink / raw) To: Sean Christopherson, Paolo Bonzini Cc: kvm, linux-kernel, Jim Mattson, Naveen N Rao, Maciej S . Szmigiero Hello Sean, From next-20260304 onwards [1], including recent next kernel next-20260309, booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC Genoa has been failing. However, on EPYC Milan, the SEV-ES guest boots fine. I am using the same QEMU command line (given below) with the same versions of QEMU and OVMF on all three platforms. "$QEMU_BIN" \ -machine q35,confidential-guest-support=sev0,vmport=off \ -object sev-guest,id=sev0,policy=0x5,cbitpos=51,reduced-phys-bits=1 \ -name guest=vm,debug-threads=on \ -drive if=pflash,format=raw,unit=0,file="$OVMF_PATH",readonly=on \ -m 2048 \ -object memory-backend-ram,size=2048M,id=mem-machine_mem \ -smp 1,maxcpus=1,cores=1,threads=1,dies=1,sockets=1 \ -cpu host \ -drive id=disk0,file="$DISK_IMAGE",format=qcow2,if=none \ -device virtio-scsi-pci,id=scsi0,disable-legacy=on,iommu_platform=true \ -device scsi-hd,drive=disk0 \ -enable-kvm \ -nographic \ -monitor tcp:localhost:4444,server,nowait QEMU version: v10.2.1 OVMF version: edk2-stable202602 The SEV-ES guest crashes with the following QEMU trace: error: kvm run failed Invalid argument EAX=00000000 EBX=00000000 ECX=00000000 EDX=00a10f10 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 ffff0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=30 17 4d 99 a6 74 ad 5a a1 1d d2 22 78 9f 73 25 ab 00 2f c0 <cd> d3 ee 26 63 0d f5 de f3 ea c3 91 28 ba b5 ac ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? KVM host serial log message that appears when the crash happens: text [ 4379.695497] kvm_amd: kvm [5809]: vcpu0, guest rIP: 0x0 vmgexit: unsupported event - exit_info_1=0x18, exit_info_2=0x0 Bisecting shows that this commit is the first bad one. When I revert it, I am able to boot the SEV-ES guest successfully on both Turin and Genoa platforms: e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431 Author: Sean Christopherson <seanjc@google.com> Date: Tue Feb 3 11:07:10 2026 -0800 KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated [1]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git, next-20260304 Will be happy to get any more information required. Thank you. Srikanth Aithal <sraithal@amd.com> On 2/4/2026 12:37 AM, Sean Christopherson wrote: > Explicitly set/clear CR8 write interception when AVIC is (de)activated to > fix a bug where KVM leaves the interception enabled after AVIC is > activated. E.g. if KVM emulates INIT=>WFS while AVIC is deactivated, CR8 > will remain intercepted in perpetuity. > > On its own, the dangling CR8 intercept is "just" a performance issue, but > combined with the TPR sync bug fixed by commit d02e48830e3f ("KVM: SVM: > Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active"), the danging > intercept is fatal to Windows guests as the TPR seen by hardware gets > wildly out of sync with reality. > > Note, VMX isn't affected by the bug as TPR_THRESHOLD is explicitly ignored > when Virtual Interrupt Delivery is enabled, i.e. when APICv is active in > KVM's world. I.e. there's no need to trigger update_cr8_intercept(), this > is firmly an SVM implementation flaw/detail. > > WARN if KVM gets a CR8 write #VMEXIT while AVIC is active, as KVM should > never enter the guest with AVIC enabled and CR8 writes intercepted. > > Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC") > Cc: stable@vger.kernel.org > Cc: Jim Mattson <jmattson@google.com> > Cc: Naveen N Rao (AMD) <naveen@kernel.org> > Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/svm/avic.c | 6 ++++-- > arch/x86/kvm/svm/svm.c | 9 +++++---- > 2 files changed, 9 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > index 44e07c27b190..13a4a8949aba 100644 > --- a/arch/x86/kvm/svm/avic.c > +++ b/arch/x86/kvm/svm/avic.c > @@ -189,12 +189,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm) > struct kvm_vcpu *vcpu = &svm->vcpu; > > vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK); > - > vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK; > vmcb->control.avic_physical_id |= avic_get_max_physical_id(vcpu); > - > vmcb->control.int_ctl |= AVIC_ENABLE_MASK; > > + svm_clr_intercept(svm, INTERCEPT_CR8_WRITE); > + > /* > * Note: KVM supports hybrid-AVIC mode, where KVM emulates x2APIC MSR > * accesses, while interrupt injection to a running vCPU can be > @@ -226,6 +226,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm) > vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK); > vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK; > > + svm_set_intercept(svm, INTERCEPT_CR8_WRITE); > + > /* > * If running nested and the guest uses its own MSR bitmap, there > * is no need to update L0's msr bitmap > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index e8313fdc5465..aa3ab22215f5 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -1077,8 +1077,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event) > svm_set_intercept(svm, INTERCEPT_CR0_WRITE); > svm_set_intercept(svm, INTERCEPT_CR3_WRITE); > svm_set_intercept(svm, INTERCEPT_CR4_WRITE); > - if (!kvm_vcpu_apicv_active(vcpu)) > - svm_set_intercept(svm, INTERCEPT_CR8_WRITE); > + svm_set_intercept(svm, INTERCEPT_CR8_WRITE); > > set_dr_intercepts(svm); > > @@ -2674,9 +2673,11 @@ static int dr_interception(struct kvm_vcpu *vcpu) > > static int cr8_write_interception(struct kvm_vcpu *vcpu) > { > - int r; > - > u8 cr8_prev = kvm_get_cr8(vcpu); > + int r; > + > + WARN_ON_ONCE(kvm_vcpu_apicv_active(vcpu)); > + > /* instruction emulation calls kvm_set_cr8() */ > r = cr_interception(vcpu); > if (lapic_in_kernel(vcpu)) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated 2026-03-10 15:41 ` Aithal, Srikanth @ 2026-03-10 17:17 ` Sean Christopherson 2026-03-10 17:36 ` Tom Lendacky 0 siblings, 1 reply; 25+ messages in thread From: Sean Christopherson @ 2026-03-10 17:17 UTC (permalink / raw) To: Srikanth Aithal Cc: Paolo Bonzini, kvm, linux-kernel, Jim Mattson, Naveen N Rao, Maciej S . Szmigiero On Tue, Mar 10, 2026, Srikanth Aithal wrote: > > Hello Sean, > > From next-20260304 onwards [1], including recent next kernel next-20260309, > booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC Genoa has been > failing. However, on EPYC Milan, the SEV-ES guest boots fine. ... > Bisecting shows that this commit is the first bad one. When I revert it, I > am able to boot the SEV-ES guest successfully on both Turin and Genoa > platforms: > > e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit > commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431 > Author: Sean Christopherson <seanjc@google.com> > Date: Tue Feb 3 11:07:10 2026 -0800 Gah, I hate how KVM manages intercepts for SEV-ES+. Though to a large extent I blame the architecture for not simply making CR{0,4,8} intercept trap-like. Side topic, is the host actually allowed to trap CR3 writes? That seems like a huge gaping security flaw, especially for SNP+. Anyways, this should fix the immediate problem. diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 33172f0e986b..b6072872b785 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -237,7 +237,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm) vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK); vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK; - svm_set_intercept(svm, INTERCEPT_CR8_WRITE); + if (!sev_es_guest(svm->vcpu.kvm)) + svm_set_intercept(svm, INTERCEPT_CR8_WRITE); /* * If running nested and the guest uses its own MSR bitmap, there Argh! The more I look at this code, the more frustrated I get. The unconditional setting of TRAP_CR8_WRITE for SEV-ES+ is flawed. When AVIC is enabled, KVM doesn't need to trap CR8 writes because hardware will update the backing page. I'm guessing Windows doesn't support running as an SEV-ES guest, which is no one has noticed. Actually, it's worse than that. sync_cr8_to_lapic() will straight up clobber the backing page. Presumably hardware never actually uses TPR from the AVIC backing page, but it's still gross. sync_lapic_to_cr8() is also beyond useless. And all of sync code should pivot on guest_state_protected, not sev_es_guest(). For now, I'll just post the above (assuming it fixes the issue). But this code needs some love sooner than later. ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated 2026-03-10 17:17 ` Sean Christopherson @ 2026-03-10 17:36 ` Tom Lendacky 2026-03-10 17:48 ` Naveen N Rao 0 siblings, 1 reply; 25+ messages in thread From: Tom Lendacky @ 2026-03-10 17:36 UTC (permalink / raw) To: Sean Christopherson, Srikanth Aithal Cc: Paolo Bonzini, kvm, linux-kernel, Jim Mattson, Naveen N Rao, Maciej S . Szmigiero On 3/10/26 12:17, Sean Christopherson wrote: > On Tue, Mar 10, 2026, Srikanth Aithal wrote: >> >> Hello Sean, >> >> From next-20260304 onwards [1], including recent next kernel next-20260309, >> booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC Genoa has been >> failing. However, on EPYC Milan, the SEV-ES guest boots fine. > > ... > >> Bisecting shows that this commit is the first bad one. When I revert it, I >> am able to boot the SEV-ES guest successfully on both Turin and Genoa >> platforms: >> >> e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit >> commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431 >> Author: Sean Christopherson <seanjc@google.com> >> Date: Tue Feb 3 11:07:10 2026 -0800 > > Gah, I hate how KVM manages intercepts for SEV-ES+. Though to a large extent I > blame the architecture for not simply making CR{0,4,8} intercept trap-like. > Side topic, is the host actually allowed to trap CR3 writes? That seems like a > huge gaping security flaw, especially for SNP+. > > Anyways, this should fix the immediate problem. > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > index 33172f0e986b..b6072872b785 100644 > --- a/arch/x86/kvm/svm/avic.c > +++ b/arch/x86/kvm/svm/avic.c > @@ -237,7 +237,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm) > vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK); > vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK; > > - svm_set_intercept(svm, INTERCEPT_CR8_WRITE); > + if (!sev_es_guest(svm->vcpu.kvm)) > + svm_set_intercept(svm, INTERCEPT_CR8_WRITE); > > /* > * If running nested and the guest uses its own MSR bitmap, there > > Argh! The more I look at this code, the more frustrated I get. The unconditional > setting of TRAP_CR8_WRITE for SEV-ES+ is flawed. When AVIC is enabled, KVM doesn't AVIC is disabled for SEV guests (see __sev_guest_init() and the kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV) call at the end of the function). Thanks, Tom > need to trap CR8 writes because hardware will update the backing page. I'm guessing > Windows doesn't support running as an SEV-ES guest, which is no one has noticed. > > Actually, it's worse than that. sync_cr8_to_lapic() will straight up clobber the > backing page. Presumably hardware never actually uses TPR from the AVIC backing > page, but it's still gross. sync_lapic_to_cr8() is also beyond useless. > > And all of sync code should pivot on guest_state_protected, not sev_es_guest(). > > For now, I'll just post the above (assuming it fixes the issue). But this code > needs some love sooner than later. > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated 2026-03-10 17:36 ` Tom Lendacky @ 2026-03-10 17:48 ` Naveen N Rao 2026-03-10 18:00 ` Naveen N Rao 2026-03-10 18:12 ` Tom Lendacky 0 siblings, 2 replies; 25+ messages in thread From: Naveen N Rao @ 2026-03-10 17:48 UTC (permalink / raw) To: Tom Lendacky Cc: Sean Christopherson, Srikanth Aithal, Paolo Bonzini, kvm, linux-kernel, Jim Mattson, Maciej S . Szmigiero On Tue, Mar 10, 2026 at 12:36:09PM -0500, Tom Lendacky wrote: > On 3/10/26 12:17, Sean Christopherson wrote: > > On Tue, Mar 10, 2026, Srikanth Aithal wrote: > >> > >> Hello Sean, > >> > >> From next-20260304 onwards [1], including recent next kernel next-20260309, > >> booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC Genoa has been > >> failing. However, on EPYC Milan, the SEV-ES guest boots fine. > > > > ... > > > >> Bisecting shows that this commit is the first bad one. When I revert it, I > >> am able to boot the SEV-ES guest successfully on both Turin and Genoa > >> platforms: > >> > >> e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit > >> commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431 > >> Author: Sean Christopherson <seanjc@google.com> > >> Date: Tue Feb 3 11:07:10 2026 -0800 > > > > Gah, I hate how KVM manages intercepts for SEV-ES+. Though to a large extent I > > blame the architecture for not simply making CR{0,4,8} intercept trap-like. > > Side topic, is the host actually allowed to trap CR3 writes? That seems like a > > huge gaping security flaw, especially for SNP+. > > > > Anyways, this should fix the immediate problem. > > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > > index 33172f0e986b..b6072872b785 100644 > > --- a/arch/x86/kvm/svm/avic.c > > +++ b/arch/x86/kvm/svm/avic.c > > @@ -237,7 +237,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm) > > vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK); > > vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK; > > > > - svm_set_intercept(svm, INTERCEPT_CR8_WRITE); > > + if (!sev_es_guest(svm->vcpu.kvm)) > > + svm_set_intercept(svm, INTERCEPT_CR8_WRITE); > > > > /* > > * If running nested and the guest uses its own MSR bitmap, there > > > > Argh! The more I look at this code, the more frustrated I get. The unconditional > > setting of TRAP_CR8_WRITE for SEV-ES+ is flawed. When AVIC is enabled, KVM doesn't > > AVIC is disabled for SEV guests (see __sev_guest_init() and the > kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV) call at the end of > the function). AVIC gets inhibited globally, but continues to be enabled on vcpu_create() opportunistically -- see kvm_create_lapic(). It only gets disabled later during vcpu setup via vcpu_reset()->svm_vcpu_reset()->init_vmcb()->avic_init_vmcb() - Naveen ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated 2026-03-10 17:48 ` Naveen N Rao @ 2026-03-10 18:00 ` Naveen N Rao 2026-03-10 18:12 ` Tom Lendacky 1 sibling, 0 replies; 25+ messages in thread From: Naveen N Rao @ 2026-03-10 18:00 UTC (permalink / raw) To: Tom Lendacky Cc: Sean Christopherson, Srikanth Aithal, Paolo Bonzini, kvm, linux-kernel, Jim Mattson, Maciej S . Szmigiero On Tue, Mar 10, 2026 at 11:18:16PM +0530, Naveen N Rao wrote: > On Tue, Mar 10, 2026 at 12:36:09PM -0500, Tom Lendacky wrote: > > On 3/10/26 12:17, Sean Christopherson wrote: > > > On Tue, Mar 10, 2026, Srikanth Aithal wrote: > > >> > > >> Hello Sean, > > >> > > >> From next-20260304 onwards [1], including recent next kernel next-20260309, > > >> booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC Genoa has been > > >> failing. However, on EPYC Milan, the SEV-ES guest boots fine. > > > > > > ... > > > > > >> Bisecting shows that this commit is the first bad one. When I revert it, I > > >> am able to boot the SEV-ES guest successfully on both Turin and Genoa > > >> platforms: > > >> > > >> e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit > > >> commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431 > > >> Author: Sean Christopherson <seanjc@google.com> > > >> Date: Tue Feb 3 11:07:10 2026 -0800 > > > > > > Gah, I hate how KVM manages intercepts for SEV-ES+. Though to a large extent I > > > blame the architecture for not simply making CR{0,4,8} intercept trap-like. > > > Side topic, is the host actually allowed to trap CR3 writes? That seems like a > > > huge gaping security flaw, especially for SNP+. > > > > > > Anyways, this should fix the immediate problem. > > > > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > > > index 33172f0e986b..b6072872b785 100644 > > > --- a/arch/x86/kvm/svm/avic.c > > > +++ b/arch/x86/kvm/svm/avic.c > > > @@ -237,7 +237,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm) > > > vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK); > > > vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK; > > > > > > - svm_set_intercept(svm, INTERCEPT_CR8_WRITE); > > > + if (!sev_es_guest(svm->vcpu.kvm)) > > > + svm_set_intercept(svm, INTERCEPT_CR8_WRITE); > > > > > > /* > > > * If running nested and the guest uses its own MSR bitmap, there I arrived at the same fix and it works for me, so FWIW: Acked-by: Naveen N Rao (AMD) <naveen@kernel.org> > > > > > > Argh! The more I look at this code, the more frustrated I get. > > > The unconditional > > > setting of TRAP_CR8_WRITE for SEV-ES+ is flawed. When AVIC is enabled, KVM doesn't > > > > AVIC is disabled for SEV guests (see __sev_guest_init() and the > > kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV) call at the end of > > the function). > > AVIC gets inhibited globally, but continues to be enabled on > vcpu_create() opportunistically -- see kvm_create_lapic(). It only gets > disabled later during vcpu setup via > vcpu_reset()->svm_vcpu_reset()->init_vmcb()->avic_init_vmcb() ... which explains why this issue is showing up. But reading your response again, I guess you were pointing out that the intercepts are not a problem for SEV-ES guests since AVIC is inhibited, which totally makes sense. Thanks, Naveen ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated 2026-03-10 17:48 ` Naveen N Rao 2026-03-10 18:00 ` Naveen N Rao @ 2026-03-10 18:12 ` Tom Lendacky 2026-03-10 18:35 ` Sean Christopherson 1 sibling, 1 reply; 25+ messages in thread From: Tom Lendacky @ 2026-03-10 18:12 UTC (permalink / raw) To: Naveen N Rao Cc: Sean Christopherson, Srikanth Aithal, Paolo Bonzini, kvm, linux-kernel, Jim Mattson, Maciej S . Szmigiero On 3/10/26 12:48, Naveen N Rao wrote: > On Tue, Mar 10, 2026 at 12:36:09PM -0500, Tom Lendacky wrote: >> On 3/10/26 12:17, Sean Christopherson wrote: >>> On Tue, Mar 10, 2026, Srikanth Aithal wrote: >>>> >>>> Hello Sean, >>>> >>>> From next-20260304 onwards [1], including recent next kernel next-20260309, >>>> booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC Genoa has been >>>> failing. However, on EPYC Milan, the SEV-ES guest boots fine. >>> >>> ... >>> >>>> Bisecting shows that this commit is the first bad one. When I revert it, I >>>> am able to boot the SEV-ES guest successfully on both Turin and Genoa >>>> platforms: >>>> >>>> e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit >>>> commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431 >>>> Author: Sean Christopherson <seanjc@google.com> >>>> Date: Tue Feb 3 11:07:10 2026 -0800 >>> >>> Gah, I hate how KVM manages intercepts for SEV-ES+. Though to a large extent I >>> blame the architecture for not simply making CR{0,4,8} intercept trap-like. >>> Side topic, is the host actually allowed to trap CR3 writes? That seems like a >>> huge gaping security flaw, especially for SNP+. >>> >>> Anyways, this should fix the immediate problem. >>> >>> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c >>> index 33172f0e986b..b6072872b785 100644 >>> --- a/arch/x86/kvm/svm/avic.c >>> +++ b/arch/x86/kvm/svm/avic.c >>> @@ -237,7 +237,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm) >>> vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK); >>> vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK; >>> >>> - svm_set_intercept(svm, INTERCEPT_CR8_WRITE); >>> + if (!sev_es_guest(svm->vcpu.kvm)) >>> + svm_set_intercept(svm, INTERCEPT_CR8_WRITE); >>> >>> /* >>> * If running nested and the guest uses its own MSR bitmap, there >>> >>> Argh! The more I look at this code, the more frustrated I get. The unconditional >>> setting of TRAP_CR8_WRITE for SEV-ES+ is flawed. When AVIC is enabled, KVM doesn't >> >> AVIC is disabled for SEV guests (see __sev_guest_init() and the >> kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV) call at the end of >> the function). > > AVIC gets inhibited globally, but continues to be enabled on > vcpu_create() opportunistically -- see kvm_create_lapic(). It only gets > disabled later during vcpu setup via > vcpu_reset()->svm_vcpu_reset()->init_vmcb()->avic_init_vmcb() I'm just saying that the unconditional trap for CR8_WRITE isn't flawed for SEV-ES+ because AVIC can't work with SEV, so there isn't any time that CR8 writes shouldn't be trapped. Thanks, Tom > > > - Naveen > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated 2026-03-10 18:12 ` Tom Lendacky @ 2026-03-10 18:35 ` Sean Christopherson 2026-03-10 21:41 ` Tom Lendacky 0 siblings, 1 reply; 25+ messages in thread From: Sean Christopherson @ 2026-03-10 18:35 UTC (permalink / raw) To: Tom Lendacky Cc: Naveen N Rao, Srikanth Aithal, Paolo Bonzini, kvm, linux-kernel, Jim Mattson, Maciej S . Szmigiero On Tue, Mar 10, 2026, Tom Lendacky wrote: > On 3/10/26 12:48, Naveen N Rao wrote: > > On Tue, Mar 10, 2026 at 12:36:09PM -0500, Tom Lendacky wrote: > >> On 3/10/26 12:17, Sean Christopherson wrote: > >>> On Tue, Mar 10, 2026, Srikanth Aithal wrote: > >>>> > >>>> Hello Sean, > >>>> > >>>> From next-20260304 onwards [1], including recent next kernel next-20260309, > >>>> booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC Genoa has been > >>>> failing. However, on EPYC Milan, the SEV-ES guest boots fine. > >>> > >>> ... > >>> > >>>> Bisecting shows that this commit is the first bad one. When I revert it, I > >>>> am able to boot the SEV-ES guest successfully on both Turin and Genoa > >>>> platforms: > >>>> > >>>> e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit > >>>> commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431 > >>>> Author: Sean Christopherson <seanjc@google.com> > >>>> Date: Tue Feb 3 11:07:10 2026 -0800 > >>> > >>> Gah, I hate how KVM manages intercepts for SEV-ES+. Though to a large extent I > >>> blame the architecture for not simply making CR{0,4,8} intercept trap-like. > >>> Side topic, is the host actually allowed to trap CR3 writes? That seems like a > >>> huge gaping security flaw, especially for SNP+. > >>> > >>> Anyways, this should fix the immediate problem. > >>> > >>> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > >>> index 33172f0e986b..b6072872b785 100644 > >>> --- a/arch/x86/kvm/svm/avic.c > >>> +++ b/arch/x86/kvm/svm/avic.c > >>> @@ -237,7 +237,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm) > >>> vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK); > >>> vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK; > >>> > >>> - svm_set_intercept(svm, INTERCEPT_CR8_WRITE); > >>> + if (!sev_es_guest(svm->vcpu.kvm)) > >>> + svm_set_intercept(svm, INTERCEPT_CR8_WRITE); > >>> > >>> /* > >>> * If running nested and the guest uses its own MSR bitmap, there > >>> > >>> Argh! The more I look at this code, the more frustrated I get. The unconditional > >>> setting of TRAP_CR8_WRITE for SEV-ES+ is flawed. When AVIC is enabled, KVM doesn't > >> > >> AVIC is disabled for SEV guests (see __sev_guest_init() and the > >> kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV) call at the end of > >> the function). > > > > AVIC gets inhibited globally, but continues to be enabled on > > vcpu_create() opportunistically -- see kvm_create_lapic(). It only gets > > disabled later during vcpu setup via > > vcpu_reset()->svm_vcpu_reset()->init_vmcb()->avic_init_vmcb() > > I'm just saying that the unconditional trap for CR8_WRITE isn't flawed > for SEV-ES+ because AVIC can't work with SEV, so there isn't any time > that CR8 writes shouldn't be trapped. Yeah, I forgot that (obviously). But sync_cr8_to_lapic() is very broken, no? INTERCEPT_CR8_WRITE will never be set, and svm->vmcb->control.int_ctl will become stale as soon as the VMSA is live, and so in all likelihood KVM is crushing CR8 to zero for SEV-ES guests. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated 2026-03-10 18:35 ` Sean Christopherson @ 2026-03-10 21:41 ` Tom Lendacky 2026-03-10 21:58 ` Sean Christopherson 0 siblings, 1 reply; 25+ messages in thread From: Tom Lendacky @ 2026-03-10 21:41 UTC (permalink / raw) To: Sean Christopherson Cc: Naveen N Rao, Srikanth Aithal, Paolo Bonzini, kvm, linux-kernel, Jim Mattson, Maciej S . Szmigiero On 3/10/26 13:35, Sean Christopherson wrote: > On Tue, Mar 10, 2026, Tom Lendacky wrote: >> On 3/10/26 12:48, Naveen N Rao wrote: >>> On Tue, Mar 10, 2026 at 12:36:09PM -0500, Tom Lendacky wrote: >>>> On 3/10/26 12:17, Sean Christopherson wrote: >>>>> On Tue, Mar 10, 2026, Srikanth Aithal wrote: >>>>>> >>>>>> Hello Sean, >>>>>> >>>>>> From next-20260304 onwards [1], including recent next kernel next-20260309, >>>>>> booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC Genoa has been >>>>>> failing. However, on EPYC Milan, the SEV-ES guest boots fine. >>>>> >>>>> ... >>>>> >>>>>> Bisecting shows that this commit is the first bad one. When I revert it, I >>>>>> am able to boot the SEV-ES guest successfully on both Turin and Genoa >>>>>> platforms: >>>>>> >>>>>> e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit >>>>>> commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431 >>>>>> Author: Sean Christopherson <seanjc@google.com> >>>>>> Date: Tue Feb 3 11:07:10 2026 -0800 >>>>> >>>>> Gah, I hate how KVM manages intercepts for SEV-ES+. Though to a large extent I >>>>> blame the architecture for not simply making CR{0,4,8} intercept trap-like. >>>>> Side topic, is the host actually allowed to trap CR3 writes? That seems like a >>>>> huge gaping security flaw, especially for SNP+. >>>>> >>>>> Anyways, this should fix the immediate problem. >>>>> >>>>> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c >>>>> index 33172f0e986b..b6072872b785 100644 >>>>> --- a/arch/x86/kvm/svm/avic.c >>>>> +++ b/arch/x86/kvm/svm/avic.c >>>>> @@ -237,7 +237,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm) >>>>> vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK); >>>>> vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK; >>>>> >>>>> - svm_set_intercept(svm, INTERCEPT_CR8_WRITE); >>>>> + if (!sev_es_guest(svm->vcpu.kvm)) >>>>> + svm_set_intercept(svm, INTERCEPT_CR8_WRITE); >>>>> >>>>> /* >>>>> * If running nested and the guest uses its own MSR bitmap, there >>>>> >>>>> Argh! The more I look at this code, the more frustrated I get. The unconditional >>>>> setting of TRAP_CR8_WRITE for SEV-ES+ is flawed. When AVIC is enabled, KVM doesn't >>>> >>>> AVIC is disabled for SEV guests (see __sev_guest_init() and the >>>> kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV) call at the end of >>>> the function). >>> >>> AVIC gets inhibited globally, but continues to be enabled on >>> vcpu_create() opportunistically -- see kvm_create_lapic(). It only gets >>> disabled later during vcpu setup via >>> vcpu_reset()->svm_vcpu_reset()->init_vmcb()->avic_init_vmcb() >> >> I'm just saying that the unconditional trap for CR8_WRITE isn't flawed >> for SEV-ES+ because AVIC can't work with SEV, so there isn't any time >> that CR8 writes shouldn't be trapped. > > Yeah, I forgot that (obviously). > > But sync_cr8_to_lapic() is very broken, no? INTERCEPT_CR8_WRITE will never be > set, and svm->vmcb->control.int_ctl will become stale as soon as the VMSA is > live, and so in all likelihood KVM is crushing CR8 to zero for SEV-ES guests. I don't think so. V_TPR is written on #VMEXIT even for SEV-ES+ guests, and since it is a trap, CR8 is set and so V_TPR should have that value. That would imply sync_cr8_to_lapic() should do the right thing. After attempting to verify this behavior it turns out that writes to CR8 (and CR2) are, in fact, not trapped, but the APM was not updated with this information (I'll send a patch to remove that code). KVM's CR8 value is, however, synced with the proper value through sync_cr8_to_lapic() because V_TPR in the VMCB is updated on #VMEXIT. Thanks, Tom ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated 2026-03-10 21:41 ` Tom Lendacky @ 2026-03-10 21:58 ` Sean Christopherson 2026-03-10 22:33 ` Tom Lendacky 0 siblings, 1 reply; 25+ messages in thread From: Sean Christopherson @ 2026-03-10 21:58 UTC (permalink / raw) To: Tom Lendacky Cc: Naveen N Rao, Srikanth Aithal, Paolo Bonzini, kvm, linux-kernel, Jim Mattson, Maciej S . Szmigiero On Tue, Mar 10, 2026, Tom Lendacky wrote: > On 3/10/26 13:35, Sean Christopherson wrote: > > On Tue, Mar 10, 2026, Tom Lendacky wrote: > >> I'm just saying that the unconditional trap for CR8_WRITE isn't flawed > >> for SEV-ES+ because AVIC can't work with SEV, so there isn't any time > >> that CR8 writes shouldn't be trapped. > > > > Yeah, I forgot that (obviously). > > > > But sync_cr8_to_lapic() is very broken, no? INTERCEPT_CR8_WRITE will never be > > set, and svm->vmcb->control.int_ctl will become stale as soon as the VMSA is > > live, and so in all likelihood KVM is crushing CR8 to zero for SEV-ES guests. > > I don't think so. V_TPR is written on #VMEXIT even for SEV-ES+ guests, > and since it is a trap, CR8 is set and so V_TPR should have that value. > That would imply sync_cr8_to_lapic() should do the right thing. But isn't svm->vmcb->control.int_ctl stale? Oh. "control", not "save". /facepalm Ah, and I assume Secure AVIC hides vTPR from the host? Or at least prevents the host from setting it? > After attempting to verify this behavior it turns out that writes to CR8 > (and CR2) are, in fact, not trapped, but the APM was not updated with > this information (I'll send a patch to remove that code). KVM's CR8 > value is, however, synced with the proper value through > sync_cr8_to_lapic() because V_TPR in the VMCB is updated on #VMEXIT. Oh. Huh. So doesn't that mean that supporting Windows (or any other guest that uses TPR to mask interrupts) as an SEV-ES guest is practically impossible? Because while KVM can observe and manipulate guest CR8, KVM won't be able to precisely detect when TPR drops below a pending IRQ. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated 2026-03-10 21:58 ` Sean Christopherson @ 2026-03-10 22:33 ` Tom Lendacky 2026-03-10 22:40 ` Sean Christopherson 2026-03-11 17:39 ` Paolo Bonzini 0 siblings, 2 replies; 25+ messages in thread From: Tom Lendacky @ 2026-03-10 22:33 UTC (permalink / raw) To: Sean Christopherson Cc: Naveen N Rao, Srikanth Aithal, Paolo Bonzini, kvm, linux-kernel, Jim Mattson, Maciej S . Szmigiero On 3/10/26 16:58, Sean Christopherson wrote: > On Tue, Mar 10, 2026, Tom Lendacky wrote: >> On 3/10/26 13:35, Sean Christopherson wrote: >>> On Tue, Mar 10, 2026, Tom Lendacky wrote: >>>> I'm just saying that the unconditional trap for CR8_WRITE isn't flawed >>>> for SEV-ES+ because AVIC can't work with SEV, so there isn't any time >>>> that CR8 writes shouldn't be trapped. >>> >>> Yeah, I forgot that (obviously). >>> >>> But sync_cr8_to_lapic() is very broken, no? INTERCEPT_CR8_WRITE will never be >>> set, and svm->vmcb->control.int_ctl will become stale as soon as the VMSA is >>> live, and so in all likelihood KVM is crushing CR8 to zero for SEV-ES guests. >> >> I don't think so. V_TPR is written on #VMEXIT even for SEV-ES+ guests, >> and since it is a trap, CR8 is set and so V_TPR should have that value. >> That would imply sync_cr8_to_lapic() should do the right thing. > > But isn't svm->vmcb->control.int_ctl stale? Oh. "control", not "save". /facepalm > > Ah, and I assume Secure AVIC hides vTPR from the host? Or at least prevents the > host from setting it? Secure AVIC will prevent the host from setting it since the backing page lives in guest memory and is encrypted/private. > >> After attempting to verify this behavior it turns out that writes to CR8 >> (and CR2) are, in fact, not trapped, but the APM was not updated with >> this information (I'll send a patch to remove that code). KVM's CR8 >> value is, however, synced with the proper value through >> sync_cr8_to_lapic() because V_TPR in the VMCB is updated on #VMEXIT. > > Oh. Huh. So doesn't that mean that supporting Windows (or any other guest that > uses TPR to mask interrupts) as an SEV-ES guest is practically impossible? Because > while KVM can observe and manipulate guest CR8, KVM won't be able to precisely > detect when TPR drops below a pending IRQ. Could we do something with virtual interrupt support? Today KVM uses the virtual interrupt control to detect when an IRQ window opens. We could do something similar by setting up the virtual interrupt priority, V_INTR_PRIO, at the level of the current TPR/CR8 level. When the TPR drops, that would trigger a #VMEXIT and allow the pending IRQ to be injected. Thoughts? Thanks, Tom > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated 2026-03-10 22:33 ` Tom Lendacky @ 2026-03-10 22:40 ` Sean Christopherson 2026-03-11 13:43 ` Tom Lendacky 2026-03-11 17:39 ` Paolo Bonzini 1 sibling, 1 reply; 25+ messages in thread From: Sean Christopherson @ 2026-03-10 22:40 UTC (permalink / raw) To: Tom Lendacky Cc: Naveen N Rao, Srikanth Aithal, Paolo Bonzini, kvm, linux-kernel, Jim Mattson, Maciej S . Szmigiero On Tue, Mar 10, 2026, Tom Lendacky wrote: > On 3/10/26 16:58, Sean Christopherson wrote: > > On Tue, Mar 10, 2026, Tom Lendacky wrote: > >> On 3/10/26 13:35, Sean Christopherson wrote: > >>> On Tue, Mar 10, 2026, Tom Lendacky wrote: > >>>> I'm just saying that the unconditional trap for CR8_WRITE isn't flawed > >>>> for SEV-ES+ because AVIC can't work with SEV, so there isn't any time > >>>> that CR8 writes shouldn't be trapped. > >>> > >>> Yeah, I forgot that (obviously). > >>> > >>> But sync_cr8_to_lapic() is very broken, no? INTERCEPT_CR8_WRITE will never be > >>> set, and svm->vmcb->control.int_ctl will become stale as soon as the VMSA is > >>> live, and so in all likelihood KVM is crushing CR8 to zero for SEV-ES guests. > >> > >> I don't think so. V_TPR is written on #VMEXIT even for SEV-ES+ guests, > >> and since it is a trap, CR8 is set and so V_TPR should have that value. > >> That would imply sync_cr8_to_lapic() should do the right thing. > > > > But isn't svm->vmcb->control.int_ctl stale? Oh. "control", not "save". /facepalm > > > > Ah, and I assume Secure AVIC hides vTPR from the host? Or at least prevents the > > host from setting it? > > Secure AVIC will prevent the host from setting it since the backing page > lives in guest memory and is encrypted/private. What about vmcb->control.int_ctl though? IIUC, that's the source of truth for the effective vTPR, not the value in the virtual APIC page. > >> After attempting to verify this behavior it turns out that writes to CR8 > >> (and CR2) are, in fact, not trapped, but the APM was not updated with > >> this information (I'll send a patch to remove that code). KVM's CR8 > >> value is, however, synced with the proper value through > >> sync_cr8_to_lapic() because V_TPR in the VMCB is updated on #VMEXIT. > > > > Oh. Huh. So doesn't that mean that supporting Windows (or any other guest that > > uses TPR to mask interrupts) as an SEV-ES guest is practically impossible? Because > > while KVM can observe and manipulate guest CR8, KVM won't be able to precisely > > detect when TPR drops below a pending IRQ. > > Could we do something with virtual interrupt support? Today KVM uses the > virtual interrupt control to detect when an IRQ window opens. We could > do something similar by setting up the virtual interrupt priority, > V_INTR_PRIO, at the level of the current TPR/CR8 level. When the TPR > drops, that would trigger a #VMEXIT and allow the pending IRQ to be > injected. Thoughts? Uh, yes, that would work? I was thinking we couldn't model the priority, but obviously that's not true. FWIW, my preference would be to not add support unless someone asks for it :-) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated 2026-03-10 22:40 ` Sean Christopherson @ 2026-03-11 13:43 ` Tom Lendacky 0 siblings, 0 replies; 25+ messages in thread From: Tom Lendacky @ 2026-03-11 13:43 UTC (permalink / raw) To: Sean Christopherson Cc: Naveen N Rao, Srikanth Aithal, Paolo Bonzini, kvm, linux-kernel, Jim Mattson, Maciej S . Szmigiero On 3/10/26 17:40, Sean Christopherson wrote: > On Tue, Mar 10, 2026, Tom Lendacky wrote: >> On 3/10/26 16:58, Sean Christopherson wrote: >>> On Tue, Mar 10, 2026, Tom Lendacky wrote: >>>> On 3/10/26 13:35, Sean Christopherson wrote: >>>>> On Tue, Mar 10, 2026, Tom Lendacky wrote: >>>>>> I'm just saying that the unconditional trap for CR8_WRITE isn't flawed >>>>>> for SEV-ES+ because AVIC can't work with SEV, so there isn't any time >>>>>> that CR8 writes shouldn't be trapped. >>>>> >>>>> Yeah, I forgot that (obviously). >>>>> >>>>> But sync_cr8_to_lapic() is very broken, no? INTERCEPT_CR8_WRITE will never be >>>>> set, and svm->vmcb->control.int_ctl will become stale as soon as the VMSA is >>>>> live, and so in all likelihood KVM is crushing CR8 to zero for SEV-ES guests. >>>> >>>> I don't think so. V_TPR is written on #VMEXIT even for SEV-ES+ guests, >>>> and since it is a trap, CR8 is set and so V_TPR should have that value. >>>> That would imply sync_cr8_to_lapic() should do the right thing. >>> >>> But isn't svm->vmcb->control.int_ctl stale? Oh. "control", not "save". /facepalm >>> >>> Ah, and I assume Secure AVIC hides vTPR from the host? Or at least prevents the >>> host from setting it? >> >> Secure AVIC will prevent the host from setting it since the backing page >> lives in guest memory and is encrypted/private. > > What about vmcb->control.int_ctl though? IIUC, that's the source of truth for > the effective vTPR, not the value in the virtual APIC page. For Secure AVIC, V_TPR from the vmcb->control.int_ctl isn't used, instead it is saved to and restored from the VMSA. The APM should probably be updated to be clear about that. > >>>> After attempting to verify this behavior it turns out that writes to CR8 >>>> (and CR2) are, in fact, not trapped, but the APM was not updated with >>>> this information (I'll send a patch to remove that code). KVM's CR8 >>>> value is, however, synced with the proper value through >>>> sync_cr8_to_lapic() because V_TPR in the VMCB is updated on #VMEXIT. >>> >>> Oh. Huh. So doesn't that mean that supporting Windows (or any other guest that >>> uses TPR to mask interrupts) as an SEV-ES guest is practically impossible? Because >>> while KVM can observe and manipulate guest CR8, KVM won't be able to precisely >>> detect when TPR drops below a pending IRQ. >> >> Could we do something with virtual interrupt support? Today KVM uses the >> virtual interrupt control to detect when an IRQ window opens. We could >> do something similar by setting up the virtual interrupt priority, >> V_INTR_PRIO, at the level of the current TPR/CR8 level. When the TPR >> drops, that would trigger a #VMEXIT and allow the pending IRQ to be >> injected. Thoughts? > > Uh, yes, that would work? I was thinking we couldn't model the priority, but > obviously that's not true. > > FWIW, my preference would be to not add support unless someone asks for it :-) Agreed. Thanks, Tom ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated 2026-03-10 22:33 ` Tom Lendacky 2026-03-10 22:40 ` Sean Christopherson @ 2026-03-11 17:39 ` Paolo Bonzini 1 sibling, 0 replies; 25+ messages in thread From: Paolo Bonzini @ 2026-03-11 17:39 UTC (permalink / raw) To: Tom Lendacky, Sean Christopherson Cc: Naveen N Rao, Srikanth Aithal, kvm, linux-kernel, Jim Mattson, Maciej S . Szmigiero On 3/10/26 23:33, Tom Lendacky wrote: > Could we do something with virtual interrupt support? Today KVM uses the > virtual interrupt control to detect when an IRQ window opens. We could > do something similar by setting up the virtual interrupt priority, > V_INTR_PRIO, at the level of the current TPR/CR8 level. When the TPR > drops, that would trigger a #VMEXIT and allow the pending IRQ to be > injected. Thoughts? Yes, in fact Hyper-V uses VINTR injection and mostly doesn't bother with interrupt windows. KVM does it to keep the code similar between Intel and AMD. But even if you don't go all the way with VINTR, it should be possible to implement something akin to Intel flexpriority through V_INTR_PRIO and V_TPR (and keeping the VINTR intercept set to detect the moment V_TPR falls below V_INTR_PRIO). Paolo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/2] KVM: SVM: Fix CR8 intercpetion woes with AVIC 2026-02-03 19:07 [PATCH 0/2] KVM: SVM: Fix CR8 intercpetion woes with AVIC Sean Christopherson 2026-02-03 19:07 ` [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC Sean Christopherson 2026-02-03 19:07 ` [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated Sean Christopherson @ 2026-03-05 17:07 ` Sean Christopherson 2 siblings, 0 replies; 25+ messages in thread From: Sean Christopherson @ 2026-03-05 17:07 UTC (permalink / raw) To: Sean Christopherson, Paolo Bonzini Cc: kvm, linux-kernel, Jim Mattson, Naveen N Rao, Maciej S . Szmigiero On Tue, 03 Feb 2026 11:07:08 -0800, Sean Christopherson wrote: > Fix a bug (or rather, a class of bugs) where SVM leaves the CR8 write > intercept enabled after AVIC is enabled. On its own, the dangling CR8 > intercept is "just" a performance issue. But combined with the TPR sync bug > fixed by commit d02e48830e3f ("KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR > even if AVIC is active"), the danging intercept is fatal to Windows guests as > the TPR seen by hardware gets wildly out of sync with reality. > > [...] Applied to kvm-x86 fixes, thanks! [1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC https://github.com/kvm-x86/linux/commit/9071d0eb6955 [2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated https://github.com/kvm-x86/linux/commit/e992bf67bcba -- https://github.com/kvm-x86/linux/tree/next ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2026-03-11 17:40 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-03 19:07 [PATCH 0/2] KVM: SVM: Fix CR8 intercpetion woes with AVIC Sean Christopherson 2026-02-03 19:07 ` [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC Sean Christopherson 2026-02-05 4:21 ` Jim Mattson 2026-02-06 14:00 ` Naveen N Rao 2026-02-06 18:17 ` Sean Christopherson 2026-02-09 10:23 ` Naveen N Rao 2026-02-09 21:36 ` Sean Christopherson 2026-02-03 19:07 ` [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated Sean Christopherson 2026-02-05 4:22 ` Jim Mattson 2026-02-06 17:11 ` Naveen N Rao 2026-02-06 17:55 ` Sean Christopherson 2026-03-10 15:41 ` Aithal, Srikanth 2026-03-10 17:17 ` Sean Christopherson 2026-03-10 17:36 ` Tom Lendacky 2026-03-10 17:48 ` Naveen N Rao 2026-03-10 18:00 ` Naveen N Rao 2026-03-10 18:12 ` Tom Lendacky 2026-03-10 18:35 ` Sean Christopherson 2026-03-10 21:41 ` Tom Lendacky 2026-03-10 21:58 ` Sean Christopherson 2026-03-10 22:33 ` Tom Lendacky 2026-03-10 22:40 ` Sean Christopherson 2026-03-11 13:43 ` Tom Lendacky 2026-03-11 17:39 ` Paolo Bonzini 2026-03-05 17:07 ` [PATCH 0/2] KVM: SVM: Fix CR8 intercpetion woes with AVIC Sean Christopherson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox