public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] KVM: SVM: Fix CR8 intercpetion woes with AVIC
@ 2026-02-03 19:07 Sean Christopherson
  2026-02-03 19:07 ` [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC Sean Christopherson
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Sean Christopherson @ 2026-02-03 19:07 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Naveen N Rao,
	Maciej S . Szmigiero

Fix a bug (or rather, a class of bugs) where SVM leaves the CR8 write
intercept enabled after AVIC is enabled.  On its own, the dangling CR8
intercept is "just" a performance issue.  But combined with the TPR sync bug
fixed by commit d02e48830e3f ("KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR
even if AVIC is active"), the danging intercept is fatal to Windows guests as
the TPR seen by hardware gets wildly out of sync with reality.

Tagged for stable even though there shouldn't be functional issues so long as
the TPR sync bug is fixed, because (a) write_cr8 exits can represent the
overwhelming majority of exits (hence the quotes around "just" a performance
issue), and (b) running with a bad/wrong configuration increases the chances
of encountering other lurking TPR bugs (if there are any), i.e. of hitting
bugs that would otherwise be rare edge (which is good for testing, but bad
for production).

Sean Christopherson (2):
  KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with
    in-kernel APIC
  KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated

 arch/x86/kvm/svm/avic.c |  8 +++++---
 arch/x86/kvm/svm/svm.c  | 11 ++++++-----
 2 files changed, 11 insertions(+), 8 deletions(-)


base-commit: e944fe2c09f405a2e2d147145c9b470084bc4c9a
-- 
2.53.0.rc2.204.g2597b5adb4-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC
  2026-02-03 19:07 [PATCH 0/2] KVM: SVM: Fix CR8 intercpetion woes with AVIC Sean Christopherson
@ 2026-02-03 19:07 ` Sean Christopherson
  2026-02-05  4:21   ` Jim Mattson
  2026-02-06 14:00   ` Naveen N Rao
  2026-02-03 19:07 ` [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated Sean Christopherson
  2026-03-05 17:07 ` [PATCH 0/2] KVM: SVM: Fix CR8 intercpetion woes with AVIC Sean Christopherson
  2 siblings, 2 replies; 25+ messages in thread
From: Sean Christopherson @ 2026-02-03 19:07 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Naveen N Rao,
	Maciej S . Szmigiero

Initialize all per-vCPU AVIC control fields in the VMCB if AVIC is enabled
in KVM and the VM has an in-kernel local APIC, i.e. if it's _possible_ the
vCPU could activate AVIC at any point in its lifecycle.  Configuring the
VMCB if and only if AVIC is active "works" purely because of optimizations
in kvm_create_lapic() to speculatively set apicv_active if AVIC is enabled
*and* to defer updates until the first KVM_RUN.  In quotes because KVM
likely won't do the right thing if kvm_apicv_activated() is false, i.e. if
a vCPU is created while APICv is inhibited at the VM level for whatever
reason.  E.g. if the inhibit is *removed* before KVM_REQ_APICV_UPDATE is
handled in KVM_RUN, then __kvm_vcpu_update_apicv() will elide calls to
vendor code due to seeing "apicv_active == activate".

Cleaning up the initialization code will also allow fixing a bug where KVM
incorrectly leaves CR8 interception enabled when AVIC is activated without
creating a mess with respect to whether AVIC is activated or not.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/avic.c | 2 +-
 arch/x86/kvm/svm/svm.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index f92214b1a938..44e07c27b190 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -368,7 +368,7 @@ void avic_init_vmcb(struct vcpu_svm *svm, struct vmcb *vmcb)
 	vmcb->control.avic_physical_id = __sme_set(__pa(kvm_svm->avic_physical_id_table));
 	vmcb->control.avic_vapic_bar = APIC_DEFAULT_PHYS_BASE;
 
-	if (kvm_apicv_activated(svm->vcpu.kvm))
+	if (kvm_vcpu_apicv_active(&svm->vcpu))
 		avic_activate_vmcb(svm);
 	else
 		avic_deactivate_vmcb(svm);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 5f0136dbdde6..e8313fdc5465 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1189,7 +1189,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
 	if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS))
 		svm->vmcb->control.erap_ctl |= ERAP_CONTROL_ALLOW_LARGER_RAP;
 
-	if (kvm_vcpu_apicv_active(vcpu))
+	if (enable_apicv && irqchip_in_kernel(vcpu->kvm))
 		avic_init_vmcb(svm, vmcb);
 
 	if (vnmi)
-- 
2.53.0.rc2.204.g2597b5adb4-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  2026-02-03 19:07 [PATCH 0/2] KVM: SVM: Fix CR8 intercpetion woes with AVIC Sean Christopherson
  2026-02-03 19:07 ` [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC Sean Christopherson
@ 2026-02-03 19:07 ` Sean Christopherson
  2026-02-05  4:22   ` Jim Mattson
                     ` (2 more replies)
  2026-03-05 17:07 ` [PATCH 0/2] KVM: SVM: Fix CR8 intercpetion woes with AVIC Sean Christopherson
  2 siblings, 3 replies; 25+ messages in thread
From: Sean Christopherson @ 2026-02-03 19:07 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Naveen N Rao,
	Maciej S . Szmigiero

Explicitly set/clear CR8 write interception when AVIC is (de)activated to
fix a bug where KVM leaves the interception enabled after AVIC is
activated.  E.g. if KVM emulates INIT=>WFS while AVIC is deactivated, CR8
will remain intercepted in perpetuity.

On its own, the dangling CR8 intercept is "just" a performance issue, but
combined with the TPR sync bug fixed by commit d02e48830e3f ("KVM: SVM:
Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active"), the danging
intercept is fatal to Windows guests as the TPR seen by hardware gets
wildly out of sync with reality.

Note, VMX isn't affected by the bug as TPR_THRESHOLD is explicitly ignored
when Virtual Interrupt Delivery is enabled, i.e. when APICv is active in
KVM's world.  I.e. there's no need to trigger update_cr8_intercept(), this
is firmly an SVM implementation flaw/detail.

WARN if KVM gets a CR8 write #VMEXIT while AVIC is active, as KVM should
never enter the guest with AVIC enabled and CR8 writes intercepted.

Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC")
Cc: stable@vger.kernel.org
Cc: Jim Mattson <jmattson@google.com>
Cc: Naveen N Rao (AMD) <naveen@kernel.org>
Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/avic.c | 6 ++++--
 arch/x86/kvm/svm/svm.c  | 9 +++++----
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 44e07c27b190..13a4a8949aba 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -189,12 +189,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
 	struct kvm_vcpu *vcpu = &svm->vcpu;
 
 	vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
-
 	vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
 	vmcb->control.avic_physical_id |= avic_get_max_physical_id(vcpu);
-
 	vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
 
+	svm_clr_intercept(svm, INTERCEPT_CR8_WRITE);
+
 	/*
 	 * Note: KVM supports hybrid-AVIC mode, where KVM emulates x2APIC MSR
 	 * accesses, while interrupt injection to a running vCPU can be
@@ -226,6 +226,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
 	vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
 	vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
 
+	svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
+
 	/*
 	 * If running nested and the guest uses its own MSR bitmap, there
 	 * is no need to update L0's msr bitmap
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e8313fdc5465..aa3ab22215f5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1077,8 +1077,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
 	svm_set_intercept(svm, INTERCEPT_CR0_WRITE);
 	svm_set_intercept(svm, INTERCEPT_CR3_WRITE);
 	svm_set_intercept(svm, INTERCEPT_CR4_WRITE);
-	if (!kvm_vcpu_apicv_active(vcpu))
-		svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
+	svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
 
 	set_dr_intercepts(svm);
 
@@ -2674,9 +2673,11 @@ static int dr_interception(struct kvm_vcpu *vcpu)
 
 static int cr8_write_interception(struct kvm_vcpu *vcpu)
 {
-	int r;
-
 	u8 cr8_prev = kvm_get_cr8(vcpu);
+	int r;
+
+	WARN_ON_ONCE(kvm_vcpu_apicv_active(vcpu));
+
 	/* instruction emulation calls kvm_set_cr8() */
 	r = cr_interception(vcpu);
 	if (lapic_in_kernel(vcpu))
-- 
2.53.0.rc2.204.g2597b5adb4-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC
  2026-02-03 19:07 ` [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC Sean Christopherson
@ 2026-02-05  4:21   ` Jim Mattson
  2026-02-06 14:00   ` Naveen N Rao
  1 sibling, 0 replies; 25+ messages in thread
From: Jim Mattson @ 2026-02-05  4:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Naveen N Rao,
	Maciej S . Szmigiero

On Tue, Feb 3, 2026 at 11:07 AM Sean Christopherson <seanjc@google.com> wrote:
>
> Initialize all per-vCPU AVIC control fields in the VMCB if AVIC is enabled
> in KVM and the VM has an in-kernel local APIC, i.e. if it's _possible_ the
> vCPU could activate AVIC at any point in its lifecycle.  Configuring the
> VMCB if and only if AVIC is active "works" purely because of optimizations
> in kvm_create_lapic() to speculatively set apicv_active if AVIC is enabled
> *and* to defer updates until the first KVM_RUN.  In quotes because KVM
> likely won't do the right thing if kvm_apicv_activated() is false, i.e. if
> a vCPU is created while APICv is inhibited at the VM level for whatever
> reason.  E.g. if the inhibit is *removed* before KVM_REQ_APICV_UPDATE is
> handled in KVM_RUN, then __kvm_vcpu_update_apicv() will elide calls to
> vendor code due to seeing "apicv_active == activate".
>
> Cleaning up the initialization code will also allow fixing a bug where KVM
> incorrectly leaves CR8 interception enabled when AVIC is activated without
> creating a mess with respect to whether AVIC is activated or not.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  2026-02-03 19:07 ` [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated Sean Christopherson
@ 2026-02-05  4:22   ` Jim Mattson
  2026-02-06 17:11   ` Naveen N Rao
  2026-03-10 15:41   ` Aithal, Srikanth
  2 siblings, 0 replies; 25+ messages in thread
From: Jim Mattson @ 2026-02-05  4:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Naveen N Rao,
	Maciej S . Szmigiero

On Tue, Feb 3, 2026 at 11:07 AM Sean Christopherson <seanjc@google.com> wrote:
>
> Explicitly set/clear CR8 write interception when AVIC is (de)activated to
> fix a bug where KVM leaves the interception enabled after AVIC is
> activated.  E.g. if KVM emulates INIT=>WFS while AVIC is deactivated, CR8
> will remain intercepted in perpetuity.
>
> On its own, the dangling CR8 intercept is "just" a performance issue, but
> combined with the TPR sync bug fixed by commit d02e48830e3f ("KVM: SVM:
> Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active"), the danging
> intercept is fatal to Windows guests as the TPR seen by hardware gets
> wildly out of sync with reality.
>
> Note, VMX isn't affected by the bug as TPR_THRESHOLD is explicitly ignored
> when Virtual Interrupt Delivery is enabled, i.e. when APICv is active in
> KVM's world.  I.e. there's no need to trigger update_cr8_intercept(), this
> is firmly an SVM implementation flaw/detail.
>
> WARN if KVM gets a CR8 write #VMEXIT while AVIC is active, as KVM should
> never enter the guest with AVIC enabled and CR8 writes intercepted.
>
> Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC")
> Cc: stable@vger.kernel.org
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Naveen N Rao (AMD) <naveen@kernel.org>
> Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC
  2026-02-03 19:07 ` [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC Sean Christopherson
  2026-02-05  4:21   ` Jim Mattson
@ 2026-02-06 14:00   ` Naveen N Rao
  2026-02-06 18:17     ` Sean Christopherson
  1 sibling, 1 reply; 25+ messages in thread
From: Naveen N Rao @ 2026-02-06 14:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Jim Mattson,
	Maciej S . Szmigiero

On Tue, Feb 03, 2026 at 11:07:09AM -0800, Sean Christopherson wrote:
> Initialize all per-vCPU AVIC control fields in the VMCB if AVIC is enabled
> in KVM and the VM has an in-kernel local APIC, i.e. if it's _possible_ the
> vCPU could activate AVIC at any point in its lifecycle.  Configuring the
> VMCB if and only if AVIC is active "works" purely because of optimizations
> in kvm_create_lapic() to speculatively set apicv_active if AVIC is enabled
> *and* to defer updates until the first KVM_RUN.  In quotes because KVM

I think it will be good to clarify that two issues are being addressed 
here (it wasn't clear to me to begin with):
- One, described above, is about calling into avic_init_vmcb() 
  regardless of the vCPU APICv status.
- Two, described below is about using the vCPU APICv status for init and 
  not consulting the VM-level APICv inhibit status.

> likely won't do the right thing if kvm_apicv_activated() is false, i.e. if
> a vCPU is created while APICv is inhibited at the VM level for whatever
> reason.  E.g. if the inhibit is *removed* before KVM_REQ_APICV_UPDATE is
> handled in KVM_RUN, then __kvm_vcpu_update_apicv() will elide calls to
> vendor code due to seeing "apicv_active == activate".
> 
> Cleaning up the initialization code will also allow fixing a bug where KVM
> incorrectly leaves CR8 interception enabled when AVIC is activated without
> creating a mess with respect to whether AVIC is activated or not.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Any reason not to add a Fixes: tag? It looks like the below commits are 
to blame, but those are really old so I understand if you don't think 
this is useful:
Fixes: 67034bb9dd5e ("KVM: SVM: Add irqchip_split() checks before enabling AVIC")
Fixes: 6c3e4422dd20 ("svm: Add support for dynamic APICv")

Other than that:
Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>

> ---
>  arch/x86/kvm/svm/avic.c | 2 +-
>  arch/x86/kvm/svm/svm.c  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index f92214b1a938..44e07c27b190 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -368,7 +368,7 @@ void avic_init_vmcb(struct vcpu_svm *svm, struct vmcb *vmcb)
>  	vmcb->control.avic_physical_id = __sme_set(__pa(kvm_svm->avic_physical_id_table));
>  	vmcb->control.avic_vapic_bar = APIC_DEFAULT_PHYS_BASE;
>  
> -	if (kvm_apicv_activated(svm->vcpu.kvm))
> +	if (kvm_vcpu_apicv_active(&svm->vcpu))
>  		avic_activate_vmcb(svm);
>  	else
>  		avic_deactivate_vmcb(svm);
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 5f0136dbdde6..e8313fdc5465 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1189,7 +1189,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
>  	if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS))
>  		svm->vmcb->control.erap_ctl |= ERAP_CONTROL_ALLOW_LARGER_RAP;
>  
> -	if (kvm_vcpu_apicv_active(vcpu))
> +	if (enable_apicv && irqchip_in_kernel(vcpu->kvm))
>  		avic_init_vmcb(svm, vmcb);

Doesn't have to be done as part of this series, but I'm wondering if it 
makes sense to turn this into a helper to clarify the intent and to make 
it more obvious:

---
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e441f270f354..4e0ec4bf0db6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2289,6 +2289,7 @@ gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva,
 gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
                                struct x86_exception *exception);

+bool kvm_apicv_possible(struct kvm *kvm);
 bool kvm_apicv_activated(struct kvm *kvm);
 bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu);
 void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 13a4a8949aba..f7b1271cea88 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -285,7 +285,7 @@ int avic_alloc_physical_id_table(struct kvm *kvm)
 {
        struct kvm_svm *kvm_svm = to_kvm_svm(kvm);

-       if (!irqchip_in_kernel(kvm) || !enable_apicv)
+       if (!kvm_apicv_possible(kvm))
                return 0;

        if (kvm_svm->avic_physical_id_table)
@@ -839,7 +839,7 @@ int avic_init_vcpu(struct vcpu_svm *svm)
        INIT_LIST_HEAD(&svm->ir_list);
        raw_spin_lock_init(&svm->ir_list_lock);

-       if (!enable_apicv || !irqchip_in_kernel(vcpu->kvm))
+       if (!kvm_apicv_possible(vcpu->kvm))
                return 0;

        ret = avic_init_backing_page(vcpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4115fe583052..b964d834512e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1188,7 +1188,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
        if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS))
                svm->vmcb->control.erap_ctl |= ERAP_CONTROL_ALLOW_LARGER_RAP;

-       if (enable_apicv && irqchip_in_kernel(vcpu->kvm))
+       if (kvm_apicv_possible(vcpu->kvm))
                avic_init_vmcb(svm, vmcb);

        if (vnmi)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8acfdfc583a1..86f99c5b831a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10270,6 +10270,12 @@ static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
        kvm_irq_delivery_to_apic(kvm, NULL, &lapic_irq);
 }

+bool kvm_apicv_possible(struct kvm *kvm)
+{
+       return enable_apicv && irqchip_in_kernel(kvm);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apicv_possible);
+
 bool kvm_apicv_activated(struct kvm *kvm)
 {
	return (READ_ONCE(kvm->arch.apicv_inhibit_reasons) == 0);


- Naveen


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  2026-02-03 19:07 ` [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated Sean Christopherson
  2026-02-05  4:22   ` Jim Mattson
@ 2026-02-06 17:11   ` Naveen N Rao
  2026-02-06 17:55     ` Sean Christopherson
  2026-03-10 15:41   ` Aithal, Srikanth
  2 siblings, 1 reply; 25+ messages in thread
From: Naveen N Rao @ 2026-02-06 17:11 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Jim Mattson,
	Maciej S . Szmigiero

On Tue, Feb 03, 2026 at 11:07:10AM -0800, Sean Christopherson wrote:
> Explicitly set/clear CR8 write interception when AVIC is (de)activated to
> fix a bug where KVM leaves the interception enabled after AVIC is
> activated.  E.g. if KVM emulates INIT=>WFS while AVIC is deactivated, CR8
> will remain intercepted in perpetuity.

Looking at svm_update_cr8_intercept(), I suppose this could also more 
commonly happen whenever AVIC is inhibited (IRQ Windows, as an example)?

> 
> On its own, the dangling CR8 intercept is "just" a performance issue, but
> combined with the TPR sync bug fixed by commit d02e48830e3f ("KVM: SVM:
> Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active"), the danging
> intercept is fatal to Windows guests as the TPR seen by hardware gets
> wildly out of sync with reality.
> 
> Note, VMX isn't affected by the bug as TPR_THRESHOLD is explicitly ignored
> when Virtual Interrupt Delivery is enabled, i.e. when APICv is active in
> KVM's world.  I.e. there's no need to trigger update_cr8_intercept(), this
> is firmly an SVM implementation flaw/detail.
> 
> WARN if KVM gets a CR8 write #VMEXIT while AVIC is active, as KVM should
> never enter the guest with AVIC enabled and CR8 writes intercepted.
> 
> Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC")
> Cc: stable@vger.kernel.org
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Naveen N Rao (AMD) <naveen@kernel.org>
> Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/avic.c | 6 ++++--
>  arch/x86/kvm/svm/svm.c  | 9 +++++----
>  2 files changed, 9 insertions(+), 6 deletions(-)

LGTM.
Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>


Thanks,
Naveen


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  2026-02-06 17:11   ` Naveen N Rao
@ 2026-02-06 17:55     ` Sean Christopherson
  0 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2026-02-06 17:55 UTC (permalink / raw)
  To: Naveen N Rao
  Cc: Paolo Bonzini, kvm, linux-kernel, Jim Mattson,
	Maciej S . Szmigiero

On Fri, Feb 06, 2026, Naveen N Rao wrote:
> On Tue, Feb 03, 2026 at 11:07:10AM -0800, Sean Christopherson wrote:
> > Explicitly set/clear CR8 write interception when AVIC is (de)activated to
> > fix a bug where KVM leaves the interception enabled after AVIC is
> > activated.  E.g. if KVM emulates INIT=>WFS while AVIC is deactivated, CR8
> > will remain intercepted in perpetuity.
> 
> Looking at svm_update_cr8_intercept(), I suppose this could also more 
> commonly happen whenever AVIC is inhibited (IRQ Windows, as an example)?

Maybe?  I don't think it's actually common in practice.  Because the bug requires
the source of the inhibition to be removed while the vCPU still has a pending IRQ
that is below PPR.  Which is definitely possible, but that seems overall unlikely,
and it'd also be self-healing to some extent.  E.g. if a workload is triggering
ExtINT, then odds are good it's going to _keep_ generating ExtINT, keep toggling
the inhibit, and thus reconcile CR8 interception every time AVIC is inhibited.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC
  2026-02-06 14:00   ` Naveen N Rao
@ 2026-02-06 18:17     ` Sean Christopherson
  2026-02-09 10:23       ` Naveen N Rao
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2026-02-06 18:17 UTC (permalink / raw)
  To: Naveen N Rao
  Cc: Paolo Bonzini, kvm, linux-kernel, Jim Mattson,
	Maciej S . Szmigiero

On Fri, Feb 06, 2026, Naveen N Rao wrote:
> On Tue, Feb 03, 2026 at 11:07:09AM -0800, Sean Christopherson wrote:
> > Initialize all per-vCPU AVIC control fields in the VMCB if AVIC is enabled
> > in KVM and the VM has an in-kernel local APIC, i.e. if it's _possible_ the
> > vCPU could activate AVIC at any point in its lifecycle.  Configuring the
> > VMCB if and only if AVIC is active "works" purely because of optimizations
> > in kvm_create_lapic() to speculatively set apicv_active if AVIC is enabled
> > *and* to defer updates until the first KVM_RUN.  In quotes because KVM
> 
> I think it will be good to clarify that two issues are being addressed 
> here (it wasn't clear to me to begin with):
> - One, described above, is about calling into avic_init_vmcb() 
>   regardless of the vCPU APICv status.
> - Two, described below is about using the vCPU APICv status for init and 
>   not consulting the VM-level APICv inhibit status.

Yeah, I was worried the changelog didn't capture the second one well, but I was
struggling to come up with wording.  How about this as a penultimate paragraph?

  Note!  Use the vCPU's current APICv status when initializing the VMCB,
  not the VM-level inhibit status.  The state of the VMCB *must* be kept
  consistent with the vCPU's APICv status at all times (KVM elides updates
  that are supposed be nops).  If the vCPU's APICv status isn't up-to-date
  with the VM-level status, then there is guaranteed to be a pending
  KVM_REQ_APICV_UPDATE, i.e. KVM will sync the vCPU with the VM before
  entering the guest.
 
> > likely won't do the right thing if kvm_apicv_activated() is false, i.e. if
> > a vCPU is created while APICv is inhibited at the VM level for whatever
> > reason.  E.g. if the inhibit is *removed* before KVM_REQ_APICV_UPDATE is
> > handled in KVM_RUN, then __kvm_vcpu_update_apicv() will elide calls to
> > vendor code due to seeing "apicv_active == activate".
> >
> > Cleaning up the initialization code will also allow fixing a bug where KVM
> > incorrectly leaves CR8 interception enabled when AVIC is activated without
> > creating a mess with respect to whether AVIC is activated or not.
> > 
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> 
> Any reason not to add a Fixes: tag?

Purely that I couldn't pin down exactly what commit(s) to blame.  Well, that's a
bit of a lie.  If I'm being 100% truthful, I got as far as commit 67034bb9dd5e
and decided I didn't care enough to spend the effort to figure out whether or not
that commit was truly to blame :-)

> It looks like the below commits are to blame, but those are really old so I
> understand if you don't think this is useful:
> Fixes: 67034bb9dd5e ("KVM: SVM: Add irqchip_split() checks before enabling AVIC")
> Fixes: 6c3e4422dd20 ("svm: Add support for dynamic APICv")

LGTM, I'll tack them on.

> Other than that:
> Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>

Thanks!  (Seriously, I really appreciate the in-depth reviews)

> > ---
> >  arch/x86/kvm/svm/avic.c | 2 +-
> >  arch/x86/kvm/svm/svm.c  | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index f92214b1a938..44e07c27b190 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -368,7 +368,7 @@ void avic_init_vmcb(struct vcpu_svm *svm, struct vmcb *vmcb)
> >  	vmcb->control.avic_physical_id = __sme_set(__pa(kvm_svm->avic_physical_id_table));
> >  	vmcb->control.avic_vapic_bar = APIC_DEFAULT_PHYS_BASE;
> >  
> > -	if (kvm_apicv_activated(svm->vcpu.kvm))
> > +	if (kvm_vcpu_apicv_active(&svm->vcpu))
> >  		avic_activate_vmcb(svm);
> >  	else
> >  		avic_deactivate_vmcb(svm);
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 5f0136dbdde6..e8313fdc5465 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -1189,7 +1189,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
> >  	if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS))
> >  		svm->vmcb->control.erap_ctl |= ERAP_CONTROL_ALLOW_LARGER_RAP;
> >  
> > -	if (kvm_vcpu_apicv_active(vcpu))
> > +	if (enable_apicv && irqchip_in_kernel(vcpu->kvm))
> >  		avic_init_vmcb(svm, vmcb);
> 
> Doesn't have to be done as part of this series, but I'm wondering if it 
> makes sense to turn this into a helper to clarify the intent and to make 
> it more obvious:

Hmm, yeah, though my only hesitation is the name.  For whatever reason, "possible"
makes me think "is APICv possible *right now*" (ignoring that I wrote exactly that
in the changelog).

What if we go with kvm_can_use_apicv()?  That would align with vmx_can_use_ipiv()
and vmx_can_use_vtd_pi(), which are pretty much identical in concept.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC
  2026-02-06 18:17     ` Sean Christopherson
@ 2026-02-09 10:23       ` Naveen N Rao
  2026-02-09 21:36         ` Sean Christopherson
  0 siblings, 1 reply; 25+ messages in thread
From: Naveen N Rao @ 2026-02-09 10:23 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Jim Mattson,
	Maciej S . Szmigiero

On Fri, Feb 06, 2026 at 10:17:29AM -0800, Sean Christopherson wrote:
> On Fri, Feb 06, 2026, Naveen N Rao wrote:
> > On Tue, Feb 03, 2026 at 11:07:09AM -0800, Sean Christopherson wrote:
> > > Initialize all per-vCPU AVIC control fields in the VMCB if AVIC is enabled
> > > in KVM and the VM has an in-kernel local APIC, i.e. if it's _possible_ the
> > > vCPU could activate AVIC at any point in its lifecycle.  Configuring the
> > > VMCB if and only if AVIC is active "works" purely because of optimizations
> > > in kvm_create_lapic() to speculatively set apicv_active if AVIC is enabled
> > > *and* to defer updates until the first KVM_RUN.  In quotes because KVM
> > 
> > I think it will be good to clarify that two issues are being addressed 
> > here (it wasn't clear to me to begin with):
> > - One, described above, is about calling into avic_init_vmcb() 
> >   regardless of the vCPU APICv status.
> > - Two, described below is about using the vCPU APICv status for init and 
> >   not consulting the VM-level APICv inhibit status.
> 
> Yeah, I was worried the changelog didn't capture the second one well, but I was
> struggling to come up with wording.  How about this as a penultimate paragraph?
> 
>   Note!  Use the vCPU's current APICv status when initializing the VMCB,
>   not the VM-level inhibit status.  The state of the VMCB *must* be kept
>   consistent with the vCPU's APICv status at all times (KVM elides updates
>   that are supposed be nops).  If the vCPU's APICv status isn't up-to-date
>   with the VM-level status, then there is guaranteed to be a pending
>   KVM_REQ_APICV_UPDATE, i.e. KVM will sync the vCPU with the VM before
>   entering the guest.

LGTM.

>  
> > > likely won't do the right thing if kvm_apicv_activated() is false, i.e. if
> > > a vCPU is created while APICv is inhibited at the VM level for whatever
> > > reason.  E.g. if the inhibit is *removed* before KVM_REQ_APICV_UPDATE is
> > > handled in KVM_RUN, then __kvm_vcpu_update_apicv() will elide calls to
> > > vendor code due to seeing "apicv_active == activate".
> > >
> > > Cleaning up the initialization code will also allow fixing a bug where KVM
> > > incorrectly leaves CR8 interception enabled when AVIC is activated without
> > > creating a mess with respect to whether AVIC is activated or not.
> > > 
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > 
> > Any reason not to add a Fixes: tag?
> 
> Purely that I couldn't pin down exactly what commit(s) to blame.  Well, that's a
> bit of a lie.  If I'm being 100% truthful, I got as far as commit 67034bb9dd5e
> and decided I didn't care enough to spend the effort to figure out whether or not
> that commit was truly to blame :-)
> 
> > It looks like the below commits are to blame, but those are really old so I
> > understand if you don't think this is useful:
> > Fixes: 67034bb9dd5e ("KVM: SVM: Add irqchip_split() checks before enabling AVIC")
> > Fixes: 6c3e4422dd20 ("svm: Add support for dynamic APICv")
> 
> LGTM, I'll tack them on.
> 
> > Other than that:
> > Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>
> 
> Thanks!  (Seriously, I really appreciate the in-depth reviews)

Glad to hear that!

> 
> > > ---
> > >  arch/x86/kvm/svm/avic.c | 2 +-
> > >  arch/x86/kvm/svm/svm.c  | 2 +-
> > >  2 files changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > > index f92214b1a938..44e07c27b190 100644
> > > --- a/arch/x86/kvm/svm/avic.c
> > > +++ b/arch/x86/kvm/svm/avic.c
> > > @@ -368,7 +368,7 @@ void avic_init_vmcb(struct vcpu_svm *svm, struct vmcb *vmcb)
> > >  	vmcb->control.avic_physical_id = __sme_set(__pa(kvm_svm->avic_physical_id_table));
> > >  	vmcb->control.avic_vapic_bar = APIC_DEFAULT_PHYS_BASE;
> > >  
> > > -	if (kvm_apicv_activated(svm->vcpu.kvm))
> > > +	if (kvm_vcpu_apicv_active(&svm->vcpu))
> > >  		avic_activate_vmcb(svm);
> > >  	else
> > >  		avic_deactivate_vmcb(svm);
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index 5f0136dbdde6..e8313fdc5465 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -1189,7 +1189,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
> > >  	if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS))
> > >  		svm->vmcb->control.erap_ctl |= ERAP_CONTROL_ALLOW_LARGER_RAP;
> > >  
> > > -	if (kvm_vcpu_apicv_active(vcpu))
> > > +	if (enable_apicv && irqchip_in_kernel(vcpu->kvm))
> > >  		avic_init_vmcb(svm, vmcb);
> > 
> > Doesn't have to be done as part of this series, but I'm wondering if it 
> > makes sense to turn this into a helper to clarify the intent and to make 
> > it more obvious:
> 
> Hmm, yeah, though my only hesitation is the name.  For whatever reason, "possible"
> makes me think "is APICv possible *right now*" (ignoring that I wrote exactly that
> in the changelog).
> 
> What if we go with kvm_can_use_apicv()?  That would align with vmx_can_use_ipiv()
> and vmx_can_use_vtd_pi(), which are pretty much identical in concept.

Yes, that's better. I'll use that and post it as a subsequent cleanup, 
unless you want to pick it up rightaway.


Thanks!
- Naveen


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC
  2026-02-09 10:23       ` Naveen N Rao
@ 2026-02-09 21:36         ` Sean Christopherson
  0 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2026-02-09 21:36 UTC (permalink / raw)
  To: Naveen N Rao
  Cc: Paolo Bonzini, kvm, linux-kernel, Jim Mattson,
	Maciej S . Szmigiero

On Mon, Feb 09, 2026, Naveen N Rao wrote:
> On Fri, Feb 06, 2026 at 10:17:29AM -0800, Sean Christopherson wrote:
> > > >  arch/x86/kvm/svm/avic.c | 2 +-
> > > >  arch/x86/kvm/svm/svm.c  | 2 +-
> > > >  2 files changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > > > index f92214b1a938..44e07c27b190 100644
> > > > --- a/arch/x86/kvm/svm/avic.c
> > > > +++ b/arch/x86/kvm/svm/avic.c
> > > > @@ -368,7 +368,7 @@ void avic_init_vmcb(struct vcpu_svm *svm, struct vmcb *vmcb)
> > > >  	vmcb->control.avic_physical_id = __sme_set(__pa(kvm_svm->avic_physical_id_table));
> > > >  	vmcb->control.avic_vapic_bar = APIC_DEFAULT_PHYS_BASE;
> > > >  
> > > > -	if (kvm_apicv_activated(svm->vcpu.kvm))
> > > > +	if (kvm_vcpu_apicv_active(&svm->vcpu))
> > > >  		avic_activate_vmcb(svm);
> > > >  	else
> > > >  		avic_deactivate_vmcb(svm);
> > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > > index 5f0136dbdde6..e8313fdc5465 100644
> > > > --- a/arch/x86/kvm/svm/svm.c
> > > > +++ b/arch/x86/kvm/svm/svm.c
> > > > @@ -1189,7 +1189,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
> > > >  	if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS))
> > > >  		svm->vmcb->control.erap_ctl |= ERAP_CONTROL_ALLOW_LARGER_RAP;
> > > >  
> > > > -	if (kvm_vcpu_apicv_active(vcpu))
> > > > +	if (enable_apicv && irqchip_in_kernel(vcpu->kvm))
> > > >  		avic_init_vmcb(svm, vmcb);
> > > 
> > > Doesn't have to be done as part of this series, but I'm wondering if it 
> > > makes sense to turn this into a helper to clarify the intent and to make 
> > > it more obvious:
> > 
> > Hmm, yeah, though my only hesitation is the name.  For whatever reason, "possible"
> > makes me think "is APICv possible *right now*" (ignoring that I wrote exactly that
> > in the changelog).
> > 
> > What if we go with kvm_can_use_apicv()?  That would align with vmx_can_use_ipiv()
> > and vmx_can_use_vtd_pi(), which are pretty much identical in concept.
> 
> Yes, that's better. I'll use that and post it as a subsequent cleanup, 
> unless you want to pick it up rightaway.

Go ahead and post it separately, it's nice to have a proper paper trail.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/2] KVM: SVM: Fix CR8 intercpetion woes with AVIC
  2026-02-03 19:07 [PATCH 0/2] KVM: SVM: Fix CR8 intercpetion woes with AVIC Sean Christopherson
  2026-02-03 19:07 ` [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC Sean Christopherson
  2026-02-03 19:07 ` [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated Sean Christopherson
@ 2026-03-05 17:07 ` Sean Christopherson
  2 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2026-03-05 17:07 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Naveen N Rao,
	Maciej S . Szmigiero

On Tue, 03 Feb 2026 11:07:08 -0800, Sean Christopherson wrote:
> Fix a bug (or rather, a class of bugs) where SVM leaves the CR8 write
> intercept enabled after AVIC is enabled.  On its own, the dangling CR8
> intercept is "just" a performance issue.  But combined with the TPR sync bug
> fixed by commit d02e48830e3f ("KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR
> even if AVIC is active"), the danging intercept is fatal to Windows guests as
> the TPR seen by hardware gets wildly out of sync with reality.
> 
> [...]

Applied to kvm-x86 fixes, thanks!

[1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC
      https://github.com/kvm-x86/linux/commit/9071d0eb6955
[2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
      https://github.com/kvm-x86/linux/commit/e992bf67bcba

--
https://github.com/kvm-x86/linux/tree/next

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  2026-02-03 19:07 ` [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated Sean Christopherson
  2026-02-05  4:22   ` Jim Mattson
  2026-02-06 17:11   ` Naveen N Rao
@ 2026-03-10 15:41   ` Aithal, Srikanth
  2026-03-10 17:17     ` Sean Christopherson
  2 siblings, 1 reply; 25+ messages in thread
From: Aithal, Srikanth @ 2026-03-10 15:41 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Naveen N Rao,
	Maciej S . Szmigiero


Hello Sean,

 From next-20260304 onwards [1], including recent next kernel 
next-20260309, booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC 
Genoa has been failing. However, on EPYC Milan, the SEV-ES guest boots 
fine.

I am using the same QEMU command line (given below) with the same 
versions of QEMU and OVMF on all three platforms.

"$QEMU_BIN" \
-machine q35,confidential-guest-support=sev0,vmport=off \
-object sev-guest,id=sev0,policy=0x5,cbitpos=51,reduced-phys-bits=1 \
-name guest=vm,debug-threads=on \
-drive if=pflash,format=raw,unit=0,file="$OVMF_PATH",readonly=on \
-m 2048 \
-object memory-backend-ram,size=2048M,id=mem-machine_mem \
-smp 1,maxcpus=1,cores=1,threads=1,dies=1,sockets=1 \
-cpu host \
-drive id=disk0,file="$DISK_IMAGE",format=qcow2,if=none \
-device virtio-scsi-pci,id=scsi0,disable-legacy=on,iommu_platform=true \
-device scsi-hd,drive=disk0 \
-enable-kvm \
-nographic \
-monitor tcp:localhost:4444,server,nowait

QEMU version: v10.2.1
OVMF version: edk2-stable202602

The SEV-ES guest crashes with the following QEMU trace:
error: kvm run failed Invalid argument
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00a10f10
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 00000000 0000ffff
IDT= 00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=30 17 4d 99 a6 74 ad 5a a1 1d d2 22 78 9f 73 25 ab 00 2f c0 <cd> d3 
ee 26 63 0d f5 de f3 ea c3 91 28 ba b5 ac ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? 
?? ?? ?? ??

KVM host serial log message that appears when the crash happens:
text

[ 4379.695497] kvm_amd: kvm [5809]: vcpu0, guest rIP: 0x0 vmgexit: 
unsupported event - exit_info_1=0x18, exit_info_2=0x0

Bisecting shows that this commit is the first bad one. When I revert it, 
I am able to boot the SEV-ES guest successfully on both Turin and Genoa 
platforms:

e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit
commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431
Author: Sean Christopherson <seanjc@google.com>
Date:   Tue Feb 3 11:07:10 2026 -0800

     KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated

[1]: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git, 
next-20260304

Will be happy to get any more information required. Thank you.

Srikanth Aithal <sraithal@amd.com>

On 2/4/2026 12:37 AM, Sean Christopherson wrote:
> Explicitly set/clear CR8 write interception when AVIC is (de)activated to
> fix a bug where KVM leaves the interception enabled after AVIC is
> activated.  E.g. if KVM emulates INIT=>WFS while AVIC is deactivated, CR8
> will remain intercepted in perpetuity.
> 
> On its own, the dangling CR8 intercept is "just" a performance issue, but
> combined with the TPR sync bug fixed by commit d02e48830e3f ("KVM: SVM:
> Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active"), the danging
> intercept is fatal to Windows guests as the TPR seen by hardware gets
> wildly out of sync with reality.
> 
> Note, VMX isn't affected by the bug as TPR_THRESHOLD is explicitly ignored
> when Virtual Interrupt Delivery is enabled, i.e. when APICv is active in
> KVM's world.  I.e. there's no need to trigger update_cr8_intercept(), this
> is firmly an SVM implementation flaw/detail.
> 
> WARN if KVM gets a CR8 write #VMEXIT while AVIC is active, as KVM should
> never enter the guest with AVIC enabled and CR8 writes intercepted.
> 
> Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC")
> Cc: stable@vger.kernel.org
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Naveen N Rao (AMD) <naveen@kernel.org>
> Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/svm/avic.c | 6 ++++--
>   arch/x86/kvm/svm/svm.c  | 9 +++++----
>   2 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 44e07c27b190..13a4a8949aba 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -189,12 +189,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
>   	struct kvm_vcpu *vcpu = &svm->vcpu;
>   
>   	vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
> -
>   	vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
>   	vmcb->control.avic_physical_id |= avic_get_max_physical_id(vcpu);
> -
>   	vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
>   
> +	svm_clr_intercept(svm, INTERCEPT_CR8_WRITE);
> +
>   	/*
>   	 * Note: KVM supports hybrid-AVIC mode, where KVM emulates x2APIC MSR
>   	 * accesses, while interrupt injection to a running vCPU can be
> @@ -226,6 +226,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
>   	vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
>   	vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
>   
> +	svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
> +
>   	/*
>   	 * If running nested and the guest uses its own MSR bitmap, there
>   	 * is no need to update L0's msr bitmap
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index e8313fdc5465..aa3ab22215f5 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1077,8 +1077,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
>   	svm_set_intercept(svm, INTERCEPT_CR0_WRITE);
>   	svm_set_intercept(svm, INTERCEPT_CR3_WRITE);
>   	svm_set_intercept(svm, INTERCEPT_CR4_WRITE);
> -	if (!kvm_vcpu_apicv_active(vcpu))
> -		svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
> +	svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
>   
>   	set_dr_intercepts(svm);
>   
> @@ -2674,9 +2673,11 @@ static int dr_interception(struct kvm_vcpu *vcpu)
>   
>   static int cr8_write_interception(struct kvm_vcpu *vcpu)
>   {
> -	int r;
> -
>   	u8 cr8_prev = kvm_get_cr8(vcpu);
> +	int r;
> +
> +	WARN_ON_ONCE(kvm_vcpu_apicv_active(vcpu));
> +
>   	/* instruction emulation calls kvm_set_cr8() */
>   	r = cr_interception(vcpu);
>   	if (lapic_in_kernel(vcpu))


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  2026-03-10 15:41   ` Aithal, Srikanth
@ 2026-03-10 17:17     ` Sean Christopherson
  2026-03-10 17:36       ` Tom Lendacky
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2026-03-10 17:17 UTC (permalink / raw)
  To: Srikanth Aithal
  Cc: Paolo Bonzini, kvm, linux-kernel, Jim Mattson, Naveen N Rao,
	Maciej S . Szmigiero

On Tue, Mar 10, 2026, Srikanth Aithal wrote:
> 
> Hello Sean,
> 
> From next-20260304 onwards [1], including recent next kernel next-20260309,
> booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC Genoa has been
> failing. However, on EPYC Milan, the SEV-ES guest boots fine.

...

> Bisecting shows that this commit is the first bad one. When I revert it, I
> am able to boot the SEV-ES guest successfully on both Turin and Genoa
> platforms:
> 
> e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit
> commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431
> Author: Sean Christopherson <seanjc@google.com>
> Date:   Tue Feb 3 11:07:10 2026 -0800

Gah, I hate how KVM manages intercepts for SEV-ES+.  Though to a large extent I
blame the architecture for not simply making CR{0,4,8} intercept trap-like.
Side topic, is the host actually allowed to trap CR3 writes?  That seems like a
huge gaping security flaw, especially for SNP+.

Anyways, this should fix the immediate problem.

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 33172f0e986b..b6072872b785 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -237,7 +237,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
        vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
        vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
 
-       svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
+       if (!sev_es_guest(svm->vcpu.kvm))
+               svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
 
        /*
         * If running nested and the guest uses its own MSR bitmap, there

Argh!  The more I look at this code, the more frustrated I get.  The unconditional
setting of TRAP_CR8_WRITE for SEV-ES+ is flawed.  When AVIC is enabled, KVM doesn't
need to trap CR8 writes because hardware will update the backing page.  I'm guessing
Windows doesn't support running as an SEV-ES guest, which is no one has noticed.

Actually, it's worse than that.  sync_cr8_to_lapic() will straight up clobber the
backing page.  Presumably hardware never actually uses TPR from the AVIC backing
page, but it's still gross.  sync_lapic_to_cr8() is also beyond useless.

And all of sync code should pivot on guest_state_protected, not sev_es_guest().

For now, I'll just post the above (assuming it fixes the issue).  But this code
needs some love sooner than later.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  2026-03-10 17:17     ` Sean Christopherson
@ 2026-03-10 17:36       ` Tom Lendacky
  2026-03-10 17:48         ` Naveen N Rao
  0 siblings, 1 reply; 25+ messages in thread
From: Tom Lendacky @ 2026-03-10 17:36 UTC (permalink / raw)
  To: Sean Christopherson, Srikanth Aithal
  Cc: Paolo Bonzini, kvm, linux-kernel, Jim Mattson, Naveen N Rao,
	Maciej S . Szmigiero

On 3/10/26 12:17, Sean Christopherson wrote:
> On Tue, Mar 10, 2026, Srikanth Aithal wrote:
>>
>> Hello Sean,
>>
>> From next-20260304 onwards [1], including recent next kernel next-20260309,
>> booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC Genoa has been
>> failing. However, on EPYC Milan, the SEV-ES guest boots fine.
> 
> ...
> 
>> Bisecting shows that this commit is the first bad one. When I revert it, I
>> am able to boot the SEV-ES guest successfully on both Turin and Genoa
>> platforms:
>>
>> e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit
>> commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431
>> Author: Sean Christopherson <seanjc@google.com>
>> Date:   Tue Feb 3 11:07:10 2026 -0800
> 
> Gah, I hate how KVM manages intercepts for SEV-ES+.  Though to a large extent I
> blame the architecture for not simply making CR{0,4,8} intercept trap-like.
> Side topic, is the host actually allowed to trap CR3 writes?  That seems like a
> huge gaping security flaw, especially for SNP+.
> 
> Anyways, this should fix the immediate problem.
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 33172f0e986b..b6072872b785 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -237,7 +237,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
>         vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
>         vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
>  
> -       svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
> +       if (!sev_es_guest(svm->vcpu.kvm))
> +               svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
>  
>         /*
>          * If running nested and the guest uses its own MSR bitmap, there
> 
> Argh!  The more I look at this code, the more frustrated I get.  The unconditional
> setting of TRAP_CR8_WRITE for SEV-ES+ is flawed.  When AVIC is enabled, KVM doesn't

AVIC is disabled for SEV guests (see __sev_guest_init() and the
kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV) call at the end of
the function).

Thanks,
Tom

> need to trap CR8 writes because hardware will update the backing page.  I'm guessing
> Windows doesn't support running as an SEV-ES guest, which is no one has noticed.
> 
> Actually, it's worse than that.  sync_cr8_to_lapic() will straight up clobber the
> backing page.  Presumably hardware never actually uses TPR from the AVIC backing
> page, but it's still gross.  sync_lapic_to_cr8() is also beyond useless.
> 
> And all of sync code should pivot on guest_state_protected, not sev_es_guest().
> 
> For now, I'll just post the above (assuming it fixes the issue).  But this code
> needs some love sooner than later.
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  2026-03-10 17:36       ` Tom Lendacky
@ 2026-03-10 17:48         ` Naveen N Rao
  2026-03-10 18:00           ` Naveen N Rao
  2026-03-10 18:12           ` Tom Lendacky
  0 siblings, 2 replies; 25+ messages in thread
From: Naveen N Rao @ 2026-03-10 17:48 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Sean Christopherson, Srikanth Aithal, Paolo Bonzini, kvm,
	linux-kernel, Jim Mattson, Maciej S . Szmigiero

On Tue, Mar 10, 2026 at 12:36:09PM -0500, Tom Lendacky wrote:
> On 3/10/26 12:17, Sean Christopherson wrote:
> > On Tue, Mar 10, 2026, Srikanth Aithal wrote:
> >>
> >> Hello Sean,
> >>
> >> From next-20260304 onwards [1], including recent next kernel next-20260309,
> >> booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC Genoa has been
> >> failing. However, on EPYC Milan, the SEV-ES guest boots fine.
> > 
> > ...
> > 
> >> Bisecting shows that this commit is the first bad one. When I revert it, I
> >> am able to boot the SEV-ES guest successfully on both Turin and Genoa
> >> platforms:
> >>
> >> e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit
> >> commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431
> >> Author: Sean Christopherson <seanjc@google.com>
> >> Date:   Tue Feb 3 11:07:10 2026 -0800
> > 
> > Gah, I hate how KVM manages intercepts for SEV-ES+.  Though to a large extent I
> > blame the architecture for not simply making CR{0,4,8} intercept trap-like.
> > Side topic, is the host actually allowed to trap CR3 writes?  That seems like a
> > huge gaping security flaw, especially for SNP+.
> > 
> > Anyways, this should fix the immediate problem.
> > 
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index 33172f0e986b..b6072872b785 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -237,7 +237,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
> >         vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
> >         vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
> >  
> > -       svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
> > +       if (!sev_es_guest(svm->vcpu.kvm))
> > +               svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
> >  
> >         /*
> >          * If running nested and the guest uses its own MSR bitmap, there
> > 
> > Argh!  The more I look at this code, the more frustrated I get.  The unconditional
> > setting of TRAP_CR8_WRITE for SEV-ES+ is flawed.  When AVIC is enabled, KVM doesn't
> 
> AVIC is disabled for SEV guests (see __sev_guest_init() and the
> kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV) call at the end of
> the function).

AVIC gets inhibited globally, but continues to be enabled on 
vcpu_create() opportunistically -- see kvm_create_lapic(). It only gets 
disabled later during vcpu setup via 
vcpu_reset()->svm_vcpu_reset()->init_vmcb()->avic_init_vmcb()


- Naveen


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  2026-03-10 17:48         ` Naveen N Rao
@ 2026-03-10 18:00           ` Naveen N Rao
  2026-03-10 18:12           ` Tom Lendacky
  1 sibling, 0 replies; 25+ messages in thread
From: Naveen N Rao @ 2026-03-10 18:00 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Sean Christopherson, Srikanth Aithal, Paolo Bonzini, kvm,
	linux-kernel, Jim Mattson, Maciej S . Szmigiero

On Tue, Mar 10, 2026 at 11:18:16PM +0530, Naveen N Rao wrote:
> On Tue, Mar 10, 2026 at 12:36:09PM -0500, Tom Lendacky wrote:
> > On 3/10/26 12:17, Sean Christopherson wrote:
> > > On Tue, Mar 10, 2026, Srikanth Aithal wrote:
> > >>
> > >> Hello Sean,
> > >>
> > >> From next-20260304 onwards [1], including recent next kernel next-20260309,
> > >> booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC Genoa has been
> > >> failing. However, on EPYC Milan, the SEV-ES guest boots fine.
> > > 
> > > ...
> > > 
> > >> Bisecting shows that this commit is the first bad one. When I revert it, I
> > >> am able to boot the SEV-ES guest successfully on both Turin and Genoa
> > >> platforms:
> > >>
> > >> e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit
> > >> commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431
> > >> Author: Sean Christopherson <seanjc@google.com>
> > >> Date:   Tue Feb 3 11:07:10 2026 -0800
> > > 
> > > Gah, I hate how KVM manages intercepts for SEV-ES+.  Though to a large extent I
> > > blame the architecture for not simply making CR{0,4,8} intercept trap-like.
> > > Side topic, is the host actually allowed to trap CR3 writes?  That seems like a
> > > huge gaping security flaw, especially for SNP+.
> > > 
> > > Anyways, this should fix the immediate problem.
> > > 
> > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > > index 33172f0e986b..b6072872b785 100644
> > > --- a/arch/x86/kvm/svm/avic.c
> > > +++ b/arch/x86/kvm/svm/avic.c
> > > @@ -237,7 +237,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
> > >         vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
> > >         vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
> > >  
> > > -       svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
> > > +       if (!sev_es_guest(svm->vcpu.kvm))
> > > +               svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
> > >  
> > >         /*
> > >          * If running nested and the guest uses its own MSR bitmap, there

I arrived at the same fix and it works for me, so FWIW:
Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>

> > > 
> > > Argh!  The more I look at this code, the more frustrated I get.  
> > > The unconditional
> > > setting of TRAP_CR8_WRITE for SEV-ES+ is flawed.  When AVIC is enabled, KVM doesn't
> > 
> > AVIC is disabled for SEV guests (see __sev_guest_init() and the
> > kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV) call at the end of
> > the function).
> 
> AVIC gets inhibited globally, but continues to be enabled on 
> vcpu_create() opportunistically -- see kvm_create_lapic(). It only gets 
> disabled later during vcpu setup via 
> vcpu_reset()->svm_vcpu_reset()->init_vmcb()->avic_init_vmcb()

... which explains why this issue is showing up.

But reading your response again, I guess you were pointing out that the 
intercepts are not a problem for SEV-ES guests since AVIC is inhibited, 
which totally makes sense.

Thanks,
Naveen


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  2026-03-10 17:48         ` Naveen N Rao
  2026-03-10 18:00           ` Naveen N Rao
@ 2026-03-10 18:12           ` Tom Lendacky
  2026-03-10 18:35             ` Sean Christopherson
  1 sibling, 1 reply; 25+ messages in thread
From: Tom Lendacky @ 2026-03-10 18:12 UTC (permalink / raw)
  To: Naveen N Rao
  Cc: Sean Christopherson, Srikanth Aithal, Paolo Bonzini, kvm,
	linux-kernel, Jim Mattson, Maciej S . Szmigiero

On 3/10/26 12:48, Naveen N Rao wrote:
> On Tue, Mar 10, 2026 at 12:36:09PM -0500, Tom Lendacky wrote:
>> On 3/10/26 12:17, Sean Christopherson wrote:
>>> On Tue, Mar 10, 2026, Srikanth Aithal wrote:
>>>>
>>>> Hello Sean,
>>>>
>>>> From next-20260304 onwards [1], including recent next kernel next-20260309,
>>>> booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC Genoa has been
>>>> failing. However, on EPYC Milan, the SEV-ES guest boots fine.
>>>
>>> ...
>>>
>>>> Bisecting shows that this commit is the first bad one. When I revert it, I
>>>> am able to boot the SEV-ES guest successfully on both Turin and Genoa
>>>> platforms:
>>>>
>>>> e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit
>>>> commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431
>>>> Author: Sean Christopherson <seanjc@google.com>
>>>> Date:   Tue Feb 3 11:07:10 2026 -0800
>>>
>>> Gah, I hate how KVM manages intercepts for SEV-ES+.  Though to a large extent I
>>> blame the architecture for not simply making CR{0,4,8} intercept trap-like.
>>> Side topic, is the host actually allowed to trap CR3 writes?  That seems like a
>>> huge gaping security flaw, especially for SNP+.
>>>
>>> Anyways, this should fix the immediate problem.
>>>
>>> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
>>> index 33172f0e986b..b6072872b785 100644
>>> --- a/arch/x86/kvm/svm/avic.c
>>> +++ b/arch/x86/kvm/svm/avic.c
>>> @@ -237,7 +237,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
>>>         vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
>>>         vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
>>>  
>>> -       svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
>>> +       if (!sev_es_guest(svm->vcpu.kvm))
>>> +               svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
>>>  
>>>         /*
>>>          * If running nested and the guest uses its own MSR bitmap, there
>>>
>>> Argh!  The more I look at this code, the more frustrated I get.  The unconditional
>>> setting of TRAP_CR8_WRITE for SEV-ES+ is flawed.  When AVIC is enabled, KVM doesn't
>>
>> AVIC is disabled for SEV guests (see __sev_guest_init() and the
>> kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV) call at the end of
>> the function).
> 
> AVIC gets inhibited globally, but continues to be enabled on 
> vcpu_create() opportunistically -- see kvm_create_lapic(). It only gets 
> disabled later during vcpu setup via 
> vcpu_reset()->svm_vcpu_reset()->init_vmcb()->avic_init_vmcb()

I'm just saying that the unconditional trap for CR8_WRITE isn't flawed
for SEV-ES+ because AVIC can't work with SEV, so there isn't any time
that CR8 writes shouldn't be trapped.

Thanks,
Tom

> 
> 
> - Naveen
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  2026-03-10 18:12           ` Tom Lendacky
@ 2026-03-10 18:35             ` Sean Christopherson
  2026-03-10 21:41               ` Tom Lendacky
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2026-03-10 18:35 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Naveen N Rao, Srikanth Aithal, Paolo Bonzini, kvm, linux-kernel,
	Jim Mattson, Maciej S . Szmigiero

On Tue, Mar 10, 2026, Tom Lendacky wrote:
> On 3/10/26 12:48, Naveen N Rao wrote:
> > On Tue, Mar 10, 2026 at 12:36:09PM -0500, Tom Lendacky wrote:
> >> On 3/10/26 12:17, Sean Christopherson wrote:
> >>> On Tue, Mar 10, 2026, Srikanth Aithal wrote:
> >>>>
> >>>> Hello Sean,
> >>>>
> >>>> From next-20260304 onwards [1], including recent next kernel next-20260309,
> >>>> booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC Genoa has been
> >>>> failing. However, on EPYC Milan, the SEV-ES guest boots fine.
> >>>
> >>> ...
> >>>
> >>>> Bisecting shows that this commit is the first bad one. When I revert it, I
> >>>> am able to boot the SEV-ES guest successfully on both Turin and Genoa
> >>>> platforms:
> >>>>
> >>>> e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit
> >>>> commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431
> >>>> Author: Sean Christopherson <seanjc@google.com>
> >>>> Date:   Tue Feb 3 11:07:10 2026 -0800
> >>>
> >>> Gah, I hate how KVM manages intercepts for SEV-ES+.  Though to a large extent I
> >>> blame the architecture for not simply making CR{0,4,8} intercept trap-like.
> >>> Side topic, is the host actually allowed to trap CR3 writes?  That seems like a
> >>> huge gaping security flaw, especially for SNP+.
> >>>
> >>> Anyways, this should fix the immediate problem.
> >>>
> >>> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> >>> index 33172f0e986b..b6072872b785 100644
> >>> --- a/arch/x86/kvm/svm/avic.c
> >>> +++ b/arch/x86/kvm/svm/avic.c
> >>> @@ -237,7 +237,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
> >>>         vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
> >>>         vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
> >>>  
> >>> -       svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
> >>> +       if (!sev_es_guest(svm->vcpu.kvm))
> >>> +               svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
> >>>  
> >>>         /*
> >>>          * If running nested and the guest uses its own MSR bitmap, there
> >>>
> >>> Argh!  The more I look at this code, the more frustrated I get.  The unconditional
> >>> setting of TRAP_CR8_WRITE for SEV-ES+ is flawed.  When AVIC is enabled, KVM doesn't
> >>
> >> AVIC is disabled for SEV guests (see __sev_guest_init() and the
> >> kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV) call at the end of
> >> the function).
> > 
> > AVIC gets inhibited globally, but continues to be enabled on 
> > vcpu_create() opportunistically -- see kvm_create_lapic(). It only gets 
> > disabled later during vcpu setup via 
> > vcpu_reset()->svm_vcpu_reset()->init_vmcb()->avic_init_vmcb()
> 
> I'm just saying that the unconditional trap for CR8_WRITE isn't flawed
> for SEV-ES+ because AVIC can't work with SEV, so there isn't any time
> that CR8 writes shouldn't be trapped.

Yeah, I forgot that (obviously).

But sync_cr8_to_lapic() is very broken, no?  INTERCEPT_CR8_WRITE will never be
set, and svm->vmcb->control.int_ctl will become stale as soon as the VMSA is
live, and so in all likelihood KVM is crushing CR8 to zero for SEV-ES guests.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  2026-03-10 18:35             ` Sean Christopherson
@ 2026-03-10 21:41               ` Tom Lendacky
  2026-03-10 21:58                 ` Sean Christopherson
  0 siblings, 1 reply; 25+ messages in thread
From: Tom Lendacky @ 2026-03-10 21:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Naveen N Rao, Srikanth Aithal, Paolo Bonzini, kvm, linux-kernel,
	Jim Mattson, Maciej S . Szmigiero

On 3/10/26 13:35, Sean Christopherson wrote:
> On Tue, Mar 10, 2026, Tom Lendacky wrote:
>> On 3/10/26 12:48, Naveen N Rao wrote:
>>> On Tue, Mar 10, 2026 at 12:36:09PM -0500, Tom Lendacky wrote:
>>>> On 3/10/26 12:17, Sean Christopherson wrote:
>>>>> On Tue, Mar 10, 2026, Srikanth Aithal wrote:
>>>>>>
>>>>>> Hello Sean,
>>>>>>
>>>>>> From next-20260304 onwards [1], including recent next kernel next-20260309,
>>>>>> booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC Genoa has been
>>>>>> failing. However, on EPYC Milan, the SEV-ES guest boots fine.
>>>>>
>>>>> ...
>>>>>
>>>>>> Bisecting shows that this commit is the first bad one. When I revert it, I
>>>>>> am able to boot the SEV-ES guest successfully on both Turin and Genoa
>>>>>> platforms:
>>>>>>
>>>>>> e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit
>>>>>> commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431
>>>>>> Author: Sean Christopherson <seanjc@google.com>
>>>>>> Date:   Tue Feb 3 11:07:10 2026 -0800
>>>>>
>>>>> Gah, I hate how KVM manages intercepts for SEV-ES+.  Though to a large extent I
>>>>> blame the architecture for not simply making CR{0,4,8} intercept trap-like.
>>>>> Side topic, is the host actually allowed to trap CR3 writes?  That seems like a
>>>>> huge gaping security flaw, especially for SNP+.
>>>>>
>>>>> Anyways, this should fix the immediate problem.
>>>>>
>>>>> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
>>>>> index 33172f0e986b..b6072872b785 100644
>>>>> --- a/arch/x86/kvm/svm/avic.c
>>>>> +++ b/arch/x86/kvm/svm/avic.c
>>>>> @@ -237,7 +237,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
>>>>>         vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
>>>>>         vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
>>>>>  
>>>>> -       svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
>>>>> +       if (!sev_es_guest(svm->vcpu.kvm))
>>>>> +               svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
>>>>>  
>>>>>         /*
>>>>>          * If running nested and the guest uses its own MSR bitmap, there
>>>>>
>>>>> Argh!  The more I look at this code, the more frustrated I get.  The unconditional
>>>>> setting of TRAP_CR8_WRITE for SEV-ES+ is flawed.  When AVIC is enabled, KVM doesn't
>>>>
>>>> AVIC is disabled for SEV guests (see __sev_guest_init() and the
>>>> kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV) call at the end of
>>>> the function).
>>>
>>> AVIC gets inhibited globally, but continues to be enabled on 
>>> vcpu_create() opportunistically -- see kvm_create_lapic(). It only gets 
>>> disabled later during vcpu setup via 
>>> vcpu_reset()->svm_vcpu_reset()->init_vmcb()->avic_init_vmcb()
>>
>> I'm just saying that the unconditional trap for CR8_WRITE isn't flawed
>> for SEV-ES+ because AVIC can't work with SEV, so there isn't any time
>> that CR8 writes shouldn't be trapped.
> 
> Yeah, I forgot that (obviously).
> 
> But sync_cr8_to_lapic() is very broken, no?  INTERCEPT_CR8_WRITE will never be
> set, and svm->vmcb->control.int_ctl will become stale as soon as the VMSA is
> live, and so in all likelihood KVM is crushing CR8 to zero for SEV-ES guests.

I don't think so. V_TPR is written on #VMEXIT even for SEV-ES+ guests,
and since it is a trap, CR8 is set and so V_TPR should have that value.
That would imply sync_cr8_to_lapic() should do the right thing.

After attempting to verify this behavior it turns out that writes to CR8
(and CR2) are, in fact, not trapped, but the APM was not updated with
this information (I'll send a patch to remove that code). KVM's CR8
value is, however, synced with the proper value through
sync_cr8_to_lapic() because V_TPR in the VMCB is updated on #VMEXIT.

Thanks,
Tom




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  2026-03-10 21:41               ` Tom Lendacky
@ 2026-03-10 21:58                 ` Sean Christopherson
  2026-03-10 22:33                   ` Tom Lendacky
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2026-03-10 21:58 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Naveen N Rao, Srikanth Aithal, Paolo Bonzini, kvm, linux-kernel,
	Jim Mattson, Maciej S . Szmigiero

On Tue, Mar 10, 2026, Tom Lendacky wrote:
> On 3/10/26 13:35, Sean Christopherson wrote:
> > On Tue, Mar 10, 2026, Tom Lendacky wrote:
> >> I'm just saying that the unconditional trap for CR8_WRITE isn't flawed
> >> for SEV-ES+ because AVIC can't work with SEV, so there isn't any time
> >> that CR8 writes shouldn't be trapped.
> > 
> > Yeah, I forgot that (obviously).
> > 
> > But sync_cr8_to_lapic() is very broken, no?  INTERCEPT_CR8_WRITE will never be
> > set, and svm->vmcb->control.int_ctl will become stale as soon as the VMSA is
> > live, and so in all likelihood KVM is crushing CR8 to zero for SEV-ES guests.
> 
> I don't think so. V_TPR is written on #VMEXIT even for SEV-ES+ guests,
> and since it is a trap, CR8 is set and so V_TPR should have that value.
> That would imply sync_cr8_to_lapic() should do the right thing.

But isn't svm->vmcb->control.int_ctl stale?  Oh.  "control", not "save".  /facepalm

Ah, and I assume Secure AVIC hides vTPR from the host?  Or at least prevents the
host from setting it?

> After attempting to verify this behavior it turns out that writes to CR8
> (and CR2) are, in fact, not trapped, but the APM was not updated with
> this information (I'll send a patch to remove that code). KVM's CR8
> value is, however, synced with the proper value through
> sync_cr8_to_lapic() because V_TPR in the VMCB is updated on #VMEXIT.

Oh.  Huh.  So doesn't that mean that supporting Windows (or any other guest that
uses TPR to mask interrupts) as an SEV-ES guest is practically impossible?  Because
while KVM can observe and manipulate guest CR8, KVM won't be able to precisely
detect when TPR drops below a pending IRQ.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  2026-03-10 21:58                 ` Sean Christopherson
@ 2026-03-10 22:33                   ` Tom Lendacky
  2026-03-10 22:40                     ` Sean Christopherson
  2026-03-11 17:39                     ` Paolo Bonzini
  0 siblings, 2 replies; 25+ messages in thread
From: Tom Lendacky @ 2026-03-10 22:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Naveen N Rao, Srikanth Aithal, Paolo Bonzini, kvm, linux-kernel,
	Jim Mattson, Maciej S . Szmigiero

On 3/10/26 16:58, Sean Christopherson wrote:
> On Tue, Mar 10, 2026, Tom Lendacky wrote:
>> On 3/10/26 13:35, Sean Christopherson wrote:
>>> On Tue, Mar 10, 2026, Tom Lendacky wrote:
>>>> I'm just saying that the unconditional trap for CR8_WRITE isn't flawed
>>>> for SEV-ES+ because AVIC can't work with SEV, so there isn't any time
>>>> that CR8 writes shouldn't be trapped.
>>>
>>> Yeah, I forgot that (obviously).
>>>
>>> But sync_cr8_to_lapic() is very broken, no?  INTERCEPT_CR8_WRITE will never be
>>> set, and svm->vmcb->control.int_ctl will become stale as soon as the VMSA is
>>> live, and so in all likelihood KVM is crushing CR8 to zero for SEV-ES guests.
>>
>> I don't think so. V_TPR is written on #VMEXIT even for SEV-ES+ guests,
>> and since it is a trap, CR8 is set and so V_TPR should have that value.
>> That would imply sync_cr8_to_lapic() should do the right thing.
> 
> But isn't svm->vmcb->control.int_ctl stale?  Oh.  "control", not "save".  /facepalm
> 
> Ah, and I assume Secure AVIC hides vTPR from the host?  Or at least prevents the
> host from setting it?

Secure AVIC will prevent the host from setting it since the backing page
lives in guest memory and is encrypted/private.

> 
>> After attempting to verify this behavior it turns out that writes to CR8
>> (and CR2) are, in fact, not trapped, but the APM was not updated with
>> this information (I'll send a patch to remove that code). KVM's CR8
>> value is, however, synced with the proper value through
>> sync_cr8_to_lapic() because V_TPR in the VMCB is updated on #VMEXIT.
> 
> Oh.  Huh.  So doesn't that mean that supporting Windows (or any other guest that
> uses TPR to mask interrupts) as an SEV-ES guest is practically impossible?  Because
> while KVM can observe and manipulate guest CR8, KVM won't be able to precisely
> detect when TPR drops below a pending IRQ.

Could we do something with virtual interrupt support? Today KVM uses the
virtual interrupt control to detect when an IRQ window opens. We could
do something similar by setting up the virtual interrupt priority,
V_INTR_PRIO, at the level of the current TPR/CR8 level. When the TPR
drops, that would trigger a #VMEXIT and allow the pending IRQ to be
injected. Thoughts?

Thanks,
Tom

> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  2026-03-10 22:33                   ` Tom Lendacky
@ 2026-03-10 22:40                     ` Sean Christopherson
  2026-03-11 13:43                       ` Tom Lendacky
  2026-03-11 17:39                     ` Paolo Bonzini
  1 sibling, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2026-03-10 22:40 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Naveen N Rao, Srikanth Aithal, Paolo Bonzini, kvm, linux-kernel,
	Jim Mattson, Maciej S . Szmigiero

On Tue, Mar 10, 2026, Tom Lendacky wrote:
> On 3/10/26 16:58, Sean Christopherson wrote:
> > On Tue, Mar 10, 2026, Tom Lendacky wrote:
> >> On 3/10/26 13:35, Sean Christopherson wrote:
> >>> On Tue, Mar 10, 2026, Tom Lendacky wrote:
> >>>> I'm just saying that the unconditional trap for CR8_WRITE isn't flawed
> >>>> for SEV-ES+ because AVIC can't work with SEV, so there isn't any time
> >>>> that CR8 writes shouldn't be trapped.
> >>>
> >>> Yeah, I forgot that (obviously).
> >>>
> >>> But sync_cr8_to_lapic() is very broken, no?  INTERCEPT_CR8_WRITE will never be
> >>> set, and svm->vmcb->control.int_ctl will become stale as soon as the VMSA is
> >>> live, and so in all likelihood KVM is crushing CR8 to zero for SEV-ES guests.
> >>
> >> I don't think so. V_TPR is written on #VMEXIT even for SEV-ES+ guests,
> >> and since it is a trap, CR8 is set and so V_TPR should have that value.
> >> That would imply sync_cr8_to_lapic() should do the right thing.
> > 
> > But isn't svm->vmcb->control.int_ctl stale?  Oh.  "control", not "save".  /facepalm
> > 
> > Ah, and I assume Secure AVIC hides vTPR from the host?  Or at least prevents the
> > host from setting it?
> 
> Secure AVIC will prevent the host from setting it since the backing page
> lives in guest memory and is encrypted/private.

What about vmcb->control.int_ctl though?  IIUC, that's the source of truth for
the effective vTPR, not the value in the virtual APIC page.

> >> After attempting to verify this behavior it turns out that writes to CR8
> >> (and CR2) are, in fact, not trapped, but the APM was not updated with
> >> this information (I'll send a patch to remove that code). KVM's CR8
> >> value is, however, synced with the proper value through
> >> sync_cr8_to_lapic() because V_TPR in the VMCB is updated on #VMEXIT.
> > 
> > Oh.  Huh.  So doesn't that mean that supporting Windows (or any other guest that
> > uses TPR to mask interrupts) as an SEV-ES guest is practically impossible?  Because
> > while KVM can observe and manipulate guest CR8, KVM won't be able to precisely
> > detect when TPR drops below a pending IRQ.
> 
> Could we do something with virtual interrupt support? Today KVM uses the
> virtual interrupt control to detect when an IRQ window opens. We could
> do something similar by setting up the virtual interrupt priority,
> V_INTR_PRIO, at the level of the current TPR/CR8 level. When the TPR
> drops, that would trigger a #VMEXIT and allow the pending IRQ to be
> injected. Thoughts?

Uh, yes, that would work?  I was thinking we couldn't model the priority, but
obviously that's not true.

FWIW, my preference would be to not add support unless someone asks for it :-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  2026-03-10 22:40                     ` Sean Christopherson
@ 2026-03-11 13:43                       ` Tom Lendacky
  0 siblings, 0 replies; 25+ messages in thread
From: Tom Lendacky @ 2026-03-11 13:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Naveen N Rao, Srikanth Aithal, Paolo Bonzini, kvm, linux-kernel,
	Jim Mattson, Maciej S . Szmigiero

On 3/10/26 17:40, Sean Christopherson wrote:
> On Tue, Mar 10, 2026, Tom Lendacky wrote:
>> On 3/10/26 16:58, Sean Christopherson wrote:
>>> On Tue, Mar 10, 2026, Tom Lendacky wrote:
>>>> On 3/10/26 13:35, Sean Christopherson wrote:
>>>>> On Tue, Mar 10, 2026, Tom Lendacky wrote:
>>>>>> I'm just saying that the unconditional trap for CR8_WRITE isn't flawed
>>>>>> for SEV-ES+ because AVIC can't work with SEV, so there isn't any time
>>>>>> that CR8 writes shouldn't be trapped.
>>>>>
>>>>> Yeah, I forgot that (obviously).
>>>>>
>>>>> But sync_cr8_to_lapic() is very broken, no?  INTERCEPT_CR8_WRITE will never be
>>>>> set, and svm->vmcb->control.int_ctl will become stale as soon as the VMSA is
>>>>> live, and so in all likelihood KVM is crushing CR8 to zero for SEV-ES guests.
>>>>
>>>> I don't think so. V_TPR is written on #VMEXIT even for SEV-ES+ guests,
>>>> and since it is a trap, CR8 is set and so V_TPR should have that value.
>>>> That would imply sync_cr8_to_lapic() should do the right thing.
>>>
>>> But isn't svm->vmcb->control.int_ctl stale?  Oh.  "control", not "save".  /facepalm
>>>
>>> Ah, and I assume Secure AVIC hides vTPR from the host?  Or at least prevents the
>>> host from setting it?
>>
>> Secure AVIC will prevent the host from setting it since the backing page
>> lives in guest memory and is encrypted/private.
> 
> What about vmcb->control.int_ctl though?  IIUC, that's the source of truth for
> the effective vTPR, not the value in the virtual APIC page.

For Secure AVIC, V_TPR from the vmcb->control.int_ctl isn't used,
instead it is saved to and restored from the VMSA. The APM should
probably be updated to be clear about that.

> 
>>>> After attempting to verify this behavior it turns out that writes to CR8
>>>> (and CR2) are, in fact, not trapped, but the APM was not updated with
>>>> this information (I'll send a patch to remove that code). KVM's CR8
>>>> value is, however, synced with the proper value through
>>>> sync_cr8_to_lapic() because V_TPR in the VMCB is updated on #VMEXIT.
>>>
>>> Oh.  Huh.  So doesn't that mean that supporting Windows (or any other guest that
>>> uses TPR to mask interrupts) as an SEV-ES guest is practically impossible?  Because
>>> while KVM can observe and manipulate guest CR8, KVM won't be able to precisely
>>> detect when TPR drops below a pending IRQ.
>>
>> Could we do something with virtual interrupt support? Today KVM uses the
>> virtual interrupt control to detect when an IRQ window opens. We could
>> do something similar by setting up the virtual interrupt priority,
>> V_INTR_PRIO, at the level of the current TPR/CR8 level. When the TPR
>> drops, that would trigger a #VMEXIT and allow the pending IRQ to be
>> injected. Thoughts?
> 
> Uh, yes, that would work?  I was thinking we couldn't model the priority, but
> obviously that's not true.
> 
> FWIW, my preference would be to not add support unless someone asks for it :-)

Agreed.

Thanks,
Tom


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  2026-03-10 22:33                   ` Tom Lendacky
  2026-03-10 22:40                     ` Sean Christopherson
@ 2026-03-11 17:39                     ` Paolo Bonzini
  1 sibling, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2026-03-11 17:39 UTC (permalink / raw)
  To: Tom Lendacky, Sean Christopherson
  Cc: Naveen N Rao, Srikanth Aithal, kvm, linux-kernel, Jim Mattson,
	Maciej S . Szmigiero

On 3/10/26 23:33, Tom Lendacky wrote:
> Could we do something with virtual interrupt support? Today KVM uses the
> virtual interrupt control to detect when an IRQ window opens. We could
> do something similar by setting up the virtual interrupt priority,
> V_INTR_PRIO, at the level of the current TPR/CR8 level. When the TPR
> drops, that would trigger a #VMEXIT and allow the pending IRQ to be
> injected. Thoughts?

Yes, in fact Hyper-V uses VINTR injection and mostly doesn't bother with 
interrupt windows.  KVM does it to keep the code similar between Intel 
and AMD.

But even if you don't go all the way with VINTR, it should be possible 
to implement something akin to Intel flexpriority through V_INTR_PRIO 
and V_TPR (and keeping the VINTR intercept set to detect the moment 
V_TPR falls below V_INTR_PRIO).

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2026-03-11 17:40 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-03 19:07 [PATCH 0/2] KVM: SVM: Fix CR8 intercpetion woes with AVIC Sean Christopherson
2026-02-03 19:07 ` [PATCH 1/2] KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC Sean Christopherson
2026-02-05  4:21   ` Jim Mattson
2026-02-06 14:00   ` Naveen N Rao
2026-02-06 18:17     ` Sean Christopherson
2026-02-09 10:23       ` Naveen N Rao
2026-02-09 21:36         ` Sean Christopherson
2026-02-03 19:07 ` [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated Sean Christopherson
2026-02-05  4:22   ` Jim Mattson
2026-02-06 17:11   ` Naveen N Rao
2026-02-06 17:55     ` Sean Christopherson
2026-03-10 15:41   ` Aithal, Srikanth
2026-03-10 17:17     ` Sean Christopherson
2026-03-10 17:36       ` Tom Lendacky
2026-03-10 17:48         ` Naveen N Rao
2026-03-10 18:00           ` Naveen N Rao
2026-03-10 18:12           ` Tom Lendacky
2026-03-10 18:35             ` Sean Christopherson
2026-03-10 21:41               ` Tom Lendacky
2026-03-10 21:58                 ` Sean Christopherson
2026-03-10 22:33                   ` Tom Lendacky
2026-03-10 22:40                     ` Sean Christopherson
2026-03-11 13:43                       ` Tom Lendacky
2026-03-11 17:39                     ` Paolo Bonzini
2026-03-05 17:07 ` [PATCH 0/2] KVM: SVM: Fix CR8 intercpetion woes with AVIC Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox