Kernel KVM virtualization development
 help / color / mirror / Atom feed
* [PATCH] KVM: SVM: Clear dummy V_IRQ in vmcb01 when deactivating AVIC
@ 2026-06-10  7:05 xin guo
  2026-06-10 12:45 ` Sean Christopherson
  0 siblings, 1 reply; 4+ messages in thread
From: xin guo @ 2026-06-10  7:05 UTC (permalink / raw)
  To: seanjc, pbonzini
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, kvm, linux-kernel,
	xin guo

When KVM requests an IRQ window via svm_set_vintr(), it programs a
dummy VINTR with int_vector=0 and V_IRQ=1 into the current VMCB.
These int_ctl fields are documented to be ignored while AVIC is
enabled, so the dummy VINTR is harmless during AVIC operation.

However, avic_deactivate_vmcb() only clears AVIC_ENABLE_MASK and
X2APIC_MODE_MASK, and does not clear the VINTR injection state. Once
AVIC is disabled, hardware honors V_IRQ again and injects vector 0
into the guest on the next VMRUN. Windows guests observe this as a
spurious interrupt and crash, e.g. with STATUS_INTEGER_DIVIDE_BY_ZERO.

Fix this by also clearing V_IRQ_INJECTION_BITS_MASK from vmcb01's
int_ctl in avic_deactivate_vmcb(), so that no stale dummy VINTR is
left behind when AVIC transitions from enabled to disabled.

Signed-off-by: xin guo <m18700951735@163.com>
---
 arch/x86/kvm/svm/avic.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index cdd5a6dc646f..b042c3f5f90e 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -257,7 +257,9 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
 {
 	struct vmcb *vmcb = svm->vmcb01.ptr;
 
-	vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
+	vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK |
+				V_IRQ_INJECTION_BITS_MASK);
+
 	vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
 
 	if (!is_sev_es_guest(&svm->vcpu))
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: SVM: Clear dummy V_IRQ in vmcb01 when deactivating AVIC
  2026-06-10  7:05 [PATCH] KVM: SVM: Clear dummy V_IRQ in vmcb01 when deactivating AVIC xin guo
@ 2026-06-10 12:45 ` Sean Christopherson
  2026-06-10 23:44   ` xinguo
  0 siblings, 1 reply; 4+ messages in thread
From: Sean Christopherson @ 2026-06-10 12:45 UTC (permalink / raw)
  To: xin guo; +Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, hpa, kvm,
	linux-kernel

On Wed, Jun 10, 2026, xin guo wrote:
> When KVM requests an IRQ window via svm_set_vintr(), it programs a
> dummy VINTR with int_vector=0 and V_IRQ=1 into the current VMCB.
> These int_ctl fields are documented to be ignored while AVIC is
> enabled, so the dummy VINTR is harmless during AVIC operation.
> 
> However, avic_deactivate_vmcb() only clears AVIC_ENABLE_MASK and
> X2APIC_MODE_MASK, and does not clear the VINTR injection state. Once
> AVIC is disabled, hardware honors V_IRQ again and injects vector 0
> into the guest on the next VMRUN. Windows guests observe this as a
> spurious interrupt and crash, e.g. with STATUS_INTEGER_DIVIDE_BY_ZERO.

Can you provide a reproducer, or at least instructions to reproduce?  This feels
like we're treating a symptom, not the underlying bug.  And while I can definitely
see KVM leaving a stale V_IRQ_MASK in vmcb01, I don't see how that can happen
while also clearing INTERCEPT_VINTR, as the only place INTERCEPT_VINTR is cleared
in vmcb01 is svm_clear_vintr(), which also purges V_IRQ_MASK.

	svm_clr_intercept(svm, INTERCEPT_VINTR);

	/* Drop int_ctl fields related to VINTR injection.  */
	svm->vmcb->control.int_ctl &= ~V_IRQ_INJECTION_BITS_MASK;

> Fix this by also clearing V_IRQ_INJECTION_BITS_MASK from vmcb01's
> int_ctl in avic_deactivate_vmcb(), so that no stale dummy VINTR is
> left behind when AVIC transitions from enabled to disabled.
> 
> Signed-off-by: xin guo <m18700951735@163.com>
> ---
>  arch/x86/kvm/svm/avic.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index cdd5a6dc646f..b042c3f5f90e 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -257,7 +257,9 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
>  {
>  	struct vmcb *vmcb = svm->vmcb01.ptr;
>  
> -	vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
> +	vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK |
> +				V_IRQ_INJECTION_BITS_MASK);
> +
>  	vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
>  
>  	if (!is_sev_es_guest(&svm->vcpu))
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: SVM: Clear dummy V_IRQ in vmcb01 when deactivating AVIC
  2026-06-10 12:45 ` Sean Christopherson
@ 2026-06-10 23:44   ` xinguo
  2026-06-11  0:04     ` Sean Christopherson
  0 siblings, 1 reply; 4+ messages in thread
From: xinguo @ 2026-06-10 23:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, hpa, kvm,
	linux-kernel

Fair point, my changelog reasoning is incomplete and I owe you data
rather than speculation.

What I actually trigger is a workload that repeatedly toggles AVIC
on and off, i.e. avic_activate_vmcb() / avic_deactivate_vmcb() get
called many times in quick succession.  Under that load the Windows
guest blue screens with STATUS_INTEGER_DIVIDE_BY_ZERO.

From the dump, Windows takes the bugcheck while dispatching an
interrupt: an unhandled #DE is raised inside the interrupt dispatch
path and ultimately reported by nt!KiInterruptHandler.  The faulting
RIP saved in the trap frame is:

	je   nt!KiInterruptSubDispatchNoLockNoEtw+0xd5

which is a conditional branch, not a div/idiv.  In other words, the
guest is being vectored through IDT entry 0 (#DE) at an instruction
boundary that has nothing to do with division, which is consistent
with the CPU delivering vector 0 from KVM rather than the guest
actually executing a faulting div.  That is what made me suspect a
stale dummy V_IRQ (vector=0, V_IRQ=1) becoming effective once AVIC
is disabled.

I agree this needs to be backed by traces, not just by that
hypothesis.  Let me instrument svm_set_vintr(), svm_clear_vintr(),
the intercept-recalc paths, and avic_deactivate_vmcb() to capture
vmcb01's int_ctl / int_vector / INTERCEPT_VINTR / is_guest_mode()
at each transition, reproduce the crash, and come back with the
actual call sequence that leaves vmcb01 in a state where V_IRQ
becomes effective once AVIC is disabled.

Please hold off on this patch in the meantime; I'll resend (or drop
it) based on what the trace shows.

Thanks for the review.

> 2026年6月10日 20:45,Sean Christopherson <seanjc@google.com> 写道:
> 
> On Wed, Jun 10, 2026, xin guo wrote:
>> When KVM requests an IRQ window via svm_set_vintr(), it programs a
>> dummy VINTR with int_vector=0 and V_IRQ=1 into the current VMCB.
>> These int_ctl fields are documented to be ignored while AVIC is
>> enabled, so the dummy VINTR is harmless during AVIC operation.
>> 
>> However, avic_deactivate_vmcb() only clears AVIC_ENABLE_MASK and
>> X2APIC_MODE_MASK, and does not clear the VINTR injection state. Once
>> AVIC is disabled, hardware honors V_IRQ again and injects vector 0
>> into the guest on the next VMRUN. Windows guests observe this as a
>> spurious interrupt and crash, e.g. with STATUS_INTEGER_DIVIDE_BY_ZERO.
> 
> Can you provide a reproducer, or at least instructions to reproduce?  This feels
> like we're treating a symptom, not the underlying bug.  And while I can definitely
> see KVM leaving a stale V_IRQ_MASK in vmcb01, I don't see how that can happen
> while also clearing INTERCEPT_VINTR, as the only place INTERCEPT_VINTR is cleared
> in vmcb01 is svm_clear_vintr(), which also purges V_IRQ_MASK.
> 
> 	svm_clr_intercept(svm, INTERCEPT_VINTR);
> 
> 	/* Drop int_ctl fields related to VINTR injection.  */
> 	svm->vmcb->control.int_ctl &= ~V_IRQ_INJECTION_BITS_MASK;
> 
>> Fix this by also clearing V_IRQ_INJECTION_BITS_MASK from vmcb01's
>> int_ctl in avic_deactivate_vmcb(), so that no stale dummy VINTR is
>> left behind when AVIC transitions from enabled to disabled.
>> 
>> Signed-off-by: xin guo <m18700951735@163.com>
>> ---
>> arch/x86/kvm/svm/avic.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
>> index cdd5a6dc646f..b042c3f5f90e 100644
>> --- a/arch/x86/kvm/svm/avic.c
>> +++ b/arch/x86/kvm/svm/avic.c
>> @@ -257,7 +257,9 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
>> {
>> 	struct vmcb *vmcb = svm->vmcb01.ptr;
>> 
>> -	vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
>> +	vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK |
>> +				V_IRQ_INJECTION_BITS_MASK);
>> +
>> 	vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
>> 
>> 	if (!is_sev_es_guest(&svm->vcpu))
>> -- 
>> 2.27.0
>> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: SVM: Clear dummy V_IRQ in vmcb01 when deactivating AVIC
  2026-06-10 23:44   ` xinguo
@ 2026-06-11  0:04     ` Sean Christopherson
  0 siblings, 0 replies; 4+ messages in thread
From: Sean Christopherson @ 2026-06-11  0:04 UTC (permalink / raw)
  To: xinguo; +Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, hpa, kvm,
	linux-kernel

On Thu, Jun 11, 2026, xinguo wrote:
> Fair point, my changelog reasoning is incomplete and I owe you data
> rather than speculation.

Oh, I'm not doubting that there is a bug, I just don't think that purging V_IRQ
when AVIC is disabled is the right fix.

> What I actually trigger is a workload that repeatedly toggles AVIC
> on and off, i.e. avic_activate_vmcb() / avic_deactivate_vmcb() get
> called many times in quick succession.  Under that load the Windows
> guest blue screens with STATUS_INTEGER_DIVIDE_BY_ZERO.

What kernel version are you using?  And do you happen to know what exactly is
causing AVIC to be (un)inhibited?  I ask because these commits that are landing
in 7.1 might be relevant:

  fa78a514d632ed2428b7c573108d9658c00d536e KVM: Isolate apicv_update_lock and apicv_nr_irq_window_req in a cacheline
  5617dddcfa30129562d7028ec766797d8c345f36 KVM: SVM: Optimize IRQ window inhibit handling
  6563ddadd169cc6f509a75b3ff8354309dcb9080 KVM: SVM: Fix IRQ window inhibit handling across multiple vCPUs
  7b402ec851cb66e73ee35913c7d802bba820086b KVM: SVM: Fix clearing IRQ window inhibit with nested guests

> From the dump, Windows takes the bugcheck while dispatching an
> interrupt: an unhandled #DE is raised inside the interrupt dispatch
> path and ultimately reported by nt!KiInterruptHandler.  The faulting
> RIP saved in the trap frame is:
> 
> 	je   nt!KiInterruptSubDispatchNoLockNoEtw+0xd5
> 
> which is a conditional branch, not a div/idiv.  In other words, the
> guest is being vectored through IDT entry 0 (#DE) at an instruction
> boundary that has nothing to do with division, which is consistent
> with the CPU delivering vector 0 from KVM rather than the guest
> actually executing a faulting div.  That is what made me suspect a
> stale dummy V_IRQ (vector=0, V_IRQ=1) becoming effective once AVIC
> is disabled.
> 
> I agree this needs to be backed by traces, not just by that
> hypothesis.  Let me instrument svm_set_vintr(), svm_clear_vintr(),
> the intercept-recalc paths, and avic_deactivate_vmcb() to capture
> vmcb01's int_ctl / int_vector / INTERCEPT_VINTR / is_guest_mode()
> at each transition, reproduce the crash, and come back with the
> actual call sequence that leaves vmcb01 in a state where V_IRQ
> becomes effective once AVIC is disabled.
> 
> Please hold off on this patch in the meantime; I'll resend (or drop
> it) based on what the trace shows.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-11  0:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10  7:05 [PATCH] KVM: SVM: Clear dummy V_IRQ in vmcb01 when deactivating AVIC xin guo
2026-06-10 12:45 ` Sean Christopherson
2026-06-10 23:44   ` xinguo
2026-06-11  0:04     ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox