* [PATCH] KVM: SVM: Clear dummy V_IRQ in vmcb01 when deactivating AVIC @ 2026-06-10 7:05 xin guo 2026-06-10 12:45 ` Sean Christopherson 0 siblings, 1 reply; 4+ messages in thread From: xin guo @ 2026-06-10 7:05 UTC (permalink / raw) To: seanjc, pbonzini Cc: tglx, mingo, bp, dave.hansen, x86, hpa, kvm, linux-kernel, xin guo When KVM requests an IRQ window via svm_set_vintr(), it programs a dummy VINTR with int_vector=0 and V_IRQ=1 into the current VMCB. These int_ctl fields are documented to be ignored while AVIC is enabled, so the dummy VINTR is harmless during AVIC operation. However, avic_deactivate_vmcb() only clears AVIC_ENABLE_MASK and X2APIC_MODE_MASK, and does not clear the VINTR injection state. Once AVIC is disabled, hardware honors V_IRQ again and injects vector 0 into the guest on the next VMRUN. Windows guests observe this as a spurious interrupt and crash, e.g. with STATUS_INTEGER_DIVIDE_BY_ZERO. Fix this by also clearing V_IRQ_INJECTION_BITS_MASK from vmcb01's int_ctl in avic_deactivate_vmcb(), so that no stale dummy VINTR is left behind when AVIC transitions from enabled to disabled. Signed-off-by: xin guo <m18700951735@163.com> --- arch/x86/kvm/svm/avic.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index cdd5a6dc646f..b042c3f5f90e 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -257,7 +257,9 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm) { struct vmcb *vmcb = svm->vmcb01.ptr; - vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK); + vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK | + V_IRQ_INJECTION_BITS_MASK); + vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK; if (!is_sev_es_guest(&svm->vcpu)) -- 2.27.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] KVM: SVM: Clear dummy V_IRQ in vmcb01 when deactivating AVIC 2026-06-10 7:05 [PATCH] KVM: SVM: Clear dummy V_IRQ in vmcb01 when deactivating AVIC xin guo @ 2026-06-10 12:45 ` Sean Christopherson 2026-06-10 23:44 ` xinguo 0 siblings, 1 reply; 4+ messages in thread From: Sean Christopherson @ 2026-06-10 12:45 UTC (permalink / raw) To: xin guo; +Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, hpa, kvm, linux-kernel On Wed, Jun 10, 2026, xin guo wrote: > When KVM requests an IRQ window via svm_set_vintr(), it programs a > dummy VINTR with int_vector=0 and V_IRQ=1 into the current VMCB. > These int_ctl fields are documented to be ignored while AVIC is > enabled, so the dummy VINTR is harmless during AVIC operation. > > However, avic_deactivate_vmcb() only clears AVIC_ENABLE_MASK and > X2APIC_MODE_MASK, and does not clear the VINTR injection state. Once > AVIC is disabled, hardware honors V_IRQ again and injects vector 0 > into the guest on the next VMRUN. Windows guests observe this as a > spurious interrupt and crash, e.g. with STATUS_INTEGER_DIVIDE_BY_ZERO. Can you provide a reproducer, or at least instructions to reproduce? This feels like we're treating a symptom, not the underlying bug. And while I can definitely see KVM leaving a stale V_IRQ_MASK in vmcb01, I don't see how that can happen while also clearing INTERCEPT_VINTR, as the only place INTERCEPT_VINTR is cleared in vmcb01 is svm_clear_vintr(), which also purges V_IRQ_MASK. svm_clr_intercept(svm, INTERCEPT_VINTR); /* Drop int_ctl fields related to VINTR injection. */ svm->vmcb->control.int_ctl &= ~V_IRQ_INJECTION_BITS_MASK; > Fix this by also clearing V_IRQ_INJECTION_BITS_MASK from vmcb01's > int_ctl in avic_deactivate_vmcb(), so that no stale dummy VINTR is > left behind when AVIC transitions from enabled to disabled. > > Signed-off-by: xin guo <m18700951735@163.com> > --- > arch/x86/kvm/svm/avic.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > index cdd5a6dc646f..b042c3f5f90e 100644 > --- a/arch/x86/kvm/svm/avic.c > +++ b/arch/x86/kvm/svm/avic.c > @@ -257,7 +257,9 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm) > { > struct vmcb *vmcb = svm->vmcb01.ptr; > > - vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK); > + vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK | > + V_IRQ_INJECTION_BITS_MASK); > + > vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK; > > if (!is_sev_es_guest(&svm->vcpu)) > -- > 2.27.0 > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] KVM: SVM: Clear dummy V_IRQ in vmcb01 when deactivating AVIC 2026-06-10 12:45 ` Sean Christopherson @ 2026-06-10 23:44 ` xinguo 2026-06-11 0:04 ` Sean Christopherson 0 siblings, 1 reply; 4+ messages in thread From: xinguo @ 2026-06-10 23:44 UTC (permalink / raw) To: Sean Christopherson Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, hpa, kvm, linux-kernel Fair point, my changelog reasoning is incomplete and I owe you data rather than speculation. What I actually trigger is a workload that repeatedly toggles AVIC on and off, i.e. avic_activate_vmcb() / avic_deactivate_vmcb() get called many times in quick succession. Under that load the Windows guest blue screens with STATUS_INTEGER_DIVIDE_BY_ZERO. From the dump, Windows takes the bugcheck while dispatching an interrupt: an unhandled #DE is raised inside the interrupt dispatch path and ultimately reported by nt!KiInterruptHandler. The faulting RIP saved in the trap frame is: je nt!KiInterruptSubDispatchNoLockNoEtw+0xd5 which is a conditional branch, not a div/idiv. In other words, the guest is being vectored through IDT entry 0 (#DE) at an instruction boundary that has nothing to do with division, which is consistent with the CPU delivering vector 0 from KVM rather than the guest actually executing a faulting div. That is what made me suspect a stale dummy V_IRQ (vector=0, V_IRQ=1) becoming effective once AVIC is disabled. I agree this needs to be backed by traces, not just by that hypothesis. Let me instrument svm_set_vintr(), svm_clear_vintr(), the intercept-recalc paths, and avic_deactivate_vmcb() to capture vmcb01's int_ctl / int_vector / INTERCEPT_VINTR / is_guest_mode() at each transition, reproduce the crash, and come back with the actual call sequence that leaves vmcb01 in a state where V_IRQ becomes effective once AVIC is disabled. Please hold off on this patch in the meantime; I'll resend (or drop it) based on what the trace shows. Thanks for the review. > 2026年6月10日 20:45,Sean Christopherson <seanjc@google.com> 写道: > > On Wed, Jun 10, 2026, xin guo wrote: >> When KVM requests an IRQ window via svm_set_vintr(), it programs a >> dummy VINTR with int_vector=0 and V_IRQ=1 into the current VMCB. >> These int_ctl fields are documented to be ignored while AVIC is >> enabled, so the dummy VINTR is harmless during AVIC operation. >> >> However, avic_deactivate_vmcb() only clears AVIC_ENABLE_MASK and >> X2APIC_MODE_MASK, and does not clear the VINTR injection state. Once >> AVIC is disabled, hardware honors V_IRQ again and injects vector 0 >> into the guest on the next VMRUN. Windows guests observe this as a >> spurious interrupt and crash, e.g. with STATUS_INTEGER_DIVIDE_BY_ZERO. > > Can you provide a reproducer, or at least instructions to reproduce? This feels > like we're treating a symptom, not the underlying bug. And while I can definitely > see KVM leaving a stale V_IRQ_MASK in vmcb01, I don't see how that can happen > while also clearing INTERCEPT_VINTR, as the only place INTERCEPT_VINTR is cleared > in vmcb01 is svm_clear_vintr(), which also purges V_IRQ_MASK. > > svm_clr_intercept(svm, INTERCEPT_VINTR); > > /* Drop int_ctl fields related to VINTR injection. */ > svm->vmcb->control.int_ctl &= ~V_IRQ_INJECTION_BITS_MASK; > >> Fix this by also clearing V_IRQ_INJECTION_BITS_MASK from vmcb01's >> int_ctl in avic_deactivate_vmcb(), so that no stale dummy VINTR is >> left behind when AVIC transitions from enabled to disabled. >> >> Signed-off-by: xin guo <m18700951735@163.com> >> --- >> arch/x86/kvm/svm/avic.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c >> index cdd5a6dc646f..b042c3f5f90e 100644 >> --- a/arch/x86/kvm/svm/avic.c >> +++ b/arch/x86/kvm/svm/avic.c >> @@ -257,7 +257,9 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm) >> { >> struct vmcb *vmcb = svm->vmcb01.ptr; >> >> - vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK); >> + vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK | >> + V_IRQ_INJECTION_BITS_MASK); >> + >> vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK; >> >> if (!is_sev_es_guest(&svm->vcpu)) >> -- >> 2.27.0 >> ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] KVM: SVM: Clear dummy V_IRQ in vmcb01 when deactivating AVIC 2026-06-10 23:44 ` xinguo @ 2026-06-11 0:04 ` Sean Christopherson 0 siblings, 0 replies; 4+ messages in thread From: Sean Christopherson @ 2026-06-11 0:04 UTC (permalink / raw) To: xinguo; +Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, hpa, kvm, linux-kernel On Thu, Jun 11, 2026, xinguo wrote: > Fair point, my changelog reasoning is incomplete and I owe you data > rather than speculation. Oh, I'm not doubting that there is a bug, I just don't think that purging V_IRQ when AVIC is disabled is the right fix. > What I actually trigger is a workload that repeatedly toggles AVIC > on and off, i.e. avic_activate_vmcb() / avic_deactivate_vmcb() get > called many times in quick succession. Under that load the Windows > guest blue screens with STATUS_INTEGER_DIVIDE_BY_ZERO. What kernel version are you using? And do you happen to know what exactly is causing AVIC to be (un)inhibited? I ask because these commits that are landing in 7.1 might be relevant: fa78a514d632ed2428b7c573108d9658c00d536e KVM: Isolate apicv_update_lock and apicv_nr_irq_window_req in a cacheline 5617dddcfa30129562d7028ec766797d8c345f36 KVM: SVM: Optimize IRQ window inhibit handling 6563ddadd169cc6f509a75b3ff8354309dcb9080 KVM: SVM: Fix IRQ window inhibit handling across multiple vCPUs 7b402ec851cb66e73ee35913c7d802bba820086b KVM: SVM: Fix clearing IRQ window inhibit with nested guests > From the dump, Windows takes the bugcheck while dispatching an > interrupt: an unhandled #DE is raised inside the interrupt dispatch > path and ultimately reported by nt!KiInterruptHandler. The faulting > RIP saved in the trap frame is: > > je nt!KiInterruptSubDispatchNoLockNoEtw+0xd5 > > which is a conditional branch, not a div/idiv. In other words, the > guest is being vectored through IDT entry 0 (#DE) at an instruction > boundary that has nothing to do with division, which is consistent > with the CPU delivering vector 0 from KVM rather than the guest > actually executing a faulting div. That is what made me suspect a > stale dummy V_IRQ (vector=0, V_IRQ=1) becoming effective once AVIC > is disabled. > > I agree this needs to be backed by traces, not just by that > hypothesis. Let me instrument svm_set_vintr(), svm_clear_vintr(), > the intercept-recalc paths, and avic_deactivate_vmcb() to capture > vmcb01's int_ctl / int_vector / INTERCEPT_VINTR / is_guest_mode() > at each transition, reproduce the crash, and come back with the > actual call sequence that leaves vmcb01 in a state where V_IRQ > becomes effective once AVIC is disabled. > > Please hold off on this patch in the meantime; I'll resend (or drop > it) based on what the trace shows. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-06-11 0:04 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-10 7:05 [PATCH] KVM: SVM: Clear dummy V_IRQ in vmcb01 when deactivating AVIC xin guo 2026-06-10 12:45 ` Sean Christopherson 2026-06-10 23:44 ` xinguo 2026-06-11 0:04 ` Sean Christopherson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.