* [PATCH] KVM: SVM: Clear dummy V_IRQ in vmcb01 when deactivating AVIC
@ 2026-06-10 7:05 xin guo
2026-06-10 12:45 ` Sean Christopherson
0 siblings, 1 reply; 4+ messages in thread
From: xin guo @ 2026-06-10 7:05 UTC (permalink / raw)
To: seanjc, pbonzini
Cc: tglx, mingo, bp, dave.hansen, x86, hpa, kvm, linux-kernel,
xin guo
When KVM requests an IRQ window via svm_set_vintr(), it programs a
dummy VINTR with int_vector=0 and V_IRQ=1 into the current VMCB.
These int_ctl fields are documented to be ignored while AVIC is
enabled, so the dummy VINTR is harmless during AVIC operation.
However, avic_deactivate_vmcb() only clears AVIC_ENABLE_MASK and
X2APIC_MODE_MASK, and does not clear the VINTR injection state. Once
AVIC is disabled, hardware honors V_IRQ again and injects vector 0
into the guest on the next VMRUN. Windows guests observe this as a
spurious interrupt and crash, e.g. with STATUS_INTEGER_DIVIDE_BY_ZERO.
Fix this by also clearing V_IRQ_INJECTION_BITS_MASK from vmcb01's
int_ctl in avic_deactivate_vmcb(), so that no stale dummy VINTR is
left behind when AVIC transitions from enabled to disabled.
Signed-off-by: xin guo <m18700951735@163.com>
---
arch/x86/kvm/svm/avic.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index cdd5a6dc646f..b042c3f5f90e 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -257,7 +257,9 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
{
struct vmcb *vmcb = svm->vmcb01.ptr;
- vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
+ vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK |
+ V_IRQ_INJECTION_BITS_MASK);
+
vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
if (!is_sev_es_guest(&svm->vcpu))
--
2.27.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] KVM: SVM: Clear dummy V_IRQ in vmcb01 when deactivating AVIC
2026-06-10 7:05 [PATCH] KVM: SVM: Clear dummy V_IRQ in vmcb01 when deactivating AVIC xin guo
@ 2026-06-10 12:45 ` Sean Christopherson
2026-06-10 23:44 ` xinguo
0 siblings, 1 reply; 4+ messages in thread
From: Sean Christopherson @ 2026-06-10 12:45 UTC (permalink / raw)
To: xin guo; +Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, hpa, kvm,
linux-kernel
On Wed, Jun 10, 2026, xin guo wrote:
> When KVM requests an IRQ window via svm_set_vintr(), it programs a
> dummy VINTR with int_vector=0 and V_IRQ=1 into the current VMCB.
> These int_ctl fields are documented to be ignored while AVIC is
> enabled, so the dummy VINTR is harmless during AVIC operation.
>
> However, avic_deactivate_vmcb() only clears AVIC_ENABLE_MASK and
> X2APIC_MODE_MASK, and does not clear the VINTR injection state. Once
> AVIC is disabled, hardware honors V_IRQ again and injects vector 0
> into the guest on the next VMRUN. Windows guests observe this as a
> spurious interrupt and crash, e.g. with STATUS_INTEGER_DIVIDE_BY_ZERO.
Can you provide a reproducer, or at least instructions to reproduce? This feels
like we're treating a symptom, not the underlying bug. And while I can definitely
see KVM leaving a stale V_IRQ_MASK in vmcb01, I don't see how that can happen
while also clearing INTERCEPT_VINTR, as the only place INTERCEPT_VINTR is cleared
in vmcb01 is svm_clear_vintr(), which also purges V_IRQ_MASK.
svm_clr_intercept(svm, INTERCEPT_VINTR);
/* Drop int_ctl fields related to VINTR injection. */
svm->vmcb->control.int_ctl &= ~V_IRQ_INJECTION_BITS_MASK;
> Fix this by also clearing V_IRQ_INJECTION_BITS_MASK from vmcb01's
> int_ctl in avic_deactivate_vmcb(), so that no stale dummy VINTR is
> left behind when AVIC transitions from enabled to disabled.
>
> Signed-off-by: xin guo <m18700951735@163.com>
> ---
> arch/x86/kvm/svm/avic.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index cdd5a6dc646f..b042c3f5f90e 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -257,7 +257,9 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
> {
> struct vmcb *vmcb = svm->vmcb01.ptr;
>
> - vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
> + vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK |
> + V_IRQ_INJECTION_BITS_MASK);
> +
> vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
>
> if (!is_sev_es_guest(&svm->vcpu))
> --
> 2.27.0
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] KVM: SVM: Clear dummy V_IRQ in vmcb01 when deactivating AVIC
2026-06-10 12:45 ` Sean Christopherson
@ 2026-06-10 23:44 ` xinguo
2026-06-11 0:04 ` Sean Christopherson
0 siblings, 1 reply; 4+ messages in thread
From: xinguo @ 2026-06-10 23:44 UTC (permalink / raw)
To: Sean Christopherson
Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, hpa, kvm,
linux-kernel
Fair point, my changelog reasoning is incomplete and I owe you data
rather than speculation.
What I actually trigger is a workload that repeatedly toggles AVIC
on and off, i.e. avic_activate_vmcb() / avic_deactivate_vmcb() get
called many times in quick succession. Under that load the Windows
guest blue screens with STATUS_INTEGER_DIVIDE_BY_ZERO.
From the dump, Windows takes the bugcheck while dispatching an
interrupt: an unhandled #DE is raised inside the interrupt dispatch
path and ultimately reported by nt!KiInterruptHandler. The faulting
RIP saved in the trap frame is:
je nt!KiInterruptSubDispatchNoLockNoEtw+0xd5
which is a conditional branch, not a div/idiv. In other words, the
guest is being vectored through IDT entry 0 (#DE) at an instruction
boundary that has nothing to do with division, which is consistent
with the CPU delivering vector 0 from KVM rather than the guest
actually executing a faulting div. That is what made me suspect a
stale dummy V_IRQ (vector=0, V_IRQ=1) becoming effective once AVIC
is disabled.
I agree this needs to be backed by traces, not just by that
hypothesis. Let me instrument svm_set_vintr(), svm_clear_vintr(),
the intercept-recalc paths, and avic_deactivate_vmcb() to capture
vmcb01's int_ctl / int_vector / INTERCEPT_VINTR / is_guest_mode()
at each transition, reproduce the crash, and come back with the
actual call sequence that leaves vmcb01 in a state where V_IRQ
becomes effective once AVIC is disabled.
Please hold off on this patch in the meantime; I'll resend (or drop
it) based on what the trace shows.
Thanks for the review.
> 2026年6月10日 20:45,Sean Christopherson <seanjc@google.com> 写道:
>
> On Wed, Jun 10, 2026, xin guo wrote:
>> When KVM requests an IRQ window via svm_set_vintr(), it programs a
>> dummy VINTR with int_vector=0 and V_IRQ=1 into the current VMCB.
>> These int_ctl fields are documented to be ignored while AVIC is
>> enabled, so the dummy VINTR is harmless during AVIC operation.
>>
>> However, avic_deactivate_vmcb() only clears AVIC_ENABLE_MASK and
>> X2APIC_MODE_MASK, and does not clear the VINTR injection state. Once
>> AVIC is disabled, hardware honors V_IRQ again and injects vector 0
>> into the guest on the next VMRUN. Windows guests observe this as a
>> spurious interrupt and crash, e.g. with STATUS_INTEGER_DIVIDE_BY_ZERO.
>
> Can you provide a reproducer, or at least instructions to reproduce? This feels
> like we're treating a symptom, not the underlying bug. And while I can definitely
> see KVM leaving a stale V_IRQ_MASK in vmcb01, I don't see how that can happen
> while also clearing INTERCEPT_VINTR, as the only place INTERCEPT_VINTR is cleared
> in vmcb01 is svm_clear_vintr(), which also purges V_IRQ_MASK.
>
> svm_clr_intercept(svm, INTERCEPT_VINTR);
>
> /* Drop int_ctl fields related to VINTR injection. */
> svm->vmcb->control.int_ctl &= ~V_IRQ_INJECTION_BITS_MASK;
>
>> Fix this by also clearing V_IRQ_INJECTION_BITS_MASK from vmcb01's
>> int_ctl in avic_deactivate_vmcb(), so that no stale dummy VINTR is
>> left behind when AVIC transitions from enabled to disabled.
>>
>> Signed-off-by: xin guo <m18700951735@163.com>
>> ---
>> arch/x86/kvm/svm/avic.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
>> index cdd5a6dc646f..b042c3f5f90e 100644
>> --- a/arch/x86/kvm/svm/avic.c
>> +++ b/arch/x86/kvm/svm/avic.c
>> @@ -257,7 +257,9 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
>> {
>> struct vmcb *vmcb = svm->vmcb01.ptr;
>>
>> - vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
>> + vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK |
>> + V_IRQ_INJECTION_BITS_MASK);
>> +
>> vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
>>
>> if (!is_sev_es_guest(&svm->vcpu))
>> --
>> 2.27.0
>>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] KVM: SVM: Clear dummy V_IRQ in vmcb01 when deactivating AVIC
2026-06-10 23:44 ` xinguo
@ 2026-06-11 0:04 ` Sean Christopherson
0 siblings, 0 replies; 4+ messages in thread
From: Sean Christopherson @ 2026-06-11 0:04 UTC (permalink / raw)
To: xinguo; +Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, hpa, kvm,
linux-kernel
On Thu, Jun 11, 2026, xinguo wrote:
> Fair point, my changelog reasoning is incomplete and I owe you data
> rather than speculation.
Oh, I'm not doubting that there is a bug, I just don't think that purging V_IRQ
when AVIC is disabled is the right fix.
> What I actually trigger is a workload that repeatedly toggles AVIC
> on and off, i.e. avic_activate_vmcb() / avic_deactivate_vmcb() get
> called many times in quick succession. Under that load the Windows
> guest blue screens with STATUS_INTEGER_DIVIDE_BY_ZERO.
What kernel version are you using? And do you happen to know what exactly is
causing AVIC to be (un)inhibited? I ask because these commits that are landing
in 7.1 might be relevant:
fa78a514d632ed2428b7c573108d9658c00d536e KVM: Isolate apicv_update_lock and apicv_nr_irq_window_req in a cacheline
5617dddcfa30129562d7028ec766797d8c345f36 KVM: SVM: Optimize IRQ window inhibit handling
6563ddadd169cc6f509a75b3ff8354309dcb9080 KVM: SVM: Fix IRQ window inhibit handling across multiple vCPUs
7b402ec851cb66e73ee35913c7d802bba820086b KVM: SVM: Fix clearing IRQ window inhibit with nested guests
> From the dump, Windows takes the bugcheck while dispatching an
> interrupt: an unhandled #DE is raised inside the interrupt dispatch
> path and ultimately reported by nt!KiInterruptHandler. The faulting
> RIP saved in the trap frame is:
>
> je nt!KiInterruptSubDispatchNoLockNoEtw+0xd5
>
> which is a conditional branch, not a div/idiv. In other words, the
> guest is being vectored through IDT entry 0 (#DE) at an instruction
> boundary that has nothing to do with division, which is consistent
> with the CPU delivering vector 0 from KVM rather than the guest
> actually executing a faulting div. That is what made me suspect a
> stale dummy V_IRQ (vector=0, V_IRQ=1) becoming effective once AVIC
> is disabled.
>
> I agree this needs to be backed by traces, not just by that
> hypothesis. Let me instrument svm_set_vintr(), svm_clear_vintr(),
> the intercept-recalc paths, and avic_deactivate_vmcb() to capture
> vmcb01's int_ctl / int_vector / INTERCEPT_VINTR / is_guest_mode()
> at each transition, reproduce the crash, and come back with the
> actual call sequence that leaves vmcb01 in a state where V_IRQ
> becomes effective once AVIC is disabled.
>
> Please hold off on this patch in the meantime; I'll resend (or drop
> it) based on what the trace shows.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-06-11 0:04 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10 7:05 [PATCH] KVM: SVM: Clear dummy V_IRQ in vmcb01 when deactivating AVIC xin guo
2026-06-10 12:45 ` Sean Christopherson
2026-06-10 23:44 ` xinguo
2026-06-11 0:04 ` Sean Christopherson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.