* KVM: Nested VGIC emulation leads to infinite IRQ exceptions
@ 2025-09-30 21:11 Volodymyr Babchuk
2025-10-01 7:23 ` Marc Zyngier
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Volodymyr Babchuk @ 2025-09-30 21:11 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org; +Cc: Marc Zyngier, Dmytro Terletskyi
Hi all,
We are trying to run Xen as KVM nested hypervisor (again!) and we have
encountered strange issue with GIC nested emulation. I am certain that
we'll dig to the root cause, but probably someone on the ML will save us
a couple of days of debugging by providing with some insights.
So, setup is following: QEMU 9.2 is running Xen 4.20 with KVM (latest
Linux master branch) as accelerator. QEMU provides a couple of virtio
devices to the VM and some of these device are passed thought to DomU
(we had to hook these devices to vSMMU, but this is another
story). Sometimes we observe the following sequence of events:
1. DomU gets IRQ from a virtio device
2. DomU acknowledges the IRQ by reading IAR1 register
3. DomU is unable to deactivate the IRQ (there is no write to the
EOI register)
We are not sure why this happens, but our current theory is that DomU's
vvcpu0 is interrupted during handling of the IRQ by Xen's timer
interrupt. Also, we are not able to catch this specific moment in KVM
trace because of lots of lost events. Anyways, after this we are seeing
the following loop:
4. vCPU switches to Xen via IRQ Exception
5. Xen reads IAR1 to get IRQ nr, but gets 1023 (aka no IRQs)
6. Xen issues ERET to return back to guest
7. GOTO 4.
This basically renders the whole vCPU stuck. Also we noticed that DomU's
vvCPU is stuck right after accessing virtio mmio register. So looks like
this is what happens:
1. QEMU sends virtio IRQ to the VM
2. Xen handles the IRQ and injects it into DomU
3. DomU tries to handle it and accesses a virtio mmio register
4. This produces a memory fault that leads to switch back to KVM (and
then to QEMU of course) so QEMU can handle MMIO access
5. When QEMU continues vCPU thread, it immediately gets switched back to
vEL2 (probably due to timer IRQ, but this is my speculation)
6. the vCPU is spinning in the aforementioned loop
Looks like this happens because of empty LRs, but we still didn't
confirmed this because the issue is not 100% reproducible. I'll be glad
to hear any suggestions.
This is a part of the KVM trace, where you can see that vCPU in question
tries to perform ERET to Linux in DomU but is being brought back to
vEL2. In this particular case this is vCPU1 / vvCPU0. I filtered out
other vCPUs to reduce clutter.
qemu-system-aar-41290 [000] d.... 12023.695620: kvm_entry: PC: 0x00000a0000267c80
qemu-system-aar-41290 [000] d.... 12023.695620: vgic_update_irq_pending: VCPU: 1, IRQ 25, level: 0
qemu-system-aar-41290 [000] d.... 12023.695621: kvm_get_timer_map: VCPU: 1, dv: 2, dp: 3, ev: 1, ep: 0
qemu-system-aar-41290 [000] d.... 12023.695621: kvm_timer_emulate: arch_timer_ctx_index: 1 (should_fire: 1)
qemu-system-aar-41290 [000] d.... 12023.695621: kvm_timer_emulate: arch_timer_ctx_index: 0 (should_fire: 0)
qemu-system-aar-41290 [000] ..... 12023.695621: kvm_exit: TRAP: HSR_EC: 0x001a (ERET), PC: 0x00000a00002674e0
qemu-system-aar-41290 [000] ..... 12023.695621: kvm_get_timer_map: VCPU: 1, dv: 2, dp: 3, ev: 1, ep: 0
qemu-system-aar-41290 [000] d.... 12023.695622: kvm_timer_save_state: CTL: 0x000000 CVAL: 0x0 arch_timer_ctx_index: 2
qemu-system-aar-41290 [000] d.... 12023.695622: kvm_timer_save_state: CTL: 0x000005 CVAL: 0x426f7d24736c arch_timer_ctx_index: 3
qemu-system-aar-41290 [000] ..... 12023.695622: kvm_nested_eret: elr_el2: 0xffffffc0010ac5a4 spsr_el2: 0x024000c5 (M: EL1h) hcr_el2: 807c663f
qemu-system-aar-41290 [000] ..... 12023.695622: kvm_get_timer_map: VCPU: 1, dv: 1, dp: 0, ev: 2, ep: 3
qemu-system-aar-41290 [000] ..... 12023.695622: kvm_timer_update_irq: VCPU: 1, IRQ 27, level 1
qemu-system-aar-41290 [000] ..... 12023.695623: vgic_update_irq_pending: VCPU: 1, IRQ 27, level: 1
qemu-system-aar-41290 [000] ..... 12023.695623: kvm_timer_update_irq: VCPU: 1, IRQ 30, level 0
qemu-system-aar-41290 [000] ..... 12023.695623: vgic_update_irq_pending: VCPU: 1, IRQ 30, level: 0
qemu-system-aar-41290 [000] d.... 12023.695623: kvm_timer_restore_state: CTL: 0x000005 CVAL: 0x48aac64bd arch_timer_ctx_index: 1
qemu-system-aar-41290 [000] d.... 12023.695624: kvm_timer_restore_state: CTL: 0x000000 CVAL: 0x0 arch_timer_ctx_index: 0
qemu-system-aar-41290 [000] ..... 12023.695624: kvm_timer_emulate: arch_timer_ctx_index: 2 (should_fire: 0)
qemu-system-aar-41290 [000] ..... 12023.695624: kvm_timer_emulate: arch_timer_ctx_index: 3 (should_fire: 1)
qemu-system-aar-41290 [000] ..... 12023.695626: kvm_get_timer_map: VCPU: 1, dv: 1, dp: 0, ev: 2, ep: 3
qemu-system-aar-41290 [000] d.... 12023.695626: kvm_timer_save_state: CTL: 0x000005 CVAL: 0x48aac64bd arch_timer_ctx_index: 1
qemu-system-aar-41290 [000] d.... 12023.695627: kvm_timer_save_state: CTL: 0x000000 CVAL: 0x0 arch_timer_ctx_index: 0
qemu-system-aar-41290 [000] ..... 12023.695627: kvm_inject_nested_exception: IRQ: esr_el2 0x0 elr_el2: 0xffffffc0010ac5a4 spsr_el2: 0x024000c5 (M: EL1h) hcr_el2: 807c663f
qemu-system-aar-41290 [000] ..... 12023.695627: kvm_get_timer_map: VCPU: 1, dv: 2, dp: 3, ev: 1, ep: 0
qemu-system-aar-41290 [000] ..... 12023.695627: kvm_timer_update_irq: VCPU: 1, IRQ 28, level 0
qemu-system-aar-41290 [000] ..... 12023.695627: vgic_update_irq_pending: VCPU: 1, IRQ 28, level: 0
qemu-system-aar-41290 [000] ..... 12023.695628: kvm_timer_update_irq: VCPU: 1, IRQ 26, level 1
qemu-system-aar-41290 [000] ..... 12023.695628: vgic_update_irq_pending: VCPU: 1, IRQ 26, level: 1
qemu-system-aar-41290 [000] d.... 12023.695628: kvm_timer_restore_state: CTL: 0x000000 CVAL: 0x0 arch_timer_ctx_index: 2
qemu-system-aar-41290 [000] d.... 12023.695628: kvm_timer_restore_state: CTL: 0x000005 CVAL: 0x426f7d24736c arch_timer_ctx_index: 3
qemu-system-aar-41290 [000] ..... 12023.695629: kvm_timer_emulate: arch_timer_ctx_index: 1 (should_fire: 1)
qemu-system-aar-41290 [000] ..... 12023.695629: kvm_timer_emulate: arch_timer_ctx_index: 0 (should_fire: 0)
qemu-system-aar-41290 [000] d.... 12023.695632: vgic_update_irq_pending: VCPU: 1, IRQ 25, level: 0
qemu-system-aar-41290 [000] d.... 12023.695632: vgic_update_irq_pending: VCPU: 1, IRQ 25, level: 0
qemu-system-aar-41290 [000] d.... 12023.695633: vgic_update_irq_pending: VCPU: 1, IRQ 25, level: 0
qemu-system-aar-41290 [000] d.... 12023.695633: kvm_entry: PC: 0x00000a0000267c80
qemu-system-aar-41290 [000] d.... 12023.695634: vgic_update_irq_pending: VCPU: 1, IRQ 25, level: 0
qemu-system-aar-41290 [000] d.... 12023.695634: kvm_get_timer_map: VCPU: 1, dv: 2, dp: 3, ev: 1, ep: 0
qemu-system-aar-41290 [000] d.... 12023.695634: kvm_timer_emulate: arch_timer_ctx_index: 1 (should_fire: 1)
qemu-system-aar-41290 [000] d.... 12023.695635: kvm_timer_emulate: arch_timer_ctx_index: 0 (should_fire: 0)
qemu-system-aar-41290 [000] ..... 12023.695635: kvm_exit: TRAP: HSR_EC: 0x001a (ERET), PC: 0x00000a00002674e0
qemu-system-aar-41290 [000] ..... 12023.695635: kvm_get_timer_map: VCPU: 1, dv: 2, dp: 3, ev: 1, ep: 0
qemu-system-aar-41290 [000] d.... 12023.695635: kvm_timer_save_state: CTL: 0x000000 CVAL: 0x0 arch_timer_ctx_index: 2
qemu-system-aar-41290 [000] d.... 12023.695635: kvm_timer_save_state: CTL: 0x000005 CVAL: 0x426f7d24736c arch_timer_ctx_index: 3
qemu-system-aar-41290 [000] ..... 12023.695636: kvm_nested_eret: elr_el2: 0xffffffc0010ac5a4 spsr_el2: 0x024000c5 (M: EL1h) hcr_el2: 807c663f
qemu-system-aar-41290 [000] ..... 12023.695636: kvm_get_timer_map: VCPU: 1, dv: 1, dp: 0, ev: 2, ep: 3
qemu-system-aar-41290 [000] ..... 12023.695636: kvm_timer_update_irq: VCPU: 1, IRQ 27, level 1
qemu-system-aar-41290 [000] ..... 12023.695636: vgic_update_irq_pending: VCPU: 1, IRQ 27, level: 1
qemu-system-aar-41290 [000] ..... 12023.695636: kvm_timer_update_irq: VCPU: 1, IRQ 30, level 0
qemu-system-aar-41290 [000] ..... 12023.695637: vgic_update_irq_pending: VCPU: 1, IRQ 30, level: 0
qemu-system-aar-41290 [000] d.... 12023.695637: kvm_timer_restore_state: CTL: 0x000005 CVAL: 0x48aac64bd arch_timer_ctx_index: 1
qemu-system-aar-41290 [000] d.... 12023.695637: kvm_timer_restore_state: CTL: 0x000000 CVAL: 0x0 arch_timer_ctx_index: 0
qemu-system-aar-41290 [000] ..... 12023.695637: kvm_timer_emulate: arch_timer_ctx_index: 2 (should_fire: 0)
qemu-system-aar-41290 [000] ..... 12023.695637: kvm_timer_emulate: arch_timer_ctx_index: 3 (should_fire: 1)
qemu-system-aar-41290 [000] ..... 12023.695640: kvm_get_timer_map: VCPU: 1, dv: 1, dp: 0, ev: 2, ep: 3
qemu-system-aar-41290 [000] d.... 12023.695640: kvm_timer_save_state: CTL: 0x000005 CVAL: 0x48aac64bd arch_timer_ctx_index: 1
qemu-system-aar-41290 [000] d.... 12023.695640: kvm_timer_save_state: CTL: 0x000000 CVAL: 0x0 arch_timer_ctx_index: 0
qemu-system-aar-41290 [000] ..... 12023.695640: kvm_inject_nested_exception: IRQ: esr_el2 0x0 elr_el2: 0xffffffc0010ac5a4 spsr_el2: 0x024000c5 (M: EL1h) hcr_el2: 807c663f
qemu-system-aar-41290 [000] ..... 12023.695641: kvm_get_timer_map: VCPU: 1, dv: 2, dp: 3, ev: 1, ep: 0
qemu-system-aar-41290 [000] ..... 12023.695641: kvm_timer_update_irq: VCPU: 1, IRQ 28, level 0
qemu-system-aar-41290 [000] ..... 12023.695641: vgic_update_irq_pending: VCPU: 1, IRQ 28, level: 0
qemu-system-aar-41290 [000] ..... 12023.695641: kvm_timer_update_irq: VCPU: 1, IRQ 26, level 1
qemu-system-aar-41290 [000] ..... 12023.695641: vgic_update_irq_pending: VCPU: 1, IRQ 26, level: 1
qemu-system-aar-41290 [000] d.... 12023.695642: kvm_timer_restore_state: CTL: 0x000000 CVAL: 0x0 arch_timer_ctx_index: 2
qemu-system-aar-41290 [000] d.... 12023.695642: kvm_timer_restore_state: CTL: 0x000005 CVAL: 0x426f7d24736c arch_timer_ctx_index: 3
qemu-system-aar-41290 [000] ..... 12023.695642: kvm_timer_emulate: arch_timer_ctx_index: 1 (should_fire: 1)
qemu-system-aar-41290 [000] ..... 12023.695642: kvm_timer_emulate: arch_timer_ctx_index: 0 (should_fire: 0)
qemu-system-aar-41290 [000] d.... 12023.695644: vgic_update_irq_pending: VCPU: 1, IRQ 25, level: 0
qemu-system-aar-41290 [000] d.... 12023.695645: vgic_update_irq_pending: VCPU: 1, IRQ 25, level: 0
qemu-system-aar-41290 [000] d.... 12023.695645: vgic_update_irq_pending: VCPU: 1, IRQ 25, level: 0
qemu-system-aar-41290 [000] d.... 12023.695646: kvm_entry: PC: 0x00000a0000267c80
qemu-system-aar-41290 [000] d.... 12023.695647: vgic_update_irq_pending: VCPU: 1, IRQ 25, level: 0
qemu-system-aar-41290 [000] d.... 12023.695647: kvm_get_timer_map: VCPU: 1, dv: 2, dp: 3, ev: 1, ep: 0
qemu-system-aar-41290 [000] d.... 12023.695647: kvm_timer_emulate: arch_timer_ctx_index: 1 (should_fire: 1)
qemu-system-aar-41290 [000] d.... 12023.695647: kvm_timer_emulate: arch_timer_ctx_index: 0 (should_fire: 0)
qemu-system-aar-41290 [000] ..... 12023.695647: kvm_exit: TRAP: HSR_EC: 0x001a (ERET), PC: 0x00000a00002674e0
qemu-system-aar-41290 [000] ..... 12023.695648: kvm_get_timer_map: VCPU: 1, dv: 2, dp: 3, ev: 1, ep: 0
qemu-system-aar-41290 [000] d.... 12023.695648: kvm_timer_save_state: CTL: 0x000000 CVAL: 0x0 arch_timer_ctx_index: 2
qemu-system-aar-41290 [000] d.... 12023.695648: kvm_timer_save_state: CTL: 0x000005 CVAL: 0x426f7d24736c arch_timer_ctx_index: 3
qemu-system-aar-41290 [000] ..... 12023.695648: kvm_nested_eret: elr_el2: 0xffffffc0010ac5a4 spsr_el2: 0x024000c5 (M: EL1h) hcr_el2: 807c663f
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: KVM: Nested VGIC emulation leads to infinite IRQ exceptions
2025-09-30 21:11 KVM: Nested VGIC emulation leads to infinite IRQ exceptions Volodymyr Babchuk
@ 2025-10-01 7:23 ` Marc Zyngier
2025-10-02 12:29 ` Volodymyr Babchuk
2025-10-01 16:17 ` Marc Zyngier
2025-11-03 17:08 ` Marc Zyngier
2 siblings, 1 reply; 7+ messages in thread
From: Marc Zyngier @ 2025-10-01 7:23 UTC (permalink / raw)
To: Volodymyr Babchuk
Cc: linux-arm-kernel@lists.infradead.org, Dmytro Terletskyi, kvmarm
Please use the kvmarm mailing list for KVM related discussions (added
for your convenience).
On Tue, 30 Sep 2025 22:11:54 +0100,
Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> wrote:
>
>
> Hi all,
>
> We are trying to run Xen as KVM nested hypervisor (again!) and we have
> encountered strange issue with GIC nested emulation. I am certain that
> we'll dig to the root cause, but probably someone on the ML will save us
> a couple of days of debugging by providing with some insights.
>
> So, setup is following: QEMU 9.2 is running Xen 4.20 with KVM (latest
> Linux master branch) as accelerator.
9.2 is an odd choice, specially as it doesn't have any NV support.
ISTR that 10.1 is the first version to have some NV support, although
without E2H0 enablement which I expect Xen requires.
Anyway, if you're already running something, then I expect you're
patched QEMU to death to get there.
> QEMU provides a couple of virtio
> devices to the VM and some of these device are passed thought to DomU
> (we had to hook these devices to vSMMU, but this is another
> story). Sometimes we observe the following sequence of events:
>
> 1. DomU gets IRQ from a virtio device
> 2. DomU acknowledges the IRQ by reading IAR1 register
> 3. DomU is unable to deactivate the IRQ (there is no write to the
> EOI register)
>
> We are not sure why this happens, but our current theory is that DomU's
> vvcpu0 is interrupted during handling of the IRQ by Xen's timer
> interrupt. Also, we are not able to catch this specific moment in KVM
> trace because of lots of lost events. Anyways, after this we are seeing
> the following loop:
>
> 4. vCPU switches to Xen via IRQ Exception
> 5. Xen reads IAR1 to get IRQ nr, but gets 1023 (aka no IRQs)
> 6. Xen issues ERET to return back to guest
> 7. GOTO 4.
What is the configuration of ICH_HCR_EL2 in Xen at the point of
reading IAR? My hunch is that you are taking a maintenance interrupt,
disable the virtual CPU interface on taking that interrupt, which of
course results in no interrupt to acknowledge.
Reading ICH_MISR_EL2 at the point of entering Xen should give you a
clue of the reason why this is happening -- assuming my hunch is
correct.
> This basically renders the whole vCPU stuck. Also we noticed that DomU's
> vvCPU is stuck right after accessing virtio mmio register. So looks like
> this is what happens:
>
> 1. QEMU sends virtio IRQ to the VM
> 2. Xen handles the IRQ and injects it into DomU
> 3. DomU tries to handle it and accesses a virtio mmio register
> 4. This produces a memory fault that leads to switch back to KVM (and
> then to QEMU of course) so QEMU can handle MMIO access
> 5. When QEMU continues vCPU thread, it immediately gets switched back to
> vEL2 (probably due to timer IRQ, but this is my speculation)
> 6. the vCPU is spinning in the aforementioned loop
>
> Looks like this happens because of empty LRs, but we still didn't
> confirmed this because the issue is not 100% reproducible. I'll be glad
> to hear any suggestions.
I don't think this is likely. If the guest hasn't done an EOI, then
the interrupt should still be in the LR in the context of DomU, with
at least an Active state. You want to try and look at what Xen sees
there.
> This is a part of the KVM trace, where you can see that vCPU in question
> tries to perform ERET to Linux in DomU but is being brought back to
> vEL2. In this particular case this is vCPU1 / vvCPU0. I filtered out
> other vCPUs to reduce clutter.
>
> qemu-system-aar-41290 [000] d.... 12023.695620: kvm_entry: PC: 0x00000a0000267c80
> qemu-system-aar-41290 [000] d.... 12023.695620: vgic_update_irq_pending: VCPU: 1, IRQ 25, level: 0
> qemu-system-aar-41290 [000] d.... 12023.695621: kvm_get_timer_map: VCPU: 1, dv: 2, dp: 3, ev: 1, ep: 0
> qemu-system-aar-41290 [000] d.... 12023.695621: kvm_timer_emulate: arch_timer_ctx_index: 1 (should_fire: 1)
> qemu-system-aar-41290 [000] d.... 12023.695621: kvm_timer_emulate: arch_timer_ctx_index: 0 (should_fire: 0)
> qemu-system-aar-41290 [000] ..... 12023.695621: kvm_exit: TRAP: HSR_EC: 0x001a (ERET), PC: 0x00000a00002674e0
[...]
There isn't much to go on here, as we mostly see the timers, which do
not help at all. I'd suggest you look at the maintenance interrupt,
and how Xen manipulates ICH_HCR_EL2, but that's the extent of what I
can do.
To help you further, I'd need a reproducer. I've asked you more than
once to provide a way to reproduce your setup, but got no answer. The
Debian package doesn't boot (it just messes up grub), and I don't have
the time to learn how to deal with Xen from scratch.
Until then, you'll have to apply some debugging by yourself.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: KVM: Nested VGIC emulation leads to infinite IRQ exceptions
2025-09-30 21:11 KVM: Nested VGIC emulation leads to infinite IRQ exceptions Volodymyr Babchuk
2025-10-01 7:23 ` Marc Zyngier
@ 2025-10-01 16:17 ` Marc Zyngier
2025-11-03 17:08 ` Marc Zyngier
2 siblings, 0 replies; 7+ messages in thread
From: Marc Zyngier @ 2025-10-01 16:17 UTC (permalink / raw)
To: Volodymyr Babchuk
Cc: linux-arm-kernel@lists.infradead.org, Dmytro Terletskyi, kvmarm
On Tue, 30 Sep 2025 22:11:54 +0100,
Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> wrote:
[...]
I spent some time to look at this again.
>
> This is a part of the KVM trace, where you can see that vCPU in question
> tries to perform ERET to Linux in DomU but is being brought back to
> vEL2. In this particular case this is vCPU1 / vvCPU0. I filtered out
> other vCPUs to reduce clutter.
>
> qemu-system-aar-41290 [000] d.... 12023.695620: kvm_entry: PC: 0x00000a0000267c80
> qemu-system-aar-41290 [000] d.... 12023.695620: vgic_update_irq_pending: VCPU: 1, IRQ 25, level: 0
> qemu-system-aar-41290 [000] d.... 12023.695621: kvm_get_timer_map: VCPU: 1, dv: 2, dp: 3, ev: 1, ep: 0
> qemu-system-aar-41290 [000] d.... 12023.695621: kvm_timer_emulate: arch_timer_ctx_index: 1 (should_fire: 1)
> qemu-system-aar-41290 [000] d.... 12023.695621: kvm_timer_emulate: arch_timer_ctx_index: 0 (should_fire: 0)
> qemu-system-aar-41290 [000] ..... 12023.695621: kvm_exit: TRAP: HSR_EC: 0x001a (ERET), PC: 0x00000a00002674e0
Wants to ERET to EL1
> qemu-system-aar-41290 [000] ..... 12023.695621: kvm_get_timer_map: VCPU: 1, dv: 2, dp: 3, ev: 1, ep: 0
> qemu-system-aar-41290 [000] d.... 12023.695622: kvm_timer_save_state: CTL: 0x000000 CVAL: 0x0 arch_timer_ctx_index: 2
> qemu-system-aar-41290 [000] d.... 12023.695622: kvm_timer_save_state: CTL: 0x000005 CVAL: 0x426f7d24736c arch_timer_ctx_index: 3
EL2 physical timer is pending
> qemu-system-aar-41290 [000] ..... 12023.695622: kvm_nested_eret: elr_el2: 0xffffffc0010ac5a4 spsr_el2: 0x024000c5 (M: EL1h) hcr_el2: 807c663f
Return to EL1, reload the EL1 context
> qemu-system-aar-41290 [000] ..... 12023.695622: kvm_get_timer_map: VCPU: 1, dv: 1, dp: 0, ev: 2, ep: 3
> qemu-system-aar-41290 [000] ..... 12023.695622: kvm_timer_update_irq: VCPU: 1, IRQ 27, level 1
> qemu-system-aar-41290 [000] ..... 12023.695623: vgic_update_irq_pending: VCPU: 1, IRQ 27, level: 1
> qemu-system-aar-41290 [000] ..... 12023.695623: kvm_timer_update_irq: VCPU: 1, IRQ 30, level 0
> qemu-system-aar-41290 [000] ..... 12023.695623: vgic_update_irq_pending: VCPU: 1, IRQ 30, level: 0
> qemu-system-aar-41290 [000] d.... 12023.695623: kvm_timer_restore_state: CTL: 0x000005 CVAL: 0x48aac64bd arch_timer_ctx_index: 1
EL1 virtual timer is pending
> qemu-system-aar-41290 [000] d.... 12023.695624: kvm_timer_restore_state: CTL: 0x000000 CVAL: 0x0 arch_timer_ctx_index: 0
> qemu-system-aar-41290 [000] ..... 12023.695624: kvm_timer_emulate: arch_timer_ctx_index: 2 (should_fire: 0)
> qemu-system-aar-41290 [000] ..... 12023.695624: kvm_timer_emulate: arch_timer_ctx_index: 3 (should_fire: 1)
EL2 physical timer still pending
> qemu-system-aar-41290 [000] ..... 12023.695626: kvm_get_timer_map: VCPU: 1, dv: 1, dp: 0, ev: 2, ep: 3
> qemu-system-aar-41290 [000] d.... 12023.695626: kvm_timer_save_state: CTL: 0x000005 CVAL: 0x48aac64bd arch_timer_ctx_index: 1
> qemu-system-aar-41290 [000] d.... 12023.695627: kvm_timer_save_state: CTL: 0x000000 CVAL: 0x0 arch_timer_ctx_index: 0
HW without FEAT_ECV, I presume?
> qemu-system-aar-41290 [000] ..... 12023.695627: kvm_inject_nested_exception: IRQ: esr_el2 0x0 elr_el2: 0xffffffc0010ac5a4 spsr_el2: 0x024000c5 (M: EL1h) hcr_el2: 807c663f
Take an interrupt from EL1 to EL2, flip the world again.
> qemu-system-aar-41290 [000] ..... 12023.695627: kvm_get_timer_map: VCPU: 1, dv: 2, dp: 3, ev: 1, ep: 0
> qemu-system-aar-41290 [000] ..... 12023.695627: kvm_timer_update_irq: VCPU: 1, IRQ 28, level 0
> qemu-system-aar-41290 [000] ..... 12023.695627: vgic_update_irq_pending: VCPU: 1, IRQ 28, level: 0
> qemu-system-aar-41290 [000] ..... 12023.695628: kvm_timer_update_irq: VCPU: 1, IRQ 26, level 1
> qemu-system-aar-41290 [000] ..... 12023.695628: vgic_update_irq_pending: VCPU: 1, IRQ 26, level: 1
> qemu-system-aar-41290 [000] d.... 12023.695628: kvm_timer_restore_state: CTL: 0x000000 CVAL: 0x0 arch_timer_ctx_index: 2
> qemu-system-aar-41290 [000] d.... 12023.695628: kvm_timer_restore_state: CTL: 0x000005 CVAL: 0x426f7d24736c arch_timer_ctx_index: 3
Yup, EL2 timer still pending
> qemu-system-aar-41290 [000] ..... 12023.695629: kvm_timer_emulate: arch_timer_ctx_index: 1 (should_fire: 1)
> qemu-system-aar-41290 [000] ..... 12023.695629: kvm_timer_emulate: arch_timer_ctx_index: 0 (should_fire: 0)
> qemu-system-aar-41290 [000] d.... 12023.695632: vgic_update_irq_pending: VCPU: 1, IRQ 25, level: 0
> qemu-system-aar-41290 [000] d.... 12023.695632: vgic_update_irq_pending: VCPU: 1, IRQ 25, level: 0
> qemu-system-aar-41290 [000] d.... 12023.695633: vgic_update_irq_pending: VCPU: 1, IRQ 25, level: 0
> qemu-system-aar-41290 [000] d.... 12023.695633: kvm_entry: PC: 0x00000a0000267c80
and we go again.
So the MI doesn't seem to be the cause of this, as empty LRs are not
likely to be the problem.
However, we definitely see timer interrupts firing, EL2 being entered,
and yet, El2 doesn't seem to acknowledge the interrupt. So something
is wrong there, either in Xen on in KVM. You want to instrument what
is happening at this stage (I don't see anything of the like, but my
machines have FEAT_ECV).
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: KVM: Nested VGIC emulation leads to infinite IRQ exceptions
2025-10-01 7:23 ` Marc Zyngier
@ 2025-10-02 12:29 ` Volodymyr Babchuk
2025-10-02 14:28 ` Marc Zyngier
0 siblings, 1 reply; 7+ messages in thread
From: Volodymyr Babchuk @ 2025-10-02 12:29 UTC (permalink / raw)
To: Marc Zyngier
Cc: linux-arm-kernel@lists.infradead.org, Dmytro Terletskyi, kvmarm
Hi Marc,
Marc Zyngier <maz@kernel.org> writes:
> Please use the kvmarm mailing list for KVM related discussions (added
> for your convenience).
Oops, sorry. I missed that MAINTAINERS have 2 "L:" entries.
> On Tue, 30 Sep 2025 22:11:54 +0100,
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> wrote:
>>
>>
>> Hi all,
>>
>> We are trying to run Xen as KVM nested hypervisor (again!) and we have
>> encountered strange issue with GIC nested emulation. I am certain that
>> we'll dig to the root cause, but probably someone on the ML will save us
>> a couple of days of debugging by providing with some insights.
>>
>> So, setup is following: QEMU 9.2 is running Xen 4.20 with KVM (latest
>> Linux master branch) as accelerator.
>
> 9.2 is an odd choice, specially as it doesn't have any NV support.
> ISTR that 10.1 is the first version to have some NV support, although
> without E2H0 enablement which I expect Xen requires.
Yep, I had to patch QEMU to enable E2H0 (among other things).
>
> Anyway, if you're already running something, then I expect you're
> patched QEMU to death to get there.
You are certainly correct.
[...]
>
> To help you further, I'd need a reproducer. I've asked you more than
> once to provide a way to reproduce your setup, but got no answer. The
> Debian package doesn't boot (it just messes up grub), and I don't have
> the time to learn how to deal with Xen from scratch.
The current setup is quite complex as it involves whole Android build,
so there is no easy setup to share reproducer.
> Until then, you'll have to apply some debugging by yourself.
This is what I and Dmytro are doing. And looks like I found the
problem. I added some more traces and here we go:
Xen wants to return back to vvCPU:
qemu-system-aar-3378 [085] ..... 246.770716: kvm_inject_nested_exception: IRQ: esr_el2 0x0 elr_el2: 0xffffffc0010e5508 spsr_el2: 0x024000c5 (M: EL1h) hcr_el2: 807c663f
qemu-system-aar-3378 [085] ..... 246.770716: kvm_get_timer_map: VCPU: 1, dv: 2, dp: 3, ev: 1, ep: 0
qemu-system-aar-3378 [085] ..... 246.770716: kvm_timer_update_irq: VCPU: 1, IRQ 28, level 0
qemu-system-aar-3378 [085] ..... 246.770716: vgic_update_irq_pending: VCPU: 1, IRQ 28, level: 0
qemu-system-aar-3378 [085] ..... 246.770717: kvm_timer_update_irq: VCPU: 1, IRQ 26, level 1
We have pending timer IRQ for Xen
qemu-system-aar-3378 [085] ..... 246.770717: vgic_update_irq_pending: VCPU: 1, IRQ 26, level: 1
qemu-system-aar-3378 [085] d.... 246.770717: kvm_timer_restore_state: CTL: 0x000000 CVAL: 0x0 arch_timer_ctx_index: 2
qemu-system-aar-3378 [085] d.... 246.770717: kvm_timer_restore_state: CTL: 0x000005 CVAL: 0x3e6c59a71a95 arch_timer_ctx_index: 3
qemu-system-aar-3378 [085] ..... 246.770717: kvm_timer_emulate: arch_timer_ctx_index: 1 (should_fire: 1)
qemu-system-aar-3378 [085] ..... 246.770718: kvm_timer_emulate: arch_timer_ctx_index: 0 (should_fire: 0)
qemu-system-aar-3378 [085] d.... 246.770719: vgic_update_irq_pending: VCPU: 1, IRQ 25, level: 0
But we also have bunch of ACTIVE interrupts which fill all available
LRs:
qemu-system-aar-3378 [085] d.... 246.770720: vgic_populate_lr: VCPU 1 lr 0 = 90a000000000004f
qemu-system-aar-3378 [085] d.... 246.770720: vgic_populate_lr: VCPU 1 lr 1 = 90a000000000004e
qemu-system-aar-3378 [085] d.... 246.770720: vgic_populate_lr: VCPU 1 lr 2 = d0a000000000004a
qemu-system-aar-3378 [085] d.... 246.770720: vgic_populate_lr: VCPU 1 lr 3 = d0a000000000004b
As all LR entries have ACTIVE bit set, read from IAR1 will produce 1023,
of course. Problem is that Xen itself can't deactivate these 4 IRQs as
they are directed to DomU, so DomU should active them first. But DomU
can't do this as it is never executed.
I am not sure what is the correct fix, but I see two options:
- Prioritize timer IRQs so they always present in LRs
- De-prioritize ACTIVE IRQs so they are inserted into LRs last.
Looks like the second one is better.
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: KVM: Nested VGIC emulation leads to infinite IRQ exceptions
2025-10-02 12:29 ` Volodymyr Babchuk
@ 2025-10-02 14:28 ` Marc Zyngier
2025-10-02 15:08 ` Volodymyr Babchuk
0 siblings, 1 reply; 7+ messages in thread
From: Marc Zyngier @ 2025-10-02 14:28 UTC (permalink / raw)
To: Volodymyr Babchuk
Cc: linux-arm-kernel@lists.infradead.org, Dmytro Terletskyi, kvmarm
On Thu, 02 Oct 2025 13:29:42 +0100,
Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> wrote:
>
> Xen wants to return back to vvCPU:
>
> qemu-system-aar-3378 [085] ..... 246.770716: kvm_inject_nested_exception: IRQ: esr_el2 0x0 elr_el2: 0xffffffc0010e5508 spsr_el2: 0x024000c5 (M: EL1h) hcr_el2: 807c663f
> qemu-system-aar-3378 [085] ..... 246.770716: kvm_get_timer_map: VCPU: 1, dv: 2, dp: 3, ev: 1, ep: 0
> qemu-system-aar-3378 [085] ..... 246.770716: kvm_timer_update_irq: VCPU: 1, IRQ 28, level 0
> qemu-system-aar-3378 [085] ..... 246.770716: vgic_update_irq_pending: VCPU: 1, IRQ 28, level: 0
> qemu-system-aar-3378 [085] ..... 246.770717: kvm_timer_update_irq: VCPU: 1, IRQ 26, level 1
>
>
> We have pending timer IRQ for Xen
>
> qemu-system-aar-3378 [085] ..... 246.770717: vgic_update_irq_pending: VCPU: 1, IRQ 26, level: 1
> qemu-system-aar-3378 [085] d.... 246.770717: kvm_timer_restore_state: CTL: 0x000000 CVAL: 0x0 arch_timer_ctx_index: 2
> qemu-system-aar-3378 [085] d.... 246.770717: kvm_timer_restore_state: CTL: 0x000005 CVAL: 0x3e6c59a71a95 arch_timer_ctx_index: 3
> qemu-system-aar-3378 [085] ..... 246.770717: kvm_timer_emulate: arch_timer_ctx_index: 1 (should_fire: 1)
> qemu-system-aar-3378 [085] ..... 246.770718: kvm_timer_emulate: arch_timer_ctx_index: 0 (should_fire: 0)
> qemu-system-aar-3378 [085] d.... 246.770719: vgic_update_irq_pending: VCPU: 1, IRQ 25, level: 0
>
> But we also have bunch of ACTIVE interrupts which fill all available
> LRs:
>
> qemu-system-aar-3378 [085] d.... 246.770720: vgic_populate_lr: VCPU 1 lr 0 = 90a000000000004f
> qemu-system-aar-3378 [085] d.... 246.770720: vgic_populate_lr: VCPU 1 lr 1 = 90a000000000004e
> qemu-system-aar-3378 [085] d.... 246.770720: vgic_populate_lr: VCPU 1 lr 2 = d0a000000000004a
> qemu-system-aar-3378 [085] d.... 246.770720: vgic_populate_lr: VCPU 1 lr 3 = d0a000000000004b
>
> As all LR entries have ACTIVE bit set, read from IAR1 will produce 1023,
> of course. Problem is that Xen itself can't deactivate these 4 IRQs as
> they are directed to DomU, so DomU should active them first. But DomU
> can't do this as it is never executed.
There is a flaw in your reasoning: if these are DomU (an L2 guest)
interrupts, why would they impact Xen itself, which is L1? At the
point of entering Xen, the HW LRs should only contain the virtual
interrupts that are targeting Xen, and nothing else (the DomU
interrupts being stored in the shadow LRs).
I can't see so far how we'd end-up in that situation, given that we do
a full context switch of the vgic context on each EL1/EL2 transition.
Unless you are actually acknowledging the DomU interrupts in Xen and
injecting them back into DomU? Which seems very odd as you don't have
the HW bit set, which I'd expect if that was the case...
> I am not sure what is the correct fix, but I see two options:
>
> - Prioritize timer IRQs so they always present in LRs
> - De-prioritize ACTIVE IRQs so they are inserted into LRs last.
>
> Looks like the second one is better.
That's indeed something missing in KVM (I have long waited until
someone would do it in my stead, but nobody seem to be bothered) but
it isn't clear, from what you are describing, that this is the actual
solution to your problem.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: KVM: Nested VGIC emulation leads to infinite IRQ exceptions
2025-10-02 14:28 ` Marc Zyngier
@ 2025-10-02 15:08 ` Volodymyr Babchuk
0 siblings, 0 replies; 7+ messages in thread
From: Volodymyr Babchuk @ 2025-10-02 15:08 UTC (permalink / raw)
To: Marc Zyngier
Cc: linux-arm-kernel@lists.infradead.org, Dmytro Terletskyi, kvmarm
Hi Marc,
Marc Zyngier <maz@kernel.org> writes:
> On Thu, 02 Oct 2025 13:29:42 +0100,
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> wrote:
[...]
>> qemu-system-aar-3378 [085] d.... 246.770720: vgic_populate_lr: VCPU 1 lr 0 = 90a000000000004f
>> qemu-system-aar-3378 [085] d.... 246.770720: vgic_populate_lr: VCPU 1 lr 1 = 90a000000000004e
>> qemu-system-aar-3378 [085] d.... 246.770720: vgic_populate_lr: VCPU 1 lr 2 = d0a000000000004a
>> qemu-system-aar-3378 [085] d.... 246.770720: vgic_populate_lr: VCPU 1 lr 3 = d0a000000000004b
>>
>> As all LR entries have ACTIVE bit set, read from IAR1 will produce 1023,
>> of course. Problem is that Xen itself can't deactivate these 4 IRQs as
>> they are directed to DomU, so DomU should active them first. But DomU
>> can't do this as it is never executed.
>
> There is a flaw in your reasoning: if these are DomU (an L2 guest)
> interrupts, why would they impact Xen itself, which is L1? At the
> point of entering Xen, the HW LRs should only contain the virtual
> interrupts that are targeting Xen, and nothing else (the DomU
> interrupts being stored in the shadow LRs).
Agree, they **should**. But looks like they contain all IRQs that are
targeted that particular vCPU. I am still studying KVM's vGIC, so I
can't say why it this happening.
Mind you, that these are QEMUs IRQs, so from Xen's standpoint they are
HW interrupts and of course they are targeting Xen. Xen injects them to
a guest by writing vLR with HW bit enabled.
IMO, KVM should track these re-injected IRQs and remove them from Xen's
LRs. But this begs assumption that Xen (or any other nested hypervisor)
is well-behaved and will not try to deactive a IRQ that it already
injected to an own guest.
>
> I can't see so far how we'd end-up in that situation, given that we do
> a full context switch of the vgic context on each EL1/EL2 transition.
>
> Unless you are actually acknowledging the DomU interrupts in Xen and
> injecting them back into DomU? Which seems very odd as you don't have
> the HW bit set, which I'd expect if that was the case...
Isn't KVM doing the same? I mean, all HW IRQs are targeting hypervisor
and then being routed and re-injected into a guest. AFAIR, only LPIs can
be injected directly to a guest. And, as I said, IRQs in question are
generated by external QEMU, so they are considered HW interrupts by Xen.
>
>> I am not sure what is the correct fix, but I see two options:
>>
>> - Prioritize timer IRQs so they always present in LRs
>> - De-prioritize ACTIVE IRQs so they are inserted into LRs last.
>>
>> Looks like the second one is better.
>
> That's indeed something missing in KVM (I have long waited until
> someone would do it in my stead, but nobody seem to be bothered) but
> it isn't clear, from what you are describing, that this is the actual
> solution to your problem.
>
Okay, disregard my previous ideas. We can't willy-nilly remove ACTIVE
IRQs from LRs. So, probably we need some sort of heuristic to determine
if L1 hypervisor re-injects IRQ to a L2 guest. I think we can check HW
bit in vLR to determine this. In this case we can differentiate L1- and
L2- targeted IRQs during context switch from KVM to L1/L2 and fill LRs
accordingly.
Of course, as I said, in this case we'll rely on good behavior of L1
hypervisor, because it can try to EOI IRQ that it already injected in a
guest. This is not a huge deal if we are dealing with "virtual" HW
interrupts (generated by QEMU in this case), but it can be tricky with
real HW interrupts generated by a real HW device and injected all the
way to L2.
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: KVM: Nested VGIC emulation leads to infinite IRQ exceptions
2025-09-30 21:11 KVM: Nested VGIC emulation leads to infinite IRQ exceptions Volodymyr Babchuk
2025-10-01 7:23 ` Marc Zyngier
2025-10-01 16:17 ` Marc Zyngier
@ 2025-11-03 17:08 ` Marc Zyngier
2 siblings, 0 replies; 7+ messages in thread
From: Marc Zyngier @ 2025-11-03 17:08 UTC (permalink / raw)
To: Volodymyr Babchuk; +Cc: linux-arm-kernel@lists.infradead.org, Dmytro Terletskyi
On Tue, 30 Sep 2025 22:11:54 +0100,
Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> wrote:
>
>
> Hi all,
>
> We are trying to run Xen as KVM nested hypervisor (again!) and we have
> encountered strange issue with GIC nested emulation. I am certain that
> we'll dig to the root cause, but probably someone on the ML will save us
> a couple of days of debugging by providing with some insights.
[...]
FWIW, I have just posted a (large) series[1] that aims at fixing the
issue you have reported. I'd appreciate it if you could give it a go
and report whether this helps addressing it.
Thanks,
M.
[1] https://lore.kernel.org/all/20251103165517.2960148-1-maz@kernel.org/
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-11-03 17:08 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-30 21:11 KVM: Nested VGIC emulation leads to infinite IRQ exceptions Volodymyr Babchuk
2025-10-01 7:23 ` Marc Zyngier
2025-10-02 12:29 ` Volodymyr Babchuk
2025-10-02 14:28 ` Marc Zyngier
2025-10-02 15:08 ` Volodymyr Babchuk
2025-10-01 16:17 ` Marc Zyngier
2025-11-03 17:08 ` Marc Zyngier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).