Re: KVM: Nested VGIC emulation leads to infinite IRQ exceptions

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
To: Marc Zyngier <maz@kernel.org>
Cc: "linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	Dmytro Terletskyi <Dmytro_Terletskyi@epam.com>,
	kvmarm <kvmarm@lists.linux.dev>
Subject: Re: KVM: Nested VGIC emulation leads to infinite IRQ exceptions
Date: Thu, 2 Oct 2025 15:08:09 +0000	[thread overview]
Message-ID: <87cy75nxyf.fsf@epam.com> (raw)
In-Reply-To: <86bjmpz8cc.wl-maz@kernel.org> (Marc Zyngier's message of "Thu, 02 Oct 2025 15:28:19 +0100")

Hi Marc,

Marc Zyngier <maz@kernel.org> writes:

> On Thu, 02 Oct 2025 13:29:42 +0100,
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> wrote:

[...]

>>  qemu-system-aar-3378    [085] d....   246.770720: vgic_populate_lr: VCPU 1 lr 0 = 90a000000000004f
>>  qemu-system-aar-3378    [085] d....   246.770720: vgic_populate_lr: VCPU 1 lr 1 = 90a000000000004e
>>  qemu-system-aar-3378    [085] d....   246.770720: vgic_populate_lr: VCPU 1 lr 2 = d0a000000000004a
>>  qemu-system-aar-3378    [085] d....   246.770720: vgic_populate_lr: VCPU 1 lr 3 = d0a000000000004b
>> 
>> As all LR entries have ACTIVE bit set, read from IAR1 will produce 1023,
>> of course. Problem is that Xen itself can't deactivate these 4 IRQs as
>> they are directed to DomU, so DomU should active them first. But DomU
>> can't do this as it is never executed.
>
> There is a flaw in your reasoning: if these are DomU (an L2 guest)
> interrupts, why would they impact Xen itself, which is L1? At the
> point of entering Xen, the HW LRs should only contain the virtual
> interrupts that are targeting Xen, and nothing else (the DomU
> interrupts being stored in the shadow LRs).

Agree, they **should**. But looks like they contain all IRQs that are
targeted that particular vCPU. I am still studying KVM's vGIC, so I
can't say why it this happening.

Mind you, that these are QEMUs IRQs, so from Xen's standpoint they are
HW interrupts and of course they are targeting Xen. Xen injects them to
a guest by writing vLR with HW bit enabled.

IMO, KVM should track these re-injected IRQs and remove them from Xen's
LRs. But this begs assumption that Xen (or any other nested hypervisor)
is well-behaved and will not try to deactive a IRQ that it already
injected to an own guest.

>
> I can't see so far how we'd end-up in that situation, given that we do
> a full context switch of the vgic context on each EL1/EL2 transition.
>
> Unless you are actually acknowledging the DomU interrupts in Xen and
> injecting them back into DomU? Which seems very odd as you don't have
> the HW bit set, which I'd expect if that was the case...

Isn't KVM doing the same? I mean, all HW IRQs are targeting hypervisor
and then being routed and re-injected into a guest. AFAIR, only LPIs can
be injected directly to a guest. And, as I said, IRQs in question are
generated by external QEMU, so they are considered HW interrupts by Xen.

>
>> I am not sure what is the correct fix, but I see two options:
>> 
>> - Prioritize timer IRQs so they always present in LRs
>> - De-prioritize ACTIVE IRQs so they are inserted into LRs last.
>> 
>> Looks like the second one is better.
>
> That's indeed something missing in KVM (I have long waited until
> someone would do it in my stead, but nobody seem to be bothered) but
> it isn't clear, from what you are describing, that this is the actual
> solution to your problem.
>

Okay, disregard my previous ideas. We can't willy-nilly remove ACTIVE
IRQs from LRs. So, probably we need some sort of heuristic to determine
if L1 hypervisor re-injects IRQ to a L2 guest. I think we can check HW
bit in vLR to determine this. In this case we can differentiate L1- and
L2- targeted IRQs during context switch from KVM to L1/L2 and fill LRs
accordingly.

Of course, as I said, in this case we'll rely on good behavior of L1
hypervisor, because it can try to EOI IRQ that it already injected in a
guest. This is not a huge deal if we are dealing with "virtual" HW
interrupts (generated by QEMU in this case), but it can be tricky with
real HW interrupts generated by a real HW device and injected all the
way to L2.

-- 
WBR, Volodymyr

next prev parent reply	other threads:[~2025-10-02 15:08 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-30 21:11 KVM: Nested VGIC emulation leads to infinite IRQ exceptions Volodymyr Babchuk
2025-10-01  7:23 ` Marc Zyngier
2025-10-02 12:29   ` Volodymyr Babchuk
2025-10-02 14:28     ` Marc Zyngier
2025-10-02 15:08       ` Volodymyr Babchuk [this message]
2025-10-01 16:17 ` Marc Zyngier
2025-11-03 17:08 ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cy75nxyf.fsf@epam.com \
    --to=volodymyr_babchuk@epam.com \
    --cc=Dmytro_Terletskyi@epam.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=maz@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).