* [PATCH] KVM: arm64: preserve pending during kvm_irqfd_deassign
@ 2025-07-02 14:41 Steve Sistare
2025-07-02 15:19 ` Oliver Upton
0 siblings, 1 reply; 3+ messages in thread
From: Steve Sistare @ 2025-07-02 14:41 UTC (permalink / raw)
To: kvmarm, linux-kernel
Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Steve Sistare
When kvm_irqfd_deassign ... -> kvm_vgic_v4_unset_forwarding is called,
if an interrupt is pending in irq->pending_latch, then transfer it to
the producer's eventfd. This way, if the KVM instance is subsequently
destroyed, the interrupt is preserved in producer state. If the irqfd
is re-created in a new KVM instance, kvm_irqfd_assign finds the producer,
polls the eventfd, finds the interrupt, and injects it into KVM.
QEMU live update does that: it passes the VFIO device descriptors to the
new process, but destroys and recreates the KVM instance, without
quiescing VFIO interrupts.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
arch/arm64/kvm/arm.c | 8 ++++++--
arch/arm64/kvm/vgic/vgic-v4.c | 13 ++++++++++---
include/kvm/arm_vgic.h | 2 +-
3 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 38a91bb5d4c7..315f4829875b 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2751,6 +2751,7 @@ int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons,
void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
struct irq_bypass_producer *prod)
{
+ bool pending = false;
struct kvm_kernel_irqfd *irqfd =
container_of(cons, struct kvm_kernel_irqfd, consumer);
struct kvm_kernel_irq_routing_entry *irq_entry = &irqfd->irq_entry;
@@ -2758,7 +2759,10 @@ void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
if (irq_entry->type != KVM_IRQ_ROUTING_MSI)
return;
- kvm_vgic_v4_unset_forwarding(irqfd->kvm, prod->irq);
+ kvm_vgic_v4_unset_forwarding(irqfd->kvm, prod->irq, &pending);
+
+ if (pending)
+ eventfd_signal((struct eventfd_ctx *)prod->token);
}
bool kvm_arch_irqfd_route_changed(struct kvm_kernel_irq_routing_entry *old,
@@ -2781,7 +2785,7 @@ int kvm_arch_update_irqfd_routing(struct kvm *kvm, unsigned int host_irq,
*
* Unmap the vLPI and fall back to software LPI injection.
*/
- return kvm_vgic_v4_unset_forwarding(kvm, host_irq);
+ return kvm_vgic_v4_unset_forwarding(kvm, host_irq, NULL);
}
void kvm_arch_irq_bypass_stop(struct irq_bypass_consumer *cons)
diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c
index 193946108192..b4cc576f9b51 100644
--- a/arch/arm64/kvm/vgic/vgic-v4.c
+++ b/arch/arm64/kvm/vgic/vgic-v4.c
@@ -527,13 +527,14 @@ static struct vgic_irq *__vgic_host_irq_get_vlpi(struct kvm *kvm, int host_irq)
return NULL;
}
-int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq)
+int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq, bool *pending)
{
struct vgic_irq *irq;
unsigned long flags;
int ret = 0;
+ bool direct_msi = vgic_supports_direct_msis(kvm);
- if (!vgic_supports_direct_msis(kvm))
+ if (!pending && !direct_msi)
return 0;
irq = __vgic_host_irq_get_vlpi(kvm, host_irq);
@@ -542,7 +543,13 @@ int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq)
raw_spin_lock_irqsave(&irq->irq_lock, flags);
WARN_ON(irq->hw && irq->host_irq != host_irq);
- if (irq->hw) {
+
+ if (pending) {
+ *pending = irq->pending_latch;
+ irq->pending_latch = false;
+ }
+
+ if (direct_msi && irq->hw) {
atomic_dec(&irq->target_vcpu->arch.vgic_cpu.vgic_v3.its_vpe.vlpi_count);
irq->hw = false;
ret = its_unmap_vlpi(host_irq);
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 4a34f7f0a864..249b39e8da02 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -434,7 +434,7 @@ struct kvm_kernel_irq_routing_entry;
int kvm_vgic_v4_set_forwarding(struct kvm *kvm, int irq,
struct kvm_kernel_irq_routing_entry *irq_entry);
-int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq);
+int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq, bool *pending);
int vgic_v4_load(struct kvm_vcpu *vcpu);
void vgic_v4_commit(struct kvm_vcpu *vcpu);
--
2.39.3
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] KVM: arm64: preserve pending during kvm_irqfd_deassign
2025-07-02 14:41 [PATCH] KVM: arm64: preserve pending during kvm_irqfd_deassign Steve Sistare
@ 2025-07-02 15:19 ` Oliver Upton
2025-07-14 16:51 ` Steven Sistare
0 siblings, 1 reply; 3+ messages in thread
From: Oliver Upton @ 2025-07-02 15:19 UTC (permalink / raw)
To: Steve Sistare
Cc: kvmarm, linux-kernel, Marc Zyngier, Joey Gouly, Suzuki K Poulose,
Zenghui Yu
On Wed, Jul 02, 2025 at 07:41:37AM -0700, Steve Sistare wrote:
> When kvm_irqfd_deassign ... -> kvm_vgic_v4_unset_forwarding is called,
> if an interrupt is pending in irq->pending_latch, then transfer it to
> the producer's eventfd. This way, if the KVM instance is subsequently
> destroyed, the interrupt is preserved in producer state. If the irqfd
> is re-created in a new KVM instance, kvm_irqfd_assign finds the producer,
> polls the eventfd, finds the interrupt, and injects it into KVM.
>
> QEMU live update does that: it passes the VFIO device descriptors to the
> new process, but destroys and recreates the KVM instance, without
> quiescing VFIO interrupts.
This *does not work*. Updates to the ITS mapping are non-atomic and a
poorly timed MSI will get dropped on the floor. Or generate an SError if
your system implementer has a sense of humor.
KVM already provides the SAVE_PENDING_TABLES ioctl for saving the
pending state of LPIs to memory which also retrieves the pending state
of vLPIs from hardware. The expectation for this ioctl is that userspace
has already quiesced the interrupt generator.
If userspace can't honor that I don't see a reason for KVM to go out of
the way to forward the pending state, especially considering the fact
that the architecture doesn't support this behavior.
A spurious interrupt doesn't seem that bad here.
> -int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq)
> +int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq, bool *pending)
> {
> struct vgic_irq *irq;
> unsigned long flags;
> int ret = 0;
> + bool direct_msi = vgic_supports_direct_msis(kvm);
>
> - if (!vgic_supports_direct_msis(kvm))
> + if (!pending && !direct_msi)
> return 0;
You've broken the early return in case hardware doesn't support GICv4...
> irq = __vgic_host_irq_get_vlpi(kvm, host_irq);
> @@ -542,7 +543,13 @@ int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq)
>
> raw_spin_lock_irqsave(&irq->irq_lock, flags);
> WARN_ON(irq->hw && irq->host_irq != host_irq);
> - if (irq->hw) {
> +
> + if (pending) {
> + *pending = irq->pending_latch;
> + irq->pending_latch = false;
> + }
> +
So this "works" for software-injected LPIs (notice that this function is
for handling *vLPIs*) as KVM's pending state is always the source of
truth. Is that why you're allowing GICv3 to get here now?
This (importantly) doesn't work for hardware vLPIs, as the pending state
is maintained in the vLPI pending table for the vPE.
Overall, I'm not convinced KVM needs to do anything here. We have state
save/restore mechanisms readily available, if userspace wants to go
off-label then it's up to userspace to figure that out.
Thanks,
Oliver
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] KVM: arm64: preserve pending during kvm_irqfd_deassign
2025-07-02 15:19 ` Oliver Upton
@ 2025-07-14 16:51 ` Steven Sistare
0 siblings, 0 replies; 3+ messages in thread
From: Steven Sistare @ 2025-07-14 16:51 UTC (permalink / raw)
To: Oliver Upton
Cc: kvmarm, linux-kernel, Marc Zyngier, Joey Gouly, Suzuki K Poulose,
Zenghui Yu
On 7/2/2025 11:19 AM, Oliver Upton wrote:
> On Wed, Jul 02, 2025 at 07:41:37AM -0700, Steve Sistare wrote:
>> When kvm_irqfd_deassign ... -> kvm_vgic_v4_unset_forwarding is called,
>> if an interrupt is pending in irq->pending_latch, then transfer it to
>> the producer's eventfd. This way, if the KVM instance is subsequently
>> destroyed, the interrupt is preserved in producer state. If the irqfd
>> is re-created in a new KVM instance, kvm_irqfd_assign finds the producer,
>> polls the eventfd, finds the interrupt, and injects it into KVM.
>>
>> QEMU live update does that: it passes the VFIO device descriptors to the
>> new process, but destroys and recreates the KVM instance, without
>> quiescing VFIO interrupts.
>
> This *does not work*. Updates to the ITS mapping are non-atomic and a
> poorly timed MSI will get dropped on the floor. Or generate an SError if
> your system implementer has a sense of humor.
>
> KVM already provides the SAVE_PENDING_TABLES ioctl for saving the
> pending state of LPIs to memory which also retrieves the pending state
> of vLPIs from hardware. The expectation for this ioctl is that userspace
> has already quiesced the interrupt generator.
Thank you Oliver for reviewing this and suggesting SAVE_PENDING_TABLES.
I have added it to the QEMU live update sequence, after calling
kvm_irqfd_deassign, and it does the trick.
- Steve
> If userspace can't honor that I don't see a reason for KVM to go out of
> the way to forward the pending state, especially considering the fact
> that the architecture doesn't support this behavior.
>
> A spurious interrupt doesn't seem that bad here.
>
>> -int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq)
>> +int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq, bool *pending)
>> {
>> struct vgic_irq *irq;
>> unsigned long flags;
>> int ret = 0;
>> + bool direct_msi = vgic_supports_direct_msis(kvm);
>>
>> - if (!vgic_supports_direct_msis(kvm))
>> + if (!pending && !direct_msi)
>> return 0;
>
> You've broken the early return in case hardware doesn't support GICv4...
>
>> irq = __vgic_host_irq_get_vlpi(kvm, host_irq);
>> @@ -542,7 +543,13 @@ int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq)
>>
>> raw_spin_lock_irqsave(&irq->irq_lock, flags);
>> WARN_ON(irq->hw && irq->host_irq != host_irq);
>> - if (irq->hw) {
>> +
>> + if (pending) {
>> + *pending = irq->pending_latch;
>> + irq->pending_latch = false;
>> + }
>> +
>
> So this "works" for software-injected LPIs (notice that this function is
> for handling *vLPIs*) as KVM's pending state is always the source of
> truth. Is that why you're allowing GICv3 to get here now?
>
> This (importantly) doesn't work for hardware vLPIs, as the pending state
> is maintained in the vLPI pending table for the vPE.
>
> Overall, I'm not convinced KVM needs to do anything here. We have state
> save/restore mechanisms readily available, if userspace wants to go
> off-label then it's up to userspace to figure that out.
>
> Thanks,
> Oliver
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-07-14 16:51 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-02 14:41 [PATCH] KVM: arm64: preserve pending during kvm_irqfd_deassign Steve Sistare
2025-07-02 15:19 ` Oliver Upton
2025-07-14 16:51 ` Steven Sistare
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).