* [PATCH v3 0/2] KVM: x86/ioapic: Fix oneshot interrupts forwarding
@ 2022-08-18 20:26 Dmytro Maluka
2022-08-18 20:27 ` [PATCH v3 1/2] KVM: irqfd: Make resampler_list an RCU list Dmytro Maluka
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Dmytro Maluka @ 2022-08-18 20:26 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, kvm
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-kernel, Marc Zyngier, Eric Auger,
Alex Williamson, Rong L Liu, Zhenyu Wang, Tomasz Nowicki,
Grzegorz Jaszczyk, upstream, Dmitry Torokhov, Dong, Eddie,
Dmytro Maluka
KVM irqfd based emulation of level-triggered interrupts doesn't work
quite correctly in some cases, particularly in the case of interrupts
that are handled in a Linux guest as oneshot interrupts (IRQF_ONESHOT).
Such an interrupt is acked to the device in its threaded irq handler,
i.e. later than it is acked to the interrupt controller (EOI at the end
of hardirq), not earlier.
Linux keeps such interrupt masked until its threaded handler finishes,
to prevent the EOI from re-asserting an unacknowledged interrupt.
However, with KVM + vfio (or whatever is listening on the resamplefd)
we always notify resamplefd at the EOI, so vfio prematurely unmasks the
host physical IRQ, thus a new physical interrupt is fired in the host.
This extra interrupt in the host is not a problem per se. The problem is
that it is unconditionally queued for injection into the guest, so the
guest sees an extra bogus interrupt. [*]
There are observed at least 2 user-visible issues caused by those
extra erroneous interrupts for a oneshot irq in the guest:
1. System suspend aborted due to a pending wakeup interrupt from
ChromeOS EC (drivers/platform/chrome/cros_ec.c).
2. Annoying "invalid report id data" errors from ELAN0000 touchpad
(drivers/input/mouse/elan_i2c_core.c), flooding the guest dmesg
every time the touchpad is touched.
The core issue here is that by the time when the guest unmasks the IRQ,
the physical IRQ line is no longer asserted (since the guest has
acked the interrupt to the device in the meantime), yet we
unconditionally inject the interrupt queued into the guest by the
previous resampling. So to fix the issue, we need a way to detect that
the IRQ is no longer pending, and cancel the queued interrupt in this
case.
With IOAPIC we are not able to probe the physical IRQ line state
directly (at least not if the underlying physical interrupt controller
is an IOAPIC too), so in this patch series we use irqfd resampler for
that. Namely, instead of injecting the queued interrupt, we just notify
the resampler that this interrupt is done. If the IRQ line is actually
already deasserted, we are done. If it is still asserted, a new
interrupt will be shortly triggered through irqfd and injected into the
guest.
In the case if there is no irqfd resampler registered for this IRQ, we
cannot fix the issue, so we keep the existing behavior: immediately
unconditionally inject the queued interrupt.
This patch series fixes the issue for x86 IOAPIC only. In the long run,
we can fix it for other irqchips and other architectures too, possibly
taking advantage of reading the physical state of the IRQ line, which is
possible with some other irqchips (e.g. with arm64 GIC, maybe even with
the legacy x86 PIC).
[*] In this description we assume that the interrupt is a physical host
interrupt forwarded to the guest e.g. by vfio. Potentially the same
issue may occur also with a purely virtual interrupt from an
emulated device, e.g. if the guest handles this interrupt, again, as
a oneshot interrupt.
v3:
- Completely reworked: instead of postponing resamplefd notify until
unmask (to avoid extra interrupts in the host), resample the pending
status at unmask to avoid erroneous propagation of those extra
interrupts to the guest.
Thanks to Marc Zyngier for helping to identify the core issue, which
resulted in a simpler and probably more sensible implementation
(even though Marc's concern about presenting inaccurate pending
status to the guest is a non-issue in the case of IOAPIC, since
IOAPIC doesn't present this information anyway).
v2:
- Fixed compilation failure on non-x86: mask_notifier_list moved from
x86 "struct kvm_arch" to generic "struct kvm".
- kvm_fire_mask_notifiers() also moved from x86 to generic code, even
though it is not called on other architectures for now.
- Instead of kvm_irq_is_masked() implemented
kvm_register_and_fire_irq_mask_notifier() to fix potential race
when reading the initial IRQ mask state.
- Renamed for clarity:
- irqfd_resampler_mask() -> irqfd_resampler_mask_notify()
- kvm_irq_has_notifier() -> kvm_irq_has_ack_notifier()
- resampler->notifier -> resampler->ack_notifier
- Reorganized code in irqfd_resampler_ack() and
irqfd_resampler_mask_notify() to make it easier to follow.
- Don't follow unwanted "return type on separate line" style for
irqfd_resampler_mask_notify().
Dmytro Maluka (2):
KVM: irqfd: Make resampler_list an RCU list
KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking
arch/x86/kvm/ioapic.c | 36 ++++++++++++++++++++++++++++--
include/linux/kvm_host.h | 9 ++++++++
include/linux/kvm_irqfd.h | 2 +-
virt/kvm/eventfd.c | 47 ++++++++++++++++++++++++++++++++-------
4 files changed, 83 insertions(+), 11 deletions(-)
--
2.37.1.595.g718a3a8f04-goog
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 1/2] KVM: irqfd: Make resampler_list an RCU list
2022-08-18 20:26 [PATCH v3 0/2] KVM: x86/ioapic: Fix oneshot interrupts forwarding Dmytro Maluka
@ 2022-08-18 20:27 ` Dmytro Maluka
2022-08-18 20:27 ` [PATCH v3 2/2] KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking Dmytro Maluka
2022-10-21 13:31 ` [PATCH v3 0/2] KVM: x86/ioapic: Fix oneshot interrupts forwarding Dmytro Maluka
2 siblings, 0 replies; 6+ messages in thread
From: Dmytro Maluka @ 2022-08-18 20:27 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, kvm
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-kernel, Marc Zyngier, Eric Auger,
Alex Williamson, Rong L Liu, Zhenyu Wang, Tomasz Nowicki,
Grzegorz Jaszczyk, upstream, Dmitry Torokhov, Dong, Eddie,
Dmytro Maluka
It is useful to be able to do read-only traversal of the list of all the
registered irqfd resamplers without locking the resampler_lock mutex.
In particular, we are going to traverse it to search for a resampler
registered for the given irq of an irqchip, and that will be done with
an irqchip spinlock (ioapic->lock) held, so it is undesirable to lock a
mutex in this context. So turn this list into an RCU list.
For protecting the read side, reuse kvm->irq_srcu which is already used
for protecting a number of irq related things (kvm->irq_routing,
irqfd->resampler->list, kvm->irq_ack_notifier_list,
kvm->arch.mask_notifier_list).
Signed-off-by: Dmytro Maluka <dmy@semihalf.com>
---
include/linux/kvm_host.h | 1 +
include/linux/kvm_irqfd.h | 2 +-
virt/kvm/eventfd.c | 8 ++++++--
3 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1c480b1821e1..ee6d906e0138 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -738,6 +738,7 @@ struct kvm {
struct {
spinlock_t lock;
struct list_head items;
+ /* resampler_list update side is protected by resampler_lock. */
struct list_head resampler_list;
struct mutex resampler_lock;
} irqfds;
diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
index dac047abdba7..8ad43692e3bb 100644
--- a/include/linux/kvm_irqfd.h
+++ b/include/linux/kvm_irqfd.h
@@ -31,7 +31,7 @@ struct kvm_kernel_irqfd_resampler {
/*
* Entry in list of kvm->irqfd.resampler_list. Use for sharing
* resamplers among irqfds on the same gsi.
- * Accessed and modified under kvm->irqfds.resampler_lock
+ * RCU list modified under kvm->irqfds.resampler_lock
*/
struct list_head link;
};
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 2a3ed401ce46..61aea70dd888 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -96,8 +96,12 @@ irqfd_resampler_shutdown(struct kvm_kernel_irqfd *irqfd)
synchronize_srcu(&kvm->irq_srcu);
if (list_empty(&resampler->list)) {
- list_del(&resampler->link);
+ list_del_rcu(&resampler->link);
kvm_unregister_irq_ack_notifier(kvm, &resampler->notifier);
+ /*
+ * synchronize_srcu(&kvm->irq_srcu) already called
+ * in kvm_unregister_irq_ack_notifier().
+ */
kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID,
resampler->notifier.gsi, 0, false);
kfree(resampler);
@@ -369,7 +373,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
resampler->notifier.irq_acked = irqfd_resampler_ack;
INIT_LIST_HEAD(&resampler->link);
- list_add(&resampler->link, &kvm->irqfds.resampler_list);
+ list_add_rcu(&resampler->link, &kvm->irqfds.resampler_list);
kvm_register_irq_ack_notifier(kvm,
&resampler->notifier);
irqfd->resampler = resampler;
--
2.37.1.595.g718a3a8f04-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 2/2] KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking
2022-08-18 20:26 [PATCH v3 0/2] KVM: x86/ioapic: Fix oneshot interrupts forwarding Dmytro Maluka
2022-08-18 20:27 ` [PATCH v3 1/2] KVM: irqfd: Make resampler_list an RCU list Dmytro Maluka
@ 2022-08-18 20:27 ` Dmytro Maluka
2023-03-16 0:16 ` Sean Christopherson
2022-10-21 13:31 ` [PATCH v3 0/2] KVM: x86/ioapic: Fix oneshot interrupts forwarding Dmytro Maluka
2 siblings, 1 reply; 6+ messages in thread
From: Dmytro Maluka @ 2022-08-18 20:27 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, kvm
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-kernel, Marc Zyngier, Eric Auger,
Alex Williamson, Rong L Liu, Zhenyu Wang, Tomasz Nowicki,
Grzegorz Jaszczyk, upstream, Dmitry Torokhov, Dong, Eddie,
Dmytro Maluka
KVM irqfd based emulation of level-triggered interrupts doesn't work
quite correctly in some cases, particularly in the case of interrupts
that are handled in a Linux guest as oneshot interrupts (IRQF_ONESHOT).
Such an interrupt is acked to the device in its threaded irq handler,
i.e. later than it is acked to the interrupt controller (EOI at the end
of hardirq), not earlier.
Linux keeps such interrupt masked until its threaded handler finishes,
to prevent the EOI from re-asserting an unacknowledged interrupt.
However, with KVM + vfio (or whatever is listening on the resamplefd)
we always notify resamplefd at the EOI, so vfio prematurely unmasks the
host physical IRQ, thus a new physical interrupt is fired in the host.
This extra interrupt in the host is not a problem per se. The problem is
that it is unconditionally queued for injection into the guest, so the
guest sees an extra bogus interrupt. [*]
There are observed at least 2 user-visible issues caused by those
extra erroneous interrupts for a oneshot irq in the guest:
1. System suspend aborted due to a pending wakeup interrupt from
ChromeOS EC (drivers/platform/chrome/cros_ec.c).
2. Annoying "invalid report id data" errors from ELAN0000 touchpad
(drivers/input/mouse/elan_i2c_core.c), flooding the guest dmesg
every time the touchpad is touched.
The core issue here is that by the time when the guest unmasks the IRQ,
the physical IRQ line is no longer asserted (since the guest has
acked the interrupt to the device in the meantime), yet we
unconditionally inject the interrupt queued into the guest by the
previous resampling. So to fix the issue, we need a way to detect that
the IRQ is no longer pending, and cancel the queued interrupt in this
case.
With IOAPIC we are not able to probe the physical IRQ line state
directly (at least not if the underlying physical interrupt controller
is an IOAPIC too), so in this patch we use irqfd resampler for that.
Namely, instead of injecting the queued interrupt, we just notify the
resampler that this interrupt is done. If the IRQ line is actually
already deasserted, we are done. If it is still asserted, a new
interrupt will be shortly triggered through irqfd and injected into the
guest.
In the case if there is no irqfd resampler registered for this IRQ, we
cannot fix the issue, so we keep the existing behavior: immediately
unconditionally inject the queued interrupt.
This patch fixes the issue for x86 IOAPIC only. In the long run, we can
fix it for other irqchips and other architectures too, possibly taking
advantage of reading the physical state of the IRQ line, which is
possible with some other irqchips (e.g. with arm64 GIC, maybe even with
the legacy x86 PIC).
[*] In this description we assume that the interrupt is a physical host
interrupt forwarded to the guest e.g. by vfio. Potentially the same
issue may occur also with a purely virtual interrupt from an
emulated device, e.g. if the guest handles this interrupt, again, as
a oneshot interrupt.
Signed-off-by: Dmytro Maluka <dmy@semihalf.com>
Link: https://lore.kernel.org/kvm/31420943-8c5f-125c-a5ee-d2fde2700083@semihalf.com/
Link: https://lore.kernel.org/lkml/87o7wrug0w.wl-maz@kernel.org/
---
arch/x86/kvm/ioapic.c | 36 ++++++++++++++++++++++++++++++++++--
include/linux/kvm_host.h | 8 ++++++++
virt/kvm/eventfd.c | 39 +++++++++++++++++++++++++++++++++------
3 files changed, 75 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 765943d7cfa5..da7074d9b04e 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -368,8 +368,40 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
if (mask_before != mask_after)
kvm_fire_mask_notifiers(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index, mask_after);
if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG
- && ioapic->irr & (1 << index))
- ioapic_service(ioapic, index, false);
+ && ioapic->irr & (1 << index)
+ && !e->fields.mask
+ && !e->fields.remote_irr) {
+ /*
+ * Pending status in irr may be outdated: the IRQ line may have
+ * already been deasserted by a device while the IRQ was masked.
+ * This occurs, for instance, if the interrupt is handled in a
+ * Linux guest as a oneshot interrupt (IRQF_ONESHOT). In this
+ * case the guest acknowledges the interrupt to the device in
+ * its threaded irq handler, i.e. after the EOI but before
+ * unmasking, so at the time of unmasking the IRQ line is
+ * already down but our pending irr bit is still set. In such
+ * cases, injecting this pending interrupt to the guest is
+ * buggy: the guest will receive an extra unwanted interrupt.
+ *
+ * So we need to check here if the IRQ is actually still pending.
+ * As we are generally not able to probe the IRQ line status
+ * directly, we do it through irqfd resampler. Namely, we clear
+ * the pending status and notify the resampler that this interrupt
+ * is done, without actually injecting it into the guest. If the
+ * IRQ line is actually already deasserted, we are done. If it is
+ * still asserted, a new interrupt will be shortly triggered
+ * through irqfd and injected into the guest.
+ *
+ * If, however, it's not possible to resample (no irqfd resampler
+ * registered for this irq), then unconditionally inject this
+ * pending interrupt into the guest, so the guest will not miss
+ * an interrupt, although may get an extra unwanted interrupt.
+ */
+ if (kvm_notify_irqfd_resampler(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index))
+ ioapic->irr &= ~(1 << index);
+ else
+ ioapic_service(ioapic, index, false);
+ }
if (e->fields.delivery_mode == APIC_DM_FIXED) {
struct kvm_lapic_irq irq;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ee6d906e0138..b3ca17c74b44 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1979,6 +1979,7 @@ int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
#ifdef CONFIG_HAVE_KVM_IRQFD
int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args);
void kvm_irqfd_release(struct kvm *kvm);
+bool kvm_notify_irqfd_resampler(struct kvm *kvm, unsigned irqchip, unsigned pin);
void kvm_irq_routing_update(struct kvm *);
#else
static inline int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
@@ -1987,6 +1988,13 @@ static inline int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
}
static inline void kvm_irqfd_release(struct kvm *kvm) {}
+
+static inline bool kvm_notify_irqfd_resampler(struct kvm *kvm,
+ unsigned irqchip,
+ unsigned pin)
+{
+ return false;
+}
#endif
#else
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 61aea70dd888..71f327019f1e 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -55,6 +55,16 @@ irqfd_inject(struct work_struct *work)
irqfd->gsi, 1, false);
}
+/* Called within kvm->irq_srcu read side. */
+static void __irqfd_resampler_notify(struct kvm_kernel_irqfd_resampler *resampler)
+{
+ struct kvm_kernel_irqfd *irqfd;
+
+ list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link,
+ srcu_read_lock_held(&resampler->kvm->irq_srcu))
+ eventfd_signal(irqfd->resamplefd, 1);
+}
+
/*
* Since resampler irqfds share an IRQ source ID, we de-assert once
* then notify all of the resampler irqfds using this GSI. We can't
@@ -65,7 +75,6 @@ irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian)
{
struct kvm_kernel_irqfd_resampler *resampler;
struct kvm *kvm;
- struct kvm_kernel_irqfd *irqfd;
int idx;
resampler = container_of(kian,
@@ -76,11 +85,7 @@ irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian)
resampler->notifier.gsi, 0, false);
idx = srcu_read_lock(&kvm->irq_srcu);
-
- list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link,
- srcu_read_lock_held(&kvm->irq_srcu))
- eventfd_signal(irqfd->resamplefd, 1);
-
+ __irqfd_resampler_notify(resampler);
srcu_read_unlock(&kvm->irq_srcu, idx);
}
@@ -648,6 +653,28 @@ void kvm_irq_routing_update(struct kvm *kvm)
spin_unlock_irq(&kvm->irqfds.lock);
}
+bool kvm_notify_irqfd_resampler(struct kvm *kvm, unsigned irqchip, unsigned pin)
+{
+ struct kvm_kernel_irqfd_resampler *resampler;
+ int gsi, idx;
+
+ idx = srcu_read_lock(&kvm->irq_srcu);
+ gsi = kvm_irq_map_chip_pin(kvm, irqchip, pin);
+ if (gsi != -1)
+ list_for_each_entry_srcu(resampler,
+ &kvm->irqfds.resampler_list, link,
+ srcu_read_lock_held(&kvm->irq_srcu)) {
+ if (resampler->notifier.gsi == gsi) {
+ __irqfd_resampler_notify(resampler);
+ srcu_read_unlock(&kvm->irq_srcu, idx);
+ return true;
+ }
+ }
+ srcu_read_unlock(&kvm->irq_srcu, idx);
+
+ return false;
+}
+
/*
* create a host-wide workqueue for issuing deferred shutdown requests
* aggregated from all vm* instances. We need our own isolated
--
2.37.1.595.g718a3a8f04-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3 0/2] KVM: x86/ioapic: Fix oneshot interrupts forwarding
2022-08-18 20:26 [PATCH v3 0/2] KVM: x86/ioapic: Fix oneshot interrupts forwarding Dmytro Maluka
2022-08-18 20:27 ` [PATCH v3 1/2] KVM: irqfd: Make resampler_list an RCU list Dmytro Maluka
2022-08-18 20:27 ` [PATCH v3 2/2] KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking Dmytro Maluka
@ 2022-10-21 13:31 ` Dmytro Maluka
2 siblings, 0 replies; 6+ messages in thread
From: Dmytro Maluka @ 2022-10-21 13:31 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, kvm
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-kernel, Marc Zyngier, Eric Auger,
Alex Williamson, Rong L Liu, Zhenyu Wang, Tomasz Nowicki,
Grzegorz Jaszczyk, upstream, Dmitry Torokhov, Dong, Eddie
Hi Paolo, Marc, Eric and others,
Did you have a chance to take a look at this v3 patchset?
Just to remind the context: this patchset implements a solution I
proposed in [1], only that it doesn't introduce the additional "pending
oneshot" state (which was Marc's concern in [2]): I realized that this
extra state is not really needed, as we can just unconditionally notify
the resamplefd once again at unmask, with the same end result.
[1] https://lore.kernel.org/kvm/72e40c17-e5cd-1ffd-9a38-00b47e1cbd8e@semihalf.com/
[2] https://lore.kernel.org/kvm/87o7wrug0w.wl-maz@kernel.org/
On 8/18/22 22:26, Dmytro Maluka wrote:
> KVM irqfd based emulation of level-triggered interrupts doesn't work
> quite correctly in some cases, particularly in the case of interrupts
> that are handled in a Linux guest as oneshot interrupts (IRQF_ONESHOT).
> Such an interrupt is acked to the device in its threaded irq handler,
> i.e. later than it is acked to the interrupt controller (EOI at the end
> of hardirq), not earlier.
>
> Linux keeps such interrupt masked until its threaded handler finishes,
> to prevent the EOI from re-asserting an unacknowledged interrupt.
> However, with KVM + vfio (or whatever is listening on the resamplefd)
> we always notify resamplefd at the EOI, so vfio prematurely unmasks the
> host physical IRQ, thus a new physical interrupt is fired in the host.
> This extra interrupt in the host is not a problem per se. The problem is
> that it is unconditionally queued for injection into the guest, so the
> guest sees an extra bogus interrupt. [*]
>
> There are observed at least 2 user-visible issues caused by those
> extra erroneous interrupts for a oneshot irq in the guest:
>
> 1. System suspend aborted due to a pending wakeup interrupt from
> ChromeOS EC (drivers/platform/chrome/cros_ec.c).
> 2. Annoying "invalid report id data" errors from ELAN0000 touchpad
> (drivers/input/mouse/elan_i2c_core.c), flooding the guest dmesg
> every time the touchpad is touched.
>
> The core issue here is that by the time when the guest unmasks the IRQ,
> the physical IRQ line is no longer asserted (since the guest has
> acked the interrupt to the device in the meantime), yet we
> unconditionally inject the interrupt queued into the guest by the
> previous resampling. So to fix the issue, we need a way to detect that
> the IRQ is no longer pending, and cancel the queued interrupt in this
> case.
>
> With IOAPIC we are not able to probe the physical IRQ line state
> directly (at least not if the underlying physical interrupt controller
> is an IOAPIC too), so in this patch series we use irqfd resampler for
> that. Namely, instead of injecting the queued interrupt, we just notify
> the resampler that this interrupt is done. If the IRQ line is actually
> already deasserted, we are done. If it is still asserted, a new
> interrupt will be shortly triggered through irqfd and injected into the
> guest.
>
> In the case if there is no irqfd resampler registered for this IRQ, we
> cannot fix the issue, so we keep the existing behavior: immediately
> unconditionally inject the queued interrupt.
>
> This patch series fixes the issue for x86 IOAPIC only. In the long run,
> we can fix it for other irqchips and other architectures too, possibly
> taking advantage of reading the physical state of the IRQ line, which is
> possible with some other irqchips (e.g. with arm64 GIC, maybe even with
> the legacy x86 PIC).
>
> [*] In this description we assume that the interrupt is a physical host
> interrupt forwarded to the guest e.g. by vfio. Potentially the same
> issue may occur also with a purely virtual interrupt from an
> emulated device, e.g. if the guest handles this interrupt, again, as
> a oneshot interrupt.
>
>
> v3:
> - Completely reworked: instead of postponing resamplefd notify until
> unmask (to avoid extra interrupts in the host), resample the pending
> status at unmask to avoid erroneous propagation of those extra
> interrupts to the guest.
> Thanks to Marc Zyngier for helping to identify the core issue, which
> resulted in a simpler and probably more sensible implementation
> (even though Marc's concern about presenting inaccurate pending
> status to the guest is a non-issue in the case of IOAPIC, since
> IOAPIC doesn't present this information anyway).
>
> v2:
> - Fixed compilation failure on non-x86: mask_notifier_list moved from
> x86 "struct kvm_arch" to generic "struct kvm".
> - kvm_fire_mask_notifiers() also moved from x86 to generic code, even
> though it is not called on other architectures for now.
> - Instead of kvm_irq_is_masked() implemented
> kvm_register_and_fire_irq_mask_notifier() to fix potential race
> when reading the initial IRQ mask state.
> - Renamed for clarity:
> - irqfd_resampler_mask() -> irqfd_resampler_mask_notify()
> - kvm_irq_has_notifier() -> kvm_irq_has_ack_notifier()
> - resampler->notifier -> resampler->ack_notifier
> - Reorganized code in irqfd_resampler_ack() and
> irqfd_resampler_mask_notify() to make it easier to follow.
> - Don't follow unwanted "return type on separate line" style for
> irqfd_resampler_mask_notify().
>
> Dmytro Maluka (2):
> KVM: irqfd: Make resampler_list an RCU list
> KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking
>
> arch/x86/kvm/ioapic.c | 36 ++++++++++++++++++++++++++++--
> include/linux/kvm_host.h | 9 ++++++++
> include/linux/kvm_irqfd.h | 2 +-
> virt/kvm/eventfd.c | 47 ++++++++++++++++++++++++++++++++-------
> 4 files changed, 83 insertions(+), 11 deletions(-)
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 2/2] KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking
2022-08-18 20:27 ` [PATCH v3 2/2] KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking Dmytro Maluka
@ 2023-03-16 0:16 ` Sean Christopherson
2023-03-16 21:13 ` Dmytro Maluka
0 siblings, 1 reply; 6+ messages in thread
From: Sean Christopherson @ 2023-03-16 0:16 UTC (permalink / raw)
To: Dmytro Maluka
Cc: Paolo Bonzini, kvm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, linux-kernel, Marc Zyngier,
Eric Auger, Alex Williamson, Rong L Liu, Zhenyu Wang,
Tomasz Nowicki, Grzegorz Jaszczyk, upstream, Dmitry Torokhov,
Eddie Dong
Looks sane to me, just a bunch of cosmetic comments. But this really needs input/review
from others. I/O APIC and level triggered interrupts are not exactly in my wheelhouse.
On Thu, Aug 18, 2022, Dmytro Maluka wrote:
> ---
> arch/x86/kvm/ioapic.c | 36 ++++++++++++++++++++++++++++++++++--
> include/linux/kvm_host.h | 8 ++++++++
> virt/kvm/eventfd.c | 39 +++++++++++++++++++++++++++++++++------
> 3 files changed, 75 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 765943d7cfa5..da7074d9b04e 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -368,8 +368,40 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
> if (mask_before != mask_after)
> kvm_fire_mask_notifiers(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index, mask_after);
> if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG
> - && ioapic->irr & (1 << index))
> - ioapic_service(ioapic, index, false);
> + && ioapic->irr & (1 << index)
> + && !e->fields.mask
> + && !e->fields.remote_irr) {
Can you opportunistically change these to fit the preferred style of putting the &&
on the previous line? Ignore the file's existing "style", this crud is ancient and
ugly (this goes for all of my comments).
> @@ -1987,6 +1988,13 @@ static inline int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
> }
>
> static inline void kvm_irqfd_release(struct kvm *kvm) {}
> +
> +static inline bool kvm_notify_irqfd_resampler(struct kvm *kvm,
> + unsigned irqchip,
> + unsigned pin)
"unsigned int" instead of bare "unsigned"
> +{
> + return false;
> +}
> #endif
>
> #else
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 61aea70dd888..71f327019f1e 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -55,6 +55,16 @@ irqfd_inject(struct work_struct *work)
> irqfd->gsi, 1, false);
> }
>
> +/* Called within kvm->irq_srcu read side. */
Ne need for the comment, let lockdep do the heavy lifting.
> +static void __irqfd_resampler_notify(struct kvm_kernel_irqfd_resampler *resampler)
I don't see a need for the double underscores. I assume the idea is to convey
that this is called under kvm->irq_srcu, but I just ended up looking for a version
without the underscores.
> +{
> + struct kvm_kernel_irqfd *irqfd;
> +
> + list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link,
> + srcu_read_lock_held(&resampler->kvm->irq_srcu))
Align the indentation, i.e.
struct kvm_kernel_irqfd *irqfd;
list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link,
srcu_read_lock_held(&resampler->kvm->irq_srcu))
eventfd_signal(irqfd->resamplefd, 1);
> @@ -648,6 +653,28 @@ void kvm_irq_routing_update(struct kvm *kvm)
> spin_unlock_irq(&kvm->irqfds.lock);
> }
>
> +bool kvm_notify_irqfd_resampler(struct kvm *kvm, unsigned irqchip, unsigned pin)
> +{
> + struct kvm_kernel_irqfd_resampler *resampler;
> + int gsi, idx;
> +
> + idx = srcu_read_lock(&kvm->irq_srcu);
> + gsi = kvm_irq_map_chip_pin(kvm, irqchip, pin);
> + if (gsi != -1)
This if-statement needs curly braces, the exemption doesn't apply if there are
multiple blocks? (can't think of the right name at the moment) in the guts of
the if-statement.
> + list_for_each_entry_srcu(resampler,
> + &kvm->irqfds.resampler_list, link,
> + srcu_read_lock_held(&kvm->irq_srcu)) {
> + if (resampler->notifier.gsi == gsi) {
> + __irqfd_resampler_notify(resampler);
> + srcu_read_unlock(&kvm->irq_srcu, idx);
> + return true;
> + }
> + }
> + srcu_read_unlock(&kvm->irq_srcu, idx);
> +
> + return false;
> +}
> +
> /*
> * create a host-wide workqueue for issuing deferred shutdown requests
> * aggregated from all vm* instances. We need our own isolated
> --
> 2.37.1.595.g718a3a8f04-goog
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 2/2] KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking
2023-03-16 0:16 ` Sean Christopherson
@ 2023-03-16 21:13 ` Dmytro Maluka
0 siblings, 0 replies; 6+ messages in thread
From: Dmytro Maluka @ 2023-03-16 21:13 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, linux-kernel, Marc Zyngier,
Eric Auger, Alex Williamson, Rong L Liu, Zhenyu Wang,
Tomasz Nowicki, Grzegorz Jaszczyk, upstream, Dmitry Torokhov,
Eddie Dong
Hi Sean,
On 3/16/23 01:16, Sean Christopherson wrote:
> Looks sane to me, just a bunch of cosmetic comments. But this really needs input/review
> from others. I/O APIC and level triggered interrupts are not exactly in my wheelhouse.
Ok, sure. All of your cosmetic suggestions below sound good to me.
>
> On Thu, Aug 18, 2022, Dmytro Maluka wrote:
>> ---
>> arch/x86/kvm/ioapic.c | 36 ++++++++++++++++++++++++++++++++++--
>> include/linux/kvm_host.h | 8 ++++++++
>> virt/kvm/eventfd.c | 39 +++++++++++++++++++++++++++++++++------
>> 3 files changed, 75 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>> index 765943d7cfa5..da7074d9b04e 100644
>> --- a/arch/x86/kvm/ioapic.c
>> +++ b/arch/x86/kvm/ioapic.c
>> @@ -368,8 +368,40 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>> if (mask_before != mask_after)
>> kvm_fire_mask_notifiers(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index, mask_after);
>> if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG
>> - && ioapic->irr & (1 << index))
>> - ioapic_service(ioapic, index, false);
>> + && ioapic->irr & (1 << index)
>> + && !e->fields.mask
>> + && !e->fields.remote_irr) {
>
> Can you opportunistically change these to fit the preferred style of putting the &&
> on the previous line? Ignore the file's existing "style", this crud is ancient and
> ugly (this goes for all of my comments).
>
>> @@ -1987,6 +1988,13 @@ static inline int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
>> }
>>
>> static inline void kvm_irqfd_release(struct kvm *kvm) {}
>> +
>> +static inline bool kvm_notify_irqfd_resampler(struct kvm *kvm,
>> + unsigned irqchip,
>> + unsigned pin)
>
> "unsigned int" instead of bare "unsigned"
>
>> +{
>> + return false;
>> +}
>> #endif
>>
>> #else
>> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
>> index 61aea70dd888..71f327019f1e 100644
>> --- a/virt/kvm/eventfd.c
>> +++ b/virt/kvm/eventfd.c
>> @@ -55,6 +55,16 @@ irqfd_inject(struct work_struct *work)
>> irqfd->gsi, 1, false);
>> }
>>
>> +/* Called within kvm->irq_srcu read side. */
>
> Ne need for the comment, let lockdep do the heavy lifting.
>
>> +static void __irqfd_resampler_notify(struct kvm_kernel_irqfd_resampler *resampler)
>
> I don't see a need for the double underscores. I assume the idea is to convey
> that this is called under kvm->irq_srcu, but I just ended up looking for a version
> without the underscores.
>
>> +{
>> + struct kvm_kernel_irqfd *irqfd;
>> +
>> + list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link,
>> + srcu_read_lock_held(&resampler->kvm->irq_srcu))
>
> Align the indentation, i.e.
>
> struct kvm_kernel_irqfd *irqfd;
>
> list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link,
> srcu_read_lock_held(&resampler->kvm->irq_srcu))
> eventfd_signal(irqfd->resamplefd, 1);
>
>> @@ -648,6 +653,28 @@ void kvm_irq_routing_update(struct kvm *kvm)
>> spin_unlock_irq(&kvm->irqfds.lock);
>> }
>>
>> +bool kvm_notify_irqfd_resampler(struct kvm *kvm, unsigned irqchip, unsigned pin)
>> +{
>> + struct kvm_kernel_irqfd_resampler *resampler;
>> + int gsi, idx;
>> +
>> + idx = srcu_read_lock(&kvm->irq_srcu);
>> + gsi = kvm_irq_map_chip_pin(kvm, irqchip, pin);
>> + if (gsi != -1)
>
> This if-statement needs curly braces, the exemption doesn't apply if there are
> multiple blocks? (can't think of the right name at the moment) in the guts of
> the if-statement.
>
>> + list_for_each_entry_srcu(resampler,
>> + &kvm->irqfds.resampler_list, link,
>> + srcu_read_lock_held(&kvm->irq_srcu)) {
>> + if (resampler->notifier.gsi == gsi) {
>> + __irqfd_resampler_notify(resampler);
>> + srcu_read_unlock(&kvm->irq_srcu, idx);
>> + return true;
>> + }
>> + }
>> + srcu_read_unlock(&kvm->irq_srcu, idx);
>> +
>> + return false;
>> +}
>> +
>> /*
>> * create a host-wide workqueue for issuing deferred shutdown requests
>> * aggregated from all vm* instances. We need our own isolated
>> --
>> 2.37.1.595.g718a3a8f04-goog
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-03-16 21:13 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-18 20:26 [PATCH v3 0/2] KVM: x86/ioapic: Fix oneshot interrupts forwarding Dmytro Maluka
2022-08-18 20:27 ` [PATCH v3 1/2] KVM: irqfd: Make resampler_list an RCU list Dmytro Maluka
2022-08-18 20:27 ` [PATCH v3 2/2] KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking Dmytro Maluka
2023-03-16 0:16 ` Sean Christopherson
2023-03-16 21:13 ` Dmytro Maluka
2022-10-21 13:31 ` [PATCH v3 0/2] KVM: x86/ioapic: Fix oneshot interrupts forwarding Dmytro Maluka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox