From: Paolo Bonzini <pbonzini@redhat.com> To: Alex Williamson <alex.williamson@redhat.com> Cc: Peter Xu <peterx@redhat.com>, qemu-devel@nongnu.org, Igor Mammedov <imammedo@redhat.com>, Eduardo Habkost <ehabkost@redhat.com>, "Michael S . Tsirkin" <mst@redhat.com> Subject: Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default Date: Sat, 27 Apr 2019 01:29:10 -0400 (EDT) [thread overview] Message-ID: <94415012.15677076.1556342950794.JavaMail.zimbra@redhat.com> (raw) In-Reply-To: <20190426150236.1af2ff08@x1.home> > > In my testing it looks like KVM advertises supporting the KVM_IRQFD > > resample feature, but vfio never gets the unmask notification, so the > > device remains with DisINTx set and no further interrupts are > > generated. Do we expect KVM's IRQFD with resampler to work in the > > split IRQ mode? We can certainly hope that "high performance" devices > > use MSI or MSI/X, but this would be quite a performance regression with > > split mode if our userspace bypass for INTx goes away. Thanks, > > arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before > kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via > kvm_notify_acked_gsi(), That wouldn't help because kvm_ioapic_update_eoi would not even be able to access vcpu->kvm->arch.vioapic (it's NULL). The following untested patch would signal the resamplefd in kvm_ioapic_send_eoi, before requesting the exit to userspace. However I am not sure how QEMU sets up the VFIO eventfds: if I understand correctly, when VFIO writes again to the irq eventfd, the interrupt request would not reach the userspace IOAPIC, but only the in-kernel LAPIC. That would be incorrect, and if my understanding is correct we need to trigger resampling from hw/intc/ioapic.c. Something like: 1) VFIO uses kvm_irqchip_add_irqfd_notifier_gsi instead of KVM_IRQFD directly 2) add a NotifierList in kvm_irqchip_assign_irqfd 3) if kvm_irqchip_is_split(), register a corresponding Notifier in ioapic.c which stores the resamplefd as an EventNotifier* 4) ioapic_eoi_broadcast writes to the resamplefd BTW, how do non-x86 platforms handle intx resampling? Paolo ---- 8< ----- From: Paolo Bonzini <pbonzini@redhat.com> Subject: [PATCH WIP] kvm: lapic: run ack notifiers for userspace IOAPIC routes diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h index ea1a4e0297da..be337e06e3fd 100644 --- a/arch/x86/kvm/ioapic.h +++ b/arch/x86/kvm/ioapic.h @@ -135,4 +135,6 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, ulong *ioapic_handled_vectors); void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, ulong *ioapic_handled_vectors); +void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector); + #endif diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c index 3cc3b2d130a0..46ea79a0dd8a 100644 --- a/arch/x86/kvm/irq_comm.c +++ b/arch/x86/kvm/irq_comm.c @@ -435,6 +435,34 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, srcu_read_unlock(&kvm->irq_srcu, idx); } +void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector) +{ + struct kvm_kernel_irq_routing_entry *entry; + struct kvm_irq_routing_table *table; + u32 i, nr_ioapic_pins; + int idx; + + idx = srcu_read_lock(&kvm->irq_srcu); + table = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu); + nr_ioapic_pins = min_t(u32, table->nr_rt_entries, + kvm->arch.nr_reserved_ioapic_pins); + + for (i = 0; i < nr_ioapic_pins; i++) { + hlist_for_each_entry(entry, &table->map[i], link) { + struct kvm_lapic_irq irq; + + if (entry->type != KVM_IRQ_ROUTING_MSI) + continue; + + kvm_set_msi_irq(kvm, entry, &irq); + if (irq.level && irq.vector == vector) + kvm_notify_acked_gsi(kvm, i); + } + } + + srcu_read_unlock(&kvm->irq_srcu, idx); +} + void kvm_arch_irq_routing_update(struct kvm *kvm) { kvm_hv_irq_routing_update(kvm); diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 9f089e2e09d0..8f8e76d77925 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1142,6 +1142,7 @@ static bool kvm_ioapic_handles_vector(struct kvm_lapic *apic, int vector) static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector) { + struct kvm_vcpu *vcpu = apic->vcpu; int trigger_mode; /* Eoi the ioapic only if the ioapic doesn't own the vector. */ @@ -1149,9 +1150,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector) return; /* Request a KVM exit to inform the userspace IOAPIC. */ - if (irqchip_split(apic->vcpu->kvm)) { - apic->vcpu->arch.pending_ioapic_eoi = vector; - kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu); + if (irqchip_split(vcpu->kvm)) { + kvm_notify_userspace_ioapic_routes(vcpu->kvm, vector); + vcpu->arch.pending_ioapic_eoi = vector; + kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, vcpu); return; } @@ -1160,7 +1162,7 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector) else trigger_mode = IOAPIC_EDGE_TRIG; - kvm_ioapic_update_eoi(apic->vcpu, vector, trigger_mode); + kvm_ioapic_update_eoi(vcpu, vector, trigger_mode); } static int apic_set_eoi(struct kvm_lapic *apic) > so it looks like KVM really ought to return an > error when trying to register a resample IRQFD when irqchip_split(), > but do we have better options? EOI handling in QEMU is pretty much > non-existent and even if it was in place, it's a big performance > regression for vfio INTx handling. Is a split irqchip that retains > performant resampling (EOI) support possible? Thanks,
WARNING: multiple messages have this Message-ID (diff)
From: Paolo Bonzini <pbonzini@redhat.com> To: Alex Williamson <alex.williamson@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com>, "Michael S . Tsirkin" <mst@redhat.com>, qemu-devel@nongnu.org, Peter Xu <peterx@redhat.com>, Eduardo Habkost <ehabkost@redhat.com> Subject: Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default Date: Sat, 27 Apr 2019 01:29:10 -0400 (EDT) [thread overview] Message-ID: <94415012.15677076.1556342950794.JavaMail.zimbra@redhat.com> (raw) Message-ID: <20190427052910.XTeK4rDkcoJaUSsIxCGlwmc7FAPeOQ3gk1lSLYRDkGs@z> (raw) In-Reply-To: <20190426150236.1af2ff08@x1.home> > > In my testing it looks like KVM advertises supporting the KVM_IRQFD > > resample feature, but vfio never gets the unmask notification, so the > > device remains with DisINTx set and no further interrupts are > > generated. Do we expect KVM's IRQFD with resampler to work in the > > split IRQ mode? We can certainly hope that "high performance" devices > > use MSI or MSI/X, but this would be quite a performance regression with > > split mode if our userspace bypass for INTx goes away. Thanks, > > arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before > kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via > kvm_notify_acked_gsi(), That wouldn't help because kvm_ioapic_update_eoi would not even be able to access vcpu->kvm->arch.vioapic (it's NULL). The following untested patch would signal the resamplefd in kvm_ioapic_send_eoi, before requesting the exit to userspace. However I am not sure how QEMU sets up the VFIO eventfds: if I understand correctly, when VFIO writes again to the irq eventfd, the interrupt request would not reach the userspace IOAPIC, but only the in-kernel LAPIC. That would be incorrect, and if my understanding is correct we need to trigger resampling from hw/intc/ioapic.c. Something like: 1) VFIO uses kvm_irqchip_add_irqfd_notifier_gsi instead of KVM_IRQFD directly 2) add a NotifierList in kvm_irqchip_assign_irqfd 3) if kvm_irqchip_is_split(), register a corresponding Notifier in ioapic.c which stores the resamplefd as an EventNotifier* 4) ioapic_eoi_broadcast writes to the resamplefd BTW, how do non-x86 platforms handle intx resampling? Paolo ---- 8< ----- From: Paolo Bonzini <pbonzini@redhat.com> Subject: [PATCH WIP] kvm: lapic: run ack notifiers for userspace IOAPIC routes diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h index ea1a4e0297da..be337e06e3fd 100644 --- a/arch/x86/kvm/ioapic.h +++ b/arch/x86/kvm/ioapic.h @@ -135,4 +135,6 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, ulong *ioapic_handled_vectors); void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, ulong *ioapic_handled_vectors); +void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector); + #endif diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c index 3cc3b2d130a0..46ea79a0dd8a 100644 --- a/arch/x86/kvm/irq_comm.c +++ b/arch/x86/kvm/irq_comm.c @@ -435,6 +435,34 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, srcu_read_unlock(&kvm->irq_srcu, idx); } +void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector) +{ + struct kvm_kernel_irq_routing_entry *entry; + struct kvm_irq_routing_table *table; + u32 i, nr_ioapic_pins; + int idx; + + idx = srcu_read_lock(&kvm->irq_srcu); + table = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu); + nr_ioapic_pins = min_t(u32, table->nr_rt_entries, + kvm->arch.nr_reserved_ioapic_pins); + + for (i = 0; i < nr_ioapic_pins; i++) { + hlist_for_each_entry(entry, &table->map[i], link) { + struct kvm_lapic_irq irq; + + if (entry->type != KVM_IRQ_ROUTING_MSI) + continue; + + kvm_set_msi_irq(kvm, entry, &irq); + if (irq.level && irq.vector == vector) + kvm_notify_acked_gsi(kvm, i); + } + } + + srcu_read_unlock(&kvm->irq_srcu, idx); +} + void kvm_arch_irq_routing_update(struct kvm *kvm) { kvm_hv_irq_routing_update(kvm); diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 9f089e2e09d0..8f8e76d77925 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1142,6 +1142,7 @@ static bool kvm_ioapic_handles_vector(struct kvm_lapic *apic, int vector) static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector) { + struct kvm_vcpu *vcpu = apic->vcpu; int trigger_mode; /* Eoi the ioapic only if the ioapic doesn't own the vector. */ @@ -1149,9 +1150,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector) return; /* Request a KVM exit to inform the userspace IOAPIC. */ - if (irqchip_split(apic->vcpu->kvm)) { - apic->vcpu->arch.pending_ioapic_eoi = vector; - kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu); + if (irqchip_split(vcpu->kvm)) { + kvm_notify_userspace_ioapic_routes(vcpu->kvm, vector); + vcpu->arch.pending_ioapic_eoi = vector; + kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, vcpu); return; } @@ -1160,7 +1162,7 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector) else trigger_mode = IOAPIC_EDGE_TRIG; - kvm_ioapic_update_eoi(apic->vcpu, vector, trigger_mode); + kvm_ioapic_update_eoi(vcpu, vector, trigger_mode); } static int apic_set_eoi(struct kvm_lapic *apic) > so it looks like KVM really ought to return an > error when trying to register a resample IRQFD when irqchip_split(), > but do we have better options? EOI handling in QEMU is pretty much > non-existent and even if it was in place, it's a big performance > regression for vfio INTx handling. Is a split irqchip that retains > performant resampling (EOI) support possible? Thanks,
next prev parent reply other threads:[~2019-04-27 5:29 UTC|newest] Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-12-20 5:40 [Qemu-devel] [PATCH v2 0/3] q35: change defaults for kernel irqchip and IR Peter Xu 2018-12-20 5:40 ` [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default Peter Xu 2018-12-20 12:13 ` Eduardo Habkost 2019-04-26 19:27 ` Alex Williamson 2019-04-26 19:27 ` Alex Williamson 2019-04-26 21:02 ` Alex Williamson 2019-04-26 21:02 ` Alex Williamson 2019-04-27 5:29 ` Paolo Bonzini [this message] 2019-04-27 5:29 ` Paolo Bonzini 2019-04-27 8:09 ` Paolo Bonzini 2019-04-27 8:09 ` Paolo Bonzini 2019-04-29 14:21 ` Alex Williamson 2019-04-29 14:21 ` Alex Williamson 2019-04-29 14:55 ` Eduardo Habkost 2019-04-29 14:55 ` Eduardo Habkost 2019-04-29 15:22 ` Alex Williamson 2019-04-29 15:22 ` Alex Williamson 2019-05-03 18:42 ` Eduardo Habkost 2019-05-03 18:42 ` Eduardo Habkost 2019-05-05 9:06 ` Peter Xu 2019-05-05 9:06 ` Peter Xu 2019-05-06 16:13 ` Paolo Bonzini 2019-05-06 21:16 ` Eduardo Habkost 2019-05-03 20:00 ` Michael S. Tsirkin 2019-05-03 20:00 ` Michael S. Tsirkin 2019-05-03 20:23 ` Eduardo Habkost 2019-05-03 20:23 ` Eduardo Habkost 2019-05-06 16:17 ` Paolo Bonzini 2019-04-30 23:01 ` Alex Williamson 2019-04-30 23:01 ` Alex Williamson 2019-04-30 23:09 ` Paolo Bonzini 2019-04-30 23:09 ` Paolo Bonzini 2019-05-01 0:28 ` Alex Williamson 2019-05-01 0:28 ` Alex Williamson 2018-12-20 5:40 ` [Qemu-devel] [PATCH v2 2/3] x86-iommu: switch intr_supported to OnOffAuto type Peter Xu 2018-12-20 5:40 ` [Qemu-devel] [PATCH v2 3/3] x86-iommu: turn on IR by default if proper Peter Xu 2018-12-20 10:00 ` [Qemu-devel] [PATCH v2 0/3] q35: change defaults for kernel irqchip and IR Paolo Bonzini 2018-12-20 10:16 ` Peter Xu
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=94415012.15677076.1556342950794.JavaMail.zimbra@redhat.com \ --to=pbonzini@redhat.com \ --cc=alex.williamson@redhat.com \ --cc=ehabkost@redhat.com \ --cc=imammedo@redhat.com \ --cc=mst@redhat.com \ --cc=peterx@redhat.com \ --cc=qemu-devel@nongnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).