qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Peter Xu <peterx@redhat.com>,
	qemu-devel@nongnu.org, Igor Mammedov <imammedo@redhat.com>,
	Eduardo Habkost <ehabkost@redhat.com>,
	"Michael S . Tsirkin" <mst@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
Date: Sat, 27 Apr 2019 01:29:10 -0400 (EDT)	[thread overview]
Message-ID: <94415012.15677076.1556342950794.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20190426150236.1af2ff08@x1.home>


> > In my testing it looks like KVM advertises supporting the KVM_IRQFD
> > resample feature, but vfio never gets the unmask notification, so the
> > device remains with DisINTx set and no further interrupts are
> > generated.  Do we expect KVM's IRQFD with resampler to work in the
> > split IRQ mode?  We can certainly hope that "high performance" devices
> > use MSI or MSI/X, but this would be quite a performance regression with
> > split mode if our userspace bypass for INTx goes away.  Thanks,
> 
> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
> kvm_notify_acked_gsi(),

That wouldn't help because kvm_ioapic_update_eoi would not even be
able to access vcpu->kvm->arch.vioapic (it's NULL).

The following untested patch would signal the resamplefd in kvm_ioapic_send_eoi,
before requesting the exit to userspace.  However I am not sure how QEMU
sets up the VFIO eventfds: if I understand correctly, when VFIO writes again to
the irq eventfd, the interrupt request would not reach the userspace IOAPIC, but
only the in-kernel LAPIC.  That would be incorrect, and if my understanding is
correct we need to trigger resampling from hw/intc/ioapic.c.

Something like:

1) VFIO uses kvm_irqchip_add_irqfd_notifier_gsi instead of KVM_IRQFD directly

2) add a NotifierList in kvm_irqchip_assign_irqfd

3) if kvm_irqchip_is_split(), register a corresponding Notifier in ioapic.c which
stores the resamplefd as an EventNotifier*

4) ioapic_eoi_broadcast writes to the resamplefd

BTW, how do non-x86 platforms handle intx resampling?

Paolo

---- 8< -----
From: Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH WIP] kvm: lapic: run ack notifiers for userspace IOAPIC routes

diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index ea1a4e0297da..be337e06e3fd 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -135,4 +135,6 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
 			   ulong *ioapic_handled_vectors);
 void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
 			    ulong *ioapic_handled_vectors);
+void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector);
+
 #endif
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 3cc3b2d130a0..46ea79a0dd8a 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -435,6 +435,34 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
 	srcu_read_unlock(&kvm->irq_srcu, idx);
 }
 
+void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector)
+{
+        struct kvm_kernel_irq_routing_entry *entry;
+        struct kvm_irq_routing_table *table;
+        u32 i, nr_ioapic_pins;
+        int idx;
+
+        idx = srcu_read_lock(&kvm->irq_srcu);
+        table = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
+        nr_ioapic_pins = min_t(u32, table->nr_rt_entries,
+                               kvm->arch.nr_reserved_ioapic_pins);
+
+	for (i = 0; i < nr_ioapic_pins; i++) {
+                hlist_for_each_entry(entry, &table->map[i], link) {
+                        struct kvm_lapic_irq irq;
+
+                        if (entry->type != KVM_IRQ_ROUTING_MSI)
+                                continue;
+
+			kvm_set_msi_irq(kvm, entry, &irq);
+			if (irq.level && irq.vector == vector)
+				kvm_notify_acked_gsi(kvm, i);
+		}
+	}
+
+	srcu_read_unlock(&kvm->irq_srcu, idx);
+}
+
 void kvm_arch_irq_routing_update(struct kvm *kvm)
 {
 	kvm_hv_irq_routing_update(kvm);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9f089e2e09d0..8f8e76d77925 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1142,6 +1142,7 @@ static bool kvm_ioapic_handles_vector(struct kvm_lapic *apic, int vector)
 
 static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
 {
+	struct kvm_vcpu *vcpu = apic->vcpu;
 	int trigger_mode;
 
 	/* Eoi the ioapic only if the ioapic doesn't own the vector. */
@@ -1149,9 +1150,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
 		return;
 
 	/* Request a KVM exit to inform the userspace IOAPIC. */
-	if (irqchip_split(apic->vcpu->kvm)) {
-		apic->vcpu->arch.pending_ioapic_eoi = vector;
-		kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
+	if (irqchip_split(vcpu->kvm)) {
+		kvm_notify_userspace_ioapic_routes(vcpu->kvm, vector);
+		vcpu->arch.pending_ioapic_eoi = vector;
+		kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, vcpu);
 		return;
 	}
 
@@ -1160,7 +1162,7 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
 	else
 		trigger_mode = IOAPIC_EDGE_TRIG;
 
-	kvm_ioapic_update_eoi(apic->vcpu, vector, trigger_mode);
+	kvm_ioapic_update_eoi(vcpu, vector, trigger_mode);
 }
 
 static int apic_set_eoi(struct kvm_lapic *apic)


> so it looks like KVM really ought to return an
> error when trying to register a resample IRQFD when irqchip_split(),
> but do we have better options?  EOI handling in QEMU is pretty much
> non-existent and even if it was in place, it's a big performance
> regression for vfio INTx handling.  Is a split irqchip that retains
> performant resampling (EOI) support possible?  Thanks,

WARNING: multiple messages have this Message-ID (diff)
From: Paolo Bonzini <pbonzini@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org, Peter Xu <peterx@redhat.com>,
	Eduardo Habkost <ehabkost@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
Date: Sat, 27 Apr 2019 01:29:10 -0400 (EDT)	[thread overview]
Message-ID: <94415012.15677076.1556342950794.JavaMail.zimbra@redhat.com> (raw)
Message-ID: <20190427052910.XTeK4rDkcoJaUSsIxCGlwmc7FAPeOQ3gk1lSLYRDkGs@z> (raw)
In-Reply-To: <20190426150236.1af2ff08@x1.home>


> > In my testing it looks like KVM advertises supporting the KVM_IRQFD
> > resample feature, but vfio never gets the unmask notification, so the
> > device remains with DisINTx set and no further interrupts are
> > generated.  Do we expect KVM's IRQFD with resampler to work in the
> > split IRQ mode?  We can certainly hope that "high performance" devices
> > use MSI or MSI/X, but this would be quite a performance regression with
> > split mode if our userspace bypass for INTx goes away.  Thanks,
> 
> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
> kvm_notify_acked_gsi(),

That wouldn't help because kvm_ioapic_update_eoi would not even be
able to access vcpu->kvm->arch.vioapic (it's NULL).

The following untested patch would signal the resamplefd in kvm_ioapic_send_eoi,
before requesting the exit to userspace.  However I am not sure how QEMU
sets up the VFIO eventfds: if I understand correctly, when VFIO writes again to
the irq eventfd, the interrupt request would not reach the userspace IOAPIC, but
only the in-kernel LAPIC.  That would be incorrect, and if my understanding is
correct we need to trigger resampling from hw/intc/ioapic.c.

Something like:

1) VFIO uses kvm_irqchip_add_irqfd_notifier_gsi instead of KVM_IRQFD directly

2) add a NotifierList in kvm_irqchip_assign_irqfd

3) if kvm_irqchip_is_split(), register a corresponding Notifier in ioapic.c which
stores the resamplefd as an EventNotifier*

4) ioapic_eoi_broadcast writes to the resamplefd

BTW, how do non-x86 platforms handle intx resampling?

Paolo

---- 8< -----
From: Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH WIP] kvm: lapic: run ack notifiers for userspace IOAPIC routes

diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index ea1a4e0297da..be337e06e3fd 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -135,4 +135,6 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
 			   ulong *ioapic_handled_vectors);
 void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
 			    ulong *ioapic_handled_vectors);
+void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector);
+
 #endif
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 3cc3b2d130a0..46ea79a0dd8a 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -435,6 +435,34 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
 	srcu_read_unlock(&kvm->irq_srcu, idx);
 }
 
+void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector)
+{
+        struct kvm_kernel_irq_routing_entry *entry;
+        struct kvm_irq_routing_table *table;
+        u32 i, nr_ioapic_pins;
+        int idx;
+
+        idx = srcu_read_lock(&kvm->irq_srcu);
+        table = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
+        nr_ioapic_pins = min_t(u32, table->nr_rt_entries,
+                               kvm->arch.nr_reserved_ioapic_pins);
+
+	for (i = 0; i < nr_ioapic_pins; i++) {
+                hlist_for_each_entry(entry, &table->map[i], link) {
+                        struct kvm_lapic_irq irq;
+
+                        if (entry->type != KVM_IRQ_ROUTING_MSI)
+                                continue;
+
+			kvm_set_msi_irq(kvm, entry, &irq);
+			if (irq.level && irq.vector == vector)
+				kvm_notify_acked_gsi(kvm, i);
+		}
+	}
+
+	srcu_read_unlock(&kvm->irq_srcu, idx);
+}
+
 void kvm_arch_irq_routing_update(struct kvm *kvm)
 {
 	kvm_hv_irq_routing_update(kvm);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9f089e2e09d0..8f8e76d77925 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1142,6 +1142,7 @@ static bool kvm_ioapic_handles_vector(struct kvm_lapic *apic, int vector)
 
 static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
 {
+	struct kvm_vcpu *vcpu = apic->vcpu;
 	int trigger_mode;
 
 	/* Eoi the ioapic only if the ioapic doesn't own the vector. */
@@ -1149,9 +1150,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
 		return;
 
 	/* Request a KVM exit to inform the userspace IOAPIC. */
-	if (irqchip_split(apic->vcpu->kvm)) {
-		apic->vcpu->arch.pending_ioapic_eoi = vector;
-		kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
+	if (irqchip_split(vcpu->kvm)) {
+		kvm_notify_userspace_ioapic_routes(vcpu->kvm, vector);
+		vcpu->arch.pending_ioapic_eoi = vector;
+		kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, vcpu);
 		return;
 	}
 
@@ -1160,7 +1162,7 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
 	else
 		trigger_mode = IOAPIC_EDGE_TRIG;
 
-	kvm_ioapic_update_eoi(apic->vcpu, vector, trigger_mode);
+	kvm_ioapic_update_eoi(vcpu, vector, trigger_mode);
 }
 
 static int apic_set_eoi(struct kvm_lapic *apic)


> so it looks like KVM really ought to return an
> error when trying to register a resample IRQFD when irqchip_split(),
> but do we have better options?  EOI handling in QEMU is pretty much
> non-existent and even if it was in place, it's a big performance
> regression for vfio INTx handling.  Is a split irqchip that retains
> performant resampling (EOI) support possible?  Thanks,


  parent reply	other threads:[~2019-04-27  5:29 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-20  5:40 [Qemu-devel] [PATCH v2 0/3] q35: change defaults for kernel irqchip and IR Peter Xu
2018-12-20  5:40 ` [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default Peter Xu
2018-12-20 12:13   ` Eduardo Habkost
2019-04-26 19:27   ` Alex Williamson
2019-04-26 19:27     ` Alex Williamson
2019-04-26 21:02     ` Alex Williamson
2019-04-26 21:02       ` Alex Williamson
2019-04-27  5:29       ` Paolo Bonzini [this message]
2019-04-27  5:29         ` Paolo Bonzini
2019-04-27  8:09         ` Paolo Bonzini
2019-04-27  8:09           ` Paolo Bonzini
2019-04-29 14:21           ` Alex Williamson
2019-04-29 14:21             ` Alex Williamson
2019-04-29 14:55             ` Eduardo Habkost
2019-04-29 14:55               ` Eduardo Habkost
2019-04-29 15:22               ` Alex Williamson
2019-04-29 15:22                 ` Alex Williamson
2019-05-03 18:42                 ` Eduardo Habkost
2019-05-03 18:42                   ` Eduardo Habkost
2019-05-05  9:06                   ` Peter Xu
2019-05-05  9:06                     ` Peter Xu
2019-05-06 16:13                     ` Paolo Bonzini
2019-05-06 21:16                       ` Eduardo Habkost
2019-05-03 20:00               ` Michael S. Tsirkin
2019-05-03 20:00                 ` Michael S. Tsirkin
2019-05-03 20:23                 ` Eduardo Habkost
2019-05-03 20:23                   ` Eduardo Habkost
2019-05-06 16:17                 ` Paolo Bonzini
2019-04-30 23:01             ` Alex Williamson
2019-04-30 23:01               ` Alex Williamson
2019-04-30 23:09               ` Paolo Bonzini
2019-04-30 23:09                 ` Paolo Bonzini
2019-05-01  0:28                 ` Alex Williamson
2019-05-01  0:28                   ` Alex Williamson
2018-12-20  5:40 ` [Qemu-devel] [PATCH v2 2/3] x86-iommu: switch intr_supported to OnOffAuto type Peter Xu
2018-12-20  5:40 ` [Qemu-devel] [PATCH v2 3/3] x86-iommu: turn on IR by default if proper Peter Xu
2018-12-20 10:00 ` [Qemu-devel] [PATCH v2 0/3] q35: change defaults for kernel irqchip and IR Paolo Bonzini
2018-12-20 10:16   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=94415012.15677076.1556342950794.JavaMail.zimbra@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).