* [KVM PATCH v7 0/3] kvm: eventfd interfaces (formerly irqfd)
@ 2009-05-12 18:26 Gregory Haskins
2009-05-12 18:26 ` [KVM PATCH v7 1/3] eventfd: export eventfd interfaces for module use Gregory Haskins
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Gregory Haskins @ 2009-05-12 18:26 UTC (permalink / raw)
To: kvm; +Cc: viro, linux-kernel, avi, davidel
(Applies to kvm.git:b5e725fa)
This is v7 of the series. We have generalized the name of the series (as well
as some of the hunks in the series) to reflect the fact that we have multiple
eventfd based components.
This series has been tested and appears to be working as intended. You can
download the unit-test used to verify this here:
ftp://ftp.novell.com/dev/ghaskins/kvm-eventfd.tar.bz
This will also serve as an example on how to use the new interfaces.
[ Changelog:
v7:
*) Added "iofd" to allow PIO/MMIO writes to generate an eventfd
signal. This was previously discussed as "hypercallfd", but
since explicit hypercalls are not looking to be very popular,
and based on the fact that they were not going to carry payload
anyway, I named them "iofd".
*) Generalized some of the code so that irqfd and iofd could be
logically grouped together. For instance
s/KVM_CAP_IRQFD/KVM_CAP_EVENTFD and
virt/kvm/irqfd.c becomes virt/kvm/eventfd.c
*) Added support for "deassign" operations to ensure we can properly
support hot-unplug.
*) Reinstated the eventfd EXPORT_SYMBOL patch since we need it again
for supporting iofd.
*) Rebased to kvm.git:b5e725fa
v6:
*) Moved eventfd creation back to userspace, per Avi's request
*) Dropped no longer necessary supporting patches from series
*) Rebased to kvm.git:833367b57
v5:
*) Added padding to the ioctl structure
*) Added proper ref-count increment to the file before returning
success. (Needs review by Al Viro, Davide Libenzi)
*) Cleaned up error-handling path to make sure we remove ourself
from the waitq if necessary.
*) Make sure we only add ourselves to kvm->irqfds if successful
creating the irqfd in the first place.
*) Rebased to kvm.git:66b0aed4
v4:
*) Changed allocation model to create the new fd last, after
we get past the last potential error point by using Davide's
new eventfd_file_create interface (Al Viro, Davide Libenzi)
*) We no longer export sys_eventfd2() since it is replaced
functionally with eventfd_file_create();
*) Rebased to kvm.git:7da2e3ba
v3:
*) The kernel now allocates the eventfd (need to export sys_eventfd2)
*) Added a flags field for future expansion to kvm_irqfd()
*) We properly toggle the irq level 1+0.
*) We re-use the USERSPACE_SRC_ID instead of creating our own
*) Properly check for failures establishing a poll-table with eventfd
*) Fixed fd/file leaks on failure
*) Rebased to lateste kvm.git::41b76d8d04
v2:
*) Dropped notifier_chain based callbacks in favor of
wait_queue_t::func and file::poll based callbacks (Thanks to
Davide for the suggestion)
v1:
*) Initial release
]
[ Todo:
*) Implement the bus_io_unregister() function so the iofd hot-unplug
path may be completed
*) Test the hot-unplug path
]
---
Gregory Haskins (3):
kvm: add iofd support
kvm: add support for irqfd via eventfd-notification interface
eventfd: export eventfd interfaces for module use
arch/x86/kvm/Makefile | 2
arch/x86/kvm/x86.c | 1
fs/eventfd.c | 3
include/linux/kvm.h | 22 +++
include/linux/kvm_host.h | 7 +
virt/kvm/eventfd.c | 294 ++++++++++++++++++++++++++++++++++++++++++++++
virt/kvm/kvm_main.c | 33 +++++
7 files changed, 361 insertions(+), 1 deletions(-)
create mode 100644 virt/kvm/eventfd.c
--
Signature
^ permalink raw reply [flat|nested] 20+ messages in thread* [KVM PATCH v7 1/3] eventfd: export eventfd interfaces for module use 2009-05-12 18:26 [KVM PATCH v7 0/3] kvm: eventfd interfaces (formerly irqfd) Gregory Haskins @ 2009-05-12 18:26 ` Gregory Haskins 2009-05-12 19:02 ` Davide Libenzi 2009-05-12 18:26 ` [KVM PATCH v7 2/3] kvm: add support for irqfd via eventfd-notification interface Gregory Haskins 2009-05-12 18:27 ` [KVM PATCH v7 3/3] kvm: add iofd support Gregory Haskins 2 siblings, 1 reply; 20+ messages in thread From: Gregory Haskins @ 2009-05-12 18:26 UTC (permalink / raw) To: kvm; +Cc: viro, linux-kernel, avi, davidel We want to use eventfd from KVM which can be compiled as a module, so export the interfaces. Signed-off-by: Gregory Haskins <ghaskins@novell.com> --- fs/eventfd.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/eventfd.c b/fs/eventfd.c index 2a701d5..3f0e197 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -16,6 +16,7 @@ #include <linux/anon_inodes.h> #include <linux/eventfd.h> #include <linux/syscalls.h> +#include <linux/module.h> struct eventfd_ctx { wait_queue_head_t wqh; @@ -56,6 +57,7 @@ int eventfd_signal(struct file *file, int n) return n; } +EXPORT_SYMBOL_GPL(eventfd_signal); static int eventfd_release(struct inode *inode, struct file *file) { @@ -197,6 +199,7 @@ struct file *eventfd_fget(int fd) return file; } +EXPORT_SYMBOL_GPL(eventfd_fget); SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags) { ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [KVM PATCH v7 1/3] eventfd: export eventfd interfaces for module use 2009-05-12 18:26 ` [KVM PATCH v7 1/3] eventfd: export eventfd interfaces for module use Gregory Haskins @ 2009-05-12 19:02 ` Davide Libenzi 0 siblings, 0 replies; 20+ messages in thread From: Davide Libenzi @ 2009-05-12 19:02 UTC (permalink / raw) To: Gregory Haskins; +Cc: kvm, viro, Linux Kernel Mailing List, avi On Tue, 12 May 2009, Gregory Haskins wrote: > We want to use eventfd from KVM which can be compiled as a module, so > export the interfaces. > > Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> - Davide ^ permalink raw reply [flat|nested] 20+ messages in thread
* [KVM PATCH v7 2/3] kvm: add support for irqfd via eventfd-notification interface 2009-05-12 18:26 [KVM PATCH v7 0/3] kvm: eventfd interfaces (formerly irqfd) Gregory Haskins 2009-05-12 18:26 ` [KVM PATCH v7 1/3] eventfd: export eventfd interfaces for module use Gregory Haskins @ 2009-05-12 18:26 ` Gregory Haskins 2009-05-14 9:47 ` Avi Kivity 2009-05-14 11:22 ` Avi Kivity 2009-05-12 18:27 ` [KVM PATCH v7 3/3] kvm: add iofd support Gregory Haskins 2 siblings, 2 replies; 20+ messages in thread From: Gregory Haskins @ 2009-05-12 18:26 UTC (permalink / raw) To: kvm; +Cc: viro, linux-kernel, avi, davidel KVM provides a complete virtual system environment for guests, including support for injecting interrupts modeled after the real exception/interrupt facilities present on the native platform (such as the IDT on x86). Virtual interrupts can come from a variety of sources (emulated devices, pass-through devices, etc) but all must be injected to the guest via the KVM infrastructure. This patch adds a new mechanism to inject a specific interrupt to a guest using a decoupled eventfd mechnanism: Any legal signal on the irqfd (using eventfd semantics from either userspace or kernel) will translate into an injected interrupt in the guest at the next available interrupt window. Signed-off-by: Gregory Haskins <ghaskins@novell.com> --- arch/x86/kvm/Makefile | 2 arch/x86/kvm/x86.c | 1 include/linux/kvm.h | 10 ++ include/linux/kvm_host.h | 5 + virt/kvm/eventfd.c | 187 ++++++++++++++++++++++++++++++++++++++++++++++ virt/kvm/kvm_main.c | 20 +++++ 6 files changed, 224 insertions(+), 1 deletions(-) create mode 100644 virt/kvm/eventfd.c diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index b43c4ef..4d50904 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -3,7 +3,7 @@ # common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \ - coalesced_mmio.o irq_comm.o) + coalesced_mmio.o irq_comm.o eventfd.o) ifeq ($(CONFIG_KVM_TRACE),y) common-objs += $(addprefix ../../../virt/kvm/, kvm_trace.o) endif diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fd0a571..ba541f6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1026,6 +1026,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_REINJECT_CONTROL: case KVM_CAP_IRQ_INJECT_STATUS: case KVM_CAP_ASSIGN_DEV_IRQ: + case KVM_CAP_EVENTFD: r = 1; break; case KVM_CAP_COALESCED_MMIO: diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 3db5d8d..dfc4bcc 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -415,6 +415,7 @@ struct kvm_trace_rec { #define KVM_CAP_ASSIGN_DEV_IRQ 29 /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */ #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30 +#define KVM_CAP_EVENTFD 31 #ifdef KVM_CAP_IRQ_ROUTING @@ -454,6 +455,13 @@ struct kvm_irq_routing { #endif +struct kvm_irqfd { + __u32 fd; + __u32 gsi; + __u32 flags; + __u8 pad[20]; +}; + /* * ioctls for VM fds */ @@ -498,6 +506,8 @@ struct kvm_irq_routing { #define KVM_ASSIGN_SET_MSIX_ENTRY \ _IOW(KVMIO, 0x74, struct kvm_assigned_msix_entry) #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq) +#define KVM_ASSIGN_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd) +#define KVM_DEASSIGN_IRQFD _IOW(KVMIO, 0x77, __u32) /* * ioctls for vcpu fds diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 2b8df0c..1acc528 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -134,6 +134,7 @@ struct kvm { struct list_head vm_list; struct kvm_io_bus mmio_bus; struct kvm_io_bus pio_bus; + struct list_head irqfds; struct kvm_vm_stat stat; struct kvm_arch arch; atomic_t users_count; @@ -525,4 +526,8 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {} #endif +int kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi, int flags); +int kvm_deassign_irqfd(struct kvm *kvm, int fd); +void kvm_irqfd_release(struct kvm *kvm); + #endif diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c new file mode 100644 index 0000000..71afd62 --- /dev/null +++ b/virt/kvm/eventfd.c @@ -0,0 +1,187 @@ +/* + * kvm eventfd support - use eventfd objects to signal various KVM events + * + * Copyright 2009 Novell. All Rights Reserved. + * + * Author: + * Gregory Haskins <ghaskins@novell.com> + * + * This file is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +#include <linux/kvm_host.h> +#include <linux/workqueue.h> +#include <linux/syscalls.h> +#include <linux/wait.h> +#include <linux/poll.h> +#include <linux/file.h> +#include <linux/list.h> + +/* + * -------------------------------------------------------------------- + * irqfd: Allows an fd to be used to inject an interrupt to the guest + * + * Credit goes to Avi Kivity for the original idea. + * -------------------------------------------------------------------- + */ +struct _irqfd { + struct kvm *kvm; + int gsi; + int fd; + struct file *file; + struct list_head list; + poll_table pt; + wait_queue_head_t *wqh; + wait_queue_t wait; + struct work_struct work; +}; + +static void +irqfd_inject(struct work_struct *work) +{ + struct _irqfd *irqfd = container_of(work, struct _irqfd, work); + struct kvm *kvm = irqfd->kvm; + + mutex_lock(&kvm->lock); + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); + mutex_unlock(&kvm->lock); +} + +static int +irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) +{ + struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait); + + /* + * The wake_up with interrupts disabled. Therefore we need to defer + * the IRQ injection until later since we need to acquire the + * kvm->lock to do so. + */ + schedule_work(&irqfd->work); + + return 0; +} + +static void +irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh, + poll_table *pt) +{ + struct _irqfd *irqfd = container_of(pt, struct _irqfd, pt); + + irqfd->wqh = wqh; + add_wait_queue(wqh, &irqfd->wait); +} + +int +kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi, int flags) +{ + struct _irqfd *irqfd; + struct file *file = NULL; + int ret; + + irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL); + if (!irqfd) + return -ENOMEM; + + irqfd->kvm = kvm; + irqfd->gsi = gsi; + irqfd->fd = fd; + INIT_LIST_HEAD(&irqfd->list); + INIT_WORK(&irqfd->work, irqfd_inject); + + /* + * Embed the file* lifetime in the irqfd. + */ + file = fget(fd); + if (IS_ERR(file)) { + ret = PTR_ERR(file); + goto fail; + } + + /* + * Install our own custom wake-up handling so we are notified via + * a callback whenever someone signals the underlying eventfd + */ + init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup); + init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc); + + ret = file->f_op->poll(file, &irqfd->pt); + if (ret < 0) + goto fail; + + irqfd->file = file; + + mutex_lock(&kvm->lock); + list_add_tail(&irqfd->list, &kvm->irqfds); + mutex_unlock(&kvm->lock); + + return 0; + +fail: + if (irqfd->wqh) + remove_wait_queue(irqfd->wqh, &irqfd->wait); + + if (file && !IS_ERR(file)) + fput(file); + + kfree(irqfd); + return ret; +} + +static void +irqfd_release(struct _irqfd *irqfd) +{ + remove_wait_queue(irqfd->wqh, &irqfd->wait); + + flush_work(&irqfd->work); + fput(irqfd->file); + + list_del(&irqfd->list); + kfree(irqfd); +} + +int +kvm_deassign_irqfd(struct kvm *kvm, int fd) +{ + struct _irqfd *irqfd, *tmp; + + mutex_lock(&kvm->lock); + + /* + * linear search isn't brilliant, but this should be a infrequent + * operation and the list should not grow very large + */ + list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds, list) { + if (irqfd->fd != fd) + continue; + + irqfd_release(irqfd); + mutex_unlock(&kvm->lock); + return 0; + } + mutex_unlock(&kvm->lock); + + return -ENOENT; +} + +void +kvm_irqfd_release(struct kvm *kvm) +{ + struct _irqfd *irqfd, *tmp; + + /* don't bother with the lock..we are shutting down */ + list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds, list) + irqfd_release(irqfd); +} diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4d00942..7aa9f0a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -983,6 +983,7 @@ static struct kvm *kvm_create_vm(void) atomic_inc(&kvm->mm->mm_count); spin_lock_init(&kvm->mmu_lock); kvm_io_bus_init(&kvm->pio_bus); + INIT_LIST_HEAD(&kvm->irqfds); mutex_init(&kvm->lock); kvm_io_bus_init(&kvm->mmio_bus); init_rwsem(&kvm->slots_lock); @@ -1034,6 +1035,7 @@ static void kvm_destroy_vm(struct kvm *kvm) spin_lock(&kvm_lock); list_del(&kvm->vm_list); spin_unlock(&kvm_lock); + kvm_irqfd_release(kvm); kvm_free_irq_routing(kvm); kvm_io_bus_destroy(&kvm->pio_bus); kvm_io_bus_destroy(&kvm->mmio_bus); @@ -2208,6 +2210,24 @@ static long kvm_vm_ioctl(struct file *filp, } #endif #endif /* KVM_CAP_IRQ_ROUTING */ + case KVM_ASSIGN_IRQFD: { + struct kvm_irqfd data; + + r = -EFAULT; + if (copy_from_user(&data, argp, sizeof data)) + goto out; + r = kvm_assign_irqfd(kvm, data.fd, data.gsi, data.flags); + break; + } + case KVM_DEASSIGN_IRQFD: { + u32 data; + + r = -EFAULT; + if (copy_from_user(&data, argp, sizeof data)) + goto out; + r = kvm_deassign_irqfd(kvm, data); + break; + } default: r = kvm_arch_vm_ioctl(filp, ioctl, arg); } ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [KVM PATCH v7 2/3] kvm: add support for irqfd via eventfd-notification interface 2009-05-12 18:26 ` [KVM PATCH v7 2/3] kvm: add support for irqfd via eventfd-notification interface Gregory Haskins @ 2009-05-14 9:47 ` Avi Kivity 2009-05-14 11:52 ` Gregory Haskins 2009-05-14 11:22 ` Avi Kivity 1 sibling, 1 reply; 20+ messages in thread From: Avi Kivity @ 2009-05-14 9:47 UTC (permalink / raw) To: Gregory Haskins; +Cc: kvm, viro, linux-kernel, davidel Gregory Haskins wrote: > KVM provides a complete virtual system environment for guests, including > support for injecting interrupts modeled after the real exception/interrupt > facilities present on the native platform (such as the IDT on x86). > Virtual interrupts can come from a variety of sources (emulated devices, > pass-through devices, etc) but all must be injected to the guest via > the KVM infrastructure. This patch adds a new mechanism to inject a specific > interrupt to a guest using a decoupled eventfd mechnanism: Any legal signal > on the irqfd (using eventfd semantics from either userspace or kernel) will > translate into an injected interrupt in the guest at the next available > interrupt window. > > r = 1; > break; > case KVM_CAP_COALESCED_MMIO: > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > index 3db5d8d..dfc4bcc 100644 > --- a/include/linux/kvm.h > +++ b/include/linux/kvm.h > @@ -415,6 +415,7 @@ struct kvm_trace_rec { > #define KVM_CAP_ASSIGN_DEV_IRQ 29 > /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */ > #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30 > +#define KVM_CAP_EVENTFD 31 > Let's keep a fine granularity and call it IRQFD. > + > +int > +kvm_deassign_irqfd(struct kvm *kvm, int fd) > +{ > + struct _irqfd *irqfd, *tmp; > + > + mutex_lock(&kvm->lock); > + > + /* > + * linear search isn't brilliant, but this should be a infrequent > + * operation and the list should not grow very large > + */ > + list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds, list) { > + if (irqfd->fd != fd) > + continue; > Please fget() the new fd and compare the filps; fds aren't meaningful in the kernel. You can also drop _irqfd::fd. It may also be useful to compare the gsi, this allows a "make-before-break" switchover: - guest reroutes irq to a different gsi - associate irqfd with new gsi - disassociate irqfd from old gsi > + > + irqfd_release(irqfd); > + mutex_unlock(&kvm->lock); > + return 0; > Don't return, userspace may have multiple associations? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [KVM PATCH v7 2/3] kvm: add support for irqfd via eventfd-notification interface 2009-05-14 9:47 ` Avi Kivity @ 2009-05-14 11:52 ` Gregory Haskins 2009-05-14 12:20 ` Avi Kivity 0 siblings, 1 reply; 20+ messages in thread From: Gregory Haskins @ 2009-05-14 11:52 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm, viro, linux-kernel, davidel [-- Attachment #1: Type: text/plain, Size: 2604 bytes --] Avi Kivity wrote: > Gregory Haskins wrote: >> KVM provides a complete virtual system environment for guests, including >> support for injecting interrupts modeled after the real >> exception/interrupt >> facilities present on the native platform (such as the IDT on x86). >> Virtual interrupts can come from a variety of sources (emulated devices, >> pass-through devices, etc) but all must be injected to the guest via >> the KVM infrastructure. This patch adds a new mechanism to inject a >> specific >> interrupt to a guest using a decoupled eventfd mechnanism: Any legal >> signal >> on the irqfd (using eventfd semantics from either userspace or >> kernel) will >> translate into an injected interrupt in the guest at the next available >> interrupt window. >> > >> r = 1; >> break; >> case KVM_CAP_COALESCED_MMIO: >> diff --git a/include/linux/kvm.h b/include/linux/kvm.h >> index 3db5d8d..dfc4bcc 100644 >> --- a/include/linux/kvm.h >> +++ b/include/linux/kvm.h >> @@ -415,6 +415,7 @@ struct kvm_trace_rec { >> #define KVM_CAP_ASSIGN_DEV_IRQ 29 >> /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */ >> #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30 >> +#define KVM_CAP_EVENTFD 31 >> > > Let's keep a fine granularity and call it IRQFD. Yeah, the iofd stuff is still immature and is not likely to be ready at the same time anyway. The CAP bits are cheap enough as it is, so not sure what I was thinking. Will fix. > >> + >> +int >> +kvm_deassign_irqfd(struct kvm *kvm, int fd) >> +{ >> + struct _irqfd *irqfd, *tmp; >> + >> + mutex_lock(&kvm->lock); >> + >> + /* >> + * linear search isn't brilliant, but this should be a infrequent >> + * operation and the list should not grow very large >> + */ >> + list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds, list) { >> + if (irqfd->fd != fd) >> + continue; >> > > Please fget() the new fd and compare the filps; fds aren't meaningful > in the kernel. You can also drop _irqfd::fd. I like this as a second option... > > It may also be useful to compare the gsi, this allows a > "make-before-break" switchover: ...but I like this best. Good idea. > > - guest reroutes irq to a different gsi > - associate irqfd with new gsi > - disassociate irqfd from old gsi > >> + >> + irqfd_release(irqfd); >> + mutex_unlock(&kvm->lock); >> + return 0; >> > > Don't return, userspace may have multiple associations? Parse error. Can you elaborate? -Greg > > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 266 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [KVM PATCH v7 2/3] kvm: add support for irqfd via eventfd-notification interface 2009-05-14 11:52 ` Gregory Haskins @ 2009-05-14 12:20 ` Avi Kivity 2009-05-14 13:12 ` Gregory Haskins 0 siblings, 1 reply; 20+ messages in thread From: Avi Kivity @ 2009-05-14 12:20 UTC (permalink / raw) To: Gregory Haskins; +Cc: kvm, viro, linux-kernel, davidel Gregory Haskins wrote: >> Please fget() the new fd and compare the filps; fds aren't meaningful >> in the kernel. You can also drop _irqfd::fd. >> > > I like this as a second option... > > >> It may also be useful to compare the gsi, this allows a >> "make-before-break" switchover: >> > > ...but I like this best. Good idea. > I thought of comparing both. >> - guest reroutes irq to a different gsi >> - associate irqfd with new gsi >> - disassociate irqfd from old gsi >> >> >>> + >>> + irqfd_release(irqfd); >>> + mutex_unlock(&kvm->lock); >>> + return 0; >>> >>> >> Don't return, userspace may have multiple associations? >> > > Parse error. Can you elaborate? > > You break out of the look when you match your irqfd. But there may be multiple matches. Granted, it doesn't make much sense to hook the same fd to the same gsi multiple times (it may make sense to hook multiple fds to a single gsi, or maybe a single fd to multiple gsis), but it pays to have a consistent do-what-I-said-even-if-it-doesn't-make-sense interface. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [KVM PATCH v7 2/3] kvm: add support for irqfd via eventfd-notification interface 2009-05-14 12:20 ` Avi Kivity @ 2009-05-14 13:12 ` Gregory Haskins 0 siblings, 0 replies; 20+ messages in thread From: Gregory Haskins @ 2009-05-14 13:12 UTC (permalink / raw) To: Avi Kivity; +Cc: Gregory Haskins, kvm, viro, linux-kernel, davidel [-- Attachment #1: Type: text/plain, Size: 1276 bytes --] Avi Kivity wrote: > Gregory Haskins wrote: >>> Please fget() the new fd and compare the filps; fds aren't meaningful >>> in the kernel. You can also drop _irqfd::fd. >>> >> >> I like this as a second option... >> >> >>> It may also be useful to compare the gsi, this allows a >>> "make-before-break" switchover: >>> >> >> ...but I like this best. Good idea. >> > > I thought of comparing both. Ah, ok. I misunderstood. We can do that. > >>> - guest reroutes irq to a different gsi >>> - associate irqfd with new gsi >>> - disassociate irqfd from old gsi >>> >>> >>>> + >>>> + irqfd_release(irqfd); >>>> + mutex_unlock(&kvm->lock); >>>> + return 0; >>>> >>> Don't return, userspace may have multiple associations? >>> >> >> Parse error. Can you elaborate? >> >> > > You break out of the look when you match your irqfd. But there may be > multiple matches. > > Granted, it doesn't make much sense to hook the same fd to the same > gsi multiple times (it may make sense to hook multiple fds to a single > gsi, or maybe a single fd to multiple gsis), but it pays to have a > consistent do-what-I-said-even-if-it-doesn't-make-sense interface. Ack, will do. -Greg [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 266 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [KVM PATCH v7 2/3] kvm: add support for irqfd via eventfd-notification interface 2009-05-12 18:26 ` [KVM PATCH v7 2/3] kvm: add support for irqfd via eventfd-notification interface Gregory Haskins 2009-05-14 9:47 ` Avi Kivity @ 2009-05-14 11:22 ` Avi Kivity 2009-05-14 15:52 ` Gregory Haskins 1 sibling, 1 reply; 20+ messages in thread From: Avi Kivity @ 2009-05-14 11:22 UTC (permalink / raw) To: Gregory Haskins; +Cc: kvm, viro, linux-kernel, davidel Gregory Haskins wrote: > KVM provides a complete virtual system environment for guests, including > support for injecting interrupts modeled after the real exception/interrupt > facilities present on the native platform (such as the IDT on x86). > Virtual interrupts can come from a variety of sources (emulated devices, > pass-through devices, etc) but all must be injected to the guest via > the KVM infrastructure. This patch adds a new mechanism to inject a specific > interrupt to a guest using a decoupled eventfd mechnanism: Any legal signal > on the irqfd (using eventfd semantics from either userspace or kernel) will > translate into an injected interrupt in the guest at the next available > interrupt window. > > + > +static void > +irqfd_inject(struct work_struct *work) > +{ > + struct _irqfd *irqfd = container_of(work, struct _irqfd, work); > + struct kvm *kvm = irqfd->kvm; > + > I think you need to ->read() from the irqfd, otherwise the count will never clear. > + mutex_lock(&kvm->lock); > + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); > + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); > + mutex_unlock(&kvm->lock); > +} > -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [KVM PATCH v7 2/3] kvm: add support for irqfd via eventfd-notification interface 2009-05-14 11:22 ` Avi Kivity @ 2009-05-14 15:52 ` Gregory Haskins 2009-05-15 3:22 ` Davide Libenzi 0 siblings, 1 reply; 20+ messages in thread From: Gregory Haskins @ 2009-05-14 15:52 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm, viro, linux-kernel, davidel [-- Attachment #1: Type: text/plain, Size: 1974 bytes --] Avi Kivity wrote: > Gregory Haskins wrote: >> KVM provides a complete virtual system environment for guests, including >> support for injecting interrupts modeled after the real >> exception/interrupt >> facilities present on the native platform (such as the IDT on x86). >> Virtual interrupts can come from a variety of sources (emulated devices, >> pass-through devices, etc) but all must be injected to the guest via >> the KVM infrastructure. This patch adds a new mechanism to inject a >> specific >> interrupt to a guest using a decoupled eventfd mechnanism: Any legal >> signal >> on the irqfd (using eventfd semantics from either userspace or >> kernel) will >> translate into an injected interrupt in the guest at the next available >> interrupt window. >> >> + >> +static void >> +irqfd_inject(struct work_struct *work) >> +{ >> + struct _irqfd *irqfd = container_of(work, struct _irqfd, work); >> + struct kvm *kvm = irqfd->kvm; >> + >> > > > I think you need to ->read() from the irqfd, otherwise the count will > never clear. Yeah, and this is a disavantage to using eventfd vs a custom anon-fd implementation. However, the count is really only there for deciding whether to sleep a traditional eventfd recipient which doesn't really apply in this application. I suppose we could try to invoke the read method (or add a new method to eventfd to allow it to be cleared independent of the f_ops->read() (ala eventfd_signal() vs f_ops->write()). I'm not convinced we really need to worry about it, though. IMO we can just let the count accumulate. But if you insist this loose end should be addressed, perhaps Davide has some thoughts on how to best do this? -Greg > >> + mutex_lock(&kvm->lock); >> + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); >> + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); >> + mutex_unlock(&kvm->lock); >> +} >> > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 266 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [KVM PATCH v7 2/3] kvm: add support for irqfd via eventfd-notification interface 2009-05-14 15:52 ` Gregory Haskins @ 2009-05-15 3:22 ` Davide Libenzi 2009-05-15 3:35 ` Gregory Haskins 0 siblings, 1 reply; 20+ messages in thread From: Davide Libenzi @ 2009-05-15 3:22 UTC (permalink / raw) To: Gregory Haskins; +Cc: Avi Kivity, kvm, viro, Linux Kernel Mailing List On Thu, 14 May 2009, Gregory Haskins wrote: > Avi Kivity wrote: > > Gregory Haskins wrote: > >> KVM provides a complete virtual system environment for guests, including > >> support for injecting interrupts modeled after the real > >> exception/interrupt > >> facilities present on the native platform (such as the IDT on x86). > >> Virtual interrupts can come from a variety of sources (emulated devices, > >> pass-through devices, etc) but all must be injected to the guest via > >> the KVM infrastructure. This patch adds a new mechanism to inject a > >> specific > >> interrupt to a guest using a decoupled eventfd mechnanism: Any legal > >> signal > >> on the irqfd (using eventfd semantics from either userspace or > >> kernel) will > >> translate into an injected interrupt in the guest at the next available > >> interrupt window. > >> > >> + > >> +static void > >> +irqfd_inject(struct work_struct *work) > >> +{ > >> + struct _irqfd *irqfd = container_of(work, struct _irqfd, work); > >> + struct kvm *kvm = irqfd->kvm; > >> + > >> > > > > > > I think you need to ->read() from the irqfd, otherwise the count will > > never clear. > > Yeah, and this is a disavantage to using eventfd vs a custom anon-fd > implementation. > > However, the count is really only there for deciding whether to sleep a > traditional eventfd recipient which doesn't really apply in this > application. I suppose we could try to invoke the read method (or add a > new method to eventfd to allow it to be cleared independent of the > f_ops->read() (ala eventfd_signal() vs f_ops->write()). I'm not > convinced we really need to worry about it, though. IMO we can just let > the count accumulate. > > But if you insist this loose end should be addressed, perhaps Davide has > some thoughts on how to best do this? The counter is 64bit, so at 1M IRQ/s will take about 585K years to saturate. But from a symmetry POV, it may be better to clear it. Maybe with a kernel-side eventfd_read()? - Davide ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [KVM PATCH v7 2/3] kvm: add support for irqfd via eventfd-notification interface 2009-05-15 3:22 ` Davide Libenzi @ 2009-05-15 3:35 ` Gregory Haskins 0 siblings, 0 replies; 20+ messages in thread From: Gregory Haskins @ 2009-05-15 3:35 UTC (permalink / raw) To: Davide Libenzi; +Cc: Avi Kivity, kvm, viro, Linux Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 2503 bytes --] Davide Libenzi wrote: > On Thu, 14 May 2009, Gregory Haskins wrote: > > >> Avi Kivity wrote: >> >>> Gregory Haskins wrote: >>> >>>> KVM provides a complete virtual system environment for guests, including >>>> support for injecting interrupts modeled after the real >>>> exception/interrupt >>>> facilities present on the native platform (such as the IDT on x86). >>>> Virtual interrupts can come from a variety of sources (emulated devices, >>>> pass-through devices, etc) but all must be injected to the guest via >>>> the KVM infrastructure. This patch adds a new mechanism to inject a >>>> specific >>>> interrupt to a guest using a decoupled eventfd mechnanism: Any legal >>>> signal >>>> on the irqfd (using eventfd semantics from either userspace or >>>> kernel) will >>>> translate into an injected interrupt in the guest at the next available >>>> interrupt window. >>>> >>>> + >>>> +static void >>>> +irqfd_inject(struct work_struct *work) >>>> +{ >>>> + struct _irqfd *irqfd = container_of(work, struct _irqfd, work); >>>> + struct kvm *kvm = irqfd->kvm; >>>> + >>>> >>>> >>> I think you need to ->read() from the irqfd, otherwise the count will >>> never clear. >>> >> Yeah, and this is a disavantage to using eventfd vs a custom anon-fd >> implementation. >> >> However, the count is really only there for deciding whether to sleep a >> traditional eventfd recipient which doesn't really apply in this >> application. I suppose we could try to invoke the read method (or add a >> new method to eventfd to allow it to be cleared independent of the >> f_ops->read() (ala eventfd_signal() vs f_ops->write()). I'm not >> convinced we really need to worry about it, though. IMO we can just let >> the count accumulate. >> >> But if you insist this loose end should be addressed, perhaps Davide has >> some thoughts on how to best do this? >> > > The counter is 64bit, so at 1M IRQ/s will take about 585K years to > saturate. But from a symmetry POV, it may be better to clear it. Maybe > with a kernel-side eventfd_read()? > Hi Davide, I think ultimately that would be the direction to go. I will defer to Avi, but I think we have reached consensus that while its perhaps sloppy to leave the counter untouched, we can back-burner this issue for now and just let it accumulate indefinately. If it becomes an issue down the road we can always fix it then. Thanks, -Greg [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 266 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* [KVM PATCH v7 3/3] kvm: add iofd support 2009-05-12 18:26 [KVM PATCH v7 0/3] kvm: eventfd interfaces (formerly irqfd) Gregory Haskins 2009-05-12 18:26 ` [KVM PATCH v7 1/3] eventfd: export eventfd interfaces for module use Gregory Haskins 2009-05-12 18:26 ` [KVM PATCH v7 2/3] kvm: add support for irqfd via eventfd-notification interface Gregory Haskins @ 2009-05-12 18:27 ` Gregory Haskins 2009-05-12 19:05 ` Gregory Haskins ` (3 more replies) 2 siblings, 4 replies; 20+ messages in thread From: Gregory Haskins @ 2009-05-12 18:27 UTC (permalink / raw) To: kvm; +Cc: viro, linux-kernel, avi, davidel iofd is a mechanism to register PIO/MMIO regions to trigger an eventfd signal when written to. Userspace can register any arbitrary address with a corresponding eventfd. Signed-off-by: Gregory Haskins <ghaskins@novell.com> --- include/linux/kvm.h | 12 +++++ include/linux/kvm_host.h | 2 + virt/kvm/eventfd.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++ virt/kvm/kvm_main.c | 13 ++++++ 4 files changed, 134 insertions(+), 0 deletions(-) diff --git a/include/linux/kvm.h b/include/linux/kvm.h index dfc4bcc..99b6e45 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -292,6 +292,17 @@ struct kvm_guest_debug { struct kvm_guest_debug_arch arch; }; +#define KVM_IOFD_FLAG_DEASSIGN (1 << 0) +#define KVM_IOFD_FLAG_PIO (1 << 1) + +struct kvm_iofd { + __u64 addr; + __u32 len; + __u32 fd; + __u32 flags; + __u8 pad[12]; +}; + #define KVM_TRC_SHIFT 16 /* * kvm trace categories @@ -508,6 +519,7 @@ struct kvm_irqfd { #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq) #define KVM_ASSIGN_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd) #define KVM_DEASSIGN_IRQFD _IOW(KVMIO, 0x77, __u32) +#define KVM_IOFD _IOW(KVMIO, 0x78, struct kvm_iofd) /* * ioctls for vcpu fds diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 1acc528..d53cb70 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -529,5 +529,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {} int kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi, int flags); int kvm_deassign_irqfd(struct kvm *kvm, int fd); void kvm_irqfd_release(struct kvm *kvm); +int kvm_iofd(struct kvm *kvm, unsigned long addr, size_t len, + int fd, int flags); #endif diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 71afd62..8b23317 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -21,12 +21,16 @@ */ #include <linux/kvm_host.h> +#include <linux/kvm.h> #include <linux/workqueue.h> #include <linux/syscalls.h> #include <linux/wait.h> #include <linux/poll.h> #include <linux/file.h> #include <linux/list.h> +#include <linux/eventfd.h> + +#include "iodev.h" /* * -------------------------------------------------------------------- @@ -185,3 +189,106 @@ kvm_irqfd_release(struct kvm *kvm) list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds, list) irqfd_release(irqfd); } + +/* + * -------------------------------------------------------------------- + * iofd: translate a PIO/MMIO memory write to an eventfd signal. + * + * userspace can register a PIO/MMIO address with an eventfd for recieving + * notification when the memory has been touched. + * -------------------------------------------------------------------- + */ + +struct _iofd { + u64 addr; + size_t length; + struct file *file; + struct kvm_io_device dev; +}; + +static int +iofd_in_range(struct kvm_io_device *this, gpa_t addr, int len, int is_write) +{ + struct _iofd *iofd = (struct _iofd *)this->private; + + return ((addr >= iofd->addr && (addr < iofd->addr + iofd->length))); +} + +/* writes trigger an event */ +static void +iofd_write(struct kvm_io_device *this, gpa_t addr, int len, const void *val) +{ + struct _iofd *iofd = (struct _iofd *)this->private; + + eventfd_signal(iofd->file, 1); +} + +/* reads return all zeros */ +static void +iofd_read(struct kvm_io_device *this, gpa_t addr, int len, void *val) +{ + memset(val, 0, len); +} + +static void +iofd_destructor(struct kvm_io_device *this) +{ + struct _iofd *iofd = (struct _iofd *)this->private; + + fput(iofd->file); + kfree(iofd); +} + +static int +kvm_assign_iofd(struct kvm *kvm, unsigned long addr, size_t len, + int fd, int flags) +{ + int pio = flags & KVM_IOFD_FLAG_PIO; + struct kvm_io_bus *bus = pio ? &kvm->pio_bus : &kvm->mmio_bus; + struct _iofd *iofd; + struct file *file; + + file = eventfd_fget(fd); + if (IS_ERR(file)) + return PTR_ERR(file); + + iofd = kzalloc(sizeof(*iofd), GFP_KERNEL); + if (!iofd) { + fput(file); + return -ENOMEM; + } + + iofd->dev.read = iofd_read; + iofd->dev.write = iofd_write; + iofd->dev.in_range = iofd_in_range; + iofd->dev.destructor = iofd_destructor; + iofd->dev.private = iofd; + + iofd->addr = addr; + iofd->length = len; + iofd->file = file; + + kvm_io_bus_register_dev(bus, &iofd->dev); + + printk(KERN_DEBUG "registering %s iofd at %lx of size %d\n", + pio ? "PIO" : "MMIO", addr, (int)len); + + return 0; +} + +static int +kvm_deassign_iofd(struct kvm *kvm, unsigned long addr, size_t len, + int fd, int flags) +{ + /* FIXME: We need an io_bus_unregister() function */ + return -EINVAL; +} + +int +kvm_iofd(struct kvm *kvm, unsigned long addr, size_t len, int fd, int flags) +{ + if (flags & KVM_IOFD_FLAG_DEASSIGN) + return kvm_deassign_iofd(kvm, addr, len, fd, flags); + + return kvm_assign_iofd(kvm, addr, len, fd, flags); +} diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7aa9f0a..a443974 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2228,6 +2228,19 @@ static long kvm_vm_ioctl(struct file *filp, r = kvm_deassign_irqfd(kvm, data); break; } + case KVM_IOFD: { + struct kvm_iofd entry; + + r = -EFAULT; + if (copy_from_user(&entry, argp, sizeof entry)) + goto out; + + r = kvm_iofd(kvm, entry.addr, entry.len, entry.fd, + entry.flags); + if (r) + goto out; + break; + } default: r = kvm_arch_vm_ioctl(filp, ioctl, arg); } ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [KVM PATCH v7 3/3] kvm: add iofd support 2009-05-12 18:27 ` [KVM PATCH v7 3/3] kvm: add iofd support Gregory Haskins @ 2009-05-12 19:05 ` Gregory Haskins 2009-05-12 19:29 ` [KVM PATCH v7.1] " Gregory Haskins ` (2 subsequent siblings) 3 siblings, 0 replies; 20+ messages in thread From: Gregory Haskins @ 2009-05-12 19:05 UTC (permalink / raw) Cc: kvm, viro, linux-kernel, avi, davidel [-- Attachment #1: Type: text/plain, Size: 7271 bytes --] Gregory Haskins wrote: > iofd is a mechanism to register PIO/MMIO regions to trigger an eventfd > signal when written to. Userspace can register any arbitrary address > with a corresponding eventfd. > Ugg..this patch header sucks, especially given all the talk around how we need to do them better lately :) I will add this text as well for future versions: -------------- Traditional MMIO/PIO exit paths are expensive because they are done within the same context as the VCPU thread and therefore cause a VMX/SVM "heavy-weight" exit, a transition back to userspace, and overhead with the qemu processing of the operation. An eventfd mechanism, on the other hand, allows the VCPU to take a very brief lightweight exit only long enough to trigger the eventfd_signal. This means that clients of the eventfd (supporting both userspace or kernel end-points) can potentially get notified much more efficiently than if we were to register through the traditional mechanism via qemu MMIO/PIO notification. ----------- > Signed-off-by: Gregory Haskins <ghaskins@novell.com> > --- > > include/linux/kvm.h | 12 +++++ > include/linux/kvm_host.h | 2 + > virt/kvm/eventfd.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++ > virt/kvm/kvm_main.c | 13 ++++++ > 4 files changed, 134 insertions(+), 0 deletions(-) > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > index dfc4bcc..99b6e45 100644 > --- a/include/linux/kvm.h > +++ b/include/linux/kvm.h > @@ -292,6 +292,17 @@ struct kvm_guest_debug { > struct kvm_guest_debug_arch arch; > }; > > +#define KVM_IOFD_FLAG_DEASSIGN (1 << 0) > +#define KVM_IOFD_FLAG_PIO (1 << 1) > + > +struct kvm_iofd { > + __u64 addr; > + __u32 len; > + __u32 fd; > + __u32 flags; > + __u8 pad[12]; > +}; > + > #define KVM_TRC_SHIFT 16 > /* > * kvm trace categories > @@ -508,6 +519,7 @@ struct kvm_irqfd { > #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq) > #define KVM_ASSIGN_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd) > #define KVM_DEASSIGN_IRQFD _IOW(KVMIO, 0x77, __u32) > +#define KVM_IOFD _IOW(KVMIO, 0x78, struct kvm_iofd) > > /* > * ioctls for vcpu fds > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 1acc528..d53cb70 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -529,5 +529,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {} > int kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi, int flags); > int kvm_deassign_irqfd(struct kvm *kvm, int fd); > void kvm_irqfd_release(struct kvm *kvm); > +int kvm_iofd(struct kvm *kvm, unsigned long addr, size_t len, > + int fd, int flags); > > #endif > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > index 71afd62..8b23317 100644 > --- a/virt/kvm/eventfd.c > +++ b/virt/kvm/eventfd.c > @@ -21,12 +21,16 @@ > */ > > #include <linux/kvm_host.h> > +#include <linux/kvm.h> > #include <linux/workqueue.h> > #include <linux/syscalls.h> > #include <linux/wait.h> > #include <linux/poll.h> > #include <linux/file.h> > #include <linux/list.h> > +#include <linux/eventfd.h> > + > +#include "iodev.h" > > /* > * -------------------------------------------------------------------- > @@ -185,3 +189,106 @@ kvm_irqfd_release(struct kvm *kvm) > list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds, list) > irqfd_release(irqfd); > } > + > +/* > + * -------------------------------------------------------------------- > + * iofd: translate a PIO/MMIO memory write to an eventfd signal. > + * > + * userspace can register a PIO/MMIO address with an eventfd for recieving > + * notification when the memory has been touched. > + * -------------------------------------------------------------------- > + */ > + > +struct _iofd { > + u64 addr; > + size_t length; > + struct file *file; > + struct kvm_io_device dev; > +}; > + > +static int > +iofd_in_range(struct kvm_io_device *this, gpa_t addr, int len, int is_write) > +{ > + struct _iofd *iofd = (struct _iofd *)this->private; > + > + return ((addr >= iofd->addr && (addr < iofd->addr + iofd->length))); > +} > + > +/* writes trigger an event */ > +static void > +iofd_write(struct kvm_io_device *this, gpa_t addr, int len, const void *val) > +{ > + struct _iofd *iofd = (struct _iofd *)this->private; > + > + eventfd_signal(iofd->file, 1); > +} > + > +/* reads return all zeros */ > +static void > +iofd_read(struct kvm_io_device *this, gpa_t addr, int len, void *val) > +{ > + memset(val, 0, len); > +} > + > +static void > +iofd_destructor(struct kvm_io_device *this) > +{ > + struct _iofd *iofd = (struct _iofd *)this->private; > + > + fput(iofd->file); > + kfree(iofd); > +} > + > +static int > +kvm_assign_iofd(struct kvm *kvm, unsigned long addr, size_t len, > + int fd, int flags) > +{ > + int pio = flags & KVM_IOFD_FLAG_PIO; > + struct kvm_io_bus *bus = pio ? &kvm->pio_bus : &kvm->mmio_bus; > + struct _iofd *iofd; > + struct file *file; > + > + file = eventfd_fget(fd); > + if (IS_ERR(file)) > + return PTR_ERR(file); > + > + iofd = kzalloc(sizeof(*iofd), GFP_KERNEL); > + if (!iofd) { > + fput(file); > + return -ENOMEM; > + } > + > + iofd->dev.read = iofd_read; > + iofd->dev.write = iofd_write; > + iofd->dev.in_range = iofd_in_range; > + iofd->dev.destructor = iofd_destructor; > + iofd->dev.private = iofd; > + > + iofd->addr = addr; > + iofd->length = len; > + iofd->file = file; > + > + kvm_io_bus_register_dev(bus, &iofd->dev); > + > + printk(KERN_DEBUG "registering %s iofd at %lx of size %d\n", > + pio ? "PIO" : "MMIO", addr, (int)len); > + > + return 0; > +} > + > +static int > +kvm_deassign_iofd(struct kvm *kvm, unsigned long addr, size_t len, > + int fd, int flags) > +{ > + /* FIXME: We need an io_bus_unregister() function */ > + return -EINVAL; > +} > + > +int > +kvm_iofd(struct kvm *kvm, unsigned long addr, size_t len, int fd, int flags) > +{ > + if (flags & KVM_IOFD_FLAG_DEASSIGN) > + return kvm_deassign_iofd(kvm, addr, len, fd, flags); > + > + return kvm_assign_iofd(kvm, addr, len, fd, flags); > +} > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 7aa9f0a..a443974 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -2228,6 +2228,19 @@ static long kvm_vm_ioctl(struct file *filp, > r = kvm_deassign_irqfd(kvm, data); > break; > } > + case KVM_IOFD: { > + struct kvm_iofd entry; > + > + r = -EFAULT; > + if (copy_from_user(&entry, argp, sizeof entry)) > + goto out; > + > + r = kvm_iofd(kvm, entry.addr, entry.len, entry.fd, > + entry.flags); > + if (r) > + goto out; > + break; > + } > default: > r = kvm_arch_vm_ioctl(filp, ioctl, arg); > } > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 266 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* [KVM PATCH v7.1] kvm: add iofd support 2009-05-12 18:27 ` [KVM PATCH v7 3/3] kvm: add iofd support Gregory Haskins 2009-05-12 19:05 ` Gregory Haskins @ 2009-05-12 19:29 ` Gregory Haskins 2009-05-12 22:17 ` [KVM PATCH v7.2] " Gregory Haskins 2009-05-14 11:11 ` [KVM PATCH v7 3/3] " Avi Kivity 3 siblings, 0 replies; 20+ messages in thread From: Gregory Haskins @ 2009-05-12 19:29 UTC (permalink / raw) To: kvm; +Cc: viro, linux-kernel, avi, davidel [ here is the updated header ] iofd is a mechanism to register PIO/MMIO regions to trigger an eventfd signal when written to. Userspace can register any arbitrary address with a corresponding eventfd. Normal IO requires a synchronous/blocking round-trip since the operation may cause side-effects in the emulated model or may return data to the caller. However, there is a subclass of IO which acts purely as a trigger for other IO (such as to kick off an out-of-band DMA request, etc). For these patterns, the synchronous call is particularly expensive because they are done within the same context as the VCPU thread and therefore cause a VMX/SVM "heavy-weight" exit, a transition back to userspace, and overhead with the qemu locking and decoding operations. Therefore providing the registration of an in-kernel trigger point allows the VCPU to take a very brief lightweight exit only long enough to signal the eventfd. This also means that any clients compatible with the eventfd interface (which includes userspace and kernelspace equally well) can now register for asynchronous notification when one of these signals are generated in the guest. The end result should be a more flexible and higher performance notification API for the backend KVM hypervisor and perhipheral components. Signed-off-by: Gregory Haskins <ghaskins@novell.com> --- include/linux/kvm.h | 12 +++++ include/linux/kvm_host.h | 2 + virt/kvm/eventfd.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++ virt/kvm/kvm_main.c | 13 ++++++ 4 files changed, 134 insertions(+), 0 deletions(-) diff --git a/include/linux/kvm.h b/include/linux/kvm.h index dfc4bcc..99b6e45 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -292,6 +292,17 @@ struct kvm_guest_debug { struct kvm_guest_debug_arch arch; }; +#define KVM_IOFD_FLAG_DEASSIGN (1 << 0) +#define KVM_IOFD_FLAG_PIO (1 << 1) + +struct kvm_iofd { + __u64 addr; + __u32 len; + __u32 fd; + __u32 flags; + __u8 pad[12]; +}; + #define KVM_TRC_SHIFT 16 /* * kvm trace categories @@ -508,6 +519,7 @@ struct kvm_irqfd { #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq) #define KVM_ASSIGN_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd) #define KVM_DEASSIGN_IRQFD _IOW(KVMIO, 0x77, __u32) +#define KVM_IOFD _IOW(KVMIO, 0x78, struct kvm_iofd) /* * ioctls for vcpu fds diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 1acc528..d53cb70 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -529,5 +529,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {} int kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi, int flags); int kvm_deassign_irqfd(struct kvm *kvm, int fd); void kvm_irqfd_release(struct kvm *kvm); +int kvm_iofd(struct kvm *kvm, unsigned long addr, size_t len, + int fd, int flags); #endif diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 71afd62..8b23317 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -21,12 +21,16 @@ */ #include <linux/kvm_host.h> +#include <linux/kvm.h> #include <linux/workqueue.h> #include <linux/syscalls.h> #include <linux/wait.h> #include <linux/poll.h> #include <linux/file.h> #include <linux/list.h> +#include <linux/eventfd.h> + +#include "iodev.h" /* * -------------------------------------------------------------------- @@ -185,3 +189,106 @@ kvm_irqfd_release(struct kvm *kvm) list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds, list) irqfd_release(irqfd); } + +/* + * -------------------------------------------------------------------- + * iofd: translate a PIO/MMIO memory write to an eventfd signal. + * + * userspace can register a PIO/MMIO address with an eventfd for recieving + * notification when the memory has been touched. + * -------------------------------------------------------------------- + */ + +struct _iofd { + u64 addr; + size_t length; + struct file *file; + struct kvm_io_device dev; +}; + +static int +iofd_in_range(struct kvm_io_device *this, gpa_t addr, int len, int is_write) +{ + struct _iofd *iofd = (struct _iofd *)this->private; + + return ((addr >= iofd->addr && (addr < iofd->addr + iofd->length))); +} + +/* writes trigger an event */ +static void +iofd_write(struct kvm_io_device *this, gpa_t addr, int len, const void *val) +{ + struct _iofd *iofd = (struct _iofd *)this->private; + + eventfd_signal(iofd->file, 1); +} + +/* reads return all zeros */ +static void +iofd_read(struct kvm_io_device *this, gpa_t addr, int len, void *val) +{ + memset(val, 0, len); +} + +static void +iofd_destructor(struct kvm_io_device *this) +{ + struct _iofd *iofd = (struct _iofd *)this->private; + + fput(iofd->file); + kfree(iofd); +} + +static int +kvm_assign_iofd(struct kvm *kvm, unsigned long addr, size_t len, + int fd, int flags) +{ + int pio = flags & KVM_IOFD_FLAG_PIO; + struct kvm_io_bus *bus = pio ? &kvm->pio_bus : &kvm->mmio_bus; + struct _iofd *iofd; + struct file *file; + + file = eventfd_fget(fd); + if (IS_ERR(file)) + return PTR_ERR(file); + + iofd = kzalloc(sizeof(*iofd), GFP_KERNEL); + if (!iofd) { + fput(file); + return -ENOMEM; + } + + iofd->dev.read = iofd_read; + iofd->dev.write = iofd_write; + iofd->dev.in_range = iofd_in_range; + iofd->dev.destructor = iofd_destructor; + iofd->dev.private = iofd; + + iofd->addr = addr; + iofd->length = len; + iofd->file = file; + + kvm_io_bus_register_dev(bus, &iofd->dev); + + printk(KERN_DEBUG "registering %s iofd at %lx of size %d\n", + pio ? "PIO" : "MMIO", addr, (int)len); + + return 0; +} + +static int +kvm_deassign_iofd(struct kvm *kvm, unsigned long addr, size_t len, + int fd, int flags) +{ + /* FIXME: We need an io_bus_unregister() function */ + return -EINVAL; +} + +int +kvm_iofd(struct kvm *kvm, unsigned long addr, size_t len, int fd, int flags) +{ + if (flags & KVM_IOFD_FLAG_DEASSIGN) + return kvm_deassign_iofd(kvm, addr, len, fd, flags); + + return kvm_assign_iofd(kvm, addr, len, fd, flags); +} diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7aa9f0a..a443974 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2228,6 +2228,19 @@ static long kvm_vm_ioctl(struct file *filp, r = kvm_deassign_irqfd(kvm, data); break; } + case KVM_IOFD: { + struct kvm_iofd entry; + + r = -EFAULT; + if (copy_from_user(&entry, argp, sizeof entry)) + goto out; + + r = kvm_iofd(kvm, entry.addr, entry.len, entry.fd, + entry.flags); + if (r) + goto out; + break; + } default: r = kvm_arch_vm_ioctl(filp, ioctl, arg); } ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [KVM PATCH v7.2] kvm: add iofd support 2009-05-12 18:27 ` [KVM PATCH v7 3/3] kvm: add iofd support Gregory Haskins 2009-05-12 19:05 ` Gregory Haskins 2009-05-12 19:29 ` [KVM PATCH v7.1] " Gregory Haskins @ 2009-05-12 22:17 ` Gregory Haskins 2009-05-13 2:46 ` Gregory Haskins 2009-05-14 11:11 ` [KVM PATCH v7 3/3] " Avi Kivity 3 siblings, 1 reply; 20+ messages in thread From: Gregory Haskins @ 2009-05-12 22:17 UTC (permalink / raw) To: kvm; +Cc: viro, linux-kernel, avi, davidel [ updated with figures, graphs, performance-test-harness info ] iofd is a mechanism to register PIO/MMIO regions to trigger an eventfd signal when written to by a guest. Userspace can register any arbitrary address with a corresponding eventfd and then pass the eventfd to a specific end-point of interest for handling. Normal IO requires a blocking round-trip since the operation may cause side-effects in the emulated model or may return data to the caller. Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM "heavy-weight" exit back to userspace, and is ultimately serviced by qemu's device model synchronously before returning control back to the vcpu. However, there is a subclass of IO which acts purely as a trigger for other IO (such as to kick off an out-of-band DMA request, etc). For these patterns, the synchronous call is particularly expensive since we really only want to simply get our notification transmitted asychronously and return as quickly as possible. All the sychronous infrastructure to ensure proper data-dependencies are met in the normal IO case are just unecessary overhead for signalling. This adds additional computational load on the system, as well as latency to the signalling path. Therefore, we provide a mechanism for registration of an in-kernel trigger point that allows the VCPU to only require a very brief, lightweight exit just long enough to signal an eventfd. This also means that any clients compatible with the eventfd interface (which includes userspace and kernelspace equally well) can now register to be notified. The end result should be a more flexible and higher performance notification API for the backend KVM hypervisor and perhipheral components. To test this theory, we built a test-harness called "doorbell". This module has a function called "doorbell_ring()" which simply increments a counter for each time the doorbell is signaled. It supports signalling from either an eventfd, or an ioctl(). We then wired up two paths to the doorbell: One via QEMU via a registered io region and through the doorbell ioctl(). The other is direct via iofd. You can download this test harness here: ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2 The measured results are as follows: qemu-mmio: 110000 iops, 9.09us rtt iofd-mmio: 200100 iops, 5.00us rtt iofd-pio: 367300 iops, 2.72us rtt I didn't measure qemu-pio, because I have to figure out how to register a PIO region, and I got lazy. However, for now we can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO, and -350ns for HC, we get: qemu-pio: 153139 iops, 6.53us rtt iofd-hc: 412585 iops, 2.37us rtt these are just for fun, for now, until I can gather more data. Here is a graph for your convenience: http://developer.novell.com/wiki/images/7/76/Iofd-chart.png The conclusion to draw is that we save about 4us by skipping the userspace hop. -------------------- Signed-off-by: Gregory Haskins <ghaskins@novell.com> --- include/linux/kvm.h | 12 +++++ include/linux/kvm_host.h | 2 + virt/kvm/eventfd.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++ virt/kvm/kvm_main.c | 13 ++++++ 4 files changed, 134 insertions(+), 0 deletions(-) diff --git a/include/linux/kvm.h b/include/linux/kvm.h index dfc4bcc..99b6e45 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -292,6 +292,17 @@ struct kvm_guest_debug { struct kvm_guest_debug_arch arch; }; +#define KVM_IOFD_FLAG_DEASSIGN (1 << 0) +#define KVM_IOFD_FLAG_PIO (1 << 1) + +struct kvm_iofd { + __u64 addr; + __u32 len; + __u32 fd; + __u32 flags; + __u8 pad[12]; +}; + #define KVM_TRC_SHIFT 16 /* * kvm trace categories @@ -508,6 +519,7 @@ struct kvm_irqfd { #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq) #define KVM_ASSIGN_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd) #define KVM_DEASSIGN_IRQFD _IOW(KVMIO, 0x77, __u32) +#define KVM_IOFD _IOW(KVMIO, 0x78, struct kvm_iofd) /* * ioctls for vcpu fds diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 1acc528..d53cb70 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -529,5 +529,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {} int kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi, int flags); int kvm_deassign_irqfd(struct kvm *kvm, int fd); void kvm_irqfd_release(struct kvm *kvm); +int kvm_iofd(struct kvm *kvm, unsigned long addr, size_t len, + int fd, int flags); #endif diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 71afd62..8b23317 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -21,12 +21,16 @@ */ #include <linux/kvm_host.h> +#include <linux/kvm.h> #include <linux/workqueue.h> #include <linux/syscalls.h> #include <linux/wait.h> #include <linux/poll.h> #include <linux/file.h> #include <linux/list.h> +#include <linux/eventfd.h> + +#include "iodev.h" /* * -------------------------------------------------------------------- @@ -185,3 +189,106 @@ kvm_irqfd_release(struct kvm *kvm) list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds, list) irqfd_release(irqfd); } + +/* + * -------------------------------------------------------------------- + * iofd: translate a PIO/MMIO memory write to an eventfd signal. + * + * userspace can register a PIO/MMIO address with an eventfd for recieving + * notification when the memory has been touched. + * -------------------------------------------------------------------- + */ + +struct _iofd { + u64 addr; + size_t length; + struct file *file; + struct kvm_io_device dev; +}; + +static int +iofd_in_range(struct kvm_io_device *this, gpa_t addr, int len, int is_write) +{ + struct _iofd *iofd = (struct _iofd *)this->private; + + return ((addr >= iofd->addr && (addr < iofd->addr + iofd->length))); +} + +/* writes trigger an event */ +static void +iofd_write(struct kvm_io_device *this, gpa_t addr, int len, const void *val) +{ + struct _iofd *iofd = (struct _iofd *)this->private; + + eventfd_signal(iofd->file, 1); +} + +/* reads return all zeros */ +static void +iofd_read(struct kvm_io_device *this, gpa_t addr, int len, void *val) +{ + memset(val, 0, len); +} + +static void +iofd_destructor(struct kvm_io_device *this) +{ + struct _iofd *iofd = (struct _iofd *)this->private; + + fput(iofd->file); + kfree(iofd); +} + +static int +kvm_assign_iofd(struct kvm *kvm, unsigned long addr, size_t len, + int fd, int flags) +{ + int pio = flags & KVM_IOFD_FLAG_PIO; + struct kvm_io_bus *bus = pio ? &kvm->pio_bus : &kvm->mmio_bus; + struct _iofd *iofd; + struct file *file; + + file = eventfd_fget(fd); + if (IS_ERR(file)) + return PTR_ERR(file); + + iofd = kzalloc(sizeof(*iofd), GFP_KERNEL); + if (!iofd) { + fput(file); + return -ENOMEM; + } + + iofd->dev.read = iofd_read; + iofd->dev.write = iofd_write; + iofd->dev.in_range = iofd_in_range; + iofd->dev.destructor = iofd_destructor; + iofd->dev.private = iofd; + + iofd->addr = addr; + iofd->length = len; + iofd->file = file; + + kvm_io_bus_register_dev(bus, &iofd->dev); + + printk(KERN_DEBUG "registering %s iofd at %lx of size %d\n", + pio ? "PIO" : "MMIO", addr, (int)len); + + return 0; +} + +static int +kvm_deassign_iofd(struct kvm *kvm, unsigned long addr, size_t len, + int fd, int flags) +{ + /* FIXME: We need an io_bus_unregister() function */ + return -EINVAL; +} + +int +kvm_iofd(struct kvm *kvm, unsigned long addr, size_t len, int fd, int flags) +{ + if (flags & KVM_IOFD_FLAG_DEASSIGN) + return kvm_deassign_iofd(kvm, addr, len, fd, flags); + + return kvm_assign_iofd(kvm, addr, len, fd, flags); +} diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7aa9f0a..a443974 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2228,6 +2228,19 @@ static long kvm_vm_ioctl(struct file *filp, r = kvm_deassign_irqfd(kvm, data); break; } + case KVM_IOFD: { + struct kvm_iofd entry; + + r = -EFAULT; + if (copy_from_user(&entry, argp, sizeof entry)) + goto out; + + r = kvm_iofd(kvm, entry.addr, entry.len, entry.fd, + entry.flags); + if (r) + goto out; + break; + } default: r = kvm_arch_vm_ioctl(filp, ioctl, arg); } ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [KVM PATCH v7.2] kvm: add iofd support 2009-05-12 22:17 ` [KVM PATCH v7.2] " Gregory Haskins @ 2009-05-13 2:46 ` Gregory Haskins 0 siblings, 0 replies; 20+ messages in thread From: Gregory Haskins @ 2009-05-13 2:46 UTC (permalink / raw) To: Gregory Haskins; +Cc: kvm, viro, linux-kernel, avi, davidel [-- Attachment #1: Type: text/plain, Size: 9610 bytes --] Gregory Haskins wrote: > [ updated with figures, graphs, performance-test-harness info ] > > iofd is a mechanism to register PIO/MMIO regions to trigger an eventfd > signal when written to by a guest. Userspace can register any arbitrary > address with a corresponding eventfd and then pass the eventfd to a specific > end-point of interest for handling. > > Normal IO requires a blocking round-trip since the operation may cause > side-effects in the emulated model or may return data to the caller. > Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM > "heavy-weight" exit back to userspace, and is ultimately serviced by qemu's > device model synchronously before returning control back to the vcpu. > > However, there is a subclass of IO which acts purely as a trigger for > other IO (such as to kick off an out-of-band DMA request, etc). For these > patterns, the synchronous call is particularly expensive since we really > only want to simply get our notification transmitted asychronously and > return as quickly as possible. All the sychronous infrastructure to ensure > proper data-dependencies are met in the normal IO case are just unecessary > overhead for signalling. This adds additional computational load on the > system, as well as latency to the signalling path. > > Therefore, we provide a mechanism for registration of an in-kernel trigger > point that allows the VCPU to only require a very brief, lightweight > exit just long enough to signal an eventfd. This also means that any > clients compatible with the eventfd interface (which includes userspace > and kernelspace equally well) can now register to be notified. The end > result should be a more flexible and higher performance notification API > for the backend KVM hypervisor and perhipheral components. > > To test this theory, we built a test-harness called "doorbell". This > module has a function called "doorbell_ring()" which simply increments a > counter for each time the doorbell is signaled. It supports signalling > from either an eventfd, or an ioctl(). > > We then wired up two paths to the doorbell: One via QEMU via a registered > io region and through the doorbell ioctl(). The other is direct via iofd. > > You can download this test harness here: > > ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2 > > The measured results are as follows: > > qemu-mmio: 110000 iops, 9.09us rtt > iofd-mmio: 200100 iops, 5.00us rtt > iofd-pio: 367300 iops, 2.72us rtt > > I didn't measure qemu-pio, because I have to figure out how to register a > PIO region, and I got lazy. However, for now we can extrapolate based on > the data from the NULLIO runs of +2.56us for MMIO, and -350ns for HC, we > get: > > qemu-pio: 153139 iops, 6.53us rtt > iofd-hc: 412585 iops, 2.37us rtt > > these are just for fun, for now, until I can gather more data. > > Here is a graph for your convenience: > > http://developer.novell.com/wiki/images/7/76/Iofd-chart.png > > The conclusion to draw is that we save about 4us by skipping the userspace > hop. > > -------------------- > > Signed-off-by: Gregory Haskins <ghaskins@novell.com> > --- > > include/linux/kvm.h | 12 +++++ > include/linux/kvm_host.h | 2 + > virt/kvm/eventfd.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++ > virt/kvm/kvm_main.c | 13 ++++++ > 4 files changed, 134 insertions(+), 0 deletions(-) > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > index dfc4bcc..99b6e45 100644 > --- a/include/linux/kvm.h > +++ b/include/linux/kvm.h > @@ -292,6 +292,17 @@ struct kvm_guest_debug { > struct kvm_guest_debug_arch arch; > }; > > +#define KVM_IOFD_FLAG_DEASSIGN (1 << 0) > +#define KVM_IOFD_FLAG_PIO (1 << 1) > + > +struct kvm_iofd { > + __u64 addr; > + __u32 len; > + __u32 fd; > + __u32 flags; > + __u8 pad[12]; > +}; > + > #define KVM_TRC_SHIFT 16 > /* > * kvm trace categories > @@ -508,6 +519,7 @@ struct kvm_irqfd { > #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq) > #define KVM_ASSIGN_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd) > #define KVM_DEASSIGN_IRQFD _IOW(KVMIO, 0x77, __u32) > +#define KVM_IOFD _IOW(KVMIO, 0x78, struct kvm_iofd) > > /* > * ioctls for vcpu fds > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 1acc528..d53cb70 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -529,5 +529,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {} > int kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi, int flags); > int kvm_deassign_irqfd(struct kvm *kvm, int fd); > void kvm_irqfd_release(struct kvm *kvm); > +int kvm_iofd(struct kvm *kvm, unsigned long addr, size_t len, > + int fd, int flags); > > #endif > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > index 71afd62..8b23317 100644 > --- a/virt/kvm/eventfd.c > +++ b/virt/kvm/eventfd.c > @@ -21,12 +21,16 @@ > */ > > #include <linux/kvm_host.h> > +#include <linux/kvm.h> > #include <linux/workqueue.h> > #include <linux/syscalls.h> > #include <linux/wait.h> > #include <linux/poll.h> > #include <linux/file.h> > #include <linux/list.h> > +#include <linux/eventfd.h> > + > +#include "iodev.h" > > /* > * -------------------------------------------------------------------- > @@ -185,3 +189,106 @@ kvm_irqfd_release(struct kvm *kvm) > list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds, list) > irqfd_release(irqfd); > } > + > +/* > + * -------------------------------------------------------------------- > + * iofd: translate a PIO/MMIO memory write to an eventfd signal. > + * > + * userspace can register a PIO/MMIO address with an eventfd for recieving > + * notification when the memory has been touched. > + * -------------------------------------------------------------------- > + */ > + > +struct _iofd { > + u64 addr; > + size_t length; > + struct file *file; > + struct kvm_io_device dev; > +}; > + > +static int > +iofd_in_range(struct kvm_io_device *this, gpa_t addr, int len, int is_write) > +{ > + struct _iofd *iofd = (struct _iofd *)this->private; > + > + return ((addr >= iofd->addr && (addr < iofd->addr + iofd->length))); > +} > + > +/* writes trigger an event */ > +static void > +iofd_write(struct kvm_io_device *this, gpa_t addr, int len, const void *val) > +{ > + struct _iofd *iofd = (struct _iofd *)this->private; > + > + eventfd_signal(iofd->file, 1); > +} > + > +/* reads return all zeros */ > +static void > +iofd_read(struct kvm_io_device *this, gpa_t addr, int len, void *val) > +{ > + memset(val, 0, len); > +} > + > +static void > +iofd_destructor(struct kvm_io_device *this) > +{ > + struct _iofd *iofd = (struct _iofd *)this->private; > + > + fput(iofd->file); > + kfree(iofd); > +} > + > +static int > +kvm_assign_iofd(struct kvm *kvm, unsigned long addr, size_t len, > + int fd, int flags) > +{ > + int pio = flags & KVM_IOFD_FLAG_PIO; > + struct kvm_io_bus *bus = pio ? &kvm->pio_bus : &kvm->mmio_bus; > + struct _iofd *iofd; > + struct file *file; > + > + file = eventfd_fget(fd); > + if (IS_ERR(file)) > + return PTR_ERR(file); > + > + iofd = kzalloc(sizeof(*iofd), GFP_KERNEL); > + if (!iofd) { > + fput(file); > + return -ENOMEM; > + } > + > + iofd->dev.read = iofd_read; > + iofd->dev.write = iofd_write; > + iofd->dev.in_range = iofd_in_range; > + iofd->dev.destructor = iofd_destructor; > + iofd->dev.private = iofd; > + > + iofd->addr = addr; > + iofd->length = len; > + iofd->file = file; > + > + kvm_io_bus_register_dev(bus, &iofd->dev); > Hmm..this needs to get beefed up too. It can BUG_ON() if there are too many io-devs registered, which would be an attack vector from userspace. Will fix to return an error (and to check the error in this function). > + > + printk(KERN_DEBUG "registering %s iofd at %lx of size %d\n", > + pio ? "PIO" : "MMIO", addr, (int)len); > + > + return 0; > +} > + > +static int > +kvm_deassign_iofd(struct kvm *kvm, unsigned long addr, size_t len, > + int fd, int flags) > +{ > + /* FIXME: We need an io_bus_unregister() function */ > + return -EINVAL; > +} > + > +int > +kvm_iofd(struct kvm *kvm, unsigned long addr, size_t len, int fd, int flags) > +{ > + if (flags & KVM_IOFD_FLAG_DEASSIGN) > + return kvm_deassign_iofd(kvm, addr, len, fd, flags); > + > + return kvm_assign_iofd(kvm, addr, len, fd, flags); > +} > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 7aa9f0a..a443974 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -2228,6 +2228,19 @@ static long kvm_vm_ioctl(struct file *filp, > r = kvm_deassign_irqfd(kvm, data); > break; > } > + case KVM_IOFD: { > + struct kvm_iofd entry; > + > + r = -EFAULT; > + if (copy_from_user(&entry, argp, sizeof entry)) > + goto out; > + > + r = kvm_iofd(kvm, entry.addr, entry.len, entry.fd, > + entry.flags); > + if (r) > + goto out; > + break; > + } > default: > r = kvm_arch_vm_ioctl(filp, ioctl, arg); > } > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 266 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [KVM PATCH v7 3/3] kvm: add iofd support 2009-05-12 18:27 ` [KVM PATCH v7 3/3] kvm: add iofd support Gregory Haskins ` (2 preceding siblings ...) 2009-05-12 22:17 ` [KVM PATCH v7.2] " Gregory Haskins @ 2009-05-14 11:11 ` Avi Kivity 2009-05-14 12:02 ` Gregory Haskins 3 siblings, 1 reply; 20+ messages in thread From: Avi Kivity @ 2009-05-14 11:11 UTC (permalink / raw) To: Gregory Haskins; +Cc: kvm, viro, linux-kernel, davidel Gregory Haskins wrote: > iofd is a mechanism to register PIO/MMIO regions to trigger an eventfd > signal when written to. Userspace can register any arbitrary address > with a corresponding eventfd. > > Please start a separate patchset for this so I can merge irqfd. > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > index dfc4bcc..99b6e45 100644 > --- a/include/linux/kvm.h > +++ b/include/linux/kvm.h > @@ -292,6 +292,17 @@ struct kvm_guest_debug { > struct kvm_guest_debug_arch arch; > }; > > +#define KVM_IOFD_FLAG_DEASSIGN (1 << 0) > +#define KVM_IOFD_FLAG_PIO (1 << 1) > + > +struct kvm_iofd { > + __u64 addr; > + __u32 len; > + __u32 fd; > + __u32 flags; > + __u8 pad[12]; > +}; > + > Please add a data match capability. virtio uses a write with the data containing the queue ID, and we want a separate event for each queue. > * kvm trace categories > @@ -508,6 +519,7 @@ struct kvm_irqfd { > #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq) > #define KVM_ASSIGN_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd) > #define KVM_DEASSIGN_IRQFD _IOW(KVMIO, 0x77, __u32) > +#define KVM_IOFD _IOW(KVMIO, 0x78, struct kvm_iofd) > Too general a name. It's not doing IO, just sending out notifications. Why have assign/deassign for irqfd and a single ioctl for iofd? The rest looks good. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [KVM PATCH v7 3/3] kvm: add iofd support 2009-05-14 11:11 ` [KVM PATCH v7 3/3] " Avi Kivity @ 2009-05-14 12:02 ` Gregory Haskins 2009-05-14 12:22 ` Avi Kivity 0 siblings, 1 reply; 20+ messages in thread From: Gregory Haskins @ 2009-05-14 12:02 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm, viro, linux-kernel, davidel [-- Attachment #1: Type: text/plain, Size: 2130 bytes --] Avi Kivity wrote: > Gregory Haskins wrote: >> iofd is a mechanism to register PIO/MMIO regions to trigger an eventfd >> signal when written to. Userspace can register any arbitrary address >> with a corresponding eventfd. >> >> > > Please start a separate patchset for this so I can merge irqfd. Ack. Will spin a new split series with your irqfd review changes > >> >> diff --git a/include/linux/kvm.h b/include/linux/kvm.h >> index dfc4bcc..99b6e45 100644 >> --- a/include/linux/kvm.h >> +++ b/include/linux/kvm.h >> @@ -292,6 +292,17 @@ struct kvm_guest_debug { >> struct kvm_guest_debug_arch arch; >> }; >> >> +#define KVM_IOFD_FLAG_DEASSIGN (1 << 0) >> +#define KVM_IOFD_FLAG_PIO (1 << 1) >> + >> +struct kvm_iofd { >> + __u64 addr; >> + __u32 len; >> + __u32 fd; >> + __u32 flags; >> + __u8 pad[12]; >> +}; >> + >> > Please add a data match capability. virtio uses a write with the data > containing the queue ID, and we want a separate event for each queue. How about "u64 cookie" ? > > >> * kvm trace categories >> @@ -508,6 +519,7 @@ struct kvm_irqfd { >> #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct >> kvm_assigned_irq) >> #define KVM_ASSIGN_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd) >> #define KVM_DEASSIGN_IRQFD _IOW(KVMIO, 0x77, __u32) >> +#define KVM_IOFD _IOW(KVMIO, 0x78, struct kvm_iofd) >> > > Too general a name. It's not doing IO, just sending out notifications. Hmm...good point. I was trying to reflect "[MM/P]IO-FD". How about "IOSIGNALFD" > > Why have assign/deassign for irqfd and a single ioctl for iofd? Heh.. irqfd "liked" two because the deassign only needed a u32. iofd needed more or less the same structure for both so I guess I thought I would be "slick" and condense the vectors. Will fix so they are symmetrical. > > The rest looks good. > I will also submit a patch to fix the io_bus stuff so that registrations can gracefully fail instead of BUG_ON(), and to provide an unregister function. Thanks Avi, -Greg [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 266 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [KVM PATCH v7 3/3] kvm: add iofd support 2009-05-14 12:02 ` Gregory Haskins @ 2009-05-14 12:22 ` Avi Kivity 0 siblings, 0 replies; 20+ messages in thread From: Avi Kivity @ 2009-05-14 12:22 UTC (permalink / raw) To: Gregory Haskins; +Cc: kvm, viro, linux-kernel, davidel Gregory Haskins wrote: >>> +#define KVM_IOFD_FLAG_PIO (1 << 1) >>> + >>> +struct kvm_iofd { >>> + __u64 addr; >>> + __u32 len; >>> + __u32 fd; >>> + __u32 flags; >>> + __u8 pad[12]; >>> +}; >>> + >>> >>> >> Please add a data match capability. virtio uses a write with the data >> containing the queue ID, and we want a separate event for each queue. >> > > How about "u64 cookie" ? > Sure, and a bit in flags to enable it. >>> * kvm trace categories >>> @@ -508,6 +519,7 @@ struct kvm_irqfd { >>> #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct >>> kvm_assigned_irq) >>> #define KVM_ASSIGN_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd) >>> #define KVM_DEASSIGN_IRQFD _IOW(KVMIO, 0x77, __u32) >>> +#define KVM_IOFD _IOW(KVMIO, 0x78, struct kvm_iofd) >>> >>> >> Too general a name. It's not doing IO, just sending out notifications. >> > > Hmm...good point. I was trying to reflect "[MM/P]IO-FD". How about > "IOSIGNALFD" > Okay. >> Why have assign/deassign for irqfd and a single ioctl for iofd? >> > Heh.. irqfd "liked" two because the deassign only needed a u32. iofd > needed more or less the same structure for both so I guess I thought I > would be "slick" and condense the vectors. Will fix so they are > symmetrical. > Yeah. You could have both use just one, or both use two. Not sure which is better. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2009-05-15 3:36 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-05-12 18:26 [KVM PATCH v7 0/3] kvm: eventfd interfaces (formerly irqfd) Gregory Haskins 2009-05-12 18:26 ` [KVM PATCH v7 1/3] eventfd: export eventfd interfaces for module use Gregory Haskins 2009-05-12 19:02 ` Davide Libenzi 2009-05-12 18:26 ` [KVM PATCH v7 2/3] kvm: add support for irqfd via eventfd-notification interface Gregory Haskins 2009-05-14 9:47 ` Avi Kivity 2009-05-14 11:52 ` Gregory Haskins 2009-05-14 12:20 ` Avi Kivity 2009-05-14 13:12 ` Gregory Haskins 2009-05-14 11:22 ` Avi Kivity 2009-05-14 15:52 ` Gregory Haskins 2009-05-15 3:22 ` Davide Libenzi 2009-05-15 3:35 ` Gregory Haskins 2009-05-12 18:27 ` [KVM PATCH v7 3/3] kvm: add iofd support Gregory Haskins 2009-05-12 19:05 ` Gregory Haskins 2009-05-12 19:29 ` [KVM PATCH v7.1] " Gregory Haskins 2009-05-12 22:17 ` [KVM PATCH v7.2] " Gregory Haskins 2009-05-13 2:46 ` Gregory Haskins 2009-05-14 11:11 ` [KVM PATCH v7 3/3] " Avi Kivity 2009-05-14 12:02 ` Gregory Haskins 2009-05-14 12:22 ` Avi Kivity
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox