* [KVM PATCH v3 0/2] irqfd
@ 2009-04-27 18:33 Gregory Haskins
2009-04-27 18:33 ` [KVM PATCH v3 1/2] eventfd: export fget and signal interfaces for module use Gregory Haskins
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Gregory Haskins @ 2009-04-27 18:33 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel, avi, davidel
(Applies to kvm.git 41b76d8d0487c26d6d4d3fe53c1ff59b3236f096)
This series implements a mechanism called "irqfd". It lets you create
an eventfd based file-desriptor to inject interrupts to a kvm guest. We
associate one gsi per fd for fine-grained routing.
[ Changelog:
v3:
*) The kernel now allocates the eventfd (need to export sys_eventfd2)
*) Added a flags field for future expansion to kvm_irqfd()
*) We properly toggle the irq level 1+0.
*) We re-use the USERSPACE_SRC_ID instead of creating our own
*) Properly check for failures establishing a poll-table with eventfd
*) Fixed fd/file leaks on failure
*) Rebased to lateste kvm.git::41b76d8d04
v2:
*) Dropped notifier_chain based callbacks in favor of
wait_queue_t::func and file::poll based callbacks (Thanks to
Davide for the suggestion)
v1:
*) Initial release
--------
We do not have a user of this interface in this series, though note
future version of virtual-bus (v4 and above) will be based on this.
The first patch will require mainline buy-in, particularly from Davide
(cc'd). The last patch is kvm specific.
qemu-kvm.git patch to follow.
-Greg
---
Gregory Haskins (2):
kvm: add support for irqfd via eventfd-notification interface
eventfd: export fget and signal interfaces for module use
arch/x86/kvm/Makefile | 2 -
arch/x86/kvm/x86.c | 1
fs/eventfd.c | 4 +
include/linux/kvm.h | 7 ++
include/linux/kvm_host.h | 4 +
virt/kvm/irqfd.c | 158 ++++++++++++++++++++++++++++++++++++++++++++++
virt/kvm/kvm_main.c | 11 +++
7 files changed, 186 insertions(+), 1 deletions(-)
create mode 100644 virt/kvm/irqfd.c
--
Signature
^ permalink raw reply [flat|nested] 16+ messages in thread* [KVM PATCH v3 1/2] eventfd: export fget and signal interfaces for module use 2009-04-27 18:33 [KVM PATCH v3 0/2] irqfd Gregory Haskins @ 2009-04-27 18:33 ` Gregory Haskins 2009-04-27 18:33 ` [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface Gregory Haskins 2009-04-29 6:10 ` [KVM PATCH v3 0/2] irqfd Andrew Morton 2 siblings, 0 replies; 16+ messages in thread From: Gregory Haskins @ 2009-04-27 18:33 UTC (permalink / raw) To: kvm; +Cc: linux-kernel, avi, davidel We will use this later in the series Signed-off-by: Gregory Haskins <ghaskins@novell.com> --- fs/eventfd.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/fs/eventfd.c b/fs/eventfd.c index 2a701d5..faf9e60 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -16,6 +16,7 @@ #include <linux/anon_inodes.h> #include <linux/eventfd.h> #include <linux/syscalls.h> +#include <linux/module.h> struct eventfd_ctx { wait_queue_head_t wqh; @@ -56,6 +57,7 @@ int eventfd_signal(struct file *file, int n) return n; } +EXPORT_SYMBOL_GPL(eventfd_signal); static int eventfd_release(struct inode *inode, struct file *file) { @@ -197,6 +199,7 @@ struct file *eventfd_fget(int fd) return file; } +EXPORT_SYMBOL_GPL(eventfd_fget); SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags) { @@ -228,6 +231,7 @@ SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags) kfree(ctx); return fd; } +EXPORT_SYMBOL_GPL(sys_eventfd2); SYSCALL_DEFINE1(eventfd, unsigned int, count) { ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface 2009-04-27 18:33 [KVM PATCH v3 0/2] irqfd Gregory Haskins 2009-04-27 18:33 ` [KVM PATCH v3 1/2] eventfd: export fget and signal interfaces for module use Gregory Haskins @ 2009-04-27 18:33 ` Gregory Haskins 2009-04-30 13:07 ` Michael S. Tsirkin 2009-05-03 6:44 ` Al Viro 2009-04-29 6:10 ` [KVM PATCH v3 0/2] irqfd Andrew Morton 2 siblings, 2 replies; 16+ messages in thread From: Gregory Haskins @ 2009-04-27 18:33 UTC (permalink / raw) To: kvm; +Cc: linux-kernel, avi, davidel This allows an eventfd to be registered as an irq source with a guest. Any signaling operation on the eventfd (via userspace or kernel) will inject the registered GSI at the next available window. Signed-off-by: Gregory Haskins <ghaskins@novell.com> --- arch/x86/kvm/Makefile | 2 - arch/x86/kvm/x86.c | 1 include/linux/kvm.h | 7 ++ include/linux/kvm_host.h | 4 + virt/kvm/irqfd.c | 158 ++++++++++++++++++++++++++++++++++++++++++++++ virt/kvm/kvm_main.c | 11 +++ 6 files changed, 182 insertions(+), 1 deletions(-) create mode 100644 virt/kvm/irqfd.c diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index b43c4ef..d5fff51 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -3,7 +3,7 @@ # common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \ - coalesced_mmio.o irq_comm.o) + coalesced_mmio.o irq_comm.o irqfd.o) ifeq ($(CONFIG_KVM_TRACE),y) common-objs += $(addprefix ../../../virt/kvm/, kvm_trace.o) endif diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ab1fdac..0773433 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1027,6 +1027,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_REINJECT_CONTROL: case KVM_CAP_IRQ_INJECT_STATUS: case KVM_CAP_ASSIGN_DEV_IRQ: + case KVM_CAP_IRQFD: r = 1; break; case KVM_CAP_COALESCED_MMIO: diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 3db5d8d..0e594f2 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -415,6 +415,7 @@ struct kvm_trace_rec { #define KVM_CAP_ASSIGN_DEV_IRQ 29 /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */ #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30 +#define KVM_CAP_IRQFD 31 #ifdef KVM_CAP_IRQ_ROUTING @@ -454,6 +455,11 @@ struct kvm_irq_routing { #endif +struct kvm_irqfd { + __u32 gsi; + __u32 flags; +}; + /* * ioctls for VM fds */ @@ -498,6 +504,7 @@ struct kvm_irq_routing { #define KVM_ASSIGN_SET_MSIX_ENTRY \ _IOW(KVMIO, 0x74, struct kvm_assigned_msix_entry) #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq) +#define KVM_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd) /* * ioctls for vcpu fds diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 095ebb6..6a8d1c1 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -134,6 +134,7 @@ struct kvm { struct list_head vm_list; struct kvm_io_bus mmio_bus; struct kvm_io_bus pio_bus; + struct list_head irqfds; struct kvm_vm_stat stat; struct kvm_arch arch; atomic_t users_count; @@ -524,4 +525,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {} #endif +int kvm_irqfd(struct kvm *kvm, int gsi, int flags); +void kvm_irqfd_release(struct kvm *kvm); + #endif diff --git a/virt/kvm/irqfd.c b/virt/kvm/irqfd.c new file mode 100644 index 0000000..4fd2bea --- /dev/null +++ b/virt/kvm/irqfd.c @@ -0,0 +1,158 @@ +/* + * irqfd: Allows an eventfd to be used to inject an interrupt to the guest + * + * Credit goes to Avi Kivity for the original idea. + * + * Copyright 2009 Novell. All Rights Reserved. + * + * Author: + * Gregory Haskins <ghaskins@novell.com> + * + * This file is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +#include <linux/kvm_host.h> +#include <linux/eventfd.h> +#include <linux/workqueue.h> +#include <linux/syscalls.h> +#include <linux/wait.h> +#include <linux/poll.h> +#include <linux/file.h> +#include <linux/list.h> + +struct _irqfd { + struct kvm *kvm; + int gsi; + struct file *file; + struct list_head list; + poll_table pt; + wait_queue_head_t *wqh; + wait_queue_t wait; + struct work_struct work; +}; + +static void +irqfd_inject(struct work_struct *work) +{ + struct _irqfd *irqfd = container_of(work, struct _irqfd, work); + struct kvm *kvm = irqfd->kvm; + + mutex_lock(&kvm->lock); + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); + mutex_unlock(&kvm->lock); +} + +static int +irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) +{ + struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait); + + /* + * The eventfd calls its wake_up with interrupts disabled, and + * eventfd_signal() may be called from interrupt context. Therefore + * we need to defer the IRQ injection until later since we need to + * acquire the kvm->lock to do so. + */ + schedule_work(&irqfd->work); + + return 0; +} + +static void +irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh, + poll_table *pt) +{ + struct _irqfd *irqfd = container_of(pt, struct _irqfd, pt); + + irqfd->wqh = wqh; + add_wait_queue(wqh, &irqfd->wait); +} + +int +kvm_irqfd(struct kvm *kvm, int gsi, int flags) +{ + struct _irqfd *irqfd; + struct file *file = NULL; + int fd = -1; + int ret; + + irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL); + if (!irqfd) + return -ENOMEM; + + irqfd->kvm = kvm; + irqfd->gsi = gsi; + INIT_LIST_HEAD(&irqfd->list); + INIT_WORK(&irqfd->work, irqfd_inject); + + /* We re-use eventfd for irqfd */ + fd = sys_eventfd2(0, 0); + if (fd < 0) { + ret = fd; + goto fail; + } + + /* We maintain a reference to eventfd for the irqfd lifetime */ + file = eventfd_fget(fd); + if (IS_ERR(file)) { + ret = PTR_ERR(file); + goto fail; + } + + irqfd->file = file; + + /* + * Install our own custom wake-up handling so we are notified via + * a callback whenever someone signals the underlying eventfd + */ + init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup); + init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc); + + ret = file->f_op->poll(file, &irqfd->pt); + if (ret < 0) + goto fail; + + mutex_lock(&kvm->lock); + list_add_tail(&irqfd->list, &kvm->irqfds); + mutex_unlock(&kvm->lock); + + return fd; + +fail: + if (fd > 0) + sys_close(fd); + + if (file && !IS_ERR(file)) + fput(file); + + kfree(irqfd); + return ret; +} + +void +kvm_irqfd_release(struct kvm *kvm) +{ + struct _irqfd *irqfd, *tmp; + + list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds, list) { + remove_wait_queue(irqfd->wqh, &irqfd->wait); + + flush_work(&irqfd->work); + fput(irqfd->file); + + list_del(&irqfd->list); + kfree(irqfd); + } +} diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 3265566..21211ee 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -972,6 +972,7 @@ static struct kvm *kvm_create_vm(void) atomic_inc(&kvm->mm->mm_count); spin_lock_init(&kvm->mmu_lock); kvm_io_bus_init(&kvm->pio_bus); + INIT_LIST_HEAD(&kvm->irqfds); mutex_init(&kvm->lock); kvm_io_bus_init(&kvm->mmio_bus); init_rwsem(&kvm->slots_lock); @@ -1023,6 +1024,7 @@ static void kvm_destroy_vm(struct kvm *kvm) spin_lock(&kvm_lock); list_del(&kvm->vm_list); spin_unlock(&kvm_lock); + kvm_irqfd_release(kvm); kvm_free_irq_routing(kvm); kvm_io_bus_destroy(&kvm->pio_bus); kvm_io_bus_destroy(&kvm->mmio_bus); @@ -2197,6 +2199,15 @@ static long kvm_vm_ioctl(struct file *filp, } #endif #endif /* KVM_CAP_IRQ_ROUTING */ + case KVM_IRQFD: { + struct kvm_irqfd data; + + r = -EFAULT; + if (copy_from_user(&data, argp, sizeof data)) + goto out; + r = kvm_irqfd(kvm, data.gsi, data.flags); + break; + } default: r = kvm_arch_vm_ioctl(filp, ioctl, arg); } ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface 2009-04-27 18:33 ` [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface Gregory Haskins @ 2009-04-30 13:07 ` Michael S. Tsirkin 2009-05-03 16:59 ` Avi Kivity 2009-05-03 6:44 ` Al Viro 1 sibling, 1 reply; 16+ messages in thread From: Michael S. Tsirkin @ 2009-04-30 13:07 UTC (permalink / raw) To: Gregory Haskins; +Cc: kvm, linux-kernel, avi, davidel On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote: > This allows an eventfd to be registered as an irq source with a guest. Any > signaling operation on the eventfd (via userspace or kernel) will inject > the registered GSI at the next available window. > > Signed-off-by: Gregory Haskins <ghaskins@novell.com> If we ever want to use this with e.g. MSI-X emulation in guest, and want to be stricly compliant to MSI-X, we'll need a way for guest to mask interrupts, and for host to report that a masked interrupt is pending. Ideally, all this will be doable with a couple of mmapped pages to avoid vmexits/system calls. > +static void > +irqfd_inject(struct work_struct *work) > +{ > + struct _irqfd *irqfd = container_of(work, struct _irqfd, work); > + struct kvm *kvm = irqfd->kvm; > + > + mutex_lock(&kvm->lock); > + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); > + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); > + mutex_unlock(&kvm->lock); This will do weird stuff (deliver the irq twice) if the irq is MSI/MSI-X. I know this was discussed already and is a temporary shortcut, but maybe add a comment that we really want kvm_toggle_irq, so that we won't forget? > +} > + -- MST ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface 2009-04-30 13:07 ` Michael S. Tsirkin @ 2009-05-03 16:59 ` Avi Kivity 2009-05-03 19:04 ` Michael S. Tsirkin 0 siblings, 1 reply; 16+ messages in thread From: Avi Kivity @ 2009-05-03 16:59 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Gregory Haskins, kvm, linux-kernel, davidel Michael S. Tsirkin wrote: > On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote: > >> This allows an eventfd to be registered as an irq source with a guest. Any >> signaling operation on the eventfd (via userspace or kernel) will inject >> the registered GSI at the next available window. >> >> Signed-off-by: Gregory Haskins <ghaskins@novell.com> >> > > If we ever want to use this with e.g. MSI-X emulation in guest, and want > to be stricly compliant to MSI-X, we'll need a way for guest to mask > interrupts, and for host to report that a masked interrupt is pending. > Ideally, all this will be doable with a couple of mmapped pages to avoid > vmexits/system calls. > > We could do this in two ways: - move msix entry emulation into the kernel - require the device to support replacing its irqfd, and juggle it like so: - guest disables msi - replace device model fd with eventfd belonging to us - when the device fires its eventfd, set the irq pending bit - guest enables msi - if the pending bit is set, fire the interrupt? - replace device model fd with the real irqfd I'm leaning towards the latter, though it's not an easy call. >> +static void >> +irqfd_inject(struct work_struct *work) >> +{ >> + struct _irqfd *irqfd = container_of(work, struct _irqfd, work); >> + struct kvm *kvm = irqfd->kvm; >> + >> + mutex_lock(&kvm->lock); >> + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); >> + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); >> + mutex_unlock(&kvm->lock); >> > > This will do weird stuff (deliver the irq twice) if the irq is > MSI/MSI-X. I know this was discussed already and is a temporary > shortcut, but maybe add a comment that we really want kvm_toggle_irq, > so that we won't forget? > If so, that's a bug. MSI should ignore kvm_set_irq(..., 0). -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface 2009-05-03 16:59 ` Avi Kivity @ 2009-05-03 19:04 ` Michael S. Tsirkin 2009-05-03 19:17 ` Avi Kivity 0 siblings, 1 reply; 16+ messages in thread From: Michael S. Tsirkin @ 2009-05-03 19:04 UTC (permalink / raw) To: Avi Kivity; +Cc: Gregory Haskins, kvm, linux-kernel, davidel On Sun, May 03, 2009 at 07:59:40PM +0300, Avi Kivity wrote: > Michael S. Tsirkin wrote: >> On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote: >> >>> This allows an eventfd to be registered as an irq source with a guest. Any >>> signaling operation on the eventfd (via userspace or kernel) will inject >>> the registered GSI at the next available window. >>> >>> Signed-off-by: Gregory Haskins <ghaskins@novell.com> >>> >> >> If we ever want to use this with e.g. MSI-X emulation in guest, and want >> to be stricly compliant to MSI-X, we'll need a way for guest to mask >> interrupts, and for host to report that a masked interrupt is pending. >> Ideally, all this will be doable with a couple of mmapped pages to avoid >> vmexits/system calls. >> >> > > We could do this in two ways: > > - move msix entry emulation into the kernel It's not too bad IMO: MSIX is just a table with a list of vectors, you check the mask bit on each interrupt, if masked mark vector pending and poll until unmasked. > - require the device to support replacing its irqfd, and juggle it like so: > - guest disables msi > - replace device model fd with eventfd belonging to us > - when the device fires its eventfd, set the irq pending bit > - guest enables msi > - if the pending bit is set, fire the interrupt? > - replace device model fd with the real irqfd Looks like a lot of code. No? > I'm leaning towards the latter, though it's not an easy call. Actually there's a third option: add KVM_MASK_IRQ, KVM_UNMASK_IRQ ioctls which will block/unblock guest from getting interrupt on this irq, whatever the source. Interrupts are queued in kernel while masked. A third ioctl KVM_PENDING_IRQS will return the status for a set if IRQs. qemu would call these ioctls when guest edits the MSIX vector control or reads the pending bit array. >>> +static void >>> +irqfd_inject(struct work_struct *work) >>> +{ >>> + struct _irqfd *irqfd = container_of(work, struct _irqfd, work); >>> + struct kvm *kvm = irqfd->kvm; >>> + >>> + mutex_lock(&kvm->lock); >>> + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); >>> + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); >>> + mutex_unlock(&kvm->lock); >>> >> >> This will do weird stuff (deliver the irq twice) if the irq is >> MSI/MSI-X. I know this was discussed already and is a temporary >> shortcut, but maybe add a comment that we really want kvm_toggle_irq, >> so that we won't forget? >> > > If so, that's a bug. MSI should ignore kvm_set_irq(..., 0). -- MST ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface 2009-05-03 19:04 ` Michael S. Tsirkin @ 2009-05-03 19:17 ` Avi Kivity 2009-05-03 19:25 ` Michael S. Tsirkin 0 siblings, 1 reply; 16+ messages in thread From: Avi Kivity @ 2009-05-03 19:17 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Gregory Haskins, kvm, linux-kernel, davidel Michael S. Tsirkin wrote: > On Sun, May 03, 2009 at 07:59:40PM +0300, Avi Kivity wrote: > >> Michael S. Tsirkin wrote: >> >>> On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote: >>> >>> >>>> This allows an eventfd to be registered as an irq source with a guest. Any >>>> signaling operation on the eventfd (via userspace or kernel) will inject >>>> the registered GSI at the next available window. >>>> >>>> Signed-off-by: Gregory Haskins <ghaskins@novell.com> >>>> >>>> >>> If we ever want to use this with e.g. MSI-X emulation in guest, and want >>> to be stricly compliant to MSI-X, we'll need a way for guest to mask >>> interrupts, and for host to report that a masked interrupt is pending. >>> Ideally, all this will be doable with a couple of mmapped pages to avoid >>> vmexits/system calls. >>> >>> >>> >> We could do this in two ways: >> >> - move msix entry emulation into the kernel >> > > It's not too bad IMO: MSIX is just a table with a list > of vectors, you check the mask bit on each interrupt, > if masked mark vector pending and poll until unmasked. > Right, but it's more and more core, and more and more bugs. Bugs in the kernel are more expensive to fix and get to users. > >> - require the device to support replacing its irqfd, and juggle it like so: >> - guest disables msi >> - replace device model fd with eventfd belonging to us >> - when the device fires its eventfd, set the irq pending bit >> - guest enables msi >> - if the pending bit is set, fire the interrupt? >> - replace device model fd with the real irqfd >> > > Looks like a lot of code. No? > We'll need exactly the same code if we do it in the kernel. The only addition is replacing the fd. >> I'm leaning towards the latter, though it's not an easy call. >> > > Actually there's a third option: add KVM_MASK_IRQ, KVM_UNMASK_IRQ ioctls > which will block/unblock guest from getting interrupt on this irq, > whatever the source. Interrupts are queued in kernel while masked. A > third ioctl KVM_PENDING_IRQS will return the status for a set if IRQs. > qemu would call these ioctls when guest edits the MSIX vector control or > reads the pending bit array. > I think this is the best option. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface 2009-05-03 19:17 ` Avi Kivity @ 2009-05-03 19:25 ` Michael S. Tsirkin 0 siblings, 0 replies; 16+ messages in thread From: Michael S. Tsirkin @ 2009-05-03 19:25 UTC (permalink / raw) To: Avi Kivity; +Cc: Gregory Haskins, kvm, linux-kernel, davidel On Sun, May 03, 2009 at 10:17:16PM +0300, Avi Kivity wrote: >> Actually there's a third option: add KVM_MASK_IRQ, KVM_UNMASK_IRQ ioctls >> which will block/unblock guest from getting interrupt on this irq, >> whatever the source. Interrupts are queued in kernel while masked. A >> third ioctl KVM_PENDING_IRQS will return the status for a set if IRQs. >> qemu would call these ioctls when guest edits the MSIX vector control or >> reads the pending bit array. >> > > I think this is the best option. Sounds good. -- MST ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface 2009-04-27 18:33 ` [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface Gregory Haskins 2009-04-30 13:07 ` Michael S. Tsirkin @ 2009-05-03 6:44 ` Al Viro 2009-05-03 18:07 ` Davide Libenzi 1 sibling, 1 reply; 16+ messages in thread From: Al Viro @ 2009-05-03 6:44 UTC (permalink / raw) To: Gregory Haskins; +Cc: kvm, linux-kernel, avi, davidel On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote: > + /* We re-use eventfd for irqfd */ > + fd = sys_eventfd2(0, 0); > + if (fd < 0) { > + ret = fd; > + goto fail; > + } > + > + /* We maintain a reference to eventfd for the irqfd lifetime */ > + file = eventfd_fget(fd); > + if (IS_ERR(file)) { > + ret = PTR_ERR(file); > + goto fail; > + } > + > + irqfd->file = file; This is just plain wrong. You have no promise whatsoever that caller of that sucker won't race with e.g. dup2(). IOW, you can't assume that file will be of the expected kind. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface 2009-05-03 6:44 ` Al Viro @ 2009-05-03 18:07 ` Davide Libenzi 2009-05-03 19:01 ` Al Viro 0 siblings, 1 reply; 16+ messages in thread From: Davide Libenzi @ 2009-05-03 18:07 UTC (permalink / raw) To: Al Viro; +Cc: Gregory Haskins, kvm, Linux Kernel Mailing List, avi On Sun, 3 May 2009, Al Viro wrote: > On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote: > > + /* We re-use eventfd for irqfd */ > > + fd = sys_eventfd2(0, 0); > > + if (fd < 0) { > > + ret = fd; > > + goto fail; > > + } > > + > > + /* We maintain a reference to eventfd for the irqfd lifetime */ > > + file = eventfd_fget(fd); > > + if (IS_ERR(file)) { > > + ret = PTR_ERR(file); > > + goto fail; > > + } > > + > > + irqfd->file = file; > > This is just plain wrong. You have no promise whatsoever that caller of > that sucker won't race with e.g. dup2(). IOW, you can't assume that > file will be of the expected kind. The eventfd_fget() checks for the file_operations pointer, before returning the file*, and fails if the fd in not an eventfd. Or you have other concerns? - Davide ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface 2009-05-03 18:07 ` Davide Libenzi @ 2009-05-03 19:01 ` Al Viro 2009-05-03 19:22 ` file descriptor abuses Al Viro 2009-05-03 20:11 ` [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface Davide Libenzi 0 siblings, 2 replies; 16+ messages in thread From: Al Viro @ 2009-05-03 19:01 UTC (permalink / raw) To: Davide Libenzi; +Cc: Gregory Haskins, kvm, Linux Kernel Mailing List, avi On Sun, May 03, 2009 at 11:07:26AM -0700, Davide Libenzi wrote: > On Sun, 3 May 2009, Al Viro wrote: > > On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote: > > > + /* We re-use eventfd for irqfd */ > > > + fd = sys_eventfd2(0, 0); > > > + if (fd < 0) { > > > + ret = fd; > > > + goto fail; > > > + } > > > + > > > + /* We maintain a reference to eventfd for the irqfd lifetime */ > > > + file = eventfd_fget(fd); > > > + if (IS_ERR(file)) { > > > + ret = PTR_ERR(file); > > > + goto fail; > > > + } > > > + > > > + irqfd->file = file; > > > > This is just plain wrong. You have no promise whatsoever that caller of > > that sucker won't race with e.g. dup2(). IOW, you can't assume that > > file will be of the expected kind. > > The eventfd_fget() checks for the file_operations pointer, before > returning the file*, and fails if the fd in not an eventfd. Or you have > other concerns? OK, but... it's still wrong. Descriptor numbers are purely for interaction with userland; using them that way violates very general race-prevention rules, even if you do paper over the worst of consequences with check in eventfd_fget(). General rules: * descriptor you've generated is fit only for return to userland; * descriptor you've got from userland is fit only for *single* fget() or equivalent, unless you are one of the core syscalls manipulating the descriptor table itself (dup2, etc.) * once file is installed in descriptor table, you'd better be past the last failure exit; sys_close() on cleanup path is not acceptable. That's what reserving descriptors is for. IOW, the sane solution would be to export something that returns your struct file *. And stop playing with fd completely. ^ permalink raw reply [flat|nested] 16+ messages in thread
* file descriptor abuses 2009-05-03 19:01 ` Al Viro @ 2009-05-03 19:22 ` Al Viro 2009-05-03 20:11 ` [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface Davide Libenzi 1 sibling, 0 replies; 16+ messages in thread From: Al Viro @ 2009-05-03 19:22 UTC (permalink / raw) To: Davide Libenzi Cc: Gregory Haskins, kvm, Linux Kernel Mailing List, avi, hirofuchi, ericvh On Sun, May 03, 2009 at 08:01:36PM +0100, Al Viro wrote: > General rules: > * descriptor you've generated is fit only for return to userland; > * descriptor you've got from userland is fit only for *single* > fget() or equivalent, unless you are one of the core syscalls manipulating > the descriptor table itself (dup2, etc.) > * once file is installed in descriptor table, you'd better be past > the last failure exit; sys_close() on cleanup path is not acceptable. > That's what reserving descriptors is for. > > IOW, the sane solution would be to export something that returns your > struct file *. And stop playing with fd completely. Speaking of which, quick look through fget() callers shows this turd: static int p9_socket_open(struct p9_client *client, struct socket *csocket) { int fd, ret; fd = sock_map_fd(csocket, 0); ..... ret = p9_fd_open(client, fd, fd); if (ret < 0) { P9_EPRINTK(KERN_ERR, "p9_socket_open: failed to open fd\n"); sockfd_put(csocket); return ret; } ..... return 0; } where p9_fd_open() calls fget() on its 2nd and 3rd arguments. Which does worse than just a leak, AFAICT - on failure exit it leaves a dangling pointer from descriptor table. On the almost unrelated note, we have (in drivers/sneak_in^Wstaging/usbip) sockfd_to_socket(), with all callers leaking struct file, AFAICS. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface 2009-05-03 19:01 ` Al Viro 2009-05-03 19:22 ` file descriptor abuses Al Viro @ 2009-05-03 20:11 ` Davide Libenzi 2009-05-03 20:31 ` Al Viro 1 sibling, 1 reply; 16+ messages in thread From: Davide Libenzi @ 2009-05-03 20:11 UTC (permalink / raw) To: Al Viro; +Cc: Gregory Haskins, kvm, Linux Kernel Mailing List, avi On Sun, 3 May 2009, Al Viro wrote: > IOW, the sane solution would be to export something that returns your > struct file *. And stop playing with fd completely. This builds but it's not tested at all. - Make all the work of the old anon_inode_getfd(), done by a new anon_inode_getfile(), with anon_inode_getfd() using its services - Make all the work done by sys_eventfd(), done by a new eventfd_file_create() (which in turn uses anon_inode_getfile()), with sys_eventfd() using its services IRQfd can use eventfd_file_create(), fget(), get_unused_fd_flags() and fd_install() just before returning. Is that what you had in mind? - Davide --- fs/anon_inodes.c | 68 +++++++++++++++++++++++++++++++++----------- fs/eventfd.c | 44 +++++++++++++++++++++------- include/linux/anon_inodes.h | 3 + include/linux/eventfd.h | 6 +++ 4 files changed, 92 insertions(+), 29 deletions(-) Index: linux-2.6.mod/fs/anon_inodes.c =================================================================== --- linux-2.6.mod.orig/fs/anon_inodes.c 2009-05-03 12:21:09.000000000 -0700 +++ linux-2.6.mod/fs/anon_inodes.c 2009-05-03 12:54:02.000000000 -0700 @@ -64,28 +64,24 @@ static const struct dentry_operations an * * Creates a new file by hooking it on a single inode. This is useful for files * that do not need to have a full-fledged inode in order to operate correctly. - * All the files created with anon_inode_getfd() will share a single inode, + * All the files created with anon_inode_getfile() will share a single inode, * hence saving memory and avoiding code duplication for the file/inode/dentry - * setup. Returns new descriptor or -error. + * setup. Returns the newly created file* or error. */ -int anon_inode_getfd(const char *name, const struct file_operations *fops, - void *priv, int flags) +struct file *anon_inode_getfile(const char *name, + const struct file_operations *fops, + void *priv, int flags) { struct qstr this; struct dentry *dentry; struct file *file; - int error, fd; + int error; if (IS_ERR(anon_inode_inode)) - return -ENODEV; + return ERR_PTR(-ENODEV); if (fops->owner && !try_module_get(fops->owner)) - return -ENOENT; - - error = get_unused_fd_flags(flags); - if (error < 0) - goto err_module; - fd = error; + return ERR_PTR(-ENOENT); /* * Link the inode to a directory entry by creating a unique name @@ -97,7 +93,7 @@ int anon_inode_getfd(const char *name, c this.hash = 0; dentry = d_alloc(anon_inode_mnt->mnt_sb->s_root, &this); if (!dentry) - goto err_put_unused_fd; + goto err_module; /* * We know the anon_inode inode count is always greater than zero, @@ -123,16 +119,54 @@ int anon_inode_getfd(const char *name, c file->f_version = 0; file->private_data = priv; + return file; + +err_dput: + dput(dentry); +err_module: + module_put(fops->owner); + return ERR_PTR(error); +} +EXPORT_SYMBOL_GPL(anon_inode_getfile); + +/** + * anon_inode_getfd - creates a new file instance by hooking it up to an + * anonymous inode, and a dentry that describe the "class" + * of the file + * + * @name: [in] name of the "class" of the new file + * @fops: [in] file operations for the new file + * @priv: [in] private data for the new file (will be file's private_data) + * @flags: [in] flags + * + * Creates a new file by hooking it on a single inode. This is useful for files + * that do not need to have a full-fledged inode in order to operate correctly. + * All the files created with anon_inode_getfd() will share a single inode, + * hence saving memory and avoiding code duplication for the file/inode/dentry + * setup. Returns new descriptor or -error. + */ +int anon_inode_getfd(const char *name, const struct file_operations *fops, + void *priv, int flags) +{ + int error, fd; + struct file *file; + + error = get_unused_fd_flags(flags); + if (error < 0) + return error; + fd = error; + + file = anon_inode_getfile(name, fops, priv, flags); + if (IS_ERR(file)) { + error = PTR_ERR(file); + goto err_put_unused_fd; + } fd_install(fd, file); return fd; -err_dput: - dput(dentry); err_put_unused_fd: put_unused_fd(fd); -err_module: - module_put(fops->owner); return error; } EXPORT_SYMBOL_GPL(anon_inode_getfd); Index: linux-2.6.mod/fs/eventfd.c =================================================================== --- linux-2.6.mod.orig/fs/eventfd.c 2009-05-03 12:21:09.000000000 -0700 +++ linux-2.6.mod/fs/eventfd.c 2009-05-03 12:54:07.000000000 -0700 @@ -198,9 +198,9 @@ struct file *eventfd_fget(int fd) return file; } -SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags) +struct file *eventfd_file_create(unsigned int count, int flags) { - int fd; + struct file *file; struct eventfd_ctx *ctx; /* Check the EFD_* constants for consistency. */ @@ -208,25 +208,47 @@ SYSCALL_DEFINE2(eventfd2, unsigned int, BUILD_BUG_ON(EFD_NONBLOCK != O_NONBLOCK); if (flags & ~EFD_FLAGS_SET) - return -EINVAL; + return ERR_PTR(-EINVAL); ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); if (!ctx) - return -ENOMEM; + return ERR_PTR(-ENOMEM); init_waitqueue_head(&ctx->wqh); ctx->count = count; ctx->flags = flags; - /* - * When we call this, the initialization must be complete, since - * anon_inode_getfd() will install the fd. - */ - fd = anon_inode_getfd("[eventfd]", &eventfd_fops, ctx, - flags & EFD_SHARED_FCNTL_FLAGS); - if (fd < 0) + file = anon_inode_getfile("[eventfd]", &eventfd_fops, ctx, + flags & EFD_SHARED_FCNTL_FLAGS); + if (IS_ERR(file)) kfree(ctx); + + return file; +} + +SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags) +{ + int fd, error; + struct file *file; + + error = get_unused_fd_flags(flags & EFD_SHARED_FCNTL_FLAGS); + if (error < 0) + return error; + fd = error; + + file = eventfd_file_create(count, flags); + if (IS_ERR(file)) { + error = PTR_ERR(file); + goto err_put_unused_fd; + } + fd_install(fd, file); + return fd; + +err_put_unused_fd: + put_unused_fd(fd); + + return error; } SYSCALL_DEFINE1(eventfd, unsigned int, count) Index: linux-2.6.mod/include/linux/anon_inodes.h =================================================================== --- linux-2.6.mod.orig/include/linux/anon_inodes.h 2009-05-03 12:21:09.000000000 -0700 +++ linux-2.6.mod/include/linux/anon_inodes.h 2009-05-03 12:48:08.000000000 -0700 @@ -8,6 +8,9 @@ #ifndef _LINUX_ANON_INODES_H #define _LINUX_ANON_INODES_H +struct file *anon_inode_getfile(const char *name, + const struct file_operations *fops, + void *priv, int flags); int anon_inode_getfd(const char *name, const struct file_operations *fops, void *priv, int flags); Index: linux-2.6.mod/include/linux/eventfd.h =================================================================== --- linux-2.6.mod.orig/include/linux/eventfd.h 2009-05-03 12:21:09.000000000 -0700 +++ linux-2.6.mod/include/linux/eventfd.h 2009-05-03 12:43:54.000000000 -0700 @@ -29,12 +29,16 @@ struct file *eventfd_fget(int fd); int eventfd_signal(struct file *file, int n); +struct file *eventfd_file_create(unsigned int count, int flags); #else /* CONFIG_EVENTFD */ -#define eventfd_fget(fd) ERR_PTR(-ENOSYS) +static inline struct file *eventfd_fget(int fd) +{ ERR_PTR(-ENOSYS); } static inline int eventfd_signal(struct file *file, int n) { return 0; } +struct file *eventfd_file_create(unsigned int count, int flags) +{ ERR_PTR(-ENOSYS); } #endif /* CONFIG_EVENTFD */ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface 2009-05-03 20:11 ` [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface Davide Libenzi @ 2009-05-03 20:31 ` Al Viro 0 siblings, 0 replies; 16+ messages in thread From: Al Viro @ 2009-05-03 20:31 UTC (permalink / raw) To: Davide Libenzi; +Cc: Gregory Haskins, kvm, Linux Kernel Mailing List, avi On Sun, May 03, 2009 at 01:11:39PM -0700, Davide Libenzi wrote: > On Sun, 3 May 2009, Al Viro wrote: > > > IOW, the sane solution would be to export something that returns your > > struct file *. And stop playing with fd completely. > > This builds but it's not tested at all. > > - Make all the work of the old anon_inode_getfd(), done by a new > anon_inode_getfile(), with anon_inode_getfd() using its services > > - Make all the work done by sys_eventfd(), done by a new > eventfd_file_create() (which in turn uses anon_inode_getfile()), with > sys_eventfd() using its services > > IRQfd can use eventfd_file_create(), fget(), get_unused_fd_flags() and > fd_install() just before returning. > Is that what you had in mind? More or less, but I'd like to see the irqfd side of that... ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [KVM PATCH v3 0/2] irqfd 2009-04-27 18:33 [KVM PATCH v3 0/2] irqfd Gregory Haskins 2009-04-27 18:33 ` [KVM PATCH v3 1/2] eventfd: export fget and signal interfaces for module use Gregory Haskins 2009-04-27 18:33 ` [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface Gregory Haskins @ 2009-04-29 6:10 ` Andrew Morton 2009-04-29 8:18 ` Avi Kivity 2 siblings, 1 reply; 16+ messages in thread From: Andrew Morton @ 2009-04-29 6:10 UTC (permalink / raw) To: Gregory Haskins; +Cc: kvm, linux-kernel, avi, davidel On Mon, 27 Apr 2009 14:33:24 -0400 Gregory Haskins <ghaskins@novell.com> wrote: > (Applies to kvm.git 41b76d8d0487c26d6d4d3fe53c1ff59b3236f096) > > This series implements a mechanism called "irqfd". It lets you create > an eventfd based file-desriptor to inject interrupts to a kvm guest. We > associate one gsi per fd for fine-grained routing. It'd be nice if the KVM weenies amongst us were to be told why one would want to inject interrupts into a KVM guest. Monosyllabic words would be preferred ;) > We do not have a user of this interface in this series, though note > future version of virtual-bus (v4 and above) will be based on this. So I assume that this patchset will be merged if/when a user of it is merged? > The first patch will require mainline buy-in, particularly from Davide > (cc'd). The last patch is kvm specific. Three EXPORT_SYMBOL_GPL()s. Once the shouting has subsided I'd suggest that this be merged via the KVM tree. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [KVM PATCH v3 0/2] irqfd 2009-04-29 6:10 ` [KVM PATCH v3 0/2] irqfd Andrew Morton @ 2009-04-29 8:18 ` Avi Kivity 0 siblings, 0 replies; 16+ messages in thread From: Avi Kivity @ 2009-04-29 8:18 UTC (permalink / raw) To: Andrew Morton; +Cc: Gregory Haskins, kvm, linux-kernel, davidel Andrew Morton wrote: > On Mon, 27 Apr 2009 14:33:24 -0400 Gregory Haskins <ghaskins@novell.com> wrote: > > >> (Applies to kvm.git 41b76d8d0487c26d6d4d3fe53c1ff59b3236f096) >> >> This series implements a mechanism called "irqfd". It lets you create >> an eventfd based file-desriptor to inject interrupts to a kvm guest. We >> associate one gsi per fd for fine-grained routing. >> > > It'd be nice if the KVM weenies amongst us were to be told why one > would want to inject interrupts into a KVM guest. Monosyllabic words > would be preferred ;) > Interrupts are injected (better word, raised) into a guest because real hardware has interrupts. This patchset does not add the ability to raise interrupts (that existed from day 1); it adds an eventfd interface to do so. An eventfd interface is useful, because it allows components to talk to kvm guests without being tied to kvm internals; they signal an eventfd; if the eventfd is terminated in kvm, it injects an interrupt. If the eventfd is terminated in userspace, it returns from epoll(). >> We do not have a user of this interface in this series, though note >> future version of virtual-bus (v4 and above) will be based on this. >> > > So I assume that this patchset will be merged if/when a user of it is > merged? > This interface is applicable to both the kernel and userspace; userspace users won't be merged. But I certainly want to see how the whole thing works. >> The first patch will require mainline buy-in, particularly from Davide >> (cc'd). The last patch is kvm specific. >> > > Three EXPORT_SYMBOL_GPL()s. Once the shouting has subsided I'd suggest > that this be merged via the KVM tree. > I think eventfd makes tons of sense as a generic 'wake me up' mechanism that can be used from both sides of the kernel/user line. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2009-05-03 20:31 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-04-27 18:33 [KVM PATCH v3 0/2] irqfd Gregory Haskins 2009-04-27 18:33 ` [KVM PATCH v3 1/2] eventfd: export fget and signal interfaces for module use Gregory Haskins 2009-04-27 18:33 ` [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface Gregory Haskins 2009-04-30 13:07 ` Michael S. Tsirkin 2009-05-03 16:59 ` Avi Kivity 2009-05-03 19:04 ` Michael S. Tsirkin 2009-05-03 19:17 ` Avi Kivity 2009-05-03 19:25 ` Michael S. Tsirkin 2009-05-03 6:44 ` Al Viro 2009-05-03 18:07 ` Davide Libenzi 2009-05-03 19:01 ` Al Viro 2009-05-03 19:22 ` file descriptor abuses Al Viro 2009-05-03 20:11 ` [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface Davide Libenzi 2009-05-03 20:31 ` Al Viro 2009-04-29 6:10 ` [KVM PATCH v3 0/2] irqfd Andrew Morton 2009-04-29 8:18 ` Avi Kivity
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox