* [PATCH 0/3] RFC: virtual device as irq injection interface @ 2009-05-31 18:58 Michael S. Tsirkin 2009-05-31 19:40 ` Avi Kivity 0 siblings, 1 reply; 9+ messages in thread From: Michael S. Tsirkin @ 2009-05-31 18:58 UTC (permalink / raw) To: Gregory Haskins, kvm, avi, mtosatti As promised, here's a (compile-tested only) patchset that proposes an alternative interrupt injection interface, not using eventfd. The idea here is that we give user the ability to create "virtual device" file descriptors from kvm context, and bind them to in-kernel drivers. One kind of such device would be virt_irq which let the user inject interrupts. This seems to solve all potential lifetime and locking issues because we control file_operations for both kvm fd and the device(irq) fd. Another kind of device could be kernel-level virtio_net_host implementation (which is really why I started writing this code). As an attempt to make virtual devices more useful, they actually use an abstract virt_hypervisor interface. I have currently only implemented it in kvm, but it will be possible to have lguest implement it as well, and then lguest will be able to use e.g. in-kernel virtio-net. Let's discuss whether we want this, or eventfd, or both. -- MST Michael S. Tsirkin (3): virt-core: binding together drivers and hypervisors kvm: virtual device support virt_irq: virtual device for injecting interrupts arch/x86/kvm/Kconfig | 1 + drivers/Makefile | 1 + drivers/virt/Kconfig | 11 +++++ drivers/virt/Makefile | 2 + drivers/virt/virt_core.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++ drivers/virt/virt_irq.c | 78 ++++++++++++++++++++++++++++++++ include/linux/kvm.h | 13 +++++ include/linux/kvm_host.h | 3 + include/linux/virt.h | 94 +++++++++++++++++++++++++++++++++++++++ include/linux/virt_irq.h | 19 ++++++++ virt/kvm/kvm_main.c | 47 +++++++++++++++++++ 11 files changed, 380 insertions(+), 0 deletions(-) create mode 100644 drivers/virt/Kconfig create mode 100644 drivers/virt/Makefile create mode 100644 drivers/virt/virt_core.c create mode 100644 drivers/virt/virt_irq.c create mode 100644 include/linux/virt.h create mode 100644 include/linux/virt_irq.h ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] RFC: virtual device as irq injection interface 2009-05-31 18:58 [PATCH 0/3] RFC: virtual device as irq injection interface Michael S. Tsirkin @ 2009-05-31 19:40 ` Avi Kivity 2009-05-31 20:10 ` Michael S. Tsirkin 0 siblings, 1 reply; 9+ messages in thread From: Avi Kivity @ 2009-05-31 19:40 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Gregory Haskins, kvm, mtosatti Michael S. Tsirkin wrote: > As promised, here's a (compile-tested only) patchset that proposes > an alternative interrupt injection interface, not using eventfd. > > The idea here is that we give user the ability to create "virtual > device" file descriptors from kvm context, and bind them to in-kernel > drivers. One kind of such device would be virt_irq which let the user > inject interrupts. This seems to solve all potential lifetime > and locking issues because we control file_operations for both kvm fd > and the device(irq) fd. > > Another kind of device could be kernel-level virtio_net_host implementation > (which is really why I started writing this code). > > As an attempt to make virtual devices more useful, they actually use an > abstract virt_hypervisor interface. I have currently only implemented > it in kvm, but it will be possible to have lguest implement it as well, > and then lguest will be able to use e.g. in-kernel virtio-net. > > Let's discuss whether we want this, or eventfd, or both. > Certainly not both. Version N of irqfd actually had the kernel create the fd, due to concerns about eventfd's flexibility (thread wakeup vs function call). As it turned out these concerns were misplaced (well, we still want the call to happen in process context when available). I'd really like to stick with eventfd if we can solve all the problems there, rather than creating yet another interface. Especially if we want uio to communicate directly with kvm. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] RFC: virtual device as irq injection interface 2009-05-31 19:40 ` Avi Kivity @ 2009-05-31 20:10 ` Michael S. Tsirkin 2009-05-31 20:30 ` Avi Kivity 0 siblings, 1 reply; 9+ messages in thread From: Michael S. Tsirkin @ 2009-05-31 20:10 UTC (permalink / raw) To: Avi Kivity; +Cc: Gregory Haskins, kvm, mtosatti On Sun, May 31, 2009 at 10:40:59PM +0300, Avi Kivity wrote: > Michael S. Tsirkin wrote: >> As promised, here's a (compile-tested only) patchset that proposes >> an alternative interrupt injection interface, not using eventfd. >> >> The idea here is that we give user the ability to create "virtual >> device" file descriptors from kvm context, and bind them to in-kernel >> drivers. One kind of such device would be virt_irq which let the user >> inject interrupts. This seems to solve all potential lifetime >> and locking issues because we control file_operations for both kvm fd >> and the device(irq) fd. >> >> Another kind of device could be kernel-level virtio_net_host implementation >> (which is really why I started writing this code). >> >> As an attempt to make virtual devices more useful, they actually use an >> abstract virt_hypervisor interface. I have currently only implemented >> it in kvm, but it will be possible to have lguest implement it as well, >> and then lguest will be able to use e.g. in-kernel virtio-net. >> >> Let's discuss whether we want this, or eventfd, or both. >> > > Certainly not both. > > Version N of irqfd actually had the kernel create the fd, due to > concerns about eventfd's flexibility (thread wakeup vs function call). > As it turned out these concerns were misplaced (well, we still want the > call to happen in process context when available). I'm afraid there are deep lifetime issues there, and the recent patch calling eventfd_fget seems to be just papering over the worst of them. > I'd really like to stick with eventfd if we can solve all the problems > there, rather than creating yet another interface. > Especially if we want uio to communicate directly with kvm. Actually, current irqfd might not be able to handle assigned pci devices because of the trick it does with set_irq(1)/set_irq(0) trick. Guest drivers for pci devices likely assume the interrupt is level. With virt devices, what we'd do is create a virt device that attaches to uio driver. This would handle interrupts and everything else that needs to live in kernel. -- MST ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] RFC: virtual device as irq injection interface 2009-05-31 20:10 ` Michael S. Tsirkin @ 2009-05-31 20:30 ` Avi Kivity 2009-06-01 4:18 ` Michael S. Tsirkin 0 siblings, 1 reply; 9+ messages in thread From: Avi Kivity @ 2009-05-31 20:30 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Gregory Haskins, kvm, mtosatti Michael S. Tsirkin wrote: >> Version N of irqfd actually had the kernel create the fd, due to >> concerns about eventfd's flexibility (thread wakeup vs function call). >> As it turned out these concerns were misplaced (well, we still want the >> call to happen in process context when available). >> > > I'm afraid there are deep lifetime issues there, and the recent patch > calling eventfd_fget seems to be just papering over the worst of them. > You'll have to be more specific. > >> I'd really like to stick with eventfd if we can solve all the problems >> there, rather than creating yet another interface. >> Especially if we want uio to communicate directly with kvm. >> > > Actually, current irqfd might not be able to handle assigned pci devices > because of the trick it does with set_irq(1)/set_irq(0) trick. > Guest drivers for pci devices likely assume the interrupt > is level. > Right. I'm willing to have some userspace mediation for level-triggered interrupts. It's a corner case anyway as we don't support shared interrupts on the host, and PCI level-triggered interrupts are very likely to be shared. > With virt devices, what we'd do is create a virt device that attaches to > uio driver. This would handle interrupts and everything else that needs > to live in kernel With irqfd, what we do is attach an eventfd to the MSI we're interested in. Given that eventfds are usable from userspace, we're adding a non-virt-specific interface to uio that serves kvm well. Both uio and kvm win. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] RFC: virtual device as irq injection interface 2009-05-31 20:30 ` Avi Kivity @ 2009-06-01 4:18 ` Michael S. Tsirkin 2009-06-01 7:45 ` Avi Kivity 2009-06-01 12:00 ` Gregory Haskins 0 siblings, 2 replies; 9+ messages in thread From: Michael S. Tsirkin @ 2009-06-01 4:18 UTC (permalink / raw) To: Avi Kivity; +Cc: Gregory Haskins, kvm, mtosatti On Sun, May 31, 2009 at 11:30:48PM +0300, Avi Kivity wrote: > Michael S. Tsirkin wrote: >>> Version N of irqfd actually had the kernel create the fd, due to >>> concerns about eventfd's flexibility (thread wakeup vs function >>> call). As it turned out these concerns were misplaced (well, we >>> still want the call to happen in process context when available). >>> >> >> I'm afraid there are deep lifetime issues there, and the recent patch >> calling eventfd_fget seems to be just papering over the worst of them. >> > > You'll have to be more specific. My concern is that we do fget on eventfd and keep this reference until fput is done on vm fd. This works as long as no one else does similar tricks. Imagine for example eventfd or another fs/ change that makes eventfd do fget on descriptor X and keep it until fput is done on eventfd. We'll get resource leak if kvm fd is substituted for X. What do you think? >> >>> I'd really like to stick with eventfd if we can solve all the >>> problems there, rather than creating yet another interface. >>> Especially if we want uio to communicate directly with kvm. >>> >> >> Actually, current irqfd might not be able to handle assigned pci devices >> because of the trick it does with set_irq(1)/set_irq(0) trick. >> Guest drivers for pci devices likely assume the interrupt >> is level. >> > > Right. I'm willing to have some userspace mediation for level-triggered > interrupts. In other words, you want to keep using KVM_IRQ_LINE for this, as well? > It's a corner case anyway as we don't support shared > interrupts on the host, and PCI level-triggered interrupts are very > likely to be shared. If you think about virtio-net-host, there's no host interrupt there. >> With virt devices, what we'd do is create a virt device that attaches to >> uio driver. This would handle interrupts and everything else that needs >> to live in kernel > > With irqfd, what we do is attach an eventfd to the MSI we're interested > in. Given that eventfds are usable from userspace, we're adding a > non-virt-specific interface to uio that serves kvm well. Both uio and > kvm win. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] RFC: virtual device as irq injection interface 2009-06-01 4:18 ` Michael S. Tsirkin @ 2009-06-01 7:45 ` Avi Kivity 2009-06-01 12:00 ` Gregory Haskins 1 sibling, 0 replies; 9+ messages in thread From: Avi Kivity @ 2009-06-01 7:45 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Gregory Haskins, kvm, mtosatti Michael S. Tsirkin wrote: > On Sun, May 31, 2009 at 11:30:48PM +0300, Avi Kivity wrote: > >> Michael S. Tsirkin wrote: >> >>>> Version N of irqfd actually had the kernel create the fd, due to >>>> concerns about eventfd's flexibility (thread wakeup vs function >>>> call). As it turned out these concerns were misplaced (well, we >>>> still want the call to happen in process context when available). >>>> >>>> >>> I'm afraid there are deep lifetime issues there, and the recent patch >>> calling eventfd_fget seems to be just papering over the worst of them. >>> >>> >> You'll have to be more specific. >> > > My concern is that we do fget on eventfd and keep this reference until > fput is done on vm fd. This works as long as no one else does > similar tricks. Imagine for example eventfd or another fs/ change that makes > eventfd do fget on descriptor X and keep it until fput is done on eventfd. > We'll get resource leak if kvm fd is substituted for X. > > What do you think? > > I think it's unlikely that eventfd will start hanging on to fds. If it does, it will have to deal with recursion anyway (eventfd holding on to itself), so irqfd will be just a part of the problem. It's better to have one big problem rather than many small problems. >>>> I'd really like to stick with eventfd if we can solve all the >>>> problems there, rather than creating yet another interface. >>>> Especially if we want uio to communicate directly with kvm. >>>> >>>> >>> Actually, current irqfd might not be able to handle assigned pci devices >>> because of the trick it does with set_irq(1)/set_irq(0) trick. >>> Guest drivers for pci devices likely assume the interrupt >>> is level. >>> >>> >> Right. I'm willing to have some userspace mediation for level-triggered >> interrupts. >> > > In other words, you want to keep using KVM_IRQ_LINE for this, as well? > We'll need something more than level-triggered interrupts since we need to pass the acknowledge from the guest to the host somehow. >> It's a corner case anyway as we don't support shared >> interrupts on the host, and PCI level-triggered interrupts are very >> likely to be shared. >> > > If you think about virtio-net-host, there's no host interrupt there. > I was talking about uio, sorry. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] RFC: virtual device as irq injection interface 2009-06-01 4:18 ` Michael S. Tsirkin 2009-06-01 7:45 ` Avi Kivity @ 2009-06-01 12:00 ` Gregory Haskins 2009-06-01 12:04 ` Avi Kivity 1 sibling, 1 reply; 9+ messages in thread From: Gregory Haskins @ 2009-06-01 12:00 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Avi Kivity, kvm, mtosatti, Davide Libenzi [-- Attachment #1: Type: text/plain, Size: 5023 bytes --] Michael S. Tsirkin wrote: > On Sun, May 31, 2009 at 11:30:48PM +0300, Avi Kivity wrote: > >> Michael S. Tsirkin wrote: >> >>>> Version N of irqfd actually had the kernel create the fd, due to >>>> concerns about eventfd's flexibility (thread wakeup vs function >>>> call). As it turned out these concerns were misplaced (well, we >>>> still want the call to happen in process context when available). >>>> >>>> >>> I'm afraid there are deep lifetime issues there, and the recent patch >>> calling eventfd_fget seems to be just papering over the worst of them. >>> >>> >> You'll have to be more specific. >> > > My concern is that we do fget on eventfd and keep this reference until > fput is done on vm fd. Hi Michael, This is not really the full picture, and I think it might be where all the confusion starts. You are only covering the case where kvm is the first to close (and if you think about it, you need to handle that case as well just like me or the tables are turned). We both agree that a irqfd or irqfd-like concept and kvm have a relationship with one another, and that we have to manage that relationship, right? The relationship starts with an IRQFD_ASSIGN, and it stops when either the irqfd is closed, or if the kvm is closed (whichever comes first). The lifetimes are actually identical with your proposal if you think about it. Only the mechanics of how to get there are (slightly) different. i.e. If the IRQFD wants to close first, you do an ioctl(kvmfd, IRQFD_DEASSIGN)+close(irqfd). If kvm wants to close first, you do a close(kvmfd). I do not think there is really any issue with lifetimes there. I suppose you could argue: "well what if they do the close(irqfd) but not the ioctl() (or vice versa)?", and to that I would say that its no different than if userspace forgot to do "X" in any other resource. The fact is that userspace holds a number of kernel resources, and they can either be explicitly freed (such as with a close()), or they will be implicitly freed when the task exits. I think all of these requirements are met here, so I do not see a problem. Yes, I agree that having to do two system calls to completely close it are not as attractive as one, but the tradeoff is to potentially not use eventfd as the underlying basis for the construct. There are distinct advantages to using eventfd here, so we would like to continue to do so unless someone can display a compelling reason not to. So far I am not seeing such a reason. A potential compromise is to investigate the POLLHUP technique that Davide mentioned so that kvmfd can get notified of the closure without needing an additional explicit ioctl to do it. Note that we already have irqfd in the tree so I assume we would need to do this in a ABI friendly way, but its possible. > This works as long as no one else does > similar tricks. Imagine for example eventfd or another fs/ change that makes > eventfd do fget on descriptor X and keep it until fput is done on eventfd. > We'll get resource leak if kvm fd is substituted for X. > I don't think thats a realistic concern to assume eventfd would ever be grabbing other fd's, but I think Avi answered this succinctly in his reply to this mail so I won't rehash it. > What do you think? > > >>> >>> >>>> I'd really like to stick with eventfd if we can solve all the >>>> problems there, rather than creating yet another interface. >>>> Especially if we want uio to communicate directly with kvm. >>>> >>>> >>> Actually, current irqfd might not be able to handle assigned pci devices >>> because of the trick it does with set_irq(1)/set_irq(0) trick. >>> Guest drivers for pci devices likely assume the interrupt >>> is level. >>> >>> >> Right. I'm willing to have some userspace mediation for level-triggered >> interrupts. >> > > In other words, you want to keep using KVM_IRQ_LINE for this, as well? > Or more specifically, if you need something more than a basic edge interrupt, you should use the existing interfaces. We set the stake in the ground during review that irqfd would only support interfaces that can do MSI/edge like injections. > > >> It's a corner case anyway as we don't support shared >> interrupts on the host, and PCI level-triggered interrupts are very >> likely to be shared. >> > > If you think about virtio-net-host, there's no host interrupt there. > > >>> With virt devices, what we'd do is create a virt device that attaches to >>> uio driver. This would handle interrupts and everything else that needs >>> to live in kernel >>> >> With irqfd, what we do is attach an eventfd to the MSI we're interested >> in. Given that eventfds are usable from userspace, we're adding a >> non-virt-specific interface to uio that serves kvm well. Both uio and >> kvm win. >> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 266 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] RFC: virtual device as irq injection interface 2009-06-01 12:00 ` Gregory Haskins @ 2009-06-01 12:04 ` Avi Kivity 2009-06-01 12:14 ` Gregory Haskins 0 siblings, 1 reply; 9+ messages in thread From: Avi Kivity @ 2009-06-01 12:04 UTC (permalink / raw) To: Gregory Haskins; +Cc: Michael S. Tsirkin, kvm, mtosatti, Davide Libenzi Gregory Haskins wrote: > A potential compromise is to investigate the POLLHUP technique that > Davide mentioned so that kvmfd can get notified of the closure without > needing an additional explicit ioctl to do it. Note that we already > have irqfd in the tree so I assume we would need to do this in a ABI > friendly way, but its possible. > We don't have irqfd in any released tree. I'm only submitting it for 2.6.32 (exactly so we can iron these things out), so we can change it any way we like or even pull it out completely. The POLLHUP stuff is something I'd like to see in. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] RFC: virtual device as irq injection interface 2009-06-01 12:04 ` Avi Kivity @ 2009-06-01 12:14 ` Gregory Haskins 0 siblings, 0 replies; 9+ messages in thread From: Gregory Haskins @ 2009-06-01 12:14 UTC (permalink / raw) To: Avi Kivity; +Cc: Michael S. Tsirkin, kvm, mtosatti, Davide Libenzi [-- Attachment #1: Type: text/plain, Size: 723 bytes --] Avi Kivity wrote: > Gregory Haskins wrote: >> A potential compromise is to investigate the POLLHUP technique that >> Davide mentioned so that kvmfd can get notified of the closure without >> needing an additional explicit ioctl to do it. Note that we already >> have irqfd in the tree so I assume we would need to do this in a ABI >> friendly way, but its possible. >> > > We don't have irqfd in any released tree. I'm only submitting it for > 2.6.32 (exactly so we can iron these things out), so we can change it > any way we like or even pull it out completely. > > The POLLHUP stuff is something I'd like to see in. > Ah, perfect. I will submit a patch to implement this, then. Thanks, -Greg [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 266 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-06-01 12:14 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-05-31 18:58 [PATCH 0/3] RFC: virtual device as irq injection interface Michael S. Tsirkin 2009-05-31 19:40 ` Avi Kivity 2009-05-31 20:10 ` Michael S. Tsirkin 2009-05-31 20:30 ` Avi Kivity 2009-06-01 4:18 ` Michael S. Tsirkin 2009-06-01 7:45 ` Avi Kivity 2009-06-01 12:00 ` Gregory Haskins 2009-06-01 12:04 ` Avi Kivity 2009-06-01 12:14 ` Gregory Haskins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox