public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] RFC: virtual device as irq injection interface
@ 2009-05-31 18:58 Michael S. Tsirkin
  2009-05-31 19:40 ` Avi Kivity
  0 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2009-05-31 18:58 UTC (permalink / raw)
  To: Gregory Haskins, kvm, avi, mtosatti

As promised, here's a (compile-tested only) patchset that proposes
an alternative interrupt injection interface, not using eventfd.

The idea here is that we give user the ability to create "virtual
device" file descriptors from kvm context, and bind them to in-kernel
drivers. One kind of such device would be virt_irq which let the user
inject interrupts. This seems to solve all potential lifetime
and locking issues because we control file_operations for both kvm fd
and the device(irq) fd.

Another kind of device could be kernel-level virtio_net_host implementation
(which is really why I started writing this code).

As an attempt to make virtual devices more useful, they actually use an
abstract virt_hypervisor interface.  I have currently only implemented
it in kvm, but it will be possible to have lguest implement it as well,
and then lguest will be able to use e.g. in-kernel virtio-net.

Let's discuss whether we want this, or eventfd, or both.

-- 
MST


Michael S. Tsirkin (3):
  virt-core: binding together drivers and hypervisors
  kvm: virtual device support
  virt_irq: virtual device for injecting interrupts

 arch/x86/kvm/Kconfig     |    1 +
 drivers/Makefile         |    1 +
 drivers/virt/Kconfig     |   11 +++++
 drivers/virt/Makefile    |    2 +
 drivers/virt/virt_core.c |  111 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/virt/virt_irq.c  |   78 ++++++++++++++++++++++++++++++++
 include/linux/kvm.h      |   13 +++++
 include/linux/kvm_host.h |    3 +
 include/linux/virt.h     |   94 +++++++++++++++++++++++++++++++++++++++
 include/linux/virt_irq.h |   19 ++++++++
 virt/kvm/kvm_main.c      |   47 +++++++++++++++++++
 11 files changed, 380 insertions(+), 0 deletions(-)
 create mode 100644 drivers/virt/Kconfig
 create mode 100644 drivers/virt/Makefile
 create mode 100644 drivers/virt/virt_core.c
 create mode 100644 drivers/virt/virt_irq.c
 create mode 100644 include/linux/virt.h
 create mode 100644 include/linux/virt_irq.h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] RFC: virtual device as irq injection interface
  2009-05-31 18:58 [PATCH 0/3] RFC: virtual device as irq injection interface Michael S. Tsirkin
@ 2009-05-31 19:40 ` Avi Kivity
  2009-05-31 20:10   ` Michael S. Tsirkin
  0 siblings, 1 reply; 9+ messages in thread
From: Avi Kivity @ 2009-05-31 19:40 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Gregory Haskins, kvm, mtosatti

Michael S. Tsirkin wrote:
> As promised, here's a (compile-tested only) patchset that proposes
> an alternative interrupt injection interface, not using eventfd.
>
> The idea here is that we give user the ability to create "virtual
> device" file descriptors from kvm context, and bind them to in-kernel
> drivers. One kind of such device would be virt_irq which let the user
> inject interrupts. This seems to solve all potential lifetime
> and locking issues because we control file_operations for both kvm fd
> and the device(irq) fd.
>
> Another kind of device could be kernel-level virtio_net_host implementation
> (which is really why I started writing this code).
>
> As an attempt to make virtual devices more useful, they actually use an
> abstract virt_hypervisor interface.  I have currently only implemented
> it in kvm, but it will be possible to have lguest implement it as well,
> and then lguest will be able to use e.g. in-kernel virtio-net.
>
> Let's discuss whether we want this, or eventfd, or both.
>   

Certainly not both.

Version N of irqfd actually had the kernel create the fd, due to 
concerns about eventfd's flexibility (thread wakeup vs function call).  
As it turned out these concerns were misplaced (well, we still want the 
call to happen in process context when available).

I'd really like to stick with eventfd if we can solve all the problems 
there, rather than creating yet another interface.  Especially if we 
want uio to communicate directly with kvm.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] RFC: virtual device as irq injection interface
  2009-05-31 19:40 ` Avi Kivity
@ 2009-05-31 20:10   ` Michael S. Tsirkin
  2009-05-31 20:30     ` Avi Kivity
  0 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2009-05-31 20:10 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gregory Haskins, kvm, mtosatti

On Sun, May 31, 2009 at 10:40:59PM +0300, Avi Kivity wrote:
> Michael S. Tsirkin wrote:
>> As promised, here's a (compile-tested only) patchset that proposes
>> an alternative interrupt injection interface, not using eventfd.
>>
>> The idea here is that we give user the ability to create "virtual
>> device" file descriptors from kvm context, and bind them to in-kernel
>> drivers. One kind of such device would be virt_irq which let the user
>> inject interrupts. This seems to solve all potential lifetime
>> and locking issues because we control file_operations for both kvm fd
>> and the device(irq) fd.
>>
>> Another kind of device could be kernel-level virtio_net_host implementation
>> (which is really why I started writing this code).
>>
>> As an attempt to make virtual devices more useful, they actually use an
>> abstract virt_hypervisor interface.  I have currently only implemented
>> it in kvm, but it will be possible to have lguest implement it as well,
>> and then lguest will be able to use e.g. in-kernel virtio-net.
>>
>> Let's discuss whether we want this, or eventfd, or both.
>>   
>
> Certainly not both.
>
> Version N of irqfd actually had the kernel create the fd, due to  
> concerns about eventfd's flexibility (thread wakeup vs function call).   
> As it turned out these concerns were misplaced (well, we still want the  
> call to happen in process context when available).

I'm afraid there are deep lifetime issues there, and the recent patch
calling eventfd_fget seems to be just papering over the worst of them.

> I'd really like to stick with eventfd if we can solve all the problems  
> there, rather than creating yet another interface.
> Especially if we want uio to communicate directly with kvm.

Actually, current irqfd might not be able to handle assigned pci devices
because of the trick it does with set_irq(1)/set_irq(0) trick.
Guest drivers for pci devices likely assume the interrupt
is level.

With virt devices, what we'd do is create a virt device that attaches to
uio driver.  This would handle interrupts and everything else that needs
to live in kernel.


-- 
MST

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] RFC: virtual device as irq injection interface
  2009-05-31 20:10   ` Michael S. Tsirkin
@ 2009-05-31 20:30     ` Avi Kivity
  2009-06-01  4:18       ` Michael S. Tsirkin
  0 siblings, 1 reply; 9+ messages in thread
From: Avi Kivity @ 2009-05-31 20:30 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Gregory Haskins, kvm, mtosatti

Michael S. Tsirkin wrote:
>> Version N of irqfd actually had the kernel create the fd, due to  
>> concerns about eventfd's flexibility (thread wakeup vs function call).   
>> As it turned out these concerns were misplaced (well, we still want the  
>> call to happen in process context when available).
>>     
>
> I'm afraid there are deep lifetime issues there, and the recent patch
> calling eventfd_fget seems to be just papering over the worst of them.
>   

You'll have to be more specific.

>   
>> I'd really like to stick with eventfd if we can solve all the problems  
>> there, rather than creating yet another interface.
>> Especially if we want uio to communicate directly with kvm.
>>     
>
> Actually, current irqfd might not be able to handle assigned pci devices
> because of the trick it does with set_irq(1)/set_irq(0) trick.
> Guest drivers for pci devices likely assume the interrupt
> is level.
>   

Right.  I'm willing to have some userspace mediation for level-triggered 
interrupts.  It's a corner case anyway as we don't support shared 
interrupts on the host, and PCI level-triggered interrupts are very 
likely to be shared.

> With virt devices, what we'd do is create a virt device that attaches to
> uio driver.  This would handle interrupts and everything else that needs
> to live in kernel

With irqfd, what we do is attach an eventfd to the MSI we're interested 
in.  Given that eventfds are usable from userspace, we're adding a 
non-virt-specific interface to uio that serves kvm well.  Both uio and 
kvm win.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] RFC: virtual device as irq injection interface
  2009-05-31 20:30     ` Avi Kivity
@ 2009-06-01  4:18       ` Michael S. Tsirkin
  2009-06-01  7:45         ` Avi Kivity
  2009-06-01 12:00         ` Gregory Haskins
  0 siblings, 2 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2009-06-01  4:18 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gregory Haskins, kvm, mtosatti

On Sun, May 31, 2009 at 11:30:48PM +0300, Avi Kivity wrote:
> Michael S. Tsirkin wrote:
>>> Version N of irqfd actually had the kernel create the fd, due to   
>>> concerns about eventfd's flexibility (thread wakeup vs function 
>>> call).   As it turned out these concerns were misplaced (well, we 
>>> still want the  call to happen in process context when available).
>>>     
>>
>> I'm afraid there are deep lifetime issues there, and the recent patch
>> calling eventfd_fget seems to be just papering over the worst of them.
>>   
>
> You'll have to be more specific.

My concern is that we do fget on eventfd and keep this reference until
fput is done on vm fd. This works as long as no one else does
similar tricks. Imagine for example eventfd or another fs/ change that makes
eventfd do fget on descriptor X and keep it until fput is done on eventfd.
We'll get resource leak if kvm fd is substituted for X.

What do you think?

>>   
>>> I'd really like to stick with eventfd if we can solve all the 
>>> problems  there, rather than creating yet another interface.
>>> Especially if we want uio to communicate directly with kvm.
>>>     
>>
>> Actually, current irqfd might not be able to handle assigned pci devices
>> because of the trick it does with set_irq(1)/set_irq(0) trick.
>> Guest drivers for pci devices likely assume the interrupt
>> is level.
>>   
>
> Right.  I'm willing to have some userspace mediation for level-triggered  
> interrupts.

In other words, you want to keep using KVM_IRQ_LINE for this, as well?


> It's a corner case anyway as we don't support shared  
> interrupts on the host, and PCI level-triggered interrupts are very  
> likely to be shared.

If you think about virtio-net-host, there's no host interrupt there.

>> With virt devices, what we'd do is create a virt device that attaches to
>> uio driver.  This would handle interrupts and everything else that needs
>> to live in kernel
>
> With irqfd, what we do is attach an eventfd to the MSI we're interested  
> in.  Given that eventfds are usable from userspace, we're adding a  
> non-virt-specific interface to uio that serves kvm well.  Both uio and  
> kvm win.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] RFC: virtual device as irq injection interface
  2009-06-01  4:18       ` Michael S. Tsirkin
@ 2009-06-01  7:45         ` Avi Kivity
  2009-06-01 12:00         ` Gregory Haskins
  1 sibling, 0 replies; 9+ messages in thread
From: Avi Kivity @ 2009-06-01  7:45 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Gregory Haskins, kvm, mtosatti

Michael S. Tsirkin wrote:
> On Sun, May 31, 2009 at 11:30:48PM +0300, Avi Kivity wrote:
>   
>> Michael S. Tsirkin wrote:
>>     
>>>> Version N of irqfd actually had the kernel create the fd, due to   
>>>> concerns about eventfd's flexibility (thread wakeup vs function 
>>>> call).   As it turned out these concerns were misplaced (well, we 
>>>> still want the  call to happen in process context when available).
>>>>     
>>>>         
>>> I'm afraid there are deep lifetime issues there, and the recent patch
>>> calling eventfd_fget seems to be just papering over the worst of them.
>>>   
>>>       
>> You'll have to be more specific.
>>     
>
> My concern is that we do fget on eventfd and keep this reference until
> fput is done on vm fd. This works as long as no one else does
> similar tricks. Imagine for example eventfd or another fs/ change that makes
> eventfd do fget on descriptor X and keep it until fput is done on eventfd.
> We'll get resource leak if kvm fd is substituted for X.
>
> What do you think?
>
>   

I think it's unlikely that eventfd will start hanging on to fds.  If it 
does, it will have to deal with recursion anyway (eventfd holding on to 
itself), so irqfd will be just a part of the problem.

It's better to have one big problem rather than many small problems.

>>>> I'd really like to stick with eventfd if we can solve all the 
>>>> problems  there, rather than creating yet another interface.
>>>> Especially if we want uio to communicate directly with kvm.
>>>>     
>>>>         
>>> Actually, current irqfd might not be able to handle assigned pci devices
>>> because of the trick it does with set_irq(1)/set_irq(0) trick.
>>> Guest drivers for pci devices likely assume the interrupt
>>> is level.
>>>   
>>>       
>> Right.  I'm willing to have some userspace mediation for level-triggered  
>> interrupts.
>>     
>
> In other words, you want to keep using KVM_IRQ_LINE for this, as well?
>   

We'll need something more than level-triggered interrupts since we need 
to pass the acknowledge from the guest to the host somehow.

>> It's a corner case anyway as we don't support shared  
>> interrupts on the host, and PCI level-triggered interrupts are very  
>> likely to be shared.
>>     
>
> If you think about virtio-net-host, there's no host interrupt there.
>   

I was talking about uio, sorry.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] RFC: virtual device as irq injection interface
  2009-06-01  4:18       ` Michael S. Tsirkin
  2009-06-01  7:45         ` Avi Kivity
@ 2009-06-01 12:00         ` Gregory Haskins
  2009-06-01 12:04           ` Avi Kivity
  1 sibling, 1 reply; 9+ messages in thread
From: Gregory Haskins @ 2009-06-01 12:00 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, kvm, mtosatti, Davide Libenzi

[-- Attachment #1: Type: text/plain, Size: 5023 bytes --]

Michael S. Tsirkin wrote:
> On Sun, May 31, 2009 at 11:30:48PM +0300, Avi Kivity wrote:
>   
>> Michael S. Tsirkin wrote:
>>     
>>>> Version N of irqfd actually had the kernel create the fd, due to   
>>>> concerns about eventfd's flexibility (thread wakeup vs function 
>>>> call).   As it turned out these concerns were misplaced (well, we 
>>>> still want the  call to happen in process context when available).
>>>>     
>>>>         
>>> I'm afraid there are deep lifetime issues there, and the recent patch
>>> calling eventfd_fget seems to be just papering over the worst of them.
>>>   
>>>       
>> You'll have to be more specific.
>>     
>
> My concern is that we do fget on eventfd and keep this reference until
> fput is done on vm fd.

Hi Michael,
  This is not really the full picture, and I think it might be where all
the confusion starts.  You are only covering the case where kvm is the
first to close (and if you think about it, you need to handle that case
as well just like me or the tables are turned).

We both agree that a irqfd or irqfd-like concept and kvm have a
relationship with one another, and that we have to manage that
relationship, right?  The relationship starts with an IRQFD_ASSIGN, and
it stops when either the irqfd is closed, or if the kvm is closed
(whichever comes first).  The lifetimes are actually identical with your
proposal if you think about it.  Only the mechanics of how to get there
are (slightly) different.

i.e. If the IRQFD wants to close first, you do an ioctl(kvmfd,
IRQFD_DEASSIGN)+close(irqfd).   If kvm wants to close first, you do a
close(kvmfd).  I do not think there is really any issue with lifetimes
there.

I suppose you could argue: "well what if they do the close(irqfd) but
not the ioctl() (or vice versa)?", and to that I would say that its no
different than if userspace forgot to do "X" in any other resource.  The
fact is that userspace holds a number of kernel resources, and they can
either be explicitly freed (such as with a close()), or they will be
implicitly freed when the task exits.  I think all of these requirements
are met here, so I do not see a problem.

Yes, I agree that having to do two system calls to completely close it
are not as attractive as one, but the tradeoff is to potentially not use
eventfd as the underlying basis for the construct.  There are distinct
advantages to using eventfd here, so we would like to continue to do so
unless someone can display a compelling reason not to.  So far I am not
seeing such a reason.

A potential compromise is to investigate the POLLHUP technique that
Davide mentioned so that kvmfd can get notified of the closure without
needing an additional explicit ioctl to do it.  Note that we already
have irqfd in the tree so I assume we would need to do this in a ABI
friendly way, but its possible.


>  This works as long as no one else does
> similar tricks. Imagine for example eventfd or another fs/ change that makes
> eventfd do fget on descriptor X and keep it until fput is done on eventfd.
> We'll get resource leak if kvm fd is substituted for X.
>   

I don't think thats a realistic concern to assume eventfd would ever be
grabbing other fd's, but I think Avi answered this succinctly in his
reply to this mail so I won't rehash it.

> What do you think?
>
>   
>>>   
>>>       
>>>> I'd really like to stick with eventfd if we can solve all the 
>>>> problems  there, rather than creating yet another interface.
>>>> Especially if we want uio to communicate directly with kvm.
>>>>     
>>>>         
>>> Actually, current irqfd might not be able to handle assigned pci devices
>>> because of the trick it does with set_irq(1)/set_irq(0) trick.
>>> Guest drivers for pci devices likely assume the interrupt
>>> is level.
>>>   
>>>       
>> Right.  I'm willing to have some userspace mediation for level-triggered  
>> interrupts.
>>     
>
> In other words, you want to keep using KVM_IRQ_LINE for this, as well?
>   

Or more specifically, if you need something more than a basic edge
interrupt, you should use the existing interfaces.  We set the stake in
the ground during review that irqfd would only support interfaces that
can do MSI/edge like injections.
>
>   
>> It's a corner case anyway as we don't support shared  
>> interrupts on the host, and PCI level-triggered interrupts are very  
>> likely to be shared.
>>     
>
> If you think about virtio-net-host, there's no host interrupt there.
>
>   
>>> With virt devices, what we'd do is create a virt device that attaches to
>>> uio driver.  This would handle interrupts and everything else that needs
>>> to live in kernel
>>>       
>> With irqfd, what we do is attach an eventfd to the MSI we're interested  
>> in.  Given that eventfds are usable from userspace, we're adding a  
>> non-virt-specific interface to uio that serves kvm well.  Both uio and  
>> kvm win.
>>     



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] RFC: virtual device as irq injection interface
  2009-06-01 12:00         ` Gregory Haskins
@ 2009-06-01 12:04           ` Avi Kivity
  2009-06-01 12:14             ` Gregory Haskins
  0 siblings, 1 reply; 9+ messages in thread
From: Avi Kivity @ 2009-06-01 12:04 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: Michael S. Tsirkin, kvm, mtosatti, Davide Libenzi

Gregory Haskins wrote:
> A potential compromise is to investigate the POLLHUP technique that
> Davide mentioned so that kvmfd can get notified of the closure without
> needing an additional explicit ioctl to do it.  Note that we already
> have irqfd in the tree so I assume we would need to do this in a ABI
> friendly way, but its possible.
>   

We don't have irqfd in any released tree.  I'm only submitting it for 
2.6.32 (exactly so we can iron these things out), so we can change it 
any way we like or even pull it out completely.

The POLLHUP stuff is something I'd like to see in.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] RFC: virtual device as irq injection interface
  2009-06-01 12:04           ` Avi Kivity
@ 2009-06-01 12:14             ` Gregory Haskins
  0 siblings, 0 replies; 9+ messages in thread
From: Gregory Haskins @ 2009-06-01 12:14 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Michael S. Tsirkin, kvm, mtosatti, Davide Libenzi

[-- Attachment #1: Type: text/plain, Size: 723 bytes --]

Avi Kivity wrote:
> Gregory Haskins wrote:
>> A potential compromise is to investigate the POLLHUP technique that
>> Davide mentioned so that kvmfd can get notified of the closure without
>> needing an additional explicit ioctl to do it.  Note that we already
>> have irqfd in the tree so I assume we would need to do this in a ABI
>> friendly way, but its possible.
>>   
>
> We don't have irqfd in any released tree.  I'm only submitting it for
> 2.6.32 (exactly so we can iron these things out), so we can change it
> any way we like or even pull it out completely.
>
> The POLLHUP stuff is something I'd like to see in.
>
Ah, perfect.  I will submit a patch to implement this, then.

Thanks,
-Greg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-06-01 12:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-31 18:58 [PATCH 0/3] RFC: virtual device as irq injection interface Michael S. Tsirkin
2009-05-31 19:40 ` Avi Kivity
2009-05-31 20:10   ` Michael S. Tsirkin
2009-05-31 20:30     ` Avi Kivity
2009-06-01  4:18       ` Michael S. Tsirkin
2009-06-01  7:45         ` Avi Kivity
2009-06-01 12:00         ` Gregory Haskins
2009-06-01 12:04           ` Avi Kivity
2009-06-01 12:14             ` Gregory Haskins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox