linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] eventfd: new EFD_STATE flag
@ 2009-08-20 15:56 Michael S. Tsirkin
  2009-08-20 16:20 ` Davide Libenzi
  0 siblings, 1 reply; 47+ messages in thread
From: Michael S. Tsirkin @ 2009-08-20 15:56 UTC (permalink / raw)
  To: davidel, avi, gleb, kvm, linux-kernel

Davide,
Looks like I got an ack from Avi and no comments from others,
so the following patchset is identical to the last RFC.
Can it be merged for 2.6.32?
Thanks!

This series implements a new EFD_STATE flag for eventfd.
When set, this changes eventfd behaviour in the following way:
- write simply stores the value written, and is always non-blocking
- read unblocks when the value written changes, and
  returns the value written

Motivation: we'd like to use eventfd in qemu to pass interrupts from
(emulated or assigned) devices to guest. For level interrupts, the
counter supported currently by eventfd is not a good match: we really
need to set interrupt to a level, typically 0 or 1, wake the guest if
there was a change and give the guest ability to see the last value
written. The level can be set either from kernel (e.g. with assigned
devices) or from userspace.

Michael S. Tsirkin (2):
  eventfd: reorganize the code to simplify new flags
  eventfd: EFD_STATE flag

 fs/eventfd.c            |   83 +++++++++++++++++++++++++++++++++++++++--------
 include/linux/eventfd.h |    3 +-
 2 files changed, 71 insertions(+), 15 deletions(-)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-20 15:56 [PATCH 0/2] eventfd: new EFD_STATE flag Michael S. Tsirkin
@ 2009-08-20 16:20 ` Davide Libenzi
  2009-08-20 17:38   ` Avi Kivity
  0 siblings, 1 reply; 47+ messages in thread
From: Davide Libenzi @ 2009-08-20 16:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, Linux Kernel Mailing List

On Thu, 20 Aug 2009, Michael S. Tsirkin wrote:

> Davide,
> Looks like I got an ack from Avi and no comments from others,
> so the following patchset is identical to the last RFC.
> Can it be merged for 2.6.32?
> Thanks!

I briefly looked at this while in vacation, although I did not reply 
hoping the horrible feeling about this code would go away.
It didn't.
I find this to be an ugly and ad-hoc multiplexing of eventfd with added 
functionalities of questionable general use.
I'm pretty sure you can do better on KVM side, to solve the problem w/out 
littering eventfd.


- Davide


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-20 16:20 ` Davide Libenzi
@ 2009-08-20 17:38   ` Avi Kivity
  2009-08-20 17:44     ` Davide Libenzi
  2009-08-20 17:55     ` Michael S. Tsirkin
  0 siblings, 2 replies; 47+ messages in thread
From: Avi Kivity @ 2009-08-20 17:38 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Michael S. Tsirkin, gleb, kvm, Linux Kernel Mailing List

On 08/20/2009 07:20 PM, Davide Libenzi wrote:
>
> I briefly looked at this while in vacation, although I did not reply
> hoping the horrible feeling about this code would go away.
> It didn't.
> I find this to be an ugly and ad-hoc multiplexing of eventfd with added
> functionalities of questionable general use.
> I'm pretty sure you can do better on KVM side, to solve the problem w/out
> littering eventfd.
>
>    

While we could argue about this my feeling is that we should drop this, 
at least until we can quantify what benefit it has and whether there are 
any Davide-acceptable alternatives.

In the meanwhile, we can let vhost-net support edge-triggered interrupts 
only, let qemu terminate those eventfds and convert then to 
level-triggered interrupts (which it can then inject using the existing 
ioctl).  It will keep vhost-net and kvm simpler at the cost of some 
performance penalty to guests using level interrupts.  These suck anyway 
so we'll point users at msi.

-- 
I have a truly marvellous patch that fixes the bug which thisb
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-20 17:38   ` Avi Kivity
@ 2009-08-20 17:44     ` Davide Libenzi
  2009-08-20 17:56       ` Paolo Bonzini
  2009-08-20 17:55     ` Michael S. Tsirkin
  1 sibling, 1 reply; 47+ messages in thread
From: Davide Libenzi @ 2009-08-20 17:44 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Michael S. Tsirkin, gleb, kvm, Linux Kernel Mailing List

On Thu, 20 Aug 2009, Avi Kivity wrote:

> On 08/20/2009 07:20 PM, Davide Libenzi wrote:
> > 
> > I briefly looked at this while in vacation, although I did not reply
> > hoping the horrible feeling about this code would go away.
> > It didn't.
> > I find this to be an ugly and ad-hoc multiplexing of eventfd with added
> > functionalities of questionable general use.
> > I'm pretty sure you can do better on KVM side, to solve the problem w/out
> > littering eventfd.
> > 
> >    
> 
> While we could argue about this my feeling is that we should drop this, at
> least until we can quantify what benefit it has and whether there are any
> Davide-acceptable alternatives.

I really didn't mean to be harsh, but seriously, we cannot just have a 
Multiplexing Feast over eventfd, with one-time users.


- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-20 17:38   ` Avi Kivity
  2009-08-20 17:44     ` Davide Libenzi
@ 2009-08-20 17:55     ` Michael S. Tsirkin
  2009-08-20 18:06       ` Avi Kivity
  1 sibling, 1 reply; 47+ messages in thread
From: Michael S. Tsirkin @ 2009-08-20 17:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Davide Libenzi, gleb, kvm, Linux Kernel Mailing List

On Thu, Aug 20, 2009 at 08:38:48PM +0300, Avi Kivity wrote:
> On 08/20/2009 07:20 PM, Davide Libenzi wrote:
>>
>> I briefly looked at this while in vacation, although I did not reply
>> hoping the horrible feeling about this code would go away.
>> It didn't.
>> I find this to be an ugly and ad-hoc multiplexing of eventfd with added
>> functionalities of questionable general use.
>> I'm pretty sure you can do better on KVM side, to solve the problem w/out
>> littering eventfd.
>>
>>    
>
> While we could argue about this my feeling is that we should drop this,  
> at least until we can quantify what benefit it has and whether there are  
> any Davide-acceptable alternatives.
>
> In the meanwhile, we can let vhost-net support edge-triggered interrupts  
> only, let qemu terminate those eventfds and convert then to  
> level-triggered interrupts (which it can then inject using the existing  
> ioctl).  It will keep vhost-net and kvm simpler at the cost of some  
> performance penalty to guests using level interrupts.  These suck anyway  
> so we'll point users at msi.

I thought the point was to move assigned devices out of KVM?

> -- 
> I have a truly marvellous patch that fixes the bug which thisb
> signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-20 17:44     ` Davide Libenzi
@ 2009-08-20 17:56       ` Paolo Bonzini
  2009-08-21 17:21         ` Davide Libenzi
  0 siblings, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2009-08-20 17:56 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Avi Kivity, Michael S. Tsirkin, gleb, kvm,
	Linux Kernel Mailing List

On 08/20/2009 07:44 PM, Davide Libenzi wrote:
> On Thu, 20 Aug 2009, Avi Kivity wrote:
>
>> On 08/20/2009 07:20 PM, Davide Libenzi wrote:
>>>
>>> I briefly looked at this while in vacation, although I did not reply
>>> hoping the horrible feeling about this code would go away.
>>> It didn't.
>>> I find this to be an ugly and ad-hoc multiplexing of eventfd with added
>>> functionalities of questionable general use.
>>> I'm pretty sure you can do better on KVM side, to solve the problem w/out
>>> littering eventfd.
>>
>> While we could argue about this my feeling is that we should drop this, at
>> least until we can quantify what benefit it has and whether there are any
>> Davide-acceptable alternatives.
>
> I really didn't mean to be harsh, but seriously, we cannot just have a
> Multiplexing Feast over eventfd, with one-time users.

EFD_STATE actually does two changes: a) makes read block until the value 
changes; b) makes each write override the previous one.  How would you 
feel if the two changes were separated?  I can see each of them has use 
cases

For example, (a) could be implemented by using select's xfds (POLLPRI) 
to poll for value changes (rfds would still poll for non-zeroness). 
Then Michael does not need even to read the eventfd; instead he'd check 
POLLIN with a zero timeout.

(b) could be implemented with a flag like Michael did.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-20 17:55     ` Michael S. Tsirkin
@ 2009-08-20 18:06       ` Avi Kivity
  2009-08-20 18:28         ` Michael S. Tsirkin
  0 siblings, 1 reply; 47+ messages in thread
From: Avi Kivity @ 2009-08-20 18:06 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Davide Libenzi, gleb, kvm, Linux Kernel Mailing List

On 08/20/2009 08:55 PM, Michael S. Tsirkin wrote:
>
>> While we could argue about this my feeling is that we should drop this,
>> at least until we can quantify what benefit it has and whether there are
>> any Davide-acceptable alternatives.
>>
>> In the meanwhile, we can let vhost-net support edge-triggered interrupts
>> only, let qemu terminate those eventfds and convert then to
>> level-triggered interrupts (which it can then inject using the existing
>> ioctl).  It will keep vhost-net and kvm simpler at the cost of some
>> performance penalty to guests using level interrupts.  These suck anyway
>> so we'll point users at msi.
>>      
> I thought the point was to move assigned devices out of KVM?
>    

Grr.  Forgot about that.

That's much more important.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-20 18:06       ` Avi Kivity
@ 2009-08-20 18:28         ` Michael S. Tsirkin
  2009-08-23 13:01           ` Avi Kivity
  0 siblings, 1 reply; 47+ messages in thread
From: Michael S. Tsirkin @ 2009-08-20 18:28 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Davide Libenzi, gleb, kvm, Linux Kernel Mailing List

On Thu, Aug 20, 2009 at 09:06:51PM +0300, Avi Kivity wrote:
> On 08/20/2009 08:55 PM, Michael S. Tsirkin wrote:
>>
>>> While we could argue about this my feeling is that we should drop this,
>>> at least until we can quantify what benefit it has and whether there are
>>> any Davide-acceptable alternatives.
>>>
>>> In the meanwhile, we can let vhost-net support edge-triggered interrupts
>>> only, let qemu terminate those eventfds and convert then to
>>> level-triggered interrupts (which it can then inject using the existing
>>> ioctl).  It will keep vhost-net and kvm simpler at the cost of some
>>> performance penalty to guests using level interrupts.  These suck anyway
>>> so we'll point users at msi.
>>>      
>> I thought the point was to move assigned devices out of KVM?
>>    
>
> Grr.  Forgot about that.
>
> That's much more important.

Looks like we'll have do with a separate char device or
something?

> -- 
> I have a truly marvellous patch that fixes the bug which this
> signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-20 17:56       ` Paolo Bonzini
@ 2009-08-21 17:21         ` Davide Libenzi
  0 siblings, 0 replies; 47+ messages in thread
From: Davide Libenzi @ 2009-08-21 17:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Avi Kivity, Michael S. Tsirkin, gleb, kvm,
	Linux Kernel Mailing List

On Thu, 20 Aug 2009, Paolo Bonzini wrote:

> On 08/20/2009 07:44 PM, Davide Libenzi wrote:
> > On Thu, 20 Aug 2009, Avi Kivity wrote:
> > 
> > > On 08/20/2009 07:20 PM, Davide Libenzi wrote:
> > > > 
> > > > I briefly looked at this while in vacation, although I did not reply
> > > > hoping the horrible feeling about this code would go away.
> > > > It didn't.
> > > > I find this to be an ugly and ad-hoc multiplexing of eventfd with added
> > > > functionalities of questionable general use.
> > > > I'm pretty sure you can do better on KVM side, to solve the problem
> > > > w/out
> > > > littering eventfd.
> > > 
> > > While we could argue about this my feeling is that we should drop this, at
> > > least until we can quantify what benefit it has and whether there are any
> > > Davide-acceptable alternatives.
> > 
> > I really didn't mean to be harsh, but seriously, we cannot just have a
> > Multiplexing Feast over eventfd, with one-time users.
> 
> EFD_STATE actually does two changes: a) makes read block until the value
> changes; b) makes each write override the previous one.  How would you feel if
> the two changes were separated?

I'm afraid that splitting the change won't make less inappropriate for 
eventfd.
I've no doubt KVM needs something like this, but this just don't belong to 
eventfd, which act as either a mutex or a semaphore.
This EFD_STATE does not belong to a generic interface, proof of which is 
the lack of such abstraction in any generic library I'm aware of.
This aside from any consideration about how the code will look with that 
multiplexing in place.


- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-20 18:28         ` Michael S. Tsirkin
@ 2009-08-23 13:01           ` Avi Kivity
  2009-08-23 13:36             ` Michael S. Tsirkin
  0 siblings, 1 reply; 47+ messages in thread
From: Avi Kivity @ 2009-08-23 13:01 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Davide Libenzi, gleb, kvm, Linux Kernel Mailing List

On 08/20/2009 09:28 PM, Michael S. Tsirkin wrote:
>>> I thought the point was to move assigned devices out of KVM?
>>>
>>>        
>> Grr.  Forgot about that.
>>
>> That's much more important.
>>      
> Looks like we'll have do with a separate char device or
> something?
>    

We can still tunnel it through userspace.  Most assigned devices will 
have MSI.  uio can signal the eventfd when the level changes, qemu can 
read the level and write it to kvm.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-23 13:01           ` Avi Kivity
@ 2009-08-23 13:36             ` Michael S. Tsirkin
  2009-08-23 13:40               ` Avi Kivity
  0 siblings, 1 reply; 47+ messages in thread
From: Michael S. Tsirkin @ 2009-08-23 13:36 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Davide Libenzi, gleb, kvm, Linux Kernel Mailing List

On Sun, Aug 23, 2009 at 04:01:29PM +0300, Avi Kivity wrote:
> On 08/20/2009 09:28 PM, Michael S. Tsirkin wrote:
>>>> I thought the point was to move assigned devices out of KVM?
>>>>
>>>>        
>>> Grr.  Forgot about that.
>>>
>>> That's much more important.
>>>      
>> Looks like we'll have do with a separate char device or
>> something?
>>    
>
> We can still tunnel it through userspace.

More important here is realization that eventfd is a mutex/semaphore
implementation, not a generic event reporting interface as we are trying
to use it.

> Most assigned devices will  
> have MSI.  uio can signal the eventfd when the level changes, qemu can  
> read the level and write it to kvm.
> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-23 13:36             ` Michael S. Tsirkin
@ 2009-08-23 13:40               ` Avi Kivity
  2009-08-23 14:30                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 47+ messages in thread
From: Avi Kivity @ 2009-08-23 13:40 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Davide Libenzi, gleb, kvm, Linux Kernel Mailing List

On 08/23/2009 04:36 PM, Michael S. Tsirkin wrote:
> More important here is realization that eventfd is a mutex/semaphore
> implementation, not a generic event reporting interface as we are trying
> to use it.
>    

Well it is a generic event reporting interface (for example, aio uses it).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-23 13:40               ` Avi Kivity
@ 2009-08-23 14:30                 ` Michael S. Tsirkin
  2009-08-23 16:51                   ` Paolo Bonzini
  2009-08-24 18:25                   ` Davide Libenzi
  0 siblings, 2 replies; 47+ messages in thread
From: Michael S. Tsirkin @ 2009-08-23 14:30 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Davide Libenzi, gleb, kvm, Linux Kernel Mailing List

On Sun, Aug 23, 2009 at 04:40:51PM +0300, Avi Kivity wrote:
> On 08/23/2009 04:36 PM, Michael S. Tsirkin wrote:
>> More important here is realization that eventfd is a mutex/semaphore
>> implementation, not a generic event reporting interface as we are trying
>> to use it.
>>    
>
> Well it is a generic event reporting interface (for example, aio uses it).

Davide, I think it's a valid point.  For example, what read on eventfd
does (zero a counter and return) is not like any semaphore I saw.

Look at eventfd as pipe replacement: for wake-up between processes, it
works well. But what if we want to pass around some kind of key as well?
pipe lets you pass single-byte values atomically. This flag allows us to
use 8 byte values as keys.

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-23 14:30                 ` Michael S. Tsirkin
@ 2009-08-23 16:51                   ` Paolo Bonzini
  2009-08-24 18:25                   ` Davide Libenzi
  1 sibling, 0 replies; 47+ messages in thread
From: Paolo Bonzini @ 2009-08-23 16:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Davide Libenzi, gleb, kvm, Linux Kernel Mailing List

On 08/23/2009 04:30 PM, Michael S. Tsirkin wrote:
> On Sun, Aug 23, 2009 at 04:40:51PM +0300, Avi Kivity wrote:
>> On 08/23/2009 04:36 PM, Michael S. Tsirkin wrote:
>>> More important here is realization that eventfd is a mutex/semaphore
>>> implementation, not a generic event reporting interface as we are trying
>>> to use it.
>>>
>>
>> Well it is a generic event reporting interface (for example, aio uses it).
>
> Davide, I think it's a valid point.  For example, what read on eventfd
> does (zero a counter and return) is not like any semaphore I saw.

It is similar to Win32 events (which can be used like contorted mutexes 
to sync two processes).

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-23 14:30                 ` Michael S. Tsirkin
  2009-08-23 16:51                   ` Paolo Bonzini
@ 2009-08-24 18:25                   ` Davide Libenzi
  2009-08-24 18:31                     ` Avi Kivity
  2009-08-24 21:49                     ` Michael S. Tsirkin
  1 sibling, 2 replies; 47+ messages in thread
From: Davide Libenzi @ 2009-08-24 18:25 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, gleb, kvm, Linux Kernel Mailing List

On Sun, 23 Aug 2009, Michael S. Tsirkin wrote:

> On Sun, Aug 23, 2009 at 04:40:51PM +0300, Avi Kivity wrote:
> > On 08/23/2009 04:36 PM, Michael S. Tsirkin wrote:
> >> More important here is realization that eventfd is a mutex/semaphore
> >> implementation, not a generic event reporting interface as we are trying
> >> to use it.
> >>    
> >
> > Well it is a generic event reporting interface (for example, aio uses it).
> 
> Davide, I think it's a valid point.  For example, what read on eventfd
> does (zero a counter and return) is not like any semaphore I saw.


Indeed, the default eventfd behaviour is like, well, an event. Signaling 
(kernel side) or writing (userspace side), signals the event.
Waiting (reading) it, will reset the event.
If you use EFD_SEMAPHORE, you get a semaphore-like behavior.
Events and sempahores are two widely known and used abstractions.
The EFD_STATE proposed one, well, no. Not at all.



- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-24 18:25                   ` Davide Libenzi
@ 2009-08-24 18:31                     ` Avi Kivity
  2009-08-24 22:08                       ` Davide Libenzi
  2009-08-24 21:49                     ` Michael S. Tsirkin
  1 sibling, 1 reply; 47+ messages in thread
From: Avi Kivity @ 2009-08-24 18:31 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Michael S. Tsirkin, gleb, kvm, Linux Kernel Mailing List

On 08/24/2009 09:25 PM, Davide Libenzi wrote:
> Indeed, the default eventfd behaviour is like, well, an event. Signaling
> (kernel side) or writing (userspace side), signals the event.
> Waiting (reading) it, will reset the event.
> If you use EFD_SEMAPHORE, you get a semaphore-like behavior.
> Events and sempahores are two widely known and used abstractions.
> The EFD_STATE proposed one, well, no. Not at all.
>    

There are libraries that provide notifications (or fire watches) when 
some value changes.  They're much less frequently used than events or 
semaphores, though.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-24 18:25                   ` Davide Libenzi
  2009-08-24 18:31                     ` Avi Kivity
@ 2009-08-24 21:49                     ` Michael S. Tsirkin
  2009-08-24 22:15                       ` Davide Libenzi
  1 sibling, 1 reply; 47+ messages in thread
From: Michael S. Tsirkin @ 2009-08-24 21:49 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Avi Kivity, gleb, kvm, Linux Kernel Mailing List

On Mon, Aug 24, 2009 at 11:25:01AM -0700, Davide Libenzi wrote:
> On Sun, 23 Aug 2009, Michael S. Tsirkin wrote:
> 
> > On Sun, Aug 23, 2009 at 04:40:51PM +0300, Avi Kivity wrote:
> > > On 08/23/2009 04:36 PM, Michael S. Tsirkin wrote:
> > >> More important here is realization that eventfd is a mutex/semaphore
> > >> implementation, not a generic event reporting interface as we are trying
> > >> to use it.
> > >>    
> > >
> > > Well it is a generic event reporting interface (for example, aio uses it).
> > 
> > Davide, I think it's a valid point.  For example, what read on eventfd
> > does (zero a counter and return) is not like any semaphore I saw.
> 
> 
> Indeed, the default eventfd behaviour is like, well, an event. Signaling 
> (kernel side) or writing (userspace side), signals the event.
> Waiting (reading) it, will reset the event.
> If you use EFD_SEMAPHORE, you get a semaphore-like behavior.
> Events and sempahores are two widely known and used abstractions.
> The EFD_STATE proposed one, well, no. Not at all.

Hmm. All we try to do is, associate a small key with the event
that we signal. Is it really that uncommon/KVM specific?

> 
> 
> - Davide
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-24 18:31                     ` Avi Kivity
@ 2009-08-24 22:08                       ` Davide Libenzi
  2009-08-24 22:10                         ` Paolo Bonzini
  2009-08-25  4:26                         ` Avi Kivity
  0 siblings, 2 replies; 47+ messages in thread
From: Davide Libenzi @ 2009-08-24 22:08 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Michael S. Tsirkin, gleb, kvm, Linux Kernel Mailing List

On Mon, 24 Aug 2009, Avi Kivity wrote:

> On 08/24/2009 09:25 PM, Davide Libenzi wrote:
> > Indeed, the default eventfd behaviour is like, well, an event. Signaling
> > (kernel side) or writing (userspace side), signals the event.
> > Waiting (reading) it, will reset the event.
> > If you use EFD_SEMAPHORE, you get a semaphore-like behavior.
> > Events and sempahores are two widely known and used abstractions.
> > The EFD_STATE proposed one, well, no. Not at all.
> >    
> 
> There are libraries that provide notifications (or fire watches) when some
> value changes.  They're much less frequently used than events or semaphores,
> though.

There are userspace libraries that do almost everything, but you hardly 
see things like pthread_(EFD_STATE-like)_create() or similar system 
interfaces based on such abstraction.
Is that really difficult to understand where I'm standing, leaving the KVM 
hat off for a moment?



- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-24 22:08                       ` Davide Libenzi
@ 2009-08-24 22:10                         ` Paolo Bonzini
  2009-08-24 22:32                           ` Davide Libenzi
  2009-08-25  4:26                         ` Avi Kivity
  1 sibling, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2009-08-24 22:10 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Avi Kivity, Michael S. Tsirkin, gleb, kvm,
	Linux Kernel Mailing List


> There are userspace libraries that do almost everything, but you hardly
> see things like pthread_(EFD_STATE-like)_create() or similar system
> interfaces based on such abstraction.

It actually seems as close to a condition variable as an eventfd can be.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-24 21:49                     ` Michael S. Tsirkin
@ 2009-08-24 22:15                       ` Davide Libenzi
  2009-08-25  7:22                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 47+ messages in thread
From: Davide Libenzi @ 2009-08-24 22:15 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, gleb, kvm, Linux Kernel Mailing List

On Tue, 25 Aug 2009, Michael S. Tsirkin wrote:

> On Mon, Aug 24, 2009 at 11:25:01AM -0700, Davide Libenzi wrote:
> > On Sun, 23 Aug 2009, Michael S. Tsirkin wrote:
> > 
> > > On Sun, Aug 23, 2009 at 04:40:51PM +0300, Avi Kivity wrote:
> > > > On 08/23/2009 04:36 PM, Michael S. Tsirkin wrote:
> > > >> More important here is realization that eventfd is a mutex/semaphore
> > > >> implementation, not a generic event reporting interface as we are trying
> > > >> to use it.
> > > >>    
> > > >
> > > > Well it is a generic event reporting interface (for example, aio uses it).
> > > 
> > > Davide, I think it's a valid point.  For example, what read on eventfd
> > > does (zero a counter and return) is not like any semaphore I saw.
> > 
> > 
> > Indeed, the default eventfd behaviour is like, well, an event. Signaling 
> > (kernel side) or writing (userspace side), signals the event.
> > Waiting (reading) it, will reset the event.
> > If you use EFD_SEMAPHORE, you get a semaphore-like behavior.
> > Events and sempahores are two widely known and used abstractions.
> > The EFD_STATE proposed one, well, no. Not at all.
> 
> Hmm. All we try to do is, associate a small key with the event
> that we signal. Is it really that uncommon/KVM specific?

All I'm trying to do, is to avoid that eventfd will become an horrible 
multiplexor for every freaky one-time-use behaviors arising inside kernel 
modules.



- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-24 22:10                         ` Paolo Bonzini
@ 2009-08-24 22:32                           ` Davide Libenzi
  2009-08-25  6:59                             ` Paolo Bonzini
  0 siblings, 1 reply; 47+ messages in thread
From: Davide Libenzi @ 2009-08-24 22:32 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Avi Kivity, Michael S. Tsirkin, gleb, kvm,
	Linux Kernel Mailing List

On Tue, 25 Aug 2009, Paolo Bonzini wrote:

> 
> > There are userspace libraries that do almost everything, but you hardly
> > see things like pthread_(EFD_STATE-like)_create() or similar system
> > interfaces based on such abstraction.
> 
> It actually seems as close to a condition variable as an eventfd can be.

A pthread condition typical code usage maps to eventfd like:

	while (read(efd, ...) > 0)
		if (CONDITION)
			break;

So a pthread condition is really a wakeup gate like eventfd is.
EFD_STATE has nothing to do with a pthread condition.



- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-24 22:08                       ` Davide Libenzi
  2009-08-24 22:10                         ` Paolo Bonzini
@ 2009-08-25  4:26                         ` Avi Kivity
  1 sibling, 0 replies; 47+ messages in thread
From: Avi Kivity @ 2009-08-25  4:26 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Michael S. Tsirkin, gleb, kvm, Linux Kernel Mailing List

On 08/25/2009 01:08 AM, Davide Libenzi wrote:
> Is that really difficult to understand where I'm standing, leaving the KVM
> hat off for a moment?
>    

I understand it perfectly.  I take the same position with kvm.  I'm 
providing more data in the hope that you'll change you mind, not trying 
to flood you with email so you'll give up.

We can always create our eventfd-lookalike for kvm, but I'd rather not 
do that (other options include a userspace proxy through existing 
interfaces, it might even be better than changing eventfd if we decide 
performance for level-triggered interrupts is not critical).

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-24 22:32                           ` Davide Libenzi
@ 2009-08-25  6:59                             ` Paolo Bonzini
  0 siblings, 0 replies; 47+ messages in thread
From: Paolo Bonzini @ 2009-08-25  6:59 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Avi Kivity, Michael S. Tsirkin, gleb, kvm,
	Linux Kernel Mailing List


>>> There are userspace libraries that do almost everything, but you hardly
>>> see things like pthread_(EFD_STATE-like)_create() or similar system
>>> interfaces based on such abstraction.
>>
>> It actually seems as close to a condition variable as an eventfd can be.
>
> A pthread condition typical code usage maps to eventfd like:
>
> 	while (read(efd, ...)>  0)
> 		if (CONDITION)
> 			break;
>
> So a pthread condition is really a wakeup gate like eventfd is.
> EFD_STATE has nothing to do with a pthread condition.

No, your code does not work for pthread_cond_broadcast (which arguably 
is the common case) because all the eventfd readers after the first 
would block.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-24 22:15                       ` Davide Libenzi
@ 2009-08-25  7:22                         ` Michael S. Tsirkin
  2009-08-25 21:57                           ` Davide Libenzi
  0 siblings, 1 reply; 47+ messages in thread
From: Michael S. Tsirkin @ 2009-08-25  7:22 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Avi Kivity, gleb, kvm, Linux Kernel Mailing List

On Mon, Aug 24, 2009 at 03:15:05PM -0700, Davide Libenzi wrote:
> On Tue, 25 Aug 2009, Michael S. Tsirkin wrote:
> 
> > On Mon, Aug 24, 2009 at 11:25:01AM -0700, Davide Libenzi wrote:
> > > On Sun, 23 Aug 2009, Michael S. Tsirkin wrote:
> > > 
> > > > On Sun, Aug 23, 2009 at 04:40:51PM +0300, Avi Kivity wrote:
> > > > > On 08/23/2009 04:36 PM, Michael S. Tsirkin wrote:
> > > > >> More important here is realization that eventfd is a mutex/semaphore
> > > > >> implementation, not a generic event reporting interface as we are trying
> > > > >> to use it.
> > > > >>    
> > > > >
> > > > > Well it is a generic event reporting interface (for example, aio uses it).
> > > > 
> > > > Davide, I think it's a valid point.  For example, what read on eventfd
> > > > does (zero a counter and return) is not like any semaphore I saw.
> > > 
> > > 
> > > Indeed, the default eventfd behaviour is like, well, an event. Signaling 
> > > (kernel side) or writing (userspace side), signals the event.
> > > Waiting (reading) it, will reset the event.
> > > If you use EFD_SEMAPHORE, you get a semaphore-like behavior.
> > > Events and sempahores are two widely known and used abstractions.
> > > The EFD_STATE proposed one, well, no. Not at all.
> > 
> > Hmm. All we try to do is, associate a small key with the event
> > that we signal. Is it really that uncommon/KVM specific?
> 
> All I'm trying to do, is to avoid that eventfd will become an horrible 
> multiplexor for every freaky one-time-use behaviors arising inside kernel 
> modules.

Yes, we don't want that. The best thing is to try to restate the problem
in a way that is generic, and then either solve or best use existing
solution. Right?

I thought I had that, but apparently not.  The reason I'm Cc-ing you is
not to try and spam you until you give up and accept the patch, it's
hoping that you see the pattern behind our usage, and help generalize
it.

If I understand it correctly, you believe this is not possible and so
any solution will have to be in KVM? Or maybe I didn't state the problem
clearly enough and should restate it?


> 
> 
> - Davide

-- 
MST

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-25  7:22                         ` Michael S. Tsirkin
@ 2009-08-25 21:57                           ` Davide Libenzi
  2009-08-26 10:29                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 47+ messages in thread
From: Davide Libenzi @ 2009-08-25 21:57 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, gleb, kvm, Linux Kernel Mailing List

On Tue, 25 Aug 2009, Michael S. Tsirkin wrote:

> Yes, we don't want that. The best thing is to try to restate the problem
> in a way that is generic, and then either solve or best use existing
> solution. Right?
> 
> I thought I had that, but apparently not.  The reason I'm Cc-ing you is
> not to try and spam you until you give up and accept the patch, it's
> hoping that you see the pattern behind our usage, and help generalize
> it.
> 
> If I understand it correctly, you believe this is not possible and so
> any solution will have to be in KVM? Or maybe I didn't state the problem
> clearly enough and should restate it?

Please do.



- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-25 21:57                           ` Davide Libenzi
@ 2009-08-26 10:29                             ` Michael S. Tsirkin
  2009-08-26 10:41                               ` Avi Kivity
  2009-08-26 17:45                               ` Davide Libenzi
  0 siblings, 2 replies; 47+ messages in thread
From: Michael S. Tsirkin @ 2009-08-26 10:29 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Avi Kivity, gleb, kvm, Linux Kernel Mailing List

On Tue, Aug 25, 2009 at 02:57:01PM -0700, Davide Libenzi wrote:
> On Tue, 25 Aug 2009, Michael S. Tsirkin wrote:
> 
> > Yes, we don't want that. The best thing is to try to restate the problem
> > in a way that is generic, and then either solve or best use existing
> > solution. Right?
> > 
> > I thought I had that, but apparently not.  The reason I'm Cc-ing you is
> > not to try and spam you until you give up and accept the patch, it's
> > hoping that you see the pattern behind our usage, and help generalize
> > it.
> > 
> > If I understand it correctly, you believe this is not possible and so
> > any solution will have to be in KVM? Or maybe I didn't state the problem
> > clearly enough and should restate it?
> 
> Please do.
> 
> 
> 
> - Davide


Problem looks like this:

There are multiple processes (devices) where each has a condition
(interrupt line) which it has logic to determine is either true or
false.

A single other process (hypervisor) is interested in a condition
(interrupt level) which is a logical OR of all interrupt lines.
On changes, an interrupt level value needs to be read and copied to
guest virtual cpu.

We also want ability to replace some or all processes above by a kernel
components, with condition changes done potentially from hardware
interrupt context.


How we wanted to solve it with EFD_STATE: Share a separate eventfd
between each device and the hypervisor.  device sets state to either 0
or 1.  hypervisor polls all eventfds, reads interrupt line on changes,
calculates the interrupt level and updates guest.

Alternative solution: shared memory where each device writes interrupt
line value. This makes setup more complex (need to share around much more
than just an fd), and makes access from interrupt impossible unless we
lock the memory (and locking userspace memory introduces yet another set
of issues).


-- 
MST

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-26 10:29                             ` Michael S. Tsirkin
@ 2009-08-26 10:41                               ` Avi Kivity
  2009-08-26 17:45                               ` Davide Libenzi
  1 sibling, 0 replies; 47+ messages in thread
From: Avi Kivity @ 2009-08-26 10:41 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Davide Libenzi, gleb, kvm, Linux Kernel Mailing List

On 08/26/2009 01:29 PM, Michael S. Tsirkin wrote:
> How we wanted to solve it with EFD_STATE: Share a separate eventfd
> between each device and the hypervisor.  device sets state to either 0
> or 1.  hypervisor polls all eventfds, reads interrupt line on changes,
> calculates the interrupt level and updates guest.
>
> Alternative solution: shared memory where each device writes interrupt
> line value. This makes setup more complex (need to share around much more
> than just an fd), and makes access from interrupt impossible unless we
> lock the memory (and locking userspace memory introduces yet another set
> of issues).
>    

For completeness:

If the device is implemented in the same process as the hypervisor, an 
eventfd isn't really needed, as there is an ioctl which performs the 
same operation.

An important class of device implementations is real devices that are 
assigned to a guest.  We would like to forward the interrupt directly 
from the host interrupt handler to qemu.  Currently, we have a 
kvm-specific interrupt handler that forwards the interrupt using 
kvm-specific interfaces.  We would like to use a generic interrupt 
handler implemented by uio, so we want a generic interrupt transfer 
mechanism.

uio already supports edge-triggered interrupts using an eventfd-like 
mechanism.  So it makes sense to extend uio to support real eventfds, 
and to make it also support level-triggered interrupts.

We can work around the lack of state eventfd by having userspace wait on 
whatever mechanism uio uses to make the interrupt state visible, and 
then use the ioctl mentioned above to inform the hypervisor of this 
state.  But it's faster and nicer to give both components an eventfd and 
let them communicate directly.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-26 10:29                             ` Michael S. Tsirkin
  2009-08-26 10:41                               ` Avi Kivity
@ 2009-08-26 17:45                               ` Davide Libenzi
  2009-08-26 18:58                                 ` Avi Kivity
  1 sibling, 1 reply; 47+ messages in thread
From: Davide Libenzi @ 2009-08-26 17:45 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, gleb, kvm, Linux Kernel Mailing List

On Wed, 26 Aug 2009, Michael S. Tsirkin wrote:

> On Tue, Aug 25, 2009 at 02:57:01PM -0700, Davide Libenzi wrote:
> > On Tue, 25 Aug 2009, Michael S. Tsirkin wrote:
> > 
> > > Yes, we don't want that. The best thing is to try to restate the problem
> > > in a way that is generic, and then either solve or best use existing
> > > solution. Right?
> > > 
> > > I thought I had that, but apparently not.  The reason I'm Cc-ing you is
> > > not to try and spam you until you give up and accept the patch, it's
> > > hoping that you see the pattern behind our usage, and help generalize
> > > it.
> > > 
> > > If I understand it correctly, you believe this is not possible and so
> > > any solution will have to be in KVM? Or maybe I didn't state the problem
> > > clearly enough and should restate it?
> > 
> > Please do.
> > 
> > 
> > 
> > - Davide
> 
> 
> Problem looks like this:
> 
> There are multiple processes (devices) where each has a condition
> (interrupt line) which it has logic to determine is either true or
> false.
> 
> A single other process (hypervisor) is interested in a condition
> (interrupt level) which is a logical OR of all interrupt lines.
> On changes, an interrupt level value needs to be read and copied to
> guest virtual cpu.
> 
> We also want ability to replace some or all processes above by a kernel
> components, with condition changes done potentially from hardware
> interrupt context.
> 
> 
> How we wanted to solve it with EFD_STATE: Share a separate eventfd
> between each device and the hypervisor.  device sets state to either 0
> or 1.  hypervisor polls all eventfds, reads interrupt line on changes,
> calculates the interrupt level and updates guest.
> 
> Alternative solution: shared memory where each device writes interrupt
> line value. This makes setup more complex (need to share around much more
> than just an fd), and makes access from interrupt impossible unless we
> lock the memory (and locking userspace memory introduces yet another set
> of issues).

OK, if I get it correctly, there is one eventfd signaler (the device), and 
one eventfd reader (the hypervisor), right?
Each hypervisor listens for multiple devices detecting state changes, and 
associating the eventfd "line" to the IRQ number by some configuration 
(ala PCI), right?



- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-26 17:45                               ` Davide Libenzi
@ 2009-08-26 18:58                                 ` Avi Kivity
  2009-08-26 19:13                                   ` Davide Libenzi
  0 siblings, 1 reply; 47+ messages in thread
From: Avi Kivity @ 2009-08-26 18:58 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Michael S. Tsirkin, gleb, kvm, Linux Kernel Mailing List

On 08/26/2009 08:45 PM, Davide Libenzi wrote:
> OK, if I get it correctly, there is one eventfd signaler (the device), and
> one eventfd reader (the hypervisor), right?
> Each hypervisor listens for multiple devices detecting state changes, and
> associating the eventfd "line" to the IRQ number by some configuration
> (ala PCI), right?
>    

Yes.  The PCI stuff happens in userspace, all the hypervisor sees is 
"this eventfd is IRQ 10".  There may be multiple eventfds routed to one 
IRQ (corresponding to a shared IRQ line).

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-26 18:58                                 ` Avi Kivity
@ 2009-08-26 19:13                                   ` Davide Libenzi
  2009-08-26 19:42                                     ` Avi Kivity
  2009-08-27  9:05                                     ` Paolo Bonzini
  0 siblings, 2 replies; 47+ messages in thread
From: Davide Libenzi @ 2009-08-26 19:13 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Michael S. Tsirkin, gleb, kvm, Linux Kernel Mailing List

On Wed, 26 Aug 2009, Avi Kivity wrote:

> On 08/26/2009 08:45 PM, Davide Libenzi wrote:
> > OK, if I get it correctly, there is one eventfd signaler (the device), and
> > one eventfd reader (the hypervisor), right?
> > Each hypervisor listens for multiple devices detecting state changes, and
> > associating the eventfd "line" to the IRQ number by some configuration
> > (ala PCI), right?
> >    
> 
> Yes.  The PCI stuff happens in userspace, all the hypervisor sees is "this
> eventfd is IRQ 10".  There may be multiple eventfds routed to one IRQ
> (corresponding to a shared IRQ line).

Ok, so why not using the eventfd counter as state?
On the device side:

void write_state(int sfd, int state) {
	u64 cnt;

	/* Clear the current state, sfd is in non-blocking mode */
	read(sfd, &cnt, sizeof(cnt));
	/* Writes new state */
	cnt = 1 + !!state;
	write(sfd, &cnt, sizeof(cnt));
}


On the hypervisor side:

int read_state(int sfd) {
	u64 cnt;

	read(sfd, &cnt, sizeof(cnt));
	return state - 1;
}




- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-26 19:13                                   ` Davide Libenzi
@ 2009-08-26 19:42                                     ` Avi Kivity
  2009-08-26 19:44                                       ` Davide Libenzi
  2009-08-26 19:50                                       ` Gleb Natapov
  2009-08-27  9:05                                     ` Paolo Bonzini
  1 sibling, 2 replies; 47+ messages in thread
From: Avi Kivity @ 2009-08-26 19:42 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Michael S. Tsirkin, gleb, kvm, Linux Kernel Mailing List

On 08/26/2009 10:13 PM, Davide Libenzi wrote:
> Ok, so why not using the eventfd counter as state?
> On the device side:
>
> void write_state(int sfd, int state) {
> 	u64 cnt;
>
> 	/* Clear the current state, sfd is in non-blocking mode */
> 	read(sfd,&cnt, sizeof(cnt));
> 	/* Writes new state */
> 	cnt = 1 + !!state;
> 	write(sfd,&cnt, sizeof(cnt));
> }
>
>
> On the hypervisor side:
>
> int read_state(int sfd) {
> 	u64 cnt;
>
> 	read(sfd,&cnt, sizeof(cnt));
> 	return state - 1;
> }
>
>    

Hadn't though of read+write as set.  While the 1+ is a little ugly, it's 
workable.

I see no kernel equivalent to read(), but that's easily done.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-26 19:42                                     ` Avi Kivity
@ 2009-08-26 19:44                                       ` Davide Libenzi
  2009-08-26 23:30                                         ` Davide Libenzi
  2009-08-26 19:50                                       ` Gleb Natapov
  1 sibling, 1 reply; 47+ messages in thread
From: Davide Libenzi @ 2009-08-26 19:44 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Michael S. Tsirkin, gleb, kvm, Linux Kernel Mailing List

On Wed, 26 Aug 2009, Avi Kivity wrote:

> On 08/26/2009 10:13 PM, Davide Libenzi wrote:
> > Ok, so why not using the eventfd counter as state?
> > On the device side:
> > 
> > void write_state(int sfd, int state) {
> > 	u64 cnt;
> > 
> > 	/* Clear the current state, sfd is in non-blocking mode */
> > 	read(sfd,&cnt, sizeof(cnt));
> > 	/* Writes new state */
> > 	cnt = 1 + !!state;
> > 	write(sfd,&cnt, sizeof(cnt));
> > }
> > 
> > 
> > On the hypervisor side:
> > 
> > int read_state(int sfd) {
> > 	u64 cnt;
> > 
> > 	read(sfd,&cnt, sizeof(cnt));
> > 	return state - 1;
> > }
> > 
> >    
> 
> Hadn't though of read+write as set.  While the 1+ is a little ugly, it's
> workable.

Pick what you want, as long as it always writes something != 0 :)


> I see no kernel equivalent to read(), but that's easily done.

Adding an in-kernel read based on "ctx", that is no problem at all.



- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-26 19:42                                     ` Avi Kivity
  2009-08-26 19:44                                       ` Davide Libenzi
@ 2009-08-26 19:50                                       ` Gleb Natapov
  2009-08-26 20:04                                         ` Davide Libenzi
  1 sibling, 1 reply; 47+ messages in thread
From: Gleb Natapov @ 2009-08-26 19:50 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Davide Libenzi, Michael S. Tsirkin, kvm,
	Linux Kernel Mailing List

On Wed, Aug 26, 2009 at 10:42:05PM +0300, Avi Kivity wrote:
> On 08/26/2009 10:13 PM, Davide Libenzi wrote:
> >Ok, so why not using the eventfd counter as state?
> >On the device side:
> >
> >void write_state(int sfd, int state) {
> >	u64 cnt;
> >
> >	/* Clear the current state, sfd is in non-blocking mode */
> >	read(sfd,&cnt, sizeof(cnt));
> >	/* Writes new state */
> >	cnt = 1 + !!state;
> >	write(sfd,&cnt, sizeof(cnt));
> >}
> >
> >
> >On the hypervisor side:
> >
> >int read_state(int sfd) {
> >	u64 cnt;
> >
> >	read(sfd,&cnt, sizeof(cnt));
> >	return state - 1;
> >}
> >
> 
> Hadn't though of read+write as set.  While the 1+ is a little ugly,
> it's workable.
> 
It's two system calls instead of one to inject interrupt.

> I see no kernel equivalent to read(), but that's easily done.
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-26 19:50                                       ` Gleb Natapov
@ 2009-08-26 20:04                                         ` Davide Libenzi
  2009-08-27  5:25                                           ` Gleb Natapov
  0 siblings, 1 reply; 47+ messages in thread
From: Davide Libenzi @ 2009-08-26 20:04 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Avi Kivity, Michael S. Tsirkin, kvm, Linux Kernel Mailing List

On Wed, 26 Aug 2009, Gleb Natapov wrote:

> On Wed, Aug 26, 2009 at 10:42:05PM +0300, Avi Kivity wrote:
> > On 08/26/2009 10:13 PM, Davide Libenzi wrote:
> > >Ok, so why not using the eventfd counter as state?
> > >On the device side:
> > >
> > >void write_state(int sfd, int state) {
> > >	u64 cnt;
> > >
> > >	/* Clear the current state, sfd is in non-blocking mode */
> > >	read(sfd,&cnt, sizeof(cnt));
> > >	/* Writes new state */
> > >	cnt = 1 + !!state;
> > >	write(sfd,&cnt, sizeof(cnt));
> > >}
> > >
> > >
> > >On the hypervisor side:
> > >
> > >int read_state(int sfd) {
> > >	u64 cnt;
> > >
> > >	read(sfd,&cnt, sizeof(cnt));
> > >	return state - 1;
> > >}
> > >
> > 
> > Hadn't though of read+write as set.  While the 1+ is a little ugly,
> > it's workable.
> > 
> It's two system calls instead of one to inject interrupt.

I guess that's going to completely throw off-chart your RT performance, 
doesn't it?


- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-26 19:44                                       ` Davide Libenzi
@ 2009-08-26 23:30                                         ` Davide Libenzi
  2009-08-27  4:13                                           ` Avi Kivity
  0 siblings, 1 reply; 47+ messages in thread
From: Davide Libenzi @ 2009-08-26 23:30 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Michael S. Tsirkin, gleb, kvm, Linux Kernel Mailing List

On Wed, 26 Aug 2009, Davide Libenzi wrote:

> > I see no kernel equivalent to read(), but that's easily done.
> 
> Adding an in-kernel read based on "ctx", that is no problem at all.

Something like the untested below.
I had thought you said the eventfd readers where in userspace, but I might 
have misunderstood you.



- Davide



---
 fs/eventfd.c            |   51 +++++++++++++++++++++++++++++++++---------------
 include/linux/eventfd.h |    7 ++++++
 2 files changed, 43 insertions(+), 15 deletions(-)

Index: linux-2.6.mod/fs/eventfd.c
===================================================================
--- linux-2.6.mod.orig/fs/eventfd.c	2009-08-26 15:58:03.000000000 -0700
+++ linux-2.6.mod/fs/eventfd.c	2009-08-26 16:20:03.000000000 -0700
@@ -130,26 +130,33 @@ static unsigned int eventfd_poll(struct 
 	return events;
 }
 
-static ssize_t eventfd_read(struct file *file, char __user *buf, size_t count,
-			    loff_t *ppos)
+/**
+ * eventfd_ctx_read - Reads the eventfd counter or wait if it is zero.
+ * @ctx: [in] Pointer to eventfd context.
+ * @no_wait: [in] Different from zero if the operation should not block.
+ * @cnt: [out] Pointer to the 64bit conter value.
+ *
+ * Returns zero if successful, or the following error codes:
+ *
+ * -EAGAIN      : The operation would have blocked but @no_wait was nonzero.
+ * -ERESTARTSYS : A signal interrupted the wait operation.
+ */
+ssize_t eventfd_ctx_read(struct eventfd_ctx *ctx, int no_wait, __u64 *cnt)
 {
-	struct eventfd_ctx *ctx = file->private_data;
 	ssize_t res;
-	__u64 ucnt = 0;
 	DECLARE_WAITQUEUE(wait, current);
 
-	if (count < sizeof(ucnt))
-		return -EINVAL;
 	spin_lock_irq(&ctx->wqh.lock);
+	*cnt = 0;
 	res = -EAGAIN;
 	if (ctx->count > 0)
-		res = sizeof(ucnt);
-	else if (!(file->f_flags & O_NONBLOCK)) {
+		res = 0;
+	else if (!no_wait) {
 		__add_wait_queue(&ctx->wqh, &wait);
-		for (res = 0;;) {
+		for (;;) {
 			set_current_state(TASK_INTERRUPTIBLE);
 			if (ctx->count > 0) {
-				res = sizeof(ucnt);
+				res = 0;
 				break;
 			}
 			if (signal_pending(current)) {
@@ -163,18 +170,32 @@ static ssize_t eventfd_read(struct file 
 		__remove_wait_queue(&ctx->wqh, &wait);
 		__set_current_state(TASK_RUNNING);
 	}
-	if (likely(res > 0)) {
-		ucnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count;
-		ctx->count -= ucnt;
+	if (likely(res == 0)) {
+		*cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count;
+		ctx->count -= *cnt;
 		if (waitqueue_active(&ctx->wqh))
 			wake_up_locked_poll(&ctx->wqh, POLLOUT);
 	}
 	spin_unlock_irq(&ctx->wqh.lock);
-	if (res > 0 && put_user(ucnt, (__u64 __user *) buf))
-		return -EFAULT;
 
 	return res;
 }
+EXPORT_SYMBOL_GPL(eventfd_ctx_read);
+
+static ssize_t eventfd_read(struct file *file, char __user *buf, size_t count,
+			    loff_t *ppos)
+{
+	struct eventfd_ctx *ctx = file->private_data;
+	ssize_t res;
+	__u64 cnt;
+
+	if (count < sizeof(cnt))
+		return -EINVAL;
+	if ((res = eventfd_ctx_read(ctx, file->f_flags & O_NONBLOCK, &cnt)) < 0)
+		return res;
+
+	return put_user(cnt, (__u64 __user *) buf) ? -EFAULT: sizeof(cnt);
+}
 
 static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t count,
 			     loff_t *ppos)
Index: linux-2.6.mod/include/linux/eventfd.h
===================================================================
--- linux-2.6.mod.orig/include/linux/eventfd.h	2009-08-26 15:58:03.000000000 -0700
+++ linux-2.6.mod/include/linux/eventfd.h	2009-08-26 16:17:01.000000000 -0700
@@ -33,6 +33,7 @@ struct file *eventfd_fget(int fd);
 struct eventfd_ctx *eventfd_ctx_fdget(int fd);
 struct eventfd_ctx *eventfd_ctx_fileget(struct file *file);
 int eventfd_signal(struct eventfd_ctx *ctx, int n);
+ssize_t eventfd_ctx_read(struct eventfd_ctx *ctx, int no_wait, __u64 *cnt);
 
 #else /* CONFIG_EVENTFD */
 
@@ -55,6 +56,12 @@ static inline void eventfd_ctx_put(struc
 
 }
 
+static inline ssize_t eventfd_ctx_read(struct eventfd_ctx *ctx, int no_wait,
+				       __u64 *cnt)
+{
+	return -ENOSYS;
+}
+
 #endif
 
 #endif /* _LINUX_EVENTFD_H */

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-26 23:30                                         ` Davide Libenzi
@ 2009-08-27  4:13                                           ` Avi Kivity
  2009-08-27  8:06                                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 47+ messages in thread
From: Avi Kivity @ 2009-08-27  4:13 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Michael S. Tsirkin, gleb, kvm, Linux Kernel Mailing List

On 08/27/2009 02:30 AM, Davide Libenzi wrote:
> On Wed, 26 Aug 2009, Davide Libenzi wrote:
>
>    
>>> I see no kernel equivalent to read(), but that's easily done.
>>>        
>> Adding an in-kernel read based on "ctx", that is no problem at all.
>>      
> Something like the untested below.
> I had thought you said the eventfd readers where in userspace, but I might
> have misunderstood you.
>    

No, they're all over the place.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-26 20:04                                         ` Davide Libenzi
@ 2009-08-27  5:25                                           ` Gleb Natapov
  0 siblings, 0 replies; 47+ messages in thread
From: Gleb Natapov @ 2009-08-27  5:25 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Avi Kivity, Michael S. Tsirkin, kvm, Linux Kernel Mailing List

On Wed, Aug 26, 2009 at 01:04:09PM -0700, Davide Libenzi wrote:
> On Wed, 26 Aug 2009, Gleb Natapov wrote:
> 
> > On Wed, Aug 26, 2009 at 10:42:05PM +0300, Avi Kivity wrote:
> > > On 08/26/2009 10:13 PM, Davide Libenzi wrote:
> > > >Ok, so why not using the eventfd counter as state?
> > > >On the device side:
> > > >
> > > >void write_state(int sfd, int state) {
> > > >	u64 cnt;
> > > >
> > > >	/* Clear the current state, sfd is in non-blocking mode */
> > > >	read(sfd,&cnt, sizeof(cnt));
> > > >	/* Writes new state */
> > > >	cnt = 1 + !!state;
> > > >	write(sfd,&cnt, sizeof(cnt));
> > > >}
> > > >
> > > >
> > > >On the hypervisor side:
> > > >
> > > >int read_state(int sfd) {
> > > >	u64 cnt;
> > > >
> > > >	read(sfd,&cnt, sizeof(cnt));
> > > >	return state - 1;
> > > >}
> > > >
> > > 
> > > Hadn't though of read+write as set.  While the 1+ is a little ugly,
> > > it's workable.
> > > 
> > It's two system calls instead of one to inject interrupt.
> 
> I guess that's going to completely throw off-chart your RT performance, 
> doesn't it?
> 
Do you consider interrupt injection path not worth of optimizing?

--
			Gleb.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-27  4:13                                           ` Avi Kivity
@ 2009-08-27  8:06                                             ` Michael S. Tsirkin
  2009-08-27 14:20                                               ` Davide Libenzi
  0 siblings, 1 reply; 47+ messages in thread
From: Michael S. Tsirkin @ 2009-08-27  8:06 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Davide Libenzi, gleb, kvm, Linux Kernel Mailing List

On Thu, Aug 27, 2009 at 07:13:32AM +0300, Avi Kivity wrote:
> On 08/27/2009 02:30 AM, Davide Libenzi wrote:
>> On Wed, 26 Aug 2009, Davide Libenzi wrote:
>>
>>    
>>>> I see no kernel equivalent to read(), but that's easily done.
>>>>        
>>> Adding an in-kernel read based on "ctx", that is no problem at all.
>>>      
>> Something like the untested below.
>> I had thought you said the eventfd readers where in userspace, but I might
>> have misunderstood you.
>>    
>
> No, they're all over the place.

Further, with Davide's proposal you must be a reader to signal events.

> -- 
> I have a truly marvellous patch that fixes the bug which this
> signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-26 19:13                                   ` Davide Libenzi
  2009-08-26 19:42                                     ` Avi Kivity
@ 2009-08-27  9:05                                     ` Paolo Bonzini
  2009-08-27  9:09                                       ` Michael S. Tsirkin
  2009-08-27 14:21                                       ` Davide Libenzi
  1 sibling, 2 replies; 47+ messages in thread
From: Paolo Bonzini @ 2009-08-27  9:05 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Avi Kivity, Michael S. Tsirkin, gleb, kvm, linux-kernel

> Ok, so why not using the eventfd counter as state?
> On the device side:
> 
> void write_state(int sfd, int state) {
> 	u64 cnt;
> 
> 	/* Clear the current state, sfd is in non-blocking mode */
> 	read(sfd,&cnt, sizeof(cnt));
> 	/* Writes new state */
> 	cnt = 1 + !!state;
> 	write(sfd,&cnt, sizeof(cnt));
> }

It's interesting [no sarcasm intended, mind] that EFD_SEMAPHORE was
added exactly to avoid a read+write combination for the case of
decrementing a value.  Here it's the same, just it's about the case of
writing a *given* value.  What about having EFD_STATE simply mean "do
not use a counter, just write the value" without affecting the way
read works, and use

	/* Writes new state */
	cnt = 1 + !!state;
	write(sfd,&cnt, sizeof(cnt));

See below?

Paolo

> On the hypervisor side:
> 
> int read_state(int sfd) {
> 	u64 cnt;
> 
> 	read(sfd,&cnt, sizeof(cnt));
> 	return state - 1;
> }


------------- 8<-- ---------------
Subject: [PATCH] eventfd: new EFD_ABSOLUTE flag

This implements a new EFD_ABSOLUTE flag for eventfd.
This changes eventfd behaviour so that write simply
stores the value written, and is always non-blocking.

Untested (I just modified Michael's patch, and given
the simpler code I'm not sure it's now worthwhile
introducing the inline functions), but otherwise

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 fs/eventfd.c            |   13 ++++++++-----
 include/linux/eventfd.h |    4 ++--
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 347a0e0..7b279e3 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -31,8 +31,6 @@
 
 static inline int eventfd_writeable(struct eventfd_ctx *ctx, u64 n)
 {
-	return ULLONG_MAX - n > ctx->count;
-	return (ctx->flags & EFD_ABSOLUTE) || (ULLONG_MAX - n > ctx->count);
 }
 
 static inline int eventfd_overflow(struct eventfd_ctx *ctx, u64 cnt)
@@ -42,10 +40,14 @@
 
 static inline void eventfd_dowrite(struct eventfd_ctx *ctx, u64 ucnt)
 {
-	if (eventfd_writeable(ctx, ucnt))
-		ucnt = ULLONG_MAX - ctx->count;
+	if (ctx->flags & EFD_ABSOLUTE)
+		ctx->count = ucnt;
+	else {
+		if (ULLONG_MAX - ctx->count < ucnt)
+			ucnt = ULLONG_MAX - ctx->count;
 
-	ctx->count += ucnt;
+		ctx->count += ucnt;
+	}
 }
 
 static inline u64 eventfd_doread(struct eventfd_ctx *ctx)
diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
index 3b85ba6..78ff649 100644
--- a/include/linux/eventfd.h
+++ b/include/linux/eventfd.h
@@ -19,11 +19,12 @@
  * shared O_* flags.
  */
 #define EFD_SEMAPHORE (1 << 0)
+#define EFD_ABSOLUTE (1 << 1)
 #define EFD_CLOEXEC O_CLOEXEC
 #define EFD_NONBLOCK O_NONBLOCK
 
 #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
-#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE)
+#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE | EFD_ABSOLUTE)
 
 #ifdef CONFIG_EVENTFD
--
1.6.2.5 

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-27  9:05                                     ` Paolo Bonzini
@ 2009-08-27  9:09                                       ` Michael S. Tsirkin
  2009-08-27 14:21                                       ` Davide Libenzi
  1 sibling, 0 replies; 47+ messages in thread
From: Michael S. Tsirkin @ 2009-08-27  9:09 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Davide Libenzi, Avi Kivity, gleb, kvm, linux-kernel

On Thu, Aug 27, 2009 at 11:05:30AM +0200, Paolo Bonzini wrote:
> > Ok, so why not using the eventfd counter as state?
> > On the device side:
> > 
> > void write_state(int sfd, int state) {
> > 	u64 cnt;
> > 
> > 	/* Clear the current state, sfd is in non-blocking mode */
> > 	read(sfd,&cnt, sizeof(cnt));
> > 	/* Writes new state */
> > 	cnt = 1 + !!state;
> > 	write(sfd,&cnt, sizeof(cnt));
> > }
> 
> It's interesting [no sarcasm intended, mind] that EFD_SEMAPHORE was
> added exactly to avoid a read+write combination for the case of
> decrementing a value.  Here it's the same, just it's about the case of
> writing a *given* value.  What about having EFD_STATE simply mean "do
> not use a counter, just write the value" without affecting the way
> read works, and use
> 
> 	/* Writes new state */
> 	cnt = 1 + !!state;
> 	write(sfd,&cnt, sizeof(cnt));

That would work for kvm.

> See below?
> 
> Paolo
> 
> > On the hypervisor side:
> > 
> > int read_state(int sfd) {
> > 	u64 cnt;
> > 
> > 	read(sfd,&cnt, sizeof(cnt));
> > 	return state - 1;
> > }
> 
> 
> ------------- 8<-- ---------------
> Subject: [PATCH] eventfd: new EFD_ABSOLUTE flag
> 
> This implements a new EFD_ABSOLUTE flag for eventfd.
> This changes eventfd behaviour so that write simply
> stores the value written, and is always non-blocking.
> 
> Untested (I just modified Michael's patch, and given
> the simpler code I'm not sure it's now worthwhile
> introducing the inline functions), but otherwise
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  fs/eventfd.c            |   13 ++++++++-----
>  include/linux/eventfd.h |    4 ++--
>  2 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/eventfd.c b/fs/eventfd.c
> index 347a0e0..7b279e3 100644
> --- a/fs/eventfd.c
> +++ b/fs/eventfd.c
> @@ -31,8 +31,6 @@
>  
>  static inline int eventfd_writeable(struct eventfd_ctx *ctx, u64 n)
>  {
> -	return ULLONG_MAX - n > ctx->count;
> -	return (ctx->flags & EFD_ABSOLUTE) || (ULLONG_MAX - n > ctx->count);
>  }
>  
>  static inline int eventfd_overflow(struct eventfd_ctx *ctx, u64 cnt)
> @@ -42,10 +40,14 @@
>  
>  static inline void eventfd_dowrite(struct eventfd_ctx *ctx, u64 ucnt)
>  {
> -	if (eventfd_writeable(ctx, ucnt))
> -		ucnt = ULLONG_MAX - ctx->count;
> +	if (ctx->flags & EFD_ABSOLUTE)
> +		ctx->count = ucnt;
> +	else {
> +		if (ULLONG_MAX - ctx->count < ucnt)
> +			ucnt = ULLONG_MAX - ctx->count;
>  
> -	ctx->count += ucnt;
> +		ctx->count += ucnt;
> +	}
>  }
>  
>  static inline u64 eventfd_doread(struct eventfd_ctx *ctx)
> diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
> index 3b85ba6..78ff649 100644
> --- a/include/linux/eventfd.h
> +++ b/include/linux/eventfd.h
> @@ -19,11 +19,12 @@
>   * shared O_* flags.
>   */
>  #define EFD_SEMAPHORE (1 << 0)
> +#define EFD_ABSOLUTE (1 << 1)
>  #define EFD_CLOEXEC O_CLOEXEC
>  #define EFD_NONBLOCK O_NONBLOCK
>  
>  #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
> -#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE)
> +#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE | EFD_ABSOLUTE)
>  
>  #ifdef CONFIG_EVENTFD
> --
> 1.6.2.5 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-27  8:06                                             ` Michael S. Tsirkin
@ 2009-08-27 14:20                                               ` Davide Libenzi
  0 siblings, 0 replies; 47+ messages in thread
From: Davide Libenzi @ 2009-08-27 14:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, gleb, kvm, Linux Kernel Mailing List

On Thu, 27 Aug 2009, Michael S. Tsirkin wrote:

> On Thu, Aug 27, 2009 at 07:13:32AM +0300, Avi Kivity wrote:
> > On 08/27/2009 02:30 AM, Davide Libenzi wrote:
> >> On Wed, 26 Aug 2009, Davide Libenzi wrote:
> >>
> >>    
> >>>> I see no kernel equivalent to read(), but that's easily done.
> >>>>        
> >>> Adding an in-kernel read based on "ctx", that is no problem at all.
> >>>      
> >> Something like the untested below.
> >> I had thought you said the eventfd readers where in userspace, but I might
> >> have misunderstood you.
> >>    
> >
> > No, they're all over the place.
> 
> Further, with Davide's proposal you must be a reader to signal events.

To signal events, you must have an instance of "ctx" (with the new 
exposed eventfd_ctx_read). How would you do otherwise?


- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-27  9:05                                     ` Paolo Bonzini
  2009-08-27  9:09                                       ` Michael S. Tsirkin
@ 2009-08-27 14:21                                       ` Davide Libenzi
  2009-08-27 14:30                                         ` Michael S. Tsirkin
  1 sibling, 1 reply; 47+ messages in thread
From: Davide Libenzi @ 2009-08-27 14:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Avi Kivity, Michael S. Tsirkin, gleb, kvm,
	Linux Kernel Mailing List

On Thu, 27 Aug 2009, Paolo Bonzini wrote:

> > Ok, so why not using the eventfd counter as state?
> > On the device side:
> > 
> > void write_state(int sfd, int state) {
> > 	u64 cnt;
> > 
> > 	/* Clear the current state, sfd is in non-blocking mode */
> > 	read(sfd,&cnt, sizeof(cnt));
> > 	/* Writes new state */
> > 	cnt = 1 + !!state;
> > 	write(sfd,&cnt, sizeof(cnt));
> > }
> 
> It's interesting [no sarcasm intended, mind] that EFD_SEMAPHORE was
> added exactly to avoid a read+write combination for the case of
> decrementing a value.

Like I repeated 25 times already, EFD_SEMAPHORE was added, because a 
*semaphore* is a pretty widely known and used abstraction.


- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-27 14:21                                       ` Davide Libenzi
@ 2009-08-27 14:30                                         ` Michael S. Tsirkin
  2009-08-27 14:38                                           ` Davide Libenzi
  0 siblings, 1 reply; 47+ messages in thread
From: Michael S. Tsirkin @ 2009-08-27 14:30 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Paolo Bonzini, Avi Kivity, gleb, kvm, Linux Kernel Mailing List

On Thu, Aug 27, 2009 at 07:21:49AM -0700, Davide Libenzi wrote:
> On Thu, 27 Aug 2009, Paolo Bonzini wrote:
> 
> > > Ok, so why not using the eventfd counter as state?
> > > On the device side:
> > > 
> > > void write_state(int sfd, int state) {
> > > 	u64 cnt;
> > > 
> > > 	/* Clear the current state, sfd is in non-blocking mode */
> > > 	read(sfd,&cnt, sizeof(cnt));
> > > 	/* Writes new state */
> > > 	cnt = 1 + !!state;
> > > 	write(sfd,&cnt, sizeof(cnt));
> > > }
> > 
> > It's interesting [no sarcasm intended, mind] that EFD_SEMAPHORE was
> > added exactly to avoid a read+write combination for the case of
> > decrementing a value.
> 
> Like I repeated 25 times already, EFD_SEMAPHORE was added, because a 
> *semaphore* is a pretty widely known and used abstraction.

what about an atomic variable, btw?  does it make sense to implement
write that does compare and exchange?

> 
> - Davide
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-27 14:30                                         ` Michael S. Tsirkin
@ 2009-08-27 14:38                                           ` Davide Libenzi
  2009-08-27 14:49                                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 47+ messages in thread
From: Davide Libenzi @ 2009-08-27 14:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Bonzini, Avi Kivity, gleb, kvm, Linux Kernel Mailing List

On Thu, 27 Aug 2009, Michael S. Tsirkin wrote:

> On Thu, Aug 27, 2009 at 07:21:49AM -0700, Davide Libenzi wrote:
> > On Thu, 27 Aug 2009, Paolo Bonzini wrote:
> > 
> > > > Ok, so why not using the eventfd counter as state?
> > > > On the device side:
> > > > 
> > > > void write_state(int sfd, int state) {
> > > > 	u64 cnt;
> > > > 
> > > > 	/* Clear the current state, sfd is in non-blocking mode */
> > > > 	read(sfd,&cnt, sizeof(cnt));
> > > > 	/* Writes new state */
> > > > 	cnt = 1 + !!state;
> > > > 	write(sfd,&cnt, sizeof(cnt));
> > > > }
> > > 
> > > It's interesting [no sarcasm intended, mind] that EFD_SEMAPHORE was
> > > added exactly to avoid a read+write combination for the case of
> > > decrementing a value.
> > 
> > Like I repeated 25 times already, EFD_SEMAPHORE was added, because a 
> > *semaphore* is a pretty widely known and used abstraction.
> 
> what about an atomic variable, btw?  does it make sense to implement
> write that does compare and exchange?

It is surprising to me, that is front of a workable solution w/out any 
use-once additions, yet you want to try to add optimizations and new 
ad-hoc abstractions to user visible interfaces.
Now, you tell me what an atomic variable has to do with an eventfd.


- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-27 14:38                                           ` Davide Libenzi
@ 2009-08-27 14:49                                             ` Michael S. Tsirkin
  2009-08-27 15:29                                               ` Davide Libenzi
  0 siblings, 1 reply; 47+ messages in thread
From: Michael S. Tsirkin @ 2009-08-27 14:49 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Paolo Bonzini, Avi Kivity, gleb, kvm, Linux Kernel Mailing List

On Thu, Aug 27, 2009 at 07:38:46AM -0700, Davide Libenzi wrote:
> On Thu, 27 Aug 2009, Michael S. Tsirkin wrote:
> 
> > On Thu, Aug 27, 2009 at 07:21:49AM -0700, Davide Libenzi wrote:
> > > On Thu, 27 Aug 2009, Paolo Bonzini wrote:
> > > 
> > > > > Ok, so why not using the eventfd counter as state?
> > > > > On the device side:
> > > > > 
> > > > > void write_state(int sfd, int state) {
> > > > > 	u64 cnt;
> > > > > 
> > > > > 	/* Clear the current state, sfd is in non-blocking mode */
> > > > > 	read(sfd,&cnt, sizeof(cnt));
> > > > > 	/* Writes new state */
> > > > > 	cnt = 1 + !!state;
> > > > > 	write(sfd,&cnt, sizeof(cnt));
> > > > > }
> > > > 
> > > > It's interesting [no sarcasm intended, mind] that EFD_SEMAPHORE was
> > > > added exactly to avoid a read+write combination for the case of
> > > > decrementing a value.
> > > 
> > > Like I repeated 25 times already, EFD_SEMAPHORE was added, because a 
> > > *semaphore* is a pretty widely known and used abstraction.
> > 
> > what about an atomic variable, btw?  does it make sense to implement
> > write that does compare and exchange?
> 
> It is surprising to me, that is front of a workable solution w/out any 
> use-once additions, yet you want to try to add optimizations and new 
> ad-hoc abstractions to user visible interfaces.
> Now, you tell me what an atomic variable has to do with an eventfd.
> 
> 
> - Davide
> 

Oh, I stopped pushing EFD_STATE since we have a solution.
I am just trying to grok what does and what does not consititute a
use-once addition, in your mind, and what does and what does not
belong in eventfd. The reason atomic does not belong there and
semaphore does is because one waits on semaphore but not
on atomic? Is that it?

-- 
MST

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-27 14:49                                             ` Michael S. Tsirkin
@ 2009-08-27 15:29                                               ` Davide Libenzi
  2009-08-27 17:09                                                 ` Davide Libenzi
  0 siblings, 1 reply; 47+ messages in thread
From: Davide Libenzi @ 2009-08-27 15:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Bonzini, Avi Kivity, gleb, kvm, Linux Kernel Mailing List

On Thu, 27 Aug 2009, Michael S. Tsirkin wrote:

> Oh, I stopped pushing EFD_STATE since we have a solution.
> I am just trying to grok what does and what does not consititute a
> use-once addition, in your mind, and what does and what does not
> belong in eventfd. The reason atomic does not belong there and
> semaphore does is because one waits on semaphore but not
> on atomic? Is that it?

An atomic variable is not a synchronization interface.
Even if it'd come up that we really need an atomic variable abstraction 
via an fd, this should not go via eventfd.
Yeah, maybe the name syncfd would have been better, but the very reason 
why eventfd born was to allow KAIO to signal events to poll/select/epoll.
That first implementation was also usable as a mutex.
Then it was propsed to make eventfd to act as a semaphore too. Code was 
trivial, a semaphore fitted the interface, and the abstraction was a 
pretty damn known too. So it went in.
Yes, you could have implemented a semaphore with the existing eventfd, by 
reading the counter and writing counter-1. But again, a semaphore was 
something widely known and generic enough, and was fitting the bill.


- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 0/2] eventfd: new EFD_STATE flag
  2009-08-27 15:29                                               ` Davide Libenzi
@ 2009-08-27 17:09                                                 ` Davide Libenzi
  0 siblings, 0 replies; 47+ messages in thread
From: Davide Libenzi @ 2009-08-27 17:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Bonzini, Avi Kivity, gleb, kvm, Linux Kernel Mailing List

On Thu, 27 Aug 2009, Michael S. Tsirkin wrote:

> Oh, I stopped pushing EFD_STATE since we have a solution.

Do you guys need the kernel-side eventfd_ctx_read() I posted or not?
Because if nobody uses it, I'm not going to push it.


- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2009-08-27 17:09 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-20 15:56 [PATCH 0/2] eventfd: new EFD_STATE flag Michael S. Tsirkin
2009-08-20 16:20 ` Davide Libenzi
2009-08-20 17:38   ` Avi Kivity
2009-08-20 17:44     ` Davide Libenzi
2009-08-20 17:56       ` Paolo Bonzini
2009-08-21 17:21         ` Davide Libenzi
2009-08-20 17:55     ` Michael S. Tsirkin
2009-08-20 18:06       ` Avi Kivity
2009-08-20 18:28         ` Michael S. Tsirkin
2009-08-23 13:01           ` Avi Kivity
2009-08-23 13:36             ` Michael S. Tsirkin
2009-08-23 13:40               ` Avi Kivity
2009-08-23 14:30                 ` Michael S. Tsirkin
2009-08-23 16:51                   ` Paolo Bonzini
2009-08-24 18:25                   ` Davide Libenzi
2009-08-24 18:31                     ` Avi Kivity
2009-08-24 22:08                       ` Davide Libenzi
2009-08-24 22:10                         ` Paolo Bonzini
2009-08-24 22:32                           ` Davide Libenzi
2009-08-25  6:59                             ` Paolo Bonzini
2009-08-25  4:26                         ` Avi Kivity
2009-08-24 21:49                     ` Michael S. Tsirkin
2009-08-24 22:15                       ` Davide Libenzi
2009-08-25  7:22                         ` Michael S. Tsirkin
2009-08-25 21:57                           ` Davide Libenzi
2009-08-26 10:29                             ` Michael S. Tsirkin
2009-08-26 10:41                               ` Avi Kivity
2009-08-26 17:45                               ` Davide Libenzi
2009-08-26 18:58                                 ` Avi Kivity
2009-08-26 19:13                                   ` Davide Libenzi
2009-08-26 19:42                                     ` Avi Kivity
2009-08-26 19:44                                       ` Davide Libenzi
2009-08-26 23:30                                         ` Davide Libenzi
2009-08-27  4:13                                           ` Avi Kivity
2009-08-27  8:06                                             ` Michael S. Tsirkin
2009-08-27 14:20                                               ` Davide Libenzi
2009-08-26 19:50                                       ` Gleb Natapov
2009-08-26 20:04                                         ` Davide Libenzi
2009-08-27  5:25                                           ` Gleb Natapov
2009-08-27  9:05                                     ` Paolo Bonzini
2009-08-27  9:09                                       ` Michael S. Tsirkin
2009-08-27 14:21                                       ` Davide Libenzi
2009-08-27 14:30                                         ` Michael S. Tsirkin
2009-08-27 14:38                                           ` Davide Libenzi
2009-08-27 14:49                                             ` Michael S. Tsirkin
2009-08-27 15:29                                               ` Davide Libenzi
2009-08-27 17:09                                                 ` Davide Libenzi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).