qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Anthony Liguori <aliguori@us.ibm.com>,
	Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>, Avi Kivity <avi@redhat.com>
Subject: Re: [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO
Date: Tue, 27 Sep 2011 09:36:31 -0500	[thread overview]
Message-ID: <4E81DF6F.8080905@codemonkey.ws> (raw)
In-Reply-To: <4E81DDDF.8050201@siemens.com>

On 09/27/2011 09:29 AM, Jan Kiszka wrote:
> On 2011-09-27 16:07, Anthony Liguori wrote:
>> On 09/27/2011 08:56 AM, Jan Kiszka wrote:
>>> Move qemu_eventfd unmodified to oslib-posix and use it for signaling
>>> POSIX AIO completions. If native eventfd suport is available, this
>>> avoids multiple read accesses to drain multiple pending signals. As
>>> before we use a pipe if eventfd is not supported.
>>>
>>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>>> ---
>>>    os-posix.c         |   32 --------------------------------
>>>    oslib-posix.c      |   32 +++++++++++++++++++++++++++++++-
>>>    posix-aio-compat.c |   12 ++++++++----
>>>    3 files changed, 39 insertions(+), 37 deletions(-)
>>>
>>> diff --git a/os-posix.c b/os-posix.c
>>> index dbf3b24..a918895 100644
>>> --- a/os-posix.c
>>> +++ b/os-posix.c
>>> @@ -45,10 +45,6 @@
>>>    #include<sys/syscall.h>
>>>    #endif
>>>
>>> -#ifdef CONFIG_EVENTFD
>>> -#include<sys/eventfd.h>
>>> -#endif
>>> -
>>>    static struct passwd *user_pwd;
>>>    static const char *chroot_dir;
>>>    static int daemonize;
>>> @@ -333,34 +329,6 @@ void os_set_line_buffering(void)
>>>        setvbuf(stdout, NULL, _IOLBF, 0);
>>>    }
>>>
>>> -/*
>>> - * Creates an eventfd that looks like a pipe and has EFD_CLOEXEC set.
>>> - */
>>> -int qemu_eventfd(int fds[2])
>>> -{
>>> -#ifdef CONFIG_EVENTFD
>>> -    int ret;
>>> -
>>> -    ret = eventfd(0, 0);
>>> -    if (ret>= 0) {
>>> -        fds[0] = ret;
>>> -        qemu_set_cloexec(ret);
>>> -        if ((fds[1] = dup(ret)) == -1) {
>>> -            close(ret);
>>> -            return -1;
>>> -        }
>>> -        qemu_set_cloexec(fds[1]);
>>> -        return 0;
>>> -    }
>>> -
>>> -    if (errno != ENOSYS) {
>>> -        return -1;
>>> -    }
>>> -#endif
>>> -
>>> -    return qemu_pipe(fds);
>>> -}
>>> -
>>>    int qemu_create_pidfile(const char *filename)
>>>    {
>>>        char buffer[128];
>>> diff --git a/oslib-posix.c b/oslib-posix.c
>>> index a304fb0..8ef7bd7 100644
>>> --- a/oslib-posix.c
>>> +++ b/oslib-posix.c
>>> @@ -47,7 +47,9 @@ extern int daemon(int, int);
>>>    #include "trace.h"
>>>    #include "qemu_socket.h"
>>>
>>> -
>>> +#ifdef CONFIG_EVENTFD
>>> +#include<sys/eventfd.h>
>>> +#endif
>>>
>>>    int qemu_daemon(int nochdir, int noclose)
>>>    {
>>> @@ -139,6 +141,34 @@ int qemu_pipe(int pipefd[2])
>>>        return ret;
>>>    }
>>>
>>> +/*
>>> + * Creates an eventfd that looks like a pipe and has EFD_CLOEXEC set.
>>> + */
>>> +int qemu_eventfd(int fds[2])
>>> +{
>>> +#ifdef CONFIG_EVENTFD
>>> +    int ret;
>>> +
>>> +    ret = eventfd(0, 0);
>>> +    if (ret>= 0) {
>>> +        fds[0] = ret;
>>> +        qemu_set_cloexec(ret);
>>> +        if ((fds[1] = dup(ret)) == -1) {
>>> +            close(ret);
>>> +            return -1;
>>> +        }
>>> +        qemu_set_cloexec(fds[1]);
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (errno != ENOSYS) {
>>> +        return -1;
>>> +    }
>>> +#endif
>>> +
>>> +    return qemu_pipe(fds);
>>> +}
>>> +
>>
>> I think it's a bit dangerous to implement eventfd() in terms of pipe().
>>
>> You don't expect to handle EAGAIN with eventfd() whereas you have to handle it
>> with pipe().
>
> EAGAIN is returned on eventfd read if no event is pending and the fd is
> non-blocking - just as we configure it.
>
>>
>> Moreover, the eventfd() counter is not lossy (practically speaking) whereas if
>> you use pipe() as a counter, it will be lossy in practice.
>>
>> This is why posix aio uses pipe() and not eventfd().
>
> I don't get this yet. eventfd is lossy by default. It only decreases the
> counter on read if you specify EFD_SEMAPHORE - which we do not do.

uint64_t value;

for (i = 0; i < 1 << 32; i++) {
    value = 1;
    write(fd, &value, sizeof(value));
}

uint64_t count = 0;

do {
    len = read(fd, &value, sizeof(value));
    count += value;
} while (len != -1);

With eventfd, count == 2^32.  With pipe, count == 8192.

That's each '1' is stored in the pipe buffer whereas with eventfd, an index is 
just incremented.

Not sure what you mean re: EFD_SEMAPHORE.  EFD_SEMAPHORE basically means any 
non-zero value is returned as 1 and the counter is decremented by 1.  Without 
EFD_SEMAPHORE, the count is returned and the counter is reset to 0.

Regards,

Anthony Liguori

>
> Jan
>

  parent reply	other threads:[~2011-09-27 14:36 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4E78C42D.5030207@siemens.com>
     [not found] ` <20110921080600.GA9847@stefanha-thinkpad.localdomain>
     [not found]   ` <4E80B50B.9000301@siemens.com>
     [not found]     ` <4E80B55F.5020203@redhat.com>
     [not found]       ` <4E80BFF3.8000907@us.ibm.com>
     [not found]         ` <4E8190BE.3000801@redhat.com>
2011-09-27 13:56           ` [Qemu-devel] [PATCH] Use qemu_eventfd for POSIX AIO Jan Kiszka
2011-09-27 14:07             ` Anthony Liguori
2011-09-27 14:11               ` Avi Kivity
2011-09-27 14:19                 ` Anthony Liguori
2011-09-27 14:22                   ` Avi Kivity
2011-09-27 14:22                   ` Jan Kiszka
2011-09-27 14:38                     ` Anthony Liguori
2011-09-27 14:29               ` Jan Kiszka
2011-09-27 14:34                 ` Avi Kivity
2011-09-27 14:36                   ` Jan Kiszka
2011-09-27 14:42                     ` Avi Kivity
2011-09-27 14:45                       ` Jan Kiszka
2011-09-27 14:48                         ` Avi Kivity
2011-09-27 14:50                           ` Jan Kiszka
2011-09-27 14:54                             ` Avi Kivity
2011-09-27 14:57                               ` Anthony Liguori
2011-09-27 14:59                               ` Jan Kiszka
2011-09-27 14:36                 ` Anthony Liguori [this message]
2011-09-27 14:41               ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E81DF6F.8080905@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=aliguori@us.ibm.com \
    --cc=avi@redhat.com \
    --cc=jan.kiszka@siemens.com \
    --cc=kwolf@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).