From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:34766)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1V1tuH-0003FB-5L
	for qemu-devel@nongnu.org; Wed, 24 Jul 2013 03:54:47 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1V1tuF-0007hy-RQ
	for qemu-devel@nongnu.org; Wed, 24 Jul 2013 03:54:45 -0400
Received: from mail-ee0-x22c.google.com ([2a00:1450:4013:c00::22c]:46814)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1V1tuF-0007hi-Ik
	for qemu-devel@nongnu.org; Wed, 24 Jul 2013 03:54:43 -0400
Received: by mail-ee0-f44.google.com with SMTP id c13so51724eek.17
	for <qemu-devel@nongnu.org>; Wed, 24 Jul 2013 00:54:42 -0700 (PDT)
Date: Wed, 24 Jul 2013 09:54:39 +0200
From: Stefan Hajnoczi <stefanha@gmail.com>
Message-ID: <20130724075439.GC31445@stefanha-thinkpad.muc.redhat.com>
References: <26DE76D4FD616A2955DC1732@nimrod.local>
	<20130723121825.GB20857@stefanha-thinkpad.redhat.com>
	<2B93060044B2D160D39B27F0@nimrod.local>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <2B93060044B2D160D39B27F0@nimrod.local>
Subject: Re: [Qemu-devel] Question on aio_poll
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alex Bligh <alex@alex.org.uk>
Cc: qemu-devel@nongnu.org

On Tue, Jul 23, 2013 at 03:46:23PM +0100, Alex Bligh wrote:
> --On 23 July 2013 14:18:25 +0200 Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >Unfortunately there is an issue with the series which I haven't had time
> >to look into yet.  I don't remember the details but I think make check
> >is failing.
> >
> >The current qemu.git/master code is doing the "correct" thing though.
> >Callers of aio_poll() are using it to complete any pending I/O requests
> >and process BHs.  If there is no work left, we do not want to block
> >indefinitely.  Instead we want to return.
> 
> If we have no work to do (no FDs) and have a timer, then this should
> wait for the timer to expire (i.e. wait until progress has been
> made). Hence without a timer, it would be peculiar if it returned
> earlier.
> 
> I think it should behave like select really, i.e. if you give it
> an infinite timeout (blocking) and no descriptors to work on, it hangs
> for ever. At the very least it should warn, as this is in my opinion
> an error by the caller.
> 
> I left this how it was in the end (I think), and got round it by
> creating a bogus pipe for the test to listen to.

Doing that requires the changes in my patch series, otherwise you break
aio_poll() loops that are waiting for pending I/O requests.  They don't
want to wait for timers.

> >>Thirdly, I don't quite understand how/why busy is being set. It seems
> >>to be set if the flush callback returns non-zero. That would imply (I
> >>think) the fd handler has something to write. But what if it is just
> >>interested in any data to read that is available (and never writes)? If
> >>this is the only fd aio_poll has, it would appear it never polls.
> >
> >The point of .io_flush() is to select file descriptors that are awaiting
> >I/O (either direction).  For example, consider an iSCSI TCP socket with
> >no I/O requests pending.  In that case .io_flush() returns 0 and we will
> >not block in aio_poll().  But if there is an iSCSI request pending, then
> >.io_flush() will return 1 and we'll wait for the iSCSI response to be
> >received.
> >
> >The effect of .io_flush() is that aio_poll() will return false if there
> >is no I/O pending.
> 
> Right, but take that example. If the tcp socket is idle because it's an
> iSCSI server and it is waiting for an iSCSI request, then io_flush
> returns 0. That will mean busy will not be set, and if it's the only
> FD, g_poll won't be called AT ALL - forget the fact it won't block -
> because it will exit aio_poll a couple of lines before the g_poll. That
> means you'll never actually poll for the incoming iSCSI command.
> Surely that can't be right!
> 
> Or are you saying that this type of FD never appears in the aio poll
> set so it is just returning for the main loop to handle them.

That happens because QEMU has two types of fd monitoring.  It has
AioContext's aio_poll() which is designed for asynchronous I/O requests
initiated by QEMU.  It can wait for them to complete.

QEMU also has main-loop's qemu_set_fd_handler() (iohandler) which is
used for server connections like the one you described.  The NBD server
uses it, for example.

I hope we can eventually unify event loops and then the select function
should behave as you described.  For now though, we need to keep the
current behavior until my .io_flush() removal series or something
equivalent is merged, at least.

> >It turned out that this behavior could be implemented at the block layer
> >instead of using the .io_flush() interface at the AioContext layer.  The
> >patch series I linked to above modifies the code so AioContext can
> >eliminate the .io_flush() concept.
> 
> I've just had a quick read of that.
> 
> I think the key one is:
>  http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg00099.html
> 
> I note you've eliminated 'busy' - hurrah.
> 
> I note you now have:
>     if (ctx->pollfds->len == 1) {
>         return progress;
>     }
> 
> Is the '1' there the event notifier? How do we know there is only
> one of them?

There many be many EventNotifier instances.  That's not what matters.

Rather, it's about the aio_notify() EventNotifier.  Each AioContext has
its own EventNotifier which can be signalled with aio_notify().  The
purpose of this function is to kick an event loop that is blocking in
select()/poll().  This is necessary when another thread modifies
something that the AioContext needs to act upon, such as adding/removing
an fd.