From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60805) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XxuoL-0002aR-BV for qemu-devel@nongnu.org; Mon, 08 Dec 2014 04:41:03 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XxuoE-0003s8-ER for qemu-devel@nongnu.org; Mon, 08 Dec 2014 04:40:57 -0500 Received: from mx1.redhat.com ([209.132.183.28]:52844) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XxuoE-0003ru-6A for qemu-devel@nongnu.org; Mon, 08 Dec 2014 04:40:50 -0500 Date: Mon, 8 Dec 2014 10:34:55 +0100 From: Kevin Wolf Message-ID: <20141208093455.GA3792@noname.str.redhat.com> References: <1417795568-3201-1-git-send-email-kwolf@redhat.com> <1417795568-3201-7-git-send-email-kwolf@redhat.com> <5481F64A.3070203@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5481F64A.3070203@redhat.com> Subject: Re: [Qemu-devel] [RFC PATCH v2 6/6] linux-aio: Queue requests instead of returning -EAGAIN List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: ming.lei@canonical.com, pl@kamp.de, qemu-devel@nongnu.org, stefanha@redhat.com, mreitz@redhat.com Am 05.12.2014 um 19:15 hat Paolo Bonzini geschrieben: > > > On 05/12/2014 17:06, Kevin Wolf wrote: > > If the queue array for io_submit() is already full, but a new request > > arrives, we cannot add it to that queue anymore. We can, however, use a > > CoQueue, which is implemented as a list and can therefore queue as many > > requests as we want. > > > > Signed-off-by: Kevin Wolf > > --- > > block/linux-aio.c | 31 ++++++++++++++++++++++++++----- > > 1 file changed, 26 insertions(+), 5 deletions(-) > > > > diff --git a/block/linux-aio.c b/block/linux-aio.c > > index 373ec4b..8e6328b 100644 > > --- a/block/linux-aio.c > > +++ b/block/linux-aio.c > > @@ -44,6 +44,7 @@ typedef struct { > > int plugged; > > unsigned int size; > > unsigned int idx; > > + CoQueue waiting; > > } LaioQueue; > > > > struct qemu_laio_state { > > @@ -160,6 +161,8 @@ static void ioq_init(LaioQueue *io_q) > > io_q->size = MAX_QUEUED_IO; > > io_q->idx = 0; > > io_q->plugged = 0; > > + > > + qemu_co_queue_init(&io_q->waiting); > > } > > > > static int ioq_submit(struct qemu_laio_state *s) > > @@ -201,15 +204,29 @@ static int ioq_submit(struct qemu_laio_state *s) > > s->io_q.idx * sizeof(s->io_q.iocbs[0])); > > } > > > > + /* Now there should be room for some more requests */ > > + if (!qemu_co_queue_empty(&s->io_q.waiting)) { > > + if (qemu_in_coroutine()) { > > + qemu_co_queue_next(&s->io_q.waiting); > > + } else { > > + qemu_co_enter_next(&s->io_q.waiting); > > We should get better performance by wrapping these with > plug/unplug. Trivial for the qemu_co_enter_next case, much less for > qemu_co_queue_next... We can probably just use qemu_co_enter_next() everywhere. The only reason why I put a qemu_co_queue_next() there was that it saves a coroutine switch - probably premature optimisation anyway... > This exposes what I think is the main wrinkle in these patches: I'm not > sure linux-aio is a great match for the coroutine architecture. You > introduce some infrastructure duplication with block.c to track > coroutines, and I don't find the coroutine code to be an improvement > over Ming Lei's asynchronous one---in fact I actually find it more > complicated. Really? I found the callback-based one that introduces new BHs and an additional state for a queue that is being aborted (which must be considered everywhere) really ugly, and the resulting code from this coroutine-based series rather clean. I honestly expected that people would debate whether it does the right thing, but that nobody would disagree that it looks nicer - but maybe it's a matter of taste. Also note that this specific patch is doing an additional step that isn't part of Ming's series: Ming's series simply lets requests fail if the queue is full. Also, regardless of that (though I find readability important), my benchmarks seem to suggest that without this conversion, the other optimisations in the queue don't work that well. The fastest performance I've seen so far - including both coroutine and callback based versions - has this conversion applied (measured without patches 4-6 yet, though). Kevin