From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:34354) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLeVg-0004rW-Vx for qemu-devel@nongnu.org; Tue, 09 Oct 2012 14:26:35 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TLeVa-0002g8-8x for qemu-devel@nongnu.org; Tue, 09 Oct 2012 14:26:28 -0400 Received: from e23smtp02.au.ibm.com ([202.81.31.144]:39254) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLeVZ-0002f2-Gi for qemu-devel@nongnu.org; Tue, 09 Oct 2012 14:26:22 -0400 Received: from /spool/local by e23smtp02.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 10 Oct 2012 04:23:56 +1000 Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.234.97]) by d23relay04.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q99IGVlZ47251518 for ; Wed, 10 Oct 2012 05:16:32 +1100 Received: from d23av03.au.ibm.com (loopback [127.0.0.1]) by d23av03.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q99IQBYv018235 for ; Wed, 10 Oct 2012 05:26:11 +1100 From: Anthony Liguori In-Reply-To: <5074502F.5030706@redhat.com> References: <1348577763-12920-1-git-send-email-pbonzini@redhat.com> <20121008113932.GB16332@stefanha-thinkpad.redhat.com> <5072CE54.8020208@redhat.com> <20121009090811.GB13775@stefanha-thinkpad.redhat.com> <877gqzn0xc.fsf@codemonkey.ws> <50743D91.4010900@redhat.com> <87391n8xmq.fsf@codemonkey.ws> <5074502F.5030706@redhat.com> Date: Tue, 09 Oct 2012 13:26:05 -0500 Message-ID: <87ehl7lcxu.fsf@codemonkey.ws> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Subject: Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts"" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Kevin Wolf , Stefan Hajnoczi , Ping Fan Liu , qemu-devel@nongnu.org, Avi Kivity Paolo Bonzini writes: > Il 09/10/2012 17:37, Anthony Liguori ha scritto: >>>> >> In the very short term, I can imagine an aio fastpath that was only >>>> >> implemented in terms of the device API. We could have a slow path that >>>> >> acquired the BQL. >>> > >>> > Not sure I follow. >> >> As long as the ioeventfd thread can acquire qemu_mutex in order to call >> bdrv_* functions. The new device-only API could do this under the >> covers for everything but the linux-aio fast path initially. > > Ok, so it's about the locking. I'm not even sure we need locking if we > have cooperative multitasking. For example if bdrv_aio_readv/writev > is called from a VCPU thread, it can just schedule a bottom half for > itself in the appropriate AioContext. Similarly for block jobs. Okay, let's separate out the two issues here though. One is whether we need a device specific block API. The second is whether we should short cut to a fast path in the short term and go after a fully unlocked bdrv_ layer in the long(shortish?) term. So let's talk about your proposal... > The only part where I'm not sure how it would work is bdrv_read/write, > because of the strange "qemu_aio_wait() calls select with a lock taken". > Maybe we can just forbid synchronous I/O if you set a non-default > AioContext. Not sure how practical that is. The is an awful lot of sync I/O still left. > This would be entirely hidden in the block layer. For example the > following does it for bdrv_aio_readv/writev: > > diff --git a/block.c b/block.c > index e95f613..7165e82 100644 > --- a/block.c > +++ b/block.c > @@ -3712,15 +3712,6 @@ static AIOPool bdrv_em_co_aio_pool = { > .cancel = bdrv_aio_co_cancel_em, > }; > > -static void bdrv_co_em_bh(void *opaque) > -{ > - BlockDriverAIOCBCoroutine *acb = opaque; > - > - acb->common.cb(acb->common.opaque, acb->req.error); > - qemu_bh_delete(acb->bh); > - qemu_aio_release(acb); > -} > - > /* Invoke bdrv_co_do_readv/bdrv_co_do_writev */ > static void coroutine_fn bdrv_co_do_rw(void *opaque) > { > @@ -3735,8 +3726,17 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque) > acb->req.nb_sectors, acb->req.qiov, 0); > } > > - acb->bh = qemu_bh_new(bdrv_co_em_bh, acb); > - qemu_bh_schedule(acb->bh); > + acb->common.cb(acb->common.opaque, acb->req.error); > + qemu_aio_release(acb); > +} > + > +static void bdrv_co_em_bh(void *opaque) > +{ > + BlockDriverAIOCBCoroutine *acb = opaque; > + > + qemu_bh_delete(acb->bh); > + co = qemu_coroutine_create(bdrv_co_do_rw); > + qemu_coroutine_enter(co, acb); > } > > static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs, > @@ -3756,8 +3756,8 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs, > acb->req.qiov = qiov; > acb->is_write = is_write; > > - co = qemu_coroutine_create(bdrv_co_do_rw); > - qemu_coroutine_enter(co, acb); > + acb->bh = qemu_bh_new(bdrv_co_em_bh, acb); > + qemu_bh_schedule(acb->bh); > > return &acb->common; > } > > > Then we can add a bdrv_aio_readv/writev_unlocked API to the protocols, which > would run outside the bottom half and provide the desired fast path. This works for some of the block layer I think. How does this interact with thread pools for AIO? But this wouldn't work well with things like NBD or curl, right? What's the plan there? Regards, Anthony Liguori > > Paolo > >> That means that we can convert block devices to use the device-only API >> across the board (provided we make BQL recursive). >> >> It also means we get at least some of the benefits of data-plane in the >> short term.