From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:59862) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLqSW-0000p1-1V for qemu-devel@nongnu.org; Wed, 10 Oct 2012 03:12:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TLqSP-00064m-OJ for qemu-devel@nongnu.org; Wed, 10 Oct 2012 03:11:59 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37021) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLqSP-00063y-AP for qemu-devel@nongnu.org; Wed, 10 Oct 2012 03:11:53 -0400 Message-ID: <50751FAB.1000201@redhat.com> Date: Wed, 10 Oct 2012 09:11:39 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1348577763-12920-1-git-send-email-pbonzini@redhat.com> <20121008113932.GB16332@stefanha-thinkpad.redhat.com> <5072CE54.8020208@redhat.com> <20121009090811.GB13775@stefanha-thinkpad.redhat.com> <877gqzn0xc.fsf@codemonkey.ws> <50743D91.4010900@redhat.com> <87391n8xmq.fsf@codemonkey.ws> <5074502F.5030706@redhat.com> <87ehl7lcxu.fsf@codemonkey.ws> In-Reply-To: <87ehl7lcxu.fsf@codemonkey.ws> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts"" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Kevin Wolf , Stefan Hajnoczi , Ping Fan Liu , qemu-devel@nongnu.org, Avi Kivity Il 09/10/2012 20:26, Anthony Liguori ha scritto: > Paolo Bonzini writes: > >> Il 09/10/2012 17:37, Anthony Liguori ha scritto: >>>>>>> In the very short term, I can imagine an aio fastpath that was only >>>>>>> implemented in terms of the device API. We could have a slow path that >>>>>>> acquired the BQL. >>>>> >>>>> Not sure I follow. >>> >>> As long as the ioeventfd thread can acquire qemu_mutex in order to call >>> bdrv_* functions. The new device-only API could do this under the >>> covers for everything but the linux-aio fast path initially. >> >> Ok, so it's about the locking. I'm not even sure we need locking if we >> have cooperative multitasking. For example if bdrv_aio_readv/writev >> is called from a VCPU thread, it can just schedule a bottom half for >> itself in the appropriate AioContext. Similarly for block jobs. > > Okay, let's separate out the two issues here though. One is whether we > need a device specific block API. The second is whether we should short > cut to a fast path in the short term and go after a fully unlocked bdrv_ > layer in the long(shortish?) term. > > So let's talk about your proposal... > >> The only part where I'm not sure how it would work is bdrv_read/write, >> because of the strange "qemu_aio_wait() calls select with a lock taken". >> Maybe we can just forbid synchronous I/O if you set a non-default >> AioContext. > > Not sure how practical that is. The is an awful lot of sync I/O still left. Hmm, yeah, perhaps we need to bite the bullet and use a recursive lock. The lock would be taken by: - sync I/O ops - monitor commands that currently call bdrv_drain_all - aio_poll when calling bottom halves or handlers The rest of the proposal however would stand (especially with reference to block jobs). I think we can proceed incrementally. The first obvious step is to s/qemu_bh_new/aio_bh_new/ in the whole block layer (including the CoQueue stuff), which would also help fixing the qemu-char bug that Jan reported. >> This would be entirely hidden in the block layer. For example the >> following does it for bdrv_aio_readv/writev: >> >> diff --git a/block.c b/block.c >> index e95f613..7165e82 100644 >> --- a/block.c >> +++ b/block.c >> @@ -3712,15 +3712,6 @@ static AIOPool bdrv_em_co_aio_pool = { >> .cancel = bdrv_aio_co_cancel_em, >> }; >> >> -static void bdrv_co_em_bh(void *opaque) >> -{ >> - BlockDriverAIOCBCoroutine *acb = opaque; >> - >> - acb->common.cb(acb->common.opaque, acb->req.error); >> - qemu_bh_delete(acb->bh); >> - qemu_aio_release(acb); >> -} >> - >> /* Invoke bdrv_co_do_readv/bdrv_co_do_writev */ >> static void coroutine_fn bdrv_co_do_rw(void *opaque) >> { >> @@ -3735,8 +3726,17 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque) >> acb->req.nb_sectors, acb->req.qiov, 0); >> } >> >> - acb->bh = qemu_bh_new(bdrv_co_em_bh, acb); >> - qemu_bh_schedule(acb->bh); >> + acb->common.cb(acb->common.opaque, acb->req.error); >> + qemu_aio_release(acb); >> +} >> + >> +static void bdrv_co_em_bh(void *opaque) >> +{ >> + BlockDriverAIOCBCoroutine *acb = opaque; >> + >> + qemu_bh_delete(acb->bh); >> + co = qemu_coroutine_create(bdrv_co_do_rw); >> + qemu_coroutine_enter(co, acb); >> } >> >> static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs, >> @@ -3756,8 +3756,8 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs, >> acb->req.qiov = qiov; >> acb->is_write = is_write; >> >> - co = qemu_coroutine_create(bdrv_co_do_rw); >> - qemu_coroutine_enter(co, acb); >> + acb->bh = qemu_bh_new(bdrv_co_em_bh, acb); >> + qemu_bh_schedule(acb->bh); >> >> return &acb->common; >> } >> >> >> Then we can add a bdrv_aio_readv/writev_unlocked API to the protocols, which >> would run outside the bottom half and provide the desired fast path. > > This works for some of the block layer I think. How does this interact > with thread pools for AIO? > > But this wouldn't work well with things like NBD or curl, right? What's > the plan there? NBD uses coroutines; curl can use the non-unlocked bdrv_aio_readv/writev. In both cases they would execute in the dataplane thread. qcow2-over-raw would also execute its read/write code entirely from the dataplane thread, for example. Paolo