From: Anthony Liguori <aliguori@us.ibm.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
Stefan Hajnoczi <stefanha@gmail.com>,
Ping Fan Liu <pingfank@linux.vnet.ibm.com>,
qemu-devel@nongnu.org, Avi Kivity <avi@redhat.com>
Subject: Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
Date: Wed, 10 Oct 2012 07:25:12 -0500 [thread overview]
Message-ID: <87haq2ldjr.fsf@codemonkey.ws> (raw)
In-Reply-To: <50751FAB.1000201@redhat.com>
Paolo Bonzini <pbonzini@redhat.com> writes:
> Il 09/10/2012 20:26, Anthony Liguori ha scritto:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>>
>>> Il 09/10/2012 17:37, Anthony Liguori ha scritto:
>>>>>>>> In the very short term, I can imagine an aio fastpath that was only
>>>>>>>> implemented in terms of the device API. We could have a slow path that
>>>>>>>> acquired the BQL.
>>>>>>
>>>>>> Not sure I follow.
>>>>
>>>> As long as the ioeventfd thread can acquire qemu_mutex in order to call
>>>> bdrv_* functions. The new device-only API could do this under the
>>>> covers for everything but the linux-aio fast path initially.
>>>
>>> Ok, so it's about the locking. I'm not even sure we need locking if we
>>> have cooperative multitasking. For example if bdrv_aio_readv/writev
>>> is called from a VCPU thread, it can just schedule a bottom half for
>>> itself in the appropriate AioContext. Similarly for block jobs.
>>
>> Okay, let's separate out the two issues here though. One is whether we
>> need a device specific block API. The second is whether we should short
>> cut to a fast path in the short term and go after a fully unlocked bdrv_
>> layer in the long(shortish?) term.
>>
>> So let's talk about your proposal...
>>
>>> The only part where I'm not sure how it would work is bdrv_read/write,
>>> because of the strange "qemu_aio_wait() calls select with a lock taken".
>>> Maybe we can just forbid synchronous I/O if you set a non-default
>>> AioContext.
>>
>> Not sure how practical that is. The is an awful lot of sync I/O still left.
>
> Hmm, yeah, perhaps we need to bite the bullet and use a recursive lock.
> The lock would be taken by:
>
> - sync I/O ops
>
> - monitor commands that currently call bdrv_drain_all
>
> - aio_poll when calling bottom halves or handlers
>
> The rest of the proposal however would stand (especially with reference
> to block jobs).
>
> I think we can proceed incrementally. The first obvious step is to
> s/qemu_bh_new/aio_bh_new/ in the whole block layer (including the
> CoQueue stuff), which would also help fixing the qemu-char bug that Jan
> reported.
>
>>> This would be entirely hidden in the block layer. For example the
>>> following does it for bdrv_aio_readv/writev:
>>>
>>> diff --git a/block.c b/block.c
>>> index e95f613..7165e82 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -3712,15 +3712,6 @@ static AIOPool bdrv_em_co_aio_pool = {
>>> .cancel = bdrv_aio_co_cancel_em,
>>> };
>>>
>>> -static void bdrv_co_em_bh(void *opaque)
>>> -{
>>> - BlockDriverAIOCBCoroutine *acb = opaque;
>>> -
>>> - acb->common.cb(acb->common.opaque, acb->req.error);
>>> - qemu_bh_delete(acb->bh);
>>> - qemu_aio_release(acb);
>>> -}
>>> -
>>> /* Invoke bdrv_co_do_readv/bdrv_co_do_writev */
>>> static void coroutine_fn bdrv_co_do_rw(void *opaque)
>>> {
>>> @@ -3735,8 +3726,17 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque)
>>> acb->req.nb_sectors, acb->req.qiov, 0);
>>> }
>>>
>>> - acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
>>> - qemu_bh_schedule(acb->bh);
>>> + acb->common.cb(acb->common.opaque, acb->req.error);
>>> + qemu_aio_release(acb);
>>> +}
>>> +
>>> +static void bdrv_co_em_bh(void *opaque)
>>> +{
>>> + BlockDriverAIOCBCoroutine *acb = opaque;
>>> +
>>> + qemu_bh_delete(acb->bh);
>>> + co = qemu_coroutine_create(bdrv_co_do_rw);
>>> + qemu_coroutine_enter(co, acb);
>>> }
>>>
>>> static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
>>> @@ -3756,8 +3756,8 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
>>> acb->req.qiov = qiov;
>>> acb->is_write = is_write;
>>>
>>> - co = qemu_coroutine_create(bdrv_co_do_rw);
>>> - qemu_coroutine_enter(co, acb);
>>> + acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
>>> + qemu_bh_schedule(acb->bh);
>>>
>>> return &acb->common;
>>> }
>>>
>>>
>>> Then we can add a bdrv_aio_readv/writev_unlocked API to the protocols, which
>>> would run outside the bottom half and provide the desired fast path.
>>
>> This works for some of the block layer I think. How does this interact
>> with thread pools for AIO?
>>
>> But this wouldn't work well with things like NBD or curl, right? What's
>> the plan there?
>
> NBD uses coroutines; curl can use the non-unlocked
> bdrv_aio_readv/writev. In both cases they would execute in the
> dataplane thread. qcow2-over-raw would also execute its read/write code
> entirely from the dataplane thread, for example.
Does that mean that we'd stop processing the queue if we're waiting for
an I/O completion to handle meta data operations?
If that's the case, that probably will hurt performance overall.
Regards,
Anthony Liguori
>
> Paolo
next prev parent reply other threads:[~2012-10-10 12:25 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 01/17] build: do not rely on indirect inclusion of qemu-config.h Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 02/17] event_notifier: enable it to use pipes Paolo Bonzini
2012-10-08 7:03 ` Stefan Hajnoczi
2012-09-25 12:55 ` [Qemu-devel] [PATCH 03/17] event_notifier: add Win32 implementation Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 04/17] aio: change qemu_aio_set_fd_handler to return void Paolo Bonzini
2012-09-25 21:47 ` Anthony Liguori
2012-09-25 12:55 ` [Qemu-devel] [PATCH 05/17] aio: provide platform-independent API Paolo Bonzini
2012-09-25 21:48 ` Anthony Liguori
2012-09-25 12:55 ` [Qemu-devel] [PATCH 06/17] aio: introduce AioContext, move bottom halves there Paolo Bonzini
2012-09-25 21:51 ` Anthony Liguori
2012-09-26 6:30 ` Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 07/17] aio: add I/O handlers to the AioContext interface Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 08/17] aio: add non-blocking variant of aio_wait Paolo Bonzini
2012-09-25 21:56 ` Anthony Liguori
2012-09-25 12:55 ` [Qemu-devel] [PATCH 09/17] aio: prepare for introducing GSource-based dispatch Paolo Bonzini
2012-09-25 22:01 ` Anthony Liguori
2012-09-26 6:36 ` Paolo Bonzini
2012-09-26 6:48 ` Paolo Bonzini
2012-09-29 11:28 ` Blue Swirl
2012-10-01 6:40 ` Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 10/17] aio: add Win32 implementation Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 11/17] aio: make AioContexts GSources Paolo Bonzini
2012-09-25 22:06 ` Anthony Liguori
2012-09-26 6:40 ` Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 12/17] aio: add aio_notify Paolo Bonzini
2012-09-25 22:07 ` Anthony Liguori
2012-09-25 12:55 ` [Qemu-devel] [PATCH 13/17] aio: call aio_notify after setting I/O handlers Paolo Bonzini
2012-09-25 22:07 ` Anthony Liguori
2012-09-25 12:56 ` [Qemu-devel] [PATCH 14/17] main-loop: use GSource to poll AIO file descriptors Paolo Bonzini
2012-09-25 22:09 ` Anthony Liguori
2012-09-26 6:38 ` Paolo Bonzini
2012-09-25 12:56 ` [Qemu-devel] [PATCH 15/17] main-loop: use aio_notify for qemu_notify_event Paolo Bonzini
2012-09-25 22:10 ` Anthony Liguori
2012-09-25 12:56 ` [Qemu-devel] [PATCH 16/17] aio: clean up now-unused functions Paolo Bonzini
2012-09-25 22:11 ` Anthony Liguori
2012-09-25 12:56 ` [Qemu-devel] [PATCH 17/17] linux-aio: use event notifiers Paolo Bonzini
2012-09-26 12:28 ` [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Kevin Wolf
2012-09-26 13:32 ` Paolo Bonzini
2012-09-26 14:31 ` Kevin Wolf
2012-09-26 15:48 ` Paolo Bonzini
2012-09-27 7:11 ` Kevin Wolf
2012-09-27 7:43 ` Paolo Bonzini
2012-10-08 11:39 ` Stefan Hajnoczi
2012-10-08 13:00 ` Paolo Bonzini
2012-10-09 9:08 ` [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts"" Stefan Hajnoczi
2012-10-09 9:26 ` Avi Kivity
2012-10-09 10:36 ` Paolo Bonzini
2012-10-09 10:52 ` Avi Kivity
2012-10-09 11:08 ` Paolo Bonzini
2012-10-09 11:55 ` Avi Kivity
2012-10-09 12:01 ` Paolo Bonzini
2012-10-09 12:18 ` Jan Kiszka
2012-10-09 12:28 ` Avi Kivity
2012-10-09 12:22 ` Avi Kivity
2012-10-09 13:11 ` Paolo Bonzini
2012-10-09 13:21 ` Avi Kivity
2012-10-09 13:50 ` Paolo Bonzini
2012-10-09 14:24 ` Avi Kivity
2012-10-09 14:35 ` Paolo Bonzini
2012-10-09 14:41 ` Avi Kivity
2012-10-09 14:05 ` Stefan Hajnoczi
2012-10-09 15:02 ` Anthony Liguori
2012-10-09 15:06 ` Paolo Bonzini
2012-10-09 15:37 ` Anthony Liguori
2012-10-09 16:26 ` Paolo Bonzini
2012-10-09 18:26 ` Anthony Liguori
2012-10-10 7:11 ` Paolo Bonzini
2012-10-10 12:25 ` Anthony Liguori [this message]
2012-10-10 13:31 ` Paolo Bonzini
2012-10-10 14:44 ` Anthony Liguori
2012-10-11 12:28 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87haq2ldjr.fsf@codemonkey.ws \
--to=aliguori@us.ibm.com \
--cc=avi@redhat.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=pingfank@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).