From: Paolo Bonzini <pbonzini@redhat.com>
To: Anthony Liguori <aliguori@us.ibm.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
Stefan Hajnoczi <stefanha@gmail.com>,
Ping Fan Liu <pingfank@linux.vnet.ibm.com>,
qemu-devel@nongnu.org, Avi Kivity <avi@redhat.com>
Subject: Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
Date: Wed, 10 Oct 2012 09:11:39 +0200 [thread overview]
Message-ID: <50751FAB.1000201@redhat.com> (raw)
In-Reply-To: <87ehl7lcxu.fsf@codemonkey.ws>
Il 09/10/2012 20:26, Anthony Liguori ha scritto:
> Paolo Bonzini <pbonzini@redhat.com> writes:
>
>> Il 09/10/2012 17:37, Anthony Liguori ha scritto:
>>>>>>> In the very short term, I can imagine an aio fastpath that was only
>>>>>>> implemented in terms of the device API. We could have a slow path that
>>>>>>> acquired the BQL.
>>>>>
>>>>> Not sure I follow.
>>>
>>> As long as the ioeventfd thread can acquire qemu_mutex in order to call
>>> bdrv_* functions. The new device-only API could do this under the
>>> covers for everything but the linux-aio fast path initially.
>>
>> Ok, so it's about the locking. I'm not even sure we need locking if we
>> have cooperative multitasking. For example if bdrv_aio_readv/writev
>> is called from a VCPU thread, it can just schedule a bottom half for
>> itself in the appropriate AioContext. Similarly for block jobs.
>
> Okay, let's separate out the two issues here though. One is whether we
> need a device specific block API. The second is whether we should short
> cut to a fast path in the short term and go after a fully unlocked bdrv_
> layer in the long(shortish?) term.
>
> So let's talk about your proposal...
>
>> The only part where I'm not sure how it would work is bdrv_read/write,
>> because of the strange "qemu_aio_wait() calls select with a lock taken".
>> Maybe we can just forbid synchronous I/O if you set a non-default
>> AioContext.
>
> Not sure how practical that is. The is an awful lot of sync I/O still left.
Hmm, yeah, perhaps we need to bite the bullet and use a recursive lock.
The lock would be taken by:
- sync I/O ops
- monitor commands that currently call bdrv_drain_all
- aio_poll when calling bottom halves or handlers
The rest of the proposal however would stand (especially with reference
to block jobs).
I think we can proceed incrementally. The first obvious step is to
s/qemu_bh_new/aio_bh_new/ in the whole block layer (including the
CoQueue stuff), which would also help fixing the qemu-char bug that Jan
reported.
>> This would be entirely hidden in the block layer. For example the
>> following does it for bdrv_aio_readv/writev:
>>
>> diff --git a/block.c b/block.c
>> index e95f613..7165e82 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -3712,15 +3712,6 @@ static AIOPool bdrv_em_co_aio_pool = {
>> .cancel = bdrv_aio_co_cancel_em,
>> };
>>
>> -static void bdrv_co_em_bh(void *opaque)
>> -{
>> - BlockDriverAIOCBCoroutine *acb = opaque;
>> -
>> - acb->common.cb(acb->common.opaque, acb->req.error);
>> - qemu_bh_delete(acb->bh);
>> - qemu_aio_release(acb);
>> -}
>> -
>> /* Invoke bdrv_co_do_readv/bdrv_co_do_writev */
>> static void coroutine_fn bdrv_co_do_rw(void *opaque)
>> {
>> @@ -3735,8 +3726,17 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque)
>> acb->req.nb_sectors, acb->req.qiov, 0);
>> }
>>
>> - acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
>> - qemu_bh_schedule(acb->bh);
>> + acb->common.cb(acb->common.opaque, acb->req.error);
>> + qemu_aio_release(acb);
>> +}
>> +
>> +static void bdrv_co_em_bh(void *opaque)
>> +{
>> + BlockDriverAIOCBCoroutine *acb = opaque;
>> +
>> + qemu_bh_delete(acb->bh);
>> + co = qemu_coroutine_create(bdrv_co_do_rw);
>> + qemu_coroutine_enter(co, acb);
>> }
>>
>> static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
>> @@ -3756,8 +3756,8 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
>> acb->req.qiov = qiov;
>> acb->is_write = is_write;
>>
>> - co = qemu_coroutine_create(bdrv_co_do_rw);
>> - qemu_coroutine_enter(co, acb);
>> + acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
>> + qemu_bh_schedule(acb->bh);
>>
>> return &acb->common;
>> }
>>
>>
>> Then we can add a bdrv_aio_readv/writev_unlocked API to the protocols, which
>> would run outside the bottom half and provide the desired fast path.
>
> This works for some of the block layer I think. How does this interact
> with thread pools for AIO?
>
> But this wouldn't work well with things like NBD or curl, right? What's
> the plan there?
NBD uses coroutines; curl can use the non-unlocked
bdrv_aio_readv/writev. In both cases they would execute in the
dataplane thread. qcow2-over-raw would also execute its read/write code
entirely from the dataplane thread, for example.
Paolo
next prev parent reply other threads:[~2012-10-10 7:12 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 01/17] build: do not rely on indirect inclusion of qemu-config.h Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 02/17] event_notifier: enable it to use pipes Paolo Bonzini
2012-10-08 7:03 ` Stefan Hajnoczi
2012-09-25 12:55 ` [Qemu-devel] [PATCH 03/17] event_notifier: add Win32 implementation Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 04/17] aio: change qemu_aio_set_fd_handler to return void Paolo Bonzini
2012-09-25 21:47 ` Anthony Liguori
2012-09-25 12:55 ` [Qemu-devel] [PATCH 05/17] aio: provide platform-independent API Paolo Bonzini
2012-09-25 21:48 ` Anthony Liguori
2012-09-25 12:55 ` [Qemu-devel] [PATCH 06/17] aio: introduce AioContext, move bottom halves there Paolo Bonzini
2012-09-25 21:51 ` Anthony Liguori
2012-09-26 6:30 ` Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 07/17] aio: add I/O handlers to the AioContext interface Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 08/17] aio: add non-blocking variant of aio_wait Paolo Bonzini
2012-09-25 21:56 ` Anthony Liguori
2012-09-25 12:55 ` [Qemu-devel] [PATCH 09/17] aio: prepare for introducing GSource-based dispatch Paolo Bonzini
2012-09-25 22:01 ` Anthony Liguori
2012-09-26 6:36 ` Paolo Bonzini
2012-09-26 6:48 ` Paolo Bonzini
2012-09-29 11:28 ` Blue Swirl
2012-10-01 6:40 ` Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 10/17] aio: add Win32 implementation Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 11/17] aio: make AioContexts GSources Paolo Bonzini
2012-09-25 22:06 ` Anthony Liguori
2012-09-26 6:40 ` Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 12/17] aio: add aio_notify Paolo Bonzini
2012-09-25 22:07 ` Anthony Liguori
2012-09-25 12:55 ` [Qemu-devel] [PATCH 13/17] aio: call aio_notify after setting I/O handlers Paolo Bonzini
2012-09-25 22:07 ` Anthony Liguori
2012-09-25 12:56 ` [Qemu-devel] [PATCH 14/17] main-loop: use GSource to poll AIO file descriptors Paolo Bonzini
2012-09-25 22:09 ` Anthony Liguori
2012-09-26 6:38 ` Paolo Bonzini
2012-09-25 12:56 ` [Qemu-devel] [PATCH 15/17] main-loop: use aio_notify for qemu_notify_event Paolo Bonzini
2012-09-25 22:10 ` Anthony Liguori
2012-09-25 12:56 ` [Qemu-devel] [PATCH 16/17] aio: clean up now-unused functions Paolo Bonzini
2012-09-25 22:11 ` Anthony Liguori
2012-09-25 12:56 ` [Qemu-devel] [PATCH 17/17] linux-aio: use event notifiers Paolo Bonzini
2012-09-26 12:28 ` [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Kevin Wolf
2012-09-26 13:32 ` Paolo Bonzini
2012-09-26 14:31 ` Kevin Wolf
2012-09-26 15:48 ` Paolo Bonzini
2012-09-27 7:11 ` Kevin Wolf
2012-09-27 7:43 ` Paolo Bonzini
2012-10-08 11:39 ` Stefan Hajnoczi
2012-10-08 13:00 ` Paolo Bonzini
2012-10-09 9:08 ` [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts"" Stefan Hajnoczi
2012-10-09 9:26 ` Avi Kivity
2012-10-09 10:36 ` Paolo Bonzini
2012-10-09 10:52 ` Avi Kivity
2012-10-09 11:08 ` Paolo Bonzini
2012-10-09 11:55 ` Avi Kivity
2012-10-09 12:01 ` Paolo Bonzini
2012-10-09 12:18 ` Jan Kiszka
2012-10-09 12:28 ` Avi Kivity
2012-10-09 12:22 ` Avi Kivity
2012-10-09 13:11 ` Paolo Bonzini
2012-10-09 13:21 ` Avi Kivity
2012-10-09 13:50 ` Paolo Bonzini
2012-10-09 14:24 ` Avi Kivity
2012-10-09 14:35 ` Paolo Bonzini
2012-10-09 14:41 ` Avi Kivity
2012-10-09 14:05 ` Stefan Hajnoczi
2012-10-09 15:02 ` Anthony Liguori
2012-10-09 15:06 ` Paolo Bonzini
2012-10-09 15:37 ` Anthony Liguori
2012-10-09 16:26 ` Paolo Bonzini
2012-10-09 18:26 ` Anthony Liguori
2012-10-10 7:11 ` Paolo Bonzini [this message]
2012-10-10 12:25 ` Anthony Liguori
2012-10-10 13:31 ` Paolo Bonzini
2012-10-10 14:44 ` Anthony Liguori
2012-10-11 12:28 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50751FAB.1000201@redhat.com \
--to=pbonzini@redhat.com \
--cc=aliguori@us.ibm.com \
--cc=avi@redhat.com \
--cc=kwolf@redhat.com \
--cc=pingfank@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.