qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Ming Lei <ming.lei@canonical.com>, Fam Zheng <famz@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support
Date: Wed, 13 Aug 2014 15:16:59 +0200	[thread overview]
Message-ID: <53EB654B.2020200@redhat.com> (raw)
In-Reply-To: <20140813095400.GA3701@noname.redhat.com>

Il 13/08/2014 11:54, Kevin Wolf ha scritto:
> Am 12.08.2014 um 21:08 hat Paolo Bonzini geschrieben:
>> Il 12/08/2014 10:12, Ming Lei ha scritto:
>>>>> The below patch is basically the minimal change to bypass coroutines.  Of course
>>>>> the block.c part is not acceptable as is (the change to refresh_total_sectors
>>>>> is broken, the others are just ugly), but it is a start.  Please run it with
>>>>> your fio workloads, or write an aio-based version of a qemu-img/qemu-io *I/O*
>>>>> benchmark.
>>> Could you explain why the new change is introduced?
>>
>> It provides a fast path for bdrv_aio_readv/writev whenever there is
>> nothing to do after the driver routine returns.  In this case there is
>> no need to wrap the AIOCB returned by the driver routine.
>>
>> It doesn't go all the way, and in particular it doesn't reverse
>> completely the roles of bdrv_co_readv/writev vs. bdrv_aio_readv/writev.
> 
> That's actually why I think it's an option. Remember that, like you say
> below, we're optimising for an extreme case here, and I certainly don't
> want to hurt the common case for it. I can't imagine a way of reversing
> the roles without multiplying the cost for the coroutine path.

I'm not that worried about it.  Perhaps it's enough to add an
!qemu_in_coroutine() to the AIO fast path, and let the driver provide
optimized coroutine paths like in your patches that allocate AIOCBs on
the stack.

> Or do you have a clever solution how you'd go about it without having an
> impact on the common case?

I don't really have any ace up my sleeve, but there are some things that
bother me in the block layer's AIO API and in block.c in general.

One is that block.c can do all the pre-processing it wants before
issuing AIO, but nothing before calling the callback.  This means that
my patches break bdrv_drain_all (they cannot call tracked_request_end).

Another is all the similar structs that we have (RwCo,
BdrvTrackedRequest, BlockRequest, etc.).

Perhaps it would help if we had a single "real" block request object,
which is an extension of the BlockDriverAIOCB and includes enough data
to subsume all these request structs.  That should help commonizing
stuff between the coroutine and AIO paths, for the common case where a
single yield is enough.  I think the single-yield case is the one that
is really worth optimizing for.  If done properly, I think this can
simplify a lot of block.c code, but it is really difficult to get it
right, and unless the design is sound the code is going to come up
really ugly. :(

Another thing to evaluate is the performance gap (if there is any)
between aio=threads and aio=native.  The only advantage of aio=native,
AFAIU, is the batch submission of requests (plug/unplug).  But
aio=threads often ends up having better performance because the kernel
folks have optimized VFS a lot.  So, in the aio=threads case, we might
as well move the format code out of the iothread and into the worker
thread, and get rid of the coroutine cost simply by making everything
synchronous.  Looking within QEMU, this worked out very well for migration.

(We could do batch submission of requests to the thread pool if there
were a variant of sem_post that can add multiple signals to the same
semaphore, similar to ReleaseSemaphore on Windows).

>> The problem is that your patches to do touch too much code and subtly
>> break too much stuff.  The one I wrote does have a little breakage
>> because I don't understand bs->growable 100% and I didn't really put
>> much effort into it (my deadline being basically "be done as soon as the
>> shower is free"), and it is ugly as hell, _but_ it should be compatible
>> with the way the block layer works.
> 
> Yes, your patch is definitely much more palatable than Ming's. The part
> that I still don't like about it is that it would be stating "in the
> common case, we're only doing the second best thing". I'm not yet
> convinced that coroutines perform necessarily worse than state-passing
> callbacks.

Coroutines lump all the allocation costs together at the time you
allocate the stack, but have (much) more expensive context switching.
Your patches decrease the allocation costs by placing the AIOCB on the
stack.

Since you have to allocate an AIOCB anyway if the caller uses
bdrv_aio_readv/writev, coroutines do perform necessarily worse if the
drivers have little or no state to pass around.  This is the case for
both thread-pool and linux-aio AIOCBs; and which is also the case for
the generic block layer whenever the long "if" statements evaluate to true.

Paolo

  reply	other threads:[~2014-08-13 13:17 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-05  3:33 [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 01/17] qemu/obj_pool.h: introduce object allocation pool Ming Lei
2014-08-05 11:55   ` Eric Blake
2014-08-05 12:05     ` Michael S. Tsirkin
2014-08-05 12:21       ` Eric Blake
2014-08-05 12:51         ` Michael S. Tsirkin
2014-08-06  2:35     ` Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 02/17] dataplane: use object pool to speed up allocation for virtio blk request Ming Lei
2014-08-05 12:30   ` Eric Blake
2014-08-06  2:45     ` Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 03/17] qemu coroutine: support bypass mode Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 04/17] block: prepare for supporting selective bypass coroutine Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 05/17] garbage collector: introduced for support of " Ming Lei
2014-08-05 12:43   ` Eric Blake
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 06/17] block: introduce bdrv_co_can_bypass_co Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 07/17] block: support to bypass qemu coroutinue Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 08/17] Revert "raw-posix: drop raw_get_aio_fd() since it is no longer used" Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 09/17] dataplane: enable selective bypassing coroutine Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 10/17] linux-aio: fix submit aio as a batch Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 11/17] linux-aio: handling -EAGAIN for !s->io_q.plugged case Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 12/17] linux-aio: increase max event to 256 Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 13/17] linux-aio: remove 'node' from 'struct qemu_laiocb' Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 14/17] hw/virtio/virtio-blk.h: introduce VIRTIO_BLK_F_MQ Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 15/17] virtio-blk: support multi queue for non-dataplane Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 16/17] virtio-blk: dataplane: support multi virtqueue Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 17/17] hw/virtio-pci: introduce num_queues property Ming Lei
2014-08-05  9:38 ` [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support Stefan Hajnoczi
2014-08-05  9:50   ` Ming Lei
2014-08-05  9:56     ` Kevin Wolf
2014-08-05 10:50       ` Ming Lei
2014-08-05 13:59     ` Stefan Hajnoczi
2014-08-05  9:48 ` Kevin Wolf
2014-08-05 10:00   ` Ming Lei
2014-08-05 11:44     ` Paolo Bonzini
2014-08-05 13:48     ` Stefan Hajnoczi
2014-08-05 14:47       ` Kevin Wolf
2014-08-06  5:33         ` Ming Lei
2014-08-06  7:45           ` Paolo Bonzini
2014-08-06  8:38             ` Ming Lei
2014-08-06  8:50               ` Paolo Bonzini
2014-08-06 13:53                 ` Ming Lei
2014-08-06  8:48           ` Kevin Wolf
2014-08-06  9:37             ` Ming Lei
2014-08-06 10:09               ` Kevin Wolf
2014-08-06 11:28                 ` Ming Lei
2014-08-06 11:44                   ` Ming Lei
2014-08-06 15:40                   ` Kevin Wolf
2014-08-07 10:27                     ` Ming Lei
2014-08-07 10:52                       ` Ming Lei
2014-08-07 11:06                         ` Kevin Wolf
2014-08-07 13:03                           ` Ming Lei
2014-08-07 13:51                       ` Kevin Wolf
2014-08-08 10:32                         ` Ming Lei
2014-08-08 11:26                           ` Ming Lei
2014-08-10  3:46             ` Ming Lei
2014-08-11 14:03               ` Kevin Wolf
2014-08-12  7:53                 ` Ming Lei
2014-08-12 11:40                   ` Kevin Wolf
2014-08-12 12:14                     ` Ming Lei
2014-08-11 19:37               ` Paolo Bonzini
2014-08-12  8:12                 ` Ming Lei
2014-08-12 19:08                   ` Paolo Bonzini
2014-08-13  9:54                     ` Kevin Wolf
2014-08-13 13:16                       ` Paolo Bonzini [this message]
2014-08-13 13:49                         ` Ming Lei
2014-08-14  9:39                           ` Stefan Hajnoczi
2014-08-14 10:12                             ` Ming Lei
2014-08-15 20:16                             ` Paolo Bonzini
2014-08-13 10:19                     ` Ming Lei
2014-08-13 12:35                       ` Paolo Bonzini
2014-08-13  8:55                 ` Stefan Hajnoczi
2014-08-13 11:43                 ` Ming Lei
2014-08-13 12:35                   ` Paolo Bonzini
2014-08-13 13:07                     ` Ming Lei
2014-08-14 10:46                 ` Kevin Wolf
2014-08-15 10:39                   ` Ming Lei
2014-08-15 20:15                   ` Paolo Bonzini
2014-08-16  8:20                     ` Ming Lei
2014-08-17  5:29                     ` Paolo Bonzini
2014-08-18  8:58                       ` Kevin Wolf
2014-08-06  9:37           ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53EB654B.2020200@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=ming.lei@canonical.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).