From: Paolo Bonzini <pbonzini@redhat.com>
To: qemu-devel@nongnu.org
Cc: stefanha@redhat.com, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite
Date: Tue, 8 Mar 2016 18:51:00 +0100 [thread overview]
Message-ID: <56DF1104.6080901@redhat.com> (raw)
In-Reply-To: <1455645388-32401-1-git-send-email-pbonzini@redhat.com>
On 16/02/2016 18:56, Paolo Bonzini wrote:
> So the fine-grained locking series has grown from 2 parts to 3...
>
> This first part stops where we remove RFifoLock.
>
> In the previous submission, the bulk of the series took care of
> associating an AioContext to a thread, so that aio_poll could run event
> handlers only on that thread. This was done to fix a race in bdrv_drain.
> There were two disadvantages:
>
> 1) reporting progress from aio_poll to the main thread was Really Hard.
> Despite being relatively sure of the correctness---also thanks to formal
> models---it's not the kind of code that I'd lightly donate to posterity.
>
> 2) the strict dependency between bdrv_drain and a single AioContext
> would have been bad for multiqueue.
>
> Instead, this series does the same trick (do not run dataplane event handlers
> from the main thread) but reports progress directly at the BlockDriverState
> level.
>
> To do so, the mechanism to track in-flight requests is changed to a
> simple counter. This is more flexible than BdrvTrackedRequest, and lets
> the block/io.c code track precisely the initiation and completion of the
> requests. In particular, the counter remains nonzero when a bottom half
> is required to "really" complete the request. bdrv_drain doesn't rely
> anymore on a non-blocking aio_poll to execute those bottom halves.
>
> It is also more efficient; while BdrvTrackedRequest has to remain
> for request serialisation, with this series we can in principle make
> BdrvTrackedRequest optional and enable it only when something (automatic
> RMW or copy-on-read) requires request serialisation.
>
> In general this flows nicely, the only snag being patch 5. The problem
> here is that mirror is calling bdrv_drain from an AIO callback, which
> used to be okay (because bdrv_drain more or less tried to guess when
> all AIO callbacks were done) but now causes a deadlock (because bdrv_drain
> really checks for AIO callbacks to be complete). To add to the complication,
> the bdrv_drain happens far away from the AIO callback, in the coroutine that
> the AIO callback enters. The situation here is admittedly underdefined,
> and Stefan pointed out that the same solution is found in many other places
> in the QEMU block layer. Therefore I think it's acceptable. I'm pointing
> it out explicitly though, because (together with patch 8) it is an example
> of the issues caused by the new, stricter definition of bdrv_drain.
>
> Patches 1-7 organize bdrv_drain into separate phases, with a well-defined
> order between the BDS and it children. The phases are: 1) disable
> throttling; 2) disable io_plug; 3) call bdrv_drain callbacks; 4) wait
> for I/O to finish; 5) re-enable io_plug and throttling. Previously,
> throttling of throttling and io_plug were handled by bdrv_flush_io_queue,
> which was repeatedly called as part of the I/O wait loop. This part of
> the series removes bdrv_flush_io_queue.
>
> Patch 8-11 replace aio_poll with bdrv_drain so that the work done
> so far applies to all former callers of aio_poll.
>
> Patches 12-13 introduce the logic that lets the main thread wait remotely
> for an iothread to drain the BDS. Patches 14-16 then achieve the purpose
> of this series, removing the long-running AioContext critical section
> from iothread_run and removing RFifoLock.
>
> The series passes check-block.sh with a few fixes applied for pre-existing
> bugs:
>
> vl: fix migration from prelaunch state
> migration: fix incorrect memory_global_dirty_log_start outside BQL
> qed: fix bdrv_qed_drain
>
> Paolo
>
> Paolo Bonzini (16):
> block: make bdrv_start_throttled_reqs return void
> block: move restarting of throttled reqs to block/throttle-groups.c
> block: introduce bdrv_no_throttling_begin/end
> block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end
> mirror: use bottom half to re-enter coroutine
> block: add BDS field to count in-flight requests
> block: change drain to look only at one child at a time
> blockjob: introduce .drain callback for jobs
> block: wait for all pending I/O when doing synchronous requests
> nfs: replace aio_poll with bdrv_drain
> sheepdog: disable dataplane
> aio: introduce aio_context_in_iothread
> block: only call aio_poll from iothread
> iothread: release AioContext around aio_poll
> qemu-thread: introduce QemuRecMutex
> aio: convert from RFifoLock to QemuRecMutex
>
> async.c | 28 +---
> block.c | 4 +-
> block/backup.c | 7 +
> block/io.c | 281 +++++++++++++++++++++++++---------------
> block/linux-aio.c | 13 +-
> block/mirror.c | 37 +++++-
> block/nfs.c | 50 ++++---
> block/qed-table.c | 16 +--
> block/qed.c | 4 +-
> block/raw-aio.h | 2 +-
> block/raw-posix.c | 16 +--
> block/sheepdog.c | 19 +++
> block/throttle-groups.c | 19 +++
> blockjob.c | 16 ++-
> docs/multiple-iothreads.txt | 40 +++---
> include/block/aio.h | 13 +-
> include/block/block.h | 3 +-
> include/block/block_int.h | 22 +++-
> include/block/blockjob.h | 7 +
> include/block/throttle-groups.h | 1 +
> include/qemu/rfifolock.h | 54 --------
> include/qemu/thread-posix.h | 6 +
> include/qemu/thread-win32.h | 10 ++
> include/qemu/thread.h | 3 +
> iothread.c | 20 +--
> stubs/Makefile.objs | 1 +
> stubs/iothread.c | 8 ++
> tests/.gitignore | 1 -
> tests/Makefile | 2 -
> tests/qemu-iotests/060 | 8 +-
> tests/qemu-iotests/060.out | 4 +-
> tests/test-aio.c | 22 ++--
> tests/test-rfifolock.c | 91 -------------
> util/Makefile.objs | 1 -
> util/qemu-thread-posix.c | 13 ++
> util/qemu-thread-win32.c | 25 ++++
> util/rfifolock.c | 78 -----------
> 37 files changed, 471 insertions(+), 474 deletions(-)
> delete mode 100644 include/qemu/rfifolock.h
> create mode 100644 stubs/iothread.c
> delete mode 100644 tests/test-rfifolock.c
> delete mode 100644 util/rfifolock.c
>
Ping? It's been almost a month and I still have one series to send
before hard freeze...
Paolo
next prev parent reply other threads:[~2016-03-08 17:51 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-16 17:56 [Qemu-devel] [PATCH 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 01/16] block: make bdrv_start_throttled_reqs return void Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 02/16] block: move restarting of throttled reqs to block/throttle-groups.c Paolo Bonzini
2016-03-09 1:26 ` Fam Zheng
2016-03-09 7:37 ` Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 03/16] block: introduce bdrv_no_throttling_begin/end Paolo Bonzini
2016-03-09 1:45 ` Fam Zheng
2016-03-09 7:40 ` Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 04/16] block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 05/16] mirror: use bottom half to re-enter coroutine Paolo Bonzini
2016-03-09 3:19 ` Fam Zheng
2016-03-09 7:41 ` Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 06/16] block: add BDS field to count in-flight requests Paolo Bonzini
2016-03-09 3:35 ` Fam Zheng
2016-03-09 7:43 ` Paolo Bonzini
2016-03-09 8:00 ` Fam Zheng
2016-03-09 8:22 ` Paolo Bonzini
2016-03-09 8:33 ` Fam Zheng
2016-02-16 17:56 ` [Qemu-devel] [PATCH 07/16] block: change drain to look only at one child at a time Paolo Bonzini
2016-03-09 3:41 ` Fam Zheng
2016-03-09 7:49 ` Paolo Bonzini
2016-03-16 16:39 ` Stefan Hajnoczi
2016-03-16 17:41 ` Paolo Bonzini
2016-03-17 0:57 ` Fam Zheng
2016-02-16 17:56 ` [Qemu-devel] [PATCH 08/16] blockjob: introduce .drain callback for jobs Paolo Bonzini
2016-03-16 17:56 ` Stefan Hajnoczi
2016-02-16 17:56 ` [Qemu-devel] [PATCH 09/16] block: wait for all pending I/O when doing synchronous requests Paolo Bonzini
2016-03-09 8:13 ` Fam Zheng
2016-03-09 8:23 ` Paolo Bonzini
2016-03-16 18:04 ` Stefan Hajnoczi
2016-02-16 17:56 ` [Qemu-devel] [PATCH 10/16] nfs: replace aio_poll with bdrv_drain Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 11/16] sheepdog: disable dataplane Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 12/16] aio: introduce aio_context_in_iothread Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 13/16] block: only call aio_poll from iothread Paolo Bonzini
2016-03-09 8:30 ` Fam Zheng
2016-03-09 8:55 ` Paolo Bonzini
2016-03-09 9:10 ` Paolo Bonzini
2016-03-09 9:27 ` Fam Zheng
2016-02-16 17:56 ` [Qemu-devel] [PATCH 14/16] iothread: release AioContext around aio_poll Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 15/16] qemu-thread: introduce QemuRecMutex Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 16/16] aio: convert from RFifoLock to QemuRecMutex Paolo Bonzini
2016-03-08 17:51 ` Paolo Bonzini [this message]
2016-03-09 8:46 ` [Qemu-devel] [PATCH 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Fam Zheng
2016-03-16 18:18 ` Stefan Hajnoczi
2016-03-16 22:29 ` Paolo Bonzini
2016-03-17 13:44 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2016-03-17 13:48 ` Paolo Bonzini
2016-03-18 15:49 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56DF1104.6080901@redhat.com \
--to=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.