From: Stefan Hajnoczi <stefanha@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite
Date: Wed, 16 Mar 2016 18:18:19 +0000 [thread overview]
Message-ID: <20160316181819.GD2012@stefanha-x1.localdomain> (raw)
In-Reply-To: <1455645388-32401-1-git-send-email-pbonzini@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 6746 bytes --]
On Tue, Feb 16, 2016 at 06:56:12PM +0100, Paolo Bonzini wrote:
> So the fine-grained locking series has grown from 2 parts to 3...
>
> This first part stops where we remove RFifoLock.
>
> In the previous submission, the bulk of the series took care of
> associating an AioContext to a thread, so that aio_poll could run event
> handlers only on that thread. This was done to fix a race in bdrv_drain.
> There were two disadvantages:
>
> 1) reporting progress from aio_poll to the main thread was Really Hard.
> Despite being relatively sure of the correctness---also thanks to formal
> models---it's not the kind of code that I'd lightly donate to posterity.
>
> 2) the strict dependency between bdrv_drain and a single AioContext
> would have been bad for multiqueue.
>
> Instead, this series does the same trick (do not run dataplane event handlers
> from the main thread) but reports progress directly at the BlockDriverState
> level.
>
> To do so, the mechanism to track in-flight requests is changed to a
> simple counter. This is more flexible than BdrvTrackedRequest, and lets
> the block/io.c code track precisely the initiation and completion of the
> requests. In particular, the counter remains nonzero when a bottom half
> is required to "really" complete the request. bdrv_drain doesn't rely
> anymore on a non-blocking aio_poll to execute those bottom halves.
>
> It is also more efficient; while BdrvTrackedRequest has to remain
> for request serialisation, with this series we can in principle make
> BdrvTrackedRequest optional and enable it only when something (automatic
> RMW or copy-on-read) requires request serialisation.
>
> In general this flows nicely, the only snag being patch 5. The problem
> here is that mirror is calling bdrv_drain from an AIO callback, which
> used to be okay (because bdrv_drain more or less tried to guess when
> all AIO callbacks were done) but now causes a deadlock (because bdrv_drain
> really checks for AIO callbacks to be complete). To add to the complication,
> the bdrv_drain happens far away from the AIO callback, in the coroutine that
> the AIO callback enters. The situation here is admittedly underdefined,
> and Stefan pointed out that the same solution is found in many other places
> in the QEMU block layer. Therefore I think it's acceptable. I'm pointing
> it out explicitly though, because (together with patch 8) it is an example
> of the issues caused by the new, stricter definition of bdrv_drain.
>
> Patches 1-7 organize bdrv_drain into separate phases, with a well-defined
> order between the BDS and it children. The phases are: 1) disable
> throttling; 2) disable io_plug; 3) call bdrv_drain callbacks; 4) wait
> for I/O to finish; 5) re-enable io_plug and throttling. Previously,
> throttling of throttling and io_plug were handled by bdrv_flush_io_queue,
> which was repeatedly called as part of the I/O wait loop. This part of
> the series removes bdrv_flush_io_queue.
>
> Patch 8-11 replace aio_poll with bdrv_drain so that the work done
> so far applies to all former callers of aio_poll.
>
> Patches 12-13 introduce the logic that lets the main thread wait remotely
> for an iothread to drain the BDS. Patches 14-16 then achieve the purpose
> of this series, removing the long-running AioContext critical section
> from iothread_run and removing RFifoLock.
>
> The series passes check-block.sh with a few fixes applied for pre-existing
> bugs:
>
> vl: fix migration from prelaunch state
> migration: fix incorrect memory_global_dirty_log_start outside BQL
> qed: fix bdrv_qed_drain
>
> Paolo
>
> Paolo Bonzini (16):
> block: make bdrv_start_throttled_reqs return void
> block: move restarting of throttled reqs to block/throttle-groups.c
> block: introduce bdrv_no_throttling_begin/end
> block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end
> mirror: use bottom half to re-enter coroutine
> block: add BDS field to count in-flight requests
> block: change drain to look only at one child at a time
> blockjob: introduce .drain callback for jobs
> block: wait for all pending I/O when doing synchronous requests
> nfs: replace aio_poll with bdrv_drain
> sheepdog: disable dataplane
> aio: introduce aio_context_in_iothread
> block: only call aio_poll from iothread
> iothread: release AioContext around aio_poll
> qemu-thread: introduce QemuRecMutex
> aio: convert from RFifoLock to QemuRecMutex
>
> async.c | 28 +---
> block.c | 4 +-
> block/backup.c | 7 +
> block/io.c | 281 +++++++++++++++++++++++++---------------
> block/linux-aio.c | 13 +-
> block/mirror.c | 37 +++++-
> block/nfs.c | 50 ++++---
> block/qed-table.c | 16 +--
> block/qed.c | 4 +-
> block/raw-aio.h | 2 +-
> block/raw-posix.c | 16 +--
> block/sheepdog.c | 19 +++
> block/throttle-groups.c | 19 +++
> blockjob.c | 16 ++-
> docs/multiple-iothreads.txt | 40 +++---
> include/block/aio.h | 13 +-
> include/block/block.h | 3 +-
> include/block/block_int.h | 22 +++-
> include/block/blockjob.h | 7 +
> include/block/throttle-groups.h | 1 +
> include/qemu/rfifolock.h | 54 --------
> include/qemu/thread-posix.h | 6 +
> include/qemu/thread-win32.h | 10 ++
> include/qemu/thread.h | 3 +
> iothread.c | 20 +--
> stubs/Makefile.objs | 1 +
> stubs/iothread.c | 8 ++
> tests/.gitignore | 1 -
> tests/Makefile | 2 -
> tests/qemu-iotests/060 | 8 +-
> tests/qemu-iotests/060.out | 4 +-
> tests/test-aio.c | 22 ++--
> tests/test-rfifolock.c | 91 -------------
> util/Makefile.objs | 1 -
> util/qemu-thread-posix.c | 13 ++
> util/qemu-thread-win32.c | 25 ++++
> util/rfifolock.c | 78 -----------
> 37 files changed, 471 insertions(+), 474 deletions(-)
> delete mode 100644 include/qemu/rfifolock.h
> create mode 100644 stubs/iothread.c
> delete mode 100644 tests/test-rfifolock.c
> delete mode 100644 util/rfifolock.c
Looks good overall. I'm a little nervous about merging it for QEMU 2.6
but the block job, NBD, and data plane tests should give it a good
workout.
I have posted comments on a few patches.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
next prev parent reply other threads:[~2016-03-16 18:18 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-16 17:56 [Qemu-devel] [PATCH 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 01/16] block: make bdrv_start_throttled_reqs return void Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 02/16] block: move restarting of throttled reqs to block/throttle-groups.c Paolo Bonzini
2016-03-09 1:26 ` Fam Zheng
2016-03-09 7:37 ` Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 03/16] block: introduce bdrv_no_throttling_begin/end Paolo Bonzini
2016-03-09 1:45 ` Fam Zheng
2016-03-09 7:40 ` Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 04/16] block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 05/16] mirror: use bottom half to re-enter coroutine Paolo Bonzini
2016-03-09 3:19 ` Fam Zheng
2016-03-09 7:41 ` Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 06/16] block: add BDS field to count in-flight requests Paolo Bonzini
2016-03-09 3:35 ` Fam Zheng
2016-03-09 7:43 ` Paolo Bonzini
2016-03-09 8:00 ` Fam Zheng
2016-03-09 8:22 ` Paolo Bonzini
2016-03-09 8:33 ` Fam Zheng
2016-02-16 17:56 ` [Qemu-devel] [PATCH 07/16] block: change drain to look only at one child at a time Paolo Bonzini
2016-03-09 3:41 ` Fam Zheng
2016-03-09 7:49 ` Paolo Bonzini
2016-03-16 16:39 ` Stefan Hajnoczi
2016-03-16 17:41 ` Paolo Bonzini
2016-03-17 0:57 ` Fam Zheng
2016-02-16 17:56 ` [Qemu-devel] [PATCH 08/16] blockjob: introduce .drain callback for jobs Paolo Bonzini
2016-03-16 17:56 ` Stefan Hajnoczi
2016-02-16 17:56 ` [Qemu-devel] [PATCH 09/16] block: wait for all pending I/O when doing synchronous requests Paolo Bonzini
2016-03-09 8:13 ` Fam Zheng
2016-03-09 8:23 ` Paolo Bonzini
2016-03-16 18:04 ` Stefan Hajnoczi
2016-02-16 17:56 ` [Qemu-devel] [PATCH 10/16] nfs: replace aio_poll with bdrv_drain Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 11/16] sheepdog: disable dataplane Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 12/16] aio: introduce aio_context_in_iothread Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 13/16] block: only call aio_poll from iothread Paolo Bonzini
2016-03-09 8:30 ` Fam Zheng
2016-03-09 8:55 ` Paolo Bonzini
2016-03-09 9:10 ` Paolo Bonzini
2016-03-09 9:27 ` Fam Zheng
2016-02-16 17:56 ` [Qemu-devel] [PATCH 14/16] iothread: release AioContext around aio_poll Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 15/16] qemu-thread: introduce QemuRecMutex Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 16/16] aio: convert from RFifoLock to QemuRecMutex Paolo Bonzini
2016-03-08 17:51 ` [Qemu-devel] [PATCH 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
2016-03-09 8:46 ` Fam Zheng
2016-03-16 18:18 ` Stefan Hajnoczi [this message]
2016-03-16 22:29 ` Paolo Bonzini
2016-03-17 13:44 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2016-03-17 13:48 ` Paolo Bonzini
2016-03-18 15:49 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160316181819.GD2012@stefanha-x1.localdomain \
--to=stefanha@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).