qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite
Date: Wed, 16 Mar 2016 18:18:19 +0000	[thread overview]
Message-ID: <20160316181819.GD2012@stefanha-x1.localdomain> (raw)
In-Reply-To: <1455645388-32401-1-git-send-email-pbonzini@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 6746 bytes --]

On Tue, Feb 16, 2016 at 06:56:12PM +0100, Paolo Bonzini wrote:
> So the fine-grained locking series has grown from 2 parts to 3...
> 
> This first part stops where we remove RFifoLock.
> 
> In the previous submission, the bulk of the series took care of
> associating an AioContext to a thread, so that aio_poll could run event
> handlers only on that thread.  This was done to fix a race in bdrv_drain.
> There were two disadvantages:
> 
> 1) reporting progress from aio_poll to the main thread was Really Hard.
> Despite being relatively sure of the correctness---also thanks to formal
> models---it's not the kind of code that I'd lightly donate to posterity.
> 
> 2) the strict dependency between bdrv_drain and a single AioContext
> would have been bad for multiqueue.
> 
> Instead, this series does the same trick (do not run dataplane event handlers
> from the main thread) but reports progress directly at the BlockDriverState
> level.
> 
> To do so, the mechanism to track in-flight requests is changed to a
> simple counter.  This is more flexible than BdrvTrackedRequest, and lets
> the block/io.c code track precisely the initiation and completion of the
> requests.  In particular, the counter remains nonzero when a bottom half
> is required to "really" complete the request.  bdrv_drain doesn't rely
> anymore on a non-blocking aio_poll to execute those bottom halves.
> 
> It is also more efficient; while BdrvTrackedRequest has to remain
> for request serialisation, with this series we can in principle make
> BdrvTrackedRequest optional and enable it only when something (automatic
> RMW or copy-on-read) requires request serialisation.
> 
> In general this flows nicely, the only snag being patch 5.  The problem
> here is that mirror is calling bdrv_drain from an AIO callback, which
> used to be okay (because bdrv_drain more or less tried to guess when
> all AIO callbacks were done) but now causes a deadlock (because bdrv_drain
> really checks for AIO callbacks to be complete).  To add to the complication,
> the bdrv_drain happens far away from the AIO callback, in the coroutine that
> the AIO callback enters.  The situation here is admittedly underdefined,
> and Stefan pointed out that the same solution is found in many other places
> in the QEMU block layer.  Therefore I think it's acceptable.  I'm pointing
> it out explicitly though, because (together with patch 8) it is an example
> of the issues caused by the new, stricter definition of bdrv_drain.
> 
> Patches 1-7 organize bdrv_drain into separate phases, with a well-defined
> order between the BDS and it children.  The phases are: 1) disable
> throttling; 2) disable io_plug; 3) call bdrv_drain callbacks; 4) wait
> for I/O to finish; 5) re-enable io_plug and throttling.  Previously,
> throttling of throttling and io_plug were handled by bdrv_flush_io_queue,
> which was repeatedly called as part of the I/O wait loop.  This part of
> the series removes bdrv_flush_io_queue.
> 
> Patch 8-11 replace aio_poll with bdrv_drain so that the work done
> so far applies to all former callers of aio_poll.
> 
> Patches 12-13 introduce the logic that lets the main thread wait remotely
> for an iothread to drain the BDS.  Patches 14-16 then achieve the purpose
> of this series, removing the long-running AioContext critical section
> from iothread_run and removing RFifoLock.
> 
> The series passes check-block.sh with a few fixes applied for pre-existing
> bugs:
> 
>     vl: fix migration from prelaunch state
>     migration: fix incorrect memory_global_dirty_log_start outside BQL
>     qed: fix bdrv_qed_drain
> 
> Paolo
> 
> Paolo Bonzini (16):
>   block: make bdrv_start_throttled_reqs return void
>   block: move restarting of throttled reqs to block/throttle-groups.c
>   block: introduce bdrv_no_throttling_begin/end
>   block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end
>   mirror: use bottom half to re-enter coroutine
>   block: add BDS field to count in-flight requests
>   block: change drain to look only at one child at a time
>   blockjob: introduce .drain callback for jobs
>   block: wait for all pending I/O when doing synchronous requests
>   nfs: replace aio_poll with bdrv_drain
>   sheepdog: disable dataplane
>   aio: introduce aio_context_in_iothread
>   block: only call aio_poll from iothread
>   iothread: release AioContext around aio_poll
>   qemu-thread: introduce QemuRecMutex
>   aio: convert from RFifoLock to QemuRecMutex
> 
>  async.c                         |  28 +---
>  block.c                         |   4 +-
>  block/backup.c                  |   7 +
>  block/io.c                      | 281 +++++++++++++++++++++++++---------------
>  block/linux-aio.c               |  13 +-
>  block/mirror.c                  |  37 +++++-
>  block/nfs.c                     |  50 ++++---
>  block/qed-table.c               |  16 +--
>  block/qed.c                     |   4 +-
>  block/raw-aio.h                 |   2 +-
>  block/raw-posix.c               |  16 +--
>  block/sheepdog.c                |  19 +++
>  block/throttle-groups.c         |  19 +++
>  blockjob.c                      |  16 ++-
>  docs/multiple-iothreads.txt     |  40 +++---
>  include/block/aio.h             |  13 +-
>  include/block/block.h           |   3 +-
>  include/block/block_int.h       |  22 +++-
>  include/block/blockjob.h        |   7 +
>  include/block/throttle-groups.h |   1 +
>  include/qemu/rfifolock.h        |  54 --------
>  include/qemu/thread-posix.h     |   6 +
>  include/qemu/thread-win32.h     |  10 ++
>  include/qemu/thread.h           |   3 +
>  iothread.c                      |  20 +--
>  stubs/Makefile.objs             |   1 +
>  stubs/iothread.c                |   8 ++
>  tests/.gitignore                |   1 -
>  tests/Makefile                  |   2 -
>  tests/qemu-iotests/060          |   8 +-
>  tests/qemu-iotests/060.out      |   4 +-
>  tests/test-aio.c                |  22 ++--
>  tests/test-rfifolock.c          |  91 -------------
>  util/Makefile.objs              |   1 -
>  util/qemu-thread-posix.c        |  13 ++
>  util/qemu-thread-win32.c        |  25 ++++
>  util/rfifolock.c                |  78 -----------
>  37 files changed, 471 insertions(+), 474 deletions(-)
>  delete mode 100644 include/qemu/rfifolock.h
>  create mode 100644 stubs/iothread.c
>  delete mode 100644 tests/test-rfifolock.c
>  delete mode 100644 util/rfifolock.c

Looks good overall.  I'm a little nervous about merging it for QEMU 2.6
but the block job, NBD, and data plane tests should give it a good
workout.

I have posted comments on a few patches.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

  parent reply	other threads:[~2016-03-16 18:18 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-16 17:56 [Qemu-devel] [PATCH 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 01/16] block: make bdrv_start_throttled_reqs return void Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 02/16] block: move restarting of throttled reqs to block/throttle-groups.c Paolo Bonzini
2016-03-09  1:26   ` Fam Zheng
2016-03-09  7:37     ` Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 03/16] block: introduce bdrv_no_throttling_begin/end Paolo Bonzini
2016-03-09  1:45   ` Fam Zheng
2016-03-09  7:40     ` Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 04/16] block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 05/16] mirror: use bottom half to re-enter coroutine Paolo Bonzini
2016-03-09  3:19   ` Fam Zheng
2016-03-09  7:41     ` Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 06/16] block: add BDS field to count in-flight requests Paolo Bonzini
2016-03-09  3:35   ` Fam Zheng
2016-03-09  7:43     ` Paolo Bonzini
2016-03-09  8:00       ` Fam Zheng
2016-03-09  8:22         ` Paolo Bonzini
2016-03-09  8:33           ` Fam Zheng
2016-02-16 17:56 ` [Qemu-devel] [PATCH 07/16] block: change drain to look only at one child at a time Paolo Bonzini
2016-03-09  3:41   ` Fam Zheng
2016-03-09  7:49     ` Paolo Bonzini
2016-03-16 16:39   ` Stefan Hajnoczi
2016-03-16 17:41     ` Paolo Bonzini
2016-03-17  0:57       ` Fam Zheng
2016-02-16 17:56 ` [Qemu-devel] [PATCH 08/16] blockjob: introduce .drain callback for jobs Paolo Bonzini
2016-03-16 17:56   ` Stefan Hajnoczi
2016-02-16 17:56 ` [Qemu-devel] [PATCH 09/16] block: wait for all pending I/O when doing synchronous requests Paolo Bonzini
2016-03-09  8:13   ` Fam Zheng
2016-03-09  8:23     ` Paolo Bonzini
2016-03-16 18:04   ` Stefan Hajnoczi
2016-02-16 17:56 ` [Qemu-devel] [PATCH 10/16] nfs: replace aio_poll with bdrv_drain Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 11/16] sheepdog: disable dataplane Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 12/16] aio: introduce aio_context_in_iothread Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 13/16] block: only call aio_poll from iothread Paolo Bonzini
2016-03-09  8:30   ` Fam Zheng
2016-03-09  8:55     ` Paolo Bonzini
2016-03-09  9:10     ` Paolo Bonzini
2016-03-09  9:27       ` Fam Zheng
2016-02-16 17:56 ` [Qemu-devel] [PATCH 14/16] iothread: release AioContext around aio_poll Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 15/16] qemu-thread: introduce QemuRecMutex Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 16/16] aio: convert from RFifoLock to QemuRecMutex Paolo Bonzini
2016-03-08 17:51 ` [Qemu-devel] [PATCH 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
2016-03-09  8:46 ` Fam Zheng
2016-03-16 18:18 ` Stefan Hajnoczi [this message]
2016-03-16 22:29   ` Paolo Bonzini
2016-03-17 13:44     ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2016-03-17 13:48       ` Paolo Bonzini
2016-03-18 15:49         ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160316181819.GD2012@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).