qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 07/16] block: change drain to look only at one child at a time
Date: Wed, 16 Mar 2016 16:39:08 +0000	[thread overview]
Message-ID: <20160316163908.GA2012@stefanha-x1.localdomain> (raw)
In-Reply-To: <1455645388-32401-8-git-send-email-pbonzini@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3880 bytes --]

On Tue, Feb 16, 2016 at 06:56:19PM +0100, Paolo Bonzini wrote:
> bdrv_requests_pending is checking children to also wait until internal
> requests (such as metadata writes) have completed.  However, checking
> children is in general overkill because, apart from this special case,
> the parent's in_flight count will always be incremented by at least one
> for every request in the child.
> 
> Since internal requests are only generated by the parent in the child,
> instead visit the tree parent first, and then wait for internal I/O in
> the children to complete.

This assumption is true if the BDS graph is a tree.  When there a
multiple roots (i.e. it's a directed acyclic graph), especially with
roots at different levels, then I wonder if pre-order bdrv_drain
traversal leaves open the possibility that I/Os sneak into parts of the
BDS graph that we thought was already drained.

The advantage of the current approach is that it really wait until the
whole (sub)tree is idle.

Concrete example: image fleecing involves adding an ephemeral qcow2 file
which is exported via NBD.  The guest keeps writing to the disk image as
normal but the original data will be backed up to the ephemeral qcow2
file before being overwritten, so that the client sees a point-in-time
snapshot of the disk image.

The tree looks like this:

          [NBD export]
             /
            v
[guest] temporary qcow2
   \    /
    v  v
    disk

Block backend access is in square brackets.  Nodes without square
brackets are BDS nodes.

If the guest wants to drain the disk, it's possible for new I/O requests
to enter the disk BDS while we're recursing to disk's children because
the NBD export socket fd is in the same AIOContext.  The socket fd is
therefore handled during aio_poll() calls.

I'm not 100% sure that this is a problem, but I wonder if you've thought
about this?

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/io.c | 27 +++++++++++++++++++--------
>  1 file changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index a9a23a6..e0c9215 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -249,6 +249,23 @@ static void bdrv_drain_recurse(BlockDriverState *bs)
>      }
>  }
>  
> +static bool bdrv_drain_io_recurse(BlockDriverState *bs)
> +{
> +    BdrvChild *child;
> +    bool waited = false;
> +
> +    while (atomic_read(&bs->in_flight) > 0) {
> +        aio_poll(bdrv_get_aio_context(bs), true);
> +        waited = true;
> +    }
> +
> +    QLIST_FOREACH(child, &bs->children, next) {
> +        waited |= bdrv_drain_io_recurse(child->bs);
> +    }
> +
> +    return waited;
> +}
> +
>  /*
>   * Wait for pending requests to complete on a single BlockDriverState subtree,
>   * and suspend block driver's internal I/O until next request arrives.
> @@ -265,10 +282,7 @@ void bdrv_drain(BlockDriverState *bs)
>      bdrv_no_throttling_begin(bs);
>      bdrv_io_unplugged_begin(bs);
>      bdrv_drain_recurse(bs);
> -    while (bdrv_requests_pending(bs)) {
> -        /* Keep iterating */
> -         aio_poll(bdrv_get_aio_context(bs), true);
> -    }
> +    bdrv_drain_io_recurse(bs);
>      bdrv_io_unplugged_end(bs);
>      bdrv_no_throttling_end(bs);
>  }
> @@ -319,10 +333,7 @@ void bdrv_drain_all(void)
>              aio_context_acquire(aio_context);
>              while ((bs = bdrv_next(bs))) {
>                  if (aio_context == bdrv_get_aio_context(bs)) {
> -                    if (bdrv_requests_pending(bs)) {
> -                        aio_poll(aio_context, true);
> -                        waited = true;
> -                    }
> +                    waited |= bdrv_drain_io_recurse(bs);
>                  }
>              }
>              aio_context_release(aio_context);
> -- 
> 2.5.0
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

  parent reply	other threads:[~2016-03-16 16:39 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-16 17:56 [Qemu-devel] [PATCH 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 01/16] block: make bdrv_start_throttled_reqs return void Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 02/16] block: move restarting of throttled reqs to block/throttle-groups.c Paolo Bonzini
2016-03-09  1:26   ` Fam Zheng
2016-03-09  7:37     ` Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 03/16] block: introduce bdrv_no_throttling_begin/end Paolo Bonzini
2016-03-09  1:45   ` Fam Zheng
2016-03-09  7:40     ` Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 04/16] block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 05/16] mirror: use bottom half to re-enter coroutine Paolo Bonzini
2016-03-09  3:19   ` Fam Zheng
2016-03-09  7:41     ` Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 06/16] block: add BDS field to count in-flight requests Paolo Bonzini
2016-03-09  3:35   ` Fam Zheng
2016-03-09  7:43     ` Paolo Bonzini
2016-03-09  8:00       ` Fam Zheng
2016-03-09  8:22         ` Paolo Bonzini
2016-03-09  8:33           ` Fam Zheng
2016-02-16 17:56 ` [Qemu-devel] [PATCH 07/16] block: change drain to look only at one child at a time Paolo Bonzini
2016-03-09  3:41   ` Fam Zheng
2016-03-09  7:49     ` Paolo Bonzini
2016-03-16 16:39   ` Stefan Hajnoczi [this message]
2016-03-16 17:41     ` Paolo Bonzini
2016-03-17  0:57       ` Fam Zheng
2016-02-16 17:56 ` [Qemu-devel] [PATCH 08/16] blockjob: introduce .drain callback for jobs Paolo Bonzini
2016-03-16 17:56   ` Stefan Hajnoczi
2016-02-16 17:56 ` [Qemu-devel] [PATCH 09/16] block: wait for all pending I/O when doing synchronous requests Paolo Bonzini
2016-03-09  8:13   ` Fam Zheng
2016-03-09  8:23     ` Paolo Bonzini
2016-03-16 18:04   ` Stefan Hajnoczi
2016-02-16 17:56 ` [Qemu-devel] [PATCH 10/16] nfs: replace aio_poll with bdrv_drain Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 11/16] sheepdog: disable dataplane Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 12/16] aio: introduce aio_context_in_iothread Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 13/16] block: only call aio_poll from iothread Paolo Bonzini
2016-03-09  8:30   ` Fam Zheng
2016-03-09  8:55     ` Paolo Bonzini
2016-03-09  9:10     ` Paolo Bonzini
2016-03-09  9:27       ` Fam Zheng
2016-02-16 17:56 ` [Qemu-devel] [PATCH 14/16] iothread: release AioContext around aio_poll Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 15/16] qemu-thread: introduce QemuRecMutex Paolo Bonzini
2016-02-16 17:56 ` [Qemu-devel] [PATCH 16/16] aio: convert from RFifoLock to QemuRecMutex Paolo Bonzini
2016-03-08 17:51 ` [Qemu-devel] [PATCH 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite Paolo Bonzini
2016-03-09  8:46 ` Fam Zheng
2016-03-16 18:18 ` Stefan Hajnoczi
2016-03-16 22:29   ` Paolo Bonzini
2016-03-17 13:44     ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2016-03-17 13:48       ` Paolo Bonzini
2016-03-18 15:49         ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160316163908.GA2012@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).