All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Max Reitz <mreitz@redhat.com>
Cc: qemu-block@nongnu.org, pbonzini@redhat.com, famz@redhat.com,
	stefanha@redhat.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 14/19] block: Defer .bdrv_drain_begin callback to polling phase
Date: Fri, 29 Jun 2018 17:14:55 +0200	[thread overview]
Message-ID: <20180629151455.GG15588@localhost.localdomain> (raw)
In-Reply-To: <dd88cb0f-1b0e-ce65-7418-78a25d3b8855@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3613 bytes --]

Am 27.06.2018 um 16:30 hat Max Reitz geschrieben:
> On 2018-04-11 18:39, Kevin Wolf wrote:
> > We cannot allow aio_poll() in bdrv_drain_invoke(begin=true) until we're
> > done with propagating the drain through the graph and are doing the
> > single final BDRV_POLL_WHILE().
> > 
> > Just schedule the coroutine with the callback and increase bs->in_flight
> > to make sure that the polling phase will wait for it.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/io.c | 28 +++++++++++++++++++++++-----
> >  1 file changed, 23 insertions(+), 5 deletions(-)
> 
> According to bisect, this breaks blockdev-snapshot with QED:

I have no idea why it would trigger only with qed, but I think the real
bug is in commit dcf94a23b1 ('block: Don't poll in parent drain
callbacks').

The crash is specifically in the place where the new overlay needs to be
drained because it's new backing file is drained. First,
bdrv_drain_invoke() creates new activity on the overlay when it gets the
old active layer attached as a backing file:

#0  0x000055b90eb73b88 in bdrv_drain_invoke (bs=0x55b910fd8440, begin=<optimized out>) at block/io.c:217
#1  0x000055b90eb1e2b0 in bdrv_replace_child_noperm (child=0x55b911f32450, new_bs=<optimized out>) at block.c:2069
#2  0x000055b90eb20875 in bdrv_replace_child (child=<optimized out>, new_bs=0x55b910fd2130) at block.c:2098
#3  0x000055b90eb20c53 in bdrv_root_attach_child (child_bs=child_bs@entry=0x55b910fd2130, child_name=child_name@entry=0x55b90eda75e5 "backing", child_role=child_role@entry=0x55b90f3d3300 <child_backing>, perm=0, shared_perm=31, opaque=opaque@entry=0x55b910fd8440, errp=0x7fffb8943620) at block.c:2141
#4  0x000055b90eb20d60 in bdrv_attach_child (parent_bs=parent_bs@entry=0x55b910fd8440, child_bs=child_bs@entry=0x55b910fd2130, child_name=child_name@entry=0x55b90eda75e5 "backing", child_role=child_role@entry=0x55b90f3d3300 <child_backing>, errp=0x7fffb8943620) at block.c:2162
#5  0x000055b90eb23c30 in bdrv_set_backing_hd (bs=bs@entry=0x55b910fd8440, backing_hd=backing_hd@entry=0x55b910fd2130, errp=errp@entry=0x7fffb8943620) at block.c:2249
#6  0x000055b90eb24a76 in bdrv_append (bs_new=0x55b910fd8440, bs_top=0x55b910fd2130, errp=errp@entry=0x7fffb8943680) at block.c:3535
#7  0x000055b90e937a89 in external_snapshot_prepare (common=0x55b910f5cf90, errp=0x7fffb89436f8) at blockdev.c:1680

And then, when trying to move all users of the old active layer to the
new overlay, it turns out that the bdrv_drain_invoke() BH hasn't been
executed yet:

#0  0x00007fdfcef5d9fb in raise () at /lib64/libc.so.6
#1  0x00007fdfcef5f800 in abort () at /lib64/libc.so.6
#2  0x00007fdfcef560da in __assert_fail_base () at /lib64/libc.so.6
#3  0x00007fdfcef56152 in  () at /lib64/libc.so.6
#4  0x000055b90eb24867 in bdrv_replace_node (from=<optimized out>, to=0x55b910fd8440, errp=0x7fffb8943620) at block.c:3470
#5  0x000055b90eb24abe in bdrv_append (bs_new=0x55b910fd8440, bs_top=0x55b910fd2130, errp=errp@entry=0x7fffb8943680) at block.c:3541
#6  0x000055b90e937a89 in external_snapshot_prepare (common=0x55b910f5cf90, errp=0x7fffb89436f8) at blockdev.c:1680

Not polling in bdrv_child_cb_drained_begin() is great if the original
drain is still polling, but if the original drain_begin has already
returned, we obviously do need to poll so we don't enter a drained
section while requests are still running.

Another thing I noticed is that qmp_transaction() only calls a one-time
bdrv_drain_all() instead of using a proper begin/end section. I suppose
this should be fixed as well.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

  reply	other threads:[~2018-06-29 15:15 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 01/19] test-bdrv-drain: bdrv_drain() works with cross-AioContext events Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 02/19] block: Use bdrv_do_drain_begin/end in bdrv_drain_all() Kevin Wolf
2018-04-20  7:07   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 03/19] block: Remove 'recursive' parameter from bdrv_drain_invoke() Kevin Wolf
2018-04-20  7:09   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 04/19] block: Don't manually poll in bdrv_drain_all() Kevin Wolf
2018-04-11 18:32   ` Eric Blake
2018-04-20  7:11   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 05/19] tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now Kevin Wolf
2018-04-11 18:33   ` Eric Blake
2018-04-20  7:12   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 06/19] block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE() Kevin Wolf
2018-04-11 17:33   ` Su Hang
2018-04-20  7:17   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain Kevin Wolf
2018-04-12  8:37   ` Paolo Bonzini
2018-04-12  9:51     ` Kevin Wolf
2018-04-12 10:12       ` Paolo Bonzini
2018-04-12 11:11         ` Kevin Wolf
2018-04-12 11:30           ` Paolo Bonzini
2018-04-12 11:53             ` Kevin Wolf
2018-04-12 12:02               ` Paolo Bonzini
2018-04-12 13:27                 ` Kevin Wolf
2018-04-12 13:42                   ` Paolo Bonzini
2018-04-12 14:25                     ` Kevin Wolf
2018-04-12 20:44                       ` Paolo Bonzini
2018-04-13  8:01                         ` Kevin Wolf
2018-04-13 11:05                           ` [Qemu-devel] [Qemu-block] " Paolo Bonzini
2018-04-13 12:40                             ` Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 08/19] block: Remove bdrv_drain_recurse() Kevin Wolf
2018-04-12  8:39   ` Paolo Bonzini
2018-04-20  7:20   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 09/19] test-bdrv-drain: Add test for node deletion Kevin Wolf
2018-04-20  7:32   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 10/19] block: Drain recursively with a single BDRV_POLL_WHILE() Kevin Wolf
2018-04-12  8:41   ` Paolo Bonzini
2018-04-11 16:39 ` [Qemu-devel] [PATCH 11/19] test-bdrv-drain: Test node deletion in subtree recursion Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 12/19] block: Don't poll in parent drain callbacks Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 13/19] test-bdrv-drain: Graph change through parent callback Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 14/19] block: Defer .bdrv_drain_begin callback to polling phase Kevin Wolf
2018-06-27 14:30   ` Max Reitz
2018-06-29 15:14     ` Kevin Wolf [this message]
2018-04-11 16:39 ` [Qemu-devel] [PATCH 15/19] test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 16/19] block: Allow AIO_WAIT_WHILE with NULL ctx Kevin Wolf
2018-04-12  8:43   ` Paolo Bonzini
2018-04-11 16:39 ` [Qemu-devel] [PATCH 17/19] block: Move bdrv_drain_all_begin() out of coroutine context Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 18/19] block: Allow graph changes in bdrv_drain_all_begin/end sections Kevin Wolf
2018-04-12  8:47   ` Paolo Bonzini
2018-04-11 16:39 ` [Qemu-devel] [PATCH 19/19] test-bdrv-drain: Test graph changes in drain_all section Kevin Wolf
2018-04-11 17:05 ` [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 no-reply
2018-04-20  7:35 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180629151455.GG15588@localhost.localdomain \
    --to=kwolf@redhat.com \
    --cc=famz@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.