All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Cody <jcody@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Fam Zheng <famz@redhat.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org,
	Max Reitz <mreitz@redhat.com>,
	pbonzini@redhat.com, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 0/1] block: Workaround for the iotests errors
Date: Tue, 28 Nov 2017 00:43:05 -0500	[thread overview]
Message-ID: <20171128054305.GP5399@localhost.localdomain> (raw)
In-Reply-To: <20171127232909.GF4903@localhost.localdomain>

On Tue, Nov 28, 2017 at 12:29:09AM +0100, Kevin Wolf wrote:
> Am 23.11.2017 um 18:57 hat Fam Zheng geschrieben:
> > Jeff's block job patch made the latent drain bug visible, and I find this
> > patch, which by itself also makes some sense, can hide it again. :) With it
> > applied we are at least back to the ground where patchew's iotests (make
> > docker-test-block@fedora) can pass.
> > 
> > The real bug is that in the middle of bdrv_parent_drained_end(), bs's parent
> > list changes. One drained_end call before the mirror_exit() already did one
> > blk_root_drained_end(), a second drained_end on an updated parent node can do
> > another same blk_root_drained_end(), making it unbalanced with
> > blk_root_drained_begin(). This is shown by the following three backtraces as
> > captured by rr with a crashed "qemu-img commit", essentially the same as in
> > the failed iotest 020:
> 
> My conclusion what really happens in 020 is that we have a graph like
> this:
> 
>                              mirror target BB --+
>                                                 |
>                                                 v
>     qemu-img BB -> mirror_top_bs -> overlay -> base
> 
> bdrv_drained_end(base) results in it being available for requests again,
> so it calls bdrv_parent_drained_end() for overlay. While draining
> itself, the mirror job completes and changes the BdrvChild between
> mirror_top_bs and overlay (which is currently being drained) to point to
> base instead. After returning, QLIST_FOREACH() continues to iterate the
> parents of base instead of those of overlay, resulting in a second
> blk_drained_end() for the mirror target BB.
> 
> This instance can be fixed relatively easily (see below) by using
> QLIST_FOREACH_SAFE() instead.
> 
> However, I'm not sure if all problems with the graph change can be
> solved this way and whether we can really allow graph changes while
> iterating the graph for bdrv_drained_begin/end. Not allowing it would
> require some more serious changes to the block jobs that delays their
> completion until after bdrv_drain_end() has finished (not sure how to
> even get a callback at that point...)
> 

That is at least part of what is causing the segfaults that I am still
seeing (after your patch):

We enter bdrv_drain_recurse(), and the BDS has been reaped:


Thread 1 "qemu-img" received signal SIGSEGV, Segmentation fault.
0x000000010014b56e in qemu_mutex_unlock (mutex=0x76767676767676d6) at util/qemu-thread-posix.c:92
92          assert(mutex->initialized);

#0  0x000000010014b56e in qemu_mutex_unlock (mutex=0x76767676767676d6) at util/qemu-thread-posix.c:92
#1  0x00000001001450bf in aio_context_release (ctx=0x7676767676767676) at util/async.c:507
#2  0x000000010009d5c7 in bdrv_drain_recurse (bs=0x100843270, begin=false) at block/io.c:201
#3  0x000000010009d949 in bdrv_drained_end (bs=0x100843270) at block/io.c:297
#4  0x000000010002e705 in bdrv_child_cb_drained_end (child=0x100870c40) at block.c:822
#5  0x000000010009cdc6 in bdrv_parent_drained_end (bs=0x100863b10) at block/io.c:60
#6  0x000000010009d938 in bdrv_drained_end (bs=0x100863b10) at block/io.c:296
#7  0x000000010009d9d5 in bdrv_drain (bs=0x100863b10) at block/io.c:322
#8  0x000000010008c39c in blk_drain (blk=0x100881220) at block/block-backend.c:1523
#9  0x000000010009a85e in mirror_drain (job=0x10088df50) at block/mirror.c:996
#10 0x0000000100037eca in block_job_drain (job=0x10088df50) at blockjob.c:187
#11 0x000000010003856d in block_job_finish_sync (job=0x10088df50, finish=0x100038926 <block_job_complete>, errp=0x7fffffffd1d8) at blockjob.c:378
#12 0x0000000100038b80 in block_job_complete_sync (job=0x10088df50, errp=0x7fffffffd1d8) at blockjob.c:544
#13 0x0000000100023de5 in run_block_job (job=0x10088df50, errp=0x7fffffffd1d8) at qemu-img.c:872
#14 0x0000000100024305 in img_commit (argc=3, argv=0x7fffffffd390) at qemu-img.c:1034
#15 0x000000010002ccd1 in main (argc=3, argv=0x7fffffffd390) at qemu-img.c:4763


(gdb) f 2
#2  0x000000010009d5c7 in bdrv_drain_recurse (bs=0x100843270, begin=false) at block/io.c:201
201         waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
(gdb) list
196
197         /* Ensure any pending metadata writes are submitted to bs->file.  */
198         bdrv_drain_invoke(bs, begin);
199
200         /* Wait for drained requests to finish */
201         waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
202
203         QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
204             BlockDriverState *bs = child->bs;
205             bool in_main_loop =


(gdb) p *bs
$1 = {
  open_flags = 8835232, 
  read_only = true, 
  encrypted = false, 
  sg = false, 
  probed = false, 
  force_share = 56, 
  implicit = 59, 
  drv = 0x0, 
  opaque = 0x0, 
  aio_context = 0x7676767676767676, 
  aio_notifiers = {
    lh_first = 0x7676767676767676
  }, 
  [...]




> And the test cases that Jeff mentions still fail with this patch, too.
> But at least it doesn't only make failure less likely by reducing the
> window for a race condition, but seems to attack a real problem.
> 
> Kevin
> 
> 
> diff --git a/block/io.c b/block/io.c
> index 4fdf93a014..6773926fc1 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -42,9 +42,9 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>  
>  void bdrv_parent_drained_begin(BlockDriverState *bs)
>  {
> -    BdrvChild *c;
> +    BdrvChild *c, *next;
>  
> -    QLIST_FOREACH(c, &bs->parents, next_parent) {
> +    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
>          if (c->role->drained_begin) {
>              c->role->drained_begin(c);
>          }
> @@ -53,9 +53,9 @@ void bdrv_parent_drained_begin(BlockDriverState *bs)
>  
>  void bdrv_parent_drained_end(BlockDriverState *bs)
>  {
> -    BdrvChild *c;
> +    BdrvChild *c, *next;
>  
> -    QLIST_FOREACH(c, &bs->parents, next_parent) {
> +    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
>          if (c->role->drained_end) {
>              c->role->drained_end(c);
>          }
>

  parent reply	other threads:[~2017-11-28  5:43 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-23 17:57 [Qemu-devel] [PATCH 0/1] block: Workaround for the iotests errors Fam Zheng
2017-11-23 17:57 ` [Qemu-devel] [PATCH 1/1] block: Don't poll for drain end Fam Zheng
2017-11-24  6:12 ` [Qemu-devel] [PATCH 0/1] block: Workaround for the iotests errors Jeff Cody
2017-11-24  8:41   ` Fam Zheng
2017-11-24 16:39 ` Kevin Wolf
2017-11-28  2:53   ` Fam Zheng
2017-11-27 23:29 ` Kevin Wolf
2017-11-28  0:21   ` [Qemu-devel] [Qemu-block] " John Snow
2017-11-28  5:43   ` Jeff Cody [this message]
2017-11-28 11:42     ` [Qemu-devel] " Kevin Wolf
2017-11-28 12:28       ` [Qemu-devel] [Qemu-block] " Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171128054305.GP5399@localhost.localdomain \
    --to=jcody@redhat.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.