qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Cody <jcody@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Fam Zheng <famz@redhat.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org,
	Max Reitz <mreitz@redhat.com>,
	pbonzini@redhat.com, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 0/1] block: Workaround for the iotests errors
Date: Tue, 28 Nov 2017 00:43:05 -0500	[thread overview]
Message-ID: <20171128054305.GP5399@localhost.localdomain> (raw)
In-Reply-To: <20171127232909.GF4903@localhost.localdomain>

On Tue, Nov 28, 2017 at 12:29:09AM +0100, Kevin Wolf wrote:
> Am 23.11.2017 um 18:57 hat Fam Zheng geschrieben:
> > Jeff's block job patch made the latent drain bug visible, and I find this
> > patch, which by itself also makes some sense, can hide it again. :) With it
> > applied we are at least back to the ground where patchew's iotests (make
> > docker-test-block@fedora) can pass.
> > 
> > The real bug is that in the middle of bdrv_parent_drained_end(), bs's parent
> > list changes. One drained_end call before the mirror_exit() already did one
> > blk_root_drained_end(), a second drained_end on an updated parent node can do
> > another same blk_root_drained_end(), making it unbalanced with
> > blk_root_drained_begin(). This is shown by the following three backtraces as
> > captured by rr with a crashed "qemu-img commit", essentially the same as in
> > the failed iotest 020:
> 
> My conclusion what really happens in 020 is that we have a graph like
> this:
> 
>                              mirror target BB --+
>                                                 |
>                                                 v
>     qemu-img BB -> mirror_top_bs -> overlay -> base
> 
> bdrv_drained_end(base) results in it being available for requests again,
> so it calls bdrv_parent_drained_end() for overlay. While draining
> itself, the mirror job completes and changes the BdrvChild between
> mirror_top_bs and overlay (which is currently being drained) to point to
> base instead. After returning, QLIST_FOREACH() continues to iterate the
> parents of base instead of those of overlay, resulting in a second
> blk_drained_end() for the mirror target BB.
> 
> This instance can be fixed relatively easily (see below) by using
> QLIST_FOREACH_SAFE() instead.
> 
> However, I'm not sure if all problems with the graph change can be
> solved this way and whether we can really allow graph changes while
> iterating the graph for bdrv_drained_begin/end. Not allowing it would
> require some more serious changes to the block jobs that delays their
> completion until after bdrv_drain_end() has finished (not sure how to
> even get a callback at that point...)
> 

That is at least part of what is causing the segfaults that I am still
seeing (after your patch):

We enter bdrv_drain_recurse(), and the BDS has been reaped:


Thread 1 "qemu-img" received signal SIGSEGV, Segmentation fault.
0x000000010014b56e in qemu_mutex_unlock (mutex=0x76767676767676d6) at util/qemu-thread-posix.c:92
92          assert(mutex->initialized);

#0  0x000000010014b56e in qemu_mutex_unlock (mutex=0x76767676767676d6) at util/qemu-thread-posix.c:92
#1  0x00000001001450bf in aio_context_release (ctx=0x7676767676767676) at util/async.c:507
#2  0x000000010009d5c7 in bdrv_drain_recurse (bs=0x100843270, begin=false) at block/io.c:201
#3  0x000000010009d949 in bdrv_drained_end (bs=0x100843270) at block/io.c:297
#4  0x000000010002e705 in bdrv_child_cb_drained_end (child=0x100870c40) at block.c:822
#5  0x000000010009cdc6 in bdrv_parent_drained_end (bs=0x100863b10) at block/io.c:60
#6  0x000000010009d938 in bdrv_drained_end (bs=0x100863b10) at block/io.c:296
#7  0x000000010009d9d5 in bdrv_drain (bs=0x100863b10) at block/io.c:322
#8  0x000000010008c39c in blk_drain (blk=0x100881220) at block/block-backend.c:1523
#9  0x000000010009a85e in mirror_drain (job=0x10088df50) at block/mirror.c:996
#10 0x0000000100037eca in block_job_drain (job=0x10088df50) at blockjob.c:187
#11 0x000000010003856d in block_job_finish_sync (job=0x10088df50, finish=0x100038926 <block_job_complete>, errp=0x7fffffffd1d8) at blockjob.c:378
#12 0x0000000100038b80 in block_job_complete_sync (job=0x10088df50, errp=0x7fffffffd1d8) at blockjob.c:544
#13 0x0000000100023de5 in run_block_job (job=0x10088df50, errp=0x7fffffffd1d8) at qemu-img.c:872
#14 0x0000000100024305 in img_commit (argc=3, argv=0x7fffffffd390) at qemu-img.c:1034
#15 0x000000010002ccd1 in main (argc=3, argv=0x7fffffffd390) at qemu-img.c:4763


(gdb) f 2
#2  0x000000010009d5c7 in bdrv_drain_recurse (bs=0x100843270, begin=false) at block/io.c:201
201         waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
(gdb) list
196
197         /* Ensure any pending metadata writes are submitted to bs->file.  */
198         bdrv_drain_invoke(bs, begin);
199
200         /* Wait for drained requests to finish */
201         waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
202
203         QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
204             BlockDriverState *bs = child->bs;
205             bool in_main_loop =


(gdb) p *bs
$1 = {
  open_flags = 8835232, 
  read_only = true, 
  encrypted = false, 
  sg = false, 
  probed = false, 
  force_share = 56, 
  implicit = 59, 
  drv = 0x0, 
  opaque = 0x0, 
  aio_context = 0x7676767676767676, 
  aio_notifiers = {
    lh_first = 0x7676767676767676
  }, 
  [...]




> And the test cases that Jeff mentions still fail with this patch, too.
> But at least it doesn't only make failure less likely by reducing the
> window for a race condition, but seems to attack a real problem.
> 
> Kevin
> 
> 
> diff --git a/block/io.c b/block/io.c
> index 4fdf93a014..6773926fc1 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -42,9 +42,9 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>  
>  void bdrv_parent_drained_begin(BlockDriverState *bs)
>  {
> -    BdrvChild *c;
> +    BdrvChild *c, *next;
>  
> -    QLIST_FOREACH(c, &bs->parents, next_parent) {
> +    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
>          if (c->role->drained_begin) {
>              c->role->drained_begin(c);
>          }
> @@ -53,9 +53,9 @@ void bdrv_parent_drained_begin(BlockDriverState *bs)
>  
>  void bdrv_parent_drained_end(BlockDriverState *bs)
>  {
> -    BdrvChild *c;
> +    BdrvChild *c, *next;
>  
> -    QLIST_FOREACH(c, &bs->parents, next_parent) {
> +    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
>          if (c->role->drained_end) {
>              c->role->drained_end(c);
>          }
>

  parent reply	other threads:[~2017-11-28  5:43 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-23 17:57 [Qemu-devel] [PATCH 0/1] block: Workaround for the iotests errors Fam Zheng
2017-11-23 17:57 ` [Qemu-devel] [PATCH 1/1] block: Don't poll for drain end Fam Zheng
2017-11-24  6:12 ` [Qemu-devel] [PATCH 0/1] block: Workaround for the iotests errors Jeff Cody
2017-11-24  8:41   ` Fam Zheng
2017-11-24 16:39 ` Kevin Wolf
2017-11-28  2:53   ` Fam Zheng
2017-11-27 23:29 ` Kevin Wolf
2017-11-28  0:21   ` [Qemu-devel] [Qemu-block] " John Snow
2017-11-28  5:43   ` Jeff Cody [this message]
2017-11-28 11:42     ` [Qemu-devel] " Kevin Wolf
2017-11-28 12:28       ` [Qemu-devel] [Qemu-block] " Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171128054305.GP5399@localhost.localdomain \
    --to=jcody@redhat.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).