qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: qemu-block@nongnu.org
Cc: qemu-devel@nongnu.org, jsnow@redhat.com, mreitz@redhat.com,
	kwolf@redhat.com, philmd@redhat.com, peter.maydell@linaro.org,
	berto@igalia.com, stefanha@redhat.com, pbonzini@redhat.com,
	vsementsov@virtuozzo.com, den@openvz.org, eblake@redhat.com
Subject: [PATCH RFC 0/5] Fix accidental crash in iotest 30
Date: Fri, 20 Nov 2020 19:16:17 +0300	[thread overview]
Message-ID: <20201120161622.1537-1-vsementsov@virtuozzo.com> (raw)

Hi all!

As Peter recently noted, iotest 30 accidentally fails.

I found that Qemu crashes due to interleaving of graph-update operations of parallel mirror and stream block-jobs.

So, here is a "workaround" to discuss.

It's of course not the full solution, as if we decide to go this way we should protect by the mutex all graph-modifying operations, not only here. And move everything into coroutine..

So, I send this mostly as a starting point for discussion, may be someone imagine better solution.

Main patches are 04-05. 01-02 only simplify debugging and 03 is
preparation for 04.

Original qemu crash looks like this:

#0  0x00007f7029b23e35 in raise () at /lib64/libc.so.6
#1  0x00007f7029b0e895 in abort () at /lib64/libc.so.6
#2  0x00007f7029b0e769 in _nl_load_domain.cold () at /lib64/libc.so.6
#3  0x00007f7029b1c566 in annobin_assert.c_end () at /lib64/libc.so.6
#4  0x0000558f3d92f15a in bdrv_replace_child (child=0x558f3fa7c400, new_bs=0x0) at ../block.c:2648
#5  0x0000558f3d92f6e1 in bdrv_detach_child (child=0x558f3fa7c400) at ../block.c:2777
#6  0x0000558f3d92f723 in bdrv_root_unref_child (child=0x558f3fa7c400) at ../block.c:2789
#7  0x0000558f3d897f4c in block_job_remove_all_bdrv (job=0x558f3f626940) at ../blockjob.c:191
#8  0x0000558f3d897c73 in block_job_free (job=0x558f3f626940) at ../blockjob.c:88
#9  0x0000558f3d891456 in job_unref (job=0x558f3f626940) at ../job.c:380
#10 0x0000558f3d892602 in job_exit (opaque=0x558f3f626940) at ../job.c:894
#11 0x0000558f3d9ce2fb in aio_bh_call (bh=0x558f3f5dc480) at ../util/async.c:136
#12 0x0000558f3d9ce405 in aio_bh_poll (ctx=0x558f3e80c5f0) at ../util/async.c:164
#13 0x0000558f3d9f75ea in aio_dispatch (ctx=0x558f3e80c5f0) at ../util/aio-posix.c:381
#14 0x0000558f3d9ce836 in aio_ctx_dispatch (source=0x558f3e80c5f0, callback=0x0, user_data=0x0)
    at ../util/async.c:306
#15 0x00007f702ae75ecd in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#16 0x0000558f3da09e33 in glib_pollfds_poll () at ../util/main-loop.c:221
#17 0x0000558f3da09ead in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:244
#18 0x0000558f3da09fb5 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:520
#19 0x0000558f3d7836b7 in qemu_main_loop () at ../softmmu/vl.c:1678
#20 0x0000558f3d317316 in main (argc=20, argv=0x7fffa94d35a8, envp=0x7fffa94d3650)
    at ../softmmu/main.c:50
(gdb) fr 4
#4  0x0000558f3d92f15a in bdrv_replace_child (child=0x558f3fa7c400, new_bs=0x0) at ../block.c:2648
2648            assert(tighten_restrictions == false);
(gdb) list
2643            int ret;
2644
2645            bdrv_get_cumulative_perm(old_bs, &perm, &shared_perm);
2646            ret = bdrv_check_perm(old_bs, NULL, perm, shared_perm, NULL,
2647                                  &tighten_restrictions, NULL);
2648            assert(tighten_restrictions == false);
2649            if (ret < 0) {
2650                /* We only tried to loosen restrictions, so errors are not fatal */
2651                bdrv_abort_perm_update(old_bs);
2652            } else {


And my exploration shows that this due to permission-graph already broken before this permission update. So we tighten restrictions not because removing the child but because we recalculate broken permissions graph and it becomes correct (and more strict unfortunately).


Also, please look through my explorations on this topic in threads:

"iotest 030 still occasionally intermittently failing"
https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg04018.html

"question about bdrv_replace_node"
https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg04478.html

Vladimir Sementsov-Ogievskiy (5):
  abort-on-set-to-true
  iotest-30-shorten: concentrate on failing test case
  scripts/block-coroutine-wrapper.py: allow more function types
  block: move some mirror and stream handlers to coroutine
  block: protect some graph-modifyng things by mutex

 block/coroutines.h                 | 11 +++++++
 include/block/block.h              |  2 ++
 block.c                            | 36 +++++++++++++++------
 block/mirror.c                     |  9 ++++--
 block/stream.c                     |  9 ++++--
 scripts/block-coroutine-wrapper.py | 36 +++++++++++++--------
 tests/qemu-iotests/030             | 52 +++++++++++++++---------------
 tests/qemu-iotests/030.out         |  4 +--
 8 files changed, 105 insertions(+), 54 deletions(-)

-- 
2.21.3



             reply	other threads:[~2020-11-20 16:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-20 16:16 Vladimir Sementsov-Ogievskiy [this message]
2020-11-20 16:16 ` [PATCH 1/5] abort-on-set-to-true Vladimir Sementsov-Ogievskiy
2020-11-20 16:16 ` [PATCH 2/5] iotest-30-shorten: concentrate on failing test case Vladimir Sementsov-Ogievskiy
2020-11-20 16:16 ` [PATCH 3/5] scripts/block-coroutine-wrapper.py: allow more function types Vladimir Sementsov-Ogievskiy
2020-11-20 16:16 ` [PATCH 4/5] block: move some mirror and stream handlers to coroutine Vladimir Sementsov-Ogievskiy
2020-11-20 16:16 ` [PATCH 5/5] block: protect some graph-modifyng things by mutex Vladimir Sementsov-Ogievskiy
2020-11-20 16:27 ` [PATCH RFC 0/5] Fix accidental crash in iotest 30 no-reply
2020-11-20 16:35   ` Vladimir Sementsov-Ogievskiy
2020-11-20 16:36 ` Kevin Wolf
2020-11-20 16:43   ` Vladimir Sementsov-Ogievskiy
2020-11-20 17:22     ` Kevin Wolf
2020-11-20 18:19       ` Vladimir Sementsov-Ogievskiy
2020-11-23 10:10         ` Kevin Wolf
2020-11-23 10:29           ` Vladimir Sementsov-Ogievskiy
2020-11-23 11:10             ` Kevin Wolf
2020-11-23 13:44               ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201120161622.1537-1-vsementsov@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=berto@igalia.com \
    --cc=den@openvz.org \
    --cc=eblake@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=philmd@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).