All of lore.kernel.org
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: qemu-devel@nongnu.org, Qemu-block <qemu-block@nongnu.org>,
	Bug 1570134 <1570134@bugs.launchpad.net>
Cc: Fam Zheng <famz@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [Bug 1570134] Re: While committing snapshot qemu crashes with SIGABRT
Date: Wed, 20 Apr 2016 20:09:55 +0200	[thread overview]
Message-ID: <5717C5F3.90603@redhat.com> (raw)
In-Reply-To: <20160420000318.17358.96092.malone@soybean.canonical.com>


[-- Attachment #1.1: Type: text/plain, Size: 5056 bytes --]

On 20.04.2016 02:03, Matthew Schumacher wrote:
> Max,
> 
> Qemu still crashes for me, but the debug is again very different.  When
> I attach to the qemu process from gdb, it is unable to provide a
> backtrace when it crashes.  The log file is different too.  Any ideas?
> 
> qemu-system-x86_64: block.c:2307: bdrv_replace_in_backing_chain:
> Assertion `!bdrv_requests_pending(old)' failed.

This message is exactly the same as you saw in 2.5.1, so I guess we've
at least averted a regression in 2.6.0.

I'm CC-ing some people who are more involved with this (although Paolo
is on PTO right now, but well...). (The following is more of a note to
those people than to you, Matthew.)

Summary: I think bdrv_drained_begin() does not behave as advertised.

So the assertion that is failing here asserts that no requests are
pending on the mirror block jobs source BDS. However, we do invoke a
bdrv_drained_begin() on exactly that BDS at the end of mirror_run().

When that function returns, there are indeed no more requests pending
for that BDS. But once mirror_exit() is invoked, there may be new
requests pending.

I reproduced that by running bonnie++ in a guest and then just committed
a snapshot and invoked block-job-complete right after the
BLOCK_JOB_READY event; sometimes, in bdrv_requests_pending(s->common.bs)
is true in mirror_exit() (which is bad), sometimes it's false. I just
used a plain virtio-blk drive without dataplane.

I'm not sure exactly how bdrv_drained_begin() and in turn
aio_disable_external() are supposed to work, but as a matter of fact a
BDS may receive requests even after those functions are called. Just
putting an assert(!bs->quiesce_counter) in tracked_request_begin() will
make it fail even before I started the mirror block job (due to some flush).

So in my case the problematic request regarding the mirroring comes from
blk_aio_ready_entry(); putting an assert(!blk_bs(blk)->quiesce_counter)
into blk_aio_readv() yields the following backtrace:

#0  0x00007f3e750bd2a8 in raise () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007f3e750be72a in abort () from /usr/lib/libc.so.6
No symbol table info available.
#2  0x00007f3e750b61b7 in __assert_fail_base () from /usr/lib/libc.so.6
No symbol table info available.
#3  0x00007f3e750b6262 in __assert_fail () from /usr/lib/libc.so.6
No symbol table info available.
#4  0x0000564cf7d4e25e in blk_aio_readv (blk=<optimized out>,
sector_num=<optimized out>, iov=<optimized out>, nb_sectors=<optimized
out>, cb=<optimized out>, opaque=<optimized out>) at
qemu/block/block-backend.c:1002
        __PRETTY_FUNCTION__ = "blk_aio_readv"
#5  0x0000564cf7ab2cf3 in submit_requests (niov=<optimized out>,
num_reqs=<optimized out>, start=<optimized out>, mrb=<optimized out>,
blk=<optimized out>) at qemu/hw/block/virtio-blk.c:361
        nb_sectors = <optimized out>
        is_write = <optimized out>
        qiov = <optimized out>
        sector_num = <optimized out>
#6  virtio_blk_submit_multireq (blk=0x564cf9f80250,
mrb=mrb@entry=0x7ffeffbfce40) at qemu/hw/block/virtio-blk.c:391
        i = <optimized out>
        start = <optimized out>
        num_reqs = <optimized out>
        niov = <optimized out>
        nb_sectors = <optimized out>
        max_xfer_len = <optimized out>
        sector_num = <optimized out>
#7  0x0000564cf7ab38c2 in virtio_blk_handle_vq (s=0x564cf9e51268,
vq=<optimized out>) at qemu/hw/block/virtio-blk.c:593
        req = 0x0
        mrb = {reqs = {0x564cfb8e8c30, 0x564cfb7bc290, 0x0 <repeats 30
times>}, num_reqs = 2, is_write = false}
#8  0x0000564cf7addcf5 in virtio_queue_notify_vq (vq=0x564cfa000be0) at
qemu/hw/virtio/virtio.c:1108
        vdev = 0x564cf9e51268
#9  0x0000564cf7d19980 in aio_dispatch (ctx=0x564cf9e42f40) at
qemu/aio-posix.c:327
        tmp = <optimized out>
        revents = <optimized out>
        node = 0x7f3e54015030
        progress = false
#10 0x0000564cf7d0eecd in aio_ctx_dispatch (source=<optimized out>,
callback=<optimized out>, user_data=<optimized out>) at qemu/async.c:233
        ctx = <optimized out>
#11 0x00007f3e781d7f07 in g_main_context_dispatch () from
/usr/lib/libglib-2.0.so.0
No symbol table info available.
#12 0x0000564cf7d1803b in glib_pollfds_poll () at qemu/main-loop.c:213
        context = 0x564cf9e44800
        pfds = <optimized out>
#13 os_host_main_loop_wait (timeout=<optimized out>) at qemu/main-loop.c:258
        ret = 2
        spin_counter = 2
#14 main_loop_wait (nonblocking=<optimized out>) at qemu/main-loop.c:506
        ret = 2
        timeout = 1000
        timeout_ns = <optimized out>
#15 0x0000564cf7a4c91c in main_loop () at qemu/vl.c:1934
        nonblocking = <optimized out>
        last_io = 0
#16 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized
out>) at qemu/vl.c:4658


Maybe bdrv_drained_begin() is supposed to work like this and to let this
request through but that would be pretty counter-intuitive.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

  reply	other threads:[~2016-04-20 18:10 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-13 23:18 [Qemu-devel] [Bug 1570134] [NEW] While committing snapshot qemu crashes with SIGABRT Matthew Schumacher
2016-04-14  2:16 ` Fam Zheng
2016-04-14 16:24 ` [Qemu-devel] [Bug 1570134] " Matthew Schumacher
2016-04-14 20:51 ` Matthew Schumacher
2016-04-15 16:30 ` Matthew Schumacher
2016-04-16 21:46 ` Max Reitz
2016-04-16 21:48 ` Max Reitz
2016-04-18 18:59 ` Matthew Schumacher
2016-04-19 22:01 ` Max Reitz
2016-04-20  0:03 ` Matthew Schumacher
2016-04-20 18:09   ` Max Reitz [this message]
2016-04-20 20:03     ` Max Reitz
2016-04-21  0:34       ` Fam Zheng
2016-04-21  2:07         ` Fam Zheng
2016-04-21 11:35     ` Peter Maydell
2016-04-21 11:43       ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2016-04-22 18:55 ` [Qemu-devel] " Matthew Schumacher
2016-04-25  1:02   ` Fam Zheng
2016-04-26  5:18 ` Fam Zheng
2016-06-21 16:33 ` T. Huth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5717C5F3.90603@redhat.com \
    --to=mreitz@redhat.com \
    --cc=1570134@bugs.launchpad.net \
    --cc=famz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.