From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44082) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1at2a0-0001jT-7M for qemu-devel@nongnu.org; Wed, 20 Apr 2016 20:34:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1at2Zz-0002DG-Bg for qemu-devel@nongnu.org; Wed, 20 Apr 2016 20:34:48 -0400 Date: Thu, 21 Apr 2016 08:34:46 +0800 From: Fam Zheng Message-ID: <20160421003413.GA24822@dhcp-14-234.nay.redhat.com> References: <20160413231801.31850.67186.malonedeb@chaenomeles.canonical.com> <20160420000318.17358.96092.malone@soybean.canonical.com> <5717C5F3.90603@redhat.com> <5717E09C.5050209@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5717E09C.5050209@redhat.com> Subject: Re: [Qemu-devel] [Bug 1570134] Re: While committing snapshot qemu crashes with SIGABRT List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Max Reitz Cc: qemu-devel@nongnu.org, Qemu-block , Bug 1570134 <1570134@bugs.launchpad.net>, Kevin Wolf , Paolo Bonzini , Stefan Hajnoczi On Wed, 04/20 22:03, Max Reitz wrote: > On 20.04.2016 20:09, Max Reitz wrote: > > On 20.04.2016 02:03, Matthew Schumacher wrote: > >> Max, > >> > >> Qemu still crashes for me, but the debug is again very different. When > >> I attach to the qemu process from gdb, it is unable to provide a > >> backtrace when it crashes. The log file is different too. Any ideas? > >> > >> qemu-system-x86_64: block.c:2307: bdrv_replace_in_backing_chain: > >> Assertion `!bdrv_requests_pending(old)' failed. > > > > This message is exactly the same as you saw in 2.5.1, so I guess we've > > at least averted a regression in 2.6.0. > > I get the same message in 2.5.0, in 2.4.0 it's "Co-routine re-entered > recursively". 2.3.0 works fine. > > Bisecting the regression between 2.3.0 and 2.4.0 interestingly yields > 48ac0a4df84662f as the problematic commit, but I can't imagine that this > is the root issue. The effective change it brings is that for active > commits, the buf_size is no longer the same as the granularity, but the > default mirror buf_size instead. > > When forcing buf_size to the granularity, the issue first appears with > commit 3f09bfbc7bee812 (after 2.4.0, before 2.5.0), which is much less > surprising, because this is the one that introduced the assertion in the > first place. > > However, I still don't think the assertion is the problem but the fact > that the guest device can still send requests after bdrv_drained_begin(). Thanks for debugging this. bdrv_drained_begin isn't effective because the guest notifier handler is not registered as "external": virtio_queue_set_host_notifier_fd_handler event_notifier_set_handler qemu_set_fd_handler aio_set_fd_handler(ctx, fd, is_external, /* false */ ...) is_external SHOULD be true here.