From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55034) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1at427-0002uV-JL for qemu-devel@nongnu.org; Wed, 20 Apr 2016 22:07:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1at426-0004ot-IP for qemu-devel@nongnu.org; Wed, 20 Apr 2016 22:07:55 -0400 Date: Thu, 21 Apr 2016 10:07:51 +0800 From: Fam Zheng Message-ID: <20160421020751.GB24822@dhcp-14-234.nay.redhat.com> References: <20160413231801.31850.67186.malonedeb@chaenomeles.canonical.com> <20160420000318.17358.96092.malone@soybean.canonical.com> <5717C5F3.90603@redhat.com> <5717E09C.5050209@redhat.com> <20160421003413.GA24822@dhcp-14-234.nay.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160421003413.GA24822@dhcp-14-234.nay.redhat.com> Subject: Re: [Qemu-devel] [Bug 1570134] Re: While committing snapshot qemu crashes with SIGABRT List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Max Reitz Cc: Kevin Wolf , Qemu-block , qemu-devel@nongnu.org, Stefan Hajnoczi , Paolo Bonzini , Bug 1570134 <1570134@bugs.launchpad.net> On Thu, 04/21 08:34, Fam Zheng wrote: > On Wed, 04/20 22:03, Max Reitz wrote: > > On 20.04.2016 20:09, Max Reitz wrote: > > > On 20.04.2016 02:03, Matthew Schumacher wrote: > > >> Max, > > >> > > >> Qemu still crashes for me, but the debug is again very different. When > > >> I attach to the qemu process from gdb, it is unable to provide a > > >> backtrace when it crashes. The log file is different too. Any ideas? > > >> > > >> qemu-system-x86_64: block.c:2307: bdrv_replace_in_backing_chain: > > >> Assertion `!bdrv_requests_pending(old)' failed. > > > > > > This message is exactly the same as you saw in 2.5.1, so I guess we've > > > at least averted a regression in 2.6.0. > > > > I get the same message in 2.5.0, in 2.4.0 it's "Co-routine re-entered > > recursively". 2.3.0 works fine. > > > > Bisecting the regression between 2.3.0 and 2.4.0 interestingly yields > > 48ac0a4df84662f as the problematic commit, but I can't imagine that this > > is the root issue. The effective change it brings is that for active > > commits, the buf_size is no longer the same as the granularity, but the > > default mirror buf_size instead. > > > > When forcing buf_size to the granularity, the issue first appears with > > commit 3f09bfbc7bee812 (after 2.4.0, before 2.5.0), which is much less > > surprising, because this is the one that introduced the assertion in the > > first place. > > > > However, I still don't think the assertion is the problem but the fact > > that the guest device can still send requests after bdrv_drained_begin(). > > Thanks for debugging this. > > bdrv_drained_begin isn't effective because the guest notifier handler is not > registered as "external": > > virtio_queue_set_host_notifier_fd_handler > event_notifier_set_handler > qemu_set_fd_handler > aio_set_fd_handler(ctx, fd, > is_external, /* false */ > ...) > > > is_external SHOULD be true here. > This patch survives the reproducer I have on top of master (also submitted to qemu-devel for 2.6): --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index f745c4a..002c2c6 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -1829,10 +1829,11 @@ void virtio_queue_set_host_notifier_fd_handler(VirtQueue *vq, bool assign, bool set_handler) { if (assign && set_handler) { - event_notifier_set_handler(&vq->host_notifier, - virtio_queue_host_notifier_read); + aio_set_event_notifier(qemu_get_aio_context(), &vq->host_notifier, + true, virtio_queue_host_notifier_read); } else { - event_notifier_set_handler(&vq->host_notifier, NULL); + aio_set_event_notifier(qemu_get_aio_context(), &vq->host_notifier, + true, NULL); } if (!assign) { /* Test and clear notifier before after disabling event,