From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:55034)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <famz@redhat.com>) id 1at427-0002uV-JL
	for qemu-devel@nongnu.org; Wed, 20 Apr 2016 22:07:56 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <famz@redhat.com>) id 1at426-0004ot-IP
	for qemu-devel@nongnu.org; Wed, 20 Apr 2016 22:07:55 -0400
Date: Thu, 21 Apr 2016 10:07:51 +0800
From: Fam Zheng <famz@redhat.com>
Message-ID: <20160421020751.GB24822@dhcp-14-234.nay.redhat.com>
References: <20160413231801.31850.67186.malonedeb@chaenomeles.canonical.com>
	<20160420000318.17358.96092.malone@soybean.canonical.com>
	<5717C5F3.90603@redhat.com> <5717E09C.5050209@redhat.com>
	<20160421003413.GA24822@dhcp-14-234.nay.redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160421003413.GA24822@dhcp-14-234.nay.redhat.com>
Subject: Re: [Qemu-devel] [Bug 1570134] Re: While committing snapshot qemu
 crashes with SIGABRT
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Max Reitz <mreitz@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Qemu-block <qemu-block@nongnu.org>, qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Bug 1570134 <1570134@bugs.launchpad.net>

On Thu, 04/21 08:34, Fam Zheng wrote:
> On Wed, 04/20 22:03, Max Reitz wrote:
> > On 20.04.2016 20:09, Max Reitz wrote:
> > > On 20.04.2016 02:03, Matthew Schumacher wrote:
> > >> Max,
> > >>
> > >> Qemu still crashes for me, but the debug is again very different.  When
> > >> I attach to the qemu process from gdb, it is unable to provide a
> > >> backtrace when it crashes.  The log file is different too.  Any ideas?
> > >>
> > >> qemu-system-x86_64: block.c:2307: bdrv_replace_in_backing_chain:
> > >> Assertion `!bdrv_requests_pending(old)' failed.
> > > 
> > > This message is exactly the same as you saw in 2.5.1, so I guess we've
> > > at least averted a regression in 2.6.0.
> > 
> > I get the same message in 2.5.0, in 2.4.0 it's "Co-routine re-entered
> > recursively". 2.3.0 works fine.
> > 
> > Bisecting the regression between 2.3.0 and 2.4.0 interestingly yields
> > 48ac0a4df84662f as the problematic commit, but I can't imagine that this
> > is the root issue. The effective change it brings is that for active
> > commits, the buf_size is no longer the same as the granularity, but the
> > default mirror buf_size instead.
> > 
> > When forcing buf_size to the granularity, the issue first appears with
> > commit 3f09bfbc7bee812 (after 2.4.0, before 2.5.0), which is much less
> > surprising, because this is the one that introduced the assertion in the
> > first place.
> > 
> > However, I still don't think the assertion is the problem but the fact
> > that the guest device can still send requests after bdrv_drained_begin().
> 
> Thanks for debugging this.
> 
> bdrv_drained_begin isn't effective because the guest notifier handler is not
> registered as "external":
> 
>   virtio_queue_set_host_notifier_fd_handler
>     event_notifier_set_handler
>       qemu_set_fd_handler
>         aio_set_fd_handler(ctx, fd,
>                            is_external, /* false */
>                            ...)
> 
> 
> is_external SHOULD be true here.
> 

This patch survives the reproducer I have on top of master (also submitted to
qemu-devel for 2.6):

---

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index f745c4a..002c2c6 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1829,10 +1829,11 @@ void virtio_queue_set_host_notifier_fd_handler(VirtQueue *vq, bool assign,
                                                bool set_handler)
 {
     if (assign && set_handler) {
-        event_notifier_set_handler(&vq->host_notifier,
-                                   virtio_queue_host_notifier_read);
+        aio_set_event_notifier(qemu_get_aio_context(), &vq->host_notifier,
+                               true, virtio_queue_host_notifier_read);
     } else {
-        event_notifier_set_handler(&vq->host_notifier, NULL);
+        aio_set_event_notifier(qemu_get_aio_context(), &vq->host_notifier,
+                               true, NULL);
     }
     if (!assign) {
         /* Test and clear notifier before after disabling event,