From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44227) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZK0Lf-0008D4-0K for qemu-devel@nongnu.org; Tue, 28 Jul 2015 04:34:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZK0LZ-0003zs-9T for qemu-devel@nongnu.org; Tue, 28 Jul 2015 04:34:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:33584) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZK0LZ-0003zT-3q for qemu-devel@nongnu.org; Tue, 28 Jul 2015 04:34:49 -0400 Date: Tue, 28 Jul 2015 09:34:46 +0100 From: Stefan Hajnoczi Message-ID: <20150728083446.GC32719@stefanha-thinkpad.redhat.com> References: <1438014819-18125-1-git-send-email-stefanha@redhat.com> <20150728090700.73f001d3.cornelia.huck@de.ibm.com> <20150728100226.49bafc67.cornelia.huck@de.ibm.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="NKoe5XOeduwbEQHU" Content-Disposition: inline In-Reply-To: <20150728100226.49bafc67.cornelia.huck@de.ibm.com> Subject: Re: [Qemu-devel] [PATCH for-2.4 0/2] AioContext: fix deadlock after aio_context_acquire() race List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Cornelia Huck Cc: borntraeger@de.ibm.com, qemu-devel@nongnu.org, Paolo Bonzini --NKoe5XOeduwbEQHU Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jul 28, 2015 at 10:02:26AM +0200, Cornelia Huck wrote: > On Tue, 28 Jul 2015 09:07:00 +0200 > Cornelia Huck wrote: >=20 > > On Mon, 27 Jul 2015 17:33:37 +0100 > > Stefan Hajnoczi wrote: > >=20 > > > See Patch 2 for details on the deadlock after two aio_context_acquire= () calls > > > race. This caused dataplane to hang on startup. > > >=20 > > > Patch 1 is a memory leak fix for AioContext that's needed by Patch 2. > > >=20 > > > Stefan Hajnoczi (2): > > > AioContext: avoid leaking BHs on cleanup > > > AioContext: force event loop iteration using BH > > >=20 > > > async.c | 29 +++++++++++++++++++++++++++-- > > > include/block/aio.h | 3 +++ > > > 2 files changed, 30 insertions(+), 2 deletions(-) > > >=20 > >=20 > > Just gave this a try: The stripped-down guest that hangs during startup > > on master is working fine with these patches applied, and my full setup > > works as well. > >=20 > > So, > >=20 > > Tested-by: Cornelia Huck >=20 > Uh-oh, spoke too soon. It starts, but when I try a virsh managedsave, I > get >=20 > qemu-system-s390x: /data/git/yyy/qemu/async.c:242: aio_ctx_finalize: Asse= rtion `ctx->first_bh->deleted' failed. Please pretty-print ctx->first_bh in gdb. In particular, which function is ctx->first_bh->cb pointing to? I tried reproducing with qemu-system-x86_64 and a RHEL 7 guest but couldn't trigger the assertion failure. This assertion means that there is an *existing* QEMUBH memory leak. It is not introduced by this patch series. If we run out of time for QEMU 2.4, it would be acceptable to replace the assertion with: /* TODO track down leaked BHs and turn this into an assertion */ if (ctx->first_bh->deleted) { g_free(ctx->first_bh); } --NKoe5XOeduwbEQHU Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVtz6mAAoJEJykq7OBq3PIc+cH/j9uEYCKJy3E3cQVhxOxlNEV n0oqpgrdPJDe86dgGhxpmsHNjb5s3jympsGnQtBihMoC9xT63qsV3qqjF9bh+N5l LFrCQheXyRmpL5dUd/FfrbeNSIBPxUqC/VgU0W21Y9AgqlLOyVIpkA6ZS9mMKCRs GgzEQWUeLRZsgHllTbaORl/MVoNissJrPzF3jXTb4l34ghPRd5E8zClHUV0SHTps lnAGI8/VzhBpxbu5iWTvt5doT1tA5wG/li1bIH8WmkBODpbx5noUaClxrpm6jQKw aPiIv9iNBTlFaNfs2z3mJ4yzgq0Xm4dsqIDmifU6C4o1FpBDzct9mvy3k2/ct7Q= =OgWi -----END PGP SIGNATURE----- --NKoe5XOeduwbEQHU--