From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:59767)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1Wnob8-0005ei-RO
	for qemu-devel@nongnu.org; Fri, 23 May 2014 08:29:26 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1Wnob1-0007Wq-8Z
	for qemu-devel@nongnu.org; Fri, 23 May 2014 08:29:18 -0400
Received: from mx1.redhat.com ([209.132.183.28]:17258)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1Wnob1-0007Wb-01
	for qemu-devel@nongnu.org; Fri, 23 May 2014 08:29:11 -0400
Date: Fri, 23 May 2014 14:29:06 +0200
From: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20140523122906.GB5254@noname.redhat.com>
References: <537E62CE.2050302@beyond.pl> <537E66A5.9010609@beyond.pl>
	<537F047C.9010800@redhat.com> <537F141C.5080102@beyond.pl>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <537F141C.5080102@beyond.pl>
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Marcin =?utf-8?Q?Gibu=C5=82a?= <m.gibula@beyond.pl>
Cc: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org

Am 23.05.2014 um 11:25 hat Marcin Gibu=C5=82a geschrieben:
> On 23.05.2014 10:19, Paolo Bonzini wrote:
> >Il 22/05/2014 23:05, Marcin Gibu=C5=82a ha scritto:
> >>Some more info.
> >>VM was doing lot of write IO during this test.
> >
> >QEMU is waiting for librados to complete I/O.  Can you reproduce it wi=
th
> >a different driver?
>=20
> Hi,
>=20
> I've reproduced it without RBD. Backtrace below:
> [...]

I see that you have a mix of aio=3Dnative and aio=3Dthreads. I can't say
much about the aio=3Dnative disks (perhaps try to reproduce without
them?), but there are definitely no worker threads for the other disks
that bdrv_drain_all() would have to wait for.

bdrv_requests_pending(), called by bdrv_requests_pending_all(), is the
function that determines for each of the disks in your VM if it still
has requests in flight that need to be completed. This function must
have returned true even though there is nothing to wait for.

Can you check which of its conditions led to this behaviour, and for
which disk it did? Either by setting a breakpoint there and
singlestepping through the function the next time it is called (if the
poll even has a timeout), or by inspecting the conditions manually in
gdb.

Kevin