From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:51977)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1a9G6V-0002Uo-BM
	for qemu-devel@nongnu.org; Wed, 16 Dec 2015 12:43:08 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1a9G6U-0001Mz-CZ
	for qemu-devel@nongnu.org; Wed, 16 Dec 2015 12:43:07 -0500
References: <1448388091-117282-1-git-send-email-pbonzini@redhat.com>
	<5656D2B9.3010802@de.ibm.com> <5656E158.3090505@redhat.com>
	<5668907A.80604@redhat.com> <56715EEE.6020208@de.ibm.com>
From: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <5671A29F.8050607@redhat.com>
Date: Wed, 16 Dec 2015 18:42:55 +0100
MIME-Version: 1.0
In-Reply-To: <56715EEE.6020208@de.ibm.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [RFC PATCH 00/40] Sneak peek of virtio and
 dataplane changes for 2.6
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Christian Borntraeger <borntraeger@de.ibm.com>, qemu-devel@nongnu.org, qemu-block@nongnu.org
Cc: mlin@kernel.org, famz@redhat.com, ming.lei@canonical.com, stefanha@redhat.com, mst@redhat.com



On 16/12/2015 13:54, Christian Borntraeger wrote:
> Just some quick remarks before I leave into vacation:
>=20
> Performance seems to be better than the initial version. I have some
> hangs from time to time when shutting down (also with your previous
> version)

Yes, I've seen them too.  To reproduce, just start some "dd" and
repeatedly stop/cont.  It hangs after 3-4 tries typically.

It's like this:

	I/O thread			main thread
	-----------------------------------------------------
					aio_context_acquire
					bdrv_drain
	bh->scheduled =3D 0
	call bottom half
	   aio_context_acquire
	   <hang>


The bottom half is hung in the I/O thread, and will never be processed
by the main thread because bh->scheduled =3D 0.

My current idea is that *all* callers of aio_poll have to call it
without the AioContext acquired, including bdrv_drain.  It's not a
problem now that Fam has fixed almost all bdrv_drain users to use
bdrv_drained_begin/end instead.

The nastiness here is two-fold:

1) doing

	aio_context_release
	aio_poll
	aio_context_acquire

in bdrv_drain is not very safe because the mutex is recursive.  (Have I
ever mentioned I don't like recursive mutexes?...)

2) there is still a race in bdrv_drain

                    bdrv_flush_io_queue(bs);
                    if (bdrv_requests_pending(bs)) {
                        busy =3D true;
                        aio_context_release(aio_context);
                        aio_poll(aio_context, busy);
                        aio_context_acquire(aio_context);

if we release before aio_poll, the last request can be completed
before aio_poll is invoked, and aio_poll will block forever.  In Linux
terms, bdrv_request_pending would be prepare_to_wait() and aio_poll
would be schedule().  But, we don't have prepare_to_wait().  I have a
hackish solution here, but I don't like it really.

Paolo