From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:59227)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <armbru@redhat.com>) id 1aVK05-0001l3-Pl
	for qemu-devel@nongnu.org; Mon, 15 Feb 2016 09:19:42 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <armbru@redhat.com>) id 1aVK02-0003L2-Iu
	for qemu-devel@nongnu.org; Mon, 15 Feb 2016 09:19:41 -0500
Received: from mx1.redhat.com ([209.132.183.28]:34397)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <armbru@redhat.com>) id 1aVK02-0003Kx-BC
	for qemu-devel@nongnu.org; Mon, 15 Feb 2016 09:19:38 -0500
From: Markus Armbruster <armbru@redhat.com>
References: <20160208151722.GB3022@work-vm>
	<20160209134706.GE6510@stefanha-x1.localdomain>
	<20160214062210.GD9723@ad.usersys.redhat.com>
	<20160215134214.GC10217@stefanha-x1.localdomain>
Date: Mon, 15 Feb 2016 15:19:35 +0100
In-Reply-To: <20160215134214.GC10217@stefanha-x1.localdomain> (Stefan
	Hajnoczi's message of "Mon, 15 Feb 2016 13:42:14 +0000")
Message-ID: <874mda2gzs.fsf@blackfin.pond.sub.org>
MIME-Version: 1.0
Content-Type: text/plain
Subject: Re: [Qemu-devel] lock-free monitor?
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: xiecl.fnst@cn.fujitsu.com, Fam Zheng <famz@redhat.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, qemu-devel@nongnu.org

Stefan Hajnoczi <stefanha@redhat.com> writes:

> On Sun, Feb 14, 2016 at 02:22:10PM +0800, Fam Zheng wrote:
>> On Tue, 02/09 13:47, Stefan Hajnoczi wrote:
>> > On Mon, Feb 08, 2016 at 03:17:23PM +0000, Dr. David Alan Gilbert wrote:
>> > > Does this make sense to everyone else, or does anyone have any better
>> > > suggestions?
>> > 
>> > As a concrete example, any monitor command that calls bdrv_drain_all()
>> > can hang forever with the QEMU global mutex held if I/O requests are
>> > stuck (e.g. NFS mount is unreachable).
>> > 
>> > bdrv_aio_cancel() can also hang but is mostly exposed to device
>> > emulation, not the monitor.
>> > 
>> > One solution for these block layer functions is to add a timeout
>> > argument and let them return an error.  This way the monitor and device
>> > emulation do not hang forever.
>> 
>> Yes, there are a few places in block layer invoking aio_poll() in a loop
>> waiting for certain events, and a disconnected network link could make QEMU
>> hang. In these cases a timeout is a huge improvement.  Maybe we can mark the
>> BDS as "hanging" (-EIO is returned for all further requests) and let
>> bdrv_drain_all() return.
>
> No, we need to be very careful about hung requests that are still in
> flight.  They could complete after a long time and modify the disk or
> guest RAM.
>
> The only thing bdrv_drain_all() can return is -ETIME.  Beyond that it
> cannot pretend that requests have been drained since that could lead to
> data corruption/loss.
>
> The caller has to abort whatever it was trying to do since it has no
> guarantee that requests have drained.  Maybe the remaining requests will
> fail and go away, maybe they will complete.  We don't know...
>
>> > 
>> > The benefit of the timeout is that both monitor and device emulation
>> > hangs are tackled.  It also doesn't require monitor changes.
>> > 
>> > I'm not sure who chooses the timeout value and which value makes sense
>> > (policy vs mechanism separation)...
>> 
>> Default to 30 seconds like Linux, and make it tunable through command line
>> options as well as QMP?
>
> Yes.  Either commands that block can take an optional timeout (seconds)
> parameter, or we could have a global disk I/O timeout value.  I prefer
> passing an optional per-command value.
>
> Libvirt and other sophisticated clients could begin using it in a
> backwards-compatible way.  Existing code wouldn't use it and still
> suffer from the hang problem (but at least QEMU is
> backwards-compatible).

I'm not arguing against timeouts, I just want to remind everyone that
blocking while holding the BQL is a bug, and limiting the block time
with timeouts is a work-around, no more, no less.