From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53232) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aUq4Y-0007eu-Od for qemu-devel@nongnu.org; Sun, 14 Feb 2016 01:22:19 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aUq4V-0004Bo-J8 for qemu-devel@nongnu.org; Sun, 14 Feb 2016 01:22:18 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45327) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aUq4V-0004AX-Ds for qemu-devel@nongnu.org; Sun, 14 Feb 2016 01:22:15 -0500 Date: Sun, 14 Feb 2016 14:22:10 +0800 From: Fam Zheng Message-ID: <20160214062210.GD9723@ad.usersys.redhat.com> References: <20160208151722.GB3022@work-vm> <20160209134706.GE6510@stefanha-x1.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160209134706.GE6510@stefanha-x1.localdomain> Subject: Re: [Qemu-devel] lock-free monitor? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: armbru@redhat.com, xiecl.fnst@cn.fujitsu.com, "Dr. David Alan Gilbert" , qemu-devel@nongnu.org On Tue, 02/09 13:47, Stefan Hajnoczi wrote: > On Mon, Feb 08, 2016 at 03:17:23PM +0000, Dr. David Alan Gilbert wrote: > > Does this make sense to everyone else, or does anyone have any better > > suggestions? > > As a concrete example, any monitor command that calls bdrv_drain_all() > can hang forever with the QEMU global mutex held if I/O requests are > stuck (e.g. NFS mount is unreachable). > > bdrv_aio_cancel() can also hang but is mostly exposed to device > emulation, not the monitor. > > One solution for these block layer functions is to add a timeout > argument and let them return an error. This way the monitor and device > emulation do not hang forever. Yes, there are a few places in block layer invoking aio_poll() in a loop waiting for certain events, and a disconnected network link could make QEMU hang. In these cases a timeout is a huge improvement. Maybe we can mark the BDS as "hanging" (-EIO is returned for all further requests) and let bdrv_drain_all() return. > > The benefit of the timeout is that both monitor and device emulation > hangs are tackled. It also doesn't require monitor changes. > > I'm not sure who chooses the timeout value and which value makes sense > (policy vs mechanism separation)... Default to 30 seconds like Linux, and make it tunable through command line options as well as QMP? Fam