From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47842) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aT8ia-0006C4-Kj for qemu-devel@nongnu.org; Tue, 09 Feb 2016 08:52:37 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aT8iX-00043D-FY for qemu-devel@nongnu.org; Tue, 09 Feb 2016 08:52:36 -0500 Received: from mx1.redhat.com ([209.132.183.28]:49054) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aT8iX-00042x-9j for qemu-devel@nongnu.org; Tue, 09 Feb 2016 08:52:33 -0500 Date: Tue, 9 Feb 2016 13:52:28 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20160209135228.GC2688@work-vm> References: <20160208151722.GB3022@work-vm> <20160209134706.GE6510@stefanha-x1.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160209134706.GE6510@stefanha-x1.localdomain> Subject: Re: [Qemu-devel] lock-free monitor? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: xiecl.fnst@cn.fujitsu.com, qemu-devel@nongnu.org, armbru@redhat.com * Stefan Hajnoczi (stefanha@redhat.com) wrote: > On Mon, Feb 08, 2016 at 03:17:23PM +0000, Dr. David Alan Gilbert wrote: > > Does this make sense to everyone else, or does anyone have any better > > suggestions? > > As a concrete example, any monitor command that calls bdrv_drain_all() > can hang forever with the QEMU global mutex held if I/O requests are > stuck (e.g. NFS mount is unreachable). > > bdrv_aio_cancel() can also hang but is mostly exposed to device > emulation, not the monitor. > > One solution for these block layer functions is to add a timeout > argument and let them return an error. This way the monitor and device > emulation do not hang forever. > > The benefit of the timeout is that both monitor and device emulation > hangs are tackled. It also doesn't require monitor changes. > > I'm not sure who chooses the timeout value and which value makes sense > (policy vs mechanism separation)... Chosing that value tends to be very difficult - for example if it's iSCSI then you have to make all the same type of choices as multipath does and worry about switch reconfiguration and SANs failing over between controllers etc etc. And then what do you do when you timeout - anywhere that already calls bdrv_drain_all would have to make a decision. > Stefan -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK