From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52125) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aQ33P-00005U-09 for qemu-devel@nongnu.org; Sun, 31 Jan 2016 20:13:19 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aQ33K-0004hQ-0J for qemu-devel@nongnu.org; Sun, 31 Jan 2016 20:13:18 -0500 Received: from [59.151.112.132] (port=4264 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aQ33J-0004Zk-IE for qemu-devel@nongnu.org; Sun, 31 Jan 2016 20:13:13 -0500 References: <1452676712-24239-1-git-send-email-xiecl.fnst@cn.fujitsu.com> <1452676712-24239-8-git-send-email-xiecl.fnst@cn.fujitsu.com> <20160127144644.GL26163@stefanha-x1.localdomain> <56A96B34.9090906@cn.fujitsu.com> <20160128151543.GH9825@stefanha-x1.localdomain> <56AAD8E6.8020208@cn.fujitsu.com> <20160129154648.GE11427@stefanha-x1.localdomain> From: Wen Congyang Message-ID: <56AEB140.3060509@cn.fujitsu.com> Date: Mon, 1 Feb 2016 09:13:36 +0800 MIME-Version: 1.0 In-Reply-To: <20160129154648.GE11427@stefanha-x1.localdomain> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v14 7/8] Implement new driver for block replication List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi , Changlong Xie Cc: Kevin Wolf , Fam Zheng , zhanghailiang , fnstml-hwcolo@cn.fujitsu.com, qemu devel , Max Reitz , Gonglei , Paolo Bonzini On 01/29/2016 11:46 PM, Stefan Hajnoczi wrote: > On Fri, Jan 29, 2016 at 11:13:42AM +0800, Changlong Xie wrote: >> On 01/28/2016 11:15 PM, Stefan Hajnoczi wrote: >>> On Thu, Jan 28, 2016 at 09:13:24AM +0800, Wen Congyang wrote: >>>> On 01/27/2016 10:46 PM, Stefan Hajnoczi wrote: >>>>> On Wed, Jan 13, 2016 at 05:18:31PM +0800, Changlong Xie wrote: >>> I'm concerned that the bdrv_drain_all() in vm_stop() can take a long >>> time if the disk is slow/failing. bdrv_drain_all() blocks until all >>> in-flight I/O requests have completed. What does the Primary do if the >>> Secondary becomes unresponsive? >> >> Actually, we knew this problem. But currently, there seems no better way to >> resolve it. If you have any ideas? > > Is it possible to hold the checkpoint information and acknowledge the > checkpoint right away, without waiting for bdrv_drain_all() or any > Secondory guest activity to complete? There is no way to know that secondary becomes unreponsive. > > I think this really means falling back to microcheckpointing until the > Secondary guest can checkpoint. Instead of a blocking vm_stop() we > would prevent vcpus from running and when the last pending I/O finishes > the Secondary could apply the last checkpoint. This approach does not > block QEMU (the monitor, etc). > If secondary host becomes unresponsive, it means that we cannot do mocrocheckpointing. We should do failover in this case. Thanks Wen Congyang