From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:52125)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <wency@cn.fujitsu.com>) id 1aQ33P-00005U-09
	for qemu-devel@nongnu.org; Sun, 31 Jan 2016 20:13:19 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <wency@cn.fujitsu.com>) id 1aQ33K-0004hQ-0J
	for qemu-devel@nongnu.org; Sun, 31 Jan 2016 20:13:18 -0500
Received: from [59.151.112.132] (port=4264 helo=heian.cn.fujitsu.com)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <wency@cn.fujitsu.com>) id 1aQ33J-0004Zk-IE
	for qemu-devel@nongnu.org; Sun, 31 Jan 2016 20:13:13 -0500
References: <1452676712-24239-1-git-send-email-xiecl.fnst@cn.fujitsu.com>
	<1452676712-24239-8-git-send-email-xiecl.fnst@cn.fujitsu.com>
	<20160127144644.GL26163@stefanha-x1.localdomain>
	<56A96B34.9090906@cn.fujitsu.com>
	<20160128151543.GH9825@stefanha-x1.localdomain>
	<56AAD8E6.8020208@cn.fujitsu.com>
	<20160129154648.GE11427@stefanha-x1.localdomain>
From: Wen Congyang <wency@cn.fujitsu.com>
Message-ID: <56AEB140.3060509@cn.fujitsu.com>
Date: Mon, 1 Feb 2016 09:13:36 +0800
MIME-Version: 1.0
In-Reply-To: <20160129154648.GE11427@stefanha-x1.localdomain>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v14 7/8] Implement new driver for block
	replication
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@redhat.com>, Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <famz@redhat.com>, zhanghailiang <zhang.zhanghailiang@huawei.com>, fnstml-hwcolo@cn.fujitsu.com, qemu devel <qemu-devel@nongnu.org>, Max Reitz <mreitz@redhat.com>, Gonglei <arei.gonglei@huawei.com>, Paolo Bonzini <pbonzini@redhat.com>

On 01/29/2016 11:46 PM, Stefan Hajnoczi wrote:
> On Fri, Jan 29, 2016 at 11:13:42AM +0800, Changlong Xie wrote:
>> On 01/28/2016 11:15 PM, Stefan Hajnoczi wrote:
>>> On Thu, Jan 28, 2016 at 09:13:24AM +0800, Wen Congyang wrote:
>>>> On 01/27/2016 10:46 PM, Stefan Hajnoczi wrote:
>>>>> On Wed, Jan 13, 2016 at 05:18:31PM +0800, Changlong Xie wrote:
>>> I'm concerned that the bdrv_drain_all() in vm_stop() can take a long
>>> time if the disk is slow/failing.  bdrv_drain_all() blocks until all
>>> in-flight I/O requests have completed.  What does the Primary do if the
>>> Secondary becomes unresponsive?
>>
>> Actually, we knew this problem. But currently, there seems no better way to
>> resolve it. If you have any ideas?
> 
> Is it possible to hold the checkpoint information and acknowledge the
> checkpoint right away, without waiting for bdrv_drain_all() or any
> Secondory guest activity to complete?

There is no way to know that secondary becomes unreponsive.

> 
> I think this really means falling back to microcheckpointing until the
> Secondary guest can checkpoint.  Instead of a blocking vm_stop() we
> would prevent vcpus from running and when the last pending I/O finishes
> the Secondary could apply the last checkpoint.  This approach does not
> block QEMU (the monitor, etc).
> 

If secondary host becomes unresponsive, it means that we cannot do mocrocheckpointing.
We should do failover in this case.

Thanks
Wen Congyang