From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:34136)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <wency@cn.fujitsu.com>) id 1aQmFM-0008TV-Ds
	for qemu-devel@nongnu.org; Tue, 02 Feb 2016 20:28:41 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <wency@cn.fujitsu.com>) id 1aQmFJ-0006LR-8R
	for qemu-devel@nongnu.org; Tue, 02 Feb 2016 20:28:40 -0500
Received: from [59.151.112.132] (port=58472 helo=heian.cn.fujitsu.com)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <wency@cn.fujitsu.com>) id 1aQmFI-0006KF-IT
	for qemu-devel@nongnu.org; Tue, 02 Feb 2016 20:28:37 -0500
References: <1452676712-24239-1-git-send-email-xiecl.fnst@cn.fujitsu.com>
	<1452676712-24239-8-git-send-email-xiecl.fnst@cn.fujitsu.com>
	<20160127144644.GL26163@stefanha-x1.localdomain>
	<56A96B34.9090906@cn.fujitsu.com>
	<20160128151543.GH9825@stefanha-x1.localdomain>
	<56AAD8E6.8020208@cn.fujitsu.com>
	<20160129154648.GE11427@stefanha-x1.localdomain>
	<56AEB140.3060509@cn.fujitsu.com>
	<20160202143413.GA32084@stefanha-x1.localdomain>
From: Wen Congyang <wency@cn.fujitsu.com>
Message-ID: <56B157EB.60804@cn.fujitsu.com>
Date: Wed, 3 Feb 2016 09:29:15 +0800
MIME-Version: 1.0
In-Reply-To: <20160202143413.GA32084@stefanha-x1.localdomain>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v14 7/8] Implement new driver for block
	replication
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Changlong Xie <xiecl.fnst@cn.fujitsu.com>, Fam Zheng <famz@redhat.com>, zhanghailiang <zhang.zhanghailiang@huawei.com>, fnstml-hwcolo@cn.fujitsu.com, qemu devel <qemu-devel@nongnu.org>, Max Reitz <mreitz@redhat.com>, Gonglei <arei.gonglei@huawei.com>, Paolo Bonzini <pbonzini@redhat.com>

On 02/02/2016 10:34 PM, Stefan Hajnoczi wrote:
> On Mon, Feb 01, 2016 at 09:13:36AM +0800, Wen Congyang wrote:
>> On 01/29/2016 11:46 PM, Stefan Hajnoczi wrote:
>>> On Fri, Jan 29, 2016 at 11:13:42AM +0800, Changlong Xie wrote:
>>>> On 01/28/2016 11:15 PM, Stefan Hajnoczi wrote:
>>>>> On Thu, Jan 28, 2016 at 09:13:24AM +0800, Wen Congyang wrote:
>>>>>> On 01/27/2016 10:46 PM, Stefan Hajnoczi wrote:
>>>>>>> On Wed, Jan 13, 2016 at 05:18:31PM +0800, Changlong Xie wrote:
>>>>> I'm concerned that the bdrv_drain_all() in vm_stop() can take a long
>>>>> time if the disk is slow/failing.  bdrv_drain_all() blocks until all
>>>>> in-flight I/O requests have completed.  What does the Primary do if the
>>>>> Secondary becomes unresponsive?
>>>>
>>>> Actually, we knew this problem. But currently, there seems no better way to
>>>> resolve it. If you have any ideas?
>>>
>>> Is it possible to hold the checkpoint information and acknowledge the
>>> checkpoint right away, without waiting for bdrv_drain_all() or any
>>> Secondory guest activity to complete?
>>
>> There is no way to know that secondary becomes unreponsive.
> 
> I meant whether it is necessary for the Secondary to vm_stop() and apply
> the checkpoint before acknowledging the checkpoint to the Primary?

I don't understand this.
Here is the COLO checkpoint flow:

    Primary                                                Secondary
    new checkpoint notice                 --->
    vm_stop()                                              vm_stop()
    vm state(device state, memory, cpu)   --->
                                                           load state
                                          <---             done
    vm_start()                                             vm_start()
> 
>>> I think this really means falling back to microcheckpointing until the
>>> Secondary guest can checkpoint.  Instead of a blocking vm_stop() we
>>> would prevent vcpus from running and when the last pending I/O finishes
>>> the Secondary could apply the last checkpoint.  This approach does not
>>> block QEMU (the monitor, etc).
>>>
>>
>> If secondary host becomes unresponsive, it means that we cannot do mocrocheckpointing.
>> We should do failover in this case.
> 
> This is dangerous because it means that a delay/failure in the Secondary
> would cause the Primary to fail over to the broken Secondary.  All the
> more reason not to perform blocking operations on the Secondary in the
> checkpoint code path.

If the secondary is broken, primary qemu will take over.

Thanks
Wen Congyang

> 
> Stefan
>