From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:41816)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1bzdfr-0000FG-E4
	for qemu-devel@nongnu.org; Thu, 27 Oct 2016 01:56:24 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1bzdfn-0001YY-FY
	for qemu-devel@nongnu.org; Thu, 27 Oct 2016 01:56:23 -0400
Received: from szxga03-in.huawei.com ([119.145.14.66]:14976)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1bzdfk-0001TE-Jq
	for qemu-devel@nongnu.org; Thu, 27 Oct 2016 01:56:19 -0400
References: <1476792613-11712-1-git-send-email-zhang.zhanghailiang@huawei.com>
	<20161026060931.GR1679@amit-lp.rh> <58105092.2050102@huawei.com>
	<20161026082609.GT1679@amit-lp.rh> <5810D150.5070709@huawei.com>
	<20161027035256.GA1476@amit-lp.rh>
From: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
Message-ID: <581196EC.7080409@huawei.com>
Date: Thu, 27 Oct 2016 13:55:56 +0800
MIME-Version: 1.0
In-Reply-To: <20161027035256.GA1476@amit-lp.rh>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 00/17] COarse-grain
 LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Amit Shah <amit.shah@redhat.com>
Cc: quintela@redhat.com, qemu-devel@nongnu.org, dgilbert@redhat.com, wency@cn.fujitsu.com, lizhijian@cn.fujitsu.com, xiecl.fnst@cn.fujitsu.com, Hai Huang <hhuang@redhat.com>, Weidong Han <hanweidong@huawei.com>, Dong eddie <eddie.dong@intel.com>, Stefan Hajnoczi <stefanha@redhat.com>, Jason Wang <jasowang@redhat.com>

On 2016/10/27 11:52, Amit Shah wrote:
> On (Wed) 26 Oct 2016 [23:52:48], Hailiang Zhang wrote:
>> Hi Amit,
>>
>> On 2016/10/26 16:26, Amit Shah wrote:
>>> On (Wed) 26 Oct 2016 [14:43:30], Hailiang Zhang wrote:
>>>> Hi Amit,
>>>>
>>>> On 2016/10/26 14:09, Amit Shah wrote:
>>>>> Hello,
>>>>>
>>>>> On (Tue) 18 Oct 2016 [20:09:56], zhanghailiang wrote:
>>>>>> This is the 21th version of COLO frame series.
>>>>>>
>>>>>> Rebase to the latest master.
>>>>>
>>>>> I've reviewed the patchset, have some minor comments, but overall it
>>>>> looks good.  The changes are contained, and common code / existing
>>>>> code paths are not affected much.  We can still target to merge this
>>>>> for 2.8.
>>>>>
>>>>
>>>> I really appreciate your help ;), I will fix all the issues later
>>>> and send v22. Hope we can still catch the deadline of V2.8.
>>>>
>>>>> Do you have any tests on how much the VM slows down / downtime
>>>>> incurred during checkpoints?
>>>>>
>>>>
>>>> Yes, we tested that long time ago, it all depends.
>>>> The downtime is determined by the time of transferring the dirty pages
>>>> and the time of flushing ram from ram buffer.
>>>> But we really have methods to reduce the downtime.
>>>>
>>>> One method is to reduce the amount of data (dirty pages mainly) while do checkpoint
>>>> by transferring dirty pages asynchronously while PVM and SVM are running (no in
>>>> the time of doing checkpoint). Besides we can re-use the capability of migration, such
>>>> as compressing, etc.
>>>> Another method is to reduce the time of flushing ram by using userfaultfd API
>>>> to convert copying ram into marking bitmap. We can also flushing the ram buffer
>>>> by multiple threads which advised by Dave ...
>>>
>>> Yes, I understand that as with any migration numbers, this too depends
>>> on what the guest is doing.  However, can you just pick some standard
>>> workload - kernel compile or something like that - and post a few
>>> observations?
>>>
>>
>> Li Zhijian has sent some test results which based on kernel colo proxy,
>> After switch to userspace colo proxy, there maybe some degradations.
>> But for the old scenario, some optimizations are not implemented.
>> For the new userspace colo proxy scenario, we didn't test it overall,
>> Because it is still WIP, we will start the work after this frame is merged.
>
> OK.
>
>>>>> Also, can you tell how did you arrive at the default checkpoint
>>>>> interval?
>>>>>
>>>>
>>>> Er, for this value, we referred to Remus in XEN platform. ;)
>>>> But after we implement COLO with colo proxy, this interval value will be changed
>>>> to a bigger one (10s). And we will make it configuration too. Besides, we will
>>>> add another configurable value to control the min interval of checkpointing.
>>>
>>> OK - any typical value that is a good mix between COLO keeping the
>>> network too busy / guest paused vs guest making progress?  Again this
>>> is something that's workload-dependent, but I guess you have typical
>>> numbers from a network-bound workload?
>>>
>>
>> Yes, you can refer to Zhijian's email for detail.
>> I think it is necessary to add some test/performance results into COLO's wiki.
>> We will do that later.
>
> Yes, please.
>
> Also, in your next iteration, please add the colo files to the
> MAINTAINERS entry so you get CC'ed on future patches (and bugs :-)
>

OK, I will send v23 with it. Thanks.

Hailiang

> 		Amit
>
> .
>