From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:57826)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1bzQWL-0000Wz-IW
	for qemu-devel@nongnu.org; Wed, 26 Oct 2016 11:53:42 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1bzQWI-0005KL-8m
	for qemu-devel@nongnu.org; Wed, 26 Oct 2016 11:53:41 -0400
Received: from szxga02-in.huawei.com ([119.145.14.65]:41945)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1bzQVs-0004v7-NH
	for qemu-devel@nongnu.org; Wed, 26 Oct 2016 11:53:38 -0400
References: <1476792613-11712-1-git-send-email-zhang.zhanghailiang@huawei.com>
	<20161026060931.GR1679@amit-lp.rh> <58105092.2050102@huawei.com>
	<20161026082609.GT1679@amit-lp.rh>
From: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
Message-ID: <5810D150.5070709@huawei.com>
Date: Wed, 26 Oct 2016 23:52:48 +0800
MIME-Version: 1.0
In-Reply-To: <20161026082609.GT1679@amit-lp.rh>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 00/17] COarse-grain
 LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Amit Shah <amit.shah@redhat.com>
Cc: quintela@redhat.com, qemu-devel@nongnu.org, dgilbert@redhat.com, wency@cn.fujitsu.com, lizhijian@cn.fujitsu.com, xiecl.fnst@cn.fujitsu.com, Hai Huang <hhuang@redhat.com>, Weidong Han <hanweidong@huawei.com>, Dong eddie <eddie.dong@intel.com>, Stefan Hajnoczi <stefanha@redhat.com>, Jason Wang <jasowang@redhat.com>

Hi Amit,

On 2016/10/26 16:26, Amit Shah wrote:
> On (Wed) 26 Oct 2016 [14:43:30], Hailiang Zhang wrote:
>> Hi Amit,
>>
>> On 2016/10/26 14:09, Amit Shah wrote:
>>> Hello,
>>>
>>> On (Tue) 18 Oct 2016 [20:09:56], zhanghailiang wrote:
>>>> This is the 21th version of COLO frame series.
>>>>
>>>> Rebase to the latest master.
>>>
>>> I've reviewed the patchset, have some minor comments, but overall it
>>> looks good.  The changes are contained, and common code / existing
>>> code paths are not affected much.  We can still target to merge this
>>> for 2.8.
>>>
>>
>> I really appreciate your help ;), I will fix all the issues later
>> and send v22. Hope we can still catch the deadline of V2.8.
>>
>>> Do you have any tests on how much the VM slows down / downtime
>>> incurred during checkpoints?
>>>
>>
>> Yes, we tested that long time ago, it all depends.
>> The downtime is determined by the time of transferring the dirty pages
>> and the time of flushing ram from ram buffer.
>> But we really have methods to reduce the downtime.
>>
>> One method is to reduce the amount of data (dirty pages mainly) while do checkpoint
>> by transferring dirty pages asynchronously while PVM and SVM are running (no in
>> the time of doing checkpoint). Besides we can re-use the capability of migration, such
>> as compressing, etc.
>> Another method is to reduce the time of flushing ram by using userfaultfd API
>> to convert copying ram into marking bitmap. We can also flushing the ram buffer
>> by multiple threads which advised by Dave ...
>
> Yes, I understand that as with any migration numbers, this too depends
> on what the guest is doing.  However, can you just pick some standard
> workload - kernel compile or something like that - and post a few
> observations?
>

Li Zhijian has sent some test results which based on kernel colo proxy,
After switch to userspace colo proxy, there maybe some degradations.
But for the old scenario, some optimizations are not implemented.
For the new userspace colo proxy scenario, we didn't test it overall,
Because it is still WIP, we will start the work after this frame is merged.

>>> Also, can you tell how did you arrive at the default checkpoint
>>> interval?
>>>
>>
>> Er, for this value, we referred to Remus in XEN platform. ;)
>> But after we implement COLO with colo proxy, this interval value will be changed
>> to a bigger one (10s). And we will make it configuration too. Besides, we will
>> add another configurable value to control the min interval of checkpointing.
>
> OK - any typical value that is a good mix between COLO keeping the
> network too busy / guest paused vs guest making progress?  Again this
> is something that's workload-dependent, but I guess you have typical
> numbers from a network-bound workload?
>

Yes, you can refer to Zhijian's email for detail.
I think it is necessary to add some test/performance results into COLO's wiki.
We will do that later.

Thanks,
hailiang

> Thanks,
>
> 		Amit
>
> .
>