From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41816) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bzdfr-0000FG-E4 for qemu-devel@nongnu.org; Thu, 27 Oct 2016 01:56:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bzdfn-0001YY-FY for qemu-devel@nongnu.org; Thu, 27 Oct 2016 01:56:23 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:14976) by eggs.gnu.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.71) (envelope-from ) id 1bzdfk-0001TE-Jq for qemu-devel@nongnu.org; Thu, 27 Oct 2016 01:56:19 -0400 References: <1476792613-11712-1-git-send-email-zhang.zhanghailiang@huawei.com> <20161026060931.GR1679@amit-lp.rh> <58105092.2050102@huawei.com> <20161026082609.GT1679@amit-lp.rh> <5810D150.5070709@huawei.com> <20161027035256.GA1476@amit-lp.rh> From: Hailiang Zhang Message-ID: <581196EC.7080409@huawei.com> Date: Thu, 27 Oct 2016 13:55:56 +0800 MIME-Version: 1.0 In-Reply-To: <20161027035256.GA1476@amit-lp.rh> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Amit Shah Cc: quintela@redhat.com, qemu-devel@nongnu.org, dgilbert@redhat.com, wency@cn.fujitsu.com, lizhijian@cn.fujitsu.com, xiecl.fnst@cn.fujitsu.com, Hai Huang , Weidong Han , Dong eddie , Stefan Hajnoczi , Jason Wang On 2016/10/27 11:52, Amit Shah wrote: > On (Wed) 26 Oct 2016 [23:52:48], Hailiang Zhang wrote: >> Hi Amit, >> >> On 2016/10/26 16:26, Amit Shah wrote: >>> On (Wed) 26 Oct 2016 [14:43:30], Hailiang Zhang wrote: >>>> Hi Amit, >>>> >>>> On 2016/10/26 14:09, Amit Shah wrote: >>>>> Hello, >>>>> >>>>> On (Tue) 18 Oct 2016 [20:09:56], zhanghailiang wrote: >>>>>> This is the 21th version of COLO frame series. >>>>>> >>>>>> Rebase to the latest master. >>>>> >>>>> I've reviewed the patchset, have some minor comments, but overall it >>>>> looks good. The changes are contained, and common code / existing >>>>> code paths are not affected much. We can still target to merge this >>>>> for 2.8. >>>>> >>>> >>>> I really appreciate your help ;), I will fix all the issues later >>>> and send v22. Hope we can still catch the deadline of V2.8. >>>> >>>>> Do you have any tests on how much the VM slows down / downtime >>>>> incurred during checkpoints? >>>>> >>>> >>>> Yes, we tested that long time ago, it all depends. >>>> The downtime is determined by the time of transferring the dirty pages >>>> and the time of flushing ram from ram buffer. >>>> But we really have methods to reduce the downtime. >>>> >>>> One method is to reduce the amount of data (dirty pages mainly) while do checkpoint >>>> by transferring dirty pages asynchronously while PVM and SVM are running (no in >>>> the time of doing checkpoint). Besides we can re-use the capability of migration, such >>>> as compressing, etc. >>>> Another method is to reduce the time of flushing ram by using userfaultfd API >>>> to convert copying ram into marking bitmap. We can also flushing the ram buffer >>>> by multiple threads which advised by Dave ... >>> >>> Yes, I understand that as with any migration numbers, this too depends >>> on what the guest is doing. However, can you just pick some standard >>> workload - kernel compile or something like that - and post a few >>> observations? >>> >> >> Li Zhijian has sent some test results which based on kernel colo proxy, >> After switch to userspace colo proxy, there maybe some degradations. >> But for the old scenario, some optimizations are not implemented. >> For the new userspace colo proxy scenario, we didn't test it overall, >> Because it is still WIP, we will start the work after this frame is merged. > > OK. > >>>>> Also, can you tell how did you arrive at the default checkpoint >>>>> interval? >>>>> >>>> >>>> Er, for this value, we referred to Remus in XEN platform. ;) >>>> But after we implement COLO with colo proxy, this interval value will be changed >>>> to a bigger one (10s). And we will make it configuration too. Besides, we will >>>> add another configurable value to control the min interval of checkpointing. >>> >>> OK - any typical value that is a good mix between COLO keeping the >>> network too busy / guest paused vs guest making progress? Again this >>> is something that's workload-dependent, but I guess you have typical >>> numbers from a network-bound workload? >>> >> >> Yes, you can refer to Zhijian's email for detail. >> I think it is necessary to add some test/performance results into COLO's wiki. >> We will do that later. > > Yes, please. > > Also, in your next iteration, please add the colo files to the > MAINTAINERS entry so you get CC'ed on future patches (and bugs :-) > OK, I will send v23 with it. Thanks. Hailiang > Amit > > . >