From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50793) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bzbkb-0007JX-0p for qemu-devel@nongnu.org; Wed, 26 Oct 2016 23:53:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bzbkX-0006nG-VB for qemu-devel@nongnu.org; Wed, 26 Oct 2016 23:53:09 -0400 Received: from mx1.redhat.com ([209.132.183.28]:50640) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1bzbkX-0006ml-Ni for qemu-devel@nongnu.org; Wed, 26 Oct 2016 23:53:05 -0400 Date: Thu, 27 Oct 2016 09:22:56 +0530 From: Amit Shah Message-ID: <20161027035256.GA1476@amit-lp.rh> References: <1476792613-11712-1-git-send-email-zhang.zhanghailiang@huawei.com> <20161026060931.GR1679@amit-lp.rh> <58105092.2050102@huawei.com> <20161026082609.GT1679@amit-lp.rh> <5810D150.5070709@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5810D150.5070709@huawei.com> Subject: Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Hailiang Zhang Cc: quintela@redhat.com, qemu-devel@nongnu.org, dgilbert@redhat.com, wency@cn.fujitsu.com, lizhijian@cn.fujitsu.com, xiecl.fnst@cn.fujitsu.com, Hai Huang , Weidong Han , Dong eddie , Stefan Hajnoczi , Jason Wang On (Wed) 26 Oct 2016 [23:52:48], Hailiang Zhang wrote: > Hi Amit, > > On 2016/10/26 16:26, Amit Shah wrote: > >On (Wed) 26 Oct 2016 [14:43:30], Hailiang Zhang wrote: > >>Hi Amit, > >> > >>On 2016/10/26 14:09, Amit Shah wrote: > >>>Hello, > >>> > >>>On (Tue) 18 Oct 2016 [20:09:56], zhanghailiang wrote: > >>>>This is the 21th version of COLO frame series. > >>>> > >>>>Rebase to the latest master. > >>> > >>>I've reviewed the patchset, have some minor comments, but overall it > >>>looks good. The changes are contained, and common code / existing > >>>code paths are not affected much. We can still target to merge this > >>>for 2.8. > >>> > >> > >>I really appreciate your help ;), I will fix all the issues later > >>and send v22. Hope we can still catch the deadline of V2.8. > >> > >>>Do you have any tests on how much the VM slows down / downtime > >>>incurred during checkpoints? > >>> > >> > >>Yes, we tested that long time ago, it all depends. > >>The downtime is determined by the time of transferring the dirty pages > >>and the time of flushing ram from ram buffer. > >>But we really have methods to reduce the downtime. > >> > >>One method is to reduce the amount of data (dirty pages mainly) while do checkpoint > >>by transferring dirty pages asynchronously while PVM and SVM are running (no in > >>the time of doing checkpoint). Besides we can re-use the capability of migration, such > >>as compressing, etc. > >>Another method is to reduce the time of flushing ram by using userfaultfd API > >>to convert copying ram into marking bitmap. We can also flushing the ram buffer > >>by multiple threads which advised by Dave ... > > > >Yes, I understand that as with any migration numbers, this too depends > >on what the guest is doing. However, can you just pick some standard > >workload - kernel compile or something like that - and post a few > >observations? > > > > Li Zhijian has sent some test results which based on kernel colo proxy, > After switch to userspace colo proxy, there maybe some degradations. > But for the old scenario, some optimizations are not implemented. > For the new userspace colo proxy scenario, we didn't test it overall, > Because it is still WIP, we will start the work after this frame is merged. OK. > >>>Also, can you tell how did you arrive at the default checkpoint > >>>interval? > >>> > >> > >>Er, for this value, we referred to Remus in XEN platform. ;) > >>But after we implement COLO with colo proxy, this interval value will be changed > >>to a bigger one (10s). And we will make it configuration too. Besides, we will > >>add another configurable value to control the min interval of checkpointing. > > > >OK - any typical value that is a good mix between COLO keeping the > >network too busy / guest paused vs guest making progress? Again this > >is something that's workload-dependent, but I guess you have typical > >numbers from a network-bound workload? > > > > Yes, you can refer to Zhijian's email for detail. > I think it is necessary to add some test/performance results into COLO's wiki. > We will do that later. Yes, please. Also, in your next iteration, please add the colo files to the MAINTAINERS entry so you get CC'ed on future patches (and bugs :-) Amit