From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33079) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bzJXN-0005UO-ME for qemu-devel@nongnu.org; Wed, 26 Oct 2016 04:26:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bzJXK-00064b-J4 for qemu-devel@nongnu.org; Wed, 26 Oct 2016 04:26:17 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34444) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1bzJXK-00064U-DZ for qemu-devel@nongnu.org; Wed, 26 Oct 2016 04:26:14 -0400 Date: Wed, 26 Oct 2016 13:56:09 +0530 From: Amit Shah Message-ID: <20161026082609.GT1679@amit-lp.rh> References: <1476792613-11712-1-git-send-email-zhang.zhanghailiang@huawei.com> <20161026060931.GR1679@amit-lp.rh> <58105092.2050102@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <58105092.2050102@huawei.com> Subject: Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Hailiang Zhang Cc: quintela@redhat.com, qemu-devel@nongnu.org, dgilbert@redhat.com, wency@cn.fujitsu.com, lizhijian@cn.fujitsu.com, xiecl.fnst@cn.fujitsu.com, Hai Huang , Weidong Han , Dong eddie , Stefan Hajnoczi , Jason Wang On (Wed) 26 Oct 2016 [14:43:30], Hailiang Zhang wrote: > Hi Amit, > > On 2016/10/26 14:09, Amit Shah wrote: > >Hello, > > > >On (Tue) 18 Oct 2016 [20:09:56], zhanghailiang wrote: > >>This is the 21th version of COLO frame series. > >> > >>Rebase to the latest master. > > > >I've reviewed the patchset, have some minor comments, but overall it > >looks good. The changes are contained, and common code / existing > >code paths are not affected much. We can still target to merge this > >for 2.8. > > > > I really appreciate your help ;), I will fix all the issues later > and send v22. Hope we can still catch the deadline of V2.8. > > >Do you have any tests on how much the VM slows down / downtime > >incurred during checkpoints? > > > > Yes, we tested that long time ago, it all depends. > The downtime is determined by the time of transferring the dirty pages > and the time of flushing ram from ram buffer. > But we really have methods to reduce the downtime. > > One method is to reduce the amount of data (dirty pages mainly) while do checkpoint > by transferring dirty pages asynchronously while PVM and SVM are running (no in > the time of doing checkpoint). Besides we can re-use the capability of migration, such > as compressing, etc. > Another method is to reduce the time of flushing ram by using userfaultfd API > to convert copying ram into marking bitmap. We can also flushing the ram buffer > by multiple threads which advised by Dave ... Yes, I understand that as with any migration numbers, this too depends on what the guest is doing. However, can you just pick some standard workload - kernel compile or something like that - and post a few observations? > >Also, can you tell how did you arrive at the default checkpoint > >interval? > > > > Er, for this value, we referred to Remus in XEN platform. ;) > But after we implement COLO with colo proxy, this interval value will be changed > to a bigger one (10s). And we will make it configuration too. Besides, we will > add another configurable value to control the min interval of checkpointing. OK - any typical value that is a good mix between COLO keeping the network too busy / guest paused vs guest making progress? Again this is something that's workload-dependent, but I guess you have typical numbers from a network-bound workload? Thanks, Amit