From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wei Liu Subject: Re: [PATCH v7 12/18] tools/libx{l, c}: add back channel to libxc Date: Wed, 3 Feb 2016 19:40:47 +0000 Message-ID: <20160203194047.GB23178@citrix.com> References: <1454045254-3711-1-git-send-email-wency@cn.fujitsu.com> <1454045254-3711-13-git-send-email-wency@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <1454045254-3711-13-git-send-email-wency@cn.fujitsu.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Wen Congyang Cc: Lars Kurth , Changlong Xie , Wei Liu , Ian Campbell , Andrew Cooper , Jiang Yunhong , Ian Jackson , xen devel , Dong Eddie , Gui Jianfeng , Shriram Rajagopalan , Yang Hongyang List-Id: xen-devel@lists.xenproject.org On Fri, Jan 29, 2016 at 01:27:28PM +0800, Wen Congyang wrote: > In COLO mode, both VMs are running, and are considered in sync if the > visible network traffic is identical. After some time, they fall out of > sync. > > At this point, the two VMs have definitely diverged. Lets call the > primary dirty bitmap set A, while the secondary dirty bitmap set B. > > Sets A and B are different. > > Under normal migration, the page data for set A will be sent from the > primary to the secondary. > > However, the set difference B - A (the one in B but not in A, lets > call this C) is out-of-date on the secondary (with respect to the > primary) and will not be sent by the primary (to secondary), as it > was not memory dirtied by the primary. The secondary needs C page data > to reconstruct an exact copy of the primary at the checkpoint. > > The secondary cannot calculate C as it doesn't know A. Instead, the > secondary must send B to the primary, at which point the primary > calculates the union of A and B (lets call this D) which is all the > pages dirtied by both the primary and the secondary, and sends all page > data covered by D. > > In the general case, D is a superset of both A and B. Without the > backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid > copy of the primary. > > We transfer the dirty bitmap on libxc side, so we need to introduce back > channel to libxc. > > Note: it is different from the paper. We change the original design to > the current one, according to our following concerns: > 1. The original design needs extra memory on Secondary host. When there's > multiple backups on one host, the memory cost is high. > 2. The memory cache code will be another 1k+, it will make the review > more time consuming. > > Note: the back channel will be used in the patch "will not be used" ? I don't see any read / write to the newly introduced fd. > libxc/restore: send dirty pfn list to primary when checkpoint under COLO > to send dirty pfn list from secondary to primary. The patch is posted in > another series. > > Signed-off-by: Yang Hongyang > Signed-off-by: Andrew Cooper > CC: Ian Campbell > CC: Ian Jackson > CC: Wei Liu > --- [...] > > /*----- helper execution -----*/ > +static int dup_fd_helper(libxl__gc *gc, int fd, const char *what) > +{ > + int dup_fd = fd; > + > + if (fd <= 2) { > + dup_fd = dup(fd); > + if (dup_fd < 0) { > + LOGE(ERROR,"dup %s", what); > + exit(-1); > + } > + } > + libxl_fd_set_cloexec(CTX, dup_fd, 0); > + > + return dup_fd; > +} > It would be better if introduction of this helper to be separated into a different patch.