From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: [PATCH v7 12/18] tools/libx{l, c}: add back channel to libxc Date: Fri, 29 Jan 2016 11:38:39 -0500 Message-ID: <20160129163838.GM22787@char.us.oracle.com> References: <1454045254-3711-1-git-send-email-wency@cn.fujitsu.com> <1454045254-3711-13-git-send-email-wency@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <1454045254-3711-13-git-send-email-wency@cn.fujitsu.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Wen Congyang Cc: Lars Kurth , Changlong Xie , Wei Liu , Ian Campbell , Andrew Cooper , Jiang Yunhong , Dong Eddie , xen devel , Gui Jianfeng , Shriram Rajagopalan , Ian Jackson , Yang Hongyang List-Id: xen-devel@lists.xenproject.org On Fri, Jan 29, 2016 at 01:27:28PM +0800, Wen Congyang wrote: > In COLO mode, both VMs are running, and are considered in sync if the > visible network traffic is identical. After some time, they fall out of > sync. > > At this point, the two VMs have definitely diverged. Lets call the > primary dirty bitmap set A, while the secondary dirty bitmap set B. > > Sets A and B are different. > > Under normal migration, the page data for set A will be sent from the > primary to the secondary. > > However, the set difference B - A (the one in B but not in A, lets > call this C) is out-of-date on the secondary (with respect to the > primary) and will not be sent by the primary (to secondary), as it > was not memory dirtied by the primary. The secondary needs C page data > to reconstruct an exact copy of the primary at the checkpoint. > > The secondary cannot calculate C as it doesn't know A. Instead, the > secondary must send B to the primary, at which point the primary > calculates the union of A and B (lets call this D) which is all the > pages dirtied by both the primary and the secondary, and sends all page > data covered by D. > > In the general case, D is a superset of both A and B. Without the > backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid > copy of the primary. > > We transfer the dirty bitmap on libxc side, so we need to introduce back > channel to libxc. > > Note: it is different from the paper. We change the original design to > the current one, according to our following concerns: > 1. The original design needs extra memory on Secondary host. When there's > multiple backups on one host, the memory cost is high. > 2. The memory cache code will be another 1k+, it will make the review > more time consuming. > > Note: the back channel will be used in the patch > libxc/restore: send dirty pfn list to primary when checkpoint under COLO > to send dirty pfn list from secondary to primary. The patch is posted in > another series. > > Signed-off-by: Yang Hongyang > Signed-off-by: Andrew Cooper > CC: Ian Campbell > CC: Ian Jackson > CC: Wei Liu It is a bit confusing to have 'back_fd' and then 'send_fd'. Could you change the 'send_fd' (in this patch) to be called 'send_back_fd' so that the connection between: tools/libxl: Add back channel to allow migration target send data back and this patch is clear? Or perhaps also add it in the commit description that you are using the 'send_fd' provided by ' tools/libxl: Add back channel to allow migration target send data back' Otherwise: Reviewed-by: Konrad Rzeszutek Wilk