From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yang Hongyang Subject: Re: [PATCH v3 COLOPre 16/26] tools/libx{l, c}: add back channel to libxc Date: Wed, 1 Jul 2015 21:54:30 +0800 Message-ID: <5593F116.2030002@cn.fujitsu.com> References: <1435213552-10556-1-git-send-email-yanghy@cn.fujitsu.com> <1435213552-10556-17-git-send-email-yanghy@cn.fujitsu.com> <1435659052.21469.68.camel@citrix.com> <559352B6.2010400@cn.fujitsu.com> <1435747338.21469.252.camel@citrix.com> <5593C881.80600@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5593C881.80600@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper , Ian Campbell Cc: wei.liu2@citrix.com, wency@cn.fujitsu.com, guijianfeng@cn.fujitsu.com, yunhong.jiang@intel.com, eddie.dong@intel.com, xen-devel@lists.xen.org, rshriram@cs.ubc.ca, Ian Jackson List-Id: xen-devel@lists.xenproject.org On 07/01/2015 07:01 PM, Andrew Cooper wrote: > On 01/07/15 11:42, Ian Campbell wrote: >> On Wed, 2015-07-01 at 10:38 +0800, Yang Hongyang wrote: >>> On 06/30/2015 06:10 PM, Ian Campbell wrote: >>>> On Thu, 2015-06-25 at 14:25 +0800, Yang Hongyang wrote: >>>>> We need to send secondary's dirty page pfns back to primary. >>>> In v2 Ian asked (<21888.2988.774072.32946@mariner.uk.xensource.com>): >>>> >>>> In the pdf >>>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0 >>>> linked from the wiki page >>>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping >>>> it says that the secondary keeps a copy of the original contents of >>>> its dirty pages. So I don't understand why you need to send the dirty >>>> bitmap to the primary. >>>> >>>> Which I don't see an answer for in my archive. Have I missed (or >>>> misplaced) the answer? >>> Sorry, seems that I misplaced the answer to: >>> [PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl to support >>> return data from xl to xc >>> >>> > Thanks for this. I would have some comments on the details, but first >>> > I want to properly understand your use case. So while I'm the author >>> > and maintainer of this save helper, I won't review this in detail just >>> > yet. I'm following the thread about what this is for... >>> >>> We need to send secondary's dirty page pfn back to primary. Primary will >>> then send pages that are both dirtied on primary/secondary to secondary. >>> in this way the secondary's memory will be consistent with primary. >>> >>> As we disscussed in [PATCH v2 COLOPre 04/13] tools/libxc: export xc_bitops.h >>> If we move this operation to libxc layer, this patch could be dropped. >> This doesn't seem to be a response to Ian's question which I quoted >> above. >> >> The crux of the question is that the design contained in those links >> does not appear to require a back channel, because it does not require a >> dirty bitmap to go from secondary to primary. Asserting a need to do so >> does not answer the question. > > It very definitely does require a dirty bitmap moving from the secondary > to the primary. > > Lets see whether I can try explaining it in a different way. > > In COLO mode, both VMs are running, and are considered in sync if the > visible network traffic is identical. After some time, they fall out of > sync. > > At this point, the two VMs have definitely diverged. Lets call the > primary dirty bitmap set A, while the secondary dirty bitmap set B. > > Sets A and B are different. > > Under normal migration, the page data for set A will be sent form the > primary to the secondary. > > However, the set difference B - A (lets call this C) is out-of-date on > the secondary (with respect to the primary) and will not be sent by the > primary, as it was not memory dirtied by the primary. The secondary > needs the page data for C to reconstruct an exact copy of the primary at > the checkpoint. > > The secondary cannot calculate C as it doesn't know A. Instead, the > secondary must send B to the primary, at which point the primary > calculates the union of A and B (lets call this D) which is all the > pages dirtied by both the primary and the secondary, and sends all page > data covered by D. > > In the general case, D is a superset of both A and B. Without the > backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid > copy of the primary. Thank you Andy! The explaination is clear enough, do you mind if I copy your comments into the code comment or commit message and with your sob? > > ~Andrew > > P.S. I have suggested an investigation of the CoW support in Xen as a > potential optimisation, as this could be used to prevent the secondary > losing C, but this is very definitely future work and not appropriate at > this point in COLO. > . > -- Thanks, Yang.