From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH v3 COLOPre 16/26] tools/libx{l, c}: add back channel to libxc Date: Wed, 1 Jul 2015 15:03:58 +0100 Message-ID: <5593F34E.4070806@citrix.com> References: <1435213552-10556-1-git-send-email-yanghy@cn.fujitsu.com> <1435213552-10556-17-git-send-email-yanghy@cn.fujitsu.com> <1435659052.21469.68.camel@citrix.com> <559352B6.2010400@cn.fujitsu.com> <1435747338.21469.252.camel@citrix.com> <5593C881.80600@citrix.com> <5593F116.2030002@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5593F116.2030002@cn.fujitsu.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Yang Hongyang , Ian Campbell Cc: wei.liu2@citrix.com, wency@cn.fujitsu.com, guijianfeng@cn.fujitsu.com, yunhong.jiang@intel.com, eddie.dong@intel.com, xen-devel@lists.xen.org, rshriram@cs.ubc.ca, Ian Jackson List-Id: xen-devel@lists.xenproject.org On 01/07/15 14:54, Yang Hongyang wrote: > > > On 07/01/2015 07:01 PM, Andrew Cooper wrote: >> On 01/07/15 11:42, Ian Campbell wrote: >>> On Wed, 2015-07-01 at 10:38 +0800, Yang Hongyang wrote: >>>> On 06/30/2015 06:10 PM, Ian Campbell wrote: >>>>> On Thu, 2015-06-25 at 14:25 +0800, Yang Hongyang wrote: >>>>>> We need to send secondary's dirty page pfns back to primary. >>>>> In v2 Ian asked (<21888.2988.774072.32946@mariner.uk.xensource.com>): >>>>> >>>>> In the pdf >>>>> >>>>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0 >>>>> linked from the wiki page >>>>> >>>>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping >>>>> it says that the secondary keeps a copy of the original >>>>> contents of >>>>> its dirty pages. So I don't understand why you need to >>>>> send the dirty >>>>> bitmap to the primary. >>>>> >>>>> Which I don't see an answer for in my archive. Have I missed (or >>>>> misplaced) the answer? >>>> Sorry, seems that I misplaced the answer to: >>>> [PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl >>>> to support >>>> return data from xl to xc >>>> >>>> > Thanks for this. I would have some comments on the details, >>>> but first >>>> > I want to properly understand your use case. So while I'm >>>> the author >>>> > and maintainer of this save helper, I won't review this in >>>> detail just >>>> > yet. I'm following the thread about what this is for... >>>> >>>> We need to send secondary's dirty page pfn back to primary. >>>> Primary will >>>> then send pages that are both dirtied on primary/secondary to >>>> secondary. >>>> in this way the secondary's memory will be consistent with >>>> primary. >>>> >>>> As we disscussed in [PATCH v2 COLOPre 04/13] tools/libxc: >>>> export xc_bitops.h >>>> If we move this operation to libxc layer, this patch could be >>>> dropped. >>> This doesn't seem to be a response to Ian's question which I quoted >>> above. >>> >>> The crux of the question is that the design contained in those links >>> does not appear to require a back channel, because it does not >>> require a >>> dirty bitmap to go from secondary to primary. Asserting a need to do so >>> does not answer the question. >> >> It very definitely does require a dirty bitmap moving from the secondary >> to the primary. >> >> Lets see whether I can try explaining it in a different way. >> >> In COLO mode, both VMs are running, and are considered in sync if the >> visible network traffic is identical. After some time, they fall out of >> sync. >> >> At this point, the two VMs have definitely diverged. Lets call the >> primary dirty bitmap set A, while the secondary dirty bitmap set B. >> >> Sets A and B are different. >> >> Under normal migration, the page data for set A will be sent form the >> primary to the secondary. >> >> However, the set difference B - A (lets call this C) is out-of-date on >> the secondary (with respect to the primary) and will not be sent by the >> primary, as it was not memory dirtied by the primary. The secondary >> needs the page data for C to reconstruct an exact copy of the primary at >> the checkpoint. >> >> The secondary cannot calculate C as it doesn't know A. Instead, the >> secondary must send B to the primary, at which point the primary >> calculates the union of A and B (lets call this D) which is all the >> pages dirtied by both the primary and the secondary, and sends all page >> data covered by D. >> >> In the general case, D is a superset of both A and B. Without the >> backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid >> copy of the primary. > > Thank you Andy! The explaination is clear enough, do you mind if I > copy your > comments into the code comment or commit message and with your sob? Feel free to borrow any/all of the description which you would feel would be useful, although you probably don't want to take it all verbatim for a commit message. ~Andrew