From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH Remus v2 00/10] Remus support for Migration-v2 Date: Mon, 11 May 2015 12:01:33 +0100 Message-ID: <55508C0D.6060904@citrix.com> References: <1431077610-3366-1-git-send-email-yanghy@cn.fujitsu.com> <554CFC90.1030301@citrix.com> <55504C20.8020009@cn.fujitsu.com> <55506F94.7070407@citrix.com> <55508906.1010003@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <55508906.1010003@cn.fujitsu.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Hongyang Yang , xen-devel@lists.xen.org Cc: wei.liu2@citrix.com, ian.campbell@citrix.com, wency@cn.fujitsu.com, ian.jackson@eu.citrix.com, yunhong.jiang@intel.com, eddie.dong@intel.com, rshriram@cs.ubc.ca List-Id: xen-devel@lists.xenproject.org On 11/05/15 11:48, Hongyang Yang wrote: > > On 05/11/2015 05:00 PM, Andrew Cooper wrote: >> On 11/05/15 07:28, Hongyang Yang wrote: >>> On 05/09/2015 02:12 AM, Andrew Cooper wrote: >>>> On 08/05/15 10:33, Yang Hongyang wrote: >>>>> This patchset implement the Remus support for Migration v2 but >>>>> without >>>>> memory compressing. > [...] >> >>> >>> >>> >>>> end_of_checkpoint() >>>> Checkpoint record >>> >>> ctx->save.callbacks->postcopy() >>> this callback should not be omitted, it do some necessary work before >>> resume >>> primary (such as call Remus devices preresume callbacks to ensure the >>> disk >>> data is consistent) and then resume the primary guest. I think this >>> callback should be renamed to ctx->save.callbacks->resume(). >> >> That looks to be a useful cleanup (and answers one of my questions of >> what exactly postcopy was) >> >>> >>>> ctx->save.callbacks->checkpoint() >>>> libxl qemu record >>> >>> Maybe we should add another callback to send qemu record instead of >>> using checkpoint callback. We can call it >>> ctx->save.callbacks->save_qemu() >> >> This is another layering violation. libxc should not prescribe what >> libxl might or might not do. One example we are experimenting with in >> XenServer at the moment is support for multiple emulators attached to a >> single domain, which would necessitate two LIBXL_EMULATOR records to be >> sent per checkpoint. libxl might also want to send an updated json blob >> or such. > > Ok, so we'd better not introduce save_qemu callback. > >> >>> Then in checkpoint callback, we only call remus devices commit >>> callbacks( >>> which will release the network buffer etc...) then decide whether we >>> need to >>> do another checkpoint or quit checkpointed stream. >>> With Remus, checkpoint callback only wait for 200ms(can be specified >>> by -i) >>> then return. >>> With COLO, checkpoint callback will ask COLO proxy if we need to do a >>> checkpoint, will return when COLO proxy module indicate a checkpoint >>> is needed. >> >> That sounds like COLO wants a should_checkpoint() callback which >> separates the decision to make a checkpoint from the logic of >> implementing a checkpoint. > > We use checkpoint callback to do should_checkpoint() thing currently. > libxc will check the return value of checkpoint callback. But that causes a chicken & egg problem. I am planning to use a CHECKPOINT record to synchronise the transfer of ownership of the FD between libxc and libxl. Therefore, a CHECKPOINT record must be in the stream ahead of the checkpoint() callback, as libxl will then write/read some records in itself. As a result, the checkpoint() callback itself can't be used to gate whether a CHECKPOINT record is written by libxc. > >> >>> >>>> ... >>>> libxl end-of-checkpoint record >>>> ctx->save.callbacks->checkpoint() returns >>>> start_of_checkpoint() >>> >>> ctx->save.callbacks->suspend() >>> >>>> >>>> end_of_checkpoint() >>>> Checkpoint record >>>> etc... >>>> >>>> This will eventually allow both libxc and libxl to send checkpoint >>>> data >>>> (and by the looks of it, remove the need for postcopy()). With this >>>> libxc/remus work it is fine to use XG_LIBXL_HVM_COMPAT to cover the >>>> current qemu situation, but I would prefer not to be also retrofitting >>>> libxc checkpoint records when doing the libxl/migv2 work. >>>> >>>> Does this look plausible in for Remus (and eventually COLO) support? >>> >>> With comments above, I would suggest the save flow as below: >>> >>> libxc writes: libxl writes: >>> >>> live migration: >>> Image Header >>> Domain Header >>> start_of_stream() >>> start_of_checkpoint() >>> >>> ctx->save.callbacks->suspend() >>> >>> end_of_checkpoint() >>> if ( checkpointd ) >>> End of Checkpoint record >>> /*If resotre side receives this record, input fd should be handed to >>> libxl*/ >>> else >>> goto end >>> >>> loop of checkpointed stream: >>> ctx->save.callbacks->resume() >>> ctx->save.callbacks->save_qemu() >>> libxl qemu record >>> ... >>> libxl end-of-checkpoint record >>> /*If resotre side receives this record, input fd should be handed to >>> libxc*/ >>> ctx->save.callbacks->save_qemu() returns >>> ctx->save.callbacks->checkpoint() >>> start_of_checkpoint() >>> ctx->save.callbacks->suspend() >>> >>> end_of_checkpoint() >>> End of Checkpoint record >>> goto 'loop of checkpointed stream' >>> >>> end: >>> END record >>> /*If resotre side receives this record, input fd should be handed to >>> libxl*/ >>> >>> >>> In order to keep it simple, we can keep the current >>> ctx->save.callbacks->checkpoint() as it is, which do the save_qemu >>> thing, call >>> Remus devices commit callbacks and then decide whether we need a >>> checkpoint. We >>> can also combine the ctx->save.callbacks->resume() with >>> ctx->save.callbacks->checkpoint(), with only one checkpoint() >>> callback, we do >>> the following things: >>> - Call Remus devices preresume callbacks >>> - Resume the primary >>> - Save qemu records >>> - Call Remus devices commit callbacks >>> - Decide whether we need a checkpoint >>> >>> Overall, there are 3 options for the save flow: >>> 1. keep the current callbacks, rename postcopy() to resume() >>> 2. split the checkpoint() callback to save_qemu() and checkpoint() >>> 3. combine the current postcopy() and checkpoint() >>> Which one do you think is the best? >> >> I have a 4th alternative in mind, but would like your feedback from my >> comments in this email first. > > So what's the 4th alternative? I have some corrections to my patch series based on David's feedback, and your comments. After that, it should hopefully be far easier to describe. ~Andrew