From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hongyang Yang Subject: Re: [PATCH 3/3] libxc/migrationv2: Split {start, end}_of_stream() to make checkpoint variants Date: Mon, 11 May 2015 17:23:17 +0800 Message-ID: <55507505.7060509@cn.fujitsu.com> References: <1431089675-31163-1-git-send-email-andrew.cooper3@citrix.com> <1431089675-31163-4-git-send-email-andrew.cooper3@citrix.com> <1431091837.2660.449.camel@citrix.com> <5550147B.1010304@cn.fujitsu.com> <55507025.1080309@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <55507025.1080309@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper , Ian Campbell Cc: Wei Liu , Ian Jackson , Xen-devel List-Id: xen-devel@lists.xenproject.org On 05/11/2015 05:02 PM, Andrew Cooper wrote: > On 11/05/15 03:31, Hongyang Yang wrote: >> On 05/08/2015 09:30 PM, Ian Campbell wrote: >>> On Fri, 2015-05-08 at 13:54 +0100, Andrew Cooper wrote: >>>> This is in preparation for supporting checkpointed streams in >>>> migration v2. >>>> - For PV guests, the VCPU context is moved to end_of_checkpoint(). >>>> - For HVM guests, the HVM context and params are moved to >>>> end_of_checkpoint(). >>> >>> [...] >>>> + /** >>>> + * Send records which need to be at the end of the checkpoint. >>>> This is >>>> + * called once, or once per checkpoint in a checkpointed >>>> stream, and is >>>> + * after the memory data. >>>> + */ >>>> + int (*end_of_checkpoint)(struct xc_sr_context *ctx); >>>> + >>>> + /** >>>> + * Send records which need to be at the end of the stream. >>>> This is called >>>> + * once, before the END record is written. >>>> */ >>>> int (*end_of_stream)(struct xc_sr_context *ctx); >>> [...] >>>> +static int x86_hvm_end_of_stream(struct xc_sr_context *ctx) >>>> +{ >>>> + int rc; >>>> + >>>> + rc = write_tsc_info(ctx); >>>> if ( rc ) >>>> return rc; >>>> >>>> - /* Write HVM_PARAMS record contains applicable HVM params. */ >>>> - rc = write_hvm_params(ctx); >>>> +#ifdef XG_LIBXL_HVM_COMPAT >>>> + rc = write_toolstack(ctx); >>> >>> I'm not sure about this end_of_stream thing. In a check pointing for >>> fault tolerance scenario (Remus or COLO) then failover happens when the >>> sender has died for some reason, and therefore won't get the chance to >>> send any end of stream stuff. >>> >>> IOW I think everything in end_of_stream actually needs to be in >>> end_of_checkpoint unless it is just for informational purposes in a >>> regular migration or something (which write_toolstack surely isn't) >> >> Yes, all records should be sent at every checkpoint, except those >> only need to be sent once. >> >> checkpoint: >> You can see clearly from the patches a Remus migration explicit include >> two stage, first stage is live migration, the second is Checkpointed >> stream. The live migration is obvious, after the live migration, both >> primary and secondary are in the same state, the primary will continue >> to run until the next checkpoint, at checkpint, we sync the secondary >> state with the primary, so that both side are in the same state, so >> any record that could be changed while Guest is runing should be sent >> at checkpoint. >> >> failover: >> The handling of Checkpointed stream on restore side is also include >> two stage, >> first is buffer records, second is process records. This is because if >> master >> died when sending records, the secondary state will be inconsistent. >> So we >> have to make sure all records are received and then process the records. >> If master died, the secondary can recover from the last checkpoint state. >> Currently Remus failover relies on the migration channel. If the channel >> break, we presume master is dead, so we will failover. The "goto >> err_buf" is >> the failover path, with goto err_buf, we discard the current checkpoint >> records because it is imperfect, then resume the guest with last >> checkpoint >> state(the last processed records). > > Thankyou for the clarification. > > It occurs to me that, despite things like 'last_iter', it is actually > the first iteration which is actually special in Remus. yes, the 'last_iter' thing is actually suspend and send the dirty mem pages to secondary. > > Is there a case where the primary decides to explicitly hand over to the > secondary? Currently there isn't, The secondary only starts on failover. > > ~Andrew > . > -- Thanks, Yang.