From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH Remus v3 1/3] libxc/save: implement Remus checkpointed save Date: Wed, 13 May 2015 16:14:04 +0100 Message-ID: <55536A3C.80808@citrix.com> References: <1431506101-29612-1-git-send-email-yanghy@cn.fujitsu.com> <1431506101-29612-2-git-send-email-yanghy@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1431506101-29612-2-git-send-email-yanghy@cn.fujitsu.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Yang Hongyang , xen-devel@lists.xen.org Cc: wei.liu2@citrix.com, ian.campbell@citrix.com, wency@cn.fujitsu.com, ian.jackson@eu.citrix.com, yunhong.jiang@intel.com, eddie.dong@intel.com, guijianfeng@cn.fujitsu.com, rshriram@cs.ubc.ca List-Id: xen-devel@lists.xenproject.org On 13/05/15 09:34, Yang Hongyang wrote: > With Remus, the save flow should be: > live migration->{ periodically save(checkpointed save) } > > Signed-off-by: Yang Hongyang > CC: Ian Campbell > CC: Ian Jackson > CC: Wei Liu > CC: Andrew Cooper > --- > tools/libxc/xc_sr_save.c | 73 +++++++++++++++++++++++++++++++++++------------- > 1 file changed, 54 insertions(+), 19 deletions(-) > > diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c > index 1d0a46d..3890c4d 100644 > --- a/tools/libxc/xc_sr_save.c > +++ b/tools/libxc/xc_sr_save.c > @@ -57,6 +57,16 @@ static int write_end_record(struct xc_sr_context *ctx) > } > > /* > + * Writes an CHECKPOINT record into the stream. > + */ > +static int write_checkpoint_record(struct xc_sr_context *ctx) > +{ > + struct xc_sr_record checkpoint = { REC_TYPE_CHECKPOINT, 0, NULL }; > + > + return write_record(ctx, &checkpoint); > +} > + > +/* > * Writes a batch of memory as a PAGE_DATA record into the stream. The batch > * is constructed in ctx->save.batch_pfns. > * > @@ -467,6 +477,9 @@ static int send_domain_memory_live(struct xc_sr_context *ctx) > DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap, > &ctx->save.dirty_bitmap_hbuf); > > + if ( !ctx->save.live ) > + goto last_iter; > + I really want to avoid this code getting into the same spiders web as it used to be with goto, but I cannot suggest a better alternative. However, please leave a comment explaining that subsequent checkpointed loop skip the live part and pause straight away. > rc = enable_logdirty(ctx); > if ( rc ) > goto out; > @@ -505,6 +518,7 @@ static int send_domain_memory_live(struct xc_sr_context *ctx) > goto out; > } > > + last_iter: > rc = suspend_domain(ctx); > if ( rc ) > goto out; > @@ -667,29 +681,50 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type) > if ( rc ) > goto err; > > - rc = ctx->save.ops.start_of_checkpoint(ctx); > - if ( rc ) > - goto err; > + do { > + rc = ctx->save.ops.start_of_checkpoint(ctx); > + if ( rc ) > + goto err; > > - if ( ctx->save.live ) > - rc = send_domain_memory_live(ctx); > - else > - rc = send_domain_memory_nonlive(ctx); > + if ( ctx->save.live || ctx->save.checkpointed ) > + rc = send_domain_memory_live(ctx); > + else > + rc = send_domain_memory_nonlive(ctx); > > - if ( rc ) > - goto err; > + if ( rc ) > + goto err; > > - if ( !ctx->dominfo.shutdown || > - (ctx->dominfo.shutdown_reason != SHUTDOWN_suspend) ) > - { > - ERROR("Domain has not been suspended"); > - rc = -1; > - goto err; > - } > + if ( !ctx->dominfo.shutdown || > + (ctx->dominfo.shutdown_reason != SHUTDOWN_suspend) ) > + { > + ERROR("Domain has not been suspended"); > + rc = -1; > + goto err; > + } > > - rc = ctx->save.ops.end_of_checkpoint(ctx); > - if ( rc ) > - goto err; > + rc = ctx->save.ops.end_of_checkpoint(ctx); > + if ( rc ) > + goto err; > + > + if ( ctx->save.checkpointed ) { > + if ( ctx->save.live ) { Coding style. libxc is admittedly inconsistent, but follows hypervisor style, so { on the next line. > + /* End of live migration, we are sending checkpointed stream */ > + ctx->save.live = false; > + } > + > + rc = write_checkpoint_record(ctx); > + if ( rc ) > + goto err; > + > + ctx->save.callbacks->postcopy(ctx->save.callbacks->data); > + > + rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data); > + if ( rc > 0 ) > + IPRINTF("Checkpointed save"); This should probably be progress information , so xc_report_progress_single() Otherwise, Reviewed-by: Andrew Cooper (I think this code is a bit of an improvement over xc_domain_save.c) > + else > + ctx->save.checkpointed = false; > + } > + } while ( ctx->save.checkpointed ); > > xc_report_progress_single(xch, "End of stream"); >