From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yang Hongyang Subject: Re: [PATCH Remus v3 1/3] libxc/save: implement Remus checkpointed save Date: Thu, 14 May 2015 09:08:47 +0800 Message-ID: <5553F59F.7020304@cn.fujitsu.com> References: <1431506101-29612-1-git-send-email-yanghy@cn.fujitsu.com> <1431506101-29612-2-git-send-email-yanghy@cn.fujitsu.com> <55536A3C.80808@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <55536A3C.80808@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper , xen-devel@lists.xen.org Cc: wei.liu2@citrix.com, ian.campbell@citrix.com, wency@cn.fujitsu.com, ian.jackson@eu.citrix.com, yunhong.jiang@intel.com, eddie.dong@intel.com, guijianfeng@cn.fujitsu.com, rshriram@cs.ubc.ca List-Id: xen-devel@lists.xenproject.org On 05/13/2015 11:14 PM, Andrew Cooper wrote: > On 13/05/15 09:34, Yang Hongyang wrote: >> With Remus, the save flow should be: >> live migration->{ periodically save(checkpointed save) } >> >> Signed-off-by: Yang Hongyang >> CC: Ian Campbell >> CC: Ian Jackson >> CC: Wei Liu >> CC: Andrew Cooper >> --- >> tools/libxc/xc_sr_save.c | 73 +++++++++++++++++++++++++++++++++++------------- >> 1 file changed, 54 insertions(+), 19 deletions(-) >> >> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c >> index 1d0a46d..3890c4d 100644 >> --- a/tools/libxc/xc_sr_save.c >> +++ b/tools/libxc/xc_sr_save.c >> @@ -57,6 +57,16 @@ static int write_end_record(struct xc_sr_context *ctx) >> } >> >> /* >> + * Writes an CHECKPOINT record into the stream. >> + */ >> +static int write_checkpoint_record(struct xc_sr_context *ctx) >> +{ >> + struct xc_sr_record checkpoint = { REC_TYPE_CHECKPOINT, 0, NULL }; >> + >> + return write_record(ctx, &checkpoint); >> +} >> + >> +/* >> * Writes a batch of memory as a PAGE_DATA record into the stream. The batch >> * is constructed in ctx->save.batch_pfns. >> * >> @@ -467,6 +477,9 @@ static int send_domain_memory_live(struct xc_sr_context *ctx) >> DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap, >> &ctx->save.dirty_bitmap_hbuf); >> >> + if ( !ctx->save.live ) >> + goto last_iter; >> + > > I really want to avoid this code getting into the same spiders web as it > used to be with goto, but I cannot suggest a better alternative. > > However, please leave a comment explaining that subsequent checkpointed > loop skip the live part and pause straight away. OK > >> rc = enable_logdirty(ctx); >> if ( rc ) >> goto out; >> @@ -505,6 +518,7 @@ static int send_domain_memory_live(struct xc_sr_context *ctx) >> goto out; >> } >> >> + last_iter: >> rc = suspend_domain(ctx); >> if ( rc ) >> goto out; >> @@ -667,29 +681,50 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type) >> if ( rc ) >> goto err; >> >> - rc = ctx->save.ops.start_of_checkpoint(ctx); >> - if ( rc ) >> - goto err; >> + do { >> + rc = ctx->save.ops.start_of_checkpoint(ctx); >> + if ( rc ) >> + goto err; >> >> - if ( ctx->save.live ) >> - rc = send_domain_memory_live(ctx); >> - else >> - rc = send_domain_memory_nonlive(ctx); >> + if ( ctx->save.live || ctx->save.checkpointed ) >> + rc = send_domain_memory_live(ctx); >> + else >> + rc = send_domain_memory_nonlive(ctx); >> >> - if ( rc ) >> - goto err; >> + if ( rc ) >> + goto err; >> >> - if ( !ctx->dominfo.shutdown || >> - (ctx->dominfo.shutdown_reason != SHUTDOWN_suspend) ) >> - { >> - ERROR("Domain has not been suspended"); >> - rc = -1; >> - goto err; >> - } >> + if ( !ctx->dominfo.shutdown || >> + (ctx->dominfo.shutdown_reason != SHUTDOWN_suspend) ) >> + { >> + ERROR("Domain has not been suspended"); >> + rc = -1; >> + goto err; >> + } >> >> - rc = ctx->save.ops.end_of_checkpoint(ctx); >> - if ( rc ) >> - goto err; >> + rc = ctx->save.ops.end_of_checkpoint(ctx); >> + if ( rc ) >> + goto err; >> + >> + if ( ctx->save.checkpointed ) { >> + if ( ctx->save.live ) { > > Coding style. libxc is admittedly inconsistent, but follows hypervisor > style, so { on the next line. OK > >> + /* End of live migration, we are sending checkpointed stream */ >> + ctx->save.live = false; >> + } >> + >> + rc = write_checkpoint_record(ctx); >> + if ( rc ) >> + goto err; >> + >> + ctx->save.callbacks->postcopy(ctx->save.callbacks->data); >> + >> + rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data); >> + if ( rc > 0 ) >> + IPRINTF("Checkpointed save"); > > This should probably be progress information , so > xc_report_progress_single() Will fix. > > Otherwise, Reviewed-by: Andrew Cooper > > (I think this code is a bit of an improvement over xc_domain_save.c) > >> + else >> + ctx->save.checkpointed = false; >> + } >> + } while ( ctx->save.checkpointed ); >> >> xc_report_progress_single(xch, "End of stream"); >> > > . > -- Thanks, Yang.