From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yang Hongyang Subject: Re: [PATCH Remus v3 3/3] libxc/restore: implement Remus checkpointed restore Date: Thu, 14 May 2015 13:42:43 +0800 Message-ID: <555435D3.8080106@cn.fujitsu.com> References: <1431506101-29612-1-git-send-email-yanghy@cn.fujitsu.com> <1431506101-29612-4-git-send-email-yanghy@cn.fujitsu.com> <555378D3.5000800@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <555378D3.5000800@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper , xen-devel@lists.xen.org Cc: wei.liu2@citrix.com, ian.campbell@citrix.com, wency@cn.fujitsu.com, guijianfeng@cn.fujitsu.com, yunhong.jiang@intel.com, eddie.dong@intel.com, rshriram@cs.ubc.ca, ian.jackson@eu.citrix.com List-Id: xen-devel@lists.xenproject.org On 05/14/2015 12:16 AM, Andrew Cooper wrote: > On 13/05/15 09:35, Yang Hongyang wrote: >> With Remus, the restore flow should be: >> the first full migration stream -> { periodically restore stream } >> >> Signed-off-by: Yang Hongyang >> Signed-off-by: Andrew Cooper >> CC: Ian Campbell >> CC: Ian Jackson >> CC: Wei Liu >> --- >> tools/libxc/xc_sr_common.h | 13 +++++ >> tools/libxc/xc_sr_restore.c | 124 +++++++++++++++++++++++++++++++++++++++----- >> 2 files changed, 125 insertions(+), 12 deletions(-) >> >> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h >> index f8121e7..3740b89 100644 >> --- a/tools/libxc/xc_sr_common.h >> +++ b/tools/libxc/xc_sr_common.h >> @@ -208,6 +208,19 @@ struct xc_sr_context >> /* Plain VM, or checkpoints over time. */ >> bool checkpointed; >> >> + /* Currently buffering records between a checkpoint */ >> + bool buffer_all_records; >> + >> +/* >> + * With Remus, we buffer the records sent by primary at checkpoint, > > "sent by the primary" > >> + * in case the primary will fail, we can recover from the last >> + * checkpoint state. >> + * This should be enough because primary only send dirty pages at >> + * checkpoint. >> + */ >> +#define MAX_BUF_RECORDS 1024 > > As a general point, it occurs to me that this might be better as a > linked list, rather than having a hard limit. A burst of activity on > the primary could potentially hit this limit There are only few records at every checkpoint in my test, mostly under 10, probably because I don't do much operations in the Guest. This limit can be adjusted later by further testing. Even with linked list, we still need a limit IMO, otherwise it will eat too much memory. > >> + struct xc_sr_record *buffered_records[MAX_BUF_RECORDS]; > > Your data handling will be much more simple if this were struct > xc_sr_record *buffered_records; and you do a one-time allocation of > MAX_BUF_RECORDS * sizeof(struct xc_sr_record), although it will require > an index integer as well. Good point, it will avoid the alloc/free of the struct xc_sr_record, I will add a 'buffered_rec_num' to the context also. > > It would probably be best to factor out setup() and clean() just like > you did for the save side. OK. > >> + >> /* >> * Xenstore and Console parameters. >> * INPUT: evtchn & domid >> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c >> index 0e512ec..85534a8 100644 >> --- a/tools/libxc/xc_sr_restore.c [...] >> @@ -541,7 +618,27 @@ static int restore(struct xc_sr_context *ctx) >> { >> rc = read_record(ctx, &rec); >> if ( rc ) >> - goto err; >> + { >> + if ( ctx->restore.buffer_all_records ) >> + goto err_buf; > > Please can we choose a label sufficiently different to "err". > > "resume_from_checkpoint" perhaps? I think "remus_failover" is better? If you don't have objections, I will use it as the label. > > ~Andrew > >> + else >> + goto err; >> + } >> + >> +#ifdef XG_LIBXL_HVM_COMPAT >> + if ( ctx->dominfo.hvm && >> + (rec.type == REC_TYPE_END || rec.type == REC_TYPE_CHECKPOINT) ) >> + { >> + rc = read_qemu(ctx); >> + if ( rc ) >> + { >> + if ( ctx->restore.buffer_all_records ) >> + goto err_buf; >> + else >> + goto err; >> + } >> + } >> +#endif >> >> rc = process_record(ctx, &rec); >> if ( rc ) >> @@ -549,15 +646,11 @@ static int restore(struct xc_sr_context *ctx) >> >> } while ( rec.type != REC_TYPE_END ); >> >> -#ifdef XG_LIBXL_HVM_COMPAT >> - if ( ctx->dominfo.hvm ) >> - { >> - rc = read_qemu(ctx); >> - if ( rc ) >> - goto err; >> - } >> -#endif >> - >> + err_buf: >> + /* >> + * With Remus, if we reach here, there must be some error on primary, >> + * failover from the last checkpoint state. >> + */ >> rc = ctx->restore.ops.stream_complete(ctx); >> if ( rc ) >> goto err; >> @@ -571,6 +664,13 @@ static int restore(struct xc_sr_context *ctx) >> PERROR("Restore failed"); >> >> done: >> + for ( i = 0; i < MAX_BUF_RECORDS; i++) >> + { >> + if ( ctx->restore.buffered_records[i] ) { >> + free(ctx->restore.buffered_records[i]->data); >> + free(ctx->restore.buffered_records[i]); >> + } >> + } >> free(ctx->restore.populated_pfns); >> rc = ctx->restore.ops.cleanup(ctx); >> if ( rc ) > > . > -- Thanks, Yang.