From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ian Campbell Subject: Re: [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore Date: Thu, 14 May 2015 14:05:16 +0100 Message-ID: <1431608716.13579.78.camel@citrix.com> References: <1431597974-15624-1-git-send-email-yanghy@cn.fujitsu.com> <1431597974-15624-3-git-send-email-yanghy@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1431597974-15624-3-git-send-email-yanghy@cn.fujitsu.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Yang Hongyang Cc: wei.liu2@citrix.com, eddie.dong@intel.com, wency@cn.fujitsu.com, andrew.cooper3@citrix.com, yunhong.jiang@intel.com, ian.jackson@eu.citrix.com, xen-devel@lists.xen.org, guijianfeng@cn.fujitsu.com, rshriram@cs.ubc.ca List-Id: xen-devel@lists.xenproject.org On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote: > With Remus, the restore flow should be: > the first full migration stream -> { periodically restore stream } > > Signed-off-by: Yang Hongyang > Signed-off-by: Andrew Cooper > CC: Ian Campbell > CC: Ian Jackson > CC: Wei Liu > --- > tools/libxc/xc_sr_common.h | 14 ++++++ > tools/libxc/xc_sr_restore.c | 113 ++++++++++++++++++++++++++++++++++++++++---- > 2 files changed, 117 insertions(+), 10 deletions(-) > > diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h > index f8121e7..3bf27f1 100644 > --- a/tools/libxc/xc_sr_common.h > +++ b/tools/libxc/xc_sr_common.h > @@ -208,6 +208,20 @@ struct xc_sr_context > /* Plain VM, or checkpoints over time. */ > bool checkpointed; > > + /* Currently buffering records between a checkpoint */ > + bool buffer_all_records; > + > +/* > + * With Remus, we buffer the records sent by the primary at checkpoint, > + * in case the primary will fail, we can recover from the last > + * checkpoint state. > + * This should be enough because primary only send dirty pages at > + * checkpoint. I'm not sure how it then follows that 1024 buffers is guaranteed to be enough, unless there is something on the sending side arranging it to be so? > + */ > +#define MAX_BUF_RECORDS 1024 > + struct xc_sr_record *buffered_records; > + unsigned buffered_rec_num; > + > /* > * Xenstore and Console parameters. > * INPUT: evtchn & domid > diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c > index 9ab5760..8468ffc 100644 > --- a/tools/libxc/xc_sr_restore.c > +++ b/tools/libxc/xc_sr_restore.c > @@ -468,11 +468,69 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec) > return rc; > } > > +static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec); > +static int handle_checkpoint(struct xc_sr_context *ctx) > +{ > + xc_interface *xch = ctx->xch; > + int rc = 0; > + unsigned i; > + > + if ( !ctx->restore.checkpointed ) > + { > + ERROR("Found checkpoint in non-checkpointed stream"); > + rc = -1; Is it usual in migrv2 to set errno as well? > + goto err; > + } > + > + if ( ctx->restore.buffer_all_records ) > + { > + IPRINTF("All records buffered"); > + > + /* > + * We need to set buffer_all_records to false in > + * order to process records instead of buffer records. > + * buffer_all_records should be set back to true after > + * we successfully processed all records. > + */ > + ctx->restore.buffer_all_records = false; I'm not personally a fan of changing global state in order to simulate the action of what should be a parameter to a function. Preferable IMHO would be to have process_record gain a parameter to override the ctx state but become an internal helper (perhaps with a name change) and then have API function process_record and process_buffered_records or some such which call it in the right way. Andy may have a differing opinion though.