From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yang Hongyang Subject: Re: [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore Date: Fri, 15 May 2015 09:08:33 +0800 Message-ID: <55554711.2080205@cn.fujitsu.com> References: <1431597974-15624-1-git-send-email-yanghy@cn.fujitsu.com> <1431597974-15624-3-git-send-email-yanghy@cn.fujitsu.com> <1431608716.13579.78.camel@citrix.com> <5554A055.5080900@citrix.com> <1431612254.13579.80.camel@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1431612254.13579.80.camel@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell , Andrew Cooper Cc: wei.liu2@citrix.com, wency@cn.fujitsu.com, ian.jackson@eu.citrix.com, yunhong.jiang@intel.com, eddie.dong@intel.com, xen-devel@lists.xen.org, guijianfeng@cn.fujitsu.com, rshriram@cs.ubc.ca List-Id: xen-devel@lists.xenproject.org On 05/14/2015 10:04 PM, Ian Campbell wrote: > On Thu, 2015-05-14 at 14:17 +0100, Andrew Cooper wrote: >> On 14/05/15 14:05, Ian Campbell wrote: >>> On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote: >>>> With Remus, the restore flow should be: >>>> the first full migration stream -> { periodically restore stream } >>>> >>>> Signed-off-by: Yang Hongyang >>>> Signed-off-by: Andrew Cooper >>>> CC: Ian Campbell >>>> CC: Ian Jackson >>>> CC: Wei Liu >>>> --- >>>> tools/libxc/xc_sr_common.h | 14 ++++++ >>>> tools/libxc/xc_sr_restore.c | 113 ++++++++++++++++++++++++++++++++++++++++---- >>>> 2 files changed, 117 insertions(+), 10 deletions(-) >>>> >>>> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h >>>> index f8121e7..3bf27f1 100644 >>>> --- a/tools/libxc/xc_sr_common.h >>>> +++ b/tools/libxc/xc_sr_common.h >>>> @@ -208,6 +208,20 @@ struct xc_sr_context >>>> /* Plain VM, or checkpoints over time. */ >>>> bool checkpointed; >>>> >>>> + /* Currently buffering records between a checkpoint */ >>>> + bool buffer_all_records; >>>> + >>>> +/* >>>> + * With Remus, we buffer the records sent by the primary at checkpoint, >>>> + * in case the primary will fail, we can recover from the last >>>> + * checkpoint state. >>>> + * This should be enough because primary only send dirty pages at >>>> + * checkpoint. >>> I'm not sure how it then follows that 1024 buffers is guaranteed to be >>> enough, unless there is something on the sending side arranging it to be >>> so? >>> >>>> + */ >>>> +#define MAX_BUF_RECORDS 1024 >>>> + struct xc_sr_record *buffered_records; >>>> + unsigned buffered_rec_num; >>>> + >>>> /* >>>> * Xenstore and Console parameters. >>>> * INPUT: evtchn & domid >>>> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c >>>> index 9ab5760..8468ffc 100644 >>>> --- a/tools/libxc/xc_sr_restore.c >>>> +++ b/tools/libxc/xc_sr_restore.c >>>> @@ -468,11 +468,69 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec) >>>> return rc; >>>> } >>>> >>>> +static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec); >>>> +static int handle_checkpoint(struct xc_sr_context *ctx) >>>> +{ >>>> + xc_interface *xch = ctx->xch; >>>> + int rc = 0; >>>> + unsigned i; >>>> + >>>> + if ( !ctx->restore.checkpointed ) >>>> + { >>>> + ERROR("Found checkpoint in non-checkpointed stream"); >>>> + rc = -1; >>> Is it usual in migrv2 to set errno as well? >> >> If a relevant errno is to be had. > > EINVAL or ENOSYS perhaps? >> >> There are a lot of cases which are waiting for some real libxc error >> codes before they can propagate numeric error information, although in >> all cases the log messages will be accurate (and hopefully helpful). >> >> ~Andrew >> >>> >>>> + goto err; >>>> + } >>>> + >>>> + if ( ctx->restore.buffer_all_records ) >>>> + { >>>> + IPRINTF("All records buffered"); >>>> + >>>> + /* >>>> + * We need to set buffer_all_records to false in >>>> + * order to process records instead of buffer records. >>>> + * buffer_all_records should be set back to true after >>>> + * we successfully processed all records. >>>> + */ >>>> + ctx->restore.buffer_all_records = false; >>> I'm not personally a fan of changing global state in order to simulate >>> the action of what should be a parameter to a function. >>> >>> Preferable IMHO would be to have process_record gain a parameter to >>> override the ctx state but become an internal helper (perhaps with a >>> name change) and then have API function process_record and >>> process_buffered_records or some such which call it in the right way. >>> >>> Andy may have a differing opinion though. >> >> Hmm yes - it would be nice to split the buffering logic away from the >> processing logic. >> >> However, the two are slightly related. >> >> Perhaps a process_or_buffer_record() helper, and removing all buffering >> logic from process_record(). > > That seems like it would work. Good idea, add a buffer_record() helper should be an improvement to the code and make it more clearer, thank you! > > > > . > -- Thanks, Yang.