From mboxrd@z Thu Jan  1 00:00:00 1970
From: Yang Hongyang <yanghy@cn.fujitsu.com>
Subject: Re: [PATCH Remus v5 2/2] libxc/restore: implement Remus
 checkpointed restore
Date: Fri, 15 May 2015 09:08:33 +0800
Message-ID: <55554711.2080205@cn.fujitsu.com>
References: <1431597974-15624-1-git-send-email-yanghy@cn.fujitsu.com>	
	<1431597974-15624-3-git-send-email-yanghy@cn.fujitsu.com>	
	<1431608716.13579.78.camel@citrix.com>
	<5554A055.5080900@citrix.com>
	<1431612254.13579.80.camel@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <1431612254.13579.80.camel@citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Ian Campbell <ian.campbell@citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>
Cc: wei.liu2@citrix.com, wency@cn.fujitsu.com, ian.jackson@eu.citrix.com, yunhong.jiang@intel.com, eddie.dong@intel.com, xen-devel@lists.xen.org, guijianfeng@cn.fujitsu.com, rshriram@cs.ubc.ca
List-Id: xen-devel@lists.xenproject.org


On 05/14/2015 10:04 PM, Ian Campbell wrote:
> On Thu, 2015-05-14 at 14:17 +0100, Andrew Cooper wrote:
>> On 14/05/15 14:05, Ian Campbell wrote:
>>> On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote:
>>>> With Remus, the restore flow should be:
>>>> the first full migration stream -> { periodically restore stream }
>>>>
>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>>>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>>>> CC: Wei Liu <wei.liu2@citrix.com>
>>>> ---
>>>>   tools/libxc/xc_sr_common.h  |  14 ++++++
>>>>   tools/libxc/xc_sr_restore.c | 113 ++++++++++++++++++++++++++++++++++++++++----
>>>>   2 files changed, 117 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
>>>> index f8121e7..3bf27f1 100644
>>>> --- a/tools/libxc/xc_sr_common.h
>>>> +++ b/tools/libxc/xc_sr_common.h
>>>> @@ -208,6 +208,20 @@ struct xc_sr_context
>>>>               /* Plain VM, or checkpoints over time. */
>>>>               bool checkpointed;
>>>>
>>>> +            /* Currently buffering records between a checkpoint */
>>>> +            bool buffer_all_records;
>>>> +
>>>> +/*
>>>> + * With Remus, we buffer the records sent by the primary at checkpoint,
>>>> + * in case the primary will fail, we can recover from the last
>>>> + * checkpoint state.
>>>> + * This should be enough because primary only send dirty pages at
>>>> + * checkpoint.
>>> I'm not sure how it then follows that 1024 buffers is guaranteed to be
>>> enough, unless there is something on the sending side arranging it to be
>>> so?
>>>
>>>> + */
>>>> +#define MAX_BUF_RECORDS 1024
>>>> +            struct xc_sr_record *buffered_records;
>>>> +            unsigned buffered_rec_num;
>>>> +
>>>>               /*
>>>>                * Xenstore and Console parameters.
>>>>                * INPUT:  evtchn & domid
>>>> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
>>>> index 9ab5760..8468ffc 100644
>>>> --- a/tools/libxc/xc_sr_restore.c
>>>> +++ b/tools/libxc/xc_sr_restore.c
>>>> @@ -468,11 +468,69 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
>>>>       return rc;
>>>>   }
>>>>
>>>> +static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec);
>>>> +static int handle_checkpoint(struct xc_sr_context *ctx)
>>>> +{
>>>> +    xc_interface *xch = ctx->xch;
>>>> +    int rc = 0;
>>>> +    unsigned i;
>>>> +
>>>> +    if ( !ctx->restore.checkpointed )
>>>> +    {
>>>> +        ERROR("Found checkpoint in non-checkpointed stream");
>>>> +        rc = -1;
>>> Is it usual in migrv2 to set errno as well?
>>
>> If a relevant errno is to be had.
>
> EINVAL or ENOSYS perhaps?
>>
>> There are a lot of cases which are waiting for some real libxc error
>> codes before they can propagate numeric error information, although in
>> all cases the log messages will be accurate (and hopefully helpful).
>>
>> ~Andrew
>>
>>>
>>>> +        goto err;
>>>> +    }
>>>> +
>>>> +    if ( ctx->restore.buffer_all_records )
>>>> +    {
>>>> +        IPRINTF("All records buffered");
>>>> +
>>>> +        /*
>>>> +         * We need to set buffer_all_records to false in
>>>> +         * order to process records instead of buffer records.
>>>> +         * buffer_all_records should be set back to true after
>>>> +         * we successfully processed all records.
>>>> +         */
>>>> +        ctx->restore.buffer_all_records = false;
>>> I'm not personally a fan of changing global state in order to simulate
>>> the action of what should be a parameter to a function.
>>>
>>> Preferable IMHO would be to have process_record gain a parameter to
>>> override the ctx state but become an internal helper (perhaps with a
>>> name change) and then have API function process_record and
>>> process_buffered_records or some such which call it in the right way.
>>>
>>> Andy may have a differing opinion though.
>>
>> Hmm yes - it would be nice to split the buffering logic away from the
>> processing logic.
>>
>> However, the two are slightly related.
>>
>> Perhaps a process_or_buffer_record() helper, and removing all buffering
>> logic from process_record().
>
> That seems like it would work.

Good idea, add a buffer_record() helper should be an improvement to the
code and make it more clearer, thank you!

>
>
>
> .
>

-- 
Thanks,
Yang.