From mboxrd@z Thu Jan  1 00:00:00 1970
From: Yang Hongyang <yanghy@cn.fujitsu.com>
Subject: Re: [PATCH Remus v3 3/3] libxc/restore: implement Remus
 checkpointed restore
Date: Thu, 14 May 2015 13:42:43 +0800
Message-ID: <555435D3.8080106@cn.fujitsu.com>
References: <1431506101-29612-1-git-send-email-yanghy@cn.fujitsu.com>
	<1431506101-29612-4-git-send-email-yanghy@cn.fujitsu.com>
	<555378D3.5000800@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <555378D3.5000800@citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Andrew Cooper <andrew.cooper3@citrix.com>, xen-devel@lists.xen.org
Cc: wei.liu2@citrix.com, ian.campbell@citrix.com, wency@cn.fujitsu.com, guijianfeng@cn.fujitsu.com, yunhong.jiang@intel.com, eddie.dong@intel.com, rshriram@cs.ubc.ca, ian.jackson@eu.citrix.com
List-Id: xen-devel@lists.xenproject.org

On 05/14/2015 12:16 AM, Andrew Cooper wrote:
> On 13/05/15 09:35, Yang Hongyang wrote:
>> With Remus, the restore flow should be:
>> the first full migration stream -> { periodically restore stream }
>>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> ---
>>   tools/libxc/xc_sr_common.h  |  13 +++++
>>   tools/libxc/xc_sr_restore.c | 124 +++++++++++++++++++++++++++++++++++++++-----
>>   2 files changed, 125 insertions(+), 12 deletions(-)
>>
>> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
>> index f8121e7..3740b89 100644
>> --- a/tools/libxc/xc_sr_common.h
>> +++ b/tools/libxc/xc_sr_common.h
>> @@ -208,6 +208,19 @@ struct xc_sr_context
>>               /* Plain VM, or checkpoints over time. */
>>               bool checkpointed;
>>
>> +            /* Currently buffering records between a checkpoint */
>> +            bool buffer_all_records;
>> +
>> +/*
>> + * With Remus, we buffer the records sent by primary at checkpoint,
>
> "sent by the primary"
>
>> + * in case the primary will fail, we can recover from the last
>> + * checkpoint state.
>> + * This should be enough because primary only send dirty pages at
>> + * checkpoint.
>> + */
>> +#define MAX_BUF_RECORDS 1024
>
> As a general point, it occurs to me that this might be better as a
> linked list, rather than having a hard limit.  A burst of activity on
> the primary could potentially hit this limit

There are only few records at every checkpoint in my test, mostly under 10,
probably because I don't do much operations in the Guest. This limit can
be adjusted later by further testing.
Even with linked list, we still need a limit IMO, otherwise it will eat too
much memory.

>
>> +            struct xc_sr_record *buffered_records[MAX_BUF_RECORDS];
>
> Your data handling will be much more simple if this were struct
> xc_sr_record *buffered_records; and you do a one-time allocation of
> MAX_BUF_RECORDS * sizeof(struct xc_sr_record), although it will require
> an index integer as well.

Good point, it will avoid the alloc/free of the struct xc_sr_record, I
will add a 'buffered_rec_num' to the context also.

>
> It would probably be best to factor out setup() and clean() just like
> you did for the save side.

OK.

>
>> +
>>               /*
>>                * Xenstore and Console parameters.
>>                * INPUT:  evtchn & domid
>> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
>> index 0e512ec..85534a8 100644
>> --- a/tools/libxc/xc_sr_restore.c
[...]
>> @@ -541,7 +618,27 @@ static int restore(struct xc_sr_context *ctx)
>>       {
>>           rc = read_record(ctx, &rec);
>>           if ( rc )
>> -            goto err;
>> +        {
>> +            if ( ctx->restore.buffer_all_records )
>> +                goto err_buf;
>
> Please can we choose a label sufficiently different to "err".
>
> "resume_from_checkpoint" perhaps?

I think "remus_failover" is better? If you don't have objections, I will
use it as the label.

>
> ~Andrew
>
>> +            else
>> +                goto err;
>> +        }
>> +
>> +#ifdef XG_LIBXL_HVM_COMPAT
>> +        if ( ctx->dominfo.hvm &&
>> +             (rec.type == REC_TYPE_END || rec.type == REC_TYPE_CHECKPOINT) )
>> +        {
>> +            rc = read_qemu(ctx);
>> +            if ( rc )
>> +            {
>> +                if ( ctx->restore.buffer_all_records )
>> +                    goto err_buf;
>> +                else
>> +                    goto err;
>> +            }
>> +        }
>> +#endif
>>
>>           rc = process_record(ctx, &rec);
>>           if ( rc )
>> @@ -549,15 +646,11 @@ static int restore(struct xc_sr_context *ctx)
>>
>>       } while ( rec.type != REC_TYPE_END );
>>
>> -#ifdef XG_LIBXL_HVM_COMPAT
>> -    if ( ctx->dominfo.hvm )
>> -    {
>> -        rc = read_qemu(ctx);
>> -        if ( rc )
>> -            goto err;
>> -    }
>> -#endif
>> -
>> + err_buf:
>> +    /*
>> +     * With Remus, if we reach here, there must be some error on primary,
>> +     * failover from the last checkpoint state.
>> +     */
>>       rc = ctx->restore.ops.stream_complete(ctx);
>>       if ( rc )
>>           goto err;
>> @@ -571,6 +664,13 @@ static int restore(struct xc_sr_context *ctx)
>>       PERROR("Restore failed");
>>
>>    done:
>> +    for ( i = 0; i < MAX_BUF_RECORDS; i++)
>> +    {
>> +        if ( ctx->restore.buffered_records[i] ) {
>> +            free(ctx->restore.buffered_records[i]->data);
>> +            free(ctx->restore.buffered_records[i]);
>> +        }
>> +    }
>>       free(ctx->restore.populated_pfns);
>>       rc = ctx->restore.ops.cleanup(ctx);
>>       if ( rc )
>
> .
>

-- 
Thanks,
Yang.