* [PATCH Remus v4 0/2] Remus support for Migration-v2
@ 2015-05-14 8:56 Yang Hongyang
2015-05-14 8:56 ` [PATCH Remus v4 1/2] libxc/save: implement Remus checkpointed save Yang Hongyang
2015-05-14 8:56 ` [PATCH Remus v4 2/2] libxc/restore: implement Remus checkpointed restore Yang Hongyang
0 siblings, 2 replies; 4+ messages in thread
From: Yang Hongyang @ 2015-05-14 8:56 UTC (permalink / raw)
To: xen-devel
Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
eddie.dong, guijianfeng, rshriram, ian.jackson
This patchset implement the Remus support for Migration v2 but without
memory compressing.
Git tree available at:
https://github.com/macrosheep/xen/tree/Remus-newmig-v4
This patchset is based on
[PATCH v6 00/16] Misc patches to aid migration v2 Remus support
https://github.com/macrosheep/xen/tree/misc-remus-v6
Yang Hongyang (2):
libxc/save: implement Remus checkpointed save
libxc/restore: implement Remus checkpointed restore
tools/libxc/xc_sr_common.h | 14 +++++
tools/libxc/xc_sr_restore.c | 121 ++++++++++++++++++++++++++++++++++++++++----
tools/libxc/xc_sr_save.c | 80 ++++++++++++++++++++++-------
3 files changed, 186 insertions(+), 29 deletions(-)
--
1.9.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH Remus v4 1/2] libxc/save: implement Remus checkpointed save
2015-05-14 8:56 [PATCH Remus v4 0/2] Remus support for Migration-v2 Yang Hongyang
@ 2015-05-14 8:56 ` Yang Hongyang
2015-05-14 8:56 ` [PATCH Remus v4 2/2] libxc/restore: implement Remus checkpointed restore Yang Hongyang
1 sibling, 0 replies; 4+ messages in thread
From: Yang Hongyang @ 2015-05-14 8:56 UTC (permalink / raw)
To: xen-devel
Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
eddie.dong, guijianfeng, rshriram, ian.jackson
With Remus, the save flow should be:
live migration->{ periodically save(checkpointed save) }
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
tools/libxc/xc_sr_save.c | 80 ++++++++++++++++++++++++++++++++++++------------
1 file changed, 61 insertions(+), 19 deletions(-)
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index 1d0a46d..1c5d199 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -57,6 +57,16 @@ static int write_end_record(struct xc_sr_context *ctx)
}
/*
+ * Writes an CHECKPOINT record into the stream.
+ */
+static int write_checkpoint_record(struct xc_sr_context *ctx)
+{
+ struct xc_sr_record checkpoint = { REC_TYPE_CHECKPOINT, 0, NULL };
+
+ return write_record(ctx, &checkpoint);
+}
+
+/*
* Writes a batch of memory as a PAGE_DATA record into the stream. The batch
* is constructed in ctx->save.batch_pfns.
*
@@ -467,6 +477,14 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
&ctx->save.dirty_bitmap_hbuf);
+ /*
+ * With Remus, we will enter checkpointed save after live migration.
+ * In checkpointed save loop, we skip the live part and pause straight
+ * away to send dirty pages between checkpoints.
+ */
+ if ( !ctx->save.live )
+ goto last_iter;
+
rc = enable_logdirty(ctx);
if ( rc )
goto out;
@@ -505,6 +523,7 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
goto out;
}
+ last_iter:
rc = suspend_domain(ctx);
if ( rc )
goto out;
@@ -667,29 +686,52 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
if ( rc )
goto err;
- rc = ctx->save.ops.start_of_checkpoint(ctx);
- if ( rc )
- goto err;
+ do {
+ rc = ctx->save.ops.start_of_checkpoint(ctx);
+ if ( rc )
+ goto err;
- if ( ctx->save.live )
- rc = send_domain_memory_live(ctx);
- else
- rc = send_domain_memory_nonlive(ctx);
+ if ( ctx->save.live || ctx->save.checkpointed )
+ rc = send_domain_memory_live(ctx);
+ else
+ rc = send_domain_memory_nonlive(ctx);
- if ( rc )
- goto err;
+ if ( rc )
+ goto err;
- if ( !ctx->dominfo.shutdown ||
- (ctx->dominfo.shutdown_reason != SHUTDOWN_suspend) )
- {
- ERROR("Domain has not been suspended");
- rc = -1;
- goto err;
- }
+ if ( !ctx->dominfo.shutdown ||
+ (ctx->dominfo.shutdown_reason != SHUTDOWN_suspend) )
+ {
+ ERROR("Domain has not been suspended");
+ rc = -1;
+ goto err;
+ }
- rc = ctx->save.ops.end_of_checkpoint(ctx);
- if ( rc )
- goto err;
+ rc = ctx->save.ops.end_of_checkpoint(ctx);
+ if ( rc )
+ goto err;
+
+ if ( ctx->save.checkpointed )
+ {
+ if ( ctx->save.live )
+ {
+ /* End of live migration, we are sending checkpointed stream */
+ ctx->save.live = false;
+ }
+
+ rc = write_checkpoint_record(ctx);
+ if ( rc )
+ goto err;
+
+ ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
+
+ rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
+ if ( rc > 0 )
+ xc_report_progress_single(xch, "Checkpointed save");
+ else
+ ctx->save.checkpointed = false;
+ }
+ } while ( ctx->save.checkpointed );
xc_report_progress_single(xch, "End of stream");
--
1.9.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH Remus v4 2/2] libxc/restore: implement Remus checkpointed restore
2015-05-14 8:56 [PATCH Remus v4 0/2] Remus support for Migration-v2 Yang Hongyang
2015-05-14 8:56 ` [PATCH Remus v4 1/2] libxc/save: implement Remus checkpointed save Yang Hongyang
@ 2015-05-14 8:56 ` Yang Hongyang
2015-05-14 9:24 ` Andrew Cooper
1 sibling, 1 reply; 4+ messages in thread
From: Yang Hongyang @ 2015-05-14 8:56 UTC (permalink / raw)
To: xen-devel
Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
eddie.dong, guijianfeng, rshriram, ian.jackson
With Remus, the restore flow should be:
the first full migration stream -> { periodically restore stream }
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
tools/libxc/xc_sr_common.h | 14 +++++
tools/libxc/xc_sr_restore.c | 121 ++++++++++++++++++++++++++++++++++++++++----
2 files changed, 125 insertions(+), 10 deletions(-)
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index f8121e7..3bf27f1 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -208,6 +208,20 @@ struct xc_sr_context
/* Plain VM, or checkpoints over time. */
bool checkpointed;
+ /* Currently buffering records between a checkpoint */
+ bool buffer_all_records;
+
+/*
+ * With Remus, we buffer the records sent by the primary at checkpoint,
+ * in case the primary will fail, we can recover from the last
+ * checkpoint state.
+ * This should be enough because primary only send dirty pages at
+ * checkpoint.
+ */
+#define MAX_BUF_RECORDS 1024
+ struct xc_sr_record *buffered_records;
+ unsigned buffered_rec_num;
+
/*
* Xenstore and Console parameters.
* INPUT: evtchn & domid
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 9ab5760..3c93406 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -468,10 +468,73 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
return rc;
}
+static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec);
+static int handle_checkpoint(struct xc_sr_context *ctx)
+{
+ xc_interface *xch = ctx->xch;
+ int rc = 0;
+ unsigned i;
+ struct xc_sr_record *rec;
+
+ if ( !ctx->restore.checkpointed )
+ {
+ ERROR("Found checkpoint in non-checkpointed stream");
+ rc = -1;
+ goto err;
+ }
+
+ if ( ctx->restore.buffer_all_records )
+ {
+ IPRINTF("All records buffered");
+
+ /*
+ * We need to set buffer_all_records to false in
+ * order to process records instead of buffer records.
+ * buffer_all_records should be set back to true after
+ * we successfully processed all records.
+ */
+ ctx->restore.buffer_all_records = false;
+ for ( i = 0; i < ctx->restore.buffered_rec_num; i++)
+ {
+ rec = ctx->restore.buffered_records +
+ i * sizeof(struct xc_sr_record);
+ rc = process_record(ctx, rec);
+ if ( rc )
+ goto err;
+ }
+ ctx->restore.buffered_rec_num = 0;
+ ctx->restore.buffer_all_records = true;
+ IPRINTF("All records processed");
+ }
+ else
+ ctx->restore.buffer_all_records = true;
+
+ err:
+ return rc;
+}
+
static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
{
xc_interface *xch = ctx->xch;
int rc = 0;
+ struct xc_sr_record *buf_rec;
+
+ if ( ctx->restore.buffer_all_records &&
+ rec->type != REC_TYPE_END &&
+ rec->type != REC_TYPE_CHECKPOINT )
+ {
+ if ( ctx->restore.buffered_rec_num >= MAX_BUF_RECORDS )
+ {
+ ERROR("There are too many records within a checkpoint");
+ return -1;
+ }
+
+ buf_rec = ctx->restore.buffered_records +
+ ctx->restore.buffered_rec_num++ * sizeof(struct xc_sr_record);
+ memcpy(buf_rec, rec, sizeof(struct xc_sr_record));
+
+ return 0;
+ }
switch ( rec->type )
{
@@ -487,12 +550,17 @@ static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
ctx->restore.verify = true;
break;
+ case REC_TYPE_CHECKPOINT:
+ rc = handle_checkpoint(ctx);
+ break;
+
default:
rc = ctx->restore.ops.process_record(ctx, rec);
break;
}
free(rec->data);
+ rec->data = NULL;
if ( rc == RECORD_NOT_PROCESSED )
{
@@ -529,6 +597,15 @@ static int setup(struct xc_sr_context *ctx)
goto err;
}
+ ctx->restore.buffered_records = malloc(
+ MAX_BUF_RECORDS * sizeof(struct xc_sr_record));
+ if ( !ctx->restore.buffered_records )
+ {
+ ERROR("Unable to allocate memory for buffered records");
+ rc = -1;
+ goto err;
+ }
+
err:
return rc;
}
@@ -536,7 +613,15 @@ static int setup(struct xc_sr_context *ctx)
static void cleanup(struct xc_sr_context *ctx)
{
xc_interface *xch = ctx->xch;
+ unsigned i;
+ struct xc_sr_record *rec;
+ for ( i = 0; i < ctx->restore.buffered_rec_num; i++)
+ {
+ rec = ctx->restore.buffered_records + i * sizeof(struct xc_sr_record);
+ free(rec->data);
+ }
+ free(ctx->restore.buffered_records);
free(ctx->restore.populated_pfns);
if ( ctx->restore.ops.cleanup(ctx) )
PERROR("Failed to clean up");
@@ -564,7 +649,27 @@ static int restore(struct xc_sr_context *ctx)
{
rc = read_record(ctx, &rec);
if ( rc )
- goto err;
+ {
+ if ( ctx->restore.buffer_all_records )
+ goto remus_failover;
+ else
+ goto err;
+ }
+
+#ifdef XG_LIBXL_HVM_COMPAT
+ if ( ctx->dominfo.hvm &&
+ (rec.type == REC_TYPE_END || rec.type == REC_TYPE_CHECKPOINT) )
+ {
+ rc = read_qemu(ctx);
+ if ( rc )
+ {
+ if ( ctx->restore.buffer_all_records )
+ goto remus_failover;
+ else
+ goto err;
+ }
+ }
+#endif
rc = process_record(ctx, &rec);
if ( rc )
@@ -572,15 +677,11 @@ static int restore(struct xc_sr_context *ctx)
} while ( rec.type != REC_TYPE_END );
-#ifdef XG_LIBXL_HVM_COMPAT
- if ( ctx->dominfo.hvm )
- {
- rc = read_qemu(ctx);
- if ( rc )
- goto err;
- }
-#endif
-
+ remus_failover:
+ /*
+ * With Remus, if we reach here, there must be some error on primary,
+ * failover from the last checkpoint state.
+ */
rc = ctx->restore.ops.stream_complete(ctx);
if ( rc )
goto err;
--
1.9.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH Remus v4 2/2] libxc/restore: implement Remus checkpointed restore
2015-05-14 8:56 ` [PATCH Remus v4 2/2] libxc/restore: implement Remus checkpointed restore Yang Hongyang
@ 2015-05-14 9:24 ` Andrew Cooper
0 siblings, 0 replies; 4+ messages in thread
From: Andrew Cooper @ 2015-05-14 9:24 UTC (permalink / raw)
To: Yang Hongyang, xen-devel
Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
eddie.dong, rshriram, ian.jackson
On 14/05/15 09:56, Yang Hongyang wrote:
> With Remus, the restore flow should be:
> the first full migration stream -> { periodically restore stream }
>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> ---
> tools/libxc/xc_sr_common.h | 14 +++++
> tools/libxc/xc_sr_restore.c | 121 ++++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 125 insertions(+), 10 deletions(-)
>
> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
> index f8121e7..3bf27f1 100644
> --- a/tools/libxc/xc_sr_common.h
> +++ b/tools/libxc/xc_sr_common.h
> @@ -208,6 +208,20 @@ struct xc_sr_context
> /* Plain VM, or checkpoints over time. */
> bool checkpointed;
>
> + /* Currently buffering records between a checkpoint */
> + bool buffer_all_records;
> +
> +/*
> + * With Remus, we buffer the records sent by the primary at checkpoint,
> + * in case the primary will fail, we can recover from the last
> + * checkpoint state.
> + * This should be enough because primary only send dirty pages at
> + * checkpoint.
> + */
> +#define MAX_BUF_RECORDS 1024
> + struct xc_sr_record *buffered_records;
> + unsigned buffered_rec_num;
> +
> /*
> * Xenstore and Console parameters.
> * INPUT: evtchn & domid
> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
> index 9ab5760..3c93406 100644
> --- a/tools/libxc/xc_sr_restore.c
> +++ b/tools/libxc/xc_sr_restore.c
> @@ -468,10 +468,73 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
> return rc;
> }
>
> +static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec);
> +static int handle_checkpoint(struct xc_sr_context *ctx)
> +{
> + xc_interface *xch = ctx->xch;
> + int rc = 0;
> + unsigned i;
> + struct xc_sr_record *rec;
> +
> + if ( !ctx->restore.checkpointed )
> + {
> + ERROR("Found checkpoint in non-checkpointed stream");
> + rc = -1;
> + goto err;
> + }
> +
> + if ( ctx->restore.buffer_all_records )
> + {
> + IPRINTF("All records buffered");
> +
> + /*
> + * We need to set buffer_all_records to false in
> + * order to process records instead of buffer records.
> + * buffer_all_records should be set back to true after
> + * we successfully processed all records.
> + */
> + ctx->restore.buffer_all_records = false;
> + for ( i = 0; i < ctx->restore.buffered_rec_num; i++)
Space before closing bracket.
> + {
> + rec = ctx->restore.buffered_records +
> + i * sizeof(struct xc_sr_record);
This pointer arithmetic looks wrong.
FWIW, "rec = &ctx->restore.buffered_records[i];" would be clearer,
although you don't even need to pull it into a variable as it is only
referenced once.
> + rc = process_record(ctx, rec);
> + if ( rc )
> + goto err;
> + }
> + ctx->restore.buffered_rec_num = 0;
> + ctx->restore.buffer_all_records = true;
> + IPRINTF("All records processed");
> + }
> + else
> + ctx->restore.buffer_all_records = true;
> +
> + err:
> + return rc;
> +}
> +
> static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
> {
> xc_interface *xch = ctx->xch;
> int rc = 0;
> + struct xc_sr_record *buf_rec;
> +
> + if ( ctx->restore.buffer_all_records &&
> + rec->type != REC_TYPE_END &&
> + rec->type != REC_TYPE_CHECKPOINT )
> + {
> + if ( ctx->restore.buffered_rec_num >= MAX_BUF_RECORDS )
> + {
> + ERROR("There are too many records within a checkpoint");
> + return -1;
> + }
> +
> + buf_rec = ctx->restore.buffered_records +
> + ctx->restore.buffered_rec_num++ * sizeof(struct xc_sr_record);
Ah - this is how the other bit of pointer arithmetic doesn’t break, but
it will wander off the array if the sender provides more than 32 records.
> + memcpy(buf_rec, rec, sizeof(struct xc_sr_record));
As before,
memcpy(&ctx->restore.buffered_records[ctx->restore.buffered_rec_num++],
rec, sizeof(*rec));
might be a little more simple.
> +
> + return 0;
> + }
>
> switch ( rec->type )
> {
> @@ -487,12 +550,17 @@ static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
> ctx->restore.verify = true;
> break;
>
> + case REC_TYPE_CHECKPOINT:
> + rc = handle_checkpoint(ctx);
> + break;
> +
> default:
> rc = ctx->restore.ops.process_record(ctx, rec);
> break;
> }
>
> free(rec->data);
> + rec->data = NULL;
>
> if ( rc == RECORD_NOT_PROCESSED )
> {
> @@ -529,6 +597,15 @@ static int setup(struct xc_sr_context *ctx)
> goto err;
> }
>
> + ctx->restore.buffered_records = malloc(
> + MAX_BUF_RECORDS * sizeof(struct xc_sr_record));
> + if ( !ctx->restore.buffered_records )
> + {
> + ERROR("Unable to allocate memory for buffered records");
> + rc = -1;
> + goto err;
> + }
> +
> err:
> return rc;
> }
> @@ -536,7 +613,15 @@ static int setup(struct xc_sr_context *ctx)
> static void cleanup(struct xc_sr_context *ctx)
> {
> xc_interface *xch = ctx->xch;
> + unsigned i;
> + struct xc_sr_record *rec;
>
> + for ( i = 0; i < ctx->restore.buffered_rec_num; i++)
Style.
> + {
> + rec = ctx->restore.buffered_records + i * sizeof(struct xc_sr_record);
> + free(rec->data);
More bad pointer arithmetic.
Other than the pointer arithmetic issues (and a few minor style issues),
this patch looks fine.
~Andrew
> + }
> + free(ctx->restore.buffered_records);
> free(ctx->restore.populated_pfns);
> if ( ctx->restore.ops.cleanup(ctx) )
> PERROR("Failed to clean up");
> @@ -564,7 +649,27 @@ static int restore(struct xc_sr_context *ctx)
> {
> rc = read_record(ctx, &rec);
> if ( rc )
> - goto err;
> + {
> + if ( ctx->restore.buffer_all_records )
> + goto remus_failover;
> + else
> + goto err;
> + }
> +
> +#ifdef XG_LIBXL_HVM_COMPAT
> + if ( ctx->dominfo.hvm &&
> + (rec.type == REC_TYPE_END || rec.type == REC_TYPE_CHECKPOINT) )
> + {
> + rc = read_qemu(ctx);
> + if ( rc )
> + {
> + if ( ctx->restore.buffer_all_records )
> + goto remus_failover;
> + else
> + goto err;
> + }
> + }
> +#endif
>
> rc = process_record(ctx, &rec);
> if ( rc )
> @@ -572,15 +677,11 @@ static int restore(struct xc_sr_context *ctx)
>
> } while ( rec.type != REC_TYPE_END );
>
> -#ifdef XG_LIBXL_HVM_COMPAT
> - if ( ctx->dominfo.hvm )
> - {
> - rc = read_qemu(ctx);
> - if ( rc )
> - goto err;
> - }
> -#endif
> -
> + remus_failover:
> + /*
> + * With Remus, if we reach here, there must be some error on primary,
> + * failover from the last checkpoint state.
> + */
> rc = ctx->restore.ops.stream_complete(ctx);
> if ( rc )
> goto err;
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-05-14 9:24 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-14 8:56 [PATCH Remus v4 0/2] Remus support for Migration-v2 Yang Hongyang
2015-05-14 8:56 ` [PATCH Remus v4 1/2] libxc/save: implement Remus checkpointed save Yang Hongyang
2015-05-14 8:56 ` [PATCH Remus v4 2/2] libxc/restore: implement Remus checkpointed restore Yang Hongyang
2015-05-14 9:24 ` Andrew Cooper
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.