On Wed, Jul 9, 2014 at 3:47 AM, Yang Hongyang <yanghy@cn.fujitsu.com> wrote:
implement remus checkpoint in v2 save
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
tools/libxc/saverestore/common.h | 1 +
tools/libxc/saverestore/save.c | 88 ++++++++++++++++++++++++----------------
2 files changed, 55 insertions(+), 34 deletions(-)
diff --git a/tools/libxc/saverestore/common.h b/tools/libxc/saverestore/common.h
index 24ba95b..1dd9f51 100644
--- a/tools/libxc/saverestore/common.h
+++ b/tools/libxc/saverestore/common.h
@@ -153,6 +153,7 @@ struct xc_sr_context
xc_dominfo_t dominfo;
bool checkpointed;
+ bool firsttime;
union
{
diff --git a/tools/libxc/saverestore/save.c b/tools/libxc/saverestore/save.c
index d2fa8a6..98a5c2f 100644
--- a/tools/libxc/saverestore/save.c
+++ b/tools/libxc/saverestore/save.c
@@ -375,6 +375,8 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
goto out;
}
+ if ( ctx->checkpointed && !ctx->firsttime )
+ goto lastiter;
nit: last_iter
I suggest adding a comment before this code block to explain what has happened sofar above why we are jumping to the last_iter block skipping the rest of the stuff below.
I also suggest maintaining some stats structure somewhere (number of dirty pages,time when checkpoint was initiated, etc.). They could be simply debug prints thatcan be enabled on demand.
A better alternative would be to somehow pass this stats structure back to libxl,such that the user can enable/disable stats printing using the xl remus commandfor example.
/* This juggling is required if logdirty is already on, e.g. VRAM tracking */
if ( xc_shadow_control(xch, ctx->domid,
XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
@@ -436,6 +438,7 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
break;
}
+lastiter:
rc = suspend_domain(ctx);
if ( rc )
goto out;
I hope the postsuspend callbacks are called somewhere?@@ -570,44 +573,60 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
if ( rc )
goto err;
- rc = ctx->save.ops.start_of_stream(ctx);
- if ( rc )
- goto err;
+ do {
+ rc = ctx->save.ops.start_of_stream(ctx);
+ if ( rc )
+ goto err;
- if ( ctx->save.live )
- {
- DPRINTF("Starting live migrate");
- rc = send_domain_memory_live(ctx);
- }
- else
- {
- DPRINTF("Starting nonlive save");
- rc = send_domain_memory_nonlive(ctx);
- }
+ if ( ctx->save.live )
+ {
+ DPRINTF("Starting live migrate");
I suggest using the ctx->firsttime bool var before this printf, so thatwhen debug print is enabled under remus, the console is notflooded with the same statement that makes no sense past thefirst iteration.
+ rc = send_domain_memory_live(ctx);
+ }
+ else
+ {
+ DPRINTF("Starting nonlive save");
+ rc = send_domain_memory_nonlive(ctx);
+ }
- if ( rc )
- goto err;
+ if ( rc )
+ goto err;
- /* Refresh domain information now it has paused. */
- if ( (xc_domain_getinfo(xch, ctx->domid, 1, &ctx->dominfo) != 1) ||
- (ctx->dominfo.domid != ctx->domid) )
- {
- PERROR("Unable to refresh domain information");
- rc = -1;
- goto err;
- }
- else if ( (!ctx->dominfo.shutdown ||
- ctx->dominfo.shutdown_reason != SHUTDOWN_suspend ) &&
- !ctx->dominfo.paused )
- {
- ERROR("Domain has not been suspended");
- rc = -1;
- goto err;
- }
+ /* Refresh domain information now it has paused. */
+ if ( (xc_domain_getinfo(xch, ctx->domid, 1, &ctx->dominfo) != 1) ||
+ (ctx->dominfo.domid != ctx->domid) )
+ {
+ PERROR("Unable to refresh domain information");
+ rc = -1;
+ goto err;
+ }
+ else if ( (!ctx->dominfo.shutdown ||
+ ctx->dominfo.shutdown_reason != SHUTDOWN_suspend ) &&
+ !ctx->dominfo.paused )
+ {
+ ERROR("Domain has not been suspended");
+ rc = -1;
+ goto err;
+ }
A silly question: Shouldn't this check for 'suspended status' be before the call tosend_domain_memory_live() [under remus]. I am assuming that thesend_domain_memory_live() function is the one that sends the dirty page dataout on the wire.
- rc = ctx->save.ops.end_of_stream(ctx);
- if ( rc )
- goto err;
+ rc = ctx->save.ops.end_of_stream(ctx);
+ if ( rc )
+ goto err;
+
+ if ( ctx->checkpointed ) {
+ if ( ctx->firsttime )
+ ctx->firsttime = false;
+
+ ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
+
+ rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
+ if ( rc > 0 ) {
+ IPRINTF("Next checkpoint\n");
I suggest maintaining checkpoint numbers instead. Much more helpful. Probably even adda gettimeofday call to print out the time the new checkpoint is started. Once again, its usefulto be able to control this printout from libxl
+ } else {
+ ctx->checkpointed = false;
+ }
+ }
+ } while ( ctx->checkpointed );
rc = write_end_record(ctx);
if ( rc )
@@ -653,6 +672,7 @@ int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_ite
ctx.save.live = !!(flags & XCFLAGS_LIVE);
ctx.save.debug = !!(flags & XCFLAGS_DEBUG);
ctx.checkpointed = !!(flags & XCFLAGS_CHECKPOINTED);
+ ctx.firsttime = true;
if ( ctx.checkpointed ) {
/* This is a checkpointed save, we need these callbacks */
--
1.9.1