From: Andrew Cooper <andrew.cooper3@citrix.com>
To: rshriram@cs.ubc.ca, Yang Hongyang <yanghy@cn.fujitsu.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>,
Ian Campbell <Ian.Campbell@citrix.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save
Date: Wed, 16 Jul 2014 16:38:56 +0100 [thread overview]
Message-ID: <53C69C90.1040908@citrix.com> (raw)
In-Reply-To: <CAP8mzPNowAjbSJ7JTx-TNkS+hom436VvB4fcdemhimuT+rtZwQ@mail.gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 8507 bytes --]
On 16/07/14 16:22, Shriram Rajagopalan wrote:
>
>
>
> On Wed, Jul 9, 2014 at 3:47 AM, Yang Hongyang <yanghy@cn.fujitsu.com
> <mailto:yanghy@cn.fujitsu.com>> wrote:
>
> implement remus checkpoint in v2 save
>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com
> <mailto:yanghy@cn.fujitsu.com>>
> ---
> tools/libxc/saverestore/common.h | 1 +
> tools/libxc/saverestore/save.c | 88
> ++++++++++++++++++++++++----------------
> 2 files changed, 55 insertions(+), 34 deletions(-)
>
> diff --git a/tools/libxc/saverestore/common.h
> b/tools/libxc/saverestore/common.h
> index 24ba95b..1dd9f51 100644
> --- a/tools/libxc/saverestore/common.h
> +++ b/tools/libxc/saverestore/common.h
> @@ -153,6 +153,7 @@ struct xc_sr_context
>
> xc_dominfo_t dominfo;
> bool checkpointed;
> + bool firsttime;
>
> union
> {
> diff --git a/tools/libxc/saverestore/save.c
> b/tools/libxc/saverestore/save.c
> index d2fa8a6..98a5c2f 100644
> --- a/tools/libxc/saverestore/save.c
> +++ b/tools/libxc/saverestore/save.c
> @@ -375,6 +375,8 @@ static int send_domain_memory_live(struct
> xc_sr_context *ctx)
> goto out;
> }
>
> + if ( ctx->checkpointed && !ctx->firsttime )
> + goto lastiter;
>
>
> nit: last_iter
>
> I suggest adding a comment before this code block to explain what has
> happened so
> far above why we are jumping to the last_iter block skipping the rest
> of the stuff below.
>
> I also suggest maintaining some stats structure somewhere (number of
> dirty pages,
> time when checkpoint was initiated, etc.). They could be simply debug
> prints that
> can be enabled on demand.
>
> A better alternative would be to somehow pass this stats structure
> back to libxl,
> such that the user can enable/disable stats printing using the xl
> remus command
> for example.
This is partly my fault. I so far haven’t got any of the "progress
report" set of logging functionality wired up. Once that is done, libxl
should be fine, as it can already filter the progress logging from the
regular logging.
>
>
> /* This juggling is required if logdirty is already on, e.g.
> VRAM tracking */
> if ( xc_shadow_control(xch, ctx->domid,
> XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
> @@ -436,6 +438,7 @@ static int send_domain_memory_live(struct
> xc_sr_context *ctx)
> break;
> }
>
> +lastiter:
> rc = suspend_domain(ctx);
> if ( rc )
> goto out;
>
>
> I hope the postsuspend callbacks are called somewhere?
>
>
> @@ -570,44 +573,60 @@ static int save(struct xc_sr_context *ctx,
> uint16_t guest_type)
> if ( rc )
> goto err;
>
> - rc = ctx->save.ops.start_of_stream(ctx);
> - if ( rc )
> - goto err;
> + do {
> + rc = ctx->save.ops.start_of_stream(ctx);
> + if ( rc )
> + goto err;
>
> - if ( ctx->save.live )
> - {
> - DPRINTF("Starting live migrate");
> - rc = send_domain_memory_live(ctx);
> - }
> - else
> - {
> - DPRINTF("Starting nonlive save");
> - rc = send_domain_memory_nonlive(ctx);
> - }
> + if ( ctx->save.live )
> + {
> + DPRINTF("Starting live migrate");
>
>
> I suggest using the ctx->firsttime bool var before this printf, so that
> when debug print is enabled under remus, the console is not
> flooded with the same statement that makes no sense past the
> first iteration.
A thought has just crossed my mind. It might be better to have a
"send_domain_remus()" function which internally deals with the
iterations and looping etc.
e.g.
if ( remus )
rc = send_domain_remus(ctx);
else if ( ctx->save.live )
rc = send_domain_live(ctx);
else
rc = send_domain_memory_nonlive(ctx);
In my upcoming series which fixes the deferred vcpu state problem I have
split some of the common bits out of send_domain_memory_{live,nonline}()
which might be useful.
>
>
> + rc = send_domain_memory_live(ctx);
> + }
> + else
> + {
> + DPRINTF("Starting nonlive save");
> + rc = send_domain_memory_nonlive(ctx);
> + }
>
> - if ( rc )
> - goto err;
> + if ( rc )
> + goto err;
>
> - /* Refresh domain information now it has paused. */
> - if ( (xc_domain_getinfo(xch, ctx->domid, 1, &ctx->dominfo) !=
> 1) ||
> - (ctx->dominfo.domid != ctx->domid) )
> - {
> - PERROR("Unable to refresh domain information");
> - rc = -1;
> - goto err;
> - }
> - else if ( (!ctx->dominfo.shutdown ||
> - ctx->dominfo.shutdown_reason != SHUTDOWN_suspend ) &&
> - !ctx->dominfo.paused )
> - {
> - ERROR("Domain has not been suspended");
> - rc = -1;
> - goto err;
> - }
> + /* Refresh domain information now it has paused. */
> + if ( (xc_domain_getinfo(xch, ctx->domid, 1,
> &ctx->dominfo) != 1) ||
> + (ctx->dominfo.domid != ctx->domid) )
> + {
> + PERROR("Unable to refresh domain information");
> + rc = -1;
> + goto err;
> + }
> + else if ( (!ctx->dominfo.shutdown ||
> + ctx->dominfo.shutdown_reason !=
> SHUTDOWN_suspend ) &&
> + !ctx->dominfo.paused )
> + {
> + ERROR("Domain has not been suspended");
> + rc = -1;
> + goto err;
> + }
>
>
> A silly question: Shouldn't this check for 'suspended status' be
> before the call to
> send_domain_memory_live() [under remus]. I am assuming that the
> send_domain_memory_live() function is the one that sends the dirty
> page data
> out on the wire.
Even for non-live migrates, we have to ensure that the VM has entered
the suspend state. A PV guest cannot possibly recover from resume if it
didn't voluntarily suspend as it will loose its reference to the start
info page (whos mfn lives in vcpu0.edx for the duration of the migrate
and must be updated on the receiving side). Any VM which doesn't
sufficiently quiesce its fronends risks ring and memory corruption on
resume.
In this case, the send_domain_memory_XXX functions do take care of
pausing the domain at the appropriate time, but this here is a sanity check.
~Andrew
>
>
>
> - rc = ctx->save.ops.end_of_stream(ctx);
> - if ( rc )
> - goto err;
> + rc = ctx->save.ops.end_of_stream(ctx);
> + if ( rc )
> + goto err;
> +
> + if ( ctx->checkpointed ) {
> + if ( ctx->firsttime )
> + ctx->firsttime = false;
> +
> + ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
> +
> + rc =
> ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
> + if ( rc > 0 ) {
> + IPRINTF("Next checkpoint\n");
>
>
> I suggest maintaining checkpoint numbers instead. Much more helpful.
> Probably even add
> a gettimeofday call to print out the time the new checkpoint is
> started. Once again, its useful
> to be able to control this printout from libxl
>
> + } else {
> + ctx->checkpointed = false;
> + }
> + }
> + } while ( ctx->checkpointed );
>
> rc = write_end_record(ctx);
> if ( rc )
> @@ -653,6 +672,7 @@ int xc_domain_save2(xc_interface *xch, int
> io_fd, uint32_t dom, uint32_t max_ite
> ctx.save.live = !!(flags & XCFLAGS_LIVE);
> ctx.save.debug = !!(flags & XCFLAGS_DEBUG);
> ctx.checkpointed = !!(flags & XCFLAGS_CHECKPOINTED);
> + ctx.firsttime = true;
>
> if ( ctx.checkpointed ) {
> /* This is a checkpointed save, we need these callbacks */
> --
> 1.9.1
>
>
[-- Attachment #1.2: Type: text/html, Size: 15189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2014-07-16 15:38 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-09 7:47 [RFC PATCH 0/3] Remus: add remus support for migration v2 Yang Hongyang
2014-07-09 7:47 ` [RFC PATCH 1/3] remus: add a bool var to indicate checkpointed stream Yang Hongyang
2014-07-09 9:45 ` Andrew Cooper
2014-07-09 9:53 ` Hongyang Yang
2014-07-09 7:47 ` [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save Yang Hongyang
2014-07-09 10:53 ` Andrew Cooper
2014-07-10 3:25 ` Hongyang Yang
2014-07-10 8:49 ` Ian Campbell
2014-07-10 9:24 ` Andrew Cooper
2014-07-16 15:22 ` Shriram Rajagopalan
2014-07-16 15:38 ` Andrew Cooper [this message]
2014-07-16 16:02 ` Shriram Rajagopalan
2014-07-16 16:33 ` Andrew Cooper
2014-07-09 7:47 ` [RFC PATCH 3/3] remus: adjust x86 pv restore to support remus Yang Hongyang
2014-07-09 11:16 ` Andrew Cooper
2014-07-09 11:26 ` Andrew Cooper
2014-07-10 3:30 ` Hongyang Yang
2014-07-10 9:25 ` Andrew Cooper
2014-07-10 9:32 ` Hongyang Yang
2014-07-10 9:42 ` Andrew Cooper
2014-07-10 9:47 ` Hongyang Yang
2014-07-09 8:53 ` [RFC PATCH 0/3] Remus: add remus support for migration v2 Ian Campbell
2014-07-09 9:56 ` Hongyang Yang
2014-07-09 9:42 ` Andrew Cooper
2014-07-09 10:06 ` Hongyang Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53C69C90.1040908@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=rshriram@cs.ubc.ca \
--cc=xen-devel@lists.xen.org \
--cc=yanghy@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).