From: Andrew Cooper <andrew.cooper3@citrix.com>
To: rshriram@cs.ubc.ca
Cc: Ian Campbell <ian.campbell@citrix.com>,
Xen-devel <xen-devel@lists.xen.org>
Subject: Re: [Patch v2] tools/migrate: Fix regression when migrating from older version of Xen
Date: Thu, 18 Jul 2013 17:27:25 +0100 [thread overview]
Message-ID: <51E8176D.5030809@citrix.com> (raw)
In-Reply-To: <CAP8mzPNnjpSt_Ma+7Ab7iwWg0Rpa+kg=9rkww6929nk1ftoBBg@mail.gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 4837 bytes --]
On 18/07/13 17:20, Shriram Rajagopalan wrote:
> On Tue, Jul 16, 2013 at 6:52 AM, Andrew Cooper
> <andrew.cooper3@citrix.com <mailto:andrew.cooper3@citrix.com>> wrote:
>
> Commit 00a4b65f8534c9e6521eab2e6ce796ae36037774 Sep 7 2010
> "libxc: provide notification of final checkpoint to restore end"
> broke migration from any version of Xen using tools from prior to
> that commit
>
> Older tools have no idea about an XC_SAVE_ID_LAST_CHECKPOINT,
> causing newer
> tools xc_domain_restore() to start reading the qemu save record, as
> ctx->last_checkpoint is 0.
>
> The failure looks like:
> xc: error: Max batch size exceeded (1970103633). Giving up.
> where 1970103633 = 0x756d6551 = *(uint32_t*)"Qemu"
>
> Sadly, the simple fix of just setting ctx->last_checkpoint = 1
> will cause an
> opposite function regresson for anyone using the current behaviour of
> save_callbacks->checkpoint().
>
> The only safe fix is to rely on the toolstack to provide this
> information.
>
> Passing 0 results in unchanged behaviour, while passing nonzero
> means "the
> other end of the migration stream does not know about
> XC_SAVE_ID_LAST_CHECKPOINT but is performing a normal migrate"
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com
> <mailto:andrew.cooper3@citrix.com>>
>
> ---
> Changes since v1:
> * Rename "last_checkpoint_unaware" to "checkpointed_stream" to be
> more
> generic yet still accurate as to the meaning and fix for the issue
> ---
> tools/libxc/xc_domain_restore.c | 3 ++-
> tools/libxc/xc_nomigrate.c | 2 +-
> tools/libxc/xenguest.h | 3 ++-
> tools/libxl/libxl_save_helper.c | 2 +-
> tools/xcutils/xc_restore.c | 2 +-
> 5 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/tools/libxc/xc_domain_restore.c
> b/tools/libxc/xc_domain_restore.c
> index 63d36cd..e50b185 100644
> --- a/tools/libxc/xc_domain_restore.c
> +++ b/tools/libxc/xc_domain_restore.c
> @@ -1401,7 +1401,7 @@ int xc_domain_restore(xc_interface *xch, int
> io_fd, uint32_t dom,
> domid_t store_domid, unsigned int
> console_evtchn,
> unsigned long *console_mfn, domid_t
> console_domid,
> unsigned int hvm, unsigned int pae, int
> superpages,
> - int no_incr_generationid,
> + int no_incr_generationid, int
> checkpointed_stream,
> unsigned long *vm_generationid_addr,
> struct restore_callbacks *callbacks)
> {
> @@ -1472,6 +1472,7 @@ int xc_domain_restore(xc_interface *xch, int
> io_fd, uint32_t dom,
>
> ctx->superpages = superpages;
> ctx->hvm = hvm;
> + ctx->last_checkpoint = !!checkpointed_stream;
>
>
> shouldnt it be a single unary NOT ?
> i.e., ctx->last_checkpoint = !checkpointed_stream;
It should, which reminds me why I had it named the way it was before. I
didn't think enough before running sed to change the name as per Ians
suggestion.
With the current naming, you have to set checpointed_stream=1 to get the
current behaviour, which means "possibly checkpointed or possibly not
but new enough that a non-checkpointed stream sends a LAST_CHECKPOINT
anyway"
>
> Otherwise, I am a bit confused. Basically, there is a memset(ctx, 0)
> right before this.
> And you seem to be passing checkpointed_stream = 0 via both xc_restore and
> libxl which basically sets it back to ctx->last_checkpoint = 0.
>
> Also, after getting pages from the last iteration, the code goes like
> this:
> if (ctx->last_checkpoint)
> goto finish;
> get pagebuf or goto finish on error
> get tailbuf or goto finish on error
> goto loadpages;
>
> finish:
> ...
>
> and you setting ctx->last_checkpoint = 0 basically means that you are
> banking on the far end (with older version of tools) to close the
> socket, causing
> "get pagebuf" to fail and subsequently land on finish.
> IIRC, this was the behavior before XC_SAVE_ID_LAST_CHECKPOINT was
> introduced
> by Ian Campbell, to get rid of this benign error message.
That might have been 'fine' for PV guests, but it cause active breakage
for HVM domains where the qemu save record immediately follows in the
migration stream.
~Andrew
>
> "This allows the restore end to finish up without waiting for a
> spurious timeout on the receive fd and thereby avoids unnecessary
> error logging in the case of a successful migration or restore."
>
> Further,
> "In the normal migration or restore case the first checkpoint is always
> the last. For a rolling checkpoint (such as Remus) the notification is
> currently unused "
>
>
[-- Attachment #1.2: Type: text/html, Size: 9506 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2013-07-18 16:27 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-16 10:52 [Patch v2] tools/migrate: Fix regression when migrating from older version of Xen Andrew Cooper
2013-07-18 13:01 ` Andrew Cooper
2013-07-18 13:14 ` Ian Campbell
2013-07-18 16:20 ` Shriram Rajagopalan
2013-07-18 16:27 ` Andrew Cooper [this message]
2013-07-18 18:47 ` Shriram Rajagopalan
2013-07-22 17:52 ` Andrew Cooper
2013-07-22 17:59 ` Ian Campbell
2013-07-22 18:06 ` Andrew Cooper
2013-07-19 9:36 ` Ian Campbell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51E8176D.5030809@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=ian.campbell@citrix.com \
--cc=rshriram@cs.ubc.ca \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).