Re: [Patch v2] tools/migrate: Fix regression when migrating from older version of Xen

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: rshriram@cs.ubc.ca
Cc: Ian Campbell <ian.campbell@citrix.com>,
	Xen-devel <xen-devel@lists.xen.org>
Subject: Re: [Patch v2] tools/migrate: Fix regression when migrating from older version of Xen
Date: Mon, 22 Jul 2013 18:52:03 +0100	[thread overview]
Message-ID: <51ED7143.6090006@citrix.com> (raw)
In-Reply-To: <CAP8mzPNdgUqoi5ue+n+FNwtoEnr1PuoWBy0pYeehhi2grp17SA@mail.gmail.com>

[-- Attachment #1.1: Type: text/plain, Size: 3341 bytes --]

On 18/07/13 19:47, Shriram Rajagopalan wrote:
> On Thu, Jul 18, 2013 at 12:27 PM, Andrew Cooper
> <andrew.cooper3@citrix.com <mailto:andrew.cooper3@citrix.com>> wrote:
>
>     On 18/07/13 17:20, Shriram Rajagopalan wrote:
>>     On Tue, Jul 16, 2013 at 6:52 AM, Andrew Cooper
>>     <andrew.cooper3@citrix.com <mailto:andrew.cooper3@citrix.com>>
>>     wrote:
>>     and you setting ctx->last_checkpoint = 0 basically means that you are
>>     banking on the far end (with older version of tools) to close the
>>     socket, causing
>>     "get pagebuf" to fail and subsequently land on finish.
>>     IIRC, this was the behavior before XC_SAVE_ID_LAST_CHECKPOINT was
>>     introduced
>>     by Ian Campbell, to get rid of this benign error message.
>
>
>     That might have been 'fine' for PV guests, but it cause active
>     breakage for HVM domains where the qemu save record immediately
>     follows in the migration stream.
>
>
> Just to clarify.. the code flows like this, iirc.
>
> loadpages:
>  while (1)
>      if !completed
>         get pagebufs
>         if 0 pages sent, break
>      endif
>      apply batch (pagebufs)
>  endwhile
>
>  if !completed
>    get tailbuf [[this is where the QEMU record would be obtained]]
>    completed = 1
>  endif
>
>  if last_checkpoint
>    goto finish
>  endif
>
>  get pagebuf or goto finish on error ---> this is where old code used
> to exit
>  get tailbuf
> goto loadpages
> finish:
>    apply tailbuf [tailbuf obtained inside the 'if !completed' block]
>    do the rest of the restore

(Logically joining the two divergent threads, as this is the answer to both)

This has nothing to do with the buffering mode, and that is not where
the Qemu record would be obtained from.

As the code currently stands, if a XC_SAVE_ID_LAST_CHECKPOINT chunk is
not seen in the stream, we complete the loadpages: section, including
the magic pages (TSS, console info etc).

We then read the buffer_tail() on line 171, set ctx->completed on line
1725, but fail the ctx->last_checkpoint check on line 1758.

What we should do is pass the last_checkpoint test, and goto finish
which then calls dump_qemu().  What actually happens is a call to
pagebuf_get() on line 1766 which raises an error because of finding a
Qemu save record rather than more pages.

So this is very much a bug directly introduced by 00a4b65f85, and can
only be fixed with knowledge from the higher levels of the toolstack.

As for the wording of the parameter, I still prefer the original
"last_checkpoint_unaware" over "checkpointed_stream" as it is more accurate.

Any migration stream started from a version of the tools after c/s
00a4b65f85 will work, whether it is checkpointed or not (as the
XC_SAVE_ID_LAST_CHECKPOINT chunk will be sent in the correct place). 
Any migration started from a version of the tools before c/s 00a4b65f85
will fail because it has no idea that the receiving end is expecting it
to insert XC_SAVE_ID_LAST_CHECKPOINT chunk.  The fault here is with the
receiving end expecting to find a XC_SAVE_ID_LAST_CHECKPOINT chunk.

The only fix is for newer toolstack to be aware that the migration
stream is from an older toolstack, and to set last_checkpoint_unaware=1,
which will set ctx->last_checkpoint to 1, allowing the receiving side of
the migration to correctly read the migration stream.

~Andrew

[-- Attachment #1.2: Type: text/html, Size: 6883 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2013-07-22 17:52 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-16 10:52 [Patch v2] tools/migrate: Fix regression when migrating from older version of Xen Andrew Cooper
2013-07-18 13:01 ` Andrew Cooper
2013-07-18 13:14   ` Ian Campbell
2013-07-18 16:20 ` Shriram Rajagopalan
2013-07-18 16:27   ` Andrew Cooper
2013-07-18 18:47     ` Shriram Rajagopalan
2013-07-22 17:52       ` Andrew Cooper [this message]
2013-07-22 17:59         ` Ian Campbell
2013-07-22 18:06           ` Andrew Cooper
2013-07-19  9:36     ` Ian Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51ED7143.6090006@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=rshriram@cs.ubc.ca \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).