From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Ian Campbell <ian.campbell@citrix.com>
Cc: rshriram@cs.ubc.ca, Xen-devel <xen-devel@lists.xen.org>
Subject: Re: [Patch v2] tools/migrate: Fix regression when migrating from older version of Xen
Date: Mon, 22 Jul 2013 19:06:39 +0100 [thread overview]
Message-ID: <51ED74AF.9010605@citrix.com> (raw)
In-Reply-To: <1374515941.6623.30.camel@hastur.hellion.org.uk>
On 22/07/13 18:59, Ian Campbell wrote:
> On Mon, 2013-07-22 at 18:52 +0100, Andrew Cooper wrote:
>> On 18/07/13 19:47, Shriram Rajagopalan wrote:
>>
>>> On Thu, Jul 18, 2013 at 12:27 PM, Andrew Cooper
>>> <andrew.cooper3@citrix.com> wrote:
>>> On 18/07/13 17:20, Shriram Rajagopalan wrote:
>>>
>>> > On Tue, Jul 16, 2013 at 6:52 AM, Andrew Cooper
>>> > <andrew.cooper3@citrix.com> wrote:
>>> > and you setting ctx->last_checkpoint = 0 basically means
>>> > that you are
>>> > banking on the far end (with older version of tools) to
>>> > close the socket, causing
>>> > "get pagebuf" to fail and subsequently land on finish.
>>> > IIRC, this was the behavior before
>>> > XC_SAVE_ID_LAST_CHECKPOINT was introduced
>>> > by Ian Campbell, to get rid of this benign error message.
>>> >
>>>
>>>
>>> That might have been 'fine' for PV guests, but it cause
>>> active breakage for HVM domains where the qemu save record
>>> immediately follows in the migration stream.
>>>
>>>
>>>
>>>
>>> Just to clarify.. the code flows like this, iirc.
>>>
>>>
>>> loadpages:
>>> while (1)
>>> if !completed
>>> get pagebufs
>>> if 0 pages sent, break
>>> endif
>>> apply batch (pagebufs)
>>> endwhile
>>>
>>>
>>> if !completed
>>> get tailbuf [[this is where the QEMU record would be obtained]]
>>> completed = 1
>>> endif
>>>
>>>
>>> if last_checkpoint
>>> goto finish
>>> endif
>>>
>>>
>>> get pagebuf or goto finish on error ---> this is where old code
>>> used to exit
>>> get tailbuf
>>> goto loadpages
>>> finish:
>>> apply tailbuf [tailbuf obtained inside the 'if !completed' block]
>>> do the rest of the restore
>> (Logically joining the two divergent threads, as this is the answer to
>> both)
>>
>> This has nothing to do with the buffering mode, and that is not where
>> the Qemu record would be obtained from.
>>
>> As the code currently stands, if a XC_SAVE_ID_LAST_CHECKPOINT chunk is
>> not seen in the stream, we complete the loadpages: section, including
>> the magic pages (TSS, console info etc).
>>
>> We then read the buffer_tail() on line 171, set ctx->completed on line
>> 1725, but fail the ctx->last_checkpoint check on line 1758.
>>
>> What we should do is pass the last_checkpoint test, and goto finish
>> which then calls dump_qemu(). What actually happens is a call to
>> pagebuf_get() on line 1766 which raises an error because of finding a
>> Qemu save record rather than more pages.
>>
>> So this is very much a bug directly introduced by 00a4b65f85, and can
>> only be fixed with knowledge from the higher levels of the toolstack.
>>
>> As for the wording of the parameter, I still prefer the original
>> "last_checkpoint_unaware" over "checkpointed_stream" as it is more
>> accurate.
>>
>> Any migration stream started from a version of the tools after c/s
>> 00a4b65f85 will work, whether it is checkpointed or not (as the
>> XC_SAVE_ID_LAST_CHECKPOINT chunk will be sent in the correct place).
>> Any migration started from a version of the tools before c/s
>> 00a4b65f85 will fail because it has no idea that the receiving end is
>> expecting it to insert XC_SAVE_ID_LAST_CHECKPOINT chunk. The fault
>> here is with the receiving end expecting to find a
>> XC_SAVE_ID_LAST_CHECKPOINT chunk.
>>
>> The only fix is for newer toolstack to be aware that the migration
>> stream is from an older toolstack, and to set
>> last_checkpoint_unaware=1, which will set ctx->last_checkpoint to 1,
>> allowing the receiving side of the migration to correctly read the
>> migration stream.
> The toolstack cannot in general know that (i.e. how does xl know?)
>
> It does know if it is doing checkpointing or not though.
>
> There are four cases:
>
> Regular migrate from pre-00a4b65f85 to post-00a4b65f85. Receiver sets
> last_checkpoint = 1 at start of day. Doesn't care that sender never
> sends XC_SAVE_ID_LAST_CHECKPOINT.
>
> Regular migrate from post-00a4b65f85 to post-00a4b65f85. Receiver sets
> last_checkpoint = 1 at start of day. Doesn't care that it sees
> XC_SAVE_ID_LAST_CHECKPOINT again, just ends up (pointlessly) setting
> last_checkpoint = 1.
>
> Checkpointed migrate from post-00a4b65f85 to post-00a4b65f85. Receiver
> doesn't set last_checkpoint = 1 at start of day.
> XC_SAVE_ID_LAST_CHECKPOINT is sent at the right time and both sides
> agree etc. Things work.
>
> Checkpointed migrate from pre-00a4b65f85 to post-00a4b65f85. We don't
> support this, because checkpoints should only be supported between like
> versions of Xen (N->N+1 case doesn't apply).
>
> Ian.
>
>
Ok - that seems to make it easier. I will see about plumbing the
correct information through from xend/remus.
~Andrew
next prev parent reply other threads:[~2013-07-22 18:06 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-16 10:52 [Patch v2] tools/migrate: Fix regression when migrating from older version of Xen Andrew Cooper
2013-07-18 13:01 ` Andrew Cooper
2013-07-18 13:14 ` Ian Campbell
2013-07-18 16:20 ` Shriram Rajagopalan
2013-07-18 16:27 ` Andrew Cooper
2013-07-18 18:47 ` Shriram Rajagopalan
2013-07-22 17:52 ` Andrew Cooper
2013-07-22 17:59 ` Ian Campbell
2013-07-22 18:06 ` Andrew Cooper [this message]
2013-07-19 9:36 ` Ian Campbell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51ED74AF.9010605@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=ian.campbell@citrix.com \
--cc=rshriram@cs.ubc.ca \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).