All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Hongyang Yang <yanghy@cn.fujitsu.com>, xen-devel@lists.xen.org
Cc: wei.liu2@citrix.com, ian.campbell@citrix.com,
	wency@cn.fujitsu.com, ian.jackson@eu.citrix.com,
	yunhong.jiang@intel.com, eddie.dong@intel.com,
	rshriram@cs.ubc.ca
Subject: Re: [PATCH Remus v2 00/10] Remus support for Migration-v2
Date: Mon, 11 May 2015 12:01:33 +0100	[thread overview]
Message-ID: <55508C0D.6060904@citrix.com> (raw)
In-Reply-To: <55508906.1010003@cn.fujitsu.com>

On 11/05/15 11:48, Hongyang Yang wrote:
>
> On 05/11/2015 05:00 PM, Andrew Cooper wrote:
>> On 11/05/15 07:28, Hongyang Yang wrote:
>>> On 05/09/2015 02:12 AM, Andrew Cooper wrote:
>>>> On 08/05/15 10:33, Yang Hongyang wrote:
>>>>> This patchset implement the Remus support for Migration v2 but
>>>>> without
>>>>> memory compressing.
> [...]
>>
>>>
>>> <last iter of memory>
>>>
>>>> end_of_checkpoint()
>>>> Checkpoint record
>>>
>>> ctx->save.callbacks->postcopy()
>>> this callback should not be omitted, it do some necessary work before
>>> resume
>>> primary (such as call Remus devices preresume callbacks to ensure the
>>> disk
>>> data is consistent) and then resume the primary guest. I think this
>>> callback should be renamed to ctx->save.callbacks->resume().
>>
>> That looks to be a useful cleanup (and answers one of my questions of
>> what exactly postcopy was)
>>
>>>
>>>>               ctx->save.callbacks->checkpoint()
>>>>                                   libxl qemu record
>>>
>>> Maybe we should add another callback to send qemu record instead of
>>> using checkpoint callback. We can call it
>>> ctx->save.callbacks->save_qemu()
>>
>> This is another layering violation.  libxc should not prescribe what
>> libxl might or might not do.  One example we are experimenting with in
>> XenServer at the moment is support for multiple emulators attached to a
>> single domain, which would necessitate two LIBXL_EMULATOR records to be
>> sent per checkpoint.  libxl might also want to send an updated json blob
>> or such.
>
> Ok, so we'd better not introduce save_qemu callback.
>
>>
>>> Then in checkpoint callback, we only call remus devices commit
>>> callbacks(
>>> which will release the network buffer etc...) then decide whether we
>>> need to
>>> do another checkpoint or quit checkpointed stream.
>>> With Remus, checkpoint callback only wait for 200ms(can be specified
>>> by -i)
>>> then return.
>>> With COLO, checkpoint callback will ask COLO proxy if we need to do a
>>> checkpoint, will return when COLO proxy module indicate a checkpoint
>>> is needed.
>>
>> That sounds like COLO wants a should_checkpoint() callback which
>> separates the decision to make a checkpoint from the logic of
>> implementing a checkpoint.
>
> We use checkpoint callback to do should_checkpoint() thing currently.
> libxc will check the return value of checkpoint callback.

But that causes a chicken & egg problem.

I am planning to use a CHECKPOINT record to synchronise the transfer of
ownership of the FD between libxc and libxl.  Therefore, a CHECKPOINT
record must be in the stream ahead of the checkpoint() callback, as
libxl will then write/read some records in itself.

As a result, the checkpoint() callback itself can't be used to gate
whether a CHECKPOINT record is written by libxc.

>
>>
>>>
>>>>                                   ...
>>>>                                   libxl end-of-checkpoint record
>>>>               ctx->save.callbacks->checkpoint() returns
>>>> start_of_checkpoint()
>>>
>>> ctx->save.callbacks->suspend()
>>>
>>>> <memory>
>>>> end_of_checkpoint()
>>>> Checkpoint record
>>>> etc...
>>>>
>>>> This will eventually allow both libxc and libxl to send checkpoint
>>>> data
>>>> (and by the looks of it, remove the need for postcopy()).  With this
>>>> libxc/remus work it is fine to use XG_LIBXL_HVM_COMPAT to cover the
>>>> current qemu situation, but I would prefer not to be also retrofitting
>>>> libxc checkpoint records when doing the libxl/migv2 work.
>>>>
>>>> Does this look plausible in for Remus (and eventually COLO) support?
>>>
>>> With comments above, I would suggest the save flow as below:
>>>
>>> libxc writes:                   libxl writes:
>>>
>>> live migration:
>>> Image Header
>>> Domain Header
>>> start_of_stream()
>>> start_of_checkpoint()
>>> <live memory>
>>> ctx->save.callbacks->suspend()
>>> <last iter memory>
>>> end_of_checkpoint()
>>> if ( checkpointd )
>>>    End of Checkpoint record
>>>    /*If resotre side receives this record, input fd should be handed to
>>> libxl*/
>>> else
>>>    goto end
>>>
>>> loop of checkpointed stream:
>>> ctx->save.callbacks->resume()
>>> ctx->save.callbacks->save_qemu()
>>>                                  libxl qemu record
>>>                                  ...
>>>                                  libxl end-of-checkpoint record
>>> /*If resotre side receives this record, input fd should be handed to
>>> libxc*/
>>> ctx->save.callbacks->save_qemu() returns
>>> ctx->save.callbacks->checkpoint()
>>> start_of_checkpoint()
>>> ctx->save.callbacks->suspend()
>>> <memory>
>>> end_of_checkpoint()
>>> End of Checkpoint record
>>> goto 'loop of checkpointed stream'
>>>
>>> end:
>>> END record
>>> /*If resotre side receives this record, input fd should be handed to
>>> libxl*/
>>>
>>>
>>> In order to keep it simple, we can keep the current
>>> ctx->save.callbacks->checkpoint() as it is, which do the save_qemu
>>> thing, call
>>> Remus devices commit callbacks and then decide whether we need a
>>> checkpoint. We
>>> can also combine the ctx->save.callbacks->resume() with
>>> ctx->save.callbacks->checkpoint(), with only one checkpoint()
>>> callback, we do
>>> the following things:
>>>   - Call Remus devices preresume callbacks
>>>   - Resume the primary
>>>   - Save qemu records
>>>   - Call Remus devices commit callbacks
>>>   - Decide whether we need a checkpoint
>>>
>>> Overall, there are 3 options for the save flow:
>>> 1. keep the current callbacks, rename postcopy() to resume()
>>> 2. split the checkpoint() callback to save_qemu() and checkpoint()
>>> 3. combine the current postcopy() and checkpoint()
>>> Which one do you think is the best?
>>
>> I have a 4th alternative in mind, but would like your feedback from my
>> comments in this email first.
>
> So what's the 4th alternative?

I have some corrections to my patch series based on David's feedback,
and your comments.  After that, it should hopefully be far easier to
describe.

~Andrew

  reply	other threads:[~2015-05-11 11:01 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-08  9:33 [PATCH Remus v2 00/10] Remus support for Migration-v2 Yang Hongyang
2015-05-08  9:33 ` [PATCH Remus v2 01/10] tools/libxc: adjust the memory allocation for migration Yang Hongyang
2015-05-08  9:51   ` Andrew Cooper
2015-05-11 11:50   ` Ian Campbell
2015-05-12  6:43     ` Hongyang Yang
2015-05-08  9:33 ` [PATCH Remus v2 02/10] tools/libxc: introduce setup() and cleanup() on save Yang Hongyang
2015-05-08  9:45   ` Andrew Cooper
2015-05-08  9:59     ` Hongyang Yang
2015-05-08 10:08       ` Andrew Cooper
2015-05-11  1:20         ` Hongyang Yang
2015-05-11 11:47           ` Ian Campbell
2015-05-11 11:49             ` Ian Campbell
2015-05-12  7:04               ` Yang Hongyang
2015-05-08  9:33 ` [PATCH Remus v2 03/10] tools/libxc: rename send_some_pages to send_dirty_pages Yang Hongyang
2015-05-08 10:11   ` Andrew Cooper
2015-05-11  1:21     ` Hongyang Yang
2015-05-08  9:33 ` [PATCH Remus v2 04/10] tools/libxc: introduce DECLARE_HYPERCALL_BUFFER_USER_POINTER Yang Hongyang
2015-05-08 10:16   ` Andrew Cooper
2015-05-11  1:22     ` Hongyang Yang
2015-05-11 11:53   ` Ian Campbell
2015-05-12  7:18     ` Yang Hongyang
2015-05-12  8:19       ` Ian Campbell
2015-05-12  9:24         ` Yang Hongyang
2015-05-12  9:43           ` Ian Campbell
2015-05-12  9:48             ` Yang Hongyang
2015-05-08  9:33 ` [PATCH Remus v2 05/10] tools/libxc: reuse send_dirty_pages() in send_all_pages() Yang Hongyang
2015-05-08 10:17   ` Andrew Cooper
2015-05-08  9:33 ` [PATCH Remus v2 06/10] tools/libxc: introduce process_record() Yang Hongyang
2015-05-08  9:33 ` [PATCH Remus v2 07/10] tools/libxc: split read/handle qemu info Yang Hongyang
2015-05-08  9:33 ` [PATCH Remus v2 08/10] tools/libxc: implement Remus checkpointed save Yang Hongyang
2015-05-08  9:33 ` [PATCH Remus v2 09/10] tools/libxc: implement Remus checkpointed restore Yang Hongyang
2015-05-08  9:33 ` [PATCH Remus v2 10/10] tools/libxc: X86_PV_INFO can be sent multiple times under Remus Yang Hongyang
2015-05-08 18:12 ` [PATCH Remus v2 00/10] Remus support for Migration-v2 Andrew Cooper
2015-05-11  6:28   ` Hongyang Yang
2015-05-11  9:00     ` Andrew Cooper
2015-05-11 10:48       ` Hongyang Yang
2015-05-11 11:01         ` Andrew Cooper [this message]
2015-05-12  8:12           ` Yang Hongyang
2015-05-12  9:40             ` Andrew Cooper
2015-05-12 10:02               ` Yang Hongyang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55508C0D.6060904@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=eddie.dong@intel.com \
    --cc=ian.campbell@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=rshriram@cs.ubc.ca \
    --cc=wei.liu2@citrix.com \
    --cc=wency@cn.fujitsu.com \
    --cc=xen-devel@lists.xen.org \
    --cc=yanghy@cn.fujitsu.com \
    --cc=yunhong.jiang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.