From mboxrd@z Thu Jan  1 00:00:00 1970
From: Hongyang Yang <yanghy@cn.fujitsu.com>
Subject: Re: [PATCH Remus v2 00/10] Remus support for
	Migration-v2
Date: Mon, 11 May 2015 18:48:38 +0800
Message-ID: <55508906.1010003@cn.fujitsu.com>
References: <1431077610-3366-1-git-send-email-yanghy@cn.fujitsu.com>
	<554CFC90.1030301@citrix.com> <55504C20.8020009@cn.fujitsu.com>
	<55506F94.7070407@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <55506F94.7070407@citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Andrew Cooper <andrew.cooper3@citrix.com>, xen-devel@lists.xen.org
Cc: wei.liu2@citrix.com, ian.campbell@citrix.com, wency@cn.fujitsu.com, ian.jackson@eu.citrix.com, yunhong.jiang@intel.com, eddie.dong@intel.com, rshriram@cs.ubc.ca
List-Id: xen-devel@lists.xenproject.org


On 05/11/2015 05:00 PM, Andrew Cooper wrote:
> On 11/05/15 07:28, Hongyang Yang wrote:
>> On 05/09/2015 02:12 AM, Andrew Cooper wrote:
>>> On 08/05/15 10:33, Yang Hongyang wrote:
>>>> This patchset implement the Remus support for Migration v2 but without
>>>> memory compressing.
[...]
>
>>
>> <last iter of memory>
>>
>>> end_of_checkpoint()
>>> Checkpoint record
>>
>> ctx->save.callbacks->postcopy()
>> this callback should not be omitted, it do some necessary work before
>> resume
>> primary (such as call Remus devices preresume callbacks to ensure the
>> disk
>> data is consistent) and then resume the primary guest. I think this
>> callback should be renamed to ctx->save.callbacks->resume().
>
> That looks to be a useful cleanup (and answers one of my questions of
> what exactly postcopy was)
>
>>
>>>               ctx->save.callbacks->checkpoint()
>>>                                   libxl qemu record
>>
>> Maybe we should add another callback to send qemu record instead of
>> using checkpoint callback. We can call it
>> ctx->save.callbacks->save_qemu()
>
> This is another layering violation.  libxc should not prescribe what
> libxl might or might not do.  One example we are experimenting with in
> XenServer at the moment is support for multiple emulators attached to a
> single domain, which would necessitate two LIBXL_EMULATOR records to be
> sent per checkpoint.  libxl might also want to send an updated json blob
> or such.

Ok, so we'd better not introduce save_qemu callback.

>
>> Then in checkpoint callback, we only call remus devices commit callbacks(
>> which will release the network buffer etc...) then decide whether we
>> need to
>> do another checkpoint or quit checkpointed stream.
>> With Remus, checkpoint callback only wait for 200ms(can be specified
>> by -i)
>> then return.
>> With COLO, checkpoint callback will ask COLO proxy if we need to do a
>> checkpoint, will return when COLO proxy module indicate a checkpoint
>> is needed.
>
> That sounds like COLO wants a should_checkpoint() callback which
> separates the decision to make a checkpoint from the logic of
> implementing a checkpoint.

We use checkpoint callback to do should_checkpoint() thing currently.
libxc will check the return value of checkpoint callback.

>
>>
>>>                                   ...
>>>                                   libxl end-of-checkpoint record
>>>               ctx->save.callbacks->checkpoint() returns
>>> start_of_checkpoint()
>>
>> ctx->save.callbacks->suspend()
>>
>>> <memory>
>>> end_of_checkpoint()
>>> Checkpoint record
>>> etc...
>>>
>>> This will eventually allow both libxc and libxl to send checkpoint data
>>> (and by the looks of it, remove the need for postcopy()).  With this
>>> libxc/remus work it is fine to use XG_LIBXL_HVM_COMPAT to cover the
>>> current qemu situation, but I would prefer not to be also retrofitting
>>> libxc checkpoint records when doing the libxl/migv2 work.
>>>
>>> Does this look plausible in for Remus (and eventually COLO) support?
>>
>> With comments above, I would suggest the save flow as below:
>>
>> libxc writes:                   libxl writes:
>>
>> live migration:
>> Image Header
>> Domain Header
>> start_of_stream()
>> start_of_checkpoint()
>> <live memory>
>> ctx->save.callbacks->suspend()
>> <last iter memory>
>> end_of_checkpoint()
>> if ( checkpointd )
>>    End of Checkpoint record
>>    /*If resotre side receives this record, input fd should be handed to
>> libxl*/
>> else
>>    goto end
>>
>> loop of checkpointed stream:
>> ctx->save.callbacks->resume()
>> ctx->save.callbacks->save_qemu()
>>                                  libxl qemu record
>>                                  ...
>>                                  libxl end-of-checkpoint record
>> /*If resotre side receives this record, input fd should be handed to
>> libxc*/
>> ctx->save.callbacks->save_qemu() returns
>> ctx->save.callbacks->checkpoint()
>> start_of_checkpoint()
>> ctx->save.callbacks->suspend()
>> <memory>
>> end_of_checkpoint()
>> End of Checkpoint record
>> goto 'loop of checkpointed stream'
>>
>> end:
>> END record
>> /*If resotre side receives this record, input fd should be handed to
>> libxl*/
>>
>>
>> In order to keep it simple, we can keep the current
>> ctx->save.callbacks->checkpoint() as it is, which do the save_qemu
>> thing, call
>> Remus devices commit callbacks and then decide whether we need a
>> checkpoint. We
>> can also combine the ctx->save.callbacks->resume() with
>> ctx->save.callbacks->checkpoint(), with only one checkpoint()
>> callback, we do
>> the following things:
>>   - Call Remus devices preresume callbacks
>>   - Resume the primary
>>   - Save qemu records
>>   - Call Remus devices commit callbacks
>>   - Decide whether we need a checkpoint
>>
>> Overall, there are 3 options for the save flow:
>> 1. keep the current callbacks, rename postcopy() to resume()
>> 2. split the checkpoint() callback to save_qemu() and checkpoint()
>> 3. combine the current postcopy() and checkpoint()
>> Which one do you think is the best?
>
> I have a 4th alternative in mind, but would like your feedback from my
> comments in this email first.

So what's the 4th alternative?

>
> ~Andrew
> .
>

-- 
Thanks,
Yang.