From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Yang Hongyang <yanghy@cn.fujitsu.com>, xen-devel@lists.xen.org
Cc: wei.liu2@citrix.com, ian.campbell@citrix.com,
wency@cn.fujitsu.com, guijianfeng@cn.fujitsu.com,
yunhong.jiang@intel.com, eddie.dong@intel.com,
rshriram@cs.ubc.ca, ian.jackson@eu.citrix.com
Subject: Re: [PATCH v6 COLO 06/15] libxc/save: support COLO save
Date: Tue, 09 Jun 2015 08:20:42 +0100 [thread overview]
Message-ID: <557693CA.9090700@citrix.com> (raw)
In-Reply-To: <55765A49.9010101@cn.fujitsu.com>
On 09/06/2015 04:15, Yang Hongyang wrote:
>
>
> On 06/08/2015 09:04 PM, Andrew Cooper wrote:
>> On 08/06/15 04:45, Yang Hongyang wrote:
>>> call callbacks->get_dirty_pfn() after suspend primary vm to
>>> get dirty pages on secondary vm, and send pages both dirty on
>>> primary/secondary to secondary.
>>>
>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>> ---
>>> tools/libxc/xc_sr_save.c | 49
>>> +++++++++++++++++++++++++++++++++++++++++++++++-
>>> 1 file changed, 48 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
>>> index d63b783..cda61ed 100644
>>> --- a/tools/libxc/xc_sr_save.c
>>> +++ b/tools/libxc/xc_sr_save.c
>>> @@ -515,6 +515,31 @@ static int send_memory_live(struct
>>> xc_sr_context *ctx)
>>> return rc;
>>> }
>>>
>>> +static int update_dirty_bitmap(uint8_t *(*get_dirty_pfn)(void *),
>>> void *data,
>>> + unsigned long p2m_size, unsigned
>>> long *bitmap)
>>
>> This function should take a ctx rather than having the caller expand 3
>> parameters. Also, "update_dirty_bitmap" is a little misleading, as it
>> isn't querying the hypervisor for the dirty bitmap.
>
> ok.
(Merging the other thread)
> how about merge_secondary_dirty_bitmap()?
Much better!
>
>>
>>> +{
>>> + uint64_t *pfn_list;
>>> + uint64_t count, i;
>>> + uint64_t pfn;
>>> +
>>> + pfn_list = (uint64_t *)get_dirty_pfn(data);
>>
>> This looks like a recipe for width-errors. The get_dirty_pfn() call
>> should take a pointer to a struct for it to fill.
>
> but the size is unknown for the caller.pfn_list[0] is the count of
> pfn.
>
>>
>>> + assert(pfn_list);
>>
>> This should turn into an error rather than an abort().
>
> Even if there are no dirty pages on secondary, pfn_list shouldn't be
> NULL, it's just that pfn_list[0] will be 0. if pfn_list is NULL,
> there might be unexpected error happened.
get_dirty_pfn() should be declared alongside a
struct pfn_data
{
uint64_t count;
uint64_t *pfns;
};
and this function here should create one of these on the stack and pass
it by pointer to get_dirty_pfn(). I might also be tempted to rename
this to get_remote_logdirty() or similar, to indicate that it is a
source of logdirty data from something other than the current hypervisor.
>
>>
>>> +
>>> + count = pfn_list[0];
>>> + for (i = 0; i < count; i++) {
>>
>> style
>>
>>> + pfn = pfn_list[i + 1];
>>> + if (pfn > p2m_size) {
>>> + errno = EINVAL;
>>> + return -1;
>>> + }
>>> +
>>> + set_bit(pfn, bitmap);
>>> + }
>>> +
>>> + free(pfn_list);
>>> + return 0;
>>> +}
>>> +
>>> /*
>>> * Suspend the domain and send dirty memory.
>>> * This is the last iteration of the live migration and the
>>> @@ -555,6 +580,19 @@ static int suspend_and_send_dirty(struct
>>> xc_sr_context *ctx)
>>>
>>> bitmap_or(dirty_bitmap, ctx->save.deferred_pages,
>>> ctx->save.p2m_size);
>>>
>>> + if ( !ctx->save.live && ctx->save.callbacks->get_dirty_pfn )
>>> + {
>>
>> Shouldn't get_dirty_pfn be mandatory for COLO streams (even if it is a
>> noop to start with) ?
>
> It should be mandatory, it shouldn't be noop under COLO. perhaps we
> should
> add sanity check at the beginning. But problem is save side do not
> have a param
> passed from libxl to indicate the stream type(like checkpointed_stream in
> restore side). So we may need to add another XCFLAGS? Currently there is
> XCFLAGS_CHECKPOINTED which represents Remus, we might need to change
> this to
> XCFLAGS_STREAM_REMUS
> XCFLAGS_STREAM_COLO
> so that we can know what kind of stream we are handling?
checkpointed_stream started out as a bugfix for a legacy stream
migration breakage. Really, this information should have been passed
right from the start.
It would probably be best to take the enum{} suggested elsewhere and
make it a top level ctx item, and have it present for both save and
restore, with sutable parameters passed in from the top. (When I am
finally able to take out the legacy code, there is going to be a severe
pruning/consolidation of the parameters.)
~Andrew
>
>>
>> ~Andrew
>>
>>> + rc = update_dirty_bitmap(ctx->save.callbacks->get_dirty_pfn,
>>> + ctx->save.callbacks->data,
>>> + ctx->save.p2m_size,
>>> + dirty_bitmap);
>>> + if ( rc )
>>> + {
>>> + PERROR("Failed to get secondary vm's dirty pages");
>>> + goto out;
>>> + }
>>> + }
>>> +
>>> rc = send_dirty_pages(ctx, stats.dirty_count +
>>> ctx->save.nr_deferred_pages);
>>> if ( rc )
>>> goto out;
>>> @@ -784,7 +822,16 @@ static int save(struct xc_sr_context *ctx,
>>> uint16_t guest_type)
>>> if ( rc )
>>> goto err;
>>>
>>> - ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
>>> + rc =
>>> ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
>>> + if ( !rc ) {
>>> + if ( !errno )
>>> + {
>>> + /* Postcopy request failed (without errno,
>>> using EINVAL) */
>>> + errno = EINVAL;
>>> + }
>>> + rc = -1;
>>> + goto err;
>>> + }
>>>
>>> rc =
>>> ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
>>> if ( rc <= 0 )
>>
>> .
>>
>
next prev parent reply other threads:[~2015-06-09 7:20 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-08 3:45 [PATCH v6 COLO 00/15] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
2015-06-08 3:45 ` [PATCH v6 COLO 01/15] docs: add colo readme Yang Hongyang
2015-06-16 10:56 ` Ian Campbell
2015-06-24 9:13 ` Yang Hongyang
2015-06-08 3:45 ` [PATCH v6 COLO 02/15] secondary vm suspend/resume/checkpoint code Yang Hongyang
2015-06-12 14:23 ` Wei Liu
2015-06-12 14:51 ` Ian Jackson
2015-06-15 2:10 ` Yang Hongyang
2015-06-15 1:55 ` Yang Hongyang
2015-06-16 11:42 ` Ian Jackson
2015-06-08 3:45 ` [PATCH v6 COLO 03/15] primary vm suspend/get_dirty_pfn/resume/checkpoint code Yang Hongyang
2015-06-16 11:05 ` Ian Campbell
2015-06-08 3:45 ` [PATCH v6 COLO 04/15] libxc/restore: support COLO restore Yang Hongyang
2015-06-08 10:39 ` Andrew Cooper
2015-06-08 14:06 ` Yang Hongyang
2015-06-08 3:45 ` [PATCH v6 COLO 05/15] send store mfn and console mfn to xl before resuming secondary vm Yang Hongyang
2015-06-08 12:16 ` Andrew Cooper
2015-06-08 14:08 ` Yang Hongyang
2015-06-16 11:13 ` Ian Campbell
2015-06-08 3:45 ` [PATCH v6 COLO 06/15] libxc/save: support COLO save Yang Hongyang
2015-06-08 13:04 ` Andrew Cooper
2015-06-09 3:15 ` Yang Hongyang
2015-06-09 7:20 ` Andrew Cooper [this message]
2015-06-09 8:45 ` Yang Hongyang
2015-06-09 8:51 ` Andrew Cooper
2015-06-09 9:09 ` Yang Hongyang
2015-06-09 9:10 ` Andrew Cooper
2015-06-09 9:16 ` Yang Hongyang
2015-06-09 3:18 ` Yang Hongyang
2015-06-08 3:45 ` [PATCH v6 COLO 07/15] implement the cmdline for COLO Yang Hongyang
2015-06-16 11:19 ` Ian Campbell
2015-06-25 4:06 ` Yang Hongyang
2015-07-14 15:14 ` Ian Campbell
2015-06-08 3:45 ` [PATCH v6 COLO 08/15] Support colo mode for qemu disk Yang Hongyang
2015-06-16 11:21 ` Ian Campbell
2015-06-08 3:45 ` [PATCH v6 COLO 09/15] COLO: use qemu block replication Yang Hongyang
2015-06-16 11:22 ` Ian Campbell
2015-06-08 3:45 ` [PATCH v6 COLO 10/15] COLO proxy: implement setup/teardown of COLO proxy module Yang Hongyang
2015-06-16 11:24 ` Ian Campbell
2015-06-16 11:26 ` Ian Campbell
2015-06-25 5:22 ` Yang Hongyang
2015-06-25 8:39 ` Ian Campbell
2015-06-25 8:48 ` Yang Hongyang
2015-06-08 3:45 ` [PATCH v6 COLO 11/15] COLO proxy: preresume, postresume and checkpoint Yang Hongyang
2015-06-08 3:45 ` [PATCH v6 COLO 12/15] COLO nic: implement COLO nic subkind Yang Hongyang
2015-06-12 14:35 ` Wei Liu
2015-06-15 2:13 ` Yang Hongyang
2015-06-08 3:45 ` [PATCH v6 COLO 13/15] setup and control colo proxy on primary side Yang Hongyang
2015-06-08 3:45 ` [PATCH v6 COLO 14/15] setup and control colo proxy on secondary side Yang Hongyang
2015-06-08 3:45 ` [PATCH v6 COLO 15/15] cmdline switches and config vars to control colo-proxy Yang Hongyang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=557693CA.9090700@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=eddie.dong@intel.com \
--cc=guijianfeng@cn.fujitsu.com \
--cc=ian.campbell@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=rshriram@cs.ubc.ca \
--cc=wei.liu2@citrix.com \
--cc=wency@cn.fujitsu.com \
--cc=xen-devel@lists.xen.org \
--cc=yanghy@cn.fujitsu.com \
--cc=yunhong.jiang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.